URL

Living Standard — Last Updated

Participate:
GitHub whatwg/url (new issue, open issues, legacy open bugs)
IRC: #whatwg on Freenode
Commits:
https://github.com/whatwg/url/commits
@urlstandard
Translation (non-normative):
日本語

Abstract

The URL Standard defines URLs, domains, IP addresses, the application/x-www-form-urlencoded format, and their API.

Goals

The URL standard takes the following approach towards making URLs fully interoperable:

As the editors learn more about the subject matter the goals might increase in scope somewhat.

1. Infrastructure

Some terms used in this specification are defined in the DOM, Encoding, IDNA, and Web IDL Standards. [DOM] [ENCODING] [IDNA] [WEBIDL]

The C0 controls are code points in the range U+0000 to U+001F, inclusive.

The C0 controls and space are C0 controls and code point U+0020.

The tab and newline are code points U+0009, U+000A, and U+000D.

The ASCII digits are code points in the range U+0030 to U+0039, inclusive.

The ASCII hex digits are ASCII digits, code points in the range U+0041 to U+0046, inclusive, and code points in the range U+0061 to U+0066, inclusive.

The ASCII alpha are code points in the range U+0041 to U+005A, inclusive, and in the range U+0061 to U+007A, inclusive.

The ASCII alphanumeric are ASCII digits and ASCII alpha.

An ASCII string is a string in the range U+0000 to U+007F, inclusive.

To ASCII lowercase a string, replace all code points in the range U+0041 to U+005A, inclusive, with the corresponding code points in the range U+0061 to U+007A, inclusive.


To serialize an integer, represent it as the shortest possible decimal number.


A Windows drive letter is two code points, of which the first is an ASCII alpha and the second is either ":" or "|".

A normalized Windows drive letter is a Windows drive letter of which the second code point is ":".

1.1. Parsers

The EOF code point is a conceptual code point that signifies the end of a string or code point stream.

Within a parser algorithm that uses a pointer variable, c references the code point the pointer variable points to.

Within a string-based parser algorithm that uses a pointer variable, remaining references the substring after pointer in the string being processed.

If "mailto:username@example" is a string being processed and pointer points to "@", c is "@" and remaining is "example".

A syntax violation indicates a non-fatal mismatch between input and syntax requirements. User agents, especially conformance checkers are encouraged to report them somewhere.

A syntax violation does not mean that the parser terminates. Termination of a parser is always stated explicitly. E.g., through a return statement.

1.2. Percent-encoded bytes

A percent-encoded byte is "%", followed by two ASCII hex digits. Sequences of percent-encoded bytes, after conversion to bytes, should not cause UTF-8 decode without BOM or fail to return failure.

To percent encode a byte into a percent-encoded byte, return a string consisting of "%", followed by a double-digit, uppercase, hexadecimal representation of byte.

To percent decode a byte sequence input, run these steps:

Using anything but UTF-8 decode without BOM when the input contains bytes that are not ASCII bytes might be insecure and is not recommended.

  1. Let output be an empty byte sequence.

  2. For each byte byte in input, run these steps:

    1. If byte is not `%`, append byte to output.

    2. Otherwise, if byte is `%` and the next two bytes after byte in input are not in the ranges 0x30 to 0x39, 0x41 to 0x46, and 0x61 to 0x66, append byte to output.

    3. Otherwise, run these substeps:

      1. Let bytePoint be the two bytes after byte in input, decoded, and then interpreted as hexadecimal number.

      2. Append a byte whose value is bytePoint to output.

      3. Skip the next two bytes in input.

  3. Return output.

The simple encode set are C0 controls and all code points greater than U+007E.

The default encode set is the simple encode set and code points U+0020, '"', "#", "<", ">", "?", "`", "{", and "}".

The userinfo encode set is the default encode set and code points "/", ":", ";", "=", "@", "[", "\", "]", "^", and "|".

To UTF-8 percent encode a codePoint, using an encode set, run these steps:

  1. If codePoint is not in encode set, return codePoint.

  2. Let bytes be the result of running UTF-8 encode on codePoint.

  3. Percent encode each byte in bytes, and then return the results concatenated, in the same order.

2. Security considerations

The security of a URL is a function of its environment. Care is to be taken when rendering, interpreting, and passing URLs around.

When rendering and allocating new URLs "spoofing" needs to be considered. An attack whereby one host or URL can be confused for another. E.g., consider how 1/l/I, m/rn/rri, 0/O, and а/a can all appear eerily similar. Or worse, consider how U+202A and similar code points are invisible. [UTS36]

When passing a URL from party A to B, both need to carefully consider what is happening. A might end up leaking data it does not want to leak. B might receive input it did not expect and take an action that harms the user. In particular, B should never trust A, as at some point URLs from A can come from untrusted sources.

3. Hosts (domains and IP addresses)

A host is a domain, an IPv4 address, or an IPv6 address. Typically a host serves as a network address, but it is sometimes (ab)used as opaque identifier in URLs where a network address is not necessary.

A domain identifies a realm within a network. [RFC1034]

An IPv4 address is a 32-bit identifier. [RFC791]

An IPv6 address is a 128-bit identifier and for the purposes of this specification represented as an ordered list of eight 16-bit pieces. [RFC4291]

Support for <zone_id> is intentionally omitted.

3.1. IDNA

The domain to ASCII given a domain domain, runs these steps:

  1. Let result be the result of running Unicode ToASCII with domain_name set to domain, UseSTD3ASCIIRules set to false, processing_option set to Transitional_Processing, and VerifyDnsLength set to false.

  2. If result is a failure value, syntax violation, return failure.

  3. Return result.

The domain to Unicode given a domain domain, runs these steps:

  1. Let result be the result of running Unicode ToUnicode with domain_name set to domain, UseSTD3ASCIIRules set to false.

  2. Signify syntax violations for any returned errors, and then, return result.

3.2. Host syntax

A host must be a domain, an IPv4 address, or "[", followed by an IPv6 address, followed by "]".

A domain is a valid domain if these steps return success:

  1. Let result be the result of running Unicode ToASCII with domain_name set to domain, UseSTD3ASCIIRules set to true, processing_option set to Nontransitional_Processing, and VerifyDnsLength set to true.

  2. If result is a failure value, return failure.

  3. Set result to the result of running Unicode ToUnicode with domain_name set to result, UseSTD3ASCIIRules set to true.

  4. If result contains any errors, return failure.

  5. Return success.

Ideally we define this in terms of a sequence of code points that make up a valid domain rather than through a whack-a-mole: bug 25334.

A domain must be a string that is a valid domain.

An IPv4 address must be four sequences of up to three ASCII digits per sequence, each representing a decimal number no greater than 255, and separated from each other by ".".

An IPv6 address is defined in the "Text Representation of Addresses" chapter of IP Version 6 Addressing Architecture. [RFC4291]

3.3. Host parsing

The host parser takes a string input and an optional Unicode flag (unset unless stated otherwise), and then runs these steps:

  1. If input starts with "[", run these substeps:

    1. If input does not end with "]", syntax violation, return failure.

    2. Return the result of IPv6 parsing input with its leading "[" and trailing "]" removed.

  2. Let domain be the result of UTF-8 decode without BOM on the percent decoding of UTF-8 encode on input.

  3. Let asciiDomain be the result of running domain to ASCII on domain.

  4. If asciiDomain is failure, return failure.

  5. If asciiDomain contains U+0000, U+0009, U+000A, U+000D, U+0020, "#", "%", "/", ":", "?", "@", "[", "\", or "]", syntax violation, return failure.

  6. Let ipv4Host be the result of IPv4 parsing asciiDomain.

  7. If ipv4Host is an IPv4 address or failure, return ipv4Host.

  8. Return asciiDomain if the Unicode flag is unset, and the result of running domain to Unicode on asciiDomain otherwise.

The IPv4 number parser takes a string input and a syntaxViolationFlag pointer, and then runs these steps:

  1. Let R be 10.

  2. If input contains at least two code points and the first two code points are either "0x" or "0X", run these substeps:

    1. Set syntaxViolationFlag.

    2. Remove the first two code points from input.

    3. Set R to 16.

  3. If input is the empty string, return zero.

  4. Otherwise, if input contains at least two code points and the first code point is "0", run these substeps:

    1. Set syntaxViolationFlag.

    2. Remove the first code point from input.

    3. Set R to 8.

  5. If input contains a code point that is not a radix-R digit, and return failure.

  6. Return the mathematical integer value that is represented by input in radix-R notation, using ASCII hex digits for digits with values 0 through 15.

The IPv4 parser takes a string input and then runs these steps:

  1. Let syntaxViolationFlag be unset.

  2. Let parts be input split on ".".

  3. If the last item in parts is the empty string, set syntaxViolationFlag and remove the last item from parts.

  4. If parts has more than four items, return input.

  5. Let numbers be the empty list.

  6. For each part in parts:

    1. If part is the empty string, return input.

      0..0x300 is a domain, not an IPv4 address.

    2. Let n be the result of parsing part using syntaxViolationFlag.

    3. If n is failure, return input.

    4. Append n to numbers.

  7. If syntaxViolationFlag is set, syntax violation.

  8. If any item in numbers is greater than 255, syntax violation.

  9. If any but the last item in numbers is greater than 255, return failure.

  10. If the last item in numbers is greater than or equal to 256(5 − the number of items in numbers), syntax violation, return failure.

  11. Let ipv4 be the last item in numbers.

  12. Remove the last item from numbers.

  13. Let counter be zero.

  14. For each n in numbers:

    1. Increment ipv4 by n × 256(3 − counter).

    2. Increment counter by one.

  15. Return ipv4.

The IPv6 parser takes a string input and then runs these steps:

  1. Let address be a new IPv6 address with its 16-bit pieces initialized to 0.

  2. Let piece pointer be a pointer into address’s 16-bit pieces, initially zero (pointing to the first 16-bit piece), and let piece be the 16-bit piece it points to.

  3. Let compress pointer be another pointer into address’s 16-bit pieces, initially null and pointing to nothing.

  4. Let pointer be a pointer into input, initially zero (pointing to the first code point).

  5. If c is ":", run these substeps:

    1. If remaining does not start with ":", syntax violation, return failure.

    2. Increase pointer by two.

    3. Increase piece pointer by one and then set compress pointer to piece pointer.

  6. Main: While c is not the EOF code point, run these substeps:

    1. If piece pointer is eight, syntax violation, return failure.

    2. If c is ":", run these inner substeps:

      1. If compress pointer is non-null, syntax violation, return failure.

      2. Increase pointer and piece pointer by one, set compress pointer to piece pointer, and then jump to Main.
    3. Let value and length be 0.

    4. While length is less than 4 and c is an ASCII hex digit, set value to value × 0x10 + c interpreted as hexadecimal number, and increase pointer and length by one.

    5. Switching on c:

      "."
      1. If length is 0, syntax violation, return failure.

      2. Decrease pointer by length.

      3. Jump to IPv4.

      ":"
      1. Increase pointer by one.

      2. If c is the EOF code point, syntax violation, return failure.

      Anything but the EOF code point

      Syntax violation, return failure.

    6. Set piece to value.

    7. Increase piece pointer by one.

  7. If c is the EOF code point, jump to Finale.

  8. IPv4: If piece pointer is greater than six, syntax violation, return failure.

  9. Let dots seen be 0.

  10. While c is not the EOF code point, run these substeps:

    1. Let value be null.

    2. If c is not an ASCII digit, syntax violation, return failure.

    3. While c is an ASCII digit, run these subsubsteps:

      1. Let number be c interpreted as decimal number.

      2. If value is null, set value to number.

        Otherwise, if value is 0, syntax violation, return failure.

        Otherwise, set value to value × 10 + number.

      3. Increase pointer by one.

      4. If value is greater than 255, syntax violation, return failure.

    4. If dots seen is less than 3 and c is not a ".", syntax violation, return failure.

    5. Set piece to piece × 0x100 + value.

    6. If dots seen is 1 or 3, increase piece pointer by one.

    7. If c is not the EOF code point, increase pointer by one.

    8. If dots seen is 3 and c is not the EOF code point, syntax violation, return failure.

    9. Increase dots seen by one.

  11. Finale: If compress pointer is non-null, run these substeps:

    1. Let swaps be piece pointercompress pointer.

    2. Set piece pointer to seven.

    3. While piece pointer is not zero and swaps is greater than zero, swap piece with the piece at pointer compress pointer + swaps − 1, and then decrease both piece pointer and swaps by one.

  12. Otherwise, if compress pointer is null and piece pointer is not eight, syntax violation, return failure.

  13. Return address.

To be clear, Main, IPv4, and Finale are simple markers. They serve no purpose other than being a location the algorithm can jump to.

3.4. Host serializing

The host serializer takes a host host and then runs these steps:

  1. If host is an IPv4 address, return the result of running the IPv4 serializer on host.

  2. Otherwise, if host is an IPv6 address, return "[", followed by the result of running the IPv6 serializer on host, followed by "]".

  3. Otherwise, host is a domain, return host.

The IPv4 serializer takes an IPv4 address address and then runs these steps:

  1. Let output be the empty string.

  2. Let n be the value of address.

  3. Repeat four times:

    1. Prepend n % 256, serialized, to output.

    2. Unless this is the fourth time, prepend "." to output.

    3. Set n to floor(n / 256).

  4. Return output.

The IPv6 serializer takes an IPv6 address address and then runs these steps:

  1. Let output be the empty string.

  2. Let compress pointer be a pointer to the first 16-bit piece in the first longest sequences of address’s 16-bit pieces that are 0.

    In 0:f:0:0:f:f:0:0 it would point to the second 0.

  3. If there is no sequence of address’s 16-bit pieces that are 0 longer than one, set compress pointer to null.

  4. For each piece in address’s pieces, run these substeps:

    1. If compress pointer points to piece, append "::" to output if piece is address’s first piece and append ":" otherwise, and then run these substeps again with all subsequent pieces in address’s pieces that are 0 skipped or go the next step in the overall set of steps if that leaves no pieces.

    2. Append piece, represented as the shortest possible lowercase hexadecimal number, to output.

    3. If piece is not address’s last piece, append ":" to output.

  5. Return output.

This algorithm requires the recommendation from A Recommendation for IPv6 Address Text Representation. [RFC5952]

3.5. Host equivalence

To determine whether a host A equals B, return true if A is B, and false otherwise.

Certificate comparison requires a host equivalence check that ignores the trailing dot of a domain (if any). However, those hosts have also various other facets enforced, such as DNS length, that are not enforced here, as URLs do not enforce them. If anyone has a good suggestion for how to bring these two closer together, or what a good unified model would be, please file an issue.

4. URLs

A URL is a universal identifier. To disambiguate from a URL string it can also be referred to as a URL record.

A URL’s scheme is an ASCII string that identifies the type of URL and can be used to dispatch a URL for further processing after parsing. It is initially the empty string.

A URL’s username is an ASCII string identifying a user. It is initially the empty string.

A URL’s password is either null or an ASCII string identifying a user’s credentials. It is initially null.

A URL’s host is either null or a host. It is initially null.

A URL’s port is either null or a 16-bit unsigned integer that identifies a networking port. It is initially null.

A URL’s path is a list of zero or more ASCII string holding data, usually identifying a location in hierarchical form. It is initially the empty list.

A URL’s query is either null or an ASCII string holding data. It is initially null.

A URL’s fragment is either null or a string holding data that can be used for further processing on the resource the URL’s other components identify. It is initially null.

This is not an ASCII string on purpose.

A URL also has an associated cannot-be-a-base-URL flag. It is initially unset.

A URL also has an associated object that is either null or a Blob object. It is initially null. [FILEAPI]

At this point this is used primarily to support "blob" URLs, but others can be added going forward, hence "object".


A special scheme is a scheme listed in the first column of the following table. A default port is a special scheme’s optional corresponding port and is listed in the second column on the same row.

scheme port
"ftp" 21
"file"
"gopher" 70
"http" 80
"https" 443
"ws" 80
"wss" 443

A URL is special if its scheme is a special scheme.

A local scheme is a scheme that is "about", "blob", "data", or "filesystem".

A URL is local if its scheme is a local scheme.

This definition is used externally. E.g., by the Fetch Standard and Referrer Policy. [FETCH] [REFERRER-POLICY]

A network scheme is a scheme that is "ftp", "http", or "https".

An HTTP(S) scheme is a scheme that is "http" or "https".

Network scheme and HTTP(S) scheme are used by HTML. [HTML]

A URL includes credentials if either its username is not the empty string or its password is non-null.

A URL can be designated as base URL.

A base URL is useful for the URL parser when the input might be a relative URL.


To pop a url’s path, if url’s scheme is not "file" or url’s path does not contain a single string that is a normalized Windows drive letter, remove url’s path’s last string, if any.

4.1. URL syntax

A URL must be either a relative URL with fragment or an absolute URL with fragment. To disambiguate from a URL record it can also be referred to as a URL string.

An absolute URL with fragment must be an absolute URL, followed by "#" and a fragment.

An absolute URL must be one of the following

any optionally followed by "?" and a query.

A scheme must be one ASCII alpha, followed by zero or more of ASCII alphanumeric, "+", "-", and ".". Schemes should be registered in the IANA URI [sic] Schemes registry. [IANA-URI-SCHEMES] [RFC7595]

A relative URL with fragment must be a relative URL, followed by "#" and a fragment.

A relative URL must be one of the following, switching on base URL’s scheme:

Not "file"

a scheme-relative URL

a path-absolute URL

a path-relative scheme-less URL

"file"

a scheme-relative file URL

a path-absolute URL if base URL’s host is null

a path-absolute non-Windows-file URL if base URL’s host is non-null

a path-relative scheme-less URL

any optionally followed by "?" and a query.

A non-null base URL is necessary when parsing a relative URL.

A scheme-relative URL must be "//", followed by a host, optionally followed by ":" and a port, optionally followed by a path-absolute URL.

A port must be zero or more ASCII digits.

A scheme-relative file URL must be "//", followed by one of the following

A path-absolute URL must be "/" followed by a path-relative URL.

A path-absolute non-Windows-file URL must be a path-absolute URL that does not start with "/", followed by a Windows drive letter, followed by "/".

A path-relative URL must be zero or more path segments, separated from each other by "/", and not start with "/".

A path-relative scheme-less URL must be a path-relative URL that does not start with a scheme and ":".

A path segment must be one of the following

A single-dot path segment must be "." or an ASCII case-insensitive match for "%2e".

A double-dot path segment must be ".." or an ASCII case-insensitive match for ".%2e", "%2e.", or "%2e%2e".

A query must be zero or more URL units.

A fragment must be zero or more URL units.

The URL code points are ASCII alphanumeric, "!", "$", "&", "'", "(", ")", "*", "+", ",", "-", ".", "/", ":", ";", "=", "?", "@", "_", "~", and code points in the ranges U+00A0 to U+D7FF, U+E000 to U+FDCF, U+FDF0 to U+FFFD, U+10000 to U+1FFFD, U+20000 to U+2FFFD, U+30000 to U+3FFFD, U+40000 to U+4FFFD, U+50000 to U+5FFFD, U+60000 to U+6FFFD, U+70000 to U+7FFFD, U+80000 to U+8FFFD, U+90000 to U+9FFFD, U+A0000 to U+AFFFD, U+B0000 to U+BFFFD, U+C0000 to U+CFFFD, U+D0000 to U+DFFFD, U+E0000 to U+EFFFD, U+F0000 to U+FFFFD, U+100000 to U+10FFFD.

Code points higher than U+009F will be converted to percent-encoded bytes by the URL parser, except for code points appearing in fragments.

The URL units are URL code points and percent-encoded bytes.

Percent-encoded bytes can be used to encode code points that are not URL code points or are excluded from a syntax production.


There is no conforming way to express a username or password of a URL record within a URL string.

4.2. URL parsing

The URL parser takes a string input, with an optional base URL base and an optional encoding encoding override, and then runs these steps:

Non-web-browser implementations only need to implement the basic URL parser.

  1. Let url be the result of running the basic URL parser on input with base, and encoding override as provided.

  2. If url is failure, return failure.

  3. If url’s scheme is not "blob", return url.

  4. If the first string in url’s path is not in the blob URL store, return url. [FILEAPI]

  5. Set url’s object to a StructuredClone of the entry in the blob URL store corresponding to the first string in url’s path. [HTML]

  6. Return url.


The basic URL parser takes a string input, optionally with a base URL base, optionally with an encoding encoding override, optionally with an URL url and a state override state override, and then runs these steps:

The encoding override argument is a legacy concept only relevant for HTML. The url and state override arguments are only for use by various APIs. [HTML]

When the url and state override arguments are not passed, the basic URL parser returns either a new URL or failure. If they are passed, the algorithm simply modifies the passed url and can terminate without returning anything.

  1. If url is not given:

    1. Set url to a new URL.

    2. If input contains any leading or trailing C0 controls and space, syntax violation.

    3. Remove any leading and trailing C0 controls and space from input.

  2. If input contains any tab and newline, syntax violation.

  3. Remove all tab and newline from input.

  4. Let state be state override if given, or scheme start state otherwise.

  5. If base is not given, set it to null.

  6. Let encoding be UTF-8.

  7. If encoding override is given, set encoding to the result of getting an output encoding from encoding override.

  8. Let buffer be the empty string.

  9. Let the @ flag and the [] flag be unset.

  10. Let pointer be a pointer to first code point in input.

  11. Keep running the following state machine by switching on state. If after a run pointer points to EOF code point, go to the next step. Otherwise, increase pointer by one and continue with the state machine.

    scheme start state
    1. If c is an ASCII alpha, append c, lowercased, to buffer, and set state to scheme state.

    2. Otherwise, if state override is not given, set state to no scheme state, and decrease pointer by one.

    3. Otherwise, syntax violation, terminate this algorithm.

    scheme state
    1. If c is an ASCII alphanumeric, "+", "-", or ".", append c, lowercased, to buffer.

    2. Otherwise, if c is ":", run these substeps:

      1. If state override is given, run these subsubsteps:

        1. If url’s scheme is a special scheme and buffer is not, terminate this algorithm.

        2. If url’s scheme is not a special scheme and buffer is, terminate this algorithm.

      2. Set url’s scheme to buffer.

      3. Set buffer to the empty string.

      4. If state override is given, terminate this algorithm.

      5. If url’s scheme is "file", run these subsubsteps:

        1. If remaining does not start with "//", syntax violation.

        2. Set state to file state.

      6. Otherwise, if url is special, base is non-null, and base’s scheme is equal to url’s scheme, set state to special relative or authority state.

        This means that base’s cannot-be-a-base-URL flag is unset.

      7. Otherwise, if url is special, set state to special authority slashes state.

      8. Otherwise, if remaining starts with an "/", set state to path or authority state, and increase pointer by one.

      9. Otherwise, set url’s cannot-be-a-base-URL flag, append an empty string to url’s path, and set state to cannot-be-a-base-URL path state.

    3. Otherwise, if state override is not given, set buffer to the empty string, state to no scheme state, and start over (from the first code point in input).

    4. Otherwise, syntax violation, terminate this algorithm.

    no scheme state
    1. If base is null, or base’s cannot-be-a-base-URL flag is set and c is not "#", syntax violation, return failure.

    2. Otherwise, if base’s cannot-be-a-base-URL flag is set and c is "#", set url’s scheme to base’s scheme, url’s path to base’s path, url’s query to base’s query, url’s fragment to the empty string, set url’s cannot-be-a-base-URL flag, and set state to fragment state.

    3. Otherwise, if base’s scheme is not "file", set state to relative state and decrease pointer by one.

    4. Otherwise, set state to file state and decrease pointer by one.

    special relative or authority state

    If c is "/" and remaining starts with "/", set state to special authority ignore slashes state and increase pointer by one.

    Otherwise, syntax violation, set state to relative state and decrease pointer by one.

    path or authority state

    If c is "/", set state to authority state.

    Otherwise, set state to path state, and decrease pointer by one.

    relative state

    Set url’s scheme to base’s scheme, and then, switching on c:

    EOF code point

    Set url’s username to base’s username, url’s password to base’s password, url’s host to base’s host, url’s port to base’s port, url’s path to base’s path, and url’s query to base’s query.

    "/"

    Set state to relative slash state.

    "?"

    Set url’s username to base’s username, url’s password to base’s password, url’s host to base’s host, url’s port to base’s port, url’s path to base’s path, url’s query to the empty string, and state to query state.

    "#"

    Set url’s username to base’s username, url’s password to base’s password, url’s host to base’s host, url’s port to base’s port, url’s path to base’s path, url’s query to base’s query, url’s fragment to the empty string, and state to fragment state.

    Otherwise

    If url is special and c is "\", syntax violation, set state to relative slash state.

    Otherwise, run these steps:

    1. Set url’s username to base’s username, url’s password to base’s password, url’s host to base’s host, url’s port to base’s port, url’s path to base’s path, and then remove url’s path’s last entry, if any.

    2. Set state to path state, and decrease pointer by one.

    relative slash state
    1. If either c is "/", or url is special and c is "\", run these substeps:

      1. If c is "\", syntax violation.

      2. Set state to special authority ignore slashes state.

    2. Otherwise, set url’s username to base’s username, url’s password to base’s password, url’s host to base’s host, url’s port to base’s port, state to path state, and then, decrease pointer by one.

    special authority slashes state

    If c is "/" and remaining starts with "/", set state to special authority ignore slashes state, and increase pointer by one.

    Otherwise, syntax violation, set state to special authority ignore slashes state, and decrease pointer by one.

    special authority ignore slashes state

    If c is neither "/" nor "\", set state to authority state, and decrease pointer by one.

    Otherwise, syntax violation.

    authority state
    1. If c is "@", run these substeps:

      1. Syntax violation.

      2. If the @ flag is set, prepend "%40" to buffer.

      3. Set the @ flag.

      4. For each codePoint in buffer, run these substeps:

        1. If codePoint is ":" and url’s password is null, set url’s password to the empty string and run these substeps for the next code point.

        2. Let encodedCodePoints be the result of running UTF-8 percent encode codePoint using the userinfo encode set.

        3. If url’s password is non-null, append encodedCodePoints to url’s password.

        4. Otherwise, append encodedCodePoints to url’s username.

      5. Set buffer to the empty string.

    2. Otherwise, if one of the following is true

      then decrease pointer by the number of code points in buffer plus one, set buffer to the empty string, and set state to host state.

    3. Otherwise, append c to buffer.

    host state
    hostname state
    1. If c is ":" and the [] flag is unset, run these substeps:

      1. If url is special and buffer is the empty string, return failure.

      2. Let host be the result of host parsing buffer.

      3. If host is failure, return failure.

      4. Set url’s host to host, buffer to the empty string, and state to port state.

      5. If state override is hostname state, terminate this algorithm.

    2. Otherwise, if one of the following is true

      then decrease pointer by one, and run these substeps:

      1. If url is special and buffer is the empty string, return failure.

      2. Let host be the result of host parsing buffer.

      3. If host is failure, return failure.

      4. Set url’s host to host, buffer to the empty string, and state to path start state.

      5. If state override is given, terminate this algorithm.

    3. Otherwise, run these substeps:

      1. If c is "[", set the [] flag.

      2. If c is "]", unset the [] flag.

      3. Append c to buffer.

    port state
    1. If c is an ASCII digit, append c to buffer.

    2. Otherwise, if one of the following is true

      run these substeps:

      1. If buffer is not the empty string, run these subsubsteps:

        1. Let port be the mathematical integer value that is represented by buffer in radix-10 using ASCII digits for digits with values 0 through 9.

        2. If port is greater than 216 − 1, syntax violation, return failure.

        3. Set url’s port to null, if port is url’s scheme’s default port, and to port otherwise.

        4. Set buffer to the empty string.

      2. If state override is given, terminate this algorithm.

      3. Set state to path start state, and decrease pointer by one.

    3. Otherwise, syntax violation, return failure.

    file state

    Set url’s scheme to "file", and then, switching on c:

    EOF code point

    If base is non-null and base’s scheme is "file", set url’s host to base’s host, url’s path to base’s path, and url’s query to base’s query.

    "/"
    "\"
    1. If c is "\", syntax violation.

    2. Set state to file slash state.

    "?"

    If base is non-null and base’s scheme is "file", set url’s host to base’s host, url’s path to base’s path, url’s query to the empty string, and state to query state.

    "#"

    If base is non-null and base’s scheme is "file", set url’s host to base’s host, url’s path to base’s path, url’s query to base’s query, url’s fragment to the empty string, and state to fragment state.

    Otherwise
    1. If base is non-null, base’s scheme is "file", and at least one of the following is true

      then set url’s host to base’s host, url’s path to base’s path, and then pop url’s path.

      This is a (platform-independent) Windows drive letter quirk.

    2. Otherwise, if base is non-null and base’s scheme is "file", syntax violation.

    3. Set state to path state, and decrease pointer by one.

    file slash state
    1. If c is "/" or "\", run these substeps:

      1. If c is "\", syntax violation.

      2. Set state to file host state.

    2. Otherwise, run these substeps:

      1. If base is non-null, base’s scheme is "file", and base’s path first string is a normalized Windows drive letter, append base’s path first string to url’s path.

        This is a (platform-independent) Windows drive letter quirk. Both url’s and base’s host are null under these conditions and therefore not copied.

      2. Set state to path state, and decrease pointer by one.

    file host state
    1. If c is EOF code point, "/", "\", "?", or "#", decrease pointer by one, and run these substeps:

      1. If buffer is a Windows drive letter, syntax violation, set state to path state.

        This is a (platform-independent) Windows drive letter quirk. buffer is not reset here and instead used in the path state.

      2. Otherwise, if buffer is the empty string, set state to path start state.

      3. Otherwise, run these steps:

        1. Let host be the result of host parsing buffer.

        2. If host is failure, return failure.

        3. If host is not "localhost", set url’s host to host.

        4. Set buffer to the empty string and state to path start state.

    2. Otherwise, append c to buffer.

    path start state
    1. If url is special and c is "\", syntax violation.

    2. Set state to path state, and if neither c is "/", nor url is special and c is "\", decrease pointer by one.

    path state
    1. If one of the following is true

      then run these substeps:

      1. If url is special and c is "\", syntax violation.

      2. If buffer is a double-dot path segment, pop url’s path, and then if neither c is "/", nor url is special and c is "\", append the empty string to url’s path.

      3. Otherwise, if buffer is a single-dot path segment and if neither c is "/", nor url is special and c is "\", append the empty string to url’s path.

      4. Otherwise, if buffer is not a single-dot path segment, run these subsubsteps:

        1. If url’s scheme is "file", url’s path is empty, and buffer is a Windows drive letter, run these subsubsubsteps:

          1. If url’s host is non-null, syntax violation.

          2. Set url’s host to null and replace the second code point in buffer with ":".

          This is a (platform-independent) Windows drive letter quirk.

        2. Append buffer to url’s path.

      5. Set buffer to the empty string.

      6. If c is "?", set url’s query to the empty string, and state to query state.

      7. If c is "#", set url’s fragment to the empty string, and state to fragment state.

    2. Otherwise, run these steps:

      1. If c is not a URL code point and not "%", syntax violation.

      2. If c is "%" and remaining does not start with two ASCII hex digits, syntax violation.

      3. If c is "%" and remaining, ASCII lowercased starts with "2e", append "." to buffer and increase pointer by two.

      4. Otherwise, UTF-8 percent encode c using the default encode set, and append the result to buffer.

    cannot-be-a-base-URL path state
    1. If c is "?", set url’s query to the empty string and state to query state.

    2. Otherwise, if c is "#", set url’s fragment to the empty string and state to fragment state.

    3. Otherwise, run these substeps:

      1. If c is not EOF code point, not a URL code point, and not "%", syntax violation.

      2. If c is "%" and remaining does not start with two ASCII hex digits, syntax violation.

      3. If c is not EOF code point, UTF-8 percent encode c using the simple encode set, and append the result to the first string in url’s path.

    query state
    1. If c is EOF code point, or state override is not given and c is "#", run these substeps:

      1. If url is not special or url’s scheme is either "ws" or "wss", set encoding to UTF-8.

      2. Set buffer to the result of encoding buffer using encoding.

      3. For each byte in buffer run these subsubsteps:

        1. If byte is less than 0x21, greater than 0x7E, or is 0x22, 0x23, 0x3C, or 0x3E, append byte, percent encoded, to url’s query.

        2. Otherwise, append a code point whose value is byte to url’s query.

      4. Set buffer to the empty string.

      5. If c is "#", set url’s fragment to the empty string, and state to fragment state.

    2. Otherwise, run these substeps:

      1. If c is not a URL code point and not "%", syntax violation.

      2. If c is "%" and remaining does not start with two ASCII hex digits, syntax violation.

      3. Append c to buffer.

    fragment state

    Switching on c:

    EOF code point

    Do nothing.

    U+0000

    Syntax violation.

    Otherwise
    1. If c is not a URL code point and not "%", syntax violation.

    2. If c is "%" and remaining does not start with two ASCII hex digits, syntax violation.

    3. Append c to url’s fragment.

      Unfortunately not using percent-encoding is intentional as implementations with majority market share exhibit this behavior.

  12. Return url.


To set the username given a url and username, run these steps:

  1. Set url’s username to the empty string.

  2. For each code point in username, UTF-8 percent encode it using the userinfo encode set, and append the result to url’s username.

To set the password given a url and password, run these steps:

  1. If password is the empty string, set url’s password to null.

  2. Otherwise, run these substeps:

    1. Set url’s password to the empty string.

    2. For each code point in password, UTF-8 percent encode it using the userinfo encode set, and append the result to url’s password.

4.3. URL serializing

The URL serializer takes a URL url, an optional exclude fragment flag, and then runs these steps:

  1. Let output be url’s scheme and ":" concatenated.

  2. If url’s host is non-null:

    1. Append "//" to output.

    2. If url’s username is not the empty string or url’s password is non-null, run these substeps:

      1. Append url’s username to output.

      2. If url’s password is non-null, append ":", followed by url’s password, to output.

      3. Append "@" to output.

    3. Append url’s host, serialized, to output.

    4. If url’s port is non-null, append ":" followed by url’s port, serialized, to output.

  3. Otherwise, if url’s host is null and url’s scheme is "file", append "//" to output.

  4. If url’s cannot-be-a-base-URL flag is set, append the first string in url’s path to output.

  5. Otherwise, append "/", followed by the strings in url’s path (including empty strings), separated from each other by "/", to output.

  6. If url’s query is non-null, append "?", followed by url’s query, to output.

  7. If the exclude fragment flag is unset and url’s fragment is non-null, append "#", followed by url’s fragment, to output.

  8. Return output.

4.4. URL equivalence

To determine whether a URL A equals B, optionally with an exclude fragments flag, run these steps:

  1. Let serializedA be the result of serializing A, with the exclude fragment flag set if the exclude fragments flag is set.

  2. Let serializedB be the result of serializing B, with the exclude fragment flag set if the exclude fragments flag is set.

  3. Return true if serializedA is serializedB, and false otherwise.

4.5. Origin

See origin’s definition in HTML for the necessary background information. [HTML]

A URL’s origin is the origin returned by running these steps, switching on URL’s scheme:

"blob"

Let url be the result of parsing the first string in URL’s path.

Return a new opaque origin, if url is failure, and url’s origin otherwise.

The origin of blob:https://whatwg.org/d0360e2f-caee-469f-9a2f-87d5b0456f6f is the tuple (https, whatwg.org, 443, null).

"ftp"
"gopher"
"http"
"https"
"ws"
"wss"

Return a tuple consisting of URL’s scheme, URL’s host, URL’s port, and null.

"file"

Unfortunate as it is, this is left as an exercise to the reader. When in doubt, return a new opaque origin.

Otherwise

Return a new opaque origin.

This does indeed mean that these URLs cannot be same-origin with themselves.

4.6. URL rendering

A URL should be rendered in its serialized form, with these modifications:

For the purposes of bidirectional text it should be rendered as if it were in a left-to-right embedding. [BIDI]

Unfortunately, as rendered URLs are simply strings and can appear anywhere, a specific bidirectional algorithm for rendered URLs would not see wide adoption. Bidirectional text interacts with the parts of a URL in ways that can cause the rendering to be different from the model. Users of bidirectional languages are thus cautioned that this is to be expected, particularly in plain text environments.

Due to the confusion that can arise between a URL’s host and path with bidirectional text, browsers are encouraged to only render a URL’s host in places where it is important for users to distinguish between the two. E.g., users are expected to make trust decisions based on a URL’s host rendered in the address bar.

5. application/x-www-form-urlencoded

The application/x-www-form-urlencoded format is a simple way to encode name-value pairs in a byte sequence where all bytes are ASCII bytes.

The application/x-www-form-urlencoded format is in many ways an aberrant monstrosity, the result of many years of implementation accidents and compromises leading to a set of requirements necessary for interoperability, but in no way representing good design practices. In particular, readers are cautioned to pay close attention to the twisted details involving repeated (and in some cases nested) conversions between character encodings and byte sequences. Unfortunately the format is in widespread use due to the prevalence of HTML forms.[HTML]

5.1. application/x-www-form-urlencoded parsing

The features provided by the application/x-www-form-urlencoded parser are mainly relevant for server-oriented implementations. A browser-based implementation only needs what the application/x-www-form-urlencoded string parser requires.

The application/x-www-form-urlencoded parser takes a byte sequence input, optionally with an encoding encoding override, and optionally with a use _charset_ flag, and then runs these steps:

  1. Let encoding be UTF-8.

  2. If encoding override is given, set encoding to encoding override.

  3. If encoding is not UTF-8 and input contains bytes that are not ASCII bytes, return failure.

    This can only happen if input was not generated through the serializer or URLSearchParams.

  4. Let sequences be the result of splitting input on `&`.

  5. Let tuples be an empty list of name-value tuples where both name and value hold a byte sequence.

  6. For each byte sequence bytes in sequences, run these substeps:

    1. If bytes is the empty byte sequence, run these substeps for the next byte sequence.

    2. If bytes contains a `=`, then let name be the bytes from the start of bytes up to but excluding its first `=`, and let value be the bytes, if any, after the first `=` up to the end of bytes. If `=` is the first byte, then name will be the empty byte sequence. If it is the last, then value will be the empty byte sequence.

    3. Otherwise, let name have the value of bytes and let value be the empty byte sequence.

    4. Replace any `+` in name and value with 0x20.

    5. If use _charset_ flag is set and name is `_charset_`, run these substeps:

      1. Let result be the result of getting an encoding for value, decoded.

      2. If result is not failure, unset use _charset_ flag and set encoding to result.

    6. Add a tuple consisting of name and value to tuples.

  7. Let output be an empty list of name-value tuples where both name and value hold a string.

  8. For each name-value tuple in tuples, append a name-value tuple to output where the new name and value appended to output are the result of running decode on the percent decoding of the name and value from tuples, respectively, using encoding.

  9. Return output.

5.2. application/x-www-form-urlencoded serializing

The application/x-www-form-urlencoded byte serializer takes a byte sequence input and then runs these steps:

  1. Let output be the empty string.

  2. For each byte in input, depending on byte:

    0x20

    Append U+002B to output.

    0x2A
    0x2D
    0x2E
    0x30 to 0x39
    0x41 to 0x5A
    0x5F
    0x61 to 0x7A

    Append a code point whose value is byte to output.

    Otherwise

    Append byte, percent encoded, to output.

  3. Return output.

The application/x-www-form-urlencoded serializer takes a list of name-value or name-value-type tuples tuples, optionally with an encoding encoding override, and then runs these steps:

  1. Let encoding be UTF-8.

  2. If encoding override is given, set encoding to the result of getting an output encoding from encoding override.

  3. Let output be the empty string.

  4. For each tuple in tuples, run these substeps:

    1. Let outputPair be a new name-value pair.

    2. Set outputPair’s name to the result of serializing the result of encoding tuple’s name, using encoding.

    3. If tuple has a type, tuple’s type is "hidden", and outputPair’s name is "_charset_", set outputPair’s value to encoding’s name.

    4. Otherwise, if tuple has a type, and tuple’s type is "file", set outputPair’s value to tuple’s value’s filename.

    5. Otherwise, set outputPair’s value to the result of serializing the result of encoding tuple’s value, using encoding.

    6. If tuple is not the first pair in tuples, then append "&" to output.

    7. Append outputPair’s name, followed by "=", followed by outputPair’s value, to output.
  5. Return output.

The HTML standard invokes this algorithm with name-value-type tuples. [HTML]

5.3. Hooks

The application/x-www-form-urlencoded string parser takes a string input, UTF-8 encodes it, and then returns the result of application/x-www-form-urlencoded parsing it.

6. API

[Constructor(USVString url, optional USVString base),
 Exposed=(Window,Worker)]
interface URL {
  static USVString domainToASCII(USVString domain);
  static USVString domainToUnicode(USVString domain);

  stringifier attribute USVString href;
  readonly attribute USVString origin;
           attribute USVString protocol;
           attribute USVString username;
           attribute USVString password;
           attribute USVString host;
           attribute USVString hostname;
           attribute USVString port;
           attribute USVString pathname;
           attribute USVString search;
  readonly attribute URLSearchParams searchParams;
           attribute USVString hash;
};

A URL object has an associated url (a URL) and query object (a URLSearchParams object).

6.1. Constructors

The URL(url, base) constructor, when invoked, must run these steps:

  1. Let parsedBase be null.

  2. If base is given, run these substeps:

    1. Let parsedBase be the result of running the basic URL parser on base.

    2. If parsedBase is failure, throw a TypeError exception.

  3. Let parsedURL be the result of running the basic URL parser on url with parsedBase.

  4. If parsedURL is failure, throw a TypeError exception.

  5. Let query be parsedURL’s query, if that is non-null, and the empty string otherwise.

  6. Let result be a new URL object.

  7. Set result’s url to parsedURL.

  8. Set result’s query object to a new URLSearchParams object using query, and then set that query object’s url object to result.

  9. Return result.

To parse a string into a URL without using a base URL, invoke the URL constructor with a single argument:

var input = "https://example.org/💩",    url = new URL(input)
url.pathname // "/%F0%9F%92%A9"

This throws an exception if the input is not an absolute URL:

try {  var url = new URL("/🍣🍺")
} catch(e) {
  // that happened
}

A base URL is necessary if the input is a relative URL:

var input = "/🍣🍺",    url = new URL(input, document.baseURI)
url.href // "https://url.spec.whatwg.org/%F0%9F%8D%A3%F0%9F%8D%BA"

6.2. URL statics

The domainToASCII(domain) static method, when invoked, must run these steps:

  1. Let asciiDomain be the result of host parsing domain.

  2. Return the empty string if asciiDomain is not a domain, and asciiDomain otherwise.

The domainToUnicode(domain) static method, when invoked, must run these steps:

  1. Let unicodeDomain be the result of host parsing domain with the Unicode flag set.

  2. Return the empty string if unicodeDomain is not a domain, and unicodeDomain otherwise.

6.3. URL members

The href attribute’s getter must return the serialization of context object’s url.

The href attribute’s setter must run these steps:

  1. Let parsedURL be the result of running the basic URL parser on the given value.

  2. If parsedURL is failure, throw a TypeError exception.

  3. Set context object’s url to parsedURL.

The origin attribute’s getter must return the Unicode serialization of context object’s url’s origin. [HTML]

It returns the Unicode rather than the ASCII serialization for compatibility with HTML’s MessageEvent feature. [HTML]

The protocol attribute’s getter must return context object url’s scheme, followed by ":".

The protocol attribute’s setter must basic URL parse the given value, followed by ":", with context object’s url as url and scheme start state as state override.

The username attribute’s getter must return context object’s url’s username.

The username attribute’s setter must run these steps:

  1. If context object’s url’s host is null, or its cannot-be-a-base-URL flag is set, terminate these steps.

  2. Set the username given context object’s url and the given value.

The password attribute’s getter must run these steps:

  1. If context object’s url’s password is null, return the empty string.

  2. Return context object’s url’s password.

The password attribute’s setter must run these steps:

  1. If context object’s url’s host is null, or its cannot-be-a-base-URL flag is set, terminate these steps.

  2. Set the password given context object’s url and the given value.

The host attribute’s getter must run these steps:

  1. Let url be context object’s url.

  2. If url’s host is null, return the empty string.

  3. If url’s port is null, return url’s host, serialized.

  4. Return url’s host, serialized, followed by ":" and url’s port, serialized.

The host attribute’s setter must run these steps:

  1. If context object’s url’s cannot-be-a-base-URL flag is set, terminate these steps.

  2. Basic URL parse the given value with context object’s url as url and host state as state override.

If the given value for the host attribute’s setter lacks a port, context object’s url’s port will not change. This can be unexpected as host attribute’s getter does return a port so one might have assumed the setter to always "reset" both.

The hostname attribute’s getter must run these steps:

  1. If context object’s url’s host is null, return the empty string.

  2. Return context object’s url’s host, serialized.

The hostname attribute’s setter must run these steps:

  1. If context object’s url’s cannot-be-a-base-URL flag is set, terminate these steps.

  2. Basic URL parse the given value with context object’s url as url and hostname state as state override.

The port attribute’s getter must run these steps:

  1. If context object’s url’s port is null, return the empty string.

  2. Return context object’s url’s port, serialized.

The port attribute’s setter must run these steps:

  1. If context object’s url’s host is null, its cannot-be-a-base-URL flag is set, or its scheme is "file", terminate these steps.

  2. Basic URL parse the given value with context object’s url as url and port state as state override.

The pathname attribute’s getter must run these steps:

  1. If context object’s url’s cannot-be-a-base-URL flag is set, return the first string in context object’s url’s path.

  2. Return "/", followed by the strings in context object’s url’s path (including empty strings), separated from each other by "/".

The pathname attribute’s setter must run these steps:

  1. If context object’s url’s cannot-be-a-base-URL flag is set, terminate these steps.

  2. Empty context object’s url’s path.

  3. Basic URL parse the given value with context object’s url as url and path start state as state override.

The search attribute’s getter must run these steps:

  1. If context object’s url’s query is either null or the empty string, return the empty string.

  2. Return "?", followed by context object’s url’s query.

The search attribute’s setter must run these steps:

  1. Let url be context object’s url.

  2. If the given value is the empty string, set url’s query to null, empty url’s query object’s list, and terminate these steps.

  3. Let input be the given value with a single leading "?" removed, if any.

  4. Set url’s query to the empty string.

  5. Basic URL parse input with url as url and query state as state override.

  6. Set url’s query object’s list to the result of parsing input.

The searchParams attribute’s getter must return context object’s query object.

The hash attribute’s getter must run these steps:

  1. If context object’s url’s fragment is either null or the empty string, return the empty string.

  2. Return "#", followed by context object’s url’s fragment.

The hash attribute’s setter must run these steps:

  1. If context object’s url’s scheme is "javascript", terminate these steps.

  2. If the given value is the empty string, set context object’s url’s fragment to null and terminate these steps.

  3. Let input be the given value with a single leading "#" removed, if any.

  4. Set context object’s url’s fragment to the empty string.

  5. Basic URL parse input with context object’s url as url and fragment state as state override.

6.4. Interface URLSearchParams

[Constructor(optional (USVString or URLSearchParams) init = ""),
 Exposed=(Window,Worker)]
interface URLSearchParams {
  void append(USVString name, USVString value);
  void delete(USVString name);
  USVString? get(USVString name);
  sequence<USVString> getAll(USVString name);
  boolean has(USVString name);
  void set(USVString name, USVString value);
  iterable<USVString, USVString>;
  stringifier;
};

A URLSearchParams object has an associated list of name-value pairs, which is initially empty.

A URLSearchParams object has an associated url object, which is initially null.

To create a new URLSearchParams object, optionally using init, run these steps:

  1. Let query be a new URLSearchParams object.

  2. If init is a string, set query’s list to the result of parsing init.

  3. If init is a URLSearchParams object, set query’s list to a copy of init’s list.

  4. Return query.

A URLSearchParams object’s update steps are to set url object’s url’s query to the serialization of URLSearchParams object’s list.

The URLSearchParams(init) constructor, when invoked, must run these steps:

  1. If init is given, is a string, and starts with "?", remove the first code point from init.

  2. Return a new URLSearchParams object, using init if given.

The append(name, value) method, when invoked, must run these steps:

  1. Append a new name-value pair whose name is name and value is value, to list.

  2. Run the update steps.

The delete(name) method, when invoked, must run these steps:

  1. Remove all name-value pairs whose name is name from list.

  2. Run the update steps.

The get(name) method, when invoked, must return the value of the first name-value pair whose name is name in list, if there is such a pair, and null otherwise.

The getAll(name) method, when invoked, must return the values of all name-value pairs whose name is name, in list, in list order, and the empty sequence otherwise.

The set(name, value) method, when invoked, must run these steps:

  1. If there are any name-value pairs whose name is name, in list, set the value of the first such name-value pair to value and remove the others.

  2. Otherwise, append a new name-value pair whose name is name and value is value, to list.

  3. Run the update steps.

The has(name) method, when invoked, must return true if there is a name-value pair whose name is name in list, and false otherwise.

The value pairs to iterate over are the list name-value pairs with the key being the name and the value being the value.

The stringification behavior must return the serialization of the URLSearchParams object’s list.

6.5. URL APIs elsewhere

A standard that exposes URLs, should expose the URL as a string (by serializing an internal URL). A standard should not expose a URL using a URL object. URL objects are meant for URL manipulation. In IDL the USVString type should be used.

The higher-level notion here is that values are to be exposed as immutable data structures.

If a standard decides to use a variant of the name "URL" for a feature it defines, it should name such a feature "url" (i.e., lowercase and with an "l" at the end). Names such as "URL", "URI", and "IRI" should not be used. However, if the name is a compound, "URL" (i.e., uppercase) is preferred, e.g., "newURL" and "oldURL".

The EventSource and HashChangeEvent interfaces in HTML are examples of proper naming. [HTML]

Acknowledgments

There have been a lot of people that have helped make URLs more interoperable over the years and thereby furthered the goals of this standard. Likewise many people have helped making this standard what it is today.

With that, many thanks to 100の人, Adam Barth, Addison Phillips, Albert Wiersch, Alexandre Morgaut, Andrew Sullivan, Arkadiusz Michalski, Behnam Esfahbod, Bobby Holley, Boris Zbarsky, Brad Hill, Brandon Ross, Chris Rebert, Dan Appelquist, Daniel Bratell, David Håsäther, David Sheets, David Singer, David Walp, Domenic Denicola, Erik Arvidsson, Gavin Carothers, Geoff Richards, Glenn Maynard, Henri Sivonen, Ian Hickson, Jakub Gieryluk, James Graham, James Manger, James Ross, Joshua Bell, Jxck, Kevin Grandon, Larry Masinter, Leif Halvard Silli, Mark Davis, Marcos Cáceres, Martin Dürst, Mathias Bynens, Michael Peick, Michael™ Smith, Michel Suignard, Peter Occil, Philip Jägenstedt, Prayag Verma, Rodney Rehm, Roy Fielding, Ryan Sleevi, Sam Ruby, Santiago M. Mola, Sebastian Mayr, Simon Pieters, Simon Sapin, Stuart Cook, Sven Uhlig, Tab Atkins, 吉野剛史 (Takeshi Yoshino), Tantek Çelik, Tim Berners-Lee, Titi_Alone, Tomek Wytrębowicz, Valentin Gosu, Vyacheslav Matva, 山岸和利 (Yamagishi Kazutoshi), and 成瀬ゆい (Yui Naruse) for being awesome!

This standard is written by Anne van Kesteren (Mozilla, annevk@annevk.nl).

Per CC0, to the extent possible under law, the editors have waived all copyright and related or neighboring rights to this work.

Conformance

All diagrams, examples, and notes in this specification are non-normative, as are all sections explicitly marked non-normative. Everything else in this specification is normative.

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this specification are to be interpreted as described in RFC2119. For readability, these words do not appear in all uppercase letters in this specification. [RFC2119]

Conformance requirements phrased as algorithms or specific steps may be implemented in any manner, so long as the end result is equivalent. (In particular, the algorithms defined in this specification are intended to be easy to follow, and not intended to be performant.)

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[BIDI]
Mark Davis; Aharon Lanin; Andrew Glass. Unicode Bidirectional Algorithm. 5 June 2014. Unicode Standard Annex #9. URL: http://www.unicode.org/reports/tr9/
[DOM]
Anne van Kesteren. DOM Standard. Living Standard. URL: https://dom.spec.whatwg.org/
[ENCODING]
Anne van Kesteren. Encoding Standard. Living Standard. URL: https://encoding.spec.whatwg.org/
[FILEAPI]
Arun Ranganathan; Jonas Sicking. File API. URL: https://w3c.github.io/FileAPI/
[HTML]
Ian Hickson. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[IANA-URI-SCHEMES]
Uniform Resource Identifier (URI) Schemes. URL: http://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml
[IDNA]
Mark Davis; Michel Suignard. Unicode IDNA Compatibility Processing. URL: http://www.unicode.org/reports/tr46/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119
[RFC4291]
R. Hinden; S. Deering. IP Version 6 Addressing Architecture. February 2006. Draft Standard. URL: https://tools.ietf.org/html/rfc4291
[UTS36]
Mark Davis; Michel Suignard. Unicode Security Considerations. URL: http://unicode.org/reports/tr36/
[WEBIDL]
Boris Zbarsky; Cameron McCormack. Web IDL. URL: https://heycam.github.io/webidl/

Informative References

[FETCH]
Anne van Kesteren. Fetch Standard. Living Standard. URL: https://fetch.spec.whatwg.org/
[REFERRER-POLICY]
Jochen Eisinger; Mike West. Referrer Policy. 7 August 2014. WD. URL: https://w3c.github.io/webappsec/specs/referrer-policy/
[RFC1034]
P.V. Mockapetris. Domain names - concepts and facilities. November 1987. Internet Standard. URL: https://tools.ietf.org/html/rfc1034
[RFC3986]
T. Berners-Lee; R. Fielding; L. Masinter. Uniform Resource Identifier (URI): Generic Syntax. January 2005. Internet Standard. URL: https://tools.ietf.org/html/rfc3986
[RFC3987]
M. Duerst; M. Suignard. Internationalized Resource Identifiers (IRIs). January 2005. Proposed Standard. URL: https://tools.ietf.org/html/rfc3987
[RFC5952]
S. Kawamura; M. Kawashima. A Recommendation for IPv6 Address Text Representation. August 2010. Proposed Standard. URL: https://tools.ietf.org/html/rfc5952
[RFC6454]
A. Barth. The Web Origin Concept. December 2011. Proposed Standard. URL: https://tools.ietf.org/html/rfc6454
[RFC7595]
D. Thaler, Ed.; T. Hansen; T. Hardie. Guidelines and Registration Procedures for URI Schemes. June 2015. Best Current Practice. URL: https://tools.ietf.org/html/rfc7595
[RFC791]
J. Postel. Internet Protocol. September 1981. Internet Standard. URL: https://tools.ietf.org/html/rfc791

IDL Index

[Constructor(USVString url, optional USVString base),
 Exposed=(Window,Worker)]
interface URL {
  static USVString domainToASCII(USVString domain);
  static USVString domainToUnicode(USVString domain);

  stringifier attribute USVString href;
  readonly attribute USVString origin;
           attribute USVString protocol;
           attribute USVString username;
           attribute USVString password;
           attribute USVString host;
           attribute USVString hostname;
           attribute USVString port;
           attribute USVString pathname;
           attribute USVString search;
  readonly attribute URLSearchParams searchParams;
           attribute USVString hash;
};

[Constructor(optional (USVString or URLSearchParams) init = ""),
 Exposed=(Window,Worker)]
interface URLSearchParams {
  void append(USVString name, USVString value);
  void delete(USVString name);
  USVString? get(USVString name);
  sequence<USVString> getAll(USVString name);
  boolean has(USVString name);
  void set(USVString name, USVString value);
  iterable<USVString, USVString>;
  stringifier;
};