Components

Containers

Three containers are provided for interacting with URLs:

Name Description

url

A valid, modifiable URL which performs dynamic memory allocation to store the character buffer.

url_view

A read-only reference to a character buffer containing a valid URL. The view does not retain ownership of the underlying character buffer; instead, it is managed by the caller.

static_url

A valid, modifiable URL which stores the character buffer inside the class itself. This is a class template, where the maximum buffer size is a non-type template parameter.

Inheritance provides the observer and modifier public members; class url_view_base has all the observers, while class url_base has all the modifiers. Although the members are public, these base classes can only be constructed by the library as needed to support the implementation. The class hierarchy looks like this:

ClassHierarchy

Throughout this documentation and especially below, when an observer is discussed, it is applicable to all three derived containers shown in the table above. When a modifier is discussed, it is relevant to the containers url and static_url. The tables and exposition which follow describe the available observers and modifiers, along with notes relating important behaviors or special requirements.

Scheme

The most important part is the scheme, whose production rule is:

scheme        = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )

The scheme, which some informal texts incorrectly refer to as "protocol", defines how the rest of the URL is interpreted. Public schemes are registered and managed by the Internet Assigned Numbers Authority (IANA). Here are some registered schemes and their corresponding specifications:

Scheme Specification

http

magnet

mailto

payto

telnet

urn

Private schemes are possible, defined by organizations to enumerate internal resources such as documents or physical devices, or to facilitate the operation of their software. These are not subject to the same rigor as the registered ones; they can be developed and modified by the organization to meet specific needs with less concern for interoperability or backward compatibility. Note that private does not imply secret; some private schemes such as Amazon’s "s3" have publicly available specifications and are quite popular. Here are some examples:

Scheme Specification

app

odbc

slack

In some cases the scheme is implied by the surrounding context and therefore omitted. Here is a complete HTTP/1.1 GET request for the target URL "/index.htm":

GET /index.htm HTTP/1.1
Host: www.example.com
Accept: text/html
User-Agent: Beast

The scheme of "http" is implied here because the context is already an HTTP request. The production rule for the URL in the request above is called origin-form, defined in the HTTP specification thusly:

origin-form    = absolute-path [ "?" query ]

absolute-path  = 1*( "/" segment )

All URLs have a scheme, whether it is explicit or implicit. The scheme determines what the rest of the URL means.

Here are some more examples of URLs using various schemes (and one example of something that is not a URL):

URL Notes

https://www.boost.org/index.html

Hierarchical URL with https protocol. Resource in the HTTP protocol.

ftp://host.dom/etc/motd

Hierarchical URL with ftp scheme. Resource in the FTP protocol.

urn:isbn:045145052

Opaque URL with urn scheme. Identifies isbn resource.

mailto:person@example.com

Opaque URL with mailto scheme. Identifies e-mail address.

index.html

URL reference. Missing scheme and authority.

www.boost.org

A Protocol-Relative Link (PRL). Not a URL.

API Reference

The scheme is represented as a case-insensitive string, along with an enumeration constant which acts as a numeric identifier when the string matches one of the well-known schemes: http, https, ws, wss, file, and ftp. Characters in the scheme are never escaped; only letters and numbers are allowed, and the first character must be a letter.

These members are used to inspect and modify the scheme:

Function Return Type Description

has_scheme

bool

Return true if a scheme is present.

scheme

string_view

Return the scheme as a string, or the empty string if there is no scheme.

scheme_id

scheme

Return the scheme as an enumerated constant, the value scheme::unknown if the scheme is not one of the well-known schemes, or the value scheme::none if there is no scheme.

Function Parameters Description

set_scheme

string_view

Set the scheme to a string.

set_scheme_id

scheme

Set the scheme to a well-known scheme constant.

remove_scheme

Remove the scheme if present. This includes the trailing colon (":").

Some package managers (pip, npm) and tools use compound schemes like git+https:// or svn+ssh:// where a plus sign separates a protocol from a transport mechanism. Boost.URL treats these as single scheme strings per RFC 3986 (which allows plus signs). To extract the transport suffix, use a helper like scheme_ex:

// Helper function to extract transport scheme from compound schemes
boost::core::string_view
scheme_ex(boost::core::string_view s)
{
    // Find the last '+' in the scheme
    // Examples: "git+https" -> "https", "svn+ssh" -> "ssh"
    auto pos = s.rfind('+');
    if (pos != boost::core::string_view::npos)
        return s.substr(pos + 1);
    return {};
}

This is an informal convention, not a URL standard. See WHATWG discussion.

Authority

The authority determines how a resource can be accessed. It contains two parts: the userinfo that holds identity credentials, and the host and port which identify a communication endpoint having dominion over the resource described in the remainder of the URL. This is the ABNF specification for the authority part:

authority   = [ user [ ":" password ] "@" ] host [ ":" port ]

The combination of user and optional password is called the userinfo.

AuthorityDiagram

Some observations:

  • The use of the password field is deprecated.

  • The authority always has a defined host field, even if empty.

  • The host can be a name, or an IPv4, an IPv6, or an IPvFuture address.

  • All but the port field use percent-encoding to escape delimiters.

The host subcomponent represents where resources are located.

Note that if an authority is present, the host is always defined even if it is the empty string (corresponding to a zero-length reg-name in the BNF).

url_view u( "https:///path/to_resource" );
assert( u.has_authority() );
assert( u.authority().buffer().empty() );
assert( u.path() == "/path/to_resource" );

The authority component also influences how we should interpret the URL path. If the authority is present, the path component must either be empty or begin with a slash.

Although the specification allows the format username:password, the password component should be used with care.

It is not recommended to transfer password data through URLs unless this is an empty string indicating no password.

API Reference

The authority is an optional part whose presence is indicated by an unescaped double slash ("//") immediately following the scheme, or at the beginning if the scheme is not present. It contains three components: an optional userinfo, the host, and an optional port.

An empty authority, corresponding to just a zero-length host component, is distinct from the absence of an authority. These members are used to inspect and modify the authority as a whole string:

Function Return Type Description

has_authority

bool

Return true if an authority is present.

authority

authority_view

Return the authority as a decoded string.

encoded_authority

pct_string_view

Return the authority as a read-only view.

Function Parameters Description

set_encoded_authority

pct_string_view

Set the authority to the string, which may contain percent escapes. Reserved characters are percent-escaped automatically.

remove_authority

Remove the authority if present. This includes the leading double slash ("//").

The paragraphs and tables that follow describe how to interact with the individual parts of the authority.

Userinfo

An authority may have an optional userinfo, which consists of a user and optional password. The presence of the userinfo is indicated by an unescaped at sign ("@") which comes afterwards. The password if present is prefixed by an unescaped colon (":"). An empty password string is distinct from no password. This table shows various URLs with userinfos, and the corresponding user and password:

URL User Password Notes

//user:pass@

"user"

"pass"

User and password

//@

""

Empty user, no password

//user@

"user"

No password

//user:@

"user"

""

Empty password

//:pass@

""

"pass"

Empty user

//:@

""

""

Empty user and password

Although the specification allows the format username:password, the password component is deprecated and should be avoided if possible or otherwise used with care. It is not recommended to transfer password data through URLs unless it is an empty string indicating no password.

These members are used to inspect and modify the userinfo:

Function Return Type Description

has_userinfo

bool

Return true if a userinfo is present.

has_password

bool

Return true if a password is present.

user

std::string

Return the user as a decoded string.

password

std::string

Return the password as a decoded string.

userinfo

std::string

Return the userinfo as a decoded string.

encoded_user

pct_string_view

Return the user.

encoded_password

pct_string_view

Return the password, or an empty string if no password is present.

encoded_userinfo

pct_string_view

Return the userinfo.

Function Parameters Description

set_user

string_view

Set the user to the string. Reserved characters are percent-escaped automatically.

set_password

string_view

Set the password to the string. Reserved characters are percent-escaped automatically.

set_userinfo

string_view

Set the userinfo to the string. Reserved characters are percent-escaped automatically.

set_encoded_user

pct_string_view

Set the user to the string, which may contain percent escapes. Reserved characters are percent-escaped automatically.

set_encoded_password

pct_string_view

Set the password to the string, which may contain percent escapes. Reserved characters are percent-escaped automatically.

set_encoded_userinfo

pct_string_view

Set the userinfo to the string, which may contain percent escapes. Reserved characters are percent-escaped automatically.

remove_password

Remove the password if present. This includes the password separator colon (":").

remove_userinfo

Remove the userinfo if present. This includes the user and password separator colon (":") and the trailing at sign ("@").

Host

The host portion of the authority is a string which can be a host name, an IPv4 address, an IPv6 address, or an IPvFuture address depending on the contents. The host is always defined if an authority is present, even if the resulting host string would be zero length.

These members are used to inspect and modify the host:

Function Return Type Description

host_type

host_type

Return the host type enumeration constant. If there is no authority, this is the value host_type::none.

host

std::string

Return the host as a decoded string, or an empty string if there is no authority.

host_address

std::string

Return the host as a decoded string. If the host type is host_type::ipv6 or host_type::ipvfuture, the enclosing brackets are removed.

host_name

std::string

Return the host name as a decoded string, or the empty string if the host type is not host_type::name.

host_ipv4_address

ipv4_address

Return the host as an ipv4_address. If the host type is not host_type::ipv4, a default-constructed value is returned.

host_ipv6_address

ipv6_address

Return the host as an ipv6_address. If the host type is not host_type::ipv6, a default-constructed value is returned.

host_ipvfuture

string_view

Return the host as a string without enclosing brackets if the host type is host_type::ipvfuture, otherwise return an empty string.

encoded_host

pct_string_view

Return the host, or an empty string if there is no authority. This includes enclosing brackets if the host type is host_type::ipv6 or host_type::ipvfuture.

encoded_host_address

pct_string_view

Return the host. If the host type is host_type::ipv6 or host_type::ipvfuture, the enclosing brackets are removed.

encoded_host_name

pct_string_view

Return the host name as a string. If the host type is not host_type::name, an empty string is returned.

Function Parameters Description

set_host

string_view

Set the host to the string, depending on the contents. If the string is a valid IPv4 address, a valid IPv6 address enclosed in brackets, or a valid IPvFuture address enclosed in brackets then the resulting host type is host_type::ipv4, host_type::ipv6, or host_type::ipvfuture respectively. Otherwise, the host type is host_type::name, and any reserved characters are percent-escaped automatically.

set_host_address

string_view

Set the host to the string, depending on the contents. If the string is a valid IPv4 address, a valid IPv6 address, or a valid IPvFuture address then the resulting host type is host_type::ipv4, host_type::ipv6, or host_type::ipvfuture respectively. Otherwise, the host type is host_type::name, and any reserved characters are percent-escaped automatically.

set_host_ipv4

ipv4_address

Set the host to the IPv4 address. The host type is host_type::ipv4.

set_host_ipv6

ipv6_address

Set the host to the IPv6 address. The host type is host_type::ipv6.

set_host_ipvfuture

string_view

Set the host to the IPvFuture address, which should not include square brackets. The host type is host_type::ipvfuture. If the string is not a valid IPvFuture address, an exception is thrown.

set_host_name

string_view

Set the host to the string. Any reserved characters are percent-escaped automatically. The host type is host_type::name.

set_encoded_host

pct_string_view

Set the host to the string, depending on the contents. If the string is a valid IPv4 address, a valid IPv6 address enclosed in brackets, or a valid IPvFuture address enclosed in brackets then the resulting host type is host_type::ipv4, host_type::ipv6, or host_type::ipvfuture respectively. Otherwise, the host type is host_type::name, the string may contain percent escapes, and any reserved characters are percent-escaped automatically.

set_encoded_host_address

pct_string_view

Set the host to the string, depending on the contents. If the string is a valid IPv4 address, a valid IPv6 address, or a valid IPvFuture address then the resulting host type is host_type::ipv4, host_type::ipv6, or host_type::ipvfuture respectively. Otherwise, the host type is host_type::name, the string may contain percent escapes, and any reserved characters are percent-escaped automatically.

set_encoded_host_name

pct_string_view

Set the host to the string, which may contain percent escapes. Any reserved characters are percent-escaped automatically. The host type is host_type::name.

Port

The port is a string of digits, possibly of zero length. The presence of a port is indicated by a colon prefix (":") appearing after the host and userinfo. A zero length port string is distinct from the absence of a port. The library represents the port with both a decimal string and an unsigned 16-bit integer. If the numeric value of the string would exceed the range of the integer, then it is mapped to the number zero.

These members are used to inspect and modify the port:

Function Return Type Description

has_port

bool

Return true if a port is present.

port

string_view

Return the port as a string, or an empty string if there is no port.

port_number

std::uint16_t

Return the port as an unsigned integer. If the number would be greater than 65535, then zero is returned.

Function Parameters Description

set_port

string_view

Set the port to a string. If the string contains any character which is not a digit, an exception is thrown.

set_port_number

std::uint16_t

Set the port to a number.

remove_port

Remove the port if present. This does not remove the authority.

Path

Depending on the scheme, the path may be treated as a string, or as a hierarchically structured sequence of segments delimited by unescaped forward-slashes ("/"). A path is always defined for every URL, even if it is the empty string.

These members are used to inspect and modify the path:

Function Return Type Description

is_path_absolute

bool

Return true if the path starts with a forward slash ("/").

path

std::string

Return the path as a decoded string.

encoded_path

pct_string_view

Return the path.

segments

segments_view

Return the path as a range of decoded segments.

encoded_segments

segments_encoded_view

Return the path as a range of segments.

Function Parameters Description

set_path

string_view

Set the path to the string. Reserved characters are percent-escaped automatically.

set_path_absolute

bool

Set whether the path is absolute.

set_encoded_path

pct_string_view

Set the path to the string, which may contain percent escapes. Reserved characters are percent-escaped automatically.

segments

segments_ref

Return the path as a modifiable range of decoded segments.

encoded_segments

segments_encoded_ref

Return the path as a modifiable range of segments.

The segments-based containers segments_view, segments_ref, segments_encoded_view, and segments_encoded_ref are discussed in a later section.

Query

Depending on the scheme, the query may be treated as a string, or as a structured series of key-value pairs (called "params") separated by unescaped ampersands ("&"). The query is optional; an empty query string is distinct from no query.

These members are used to inspect and modify the query:

Function Return Type Description

has_query

bool

Return true if a query is present.

query

std::string

Return the query as a decoded string.

encoded_query

pct_string_view

Return the query.

params

params_view

Return the query as a read-only range of decoded params.

encoded_params

params_encoded_view

Return the query as a read-only range of params.

Function Parameters Description

set_query

string_view

Set the query to a string. Reserved characters are percent-escaped automatically.

set_encoded_query

pct_string_view

Set the query to a string, which may contain percent escapes. Reserved characters are percent-escaped automatically.

params

params_ref

Return the query as a modifiable range of decoded params.

encoded_params

params_encoded_ref

Return the query as a modifiable range of params.

remove_query

Remove the query. This also removes the leading question mark ("?") if present.

The params-based containers params_view, params_ref, params_encoded_view, and params_encoded_ref are discussed in a later section.

Fragment

The fragment is treated as a string; there is no common, structured interpretation of the contents.

These members are used to inspect and modify the fragment:

Function Return Type Description

has_fragment

bool

Return true if a fragment is present.

fragment

std::string

Return the fragment as a decoded string.

encoded_fragment

pct_string_view

Return the fragment.

Function Parameters Description

set_fragment

string_view

Set the fragment to the string. Reserved characters are percent-escaped automatically.

set_encoded_fragment

pct_string_view

Set the fragment to the string, which may contain percent escapes. Reserved characters are percent-escaped automatically.

remove_fragment

Remove the fragment. This also removes the leading pound sign ("#") if present.

Compound Fields

For convenience, these observers and modifiers for aggregated subsets of the URL are provided:

Function Return Type Description

encoded_host_and_port

pct_string_view

Return the host and port as a string with percent escapes.

encoded_origin

pct_string_view

Return only the scheme and authority parts as an individual string.

encoded_resource

pct_string_view

Return only the path, query, and fragment parts as an individual string.

encoded_target

pct_string_view

Return only the path and query parts as an individual string.

Function Parameters Description

remove_origin

Remove the scheme and authority parts from the URL.