- Notifications
You must be signed in to change notification settings - Fork54
Boost.URL is a library for manipulating Uniform Resource Identifiers (URIs) and Locators (URLs).
License
boostorg/url
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Branch | master | develop |
---|---|---|
Boost.URL is a portable C++ library which provides containers and algorithms which model a "URL," more formally described using theUniform Resource Identifier (URI)specification (henceforth referred to asrfc3986).A URL is a compact sequence of characters that identifies an abstract or physical resource.For example, this is a valid URL:
https://www.example.com/path/to/file.txt?userid=1001&pages=3&results=full#page1
This library understands the grammars related to URLs and provides functionality to validate, parse, examine, and modify urls, and apply normalization or resolution algorithms.
While the library is general purpose, special care has been taken to ensure that the implementation and data representation are friendly to network programs which need to handle URLs efficiently and securely, including the case where the inputs come from untrusted sources.Interfaces are provided for using error codes instead of exceptions as needed, and most algorithms have the means to opt-out of dynamic memory allocation.Another feature of the library is that all modifications leave the URL in a valid state.Code which uses this library is easy to read, flexible, and performant.
Network programs such as those using Boost.Asio or Boost.Beast often encounter the need to process, generate, or modify URLs.This library provides a very much needed modular component for handling these use-cases.
Boost.URL offers these features:
C++11 as only requirement
Fast compilation, few templates
Strict compliance withrfc3986
Containers that maintain valid URLs
Parsing algorithms that work without exceptions
Control over storage and allocation for URLs
Support for
-fno-exceptions
, detected automaticallyFeatures that work well on embedded devices
Note | Currently, the library does not handleInternationalized Resource Identifiers (IRIs).These are different from URLs, come from Unicode strings instead of low-ASCII strings, and are covered by a separate specification. |
The library requires a compiler supporting at least C++11.
Aliases for standard types, such aserror_code orstring_view
, use their Boost equivalents.
Boost.URL works great on embedded devices.It can be used in a way that avoids all dynamic memory allocations.Furthermore, it offers alternative interfaces that work without exceptions if desired.
Boost.URL has been tested with the following compilers:
clang: 3.8, 4, 5, 6, 7, 8, 9, 10, 11, 12
gcc: 4.8, 4.9, 5, 6, 7, 8, 9, 10, 11
msvc: 14.1, 14.2, 14.3
and these architectures: x86, x64, ARM64, S390x.
We do not test and support gcc 8.0.1.
The development infrastructure for the library includes these per-commit analyses:
Coverage reports
Compilation and tests on Drone.io and GitHub Actions
Regular code audits for security
Various names have been used historically to refer to different flavors of resource identifiers, includingURI,URL,URN, and evenIRI.Over time, the distinction between URIs and URLs has disappeared when discussed in technical documents and informal works.In this library we use the termURL to refer to all strings which are valid according to the top-level grammar rules found inrfc3986.
This documentation uses theAugmented Backus-Naur Form(ABNF) notation ofrfc5234to specify particular grammars used by algorithms and containers.While a complete understanding of the notation is not a requirement for using the library, it may help for an understanding of how valid components of URLs are defined.In particular, this is of interest to users who wish to compose parsing algorithms using the combinators provided by the library.
Note | Sample code and identifiers used throughout are written as if the following declarations are in effect: #include<boost/url.hpp>usingnamespaceboost::urls; |
We begin by including the library header file which brings all the symbols into scope.
#include<boost/url.hpp>
Alternatively, individual headers may be included to obtain the declarations for specific types.
Boost.URL is a compiled library.You need to install binaries in a location that can be found by your linker and link your program with the Boost.URL built library.If you followed the [@http://www.boost.org/doc/libs/release/more/getting_started/index.html Boost Getting Started]instructions, that’s already been done for you.
For example, if you are using CMake, you can use the following commands to find and link the library:
find_package(Boost REQUIRED COMPONENTS url)target_link_libraries(my_programPRIVATE Boost::url)
Say you have the following URL that you want to parse:
boost::core::string_view s ="https://user:pass@example.com:443/path/to/my%2dfile.txt?id=42&name=John%20Doe+Jingleheimer%2DSchmidt#page%20anchor";
In this example,string_view
is an alias toboost::core::string_view
, astring_view
implementation that is implicitly convertible from and tostd::string_view
.
You can parse the string by calling this function:
boost::system::result<url_view> r = parse_uri( s );
The functionparse_uri returns an object of typeresult<url_view>
which is a container resembling a variant that holds either an error or an object.A number of functions are available to parse different types of URL.
We can immediately callresult::value
to obtain aurl_view
.
url_view u = r.value();
Or simply
url_view u = *r;
for unchecked access.
When there are no errors,result::value
returns an instance ofurl_view, which holds the parsed result.
result::value
throws an exception on a parsing error.Alternatively, the functionsresult::has_value
andresult::has_error
could also be used to check if the string has been parsed without errors.
Note | It is worth noting thatparse_uri does not allocate any memory dynamically.Like a As long as the contents of the original string are unmodified, constructed URL views always contain a valid URL in its correctly serialized form. If the input does not match the URL grammar, an error code is reported throughresult rather than exceptions.Exceptions only thrown on excessive input length. |
Accessing the parts of the URL is easy:
url_viewu("https://user:pass@example.com:443/path/to/my%2dfile.txt?id=42&name=John%20Doe+Jingleheimer%2DSchmidt#page%20anchor" );assert(u.scheme() == "https");assert(u.authority().buffer() == "user:pass@example.com:443");assert(u.userinfo() == "user:pass");assert(u.user() == "user");assert(u.password() == "pass");assert(u.host() == "example.com");assert(u.port() == "443");assert(u.path() == "/path/to/my-file.txt");assert(u.query() == "id=42&name=John Doe Jingleheimer-Schmidt");assert(u.fragment() == "page anchor");
URL paths can be further divided into path segments with the functionurl_view::segments
.
Although URL query strings are often used to represent key/value pairs, this interpretation is not defined byrfc3986.Users can treat the query as a single entity.url_view provides the functionurl_view::params
to extract this view of key/value pairs.
for (auto seg: u.segments())std::cout << seg <<"\n";std::cout <<"\n";for (auto param: u.params())std::cout << param.key <<":" << param.value <<"\n";std::cout <<"\n";
The output is:
pathtomy-file.txtid: 42name: John Doe Jingleheimer-Schmidt
These functions return views referring to substrings and sub-ranges of the underlying URL.By simply referencing the relevant portion of the URL string internally, its components can represent percent-decoded strings and be converted to other types without any previous memory allocation.
std::string h = u.host();assert(h =="example.com");
A specialstring_token
type can also be used to specify how a portion of the URL should be encoded and returned.
std::string h ="host:";u.host(string_token::append_to(h));assert(h =="host: example.com");
These functions might also return empty strings
url_view u1 = parse_uri("http://www.example.com" ).value();assert(u1.fragment().empty());assert(!u1.has_fragment());
for both empty and absent components
url_view u2 = parse_uri("http://www.example.com/#" ).value();assert(u2.fragment().empty());assert(u2.has_fragment());
Many components do not have corresponding functions such ashas_authority
to check for their existence.This happens because some URL components are mandatory.
When applicable, the encoded components can also be directly accessed through astring_view
without any need to allocate memory:
std::cout <<"url :" << u <<"\n""scheme :" << u.scheme() <<"\n""authority :" << u.encoded_authority() <<"\n""userinfo :" << u.encoded_userinfo() <<"\n""user :" << u.encoded_user() <<"\n""password :" << u.encoded_password() <<"\n""host :" << u.encoded_host() <<"\n""port :" << u.port() <<"\n""path :" << u.encoded_path() <<"\n""query :" << u.encoded_query() <<"\n""fragment :" << u.encoded_fragment() <<"\n";
The output is:
url : https://user:pass@example.com:443/path/to/my%2dfile.txt?id=42&name=John%20Doe+Jingleheimer%2DSchmidt#page%20anchorscheme : httpsauthority : user:pass@example.com:443userinfo : user:passuser : userpassword : passhost : example.comport : 443path : /path/to/my%2dfile.txtquery : id=42&name=John%20Doe+Jingleheimer%2DSchmidtfragment : page%20anchor
An instance ofdecode_view
provides a number of functions to persist a decoded string:
decode_viewdv("id=42&name=John%20Doe%20Jingleheimer%2DSchmidt");std::cout << dv <<"\n";
The output is:
id=42&name=John Doe Jingleheimer-Schmidt
decode_view
and its decoding functions are designed to perform no memory allocations unless the algorithm where its being used needs the result to be in another container.The design also permits recycling objects to reuse their memory, and at least minimize the number of allocations by deferring them until the result is in fact needed by the application.
In the example above, the memory owned bystr
can be reused to store other results.This is also useful when manipulating URLs:
u1.set_host(u2.host());
Ifu2.host()
returned a value type, then two memory allocations would be necessary for this operation.Another common use case is converting URL path segments into filesystem paths:
boost::filesystem::path p;for (auto seg: u.segments())p.append(seg.begin(), seg.end());std::cout <<"path:" << p <<"\n";
The output is:
path: "path/to/my-file.txt"
In this example, only the internal allocations offilesystem::path
need to happen.In many common use cases, no allocations are necessary at all, such as finding the appropriate route for a URL in a web server:
auto match = [](std::vector<std::string>const& route,url_view u){auto segs = u.segments();if (route.size() != segs.size())returnfalse;returnstd::equal(route.begin(),route.end(),segs.begin());};
This allows us to easily match files in the document root directory of a web server:
std::vector<std::string> route ={"community","reviews.html"};if (match(route, u)){handle_route(route, u);}
The path and query parts of the URL are treated specially by the library.While they can be accessed as individual encoded strings, they can also be accessed through special view types.
This code callsencoded_segments
to obtain the path segments as a container that returns encoded strings:
segments_view segs = u.encoded_segments();for(auto v : segs ){std::cout << v <<"\n";}
The output is:
path to my-file.txt
As with otherurl_view
functions which return encoded strings, the encoded segments container does not allocate memory.Instead, it returns views to the corresponding portions of the underlying encoded buffer referenced by the URL.
As with other library functions,decode_view
permits accessing elements of composed elements while avoiding memory allocations entirely:
segments_view segs = u.encoded_segments();for( pct_string_view v : segs ){decode_view dv = *v;std::cout << dv <<"\n";}
The output is:
path to my-file.txt
Or with the encoded query parameters:
params_encoded_view params_ref = u.encoded_params();for(auto v : params_ref ){ decode_viewdk(v.key); decode_viewdv(v.value); std::cout <<"key =" << dk <<", value =" << dv <<"\n";}
The output is:
key = id, value = 42key = name, value = John Doe
The library provides the containersurl
andstatic_url
which supporting modification of the URL contents.Aurl
orstatic_url
must be constructed from an existingurl_view
.
Unlike theurl_view
, which does not gain ownership of the underlying character buffer, theurl
container uses the default allocator to control a resizable character buffer which it owns.
url u = parse_uri( s ).value();
On the other hand, astatic_url
has fixed-capacity storage and does not require dynamic memory allocations.
static_url<1024> su = parse_uri( s ).value();
Objects of typeurl
arestd::regular.Similarly to built-in types, such asint
, aurl
is copyable, movable, assignable, default constructible, and equality comparable.They support all the inspection functions ofurl_view
, and also provide functions to modify all components of the URL.
Changing the scheme is easy:
u.set_scheme("https" );
Or we can use a predefined constant:
u.set_scheme_id( scheme::https );// equivalent to u.set_scheme( "https" );
The scheme must be valid, however, or an exception is thrown.All modifying functions perform validation on their input.
Attempting to set the URL scheme or port to an invalid string results in an exception.
Attempting to set other URL components to invalid strings will get the original input properly percent-encoded for that component.
It is not possible for aurl
to hold syntactically illegal text.
Modification functions return a reference to the object, so chaining is possible:
u.set_host_ipv4( ipv4_address("192.168.0.1" ) ) .set_port_number(8080 ) .remove_userinfo();std::cout << u <<"\n";
The output is:
https://192.168.0.1:8080/path/to/my%2dfile.txt?id=42&name=John%20Doe#page%20anchor
All non-const operations offer the strong exception safety guarantee.
The path segment and query parameter containers returned by aurl
offer modifiable range functionality, using member functions of the container:
params_ref p = u.params();p.replace(p.find("name"), {"name","John Doe"});std::cout << u <<"\n";
The output is:
https://192.168.0.1:8080/path/to/my%2dfile.txt?id=42&name=Vinnie%20Falco#page%20anchor
Algorithms to format URLs construct a mutable URL by parsing and applying arguments to a URL template.The following example uses theformat
function to construct an absolute URL:
url u = format("{}://{}:{}/rfc/{}","https","www.ietf.org",80,"rfc2396.txt");assert(u.buffer() == "https://www.ietf.org:80/rfc/rfc2396.txt");
The rules for a format URL string are the same as for astd::format_string
, where replacement fields are delimited by curly braces.The URL type is inferred from the format string.
The URL components to which replacement fields belong are identified before replacement is applied and any invalid characters for that formatted argument are percent-escaped:
url u = format("https://{}/{}","www.boost.org","Hello world!");assert(u.buffer() == "https://www.boost.org/Hello%20world!");
Delimiters in the URL template, such as":"
,"//"
,"?"
, and"#"
, unambiguously associate each replacement field to a URL component.All other characters are normalized to ensure the URL is valid:
url u = format("{}:{}","mailto","someone@example.com");assert(u.buffer() == "mailto:someone@example.com");assert(u.scheme() == "mailto");assert(u.path() == "someone@example.com");
url u = format("{}{}","mailto:","someone@example.com");assert(u.buffer() == "mailto%3Asomeone@example.com");assert(!u.has_scheme());assert(u.path() == "mailto:someone@example.com");assert(u.encoded_path() == "mailto%3Asomeone@example.com");
The functionformat_to
can be used to format URLs into any modifiable URL container.
static_url<50> u;format_to(u,"{}://{}:{}/rfc/{}","https","www.ietf.org",80,"rfc2396.txt");assert(u.buffer() == "https://www.ietf.org:80/rfc/rfc2396.txt");
As withstd::format
, positional and named arguments are supported.
url u = format("{0}://{2}:{1}/{3}{4}{3}","https",80,"www.ietf.org","abra","cad");assert(u.buffer() == "https://www.ietf.org:80/abracadabra");
Thearg
function can be used to associate names with arguments:
url u = format("https://example.com/~{username}", arg("username","mark"));assert(u.buffer() == "https://example.com/~mark");
A second overload based onstd::initializer_list
is provided for bothformat
andformat_to
.
These overloads can help with lists of named arguments:
boost::core::string_view fmt ="{scheme}://{host}:{port}/{dir}/{file}";url u = format(fmt, {{"scheme","https"}, {"port",80}, {"host","example.com"}, {"dir","path/to"}, {"file","file.txt"}});assert(u.buffer() == "https://example.com:80/path/to/file.txt");
The complete library documentation is available online atboost.org.
This library wouldn’t be where it is today without the help ofPeter Dimov, for design advice and general assistance.
Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy athttps://www.boost.org/LICENSE_1_0.txt)
About
Boost.URL is a library for manipulating Uniform Resource Identifiers (URIs) and Locators (URLs).