The official "URL syntax" is primarily defined in these two differentspecifications:
RFC 3986 is the earlier one, and curl has always tried to adhere tothat one (since it shipped in January 2005).
The WHATWG URL spec was written later, is incompatible with the RFC3986 and changes over time.
URL parsers as implemented in browsers, libraries and tools usuallyopt to support one of the mentioned specifications. Bugs, differences ininterpretations and the moving nature of the WHATWG spec does howevermake it unlikely that multiple parsers treat URLs the same way.
Due to the inherent differences between URL parser implementations,it is considered a security risk to mix different implementations andassume the same behavior.
For example, if you use one parser to check if a URL uses a goodhostname or the correct auth field, and then pass on that same URL to asecond parser, there is always a risk it treats the same URLdifferently. There is no right and wrong in URL land, only differencesof opinions.
libcurl offers a separate API to its URL parser for this reason,among others.
Applications may at times find it convenient to allow users tospecify URLs for various purposes and that string would then end up fedto curl. Getting a URL from an external untrusted party and using itwith curl brings several security concerns:
If you have an application that runs as or in a serverapplication, getting an unfiltered URL can trick your application toaccess a local resource instead of a remote resource. Protectingyourself against localhost accesses is hard when accepting user providedURLs.
Such custom URLs can access other ports than you planned as portnumbers are part of the regular URL format. The combination of a localhost and a custom port number can allow external users to play trickswith your local services.
Such a URL might use other schemes than you thought of or plannedfor.
curl recognizes a URL syntax that we call "RFC 3986 plus". It isgrounded on the well established RFC 3986 to make sure previouslywritten command lines and curl using scripts remain working.
curl's URL parser allows a few deviations from the spec in order tointer-operate better with URLs that appear in the wild.
A URL provided to curl cannot contain spaces. They need to beprovided URL encoded to be accepted in a URL by curl.
An exception to this rule:Location:
response headersthat indicate to a client where a resource has been redirected to,sometimes contain spaces. This is a violation of RFC 3986 but is fine inthe WHATWG spec. curl handles these by re-encoding them to%20
.
Byte values in a provided URL that are outside of the printable ASCIIrange are percent-encoded by curl.
An absolute URL always starts with a "scheme" followed by a colon.For all the schemes curl supports, the colon must be followed by twoslashes according to RFC 3986 but not according to the WHATWG spec -which allows one to infinity amount.
curl allows one, two or three slashes after the colon to still beconsidered a valid URL.
curl supports "URLs" that do not start with a scheme. This is notsupported by any of the specifications. This is a shortcut to enteringURLs that was supported by browsers early on and has been mimicked bycurl.
Based on what the hostname starts with, curl "guesses" what protocolto use:
ftp.
means FTPdict.
means DICTldap.
means LDAPimap.
means IMAPsmtp.
means SMTPpop3.
means POP3The curl command line tool supports "globbing" of URLs. It means thatyou can create ranges and lists using[N-M]
and{one,two,three}
sequences. The letters used for this([]{}
) are reserved in RFC 3986 and can therefore notlegitimately be part of such a URL.
They are however not reserved or special in the WHATWG specification,so globbing can mess up such URLs. Globbing can be turned off for suchoccasions (using--globoff
).
A URL may consist of the following components - many of them areoptional:
[scheme][divider][userinfo][hostname][port number][path][query][fragment]
Each component is separated from the following component with adivider character or string.
For example, this could look like:
http://user:password@www.example.com:80/index.html?foo=bar#top
The scheme specifies the protocol to use. A curl build can support afew or many different schemes. You can limit what schemes curl shouldaccept.
curl supports the following schemes on URLs specified to transfer.They are matched case insensitively:
dict
,file
,ftp
,ftps
,gopher
,gophers
,http
,https
,imap
,imaps
,ldap
,ldaps
,mqtt
,pop3
,pop3s
,rtmp
,rtmpe
,rtmps
,rtmpt
,rtmpte
,rtmpts
,rtsp
,smb
,smbs
,smtp
,smtps
,telnet
,tftp
When the URL is specified to identify a proxy, curl recognizes thefollowing schemes:
http
,https
,socks4
,socks4a
,socks5
,socks5h
,socks
The userinfo field can be used to set username and password forauthentication purposes in this transfer. The use of this field isdiscouraged since it often means passing around the password in plaintext and is thus a security risk.
URLs for IMAP, POP3 and SMTP also supportlogin options aspart of the userinfo field. They are provided as a semicolon after thepassword and then the options.
The hostname part of the URL contains the address of the server thatyou want to connect to. This can be the fully qualified domain name ofthe server, the local network name of the machine on your network or theIP address of the server or machine represented by either an IPv4 orIPv6 address (within brackets). For example:
http://www.example.com/http://hostname/http://192.168.0.1/http://[2001:1890:1112:1::20]/
Starting in curl 7.77.0, curl uses loopback IP addresses for the namelocalhost
:127.0.0.1
and::1
. Itdoes not resolve the name using the resolver functions.
This is done to make sure the host accessed is truly the localhost -the local machine.
If curl was built with International Domain Name (IDN) support, itcan also handle hostnames using non-ASCII characters.
When built with libidn2, curl uses the IDNA 2008 standard. This isequivalent to the WHATWG URL spec, but differs from certain browsersthat use IDNA 2003 Transitional Processing. The two standards have ahuge overlap but differ slightly, perhaps most famously in how they dealwith the German "double s" (ß
).
When WinIDN is used, curl uses IDNA 2003 Transitional Processing,like the rest of Windows.
If there is a colon after the hostname, that should be followed bythe port number to use. 1 - 65535. curl also supports a blank portnumber field - but only if the URL starts with a scheme.
If the port number is not specified in the URL, curl uses a defaultport number based on the provide scheme:
DICT 2628, FTP 21, FTPS 990, GOPHER 70, GOPHERS 70, HTTP 80, HTTPS443, IMAP 132, IMAPS 993, LDAP 369, LDAPS 636, MQTT 1883, POP3 110,POP3S 995, RTMP 1935, RTMPS 443, RTMPT 80, RTSP 554, SCP 22, SFTP 22,SMB 445, SMBS 445, SMTP 25, SMTPS 465, TELNET 23, TFTP 69
The path part of an FTP request specifies the file to retrieve andfrom which directory. If the file part is omitted then libcurl downloadsthe directory listing for the directory specified. If the directory isomitted then the directory listing for the root / home directory isreturned.
FTP servers typically put the user in its "home directory" afterlogin, which then differs between users. To explicitly specify the rootdirectory of an FTP server, start the path with double slash//
or/%2f
(2F is the hexadecimal value of theASCII code for the slash).
When aFILE://
URL is accessed on Windows systems, itcan be crafted in a way so that Windows attempts to connect to a(remote) machine when curl wants to read or write such a path.
curl only allows the hostname part of a FILE URL to be one out ofthese three alternatives:localhost
,127.0.0.1
or blank ("", zero characters). Anything else makes curl fail to parsethe URL.
curl accepts that the FILE URL's path starts with a "drive letter".That is a single lettera
toz
followed by acolon or a pipe character (|
).
The Windows operating system itself converts some file accesses toperform network accesses over SMB/CIFS, through several different filepath patterns. This way, afile://
URL passed to curlmight be converted into a network access inadvertently andunknowingly to curl. This is a Windows feature curl cannot control ordisable.
The path part of an IMAP request not only specifies the mailbox tolist or select, but can also be used to check theUIDVALIDITY
of the mailbox, to specify theUID
,SECTION
andPARTIAL
octetsof the message to fetch and to specify what messages to search for.
A top level folder list:
imap://user:password@mail.example.com
A folder list on the user's inbox:
imap://user:password@mail.example.com/INBOX
Select the user's inbox and fetch message withuid = 1
:
imap://user:password@mail.example.com/INBOX/;UID=1
Select the user's inbox and fetch the first message in the mailbox:
imap://user:password@mail.example.com/INBOX/;MAILINDEX=1
Select the user's inbox, check theUIDVALIDITY
of themailbox is 50 and fetch message 2 if it is:
imap://user:password@mail.example.com/INBOX;UIDVALIDITY=50/;UID=2
Select the user's inbox and fetch the text portion of message 3:
imap://user:password@mail.example.com/INBOX/;UID=3/;SECTION=TEXT
Select the user's inbox and fetch the first 1024 octets of message4:
imap://user:password@mail.example.com/INBOX/;UID=4/;PARTIAL=0.1024
Select the user's inbox and check for NEW messages:
imap://user:password@mail.example.com/INBOX?NEW
Select the user's inbox and search for messages containing "shadows"in the subject line:
imap://user:password@mail.example.com/INBOX?SUBJECT%20shadows
Searching via the query part of the URL?
is a searchrequest for the results to be returned as message sequence numbers(MAILINDEX
). It is possible to make a search request forresults to be returned as unique ID numbers (UID
) by usinga custom curl request via-X
.UID
numbers areunique per session (and multiple sessions whenUIDVALIDITY
is the same). For example, if you are searching for"foo bar"
in header+body (TEXT
) and you wantthe matchingMAILINDEX
numbers returned then you couldsearch via URL:
imap://user:password@mail.example.com/INBOX?TEXT%20%22foo%20bar%22
If you want matchingUID
numbers you have to use acustom request:
imap://user:password@mail.example.com/INBOX -X "UID SEARCH TEXT \"foo bar\""
For more information about IMAP commands please see RFC 9051. Formore information about the individual components of an IMAP URL pleasesee RFC 5092.
FETCH
by message sequencenumber whenUID
was specified in the URL. That was a bugfixed in 7.62.0, which addedMAILINDEX
toFETCH
by mail sequence number.The path part of a LDAP request can be used to specify the:Distinguished Name, Attributes, Scope, Filter and Extension for a LDAPsearch. Each field is separated by a question mark and when that fieldis not required an empty string with the question mark separator shouldbe included.
Search for theDN
asMy Organization
:
ldap://ldap.example.com/o=My%20Organization
the same search but only returnpostalAddress
attributes:
ldap://ldap.example.com/o=My%20Organization?postalAddress
Search for an emptyDN
and request information about therootDomainNamingContext
attribute for an Active Directoryserver:
ldap://ldap.example.com/?rootDomainNamingContext
For more information about the individual components of a LDAP URLplease seeRFC4516.
The path part of a POP3 request specifies the message ID to retrieve.If the ID is not specified then a list of waiting messages is returnedinstead.
The path part of an SCP URL specifies the path and file to retrieveor upload. The file is taken as an absolute path from the root directoryon the server.
To specify a path relative to the user's home directory on theserver, prepend~/
to the path portion.
The path part of an SFTP URL specifies the file to retrieve orupload. If the path ends with a slash (/
) then a directorylisting is returned instead of a file. If the path is omitted entirelythen the directory listing for the root / home directory isreturned.
The path part of a SMB request specifies the file to retrieve andfrom what share and directory or the share to upload to and as such, maynot be omitted. If the username is embedded in the URL then it mustcontain the domain name and as such, the backslash must be URL encodedas %2f.
When uploading to SMB, the size of the file needs to be known aheadof time, meaning that you can upload a file passed to curl over a pipelike stdin.
curl supports SMB version 1 (only)
The path part of a SMTP request specifies the hostname to presentduring communication with the mail server. If the path is omitted, thenlibcurl attempts to resolve the local computer's hostname. However, thismay not return the fully qualified domain name that is required by somemail servers and specifying this path allows you to set an alternativename, such as your machine's fully qualified domain name, which youmight have obtained from an external function such as gethostname orgetaddrinfo.
The default smtp port is 25. Some servers use port 587 as analternative.
There is no official URL spec for RTMP so libcurl uses the URL syntaxsupported by the underlying librtmp library. It has a syntax where itwants a traditional URL, followed by a space and a series ofspace-separatedname=value
pairs.
While space is not typically a "legal" letter, libcurl accepts them.When a user wants to pass in a#
(hash) character it istreated as a fragment and it gets cut off by libcurl if providedliterally. You have to escape it by providing it as backslash and itsASCII value in hexadecimal:\23
.