990

When is a space in a URL encoded to+, and when is it encoded to%20?

Cole Tobin's user avatar
Cole Tobin
9,44815 gold badges52 silver badges77 bronze badges
askedOct 27, 2009 at 23:23
BC.'s user avatar
0

6 Answers6

538

FromWikipedia (emphasis and link added):

When data that has been entered into HTML forms is submitted, the form field names and values are encoded and sent to the server in an HTTP request message using method GET or POST, or, historically, via email.The encoding used by default is based on a very early version of the general URI percent-encoding rules, with anumber of modifications such as newline normalization and replacing spaces with "+" instead of "%20". The MIME type of data encoded this way is application/x-www-form-urlencoded, and it is currently defined (still in a very outdated manner) in the HTML and XForms specifications.

So, thereal percent encoding uses%20 while form data in URLs is in a modified form that uses+. So you're most likely to only see+ in URLs in the query string after an?.

Timothy Jones's user avatar
Timothy Jones
22.3k6 gold badges65 silver badges95 bronze badges
answeredOct 27, 2009 at 23:26
Joey's user avatar
Sign up to request clarification or add additional context in comments.

14 Comments

So + encoding would technically be multipart/form-data encoding, while percent encoding is application/x-www-form-urlencoded?
@BC: no -multipart/form-data uses MIME encoding;application/x-www-form-urlencoded uses+ and properly encoded URIs use%20.
"So you're most likely to only see + in URLs in the query string after an ?" Is an understatement. You should never see "+" in the path part of the URL because it will not do what you expect (space).
So basically: Target of GET submission ishttp://www.bing.com/search?q=hello+world and a resource with space in the namehttp://camera.phor.net/cameralife/folders/2012/2012-06%20Pool%20party/
Note that for email links, you do need %20 and not + after the ?. For example,mailto:[email protected]?subject=I%20need%20help. If you tried that with +, the email will open with +es instead of spaces.
|
448

This confusion is because URLs are still 'broken' to this day.

Froma blog post:

Take "http://www.google.com" for instance. This is a URL. A URL is a Uniform Resource Locator and is really a pointer to a web page (in most cases). URLs actually have a very well-defined structure since the first specification in 1994.

We can extract detailed information about the "http://www.google.com" URL:

+---------------+-------------------+|      Part     |      Data         |+---------------+-------------------+|  Scheme       | http              ||  Host         | www.google.com    |+---------------+-------------------+

If we look at a more complex URL such as:

"https://bob:[email protected]:8080/file;p=1?q=2#third"

we can extract the following information:

+-------------------+---------------------+|        Part       |       Data          |+-------------------+---------------------+|  Scheme           | https               ||  User             | bob                 ||  Password         | bobby               ||  Host             | www.lunatech.com    ||  Port             | 8080                ||  Path             | /file;p=1           ||  Path parameter   | p=1                 ||  Query            | q=2                 ||  Fragment         | third               |+-------------------+---------------------+https://bob:[email protected]:8080/file;p=1?q=2#third\___/   \_/ \___/ \______________/ \__/\_______/ \_/ \___/  |      |    |          |          |      | \_/  |    |Scheme User Password    Host       Port  Path |   | Fragment        \_____________________________/       | Query                       |               Path parameter                   Authority

The reserved characters are different for each part.

For HTTP URLs, a space in a path fragment part has to be encoded to "%20" (not, absolutely not "+"), while the "+" character in the path fragment part can be left unencoded.

Now in the query part, spaces may be encoded to either "+" (for backwards compatibility: do not try to search for it in the URI standard) or "%20" while the "+" character (as a result of this ambiguity) has to be escaped to "%2B".

This means that the "blue+light blue" string has to be encoded differently in the path and query parts:

"http://example.com/blue+light%20blue?blue%2Blight+blue".

From there you can deduce that encoding a fully constructed URL is impossible without a syntactical awareness of the URL structure.

This boils down to:

You should have%20 before the? and+ after.

Source

Peter Mortensen's user avatar
Peter Mortensen
31.4k22 gold badges110 silver badges134 bronze badges
answeredApr 29, 2015 at 15:36
Matas Vaitkevicius's user avatar

11 Comments

>> you should have %20 before the ? and + after Sorry for the silly question. I know a bit somehow that hashtag parameter is used after "?" question mark parameter. Though it is somehow different because using "#" does not reload the page. But I've been trying to use %20 and + sign after the "#" hashtag, and it seems not working. Which one needs to be used after "#"?
@Philcyb You might wanna read thisen.wikipedia.org/wiki/Percent-encoding
Does the query part actually have an "official" standard? I thought basically that part is application specific. 99.99% of apps usekey1=value1&key1=value2 where keys and values are encoded with whatever rulesencodeURIComponent follow but AFAIK the contents of the query part is entirely 100% up to the app. Other then it only goes to the first# there's no official encoding.
Actually, I just took a look at the LunaTech blog article, which you kindly referenced, and the take-home message seems to be more like:You must use %20 and not + before the?, but after the? it is simply a matter of taste. For the love of God, people, just always use the percent sign-based encoding and clear out some brain space for more important stuff.
Wow man. I have to say that graph in ASCII looks cool.
|
36

A space may be encoded as+ only in the "application/x-www-form-urlencoded" content-type key-value pairs query part of a URL. In my opinion, this is optional, not mandatory. In other parts of URLs, spaces are encoded as%20.

It's better to always encode spaces as%20 rather than as+, even in the query part of a URL. This is because the HTML specification (RFC 1866) specifies that space characters should be encoded as+ inapplication/x-www-form-urlencoded content-type key-value pairs (see paragraph 8.2.1, subparagraph 1).

This method of encoding form data is also specified in later version of the HTML standard. For example, relevant paragraphs aboutapplication/x-www-form-urlencoded can be found in the HTML 4.01 Specification, and subsequent specifications.

Here is a sample string in a URL where the HTML specification allows encoding spaces as pluses:http://example.com/over/there?name=foo+bar.Only after "?", spaces can be replaced by pluses. In other cases, spaces should be encoded as%20. However, since it can be challenging to determine the correct context, it's best practice to never encode spaces as+.

I recommend percent-encoding all characters except those classified as "unreserved" inRFC 3986, p.2.3:

unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"

The implementation of this encoding depends on the programming language you choose.

If your URL contains national characters, first encode them to UTF-8, and then percent-encode the result of the UTF-8 encoding.

answeredOct 27, 2016 at 19:29
Maxim Masiutin's user avatar

5 Comments

Why should anyone care about HTML specification if the requested resource isn't HTML? I've seen "+" in some Web APIs which don't respond with HTML e.g. you request a pdf. I consider it wrong that they dont use "%20".
@TheincredibleJan, I agree with you. That's what my reply is about.
@MaximMasiutin When your answer says "This is a MAY, not a MUST", which spec are you referring to? I'm struggling to find a spec that has it as a may. Inw3.org/TR/1999/REC-html401-19991224/interact/… using '+' (in the query section) is within a 'must' section of the spec.
@JosephH - thank you for your note. It is my persional opinion about MAY. I have edited the post. What I meant is that HTML specification you qouted defines "+", but in the URL context, other rules apply, which permit encoding spaces as %20 also.
Agree! decodeURIComponent('+') returns +. So if a space is encoded into +, the server cannot decode it into space.
27

I would recommend%20.

Are you hard-coding them?

This is not very consistent across languages, though.If I'm not mistaken, in PHPurlencode() treats spaces as+ whereas Python'surlencode() treats them as%20.

EDIT:

It seems I'm mistaken. Python'surlencode() (at least in 2.7.2) usesquote_plus() instead ofquote() and thus encodes spaces as "+".It seems also that the W3C recommendation is the "+" as per here:http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.1

And in fact, you can follow this interesting debate on Python's own issue tracker about what to use to encode spaces:http://bugs.python.org/issue13866.

EDIT #2:

I understand that the most common way of encoding " " is as "+", but just a note, it may be just me, but I find this a bit confusing:

import urllibprint(urllib.urlencode({' ' : '+ '})>>> '+=%2B+'
the Tin Man's user avatar
the Tin Man
161k44 gold badges222 silver badges308 bronze badges
answeredOct 27, 2009 at 23:31
Rui Vieira's user avatar

7 Comments

Not hardcoding. Trying to determine from an aesthetic perspective what my urls containing spaces will look like.
Hi, I am confused too, When user submit the html form, how the form encode the space ? with which character? Is the result browser-dependent?
And theURLEncoder.encode() method in Java converts it in+ as well.
And then the question arises as to how to treat encoding in the body of a POST request: "Content-Type: application/x-www-form-urlencoded" where the parameters are in the form of "a=b&c=d", but aren't in a URL at all, just the body of the "document." They made a real mess out of this issue, and it's darned difficult to find definitive answers.
Perls uri_escape() treats them as %20
|
24

To summarize the (somewhat conflicting) answers here, I think it can be boiled down to:

standard+%20
URLnoyes
Query Stringyesyes
Form Paramsyesno
mailto querynoyes

So historically I think what happened is:

  1. The RFC specified a pretty clear standard about the form of URLs and how they are encoded. In this context the query is just a "string", there is no specification how key/value pairs should be encoded
  2. The HTTP guys put out a standard of how key/value pairs are to be encoded in form params, and borrowed from the URL encoding standard, except that spaces should be encoded as+.
  3. The web guys said: cool we have a way to encode key/value pairs let's put that into the URL query string

Result: We end up with two different ways how to encode spaces in a URL depending on which part you're talking about. But it doesn't even violate the URL standard. From URL perspective the "query" is just a blackbox. If you want to use other encodings there besides percent encoding: knock yourself out.

But as the email example shows it can be problematic to borrow from the form-params implementation for an URL query string. So ultimately using %20 is safer but there might not be out of the box library support for it.

answeredJul 1, 2022 at 19:09
David Ongaro's user avatar

2 Comments

Amazing answer, thanks, but this part was vague, would you explain what example you're talking about? "as the email example shows it can be problematic to borrow from the form-params implementation for an URL query string."
@aderchox I'm referring to this comment:stackoverflow.com/questions/1634271/…. Basically, e-mail clients don't accept the + encoding in general. Thanks for your praise, but I'm not happy with my answer, since it contains some inaccuracies. It wasn't the "HTTP guys" who introduced the + encoding, but the HTML guys (see the HTML <form> tag spec). I plan to fix my answer and provide some references soon.
3

As I was surprised that nobody did cite the actualRFC 3986 on "percent encoding", I'm adding my own answer here:

As the aforementioned RFC does not include any reference of encoding spaces as+, I guess using%20 is the way to go today.

For example, "%20" is the percent-encoding for the binary octet "00100000" (ABNF: %x20), which in US-ASCII corresponds to the space character (SP).

answeredOct 18, 2023 at 11:44
U. Windl's user avatar

2 Comments

Yes. With the only exception forx-www-form-urlencoded.
You are referring toapplication/x-www-form-urlencoded in W3C's "Form content types"? I wonder what the motivation behind actually was (saving two bytes per space?) when they state "Space characters are replaced by '+', and then reserved characters are escaped as described in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by '%HH', a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as "CR LF" pairs (i.e., '%0D%0A').".
Protected question. To answer this question, you need to have at least 10 reputation on this site (not counting theassociation bonus). The reputation requirement helps protect this question from spam and non-answer activity.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.