Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork9.7k
[Routing] Add seamless support for unicode requirements#19604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
6385b48 to2098128Comparenicolas-grekas commentedAug 18, 2016
ping@Tobion |
c960657 commentedAug 19, 2016
I like how this approach is simpler than my own PR#19562. I think UTF-8 is the only reasonable encoding to use for URLs (because this is what browsers use when pretty-printing the URLs in the location field), and I assume most people will use UTF-8 internally in their data and source code, so I like how this patch favours UTF-8 while still allowing people to do otherwise if they want. I think the auto-detection of UTF-8 patterns is a bit too magic. E.g. whether Here are two examples that are explicit. They are not native PCRE syntax but reuse syntax used in PCRE.
According to themanual, using |
nicolas-grekas commentedAug 19, 2016 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I didn't know about the |
2098128 tode3a063Comparenicolas-grekas commentedAug 19, 2016
The prefix idea doesn't play well with route prefixing as done by loaders... |
stof commentedAug 19, 2016
Another solution is to use a route option to opt in unicode mode. Route options are precisely meant to give hints to the route compiler. |
nicolas-grekas commentedAug 19, 2016
@stof but that would be a DX nightmare: if you'd use unicode, you'd need to repeat yourself thousands of times. The current way is a bit magic, but really seamless: if you use any unicode chars or any unicode property, unicode is enabled. There is only one special case: the |
nicolas-grekas commentedAug 19, 2016
Doc PR added:symfony/symfony-docs#6890 |
de3a063 to3f0718bComparefabpot commentedAug 19, 2016
I don't really like the |
stof commentedAug 19, 2016
@fabpot what about my proposal of using my proposal ? @nicolas-grekas we could keep the autodetection of unicode. The option would be used instead of using a magic |
fabpot commentedAug 19, 2016
@stof Indeed, I think an option is better than an obscure convention. |
3f0718b to7d6c262Comparenicolas-grekas commentedAug 19, 2016
updated |
c12f06a tod52bda3Comparec960657 commentedAug 23, 2016 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
The patch uses the term “Unicode” in variable names etc., but wouldn't it be more correct to use “UTF-8” (the specific encoding we are supporting)? “Unicode” is a much wider concept. Note that the Unicode character properties escape sequences, Also, I think we disagree a bit on what exactly this feature does: In my eyes, this flag enabled UTF-8 support for regular expressions, i.e. basically adds the |
| * Added support for`bool`,`int`,`float`,`string`,`list` and`map` defaults in XML configurations. | ||
| * Added support for unicode requirements | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Unicode is a proper noun, so it should be capitalized.
| * | ||
| * Existing settings will be overridden. | ||
| * | ||
| * @param string $unicode Whether unicode matching is enforced or not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
@param bool
8276d4b toa1d4bc4Comparenicolas-grekas commentedAug 24, 2016 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I hesitated on this, and choose Unicode to match what the PCRE doc uses. On the web, Unicode is synonym for UTF-8. Nobody uses any other Unicode encoding there. And the PCRE doc says about "unicode properties", the "unicode" flag, etc. I thought it'd be better to make the vocabularies match.
Yes, yet it's undocumented (in the PHP doc at least, where on the contrary it's specifically documented under the "Unicode properties" chapter), and thus nobody knows what it does, esp. when considering high-ASCII chars.
Yes when given |
c960657 commentedAug 24, 2016 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
In most places, the PCRE man pages refer to "UTF-8 mode" or "UTF mode" (the library also supports UTF-16 and UTF-32).
On theman page it says: |
nicolas-grekas commentedAug 24, 2016
I never read the man page of pcre but I read many times the one on php.net. I guess I'm more like the usual PHP user on this one :) |
a1d4bc4 to02b1214Comparec960657 commentedAug 25, 2016
I'm not sure which parts of the PHP manual you are referring to. I couldn't find any occurrences of the phrase "Unicode mode" outside user comments. "UTF-8 mode" occurs a few places, e.g. in thedescription of PREG_BAD_UTF8_OFFSET_ERROR. Unicode mode: UTF-8 mode: Additionally, the work "Unicode" is used in connection with character classes, but as mentioned above they work with and without UTF-8 mode. |
02b1214 to8f62888Comparenicolas-grekas commentedAug 25, 2016
@c960657 thanks for you input. I've just renamed unicode to utf8 everywhere. I've also added a deprecation targeting the current magic: I propose to encourage people to explicitly enable the utf8 flag; throw a deprecation when they don't, and throw a LogicException in 4.0. |
c960657 commentedAug 25, 2016
Sounds like a good solution :) |
| $this->utf8 =$utf8; | ||
| } | ||
| publicfunctionisUtf8() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
do we need a new property on the route ? I suggested using options, which are already meant to provide hints to the route compiler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
That would complicate the transition: options are validated and you can't add an unsupported key (no forward compat). This means it wouldn't be possible to write routes in a forward/backward compatible manner in Yml/Xml/Annotations.
21c3992 to70c40f0Compare| * Available options: | ||
| * | ||
| * * compiler_class: A class name able to compile this route instance (RouteCompiler by default) | ||
| * * utf8: Whether UTF-8 matching is enforced ot not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
utf8 is now an option on the route. ping@stof (I don't know were I've got this idea that it wasn't possible...)
70c40f0 toe5cb1f4Comparestof commentedAug 25, 2016
👍 |
e5cb1f4 toa829d34Comparefabpot commentedAug 25, 2016
Thank you@nicolas-grekas. |
…s (nicolas-grekas)This PR was merged into the 3.2-dev branch.Discussion----------[Routing] Add seamless support for unicode requirements| Q | A| ------------- | ---| Branch? | master| Bug fix? | no| New feature? | yes| BC breaks? | no| Deprecations? | yes| Tests pass? | yes| Fixed tickets |#3629,#5236,#19562| License | MIT| Doc PR |symfony/symfony-docs#6890This PR adds unicode support to route matching and generation by automatically adding the `u` modifier to regexps that use either unicode characters or unicode enabled character classes (e.g. `\p...` `\x{...}` `\X`).As a side note, if one wants to match a single unicode character (vs a single byte), one should use `\PM` or `\X` instead of `.` *or* set the `unicode` parameter to true.Commits-------a829d34 [Routing] Add seamless support for unicode requirements
…rekas)This PR was merged into the master branch.Discussion----------[Routing] Add doc about unicode requirementsRef.symfony/symfony#19604Commits-------75ed392 Add doc about unicode requirements
Uh oh!
There was an error while loading.Please reload this page.
This PR adds unicode support to route matching and generation by automatically adding the
umodifier to regexps that use either unicode characters or unicode enabled character classes (e.g.\p...\x{...}\X).As a side note, if one wants to match a single unicode character (vs a single byte), one should use
\PMor\Xinstead of.or set theunicodeparameter to true.