Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork9.7k
[Routing] Fix matching of utf8 params#42159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
carsonbot commentedJul 16, 2021
Hey! I see that this is your first PR. That is great! Welcome! Symfony has acontribution guide which I suggest you to read. In short:
Review the GitHub status checks of your pull request and try to solve the reported issues. If some tests are failing, try to see if they are failing because of this change. When two Symfony core team members approve this change, it will be merged and you will become an official Symfony contributor! I am going to sit back now and wait for the reviews. Cheers! Carsonbot |
| // Match all variables enclosed in "{}" and iterate over them. But we only want to match the innermost variable | ||
| // in case of nested "{}", e.g. {foo{bar}}. This in ensured because \w does not match "{" or "}" itself. | ||
| preg_match_all('#\{(!)?(\w+)\}#',$pattern,$matches, \PREG_OFFSET_CAPTURE | \PREG_SET_ORDER); | ||
| $routeParamsPattern =$needsUtf8 ?'#\{(!)?([\p{L}_]+)\}#u' :'#\{(!)?(\w+)\}#'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Numbers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Fixed, thanks
FoxprodevJul 16, 2021 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I am not sure 100% sure, but \w with u flag should be enough. Am I wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
\w doesn't support unicode characters.
Here is example:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
There is no unicode flag in your snippet
| }catch (ResourceNotFoundException$e) { | ||
| } | ||
| $this->assertEquals(['_route' =>'foo','bär' =>'baz'],$matcher->match('/foo/baz')); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
assertSame whenever possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'm not sure that it's possible to use assertSame here. $matcher->match() doesn't guarantee the order of elements in the array. And assertEquals ignores it.
| // Match all variables enclosed in "{}" and iterate over them. But we only want to match the innermost variable | ||
| // in case of nested "{}", e.g. {foo{bar}}. This in ensured because \w does not match "{" or "}" itself. | ||
| preg_match_all('#\{(!)?(\w+)\}#',$pattern,$matches, \PREG_OFFSET_CAPTURE | \PREG_SET_ORDER); | ||
| $routeParamsPattern =$needsUtf8 ?'#\{(!)?([\p{L}\d_]+)\}#u' :'#\{(!)?(\w+)\}#'; |
nicolas-grekasJul 19, 2021 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
there is no need to check for unicode support:\pL always works
this means that the regexp can unconditionally be:'#\{(!)?([\w\pL]++)\}#'
note that there are other occurrences of\w in this very file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Yes, I agree. Anyway, I found that I also need to make some fixes to support utf-8 characters in regex of compiled routes. (for php7.2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
@nicolas-grekas Maybe it's not so bad to split the logic.
Matching characters by Unicode property is not fast, because PCRE has to do a multistage table lookup in order to find a character's property. That is why the traditional escape sequences such as \d and \w do not use Unicode properties in PCRE by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
In non-unicode mode, PCRE doesn't use unicode tables.
FoxprodevJul 19, 2021 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
@nicolas-grekas in non-unicode mode \pL does not fully handle unicode characters too.https://www.phpliveregex.com/p/BbM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
When theu modifier is set (aka when$needsUtf8 is true, aka when theutf8 option is set), PCRE will use Unicode tables. It will use ASCII tables otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
So we still need to conditionally set u modifier, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
That's already done:
| $regexp .='u'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
But it doesn't work. We have to use the modifier u in this regex as well. Because otherwise route params won't be parsed and route will be determined as static instead of dynamic
FoxprodevJul 19, 2021 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Oh, that's the point. But it doesn't work in my test cases and I am still sure that we need to addu on currently discussed string.
Anyway I will wait for full PR then. Thanks for the time!
nicolas-grekas left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Seegrep '\\w' src/Symfony/Component/Routing/ -r
Change Regexp for routes with UTF-8 params.