Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork9.7k
[BrowserKit] Adds support for meta refresh#27118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
4f443ee to8516b29Compare
nicolas-grekas left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thanks for submitting this PR.
I didn't review in depth, here are pure CS comments for now :)
| * | ||
| * @return string|false Redirect URI if response has a meta refresh | ||
| */ | ||
| protectedfunctionhasMetaRefresh() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
this method should be private and should returnstring|null (thus, the docblock could/should be removed, and a return type added:?string instead.)
The name should be getMetaRefresh as one would expecthas* methods to return a bool IMHO.
| foreach ($deprecations ?unserialize($deprecations) :array()as$deprecation) { | ||
| if ($deprecation[0]) { | ||
| trigger_error($deprecation[1],E_USER_DEPRECATED); | ||
| @trigger_error($deprecation[1],E_USER_DEPRECATED); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
should be reverted (fabbot failure is a false positive)
| if ($this->followRedirects) { | ||
| // Check for meta refresh redirect. | ||
| $redirect =$this->hasMetaRefresh(); | ||
| if ($redirect) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
according to previous comment:if ($this->followRedirects && null !== $redirect = $this->getMetaRefresh()) {
(removing one indentation level)
jhedstrom commentedMay 2, 2018
Thanks@nicolas-grekas for the review! I've made those changes, and also made the test a bit more robust (2 meta tags), and the logic in |
| /** | ||
| * Determines if the response has a meta refresh. | ||
| * | ||
| * @return string|null Redirect URI if response has a meta refresh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Should be removed as it duplicated the signature. The above sentence should be updated instead ("determines" is not accurate anymore.)
| { | ||
| $metaRefresh =$this->getCrawler()->filter('head meta[http-equiv="refresh"]'); | ||
| foreach ($metaRefresh->extract(array('content'))as$content) { | ||
| if (preg_match('/.*URL=(.*)$/i',$content,$m)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
about the regexp, what about restricting to zero timeouts? and be more strict about it, eg:
if (preg_match('/^\s*0\s*;\s*URL\s*=\s*(?|'([^']++)|"([^"]++)|([^'"].*))/i',$content,$m)) {return str_replace("\t\r\n",'',rtrim($m[1]);}
parsing logic extracted fromhttps://dev.w3.org/html5/spec-preview/the-meta-element.html#attr-meta-http-equiv-refresh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thanks for this regex! It helped out a lot :)
| return$m[1]; | ||
| } | ||
| } | ||
| returnnull; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
missing blank line just before this one
| * | ||
| * @return string|null Redirect URI if response has a meta refresh | ||
| */ | ||
| privatefunctiongetMetaRefresh(): ?string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
getMetaRefreshUrl?
| publicfunctiontestFollowMetaRefresh() | ||
| { | ||
| $content ='<html><head><meta http-equiv="Refresh" content="4" /><meta http-equiv="refresh" content="0; URL=http://www.example.com/redirected"/></head></html>'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
could be great to test with more refresh URLs formats (as accepted by HTML5)
nicolas-grekas commentedMay 2, 2018
Thanks for the first round of changes, here is a next step. Should be good on my side after it. |
jhedstrom commentedMay 2, 2018
I've made the changes suggested above. I included the only-zero time change, but the spec does allow for non-zero time redirects, so that may come up again :) I'm on the fence as to whether this is a bug fix or a feature. If folks think it is more of a feature, I can update the CHANGELOG as needed. |
nicolas-grekas left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Lgtm thanks. New feature or bugfix that's not obvious to me also. Anyone else to help decide?
codedokode commentedMay 3, 2018
I wanted to note that some sites use a combination of <noscript><metahttp-equiv="refresh"content="0; URL=/badbrowser.php"></noscript> This change will make impossible to access such sites with Crawler. Therefore I would suggest to add an option to enable this and make the option disabled by default. Or ignore Also, the Refresh can be sent as a HTTP header. |
jhedstrom commentedMay 3, 2018 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
That's actually the main motivation for this, tofollow those inside I think a separate option sounds reasonable here, since sometimes folks would want to follow them, and other times they would not if there was no support for non-JS sessions. |
| /** | ||
| * Sets whether to automatically follow meta refresh redirects or not. | ||
| * | ||
| * @param bool $followMetaRefresh Whether to follow meta refresh redirects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
the description of the argument is not really needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I've removed the description.
6d4fc48 toe2f175eCompare
fabpot left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Looks good to me (I've made one comment about browser history, everything else is boring CS). IMHO, this is a new feature.
| /** | ||
| * Sets whether to automatically follow meta refresh redirects or not. | ||
| * | ||
| * @param bool $followMetaRefresh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This can be removed. The rule is that we don't add phpdocs when it does not add anything that is not already part of the signature.
| } | ||
| /** | ||
| * Get the meta refresh URL if the response has one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Can be removed, method name is clear enough (and private).
| return$this->crawler =$this->createCrawlerFromContent($this->internalRequest->getUri(),$this->internalResponse->getContent(),$this->internalResponse->getHeader('Content-Type')); | ||
| $this->crawler =$this->createCrawlerFromContent($this->internalRequest->getUri(),$this->internalResponse->getContent(),$this->internalResponse->getHeader('Content-Type')); | ||
| // Check for meta refresh redirect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
final dot can be removed
| // Check for meta refresh redirect. | ||
| if ($this->followMetaRefresh &&null !==$redirect =$this->getMetaRefreshUrl()) { | ||
| $this->redirect =$redirect; | ||
| $this->crawler =$this->followRedirect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Do we also need$this->redirects[serialize($this->history->current())] = true; to add the URL in the browser history?
jhedstrom commentedMay 17, 2018
I've made the changes noted above, including adding the redirect to the browser history. |
fabpot commentedMay 18, 2018
Thank you@jhedstrom. |
This PR was squashed before being merged into the 4.2-dev branch (closes#27118).Discussion----------[BrowserKit] Adds support for meta refresh| Q | A| ------------- | ---| Branch? | master| Bug fix? | no| New feature? | yes| BC breaks? | no| Deprecations? | no| Tests pass? | yes| Fixed tickets |#27117 <!-- #-prefixed issue number(s), if any -->| License | MIT| Doc PR | symfony/symfony-docs#... <!-- required for new features -->This adds support for following redirects defined by the `http-equiv=refresh` meta tag, such as```<meta http-equiv="Refresh" content="0; URL=http://example.com/somewhere">```Additional background atjhedstrom/drupalextension#325 (comment)<!--Write a short README entry for your feature/bugfix here (replace this comment block.)This will help people understand your PR and can be used as a start of the Doc PR.Additionally: - Bug fixes must be submitted against the lowest branch where they apply (lowest branches are regularly merged to upper ones so they get the fixes too). - Features and deprecations must be submitted against the master branch.-->Commits-------1c64c82 [BrowserKit] Adds support for meta refresh
fabpot commentedMay 18, 2018
@jhedstrom Great first contribution! |
jhedstrom commentedMay 18, 2018
Thanks@fabpot! |
Uh oh!
There was an error while loading.Please reload this page.
This adds support for following redirects defined by the
http-equiv=refreshmeta tag, such asAdditional background atjhedstrom/drupalextension#325 (comment)