Movatterモバイル変換


[0]ホーム

URL:


homepage

Issue25400

This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title:robotparser doesn't return crawl delay for default entry
Type:behaviorStage:resolved
Components:Library (Lib)Versions:Python 3.7, Python 3.6
process
Status:closedResolution:fixed
Dependencies:Superseder:
Assigned To:Nosy List: berker.peksag, pwirtz, python-dev
Priority:normalKeywords:patch

Created on2015-10-14 01:21 bypwirtz, last changed2022-04-11 14:58 byadmin. This issue is nowclosed.

Files
File nameUploadedDescriptionEdit
robotparser_crawl_delay.patchpwirtz,2015-10-14 01:21patchreview
robotparser_crawl_delay_v2.patchpwirtz,2015-10-14 18:35review
issue25400_v2.diffberker.peksag,2016-09-18 15:36review
issue25400_v3.diffberker.peksag,2016-09-18 16:01review
Pull Requests
URLStatusLinkedEdit
PR 552closeddstufft,2017-03-31 16:36
Messages (8)
msg252971 -(view)Author: Peter Wirtz (pwirtz)*Date: 2015-10-14 01:21
After changesethttp://hg.python.org/lookup/dbed7cacfb7e, calling the crawl_delay method for a robots.txt files that has a crawl-delay for * useragents always returns None.Ex:Python 3.6.0a0 (default:1aae9b6a6929+, Oct  9 2015, 22:08:05)[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwinType "help", "copyright", "credits" or "license" for more information.>>> import urllib.robotparser>>> parser = urllib.robotparser.RobotFileParser()>>> parser.set_url('https://www.carthage.edu/robots.txt')>>> parser.read()>>> parser.crawl_delay('test_robotparser')>>> parser.crawl_delay('*')>>> print(parser.default_entry.delay)120>>>Excerpt fromhttps://www.carthage.edu/robots.txt:User-agent: *Crawl-Delay: 120Disallow: /cgi-binI have written a patch that solves this. With patch, output is:Python 3.6.0a0 (default:1aae9b6a6929+, Oct  9 2015, 22:08:05)[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwinType "help", "copyright", "credits" or "license" for more information.>>> import urllib.robotparser>>> parser = urllib.robotparser.RobotFileParser()>>> parser.set_url('https://www.carthage.edu/robots.txt')>>> parser.read()>>> parser.crawl_delay('test_robotparser')120>>> parser.crawl_delay('*')120>>> print(parser.default_entry.delay)120>>>This also applies to the request_rate method.
msg252972 -(view)Author: Peter Wirtz (pwirtz)*Date: 2015-10-14 01:25
This fix breaks the unit tests though. I am not sure how to go about checking those as this would be my first contribution to python and an open source project in general.
msg253015 -(view)Author: Peter Wirtz (pwirtz)*Date: 2015-10-14 18:16
On further inspection of the tests, it appears that the way in which the tests are written, a test case can only be tested for one useragent at a time. I will attempt to work on the tests so work correctly. Any advice would be much appreciated.
msg253016 -(view)Author: Berker Peksag (berker.peksag)*(Python committer)Date: 2015-10-14 18:22
Thanks for the patch Peter(and welcome to Python and open source development). I have a WIP patch to rewrite test_robotparser in a less magic way. So we can ignore test failures for now. I'll take a closer look to your patch.
msg253017 -(view)Author: Peter Wirtz (pwirtz)*Date: 2015-10-14 18:35
Ok, for the mean time, I reworked the test so it appears to test correctly and tests passes. There does seem to be some magic, so I do hope I did not overlook anything. Here is the new patch.
msg275776 -(view)Author: Berker Peksag (berker.peksag)*(Python committer)Date: 2016-09-11 11:55
I've now updatedLib/test/test_robotparser.py (issue 25497) Peter, do you have time to update your patch? Thanks!
msg276897 -(view)Author: Berker Peksag (berker.peksag)*(Python committer)Date: 2016-09-18 15:36
Here's an updated patch.
msg276900 -(view)Author: Roundup Robot (python-dev)(Python triager)Date: 2016-09-18 17:17
New changesetd5d910cfd288 by Berker Peksag in branch '3.6':Issue#25400: RobotFileParser now correctly returns default values for crawl_delay and request_ratehttps://hg.python.org/cpython/rev/d5d910cfd288New changeset911070065e38 by Berker Peksag in branch 'default':Issue#25400: Merge from 3.6https://hg.python.org/cpython/rev/911070065e38
History
DateUserActionArgs
2022-04-11 14:58:22adminsetgithub: 69586
2017-03-31 16:36:30dstufftsetpull_requests: +pull_request1034
2016-09-18 17:18:17berker.peksagsetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2016-09-18 17:17:29python-devsetnosy: +python-dev
messages: +msg276900
2016-09-18 16:01:20berker.peksagsetfiles: +issue25400_v3.diff
2016-09-18 15:36:23berker.peksagsetfiles: +issue25400_v2.diff

messages: +msg276897
versions: + Python 3.7
2016-09-11 11:55:50berker.peksagsetmessages: +msg275776
2015-10-14 18:35:14pwirtzsetfiles: +robotparser_crawl_delay_v2.patch

messages: +msg253017
2015-10-14 18:22:35berker.peksagsetmessages: +msg253016
stage: patch review
2015-10-14 18:16:25pwirtzsetmessages: +msg253015
2015-10-14 09:10:17berker.peksagsetnosy: +berker.peksag
2015-10-14 01:25:01pwirtzsetmessages: +msg252972
2015-10-14 01:21:42pwirtzcreate
Supported byThe Python Software Foundation,
Powered byRoundup
Copyright © 1990-2022,Python Software Foundation
Legal Statements

[8]ページ先頭

©2009-2026 Movatter.jp