21.10.`urllib.robotparser` — Parser for robots.txt¶

This module provides a single class,RobotFileParser, which answersquestions about whether or not a particular user agent can fetch a URL on theWeb site that published therobots.txt file. For more details on thestructure ofrobots.txt files, seehttp://www.robotstxt.org/orig.html.

classurllib.robotparser.RobotFileParser(url='')¶

This class provides methods to read, parse and answer questions about therobots.txt file aturl.

set_url(url)¶: Sets the URL referring to arobots.txt file.

read()¶: Reads therobots.txt URL and feeds it to the parser.

parse(lines)¶: Parses the lines argument.

can_fetch(useragent,url)¶: ReturnsTrue if theuseragent is allowed to fetch theurlaccording to the rules contained in the parsedrobots.txtfile.

mtime()¶: Returns the time therobots.txt file was last fetched. This isuseful for long-running web spiders that need to check for newrobots.txt files periodically.

modified()¶: Sets the time therobots.txt file was last fetched to the currenttime.

The following example demonstrates basic use of the RobotFileParser class.

>>>importurllib.robotparser>>>rp=urllib.robotparser.RobotFileParser()>>>rp.set_url("http://www.musi-cal.com/robots.txt")>>>rp.read()>>>rp.can_fetch("*","http://www.musi-cal.com/cgi-bin/search?city=San+Francisco")False>>>rp.can_fetch("*","http://www.musi-cal.com/")True

Previous topic

21.9.urllib.error — Exception classes raised by urllib.request

Next topic

21.11.http — HTTP modules

This Page

Quick search

Enter search terms or a module, class or function name.

Movatterモバイル変換

Navigation

21.10.`urllib.robotparser` — Parser for robots.txt¶

Previous topic

Next topic

This Page

Quick search

Navigation

Movatterモバイル変換

Navigation

21.10.urllib.robotparser — Parser for robots.txt¶

Previous topic

Next topic

This Page

Quick search

Navigation

21.10.`urllib.robotparser` — Parser for robots.txt¶