Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

A native Rust port of Google's robots.txt parser and matcher C++ library.

License

NotificationsYou must be signed in to change notification settings

Folyd/robotstxt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crates.ioDocs.rsApache 2.0

A native Rust port ofGoogle's robots.txt parser and matcher C++ library.

  • Native Rust port, no third-part crate dependency
  • Zero unsafe code
  • Preserves all behavior of original library
  • Consistent API with the original library
  • 100% google original test passed

Installation

[dependencies]robotstxt ="0.3.0"

Quick start

use robotstxt::DefaultMatcher;letmut matcher =DefaultMatcher::default();let robots_body ="user-agent: FooBot\n\                   disallow: /\n";assert_eq!(false, matcher.one_agent_allowed_by_robots(robots_body,"FooBot","https://foo.com/"));

About

Quoting the README from Google's robots.txt parser and matcher repo:

The Robots Exclusion Protocol (REP) is a standard that enables website owners to control which URLs may be accessed by automated clients (i.e. crawlers) through a simple text file with a specific syntax. It's one of the basic building blocks of the internet as we know it and what allows search engines to operate.

Because the REP was only a de-facto standard for the past 25 years, different implementers implement parsing of robots.txt slightly differently, leading to confusion. This project aims to fix that by releasing the parser that Google uses.

The library is slightly modified (i.e. some internal headers and equivalent symbols) production code used by Googlebot, Google's crawler, to determine which URLs it may access based on rules provided by webmasters in robots.txt files. The library is released open-source to help developers build tools that better reflect Google's robots.txt parsing and matching.

Craterobotstxt aims to be a faithful conversion, from C++ to Rust, of Google's robots.txt parser and matcher.

Testing

$ git clone https://github.com/Folyd/robotstxtCloning into 'robotstxt'...$ cd robotstxt/tests ...$ mkdir c-build && cd c-build...$ cmake .....$ make...$ make testRunning tests...Test project ~/robotstxt/tests/c-build    Start 1: robots-test1/1 Test #1: robots-test ......................   Passed    0.33 sec

License

The robotstxt parser and matcher Rust library is licensed under the terms of theApache license. SeeLICENSE for more information.

About

A native Rust port of Google's robots.txt parser and matcher C++ library.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp