Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.

License

NotificationsYou must be signed in to change notification settings

internetarchive/heritrix3

Repository files navigation

Maven CentralDockerJavadocLICENSE

Introduction

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix (sometimes spelled heretrix, or misspelled or missaid as heratrix/heritix/heretix/heratix) is an archaic word for heiress (woman who inherits). Since our crawler seeks to collect and preserve the digital artifacts of our culture for the benefit of future researchers and generations, this name seemed apt.

Crawl Operators!

Heritrix is designed to respect therobots.txt exclusion directives andMETA nofollow tags. Please consider theload your crawl will place on seed sites and set politeness policies accordingly. Also, always identify your crawl with contact information in theUser-Agent so sites that may be adversely affected by your crawl can contact you or adapt their server behavior accordingly.

Documentation

Developer Documentation

Latest Releases

Information about releases can be foundhere.

License

Heritrix is free software; you can redistribute it and/or modify it under the terms of theApache License, Version 2.0

Some individual source code files are subject to or offered under other licenses. See the includedLICENSE.txt file for more information.

Heritrix is distributed with the libraries it depends upon. The libraries can be found under thelib directory in the release distribution, and are used under the terms of their respective licenses, which are included alongside the libraries in thelib directory.

About

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors46


[8]ページ先頭

©2009-2025 Movatter.jp