- Notifications
You must be signed in to change notification settings - Fork76
ArchiveBot, an IRC bot for archiving websites
License
NotificationsYou must be signed in to change notification settings
ArchiveTeam/ArchiveBot
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
1. ArchiveBot <SketchCow> Coders, I have a question. <SketchCow> Or, a request, etc. <SketchCow> I spent some time with xmc discussing something we could do to make things easier around here. <SketchCow> What we came up with is a trigger for a bot, which can be triggered by people with ops. <SketchCow> You tell it a website. It crawls it. WARC. Uploads it to archive.org. Boom. <SketchCow> I can supply machine as needed. <SketchCow> Obviously there's some sanitation issues, and it is root all the way down or nothing. <SketchCow> I think that would help a lot for smaller sites <SketchCow> Sites where it's 100 pages or 1000 pages even, pretty simple. <SketchCow> And just being able to go "bot, get a sanity dump"2. More infoArchiveBot has two major backend components: the control node, whichruns the IRC interface and bookkeeping programs, and the crawlers, whichdo all the Web crawling. ArchiveBot users communicate with ArchiveBotby issuing commands in an IRC channel.User's guide:http://archivebot.readthedocs.org/en/latest/Control node installation guide: INSTALL.backendCrawler installation guide: INSTALL.pipeline3. Local useArchiveBot was originally written as a set of separate programs fordeployment on a server. This means it has a poor distribution story.However, Ivan Kozik (@ivan) has taken the ArchiveBot pipeline,dashboard, ignores, and control system and created a package intended forpersonal use. You can find it athttps://github.com/ArchiveTeam/grab-site.4. LicenseCopyright 2013 David Yip; made available under the MIT license. SeeLICENSE for details.5. AcknowledgmentsThanks to Alard (@alard), who added WARC generation and Lua scripting toGNU Wget. Wget+lua was the first web crawler used by ArchiveBot.Thanks to Christopher Foo (@chfoo) for wpull, ArchiveBot's current webcrawler.Thanks to Ivan Kozik (@ivan) for maintaining ignore patterns andtracking down performance problems at scale.Other thanks go to the following projects:* Celluloid <http://celluloid.io/>* Cinch <https://github.com/cinchrb/cinch/>* CouchDB <http://couchdb.apache.org/>* Ember.js <http://emberjs.com/>* Redis <http://redis.io/>* Seesaw <https://github.com/ArchiveTeam/seesaw-kit>6. Special thanksDragonette, Barnaby Bright, Vienna Teng, NONONO.The memory hole of the Web has gone too far.Don't look down, never look away; ArchiveBot's like the wind. vim:ts=2:sw=2:tw=72:et
About
ArchiveBot, an IRC bot for archiving websites
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
No releases published
Packages0
No packages published
Uh oh!
There was an error while loading.Please reload this page.