- Notifications
You must be signed in to change notification settings - Fork32
Making a reusable toolkit for writing seesaw scripts
License
ArchiveTeam/seesaw-kit
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
An asynchronous toolkit for distributed web processing. Written in Python and named after its behavior, it supports concurrent downloads, uploads, etc.
This toolkit is well-known forArchive Team projects. It also powers theArchive Team warrior.
Requires Python 2 or 3.
Needs the Tornado library for event-driven I/O. The complete list of Python modules needed are listed in requirements.txt.
To run the example pipeline:
sudo pip install -r requirements.txt./run-pipeline --help./run-pipeline examples/example-pipeline.py someone
Point your browser tohttp://127.0.0.1:8001/
.
You can also userun-pipeline2
orrun-pipeline3
to be explicit for the Python version.
General idea: a set ofTask
s that can be combined into aPipeline
that processesItem
s:
- An
Item
is a thing that needs to be downloaded (a user, for example). It has properties that are filled by theTask
s. - A
Task
is a step in the download process: it takes an item, does something with it and passes it on. Example Tasks: getting an item name from the tracker, running a download script, rsyncing the result, notifying the tracker that it's done. - A
Pipeline
represents a sequence ofTask
s. To make a seesaw script for a new project you'd specify a newPipeline
.
ATask
can work on multipleItem
s at a time (e.g., multiple Wget downloads). The concurrency can be limited by wrapping the task in aLimitConcurrency
Task
: this will queue the items and run them one-by-one (e.g., a single Rsync upload).
ThePipeline
needs to be fed emptyItem
objects; by controlling the number of activeItem
s you can limit the number of items. (For example, add a new item each time an item leaves the pipeline.)
With theItemValue
,ItemInterpolation
andConfigValue
classes it is possible to pass item-specific arguments to theTask
objects. The value of these objects will be re-evaluated for each item. Examples: a path name that depends on the item name, a configurable bandwidth limit, the number of concurrent downloads.
Consultthe wiki for more information.
About
Making a reusable toolkit for writing seesaw scripts
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors15
Uh oh!
There was an error while loading.Please reload this page.