This repository was archived by the owner on Jun 10, 2024. It is now read-only.
- Notifications
You must be signed in to change notification settings - Fork3.7k
A Powerful Spider(Web Crawler) System in Python.
License
NotificationsYou must be signed in to change notification settings
binux/pyspider
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
A Powerful Spider(Web Crawler) System in Python.
- Write script in Python
- Powerful WebUI with script editor, task monitor, project manager and result viewer
- MySQL,MongoDB,Redis,SQLite,Elasticsearch;PostgreSQL withSQLAlchemy as database backend
- RabbitMQ,Redis andKombu as message queue
- Task priority, retry, periodical, recrawl by age, etc...
- Distributed architecture, Crawl Javascript pages, Python 2.{6,7}, 3.{3,4,5,6} support, etc...
Tutorial:http://docs.pyspider.org/en/latest/tutorial/
Documentation:http://docs.pyspider.org/
Release notes:https://github.com/binux/pyspider/releases
frompyspider.libs.base_handlerimport*classHandler(BaseHandler):crawl_config= { }@every(minutes=24*60)defon_start(self):self.crawl('http://scrapy.org/',callback=self.index_page)@config(age=10*24*60*60)defindex_page(self,response):foreachinresponse.doc('a[href^="http"]').items():self.crawl(each.attr.href,callback=self.detail_page)defdetail_page(self,response):return {"url":response.url,"title":response.doc('title').text(), }
pip install pyspider- run command
pyspider, visithttp://localhost:5000/
WARNING: WebUI is open to the public by default, it can be used to execute any command which may harm your system. Please use it in an internal network orenableneed-auth for webui.
Quickstart:http://docs.pyspider.org/en/latest/Quickstart/
- Use It
- OpenIssue, send PR
- User Group
- 中文问答
- a visual scraping interface likeportia
Licensed under the Apache License, Version 2.0
About
A Powerful Spider(Web Crawler) System in Python.
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
No packages published
Uh oh!
There was an error while loading.Please reload this page.