Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
This repository was archived by the owner on Jun 10, 2024. It is now read-only.
/pyspiderPublic archive

A Powerful Spider(Web Crawler) System in Python.

License

NotificationsYou must be signed in to change notification settings

binux/pyspider

Repository files navigation

A Powerful Spider(Web Crawler) System in Python.

  • Write script in Python
  • Powerful WebUI with script editor, task monitor, project manager and result viewer
  • MySQL,MongoDB,Redis,SQLite,Elasticsearch;PostgreSQL withSQLAlchemy as database backend
  • RabbitMQ,Redis andKombu as message queue
  • Task priority, retry, periodical, recrawl by age, etc...
  • Distributed architecture, Crawl Javascript pages, Python 2.{6,7}, 3.{3,4,5,6} support, etc...

Tutorial:http://docs.pyspider.org/en/latest/tutorial/
Documentation:http://docs.pyspider.org/
Release notes:https://github.com/binux/pyspider/releases

Sample Code

frompyspider.libs.base_handlerimport*classHandler(BaseHandler):crawl_config= {    }@every(minutes=24*60)defon_start(self):self.crawl('http://scrapy.org/',callback=self.index_page)@config(age=10*24*60*60)defindex_page(self,response):foreachinresponse.doc('a[href^="http"]').items():self.crawl(each.attr.href,callback=self.detail_page)defdetail_page(self,response):return {"url":response.url,"title":response.doc('title').text(),        }

Installation

WARNING: WebUI is open to the public by default, it can be used to execute any command which may harm your system. Please use it in an internal network orenableneed-auth for webui.

Quickstart:http://docs.pyspider.org/en/latest/Quickstart/

Contribute

TODO

v0.4.0

  • a visual scraping interface likeportia

License

Licensed under the Apache License, Version 2.0

About

A Powerful Spider(Web Crawler) System in Python.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors51


[8]ページ先頭

©2009-2025 Movatter.jp