Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

A simple and easy to use web crawler for Python

License

NotificationsYou must be signed in to change notification settings

DataCrawl-AI/datacrawl

Repository files navigation

cover

Tiny Web Crawler

CICoverage badgeStable VersionLicense: MITDownload StatsDiscord

A simple and efficient web crawler for Python.

Features

  • Crawl web pages and extract links starting from a root URL recursively
  • Concurrent workers and custom delay
  • Handle relative and absolute URLs
  • Designed with simplicity in mind, making it easy to use and extend for various web crawling tasks

Installation

Install using pip:

pip install tiny-web-crawler

Usage

fromtiny_web_crawlerimportSpiderfromtiny_web_crawlerimportSpiderSettingssettings=SpiderSettings(root_url='http://github.com',max_links=2)spider=Spider(settings)spider.start()# Set workers and delay (default: delay is 0.5 sec and verbose is True)# If you do not want delay, set delay=0settings=SpiderSettings(root_url='https://github.com',max_links=5,max_workers=5,delay=1,verbose=False)spider=Spider(settings)spider.start()

Output Format

Crawled output sample forhttps://github.com

{"http://github.com": {"urls": ["http://github.com/","https://githubuniverse.com/","..."        ],"https://github.com/solutions/ci-cd": {"urls": ["https://github.com/solutions/ci-cd/","https://githubuniverse.com/","..."        ]      }    }}

Contributing

Thank you for considering to contribute.

  • If you are a first time contributor you can pick agood-first-issue and get started.
  • Please feel free to ask questions.
  • Before starting to work on an issue. Please get it assigned to you so that we can avoid multiple people from working on the same issue.
  • We are working on doing our first major release. Please check thisissue and see if anything interests you.

Dev setup

  • Install poetry in your systempipx install poetry
  • Clone the repo you forked
  • Create a venv or usepoetry shell
  • Runpoetry install --with dev
  • pre-commit install (see)
  • pre-commit install --hook-type pre-push

Before raising a PR. Please make sure you have these checks covered

  • An issue exists or is created which address the PR
  • Tests are written for the changes
  • All lint/test passes

Packages

No packages published

Contributors2

  •  
  •  

Languages


[8]ページ先頭

©2009-2025 Movatter.jp