scraper

AIHawk aims to easy job hunt process by automating the job application process. Utilizing artificial intelligence, it enables users to apply for multiple jobs in a tailored way.

python resume bot agent chrome scraper automation job scraping selenium jobs artificial-intelligence automate jobseeker gpt jobsearch human-resources chatgpt opeai application-resume

UpdatedMay 28, 2025
Python

gocolly /colly

Star24.7k

Elegant Scraper and Crawler Framework for Golang

go golang crawler scraper framework spider scraping crawling

UpdatedSep 30, 2025
Go

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

nodejs javascript npm crawler scraper automation typescript web-crawler headless scraping crawling web-scraping web-crawling headless-chrome apify puppeteer playwright

UpdatedOct 7, 2025
TypeScript

codelucas /newspaper

Star14.8k

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

python crawler scraper news crawling news-aggregator

UpdatedAug 14, 2025
HTML

Evil0ctal /Douyin_TikTok_Download_API

Sponsor

Star14.5k

🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具，支持API调用，在线批量解析及下载。

python api crawler scraper spider async web-scraping douyin tiktok fastapi tiktok-scraper tiktok-api douyin-api pywebio tiktok-signature no-watermark online-parsing douyin-tiktok-api douyin-tiktok-download douyin-scraper

UpdatedOct 2, 2025
Python

getmaxun /maxun

Star13.7k

⚡ Easiest no code web data extraction platform • Instantly turn any website into API or spreadsheet ⚡

api scraper automation browser web-scraper self-hosted web-scraping data-extraction webscraping agents hacktoberfest browser-automation no-code web-automation rpa robotic-process-automation playwright hacktoberfest-accepted no-code-web-scraper

UpdatedOct 6, 2025
TypeScript

pwxcoo /chinese-xinhua

Star11.4k

📙 中华新华字典数据库。包括歇后语，成语，词语，汉字。

json data scraper json-data python3 chinese chinese-nlp chinese-characters chinese-simplified chinese-traditional json-dataset chinese-language

UpdatedDec 26, 2023
Python

guyueyingmu /avbook

Star9.8k

AV 电影管理系统， avmoo , javbus , javlibrary 爬虫，线上 AV 影片图书馆，AV 磁力链接数据库，Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database

crawler scraper laravel database spider magnet-link guzzlehttp magnet adult javbus javlibrary avmoo adult-video

UpdatedJun 1, 2024
PHP

TeamWiseFlow /wiseflow

Star7.8k

Use LLMs to dig out what you care about from massive amounts of information and a variety of sources daily.

crawler scraper information-gathering focus-stacking llm

UpdatedOct 2, 2025
Python

alirezamika /autoscraper

Sponsor

Star7k

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

python crawler machine-learning scraper automation ai scraping artificial-intelligence web-scraping scrape webscraping webautomation

UpdatedJun 9, 2025
Python

BruceDone /awesome-crawler

Star7k

A collection of awesome web crawler,spider in different languages

crawler scraper awesome spider web-crawler web-scraper node-crawler

UpdatedJun 16, 2024

apify /crawlee-python

Star6.8k

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

python crawler scraper automation web-crawler headless scraping crawling pip web-scraping beautifulsoup web-crawling hacktoberfest headless-chrome apify playwright