scraping

AIHawk aims to easy job hunt process by automating the job application process. Utilizing artificial intelligence, it enables users to apply for multiple jobs in a tailored way.

python resume bot agent chrome scraper automation job scraping selenium jobs artificial-intelligence automate jobseeker gpt jobsearch human-resources chatgpt opeai application-resume

UpdatedNov 16, 2025
Python

gocolly /colly

Star24.9k

Elegant Scraper and Crawler Framework for Golang

go golang crawler scraper framework spider scraping crawling

UpdatedDec 4, 2025
Go

ScrapeGraphAI /Scrapegraph-ai

Sponsor

Star22k

Python scraper based on AI

markdown crawler web-crawler scraping web-scraper web-scraping data-extraction webscraping web-data-extraction web-search ai-search rag web-data scraping-python web-crawlers llm ai-crawler large-language-model ai-scraping firecrawl-alternative

UpdatedDec 13, 2025
Python

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

nodejs javascript npm crawler scraper automation typescript web-crawler headless scraping crawling web-scraping web-crawling headless-chrome apify puppeteer playwright

UpdatedDec 17, 2025
TypeScript

soxoj /maigret

Sponsor

Star18.2k

🕵️‍♂️ Collect a dossier on a person by username from thousands of sites

python cli open-source osint social-network scraping sherlock python3 cybersecurity identification infosec pentesting blueteam investigation reconnaissance redteam osint-framework socmint osint-python namechecker

UpdatedDec 6, 2025
Python

psf /requests-html

Sponsor

Star13.9k

Pythonic HTML Parsing for Humans™

python html http scraping requests kennethreitz beautifulsoup lxml css-selectors pyquery

UpdatedApr 16, 2024
Python

ultrafunkamsterdam /undetected-chromedriver

Star12.1k

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)

testing chrome automation webdriver browser captcha scraping selenium navigator python3 cloudflare chromedriver anti-bot bot-detection cloudflare-bypass distil anti-detection

UpdatedJul 5, 2025
Python

code4craft /webmagic

Star11.7k

A scalable web crawler framework for Java.

java crawler framework scraping

UpdatedNov 10, 2025
Java

D4Vinci /Scrapling

Sponsor

Star8.3k

🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!

python crawler data automation ai mcp scraping crawling web-scraper web-scraping selectors xpath data-extraction stealth webscraping crawling-python playwright web-scraping-python ai-scraping mcp-server

UpdatedDec 18, 2025
Python

lorien /awesome-web-scraping

Star7.7k

List of libraries, tools and APIs for web scraping and data processing.

crawler spider scraping crawling web-scraping captcha-recaptcha webscraping crawling-framework scraping-framework captcha-bypass scraping-tool crawling-tool scraping-python crawling-python

UpdatedOct 13, 2025
Makefile

apify /crawlee-python

Star7.3k

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

python crawler scraper automation web-crawler headless scraping crawling pip web-scraping beautifulsoup web-crawling hacktoberfest headless-chrome apify playwright