webscraper

Star

Here are 1,518 public repositories matching this topic...

Language:All

Filter by language

All1,518 Python869 JavaScript160 Jupyter Notebook105 HTML60 Go54 TypeScript52 Java42 C#41 PHP17 R17

Sort:Most stars

Sort options

Most stars Fewest stars Most forks Fewest forks Recently updated Least recently updated

jaypyles /Scraperr

Star4.7k

Self-hosted webscraper.

python docker kubernetes opensource helm scraping webscraper web-scraper self-hosted web-scraping web-scrapers webscraping playwright

UpdatedOct 12, 2025
TypeScript

any4ai /AnyCrawl

Star2.5k

AnyCrawl 🚀: A Node.js/TypeScript crawler that turns websites into LLM-ready data and extracts structured SERP results from Google/Bing/Baidu/etc. Native multi-threading for bulk processing.

data html-to-markdown scraping webscraper crawl scrape serp rag aitools ai-scraping

UpdatedNov 24, 2025
TypeScript

anaskhan96 /soup

Star2.2k

Web Scraper in Go, similar to BeautifulSoup

go golang webscraper web-scraper beautifulsoup webscraping html-node

UpdatedNov 2, 2023
Go

Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.

html cli http json scraper web rest command-line curl xml webscraper wget css-selector xpath xquery data-processing httpie webscraping datascraping xmlstarlet

UpdatedFeb 22, 2025
Pascal

scrapfly /scrapfly-scrapers

Star771

Scalable Python web scraping scripts for +40 popular domains

python crawler scraper automation spider web-crawler scraping crawling webscraper proxies web-scraping webscraping twitter-scraper datascraping python-scraper captcha-bypass antibot scraping-python crawling-python web-scraping-python

UpdatedNov 26, 2025
Python

rootVIII /proxy_requests

Star390

a class that uses scraped proxies to make http GET/POST requests (Python requests)

python http proxy proxy-requests webscraper proxy-server http-proxy python3 recursion requests proxy-list webscraping python-requests http-getter recursion-problem http-proxy-middleware http-get requests-module webscraper-api

UpdatedDec 3, 2020
Python

salimk /Rcrawler

Star360

An R web crawler and scraper

crawler scraper r webscraper crawlers webcrawler webscraping webscrapping rpackage

UpdatedMar 27, 2022
R

onepointAI /onepoint

Star316

An AI assistant tool that integrates coding, writing, and reading functions. For better alternatives seehttps://monica.im/desktop

electron react macos ai toolkit webscraper reading coding all-in-one xiaoai-tts chatgpt gpt-35-turbo

UpdatedMay 12, 2023
TypeScript

toby-p /rightmove_webscraper.py

Star272

Python class to scrape data from rightmove.co.uk and return listings in a pandas DataFrame object

python data-science data-mining csv pandas-dataframe webscraper pandas python3 data-analysis rightmove

UpdatedDec 27, 2023
Python

intergalacticalvariable /reader

Star262

📚 This is an adapted version of Jina AI's Reader for local deployment using Docker. Convert any URL to an LLM-friendly input with a simple prefixhttp://127.0.0.1:3000/https://website-to-scrape.com/

docker scraper proxy webscraper self-hosted webscraping website-screenshot website-screenshot-capturer rag webscraping-data llm

UpdatedJul 18, 2025
TypeScript

serpapi /lego-ai-parser

Star235

Lego AI Parser is an open-source application that uses OpenAI to parse visible text of HTML elements.

python html parser machine-learning scraper tools ai parser-library parser-generator webscraper artificial-intelligence datascience webapp openai classification webscraping gpt-3

UpdatedJun 10, 2024
Python

TBosak /mkfd

Sponsor

Star215

RSS feed builder created with Bun🥖 and Hono🔥- builds from webpages, email folders, and REST API calls.

docker rss dockerfile scraper typescript webscraper self-hosted feed help-wanted dockerhub rss-generator hono bun rssfeed contributors-welcome honojs bunjs mkfd

UpdatedNov 19, 2025
TypeScript

AliAkhtari78 /SpotifyScraper

Star200

Spotify Scraper to extract all the information from spotify, download mp3 with cover of the song

python crawler scraper webscraper python3 free webscraping spotify-downloader spotify-web-player spotfiy infromation spotify-scraper spotify-scraping spotify-crawler preview-mp3 album-title spotify-songs

UpdatedNov 24, 2025
Makefile

MichaelYochpaz /iSubRip

Star189

A Python command-line tool for scraping and downloading subtitles from AppleTV and iTunes movie pages.

python scraper script itunes webscraper m3u8 subtitles appletv pypi-package

UpdatedOct 14, 2025
Python

mehmetozkaya /DotnetCrawler

Star179

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link :https://medium.com/@mehmetozkaya/creating-custom-web-crawler-w…

crawler csharp dotnetcore scraping crawling webscraper scrapy entity-framework-core webcrawler webscraping scrapy-crawler ddd-architecture htmlagilitypack webcrawling webcrawler-htmlagilitypack