data-extraction
Here are 1,200 public repositories matching this topic...
Language:All
Sort:Most stars
🔥 The Web Data API for AI - Turn entire websites into LLM-ready markdown or structured data
- Updated
Nov 29, 2025 - TypeScript
Turn any website into clean, contextualized data pipelines for your workflows
- Updated
Nov 29, 2025 - TypeScript
🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!
- Updated
Nov 26, 2025 - Python
Extract Keywords from sentence or Replace keywords in sentences.
- Updated
Apr 13, 2025 - Python
Python package for scraping recipes data
- Updated
Nov 26, 2025 - Python
ContextGem: Effortless LLM extraction from documents
- Updated
Nov 16, 2025 - Python
A powerful Model Context Protocol (MCP) server that provides an all-in-one solution for public web access.
- Updated
Nov 24, 2025 - JavaScript
Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (from the Apache PDFBox library).
- Updated
Dec 17, 2023 - Java
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
- Updated
Dec 2, 2024 - Python
A lightweight, self-hosted headless browser automation platform. Designed as an alternative to Browserless, built for speed, privacy, and scalability.
- Updated
Oct 19, 2025 - JavaScript
Lightweight library for scraping web-sites with LLMs
- Updated
Oct 15, 2025 - Python
A beginner-friendly yet powerful Python toolkit for financial analysis and automation — built to make modern investing accessible to everyone
- Updated
Nov 14, 2025 - Python
📰 Let ChatGPT Summarize Hacker News for You
- Updated
Sep 12, 2025 - Python
🚜 Parse text and tables from PDF files.
- Updated
Nov 21, 2025 - HTML
Local-first, open-source AI assistant for your data. Unify tasks, notes, docs, photos, and bookmarks. Private, self-hosted, and extensible via APIs.
- Updated
Nov 13, 2025 - TypeScript
🤖 AI-powered web scraping editor with visual workflow builder. Build, test & deploy web scrapers using natural language. Powered by ScrapeGraphAI & LangGraph.
- Updated
Oct 26, 2025 - Python
Pure Python, lightweight, Pillow-based solver for Amazon's text captcha.
- Updated
Oct 27, 2025 - Python
Benchmarking PDF libraries
- Updated
Jul 2, 2025 - Python
Undetected web-scraping & seamless HTML parsing in Python!
- Updated
Jul 14, 2025 - Python
Improve this page
Add a description, image, and links to thedata-extraction topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thedata-extraction topic, visit your repo's landing page and select "manage topics."