data-extraction
Here are 983 public repositories matching this topic...
Language:All
Sort:Most stars
The Web Data API for AI - Turn entire websites into LLM-ready markdown or structured data 🔥
- Updated
Oct 7, 2025 - TypeScript
⚡ Easiest no code web data extraction platform • Instantly turn any website into API or spreadsheet ⚡
- Updated
Oct 6, 2025 - TypeScript
🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!
- Updated
Oct 6, 2025 - Python
Extract Keywords from sentence or Replace keywords in sentences.
- Updated
Apr 13, 2025 - Python
Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (from the Apache PDFBox library).
- Updated
Dec 17, 2023 - Java
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
- Updated
Dec 2, 2024 - Python
ContextGem: Effortless LLM extraction from documents
- Updated
Oct 1, 2025 - Python
A powerful Model Context Protocol (MCP) server that provides an all-in-one solution for public web access.
- Updated
Sep 24, 2025 - JavaScript
Lightweight library for scraping web-sites with LLMs
- Updated
Aug 25, 2025 - Python
A beginner-friendly yet powerful Python toolkit for financial analysis and automation — built to make modern investing accessible to everyone
- Updated
Sep 17, 2025 - Python
📰 Let ChatGPT Summarize Hacker News for You
- Updated
Sep 12, 2025 - Python
🚜 Parse text and tables from PDF files.
- Updated
Jan 22, 2025 - HTML
The agentic AI platform for enterprise. Built for availability, scalability, and security. Complete end-to-end context engineering and LLM orchestration infrastructure. Run anywhere - local, cloud, or bare metal.
- Updated
Oct 6, 2025 - Python
A lightweight, self-hosted headless browser automation platform. Designed as an alternative to Browserless, built for speed, privacy, and scalability.
- Updated
Sep 26, 2025 - JavaScript
Pure Python, lightweight, Pillow-based solver for Amazon's text captcha.
- Updated
Oct 3, 2025 - Python
Local-first, open-source AI assistant for your data. Unify tasks, notes, docs, photos, and bookmarks. Private, self-hosted, and extensible via APIs.
- Updated
Oct 6, 2025 - TypeScript
Benchmarking PDF libraries
- Updated
Jul 2, 2025 - Python
Undetected web-scraping & seamless HTML parsing in Python!
- Updated
Jul 14, 2025 - Python
🤖 AI-powered web scraping editor with visual workflow builder. Build, test & deploy web scrapers using natural language. Powered by ScrapeGraphAI & LangGraph.
- Updated
Aug 18, 2025 - Python
A tool for scraping emails, social media accounts, and much more information from websites using Google Search Results.
- Updated
Mar 19, 2024 - Ruby
Improve this page
Add a description, image, and links to thedata-extraction topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thedata-extraction topic, visit your repo's landing page and select "manage topics."