extract-data
Here are 275 public repositories matching this topic...
Language:All
Sort:Most stars
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
- Updated
Dec 16, 2025 - Python
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
- Updated
Dec 17, 2025 - Python
Web Crawler/Spider for NodeJS + server-side jQuery ;-)
- Updated
May 28, 2025 - TypeScript
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
- Updated
Dec 17, 2025 - Python
Open-source platform for extracting structured data from documents using AI.
- Updated
May 15, 2025 - JavaScript
Crawly, a high-level web crawling & scraping framework for Elixir.
- Updated
Jul 16, 2025 - Elixir
Extract structured data from web sites. Web sites scraping.
- Updated
Mar 7, 2023 - Go
A simple resume parser used for extracting information from resumes
- Updated
Feb 7, 2024 - Python
Receipt scanner extracts information from your PDF or image receipts - built in NodeJS
- Updated
Nov 18, 2018 - JavaScript
Turn Webpage to LLM friendly input text. Similar to Firecrawl and Jina Reader API. Makes RAG, AI web scraping, image & webpage links extraction easy.
- Updated
Nov 26, 2025 - Python
Extract data from .trace documents generated by Instruments
- Updated
Sep 21, 2020 - Objective-C
extract data from html table
- Updated
May 1, 2020 - Python
An R package for acquisition and processing of NASA SMAP data
- Updated
Nov 29, 2025 - R
Library and cli for extracting data from HTML via CSS selectors
- Updated
Aug 22, 2025 - Go
FBLYZE is a Facebook scraping system and analysis system.
- Updated
Apr 28, 2021 - Jupyter Notebook
Get Lyrics for any songs by just passing in the song name (spelled or misspelled) in less than 2 seconds using this awesome Python Library.
- Updated
Jan 11, 2024 - Python
Extracting and parsing structured data with jQuery Selector, XPath or JsonPath from common web format like HTML, XML and JSON.
- Updated
Jan 22, 2024 - Java
This program extracts insider trading data from the sec website and stores it in excel file for the specified time frame.
- Updated
Oct 5, 2022 - Python
A tool to replace data in a Unity Asset Bundle from modified files.
- Updated
Apr 15, 2024 - C#
Improve this page
Add a description, image, and links to theextract-data topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with theextract-data topic, visit your repo's landing page and select "manage topics."