table-extraction
Here are 80 public repositories matching this topic...
Language:All
Sort:Most stars
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
- Updated
Nov 8, 2025 - Python
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
- Updated
Dec 17, 2025 - Python
A polyglot document intelligence framework with a Rust core. Extract text, metadata, and structured information from PDFs, Office documents, images, and 50+ formats. Available for Rust, Python, Ruby, Go, and TypeScript/Node.js—or use via CLI, REST API, or MCP server.
- Updated
Dec 17, 2025 - HTML
Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.
- Updated
Jun 24, 2024 - Python
An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)
- Updated
Aug 25, 2025 - Python
High-accuracy PDF-to-Markdown OCR API using LLMs with vision capabilities. Features parallel processing, batching, and auto-retry logic for scalable extraction.
- Updated
Nov 29, 2025 - Python
img2table is a table identification and extraction Python Library for PDF and images, based on OpenCV image processing
- Updated
Nov 9, 2025 - Python
Document Layout Analysis resources repos for development with PdfPig.
- Updated
Oct 1, 2023 - C#
Python library to extract tabular data from images and scanned PDFs
- Updated
Jul 30, 2024 - Python
A Curated List of Awesome Table Structure Recognition (TSR) Research. Including models, papers, datasets and codes. Continuously updating.
- Updated
Sep 9, 2024
Extract tables from PDF files (port of tabula-java)
- Updated
Mar 17, 2025 - C#
A carefully-designed OCR pipeline for universal boarded table recognition and reconstruction.
- Updated
Jan 10, 2023 - C++
✂️ Extract Tables from Microsoft Word Documents with R
- Updated
Oct 2, 2021 - R
Best PDF Converter! PDF to any format, pdf2word/excel/xml/html/txt...
- Updated
Mar 11, 2021 - Python
CCKS2019评测任务五-公众公司公告信息抽取,第3名
- Updated
Sep 15, 2019 - Python
🔍 Table Extraction Tool: A powerful open-source solution combining OCR and computer vision for extracting structured tabular data from images. Ideal for LLM preprocessing, data analysis, and automation. 🚀
- Updated
Feb 22, 2025 - Python
Automated data extraction from engineering blueprint images.
- Updated
Aug 26, 2023 - Python
Parsee's PDF reader, specialized on the extraction of tables with numeric values and the accurate extraction and preservation of text-paragraphs. Full support for scans and images.
- Updated
Aug 29, 2025 - Python
A line-based framework to detect and extract tabular data in JSON format from raster images using computer vision and Tesseract OCR.
- Updated
Oct 6, 2025 - Python
Improve this page
Add a description, image, and links to thetable-extraction topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thetable-extraction topic, visit your repo's landing page and select "manage topics."