#
document-data-extraction
Here are 2 public repositories matching this topic...
An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)
nlpmachine-learningocrextractiondocumentonpremdocument-analysistable-extractionunstructured-dataragonpremisellmsvlmsdocument-information-extractionocr-onpremisedocument-data-extractiononprem-visiononprem-ocrllm-ocrocr-benchmark
- Updated
Aug 25, 2025 - Python
TWIX is an open-source data extraction tool that reconstructs structured data from documents at scale, accurately and at low cost, by inferring the shared underlying visual template across documents
- Updated
May 29, 2025 - Python
Improve this page
Add a description, image, and links to thedocument-data-extraction topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thedocument-data-extraction topic, visit your repo's landing page and select "manage topics."