document-parser
Here are 50 public repositories matching this topic...
Language:All
Sort:Most stars
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
- Updated
Jul 18, 2025 - Python
Get your documents ready for gen AI
- Updated
Jul 18, 2025 - Python
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.
- Updated
Jul 18, 2025 - HTML
AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
- Updated
Jul 4, 2025 - Python
Knowledge Agents and Management in the Cloud
- Updated
Jul 18, 2025 - Python
Improved file parsing for LLM’s
- Updated
Nov 13, 2024 - Python
A Repo For Document AI
- Updated
Jul 17, 2025 - Python
LAYRA—an enterprise-ready, out-of-the-box solution—unlocks next-generation intelligent systems powered by visual RAG and limitless visual multi-step agent workflow orchestration.
- Updated
Jul 18, 2025 - TypeScript
Parse PDFs into markdown using Vision LLMs
- Updated
Feb 8, 2025 - Python
A comprehensive list of document parsers, covering PDF-to-text conversion and layout extraction. Each tested for support of tables, equations, handwriting, two-column layouts, and multi-column layouts.
- Updated
Jul 14, 2025
Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pipelines (GenAI, LLM, VLLM) into your applications, supporting various tasks such as document cleanup, optical character recognition (OCR), classification, splitting, named entity recognition, and form processing
- Updated
Jul 18, 2025 - Python
Tutorial on how to deskew (straighten) text images
- Updated
Mar 15, 2022 - Python
A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GROBID, LangChain, listen as podcast. Customize your own pipelines.
- Updated
Mar 17, 2025 - Python
The invoice, document, and resume parser powered by AI.
- Updated
Nov 22, 2024 - Python
An open source framework for Retrieval-Augmented System (RAG) uses semantic search helps to retrieve the expected results and generate human readable conversational response with the help of LLM (Large Language Model).
- Updated
Jul 19, 2024 - Python
An OCR based document parser to extract information from identity document images
- Updated
Aug 25, 2022 - TypeScript
Graphlit Platform
- Updated
Feb 20, 2024
Resume Parsing app to extract information using AI
- Updated
Jan 19, 2022 - Jupyter Notebook
Novalad offers a unified, centralized platform enabling organizations to extract meaningful data and perform advanced processing at high speed.
- Updated
Jul 14, 2025 - Jupyter Notebook
Improve this page
Add a description, image, and links to thedocument-parser topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thedocument-parser topic, visit your repo's landing page and select "manage topics."