document-parser
Here are 40 public repositories matching this topic...
Language:All
Sort:Most stars
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
- Updated
Mar 24, 2025 - TypeScript
Get your documents ready for gen AI
- Updated
Mar 19, 2025 - Python
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
- Updated
Mar 21, 2025 - HTML
Knowledge Agents and Management in the Cloud
- Updated
Mar 22, 2025 - Python
AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
- Updated
Mar 3, 2025 - Python
Improved file parsing for LLM’s
- Updated
Nov 13, 2024 - Python
A Repo For Document AI
- Updated
Mar 23, 2025 - Python
Parse PDFs into markdown using Vision LLMs
- Updated
Feb 8, 2025 - Python
Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pipelines (GenAI, LLM, VLLM) into your applications, supporting various tasks such as document cleanup, optical character recognition (OCR), classification, splitting, named entity recognition, and form processing
- Updated
Mar 21, 2025 - Python
Tutorial on how to deskew (straighten) text images
- Updated
Mar 15, 2022 - Python
A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GROBID, LangChain, listen as podcast. Customize your own pipelines.
- Updated
Mar 17, 2025 - Python
The invoice, document, and resume parser powered by AI.
- Updated
Nov 22, 2024 - Python
An OCR based document parser to extract information from identity document images
- Updated
Aug 25, 2022 - TypeScript
An open source framework for Retrieval-Augmented System (RAG) uses semantic search helps to retrieve the expected results and generate human readable conversational response with the help of LLM (Large Language Model).
- Updated
Jul 19, 2024 - Python
Resume Parsing app to extract information using AI
- Updated
Jan 19, 2022 - Jupyter Notebook
Graphlit Platform
- Updated
Feb 20, 2024
Build a RAG preprocessing pipeline
- Updated
Apr 7, 2024 - Jupyter Notebook
Python client library for Graphlit Platform
- Updated
Mar 16, 2025 - Python
Extract text from your DOCX documents.
- Updated
Feb 10, 2024 - Python
Improve this page
Add a description, image, and links to thedocument-parser topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thedocument-parser topic, visit your repo's landing page and select "manage topics."