document-parser
Here are 59 public repositories matching this topic...
Language:All
Sort:Most stars
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
- Updated
Oct 3, 2025 - TypeScript
Get your documents ready for gen AI
- Updated
Oct 7, 2025 - Python
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.
- Updated
Sep 26, 2025 - HTML
AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
- Updated
Sep 28, 2025 - Python
Knowledge Agents and Management in the Cloud
- Updated
Oct 7, 2025 - TypeScript
Improved file parsing for LLM’s
- Updated
Nov 13, 2024 - Python
A Repo For Document AI
- Updated
Sep 15, 2025 - Python
LAYRA—an enterprise-ready, out-of-the-box solution—unlocks next-generation intelligent systems powered by visual RAG and limitless visual multi-step agent workflow orchestration.
- Updated
Oct 7, 2025 - TypeScript
Extract and convert data from any document, images, pdfs, word doc, ppt or URL into multiple formats (Markdown, JSON, CSV, HTML) with intelligent structured data extraction and advanced OCR.
- Updated
Sep 11, 2025 - Python
Safe, Open, High-Performance — PDF for AI
- Updated
Oct 7, 2025 - Java
Parse PDFs into markdown using Vision LLMs
- Updated
Oct 4, 2025 - Python
A comprehensive list of document parsers, covering PDF-to-text conversion and layout extraction. Each tested for support of tables, equations, handwriting, two-column layouts, and multi-column layouts.
- Updated
Jul 14, 2025
Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pipelines (GenAI, LLM, VLLM) into your applications, supporting various tasks such as document cleanup, optical character recognition (OCR), classification, splitting, named entity recognition, and form processing
- Updated
Oct 7, 2025 - Python
Tutorial on how to deskew (straighten) text images
- Updated
Mar 15, 2022 - Python
A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GROBID, LangChain, listen as podcast. Customize your own pipelines.
- Updated
Mar 17, 2025 - Python
文档解析(Document Parser),支持 PDF、TXT、DOC、DOCX、Markdown 等文件格式,高效提取与解析内容,生成标准文档树结构。内置 PDF Parser、Text Parser、Word Parser,助力 RAG、知识库、全文检索等智能应用。
- Updated
Sep 17, 2025 - Python
The invoice, document, and resume parser powered by AI.
- Updated
Nov 22, 2024 - Python
Graphlit Platform
- Updated
Feb 20, 2024
An open source framework for Retrieval-Augmented System (RAG) uses semantic search helps to retrieve the expected results and generate human readable conversational response with the help of LLM (Large Language Model).
- Updated
Jul 19, 2024 - Python
Improve this page
Add a description, image, and links to thedocument-parser topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thedocument-parser topic, visit your repo's landing page and select "manage topics."