document-parser

Star

Here are 59 public repositories matching this topic...

Language:All

Filter by language

All59 Python27 TypeScript7 Java4 Jupyter Notebook4 HTML2 JavaScript2 R2 C#1 Hack1 PHP1

Sort:Most stars

Sort options

Most stars Fewest stars Most forks Fewest forks Recently updated Least recently updated

infiniflow /ragflow

Star65.6k

RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs

agent ai deep-learning mcp multi-agent openai document-parser ai-search rag document-understanding llm agentic retrieval-augmented-generation ollama deepseek graphrag agentic-workflow agentic-ai deepseek-r1 deep-research

UpdatedOct 3, 2025
TypeScript

docling-project /docling

Star40.8k

Get your documents ready for gen AI

html markdown pdf ai convert xlsx pdf-converter docx documents pptx pdf-to-text tables document-parser pdf-to-json document-parsing

UpdatedOct 7, 2025
Python

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

nlp pdf machine-learning natural-language-processing information-retrieval ocr deep-learning ml docx preprocessing pdf-to-text data-pipelines donut document-image-processing document-parser pdf-to-json document-image-analysis llm document-parsing langchain

UpdatedSep 26, 2025
HTML

freeok /so-novel

Star4.5k

小说下载｜网文下载 | 网络小说

cli ebook tui novel document-parser offline-reader content-export

UpdatedOct 7, 2025
Java

Marker-Inc-Korea /AutoRAG

Star4.3k

AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

python open-source qa benchmarking ops pipeline analysis optimization evaluation embeddings automl document-parser rag llm retrieval-augmented-generation llm-ops llm-evaluation rag-evaluation

UpdatedSep 28, 2025
Python

run-llama /llama_cloud_services

Star4.2k

Knowledge Agents and Management in the Cloud

pdf parsing document pptx structured-data pdf-to-text pdf-to-excel tables docx-to-markdown document-parser pdf-document-processor pdf-to-json document-parsing ppt-to-json pdf-to-markdown ppt-to-markdown

UpdatedOct 7, 2025
TypeScript

Filimoa /open-parse

Star3.1k

Improved file parsing for LLM’s

document-parser table-detection document-structure layout-parsing

UpdatedNov 13, 2024
Python

deepdoctection /deepdoctection

Star3k

A Repo For Document AI

python nlp ocr tensorflow pytorch document-parser document-layout-analysis table-recognition table-detection document-understanding publaynet layoutlm document-ai document-image-analysis pubtabnet

UpdatedSep 15, 2025
Python

liweiphys /layra

Star821

LAYRA—an enterprise-ready, out-of-the-box solution—unlocks next-generation intelligent systems powered by visual RAG and limitless visual multi-step agent workflow orchestration.

agent workflow knowledge-base document-parser fastapi llm gpt-4o colpali visual-rag

UpdatedOct 7, 2025
TypeScript

NanoNets /docstrange

Star706

Extract and convert data from any document, images, pdfs, word doc, ppt or URL into multiple formats (Markdown, JSON, CSV, HTML) with intelligent structured data extraction and advanced OCR.

markdown ocr ai structured-data tables pdf-parser document-parser structured-data-capture pdf-to-json llm document-parsing image-to-markdown pdf-to-markdown

UpdatedSep 11, 2025
Python

opendataloader-project /opendataloader-pdf

Star685

Safe, Open, High-Performance — PDF for AI

html markdown pdf json sdk recognition ai pdf-converter documents dataloader tables ocr-recognition document-parser pdf-to-html pdf-to-json document-parsing pdf-to-markdown

UpdatedOct 7, 2025
Java

iamarunbrahma /vision-parse

Star430

Parse PDFs into markdown using Vision LLMs

text-extraction pdf-parser document-parser pdf-to-markdown

UpdatedOct 4, 2025
Python

GiftMungmeeprued /document-parsers-list

Star157

A comprehensive list of document parsers, covering PDF-to-text conversion and layout extraction. Each tested for support of tables, equations, handwriting, two-column layouts, and multi-column layouts.

pdf ocr preprocessing pdf-to-text document-image-processing data-pipeline document-parser document-parsing langchain

UpdatedJul 14, 2025

marieai /marie-ai

Star72

Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pipelines (GenAI, LLM, VLLM) into your applications, supporting various tasks such as document cleanup, optical character recognition (OCR), classification, splitting, named entity recognition, and form processing

python docker ocr pytorch omr optical-character-recognition optical-mark-recognition icr document-parser document-layout-analysis table-recognition table-detection publaynet intelligent-character-recognition intelligent-word-recognition iwr pubtabnet

UpdatedOct 7, 2025
Python

JPLeoRX /opencv-text-deskew

Star52

Tutorial on how to deskew (straighten) text images

python opencv tutorial computer-vision image-processing opencv-python deskew document-parser

UpdatedMar 15, 2022
Python

papercast-dev /papercast

Star52

A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GROBID, LangChain, listen as podcast. Customize your own pipelines.

python nlp pipeline podcast pdf-converter tts arxiv pdf-to-text dag document-parser pdf-document-processor grobid semantic-scholar document-parsing

UpdatedMar 17, 2025
Python

LianjiaTech /bella-domify

Star49

文档解析（Document Parser），支持 PDF、TXT、DOC、DOCX、Markdown 等文件格式，高效提取与解析内容，生成标准文档树结构。内置 PDF Parser、Text Parser、Word Parser，助力 RAG、知识库、全文检索等智能应用。

parser pdf-parser document-parser

UpdatedSep 17, 2025
Python

InvoiceableAI /Invoiceable

Star40

The invoice, document, and resume parser powered by AI.

python resume ai experimental invoices invoice documents resume-parser resumes document-parser invoice-parser invoiceable

UpdatedNov 22, 2024
Python

graphlit /graphlit

Star23

Graphlit Platform

data natural-language-processing information-retrieval framework chatbot pdf-to-text copilot document-parser rag pdf-to-json vector-database llm graphlit

UpdatedFeb 20, 2024

decisionfacts /semantic-ai

Star21

An open source framework for Retrieval-Augmented System (RAG) uses semantic search helps to retrieve the expected results and generate human readable conversational response with the help of LLM (Large Language Model).

pdf machine-learning ocr deep-neural-networks openai docx approximate-nearest-neighbor-search semantic-search document-parser rag fastapi vector-database inference-api openai-api llm retrieval-augmented-generation llama2

UpdatedJul 19, 2024
Python

Improve this page

Add a description, image, and links to thedocument-parser topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thedocument-parser topic, visit your repo's landing page and select "manage topics."

Learn more

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly