Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
#

document-parsing

Here are 47 public repositories matching this topic...

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 80+ languages.

  • UpdatedOct 6, 2025
  • Python
docling

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

  • UpdatedSep 26, 2025
  • HTML
ExtractThinker

ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.

  • UpdatedAug 27, 2025
  • Python

Extract and convert data from any document, images, pdfs, word doc, ppt or URL into multiple formats (Markdown, JSON, CSV, HTML) with intelligent structured data extraction and advanced OCR.

  • UpdatedSep 11, 2025
  • Python

Open-source spreadsheets platform for deep research and document processing

  • UpdatedSep 25, 2025
  • TypeScript

A comprehensive list of document parsers, covering PDF-to-text conversion and layout extraction. Each tested for support of tables, equations, handwriting, two-column layouts, and multi-column layouts.

  • UpdatedJul 14, 2025
Documents-Parsing-Lab

Jupyter notebooks testing different OCR models for document parsing (Dolphin, MonkeyOCR, Marker, Nanonets, ...)

  • UpdatedSep 18, 2025
  • Jupyter Notebook

A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GROBID, LangChain, listen as podcast. Customize your own pipelines.

  • UpdatedMar 17, 2025
  • Python

A Unified Toolkit for Deep Learning-Based Table Extraction

  • UpdatedNov 21, 2024
  • Python

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

  • UpdatedApr 7, 2023

Docling4j brings the functionalities of Docling in document understanding to Java® projects

  • UpdatedMar 31, 2025
  • Java

Official implementation of our ECCVW paper "μgat: Improving Single-Page Document Parsing by Providing Multi-Page Context"

  • UpdatedAug 30, 2024
  • Python
ats

Applicant Tracking System (ATS): A powerful platform leveraging generative AI and soft-match algorithms to analyze resumes against job descriptions. Built with React and Node.js, it streamlines hiring insights. Future plans include expanding to investor pitches and other structured documents.

  • UpdatedApr 15, 2025
  • JavaScript

Tool for converting First National Bank (FNB) bank statement PDFs into useful structured data

  • UpdatedOct 31, 2024
  • Python

This is a collection of various document parsers and hands-on to construct structured data for your RAG applications.

  • UpdatedAug 17, 2025
  • Python

Improve this page

Add a description, image, and links to thedocument-parsing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thedocument-parsing topic, visit your repo's landing page and select "manage topics."

Learn more


[8]ページ先頭

©2009-2025 Movatter.jp