document-understanding
Here are 40 public repositories matching this topic...
Language:All
Sort:Most stars
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
- Updated
Jul 18, 2025 - Python
A Repo For Document AI
- Updated
Jul 17, 2025 - Python
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
- Updated
May 30, 2025 - Python
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
- Updated
Apr 9, 2025 - C++
A curated list of resources for Document Understanding (DU) topic
- Updated
Jun 2, 2023
Parsing-free RAG supported by VLMs
- Updated
Feb 19, 2025 - Python
Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020)
- Updated
Jul 25, 2024 - Python
Official PyTorch implementation of LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding (ACL 2022)
- Updated
Oct 31, 2022 - Python
Sample applications and demos for Document AI, the end-to-end document processing platform on Google Cloud
- Updated
Jul 18, 2025 - Jupyter Notebook
A Curated List of Awesome Table Structure Recognition (TSR) Research. Including models, papers, datasets and codes. Continuously updating.
- Updated
Sep 9, 2024
Algorithms, papers, datasets, performance comparisons for Document AI. Continuously updating.
- Updated
Mar 1, 2025
Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.
- Updated
Apr 3, 2024 - Python
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models
- Updated
Jan 13, 2025 - Jupyter Notebook
Doc2Graph transforms documents into graphs and exploit a GNN to solve several tasks.
- Updated
May 23, 2023 - Jupyter Notebook
ReadingBank: A Benchmark Dataset for Reading Order Detection
- Updated
Aug 26, 2024
Object Detection Model for Scanned Documents
- Updated
Mar 6, 2025 - Jupyter Notebook
Checkbox Detection Model for Scanned Documents
- Updated
Mar 6, 2025 - Jupyter Notebook
Datasets and Evaluation Scripts for CompHRDoc
- Updated
Feb 25, 2025 - Python
[MM'2024] PEneo, an effective algorithm for key-value pair extraction from form-like documents, designed for real-world applications.
- Updated
Apr 7, 2025 - Python
TAT-DQA: Towards Complex Document Understanding By Discrete Reasoning
- Updated
Sep 17, 2024
Improve this page
Add a description, image, and links to thedocument-understanding topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thedocument-understanding topic, visit your repo's landing page and select "manage topics."