document-image-processing
Here are 19 public repositories matching this topic...
Language:All
Sort:Most stars
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.
- Updated
Jul 18, 2025 - HTML
A Unified Toolkit for Deep Learning Based Document Image Analysis
- Updated
Aug 15, 2024 - Python
A comprehensive list of awesome document image rectification papers.
- Updated
Jun 15, 2025
The official code for “DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction”, ACM MM, Oral Paper, 2021.
- Updated
Jun 18, 2025 - Python
The official repo for “DocScanner: Robust Document Image Rectification with Progressive Learning”, IJCV, 2025.
- Updated
Jun 18, 2025 - Python
Detectron2 for Document Layout Analysis
- Updated
Aug 2, 2024 - Python
A comprehensive list of document parsers, covering PDF-to-text conversion and layout extraction. Each tested for support of tables, equations, handwriting, two-column layouts, and multi-column layouts.
- Updated
Jul 14, 2025
The official code for “Geometric Representation Learning for Document Image Rectification”, ECCV, 2022.
- Updated
Jun 18, 2025 - Python
文档图像处理工具(Document image processing tool),包括漂白 / 文字方向矫正 / 清晰增强 / 笔记去噪美化 / 去阴影 / 扭曲矫正 / 切边增强(DocBleach / TextOrientationCorrection / DocSharpening / HandwritingDenoisingBeautifying / DocShadowRemoval / document_image_dewarping / DocTrimmingEnhancement)。
- Updated
Aug 27, 2024 - Python
Android App for English Handwritten Text Recognition
- Updated
Sep 20, 2017 - Java
Process Caltech Archives' digital documents and photos, and annotate each page or image with information about its contents
- Updated
May 5, 2022 - Python
Python wrapper to facilitate data manipulation for the SmartDoc 2015 - Challenge 1 Dataset.
- Updated
Jun 17, 2024 - Jupyter Notebook
The ScriptNet / competitions site.
- Updated
Dec 16, 2018 - Python
A web app evaluating the quality the scanned document images
- Updated
Feb 1, 2024 - HTML
复杂背景图像漂白,文字方向矫正,清晰增强,笔记去噪美化,去阴影,扭曲矫正,去黑点以及切边增强。complex background image bleaching, text direction correction, clarity enhancement, note to blur beautification, shadow removal, distortion correction, black spots removal and cutting edge enhancement。
- Updated
May 23, 2024
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
- Updated
Mar 3, 2023 - HTML
This script automates the process of extracting text from various file formats (images, PDFs, DOCX) using Optical Character Recognition (OCR) powered by Azure Cognitive Services. The script supports image preprocessing, text extraction, and uploading of the processed files to Google Cloud Storage (GCP).
- Updated
Jan 30, 2025 - Python
Sophia Trikoupi dataset (Collection of 46 handwritten, annotated pages)
- Updated
Apr 29, 2019 - Python
- Updated
Jul 19, 2018
Improve this page
Add a description, image, and links to thedocument-image-processing topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thedocument-image-processing topic, visit your repo's landing page and select "manage topics."