layout-analysis

An Open-Source Python3 tool with SMALL models for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported.

python ocr latex pytorch latex-pdf math-formula layout-analysis math-ocr mathpix table-ocr math-formula-recognition image-to-markdown

UpdatedJul 25, 2025
Jupyter Notebook

UglyToad /PdfPig

Star2.3k

Read and extract text and other content from PDFs in C# (port of PDFBox)

pdf csharp pdfbox netstandard pdf-files pdf-document pdf-generation hocr document-analysis pdf-extractor alto-xml page-xml layout-analysis pdf-document-processor

UpdatedDec 7, 2025
C#

kotaro-kinoshita /yomitoku

Sponsor

Star1.2k

YomiTokuはAIを活用した日本語文書解析エンジンを提供するPythonパッケージです。 Yomitoku is an AI-powered document image analysis package designed specifically for the Japanese language.

python ocr deep-learning pytorch layout-analysis

UpdatedDec 17, 2025
Python

mittagessen /kraken

Star921

OCR engine for all the languages

ocr neural-networks hocr optical-character-recognition htr handwritten-text-recognition alto-xml page-xml layout-analysis

UpdatedDec 13, 2025
Python

BobLd /DocumentLayoutAnalysis

Sponsor

Star627

Document Layout Analysis resources repos for development with PdfPig.

pdf csharp hocr tei hocr-documents alto-xml table-extraction page-xml alto layout-analysis document-layout-analysis xycut docstrum pdfpig xy-cut recursive-xy-cut page-segmentation

UpdatedOct 1, 2023
C#

mindspore-lab /mindocr

Star293

A toolbox of ocr models and algorithms based on MindSpore

ocr deep-learning text-recognition text-detection layout-analysis crnn dbnet table-recognition mindspore key-information-extraction layoutxlm ocr-large-model tablemaster vary-toy

UpdatedJul 24, 2025
Python

RapidAI /RapidLayout

Star256

Analysis of Chinese and English layouts 中英文版面分析

layout layout-analysis cdla pp-structure doclayout-yolo

UpdatedAug 6, 2025
Python

RapidAI /RapidDocEx

Star207

📝 针对文档类图像做内容提取，将文档类图像一比一输出到Word或者Txt中，便于进一步使用或处理。后续计划支持输入PDF/图像，输出对应json格式、Txt格式、Word格式和Markdown格式。

layout-analysis layout-recover

UpdatedNov 1, 2024
Python

ppaanngggg /yolo-doclaynet

Star145

YOLO models trained by DocLayNet - power your Document Intelligent by Layout Analysis

yolo document-analysis layout-analysis ultralytics yolov8 doclaynet

UpdatedAug 3, 2025
Python

andreagemelli /doc2graph

Star135

Doc2Graph transforms documents into graphs and exploit a GNN to solve several tasks.

nlp deep-learning pytorch layout-analysis geometric-deep-learning table-detection gnn document-understanding key-information-extraction

UpdatedOct 18, 2025
Jupyter Notebook

xushengfeng /eSearch-OCR

Star106

基于paddleOCR的nodejs库

nodejs ocr layout-analysis onnx paddleocr

UpdatedAug 23, 2025
TypeScript

NormXU /Layout2Graph

Star81

An official implementation of paper "Paragraph2Graph: A Language-independent GNN-based framework for layout analysis"

layout-analysis gnn-framework

UpdatedOct 14, 2023
Python

CycloneBoy /pdf_table

Star55

A Unified Toolkit for Deep Learning-Based Table Extraction

pdf ocr ai table layout-analysis pdf-to-html table-recognition document-parsing

UpdatedNov 21, 2024
Python

JPLeoRX /detectron2-publaynet

Star50

Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset

python machine-learning computer-vision deep-learning neural-network python3 pytorch artificial-intelligence neural-networks faster-rcnn document-classification object-detection document-analysis document-layout instance-segmentation layout-analysis document-layout-analysis detectron2 publaynet

UpdatedApr 16, 2023
Python

MaitySubhajit /SelfDocSeg

Star42

[ICDAR 2023] SelfDocSeg: A self-supervised vision-based approach towards Document Segmentation (Oral)

computer-vision layout-analysis self-supervised-learning document-segmentation

UpdatedOct 6, 2023
Python

empressabyss /nordrassil

Star36

A keyboard layout that provides an elegant and balanced typing experience by its use of a thumb-alpha, emphasis on middle fingers, deprioritisation of pinkies, and arcane keys.

warcraft layouts keyboard-layout qmk keyboards dactyl layout-analysis arcane arcane-key

UpdatedNov 7, 2025

dell-research-harvard /HJDataset

Star34

A Large Dataset of Historical Japanese Documents with Complex Layouts

python dataset layout-analysis detectron2

UpdatedJul 22, 2022
Jupyter Notebook

Improve this page

Add a description, image, and links to thelayout-analysis topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thelayout-analysis topic, visit your repo's landing page and select "manage topics."

Learn more

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly