ocr-python

Star

Here are 495 public repositories matching this topic...

Language:All

Filter by language

All495 Python349 Jupyter Notebook80 HTML19 JavaScript8 TypeScript7 CSS4 Shell4 C++3 Roff3 Dart2

Sort:Most stars

Sort options

Most stars Fewest stars Most forks Fewest forks Recently updated Least recently updated

hiroi-sora /Umi-OCR

Star40.7k

OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片，PDF文档识别，排除水印/页眉页脚，扫描/生成二维码。内置多国语言库。

screenshot qt ocr qml ocr-python paddleocr umi-ocr

UpdatedNov 20, 2025
Python

CnOCR: Awesome Chinese/English OCR Python toolkits based on PyTorch. It comes with 20+ well-trained models for different application scenarios and can be used directly after installation. 【基于 PyTorch/MXNet 的中文/英文 OCR Python 包。】

ocr pytorch chinese-character-recognition ocr-python english-character-recognition

UpdatedSep 21, 2025
Python

CatchTheTornado /text-extract-api

Star3k

Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown

api pdf json ocr extract anonymization pii ocr-python llm

UpdatedDec 8, 2025
Python

hiroi-sora /Umi-OCR_v2

Star948

结束和新的开始

qt ocr qml ocr-python paddleocr

UpdatedNov 19, 2023
QML

ankandrew /fast-plate-ocr

Star347

Lightweight & fast OCR models for license plate text recognition.

ocr tensorflow keras pytorch license-plate plate-recognition onnx license-plate-recognition jax ocr-python albumentations plate-ocr license-plate-reader keras3 license-plate-check license-plate-ocr

UpdatedSep 23, 2025
Python

Psarpei /Multi-Type-TD-TSR

Star282

Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition

nlp computer-science machine-learning natural-language-processing ocr computer-vision deep-learning algorithms machine-learning-algorithms image-processing nlp-machine-learning ocr-recognition computer-vision-algorithms ocr-python table-detection computer-vision-opencv table-detection-using-deep-learning table-structure-recognition

UpdatedSep 5, 2022
Jupyter Notebook

maxent-ai /ocrpy

Star223

OCR, Archive, Index and Search: Implementation agnostic OCR framework.

python nlp aws information-retrieval ocr computer-vision deep-learning azure cv image-processing transformers tesseract-ocr google-vision-api semantic-search ocr-python

UpdatedNov 3, 2023
Jupyter Notebook

MrZilinXiao /Hyper-Table-OCR

Star178

A carefully-designed OCR pipeline for universal boarded table recognition and reconstruction.

ocr deep-learning table-extraction ocr-python table-ocr

UpdatedJan 10, 2023
C++

nathanaday /RealTime-OCR

Star178

Perform text detection in a variety of languages with your computer webcam using Google Tesseract OCR and OpenCV. This script achieves a real-time OCR effect via multi-threading.

python ocr multithreading cv2 opencv-python pytesseract ocr-python

UpdatedJan 30, 2023
Python

genieincodebottle /parsemypdf

Star158

Collection of PDF parsing libraries like AI based docling, claude, openai, gemini, meta's llama-vision, unstructured-io, and pdfminer, pymupdf, pdfplumber etc for efficient snapshot, text, table, and metadata extraction.

ocr openai claude camelot pymupdf pypdf ocr-python markitdown gemini-pro gemini-ai llama-parse omniai unstructured-io docling llama-vision mistral-ocr smoldocling llama4

UpdatedAug 29, 2025
Python

blueaxis /Cloe

Star139

Manga OCR snipping application for desktop

ocr pyqt5 ocr-python snipping-tool manga-ocr

UpdatedJan 7, 2023
Python

ilic5000 /pabkvizgenerator

Star127

Anansi is a computer vision (cv2 and FFmpeg) + OCR (EasyOCR and tesseract) python-based crawler for finding and extracting questions and correct answers from video files of popular TV game shows in the Balkan region.

python opencv computer-vision tesseract quiz-game quiz-app ocr-python easyocr