Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

A powerful OCR (Optical Character Recognition) package that uses state-of-the-art vision language models

License

NotificationsYou must be signed in to change notification settings

dwqs/ollama-ocr

Repository files navigation

inspired byimanoop7/Ollama-OCR

Ollama OCR for web

A powerful OCR (Optical Character Recognition) package that uses state-of-the-art vision language models through Ollama to extract text from images.

Supported Models

  • LLaVA: A multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities mimicking spirits of the multimodal GPT-4. (LLaVa model can generate wrong output sometimes)
  • Llama 3.2 Vision: Instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image
  • MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Quick Start

Prerequisites

  1. InstallOllama
  2. Pull the required models:
ollama pull llama3.2-vision:11bollama pull llava:13bollama pull minicpm-v:8b

Then run following command:

git clone git@github.com:dwqs/ollama-ocr.gitcd ollama-ocryarn or npm iyarn dev or npm run dev

Docker Supports

you can run the demo from docker:debounce/ollama-ocr

Examples

Input Image1

input-image

Output Markdown

output-markdown.png

Input Image2

input-image

Output JSON

output-json.png

Output Format Details

  • Markdown Format: The output is a markdown string containing the extracted text from the image.
  • Text Format: The output is a plain text string containing the extracted text from the image.
  • JSON Format: The output is a JSON object containing the extracted text from the image.

License

MIT

About

A powerful OCR (Optical Character Recognition) package that uses state-of-the-art vision language models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp