- Notifications
You must be signed in to change notification settings - Fork15
A powerful OCR (Optical Character Recognition) package that uses state-of-the-art vision language models
License
NotificationsYou must be signed in to change notification settings
dwqs/ollama-ocr
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
inspired byimanoop7/Ollama-OCR
A powerful OCR (Optical Character Recognition) package that uses state-of-the-art vision language models through Ollama to extract text from images.
- LLaVA: A multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities mimicking spirits of the multimodal GPT-4. (LLaVa model can generate wrong output sometimes)
- Llama 3.2 Vision: Instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image
- MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
- InstallOllama
- Pull the required models:
ollama pull llama3.2-vision:11bollama pull llava:13bollama pull minicpm-v:8b
Then run following command:
git clone git@github.com:dwqs/ollama-ocr.gitcd ollama-ocryarn or npm iyarn dev or npm run devyou can run the demo from docker:debounce/ollama-ocr
- Markdown Format: The output is a markdown string containing the extracted text from the image.
- Text Format: The output is a plain text string containing the extracted text from the image.
- JSON Format: The output is a JSON object containing the extracted text from the image.
MIT
About
A powerful OCR (Optical Character Recognition) package that uses state-of-the-art vision language models
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
No releases published
Packages0
No packages published