- Notifications
You must be signed in to change notification settings - Fork233
License
NotificationsYou must be signed in to change notification settings
imanoop7/Ollama-OCR
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
A powerful OCR (Optical Character Recognition) package that uses state-of-the-art vision language models through Ollama to extract text from images and PDF. Available both as a Python package and a Streamlit web application.
Multiple Vision Models Support
- LLaVA: Efficient vision-language model for real-time processing (LLaVa model can generate wrong output sometimes)
- Llama 3.2 Vision: Advanced model with high accuracy for complex documents
- Granite3.2-vision: A compact and efficient vision-language model, specifically designed for visual document understanding, enabling automated content extraction from tables, charts, infographics, plots, diagrams, and more.
- Moondream: Small vision language model designed to run efficiently on edge devices.
- Minicpm-v: MiniCPM-V 2.6 can process images with any aspect ratio and up to 1.8 million pixels (e.g., 1344x1344).
Multiple Output Formats
- Markdown: Preserves text formatting with headers and lists
- Plain Text: Clean, simple text extraction
- JSON: Structured data format
- Structured: Tables and organized data
- Key-Value Pairs: Extracts labeled information
- Table: Extract all tabular data.
Batch Processing
- Process multiple images in parallel
- Progress tracking for each image
- Image preprocessing (resize, normalize, etc.)
Custom Prompts
- Override default prompts with custom instructions for text extraction.
pip install ollama-ocr
- Install Ollama
- Pull the required model:
ollama pull llama3.2-vision:11bollama pull granite3.2-visionollama pull moondreamollama pull minicpm-v
fromollama_ocrimportOCRProcessor# Initialize OCR processorocr=OCRProcessor(model_name='llama3.2-vision:11b',base_url="http://host.docker.internal:11434/api/generate")# You can use any vision model available on Ollama# you can pass your custom ollama api# Process an imageresult=ocr.process_image(image_path="path/to/your/image.png",# path to your pdf files "path/to/your/file.pdf"format_type="markdown",# Options: markdown, text, json, structured, key_valuecustom_prompt="Extract all text, focusing on dates and names.",# Optional custom promptlanguage="English"# Specify the language of the text (New! 🆕))print(result)
fromollama_ocrimportOCRProcessor# Initialize OCR processorocr=OCRProcessor(model_name='llama3.2-vision:11b',max_workers=4)# max workers for parallel processing# Process multiple images# Process multiple images with progress trackingbatch_results=ocr.process_batch(input_path="path/to/images/folder",# Directory or list of image pathsformat_type="markdown",recursive=True,# Search subdirectoriespreprocess=True,# Enable image preprocessingcustom_prompt="Extract all text, focusing on dates and names.",# Optional custom promptlanguage="English"# Specify the language of the text (New! 🆕))# Access resultsforfile_path,textinbatch_results['results'].items():print(f"\nFile:{file_path}")print(f"Extracted Text:{text}")# View statisticsprint("\nProcessing Statistics:")print(f"Total images:{batch_results['statistics']['total']}")print(f"Successfully processed:{batch_results['statistics']['successful']}")print(f"Failed:{batch_results['statistics']['failed']}")
- Markdown Format: The output is a markdown string containing the extracted text from the image.
- Text Format: The output is a plain text string containing the extracted text from the image.
- JSON Format: The output is a JSON object containing the extracted text from the image.
- Structured Format: The output is a structured object containing the extracted text from the image.
- Key-Value Format: The output is a dictionary containing the extracted text from the image.
- Table Format: Extract all tabular data.
- User-Friendly Interface
- Drag-and-drop file upload
- Real-time processing
- Download extracted text
- Image preview with details
- Responsive design
- Language Selection: Specify the language for better OCR accuracy. (New! 🆕)
- Clone the repository:
git clone https://github.com/imanoop7/Ollama-OCR.gitcd Ollama-OCR- Install dependencies:
pip install -r requirements.txt
- Go to the directory where app.py is located:
cd src/ollama_ocr- Run the Streamlit app:
streamlit run app.py
- Ollama OCR on Colab: How to use Ollama-OCR on Google Colab.
- Example Notebook: Example usage of Ollama OCR.
- Ollama OCR with Autogen: Use Ollama-OCR with autogen.
- Ollama OCR with LangGraph: Use Ollama-OCR with LangGraph.
This project is licensed under the MIT License - see the LICENSE file for details.
Built with OllamaPowered by Vision Models
About
No description, website, or topics provided.
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
No releases published
Packages0
No packages published
Uh oh!
There was an error while loading.Please reload this page.
Contributors4
Uh oh!
There was an error while loading.Please reload this page.



