Movatterモバイル変換

dwqs/ollama-ocrPublic

NotificationsYou must be signed in to change notification settings
Fork15
Star251

A powerful OCR (Optical Character Recognition) package that uses state-of-the-art vision language models

License

MIT license

251 stars 15 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.vscode		.vscode
public		public
src		src
.editorconfig		.editorconfig
.gitignore		.gitignore
.prettierrc.json		.prettierrc.json
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
env.d.ts		env.d.ts
eslint.config.js		eslint.config.js
index.html		index.html
package.json		package.json
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts
yarn.lock		yarn.lock

Repository files navigation

inspired by imanoop7/Ollama-OCR

Ollama OCR for web

A powerful OCR (Optical Character Recognition) package that uses state-of-the-art vision language models through Ollama to extract text from images.

Supported Models

LLaVA: A multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities mimicking spirits of the multimodal GPT-4. (LLaVa model can generate wrong output sometimes)
Llama 3.2 Vision: Instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Quick Start

Prerequisites

InstallOllama
Pull the required models:

ollama pull llama3.2-vision:11bollama pull llava:13bollama pull minicpm-v:8b

Then run following command:

git clone git@github.com:dwqs/ollama-ocr.gitcd ollama-ocryarn or npm iyarn dev or npm run dev

Docker Supports

you can run the demo from docker:debounce/ollama-ocr

Examples

Input Image1

Output Markdown

Input Image2

Output JSON

Output Format Details

Markdown Format: The output is a markdown string containing the extracted text from the image.
Text Format: The output is a plain text string containing the extracted text from the image.
JSON Format: The output is a JSON object containing the extracted text from the image.

License

MIT

About

A powerful OCR (Optical Character Recognition) package that uses state-of-the-art vision language models

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Ollama OCR for web

Supported Models

Quick Start

Prerequisites

Docker Supports

Examples

Input Image1

Output Markdown

Input Image2

Output JSON

Output Format Details

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

License

dwqs/ollama-ocr

Folders and files

Latest commit

History

Repository files navigation

Ollama OCR for web

Supported Models

Quick Start

Prerequisites

Docker Supports

Examples

Input Image1

Output Markdown

Input Image2

Output JSON

Output Format Details

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages