arkeodev/image-text-extractorPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star1

An OCR application that extracts text from images

License

MIT license

1 star 0 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
image-text-extractor		image-text-extractor
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Repository files navigation

Image Text Extractor

An OCR application that extracts text from images.

Features

Extract text from uploaded images
Process multiple image formats (PNG, JPG, JPEG, GIF, WEBP)
User-friendly Streamlit interface
RESTful API endpoints
Integration with Langchain for advanced text processing
Together AI Vision model integration

Prerequisites

Python 3.12 or higher
Poetry package manager
Together AI API key

Installation

1.Clone the repository:

   git clone https://github.com/yourusername/ImageTextExtractor.gitcd ImageTextExtractor

2.Install dependencies using Poetry:

   poetry install

Usage

Streamlit UI

1.Start the FastAPI backend:

   poetry run python main.py

2.In a new terminal, launch the Streamlit interface:

   poetry run streamlit run ui.py

3.Open your browser and navigate tohttp://localhost:8501

4.Enter your Together AI API key

5.Upload an image and wait for the results

REST API

The application exposes a REST API endpoint for OCR processing.

Endpoint: POST /ocr

Request:

URL:http://localhost:8000/ocr
Method:POST
Content-Type:multipart/form-data

Parameters:

file: Image file (supported formats: PNG, JPG, JPEG, GIF, WEBP)
api_key: Together AI API key
system_prompt: (Optional) Custom prompt for the vision model

Example using curl:

curl -X POST http://localhost:8000/ocr \-F"file=@/path/to/your/image.jpg" \-F"api_key=your_together_ai_api_key" \-F"system_prompt=Convert the provided image into text"

Response:

poetry run pytest

Environment Variables

The application uses the following configurations (defined inconfig.py):

LOGGING_LEVEL: Default is "INFO"
SUPPORTED_IMAGE_TYPES: [".png", ".jpg", ".jpeg", ".gif", ".webp"]
TOGETHER_MODEL_NAME: "meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo"

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Together AI for providing the vision model
Langchain for the AI integration framework
Streamlit for the user interface
FastAPI for the REST API implementation

About

An OCR application that extracts text from images

Releases

No releases published

Packages

No packages published

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Image Text Extractor

Features

Prerequisites

Installation

Usage

Streamlit UI

REST API

Endpoint: POST /ocr

Environment Variables

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

License

arkeodev/image-text-extractor

Folders and files

Latest commit

History

Repository files navigation

Image Text Extractor

Features

Prerequisites

Installation

Usage

Streamlit UI

REST API

Endpoint: POST /ocr

Environment Variables

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages