Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

An OCR application that extracts text from images

License

NotificationsYou must be signed in to change notification settings

arkeodev/image-text-extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

An OCR application that extracts text from images.

Features

  • Extract text from uploaded images
  • Process multiple image formats (PNG, JPG, JPEG, GIF, WEBP)
  • User-friendly Streamlit interface
  • RESTful API endpoints
  • Integration with Langchain for advanced text processing
  • Together AI Vision model integration

Prerequisites

  • Python 3.12 or higher
  • Poetry package manager
  • Together AI API key

Installation

1.Clone the repository:

   git clone https://github.com/yourusername/ImageTextExtractor.gitcd ImageTextExtractor

2.Install dependencies using Poetry:

   poetry install

Usage

Streamlit UI

1.Start the FastAPI backend:

   poetry run python main.py

2.In a new terminal, launch the Streamlit interface:

   poetry run streamlit run ui.py

3.Open your browser and navigate tohttp://localhost:8501

4.Enter your Together AI API key

5.Upload an image and wait for the results

REST API

The application exposes a REST API endpoint for OCR processing.

Endpoint: POST /ocr

Request:

  • URL:http://localhost:8000/ocr
  • Method:POST
  • Content-Type:multipart/form-data

Parameters:

  • file: Image file (supported formats: PNG, JPG, JPEG, GIF, WEBP)
  • api_key: Together AI API key
  • system_prompt: (Optional) Custom prompt for the vision model

Example using curl:

curl -X POST http://localhost:8000/ocr \-F"file=@/path/to/your/image.jpg" \-F"api_key=your_together_ai_api_key" \-F"system_prompt=Convert the provided image into text"

Response:

poetry run pytest

Environment Variables

The application uses the following configurations (defined inconfig.py):

  • LOGGING_LEVEL: Default is "INFO"
  • SUPPORTED_IMAGE_TYPES: [".png", ".jpg", ".jpeg", ".gif", ".webp"]
  • TOGETHER_MODEL_NAME: "meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo"

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

About

An OCR application that extracts text from images

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp