AzozzALFiras/Pdf-OCRPublic

NotificationsYou must be signed in to change notification settings
Fork1
Star6

A simple, free tool for extracting text from scanned PDFs and images using OCR, and converting images to PDFs. It processes files locally in the browser, ensuring privacy and security while enabling users to effortlessly convert documents and images into editable text or PDF format.

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Img2PDF.html		Img2PDF.html
README.md		README.md
index.html		index.html

Repository files navigation

PDF2Text OCR Tool with Image to PDF and Image OCR

A simple, free, and easy-to-use tool for converting scanned PDF files, images, and documents to text using Optical Character Recognition (OCR). This tool processes files locally in the browser, allowing developers and users to extract text from PDF documents and images, as well as convert images into PDFs.

Features

OCR-powered PDF to Text Conversion: Extract text from scanned PDF files using Tesseract.js.
Image OCR: Extract text from images (JPG, PNG, etc.) using OCR technology.
Multi-language Support: Supports various languages including English, Arabic, Spanish, French, and more.
Image to PDF Conversion: Convert images (JPG, PNG, etc.) into a PDF file.
Downloadable Output: Extracted text can be downloaded as a PDF file or plain text file.
Copy to Clipboard: The extracted text can be copied to the clipboard for easy pasting.
Local Processing: All processing is done locally in the browser, ensuring privacy and security.

Technologies

Tesseract.js: A powerful JavaScript library for OCR.
pdf.js: A PDF rendering engine that allows us to convert PDF pages to images.
jsPDF: A library to generate downloadable PDFs.
Tailwind CSS: A utility-first CSS framework for modern web design.

Usage

Open the PDF2Text OCR Tool in your browser.
Select a PDF file by clicking theSelect PDF File button.
Choose the language for OCR from the dropdown menu.
Wait for the tool to process the file and extract the text.
Once the extraction is complete, you can:
- Copy the text to your clipboard.
- Download the extracted text as a PDF.

Installation

To run this project locally, follow these steps:

Clone the repository:

git clone https://github.com/AzozzALFiras/Pdf-OCR.git

2- Navigate to the project folder:

cd Pdf2Text-OCR

3- Open the index.html file in your browser to use the tool locally.

License

MIT

Acknowledgments

Tesseract.js – An open-source OCR (Optical Character Recognition) library.
pdf.js – A Mozilla project that allows the rendering of PDF documents in a web browser.
jsPDF – A library for generating PDF documents using JavaScript.
Tailwind CSS – A utility-first CSS framework for building custom user interfaces.

About

Releases

No releases published

Packages

No packages published

Languages

HTML100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

PDF2Text OCR Tool with Image to PDF and Image OCR

Features

Technologies

Usage

Installation

License

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

AzozzALFiras/Pdf-OCR

Folders and files

Latest commit

History

Repository files navigation

PDF2Text OCR Tool with Image to PDF and Image OCR

Features

Technologies

Usage

Installation

License

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages