- Notifications
You must be signed in to change notification settings - Fork1
A simple, free tool for extracting text from scanned PDFs and images using OCR, and converting images to PDFs. It processes files locally in the browser, ensuring privacy and security while enabling users to effortlessly convert documents and images into editable text or PDF format.
AzozzALFiras/Pdf-OCR
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
A simple, free, and easy-to-use tool for converting scanned PDF files, images, and documents to text using Optical Character Recognition (OCR). This tool processes files locally in the browser, allowing developers and users to extract text from PDF documents and images, as well as convert images into PDFs.
- OCR-powered PDF to Text Conversion: Extract text from scanned PDF files using Tesseract.js.
- Image OCR: Extract text from images (JPG, PNG, etc.) using OCR technology.
- Multi-language Support: Supports various languages including English, Arabic, Spanish, French, and more.
- Image to PDF Conversion: Convert images (JPG, PNG, etc.) into a PDF file.
- Downloadable Output: Extracted text can be downloaded as a PDF file or plain text file.
- Copy to Clipboard: The extracted text can be copied to the clipboard for easy pasting.
- Local Processing: All processing is done locally in the browser, ensuring privacy and security.
- Tesseract.js: A powerful JavaScript library for OCR.
- pdf.js: A PDF rendering engine that allows us to convert PDF pages to images.
- jsPDF: A library to generate downloadable PDFs.
- Tailwind CSS: A utility-first CSS framework for modern web design.
- Open the PDF2Text OCR Tool in your browser.
- Select a PDF file by clicking theSelect PDF File button.
- Choose the language for OCR from the dropdown menu.
- Wait for the tool to process the file and extract the text.
- Once the extraction is complete, you can:
- Copy the text to your clipboard.
- Download the extracted text as a PDF.
To run this project locally, follow these steps:
Clone the repository:
git clone https://github.com/AzozzALFiras/Pdf-OCR.git
2- Navigate to the project folder:
cd Pdf2Text-OCR
3- Open the index.html file in your browser to use the tool locally.
- Tesseract.js – An open-source OCR (Optical Character Recognition) library.
- pdf.js – A Mozilla project that allows the rendering of PDF documents in a web browser.
- jsPDF – A library for generating PDF documents using JavaScript.
- Tailwind CSS – A utility-first CSS framework for building custom user interfaces.
About
A simple, free tool for extracting text from scanned PDFs and images using OCR, and converting images to PDFs. It processes files locally in the browser, ensuring privacy and security while enabling users to effortlessly convert documents and images into editable text or PDF format.