- Notifications
You must be signed in to change notification settings - Fork144
Fork of tess-two rewritten from scratch to support latest version of Tesseract OCR.
License
adaptech-cz/Tesseract4Android
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Fork of tess-two rewritten from scratch to build with CMake and support latest Android Studio and Tesseract OCR.
The Java/JNI wrapper files and tests for Leptonica / Tesseract are based on thetess-two project,which is based onTesseract Tools for Android.
This project uses additional libraries (with their own specific licenses):
- Tesseract OCR 5.5.1
- Leptonica 1.85.0
- libjpeg v9f
- libpng 1.6.48
- Android 5.0 (API 21) or higher
- A v4.0.0trained data file(s) for language(s) you want to use.
- These files must be placed in the (sub)directory named
tessdataand the path must be readableby the app. When targeting API >=29, only suitable places for this are app's private directories(likecontext.getFilesDir()orcontext.getExternalFilesDir()).
- These files must be placed in the (sub)directory named
This library is available in two variants.
- Standard - Single-threaded. Best for single-core processors or when using multiple Tesseractinstances in parallel.
- OpenMP - Multi-threaded. Provides better performance on multi-core processors when using onlysingle instance of Tesseract.
You can get compiled version of Tesseract4Android from JitPack.io.
- Add the JitPack repository to your project root
settings.gradlefile at the end of repositories:
dependencyResolutionManagement { repositories { maven {... url= uri("https://jitpack.io") } }}
- Add the dependency to your app module
build.gradlefile:
dependencies {// To use Standard variant: implementation'cz.adaptech.tesseract4android:tesseract4android:4.9.0'// To use OpenMP variant: implementation'cz.adaptech.tesseract4android:tesseract4android-openmp:4.9.0'}
- Use the
TessBaseAPIclass in your code:
This is the simplest example you can have. In this case TessBaseAPI is always created, used to recognize the image and then destroyed.Better would be to create and initialize the instance only once and use it to recognize multiple images instead. Look at thesampleproject for such usage, additionally with progress notifications and a way to stop the ongoing processing.
// Create TessBaseAPI instance (this internally creates the native Tesseract instance)TessBaseAPItess =newTessBaseAPI();// NOTE: TessBaseAPI is not thread-safe. If you want to process multiple images in parallel,// create separate instance of TessBaseAPI for each thread.// Given path must contain subdirectory `tessdata` where are `*.traineddata` language files// The path must be directly readable by the appStringdataPath =newFile(context.getFilesDir(),"tesseract").getAbsolutePath();// Initialize API for specified language// (can be called multiple times during Tesseract lifetime)if (!tess.init(dataPath,"eng")) {// could be multiple languages, like "eng+deu+fra"// Error initializing Tesseract (wrong/inaccessible data path or not existing language file(s))// Release the native Tesseract instancetess.recycle();return;}// Load the image (file path, Bitmap, Pix...)// (can be called multiple times during Tesseract lifetime)tess.setImage(image);// Start the recognition (if not done for this image yet) and retrieve the result// (can be called multiple times during Tesseract lifetime)Stringtext =tess.getUTF8Text();// Release the native Tesseract instance when you don't want to use it anymore// After this call, no method can be called on this TessBaseAPI instancetess.recycle();
There is example application in thesample directory. It shows basic usage of the TessBaseAPIinside ViewModel, showing progress indication, allowing stopping the processing, filtering resultsbased on confidence, and more.
It uses sample image and english traineddata, which are extracted from the assets in the APKto app's private directory on device. This is simple, but you are keeping 2 instances of the datafile (first is kept in the APK file itself, second is kept on the storage) - wasting some space.If you plan to use multiple traineddata files, it would be better to download them directly fromthe internet rather than distributing them within the APK.
You can use Android Studio to open the project and build the AAR. Or you can usegradlew from command line.
To build the release version of the library, use tasktesseract4android:assembleRelease.After successful build, you will have resultingAAR files in the<project dir>/tesseract4Android/build/outputs/aar/ directory.
Or you can publish the AAR directly to your local maven repository, by using tasktesseract4android:publishToMavenLocal.After successful build, you can consume your library as any other maven dependency. Just make sureto addmavenLocal() repository inrepositories {} block in your project'sbuild.gradle file.
- Open this project in Android Studio.
- Open Gradle panel, expand
Tesseract4Android / :tesseract4Android / Tasks / otherand runassembleRelease(to get AAR). - Or in the same panel expand
Tesseract4Android / :tesseract4Android / Tasks / publishingand runpublishToMavenLocal(to publish AAR).
- In project directory create
local.propertiesfile containing:
sdk.dir=c\:\\your\\path\\to\\android\\sdkndk.dir=c\:\\your\\path\\to\\android\\ndk
Note for paths on Windows you must use\ to escape some special characters, as in example above.
- Call
gradlew tesseract4android:assembleReleasefrom command line (to get AAR). - Or call
gradlew tesseract4android:publishToMavenLocalfrom command line (to publish AAR).
Copyright 2019 Adaptech s.r.o., Robert PöselLicensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.About
Fork of tess-two rewritten from scratch to support latest version of Tesseract OCR.
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.