- Notifications
You must be signed in to change notification settings - Fork24
Java library for identifying Japanese characters from images
License
sakarika/kanjitomo-ocr
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
KanjiTomo OCR is a Java library for identifying Japanese characters from images. The algorithm used in this library is custom made, originally used with KanjiTomo program:https://www.kanjitomo.net/ Description of the algorithmis here.
This library is intented to be used with interactive programs where user can point to individual words with a mouse. Batch-processing whole pages is not supported.
- IncludeKanjiTomoOCR.jar to your project
- Add "--illegal-access=deny" JVM parameter, this is not strictly required but prevents unnecessary warnings on startup
- "-Xmx1200m" and "-server" JVM parameters are also recommended for performance reasons
- CreateKanjiTomo class instance
- Load data structures withloadData method. This needs to be done only at startup.
- Set the target image withsetTargetImage method. This can be whole page or screenshot around target word. Screenshots around mouse cursor can be taken with Java'sRobot class.
- Start OCR withrunOCR method. Point argument determines the first character to be scanned in target image's coordinates and should correspond to mouse cursor location.
- Results are returned asOCRResults object.bestMatchingCharacters field contains a list of identified characters andwords list contains results of a dictionary search from these characters.
KanjiTomotomo =newKanjiTomo();tomo.loadData();BufferedImageimage =ImageIO.read(newFile("file.png"));tomo.setTargetImage(image);OCRResultsresults =tomo.runOCR(newPoint(80,40));System.out.println(results);
KanjiTomo is free to use for non-commercial purposes. License file ishere
KanjiTomo has been created by Sakari Kääriäinen. You can contact me at kanjitomo(at)gmail.com
EDICT, ENAMDICT and KANJIDIC dictionaries are the property of the Electronic Dictionary Research and Development Group, and are used in conformance with the Group's licence.
https://www.edrdg.org/jmdict/edict.html
imgscalr library by Riyad Kalla
https://github.com/rkalla/imgscalr
Unsharp Mask code by Romain Guy
http://www.java2s.com/Code/Java/Advanced-Graphics/UnsharpMaskDemo.htm
Kryo library by EsotericSoftware
https://github.com/EsotericSoftware/kryo
About
Java library for identifying Japanese characters from images
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.