- Notifications
You must be signed in to change notification settings - Fork387
Generate text images for training deep learning ocr model
License
Sanster/text_renderer
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
New version release:https://github.com/oh-my-ocr/text_renderer
Generate text images for training deep learning OCR model (e.g.CRNN).Support both latin and non-latin text.
- Ubuntu 16.04
- python 3.5+
Install dependencies:
pip3 install -r requirements.txt
By default, simply runpython3 main.py
will generate 20 text imagesand a labels.txt file inoutput/default/
.
Please run
python3 main.py --help
to see all optional arguments and their meanings.And put your own data in corresponding folder.Config text effects and fraction in
configs/default.yaml
file(or create anew config file and use it by--config_file
option), here are some examples:
- Run
main.py
file.
For no-latin language(e.g Chinese), it's very common that some fonts only supportlimited chars. In this case, you will get bad results like these:
Select fonts that support all chars in--chars_file
is annoying.Runmain.py
with--strict
option, renderer will retry get text fromcorpus during generate processing until all chars are supported by a font.
You can usecheck_font.py
script to check how many chars your font not support in--chars_file
:
python3 tools/check_font.pychecking font ./data/fonts/eng/Hack-Regular.ttfchars not supported(4971):['第','朱','广','沪','联','自','治','县','驼','身','进','行','纳','税','防','火','墙','掏','心','内','容','万','警','钟','上','了','解'...]0 fonts support all chars(5071)in ./data/chars/chn.txt:[]
If you want to use GPU to make generate image faster, first compile opencv with CUDA.Compiling OpenCV with CUDA support
Then build Cython part, and add--gpu
option when runmain.py
cd libs/gpupython3 setup.py build_ext --inplace
Runpython3 main.py --debug
will save images with extract information.You can see how perspectiveTransform works and all bounding/rotated boxes.
Seehttps://github.com/Sanster/text_renderer/projects/1
If you use text_renderer in your research, please consider use the following BibTeX entry.
@misc{text_renderer,author ={weiqing.chu},title ={text_renderer},howpublished ={\url{https://github.com/Sanster/text_renderer}},year ={2021}}