- Notifications
You must be signed in to change notification settings - Fork154
TextBoxes: A Fast Text Detector with a Single Deep Neural Network
License
MhLiao/TextBoxes
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Recommend:TextBoxes++ is an extended work of TextBoxes, which supports oriented scene text detection. The recognition part is also included inTextBoxes++.
This paper presents an end-to-end trainable fast scene text detector, named TextBoxes, which detects scene text with both high accuracy and efficiency in a single network forward pass, involving no post-process except for a standard nonmaximum suppression. For more details, please refer to ourpaper.
Please cite TextBoxes in your publications if it helps your research:
@inproceedings{LiaoSBWL17, author = {Minghui Liao and Baoguang Shi and Xiang Bai and Xinggang Wang and Wenyu Liu}, title = {TextBoxes: {A} Fast Text Detector with a Single Deep Neural Network}, booktitle = {AAAI}, year = {2017}}
- Get the code. We will call the directory that you cloned Caffe into
$CAFFE_ROOT
git clone https://github.com/MhLiao/TextBoxes.gitcd TextBoxesmake -j8make py
- Models trained on ICDAR 2013:Dropbox linkBaiduYun link
- Fully convolutional reduced (atrous) VGGNet:Dropbox linkBaiduYun link
- Compiled mex file for evaluation(for multi-scale test evaluation: evaluation_nms.m):Dropbox linkBaiduYun link
- run "python examples/demo.py".
- You can modify the "use_multi_scale" in the "examples/demo.py" script to control whether to use multi-scale or not.
- The results are saved in the "examples/results/".
- Train about 50k iterions on Synthetic data which refered in the paper.
- Train about 2k iterions on corresponding training data such as ICDAR 2013 and SVT.
- For more information, such as learning rate setting, please refer to the paper.
- Using the given test code, you can achieve an F-measure of about 80% on ICDAR 2013 with a single scale.
- Using the given multi-scale test code, you can achieve an F-measure of about 85% on ICDAR 2013 with a non-maximum suppression.
- More performance information, please refer to the paper and Task1 and Task4 of Challenge2 on the ICDAR 2015 website:http://rrc.cvc.uab.es/?ch=2&com=evaluation
The reference xml file is as following:
<?xml version="1.0" encoding="utf-8"?> <annotation> <object> <name>text</name> <bndbox> <xmin>158</xmin> <ymin>128</ymin> <xmax>411</xmax> <ymax>181</ymax> </bndbox> </object> <object> <name>text</name> <bndbox> <xmin>443</xmin> <ymin>128</ymin> <xmax>501</xmax> <ymax>169</ymax> </bndbox> </object> <folder></folder> <filename>100.jpg</filename> <size> <width>640</width> <height>480</height> <depth>3</depth> </size> </annotation>
Please let me know if you encounter any issues.
About
TextBoxes: A Fast Text Detector with a Single Deep Neural Network
Topics
Resources
License
Stars
Watchers
Forks
Packages0
Languages
- C++80.6%
- Python9.1%
- Cuda5.9%
- CMake2.4%
- MATLAB0.8%
- Makefile0.6%
- Other0.6%