Part of the book series:Lecture Notes in Computer Science ((LNISA,volume 11295))
Included in the following conference series:
2837Accesses
Abstract
Scene text detection (STD) in natural images is still challenging since text objects exhibit vast diversity in fonts, scales and orientations. Deep learning based state-of-the-art STD methods are promising such as PixelLink which has achieved 85% accuracy on ICDAR 2015 benchmark. Our preliminary experimental results with PixelLink have shown that its detection errors come mainly from two aspects: failing to detect the small scale and ambiguous text objects. In this paper, following the powerful PixelLink framework, we try to improve the STD performance via delicately designing a new fused semantic segmentation network with attention. Specifically, an inception module is carefully designed to extract multi-scale receptive field features aiming at enhancing feature representation. Besides, a hierarchical feature fusion module is cascaded with the inception module to capture multi-level inception features to obtain more semantic information. At last, to suppress background disturbance and better locate the text objects, an attention module is developed to learn a probability heat map of texts which helps accurately infer the texts even for ambiguous texts. Experimental results on three public benchmarks demonstrate the effectiveness of our proposed method compared with the state-of-the-arts. We note that the highest F-measure on ICADR 2015, ICADR 2013 and MSRA-TD500 has been obtained for our proposed method but the higher computational cost is required.
This is a preview of subscription content,log in via an institution to check access.
Access this chapter
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
- Chapter
- JPY 3498
- Price includes VAT (Japan)
- eBook
- JPY 5719
- Price includes VAT (Japan)
- Softcover Book
- JPY 7149
- Price includes VAT (Japan)
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Deng, D., Liu, H., Li, X., Cai, D.: PixelLink: detecting scene text via instance segmentation (2018)
He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., Li, X.: Single shot text detector with regional attention. In: IEEE International Conference on Computer Vision, pp. 3066–3074 (2017)
Dai, Y., Huang, Z., Gao, Y., Chen, K.: Fused text segmentation networks for multi-oriented scene text detection (2017)
He, W., Zhang, X.Y., Yin, F., Liu, C.L.: Deep direct regression for multi-oriented scene text detection, pp. 745–753 (2017)
Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016).https://doi.org/10.1007/978-3-319-46484-8_4
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell.39, 1137–1149 (2017)
Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016).https://doi.org/10.1007/978-3-319-46448-0_2
Dai, J., He, K., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades. In: Computer Vision and Pattern Recognition, pp. 3150–3158 (2016)
Zhang, Z., Shen, W., Yao, C., Bai, X.: Symmetry-based text line detection in natural scenes. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2558–2567 (2015)
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324 (2016)
Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: TextBoxes: a fast text detector with a single deep neural network (2016)
Yao, C., Bai, X., Liu, W.: A unified framework for multioriented text detection and recognition. IEEE Trans. Image Process.23, 4737–4749 (2014)
Nagaoka, Y., Miyazaki, T., Sugaya, Y., Omachi, S.: Text detection by faster R-CNN with multiple region proposal networks. In: IAPR International Conference on Document Analysis and Recognition, pp. 15–20 (2017)
Liao, M., Zhu, Z., Shi, B., Xia, G., Bai, X.: Rotation-sensitive regression for oriented scene text detection (2018)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
He, T., Huang, W., Qiao, Y., Yao, J.: Accurate text localization in natural image with cascaded convolutional text network (2016)
Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: Computer Vision and Pattern Recognition, pp. 4159–4167 (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Comput. Sci. (2014)
Szegedy, C., et al.: Going deeper with convolutions, pp. 1–9 (2014)
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions (2016)
Zhou, X., et al.: EAST: an efficient and accurate scene text detector, pp. 2642–2651 (2017)
Karatzas, D., et al.: ICDAR 2013 robust reading competition. In: International Conference on Document Analysis and Recognition, pp. 1484–1493 (2013)
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: International Conference on Document Analysis and Recognition, pp. 1156–1160 (2015)
Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments, pp. 3482–3490 (2017)
Ma, J., et al.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimed.PP, 1 (2017)
Acknowledgment
This paper was partially supported by the Shenzhen Science & Technology Fundamental Research Program (No.: JCYJ20160330095814461) & Shenzhen Key Laboratory for Intelligent Multimedia and Virtual Reality (ZDSYS201703031405467). Special Acknowledgements are given to Aoto-PKUSZ Joint Research Center of Artificial Intelligence on Scene Cognition & Technology Innovation for its support.
Author information
Authors and Affiliations
ADSPLAB, School of ECE, Peking University, Shenzhen, China
Chao Liu, Yuexian Zou & Dongming Yang
Peng Cheng Laboratory, Shenzhen, China
Yuexian Zou
- Chao Liu
You can also search for this author inPubMed Google Scholar
- Yuexian Zou
You can also search for this author inPubMed Google Scholar
- Dongming Yang
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toYuexian Zou.
Editor information
Editors and Affiliations
Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
Ioannis Kompatsiaris
EURECOM, Sophia Antipolis, France
Benoit Huet
Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
Vasileios Mezaris
Dublin City University, Dublin, Ireland
Cathal Gurrin
National Chiao Tung University, Hsinchu, Taiwan
Wen-Huang Cheng
Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
Stefanos Vrochidis
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, C., Zou, Y., Yang, D. (2019). Enhancing Scene Text Detection via Fused Semantic Segmentation Network with Attention. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, WH., Vrochidis, S. (eds) MultiMedia Modeling. MMM 2019. Lecture Notes in Computer Science(), vol 11295. Springer, Cham. https://doi.org/10.1007/978-3-030-05710-7_44
Download citation
Published:
Publisher Name:Springer, Cham
Print ISBN:978-3-030-05709-1
Online ISBN:978-3-030-05710-7
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative