Movatterモバイル変換

Part of the book series:Lecture Notes in Computer Science ((LNISA,volume 11295))

Included in the following conference series:

International Conference on Multimedia Modeling

2837Accesses
2Citations

Abstract

Scene text detection (STD) in natural images is still challenging since text objects exhibit vast diversity in fonts, scales and orientations. Deep learning based state-of-the-art STD methods are promising such as PixelLink which has achieved 85% accuracy on ICDAR 2015 benchmark. Our preliminary experimental results with PixelLink have shown that its detection errors come mainly from two aspects: failing to detect the small scale and ambiguous text objects. In this paper, following the powerful PixelLink framework, we try to improve the STD performance via delicately designing a new fused semantic segmentation network with attention. Specifically, an inception module is carefully designed to extract multi-scale receptive field features aiming at enhancing feature representation. Besides, a hierarchical feature fusion module is cascaded with the inception module to capture multi-level inception features to obtain more semantic information. At last, to suppress background disturbance and better locate the text objects, an attention module is developed to learn a probability heat map of texts which helps accurately infer the texts even for ambiguous texts. Experimental results on three public benchmarks demonstrate the effectiveness of our proposed method compared with the state-of-the-arts. We note that the highest F-measure on ICADR 2015, ICADR 2013 and MSRA-TD500 has been obtained for our proposed method but the higher computational cost is required.

This is a preview of subscription content,log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 5719; Price includes VAT (Japan)

Softcover Book: JPY 7149; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Novel Attention Mechanism for Scene Text Detection

Semantic Segmentation Architecture for Text Detection with an Attention Module

A spatial feature adaptive network for text detection

Article28 February 2022

References

Deng, D., Liu, H., Li, X., Cai, D.: PixelLink: detecting scene text via instance segmentation (2018)
Google Scholar
He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., Li, X.: Single shot text detector with regional attention. In: IEEE International Conference on Computer Vision, pp. 3066–3074 (2017)
Google Scholar
Dai, Y., Huang, Z., Gao, Y., Chen, K.: Fused text segmentation networks for multi-oriented scene text detection (2017)
Google Scholar
He, W., Zhang, X.Y., Yin, F., Liu, C.L.: Deep direct regression for multi-oriented scene text detection, pp. 745–753 (2017)
Google Scholar
Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016).https://doi.org/10.1007/978-3-319-46484-8_4
Chapter Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell.39, 1137–1149 (2017)
Article Google Scholar
Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016).https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Dai, J., He, K., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades. In: Computer Vision and Pattern Recognition, pp. 3150–3158 (2016)
Google Scholar
Zhang, Z., Shen, W., Yao, C., Bai, X.: Symmetry-based text line detection in natural scenes. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2558–2567 (2015)
Google Scholar
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324 (2016)
Google Scholar
Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: TextBoxes: a fast text detector with a single deep neural network (2016)
Google Scholar
Yao, C., Bai, X., Liu, W.: A unified framework for multioriented text detection and recognition. IEEE Trans. Image Process.23, 4737–4749 (2014)
Article MathSciNet Google Scholar
Nagaoka, Y., Miyazaki, T., Sugaya, Y., Omachi, S.: Text detection by faster R-CNN with multiple region proposal networks. In: IAPR International Conference on Document Analysis and Recognition, pp. 15–20 (2017)
Google Scholar
Liao, M., Zhu, Z., Shi, B., Xia, G., Bai, X.: Rotation-sensitive regression for oriented scene text detection (2018)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Google Scholar
He, T., Huang, W., Qiao, Y., Yao, J.: Accurate text localization in natural image with cascaded convolutional text network (2016)
Google Scholar
Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: Computer Vision and Pattern Recognition, pp. 4159–4167 (2016)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Comput. Sci. (2014)
Google Scholar
Szegedy, C., et al.: Going deeper with convolutions, pp. 1–9 (2014)
Google Scholar
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions (2016)
Google Scholar
Zhou, X., et al.: EAST: an efficient and accurate scene text detector, pp. 2642–2651 (2017)
Google Scholar
Karatzas, D., et al.: ICDAR 2013 robust reading competition. In: International Conference on Document Analysis and Recognition, pp. 1484–1493 (2013)
Google Scholar
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: International Conference on Document Analysis and Recognition, pp. 1156–1160 (2015)
Google Scholar
Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments, pp. 3482–3490 (2017)
Google Scholar
Ma, J., et al.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimed.PP, 1 (2017)
Google Scholar

Download references

Acknowledgment

This paper was partially supported by the Shenzhen Science & Technology Fundamental Research Program (No.: JCYJ20160330095814461) & Shenzhen Key Laboratory for Intelligent Multimedia and Virtual Reality (ZDSYS201703031405467). Special Acknowledgements are given to Aoto-PKUSZ Joint Research Center of Artificial Intelligence on Scene Cognition & Technology Innovation for its support.

Author information

Authors and Affiliations

ADSPLAB, School of ECE, Peking University, Shenzhen, China
Chao Liu, Yuexian Zou & Dongming Yang
Peng Cheng Laboratory, Shenzhen, China
Yuexian Zou

Authors

Chao Liu
View author publications
You can also search for this author inPubMed Google Scholar
Yuexian Zou
View author publications
You can also search for this author inPubMed Google Scholar
Dongming Yang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toYuexian Zou.

Editor information

Editors and Affiliations

Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
Ioannis Kompatsiaris
EURECOM, Sophia Antipolis, France
Benoit Huet
Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
Vasileios Mezaris
Dublin City University, Dublin, Ireland
Cathal Gurrin
National Chiao Tung University, Hsinchu, Taiwan
Wen-Huang Cheng
Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
Stefanos Vrochidis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, C., Zou, Y., Yang, D. (2019). Enhancing Scene Text Detection via Fused Semantic Segmentation Network with Attention. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, WH., Vrochidis, S. (eds) MultiMedia Modeling. MMM 2019. Lecture Notes in Computer Science(), vol 11295. Springer, Cham. https://doi.org/10.1007/978-3-030-05710-7_44

Download citation

DOI:https://doi.org/10.1007/978-3-030-05710-7_44
Published:08 December 2018
Publisher Name:Springer, Cham
Print ISBN:978-3-030-05709-1
Online ISBN:978-3-030-05710-7
eBook Packages:Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Movatterモバイル変換

Enhancing Scene Text Detection via Fused Semantic Segmentation Network with Attention

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Novel Attention Mechanism for Scene Text Detection

Semantic Segmentation Architecture for Text Detection with an Attention Module

A spatial feature adaptive network for text detection

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Access this chapter

Subscribe and save

Buy Now