Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Advertisement

Springer Nature Link
Log in

Enhancing Scene Text Detection via Fused Semantic Segmentation Network with Attention

  • Conference paper
  • First Online:

Part of the book series:Lecture Notes in Computer Science ((LNISA,volume 11295))

Included in the following conference series:

Abstract

Scene text detection (STD) in natural images is still challenging since text objects exhibit vast diversity in fonts, scales and orientations. Deep learning based state-of-the-art STD methods are promising such as PixelLink which has achieved 85% accuracy on ICDAR 2015 benchmark. Our preliminary experimental results with PixelLink have shown that its detection errors come mainly from two aspects: failing to detect the small scale and ambiguous text objects. In this paper, following the powerful PixelLink framework, we try to improve the STD performance via delicately designing a new fused semantic segmentation network with attention. Specifically, an inception module is carefully designed to extract multi-scale receptive field features aiming at enhancing feature representation. Besides, a hierarchical feature fusion module is cascaded with the inception module to capture multi-level inception features to obtain more semantic information. At last, to suppress background disturbance and better locate the text objects, an attention module is developed to learn a probability heat map of texts which helps accurately infer the texts even for ambiguous texts. Experimental results on three public benchmarks demonstrate the effectiveness of our proposed method compared with the state-of-the-arts. We note that the highest F-measure on ICADR 2015, ICADR 2013 and MSRA-TD500 has been obtained for our proposed method but the higher computational cost is required.

This is a preview of subscription content,log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Similar content being viewed by others

References

  1. Deng, D., Liu, H., Li, X., Cai, D.: PixelLink: detecting scene text via instance segmentation (2018)

    Google Scholar 

  2. He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., Li, X.: Single shot text detector with regional attention. In: IEEE International Conference on Computer Vision, pp. 3066–3074 (2017)

    Google Scholar 

  3. Dai, Y., Huang, Z., Gao, Y., Chen, K.: Fused text segmentation networks for multi-oriented scene text detection (2017)

    Google Scholar 

  4. He, W., Zhang, X.Y., Yin, F., Liu, C.L.: Deep direct regression for multi-oriented scene text detection, pp. 745–753 (2017)

    Google Scholar 

  5. Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016).https://doi.org/10.1007/978-3-319-46484-8_4

    Chapter  Google Scholar 

  6. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell.39, 1137–1149 (2017)

    Article  Google Scholar 

  7. Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016).https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  8. Dai, J., He, K., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades. In: Computer Vision and Pattern Recognition, pp. 3150–3158 (2016)

    Google Scholar 

  9. Zhang, Z., Shen, W., Yao, C., Bai, X.: Symmetry-based text line detection in natural scenes. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2558–2567 (2015)

    Google Scholar 

  10. Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324 (2016)

    Google Scholar 

  11. Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: TextBoxes: a fast text detector with a single deep neural network (2016)

    Google Scholar 

  12. Yao, C., Bai, X., Liu, W.: A unified framework for multioriented text detection and recognition. IEEE Trans. Image Process.23, 4737–4749 (2014)

    Article MathSciNet  Google Scholar 

  13. Nagaoka, Y., Miyazaki, T., Sugaya, Y., Omachi, S.: Text detection by faster R-CNN with multiple region proposal networks. In: IAPR International Conference on Document Analysis and Recognition, pp. 15–20 (2017)

    Google Scholar 

  14. Liao, M., Zhu, Z., Shi, B., Xia, G., Bai, X.: Rotation-sensitive regression for oriented scene text detection (2018)

    Google Scholar 

  15. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)

    Google Scholar 

  16. He, T., Huang, W., Qiao, Y., Yao, J.: Accurate text localization in natural image with cascaded convolutional text network (2016)

    Google Scholar 

  17. Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: Computer Vision and Pattern Recognition, pp. 4159–4167 (2016)

    Google Scholar 

  18. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Comput. Sci. (2014)

    Google Scholar 

  19. Szegedy, C., et al.: Going deeper with convolutions, pp. 1–9 (2014)

    Google Scholar 

  20. Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions (2016)

    Google Scholar 

  21. Zhou, X., et al.: EAST: an efficient and accurate scene text detector, pp. 2642–2651 (2017)

    Google Scholar 

  22. Karatzas, D., et al.: ICDAR 2013 robust reading competition. In: International Conference on Document Analysis and Recognition, pp. 1484–1493 (2013)

    Google Scholar 

  23. Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: International Conference on Document Analysis and Recognition, pp. 1156–1160 (2015)

    Google Scholar 

  24. Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments, pp. 3482–3490 (2017)

    Google Scholar 

  25. Ma, J., et al.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimed.PP, 1 (2017)

    Google Scholar 

Download references

Acknowledgment

This paper was partially supported by the Shenzhen Science & Technology Fundamental Research Program (No.: JCYJ20160330095814461) & Shenzhen Key Laboratory for Intelligent Multimedia and Virtual Reality (ZDSYS201703031405467). Special Acknowledgements are given to Aoto-PKUSZ Joint Research Center of Artificial Intelligence on Scene Cognition & Technology Innovation for its support.

Author information

Authors and Affiliations

  1. ADSPLAB, School of ECE, Peking University, Shenzhen, China

    Chao Liu, Yuexian Zou & Dongming Yang

  2. Peng Cheng Laboratory, Shenzhen, China

    Yuexian Zou

Authors
  1. Chao Liu

    You can also search for this author inPubMed Google Scholar

  2. Yuexian Zou

    You can also search for this author inPubMed Google Scholar

  3. Dongming Yang

    You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toYuexian Zou.

Editor information

Editors and Affiliations

  1. Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece

    Ioannis Kompatsiaris

  2. EURECOM, Sophia Antipolis, France

    Benoit Huet

  3. Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece

    Vasileios Mezaris

  4. Dublin City University, Dublin, Ireland

    Cathal Gurrin

  5. National Chiao Tung University, Hsinchu, Taiwan

    Wen-Huang Cheng

  6. Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece

    Stefanos Vrochidis

Rights and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, C., Zou, Y., Yang, D. (2019). Enhancing Scene Text Detection via Fused Semantic Segmentation Network with Attention. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, WH., Vrochidis, S. (eds) MultiMedia Modeling. MMM 2019. Lecture Notes in Computer Science(), vol 11295. Springer, Cham. https://doi.org/10.1007/978-3-030-05710-7_44

Download citation

Publish with us

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only


[8]ページ先頭

©2009-2025 Movatter.jp