Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Advertisement

Springer Nature Link
Log in

Feature Enhancement with Text-Specific Region Contrast for Scene Text Detection

  • Conference paper
  • First Online:

Abstract

As a fundamental step in most visual text-related tasks, scene text detection has been widely studied for a long time. However, due to the diversity in the foreground, such as aspect ratios, colors, shapes,etc., as well as the complexity of the background, scene text detection still faces many challenges. It is often difficult to obtain discriminative text-level features when dealing with overlapping text regions or ambiguous regions of adjacency, resulting in suboptimal detection performance. In this paper, we propose Text-specific Region Contrast (TRC) based on contrastive learning to enhance the features of text regions. Specifically, to formulate positive and negative sample pairs for contrast-based training, we divide regions in scene text images into three categories,i.e., text regions, backgrounds, and text-adjacent regions. Furthermore, we design a Text Multi-scale Strip Convolutional Attention module, called TextMSCA, to refine embedding features for precise contrast. We find that the learned features can focus on complete text regions and effectively tackle the ambiguity problem. Additionally, our method is lightweight and can be implemented in a plug-and-play manner while maintaining a high inference speed. Extensive experiments conducted on multiple benchmarks verify that the proposed method consistently improves the baseline with significant margins.

Supported by the Natural Science Foundation of China (Grant NO 62376266), and by the Key Research Program of Frontier Sciences, CAS (Grant NO ZDBS-LY-7024).

This is a preview of subscription content,log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 9151
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 11439
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Similar content being viewed by others

References

  1. Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9365–9374 (2019)

    Google Scholar 

  2. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR (2020)

    Google Scholar 

  3. Ch’ng, C.K., Chan, C.S.: Total-Text: a comprehensive dataset for scene text detection and recognition. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 935–942. IEEE (2017)

    Google Scholar 

  4. Deng, D., Liu, H., Li, X., Cai, D.: PixelLink: detecting scene text via instance segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

    Google Scholar 

  5. Gómez, L., Mafla, A., Rusiñol, M., Karatzas, D.: Single shot scene text retrieval. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 728–744. Springer, Cham (2018).https://doi.org/10.1007/978-3-030-01264-9_43

    Chapter  Google Scholar 

  6. Guo, M.H., Lu, C.Z., Hou, Q., Liu, Z., Cheng, M.M., Hu, S.M.: SegNeXt: rethinking convolutional attention design for semantic segmentation. arXiv preprintarXiv:2209.08575 (2022)

  7. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)

    Google Scholar 

  8. Hu, H., Cui, J., Wang, L.: Region-aware contrastive learning for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16291–16301 (2021)

    Google Scholar 

  9. Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015)

    Google Scholar 

  10. Khosla, P., et al.: Supervised contrastive learning. Adv. Neural. Inf. Process. Syst.33, 18661–18673 (2020)

    Google Scholar 

  11. Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11474–11481 (2020)

    Google Scholar 

  12. Liu, Y., Chen, H., Shen, C., He, T., Jin, L., Wang, L.: ABCNet: real-time scene text spotting with Adaptive Bezier-Curve network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9809–9818 (2020)

    Google Scholar 

  13. Liu, Z., Lin, G., Yang, S., Feng, J., Lin, W., Goh, W.L.: Learning Markov clustering networks for scene text detection. arXiv preprintarXiv:1805.08365 (2018)

  14. Long, S., Ruan, J., Zhang, W., He, X., Wu, W., Yao, C.: TextSnake: a flexible representation for detecting text of arbitrary shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 19–35. Springer, Cham (2018).https://doi.org/10.1007/978-3-030-01216-8_2

    Chapter  Google Scholar 

  15. Nayef, N., et al.: ICDAR 2017 robust reading challenge on multi-lingual scene text detection and script identification-RRC-MLT. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1454–1459. IEEE (2017)

    Google Scholar 

  16. Oord, A.V.D., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprintarXiv:1807.03748 (2018)

  17. Sheng, T., Chen, J., Lian, Z.: CentripetalText: an efficient text instance representation for scene text detection. Adv. Neural. Inf. Process. Syst.34, 335–346 (2021)

    Google Scholar 

  18. Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: ASTER: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell.41(9), 2035–2048 (2018)

    Article  Google Scholar 

  19. Tian, Y., Krishnan, D., Isola, P.: Contrastive multiview coding. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 776–794. Springer, Cham (2020).https://doi.org/10.1007/978-3-030-58621-8_45

    Chapter  Google Scholar 

  20. Tian, Z., et al.: Learning shape-aware embedding for scene text detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4234–4243 (2019)

    Google Scholar 

  21. Wang, F., Chen, Y., Wu, F., Li, X.: TextRay: contour-based geometric modeling for arbitrary-shaped scene text detection. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 111–119 (2020)

    Google Scholar 

  22. Wang, F., Xu, X., Chen, Y., Li, X.: Fuzzy semantics for arbitrary-shaped scene text detection. IEEE Trans. Image Process.32, 1–12 (2022)

    Article  Google Scholar 

  23. Wang, W., Zhou, T., Yu, F., Dai, J., Konukoglu, E., Van Gool, L.: Exploring cross-image pixel contrast for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7303–7313 (2021)

    Google Scholar 

  24. Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9336–9345 (2019)

    Google Scholar 

  25. Wang, W., et al.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8440–8449 (2019)

    Google Scholar 

  26. Xie, E., Zang, Y., Shao, S., Yu, G., Yao, C., Li, G.: Scene text detection with supervised pyramid context network. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 9038–9045 (2019)

    Google Scholar 

  27. Xu, Y., Wang, Y., Zhou, W., Wang, Y., Yang, Z., Bai, X.: TextField: learning a deep direction field for irregular scene text detection. IEEE Trans. Image Process.28(11), 5566–5579 (2019)

    Article MathSciNet  Google Scholar 

  28. Xue, C., Lu, S., Zhang, W.: MSR: multi-scale shape regression for scene text detection. arXiv preprintarXiv:1901.02596 (2019)

  29. Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1083–1090. IEEE (2012)

    Google Scholar 

  30. Yuliang, L., Lianwen, J., Shuaitao, Z., Sheng, Z.: Detecting curve text in the wild: new dataset and new solution. arXiv preprintarXiv:1712.02170 (2017)

  31. Zhang, S., Liu, Y., Jin, L., Wei, Z., Shen, C.: OPMP: an omnidirectional pyramid mask proposal network for arbitrary-shape scene text detection. IEEE Trans. Multimedia23, 454–467 (2020)

    Article  Google Scholar 

  32. Zhou, X., et al.: EAST: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5551–5560 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China

    Xurui Sun, Jiahao Lyu, Yifei Zhang, Bo Fang, Yu Zhou, Enze Xie & Can Ma

  2. School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China

    Xurui Sun, Jiahao Lyu, Yifei Zhang, Bo Fang & Enze Xie

  3. School of Information and Communication Engineering, Communication University of China, Beijing, China

    Gangyan Zeng

Authors
  1. Xurui Sun

    You can also search for this author inPubMed Google Scholar

  2. Jiahao Lyu

    You can also search for this author inPubMed Google Scholar

  3. Yifei Zhang

    You can also search for this author inPubMed Google Scholar

  4. Gangyan Zeng

    You can also search for this author inPubMed Google Scholar

  5. Bo Fang

    You can also search for this author inPubMed Google Scholar

  6. Yu Zhou

    You can also search for this author inPubMed Google Scholar

  7. Enze Xie

    You can also search for this author inPubMed Google Scholar

  8. Can Ma

    You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toYu Zhou.

Editor information

Editors and Affiliations

  1. Nanjing University of Information Science and Technology, Nanjing, China

    Qingshan Liu

  2. Xiamen University, Xiamen, China

    Hanzi Wang

  3. Beijing University of Posts and Telecommunications, Beijing, China

    Zhanyu Ma

  4. Sun Yat-sen University, Guangzhou, China

    Weishi Zheng

  5. Peking University, Beijing, China

    Hongbin Zha

  6. Chinese Academy of Sciences, Beijing, China

    Xilin Chen

  7. Chinese Academy of Sciences, Beijing, China

    Liang Wang

  8. Xiamen University, Xiamen, China

    Rongrong Ji

Rights and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sun, X.et al. (2024). Feature Enhancement with Text-Specific Region Contrast for Scene Text Detection. In: Liu, Q.,et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14431. Springer, Singapore. https://doi.org/10.1007/978-981-99-8540-1_1

Download citation

Publish with us

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 9151
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 11439
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only


[8]ページ先頭

©2009-2025 Movatter.jp