Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Advertisement

Springer Nature Link
Log in

A Cost-Efficient Framework for Scene Text Detection in the Wild

  • Conference paper
  • First Online:

Part of the book series:Lecture Notes in Computer Science ((LNAI,volume 13031))

Included in the following conference series:

Abstract

Scene text detection in the wild is a hot research area in the field of computer vision, which has achieved great progress with the aid of deep learning. However, training deep text detection models needs large amounts of annotations such as bounding boxes and quadrangles, which is laborious and expensive. Although synthetic data is easier to acquire, the model trained on this data has large performance gap with that trained on real data because of domain shift. To address this problem, we propose a novel two-stage framework for cost-efficient scene text detection. Specifically, in order to unleash the power of synthetic data, we design an unsupervised domain adaptation scheme consisting of Entropy-aware Global Transfer (EGT) and Text Region Transfer (TRT) to pre-train the model. Furthermore, we utilize minimal actively annotated and enhanced pseudo labeled real samples to fine-tune the model, aiming at saving the annotation cost. In this framework, both the diversity of the synthetic data and the reality of the unlabeled real data are fully exploited. Extensive experiments on various benchmarks show that the proposed framework significantly outperforms the baseline, and achieves desirable performance with even a few labeled real datasets.

This is a preview of subscription content,log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11439
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14299
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Similar content being viewed by others

References

  1. Chen, D., et al.: Cross-domain scene text detection via pixel and image-level adaptation. In: ICONIP, pp. 135–143 (2019)

    Google Scholar 

  2. Chen, Y., Wang, W., Zhou, Y., Yang, F., Yang, D., Wang, W.: Self-training for domain adaptive scene text detection. In: ICPR, pp. 850–857 (2021)

    Google Scholar 

  3. Chen, Y., Zhou, Y., Yang, D., Wang, W.: Constrained relation network for character detection in scene images. In: PRICAI, pp. 137–149 (2019)

    Google Scholar 

  4. Chen, Y., Li, W., Sakaridis, C., Dai, D., Van Gool, L.: Domain adaptive faster r-cnn for object detection in the wild. In: CVPR, pp. 3339–3348 (2018)

    Google Scholar 

  5. Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018)

    Google Scholar 

  6. Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: ICML, pp. 1180–1189 (2015)

    Google Scholar 

  7. Gong, B., Shi, Y., Sha, F., Grauman, K.: Geodesic flow kernel for unsupervised domain adaptation. In: CVPR, pp. 2066–2073 (2012)

    Google Scholar 

  8. Guo, Y., Zhou, Y., Qin, X., Wang, W.: Which and where to focus: a simple yet accurate framework for arbitrary-shaped nearby text detection in scene images. In: ICANN (2021)

    Google Scholar 

  9. Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: CVPR, pp. 2315–2324 (2016)

    Google Scholar 

  10. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017)

    Google Scholar 

  11. Karatzas, D., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015)

    Google Scholar 

  12. Karatzas, D., et al.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013)

    Google Scholar 

  13. Leng, Y., Xu, X., Qi, G.: Combining active learning and semi-supervised learning to construct svm classifier. Knowl.-Based Syst.44, 121–131 (2013)

    Article  Google Scholar 

  14. Li, W., Luo, D., Fang, B., Zhou, Y., Wang, W.: Video 3d sampling for self-supervised representation learning. arXiv preprintarXiv:2107.03578 (2021)

  15. Li, X., et al.: Dense semantic contrast for self-supervised visual representation learning. In: ACM MM (2021)

    Google Scholar 

  16. Liao, M., Shi, B., Bai, X.: Textboxes++: a single-shot oriented scene text detector. TIP27(8), 3676–3690 (2018)

    MathSciNet MATH  Google Scholar 

  17. Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018)

    Google Scholar 

  18. Luo, D., Fang, B., Zhou, Y., Zhou, Y., Wu, D., Wang, W.: Exploring relations in untrimmed videos for self-supervised learning. arXiv preprintarXiv:2008.02711 (2020)

  19. Luo, D., et al.: Video cloze procedure for self-supervised spatio-temporal learning. In: AAAI, pp. 11701–11708 (2020)

    Google Scholar 

  20. Pise, N.N., Kulkarni, P.: A survey of semi-supervised learning methods. In: CIS, vol. 2, pp. 30–34 (2008)

    Google Scholar 

  21. Qiao, Z., Qin, X., Zhou, Y., Yang, F., Wang, W.: Gaussian constrained attention network for scene text recognition. In: ICPR, pp. 3328–3335 (2021)

    Google Scholar 

  22. Qiao, Z., et al.: PIMNet: a parallel, iterative and mimicking network for scene text recognition. In: ACM MM (2021)

    Google Scholar 

  23. Qiao, Z., Zhou, Y., Yang, D., Zhou, Y., Wang, W.: Seed: Semantics enhanced encoder-decoder framework for scene text recognition. In: CVPR, pp. 13528–13537 (2020)

    Google Scholar 

  24. Qin, X., et al.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM (2021)

    Google Scholar 

  25. Qin, X., Zhou, Y., Guo, Y., Wu, D., Wang, W.: Fc 2 rn: a fully convolutional corner refinement network for accurate multi-oriented scene text detection. In: ICASSP. pp. 4350–4354 (2021)

    Google Scholar 

  26. Qin, X., Zhou, Y., Yang, D., Wang, W.: Curved text detection in natural scene images with semi-and weakly-supervised learning. In: ICDAR, pp. 559–564 (2019)

    Google Scholar 

  27. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. TPAMI39(6), 1137–1149 (2016)

    Article  Google Scholar 

  28. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. TPAMI39(11), 2298–2304 (2016)

    Article  Google Scholar 

  29. Tian, S., Lu, S., Li, C.: Wetext: scene text detection under weak supervision. In: ICCV, pp. 1492–1500 (2017)

    Google Scholar 

  30. Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: ECCV, pp. 56–72 (2016)

    Google Scholar 

  31. Wang, K., Zhang, D., Li, Y., Zhang, R., Lin, L.: Cost-effective active learning for deep image classification. TCSVT27(12), 2591–2600 (2016)

    Google Scholar 

  32. Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: CVPR, pp. 9336–9345 (2019)

    Google Scholar 

  33. Wang, W., et al.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: ICCV, pp. 8440–8449 (2019)

    Google Scholar 

  34. Wang, X., Wen, J., Alam, S., Jiang, Z., Wu, Y.: Semi-supervised learning combining transductive support vector machine with active learning. Neurocomputing173, 1288–1298 (2016)

    Article  Google Scholar 

  35. Wu, W., et al.: Synthetic-to-real unsupervised domain adaptation for scene text detection in the wild. In: ACCV (2020)

    Google Scholar 

  36. Yang, D., Zhou, Y., Wang, W.: Multi-view correlation distillation for incremental object detection. arXiv preprintarXiv:2107.01787 (2021)

  37. Yang, D., Zhou, Y., Wu, D., Ma, C., Yang, F., Wang, W.: Two-level residual distillation based triple network for incremental object detection. arXiv preprintarXiv:2007.13428 (2020)

  38. Yao, Y., Liu, C., Luo, D., Zhou, Y., Ye, Q.: Video playback rate perception for self-supervised spatio-temporal representation learning. In: CVPR, pp. 6548–6557 (2020)

    Google Scholar 

  39. Ye, Q., Doermann, D.: Text detection and recognition in imagery: a survey. TPAMI37(7), 1480–1500 (2014)

    Article  Google Scholar 

  40. Yoo, D., Kweon, I.S.: Learning loss for active learning. In: CVPR, pp. 93–102 (2019)

    Google Scholar 

  41. Zeng, G., Zhang, Y., Zhou, Y., Yang, X.: Beyond OCR + VQA: involving OCR into the flow for robust and accurate TextVQA. In: ACM MM (2021)

    Google Scholar 

  42. Zhan, F., Lu, S., Xue, C.: Verisimilar image synthesis for accurate detection and recognition of texts in scenes. In: ECCV, pp. 249–266 (2018)

    Google Scholar 

  43. Zhan, F., Xue, C., Lu, S.: Ga-dan: geometry-aware domain adaptation network for scene text detection and recognition. In: ICCV, pp. 9105–9115 (2019)

    Google Scholar 

  44. Zhang, Y., Liu, C., Zhou, Y., Wang, W., Wang, W., Ye, Q.: Progressive cluster purification for unsupervised feature learning. In: ICPR, pp. 8476–8483 (2021)

    Google Scholar 

  45. Zhang, Y., Zhou, Y., Wang, W.: Exploring instance relations for unsupervised feature embedding. arXiv preprintarXiv:2105.03341 (2021)

  46. Zheng, Y., Huang, D., Liu, S., Wang, Y.: Cross-domain object detection through coarse-to-fine feature adaptation. In: CVPR, pp. 13766–13775 (2020)

    Google Scholar 

  47. Zhou, X., et al.: East: an efficient and accurate scene text detector. In: CVPR, pp. 5551–5560 (2017)

    Google Scholar 

Download references

Acknowledgments

This work is supported by the Open Research Project of the State Key Laboratory of Media Convergence and Communication, Communication University of China, China (No. SKLMCC2020KF004), the Beijing Municipal Science & Technology Commission (Z191100007119002), the Key Research Program of Frontier Sciences, CAS, Grant NO ZDBS-LY-7024, and the National Natural Science Foundation of China (No. 62006221).

Author information

Authors and Affiliations

  1. Communication University of China, Beijing, China

    Gangyan Zeng & Yuan Zhang

  2. Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China

    Yu Zhou & Xiaomeng Yang

  3. School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China

    Yu Zhou & Xiaomeng Yang

Authors
  1. Gangyan Zeng

    You can also search for this author inPubMed Google Scholar

  2. Yuan Zhang

    You can also search for this author inPubMed Google Scholar

  3. Yu Zhou

    You can also search for this author inPubMed Google Scholar

  4. Xiaomeng Yang

    You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toYu Zhou.

Editor information

Editors and Affiliations

  1. MIMOS Berhad, Kuala Lumpur, Malaysia

    Duc Nghia Pham

  2. Sirindhorn International Institute of Science and Technology, Thammasat University, Mueang Pathum Thani, Thailand

    Thanaruk Theeramunkong

  3. Data61, CSIRO, Brisbane, QLD, Australia

    Guido Governatori

  4. Department of Philosophy, Tsinghua University, Beijing, China

    Fenrong Liu

Rights and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zeng, G., Zhang, Y., Zhou, Y., Yang, X. (2021). A Cost-Efficient Framework for Scene Text Detection in the Wild. In: Pham, D.N., Theeramunkong, T., Governatori, G., Liu, F. (eds) PRICAI 2021: Trends in Artificial Intelligence. PRICAI 2021. Lecture Notes in Computer Science(), vol 13031. Springer, Cham. https://doi.org/10.1007/978-3-030-89188-6_11

Download citation

Publish with us

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11439
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14299
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only


[8]ページ先頭

©2009-2025 Movatter.jp