Movatterモバイル変換

Part of the book series:Lecture Notes in Computer Science ((LNCS,volume 14808))

Included in the following conference series:

International Conference on Document Analysis and Recognition

451Accesses

Abstract

The advancement of text shape representations towards compactness has enhanced text detection and spotting performance, but at a high annotation cost. Current models use single-point annotations to reduce costs, yet they lack sufficient localization information for downstream applications. To overcome this limitation, we introduce Point2Pol- ygon, which can efficiently transform single-points into compact polygons. Our method uses a coarse-to-fine process, starting with creating and selecting anchor points based on recognition confidence, then vertically and horizontally refining the polygon using recognition information to optimize its shape. We demonstrate the accuracy of the generated polygons through extensive experiments: 1) By creating polygons from ground truth points, we achieved an accuracy of 82.0% on ICDAR 2015; 2) In training detectors with polygons generated by our method, we attained 86% of the accuracy relative to training with ground truth (GT); 3) Additionally, the proposed Point2Polygon can be seamlessly integrated to empower single-point spotters to generate polygons. This integration led to an impressive 82.5% accuracy for the generated polygons. It is worth mentioning that our method relies solely on synthetic recognition information, eliminating the need for any manual annotation beyond single points.

L. Deng and M. Huang—Equal contribution.

This is a preview of subscription content,log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 8465; Price includes VAT (Japan)

Softcover Book: JPY 10581; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Arbitrary-shaped scene text detection with keypoint-based shape representation

Article25 March 2022

TextTriangle: an end-to-end textspotter with piecewise linear alignment

Article12 March 2025

Residual Dual Scale Scene Text Spotting by Fusing Bottom-Up and Top-Down Processing

Article24 October 2020

References

Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9365–9374 (2019)
Google Scholar
Bookstein, F.L.: Principal warps: thin-plate splines and the decomposition of deformations. IEEE Trans. Pattern Anal. Mach. Intell.11(6), 567–585 (1989)
Article Google Scholar
Cheng, Z., Xu, Y., Bai, F., Niu, Y., Pu, S., Zhou, S.: AON: towards arbitrarily-oriented text recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5571–5579 (2018)
Google Scholar
Ch’ng, C.K., Chan, C.S., Liu, C.L.: Total-text: toward orientation robustness in scene text detection. Int. J. Doc. Anal. Recogn. (IJDAR)23(1), 31–52 (2020)
Article Google Scholar
Du, Y., et al.: SVTR: scene text recognition with a single visual model. In: Raedt, L.D. (ed.) Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence. IJCAI-22, pp. 884–890. International Joint Conferences on Artificial Intelligence Organization (2022).https://doi.org/10.24963/ijcai.2022/124, main Track
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324 (2016)
Google Scholar
He, W., Zhang, X.Y., Yin, F., Liu, C.L.: Deep direct regression for multi-oriented scene text detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 745–753 (2017)
Google Scholar
Hu, H., Zhang, C., Luo, Y., Wang, Y., Han, J., Ding, E.: WordSup: exploiting word annotations for character based text detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4940–4949 (2017)
Google Scholar
Huang, M., et al.: Swintextspotter: scene text spotting via better synergy between text detection and text recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4593–4603 (2022)
Google Scholar
Huang, M., et al.: ESTextSpotter: towards better scene text spotting with explicit synergy in transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 19495–19505 (2023)
Google Scholar
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprintarXiv:1406.2227 (2014)
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015)
Google Scholar
Kil, T., Kim, S., Seo, S., Kim, Y., Kim, D.: Towards unified scene text spotting based on sequence generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15223–15232 (2023)
Google Scholar
Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4604–4613 (2022)
Google Scholar
Kuang, Z., et al.: MMOCR: a comprehensive toolbox for text detection, recognition and understanding. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 3791–3794 (2021)
Google Scholar
Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Trans. Pattern Anal. Mach. Intell.43(2), 532–548 (2021).https://doi.org/10.1109/TPAMI.2019.2937086
Article Google Scholar
Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: Textboxes: a fast text detector with a single deep neural network. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
Google Scholar
Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11474–11481 (2020)
Google Scholar
Liu, C., Liu, Y., Jin, L., Zhang, S., Luo, C., Wang, Y.: Erasenet: end-to-end text removal in the wild. IEEE Trans. Image Process.29, 8760–8775 (2020)
Article Google Scholar
Liu, R., Lu, N., Chen, D., Li, C., Yuan, Z., Peng, W.: PBformer: capturing complex scene text shape with polynomial band transformer. In: Proceedings of the 31st ACM International Conference on Multimedia, pp. 2112–2120 (2023)
Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016).https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Liu, Y., Chen, H., Shen, C., He, T., Jin, L., Wang, L.: ABCnet: real-time scene text spotting with adaptive Bezier-curve network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9809–9818 (2020)
Google Scholar
Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recogn.90, 337–345 (2019)
Article Google Scholar
Liu, Y., et al.: SPTS v2: single-point scene text spotting. arXiv preprintarXiv:2301.01635 (2023)
Luo, C., Jin, L., Sun, Z.: Moran: a multi-object rectified attention network for scene text recognition. Pattern Recogn.90, 109–118 (2019)
Article Google Scholar
Peng, D., et al.: SPTS: single-point text spotting. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 4272–4281 (2022)
Google Scholar
Qu, Y., Tan, Q., Xie, H., Xu, J., Wang, Y., Zhang, Y.: Exploring stroke-level modifications for scene text editing. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 2119–2127 (2023)
Google Scholar
Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell.41(9), 2035–2048 (2018)
Article Google Scholar
Tang, J., Qiao, S., Cui, B., Ma, Y., Zhang, S., Kanoulas, D.: You can even annotate text with voice: transcription-only-supervised text spotting. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 4154–4163 (2022)
Google Scholar
Tian, S., Lu, S., Li, C.: Wetext: scene text detection under weak supervision. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1492–1500 (2017)
Google Scholar
Tian, Z., Huang, W., He, T., He, P., Qiao, Yu.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part VIII. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016).https://doi.org/10.1007/978-3-319-46484-8_4
Chapter Google Scholar
Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9336–9345 (2019)
Google Scholar
Wang, W., et al.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8440–8449 (2019)
Google Scholar
Wang, Y., Xie, H., Fang, S., Qu, Y., Zhang, Y.: PERT: a progressively region-based network for scene text removal. arXiv preprintarXiv:2106.13029 (2021)
Wu, L., et al.: Editing text in the wild. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 1500–1508 (2019)
Google Scholar
Wu, W., Xie, E., Zhang, R., Wang, W., Luo, P., Hong, Z.: Polygon-free: unconstrained scene text detection with box annotations. In: Proceedings of the IEEE International Conference on Image Processing, pp. 1226–1230 (2022)
Google Scholar
Ye, M., et al.: DeepSolo: let transformer decoder with explicit points solo for text spotting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19348–19357 (2023)
Google Scholar
Zeiler, M.D.: AdaDelta: an adaptive learning rate method. arXiv preprintarXiv:1212.5701 (2012)
Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9519–9528 (2022)
Google Scholar
Zhao, M., Feng, W., Yin, F., Liu, C.L.: Texts as points: scene text detection with point supervision. Pattern Recogn. Lett.170, 1–8 (2023)
Article Google Scholar
Zheng, T., Chen, Z., Fang, S., Xie, H., Jiang, Y.G.: CDistnet: perceiving multi-domain character distance for robust text recognition. Int. J. Comput. Vision, 1–19 (2023)
Google Scholar
Zhou, X., et al.: East: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5551–5560 (2017)
Google Scholar

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (No. 62225603, No. 62206104).

Author information

Authors and Affiliations

Huazhong University of Science and Technology, Wuhan, China
Linger Deng, Xudong Xie, Yuliang Liu & Xiang Bai
South China University of Technology, Guangzhou, China
Mingxin Huang & Lianwen Jin

Authors

Linger Deng
View author publications
You can also search for this author inPubMed Google Scholar
Mingxin Huang
View author publications
You can also search for this author inPubMed Google Scholar
Xudong Xie
View author publications
You can also search for this author inPubMed Google Scholar
Yuliang Liu
View author publications
You can also search for this author inPubMed Google Scholar
Lianwen Jin
View author publications
You can also search for this author inPubMed Google Scholar
Xiang Bai
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toYuliang Liu.

Editor information

Editors and Affiliations

Luleå Tekniska Universitet, Luleå, Sweden
Elisa H. Barney Smith
Luleå Tekniska Universitet, Luleå, Sweden
Marcus Liwicki
Tsinghua University, Beijing, China
Liangrui Peng

1Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 47 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Deng, L., Huang, M., Xie, X., Liu, Y., Jin, L., Bai, X. (2024). Progressive Evolution from Single-Point to Polygon for Scene Text. In: Barney Smith, E.H., Liwicki, M., Peng, L. (eds) Document Analysis and Recognition - ICDAR 2024. ICDAR 2024. Lecture Notes in Computer Science, vol 14808. Springer, Cham. https://doi.org/10.1007/978-3-031-70549-6_7

Download citation

DOI:https://doi.org/10.1007/978-3-031-70549-6_7
Published:09 September 2024
Publisher Name:Springer, Cham
Print ISBN:978-3-031-70548-9
Online ISBN:978-3-031-70549-6
eBook Packages:Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Movatterモバイル変換

Progressive Evolution from Single-Point to Polygon for Scene Text

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Arbitrary-shaped scene text detection with keypoint-based shape representation

TextTriangle: an end-to-end textspotter with piecewise linear alignment

Residual Dual Scale Scene Text Spotting by Fusing Bottom-Up and Top-Down Processing

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1Electronic supplementary material

Supplementary material 1 (pdf 47 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Access this chapter

Subscribe and save

Buy Now