Part of the book series:Lecture Notes in Computer Science ((LNCS,volume 14808))
Included in the following conference series:
451Accesses
Abstract
The advancement of text shape representations towards compactness has enhanced text detection and spotting performance, but at a high annotation cost. Current models use single-point annotations to reduce costs, yet they lack sufficient localization information for downstream applications. To overcome this limitation, we introduce Point2Pol- ygon, which can efficiently transform single-points into compact polygons. Our method uses a coarse-to-fine process, starting with creating and selecting anchor points based on recognition confidence, then vertically and horizontally refining the polygon using recognition information to optimize its shape. We demonstrate the accuracy of the generated polygons through extensive experiments: 1) By creating polygons from ground truth points, we achieved an accuracy of 82.0% on ICDAR 2015; 2) In training detectors with polygons generated by our method, we attained 86% of the accuracy relative to training with ground truth (GT); 3) Additionally, the proposed Point2Polygon can be seamlessly integrated to empower single-point spotters to generate polygons. This integration led to an impressive 82.5% accuracy for the generated polygons. It is worth mentioning that our method relies solely on synthetic recognition information, eliminating the need for any manual annotation beyond single points.
L. Deng and M. Huang—Equal contribution.
This is a preview of subscription content,log in via an institution to check access.
Access this chapter
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
- Chapter
- JPY 3498
- Price includes VAT (Japan)
- eBook
- JPY 8465
- Price includes VAT (Japan)
- Softcover Book
- JPY 10581
- Price includes VAT (Japan)
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9365–9374 (2019)
Bookstein, F.L.: Principal warps: thin-plate splines and the decomposition of deformations. IEEE Trans. Pattern Anal. Mach. Intell.11(6), 567–585 (1989)
Cheng, Z., Xu, Y., Bai, F., Niu, Y., Pu, S., Zhou, S.: AON: towards arbitrarily-oriented text recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5571–5579 (2018)
Ch’ng, C.K., Chan, C.S., Liu, C.L.: Total-text: toward orientation robustness in scene text detection. Int. J. Doc. Anal. Recogn. (IJDAR)23(1), 31–52 (2020)
Du, Y., et al.: SVTR: scene text recognition with a single visual model. In: Raedt, L.D. (ed.) Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence. IJCAI-22, pp. 884–890. International Joint Conferences on Artificial Intelligence Organization (2022).https://doi.org/10.24963/ijcai.2022/124, main Track
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324 (2016)
He, W., Zhang, X.Y., Yin, F., Liu, C.L.: Deep direct regression for multi-oriented scene text detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 745–753 (2017)
Hu, H., Zhang, C., Luo, Y., Wang, Y., Han, J., Ding, E.: WordSup: exploiting word annotations for character based text detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4940–4949 (2017)
Huang, M., et al.: Swintextspotter: scene text spotting via better synergy between text detection and text recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4593–4603 (2022)
Huang, M., et al.: ESTextSpotter: towards better scene text spotting with explicit synergy in transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 19495–19505 (2023)
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprintarXiv:1406.2227 (2014)
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015)
Kil, T., Kim, S., Seo, S., Kim, Y., Kim, D.: Towards unified scene text spotting based on sequence generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15223–15232 (2023)
Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4604–4613 (2022)
Kuang, Z., et al.: MMOCR: a comprehensive toolbox for text detection, recognition and understanding. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 3791–3794 (2021)
Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Trans. Pattern Anal. Mach. Intell.43(2), 532–548 (2021).https://doi.org/10.1109/TPAMI.2019.2937086
Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: Textboxes: a fast text detector with a single deep neural network. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11474–11481 (2020)
Liu, C., Liu, Y., Jin, L., Zhang, S., Luo, C., Wang, Y.: Erasenet: end-to-end text removal in the wild. IEEE Trans. Image Process.29, 8760–8775 (2020)
Liu, R., Lu, N., Chen, D., Li, C., Yuan, Z., Peng, W.: PBformer: capturing complex scene text shape with polynomial band transformer. In: Proceedings of the 31st ACM International Conference on Multimedia, pp. 2112–2120 (2023)
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016).https://doi.org/10.1007/978-3-319-46448-0_2
Liu, Y., Chen, H., Shen, C., He, T., Jin, L., Wang, L.: ABCnet: real-time scene text spotting with adaptive Bezier-curve network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9809–9818 (2020)
Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recogn.90, 337–345 (2019)
Liu, Y., et al.: SPTS v2: single-point scene text spotting. arXiv preprintarXiv:2301.01635 (2023)
Luo, C., Jin, L., Sun, Z.: Moran: a multi-object rectified attention network for scene text recognition. Pattern Recogn.90, 109–118 (2019)
Peng, D., et al.: SPTS: single-point text spotting. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 4272–4281 (2022)
Qu, Y., Tan, Q., Xie, H., Xu, J., Wang, Y., Zhang, Y.: Exploring stroke-level modifications for scene text editing. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 2119–2127 (2023)
Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell.41(9), 2035–2048 (2018)
Tang, J., Qiao, S., Cui, B., Ma, Y., Zhang, S., Kanoulas, D.: You can even annotate text with voice: transcription-only-supervised text spotting. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 4154–4163 (2022)
Tian, S., Lu, S., Li, C.: Wetext: scene text detection under weak supervision. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1492–1500 (2017)
Tian, Z., Huang, W., He, T., He, P., Qiao, Yu.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part VIII. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016).https://doi.org/10.1007/978-3-319-46484-8_4
Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9336–9345 (2019)
Wang, W., et al.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8440–8449 (2019)
Wang, Y., Xie, H., Fang, S., Qu, Y., Zhang, Y.: PERT: a progressively region-based network for scene text removal. arXiv preprintarXiv:2106.13029 (2021)
Wu, L., et al.: Editing text in the wild. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 1500–1508 (2019)
Wu, W., Xie, E., Zhang, R., Wang, W., Luo, P., Hong, Z.: Polygon-free: unconstrained scene text detection with box annotations. In: Proceedings of the IEEE International Conference on Image Processing, pp. 1226–1230 (2022)
Ye, M., et al.: DeepSolo: let transformer decoder with explicit points solo for text spotting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19348–19357 (2023)
Zeiler, M.D.: AdaDelta: an adaptive learning rate method. arXiv preprintarXiv:1212.5701 (2012)
Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9519–9528 (2022)
Zhao, M., Feng, W., Yin, F., Liu, C.L.: Texts as points: scene text detection with point supervision. Pattern Recogn. Lett.170, 1–8 (2023)
Zheng, T., Chen, Z., Fang, S., Xie, H., Jiang, Y.G.: CDistnet: perceiving multi-domain character distance for robust text recognition. Int. J. Comput. Vision, 1–19 (2023)
Zhou, X., et al.: East: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5551–5560 (2017)
Acknowledgments
This work was supported by the National Natural Science Foundation of China (No. 62225603, No. 62206104).
Author information
Authors and Affiliations
Huazhong University of Science and Technology, Wuhan, China
Linger Deng, Xudong Xie, Yuliang Liu & Xiang Bai
South China University of Technology, Guangzhou, China
Mingxin Huang & Lianwen Jin
- Linger Deng
You can also search for this author inPubMed Google Scholar
- Mingxin Huang
You can also search for this author inPubMed Google Scholar
- Xudong Xie
You can also search for this author inPubMed Google Scholar
- Yuliang Liu
You can also search for this author inPubMed Google Scholar
- Lianwen Jin
You can also search for this author inPubMed Google Scholar
- Xiang Bai
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toYuliang Liu.
Editor information
Editors and Affiliations
Luleå Tekniska Universitet, Luleå, Sweden
Elisa H. Barney Smith
Luleå Tekniska Universitet, Luleå, Sweden
Marcus Liwicki
Tsinghua University, Beijing, China
Liangrui Peng
1Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Deng, L., Huang, M., Xie, X., Liu, Y., Jin, L., Bai, X. (2024). Progressive Evolution from Single-Point to Polygon for Scene Text. In: Barney Smith, E.H., Liwicki, M., Peng, L. (eds) Document Analysis and Recognition - ICDAR 2024. ICDAR 2024. Lecture Notes in Computer Science, vol 14808. Springer, Cham. https://doi.org/10.1007/978-3-031-70549-6_7
Download citation
Published:
Publisher Name:Springer, Cham
Print ISBN:978-3-031-70548-9
Online ISBN:978-3-031-70549-6
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative