Movatterモバイル変換

19 March 2023Multilingual semantic fusion network for text recognition in the wild

Celi Lou,Minglei Tong, Liang Xue,Sisil Kumarawadu

Celi Lou,^1,2 Minglei Tong,^1,* Liang Xue,¹ Sisil Kumarawadu³

¹Shanghai Univ. of Electric Power (China)
²Univ. of Chinese Academy of Sciences (China)
³Univ. of Moratuwa (Sri Lanka)

^*Address all correspondence to Minglei Tong, tongminglei@shiep.edu.cn

Funded by:National Natural Science Foundation of China (NSFC)

Journal of Electronic Imaging, Vol. 32, Issue 2, 023015 (March 2023).https://doi.org/10.1117/1.JEI.32.2.023015

ARTICLE
FIGURES & TABLES
REFERENCES
CITED BY
DOWNLOAD PAPER SAVE TO MY LIBRARY

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account?Create one

;

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335ADD TO CART

25 downloads per 1-year subscription

Members: $145

Non-members: $250ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members:

Non-members:ADD TO CART

This will count as one of your downloads.

You will have access to both the presentation and article (if available).

DOWNLOAD NOW

Abstract

Most current approaches in the literature of scene text recognition train the language model via a text dataset far sparser than in natural language processing, resulting in inadequate training. Therefore, we propose a simple transformer encoder–decoder model called the multilingual semantic fusion network (MSFN) that can leverage prior linguistic knowledge to learn robust language features. First, we label the text dataset with forward, backward sequences, and subwords, which are extracted by tokenization with linguistic information. Then we introduce a multilingual model to the decoder corresponding to three different channels of the labeled dataset. The final output is fused by different channels to get more accurate results. In experiments, MSFN achieves cutting-edge performance across six benchmark datasets, and extensive ablative studies have proven the effectiveness of the proposed method. Code is available athttps://github.com/lclee0577/MLViT.

CitationDownload Citation

Celi Lou,Minglei Tong,Liang Xue, andSisil Kumarawadu"Multilingual semantic fusion network for text recognition in the wild," Journal of Electronic Imaging 32(2), 023015 (19 March 2023).https://doi.org/10.1117/1.JEI.32.2.023015

Received: 24 August 2022; Accepted: 20 February 2023; Published: 19 March 2023

ACCESS THE FULL ARTICLE