Electrical Engineering and Systems Science > Audio and Speech Processing
arXiv:2207.12089 (eess)
[Submitted on 1 Jul 2022]
Title:A Polyphone BERT for Polyphone Disambiguation in Mandarin Chinese
View a PDF of the paper titled A Polyphone BERT for Polyphone Disambiguation in Mandarin Chinese, by Song Zhang and 3 other authors
View PDFAbstract:Grapheme-to-phoneme (G2P) conversion is an indispensable part of the Chinese Mandarin text-to-speech (TTS) system, and the core of G2P conversion is to solve the problem of polyphone disambiguation, which is to pick up the correct pronunciation for several candidates for a Chinese polyphonic character. In this paper, we propose a Chinese polyphone BERT model to predict the pronunciations of Chinese polyphonic characters. Firstly, we create 741 new Chinese monophonic characters from 354 source Chinese polyphonic characters by pronunciation. Then we get a Chinese polyphone BERT by extending a pre-trained Chinese BERT with 741 new Chinese monophonic characters and adding a corresponding embedding layer for new tokens, which is initialized by the embeddings of source Chinese polyphonic characters. In this way, we can turn the polyphone disambiguation task into a pre-training task of the Chinese polyphone BERT. Experimental results demonstrate the effectiveness of the proposed model, and the polyphone BERT model obtain 2% (from 92.1% to 94.1%) improvement of average accuracy compared with the BERT-based classifier model, which is the prior state-of-the-art in polyphone disambiguation.
Comments: | Accepted for INTERSPEECH 2022 |
Subjects: | Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG) |
Cite as: | arXiv:2207.12089 [eess.AS] |
(orarXiv:2207.12089v1 [eess.AS] for this version) | |
https://doi.org/10.48550/arXiv.2207.12089 arXiv-issued DOI via DataCite |
Full-text links:
Access Paper:
- View PDF
- TeX Source
- Other Formats
View a PDF of the paper titled A Polyphone BERT for Polyphone Disambiguation in Mandarin Chinese, by Song Zhang and 3 other authors
Current browse context:
eess.AS
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer(What is the Explorer?)
Connected Papers(What is Connected Papers?)
Litmaps(What is Litmaps?)
scite Smart Citations(What are Smart Citations?)
Code, Data and Media Associated with this Article
alphaXiv(What is alphaXiv?)
CatalyzeX Code Finder for Papers(What is CatalyzeX?)
DagsHub(What is DagsHub?)
Gotit.pub(What is GotitPub?)
Hugging Face(What is Huggingface?)
Papers with Code(What is Papers with Code?)
ScienceCast(What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower(What are Influence Flowers?)
CORE Recommender(What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community?Learn more about arXivLabs.