- Notifications
You must be signed in to change notification settings - Fork43
Chinese Mandarin Grapheme-to-Phoneme Converter. 中文轉注音或拼音 (INTERSPEECH 2022)
License
GitYCC/g2pW
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Authors:Yi-Chang Chen, Yu-Chuan Chang, Yen-Cheng Chang and Yi-Ren Yeh
This is the official repository of our paperg2pW: A Conditional Weighted Softmax BERT for Polyphone Disambiguation in Mandarin (INTERSPEECH 2022).
- g2pW is included inPaddlePaddle/PaddleSpeech
- g2pW is included inmozillazg/pypinyin-g2pW
(This work was tested with PyTorch 1.7.0, CUDA 10.1, python 3.6 and Ubuntu 16.04.)
InstallPyTorch
$ pip install g2pw
>>>fromg2pwimportG2PWConverter>>>conv=G2PWConverter()>>>sentence='上校請技術人員校正FN儀器'>>>conv(sentence)[['ㄕㄤ4','ㄒㄧㄠ4','ㄑㄧㄥ3','ㄐㄧ4','ㄕㄨ4','ㄖㄣ2','ㄩㄢ2','ㄐㄧㄠ4','ㄓㄥ4',None,None,'ㄧ2','ㄑㄧ4']]>>>sentences= ['銀行','行動']>>>conv(sentences)[['ㄧㄣ2','ㄏㄤ2'], ['ㄒㄧㄥ2','ㄉㄨㄥ4']]
conv=G2PWConverter(model_dir='./G2PWModel-v2-onnx/',model_source='./path-to/bert-base-chinese/')
>>>fromg2pwimportG2PWConverter>>>conv=G2PWConverter(style='pinyin',enable_non_tradional_chinese=True)>>>conv('然而,他红了20年以后,他竟退出了大家的视线。')[['ran2','er2',None,'ta1','hong2','le5',None,None,'nian2','yi3','hou4',None,'ta1','jing4','tui4','chu1','le5','da4','jia1','de5','shi4','xian4',None]]
$ git clone https://github.com/GitYCC/g2pW.git
For example, we train models on CPP dataset as follows:
$ bash cpp_dataset/download.sh$ python scripts/train_g2p_bert.py --config configs/config_cpp.py
$ python scripts/test_g2p_bert.py \ --config saved_models/CPP_BERT_M_DescWS-Sec-cLin-B_POSw01/config.py \ --checkpoint saved_models/CPP_BERT_M_DescWS-Sec-cLin-B_POSw01/best_accuracy.pth \ --sent_path cpp_dataset/test.sent \ --output_path output_pred.txt
$ python scripts/predict_g2p_bert.py \ --config saved_models/CPP_BERT_M_DescWS-Sec-cLin-B_POSw01/config.py \ --checkpoint saved_models/CPP_BERT_M_DescWS-Sec-cLin-B_POSw01/best_accuracy.pth \ --sent_path cpp_dataset/test.sent \ --lb_path cpp_dataset/test.lb
To cite the code/data/paper, please use this BibTex
@inproceedings{chen22d_interspeech,title ={g2pW: A Conditional Weighted Softmax BERT for Polyphone Disambiguation in Mandarin},author ={Yi-Chang Chen and Yu-Chuan Steven and Yen-Cheng Chang and Yi-Ren Yeh},year ={2022},booktitle ={Interspeech 2022},pages ={1926--1930},doi ={10.21437/Interspeech.2022-216},issn ={2958-1796},}
About
Chinese Mandarin Grapheme-to-Phoneme Converter. 中文轉注音或拼音 (INTERSPEECH 2022)
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Uh oh!
There was an error while loading.Please reload this page.
Contributors4
Uh oh!
There was an error while loading.Please reload this page.