Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Chinese Mandarin Grapheme-to-Phoneme Converter. 中文轉注音或拼音 (INTERSPEECH 2022)

License

NotificationsYou must be signed in to change notification settings

GitYCC/g2pW

Repository files navigation

Downloadslicense

Authors:Yi-Chang Chen, Yu-Chuan Chang, Yen-Cheng Chang and Yi-Ren Yeh

This is the official repository of our paperg2pW: A Conditional Weighted Softmax BERT for Polyphone Disambiguation in Mandarin (INTERSPEECH 2022).

News

Getting Started

Dependency / Install

(This work was tested with PyTorch 1.7.0, CUDA 10.1, python 3.6 and Ubuntu 16.04.)

Quick Demo

Open In Colab

>>>fromg2pwimportG2PWConverter>>>conv=G2PWConverter()>>>sentence='上校請技術人員校正FN儀器'>>>conv(sentence)[['ㄕㄤ4','ㄒㄧㄠ4','ㄑㄧㄥ3','ㄐㄧ4','ㄕㄨ4','ㄖㄣ2','ㄩㄢ2','ㄐㄧㄠ4','ㄓㄥ4',None,None,'ㄧ2','ㄑㄧ4']]>>>sentences= ['銀行','行動']>>>conv(sentences)[['ㄧㄣ2','ㄏㄤ2'], ['ㄒㄧㄥ2','ㄉㄨㄥ4']]

Load Offline Model

conv=G2PWConverter(model_dir='./G2PWModel-v2-onnx/',model_source='./path-to/bert-base-chinese/')

Support Simplified Chinese and Pinyin

>>>fromg2pwimportG2PWConverter>>>conv=G2PWConverter(style='pinyin',enable_non_tradional_chinese=True)>>>conv('然而,他红了20年以后,他竟退出了大家的视线。')[['ran2','er2',None,'ta1','hong2','le5',None,None,'nian2','yi3','hou4',None,'ta1','jing4','tui4','chu1','le5','da4','jia1','de5','shi4','xian4',None]]

Scripts

$ git clone https://github.com/GitYCC/g2pW.git

Train Model

For example, we train models on CPP dataset as follows:

$ bash cpp_dataset/download.sh$ python scripts/train_g2p_bert.py --config configs/config_cpp.py

Testing

$ python scripts/test_g2p_bert.py \    --config saved_models/CPP_BERT_M_DescWS-Sec-cLin-B_POSw01/config.py \    --checkpoint saved_models/CPP_BERT_M_DescWS-Sec-cLin-B_POSw01/best_accuracy.pth \    --sent_path cpp_dataset/test.sent \    --output_path output_pred.txt

Prediction

$ python scripts/predict_g2p_bert.py \    --config saved_models/CPP_BERT_M_DescWS-Sec-cLin-B_POSw01/config.py \    --checkpoint saved_models/CPP_BERT_M_DescWS-Sec-cLin-B_POSw01/best_accuracy.pth \    --sent_path cpp_dataset/test.sent \    --lb_path cpp_dataset/test.lb

Checkpoints

Citation

To cite the code/data/paper, please use this BibTex

@inproceedings{chen22d_interspeech,title     ={g2pW: A Conditional Weighted Softmax BERT for Polyphone Disambiguation in Mandarin},author    ={Yi-Chang Chen and Yu-Chuan Steven and Yen-Cheng Chang and Yi-Ren Yeh},year      ={2022},booktitle ={Interspeech 2022},pages     ={1926--1930},doi       ={10.21437/Interspeech.2022-216},issn      ={2958-1796},}

Star History

Star History Chart


[8]ページ先頭

©2009-2025 Movatter.jp