Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Pronunciation lexicon covering both English and Chinese languages for Automatic Speech Recognition.

NotificationsYou must be signed in to change notification settings

speechio/BigCiDian

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

1. Goal

This project is an attempt to create a pronunciation lexicon covering both English and Chinese wordsin a unified phoneset for ASR applications.

P.S. "CiDian" means "lexicon" in Chinese.

typical use cases in Chinese ASR applications:

你手机上都装了什么 APP ?APPLE 的新 MACBOOK PRO 真漂亮上个月 PRADA 出了款新包包手机开了 GPRS 导航世界杯 H 组小组赛

2. Phoneset

The unified phoneset should be a simple and precise phoneset that covers both languages. Note that the mapping listed below are heavily based on IPA.

2.1 English Phoneset Mapping

English entries are derived from CMUDict 0.7b, hence we need a mapping from ARPA phoneset to target phoneset.

ARPAIPACMUDict example entries
AA0aicon:AY1 K AA0 N
AA1aheart: HH AA1 R T
AA2akmart: K EY1 M AA2 R T
AE0æromance: R OW1 M AE0 N S
AE1ælambda: L AE1 M D AH0
AE2æsetback: S EH1 T B AE2 K
AH0əstation: S T EY1 SH AH0 N
AH1ʌbug: B AH1 G
AH2ʌhaircut: HH EH1 R K AH2 T
AO0ɔhongkong: HH AO1 NG K AO0 NG
AO1ɔlaw: L AO1
AO2ɔlayoff: L EY1 AO2 F
AW0aufoundation: F AW0 N D EY1 SH AH0 N
AW1aufounder: F AW1 N D ER0
AW2auhometown: HH OW1 M T AW2 N
AY0aihypothese: HH AY0 P AA1 TH AH0 S IY2 Z
AY1aiice: AY1 S
AY2aiiceland: AY1 S L AH0 N D
Bbbike: B AY1 K
CHchchase: CH EY1 S
Dddesk: D EH1 S K
DHðthose: DH OW1 Z
EH0eprincess: P R IH1 N S EH0 S
EH1eprofessor: P R AH0 F EH1 S ER0
EH2eprogress: P R AA1 G R EH2 S
ER0ə rprogrammer: P R OW1 G R AE2 M ER0
ER1ə rpurge: P ER1 JH
ER2ə rshowgirl: SH OW1 G ER2 L
EY0eieighteen: EY0 T IY1 N
EY1eiemail: IY0 M EY1 L
EY2eithursday: TH ER1 Z D EY2
Ffface: F EY1 S
Gggive: G IH1 V
HHhhey: HH EY1
IH0ifacing: F EY1 S IH0 NG
IH1ifear: F IH1 R
IH2ifellowship: F EH1 L OW0 SH IH2 P
IY0iiemail: IY0 M EY1 L
IY1iiprefix: P R IY1 F IH0 K S
IY2iiincrease: IH1 N K R IY2 S
JHzhgesture: JH EH1 S CH ER0
Kkcat: K AE1 T
Lllack: L AE1 K
Mmmay: M EY1
Nnno: N OW1
NGŋthing: TH IH1 NG
OW0əucrypto: K R IH1 P T OW0
OW1əutoken: T OW1 K AH0 N
OW2əuearphone: IH1 R F OW2 N
OY0ɔiinvoice: IH1 N V OY0 S
OY1ɔifloyd: F L OY1 D
OY2ɔiepisode: EH1 P IH0 S OW2 D
Pppat: P AE1 T
Rrrisk: R IH1 S K
Sssing: S IH1 NG
SHshshake: SH EY1 K
Tttest: T EH1 S T
THθthink: TH IH1 NG K
UH0ufulfill: F UH0 L F IH1 L
UH1ufull: F UH1 L
UH2ugoodbye: G UH2 D B AY1
UW0uurescue: R EH1 S K Y UW0
UW1uufool: F UW1 L
UW2uurestroom: R EH1 S T R UW2 M
Vvvery: V EH1 R IY0
Wwwest: W EH1 S T
Yyyes: Y EH1 S
Zzzero: Z IY1 R OW0
ZHʒillusion: IH2 L UW1 ZH AH0 N

notes: If you find anything that doesn't make sense in the mapping table, please let me know, thanks

2.2 Chinese PinYin Mapping

Chinese entries are extracted fromDaCiDian project

Here is a PinYin to IPA mapping from educational prospective:https://resources.allsetlearning.com/chinese/pronunciation/Pinyin_chart

With a few mapping modifications and symbolic adaptations, here is the finalPinYin to target phoneset mapping

2.3 tone

There are normally 5 tones in Chinese PinYin system ranging from 0 ~ 4.However there is no tone definition in English. In BigCiDian, Chinese tonal information is retained and merged with untoned English, so the resulting phoneset may contain 6 tonal variation(1 from English and 5 from Chinese):

e.g. for phoneme *ai*1. HI -> h ai2. 哎 -> ai_03. 掰 -> b ai_14. 还 -> h ai_25. 凯 -> k ai_36. 外 -> w ai_4

2.4 the unified phoneset

The final unified bi-lingual phoneset details are listed below:

phonemeCN exampleEN example
a把b a_3AACHENa k ə n
æCATk æ t
ai爱ai_4KITEk ai t
an安an_1
羊y aŋ_2
au老l au_3LOUDl au d
b白b ai_2BUTb ʌ t
ch陈ch ən_2CHESTch e s t
d大d a_4DAYd ei
ðTHISð i s
eBEDb e d
ei累l ei_4LAKEl ei k
ə鹅ə_2COCA-COLAk əu k ə k əu l a
ən陈ch ən_2
əŋ横h əŋ_2
ər二ər_4
əu欧əu_1BOATb əu t
f房f aŋ_2FACEf ei s
g刚g aŋ_1GIVEg i v
h海h ai_3HUGh ʌ g
i天t i an_1HITh i t
ie别b ie_2
ii比b ii_3BEATb ii t
iii吃ch iii_1
in音y in_1
听t iŋ_1
j九j i əu_3
k看k an_4CAKEk ei k
l来l ai_2LAKEl ei k
m马m a_3MAKEm ei k
n那n a_1NIKEn ai k ii
ŋINTERESTINGi n t ə r e s t i ŋ
ɔOFFɔ f
ɔiJOYzh ɔi
p胖p aŋ_4PACEp ei s
q钱q i an_2
r让ʒ aŋ_4RISKr i s k
s丝s iii_1SINGs i ŋ
sh上sh aŋ_4SHAKEsh ei k
t团t u an_2TIMEt ai m
ts才ts ai_2
uBOOKb u k
从ts uŋ_2
桌zh uɔ_1
uu不b uu_4TWOt uu
vVICTORYv i k t ə r ii
ʌCUTk ʌ t
w王w aŋ_2WESTw e s t
x西x ii_1
y言y an_2YESy e s
yu去q yu_4
yue缺q yue_1
z赞z an_4ZOOz uu
zh中zh uŋ_1GESTUREzh e s ch ə r
ʒ让ʒ aŋ_4LEISUREl e ʒ ə r
θTHINKθ i ŋ k

So overall there are 56 phonemes in the unified phoneset(regardless of tones).

Theoretically some phonemes can be split with smaller granularity(eg. au->a u, ɔi->ɔ i, an->a n ...), hence making the phoneset even more compact. But it is a common practice that larger acoustic modeling units are beneficial for Chinese ASR accuracy, and the existence of decision-tree based state-tying, makes base phoneset size less irrelevant to ASR problem.

I may or may not change the unified phoneset in the future, currently it seems to be sufficient for my purpose.

3. Usage

sh run.sh should give you a ready-to-use bi-lingual ASR lexicon (lexicon.txt), and a phoneset list(phones.list) in project root directory.

4. Extend entries

To extend the final lexicon with entries of your own interest(say "IPHONE", "华为P30"), you can either:

  • add those entries into the very bottom sources(CMUDict and DaCiDian)

or:

  • maintain a seperate extension-lexicon, and merge it with main lexicon automatically generated above.

5. Experiment result

InAISHELL-2 Mandarin ASR task, replacing Chinese lexicon(DaCiDian) with multilingual CN-EN lexicon(BigCiDian), details are showed below:

For DaCiDian, system performance:

----- test -----:%WER 44.39 [ 21986 / 49532, 338 ins, 2085 del, 19563 sub ] exp/mono/decode_test/cer_9_0.0%WER 24.25 [ 12011 / 49532, 393 ins, 792 del, 10826 sub ] exp/tri1/decode_test/cer_12_0.0%WER 22.13 [ 10963 / 49532, 396 ins, 644 del, 9923 sub ] exp/tri2/decode_test/cer_12_0.0%WER 19.29 [ 9555 / 49532, 263 ins, 640 del, 8652 sub ] exp/tri3/decode_test/cer_13_0.5%WER 8.33 [ 4125 / 49532, 84 ins, 192 del, 3849 sub ] exp/chain/tdnn_1a/decode_test/cer_8_0.5

For BigCiDian, system performance:

%WER 43.92 [ 21754 / 49532, 405 ins, 1574 del, 19775 sub ] exp/mono/decode_test/cer_7_0.0%WER 22.54 [ 11163 / 49532, 406 ins, 652 del, 10105 sub ] exp/tri1/decode_test/cer_11_0.0%WER 21.09 [ 10445 / 49532, 377 ins, 609 del, 9459 sub ] exp/tri2/decode_test/cer_12_0.0%WER 18.47 [ 9148 / 49532, 265 ins, 621 del, 8262 sub ] exp/tri3/decode_test/cer_13_0.5%WER 8.22 [ 4072 / 49532, 68 ins, 260 del, 3744 sub ] exp/chain/tdnn_1a/decode_test/cer_9_0.5

Conclusion

  • It shows that BigCiDian only gives slightly better results than DaCiDian.
  • But more importantly, BigCiDian turns a pure Chinese ASR system to multiligual system, which is pretty much the case in nowadays Chinese ASR applications.

THE END

About

Pronunciation lexicon covering both English and Chinese languages for Automatic Speech Recognition.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp