Movatterモバイル変換


[0]ホーム

URL:


CN112102812B - Anti-false wake-up method based on multiple acoustic models - Google Patents

Anti-false wake-up method based on multiple acoustic models
Download PDF

Info

Publication number
CN112102812B
CN112102812BCN202011300127.7ACN202011300127ACN112102812BCN 112102812 BCN112102812 BCN 112102812BCN 202011300127 ACN202011300127 ACN 202011300127ACN 112102812 BCN112102812 BCN 112102812B
Authority
CN
China
Prior art keywords
voice recognition
firmware
model
language
acoustic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011300127.7A
Other languages
Chinese (zh)
Other versions
CN112102812A (en
Inventor
舒畅
何云鹏
许兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chipintelli Technology Co Ltd
Original Assignee
Chipintelli Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chipintelli Technology Co LtdfiledCriticalChipintelli Technology Co Ltd
Priority to CN202011300127.7ApriorityCriticalpatent/CN112102812B/en
Publication of CN112102812ApublicationCriticalpatent/CN112102812A/en
Application grantedgrantedCritical
Publication of CN112102812BpublicationCriticalpatent/CN112102812B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

A mistake wake-up preventing method based on multiple acoustic models and a voice recognition module are provided, wherein the mistake wake-up preventing method based on the multiple acoustic models comprises the following steps: s1, selecting corpora required by training a plurality of acoustic models of different languages respectively; s2, processing the materials; s3, training the acoustic models by utilizing the corpora respectively; s4, packaging the trained acoustic model and the language model to form voice recognition firmware of different languages and burning the voice recognition firmware into a voice recognition module; and S5, the voice recognition module simultaneously inputs the audio to be recognized into a plurality of voice recognition firmware, and when the plurality of voice recognition firmware are recognized as command words simultaneously, the voice recognition module judges the command words and executes the commands. The invention can effectively avoid the error recognition of non-command words into command words caused by harmonic sounds by training the voice recognition firmware of two different languages to simultaneously perform voice recognition, and can not influence the normal recognition of the command words.

Description

Anti-false wake-up method based on multiple acoustic models
Technical Field
The invention belongs to the technical field of voice recognition, and particularly relates to a multi-acoustic-model-based anti-false-awakening method.
Background
With the increasing need for human-computer interaction, speech recognition-related applications are increasing in life. In the mature speech interaction, people are more pursuing comfort for speech recognition and accuracy of speech recognition. Speech recognition is achieved by matching a language model with an acoustic model.
For example, in an english model, because only part of english corpus is labeled as garbage words during training, in a chinese environment, some chinese words having similar pronunciation to an english command word are mistakenly recognized as an english command word by the english model, so that the recognition is performed accordingly. The solution in the prior art is to expand a garbage lexicon required by training an English acoustic model, and the method can solve the problem that the English model is mistakenly awakened in an English environment and can also improve the mistaken identification condition of the English model in a Chinese environment, but still cannot effectively solve the problem.
Disclosure of Invention
In order to overcome the technical defects in the prior art, the invention discloses a false wake prevention method based on a multi-acoustic model.
The invention discloses a multi-acoustic-model-based anti-false-awakening method, which comprises the following steps of:
s1, respectively selecting language corpora required by training a plurality of acoustic models of different languages; wherein, the language corpora corresponding to the training of different acoustic models all contain command word corpora;
s2, processing the material, wherein pronunciations of command words corresponding to the command word material are the same or similar under a plurality of language models;
s3, training the corresponding acoustic models by using the processed corpora respectively;
s4, packaging the trained acoustic model and the language model to form voice recognition firmware of different languages and burning the voice recognition firmware into a voice recognition module;
s5, in the specific recognition process, the voice recognition module simultaneously inputs the audio to be recognized into a plurality of voice recognition firmware, and when the plurality of voice recognition firmware are recognized as command words simultaneously, the voice recognition module judges the command words and executes commands;
if the command words are not recognized at the same time, the command words are not considered;
the differences between the plurality of language models and the plurality of speech recognition firmware are all for different languages; in the step S2, the pronunciation of the command word in the plurality of language models is defined by pronunciation rules of the same language, which is any one of different languages in the step S1.
Specifically, the method specifically comprises the following steps for two different languages:
s11, selecting corpora required by training the first acoustic model and the second acoustic model respectively; wherein, the language corpora corresponding to the training of different acoustic models all contain command word corpora;
s21, processing the linguistic data, wherein pronunciations of the command words under the first language model and the second language model are the same or similar;
s31, training a first acoustic model and a second acoustic model respectively by utilizing the processed two language corpora;
s41, packaging the trained acoustic model and the language model to form a first voice recognition firmware and a second voice recognition firmware, and burning the first voice recognition firmware and the second voice recognition firmware into a voice recognition module;
s51, in the specific recognition process, the voice recognition module simultaneously inputs the audio to be recognized into a first voice recognition firmware and a second voice recognition firmware, and when the first voice recognition firmware and the second voice recognition firmware are recognized as command words simultaneously, the voice recognition module judges the command words and executes commands;
if the command words are not recognized at the same time, the command words are not considered;
the first acoustic model, the first language model and the first speech recognition firmware are specific to one of two different languages, and the second acoustic model, the second language model and the second speech recognition firmware are specific to the other of the two different languages.
Preferably, the corpora required for training the first and second acoustic models in step S11 are different.
Preferably, the training in the step S3 is performed by using a KALDI method.
The invention also discloses a voice recognition module based on the multiple acoustic models, which comprises voice recognition firmware of different languages, wherein the voice recognition firmware of different languages has the following functions: all command words are given pronunciation in all voice recognition firmware and marked as command words, the voice recognition module further comprises a command word judgment module for judging the command words, and the judgment method of the command word judgment module is as follows: all the voice recognition firmware recognizes the command word, and the command word is judged to be the command word, otherwise, the command word is not considered to be the command word.
By adopting the anti-false-awakening method and the voice recognition module based on the multi-acoustic model, the voice recognition is carried out by training the voice recognition firmware of two different languages at the same time, so that the condition that non-command words are mistakenly recognized as command words due to harmonic sounds can be effectively avoided, and the normal recognition of the command words cannot be influenced.
Drawings
FIG. 1 is a diagram illustrating an embodiment of a speech recognition firmware according to the present invention;
FIG. 2 is a diagram illustrating an embodiment of speech signal recognition according to the present invention.
Detailed Description
The following provides a more detailed description of the present invention.
The invention discloses a multi-acoustic-model-based anti-false-awakening method, which comprises the following steps of:
s1, respectively selecting language corpora required by training a plurality of acoustic models of different languages; wherein, the language corpora corresponding to the training of different acoustic models all contain command word corpora;
s2, processing the material, wherein pronunciations of command words corresponding to the command word material are the same or similar under a plurality of language models;
s3, training the corresponding acoustic models by using the processed corpora respectively;
s4, packaging the trained acoustic model and the language model to form voice recognition firmware of different languages and burning the voice recognition firmware into a voice recognition module;
s5, in the specific recognition process, the voice recognition module simultaneously inputs the audio to be recognized into a plurality of voice recognition firmware, and when the plurality of voice recognition firmware are recognized as command words simultaneously, the voice recognition module judges the command words and executes commands;
if the command word is not recognized at the same time, the command word is not considered.
In the simplest form, specifically for both languages, the following steps are included.
S11, selecting corpora required by training the first acoustic model and the second acoustic model respectively; wherein, the language corpora corresponding to the training of different acoustic models all contain command word corpora;
s21, processing the linguistic data, wherein pronunciations of the command words under the first language model and the second language model are the same or similar;
s31, training a first acoustic model and a second acoustic model respectively by utilizing the processed two language corpora;
s41, packaging the trained acoustic model and the language model to form a first voice recognition firmware and a second voice recognition firmware, and burning the first voice recognition firmware and the second voice recognition firmware into a voice recognition module;
the first acoustic model, the first language model and the first voice recognition firmware aim at one of two different languages, and the second acoustic model, the second language model and the second voice recognition firmware aim at the other of the two different languages;
as shown in fig. 1, the above steps complete the construction of the speech recognition module, and step S51 completes the specific recognition process.
In the specific recognition process of step S51, the voice recognition module simultaneously inputs the audio to be recognized into the first voice recognition firmware and the second voice recognition firmware, and when the first voice recognition firmware and the second voice recognition firmware simultaneously recognize as a command word, the voice recognition module determines that the command word and executes the command;
if the command word is not recognized at the same time, the command word is not considered.
Taking two most common Chinese and English languages as an example, namely the languages corresponding to the first acoustic model, the first language model and the first speech recognition firmware are Chinese; the second acoustic model, the second language model and the second speech recognition firmware correspond to the language of English.
Other languages can be selected, and the types of languages can be continuously increased, so long as more than two languages exist and are similar to the pronunciation of a certain meaningful word, the recognition can be carried out by referring to the invention.
Step S11, selecting corpora needed by training English and Chinese acoustic models respectively based on the application context;
the linguistic data required for training the English acoustic model and the Chinese acoustic model can be the same, but preferably different, and the training of different linguistic data can expand a garbage lexicon, so that the false awakening rate can be better reduced;
command word corpora are included in corpora for training different models;
step S21, processing the language material, and generating corresponding Chinese language model and English language model according to Chinese and English word stock, wherein the pronunciation of the command word is the same or similar under the Chinese and English language models;
for example, corresponding to the English command word START, in the Chinese acoustic model, the pronunciation of the Chinese command word is marked as SI DA TE by Pinyin, tone marks can be added, and three syllables are respectively one sound, four sounds and light sounds.
Step S31, training English acoustic models and Chinese acoustic models by using the processed corpus;
the training in step S3 or S31 may be performed in a KALDI manner. Required English models and Chinese models are trained, Kaldi is written mainly in C + + language, Shell, Python and Perl are used as voice recognition tools for model training with glue, and the model is completely free and can be quickly trained.
Step S41, packaging the trained acoustic model and the language model to form a first voice recognition firmware and a second voice recognition firmware, and burning the first voice recognition firmware and the second voice recognition firmware into the voice recognition module;
through the process, the voice recognition module based on the double-acoustic model can be obtained, the first voice recognition firmware and the second voice recognition firmware are arranged in the voice recognition module, the voice recognition module further comprises a command word judging module, and the judging method of the command word judging module is as follows: the first and second speech recognition firmware recognize the command word, and then determine it is the command word, otherwise, it is not the command word, and the specific determination logic is as shown in fig. 2.
In the specific recognition process, the voice recognition module simultaneously inputs the audio to be recognized into the Chinese voice recognition firmware and the English voice recognition firmware,
when the Chinese voice recognition firmware and the English voice recognition firmware recognize command words at the same time, a command word judgment module of the voice recognition module judges the command words and executes the commands;
not recognized simultaneously, including both firmware being unrecognized, or only one firmware being recognized and the other being unrecognized, are considered not to be command words.
The invention mainly aims at the phenomenon of misrecognition caused by the fact that pronunciation of some Chinese words is similar to that of English command words or the pronunciation of the English words is similar to that of the Chinese command words.
For example, the english command word WARMER is used as a heating command word, is similar to the pronunciation of the Chinese common word "us", and is easily recognized as the command word "WARMER" by the english speech recognition firmware when the pronunciation of "us" is heard.
During training, the WARMER is determined as a command word, corresponding pronunciation is stored in the English language model, and a similar pronunciation, such as 'WO MEN ER', is given to the English word 'WARMER' in the Chinese language model by using Chinese pronunciation rules so that the pronunciation is similar to the 'WARMER' English pronunciation.
While "we" are not command words, in the Chinese language model there are pronunciations, but in the English language model there are no similar pronunciations assigned.
When the user sends out the WARMER pronunciation, the Chinese language model and the English language model have the same or similar pronunciations and can be respectively and simultaneously recognized by the Chinese speech recognition firmware and the English speech recognition firmware, so that the command word is judged.
When the user sends out 'us', even though the 'us' is not a command word, the user can be mistakenly recognized as the 'WARMER' command word by English speech recognition firmware, the word is recognized as not the command word in Chinese speech recognition firmware, and the 'us' is originally a Chinese pronunciation, so that the Chinese speech recognition firmware is easier to recognize and has more accurate recognition result, and the user can not judge the word as the command word or determine the word as a junk word when the Chinese speech recognition firmware recognizes the word as not the command word.
Other languages can be selected, and the types of languages can be continuously increased, so long as more than two languages exist and are similar to the pronunciation of a certain meaningful word, the recognition can be carried out by referring to the invention.
In the invention, the linguistic data adopted for training different language models are not completely the same, only the linguistic data of the command word part is the same, but the linguistic data of the non-command word or the junk word part is different. For example, in an english environment, some words may not be covered during training, or some words may not be used in the english environment, and after the model is placed in a chinese environment, the pronunciation of some english non-command words is similar to that of the chinese command word, thereby causing false awakening. After two language models are adopted for simultaneous recognition, the part of words which are awoken by mistake in the Chinese environment can be defined as garbage words, and the judgment of the Chinese model can be different from that of the English model under the two language models, so that the occurrence of awoken by mistake in single model recognition can be avoided.
The invention discloses a voice recognition module based on multiple acoustic models, which comprises voice recognition firmware of different languages formed by the method, wherein the voice recognition firmware of different languages has the following functions: all command words are given pronunciation in all voice recognition firmware and marked as command words, the voice recognition module further comprises a command word judgment module for judging the command words, and the judgment method of the command word judgment module is as follows: all the voice recognition firmware recognizes the command word, and the command word is judged to be the command word, otherwise, the command word is not considered to be the command word. The command word judging module is a software module and collects the recognition results of all the voice recognition firmware, all the firmware recognizes the current voice signals as command words, the command word judging module considers the command words, and otherwise, the command words are not considered as the command words.
By adopting the anti-false-awakening method and the voice recognition module based on the multi-acoustic model, the voice recognition is carried out by training the voice recognition firmware of two different languages at the same time, so that the condition that non-command words are mistakenly recognized as command words due to harmonic sounds can be effectively avoided, and the normal recognition of the command words cannot be influenced.
The foregoing is directed to preferred embodiments of the present invention, wherein the preferred embodiments are not obviously contradictory or subject to any particular embodiment, and any combination of the preferred embodiments may be combined in any overlapping manner, and the specific parameters in the embodiments and examples are only for the purpose of clearly illustrating the inventor's invention verification process and are not intended to limit the scope of the invention, which is defined by the claims and the equivalent structural changes made by the description and drawings of the present invention are also intended to be included in the scope of the present invention.

Claims (4)

CN202011300127.7A2020-11-192020-11-19Anti-false wake-up method based on multiple acoustic modelsActiveCN112102812B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202011300127.7ACN112102812B (en)2020-11-192020-11-19Anti-false wake-up method based on multiple acoustic models

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202011300127.7ACN112102812B (en)2020-11-192020-11-19Anti-false wake-up method based on multiple acoustic models

Publications (2)

Publication NumberPublication Date
CN112102812A CN112102812A (en)2020-12-18
CN112102812Btrue CN112102812B (en)2021-02-05

Family

ID=73785894

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202011300127.7AActiveCN112102812B (en)2020-11-192020-11-19Anti-false wake-up method based on multiple acoustic models

Country Status (1)

CountryLink
CN (1)CN112102812B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN114187899B (en)*2021-11-042025-07-08杭州涿溪脑与智能研究所Intelligent voice assistant wake-up word recognition model reinforcement method based on genetic algorithm

Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20130238336A1 (en)*2012-03-082013-09-12Google Inc.Recognizing speech in multiple languages
WO2014197303A1 (en)*2013-06-072014-12-11Microsoft CorporationLanguage model adaptation using result selection
US20200160838A1 (en)*2018-11-212020-05-21Samsung Electronics Co., Ltd.Speech recognition method and apparatus
CN111292750A (en)*2020-03-092020-06-16成都启英泰伦科技有限公司Local voice recognition method based on cloud improvement
CN111710337A (en)*2020-06-162020-09-25睿云联(厦门)网络通讯技术有限公司Voice data processing method and device, computer readable medium and electronic equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US10923117B2 (en)*2019-02-192021-02-16Tencent America LLCBest path change rate for unsupervised language model weight selection
CN111933129B (en)*2020-09-112021-01-05腾讯科技(深圳)有限公司Audio processing method, language model training method and device and computer equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20130238336A1 (en)*2012-03-082013-09-12Google Inc.Recognizing speech in multiple languages
WO2014197303A1 (en)*2013-06-072014-12-11Microsoft CorporationLanguage model adaptation using result selection
US20200160838A1 (en)*2018-11-212020-05-21Samsung Electronics Co., Ltd.Speech recognition method and apparatus
CN111292728A (en)*2018-11-212020-06-16三星电子株式会社 Speech recognition method and device
CN111292750A (en)*2020-03-092020-06-16成都启英泰伦科技有限公司Local voice recognition method based on cloud improvement
CN111710337A (en)*2020-06-162020-09-25睿云联(厦门)网络通讯技术有限公司Voice data processing method and device, computer readable medium and electronic equipment

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"Dual Language Models for Code Switched Speech Recognition";Saurabh Garg 等;《https://arxiv.org/abs/1711.01048》;20180803;全文*
"Multilingual Speech Recognition With A Single End-To-End Model";Shubham Toshniwal 等;《https://arxiv.org/abs/1711.01694》;20180215;全文*
"语音识别准确率与检索性能的关联性研究";周梁 等;《第二届全国信息检索与内容安全学术会议(NCIRCS-2005)论文集》;20051031;全文*

Also Published As

Publication numberPublication date
CN112102812A (en)2020-12-18

Similar Documents

PublicationPublication DateTitle
US10319250B2 (en)Pronunciation guided by automatic speech recognition
JP5330450B2 (en) Topic-specific models for text formatting and speech recognition
US6067520A (en)System and method of recognizing continuous mandarin speech utilizing chinese hidden markou models
US6839667B2 (en)Method of speech recognition by presenting N-best word candidates
US5787230A (en)System and method of intelligent Mandarin speech input for Chinese computers
US9613621B2 (en)Speech recognition method and electronic apparatus
WO2017071182A1 (en)Voice wakeup method, apparatus and system
EP1430474B1 (en)Correcting a text recognized by speech recognition through comparison of phonetic sequences in the recognized text with a phonetic transcription of a manually input correction word
US20020123894A1 (en)Processing speech recognition errors in an embedded speech recognition system
US11263198B2 (en)System and method for detection and correction of a query
JP2001296880A (en)Method and device to generate plural plausible pronunciation of intrinsic name
CN106710585B (en) Method and system for broadcasting polyphonic characters during voice interaction
Howell et al.Development of a two-stage procedure for the automatic recognition of dysfluencies in the speech of children who stutter: I. Psychometric procedures appropriate for selection of training material for lexical dysfluency classifiers
JP2011002656A (en)Device for detection of voice recognition result correction candidate, voice transcribing support device, method, and program
Savchenko et al.Towards the creation of reliable voice control system based on a fuzzy approach
US6963834B2 (en)Method of speech recognition using empirically determined word candidates
JP4680714B2 (en) Speech recognition apparatus and speech recognition method
JP2002132287A (en) Voice recording method, voice recording device, and storage medium
CN100578613C (en)Speech recognition system and method using merged dictionary
CN112102812B (en)Anti-false wake-up method based on multiple acoustic models
CN110808050A (en) Speech recognition method and smart device
EP3790000B1 (en)System and method for detection and correction of a speech query
CN110265018B (en)Method for recognizing continuously-sent repeated command words
TWI856667B (en)Speaking training system with extra pronunciation correction
US12094463B1 (en)Default assistant fallback in multi-assistant devices

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp