A kind of method and device by Voice command smart machineTechnical field
The present embodiments relate to voice control technology field, particularly relate to one and set by Voice command intelligenceStandby method and device.
Background technology
In prior art, the remote control thereof of intelligent television mainly being included two kinds, one is distant by buttonControl device and the infrared communication of intelligent television, control intelligent television;And, another kind is by Voice command intelligenceCan TV.By remote controller operation TV, if remote controller breaks down, user cannot use television set,Therefore, the most gradually replaced remote controller to control by controlling television by using voice.
Voice control function brings many facilities to user, simplifies the operating procedure of user, perfectly realizes peopleMachine is mutual.But, the phonetic function of current intelligent television the most only supports mandarin.But, current mandarinUniversal the most quite varied, user's mandarin in each place is not special standard.Domestic consumer is the most stillBased on the local dialect exchange, the phonetic recognization rate ratio thus making intelligent television is relatively low, in fact it could happen that useFamily barks out voice command several times, the situation of intelligent television but None-identified, and user can not get the most mutual bodyTest.
Summary of the invention
The embodiment of the present invention provides a kind of method and device by Voice command smart machine, to provide a kind ofThe new Voice command mode of smart machine, improves speech recognition accuracy, improves the mutual body of userTest.
First aspect, embodiments provides a kind of method by Voice command smart machine, including:
Obtain the control voice that user sends;
The sound template that described control voice and preset language lineage class library include is compared;
Determine the control instruction corresponding with described control voice according to comparison result, perform described control instruction pairThe operation answered, wherein, described language is the mandarin that class library includes associating with smart machine control instructionSound template and dialect phonetic template.
Preferably, before obtaining the control voice that user sends, also include:
Shown by the display screen of smart machine and set statement, to point out user to read described setting statement;
Would correspond to the user speech setting statement and mate, to determine user's with setting tested speech storehouseLanguage is classification;
Language determined by corresponding in described language lineage class library is the comparison of the sound template of classificationPriority is set to the highest;
And, the sound template that described control voice and preset language lineage class library include is compared,Including:
Determine the audio feature code that described control voice that active user sends is corresponding;
It is that the sound template that in class library, comparison priority is the highest compares by audio feature code and described languageRight, it is thus achieved that comparison result.
Preferably, described user speech is carried out with corresponding to the default tested speech storehouse of described setting statementJoin, to determine that the language of user is classification, including:
Determine the audio feature code that described user speech is corresponding;
Described audio feature code is mated with the audio frequency characteristics code mask in described tested speech storehouse;
Determining, according to matching result, the language lineage classification that user is corresponding, wherein, described language is classification bagInclude mandarin and dialect.
Preferably, the audio frequency characteristics code mask in described audio feature code and described tested speech storehouse is carried outJoin, including:
Smart machine their location is determined according to internet protocol address;
The audio feature code mould that area determined by according to is corresponding with described area in obtaining described tested speech storehouseBlock;
Described audio feature code is mated with described audio frequency characteristics code mask, determines matching degree;
When matching degree not up to sets threshold value, according to audio feature code described in setting order successively comparison and instituteState remaining audio condition code template in tested speech storehouse.
Preferably, the sound template that described control voice and preset language lineage class library include is comparedRight, including:
Determine the audio feature code that described control voice that active user sends is corresponding;
According to default comparison priority, successively by described audio feature code and described language lineage class library bagThe sound template included mates, to identify described control voice.
Second aspect, the embodiment of the present invention additionally provides a kind of device by Voice command smart machine, shouldDevice includes:
Voice acquisition module, for obtaining the control voice that user sends;
Sound identification module, for the voice included with preset language lineage class library by described control voiceTemplate is compared;
Instruction determines module, for determining the control instruction corresponding with described control voice according to comparison result,Performing the operation that described control instruction is corresponding, wherein, described language is that class library includes and smart machine controlThe mandarin pronunciation template of system instruction association and dialect phonetic template.
Preferably, described device also includes:
Set statement display module, for, before obtaining the control voice that user sends, passing through smart machineDisplay screen display set statement, to point out user to read described setting statement;
The family of languages determines module, carries out with setting tested speech storehouse for would correspond to the user speech setting statementCoupling, to determine that the language of user is classification;
Priority arranges module, and being used for described language is that language determined by corresponding in class library isThe comparison priority of the sound template of classification is set to the highest;
And, described instruction determine module specifically for:
Determine the audio feature code that described control voice that active user sends is corresponding;
It is that the sound template that in class library, comparison priority is the highest compares by audio feature code and described languageRight, to identify described control voice.
Preferably, the described family of languages determines that module includes:
Condition code extracts submodule, for determining the audio feature code that described user speech is corresponding;
Voice match submodule, for by described audio feature code and the audio frequency characteristics in described tested speech storehouseCode mask mates;
Language is to determine submodule, determines that language corresponding to user is classification according to matching result, wherein,Described language is that classification includes mandarin and dialect.
Preferably, described language be determine submodule specifically for:
Smart machine their location is determined according to internet protocol address;
The audio feature code mould that area determined by according to is corresponding with described area in obtaining described tested speech storehouseBlock;
Described audio feature code is mated with described audio frequency characteristics code mask, determines matching degree;
When matching degree not up to sets threshold value, described audio feature code is remained in described tested speech storehouseAudio frequency characteristics code mask mates.
Preferably, described instruction determine module specifically for:
Determine the audio feature code that described control voice that active user sends is corresponding;
According to default comparison priority orders, it is class library bag by described audio feature code and described languageThe sound template included mates, to identify described control voice.
The embodiment of the present invention is by obtaining the control voice that user sends;By described control voice and preset languageThe sound template that speech lineage class library includes is compared;Determine and described control voice pair according to comparison resultThe control instruction answered, performs the operation that described control instruction is corresponding, solves the language of smart machine in prior artSound control function is the highest to the discrimination of dialect, in fact it could happen that user barks out control instruction several times, smart machineBut the problem of None-identified, it is provided that a kind of mode identifying the most rapidly user speech control instruction, reachesPromote the effect that the application of user is experienced.
Accompanying drawing explanation
Fig. 1 is the flow chart of a kind of method by Voice command smart machine in the embodiment of the present invention one;
Fig. 2 be in the embodiment of the present invention two a kind of by the method for Voice command smart machine control voiceThe flow chart of identification step;
Fig. 3 is the structural representation of a kind of device by Voice command smart machine in the embodiment of the present invention threeFigure.
Detailed description of the invention
The present invention is described in further detail with embodiment below in conjunction with the accompanying drawings.It is understood that thisSpecific embodiment described by place is used only for explaining the present invention, rather than limitation of the invention.The most also needIt is noted that for the ease of describing, accompanying drawing illustrate only part related to the present invention and not all knotStructure.
Embodiment one
The flow chart of a kind of method by Voice command smart machine that Fig. 1 provides for the embodiment of the present invention one,The present embodiment is applicable to the voice control not using mandarin or the non-type user of mandarin to use smart machineThe situation of function processed, the method can be performed by by the device of Voice command smart machine, this device quiltIt is configured in the smart machine with data processing function.Described had by the method for Voice command smart machineBody comprises the steps:
The control voice that step 110, acquisition user send.
Wherein, described control voice is the phonetic order that user wants to be issued to smart machine, can be commonTalk about, can be dialect or other any language.Smart machine gets user to smart machine by mikeThe phonetic order sent.Described phonetic order is performed the pretreatment such as filtering, noise reduction, by pretreated languageSound instruction is converted to digital signal and inputs the master controller to smart machine.Wherein, mike may be provided at intelligenceCan be in equipment (such as in intelligent TV set body), it is possible to be individually be positioned at smart machine outside (such as intelligenceEnergy TV set is external), mike and smart machine are communication connection.If described smart machine is intelligent television,When user says the television channel wanting to see, gathered the speech data of user, gatherer process by mikeIn by the pretreatment such as filtering, noise reduction, it is achieved other unwanted voice data is filtered as noiseAnd retain the purpose of the voice data of user.
Step 120, it is that the sound template that includes of class library is carried out by described control voice and preset languageComparison.
Wherein, language lineage class library is the set of the phoneme relevant to mandarin, dialect or other Languages,Further, being difference according to language, described language is that class library includes several word banks.Such as, describedLanguage be class library include mandarin word bank, Shanghai native language word bank, northeast words word bank, Chongqing words word bank andGuangdong language word bank etc..Wherein, phoneme is least unit or the sound bite of minimum of syllabication, is from tonequalityAngular divisions minimum linear phonetic unit out.Such as, the initial consonant in Chinese and simple or compound vowel of a Chinese syllable, andVowel in English and consonant etc..Phoneme in each language is all different, even language of the same raceIn, the phoneme of dialect is also different.
In each word bank, storage has the set of the phoneme associated with smart machine control instruction, described phonemeSet is constituted sound template with the form of phrase.Wherein, control instruction include turning on TV, closing television,Open air-conditioning, see BTV and play the operational order of the smart machines such as news hookup.And by setting numberPhoneme one phrase of composition that purpose is pronounced continuously.Such as, store in Shanghai native language word bank and have and smart machineThe Shanghai native language sound template of control instruction association.In the words word bank of northeast, storage has and smart machine control instruction passThe northeast words sound template of connection.
Smart machine determines the audio feature code that described control voice that active user sends is corresponding.Wherein, soundFrequently condition code includes that what active user sent controls the phoneme that voice is corresponding.Such as, the main control of smart machineDevice carries out framing to described control voice, and each frame occupies the duration (such as 25ms) of setting.Assume again that thisOne frame both long enoughs (can contain the phoneme attribute that be enough to judge it) of sample, the most steadily (conveniently carry out shortTime Fourier analysis), so each frame is converted to a characteristic vector, identifies that to control voice corresponding successivelyPhoneme.
According to default comparison priority, successively by described audio feature code and described language lineage class library bagThe sound template included mates, to identify described control voice.Can be language in advance by smart machineIt it is each word bank assigned priority in class library.For example, it is possible to for language system before smart machine dispatches from the factoryEach word bank that genus class library includes is according to the regular assigned priority set.A kind of possible order is commonThe priority of words word bank is the highest, and remaining word bank puts in order according to the initial of each department lead-in and determines and areaThe priority of corresponding word bank.Can also be that each word bank that language lineage class library includes is specified excellent by userFirst level.A kind of situation specified is probably prompting user and arranges the priority of word bank according to the language used by household.Can also is that smart machine record controls the word bank belonging to voice, update priority row according to using Automatic FrequencyName.Smart machine is performing sub-frame processing to controlling voice, and after identifying and controlling phoneme corresponding to voice,According to the priority rank of each word bank, the sound template controlling voice corresponding with each word bank is compared,Perform search decoding process.Decoding principle be typically given according to grammer, dictionary to Markov mouldType be attached after search network (each node of network can be a phrase etc.) after, allPossible searching route select one or more optimum (typically maximum a posteriori probability) path (in dictionaryThe phrase string of phrase occurs) as recognition result.
Step 130, determine the control instruction corresponding with described control voice according to comparison result, perform describedThe operation that control instruction is corresponding.
The sound template included due to described language lineage class library associates with smart machine control instruction, i.e. instituteState voice module with turn on TV, closing television, open air-conditioning, see BTV and play news hookupControl instruction existence setting corresponding relation Deng smart machine.If it is determined that control the right of voice and voice moduleShould be related to, also determine that the corresponding relation controlling voice with control instruction.Smart machine according to determined byControl instruction performs corresponding operation.
The technical scheme of the present embodiment, by obtaining the control voice that user sends;By described control voice withThe sound template that preset language lineage class library includes is compared;Determine and described control according to comparison resultThe control instruction that voice processed is corresponding, performs the operation that described control instruction is corresponding, solves intelligence in prior artThe voice control function of equipment is the highest to the discrimination of dialect, in fact it could happen that user barks out control instruction several times,The problem of smart machine but None-identified, it is provided that a kind of mode identifying the most rapidly user speech control instruction,Reach to promote the effect that the application of user is experienced.
Embodiment two
Fig. 2 be in the embodiment of the present invention two a kind of by the method for Voice command smart machine control voiceThe flow chart of identification step.The technical scheme of the present embodiment is further to by described control voice and preset languageSpeech is that the sound template that includes of class library is compared and illustrated, and specifically includes following steps:
Step 210, shown by the display screen of smart machine and set statement, to point out user to read described settingStatement.
Smart machine, when phonetic entry being detected, shows setting statement on a display screen, and prompting user reads instituteState setting statement.Such as, when detecting that user sends phonetic order to intelligent television, on intelligent televisionDisplay can distinguish a short sentence of the language in each area.
Step 220, would correspond to set statement user speech with setting tested speech storehouse mate, withDetermine the language lineage classification of user.
Smart machine determines the audio feature code that described user speech is corresponding, the determination process of audio feature code withThe mode of the record of embodiment one is identical, and here is omitted.
Audio frequency characteristics code mask in described audio feature code and described tested speech storehouse is carried out by intelligent terminalJoin.Wherein, tested speech storehouse includes the audio frequency characteristics code mask corresponding to described setting statement.Such as,Described tested speech storehouse includes genic male sterility word bank, Shanghai native language test word bank, northeast words test word bank, weightCelebrating words test word bank and Guangdong language test word bank etc..Each test word bank stores and has and described setting statementThe set of corresponding phoneme, the set of described phoneme (sets the phoneme composition of the pronunciation continuously of number with phraseOne phrase) form constitute audio frequency characteristics code mask.Such as, Chongqing words test word bank in storage have withThe Chongqing words sound template that described setting statement is corresponding.Language test word bank in the south of Fujian Province stores and has and described settingThe south of Fujian Province language sound template that statement is corresponding.
Smart machine determines the audio feature code that the user speech corresponding to setting statement is corresponding.Wherein, audio frequencyCondition code includes the phoneme that user speech that active user sends is corresponding.Such as, the master controller of smart machineDescribed user speech carries out framing, and each frame occupies the duration (such as 25ms) of setting.Assume again that soA frame both long enoughs (phoneme attribute that be enough to judge it can be contained), the most steadily (conveniently carry out shortTime Fourier analysis), so each frame is converted to a characteristic vector, identifies that user speech is corresponding successivelyPhoneme.
Smart machine determines smart machine their location according to internet protocol address, according to determined by area obtainTake audio feature code module corresponding with described area in described tested speech storehouse.Smart machine is by described audio frequencyCondition code is mated with described audio frequency characteristics code mask, determines matching degree;The threshold of setting is exceeded in matching degreeDuring value, determine that the language that current test word bank is corresponding is to be categorized as the language that user uses.In matching degree notWhen reaching to set threshold value, according in audio feature code described in setting order successively comparison and described tested speech storehouseRemaining audio condition code template.The language lineage classification that user is corresponding, wherein, institute is determined according to matching resultPredicate speech is that classification includes mandarin and dialect.Such as, the master controller of smart machine is according to current netBorder Protocol IP address determines that current smart machine is in Tianjin, selects Tianjin words test in tested speech storehouseThe audio frequency characteristics code mask that storehouse includes.Smart machine would correspond to set the audio frequency characteristics of the user speech of statementCode mates with the audio frequency characteristics code mask of Tianjin words test word bank.As matching degree exceedes setting threshold value, thenDetermine that active user uses Tianjin to talk about.If matching degree not up to sets threshold value, then according to setting priority by rightShould carry out with remaining audio condition code template in described tested speech storehouse in the described audio feature code setting statementCoupling, exceedes matching degree that to set the language corresponding to audio frequency characteristics code mask of threshold value be that classification is as userThe language being provided with is classification.Wherein, the setting means of the priority of each test word bank in tested speech storehouseIdentical, the most no longer with the priority level initializing mode of each word bank in language lineage class library in embodiment oneRepeat.
Step 230, will described language be language determined by corresponding in class library be the voice of classificationThe comparison priority of template is set to the highest.
The language that the user determined in above-mentioned steps uses is sound template corresponding to classification by smart machineComparison priority is set to the highest.Such as, determine that user uses Tianjin to talk about by above-mentioned steps, then, by languageIn speech lineage class library, the priority of the sound template that Tianjin words word bank includes is set to the highest.Described language isIn class library, the priority of sound template is corresponding to including the priority of the word bank of described sound template.Institute's predicateSpeech is the priority remaining sound template in class library according to language in embodiment one is each in class libraryThe prioritization of individual word bank, here is omitted.
The control voice that step 240, acquisition user send.
Smart machine gathers the speech data of user, the process side to the speech data gathered by mikeFormula is identical with embodiment one, and here is omitted.
Step 250, determine the audio feature code that described control voice that active user sends is corresponding.
Smart machine determines the audio feature code that described control voice that active user sends is corresponding.Audio frequency characteristicsThe determination process of code is identical with the mode of the record of embodiment one, and here is omitted.
Step 260, it is the voice that in class library, comparison priority is the highest by audio feature code and described languageTemplate is compared, it is thus achieved that comparison result.
Sound template the highest for the priority determined in audio feature code and above-mentioned steps is compared by smart machineRight.Such as, determine that language is the sound template that in class library, Tianjin words word bank includes according to above-mentioned stepsPriority is the highest.When performing comparison operation, the master controller of smart machine is first by described audio feature codeThe sound template included with Tianjin words word bank is compared, if there is matching degree to exceed the phrase setting threshold values,Then using this phrase or phrase string as the recognition result controlling voice obtained according to comparison result.
The technical scheme of the present embodiment, by enabling users reading to set statement, reduces the scope of speech recognition,Improve recognition speed.And the language first used user is determined, thus sets smart machine and existWhen obtaining the control instruction of user, the language of preferential comparison is the sound template that class library includes, it is achieved everyAfter secondary acquisition control instruction, smart machine can determine the comparison template of first-selection targetedly, carries furtherHigh speech recognition speed and accuracy.
Embodiment three
Fig. 3 is the structural representation of a kind of device by Voice command smart machine in the embodiment of the present invention threeFigure.Described device includes:
Voice acquisition module 310, for obtaining the control voice that user sends.
Sound identification module 320, for include described control voice and preset language lineage class librarySound template is compared.
Instruction determines module 330, for determining that the control corresponding with described control voice refers to according to comparison resultOrder, performs the operation that described control instruction is corresponding, and wherein, described language is that class library includes setting with intelligenceThe mandarin pronunciation template of standby control instruction association and dialect phonetic template.
The technical scheme of the present embodiment, obtains, by voice acquisition module 310, the control voice that user sends.Using sound identification module 320 is the voice mould that class library includes by described control voice and preset languagePlate is compared.Determine that module 330 determines according to comparison result by instruction corresponding with described control voiceControl instruction, performs the operation that described control instruction is corresponding, solves the voice control of smart machine in prior artFunction processed is the highest to the discrimination of dialect, in fact it could happen that user barks out control instruction several times, smart machine but withoutThe problem of method identification, it is provided that a kind of mode identifying the most rapidly user speech control instruction, has reached to promoteThe effect that the application of user is experienced.
Further, described device also includes:
Set statement display module, for, before obtaining the control voice that user sends, passing through smart machineDisplay screen display set statement, to point out user to read described setting statement;
The family of languages determines module, carries out with setting tested speech storehouse for would correspond to the user speech setting statementCoupling, to determine that the language of user is classification;
Priority arranges module, and being used for described language is that language determined by corresponding in class library isThe comparison priority of the sound template of classification is set to the highest;
And, described instruction determine module specifically for:
Determine the audio feature code that described control voice that active user sends is corresponding;
It is that the sound template that in class library, comparison priority is the highest compares by audio feature code and described languageRight, to identify described control voice.
Further, the described family of languages determines that module includes:
Condition code extracts submodule, for determining the audio feature code that described user speech is corresponding;
Voice match submodule, for by described audio feature code and the audio frequency characteristics in described tested speech storehouseCode mask mates;
Language is to determine submodule, determines that language corresponding to user is classification according to matching result, wherein,Described language is that classification includes mandarin and dialect.
Further, described language be determine submodule specifically for:
Smart machine their location is determined according to internet protocol address;
The audio feature code mould that area determined by according to is corresponding with described area in obtaining described tested speech storehouseBlock;
Described audio feature code is mated with described audio frequency characteristics code mask, determines matching degree;
When matching degree not up to sets threshold value, described audio feature code is remained in described tested speech storehouseAudio frequency characteristics code mask mates.
Further, described instruction determine module 330 specifically for:
Determine the audio feature code that described control voice that active user sends is corresponding;
According to default comparison priority orders, it is class library bag by described audio feature code and described languageThe sound template included mates, to identify described control voice.
Above by passing through that the device of Voice command smart machine can perform that any embodiment of the present invention providedThe method of Voice command smart machine, possesses the corresponding functional module of execution method and beneficial effect.
It will be appreciated by those skilled in the art that all or part of step realizing in above-described embodiment method is permissibleInstructing relevant hardware by program to complete, this program is stored in a storage medium, including someInstruct with so that an equipment (can be single-chip microcomputer, chip etc.) or processor (processor) performAll or part of step of method described in each embodiment of the application.And aforesaid storage medium includes: USB flash disk,Portable hard drive, read only memory (ROM, Read-Only Memory), random access memory (RAM, RandomAccess Memory), the various media that can store program code such as magnetic disc or CD.
Note, above are only presently preferred embodiments of the present invention and institute's application technology principle.Those skilled in the artIt will be appreciated that the invention is not restricted to specific embodiment described here, can enter for a person skilled in the artRow various obvious changes, readjust and substitute without departing from protection scope of the present invention.Therefore, thoughSo by above example, the present invention is described in further detail, but the present invention be not limited only toUpper embodiment, without departing from the inventive concept, it is also possible to include other Equivalent embodiments more,And the scope of the present invention is determined by scope of the appended claims.