Movatterモバイル変換


[0]ホーム

URL:


CN107293300A - Audio recognition method and device, computer installation and readable storage medium storing program for executing - Google Patents

Audio recognition method and device, computer installation and readable storage medium storing program for executing
Download PDF

Info

Publication number
CN107293300A
CN107293300ACN201710648985.2ACN201710648985ACN107293300ACN 107293300 ACN107293300 ACN 107293300ACN 201710648985 ACN201710648985 ACN 201710648985ACN 107293300 ACN107293300 ACN 107293300A
Authority
CN
China
Prior art keywords
voice messaging
lip
pause information
user
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201710648985.2A
Other languages
Chinese (zh)
Inventor
关超雄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Meizu Technology Co Ltd
Original Assignee
Meizu Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Meizu Technology Co LtdfiledCriticalMeizu Technology Co Ltd
Priority to CN201710648985.2ApriorityCriticalpatent/CN107293300A/en
Publication of CN107293300ApublicationCriticalpatent/CN107293300A/en
Withdrawnlegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

The invention provides a kind of audio recognition method, the audio recognition method includes:Obtain the voice messaging of user's input;Obtain lip image of the user when inputting the voice messaging;Pause information in the voice messaging according to the lip image recognition;Speech recognition is carried out to the voice messaging according to the pause information.The present invention also provides a kind of speech recognition equipment, computer installation and computer-readable recording medium.The present invention can carry out speech recognition using lip image, improve the accuracy rate of speech recognition.

Description

Audio recognition method and device, computer installation and readable storage medium storing program for executing
Technical field
The present invention relates to intelligent sound technical field, and in particular to a kind of audio recognition method and device, computer installationAnd readable storage medium storing program for executing.
Background technology
At present, with the development of electronics and the communication technology, the terminal such as mobile phone, tablet personal computer is widely used, man-machine friendshipMutual mode is also more and more diversified.Phonetic entry is more and more used as one of natural mode of man-machine interaction most convenientFamily is received.However, current speech recognition accuracy is not high, poor user experience.
The content of the invention
In view of the foregoing, it is necessary to propose a kind of audio recognition method and device, computer installation and readable storage mediumMatter, it can carry out speech recognition using lip image, improve the accuracy rate of speech recognition.
The first aspect of the application provides a kind of audio recognition method, and methods described includes:
Obtain the voice messaging of user's input;
Obtain lip image of the user when inputting the voice messaging;
Pause information in the voice messaging according to the lip image recognition;
Speech recognition is carried out to the voice messaging according to the pause information.
It is described that speech recognition is carried out to the voice messaging according to the pause information in alternatively possible implementationIncluding:
According to the time map relation between the pause information and the voice messaging, the pause information is inserted intoIn the text message being converted into by the voice messaging;Or
The pause information in the voice messaging is removed, the voice messaging for having removed the pause information is enteredRow speech recognition.
In alternatively possible implementation, the pause letter in the voice messaging according to the lip image recognitionBreath includes:
Disconnected word pause information and/or punctuate pause information in the voice messaging according to the lip image recognition;
Carrying out speech recognition to the voice messaging according to the pause information includes:
Speech recognition is carried out to the voice messaging according to disconnected the word pause information and/or punctuate pause information.
In alternatively possible implementation, the voice messaging for obtaining user's input;Obtain user described in inputLip image during voice messaging includes:
When user inputs the voice messaging, the voice messaging is gathered by the microphone of terminal, and pass through endThe camera at end shoots the lip image.
In alternatively possible implementation, methods described also includes:
Judge whether the lip motion information matches with the voice messaging;
If the lip motion information is mismatched with the voice messaging, the camera is controlled to stop shooting the lip figurePicture.
In alternatively possible implementation, methods described also includes:
The motion amplitude of user's lip is obtained according to the lip image, is recognized according to the motion amplitude of user's lipThe corresponding tone of the voice messaging;Or
The lip characteristic of user pronunciation is obtained, user characteristics is determined according to the lip characteristic, according to the user characteristicsSpeech recognition is carried out to the voice messaging with the pause information.
The second aspect of the application provides a kind of speech recognition equipment, and described device includes:
First acquisition unit, the voice messaging for obtaining user's input;
Second acquisition unit, for obtaining lip image of the user when inputting the voice messaging;
First recognition unit, for the pause information in the voice messaging according to the lip image recognition;
Second recognition unit, for carrying out speech recognition to the voice messaging according to the pause information.
In alternatively possible implementation, second recognition unit specifically for:
According to the time map relation between the pause information and the voice messaging, the pause information is inserted intoIn the text message being converted into by the voice messaging;Or
The pause information in the voice messaging is removed, the voice messaging for having removed the pause information is enteredRow speech recognition.
The third aspect of the application provides a kind of computer installation, and the computer installation includes processor, the processingThe step of device is used to realize the audio recognition method when performing the computer program stored in memory.
The fourth aspect of the application provides a kind of computer-readable recording medium, is stored thereon with computer program, describedThe step of audio recognition method being realized when computer program is executed by processor.
The present invention obtains the voice messaging of user's input;Obtain lip image of the user when inputting the voice messaging;Pause information in the voice messaging according to the lip image recognition;The voice messaging is entered according to the pause informationRow speech recognition.The present invention can carry out speech recognition using lip image, improve the accuracy rate of speech recognition.
Brief description of the drawings
Fig. 1 is the flow chart for the audio recognition method that the embodiment of the present invention one is provided;
Fig. 2 is the structure chart for the speech recognition equipment that the embodiment of the present invention two is provided;
Fig. 3 is the schematic diagram for the computer installation that the embodiment of the present invention three is provided.
Main element symbol description
Computer installation 1
Speech recognition equipment 10
Memory 20
Processor 30
Computer program 40
First acquisition unit 201
Second acquisition unit 202
First recognition unit 203
Second recognition unit 204
Following embodiment will further illustrate the present invention with reference to above-mentioned accompanying drawing.
Embodiment
It is below in conjunction with the accompanying drawings and specific real in order to be more clearly understood that the above objects, features and advantages of the present inventionApplying example, the present invention will be described in detail.It should be noted that in the case where not conflicting, embodiments herein and embodimentIn feature can be mutually combined.
Elaborate many details in the following description to facilitate a thorough understanding of the present invention, described embodiment onlyOnly it is a part of embodiment of the invention, rather than whole embodiments.Based on the embodiment in the present invention, ordinary skillThe every other embodiment that personnel are obtained under the premise of creative work is not made, belongs to the scope of protection of the invention.
Unless otherwise defined, all of technologies and scientific terms used here by the article is with belonging to technical field of the inventionThe implication that technical staff is generally understood that is identical.Term used in the description of the invention herein is intended merely to description toolThe purpose of the embodiment of body, it is not intended that in the limitation present invention.
Preferably, audio recognition method of the invention is applied in one or more terminal.The terminal is a kind of energyIt is enough according to the instruction for being previously set or store, the equipment of automatic progress numerical computations and/or information processing, its hardware is included but notIt is limited to microprocessor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), can compilesJourney gate array (Field-Programmable Gate Array, FPGA), digital processing unit (Digital SignalProcessor, DSP), embedded device etc..
The terminal may be, but not limited to, any one can with user by keyboard, mouse, remote control, touch pad orThe modes such as voice-operated device carry out the electronic product of man-machine interaction, for example, tablet personal computer, smart mobile phone, personal digital assistant(Personal Digital Assistant, PDA), intelligent wearable equipment etc..
Embodiment one
Fig. 1 is the flow chart for the audio recognition method that the embodiment of the present invention one is provided.As shown in figure 1, this method is specifically wrappedInclude following steps:
101:Obtain the voice messaging of user's input.
The voice messaging is the speech data obtained according to the natural-sounding of user.For example, the voice messaging is logicalCross microphone and the natural-sounding of user is converted into the voice signal that electric signal is obtained.
The voice messaging can be gathered by the microphone of terminal in user's input voice information.For example, can examineSurvey and whether receive phonetic entry sign on (for example detecting whether the home keys of terminal are long pressed), refer to if receiving phonetic entryOrder, then start to gather the voice messaging that user inputs by the microphone of terminal.It can also detect whether to receive phonetic entry knotShu Zhiling (for example detects whether the home keys of terminal are released), if receiving phonetic entry END instruction, stops passing through terminalMicrophone collection user input voice messaging.
Or, the voice messaging gathered in advance can be read.For example, the voice messaging of user's input can be gathered in advance,When needing to carry out speech recognition to the voice messaging, the voice messaging is read.
102:Obtain lip image of the user when inputting the voice messaging.
The lip image is also lip motion video or labiomaney image, refers to when people speaks, the lip motion of speakerThe image of change.Lip image in a period of time may be constructed image sequence or image/video.
Facial image of the user when inputting the voice messaging can be obtained, lip position is determined from the facial imagePut, so as to obtain the lip image.
Camera can also be directly directed to user's lip to be shot, so as to obtain the lip image.For example, shootingHead can be built in microphone (such as in headset), or microphone is built in camera, and user is in use, take the photographAs head is directly directed at user's lip, so as to easily obtain lip image.
The lip image can be shot by the camera of terminal in user's input voice information.For example, can examineWhether survey receives phonetic entry sign on, if receiving phonetic entry sign on, gathers and uses in the microphone by terminalWhile the voice messaging of family input, the lip image of user is shot by the camera of terminal.It can also detect whether to receivePhonetic entry END instruction, if receiving phonetic entry END instruction, user's input is gathered stopping the microphone by terminalVoice messaging while, stop shooting the lip image of user by the camera of terminal.
Or, the lip image shot in advance can be read.For example, can be in the voice messaging that collection user inputs in advanceWhen, the lip image is shot, when needing to carry out speech recognition to the voice messaging, the lip image is read.
The voice messaging that user inputs and the camera shooting lip for passing through terminal are gathered in the microphone by terminalDuring shape image, it can be determined that whether the lip motion information matches with the voice messaging, if the lip motion information and the voiceInformation is mismatched, and controls the camera to stop shooting the lip image.
It can detect whether the lip motion information is synchronous with the voice messaging, if the lip motion information is believed with the voiceBreath is asynchronous, then the lip motion information is mismatched with the voice messaging.If for example, according to the voice messaging determine user fromLoquitur within 1st second, determine that user loquitured from the 5th second according to the lip motion information, then the lip motion information and institute's predicateMessage breath is asynchronous, thus the lip motion information and voice messaging mismatch.
Or, it can detect that the corresponding text information of lip motion information text information corresponding with the voice messaging isIt is no consistent, it is described if the corresponding text information of lip motion information text information corresponding with the voice messaging is inconsistentLip motion information is mismatched with the voice messaging.For example, the corresponding text information of the lip motion information is " I in certain time periodHave a meeting ", the corresponding text information of the voice messaging is " today, weather was pretty good ", then the corresponding text of the lip motion informationWord information text information corresponding with the voice messaging is inconsistent, thus the lip motion information and the voice messaging are notMatch somebody with somebody.
103:Pause information in the voice messaging according to the lip image recognition.
Pause often occurs when speaking by user, therefore, and the lip image includes lip image when pausing, describedVoice messaging includes the voice messaging (information of pausing) when pausing, and the voice can be recognized according to lip image when pausingThe pause information that packet contains.
User can be paused during speaking when needing disconnected word or punctuate, and therefore, the pause information can be with tableShow disconnected word and/or punctuate (now pause information can be mute signal), the pause information can include disconnected word pause informationAnd/or punctuate pause information.
Or, user can be paused during speaking when other side speaks or thinks deeply, therefore, and the pause information canWith represent one section it is Jing Yin.Now the pause information is invalid phonetic entry.
Or, user can be paused during speaking when there is noise (such as when noise is excessive), therefore, describedPause information can represent noise (now pause information can be noise signal).Now the pause information is invalid voiceInput.
When the pause information represents disconnected word and/or punctuate, it can believe voice according to the lip image recognitionDisconnected word pause information and/or punctuate pause information in breath.
Whether can not occurred according to the lip image detection to the first preset time (such as 0.1 second) interior user's lipWhether change or amplitude of variation are less than or equal to predetermined amplitude, if according in the lip image detection to the first preset timeUser's lip does not change or amplitude of variation is less than or equal to predetermined amplitude, then pre- by described in the voice messaging firstIf time corresponding voice messaging is identified as disconnected word pause information.
Whether can not occurred according to the lip image detection to the second preset time (such as 0.5 second) interior user's lipChange or amplitude of variation are less than or equal to predetermined amplitude, if according to user in the lip image detection to the second preset timeLip does not change or amplitude of variation is less than or equal to predetermined amplitude, then when described in the voice messaging second is presetBetween corresponding voice messaging be identified as punctuate pause information.Second preset time can be more than first preset time.
, can be default to the 3rd according to the lip image detection when the pause information represents that one section Jing Yin or during noiseWhether time (such as 3 seconds) interior user's lip does not change or whether amplitude of variation is less than or equal to predetermined amplitude, if rootDo not changed according to user's lip in the lip image detection to the 3rd preset time or amplitude of variation is less than or equal in advanceIf amplitude, then the corresponding voice messaging of the 3rd preset time described in the voice messaging is identified as pause information.Or, ifDo not changed according to user's lip in the lip image detection to the 3rd preset time or amplitude of variation is less than or equal toPredetermined amplitude, and the corresponding voice signal amplitude of the 3rd preset time described in the voice messaging is more than predetermined threshold value,The corresponding voice messaging of the 3rd preset time described in the voice messaging is then identified as pause information.Described 3rd it is default whenBetween can be more than second preset time.
104:Speech recognition is carried out to the voice messaging according to the pause information.
If the pause information includes disconnected word pause information, the voice can be believed according to the disconnected word pause informationBreath carries out speech recognition.
Or, can be according to the punctuate pause information to described if the pause information includes punctuate pause informationVoice messaging carries out speech recognition.
Or, can be according to the disconnected word if the pause information includes disconnected word pause information and punctuate pause informationPause information and punctuate pause information carry out speech recognition to the voice messaging.
Can according to the time map relation between the pause information and the voice messaging (i.e. corresponding time relationship),The pause information is inserted into the text message being converted into by the voice messaging.For example, can be to the voice messagingCarry out speech recognition, obtain the corresponding text message of the voice messaging, according to the pause information (disconnected word pause information and/Or punctuate pause information) time of occurrence in the voice messaging, the pause information is inserted into the text message,Obtain including the text message of pause information.
Or, the pause information in the voice messaging can be removed, to having removed described in the pause informationVoice messaging carries out speech recognition.As it was previously stated, the pause information can represent noise or Jing Yin, i.e., invalid voice is defeatedEnter, the noise in the voice messaging can be removed by carrying out speech recognition to the voice messaging for having removed the pause informationOr it is Jing Yin.
Can use various speech recognition technologies, such as dynamic time warping (Dynamic Time Warping, DTW),It is hidden Markov model (Hidden Markov Model, HMM), vector quantization (Vector Quantization, VQ), artificialTechnology is to the voice messaging or has removed pause information for neutral net (Artificial Neural Network, ANN) etc.The voice messaging carries out speech recognition.
The audio recognition method of embodiment one obtains the voice messaging of user's input;Obtain user and input the voice letterLip image during breath;Pause information in the voice messaging according to the lip image recognition;According to the pause informationSpeech recognition is carried out to the voice messaging.The audio recognition method of embodiment one can carry out voice knowledge using lip imageNot, the accuracy rate of speech recognition is improved.
In another embodiment, methods described can also include:The motion of user's lip is obtained according to the lip imageAmplitude, the corresponding tone of the voice messaging is recognized according to the motion amplitude of user's lip.The tone can include oldPredicate gas, the query tone, imperative mood, exclamation tone etc..If for example, the motion amplitude of user's lip is in the first default widthIn the range of degree, it is determined that the corresponding tone of the voice messaging is sighs with feeling the tone;If the motion amplitude of user's lip isIn the range of two predetermined amplitudes, it is determined that the corresponding tone of the voice messaging is imperative mood.
In another embodiment, methods described can also include:Obtain the lip characteristic of user pronunciation;According to the lipCharacteristic determines user characteristics;Speech recognition is carried out to the voice messaging according to the user characteristics and the pause information.InstituteUser's sex, language form, dialect type and/or pet phrase custom etc. can be included by stating user characteristics.For example, can according toThe lip characteristic of family pronunciation determines language form (such as Chinese), according to the language form and the pause information to institute's predicateMessage breath carries out speech recognition.To obtaining more auxiliary informations before voice messaging progress speech recognition, (i.e. user is specialLevy), it can further improve the accuracy rate of speech recognition.
Embodiment two
Fig. 2 is the structure chart for the speech recognition equipment that the embodiment of the present invention two is provided.As shown in Fig. 2 the speech recognitionDevice 10 can include:First acquisition unit 201, second acquisition unit 202, the first recognition unit 203, the second recognition unit204。
First acquisition unit 201, the voice messaging for obtaining user's input.
The voice messaging is the speech data obtained according to the natural-sounding of user.For example, the voice messaging is logicalCross microphone and the natural-sounding of user is converted into the voice signal that electric signal is obtained.
The voice messaging can be gathered by the microphone of terminal in user's input voice information.For example, can examineSurvey and whether receive phonetic entry sign on (for example detecting whether the home keys of terminal are long pressed), refer to if receiving phonetic entryOrder, then start to gather the voice messaging that user inputs by the microphone of terminal.It can also detect whether to receive phonetic entry knotShu Zhiling (for example detects whether the home keys of terminal are released), if receiving phonetic entry END instruction, stops passing through terminalMicrophone collection user input voice messaging.
Or, the voice messaging gathered in advance can be read.For example, the voice messaging of user's input can be gathered in advance,When needing to carry out speech recognition to the voice messaging, the voice messaging is read.
Second acquisition unit 202, for obtaining lip image of the user when inputting the voice messaging.
The lip image is also lip motion video or labiomaney image, refers to when people speaks, the lip motion of speakerThe image of change.Lip image in a period of time may be constructed image sequence or image/video.
Facial image of the user when inputting the voice messaging can be obtained, lip position is determined from the facial imagePut, so as to obtain the lip image.
Camera can also be directly directed to user's lip to be shot, so as to obtain the lip image.For example, shootingHead can be built in microphone (such as in headset), or microphone is built in camera, and user is in use, take the photographAs head is directly directed at user's lip, so as to easily obtain lip image.
The lip image can be shot by the camera of terminal in user's input voice information.For example, can examineWhether survey receives phonetic entry sign on, if receiving phonetic entry sign on, gathers and uses in the microphone by terminalWhile the voice messaging of family input, the lip image of user is shot by the camera of terminal.It can also detect whether to receivePhonetic entry END instruction, if receiving phonetic entry END instruction, user's input is gathered stopping the microphone by terminalVoice messaging while, stop shooting the lip image of user by the camera of terminal.
Or, the lip image shot in advance can be read.For example, can be in the voice messaging that collection user inputs in advanceWhen, the lip image is shot, when needing to carry out speech recognition to the voice messaging, the lip image is read.
The voice messaging that user inputs and the camera shooting lip for passing through terminal are gathered in the microphone by terminalDuring shape image, it can be determined that whether the lip motion information matches with the voice messaging, if the lip motion information and the voiceInformation is mismatched, and controls the camera to stop shooting the lip image.
It can detect whether the lip motion information is synchronous with the voice messaging, if the lip motion information is believed with the voiceBreath is asynchronous, then the lip motion information is mismatched with the voice messaging.If for example, according to the voice messaging determine user fromLoquitur within 1st second, determine that user loquitured from the 5th second according to the lip motion information, then the lip motion information and institute's predicateMessage breath is asynchronous, thus the lip motion information and voice messaging mismatch.
Or, it can detect that the corresponding text information of lip motion information text information corresponding with the voice messaging isIt is no consistent, it is described if the corresponding text information of lip motion information text information corresponding with the voice messaging is inconsistentLip motion information is mismatched with the voice messaging.For example, the corresponding text information of the lip motion information is " I in certain time periodHave a meeting ", the corresponding text information of the voice messaging is " today, weather was pretty good ", then the corresponding text of the lip motion informationWord information text information corresponding with the voice messaging is inconsistent, thus the lip motion information and the voice messaging are notMatch somebody with somebody.
First recognition unit 203, for the pause information in the voice messaging according to the lip image recognition.
Pause often occurs when speaking by user, therefore, and the lip image includes lip image when pausing, describedVoice messaging includes the voice messaging (information of pausing) when pausing, and the voice can be recognized according to lip image when pausingThe pause information that packet contains.
User can be paused during speaking when needing disconnected word or punctuate, and therefore, the pause information can be with tableShow disconnected word and/or punctuate (now pause information can be mute signal), the pause information can include disconnected word pause informationAnd/or punctuate pause information.
Or, user can be paused during speaking when other side speaks or thinks deeply, therefore, and the pause information canWith represent one section it is Jing Yin.Now the pause information is invalid phonetic entry.
Or, user can be paused during speaking when there is noise (such as when noise is excessive), therefore, describedPause information can represent noise (now pause information can be noise signal).Now the pause information is invalid voiceInput.
When the pause information represents disconnected word and/or punctuate, it can believe voice according to the lip image recognitionDisconnected word pause information and/or punctuate pause information in breath.
Whether can not occurred according to the lip image detection to the first preset time (such as 0.1 second) interior user's lipWhether change or amplitude of variation are less than or equal to predetermined amplitude, if according in the lip image detection to the first preset timeUser's lip does not change or amplitude of variation is less than or equal to predetermined amplitude, then pre- by described in the voice messaging firstIf time corresponding voice messaging is identified as disconnected word pause information.
Whether can not occurred according to the lip image detection to the second preset time (such as 0.5 second) interior user's lipChange or amplitude of variation are less than or equal to predetermined amplitude, if according to user in the lip image detection to the second preset timeLip does not change or amplitude of variation is less than or equal to predetermined amplitude, then when described in the voice messaging second is presetBetween corresponding voice messaging be identified as punctuate pause information.Second preset time can be more than first preset time.
, can be default to the 3rd according to the lip image detection when the pause information represents that one section Jing Yin or during noiseWhether time (such as 3 seconds) interior user's lip does not change or whether amplitude of variation is less than or equal to predetermined amplitude, if rootDo not changed according to user's lip in the lip image detection to the 3rd preset time or amplitude of variation is less than or equal in advanceIf amplitude, then the corresponding voice messaging of the 3rd preset time described in the voice messaging is identified as pause information.Or, ifDo not changed according to user's lip in the lip image detection to the 3rd preset time or amplitude of variation is less than or equal toPredetermined amplitude, and the corresponding voice signal amplitude of the 3rd preset time described in the voice messaging is more than predetermined threshold value,The corresponding voice messaging of the 3rd preset time described in the voice messaging is then identified as pause information.Described 3rd it is default whenBetween can be more than second preset time.
Second recognition unit 204, for carrying out speech recognition to the voice messaging according to the pause information.
If the pause information includes disconnected word pause information, the voice can be believed according to the disconnected word pause informationBreath carries out speech recognition.
Or, can be according to the punctuate pause information to described if the pause information includes punctuate pause informationVoice messaging carries out speech recognition.
Or, can be according to the disconnected word if the pause information includes disconnected word pause information and punctuate pause informationPause information and punctuate pause information carry out speech recognition to the voice messaging.
Can according to the time map relation between the pause information and the voice messaging (i.e. corresponding time relationship),The pause information is inserted into the text message being converted into by the voice messaging.For example, can be to the voice messagingCarry out speech recognition, obtain the corresponding text message of the voice messaging, according to the pause information (disconnected word pause information and/Or punctuate pause information) time of occurrence in the voice messaging, the pause information is inserted into the text message,Obtain including the text message of pause information.
Or, the pause information in the voice messaging can be removed, to having removed described in the pause informationVoice messaging carries out speech recognition.As it was previously stated, the pause information can represent noise or Jing Yin, i.e., invalid voice is defeatedEnter, the noise in the voice messaging can be removed by carrying out speech recognition to the voice messaging for having removed the pause informationOr it is Jing Yin.
Can use various speech recognition technologies, such as dynamic time warping (Dynamic Time Warping, DTW),It is hidden Markov model (Hidden Markov Model, HMM), vector quantization (Vector Quantization, VQ), artificialTechnology is to the voice messaging or has removed pause information for neutral net (Artificial Neural Network, ANN) etc.The voice messaging carries out speech recognition.
The speech recognition equipment 10 of embodiment two obtains the voice messaging of user's input;Obtain user and input the voiceLip image during information;Pause information in the voice messaging according to the lip image recognition;Believed according to described pauseBreath carries out speech recognition to the voice messaging.The speech recognition equipment 10 of embodiment two can carry out voice using lip imageIdentification, improves the accuracy rate of speech recognition.
In another embodiment, the speech recognition equipment 10 can also include:
3rd recognition unit, the motion amplitude for obtaining user's lip according to the lip image, according to the userThe motion amplitude of lip recognizes the corresponding tone of the voice messaging.The tone can include indicative mood, the query tone, prayMake the tone, sigh with feeling tone etc..If for example, the motion amplitude of user's lip is in the range of the first predetermined amplitude, it is determined that instituteThe corresponding tone of voice messaging is stated to sigh with feeling the tone;If the motion amplitude of user's lip is in the range of the second predetermined amplitude,It is imperative mood then to determine the corresponding tone of the voice messaging.
In another embodiment, the speech recognition equipment 10 can also include:
4th recognition unit, the lip characteristic for obtaining user pronunciation;User characteristics is determined according to the lip characteristic;Speech recognition is carried out to the voice messaging according to the user characteristics and the pause information.The user characteristics can includeUser's sex, language form, dialect type and/or pet phrase custom etc..For example, can be true according to the lip characteristic of user pronunciationDetermine language form (such as Chinese), voice knowledge is carried out to the voice messaging according to the language form and the pause informationNot.The voice messaging is carried out to obtain more auxiliary informations (i.e. user characteristics) before speech recognition, can further be carriedThe accuracy rate of high speech recognition.
Embodiment three
Fig. 3 is the schematic diagram for the computer installation that the embodiment of the present invention three is provided.The computer installation 1 includes memory20th, processor 30 and the computer program 40 that can be run in the memory 20 and on the processor 30, example are stored inSuch as speech recognition program.The processor 30 is realized when performing the computer program 40 in above-mentioned audio recognition method embodimentThe step of, such as step 101 shown in Fig. 1~104.Or, the processor 30 is realized when performing the computer program 40The function of each module/unit, such as unit 201~204 in said apparatus embodiment.
Exemplary, the computer program 40 can be divided into one or more module/units, it is one orMultiple module/units are stored in the memory 20, and are performed by the processor 30, to complete the present invention.Described oneIndividual or multiple module/units can complete the series of computation machine programmed instruction section of specific function, and the instruction segment is used forImplementation procedure of the computer program 40 in the computer installation 1 is described.For example, the computer program 40 can be byFirst acquisition unit 201 in Fig. 2, second acquisition unit 202, the first recognition unit 203, the second recognition unit 204 are divided into,Each module concrete function is referring to embodiment two.
The computer installation 1 can be that the calculating such as desktop PC, notebook, palm PC and cloud server is setIt is standby.It will be understood by those skilled in the art that the schematic diagram 3 is only the example of computer installation 1, do not constitute to computerThe restriction of device 1, can include than illustrating more or less parts, either combine some parts or different parts, exampleComputer installation 1 can also include input-output equipment, network access equipment, bus etc. as described.
Alleged processor 30 can be CPU (Central Processing Unit, CPU), can also beOther general processors, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit(Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other PLDs, discrete gate or transistor logic,Discrete hardware components etc..General processor can be microprocessor or the processor 30 can also be any conventional processorDeng the processor 30 is the control centre of the computer installation 1, utilizes various interfaces and connection whole computer dressPut 1 various pieces.
The memory 20 can be used for storing the computer program 40 and/or module/unit, and the processor 30 passes throughOperation performs and is stored in computer program and/or module/unit in the memory 20, and calls and be stored in memoryData in 20, realize the various functions of the computer installation 1.The memory 20 can mainly include storing program area and depositData field is stored up, wherein, the application program that storing program area can be needed for storage program area, at least one function (such as broadcast by soundPlaying function, image player function etc.) etc.;Storage data field can be stored uses created data (ratio according to computer installation 1Such as voice data, phone directory) etc..In addition, memory 20 can include high-speed random access memory, it can also include non-easyThe property lost memory, such as hard disk, internal memory, plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital(Secure Digital, SD) block, flash card (Flash Card), at least one disk memory, flush memory device or otherVolatile solid-state part.
If the integrated module/unit of the computer installation 1 is realized using in the form of SFU software functional unit and as independentlyProduction marketing or in use, can be stored in a computer read/write memory medium.Understood based on such, the present inventionAll or part of flow in above-described embodiment method is realized, the hardware of correlation can also be instructed by computer program come completeInto described computer program can be stored in a computer-readable recording medium, and the computer program is being executed by processorWhen, the step of each above-mentioned embodiment of the method can be achieved.Wherein, the computer program includes computer program code, describedComputer program code can be source code form, object identification code form, executable file or some intermediate forms etc..The meterCalculation machine computer-readable recording medium can include:Can carry any entity or device of the computer program code, recording medium, USB flash disk,Mobile hard disk, magnetic disc, CD, computer storage, read-only storage (ROM, Read-Only Memory), random access memoryDevice (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium etc..Need explanationIt is that the content that the computer-readable medium is included can be fitted according to legislation in jurisdiction and the requirement of patent practiceWhen increase and decrease, such as in some jurisdictions, according to legislation and patent practice, computer-readable medium does not include electric carrier wave letterNumber and telecommunication signal.
, can be with several embodiments provided by the present invention, it should be understood that disclosed computer installation and methodRealize by another way.For example, computer installation embodiment described above is only schematical, for example, describedThe division of unit, only a kind of division of logic function, can there is other dividing mode when actually realizing.
In addition, each functional unit in each embodiment of the invention can be integrated in same treatment unit, can alsoThat unit is individually physically present, can also two or more units be integrated in same unit.Above-mentioned integrated listMember can both be realized in the form of hardware, it would however also be possible to employ hardware adds the form of software function module to realize.
It is obvious to a person skilled in the art that the invention is not restricted to the details of above-mentioned one exemplary embodiment, Er QieIn the case of without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matterFrom the point of view of which point, embodiment all should be regarded as exemplary, and be nonrestrictive, the scope of the present invention is by appended powerProfit is required rather than described above is limited, it is intended that all in the implication and scope of the equivalency of claim by fallingChange is included in the present invention.Any reference in claim should not be considered as to the claim involved by limitation.ThisOutside, it is clear that the word of " comprising " one is not excluded for other units or step, and odd number is not excluded for plural number.Stated in computer installation claimMultiple units or computer installation can also be realized by same unit or computer installation by software or hardware.TheOne, the second grade word is used for representing title, and is not offered as any specific order.
Finally it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention and it is unrestricted, although referenceThe present invention is described in detail for preferred embodiment, it will be understood by those within the art that, can be to the present invention'sTechnical scheme is modified or equivalent substitution, without departing from the spirit and scope of technical solution of the present invention.

Claims (10)

CN201710648985.2A2017-08-012017-08-01Audio recognition method and device, computer installation and readable storage medium storing program for executingWithdrawnCN107293300A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201710648985.2ACN107293300A (en)2017-08-012017-08-01Audio recognition method and device, computer installation and readable storage medium storing program for executing

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201710648985.2ACN107293300A (en)2017-08-012017-08-01Audio recognition method and device, computer installation and readable storage medium storing program for executing

Publications (1)

Publication NumberPublication Date
CN107293300Atrue CN107293300A (en)2017-10-24

Family

ID=60104131

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201710648985.2AWithdrawnCN107293300A (en)2017-08-012017-08-01Audio recognition method and device, computer installation and readable storage medium storing program for executing

Country Status (1)

CountryLink
CN (1)CN107293300A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN107799125A (en)*2017-11-092018-03-13维沃移动通信有限公司A kind of audio recognition method, mobile terminal and computer-readable recording medium
CN108389573A (en)*2018-02-092018-08-10北京易真学思教育科技有限公司 Language recognition method and device, training method and device, medium, terminal
CN108847237A (en)*2018-07-272018-11-20重庆柚瓣家科技有限公司continuous speech recognition method and system
CN109599130A (en)*2018-12-102019-04-09百度在线网络技术(北京)有限公司Reception method, device and storage medium
CN109697976A (en)*2018-12-142019-04-30北京葡萄智学科技有限公司A kind of pronunciation recognition methods and device
CN109726536A (en)*2017-10-312019-05-07百度(美国)有限责任公司Method for authenticating, electronic equipment and computer-readable program medium
WO2019134463A1 (en)*2018-01-022019-07-11Boe Technology Group Co., Ltd.Lip language recognition method and mobile terminal
CN110534109A (en)*2019-09-252019-12-03深圳追一科技有限公司Audio recognition method, device, electronic equipment and storage medium
CN110827823A (en)*2019-11-132020-02-21联想(北京)有限公司Voice auxiliary recognition method and device, storage medium and electronic equipment
CN110992958A (en)*2019-11-192020-04-10深圳追一科技有限公司Content recording method, content recording apparatus, electronic device, and storage medium
CN111091824A (en)*2019-11-302020-05-01华为技术有限公司Voice matching method and related equipment
CN111768799A (en)*2019-03-142020-10-13富泰华工业(深圳)有限公司 Voice recognition method, device, computer device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20070136071A1 (en)*2005-12-082007-06-14Lee Soo JApparatus and method for speech segment detection and system for speech recognition
CN103745723A (en)*2014-01-132014-04-23苏州思必驰信息科技有限公司Method and device for identifying audio signal
CN104409075A (en)*2014-11-282015-03-11深圳创维-Rgb电子有限公司 Speech Recognition Method and System
CN105022470A (en)*2014-04-172015-11-04中兴通讯股份有限公司Method and device of terminal operation based on lip reading
CN105389097A (en)*2014-09-032016-03-09中兴通讯股份有限公司Man-machine interaction device and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20070136071A1 (en)*2005-12-082007-06-14Lee Soo JApparatus and method for speech segment detection and system for speech recognition
CN103745723A (en)*2014-01-132014-04-23苏州思必驰信息科技有限公司Method and device for identifying audio signal
CN105022470A (en)*2014-04-172015-11-04中兴通讯股份有限公司Method and device of terminal operation based on lip reading
CN105389097A (en)*2014-09-032016-03-09中兴通讯股份有限公司Man-machine interaction device and method
CN104409075A (en)*2014-11-282015-03-11深圳创维-Rgb电子有限公司 Speech Recognition Method and System

Cited By (18)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109726536A (en)*2017-10-312019-05-07百度(美国)有限责任公司Method for authenticating, electronic equipment and computer-readable program medium
CN107799125A (en)*2017-11-092018-03-13维沃移动通信有限公司A kind of audio recognition method, mobile terminal and computer-readable recording medium
WO2019134463A1 (en)*2018-01-022019-07-11Boe Technology Group Co., Ltd.Lip language recognition method and mobile terminal
US11495231B2 (en)2018-01-022022-11-08Beijing Boe Technology Development Co., Ltd.Lip language recognition method and mobile terminal using sound and silent modes
CN108389573A (en)*2018-02-092018-08-10北京易真学思教育科技有限公司 Language recognition method and device, training method and device, medium, terminal
CN108389573B (en)*2018-02-092022-03-08北京世纪好未来教育科技有限公司 Language recognition method and device, training method and device, medium and terminal
CN108847237A (en)*2018-07-272018-11-20重庆柚瓣家科技有限公司continuous speech recognition method and system
CN109599130B (en)*2018-12-102020-10-30百度在线网络技术(北京)有限公司Sound reception method, device and storage medium
CN109599130A (en)*2018-12-102019-04-09百度在线网络技术(北京)有限公司Reception method, device and storage medium
CN109697976A (en)*2018-12-142019-04-30北京葡萄智学科技有限公司A kind of pronunciation recognition methods and device
CN111768799A (en)*2019-03-142020-10-13富泰华工业(深圳)有限公司 Voice recognition method, device, computer device and storage medium
CN110534109A (en)*2019-09-252019-12-03深圳追一科技有限公司Audio recognition method, device, electronic equipment and storage medium
CN110827823A (en)*2019-11-132020-02-21联想(北京)有限公司Voice auxiliary recognition method and device, storage medium and electronic equipment
CN110992958A (en)*2019-11-192020-04-10深圳追一科技有限公司Content recording method, content recording apparatus, electronic device, and storage medium
CN110992958B (en)*2019-11-192021-06-22深圳追一科技有限公司Content recording method, content recording apparatus, electronic device, and storage medium
CN111091824A (en)*2019-11-302020-05-01华为技术有限公司Voice matching method and related equipment
CN111091824B (en)*2019-11-302022-10-04华为技术有限公司 A kind of voice matching method and related equipment
US12400657B2 (en)2019-11-302025-08-26Huawei Technologies Co., Ltd.Audio matching method and related device for robot control based on calculating a rotation angle

Similar Documents

PublicationPublication DateTitle
CN107293300A (en)Audio recognition method and device, computer installation and readable storage medium storing program for executing
US20200075024A1 (en)Response method and apparatus thereof
CN106874265B (en)Content output method matched with user emotion, electronic equipment and server
CN110427472A (en)The matched method, apparatus of intelligent customer service, terminal device and storage medium
CN112309365A (en) Training method, device, storage medium and electronic device for speech synthesis model
CN108922525B (en)Voice processing method, device, storage medium and electronic equipment
CN109346076A (en) Voice interaction, voice processing method, device and system
CN110265011B (en)Electronic equipment interaction method and electronic equipment
WO2017084197A1 (en)Smart home control method and system based on emotion recognition
CN108363706A (en)The method and apparatus of human-computer dialogue interaction, the device interacted for human-computer dialogue
CN102404278A (en)Song requesting system based on voiceprint recognition and application method thereof
WO2020253128A1 (en)Voice recognition-based communication service method, apparatus, computer device, and storage medium
CN107506166A (en)Information cuing method and device, computer installation and readable storage medium storing program for executing
CN106297801A (en)Method of speech processing and device
CN104538043A (en)Real-time emotion reminder for call
CN111312222A (en)Awakening and voice recognition model training method and device
CN107704612A (en)Dialogue exchange method and system for intelligent robot
CN107452382A (en)Voice operating method and device, computer installation and computer-readable recording medium
CN107393529A (en)Audio recognition method, device, terminal and computer-readable recording medium
CN108806684A (en)Position indicating method, device, storage medium and electronic equipment
CN108345612A (en)A kind of question processing method and device, a kind of device for issue handling
CN107291704A (en)Treating method and apparatus, the device for processing
CN112463108A (en)Voice interaction processing method and device, electronic equipment and storage medium
CN107329760A (en)Information cuing method, device, terminal and storage medium
CN110111795B (en)Voice processing method and terminal equipment

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
WW01Invention patent application withdrawn after publication
WW01Invention patent application withdrawn after publication

Application publication date:20171024


[8]ページ先頭

©2009-2025 Movatter.jp