CN108735209A

Movatterモバイル変換

Info

Publication number: CN108735209A
Application number: CN201810407844.6A
Authority: CN
Inventors: 何瑞澄
Original assignee: Midea Group Co Ltd; Guangdong Midea Refrigeration Equipment Co Ltd
Current assignee: Midea Group Co Ltd; GD Midea Air Conditioning Equipment Co Ltd
Priority date: 2018-04-28
Filing date: 2018-04-28
Publication date: 2018-11-02
Anticipated expiration: 2038-04-28
Also published as: CN108735209B

Abstract

The invention discloses a kind of wake-up word binding methods, including：Step S1, the voice signal that acquisition user sends out；Step S2, the wake-up word information and user information in the voice signal are extracted；Step S3, the user information and the wake-up word information are bound with the user.The present invention also proposes a kind of smart machine and storage medium.The present invention reduces operation, easy to use, raising intelligence degree without recording a large amount of voices.

Description

Wake up word binding method, smart machine and storage medium

Technical field

The present invention relates to a kind of technical field of voice recognition more particularly to wake-up word binding method, smart machine and storagesMedium.

Background technology

Speech recognition technology, exactly allow machine by identification and understanding process voice signal be changed into corresponding text orThe high-tech of order, that is, the voice that allows machine to understand the mankind.Also referred to as automatic speech recognition (Automatic SpeechRecognition, ASR), it is computer-readable input that target, which is by the vocabulary Content Transformation in human speech, such as byKey, binary coding or character string.Speech recognition technology comes into household electrical appliances, communication, electronic product, home services in recent yearsEqual fields are controlled with the near field or far field that provide household electrical appliances or electronic product, and it is user's household electrical appliances or electronics to wake up word binding technologyThe near field of product or far field control provide premise.

The mainstream technology for waking up word binding is that technical software wakes up, but running software is premised on system starts, to ensureThe phonetic order of user can be received whenever and wherever possible, and running background and monitoring, system cannot be introduced into speech recognition engine needs alwaysThe standby electricity-saving state of suspend mode, power consumption are larger.For reduction system power dissipation, occur voice low-power consumption awakening technology at present,It is trained to fixed wake-up word by recording a large amount of voice data, to identify the wake-up word in the phonetic order of userWhen wake up system.

But inventor has found that above-mentioned technology at least has the following technical problems：

The self-defined word that wakes up needs definition to record very much very more voice data, cumbersome, inconvenient to use, intelligenceChange degree is poor.

Invention content

The embodiment of the present invention solves existing self-defined wake-up word and needs to define by providing a kind of wake-up word binding methodRecord very much very more voice data, cumbersome, inconvenient to use, the technical problem of intelligence degree difference.

An embodiment of the present invention provides wake-up word binding method, include the following steps：

Step S1, the voice signal that acquisition user sends out；

Step S2, the wake-up word information and user information in the voice signal are extracted；

Step S3, the user information and the wake-up word information are bound with the user.

Optionally, the step S3 includes：

Step S31, the user's registration is obtained to the wake-up word model of speech recognition system, by the user information and instituteIt states and wakes up word and wake-up word model binding.

Optionally, it is voiceprint in the user information, the step S31 includes：

Step S311, multi collect wake-up word sound signal input by user；

Step S312, it is special to obtain timing feature, tonality feature and the phoneme waken up in word sound signal inputted every timeSign；

Step S313, acoustic feature processing is carried out to the timing feature and tonality feature that obtain every time, will passed throughThe timing characteristic information of acoustic feature processing and the voice print database that tonality feature information registering is the user；

Step S314, combination is ranked up to the phoneme feature obtained every time based on preset acoustic model, obtained describedWake up word model；

Step S315, by the voice print database and word and wake-up word model interaction preservation are waken up.

Optionally, the step S2 includes：

Step S21, when receiving voice signal, judge whether the volume value of the voice signal is more than default volumeValue；

Step S22, if so, obtaining the wake-up word information in the voice signal based on acoustic model and syntactic structure,The voiceprint in the voice signal is obtained based on sound groove recognition technology in e.

Optionally, after the step S3, further include：

Step S4, it receives and wakes up voice signal, extract the wake-up word waken up in voice signal；

Step S5, when the wake-up word is matched with the default wake-up word in speech recognition system, to the wake-up wordSound signal response executes the operation of response.

Optionally, after the step S4, further include：

Step S6, the recognition threshold of the default wake-up word in speech recognition system is adjusted；

Step S7, when the wake-up word is matched with the default wake-up word after adjustment, the wake-up word sound signal is rungThe operation of response should be executed.

Optionally, the user information is voiceprint, and the step S6 includes：

Step S61, the voiceprint waken up in word sound signal is extracted；

Step S62, when voice print database matched with the voiceprint is not present in speech recognition system, voice is turned upThe wake-up word recognition threshold of identifying system；

Step S63, when there is voice print database matched with the voiceprint in speech recognition system, voice knowledge is turned downThe wake-up word recognition threshold of other system.

Optionally, after the step S61, further include：

Step S64, the voiceprint and the vocal print number for being registered in speech recognition system are calculated according to default sound-groove modelAccording to similarity；

Step S65, when the similarity within a preset range when, judgement speech recognition system in exist and the vocal print believeCease matched voice print database；

Step S66, it when the similarity is when except preset range, is not present in judgement speech recognition system and the soundThe voice print database of line information matches.

The present invention also proposes a kind of storage medium, which, which is stored with, wakes up word binding procedure, and the wake-up word is tied upDetermine to realize the step of waking up word binding as described above when program is executed by processor.

The present invention by obtaining the wake-up word information in the voice signal that receives, by the wake-ups word and the user intoRow binding, rather than a large amount of voices of the recording of blindness, but after recording wake-up word, word will be waken up and bound with user information,In follow-up identification process, it can directly correspond to user and wake up word to identify, recognition accuracy be improved, without recording a large amount of languagesSound reduces operation, easy to use, improves intelligence degree.

Description of the drawings

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show belowThere is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only thisSome embodiments of invention for those of ordinary skill in the art without creative efforts, can be withThe structure shown according to these attached drawings obtains other attached drawings.

Fig. 1 is the structural schematic diagram for the hardware running environment that the smart machine of the present invention is related to；

Fig. 2 is the flow diagram of the wake-up word binding method first embodiment of the present invention；

Fig. 3 is to obtain the user's registration to the wake-up word model of speech recognition system, by institute in one embodiment of the inventionState user information and the flow diagram for waking up word and the wake-up word model binding；

Fig. 4 is the refinement flow diagram of step S20 in one embodiment of the invention；

Fig. 5 is the flow diagram that the present invention wakes up word binding method second embodiment；

Fig. 6 is the flow diagram that the present invention wakes up word binding method 3rd embodiment；

Fig. 7 is the flow diagram that recognition threshold is adjusted in one embodiment of the invention；

Fig. 8 is the flow diagram that voiceprint is judged in one embodiment of the invention；

Fig. 9 is the refinement flow diagram of step S203 in one embodiment of the invention；

Figure 10 is the refinement flow diagram of step S70 in one embodiment of the invention.

Drawing reference numeral explanation：

Label	Title	Label	Title
				100	Smart machine	101	Radio frequency unit
102	WiFi module	103	Audio output unit
				104	A/V input units	1041	Graphics processor
1042	Microphone	105	Sensor
				106	Display unit	1061	Display interface
107	User input unit	1071	Operation and control interface
				1072	Other input equipments	108	Interface unit
109	Memory	110	Processor
				111	Power supply

The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.

Specific implementation mode

It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.

In subsequent description, using for indicating that the suffix of such as " module ", " component " or " unit " of element is onlyThe explanation for being conducive to the present invention, itself does not have a specific meaning.Therefore, " module ", " component " or " unit " can mixGround uses.

Smart machine can be implemented in a variety of manners.For example, smart machine described in the present invention can be by such as handMachine, tablet computer, laptop, palm PC, personal digital assistant (Personal Digital Assistant, PDA),Portable media player (Portable Media Player, PMP), navigation device, wearable device, Intelligent bracelet, meter stepDevice, intelligent sound box etc. have display interface mobile terminal realize, can also by such as number TV, desktop computer, air conditioner,There is the fixed terminal of display interface to realize for refrigerator, water heater, dust catcher etc..

It will be illustrated by taking smart machine as an example in subsequent descriptions, it will be appreciated by those skilled in the art that in addition to specialExcept element for moving purpose, the intelligence that construction according to the embodiment of the present invention can also apply to fixed type is setIt is standby.

Referring to Fig. 1, a kind of hardware architecture diagram of its smart machine of each embodiment to realize the present invention, the intelligenceCan equipment 100 may include：RF (Radio Frequency, radio frequency) unit 101, WiFi module 102, audio output unit103, A/V (audio/video) input unit 104, sensor 105, display area 106, user input unit 107, interface unit108, the components such as memory 109, processor 110 and power supply 111.It will be understood by those skilled in the art that shown in Fig. 1Smart machine structure does not constitute the restriction to smart machine, and smart machine may include components more more or fewer than diagram,Either combine certain components or different components arrangement.

The all parts of smart machine are specifically introduced with reference to Fig. 1：

Radio frequency unit 101 can be used for receiving and sending messages or communication process in, signal sends and receivees, specifically, by base stationDownlink information receive after, to processor 110 handle；In addition, the data of uplink are sent to base station.In general, radio frequency unit 101Including but not limited to antenna, at least one amplifier, transceiver, coupler, low-noise amplifier, duplexer etc..In addition, penetratingFrequency unit 101 can also be communicated with network and other equipment by radio communication.Above-mentioned wireless communication can use any communicationStandard or agreement, including but not limited to GSM (Global System of Mobile communication, global system for mobile telecommunicationsSystem), GPRS (General Packet Radio Service, general packet radio service), CDMA2000 (CodeDivision Multiple Access2000, CDMA 2000), WCDMA (Wideband Code DivisionMultiple Access, wideband code division multiple access), TD-SCDMA (Time Division-Synchronous CodeDivision Multiple Access, TD SDMA), FDD-LTE (Frequency DivisionDuplexing-Long Term Evolution, frequency division duplex long term evolution) and TDD-LTE (Time DivisionDuplexing-Long Term Evolution, time division duplex long term evolution) etc..

WiFi belongs to short range wireless transmission technology, and smart machine can help user to receive and dispatch electricity by WiFi module 102Sub- mail, browsing webpage and access streaming video etc., it has provided wireless broadband internet to the user and has accessed.Although Fig. 1 showsGo out WiFi module 102, but it is understood that, and it is not belonging to must be configured into for smart machine, it completely can be according to needIt to be omitted in the range for the essence for not changing invention.Such as in the present embodiment, smart machine 100 can be based on WiFi module102 establish synchronization association relationship with App terminals.

Audio output unit 103 can be in call signal reception pattern, call mode, record mould in smart machine 100When under the isotypes such as formula, speech recognition mode, broadcast reception mode, it is that radio frequency unit 101 or WiFi module 102 are received orThe audio data stored in memory 109 is converted into audio signal and exports to be sound.Moreover, audio output unit 103The relevant audio output of specific function executed with smart machine 100 can also be provided (for example, call signal receives sound, disappearsBreath receives sound etc.).Audio output unit 103 may include loud speaker, buzzer etc..In the present embodiment, exportingWhen re-entering the prompt of voice signal, which can be voice prompt, the vibration prompting etc. based on buzzer.

A/V input units 104 are for receiving audio or video signal.A/V input units 104 may include graphics processor(Graphics Processing Unit, GPU) 1041 and microphone 1042, graphics processor 1041 is in video acquisition modeOr the image data of the static images or video obtained by image capture apparatus (such as camera) in image capture mode carries outReason.Treated, and picture frame may be displayed on display area 106.Through graphics processor 1041, treated that picture frame can be depositedStorage is sent in memory 109 (or other storage mediums) or via radio frequency unit 101 or WiFi module 102.MikeWind 1042 can connect in telephone calling model, logging mode, speech recognition mode etc. operational mode via microphone 1042Quiet down sound (audio data), and can be audio data by such acoustic processing.Audio that treated (voice) data canTo be converted to the format output that can be sent to mobile communication base station via radio frequency unit 101 in the case of telephone calling model.Microphone 1042 can implement various types of noises elimination (or inhibition) algorithms and send and receive sound to eliminate (or inhibition)The noise generated during frequency signal or interference.

Smart machine 100 further includes at least one sensor 105, such as optical sensor, motion sensor and other biographiesSensor.Specifically, optical sensor includes ambient light sensor and proximity sensor, wherein ambient light sensor can be according to environmentThe light and shade of light adjusts the brightness of display interface 1061, and proximity sensor can close when smart machine 100 is moved in one's earDisplay interface 1061 and/or backlight.As a kind of motion sensor, accelerometer sensor can detect in all directions (generalFor three axis) size of acceleration, size and the direction of gravity are can detect that when static, can be used to identify the application of mobile phone posture(such as horizontal/vertical screen switching, dependent game, magnetometer pose calibrating), Vibration identification correlation function (such as pedometer, percussion) etc.；The fingerprint sensor that can also configure as mobile phone, pressure sensor, iris sensor, molecule sensor, gyroscope, barometer,The other sensors such as hygrometer, thermometer, infrared sensor, details are not described herein.

Display area 106 is for showing information input by user or being supplied to the information of user.Display area 106 can wrapDisplay interface 1061 is included, liquid crystal display (Liquid Crystal Display, LCD), Organic Light Emitting Diode may be usedForms such as (Organic Light-Emitting Diode, OLED) configure display interface 1061.

User input unit 107 can be used for receiving the number or character information of input, and generate the use with smart machineFamily is arranged and the related key signals input of function control.Specifically, user input unit 107 may include operation and control interface 1071 withAnd other input equipments 1072.Operation and control interface 1071, also referred to as touch screen collect user on it or neighbouring touch operation(for example user uses any suitable objects or attachment such as finger, stylus on operation and control interface 1071 or in operation and control interface 1071Neighbouring operation), and corresponding attachment device is driven according to preset formula.Operation and control interface 1071 may include touch detectionTwo parts of device and touch controller.Wherein, the touch orientation of touch detecting apparatus detection user, and detect touch operation bandThe signal come, transmits a signal to touch controller；Touch controller receives touch information from touch detecting apparatus, and by itIt is converted into contact coordinate, then gives processor 110, and order that processor 110 is sent can be received and executed.In addition, canTo realize operation and control interface 1071 using multiple types such as resistance-type, condenser type, infrared ray and surface acoustic waves.In addition to operation and control interface1071, user input unit 107 can also include other input equipments 1072.Specifically, other input equipments 1072 can wrapIt includes but is not limited in physical keyboard, function key (such as volume control button, switch key etc.), trace ball, mouse, operating lever etc.It is one or more, do not limit herein specifically.

Further, operation and control interface 1071 can cover display interface 1061, when operation and control interface 1071 detect on it orAfter neighbouring touch operation, processor 110 is sent to determine the type of touch event, is followed by subsequent processing device 110 according to touch thingThe type of part provides corresponding visual output on display interface 1061.Although in Fig. 1, operation and control interface 1071 and display interface1061 be to realize the function that outputs and inputs of smart machine as two independent components, but in certain embodiments, canThe function that outputs and inputs of smart machine is realized so that operation and control interface 1071 and display interface 1061 is integrated, is not done herein specificallyIt limits.

Interface unit 108 be used as at least one external device (ED) connect with smart machine 100 can by interface.For example,External device (ED) may include wired or wireless headphone port, external power supply (or battery charger) port, wired or nothingLine data port, memory card port, the port for connecting the device with identification module, audio input/output (I/O) endMouth, video i/o port, ear port etc..Interface unit 108 can be used for receiving the input from external device (ED) (for example, numberIt is believed that breath, electric power etc.) and the input received is transferred to one or more elements in smart machine 100 or can be withFor the transmission data between smart machine 100 and external device (ED).

Memory 109 can be used for storing software program and various data.Memory 109 can include mainly storing program areaAnd storage data field, wherein storing program area can storage program area, application program (such as the language needed at least one functionSound identifying system etc.) etc.；Storage data field can store according to smart machine use created data (such as voice print database,Wake up word model, user information etc.) etc..In addition, memory 109 may include high-speed random access memory, can also includeNonvolatile memory, for example, at least a disk memory, flush memory device or other volatile solid-state parts.

Processor 110 is the control centre of smart machine, utilizes each of various interfaces and the entire smart machine of connectionA part by running or execute the software program and/or module that are stored in memory 109, and calls and is stored in storageData in device 109 execute the various functions and processing data of smart machine, to carry out integral monitoring to smart machine.PlaceReason device 110 may include one or more processing units；Preferably, processor 110 can integrate application processor and modulatedemodulate is mediatedManage device, wherein the main processing operation system of application processor, user interface and application program etc., modem processor is mainProcessing wireless communication.It is understood that above-mentioned modem processor can not also be integrated into processor 110.

Smart machine 100 can also include the power supply 111 (such as battery) powered to all parts, it is preferred that power supply 111Can be logically contiguous by power-supply management system and processor 110, to realize management charging by power-supply management system, putThe functions such as electricity and power managed.

Although Fig. 1 is not shown, smart machine 100 can also include the bluetooth module that communication connection can be established with other-endDeng details are not described herein.

Based on the hardware configuration of above-mentioned smart machine, the smart machine of the embodiment of the present invention is mounted with speech recognition system,Wake-up word information in the voice signal received by acquisition, the wake-up word is bound with the user, rather thanThe a large amount of voices of recording of blindness, but after recording wake-up word, word will be waken up and bound with user information, in follow-up identification processIn, it can directly correspond to user and wake up word to identify, improve recognition accuracy, without a large amount of voices of recording, reduce operation,It is easy to use, improve intelligence degree.

As shown in Figure 1, as may include operating system and wake-up word in a kind of memory 109 of computer storage mediaBinding procedure.

In smart machine 100 shown in Fig. 1, WiFi module 102 is mainly used for connecting background server or big data cloudEnd, with background server or big data high in the clouds into row data communication, and can realize and be communicatively coupled with other-end equipment；PlaceReason device 110 can be used for calling the wake-up word binding application program stored in memory 109, and execute following operation：

Step S1, the voice signal that acquisition user sends out；

Optionally, the step S3 includes：

Further, it is voiceprint in the user information, processor 110 can be used for calling to be deposited in memory 109The wake-up word binding application program of storage, and execute following operation：

Step S311, multi collect wake-up word sound signal input by user；

Further, processor 110 can be used for calling the wake-up word binding application program stored in memory 109, andExecute following operation：

Further, after the step S3, processor 110 can be used for calling the wake-up word stored in memory 109Binding application program, and execute following operation：

Further, after the step S4, processor 110 can be used for calling the wake-up word stored in memory 109Binding application program, and execute following operation：

Further, the user information is voiceprint, and processor 110 can be used for calling to be stored in memory 109Wake-up word binding application program, and execute following operation：

Step S61, the voiceprint waken up in word sound signal is extracted；

Further, after the step S61, processor 110 can be used for calling the wake-up word stored in memory 109Binding application program, and execute following operation：

Present invention further propose that a kind of wake-up word binding method, applied to wake-up speech recognition system or is mounted with voiceThe smart machine of identifying system.

With reference to the flow diagram for the wake-up word binding method first embodiment that Fig. 2, Fig. 2 are the present invention.

In this embodiment, the wake-up word binding method includes the following steps：

S10：The voice signal that acquisition user sends out；

In the present embodiment, when user for the first time wakes up the speech recognition system using self-defined wake-up word orPerson fails to avoid waking up in the wake-up word for needing typing user, improves wake-up rate, need the customized wake-up of training userWord model, to be responded when receiving user and inputting wake-up word corresponding comprising the wake-up word model.User sends outVoice signal, acquires the voice signal that the user sends out, can be in the voice signal include " air-conditioning ", " dehumidifier " or" fan " etc. can also be to be provided as waking up word in advance including " booting ", " temperature is turned up ", " one grade of wind speed is turned up " etc.Information.

S20, the wake-up word information in the extraction voice signal and user information；

After getting voice signal input by user, the wake-up word information and user's letter in the voice signal are extractedBreath；The user information can be subscriber identity information, can be used for identifying the information of user for user's voice print database etc..It is describedThe extraction for waking up word and user information is converted into text information by the conversion to voice signal, is carried from text informationIt is taken as waking up word and carries the sentence of user information.

S30, the user information and the wake-up word information are bound with the user.

Specifically, the user-defined wake-up word sound signal of acquisition, as user can repeatedly input the voice of " air-conditioning "Signal, and after smart machine is based on microphone or audio sensor pickup to the voice signal of " air-conditioning ", obtain the useFamily is registered to the wake-up word model of speech recognition system, and the user information and the wake-up word are tied up with the wake-up word modelIt is fixed.

More accurately adjustment is carried out to waking up word recognition threshold according to the voice print database recognized for ease of follow-up, is being obtainedAfter getting the voice print database of registration user and the wake-up word model of registration, further by the voice print database and the wake-up word mouldType is associated, and establishes incidence relation between the two.

The present embodiment is by obtaining the wake-up word information in the voice signal received, by the wake-up word and the userIt is bound, rather than a large amount of voices of the recording of blindness, but after recording wake-up word, word will be waken up and bound with user information,In follow-up identification process, it can directly correspond to user and wake up word to identify, improve recognition accuracy, it is a large amount of without recordingVoice reduces operation, easy to use, improves intelligence degree.

Further, with reference to Fig. 3, the wake-up word binding method based on above-described embodiment is described to obtain the user's registrationTo the wake-up word model of speech recognition system, by the user information and the step for waking up word and the wake-up word model bindingSuddenly include：

S100：Multi collect wake-up word sound signal input by user；

In the present embodiment, the user information is described by taking user's voice print database as an example.In order to improve the essence for waking up word bindingParasexuality, method, can be with multi collect wake-up word sound signal input by user, then according to more in sample phase in the present embodimentThe wake-up word sound signal of secondary acquisition obtains optimal wake-up word model and voice print database.

S200：Obtain timing feature, tonality feature and the phoneme feature waken up in word sound signal inputted every time；

In the voice print database and user's registration for obtaining user according to the wake-up word sound signal of multi collect to speech recognitionWhen the wake-up word model of system, the wake-up word sound signal that same user inputs every time is specially converted into voice digital signalAfterwards, timing feature and the tonality feature in the voice signal are obtained based on sound groove recognition technology in e；Based on acoustic model and languageMethod structure obtains the factor feature in the voice signal, such as obtains voice signal in various paragraph (such as sounds by end-point detectionElement, syllable, morpheme) initial point and final position, unvoiced segments are excluded from voice signal.

S300：Acoustic feature processing is carried out to the timing feature and tonality feature that obtain every time, acoustics will be passed throughThe timing characteristic information and tonality feature information registering of characteristic processing are the voice print database of the user；

After getting the wake-up word sound signal of input for the first time, timing feature 1 and sound are obtained based on Application on Voiceprint RecognitionFeature 1 is adjusted, the timing feature 2 and tonality feature 2 of second of input waken up in word sound signal is then obtained, it is poor when existingWhen different larger, optimize timing feature 1 using timing feature 2, optimizes tonality feature 1 using tonality feature 2, and so on,Until the timing feature n obtained again and tonality feature n is respectively between current timing feature n-1, tonality feature n-1Difference within a preset range, the current tempo sense feature and tonality feature are registered as into the use after acoustic feature is handledThe voice print database at family.

S400：Combination is ranked up to the phoneme feature obtained every time based on preset acoustic model, obtains the wake-upWord model；

S500：By the voice print database and wake up word and wake-up word model interaction preservation.

After the wake-up word model for the voice print database and registration for getting user, by the user information of the user, such as useThe voice print database and wake-up word are preserved with the wake-up word model interaction to the voice and are known by family account, Customs Assigned Number etc.Other system, in order to determine the corresponding wake-up word model of the user according to the voice print database identified in follow-up wakeup process, withWord identification is waken up for subsequently making.It by voice print database and wakes up word and wakes up being associated with for word model so that pass through voice print databaseIt is more accurate that identification wakes up.

Further, reference Fig. 4, the wake-up word binding method based on above-described embodiment, step S20 include：

S20a：When receiving voice signal, judge whether the volume value of the voice signal is more than default volume value；

In the present embodiment, since vocal print is the sound wave spectrum with verbal information, vocal print itself and amplitude, frequency, baseBecause profile, formant frequency bandwidth etc. are closely related, and sound wave is in communication process, the voice that the distance of propagation far receivesThe volume value of signal is smaller, and amplitude and volume value are inversely, so the volume value of vocal print and the voice signal receivedIt is related.In addition, the speech recognition engine of speech recognition system only identifies that speech volume reaches the voice of predetermined threshold value, therefore, it isWhether the accuracy for improving Application on Voiceprint Recognition and speech recognition needs the volume value for the voice signal for judging to receive to be more than defaultVolume value, the default volume value are the minimal volume value of the voice signal needed for Application on Voiceprint Recognition and speech recognition.

S20b：If so, obtaining the wake-up word information in the voice signal, and base based on acoustic model and syntactic structureThe voiceprint in the voice signal is obtained in sound groove recognition technology in e.

When the volume value of the voice signal received is more than default volume value, judge that the voice signal received is effective,Can Application on Voiceprint Recognition and acoustics model analysis further be carried out to it, such as based on end-point detection by voice signal phoneme, syllable,Unvoiced segments in the paragraphs such as morpheme exclude, and the syllable characteristic being then based in voice signal obtains the vocal print letter of the voice signalBreath, the wake-up in the voice signal is obtained based on morpheme feature, phoneme feature acoustic model and the syntactic structure in voice signalWord information.

Further, with reference to Fig. 5, the wake-up word binding method based on above-described embodiment after step S30, further includes：

S40 is received and is waken up voice signal, extracts the wake-up word waken up in voice signal；

S50, when the wake-up word is matched with the default wake-up word in speech recognition system, to the wake-up word messageNumber response execute response operation.

After user has binding to wake up word, receives and wake up word sound signal, make wake operation, extract the wake-up voiceWake-up word in signal, when the wake-up word is matched with the default wake-up word in speech recognition system, to the wake-up wordSound signal response executes the operation of response.The wake-up word of extraction it is corresponding with user be stored in speech recognition system default call outWhen word of waking up matches, the operation of response is executed.Realize accurate wake up.

Further, it in order to preferably accomplish to wake up, reduces error rate and after the step S40, is also wrapped with reference to figure 6It includes：

S60：Adjust the recognition threshold of the default wake-up word in speech recognition system；

S70：When the wake-up word is matched with the default wake-up word after adjustment, wake-up word sound signal response is heldThe operation of row response.

It adjusts, will not immobilize to waking up word, adjusted as user situation is different.Specifically, with reference toThe process of Fig. 7, the adjustment includes：

S201：Extract the voiceprint in the wake-up word sound signal；

After extracting wake-up word information, voiceprint is extracted from the wake-up word sound signal, due to of the invention realThe main purpose applied is exactly to solve user to speech recognition system or to be mounted with the language using personalized or customized wake-up wordWhen the smart machine of sound identifying system is waken up, the low problem of wake-up rate, and wake up word binding technology and speech recognition technologyCore be exactly that training pattern and identification model need in advance so in order to improve the wake-up rate of speech recognition in speech recognitionSystem registry wakes up word model and voice print database accordingly, for waking up the voice after the matched voice signal of user's the input phaseIdentifying system.In order to further increase the wake-up rate of speech recognition system, and avoid false wake-up caused by ambient noise, Ke YiyouFirst judge to whether there is and the matched voice print database of the voiceprint in speech recognition system.Exist in speech recognition systemWhen the voiceprint, step S202 is executed, in the absence of, execute step S203.

S202：Turn down the wake-up word recognition threshold of speech recognition system；

When there is voice print database matched with the voiceprint in the speech recognition system, can be existed according to userThe voice print database registered in the speech recognition system determines that the active user of the smart machine as registered users, eliminates ringThe case where border noise or other sound false wake-ups, to turn down the wake-up word recognition threshold of the corresponding user of the voice print database, withImprove the probability that user wakes up speech recognition system.

S203：The wake-up word recognition threshold of speech recognition system is turned up.

When voice print database matched with the voiceprint is not present in the speech recognition system, the language may infer thatSound signal may be ambient noise, it is also possible to what nonregistered user was sent out, in order to avoid false wake-up caused by ambient noise, togetherThe wake-up word recognition threshold of speech recognition system can be accordingly turned up, to carry in the safety of Shi Tigao speech recognition systems at this timeHeight wakes up difficulty.

Further, with reference to Fig. 8, the wake-up word binding method based on above-described embodiment after step S201, further includes：

S204：The voiceprint is calculated according to default sound-groove model and is registered in the voice print database of speech recognition systemSimilarity；

In the present embodiment, judging in speech recognition system with the presence or absence of matched with the voiceprint in voice signalWhen voice print database, in order to improve the accuracy of Application on Voiceprint Recognition to improve subsequently to the wake-up rate of subsequent speech recognition, judgingWhen can calculate the voiceprint in voice signal based on default sound-groove model and be registered in the voice print database of speech recognition systemSimilarity, can be specifically to carry out syllable state to the tone A in the voiceprint based on the default sound-groove model to cutPoint, be then based on same means and syllable state cutting carried out to the tone S in the voice print database, then compare tone A withThe registration of each state syllable of tone S, the registration are the similarity.In other embodiments, can also pass throughThe timing B and timing D in voice print database compared in the voiceprint in voice signal calculates the similarity.

S205：When the similarity within a preset range when, judgement speech recognition system in exist and the voiceprintMatched voice print database；

When the registration of tone A and each state syllable of tone S within a preset range when, it is possible to determine that speech recognition systemIt is interior to exist and the matched voice print database of the voiceprint.

S206：When the similarity is when except preset range, it is not present in judgement speech recognition system and the vocal printThe voice print database of information matches.

When the registration of tone A and each state syllable of tone S is when except preset range, judge in speech recognition systemThere is no with the matched voice print database of the voiceprint.

Further, with reference to Fig. 9, the wake-up word binding method based on above-described embodiment, step S203, including：

S2031：When voice print database matched with the voiceprint is not present in speech recognition system, current use is obtainedFamily status information and image information；

In the present embodiment, when the registration of tone A and each state syllable of tone S is when except preset range, judgementIn speech recognition system there is no with the matched voice print database of the voiceprint, may be at this time user have input it is unregisteredWake up word, it is also possible to which receiving ambient noise causes, thus needs further to obtain current user state information and image letterBreath, whether to be whether the voice signal registered user or received judges as ambient noise to active user.

S2032：When detecting except the non-sounding of active user, the identification range in speech recognition system or current useWhen family is unregistered, the wake-up word recognition threshold of speech recognition system is turned up.

When judging that the non-sounding of user or judgement user are in speech recognition system according to the current user state information of acquisitionIdentification range except when, judge the voice signal received for ambient noise, to reduce ambient noise caused by false wake-up, adjustThe wake-up word recognition threshold of high speech recognition system reduces false wake-up rate to improve wake-up difficulty.When the current use according to acquisitionWhen family image information judgement active user is unregistered, the wake-up word recognition threshold of speech recognition system is turned up, hardly possible is waken up to improveDegree, improves the safety of speech recognition.

Further, referring to Fig.1 0, the wake-up word binding method based on above-described embodiment, step S70 includes：

S71：It counts the wake-up word information in the voice signal received and is registered to the wake-up of speech recognition systemThe matching degree of word model；

In the present embodiment, due to mainly matching the wake-up word information in voice signal with word model is waken up, andSpecific matched mode can be the matching degree of the permutation and combination between phoneme, such as when it includes 48 phonemes to wake up word model,It needs to count the wake-up word information in the voice signal received, namely statistics wakes up the phoneme feature in word information, then comparesFurther compare the permutation and combination method between phoneme when reaching preset quantity compared with the phoneme waken up in word information.

S72：When the matching degree reaches the wake-up word recognition threshold after turning down or being turned up, wake up speech recognition system orWake up the smart machine where speech recognition system.

Reach the coincidence factor of the permutation and combination between preset quantity and phoneme more than default when waking up the phoneme in word informationWhen threshold value, judge that the wake-up word information in voice signal reaches the wake-up word after turning down or being turned up with the matching degree for waking up word modelRecognition threshold at this time can make a response the voice signal, such as wake up speech recognition system or wake up speech recognition systemThe smart machine at place is instructed with the phonetic control command or interactive voice that identify subsequent user input, and then make a responseControl action or interactive action, to improve the intelligent of smart machine.

In addition, the embodiment of the present invention also proposes a kind of storage medium, which, which is stored with, wakes up word binding using journeySequence realizes the step of waking up word binding method as described above when the wake-up word binding procedure is executed by processor.

Wherein, wake-up word binding procedure is performed realized method and can refer to each of present invention wake-up word binding methodA embodiment, details are not described herein again.

It should be understood by those skilled in the art that, the embodiment of the present invention can be provided as method, system or computer programProduct.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present inventionApply the form of example.Moreover, the present invention can be used in one or more wherein include computer usable program code computerThe computer program production implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.)The form of product.

The present invention be with reference to according to the method for the embodiment of the present invention, the flow of equipment (system) and computer program productFigure and/or block diagram describe.It should be understood that can be realized by computer program instructions every first-class in flowchart and/or the block diagramThe combination of flow and/or box in journey and/or box and flowchart and/or the block diagram.These computer programs can be providedInstruct the processor of all-purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produceA raw machine so that the instruction executed by computer or the processor of other programmable data processing devices is generated for realThe device for the function of being specified in present one flow of flow chart or one box of multiple flows and/or block diagram or multiple boxes.

These computer program instructions, which may also be stored in, can guide computer or other programmable data processing devices with spyDetermine in the computer-readable memory that mode works so that instruction generation stored in the computer readable memory includes referring toEnable the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one box of block diagram orThe function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device so that countSeries of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, in computer orThe instruction executed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram oneThe step of function of being specified in a box or multiple boxes.

It should be noted that in the claims, any reference mark between bracket should not be configured to powerThe limitation that profit requires.Word "comprising" does not exclude the presence of component not listed in the claims or step.Before componentWord "a" or "an" does not exclude the presence of multiple such components.The present invention can be by means of including several different componentsIt hardware and is realized by means of properly programmed computer.In the unit claims listing several devices, these are filledSeveral in setting can be embodied by the same hardware branch.The use of word first, second, and third is notIndicate any sequence.These words can be construed to title.

Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basicProperty concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted asIt selects embodiment and falls into all change and modification of the scope of the invention.

Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the artGod and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologiesWithin, then the present invention is also intended to include these modifications and variations.