Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, completeSite preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based onEmbodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every otherEmbodiment shall fall within the protection scope of the present invention.
The embodiment of the invention provides a kind of method for recognizing sound-groove and devices.The method for recognizing sound-groove and device can be applied toIt is in need identification unknown subscriber's identity scene or equipment in.The character in character string for carrying out Application on Voiceprint Recognition can beArabic numerals, English alphabet or other language characters etc..To simplify the description, the character in the embodiment of the present invention is with ArabIt is illustrated for number.
Method for recognizing sound-groove in the embodiment of the present invention can be divided into two stages, as shown in Figure 1:
1) the voiceprint registration stage of user is registered
In the voiceprint registration stage, a login-string (the second character occurred hereinafter can be read aloud by registering userString), voice print identification device acquires registration voice messaging of the registration user when reading aloud the login-string, then to registration languageMessage breath carry out voice recognition obtain it is described registration voice messaging in include respectively with multiple words in the login-stringCorresponding sound bite is accorded with, and then vocal print feature extraction and vocal print model training are carried out to the corresponding sound bite of each character,Including the vocal print feature according to the corresponding sound bite of each character, in conjunction with the corresponding common background of preset respective symbolsModel (Universal Background Model, UBM, i.e. GMM-UBM) training obtains each character in registration voice messagingCorresponding feature vector, then voice print identification device can be respectively that different registration users reads aloud it in the voiceprint registration stageRegistration voice messaging in the corresponding feature vector of multiple characters be stored in the model library of voice print identification device.
For example, login-string is digit strings 0185851, four kinds of digital " 0 "s, " 1 ", " 5 ", " 8 " are contained, then soundLine identification device carries out vocal print feature extraction and sound-groove model according to the corresponding sound bite of character each in registration voice messagingTraining, obtain " 0 ", " 1 ", " 5 ", " 8 " corresponding sound bite vocal print feature, and then combine preset respective symbols it is correspondingUBM training obtains the corresponding feature vector of each character in registration voice messaging, including feature vector corresponding with digital " 0 ",And digital " 1 " corresponding feature vector feature vector corresponding with number " 5 " and feature vector corresponding with number " 8 ".
2) the identification stage of user is verified
In the identification stage, the user for verifying the i.e. unknown identity of user reads aloud a verifying character string (to be occurred hereinafterThe first character string, second character string possesses at least one identical character with first character string), Application on Voiceprint Recognition dressVerifying voice messaging of the acquisition verifying user when reading aloud the verifying character string is set, sound then is carried out to verifying voice messagingIdentification obtains the voice sheet corresponding with multiple characters in the verifying character string respectively for including in the verifying voice messagingSection, and then vocal print feature extraction and vocal print model training are carried out to the corresponding sound bite of each character, including according to described eachThe vocal print feature of the corresponding sound bite of a character is verified voice letter in conjunction with the corresponding UBM training of preset respective symbolsThe corresponding feature vector of each character in breath finally calculates the corresponding feature vector of each character in verifying voice messaging and defaultRegistration voice messaging in the corresponding feature vector of respective symbols similarity score, tested if the similarity score reaches defaultThresholding is demonstrate,proved, then the verifying user is determined as the corresponding registration user of the registration voice messaging.
For example, verifying character string is digit strings 85851510, then when voice print identification device is read aloud according to verifying userThe corresponding sound bite of each character carries out vocal print feature and extracts and vocal print model training in the verifying voice messaging of generation, obtains" 0 ", " 1 ", " 5 ", " 8 " corresponding GMM, and then combine the corresponding UBM of preset respective symbols that verifying user can be calculatedVerifying voice messaging feature vector, including and the corresponding feature vector of digital " 0 ", feature vector corresponding with number " 1 ",And digital " 5 " corresponding feature vector and feature vector corresponding with digital " 8 ", and then calculate separately in verifying voice messaging" 0 ", " 1 ", " 5 ", " 8 " corresponding feature vector spy corresponding with " 0 ", " 1 ", " 5 ", " 8 " in registration voice messaging respectivelyThe similarity score between vector is levied, if the similarity score reaches default verifying thresholding, the verifying user is determinedFor the corresponding registration user of the registration voice messaging.
It should be pointed out that the voiceprint registration stage of above-mentioned registration user and the identification stage of verifying user can beIt realizes, can also be realized in different devices in same device respectively, such as the vocal print note of registration userThe volume stage implements in the first equipment, and then the first equipment will be registered the corresponding feature vector of multiple characters in voice messaging and be sent outThe second equipment is given, so as to implement the identification stage of verifying user in the second equipment.
Above-mentioned two process is described in detail respectively below by specific embodiment.
Fig. 2 is the flow diagram of one of embodiment of the present invention method for recognizing sound-groove, in the present embodiment as shown in the figureMethod for recognizing sound-groove process may include:
S201 obtains verifying user and reads aloud verifying voice messaging caused by the first character string.
Verifying user, that is, unknown identity user, needs to verify its user identity by voice print identification device.It is describedFirst character string is that the character string of authentication is carried out for verifying user, can be randomly generated, and is also possible to default solidA fixed character string, such as the second character string corresponding with pre-generated registration voice messaging are one at least partly identicalCharacter string.Specifically, the character string may include m character, wherein there is n mutually different characters, m, n are positive wholeNumber, and m >=n.
For example, the first character string is " 12358948 ", totally 8 characters, include 7 kinds of mutually different characters " 1 ", " 2 ",“3”、“4”、“5”、“8”、“9”。
In an alternative embodiment, voice print identification device can be generated and show first character string, allows and verifies user's rootIt is read aloud according to first character string of display.
S202, to it is described verifying voice messaging carry out speech recognition obtain it is described verifying voice messaging in include respectively withThe corresponding sound bite of multiple characters in first character string.
As shown in figure 3, voice print identification device can be filtered by speech recognition and intensity of sound, by the verifying voiceInformation divides to obtain the corresponding sound bite of multiple characters, can also optionally weed out invalid voice segment, after being not involved inContinuous treatment process.
S203 extracts the vocal print feature of the corresponding sound bite of each character.
Specifically, voice print identification device can extract the MFCC (Mel in the corresponding sound bite of each characterFrequency Cepstrum Coefficient, mel cepstrum coefficients) or PLP (Perceptual LinearPredictive perceives linear predictor coefficient), the vocal print feature as sound bite corresponding to each character.
S204, it is corresponding in conjunction with preset respective symbols according to the vocal print feature of the corresponding sound bite of each characterUniversal background model training be verified the corresponding feature vector of each character in voice messaging;
The universal background model UBM in the embodiment of the present invention is a kind of language of optional network specific digit by a large amount of speakersMixed Gauss model made of segment combined training characterizes distribution of the voice of corresponding number in feature space, and due to instructionPractice data source in a large amount of speaker, therefore it does not characterize certain one kind and specifically talks about people, it, can with the unrelated characteristic of identityRegard a kind of universal background model as.It schematically, can be more than 20 hours languages greater than 1000 people, duration using number of speakingSound sample, and the frequency of occurrences relative equilibrium of each character, training obtain UBM.The mathematic(al) representation of UBM are as follows:
P (x)=∑I=1 ... CaiN(x|μi, ∑i) ... ... formula (1)
Wherein, P (x) represents the probability distribution of UBM, and C, which is represented, shares C Gauss module in UBM, sums up, aiIt representsThe weight of i-th of Gauss module, μiRepresent the mean value of i-th of Gauss module, ∑iRepresent the variance of i-th of Gauss module, N (x)Gaussian Profile is represented, x represents the sample of input, sample namely vocal print feature.
Voice print identification device can will verify the vocal print feature of the corresponding sound bite of each character in voice messaging asTraining sample data, using maximal posterior probability algorithm (Maximum A Posteriori, MAP) to preset respective symbols pairThe parameter for the universal background model answered is adjusted, i.e., in the sound that will verify the corresponding sound bite of each character in voice messagingAfter line feature substitutes into formula (1) as input sample, by constantly adjusting the corresponding universal background model of preset respective symbolsParameter, so that posterior probability P (x) is maximum, so as to which the maximum parameter of posterior probability P (x) is determining to verify voice according to makingThe corresponding feature vector of respective symbols in information.
Due to largely test the mean value for demonstrating each Gauss module in UBM model with paper can be used for distinguish speakThe identity information of people, we define the mean value super vector of UBM model are as follows:
To which voice print identification device can be by the vocal print feature of the corresponding sound bite of character each in verifying voice messagingAs training sample data, using maximal posterior probability algorithm (Maximum A Posteriori, MAP) to preset corresponding wordThe mean value super vector for according with corresponding universal background model is adjusted, i.e., will verify the corresponding language of each character in voice messagingAfter the vocal print feature of tablet section substitutes into formula (1) as input sample, by constantly adjusting mean value super vector, so that posterior probability P(x) maximum, so as to which the maximum mean value super vector of posterior probability P (x) will be made as respective symbols in verifying voice messagingCorresponding feature vector.
In another alternative embodiment, the slow problem of high-dimensional bring convergence rate in order to reduce super vector, wePass through principal component analytical method based on probability (PPCA, probabilistic principal component analysis)The variation range of mean value super vector is limited in a sub-spaces, voice print identification device can will be verified each in voice messagingThe vocal print feature of the corresponding sound bite of character is as training sample data, using maximal posterior probability algorithm to preset correspondingThe mean value super vector of the corresponding universal background model of character is adjusted, and combines preset super vector subspace matrices to obtainThe corresponding feature vector of each character into verifying voice messaging.In the specific implementation, can be using following formula to preset corresponding wordThe mean value super vector for according with corresponding universal background model is adjusted, so that the corresponding common background mould of respective symbols adjustedThe posterior probability of type is maximum:
M=m+T ω, wherein M represents the mean value super vector of the universal background model of some character adjusted, and m, which is represented, to be adjustedThe mean value super vector of the universal background model of respective symbols before whole, T are preset super vector subspace matrices, and ω is to verifyThe corresponding feature vector of respective symbols in voice messaging will verify the corresponding sound bite of each character in voice messagingAfter vocal print feature substitutes into formula (1) as input sample, by constantly adjust the mean value that ω may be implemented in adjustment type (1) surpass toAmount, so that posterior probability P (x) is maximum, so as to which the maximum ω of posterior probability P (x) will be made as in verifying voice messagingThe corresponding feature vector of respective symbols.The super vector subspace matrices T be according to the mean value of the gauss hybrid models surpass toWhat the correlation determination in amount between each dimension vector obtained.
S205 calculates the corresponding feature vector of each character and phase in preset registration voice messaging in verifying voice messagingThe similarity score of the corresponding feature vector of character is answered, if the similarity score reaches default verifying thresholding, is tested describedCard user is determined as the corresponding registration user of the registration voice messaging.
Specifically, voice print identification device can the voiceprint registration stage get registration user registration voice messaging,And extracted by the vocal print feature similar with the present embodiment and vocal print model training, it is each in available registration voice messagingThe corresponding feature vector of the sound bite of character.The registration voice messaging can be voice print identification device and obtain registration userIt reads aloud and registers voice messaging caused by the second character string, second character string and first character string possess at least oneIdentical character, i.e., described corresponding second character string of registration voice messaging and first character string are at least partly identical.IntoAnd in an alternative embodiment, it is corresponding that voice print identification device can also obtain respective symbols in the registration voice messaging from outsideAfter feature vector, i.e. registration user are by other equipment typing registration voice messaging, other equipment or server pass through soundLine feature extraction and vocal print model training obtain the corresponding feature vector of sound bite of each character in registration voice messaging, soundLine identification device is by getting the corresponding feature of respective symbols in the registration voice messaging from other equipment or serverVector, thus verifying user the identification stage to feature vector corresponding with each character in verifying voice messaging intoRow compares.
In the specific implementation, the similarity score is that voice print identification device is corresponding by each character in verifying voice messagingAfter feature vector feature vector corresponding with respective symbols in preset registration voice messaging is compared, identical characters are measuredThe score value of similarity degree between two feature vectors.In an alternative embodiment, each word in verifying voice messaging can be calculatedAccord with the COS distance value between corresponding feature vector feature vector corresponding with respective symbols in preset registration voice messagingAs the similarity score, that is, be calculate by the following formula some character respectively verifying voice messaging in corresponding feature vector andRegister the similarity score between the feature vector in voice messaging:
Wherein, subscript i indicates i-th of verifying voice messaging and registers the character shared in voice messaging, ωi(tar) tableShow the character corresponding feature vector, ω in verifying voice messagingi(test) indicate that the character is right in registration voice messagingThe feature vector answered.If verifying in voice messaging and registration voice messaging includes multiple identical characters, can be according to above formulaThe similarity score for each character being calculated takes mean value, if the similarity score mean value of each character reaches corresponding defaultThresholding is verified, then the verifying user is determined as the corresponding registration user of the registration voice messaging.Multidigit is registered if it existsUser, such as registration user A, B and C shown in FIG. 1, can be according to the feature vector and each note for verifying some character of userThe similarity of the feature vector of the respective symbols of volume user, when the feature vector and verifying language of the respective symbols of some registration userThe similarity score highest and similarity of the feature vector of the character of sound reach default verifying thresholding, then make registration userFor the identification result for verifying user.
In an alternative embodiment, if there are same characters to occur more than once in the verifying voice messaging, such as occur0,1,5 and 8 all occur 2 times respectively in verifying voice messaging as shown in Figure 2, then can be corresponding according to character 0 twiceThe feature vector that handles of the sound bite similarity with the feature vector of character 0 in preset registration voice messaging respectivelyThe average value of score, as character 0 in the feature vector of character 0 in this verifying voice messaging and preset registration voice messagingFeature vector similarity score, and so on.
It should be pointed out that measuring the mode of the similarity between two feature vectors there are also very much, the above is only this hairsA kind of embodiment of bright offer, those skilled in the art may not need creative labor on the basis of scheme disclosed by the inventionThe similarity point of more feature vectors for calculating verifying voice messaging and registering the character shared in voice messaging is obtained dynamiclySeveral modes, the present invention is without exhaustion.
To the corresponding sound bite of character each in the verifying voice messaging of, the present embodiment by obtaining verifying userVocal print feature is verified the corresponding feature vector of each character in voice messaging in conjunction with the UBM training of preset respective symbols,And by will verify the corresponding feature vector of each character in voice messaging with register the features of respective symbols in voice messaging toAmount carries out similarity-rough set, so that it is determined that the user identity of verifying user, which to the user characteristics vector that compares withSpecific character is corresponding, fully takes into account vocal print feature when user reads aloud kinds of characters, so as to effectively improve Application on Voiceprint Recognition standardTrue rate.
Fig. 5 is the voiceprint registration flow diagram that user is registered in the embodiment of the present invention, in the present embodiment as shown in the figureVoiceprint registration process may include:
S501 obtains registration user and reads aloud and registers voice messaging caused by the second character string, second character string withFirst character string possesses at least one identical character.
The registration user is the user for determining legal identity, and second character string is for acquiring registration user's vocal printThe character string of feature vector can be randomly generated, and be also possible to preset a character string of fixation.Specifically, describedTwo character strings also may include m character, wherein there is n mutually different characters, m, n are positive integer, and m >=n.
In an alternative embodiment, voice print identification device can be generated and show second character string, allows and registers user's rootIt is read aloud according to second character string of display.
S502, to it is described registration voice messaging carry out speech recognition obtain it is described registration voice messaging in include respectively withThe corresponding sound bite of multiple characters in second character string;
Voice print identification device can be filtered by speech recognition and intensity of sound, and the verifying voice messaging is dividedTo the corresponding sound bite of multiple characters, invalid voice segment can also optionally be weeded out, be not involved in subsequent processedJourney.
S503 extracts the vocal print feature of the corresponding sound bite of each character in registration voice messaging.
Specifically, voice print identification device can extract the MFCC (Mel in the corresponding sound bite of each characterFrequency Cepstrum Coefficient, mel cepstrum coefficients) or PLP (Perceptual LinearPredictive perceives linear predictor coefficient), the vocal print feature as sound bite corresponding to each character.
S504, according to the vocal print feature of the corresponding sound bite of character each in registration voice messaging, in conjunction with preset phaseCharacter corresponding universal background model training is answered to obtain the corresponding feature vector of each character in registration voice messaging.
The expression formula of UBM can be with reference to embodiment above.The step of voiceprint registration process and Application on Voiceprint Recognition processS204 is similar, voice print identification device can will register the vocal print feature of the corresponding sound bite of each character in voice messaging asTraining sample data, using maximal posterior probability algorithm (Maximum A Posteriori, MAP) to preset respective symbols pairThe parameter for the universal background model answered is adjusted, i.e., in the sound that will register the corresponding sound bite of each character in voice messagingAfter line feature substitutes into formula (1) as input sample, by constantly adjusting the corresponding universal background model of preset respective symbolsParameter, so that posterior probability P (x) is maximum, so as to which the maximum parameter of posterior probability P (x) is determining to register voice according to makingThe corresponding feature vector of respective symbols in information.
And since the mean value of Gauss module each in UBM model can be used for distinguishing the identity information of speaker, vocal print is knownOther device can be adopted using the vocal print feature of the corresponding sound bite of character each in registration voice messaging as training sample dataWith maximal posterior probability algorithm (Maximum A Posteriori, MAP) to the corresponding common background mould of preset respective symbolsThe mean value super vector of type is adjusted, i.e., makees in the vocal print feature that will register the corresponding sound bite of each character in voice messagingAfter substituting into formula (1) for input sample, by constantly adjusting mean value super vector, so that posterior probability P (x) is maximum, so as to incite somebody to actionSo that the maximum mean value super vector of posterior probability P (x) is as the corresponding feature vector of respective symbols in registration voice messaging.
It, can be using following formula to the equal of the corresponding universal background model of preset respective symbols in another alternative embodimentValue super vector is adjusted, so that the posterior probability of the corresponding universal background model of respective symbols adjusted is maximum:
M=m+T ω, wherein M represents the mean value super vector of the universal background model of some character adjusted, and m, which is represented, to be adjustedThe mean value super vector of the universal background model of respective symbols before whole, T are preset super vector subspace matrices, and ω is to registerThe corresponding feature vector of respective symbols in voice messaging will register the corresponding sound bite of each character in voice messagingAfter vocal print feature substitutes into formula (1) as input sample, by constantly adjust the mean value that ω may be implemented in adjustment type (1) surpass toAmount, so that posterior probability P (x) is maximum, so as to which the maximum ω of posterior probability P (x) will be made as in registration voice messagingThe corresponding feature vector of respective symbols.
Fig. 6 is the flow diagram of the method for recognizing sound-groove in another embodiment of the present invention, in the present embodiment as shown in the figureMethod for recognizing sound-groove may include following below scheme:
S601, it is random to generate the first character string and shown.
S602 obtains verifying user and reads aloud verifying voice messaging caused by the first character string.
S603 identifies efficient voice segment and invalid voice segment in the verifying voice messaging.
Specifically, can be divided according to intensity of sound to verifying voice, the lesser sound bite of intensity of sound is regardedFor invalid voice segment (for example including mute section and impulsive noise).
S604, to the efficient voice segment carry out speech recognition obtain respectively with multiple words in first character stringAccord with corresponding sound bite.
Sound bite corresponding with multiple characters in first character string respectively can be obtained by speech recognition.
S605 determines the sequence and first character string of the sound bite of multiple characters in the verifying voice messagingIn respective symbols sequence it is consistent.
In order to after effectively avoiding the voice messaging of registration user from being copied illegally or illegally copied to carry out Application on Voiceprint Recognition, can be withIt generates the first different character strings at random every time, and judges the sound bite of multiple characters in verifying voice messaging in this stepSequence it is whether consistent with the sequence of respective symbols in the first character string, if inconsistent, may determine that Application on Voiceprint Recognition fail,If consistent with the sequence of the respective symbols in the first character string, follow-up process is executed.
S606 extracts the vocal print feature of the corresponding sound bite of each character.
Specifically, voice print identification device can extract the MFCC (Mel in the corresponding sound bite of each characterFrequency Cepstrum Coefficient, mel cepstrum coefficients) or PLP (Perceptual LinearPredictive perceives linear predictor coefficient), the vocal print feature as sound bite corresponding to each character.
S607, using the vocal print feature of the corresponding sound bite of character each in verifying voice messaging as number of trainingAccording to being adjusted using mean value super vector of the maximal posterior probability algorithm to the corresponding universal background model of preset respective symbolsIt is whole, so that estimation is verified the corresponding feature vector of each character in voice messaging.
Due to largely test the mean value for demonstrating each Gauss module in UBM model with paper can be used for distinguish speakThe identity information of people, voice print identification device can be by the vocal print features of the corresponding sound bite of character each in verifying voice messagingAs training sample data, using maximal posterior probability algorithm (Maximum A Posteriori, MAP) to preset corresponding wordThe mean value super vector for according with corresponding universal background model is adjusted, i.e., will verify the corresponding language of each character in voice messagingAfter the vocal print feature of tablet section substitutes into formula (1) as input sample, by constantly adjusting mean value super vector, so that posterior probability P(x) maximum, so as to which the maximum mean value super vector of posterior probability P (x) will be made as respective symbols in verifying voice messagingCorresponding feature vector.
In another alternative embodiment, the slow problem of high-dimensional bring convergence rate in order to reduce super vector, vocal printIdentification device can be adjusted the mean value super vector of the corresponding universal background model of preset respective symbols using following formula, makeThe posterior probability for obtaining the corresponding universal background model of respective symbols adjusted is maximum:
M=m+T ω, wherein M represents the mean value super vector of the universal background model of some character adjusted, and m, which is represented, to be adjustedThe mean value super vector of the universal background model of respective symbols before whole, T are preset super vector subspace matrices, and ω is to verifyThe corresponding feature vector of respective symbols in voice messaging will verify the corresponding sound bite of each character in voice messagingAfter vocal print feature substitutes into formula (1) as input sample, by constantly adjust the mean value that ω may be implemented in adjustment type (1) surpass toAmount, so that posterior probability P (x) is maximum, so as to which the maximum ω of posterior probability P (x) will be made as in verifying voice messagingThe corresponding feature vector of respective symbols.
S608 calculates the corresponding feature vector of each character and phase in preset registration voice messaging in verifying voice messagingThe similarity score of the corresponding feature vector of character is answered, if similarity score reaches default verifying thresholding, it is true user will to be verifiedIt is set to the corresponding registration user of registration voice messaging.
In the present embodiment, voice print identification device can calculate in verifying voice messaging the corresponding feature vector of each character withCOS distance value in preset registration voice messaging between the corresponding feature vector of respective symbols as the similarity score,It is calculate by the following formula spy of some character respectively in verifying voice messaging in corresponding feature vector and registration voice messagingLevy the similarity score between vector:
Wherein, subscript i indicates i-th of verifying voice messaging and registers the character shared in voice messaging, ωi(tar) tableShow the character corresponding feature vector, ω in verifying voice messagingi(test) indicate that the character is right in registration voice messagingThe feature vector answered.If verifying in voice messaging and registration voice messaging includes multiple identical characters, can be according to above formulaThe similarity score for each character being calculated takes mean value, if the similarity score mean value of each character reaches corresponding defaultThresholding is verified, then the verifying user is determined as the corresponding registration user of the registration voice messaging.Multidigit is registered if it existsUser, such as registration user A, B and C shown in FIG. 1, can be according to the feature vector and each note for verifying some character of userThe similarity of the feature vector of the respective symbols of volume user, when the feature vector and verifying language of the respective symbols of some registration userThe similarity score highest and similarity of the feature vector of the character of sound reach default verifying thresholding, then make registration userFor the identification result for verifying user.
To which, the present embodiment will be by that will verify the corresponding feature vector of each character in voice messaging and register voice messagingThe feature vector of middle respective symbols carries out similarity-rough set, and combines the timing judgement of sound bite, can further reallyProtect the accuracy of the user identity of verifying user.
Fig. 7 is the structural schematic diagram of one of embodiment of the present invention voice print identification device, in the present embodiment as shown in the figureVoice print identification device may include:
Voice obtains module 710, reads aloud for acquisition verifying user and verifies voice messaging caused by the first character string.
Verifying user, that is, unknown identity user, needs to verify its user identity by voice print identification device.It is describedFirst character string is that the character string of authentication is carried out for verifying user, can be randomly generated, and is also possible to default solidA fixed character string, such as the second character string corresponding with pre-generated registration voice messaging are one at least partly identicalCharacter string.Specifically, the character string may include m character, wherein there is n mutually different characters, m, n are positive wholeNumber, and m >=n.
For example, the first character string is " 12358948 ", totally 8 characters, include 7 kinds of mutually different characters " 1 ", " 2 ",“3”、“4”、“5”、“8”、“9”。
Sound bite identification module 720 obtains the verifying language for carrying out speech recognition to the verifying voice messagingThe sound bite corresponding with multiple characters in first character string respectively for including in message breath.
As shown in figure 3, sound bite identification module 720 can be filtered by speech recognition and intensity of sound, it will be describedVerifying voice messaging divides to obtain the corresponding sound bite of multiple characters, can also optionally weed out invalid voice segment,It is not involved in subsequent treatment process.
In an alternative embodiment, the sound bite identification module can further include as shown in Figure 8:
Effective segment recognition unit 721, for identification the efficient voice segment in the verifying voice messaging and invalid languageTablet section.
Specifically, effectively segment recognition unit 721 can divide verifying voice according to intensity of sound, sound is strongIt spends lesser sound bite and is considered as invalid voice segment (for example including mute section and impulsive noise).
Voice recognition unit 722 obtains respectively for carrying out speech recognition to the efficient voice segment with described firstThe corresponding sound bite of multiple characters in character string.
Vocal print feature extraction module 730, for extracting the sound of the corresponding sound bite of each character in verifying voice messagingLine feature.
Specifically, vocal print feature extraction module 730 can extract the MFCC (Mel in the corresponding sound bite of each characterFrequency Cepstrum Coefficient, mel cepstrum coefficients) or PLP (Perceptual LinearPredictive perceives linear predictor coefficient), the vocal print feature as sound bite corresponding to each character.
Characteristic model training module 740, for the vocal print feature according to the corresponding sound bite of each character, in conjunction withThe corresponding universal background model training of preset respective symbols is verified the corresponding feature vector of each character in voice messaging.
Characteristic model training module 740 can be special by the vocal print of the corresponding sound bite of character each in verifying voice messagingSign is used as training sample data, using maximal posterior probability algorithm (Maximum A Posteriori, MAP) to preset correspondingThe parameter of the corresponding universal background model of character is adjusted, i.e., will verify the corresponding voice sheet of each character in voice messagingAfter the vocal print feature of section substitutes into formula (1) as input sample, by constantly adjusting the corresponding common background of preset respective symbolsThe parameter of model, so that posterior probability P (x) is maximum, so that characteristic model training module 740 can be according to making posterior probability P(x) maximum parameter determines the corresponding feature vector of respective symbols in verifying voice messaging.
Due to largely test the mean value for demonstrating each Gauss module in UBM model with paper can be used for distinguish speakThe identity information of people, we define the mean value super vector of UBM model are as follows:
To which characteristic model training module 740 can be by the corresponding sound bite of character each in verifying voice messagingVocal print feature is as training sample data, using maximal posterior probability algorithm (Maximum A Posteriori, MAP) to defaultThe mean value super vector of the corresponding universal background model of respective symbols be adjusted, i.e., will verify each character in voice messagingAfter the vocal print feature of corresponding sound bite substitutes into formula (1) as input sample, by constantly adjusting mean value super vector, so that afterProbability P (x) maximum is tested, characteristic model training module 740 can will be so that the maximum mean value super vector conduct of posterior probability P (x)Verify the corresponding feature vector of respective symbols in voice messaging.
In another alternative embodiment, the slow problem of high-dimensional bring convergence rate in order to reduce super vector, wePass through principal component analytical method based on probability (PPCA, probabilistic principal component analysis)The variation range of mean value super vector is limited in a sub-spaces, characteristic model training module 740 can be by verifying voice letterThe vocal print feature of the corresponding sound bite of each character is as training sample data in breath, using maximal posterior probability algorithm to pre-If the mean value super vector of the corresponding universal background model of respective symbols be adjusted, and combine preset super vector subspace squareBattle array is to be verified the corresponding feature vector of each character in voice messaging.In the specific implementation, characteristic model training module 740The mean value super vector of the corresponding universal background model of preset respective symbols can be adjusted using following formula, so that after adjustmentThe corresponding universal background model of respective symbols posterior probability it is maximum:
M=m+T ω, wherein M represents the mean value super vector of the universal background model of some character adjusted, and m, which is represented, to be adjustedThe mean value super vector of the universal background model of respective symbols before whole, T are preset super vector subspace matrices, and ω is to verifyThe corresponding feature vector of respective symbols in voice messaging will verify the corresponding sound bite of each character in voice messagingAfter vocal print feature substitutes into formula (1) as input sample, by constantly adjust the mean value that ω may be implemented in adjustment type (1) surpass toAmount, so that posterior probability P (x) is maximum, so as to which the maximum ω of posterior probability P (x) will be made as in verifying voice messagingThe corresponding feature vector of respective symbols.The super vector subspace matrices T be according to the mean value of the gauss hybrid models surpass toWhat the correlation determination in amount between each dimension vector obtained.
Similarity judgment module 750 for the corresponding feature vector of character each in calculating verifying voice messaging and is presetRegistration voice messaging in the corresponding feature vector of respective symbols similarity score.
Specifically, voice print identification device can the voiceprint registration stage get registration user registration voice messaging,It is available and by sound bite identification module 720, vocal print feature extraction module 730 and characteristic model training module 740Register the corresponding feature vector of sound bite of each character in voice messaging.The registration voice messaging can be vocal print knowledgeOther device obtains registration user and reads aloud registration voice messaging, second character string and described first caused by the second character stringCharacter string possesses at least one identical character, i.e., described corresponding second character string of registration voice messaging and first characterIt goes here and there at least partly identical.And then in an alternative embodiment, voice print identification device can also obtain the registration voice letter from outsideAfter the corresponding feature vector of respective symbols in breath, i.e. registration user are by other equipment typing registration voice messaging, other are setStandby or server is extracted by vocal print feature and vocal print model training obtains the voice sheet of each character in registration voice messagingThe corresponding feature vector of section, voice print identification device from other equipment or server by getting in the registration voice messagingThe corresponding feature vector of respective symbols, thus verifying user identification stage similarity judgment module 750 to testThe corresponding feature vector of each character is compared in card voice messaging.
In the specific implementation, the similarity score is that voice print identification device is corresponding by each character in verifying voice messagingAfter feature vector feature vector corresponding with respective symbols in preset registration voice messaging is compared, identical characters are measuredThe score value of similarity degree between two feature vectors.In an alternative embodiment, similarity judgment module 750 can calculate verifyingThe corresponding feature vector of each character feature vector corresponding with respective symbols in preset registration voice messaging in voice messagingBetween COS distance value as the similarity score, that is, be calculate by the following formula some character respectively verifying voice messaging inThe similarity score between feature vector in corresponding feature vector and registration voice messaging:
Wherein, subscript i indicates i-th of verifying voice messaging and registers the character shared in voice messaging, ωi(tar) tableShow the character corresponding feature vector, ω in verifying voice messagingi(test) indicate that the character is right in registration voice messagingThe feature vector answered.In an alternative embodiment, if there are same characters to occur more than once in the verifying voice messaging, such asOccur in verifying voice messaging as shown in Figure 20,1,5 and 8 all to occur respectively 2 times, then can be according to character 0 twiceThe feature vector that corresponding sound bite is handled respectively with it is preset registration voice messaging in character 0 feature vector phaseLike the average value of degree score, in the feature vector and preset registration voice messaging as character 0 in this verifying voice messagingThe similarity score of the feature vector of character 0, and so on.
It should be pointed out that measuring the mode of the similarity between two feature vectors there are also very much, the above is only this hairsA kind of embodiment of bright offer, those skilled in the art may not need creative labor on the basis of scheme disclosed by the inventionThe similarity point of more feature vectors for calculating verifying voice messaging and registering the character shared in voice messaging is obtained dynamiclySeveral modes, the present invention is without exhaustion.
Subscriber identification module 760, if reaching default verifying thresholding for the similarity score, by the verifying userIt is determined as the corresponding registration user of the registration voice messaging.
If verifying in voice messaging and registration voice messaging includes multiple identical characters, subscriber identification module 760 canMean value is taken with the similarity score for each character being calculated according to similarity judgment module 750, if each character is similarDegree score mean value reaches corresponding default verifying thresholding, then it is corresponding the verifying user to be determined as the registration voice messagingRegister user.Multidigit registers user if it exists, such as registration user A, B and C shown in FIG. 1, and subscriber identification module 760 can be withAccording to the similarity of the feature vector of verifying some character of user and the feature vector of the respective symbols of each registration user, when certainIt is a registration user respective symbols feature vector and verifying voice the character feature vector similarity score highest andSimilarity reaches default verifying thresholding, then using registration user as the identification result of verifying user.
And then in an alternative embodiment, the voice obtains module 710, is also used to obtain registration user and reads aloud the second characterVoice messaging is registered caused by string, second character string possesses at least one identical character with first character string;
The sound bite identification module 720 is also used to obtain registration voice messaging progress speech recognition describedThe sound bite corresponding with multiple characters in second character string respectively for including in registration voice messaging;
The vocal print feature extraction module 730 is also used to extract the corresponding voice sheet of each character in registration voice messagingThe vocal print feature of section;
The characteristic model training module 740 is also used to according to the corresponding language of character each in the registration voice messagingThe vocal print feature of tablet section obtains each in registration voice messaging in conjunction with the corresponding universal background model training of preset respective symbolsThe corresponding feature vector of a character.
In an alternative embodiment, voice print identification device further can also include:
Character sequence determining module 770, for determining the sound bite for verifying multiple characters in voice messagingIt sorts consistent with the sequence of respective symbols in first character string.
In order to after effectively avoiding the voice messaging of registration user from being copied illegally or illegally copied to carry out Application on Voiceprint Recognition, can be withIt generates the first different character strings at random every time, and judges the sound bite of multiple characters in verifying voice messaging in this stepSequence it is whether consistent with the sequence of respective symbols in the first character string, if inconsistent, may determine that Application on Voiceprint Recognition fail,If consistent with the sequence of the respective symbols in the first character string, vocal print feature extraction module 730 or characteristic model can be notifiedTraining module 740 is executed for the feature extraction of the verifying voice messaging and vocal print training.
In an alternative embodiment, voice print identification device further can also include:
Character string display module 700, for generating first character string at random and being shown.
To the corresponding sound bite of character each in the verifying voice messaging of, the present embodiment by obtaining verifying userVocal print feature is verified the corresponding feature vector of each character in voice messaging in conjunction with the UBM training of preset respective symbols,And by will verify the corresponding feature vector of each character in voice messaging with register the features of respective symbols in voice messaging toAmount carries out similarity-rough set, so that it is determined that the user identity of verifying user, which to the user characteristics vector that compares withSpecific character is corresponding, fully takes into account vocal print feature when user reads aloud kinds of characters, so as to effectively improve Application on Voiceprint Recognition standardTrue rate.
In actual test example, (wherein the test of identities match is 1 in 1000 people's training samples, 290,000 testsTen thousand times or so, test is mismatched about at 280,000 times), it can be realized under one thousandth error rate 79.8% recall rate, wait wrong generalRate (EER, Equal Error Rate) is 3.39%, and compared to traditional unrelated modeling method of text, Application on Voiceprint Recognition performance is mentionedIt rises more than 40% or more.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be withRelevant hardware is instructed to complete by computer program, the program can be stored in a computer-readable storage mediumIn, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, the storage medium can be magneticDish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random AccessMemory, RAM) etc..
The above disclosure is only the preferred embodiments of the present invention, cannot limit the right model of the present invention with this certainlyIt encloses, therefore equivalent changes made in accordance with the claims of the present invention, is still within the scope of the present invention.