The method of sound beauty and emotion modificationTechnical field
The invention belongs to emotion, voice recognition and acoustic processing field, the speech processes for mainly being heard user areVoice wanting mood containing user, wanting to hear sound type, while being also required to carry out accent and unclear placeDenoising meets user demand, improves the satisfaction of user so that becoming apparent from of listening of user.
Background technique
With artificial intelligent voice identification field rapid development, Google, Xun Feideng enterprise field of speech recognitionThrough being made that very big achievement, other language can be then converted to by identifying that voice is translated into text;At present household,Electric appliance, mobile phone etc. can be by its behaviors of voice control, such as can directly be passed through by the switch of sound control air-conditioningSiri informs the contact person for oneself wanting to make, and meeting automatically dial, these are all the development step by step of speech recognition.
When everyone has unsatisfactory, if at this moment criticism also severe by others, can add again a lamination to himPower.And we are as the higher organism for having emotion, it is desirable to if hearing special emotional expression under some specific scenesLanguage or the sound of some different tone colors, tone are to user so combining the emotion of people with voice recognition with processingIt is a kind of to enjoy well.
In daily call, we are frequently encountered due to dialect, mandarin is nonstandard or the influence of ambient noise,Causing our call has a little difficult or obstacle, and in order to solve the problems, we can be increased by sound beautyPotent fruit.
Summary of the invention
Technical problem: the invention discloses a kind of methods of sound beauty and emotion modification, mainly by the demand of user,Acoustic processing and emotion modification are carried out to voice, change tone color, tone and the emotion for being included originally of original voice,And it can also denoise so that the voice heard is apparent understandable.
Technical solution: in the presence of solving the problems, such as above-mentioned background technique, the present invention propose a kind of sound beauty withThe method of emotion modification.Firstly, it is necessary to acquire voice data, each word is identified by speech recognition, is paid attention to bandThe identification for the voice having an accent;Then gone out according to the relative intensity between conjunctive word and the spaced markings between word and wordEmphasis vocabulary;Secondly emotion keynote is established according to the tongue of the intonation of each word, the power of sound and sentence entirety;Then according to above data accumulation, sound is handled, changes the feelings of primitive sound by intonation, sound intensity, interval etc.Thread, and beauty can be carried out to sound by collecting the acoustic information of special people, so that it is sounded like the sound of certain star;MostAfterwards, white noise, or the identification of enhancing sound can be eliminated to the result finally exported.The present invention not only can satisfy user to thinkingThe demand of listening, and the mood that can also be spoken by adjusting other side oneself more comfortably to loosen.
Architecture
(1) voice data is acquired by speech recognition, fuzzy diagnosis will be carried out (if comprising the country for the voice data having an accentOuter dialect then needs to inquire dialect phonetic database in the process, to be more accurately determined the semanteme that the user is spokenAnd the meaning of a word), characteristic quantity, which is converted, by the sound of input is conveniently further processed.
(2) go out emphasis vocabulary according to the relative intensity between conjunctive word and the spaced markings between word and word,The secondary tongue according to the intonation of each word, the power of sound and sentence entirety establishes emotion keynote.It can be according to passRelative intensity between keyword identifies the rough idea of fuzzy sentence, and the interval between word and word can be to avoid different wordsLinking together has the different meanings, helps to differentiate which word should form a word, and give expression to the meaning of this word.It is rightIn every words, each word even each word, intonation and strong and weak difference can give expression to different emotion, we can rootDetermine that user hears or oneself word is that kind of a kind of emotion be full of according to this basis, and it can also be according to thisIt is a little to make it with mood required for user because usually changing voice.Specific practice is according to being stored between pronunciation and phonemeTransformation rule or pronunciation and phone string between transformation rule transformation rule storage unit in the transformation rule that stores, will beThe pronunciation for being stored with the identification word stored in this storage unit of the identification of pronunciation of identification word is converted into phone string.Standard is extracted againMode is finally attached.To identification word pronunciation more than in the case where also very be applicable in.
(3) according to above data accumulation, sound is handled, primitive is changed by intonation, sound intensity, interval etc.The mood of sound, and beauty can be carried out to sound by collecting the acoustic information of special people, so that it is sounded like certain star'sSound;It a little says in detail, acquires the data of some especial sounds, such as tone, audio, tone color, the language of certain host's soundAdjust etc., user can be adjusted according to these obtained data and wishes that Duan Yuyin changed, its various values are carried outModification, to meet user's needs to the full extent to the greatest extent.Explain in detail are as follows: database is saved as to the voice data possessed,Their some features are converted to parameter deposit, when user requires to change, can be changed by changing these parametersThe audibility of sound;We are also necessary necessary not only for sound transformation model, emotion transformation model is established.First obtain instructionPractice data (duration alignment can be done to inputoutput data according to dynamic time warping algorithm), then it is pre-processed, extractsThe emotion influence factor (tone, interval etc. of speaking including words) of training data, according to the ginseng of initialization sound transformation modelNumber, training pattern are established, because the model can be neural network model, are made of encoder, each encoder represents a certainThe assertive evidence space of the similar original pronunciation people voice messaging of class, needs to convert the spectrum signature of its voice signal.
(whereinIndicate the output of n-th of eigenspace model of input coding layer i,Indicate input layer i'sN-th of eigenspace model for network parameter, δ indicate excitation function).
(4) it eliminates the effects of the act the various noise noises (white noise or other colored noises) of effect to the result finally exported,Or the identification of enhancing sound.Noise is eliminated by signal processing, it can be by acoustically exporting and the space to movable bodyThere is the phase of the noise of the inside leakage the sound of opposite phase to eliminate noise.So that sound beauty and it is changeable in mood after languageSound is more clear, and makes user acoustically also more comfortable.
Beneficial effect
(1) be conducive to user and adjust own self emotion, build comfortable auditory envelopment;
(2) new entertainment environment is manufactured to user, is allowed and oneself is changed the sound of other people or oneself by the demand of oneselfAnd the emotion contained in sound;
(3) exchange of two people to converse mutually can be promoted to a certain extent.
Detailed description of the invention
Fig. 1 is the implementation flow chart of the method for sound beauty and emotion modification.
Specific embodiment
(1) by speech recognition acquire voice data, for the voice data having an accent to carry out fuzzy diagnosis (if comprisingDialect both domestic and external then needs to inquire dialect phonetic database in the process, is spoken being more accurately determined the userThe semantic and meaning of a word), characteristic quantity, which is converted, by the sound of input is conveniently further processed.
(2) go out emphasis vocabulary according to the relative intensity between conjunctive word and the spaced markings between word and word,The secondary tongue according to the intonation of each word, the power of sound and sentence entirety establishes emotion keynote.It can be according to passRelative intensity between keyword identifies the rough idea of fuzzy sentence, and the interval between word and word can be to avoid different wordsLinking together has the different meanings, helps to differentiate which word should form a word, and give expression to the meaning of this word.It is rightIn every words, each word even each word, intonation and strong and weak difference can give expression to different emotion, we can rootDetermine that user hears or oneself word is that kind of a kind of emotion be full of according to this basis, and it can also be according to thisIt is a little to make it with mood required for user because usually changing voice.Specific practice is according to being stored between pronunciation and phonemeTransformation rule or pronunciation and phone string between transformation rule transformation rule storage unit in the transformation rule that stores, will beThe pronunciation for being stored with the identification word stored in this storage unit of the identification of pronunciation of identification word is converted into phone string.Standard is extracted againMode is finally attached.To identification word pronunciation more than in the case where also very be applicable in.
(3) according to above data accumulation, sound is handled, primitive is changed by intonation, sound intensity, interval etc.The mood of sound, and beauty can be carried out to sound by collecting the acoustic information of special people, so that it is sounded like certain star'sSound;It a little says in detail, acquires the data of some especial sounds, such as tone, audio, tone color, the language of certain host's soundAdjust etc., user can be adjusted according to these obtained data and wishes that Duan Yuyin changed, its various values are carried outModification, to meet user's needs to the full extent to the greatest extent.Explain in detail are as follows: database is saved as to the voice data possessed,Their some features are converted to parameter deposit, when user requires to change, can be changed by changing these parametersThe audibility of sound;We are also necessary necessary not only for sound transformation model, emotion transformation model is established.First obtain instructionPractice data (duration alignment can be done to inputoutput data according to dynamic time warping algorithm), then it is pre-processed, extractsThe emotion influence factor (tone, interval etc. of speaking including words) of training data, according to the ginseng of initialization sound transformation modelNumber, training pattern are established, because the model can be neural network model, are made of encoder, each encoder represents a certainThe assertive evidence space of the similar original pronunciation people voice messaging of class, needs to convert the spectrum signature of its voice signal.
(whereinIndicate the output of n-th of eigenspace model of input coding layer i,Indicate the of input layer iN eigenspace model for network parameter, δ indicate excitation function).
(4) it eliminates the effects of the act the various noise noises (white noise or other colored noises) of effect to the result finally exported,Or the identification of enhancing sound.Noise is eliminated by signal processing, it can be by acoustically exporting and the space to movable bodyThere is the phase of the noise of the inside leakage the sound of opposite phase to eliminate noise.So that sound beauty and it is changeable in mood after languageSound is more clear, and makes user acoustically also more comfortable.