A kind of small-sized speaker's emotion recognition systemTechnical field
what the present invention relates to is a kind of speech emotion recognition system, particularly one small-sized speaker's emotion recognition system.
background technology
voice are interpersonal important means exchanged, and sound is the carrier of information, and people can obtain information wherein by sound, wherein naturally comprise emotion information.Voice are a kind of important tool that the mankind exchange mutually, are also the important media of one transmitting emotion.The not just Word message that voice signal comprises, is also mingled with the emotion of people simultaneously.In short equally, wherein can comprise different emotions, and emotion is different, so the meaning of the words just likely changes, if computing machine cannot obtain its emotion from the voice of operator, so just can not reach best communicative effect, even likely can misunderstand to the meaning of operator, thus generation misoperation, make troubles to operator.
speech processing is an important field of research, and research history existing very long so far, the emotion research of voice signal is then an emerging field, but it is a research topic combining multiclass subject.Wherein mainly contain the important subjects such as physiology, psychology and signal transacting.Achievement in research-speech emotion recognition the system of this problem has quite broad application prospect simultaneously, specifically can be applied in:
whether 1, distance network teaching, can add emotion recognition system in distance education system, proper by judging the emotional expression of learner, and learner can be allowed better to improve Reading ability with enriching emotion.
2, for criminal investigation field, emotion recognition system can be made into an a lie detector, utilizes it to infer the language really degree of tester.Along with improving constantly of technology, constantly can improve the function of a lie detector and use it in reality, therefore emotion recognition system also has considerable practical significance for criminal investigation field.
3, amusement game, at present great majority game is all conveyed a message by word, if add the emotion recognition of voice in gaming and express, can the transfer mode of abundant information, also more can attract player simultaneously.Can alleviate the fatigue strength of player in game process to a certain extent by the mode of this novelty, player also can obtain the sense of hearing and visual enjoyment simultaneously, adds the played degree of game.
summary of the invention
the object of this invention is to provide and a kind ofly utilize a small-sized emotional speech Cooley to do training sample with it as voice, for building reference template, to people's emotion recognition system that the discrimination of often kind of emotion is added up.
the object of the present invention is achieved like this: first step work of the present invention is on the basis of reading domestic and international great mass of data, establish a small-sized emotional speech storehouse, wherein will do training sample, for building reference template by a part of voice; Another part does test sample book, tests for follow-up emotion recognition.Second step carries out pre-service to the voice obtained in sound bank, and its step mainly comprises pre-emphasis, windowing framing and speech terminals detection.3rd step be to pre-service after voice signal carry out the extraction work of emotion parameter, emotion parameter comprises fundamental frequency, resonance peak, mel-frequency cepstrum coefficient and pertinent statistical parameters thereof.With software, emulation experiment is carried out to the extraction of parameter, obtain the distribution range of the parameters of different emotions type, and concise and to the point analysis is carried out to result.4th step carries out speech emotion recognition experiment, classified by the emotion classifiers of the emotion parameter of training utterance based on support vector machine, predict afterwards with it to tested speech again, judges which kind of emotion it belongs to.After experiment, the discrimination of often kind of emotion is added up, final statistics is analyzed.Finally, for whole system devises a simple man-machine interface, this interface can complete input test voice, display system to the recognition result of these voice and the function that empties result.
oneself records a small-scale Chinese emotional speech storehouse, and in storehouse, the emotion of voice is divided into four classes: happy, angry, sad, surprised.Producer is 6 people is all boy student, and everyone reads aloud by 4 kinds of emotions respectively to 4 speech texts, and often kind of emotion reads aloud 4 times, altogether obtains 384 samples and uses emotional speech storehouse as experiment.Adopt the method for SVM to classify to emotion, wherein SVM adopts " one to one " method to solve polytypic problem.Finally respectively with the prosodic features of voice comprise the correlation parameter of fundamental tone and resonance peak, phonetic feature MFCC correlation parameter and both be combined as affective characteristics and identify, and carried out analyzing contrast to recognition result.In experiment, when identifying by whole 11 parameters, the average recognition rate of final 4 kinds of obtained emotions is 79.15%, and sad discrimination is up to 83.3%.Find simultaneously, the most easily occur to identify phenomenon between these two kinds of emotions happy and angry by mistake.
Accompanying drawing explanation
fig. 1 is speech emotion recognition process flow diagram.
Embodiment
below in conjunction with accompanying drawing citing, the present invention is described in more detail:
embodiment 1
composition graphs 1, Fig. 1 is speech emotion recognition process flow diagram.1, the acquisition in emotional speech storehouse.Because current speech emotion recognition is all for other country's language, it is relatively less that Chinese research in this respect is then carried out, and can not find the Chinese emotional speech storehouse that is specifically designed to emotion recognition.Therefore the beam worker carried out before Study of recognition is exactly the emotional speech storehouse that oneself records a small-scale Chinese, then carries out follow-up study based on this sound bank.2, the pre-service of voice signal.Due to voice signal, can not extracting directly affective characteristics parameter wherein for the voice signal in sound bank, a step front-end processing be must first carry out, pre-emphasis, windowing framing and end-point detection comprised.3, the extraction of affective characteristics parameter.Be then extract the affective characteristics parameter in signal after pre-service, wherein mainly comprise two kinds, a class is acoustical characteristic parameters, comprises 12 rank MFCC parameter and formant parameters.Another kind of is prosodic features parameter, comprises the fundamental frequency of voice, short-time energy, the parameters such as average zero-crossing rate.And carried out refinement on this basis, finally have chosen fundamental frequency mean value, maximal value, minimum value, the first resonance peak mean value, maximal value, and the 10th of MFCC the, 11,12 parameters are as affective characteristics parameter.4, the design of emotion classifiers.Present invention employs the design of the speech emotional sorter based on support vector machine (Support Vector Machine), because current svm is only applicable to two classification, and if many classification will be realized, then need to design a svm between every two samples, when needs are classified to unknown sample, then to finally determine its classification by voting.Method that Here it is so-called " one to one ".