CN103456314A

Movatterモバイル変換

Info

Publication number: CN103456314A
Application number: CN2013103948340A
Authority: CN
Inventors: 王鲜
Original assignee: Guangzhou Skyworth Flat Display Technology Co Ltd
Current assignee: Guangzhou Skyworth Flat Display Technology Co Ltd
Priority date: 2013-09-03
Filing date: 2013-09-03
Publication date: 2013-12-18
Anticipated expiration: 2033-09-03
Also published as: CN103456314B

Abstract

The invention is applicable to the field of communication, and provides an emotion recognition method and device. A first emotion value is obtained according to obtained emotion characteristics. Meanwhile, a second emotion value is obtained according to obtained voice signals; at the same time, an emotion word is found in voice test recognized through a voice recognition technology, and therefore a third emotion value corresponding to the emotion word is obtained. Then, according to the sum of the product of the first emotion value and a first weight, the product of the second emotion value and a second weight, and the product of the third emotion value and a third weight, an emotion value of a user can be determined. Therefore, according to the emotion value of the user, a mobile terminal can judge the current mood of the user and execute preset operation.

Description

A kind of emotion identification method and device

Technical field

The invention belongs to the communications field, relate in particular to a kind of emotion identification method and device.

Background technology

Speech emotional is to carry out the mutual of emotion communication with mobile terminal and user alternately, to allow mobile terminal can identify also perception user's mood, understand user's happiness, anger, grief and joy, and provide corresponding emotion response, thereby eliminate the estrangement between user and mobile terminal.At present, the speech emotional interaction technique has entered the every field such as industry, household electrical appliances, communication, automotive electronics, consumption electronic product.But existing speech emotional interaction technique has following defect:

Current interactive voice is mainly to rest on conversational mode, take the form of " question-response " as main;

It is stiff that reciprocal process shows slightly, and television terminal can't be given the feedback of user's emotion information;

And television terminal can not be understood user's happiness, anger, grief and joy fully.

Thereby the mutual experience sense of speech emotional is poor, shortcoming mutual understanding truly; Still there is certain estrangement between user and mobile terminal.

Summary of the invention

The object of the present invention is to provide a kind of method of emotion recognition, identify user's emotion with the facial image by obtaining the user and sound.

On the one hand, a kind of emotion identification method, described emotion identification method comprises:

The facial image got is carried out to affection recognition of image, obtain the first emotion value;

The voice signal got is carried out to speech emotional identification, obtain the second emotion value;

Find out the emotion word to obtain the 3rd emotion value that described emotion word is corresponding from speech text, described speech text generates by adopting speech recognition technology to process described voice signal;

According to described the first emotion value be multiplied by the first weight, described the second emotion value be multiplied by the second weight and described the 3rd emotion value be multiplied by the 3rd weight and, determine the user feeling value.。

On the one hand, the present invention also provides a kind of emotion recognition device, and described emotion recognition device comprises:

The first emotion value cell, carry out affection recognition of image for the facial image to getting, and obtains the first emotion value;

The second emotion value cell, carry out speech emotional identification for the voice signal to getting, and obtains the second emotion value;

The 3rd emotion value cell, for finding out the emotion word from speech text to obtain the 3rd emotion value that described emotion word is corresponding, described speech text generates by adopting speech recognition technology to process described voice signal;

The user feeling value cell, for according to described the first emotion value, be multiplied by the first weight, described the second emotion value be multiplied by the second weight and described the 3rd emotion value be multiplied by the 3rd weight and, determine the user feeling value.

On the one hand, the present invention also provides a kind of mobile terminal, and described mobile terminal comprises above-mentioned emotion recognition device.

In the present invention, the camera by mobile terminal obtains described facial image; If the expressive features of the expressive features that finds from the expressive features storehouse and extract from described facial image coupling, export described the first emotion value corresponding to described expressive features.Simultaneously, the identical phonetic feature of phonetic feature comprised with the phonetic feature storehouse if extract from described voice signal, export described the second emotion value that described phonetic feature is corresponding; Meanwhile, find out the emotion word to obtain the 3rd emotion value that described emotion word is corresponding from speech text, in described emotion tree, find out with described speech text in the emotion word that mates most of the emotion word meaning of a word, the position of the described emotion word mated most according to the meaning of a word in comprising the emotion tree of described emotion word, determine described the 3rd emotion value.Then, according to described the first emotion value be multiplied by the first weight, described the second emotion value be multiplied by the second weight and described the 3rd emotion value be multiplied by the 3rd weight and, determine the user feeling value.Thereby mobile terminal can judge user's mood at that time according to the user feeling value, carries out default operation.

The accompanying drawing explanation

In order to be illustrated more clearly in the technical scheme in the embodiment of the present invention, below will the accompanying drawing of required use in embodiment or description of the Prior Art be briefly described, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain according to these accompanying drawings other accompanying drawing.

Fig. 1 is the workflow diagram of the emotion identification method that provides of the embodiment of the present invention one;

Fig. 2 is the composition structural drawing of the emotion recognition device that provides of the embodiment of the present invention two.

Embodiment

In order to make purpose of the present invention, technical scheme and advantage clearer, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein, only in order to explain the present invention, is not intended to limit the present invention.

For technical solutions according to the invention are described, below by specific embodiment, describe.

embodiment mono-:

A kind of emotion identification method that the present embodiment provides, Fig. 1 shows the realization flow of this emotion identification method, for convenience of description, only shows the part relevant to the embodiment of the present invention.

A kind of emotion identification method, described emotion identification method comprises:

Step S11, carry out affection recognition of image to the facial image got, and obtains the first emotion value;

Step S12, carry out speech emotional identification to the voice signal got, and obtains the second emotion value;

Step S13 finds out the emotion word to obtain the 3rd emotion value that described emotion word is corresponding from speech text, and described speech text generates by adopting speech recognition technology to process described voice signal;

Step S14, according to described the first emotion value be multiplied by the first weight, described the second emotion value be multiplied by the second weight and described the 3rd emotion value be multiplied by the 3rd weight and, determine the user feeling value.

It should be noted that, the mankind are the sentient animals of tool, often show abundant emotion; The language of in this simultaneously, linking up for the mankind also comprises the abundant emotion word with emotion.Wherein, described emotion word is comprised of commendatory term, derogatory term and neutral words three classes.Can real-time judge go out user's emotion for the ease of mobile terminal, define the emotion value.In the present embodiment, the emotion value of commendatory term is positive number, and the emotion value of derogatory term is negative, and the emotion value of neutral words is 0; And, along with the emotion color of commendatory term is strong gradually, corresponding emotion value is increasing; And, along with the emotion color of derogatory term is strong gradually, corresponding emotion value is more and more less; For example: the emotion value of neutral words is 0, and worried emotion value is-0.5, and angry emotion value is-1, and joyful emotion value is 0.5, and glad emotion value is 1.

It should be noted that, described the first emotion value, described the second emotion value, described the 3rd emotion value or described user feeling value are described emotion value." first " wherein, " second ", " the 3rd " are only acute pyogenic infection of finger tip.

In addition, also it should be noted that, described emotion word can also be divided into by building form: have the emotion word of negative prefixes and do not have the emotion word of negative prefixes; Wherein, add in the described emotion word front that does not there is negative prefixes the emotion word generated after negative prefixes, the sign of its emotion value is contrary with the emotion value of the described emotion word that does not have a negative prefixes, and for example: glad emotion value is 1, and emotion value out of sorts is-1.

It should be noted that, after described the first weight, described the second weight and described the 3rd weight addition and be 1; In order to improve the accuracy of emotion recognition, identify and identify three classes identifications by the emotion word for affection recognition of image, speech emotional and there is different accuracy, the weight (comprising described the first weight, described the second weight and described the 3rd weight) that adjustment is given respectively described the first emotion value, described the second emotion value and described the 3rd emotion value.

Like this, the Real-time Obtaining facial image, and adopt the affection recognition of image technology to identify user's expression to the facial image got, and record this expression with the emotion word, thereby can obtain the emotion value that described emotion word is corresponding, for example: the image that the user who gets has a broad smile on one's face, the affection recognition of image technology identifies the user for glad, records corresponding emotion value and is: 1.

Meanwhile, gather user's voice signal, and adopt the speech emotional recognition technology to identify user's emotion to the voice signal got, and record this emotion with the emotion word, thus the emotion value that described emotion word is corresponding can be obtained, for example: if the user shouts the people loudly, get high-decibel and rapid voice signal, after adopting the speech emotional recognition technology to process, be judged to be anger, the emotion value of record is-1.

Meanwhile, to the voice signal gathered, adopt speech recognition technology to process, identify the voice content that comprises in voice signal and with the speech text record; Then, from described speech text, search whether there is the emotion word, if exist, record the emotion value that described emotion word is corresponding.For example: during user's indignation, say " I am very angry ", voice signal to the correspondence that collects adopts speech recognition technology to be identified, recognition result is recorded in speech text, in speech text, record " I am very angry ", owing to comprising " anger " this emotion word in " I am very angry ", the emotion value of recording is-1.

After for described the first emotion value, described the second emotion value and described the 3rd emotion value, determining respectively described the first weight, described the second weight and described the 3rd weight, calculate the user feeling value; Described user feeling value has given expression to user's mood at that time; Thereby, realize in real time according to the user expression and voice at that time, adopt the emotion identification method of the present embodiment to judge user's mood, thereby mobile terminal can be adjusted the interface of presenting to the user or carry out default operation.

As one embodiment of the invention, because mobile terminal carries camera, thereby, described the facial image got is carried out to affection recognition of image, before obtaining the step of the first emotion value, described emotion identification method also comprises:

Camera by mobile terminal obtains described facial image.

Like this, the camera that can carry by mobile terminal is taken pictures or is recorded a video the user in real time, to obtain user's facial image.

As one embodiment of the invention, because mobile terminal carries microphone, thereby, described the voice signal got is carried out to speech emotional identification, before obtaining the step of the second emotion value, described emotion identification method also comprises:

Microphone by mobile terminal obtains described voice signal.

Like this, the microphone by mobile terminal can Real-time Obtaining to the speaking of user, and output voice signal.

Preferably, described the facial image got is carried out to affection recognition of image, obtains the step of the first emotion value, be specially:

If the expressive features of the expressive features that finds from the expressive features storehouse and extract from described facial image coupling, export described the first emotion value corresponding to described expressive features.

It should be noted that, described expressive features storehouse is for needing to set up in advance according to coupling; Wherein, the user that described expressive features storehouse need to be mated for each, all store the expressive features of this user's various expressions; Wherein, described expressive features comprises: the conversion of the conversion at canthus, the corners of the mouth in people's face, the big or small expressive features information such as conversion show one's teeth.

It should be noted that, described expressive features storehouse can be set up in this locality, also can in server, set up, and for example in Cloud Server, sets up described expressive features storehouse.

Be based upon local situation for described expressive features storehouse, according to matching result, the emotion value corresponding to described expressive features that can directly obtain coupling usingd as described the first emotion value.

Situation about setting up at server for described expressive features storehouse, for the facial image got at every turn, after the expressive features gone out in this facial image to be extracted, all need to send this expressive features to server end; Server travels through described expressive features storehouse so that described expressive features is mated, wherein, matching process is to search to meet default matching threshold (wherein, described matching threshold is determined according to degree of accuracy and the coupling demand in the expressive features storehouse of setting up) expressive features, if find the expressive features of coupling, export emotion value corresponding to described expressive features feedback; After the emotion recognition device receives described emotion value, using described emotion value as described the first emotion value.

Preferably, described the voice signal got is carried out to speech emotional identification, obtains the step of the second emotion value, be specially:

The identical phonetic feature of phonetic feature comprised with the phonetic feature storehouse if extract from described voice signal, export described the second emotion value that described phonetic feature is corresponding.

It should be noted that, described phonetic feature storehouse is: therefore the module library of setting up for the phonetic feature coupling needs to meet coupling and requires and set up in advance; Wherein, the user that described phonetic feature storehouse need to be mated for each, all store this user's various phonetic features; Wherein, described phonetic feature comprises: the phonetic feature information such as the intensity of the single character that the intensity of whole voice and duration, speech recognition technology identify and duration; For example: the user that need to mate for each, the phonetic features such as the word speed while gathering different emotions (angry, worried, neutral, joyful, glad etc. emotion), sound size.

It should be noted that, described phonetic feature storehouse can be set up in this locality, also can in server, set up, and for example in Cloud Server, sets up described phonetic feature storehouse.

Be based upon local situation for described phonetic feature storehouse, according to matching result, the emotion value corresponding to described expressive features that can directly obtain coupling usingd as described the second emotion value.

Situation about setting up at server for described phonetic feature storehouse, all store the voice signal at every turn got, and described audio file be sent to server end with the form of audio file; Server travels through described phonetic feature storehouse and is mated with the described phonetic feature in audio file, wherein, matching process is to search to meet default voice match threshold value (wherein, described voice match threshold value is determined according to degree of accuracy and the coupling demand in the phonetic feature storehouse of setting up) phonetic feature, or find the phonetic feature of coupling, export emotion value feedback that described phonetic feature is corresponding; After the emotion recognition device receives described emotion value, using described emotion value as described the second emotion value.

Preferably, the method for calculating described the 3rd emotion value comprises:

Determine one or more seed emotion words, and take respectively each described seed emotion word and set up emotion tree as root node, root node and the child node of described emotion tree are the emotion word with same class emotion;

Position according to described emotion word in the emotion tree that comprises described emotion word, determine described the 3rd emotion value.

It should be noted that, for the ease of the emotion value of judgement emotion word, set up the emotion tree, and set up respectively an emotion tree for commendatory term, derogatory term and neutral words.

In the emotion tree that comprises all commendatory terms, from commendatory term, a selected word is as root node, all emotion words that emotion is stronger than this root node are divided into a zone, choose the child node of an emotion word as root node from this zone, emotion according to described child node, the child node that the emotion word stronger than described child node is divided into a sub regions and the emotion word selecting to make new advances is usingd as described child node by emotion in described zone, another child node that the emotion word by emotion in described zone a little less than than child node is divided into another subregion and the emotion word selecting to make new advances is usingd as described child node, meanwhile, all emotion words by emotion a little less than than this root node are divided into another zone, choose an emotion word another child node as root node from this zone, emotion according to described another child node, by the child node that strong emotion word is divided into a sub regions and the emotion word selecting to make new advances is usingd as described another child node than described another child node of emotion in described zone, another child node that emotion word by emotion in described zone a little less than than child node is divided into another subregion and the emotion word selecting to make new advances is usingd as described another child node, by that analogy, set up complete emotion tree.For example: selected " happiness ", as root node, its corresponding emotion value is: 1; Select " very delight " child node as root node, its emotion value is: 1.5; Select " joyful " child node as root node, its emotion value is: 0.5, by that analogy, set up the emotion tree.

In like manner, the method for the emotion tree that comprises all commendatory terms with foundation is analogized, and sets up the emotion tree that comprises all derogatory terms, and sets up the emotion tree that comprises all neutral words.

After establishing emotion tree, if find out the emotion word identical with emotion word in described speech text in described emotion tree, directly obtain the emotion value that this emotion word is corresponding.

As one embodiment of the invention, in order to reduce the node of emotion tree, reduce the time of carrying out matched and searched in the emotion tree, the method for calculating described the 3rd emotion value also comprises:

In described emotion tree, find out with described speech text in the emotion word that mates most of the emotion word meaning of a word;

Described according to described emotion word the position in the emotion that comprises described emotion word tree, determine and be specially the step of described the 3rd emotion value:

The position of the described emotion word mated most according to the meaning of a word in comprising the emotion tree of described emotion word, determine described the 3rd emotion value.

Concrete, after in described speech text, finding the emotion word, according to the literal sense of this emotion word of putting down in writing in dictionary, search the node with the immediate emotion word of the literal sense of described emotion word place in described emotion tree; Particular location according to this node in described emotion tree, determine the 3rd emotion value of the described emotion word in described speech text.For example: the emotion value of neutral words is 0, and worried emotion value is-0.5, and angry emotion value is-1, and joyful emotion value is 0.5, and glad emotion value is 1; If the literal sense of the emotion word " indignation " that speech text comprises is the most approaching with " anger ", the emotion value of " indignation " is-1.

embodiment bis-:

Fig. 2 shows the composition structure of the emotion recognition device that the embodiment of the present invention two provides, and for convenience of description, only shows the part relevant to the embodiment of the present invention.

It should be noted that, the emotion identification method that the emotion recognition device that the present embodiment provides and embodiment mono-provide is mutually applicable.

A kind of emotion recognition device, described emotion recognition device comprises:

The firstemotion value cell 21, carry out affection recognition of image for the facial image to getting, and obtains the first emotion value;

The secondemotion value cell 22, carry out speech emotional identification for the voice signal to getting, and obtains the second emotion value;

The 3rdemotion value cell 23, for finding out the emotion word from speech text to obtain the 3rd emotion value that described emotion word is corresponding, described speech text generates by adopting speech recognition technology to process described voice signal;

Userfeeling value cell 24, for according to described the first emotion value, be multiplied by the first weight, described the second emotion value be multiplied by the second weight and described the 3rd emotion value be multiplied by the 3rd weight and, determine the user feeling value.

Preferably, described the firstemotion value cell 21, specifically for:

Preferably, described the secondemotion value cell 22, specifically for:

Preferably, described the 3rdemotion value cell 23 also comprises the 3rd emotionvalue computing unit 231, and described the 3rd emotion value computing unit specifically comprises:

Emotion tree is set upunit 2311, for determining one or more seed emotion words, and take respectively each described seed emotion word and sets up the emotion tree as root node, and root node and the child node of described emotion tree are the emotion word with same class emotion;

Determiningunit 2313, for the position in the emotion tree that comprises described emotion word according to described emotion word, determine described the 3rd emotion value.

Preferably, described the 3rd emotion value computing unit also comprises:

Matching unit 2312, in described emotion tree, find out with described speech text in the emotion word that mates most of the emotion word meaning of a word;

Described determiningunit 2313, specifically for:

It should be noted that, each functional unit that the emotion recognition device that the present embodiment provides comprises can be the unit that software unit, hardware cell or software and hardware combine, and also can be used as independently suspension member and is integrated in mobile terminal or runs in the application system of this mobile terminal.

As one embodiment of the invention, the present invention also provides a kind of mobile terminal, and described mobile terminal comprises the emotion recognition device that embodiment bis-provides.Wherein said mobile terminal comprises that intelligent TV set, smart mobile phone, IPAD, intelligent robot etc. have the terminal of interactive function.

In embodiments of the present invention, the camera by mobile terminal obtains described facial image; If the expressive features of the expressive features that finds from the expressive features storehouse and extract from described facial image coupling, export described the first emotion value corresponding to described expressive features.Simultaneously, the identical phonetic feature of phonetic feature comprised with the phonetic feature storehouse if extract from described voice signal, export described the second emotion value that described phonetic feature is corresponding; Meanwhile, find out the emotion word to obtain the 3rd emotion value that described emotion word is corresponding from speech text, in described emotion tree, find out with described speech text in the emotion word that mates most of the emotion word meaning of a word, the position of the described emotion word mated most according to the meaning of a word in comprising the emotion tree of described emotion word, determine described the 3rd emotion value.Then, according to described the first emotion value be multiplied by the first weight, described the second emotion value be multiplied by the second weight and described the 3rd emotion value be multiplied by the 3rd weight and, determine the user feeling value.Thereby mobile terminal can judge user's mood at that time according to the user feeling value, carries out default operation.

It will be appreciated by those skilled in the art that the unit that comprises for above-described embodiment two is just divided according to function logic, but be not limited to above-mentioned division, as long as can realize corresponding function; In addition, the concrete title of each functional unit also, just for the ease of mutual differentiation, is not limited to protection scope of the present invention.

Those of ordinary skills it is also understood that, realize that all or part of step in above-described embodiment method is to come the hardware that instruction is relevant to complete by program, described program can be in being stored in a computer read/write memory medium, described storage medium, comprise ROM/RAM, disk, CD etc.

Above content is in conjunction with concrete preferred implementation further description made for the present invention, can not assert that specific embodiment of the invention is confined to these explanations.For the general technical staff of the technical field of the invention; make without departing from the inventive concept of the premise some alternative or obvious modification that are equal to; and performance or purposes identical, all should be considered as belonging to the present invention's scope of patent protection definite by submitted to claims.

Claims

1. an emotion identification method, is characterized in that, described emotion identification method comprises:

According to described the first emotion value be multiplied by the first weight, described the second emotion value be multiplied by the second weight and described the 3rd emotion value be multiplied by the 3rd weight and, determine the user feeling value.

2. emotion identification method as claimed in claim 1, is characterized in that, described the facial image got carried out to affection recognition of image, obtains the step of the first emotion value, is specially:

3. emotion identification method as claimed in claim 1, is characterized in that, described the voice signal got carried out to speech emotional identification, obtains the step of the second emotion value, is specially:

4. emotion identification method as claimed in claim 1, is characterized in that, the method for calculating described the 3rd emotion value comprises:

5. emotion identification method as claimed in claim 4, is characterized in that, the method for calculating described the 3rd emotion value also comprises:

6. an emotion recognition device, is characterized in that, described emotion recognition device comprises:

7. emotion recognition device as claimed in claim 6, is characterized in that, described the first emotion value cell, specifically for:

8. emotion recognition device as claimed in claim 6, is characterized in that, described the second emotion value cell, specifically for:

9. emotion recognition device as claimed in claim 6, is characterized in that, described the 3rd emotion value cell also comprises the 3rd emotion value computing unit, and described the 3rd emotion value computing unit specifically comprises:

Emotion tree is set up unit, for determining one or more seed emotion words, and take respectively each described seed emotion word and sets up the emotion tree as root node, and root node and the child node of described emotion tree are the emotion word with same class emotion;

Determining unit, for the position in the emotion tree that comprises described emotion word according to described emotion word, determine described the 3rd emotion value.

10. emotion recognition device as claimed in claim 9, is characterized in that, described the 3rd emotion value computing unit also comprises:

Matching unit, in described emotion tree, find out with described speech text in the emotion word that mates most of the emotion word meaning of a word;

Described determining unit, specifically for: