Movatterモバイル変換


[0]ホーム

URL:


CN110287323B - Target-oriented emotion classification method - Google Patents

Target-oriented emotion classification method
Download PDF

Info

Publication number
CN110287323B
CN110287323BCN201910568300.2ACN201910568300ACN110287323BCN 110287323 BCN110287323 BCN 110287323BCN 201910568300 ACN201910568300 ACN 201910568300ACN 110287323 BCN110287323 BCN 110287323B
Authority
CN
China
Prior art keywords
target sequence
text
word
words
emotion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910568300.2A
Other languages
Chinese (zh)
Other versions
CN110287323A (en
Inventor
顾凌云
王洪阳
严涵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Bingjian Information Technology Co ltd
Original Assignee
Chengdu Bingjian Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Bingjian Information Technology Co ltdfiledCriticalChengdu Bingjian Information Technology Co ltd
Priority to CN201910568300.2ApriorityCriticalpatent/CN110287323B/en
Publication of CN110287323ApublicationCriticalpatent/CN110287323A/en
Application grantedgrantedCritical
Publication of CN110287323BpublicationCriticalpatent/CN110287323B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

The invention discloses a target-oriented emotion classification method, which belongs to the technical field of big data and comprises the steps of establishing a client server and a central server, wherein the client server is used for collecting text information and sending the text information to the central server; the method and the device have the advantages that the preprocessing module, the GloVe model module, the position information coding module, the attention coder and the classifier module are built in the central server, the technical problem that a target sequence to be analyzed is extracted, target-oriented emotion analysis is completed aiming at the target sequence is solved, and the method and the device are suitable for scenes needing fine-grained emotion analysis.

Description

Target-oriented emotion classification method
Technical Field
The invention belongs to the technical field of big data, and particularly relates to an emotion classification method for targets.
Background
Emotion classification, also known as view mining, is a field of natural language processing that is used to extract the view of related things in a text passage and identify their emotional tendencies. At present, emotion classification technology is applied to more and more fields, with the continuous increase of public information on the internet, a large number of texts with subjective emotion colors exist on social networks, news resources, businesses and government system platforms, and by utilizing emotion analysis technology, viewpoint extraction, analysis and processing can be carried out on unstructured text information, and structured data are finally obtained and visually displayed to users.
At present, there are two main categories of techniques used for emotion classification: machine learning based methods and deep learning based methods. The machine learning-based method is mature in technology development so far, and a Bayes classifier, a support vector machine, a logistic regression and the like are common, when the traditional machine learning method is used for emotion classification, a model learns subjective emotion in a text according to input text characteristics, a classifier is obtained through training, the final classification error is minimized, and finally the classifier can be used for analyzing emotion in other texts and obtaining results. In the process, the characteristic engineering is a very important link, the quality of the characteristics directly influences the final expression effect of the classifier, and the method has the greatest defects that: the features of the input classifier need to be manually designed and extracted, such as the dictionary rule-based text keyword extraction technology widely used at present, and the keywords are converted into vector representation by using the technology of TF-IDF and the like and used as the features of the text. Because different keyword extraction rules need to be formulated in different fields, the workload of the method is huge, and the quality of extracted features cannot be ensured.
The method based on deep learning uses a neural network technology to extract the characteristics of a text, and trains a classifier to finish the emotion classification task on the extracted characteristics, and the common neural network for processing the text mainly comprises the following steps: a recurrent neural network, a convolutional neural network, etc.
Whether the method is based on machine learning or deep learning, the method is currently directed to emotion analysis of a whole text or a sentence, and belongs to a single mode, if a sentence comprises a plurality of targets to be analyzed, and different targets have different emotion colors, at the moment, the result obtained by analysis by the coarse-grained emotion classification method is no longer accurate, and fine-grained emotion classification needs to be performed on different target objects.
Disclosure of Invention
The invention aims to provide a target-oriented emotion classification method, which solves the technical problems that a target sequence to be analyzed is extracted and target-oriented emotion analysis is completed aiming at the target sequence.
In order to achieve the purpose, the invention adopts the following technical scheme:
a target-oriented emotion classification method comprises the following steps:
step 1: establishing a client server and a central server, wherein the client server is used for collecting text information and sending the text information to the central server;
establishing a preprocessing module, a GloVe model module, a position information coding module, an attention coder and a classifier module in a central server;
step 2: after the central server acquires the text information, the central server preprocesses the text data with subjective emotional colors in the text information through a preprocessing module, and respectively shows text sentences and target sequences in the text data, and the method specifically comprises the following steps:
step A1: establishing a Chinese stop word dictionary, deleting stop words contained in the text data according to the Chinese stop word dictionary, and deleting incomplete text data contained in the text data according to the Chinese stop word dictionary to obtain original sentence data;
step A2: taking the sentences with emotional colors in the original sentence data as targets to be detected, establishing target sequences for the targets to be detected, and extracting the target sequences to obtain subsequences of the target sequences corresponding to the original sentence data;
step A3: carrying out serialization operation on the original sentence data and the target sequence to complete the serialization operation of the text data;
and step 3: the GloVe model module pre-trains a language model by using a GloVe word representation tool, obtains the feature representation of the word vector of the original sentence data and the target sequence by using the language model, and captures the wide semantic features among words;
and 4, step 4: the position information coding module codes the position information of the context words in the original sentence data relative to the target sequence, and calculates the position weight of each word in the original sentence data, and the method specifically comprises the following steps:
step B1: specifying that words closer to the target sequence contribute more to their computation of their sentiment values, and words further from the target sequence contribute less to their sentiment values;
step B2: calculating the position distance of each word in the context relative to the target sequence to obtain position distance information, if a target sequence consists of a plurality of words and a certain context belongs to the target sequence, the position distance between the context and the target sequence is 0, and calculating the position weight of all the context words relative to the target sequence through the position distance information;
and 5: respectively encoding the word vectors of the original sentence data and the target sequence by using an attention encoder, and specifically comprising the following steps:
step C1: combining the position distance information with the original sentence data to update word vectors, so that each word vector in the context coded by the Glove word representation tool can embody the position distance information between the word vector and the target sequence;
step C2: the learning of text semantics is completed by using a long-short term network and an attention mechanism, and the process comprises the following steps:
step Y1: using Bi-LSTM to learn the meaning of the text words from the forward direction and the reverse direction respectively, and combining word vectors obtained by the forward direction and the reverse direction respectively to form a final text word vector;
step Y2: using an attention encoder to further learn the interrelation between each word in the text sentence and the target sequence respectively to obtain a final text feature vector;
step 6: the classifier module learns a classifier for the final text feature vector and calculates the emotion classification of the original sentence data, and the specific steps are as follows:
step D1: and (3) respectively calculating the positive emotion scores, the neutral emotion scores and the negative emotion scores of the text aiming at the target sequence through a layer of fully-connected neural network, and taking one item with the highest probability as an emotion classification result, wherein the specific calculation formula is as follows:
scorej=Wp·F+bp,j∈[1,3];
wherein,
Figure GDA0002622689690000031
and
Figure GDA0002622689690000032
is a parameter of a neuron between an input layer and an output layer of a neural network, and needs to be continuously changed in the training process of a model to finally reach a convergence state, scorejA score indicating that the text belongs to tag j, whereinJ takes the value 1, 2, 3 respectively represents the emotion value: positive, neutral, negative;
step D2: text emotion categories aiming at the target sequence are calculated through Softmax normalization, the emotion label with the maximum probability is extracted to serve as a text emotion value of the target sequence, and the formula is as follows:
Figure GDA0002622689690000041
preferably, when step a3 is executed, first, character statistics is performed on the original sentence data and the target sequence, a dictionary library is established, the dictionary library contains all words of the corpus, subscript indexes of the words in the original sentence data and the target sequence in the dictionary library are searched, and the serialization operation of the text data is completed.
Preferably, when step 3 is executed, the method comprises the following steps:
step S1: presetting a corpus, constructing a co-occurrence matrix according to the corpus, wherein each element in the co-occurrence matrix represents the co-occurrence frequency of a word in a context window with a specific size in context words, and specifically, defining an attenuation function for calculating weight according to the distance d between two words in the context window;
step S2: and constructing an approximate relation between the word vector and the co-occurrence matrix, wherein the relation can be represented by the following formula:
Figure GDA0002622689690000042
wherein,
Figure GDA0002622689690000043
and
Figure GDA0002622689690000044
is the word vector to be solved finally; biAnd
Figure GDA0002622689690000045
is the offset of two word vectorsAn item; i and j represent the number of word vectors, X, respectivelyijIs an output result;
step S3: the loss function J is constructed according to the following formula:
Figure GDA0002622689690000046
the loss function J uses the mean square error, while adding a weight function f (X)ij);
Weight function f (X)ij) The formula of (1) is as follows:
Figure GDA0002622689690000047
obtaining a word vector table of the corpus after being trained by a GloVe word representation tool, and setting the word vector table to be represented as
Figure GDA0002622689690000048
Wherein d isvIs the dimension of the word vector, | V | is the size of the entire dictionary library constructed above;
after words in the original sentence data are mapped into vectors by searching the word vector table, the text sentence is represented as
Figure GDA0002622689690000051
Similarly, words in the target sequence are searched in the word vector table to obtain a vectorized target sequence:
Figure GDA0002622689690000052
the target-oriented emotion classification method solves the technical problems that a target sequence to be analyzed is extracted and target-oriented emotion analysis is completed aiming at the target sequence, is suitable for scenes needing fine-grained emotion analysis, can realize emotion classification on texts with a plurality of targets to be analyzed and different targets having different emotion colors in sentences, and is more accurate and effective in extraction.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a flow chart of the location information encoding of the present invention;
FIG. 3 is a flow chart of the object-oriented text attention learning of the present invention.
Detailed Description
1-3, the emotion classification method for objects includes the following steps:
step 1: establishing a client server and a central server, wherein the client server is used for collecting text information and sending the text information to the central server;
establishing a preprocessing module, a GloVe model module, a position information coding module, an attention coder and a classifier module in a central server;
step 2: after the central server acquires the text information, the central server preprocesses the text data with subjective emotional colors in the text information through a preprocessing module, and respectively shows text sentences and target sequences in the text data, and the method specifically comprises the following steps:
step A1: establishing a Chinese stop word dictionary, deleting stop words contained in the text data according to the Chinese stop word dictionary, and deleting incomplete text data contained in the text data according to the Chinese stop word dictionary to obtain original sentence data;
the established Chinese stop word dictionary is represented as Ds={wd1,wd2,...,wdtAnd f, stopping word list filtering on the text, deleting incomplete text data with missing values, and obtaining an original sentence W ═ W { (W)1,w2,...,wnAnd simultaneously extracting a target sequence to be analyzed, wherein the target sequence is expressed as
Figure GDA0002622689690000062
Wherein WtIs a subsequence of W, and the goal of this embodiment is to predict the emotional tendency of the target sequence to be analyzed; whereint refers to target and represents the target sequence.
Step A2: taking the sentences with emotional colors in the original sentence data as targets to be detected, establishing target sequences for the targets to be detected, and extracting the target sequences to obtain subsequences of the target sequences corresponding to the original sentence data;
step A3: carrying out serialization operation on the original sentence data and the target sequence to complete the serialization operation of the text data;
firstly, carrying out character statistics on original sentence data and a target sequence, establishing a dictionary library, wherein the dictionary library contains all words of a training corpus, searching subscript indexes of the original sentence data and the words in the target sequence in the dictionary library, and finishing serialization operation of text data, for example, the serialized original sentence data is represented as I ═ I1,i2,...,inThen the target sequence is represented as
Figure GDA0002622689690000063
And step 3: the GloVe model module pre-trains a language model by using a GloVe word representation tool, obtains the feature representation of the word vector of the original sentence data and the target sequence by using the language model, and captures the wide semantic features among words;
when step 3 is executed, the method comprises the following steps:
step S1: presetting a corpus, constructing a co-occurrence matrix according to the corpus, wherein each element in the co-occurrence matrix represents the co-occurrence frequency of a word in a context window with a specific size in context words, and specifically, defining an attenuation function for calculating weight according to the distance d between two words in the context window;
step S2: and constructing an approximate relation between the word vector and the co-occurrence matrix, wherein the relation can be represented by the following formula:
Figure GDA0002622689690000061
wherein,
Figure GDA0002622689690000071
and
Figure GDA0002622689690000072
is the word vector that is to be solved at the end,
Figure GDA0002622689690000073
is wiTransposing; biAnd
Figure GDA0002622689690000074
is a bias term for two word vectors; i and j represent the number of word vectors, X, respectivelyijIs an output result;
step S3: the loss function J is constructed according to the following formula:
Figure GDA0002622689690000075
v represents the whole dictionary base, the loss function J uses the mean square error, and a weight function f (x) is added;
the formula of the weight function f (x) is as follows:
Figure GDA0002622689690000076
wherein x ismaxRepresenting the highest frequency of appearance in the context of a certain word and another word, wherein the value is set as 120, alpha is 0.76, a word vector table of a corpus is obtained after being trained by a Glove word representation tool, and the word vector table is set to be represented as
Figure GDA0002622689690000077
Wherein d isvIs the dimension of the word vector, | V | is the size of the entire dictionary library constructed above;
after words in the original sentence data are mapped into vectors by searching the word vector table, the text sentence is represented as
Figure GDA0002622689690000078
Similarly, words in the target sequence are searched in the word vector table to obtain a vectorized target sequence:
Figure GDA0002622689690000079
and 4, step 4: the position information coding module codes the position information of the context words in the original sentence data relative to the target sequence, and calculates the position weight of each word in the original sentence data, and the method specifically comprises the following steps:
step B1: specifying that words closer to the target sequence contribute more to their computation of their sentiment values, and words further from the target sequence contribute less to their sentiment values;
step B2: calculating the position distance of each word in the context relative to the target sequence to obtain position distance information, if a target sequence consists of a plurality of words and a certain context belongs to the target sequence, the position distance between the context and the target sequence is 0, and calculating the position weight of all the context words relative to the target sequence through the position distance information;
as shown in fig. 2, which is a flow chart of position information encoding, first, the position of the first character in the target sequence in the context in the whole original sentence data is calculated, the index number is recorded as k, then, the distance between each word in the calculation context and the position of the first character in the target sequence is calculated as l, assuming that the total length of the target sequence is m, the distance calculation formula is as follows:
Figure GDA0002622689690000081
wherein liIndicating the distance of the ith word in the current context from the target sequence.
As shown in the formula, the distance between the contexts on the left side of the target sequence is less than 0, the distance between the contexts on the right side of the target sequence is greater than 0, and the distance between all the words in the middle of the target sequence is set to 0, because it needs to be calculated that the influence of the upper and lower words on the emotion value of the target sequence, so that the context farther away from the target sequence contributes less to the emotion of the target sequence, and the context closer to the target sequence contributes more to the emotion of the target sequence, and the position weight of each word in the text sentence from the target sequence can be calculated by the following formula:
Figure GDA0002622689690000082
where n is the total length of a text, m is the total length of the target sequence, | liI is the absolute value of the distance, wiAnd the position weight of the ith word in the original sentence from the target sequence is represented.
And 5: respectively encoding the word vectors of the original sentence data and the target sequence by using an attention encoder, and specifically comprising the following steps:
step C1: combining the position distance information with the original sentence data to update word and word vectors, so that each word and word vector x in the context coded by the GloVe word representation tooliThe position distance information between the target sequence and the target sequence can be embodied, and the concrete implementation formula is as follows:
X=(x1·w1,x2·w2,...,xn·wn) (ii) a Wherein wiRepresenting the position weight of the ith word in the original sentence from the target sequence;
and X is an updated text sentence word vector, and the vector represented by the vector already contains the position distance information of the target sequence.
Step C2: the learning of text semantics is completed by using a long-short term network and an attention mechanism, and the process comprises the following steps:
step Y1: using Bi-LSTM to learn the meaning of the text words from the forward direction and the reverse direction respectively, and combining word vectors obtained by the forward direction and the reverse direction respectively to form a final text word vector;
Bi-LSTM (Bi-directional Long-Short Term Memory) also called bidirectional Long-Short Term Memory network can learn the meaning of text words from the forward direction and the reverse direction respectively, and combines word vectors obtained by the forward direction and the reverse direction learning respectively to form a final text word vector to finish the preliminary learning of sequence texts, wherein the following process is the process of single-direction text word vector learning:
fi=σ(Wf[xi,hi]+bf);
ii=σ(Wi[xi,hi]+bi);
Figure GDA0002622689690000091
oi=σ(Wo[xi,hi]+bo);
Figure GDA0002622689690000096
hi=tanh(Ci)*oi
wherein h isiIs a text word expression form obtained after one-time learning,
Figure GDA0002622689690000092
as a result of the forward learning,
Figure GDA0002622689690000093
for the result of backward learning, the bi-directional text word vector can be represented as:
Figure GDA0002622689690000094
wherein d is the number of neurons in the hidden layer in Bi-LSTM, from which it can be seen that
Figure GDA0002622689690000095
The text feature vector representing all words is obtained by carrying out Bi-LSTM coding on the target sequence in the same way
Figure GDA0002622689690000101
In this way, it can be seen that,
Figure GDA0002622689690000102
a text feature vector representation representing all sequences of the eye, wherein dtNumber of hidden layer neurons in LSTM for target sequence.
Step Y2: using an attention encoder to further learn the interrelation between each word in the text sentence and the target sequence respectively to obtain a final text feature vector;
fig. 3 is a flowchart of attention learning for target-oriented text, which includes the following steps:
and step 201, fine-grained calculation from the target sequence to the text statement. In order to incorporate the information of the target sequence into the text sentence, the relationship between the text sentence and the target sequence words can be calculated, and for each word in the text sentence, the attention coefficient between the word and all the words in the target sequence can be calculated, and the calculation formula is as follows:
Figure GDA0002622689690000103
wherein,
Figure GDA0002622689690000104
is a parameter of a linear transformation in order to transform the target sequence
Figure GDA0002622689690000105
Is converted into a sum text statement hiIn order to calculate a relevance score, followed by an attention coefficient:
Figure GDA0002622689690000106
wherein alpha isi,jWhich represents the attention coefficient between the ith word in the text sentence and the jth word in the target sequence, it follows that,
Figure GDA0002622689690000107
and finally, multiplying the attention coefficient and the text statement to obtain a context word vector based on the target sequence:
Figure GDA0002622689690000108
and 202, performing fine-grained calculation on the text sentence to the target sequence. In order to merge the text sentence information into the target sequence, the relationship between the target sequence and the text sentence words may be calculated, and for each word in the target sequence, the attention coefficient between the word and all words in the text sentence may be calculated, and the calculation formula is as follows:
Figure GDA0002622689690000111
wherein,
Figure GDA0002622689690000112
is a parameter of a linear transformation for transforming a text sentence hjConversion to and target sequence
Figure GDA0002622689690000113
In order to calculate a relevance score, followed by an attention coefficient:
Figure GDA0002622689690000114
wherein, betai,jRepresenting the attention coefficient between the ith word in the target sequence and the jth word in the text sentence, it follows that,
Figure GDA0002622689690000115
and finally, multiplying the attention coefficient by the target sequence to obtain a target sequence word vector based on the text sentence:
Figure GDA0002622689690000116
step 203, coarse-grained calculation of the target sequence to the text statement. After the attention calculation of the original text based on the target sequence and the target sequence based on the text sentence is completed, the two-way information between the original text based on the target sequence and the target sequence based on the text sentence is learned on the basis of the primary Bi-LSTM, the text sentence comprises the position and the target sequence information, the target sequence comprises the text sentence and the position information, and then the relation between the text sentence with the coarse granularity and the target sequence is learned on the basis of the fine granularity. The attention coefficient between the words with the maximum emotion contribution of each word in the text sentence to the target sequence is reserved, and the text features extracted by the LSTM are updated through the attention coefficients to obtain a new text sentence word vector ZacThe concrete formula is as follows:
αi=max(αi,:);
Figure GDA0002622689690000121
wherein alpha isi,:Representing the attention coefficient between the ith word in the text sentence and all the words in the target sequence, and the max () function represents the maximum attention coefficient.
And step 204, coarse-grained calculation of the text sentence to the target sequence. The attention coefficient of each word in the target sequence relative to the word which is most related to the emotion in the text sentence is reserved, and then the target sequence characteristics extracted by Bi-LSTM are updated through the attention coefficients to obtain a new target sequence word vector ZcaThe specific formula is as follows:
βi=max(βi,:);
Figure GDA0002622689690000122
and 205, performing feature synthesis on the fine-grained and coarse-grained processed word features of the text sentence based on the target sequence and the word features of the target sequence based on the original text to obtain final text features. Performing feature splicing on the word features of the text sentence based on the target sequence after the fine-grained processing and the word features of the target sequence based on the text sentence after the coarse-grained processing, and similarly, performing feature splicing on the word features of the target sequence based on the text sentence obtained after the fine-grained processing and the word features of the text sentence based on the target sequence obtained after the coarse-grained processing, wherein before the feature splicing, the four feature vectors need to be subjected to dimension reduction:
Figure GDA0002622689690000131
Figure GDA0002622689690000132
Figure GDA0002622689690000133
Figure GDA0002622689690000134
Figure GDA0002622689690000135
Figure GDA0002622689690000136
respectively obtaining text sentence word characteristics H ' and target sequence word characteristics H ' which are combined in coarse granularity and fine granularity 'tAnd finally, completing feature mixing between the two to obtain a final text feature vector, wherein the final text feature vector is expressed as F:
Figure GDA0002622689690000137
step 6: the classifier module learns a classifier for the final text feature vector and calculates the emotion classification of the original sentence data, and the specific steps are as follows:
step D1: and (3) respectively calculating the positive emotion scores, the neutral emotion scores and the negative emotion scores of the text aiming at the target sequence through a layer of fully-connected neural network, and taking one item with the highest probability as an emotion classification result, wherein the specific calculation formula is as follows:
scorej=Wp·F+bp,j∈[1,3];
wherein,
Figure GDA0002622689690000138
and
Figure GDA0002622689690000139
is a parameter of a neuron between an input layer and an output layer of a neural network, and needs to be continuously changed in the training process of a model to finally reach a convergence state, scorejAnd expressing the score of the text belonging to the label j, wherein j takes the value of 1, 2, and 3 respectively express the emotion value: positive, neutral, negative;
step D2: text emotion categories aiming at the target sequence are calculated through Softmax normalization, the emotion label with the maximum probability is extracted to serve as a text emotion value of the target sequence, and the formula is as follows:
Figure GDA0002622689690000141
the target-oriented emotion classification method solves the technical problems that a target sequence to be analyzed is extracted and target-oriented emotion analysis is completed aiming at the target sequence, is suitable for scenes needing fine-grained emotion analysis, can realize emotion classification on texts with a plurality of targets to be analyzed and different targets having different emotion colors in sentences, and is more accurate and effective in extraction.

Claims (3)

1. A target-oriented emotion classification method is characterized by comprising the following steps: the method comprises the following steps:
step 1: establishing a client server and a central server, wherein the client server is used for collecting text information and sending the text information to the central server;
establishing a preprocessing module, a GloVe model module, a position information coding module, an attention coder and a classifier module in a central server;
step 2: after the central server acquires the text information, the central server preprocesses the text data with subjective emotional colors in the text information through a preprocessing module, and respectively shows text sentences and target sequences in the text data, and the method specifically comprises the following steps:
step A1: establishing a Chinese stop word dictionary, deleting stop words contained in the text data according to the Chinese stop word dictionary, and deleting incomplete text data contained in the text data according to the Chinese stop word dictionary to obtain original sentence data;
step A2: taking the sentences with emotional colors in the original sentence data as targets to be detected, establishing target sequences for the targets to be detected, and extracting the target sequences to obtain subsequences of the target sequences corresponding to the original sentence data;
step A3: carrying out serialization operation on the original sentence data and the target sequence to complete the serialization operation of the text data;
and step 3: the GloVe model module pre-trains a language model by using a GloVe word representation tool, obtains the feature representation of the word vector of the original sentence data and the target sequence by using the language model, and captures the wide semantic features among words;
and 4, step 4: the position information coding module codes the position information of the context words in the original sentence data relative to the target sequence, and calculates the position weight of each word in the original sentence data, and the method specifically comprises the following steps:
step B1: specifying that words closer to the target sequence contribute more to their computation of their sentiment values, and words further from the target sequence contribute less to their sentiment values;
step B2: calculating the position distance of each word in the context relative to the target sequence to obtain position distance information, if a target sequence consists of a plurality of words and a certain context belongs to the target sequence, the position distance between the context and the target sequence is 0, and calculating the position weight of all the context words relative to the target sequence through the position distance information;
and 5: respectively encoding the word vectors of the original sentence data and the target sequence by using an attention encoder, and specifically comprising the following steps:
step C1: combining the position distance information with the original sentence data to update word vectors, so that each word vector in the context coded by the Glove word representation tool can embody the position distance information between the word vector and the target sequence;
step C2: the learning of text semantics is completed by using a long-short term network and an attention mechanism, and the method comprises the following steps:
step Y1: using Bi-LSTM to learn the meaning of the text words from the forward direction and the reverse direction respectively, and combining word vectors obtained by the forward direction and the reverse direction respectively to form a final text word vector;
step Y2: using an attention encoder to further learn the interrelation between each word in the text sentence and the target sequence respectively to obtain a final text feature vector;
step 6: the classifier module learns a classifier for the final text feature vector and calculates the emotion classification of the original sentence data, and the specific steps are as follows:
step D1: and (3) respectively calculating the positive emotion scores, the neutral emotion scores and the negative emotion scores of the text aiming at the target sequence through a layer of fully-connected neural network, and taking one item with the highest probability as an emotion classification result, wherein the specific calculation formula is as follows:
scorej=Wp·F+bp,j∈[1,3];
wherein,
Figure FDA0002622689680000021
and
Figure FDA0002622689680000022
is the parameter of the neuron between the input layer and the output layer of the neural network, and needs to be continuously trained in the training process of the modelThe variation eventually reaches a convergent state, scorejAnd expressing the score of the text belonging to the label j, wherein j takes the value of 1, 2, and 3 respectively express the emotion value: positive, neutral, negative;
f represents a text feature vector, d represents the distance between two words in a context window, and t represents target and represents a target sequence;
step D2: text emotion categories aiming at the target sequence are calculated through Softmax normalization, the emotion label with the maximum probability is extracted to serve as a text emotion value of the target sequence, and the formula is as follows:
Figure FDA0002622689680000023
2. the method of claim 1 for object-oriented emotion classification, characterized in that: when step a3 is executed, first, character statistics is performed on the original sentence data and the target sequence, a dictionary library is established, the dictionary library contains all words of the training corpus, subscript indexes of the original sentence data and each word in the target sequence in the dictionary library are searched, and the serialization operation of the text data is completed.
3. The method of claim 1 for object-oriented emotion classification, characterized in that: when step 3 is executed, the method comprises the following steps:
step S1: presetting a corpus, constructing a co-occurrence matrix according to the corpus, wherein each element in the co-occurrence matrix represents the co-occurrence frequency of a word in a context window with a specific size in context words, and specifically, defining an attenuation function for calculating weight according to the distance d between two words in the context window;
step S2: and constructing an approximate relation between the word vector and the co-occurrence matrix, wherein the relation can be represented by the following formula:
Figure FDA0002622689680000031
wherein,
Figure FDA0002622689680000032
and
Figure FDA0002622689680000033
is the word vector to be solved finally; biAnd
Figure FDA0002622689680000034
is a bias term for two word vectors; i and j represent the number of word vectors, X, respectivelyijIs an output result;
step S3: the loss function J is constructed according to the following formula:
Figure FDA0002622689680000035
the loss function J uses the mean square error, while adding a weight function f (x);
the formula of the weight function f (x) is as follows:
Figure FDA0002622689680000036
wherein, the value of alpha is 0.76;
obtaining a word vector table of the corpus after being trained by a GloVe word representation tool, and setting the word vector table to be represented as
Figure FDA0002622689680000037
Wherein d isvIs the dimension of the word vector, | V | is the size of the entire dictionary library constructed above;
after words in the original sentence data are mapped into vectors by searching the word vector table, the text sentence is represented as
Figure FDA0002622689680000041
In the same way, aim at eyesSearching words in the target sequence in a word vector table to obtain a vectorized target sequence:
Figure FDA0002622689680000042
CN201910568300.2A2019-06-272019-06-27Target-oriented emotion classification methodActiveCN110287323B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201910568300.2ACN110287323B (en)2019-06-272019-06-27Target-oriented emotion classification method

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201910568300.2ACN110287323B (en)2019-06-272019-06-27Target-oriented emotion classification method

Publications (2)

Publication NumberPublication Date
CN110287323A CN110287323A (en)2019-09-27
CN110287323Btrue CN110287323B (en)2020-10-23

Family

ID=68019310

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201910568300.2AActiveCN110287323B (en)2019-06-272019-06-27Target-oriented emotion classification method

Country Status (1)

CountryLink
CN (1)CN110287323B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112667803A (en)*2019-09-302021-04-16北京国双科技有限公司Text emotion classification method and device
CN110866405A (en)*2019-11-142020-03-06电子科技大学 An aspect-level sentiment classification method based on sentence information
CN111553148A (en)*2020-03-312020-08-18深圳壹账通智能科技有限公司Label establishing method and device, electronic equipment and medium
CN111539513A (en)*2020-04-102020-08-14中国检验检疫科学研究院Method and device for determining risk of imported animal infectious diseases
CN111552810B (en)*2020-04-242024-03-19深圳数联天下智能科技有限公司Entity extraction and classification method, entity extraction and classification device, computer equipment and storage medium
CN113221551B (en)*2021-05-282022-07-29复旦大学 A fine-grained sentiment analysis method based on sequence generation
CN113656560B (en)*2021-10-192022-02-22腾讯科技(深圳)有限公司Emotion category prediction method and device, storage medium and electronic equipment
CN114936283B (en)*2022-05-182023-12-26电子科技大学Network public opinion analysis method based on Bert
CN116052081B (en)*2023-01-102025-02-28山东高速建设管理集团有限公司 A method, system, electronic device and storage medium for real-time monitoring of site safety

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US9606988B2 (en)*2014-11-042017-03-28Xerox CorporationPredicting the quality of automatic translation of an entire document
US9786299B2 (en)*2014-12-042017-10-10Microsoft Technology Licensing, LlcEmotion type classification for interactive dialog system
CN107038154A (en)*2016-11-252017-08-11阿里巴巴集团控股有限公司A kind of text emotion recognition methods and device
CN107832400B (en)*2017-11-012019-04-16山东大学A kind of method that location-based LSTM and CNN conjunctive model carries out relationship classification
KR101851794B1 (en)*2017-12-222018-04-24주식회사 마인드셋Apparatus and Method for Generating Emotion Scores for Target Phrases
CN108829667A (en)*2018-05-282018-11-16南京柯基数据科技有限公司It is a kind of based on memory network more wheels dialogue under intension recognizing method
CN109710761A (en)*2018-12-212019-05-03中国标准化研究院 Sentiment analysis method based on attention-enhanced bidirectional LSTM model
CN109902177B (en)*2019-02-282022-11-29上海理工大学Text emotion analysis method based on dual-channel convolutional memory neural network

Also Published As

Publication numberPublication date
CN110287323A (en)2019-09-27

Similar Documents

PublicationPublication DateTitle
CN110287323B (en)Target-oriented emotion classification method
CN109948165B (en) Fine-grained emotion polarity prediction method based on hybrid attention network
CN110609897B (en)Multi-category Chinese text classification method integrating global and local features
CN109284506B (en)User comment emotion analysis system and method based on attention convolution neural network
Chen et al.Research on text sentiment analysis based on CNNs and SVM
CN111382565B (en) Method and system for extracting emotion-cause pairs based on multi-label
CN114091460B (en)Multitasking Chinese entity naming identification method
CN110321563B (en) Text Sentiment Analysis Method Based on Mixed Supervision Model
CN110929030A (en) A joint training method for text summarization and sentiment classification
CN110287320A (en) A deep learning multi-category sentiment analysis model combined with attention mechanism
CN108874896B (en)Humor identification method based on neural network and humor characteristics
CN110717843A (en) A Reusable Legal Article Recommendation Framework
CN112989830B (en) A Named Entity Recognition Method Based on Multiple Features and Machine Learning
CN114417851B (en)Emotion analysis method based on keyword weighted information
CN114722835A (en) Text emotion recognition method based on LDA and BERT fusion improved model
CN110134954A (en) A Named Entity Recognition Method Based on Attention Mechanism
CN112287106A (en) An online review sentiment classification method based on two-channel hybrid neural network
CN114416969B (en)LSTM-CNN online comment emotion classification method and system based on background enhancement
CN112100212A (en)Case scenario extraction method based on machine learning and rule matching
CN110569355A (en) A combined method and system for opinion target extraction and target sentiment classification based on lexical chunks
Zhang et al.Exploring deep recurrent convolution neural networks for subjectivity classification
CN118503358A (en)Class case matching method based on judge document structure and fusion case elements
CN112613316A (en)Method and system for generating ancient Chinese marking model
CN110046353A (en) An Aspect-Level Sentiment Analysis Method Based on Multilingual Hierarchical Mechanism
CN116244441A (en)Social network offensiveness language detection method based on multitasking learning

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp