Disclosure of Invention
In view of the foregoing problems, it is an object of the present invention to provide a method, an apparatus, an electronic device, and a computer-readable storage medium for emotion analysis of data with good generalization effect and high analysis accuracy.
In order to achieve the above object, the present invention provides a data emotion analyzing method, including:
collecting comment information;
obtaining the score of the user in the comment information;
obtaining emotion polarities of the comment information by adopting keyword matching and dictionary rules based on a dictionary, wherein the emotion polarities comprise neutral, negative and positive;
analyzing the probability of the comment information belonging to different emotion polarities based on machine learning;
converting the scores, the emotion polarities and the probabilities of the emotion polarities into the same range by adopting a mapping method;
and obtaining the emotion polarity and the score of the comment information by adopting a weighted voting fusion mode according to the scores, the emotion polarities and the emotion polarities converted into the same range.
Preferably, the step of obtaining the emotion polarity of the comment information by using keyword matching and dictionary rules based on the dictionary includes:
loading a dictionary, wherein the dictionary comprises a subject word dictionary, an emotion word dictionary and a user segmentation word dictionary, and the emotion word dictionary comprises a common emotion word dictionary and a theme related emotion word dictionary;
preprocessing comment information through a user word segmentation dictionary, wherein the preprocessing comprises sentence segmentation and word segmentation;
scanning the comment information through the subject word dictionary, and judging whether the scanned words are subject words or not;
if the scanned word is a subject word, obtaining the emotion polarity of the subject word through a theme related emotion dictionary;
if the scanned word is not the subject word, obtaining the emotion polarity of the word through a common emotion dictionary;
and taking the emotion polarity to which the maximum word number belongs in the comment information as the emotion polarity of the comment information.
Further, preferably, the step of converting the scores, the emotion polarities and the probabilities of emotion polarities to the same range by using the mapping method comprises:
mapping a negative direction to a minimum value of the range;
mapping a forward direction to a maximum value of the range;
neutral is mapped as the average of the maximum and minimum values of the range.
Preferably, the step of analyzing the probability that the comment information belongs to different emotion polarities based on machine learning includes:
pre-training a language model by adopting unsupervised linguistic data;
adjusting the pre-trained language model through the labeled corpus;
and obtaining the probability that the comment information belongs to different emotion polarities through the adjusted language model.
Further, preferably, the step of pre-training the language model by using unsupervised corpus includes:
and pre-training the language model by adopting a BERT pre-training model.
Preferably, the step of converting the scores, the emotion polarities and the probabilities of emotion polarities to the same range by using the mapping method comprises:
obtaining the number of grading grades graded by a user;
mapping the score of the lowest scoring level to the minimum of the range;
mapping the score of the highest scoring level to the maximum of the range;
mapping the scores of the intermediate levels to the average of the maximum and minimum values;
mapping scores of levels between a middle level and a lowest level to an average of less than the average of the maximum and minimum values and a minimum value;
the scores of the levels between the middle and highest levels are mapped to a mean greater than the maximum and minimum values and a mean of the maximum values.
Preferably, the step of converting the scores, the emotion polarities and the probabilities of emotion polarities to the same range by using the mapping method comprises:
obtaining the sum of the absolute values of the maximum and minimum values of the range;
multiplying the average of the sum of absolute values by the probability of emotion polarity.
In addition, the present invention also provides a data emotion analyzing apparatus, including:
the collection module is used for collecting comment information;
the score extraction module is used for obtaining the score of the user in the comment information;
the first emotion polarity acquisition module is used for acquiring the emotion polarities of the comment information by adopting keyword matching and dictionary rules based on a dictionary, wherein the emotion polarities comprise neutral, negative and positive polarities;
the second emotion polarity acquisition module is used for analyzing the probability that the comment information belongs to different emotion polarities based on machine learning;
the mapping module converts the scores, the emotion polarities and the probabilities of the emotion polarities into the same range by adopting a mapping method;
and the third emotion polarity obtaining module is used for obtaining the emotion polarity and the score of the comment information by adopting a weighted voting fusion mode according to the scores, the emotion polarities and the emotion polarities converted into the same range.
In order to achieve the above object, the present invention also provides an electronic device, including:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the data emotion analysis method described above.
In order to achieve the above object, the present invention further provides a computer-readable storage medium storing a computer program, which when executed by a processor, implements the above data emotion analysis method.
The data emotion analysis method, the device, the electronic equipment and the computer readable storage medium perform weighted voting fusion on the scoring information of the user comment, the emotion analysis result based on the dictionary and adopting keyword matching and dictionary rules and the emotion analysis result based on machine learning so as to obtain the emotion polarity of the current user comment text, and improve generalization effect and analysis accuracy by adopting multiple analysis measurement fusion.
To the accomplishment of the foregoing and related ends, one or more aspects of the invention comprise the features hereinafter fully described. The following description and the annexed drawings set forth in detail certain illustrative aspects of the invention. These aspects are indicative, however, of but a few of the various ways in which the principles of the invention may be employed. Further, the present invention is intended to include all such aspects and their equivalents.
Detailed Description
Specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
Fig. 1 is a schematic flow diagram of a data emotion analysis method according to the present invention, and as shown in fig. 1, the data emotion analysis method includes:
step S1, comment information is collected;
step S2, obtaining the score of the user in the comment information;
step S3, obtaining the emotion polarity of the comment information by adopting keyword matching and dictionary rules based on the dictionary, wherein the emotion polarity comprises neutral, negative and positive;
step S4, analyzing the probability that the comment information belongs to different emotion polarities based on machine learning;
step S5, converting the scores, the emotion polarities and the probabilities of the emotion polarities into the same range by adopting a mapping method;
and step S6, obtaining the emotion polarity and the score of the comment information by adopting a weighted voting fusion mode according to the scores, the emotion polarities and the emotion polarities converted into the same range.
The data emotion analysis method integrates the control capability of a rule keyword mode on emotion results and the generalization capability of deep learning, votes and fuses the rating information of user comments, the emotion analysis results of keywords and rules (keyword matching and dictionary rules) and the emotion analysis results of a machine learning algorithm so as to obtain the emotion polarity of the current user comment text, and can avoid the defects caused by a single strategy based on a dictionary. The strategy based on deep learning has strong semantic understanding capability on one hand and strong generalization capability on the other hand, and can greatly get rid of the dilemma that new words are difficult to find and dictionaries are difficult to construct. The strategy based on the user behavior is not based on the dictionary at all, and the situation that the strategy based on the dictionary does not cover can be solved to a certain extent. In addition, the invention does not need to select kernel functions, and parameter adjustment can also be selected according to experience and evaluation results. The emotion analysis based on the keyword rule is constructed by relying on an artificial emotion dictionary too much, the recall rate is often low, and the emotion analysis effect on short sentences and sentences without related key words is poor. But based on the way of keyword rule, badcase (case not right by model judgment) can be processed quickly, and the control ability for emotion result is strong. By introducing the emotion analysis mode of deep learning, the problem of poor generalization effect in the mode of keyword rules can be better alleviated.
In one embodiment, in step S3, the step of obtaining the emotion polarity of the comment information by using keyword matching and dictionary rules based on the dictionary includes:
loading a dictionary, wherein the dictionary comprises a subject word dictionary, an emotion word dictionary and a user segmentation word dictionary, and the emotion word dictionary comprises a common emotion word dictionary and a theme related emotion word dictionary;
preprocessing comment information through a user word segmentation dictionary, wherein the preprocessing comprises sentence segmentation and word segmentation;
scanning the comment information through the subject word dictionary, and judging whether the scanned words are subject words or not;
if the scanned word is a subject word, obtaining the emotion polarity of the subject word through a theme related emotion dictionary;
if the scanned word is not the subject word, obtaining the emotion polarity of the word through a common emotion dictionary;
and taking the emotion polarity to which the maximum word number belongs in the comment information as the emotion polarity of the comment information.
In one embodiment, in step S3, the step of obtaining the emotion polarity of the comment information by using keyword matching and dictionary rules based on the dictionary includes:
setting weight corresponding to neutral emotion polarity;
loading a dictionary, wherein the dictionary comprises a subject word dictionary, an emotion word dictionary and a user word segmentation dictionary;
preprocessing comment information through a user word segmentation dictionary, wherein the preprocessing comprises sentence segmentation and word segmentation;
scanning comment information through a subject word dictionary to obtain subject words contained in the comment information so as to obtain punctuation sentences containing the subject words and obtain vector phrases of the punctuation sentences, wherein the initial weight of the vector phrases is a weight corresponding to neutrality, and the initial weight of each word of the vector phrases is an average value of the weights corresponding to neutrality, namely the weight corresponding to neutrality is divided by the number of the words in the vector phrases;
obtaining word categories of each word in a word vector group of a sentence break through an emotion dictionary, wherein the word categories comprise degree adverbs, negative words, positive words and negative words;
the gradually updating the weight of the word vector group specifically includes:
if the word is an active word, acquiring a previous word and a next word of the word; if the previous word is a degree adverb, multiplying the initial weight of the degree adverb by the initial weight of the word; if the previous word is a negative word or a negative word, multiplying the weight of the previous word by-1; if the latter word is a negative word, the weight of the latter word is multiplied by-1; if the previous or subsequent word is other than the above, adding the initial weight of the subsequent or previous word to the initial weight of the word;
if the word is a negative word, multiplying the initial weight of the word by-1;
if the word is a negative word, acquiring a previous word of the word; if the previous word is a negative word, adding the weights of the previous word and the word; if the previous word is a degree adverb, multiplying the initial weight of the degree adverb by the initial weight of the word; if the previous word is in other cases, multiplying the weight of the previous word by-1;
if the word is a degree adverb, the initial weight is unchanged;
updating successively to obtain the weight of the updated word vector group;
if the weight of the word vector group is larger than the weight corresponding to neutrality, the emotion polarity of the punctuation is positive;
if the weight of the word vector group is smaller than the weight corresponding to neutrality, the emotional polarity of the punctuation is negative;
if the weight of the word vector group is equal to the weight corresponding to neutrality, the emotional polarity of the punctuation is neutral;
and the emotion polarity to which the subject word belongs is contained at most in the comment information as the emotion polarity of the comment information.
Preferably, in the step of obtaining the emotion polarity of the comment information by using keyword matching and dictionary rules based on the dictionary, the method further includes:
obtaining subject terms of comment information similar to terms input by a client;
and taking the emotion polarity to which the subject term which is similar to the term input by the client at most in the comment information belongs as the emotion polarity of the comment information.
The method for obtaining the emotion polarity of the comment information by adopting the keyword matching and the dictionary rule based on the dictionary can avoid bias brought by counting the times of the emotion words, for example, the comment information is ' landscape is very good and beautiful, but service is not good, facilities are not complete ', if the times of the emotion words are counted only, the words are neutral, the ' landscape is a good and beautiful ' which is a spoken scenery, the words are negative if the times are counted once, if the words output by the client are ' landscape ', the ' landscape ' is a good ' which is a subject word and better accords with actual conditions, and the comment information is positive.
In one embodiment, in step S3, the step of obtaining the emotion polarity of the comment information by using keyword matching and dictionary rules based on the dictionary includes: loading a dictionary, wherein the dictionary further comprises a polarity white list, and the polarity white list comprises a positive white list, a negative white list and a neutral white list; word segmentation, preprocessing and sentence segmentation; scanning comment information from left to right; when the subject term is scanned, recording the subject term; when the emotion words are scanned, judging whether the emotion words are preceded by the subject words or not, if the subject words are preceded by the emotion words, acquiring emotion polarity according to a theme related emotion dictionary, and if the subject words are not preceded by the emotion words, acquiring emotion polarity from a common emotion dictionary; then screening whether the emotion words exist in a polarity white list or not, and if so, determining the emotion polarity according to the polarity white list; and acquiring the times of positive, negative and neutral occurrence in the comment information respectively, and determining the final emotional polarity of the comment information and the emotional polarity of the comment information with the maximum times according to the number of times.
The polarity white list is a word list with strong and definite emotional tendency, has the highest power, and can be used as a supplementary test to correct the condition of inaccurate identification.
In one embodiment, the user word segmentation dictionary can also store a professional dictionary and new words in the travel field, so that the word segmentation accuracy is improved; the subject words of the subject word dictionary can find related subject words under each subject according to FastText (word vector tool). The emotional words of the emotional dictionary can be given with positive and negative seed words, then the word most similar to the seed words is found according to the mutual information algorithm, and the word is added into the emotional word dictionary.
In one embodiment, in step S4, the step of analyzing the probability that the comment information belongs to different emotion polarities based on machine learning includes:
pre-training a language model by adopting unsupervised linguistic data;
adjusting the pre-trained language model through the labeled corpus;
and obtaining the probability that the comment information belongs to different emotion polarities through the adjusted language model.
Preferably, the step of pre-training the language model by using unsupervised corpus includes:
and pre-training the language model by adopting a BERT pre-training model.
The experimental result also shows that the effect of using the Bert is better than the success of using a baseline (benchmark model) of the traditional algorithm for a plurality of pre-training language models Bert, and proves that the potential semantic information can be learned from massive unlabeled texts without labeling a large amount of training data for each downstream NLP task. In addition, the success of Pre-training the language model bert also opens a new paradigm for NLP research, namely, a large amount of unsupervised corpora are used for language model Pre-training (Pre-training) first, and then a small amount of labeled corpora are used for Fine-tuning (Fine-tuning) to complete a specific NLP classification task.
Any model training and prediction needs to have an explicit input, and the processor in the BERT code is responsible for processing the input of the model. The google has written some processors for some public data sets in run _ classsifier. py files, such as xnli processor, mnli processor, mrpc processor and cola processor.
For a model that needs to perform training, cross-validation and testing of the complete process, the custom processor needs to inherit the DataProcessor and reload get _ labels, get _ dev _ examples and get _ test _ examples functions that get labels and get _ train _ examples of a single input. Which are called in the flags.do _ train, flags.do _ eval and flags.do _ predict phases of the main function, respectively.
The contents of the functions get _ train _ instances, get _ dev _ instances and get _ test _ instances differ a little bit, except that the addresses of the respective read-in files need to be specified. Taking get _ train _ instances as an example, a function needs to return a list consisting of the InputExample class. The InputExample class is a very simple class, and only the initialization function is needed, and the guid in the parameters needed to be introduced is used for distinguishing each example, and can be defined in the form of train-% d'% (i). text _ a is a string of characters, and text _ b is another string of characters. After subsequent input processing (which is already included in the BERT code and does not need to be done by itself) text _ a and text _ b will be combined into the form of [ CLS ] text _ a [ SEP ] text _ b [ SEP ] incoming model. The last parameter, label, is also in the form of a string, and the content of label needs to be guaranteed to appear in the list returned by the get _ labels function.
In one embodiment, there is an input file named train _ present. txt under the data path: the landscape is good when the user passes one time before, and the history is worth one visit. Traffic is not very convenient.
An input file named train _ present. txt is converted into a function of get _ train _ instances,
the specific Processor code is:
# read txt
# Return to list consisting of InputExample classes
Text a is a string and text b is another string. After subsequent input processing (included in the BERT code, not necessarily done by itself)
# text _ a and text _ b will be combined into a form-transfer model of [ CLS ] text _ a [ SEP ] text _ b [ SEP ]
def get_train_examples(self, data_dir):
file_path = os.path.join(data_dir, 'train_sentiment.txt')
f = open(file_path, 'r', encoding='utf-8')
train_data = []
index = 0
for line in f.readlines():
guid = 'train-% d'% index # parameter guid for distinguishing each example
line = line.replace("\n", "").split("\t")
text _ a = token mutation, convert _ to _ unicode (str [1]) # text to be classified
Emotion category corresponding to label = str (line [2]) # text
train_data.append(InputExample(guid=guid, text_a=text_a, text_b=None, label=label))
Add # to InputExample List
index += 1
return train_data
For this task of text classification (with 3 emotion polarity labels, 0 is neutral, 1 is positive, and 2 is negative), the get _ labels function can be written as follows:
def get_labels(self):
return ['0', '1', '2']
run fine-tune:
python3 run_classifier.py \
--data_dir=data \
--task_name=sim \
--vocab_file=chinese_L-12_H-768_A-12/vocab.txt \
--bert_config_file=chinese_L-12_H-768_A-12/bert_config.json \
--output_dir=sim_model \
--do_train=true \
--do_eval=true \
--init_checkpoint=chinese_L-12_H-768_A-12/bert_model.ckpt \
--max_seq_length=70 \
--train_batch_size=32 \
--learning_rate=5e-5 \
--num_train_epochs=3.0
after training, prediction is needed, and prediction is operated:
python3 run_classifier.py \
--data_dir=data \
--task_name=sa\
--vocab_file=chinese_L-12_H-768_A-12/vocab.txt \
--bert_config_file=chinese_L-12_H-768_A-12/bert_config.json \
--output_dir=sim_model \
--do_predict=true \
--init_checkpoint=chinese_L-12_H-768_A-12/bert_model.ckpt \
--max_seq_length=70 \
and (3) as a running result, generating a test _ results after the running is finished, wherein the three probability values of each row are the probability values of each text about neutral, positive and negative, and are used for being included in the fusion strategy to obtain the following results:
considering that the OTA website has user scores, the behavior of the user scores has emotional tendency, and the user behavior is single and independent. Meanwhile, the inaccuracy and the randomness of the user score are considered, the user score is incorporated into the fusion strategy, and the user score is used as a calculation factor influencing the final emotional tendency.
The user scores of different websites are in different forms, the user scores are normalized to the interval of 1-5 points, the normalized result is discrete scores which are 1 point, 2 points, 3 points, 4 points and 5 points respectively, for example, the user score of one website is in a star making mode, the user scores five stars for the comment, and the score is mapped into 5 points.
In one embodiment, in step S5, the step of converting the score, the emotion polarity and the probability of emotion polarity to the same range by using the mapping method includes:
mapping the negative direction of the emotion polarity to be the minimum value of the range;
mapping the positive direction of the emotion polarity to the maximum value of the range;
neutral of emotional polarity is mapped to the average of the maximum and minimum of the range.
In one embodiment, in step S5, the step of converting the score, the emotion polarity and the probability of emotion polarity to the same range by using the mapping method includes:
obtaining the number of grading grades graded by a user;
mapping the score of the lowest scoring level to the minimum of the range;
mapping the score of the highest scoring level to the maximum of the range;
mapping the scores of the intermediate levels to the average of the maximum and minimum values;
mapping scores of levels between a middle level and a lowest level to an average of less than the average of the maximum and minimum values and a minimum value;
the scores of the levels between the middle and highest levels are mapped to a mean greater than the maximum and minimum values and a mean of the maximum values.
In one embodiment, in step S5, the step of converting the score, the emotion polarity and the probability of emotion polarity to the same range by using the mapping method includes:
obtaining the sum of the absolute values of the maximum and minimum values of the range;
multiplying the average of the sum of absolute values by the probability of emotion polarity.
In a specific embodiment of the present invention, the scenario is to perform emotion analysis on scenic spot-related user comments on an OTA website (OTA is mainly an agent of various Travel products, such as a Travel route, a cattle on the way, a pig, where to go, etc.), and specifically includes:
obtaining the scoring information of the user comment information through a crawler technology, wherein the user scoring information is user scoring: 1 point, 2 points, 3 points, 4 points and 5 points, and the scoring result of the user is single and independent;
the emotion trends of the emotion analysis results of the keywords and the rules comprise positive direction, negative direction and neutrality, and the output result is single and independent;
the machine learning algorithm emotion analysis result is the probability that the emotion polarity is positive, negative and neutral;
the user scoring information is a user scoring result which is respectively mapped to-100, -60, 0, 70 and 100 and respectively represented by A1, B1, C1, D1 and F1, when the user scores four points, the user is more biased to positive feedback and is closer to 100 points, so 4 is mapped to 70 instead of 50, when the user scores two points, the user is more biased to negative feedback and is closer to-100 points, so 2 is mapped to-60 instead of-50, and the preference of the user can be reflected better;
the keywords and the rules obtain positive direction, negative direction and neutrality of emotion polarity which are respectively mapped to 100, -100 and 0 and are represented by A2, B2 and C2;
the machine learning algorithm emotion analysis result is represented by A3, B3 and C3 by multiplying the probability that the emotion polarity is positive, negative and neutral by 100, -100 and 0 respectively, for example, the machine learning algorithm emotion analysis result is represented by 96%, 3% and 1% of the probability that the emotion polarity is positive, negative and neutral respectively, and is divided into 96, -3 and 0 after mapping;
construction of a weighted voting fusion model by
score = a (a1 or B1 or C1 or D1 or F1) + B (a2 or B2 or C2) + C (A3 + B3+ C3);
wherein a + b + c =1, a, b, and c are scale factors of emotion scores, preferably, a =0.3, b =0.3, and c =0.4 are initially set, the scale factors may be modified through data training, and score is a result of weighted voting fused evaluation information;
inputting the mapped scores, emotion polarities and probabilities of emotion polarities into a weighted voting fusion model to obtain a final result = score, wherein-100 < = result < =100 comprises:
setting the resolution range (i, j) with neutral emotion polarity, -100< i, j <100, i < j, preferably, i = -30, j =30, i and j can also be modified by data training;
if-100 < = result < = i, the emotion polarity of the final evaluation information is negative.
If i < result < j, the emotional polarity of the final assessment information is neutral.
If j < = result < =100, the emotion polarity of the final evaluation information is positive.
Preferably, since emotional polarity neutrality is usually associated with 0, C2 or/and C3 may be omitted in the weighted voting fusion model, for example score = a (a1 or B1 or C1 or D1 or F1) + B (a2 or B2) + C (A3 + B3), in order to further reduce the calculation, increase the analysis speed.
Fig. 2 is a block diagram showing a configuration of a data emotion analyzing apparatus according to the present invention, and as shown in fig. 2, the data emotion analyzing apparatus 100 includes:
the acquisition module 110: collecting comment information;
the score extraction module 120 is used for obtaining the scores of the users in the comment information;
a first emotion polarity obtaining module 130, which obtains emotion polarities of the comment information by using keyword matching and dictionary rules based on a dictionary, where the emotion polarities include neutral, negative, and positive;
the second emotion polarity obtaining module 140 analyzes the probability that the comment information belongs to different emotion polarities based on machine learning;
the mapping module 150 converts the scores, the emotion polarities and the probabilities of the emotion polarities into the same range by adopting a mapping method;
and a third emotion polarity obtaining module 160, which obtains the emotion polarity and the score of the comment information by adopting a weighted voting fusion mode according to the scores, the emotion polarities and the emotion polarities converted into the same range.
In one embodiment, first emotion polarity acquisition module 130 includes:
the loading unit is used for loading a dictionary, wherein the dictionary comprises a subject word dictionary, an emotional word dictionary and a user word segmentation dictionary, and the emotional word dictionary comprises a common emotional word dictionary and a theme related emotional word dictionary;
the pre-processing unit is used for pre-processing the comment information through a user word segmentation dictionary, wherein the pre-processing comprises sentence segmentation and word segmentation;
a scanning unit for scanning the comment information through the subject word dictionary;
the judging unit is used for judging whether the scanned words are subject words or not, sending signals to the subject word emotion analyzing unit if the scanned words are subject words, and sending signals to the non-subject word emotion analyzing unit if the scanned words are not subject words;
the theme word emotion analysis unit is used for acquiring the emotion polarity of the theme word through a theme related emotion dictionary;
the non-subject word emotion analysis unit acquires the emotion polarity of the word through a common emotion dictionary;
and the first emotion analysis unit is used for taking the emotion polarity to which the maximum word number belongs in the comment information as the emotion polarity of the comment information.
In one embodiment, the second emotion polarity acquisition module 140 includes:
the pre-training unit is used for pre-training the language model by adopting unsupervised linguistic data, and preferably, pre-training the language model by adopting a BERT pre-training model;
the adjusting unit adjusts the pre-trained language model through the labeled corpus;
and the second emotion analysis unit is used for obtaining the probability that the comment information belongs to different emotion polarities through the adjusted language model.
In one embodiment, the mapping module 150 includes a first mapping submodule, a second mapping submodule, and a third mapping submodule, where the first mapping submodule is used to map the score of the comment information, the second mapping submodule is used to map the emotion polarity obtained by the first emotion polarity obtaining module, and the third mapping submodule is used to map the probability obtained by the second emotion polarity obtaining module, specifically:
the first mapping sub-module maps negative to a minimum of the range, positive to a maximum of the range, and neutral to an average of the maximum and minimum of the range.
The second mapping submodule includes:
the counting unit is used for acquiring the number of grading grades graded by a user;
an emotion polarity mapping unit which maps the score of the lowest score level to the minimum value of the range; mapping the score of the highest scoring level to the maximum of the range; mapping the scores of the intermediate levels to the average of the maximum and minimum values; a score for a level between the middle level and the lowest level is mapped to a mean that is less than the mean and the minimum of the maximum and minimum values.
The third mapping submodule obtains the sum of the absolute values of the maximum value and the minimum value of the range, and multiplies the average value of the sum of the absolute values by the probability of the emotion polarity.
FIG. 3 is a schematic structural diagram of an electronic device for implementing the data emotion analysis method according to the present invention.
The electronic device 1 may include a processor 10, a memory 11, and a bus, and may further include a computer program, such as a data emotion analysis program 12, stored in the memory 11 and operable on the processor 10.
The memory 11 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only to store application software installed in the electronic device 1 and various types of data, such as codes of a data emotion analysis program, but also to temporarily store data that has been output or is to be output.
The processor 10 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing programs or modules (e.g., data emotion analysis programs, etc.) stored in the memory 11 and calling data stored in the memory 11.
The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.
Fig. 3 shows only an electronic device with components, and it will be understood by those skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
For example, although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so as to implement functions of charge management, discharge management, power consumption management, and the like through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
Further, the electronic device 1 may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the electronic device 1 and other electronic devices.
Optionally, the electronic device 1 may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
The data emotion analysis program 12 stored in the memory 11 of the electronic device 1 is a combination of a plurality of instructions, and when running in the processor 10, can realize:
collecting comment information;
obtaining the score of the user in the comment information;
obtaining emotion polarities of the comment information by adopting keyword matching and dictionary rules based on a dictionary, wherein the emotion polarities comprise neutral, negative and positive;
analyzing the probability of the comment information belonging to different emotion polarities based on machine learning;
converting the scores, the emotion polarities and the probabilities of the emotion polarities into the same range by adopting a mapping method;
and obtaining the emotion polarity and the score of the comment information by adopting a weighted voting fusion mode according to the scores, the emotion polarities and the emotion polarities converted into the same range.
Specifically, the specific implementation method of the processor 10 for the instruction may refer to the description of the relevant steps in the embodiment corresponding to fig. 1, which is not described herein again.
Further, the integrated modules/units of the electronic device 1, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).
Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium may be non-volatile or volatile, and the computer-readable storage medium includes a computer program, where the computer program is executed by a processor, and the computer program implements the following operations:
collecting comment information;
obtaining the score of the user in the comment information;
obtaining emotion polarities of the comment information by adopting keyword matching and dictionary rules based on a dictionary, wherein the emotion polarities comprise neutral, negative and positive;
analyzing the probability of the comment information belonging to different emotion polarities based on machine learning;
converting the scores, the emotion polarities and the probabilities of the emotion polarities into the same range by adopting a mapping method;
and obtaining the emotion polarity and the score of the comment information by adopting a weighted voting fusion mode according to the scores, the emotion polarities and the emotion polarities converted into the same range.
The specific implementation of the computer-readable storage medium of the present application is substantially the same as the specific implementation of the data emotion analysis method, apparatus, and electronic device, and is not described herein again.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.