Movatterモバイル変換


[0]ホーム

URL:


CN109165298A - A kind of text emotion analysis system of autonomous upgrading and anti-noise - Google Patents

A kind of text emotion analysis system of autonomous upgrading and anti-noise
Download PDF

Info

Publication number
CN109165298A
CN109165298ACN201810930606.3ACN201810930606ACN109165298ACN 109165298 ACN109165298 ACN 109165298ACN 201810930606 ACN201810930606 ACN 201810930606ACN 109165298 ACN109165298 ACN 109165298A
Authority
CN
China
Prior art keywords
algorithm model
module
industry
learning algorithm
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810930606.3A
Other languages
Chinese (zh)
Other versions
CN109165298B (en
Inventor
陈福
刘洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Wujie Data Technology Co ltd
Original Assignee
Shanghai Wenjun Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Wenjun Information Technology Co ltdfiledCriticalShanghai Wenjun Information Technology Co ltd
Priority to CN201810930606.3ApriorityCriticalpatent/CN109165298B/en
Publication of CN109165298ApublicationCriticalpatent/CN109165298A/en
Application grantedgrantedCritical
Publication of CN109165298BpublicationCriticalpatent/CN109165298B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Landscapes

Abstract

The text emotion analysis system of a kind of autonomous upgrading and anti-noise, is related to text emotion analysis technical field, including user terminal, background end, text emotion judge system;Text emotion judges that system includes medium categorization module, trade classification module, medium engine group, industry engine group, rule learning engine group;The accuracy rate that rule learning engine group judges content of text Sentiment orientation according to judging result data, each Judgment by emotion path of statistics is the highest Judgment by emotion path of text matches accuracy rate;The existing deep learning algorithm model of rule learning engine group on-line training or machine learning algorithm model form new deep learning algorithm model or machine learning algorithm model simultaneously, and be compared it with existing machine learning algorithm model, realize iteration upgrading.The application provide it is a kind of with ability of self-teaching, independently adapt to environment, the text emotion analysis system compared with strong anti-interference ability, while ensureing efficiency, raising accuracy rate.

Description

A kind of text emotion analysis system of autonomous upgrading and anti-noise
Technical field
The present invention relates to text emotion analysis technical fields, and in particular to a kind of independently text emotion of upgrading and anti-noise pointAnalysis system.
Background technique
Sentiment analysis and accurate judgement to client are the targets of the diligent pursuit of businessman, with the sea of internet text notebook dataAmount increases, and it is unlikely to analyze data by manually, therefore introduce machine learning method one after another, to these or it is long orShort text carries out sentiment analysis by machine come information expressed by these texts, and then it is expected to make essence to the emotion of userTrue judgement and assurance.
Instantly, such numerous technology are produced: having semantic-based, are also had based on statistics;Have plenty of supervised,There is non-supervisory formula, there are also Semi-superviseds;Have based on tradition SVM or random forests algorithm, also has based on deep learning;Have specially inShort text also has specially in long text.But from the point of view of presently disclosed situation, the performance of such technology is not so to the greatest extent such as peopleMeaning.Such as the open short text sentiment analysis engine of Baidu, we survey, and accuracy is also only 75% or so.NamelyIt says, instantly the used technology being inclined to by machine recognition text emotion, to the standard of the Judgment by emotion of text on internetTrue rate, also farther out apart from artificial judgment, even less than 80%, this ratio is also compared with the machine AI technology in video identification fieldAccuracy rate wants much lower.
The main reason for analysis is got off, and the analysis of restricting current text emotion is bad has:
1, existing participle technique etc., can introduce irrelevant with article, even result in the vocabulary of ambiguity, and vocabulary is allThe basis of machine learning algorithm, because they are the sources of article feature extraction;
2, identical vocabulary often has different emotion meanings in different types of article and the article of different fieldJustice;
3, internet is the synonym of variation, and new word continues to bring out or a word, in similar scene, with whenBetween change and have the different meanings;
4, although be machine learning type algorithm, but its algorithm model is often before online production environment by manually instructingIt perfects, and in the process of running, it cannot automatically learn the internet environment with the above-mentioned complexity of adaptation.
In short, the interference of the internet come is too many, and machine learning algorithm instantly used, although can be to article emotionIt being prejudged (regardless of quasi- inaccuracy), shortage independently adapts to and the ability of autonomous learning, that is, lacks the mechanism of anti-noise, thusIt is not high to result in machine learning type text emotion judgment technology accuracy instantly.
Summary of the invention
In order at least solve one of the problems of the above-mentioned prior art, the application provides a kind of with self-teaching energyPower independently adapts to environment, the text emotion analysis system compared with strong anti-interference ability, while ensureing efficiency, improves accuracy rate.
In order to reach above-mentioned technical effect, the specific technical solution of the present invention is as follows:
The text emotion analysis system of a kind of autonomous upgrading and anti-noise, including user terminal, background end, text emotion judgement systemSystem;The text emotion judges that system includes medium categorization module, trade classification module, medium engine group, industry engine group, ruleThen learn engine group;
The medium categorization module obtains the content of text to sentiment analysis, judges whether it derives from medium, if literaryThis content is derived from medium, then sends it to the media characteristics dictionary of corresponding media types, on the contrary then be not sent to mediumFeature lexicon;The media types includes comment, news, blog, wechat, microblogging, and the media characteristics dictionary accordingly includesComment on category feature dictionary, news category feature lexicon, blog category feature dictionary, wechat category feature dictionary, microblogging category feature dictionary;
The media characteristics dictionary receive corresponding media types to sentiment analysis content of text, generated by word segmentation moduleTo the vocabulary of sentiment analysis, the vocabulary to sentiment analysis of generation is sent to the M in media characteristics extraction module groupMedia characteristics extraction module, the media characteristics extraction module are mentioned including the first media characteristics extraction module, the second media characteristicsTo M media characteristics extraction module, M is integer for modulus block, third media characteristics extraction module, and each media characteristics extractThe feature vector respectively extracted is sent to N number of media characteristics selecting module in media characteristics selecting module group, institute by moduleStating media characteristics selecting module includes the first media characteristics selecting module, the second media characteristics selecting module to N media characteristicsSelecting module, N are integer, and the feature vector respectively selected is sent to the medium and drawn by each media characteristics selecting moduleHold up group;
The medium engine group includes medium deep learning engine group and medium machine learning engine group, the medium depthStudy engine group includes the Q medium deep learning engines based on deep learning algorithm model, realization Judgment by emotion, and Q is integer,The medium machine learning engine group includes the S medium machine learning based on machine learning algorithm model, realization Judgment by emotionEngine, S are integer, and the medium engine group is based on corresponding algorithm model, select mould by the media characteristics to what is receivedThe feature vector that block is sent is calculated, and the sentiment analysis result data of each vocabulary to sentiment analysis is calculated,Calculated sentiment analysis result data is sent to medium Sentiment orientation judgment module by the medium engine group;
The medium Sentiment orientation judgment module will judge for judging whether each sentiment analysis result data is correctResult data is sent to rule learning engine group;
The trade classification module obtains the content of text to sentiment analysis, judges it with the presence or absence of the industry neck of ownershipDomain if content of text is the industry field in the presence of ownership, then sends it to the industrial characteristic dictionary of corresponding industry field, insteadBe not sent to industrial characteristic dictionary then;The industry field includes food and drink, electronics, automobile, communication, clothes, and the industry is specialSign dictionary accordingly includes catering field feature lexicon, electronic field feature lexicon, automotive field feature lexicon, communications field spyLevy dictionary, garment industry feature lexicon;
The industrial characteristic dictionary receive corresponding industry field to sentiment analysis content of text, generated by word segmentation moduleTo the vocabulary of sentiment analysis, the vocabulary to sentiment analysis of generation is sent to the X in industrial characteristic extraction module groupIndustrial characteristic extraction module, the industrial characteristic extraction module are mentioned including the first industry characteristic extracting module, secondary industry featureTo X industrial characteristic extraction module, X is integer for modulus block, third industry characteristic extracting module, and each industrial characteristic extractsThe feature vector respectively extracted is sent to Y industrial characteristic selecting module in industrial characteristic selecting module group, institute by moduleStating industrial characteristic selecting module includes the first industry feature selection module, secondary industry feature selection module to Y industrial characteristicSelecting module, Y are integer, and the feature vector respectively selected is sent to the industry and drawn by each industrial characteristic selecting moduleHold up group;
The industry engine group includes industry deep learning engine group and industry machine learning engine group, the industry depthStudy engine group includes the U industry deep learning engines based on deep learning algorithm model, realization Judgment by emotion, and U is integer,The industry machine learning engine group includes the V industry machine learning based on machine learning algorithm model, realization Judgment by emotionEngine, V are integer, and the industry engine group is based on corresponding algorithm model, select mould by the industrial characteristic to what is receivedThe feature vector that block is sent is calculated, and the sentiment analysis result data of each vocabulary to sentiment analysis is calculated,Calculated sentiment analysis result data is sent to industry Sentiment orientation judgment module by the industry engine group;
The industry Sentiment orientation judgment module will judge for judging whether each sentiment analysis result data is correctResult data is sent to rule learning engine group;
The rule learning engine group is according to receiving by the medium Sentiment orientation judgment module, industry Sentiment orientationThe judging result data that judgment module is sent, count M*N (Q+S) item or/and emotion is treated in X*Y (U+V) Judgment by emotion pathThe accuracy rate of the content of text Sentiment orientation judgement of analysis, is different media types, industry field text matches accuracy rate highestJudgment by emotion path;The rule learning engine group is according to known judging result data, the existing depth of on-line training simultaneouslyDegree learning algorithm model or machine learning algorithm model form new deep learning algorithm model or machine learning algorithm model,And by new deep learning algorithm model or machine learning algorithm model be added medium engine group or industry engine group with it is existingDeep learning algorithm model or machine learning algorithm model carry out superiority and inferiority comparison, realize deep learning algorithm model or machine learningThe iteration of algorithm model upgrades.
Further, when the same vocabulary to sentiment analysis in content of text is sent to medium classification mould simultaneouslyBlock, trade classification module and when judging through the medium Sentiment orientation judgment module, industry Sentiment orientation judgment module, the ruleThen learn engine group using following steps:
S1, the rule learning engine group obtain the medium Sentiment orientation judgment module respectively, industry Sentiment orientation is sentencedThe judging result data of disconnected module;
S2, judge whether two judging result data are consistent, if unanimously, judging result data are sent to user terminalUser, user online label text and can form user's marking data based on label text;If it is inconsistent, carrying out stepS3;
The administrator of S3, rule learning engine group notice background end;
S4, the online label text of administrator simultaneously form administrator's marking data based on label text;
S5, pipeline person judge whether the industry of content of text ownership is correct, if correctly, it is correct that text is put into industry classText training library and medium class Error Text training library;If incorrect, step S6 is carried out;
S6, text is put into industry class Error Text training library and medium class correct text training library.
Further, new deep learning algorithm model or machine learning algorithm model and existing deep learning algorithm mouldType or machine learning algorithm model carry out superiority and inferiority comparison, realize the iteration of deep learning algorithm model or machine learning algorithm modelSpecific step is as follows for upgrading:
A, the new training test sample of building, the new trained test sample is by user's marking data, administrator's marking dataIt is constituted with the training test sample data extracted in new content of text;
B, existing deep learning algorithm model or machine learning algorithm model are instructed using new training test samplePractice, formed and be able to achieve the new deep learning algorithm model or machine learning algorithm model of Judgment by emotion, while by training processMiddle identification, the new term obtained, old word new meaning add to corresponding media characteristics dictionary or industrial characteristic dictionary;
C, new deep learning algorithm model or machine learning algorithm model is verified to incline to the emotion of new training test sampleWhether reach 85% to the accuracy rate of judgement, if reaching standard, by new deep learning algorithm model or machine learning algorithmMedium engine group or industry engine group is added in model;If not reaching standard, step d is carried out;
D, this iteration is abandoned;
E, whether the accuracy rate for reaching the new deep learning algorithm model or machine learning algorithm model of standard is higher than nowThe accuracy rate of some deep learning algorithm models or machine learning algorithm model, if so, retaining new deep learning algorithm mouldType or machine learning algorithm model delete existing deep learning algorithm model or machine learning algorithm model simultaneously, if notIt is then to carry out step f;
F, retain existing deep learning algorithm model or machine learning algorithm model is deleted new deep learning simultaneously and calculatedMethod model or machine learning algorithm model;
G, repeat the above steps a, b, c, d, e, f.
Further, Judgment by emotion path described in single includes a characteristic extracting module, feature choosing on its pathModule and a Judgment by emotion algorithm model are selected, the characteristic extracting module uses participle technique to original text to by word segmentation moduleThis participle for carrying out operation formation carries out feature extraction, and the feature vector of extraction conveys after feature selection module selection, amendmentThe training of Judgment by emotion algorithm model is given, and forms new Judgment by emotion algorithm model.
Further, in this system operational process, the rule learning engine group is sentenced according to the medium Sentiment orientationThe judging result data that disconnected module, industry Sentiment orientation judgment module are sent, are also used to identify new term, old word new meaning and will obtainNew term, the old word new meaning got add to corresponding media characteristics dictionary or industrial characteristic dictionary.
Further, the machine learning algorithm model includes decision Tree algorithms model, regression algorithm model, clustering algorithmModel, artificial neural network algorithm model.
According to above-mentioned technical proposal, we set up medium categorization module, trade classification module, on the one hand from comment, news,Blog, microblogging media types different from wechat etc. set out, and construct different media characteristics dictionaries, on the other hand from food and drink, electronics,The different industries such as clothes, automobile, communication field is set out, and different industrial characteristic dictionaries is constructed, behind according to the media types, rowIndustry field will select suitable feature lexicon, feature extraction mode, feature vector expression content and algorithm model, engine,To obtain more accurate Sentiment orientation judgement;In the present invention, it adapts to industry and adapts to medium be two mutually independent emotionsJudge path, a text is certain to adhere to different medium and industry separately, will be sufficiently from the medium of the text by this two pathsIt sets out with industry speciality, carries out the identification of feature extraction and Sentiment orientation;In addition present invention further introduces " rule learning enginesGroup ", it realizes following functions: 1) in system operation, each Judgment by emotion path is counted, to text emotion tendency judgementAccuracy rate is different media types, industry field text, finds most suitable Judgment by emotion path;2) in system operationIn, according to known court verdict data, on-line training algorithm model, and engine group is added in New Algorithm Model, with other algorithmsModel is at war with;Autonomous learning in the present invention is mainly realized in rule learning engine group, and two levels are divided to:
1) autonomous optimal selection: the judging result data by counting each Judgment by emotion path select anticipation accuracy rateHighest Judgment by emotion path;
2) self iteration upgrades: the error correction information by collecting user and administrator, periodically carries out the training, new of old modelThe generation of model by new model test, upgrading and replaces old model.
Compared with prior art, the present invention having the advantage that
1, the Judgment by emotion based on medium classification and trade classification;
2, the defect that internet text introduces a large amount of noise across room and time is overcome;
3, good user's interaction, the data of Internet magnanimity obtain suitable training and test sample;
4, optimal Judgment by emotion Path selection and autonomous continuous iteration upgrade, it is ensured that Judgment by emotion is higher accurateRate.
Detailed description of the invention
Below by specific embodiment combination attached drawing to being originally described in further detail.
Fig. 1 is general frame schematic diagram of the invention;
Fig. 2 is the part frame schematic diagram in the present invention;
Fig. 3 is another part block schematic illustration in the present invention;
Fig. 4 is that the same vocabulary to sentiment analysis is sent to medium categorization module, trade classification module simultaneously in the present inventionWhen rule learning engine group use method flow diagram;
Fig. 5 is deep learning algorithm model or the method flow of machine learning algorithm model realization iteration upgrading in the present inventionFigure;
Fig. 6 is the schematic diagram for generating and its using in a Judgment by emotion path in the present invention;
Wherein, 1, user terminal;2, background end;3, text emotion judges system;4, medium categorization module;5, trade classification mouldBlock;6, medium engine group;61, medium deep learning engine group;62, medium machine learning engine group;7, industry engine group;71,Industry deep learning engine group;72, industry machine learning engine group;8, rule learning engine group;9, media characteristics dictionary;10,Media characteristics extraction module group;11, media characteristics selecting module group;12, medium Sentiment orientation judgment module;13, industrial characteristicDictionary;14, industrial characteristic extraction module group;15, industrial characteristic selecting module group;16, industry Sentiment orientation judgment module;17,Judgment by emotion path.
Specific embodiment
To keep the purposes, technical schemes and advantages of present embodiment clearer, below in conjunction in present embodimentThe technical solution in present embodiment is clearly and completely described in attached drawing, it is clear that described embodiment is sheet oneSome embodiments, rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art existEvery other embodiment obtained under the premise of creative work is not made, the range of this protection is belonged to.
In addition, term " first ", " second ", " M ", " X " etc. are used for description purposes only, and should not be understood as instruction orIt implies relative importance or implicitly indicates the quantity of indicated technical characteristic." first ", " second ", " are defined as a result,The feature of M ", " X " etc. can explicitly or implicitly include one or more of the features.
In the present invention unless specifically defined or limited otherwise, the terms such as term " installation ", " connection ", " fixation " are answeredIt is interpreted broadly, for example, it may be being fixedly connected, may be a detachable connection, or is integral;It can be mechanical connection,It can be electrical connection;It can be directly connected, the company inside two elements can also be can be indirectly connected through an intermediaryLogical or two elements interaction relationship.For the ordinary skill in the art, can understand as the case may beThe concrete meaning of above-mentioned term in the present invention.
Embodiment
As shown in Figure 1, the text emotion analysis system of a kind of autonomous upgrading and anti-noise, including user terminal 1, background end 2, textThis Judgment by emotion system 3;The text emotion judges that system 3 includes medium categorization module 4, trade classification module 5, medium engineGroup 6, industry engine group 7, rule learning engine group 8.
The Main classification module 3 is used to obtain the content of text to sentiment analysis, is derived from medium according to it and still depositsIn the industry of ownership, content of text is sent to medium categorization module 4 or/and trade classification module 5, if content of text is onlyFrom medium, content of text is sent to medium categorization module 4, if there is only the industries of ownership for content of text, by textContent is sent to trade classification module 5, if content of text had not only been derived from medium but also there is the industry of ownership, by content of textIt is sent to medium categorization module 4, trade classification module 5 simultaneously.
As shown in Fig. 2, the medium categorization module 4 obtains the content of text to sentiment analysis, judge whether it derives fromMedium, if content of text is derived from medium, then sends it to the media characteristics dictionary 9 of corresponding media types, it is on the contrary thenIt is not sent to media characteristics dictionary 9;The media types includes comment, news, blog, wechat, microblogging, the media characteristics wordAllusion quotation 9 accordingly includes comment category feature dictionary, news category feature lexicon, blog category feature dictionary, wechat category feature dictionary, microbloggingCategory feature dictionary.
As shown in Fig. 2, the media characteristics dictionary 9 receive corresponding media types to sentiment analysis content of text, pass throughWord segmentation module (not shown) generates the vocabulary to sentiment analysis, and wherein word segmentation module and its participle technique are the prior art,This is not described in detail, and the vocabulary to sentiment analysis of generation is sent to the spy of M medium in media characteristics extraction module group 10Levy extraction module, the media characteristics extraction module include the first media characteristics extraction module, the second media characteristics extraction module,For third media characteristics extraction module to M media characteristics extraction module, M is integer, and each media characteristics extraction module willThe feature vector respectively extracted is sent to N number of media characteristics selecting module in media characteristics selecting module group 11, the matchmakerJie's feature selection module includes the first media characteristics selecting module, the second media characteristics selecting module to N media characteristics selectionModule, N are integer, and the feature vector respectively selected is sent to the medium engine group by each media characteristics selecting module6;The first media characteristics extraction module, the second media characteristics extraction module, third media characteristics extraction module in the present invention are extremelyThe first industry characteristic extracting module that M media characteristics extraction module and lower section to be mentioned, secondary industry characteristic extracting module,Third industry characteristic extracting module is to X industrial characteristic extraction module, and in order to achieve the purpose that title is distinguished, essence is still specialExtraction module is levied, using identical or different Feature Extraction Technology, these Feature Extraction Technologies are the prior art, unknown hereinIt states;Similarly the first media characteristics selecting module, the second media characteristics selecting module to N media characteristics selecting module and firstTo Y industrial characteristic selecting module, essence is still feature selecting for industrial characteristic selecting module, secondary industry feature selection moduleModule, using identical or different Feature Selection, these Feature Selections are the prior art.
As shown in Fig. 2, the medium engine group 6 includes medium deep learning engine group 61 and medium machine learning engine group62, the medium deep learning engine group 61 includes the Q medium depth based on deep learning algorithm model, realization Judgment by emotionLearn engine, Q is integer, and the medium machine learning engine group 62 includes S based on machine learning algorithm model, realization emotionThe medium machine learning engine of judgement, S are integer, and the medium engine group 6 is based on corresponding algorithm model, to receive byThe feature vector that the media characteristics selecting module is sent is calculated, and each vocabulary to sentiment analysis is calculatedCalculated sentiment analysis result data is sent to medium Sentiment orientation by sentiment analysis result data, the medium engine group 6Judgment module 12;The machine learning algorithm model include decision Tree algorithms model, regression algorithm model, clustering algorithm model,Artificial neural network algorithm model.
It must be noted that: medium deep learning engine group 61 and industry deep learning engine group 71, medium in the present inventionMachine learning engine group 62 and industry machine learning engine group 72, medium deep learning engine and industry deep learning engine, matchmakerJie's machine learning engine and industry machine learning engine are distinguished for title and are needed, and essence is still deep learning engine and machineDevice learns engine, medium deep learning engine group 61 and industry deep learning engine group 71, medium machine learning engine group 62 withThe deep learning engine and machine learning engine of same technique or different technologies can be used in industry machine learning engine group 72;DepthLearn engine group, refers to based on deep learning algorithm model, the combination of multiple deep learning engines of the Judgment by emotion of realization, byIn there are multiple deep learning algorithm models, therefore there is different deep learning engines in the combination, machine learning engine group refers toBased on machine learning algorithm model, the combination of multiple machine learning engines of the Judgment by emotion of realization, due to there is multiple engineeringsAlgorithm model is practised, therefore has different machine learning engines in the combination;On a Judgment by emotion path, one is centainly hadA deep learning engine or machine learning engine.
The medium Sentiment orientation judgment module 12 will be sentenced for judging whether each sentiment analysis result data is correctDisconnected result data is sent to rule learning engine group 8.
As shown in figure 3, the trade classification module 5 obtains the content of text to sentiment analysis, judges that it whether there is and returnThe industry field of category if content of text is the industry field in the presence of ownership, then sends it to the industry of corresponding industry fieldFeature lexicon 13, it is on the contrary then be not sent to industrial characteristic dictionary 13;The industry field include food and drink, electronics, automobile, communication,Clothes, the industrial characteristic dictionary 13 accordingly include catering field feature lexicon, electronic field feature lexicon, automotive field spyLevy dictionary, communications field feature lexicon, garment industry feature lexicon.
As shown in figure 3, the industrial characteristic dictionary 13 receive corresponding industry field to sentiment analysis content of text, pass throughWord segmentation module generates the vocabulary to sentiment analysis, and the vocabulary to sentiment analysis of generation is sent to industrial characteristic and extracts mouldX industrial characteristic extraction module in block group 14, the industrial characteristic extraction module include the first industry characteristic extracting module, theTo X industrial characteristic extraction module, X is integer for two industrial characteristic extraction modules, third industry characteristic extracting module, each describedThe feature vector respectively extracted is sent to the spy of Y industry in industrial characteristic selecting module group 15 by industrial characteristic extraction moduleSelecting module is levied, the industrial characteristic selecting module includes the first industry feature selection module, secondary industry feature selection moduleTo Y industrial characteristic selecting module, Y is integer, and each industrial characteristic selecting module sends out the feature vector respectively selectedGive the industry engine group 7.
As shown in figure 3, the industry engine group 7 includes industry deep learning engine group 71 and industry machine learning engine group72, the industry deep learning engine group 71 includes the U industry depth based on deep learning algorithm model, realization Judgment by emotionLearn engine, U is integer, and the industry machine learning engine group 72 includes V based on machine learning algorithm model, realization emotionThe industry machine learning engine of judgement, V are integer, and the industry engine group 7 is based on corresponding algorithm model, to receive byThe feature vector that the industrial characteristic selecting module is sent is calculated, and each vocabulary to sentiment analysis is calculatedCalculated sentiment analysis result data is sent to industry Sentiment orientation by sentiment analysis result data, the industry engine group 7Judgment module 16.
The industry Sentiment orientation judgment module 16 will be sentenced for judging whether each sentiment analysis result data is correctDisconnected result data is sent to rule learning engine group 8.
The rule learning engine group 8 is according to receiving by the medium Sentiment orientation judgment module 12, industry emotionThe judging result data that tendency judgment module 16 is sent, count M*N (Q+S) item or/and X*Y (U+V) Judgment by emotion path 17The accuracy rate for treating the content of text Sentiment orientation judgement of sentiment analysis is different media types, industry field text matches standardThe true highest Judgment by emotion path 17 of rate;Shown in the system as shown in Figure 6, the generation in a Judgment by emotion path with makeWith: single Judgment by emotion path includes a characteristic extracting module, a feature selection module and a Judgment by emotion on its pathAlgorithm model, the characteristic extracting module to by word segmentation module using participle technique to urtext carry out operation formation divideWord carries out feature extraction, and the feature vector of extraction is conveyed to Judgment by emotion algorithm model after feature selection module selection, amendmentTraining, and new Judgment by emotion algorithm model is formed, which is illustrated by taking regression algorithm model as an example.
The rule learning engine group 8 is according to known judging result data, the existing deep learning algorithm of on-line trainingModel or machine learning algorithm model form new deep learning algorithm model or machine learning algorithm model, and by new depthIt spends learning algorithm model or medium engine group 6 or industry engine group 7 and existing deep learning is added in machine learning algorithm modelAlgorithm model or machine learning algorithm model carry out superiority and inferiority comparison, realize deep learning algorithm model or machine learning algorithm modelIteration upgrading.Wherein by taking decision Tree algorithms model as an example, to look at the on-line training process of algorithm model:
1) sample data of all handmarkings, including user's marking data, administrator's marking data are collected, and presses 2:1 ratio, splits into training set and test set;
2) comentropy of feature vocabulary A in training set D is calculated according to formula below:
P (X=A)=Pi, i=1,2,3 ..., n
Wherein, pi refers to probability when being characterized vocabulary A;
3) according to the following formula, feature vocabulary A is calculated to the information gain G (D, A) of training set D:
G (D, A)=H (D)-H (D | A)
Wherein, H (D) is the empirical entropy of training set D, in the case of H (D | A) then refers to known features vocabulary A, training set DEmpirical condition entropy;
4) training set D, and E (it is an array, wherein each element is a threshold value e) basis of setting are based onFront calculates gained information gain G (D, A) and carries out new decision Tree algorithms model according to the decision Tree algorithms model such as ID3, CARTGeneration;
5) test is carried out to newly-generated decision tree with test set, when accuracy rate is higher than 85%, is deployed to machineLearn engine group;
6) newly-generated decision Tree algorithms model and old decision Tree algorithms model in production environment, is allowed to compete, it is winningBad to eliminate, certain decision Tree algorithms model is at war with such as random forest scheduling algorithm model again, to obtain more text feelingsThe adjudicatory power of sense.
This process is entirely to complete online.It is conceivable that each algorithm model will be more next with the increase of sample dataIt is more fitted true production environment, their judging nicety rate will be higher and higher.
As shown in figure 5, new deep learning algorithm model or machine learning algorithm model and existing deep learning algorithmModel or the progress superiority and inferiority comparison of machine learning algorithm model, realization deep learning algorithm model or machine learning algorithm model changeSpecific step is as follows for generation upgrading:
A, the new training test sample of building, the new trained test sample is by user's marking data, administrator's marking dataIt is constituted with the training test sample data extracted in new content of text;
B, existing deep learning algorithm model or machine learning algorithm model are instructed using new training test samplePractice, formed and be able to achieve the new deep learning algorithm model or machine learning algorithm model of Judgment by emotion, while by training processMiddle identification, the new term obtained, old word new meaning add to corresponding media characteristics dictionary 9 or industrial characteristic dictionary 13;
C, new deep learning algorithm model or machine learning algorithm model is verified to incline to the emotion of new training test sampleWhether reach 85% to the accuracy rate of judgement, if reaching standard, by new deep learning algorithm model or machine learning algorithmMedium engine group 6 or industry engine group 7 is added in model;If not reaching standard, step d is carried out;
D, this iteration is abandoned;
E, whether the accuracy rate for reaching the new deep learning algorithm model or machine learning algorithm model of standard is higher than nowThe accuracy rate of some deep learning algorithm models or machine learning algorithm model, if so, retaining new deep learning algorithm mouldType or machine learning algorithm model delete existing deep learning algorithm model or machine learning algorithm model simultaneously, if notIt is then to carry out step f;
F, retain existing deep learning algorithm model or machine learning algorithm model is deleted new deep learning simultaneously and calculatedMethod model or machine learning algorithm model;
G, repeat the above steps a, b, c, d, e, f.
In the present invention, the iteration upgrading of the old and new's algorithm model be in production environment, by the mechanism of the survival of the fittest intoCapable, it is ensured that corresponding Judgment by emotion path is unidirectionally advanced, in other words, it is ensured that our sentiment analysis systems are more nextMore clever, Judgment by emotion is more and more accurate.
The mass data for having benefited from internet has ensured required training and test sample data in the present invention.WeBased on Spark, the invention is realized, allows our data based on magnanimity, on a parallel computing trunking, implementAbove-mentioned thinking, and be reasonably resistant on internet be equally magnanimity noise, realize the emotion for capableing of self on-line evolutionAnalysis system.
As shown in figure 4, when the same vocabulary to sentiment analysis in content of text is sent to medium classification simultaneouslyModule 4, trade classification module 5 simultaneously judge through the medium Sentiment orientation judgment module 12, industry Sentiment orientation judgment module 16When, the rule learning engine group 8 uses following steps:
S1, the rule learning engine group 8 obtain the medium Sentiment orientation judgment module 12, industry Sentiment orientation respectivelyThe judging result data of judgment module 16;
S2, judge whether two judging result data are consistent, if unanimously, judging result data are sent to user terminal 1User, user online label text and can form user's marking data based on label text;If it is inconsistent, carrying out stepS3;
The administrator of S3, the rule learning engine group 8 notice background end 2;
S4, the online label text of administrator simultaneously form administrator's marking data based on label text;
S5, pipeline person judge whether the industry of content of text ownership is correct, if correctly, it is correct that text is put into industry classText training library and medium class Error Text training library;If incorrect, step S6 is carried out;
S6, text is put into industry class Error Text training library and medium class correct text training library.
In addition, the rule learning engine group 8 judges mould according to the medium Sentiment orientation in this system operational processThe judging result data that block 12, industry Sentiment orientation judgment module 16 are sent are also used to identify new term, old word new meaning and will obtainNew term, the old word new meaning got add to corresponding media characteristics dictionary 9 or industrial characteristic dictionary 13.
Use above specific case is merely used to help understand this to being originally illustrated, not to limit this.ForThis person of ordinary skill in the field can also make several simple deductions, deformation or replacement according to this thought.

Claims (6)

The media characteristics dictionary receive corresponding media types to sentiment analysis content of text, generated by word segmentation module to feelingsFeel the vocabulary of analysis, the vocabulary to sentiment analysis of generation is sent to M medium in media characteristics extraction module groupCharacteristic extracting module, the media characteristics extraction module include the first media characteristics extraction module, the second media characteristics extraction mouldTo M media characteristics extraction module, M is integer for block, third media characteristics extraction module, each media characteristics extraction moduleThe feature vector respectively extracted is sent to N number of media characteristics selecting module in media characteristics selecting module group, the matchmakerJie's feature selection module includes the first media characteristics selecting module, the second media characteristics selecting module to N media characteristics selectionModule, N are integer, and the feature vector respectively selected is sent to the medium engine by each media characteristics selecting moduleGroup;
The medium engine group includes medium deep learning engine group and medium machine learning engine group, the medium deep learningEngine group includes the Q medium deep learning engines based on deep learning algorithm model, realization Judgment by emotion, and Q is integer, describedMedium machine learning engine group includes the S medium machine learning engines based on machine learning algorithm model, realization Judgment by emotion,S is integer, and the medium engine group is based on corresponding algorithm model, is sent to what is received by the media characteristics selecting moduleThe feature vector come is calculated, and the sentiment analysis result data of each vocabulary to sentiment analysis, the matchmaker are calculatedCalculated sentiment analysis result data is sent to medium Sentiment orientation judgment module by Jie's engine group;
The industrial characteristic dictionary receive corresponding industry field to sentiment analysis content of text, generated by word segmentation module to feelingsFeel the vocabulary of analysis, the vocabulary to sentiment analysis of generation is sent to X industry in industrial characteristic extraction module groupCharacteristic extracting module, the industrial characteristic extraction module include the first industry characteristic extracting module, secondary industry feature extraction mouldTo X industrial characteristic extraction module, X is integer for block, third industry characteristic extracting module, each industrial characteristic extraction moduleThe feature vector respectively extracted is sent to Y industrial characteristic selecting module in industrial characteristic selecting module group, the rowIndustry feature selection module includes the first industry feature selection module, secondary industry feature selection module to Y industrial characteristic selectionModule, Y are integer, and the feature vector respectively selected is sent to the industry engine by each industrial characteristic selecting moduleGroup;
The industry engine group includes industry deep learning engine group and industry machine learning engine group, the industry deep learningEngine group includes the U industry deep learning engines based on deep learning algorithm model, realization Judgment by emotion, and U is integer, describedIndustry machine learning engine group includes the V industry machine learning engines based on machine learning algorithm model, realization Judgment by emotion,V is integer, and the industry engine group is based on corresponding algorithm model, is sent to what is received by the industrial characteristic selecting moduleThe feature vector come is calculated, and the sentiment analysis result data of each vocabulary to sentiment analysis, the row are calculatedCalculated sentiment analysis result data is sent to industry Sentiment orientation judgment module by industry engine group;
The rule learning engine group is judged according to what is received by the medium Sentiment orientation judgment module, industry Sentiment orientationThe judging result data that module is sent, count M*N (Q+S) item or/and sentiment analysis is treated in X*Y (U+V) Judgment by emotion pathContent of text Sentiment orientation judgement accuracy rate, be different media types, the highest feelings of industry field text matches accuracy rateSense judges path;The rule learning engine group is according to known judging result data, the existing depth of on-line training simultaneouslyAlgorithm model or machine learning algorithm model are practised to form new deep learning algorithm model or machine learning algorithm model, and willMedium engine group or industry engine group and existing depth is added in new deep learning algorithm model or machine learning algorithm modelLearning algorithm model or machine learning algorithm model carry out superiority and inferiority comparison, realize deep learning algorithm model or machine learning algorithmThe iteration of model upgrades.
CN201810930606.3A2018-08-152018-08-15Text emotion analysis system capable of achieving automatic upgrading and resisting noiseActiveCN109165298B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201810930606.3ACN109165298B (en)2018-08-152018-08-15Text emotion analysis system capable of achieving automatic upgrading and resisting noise

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201810930606.3ACN109165298B (en)2018-08-152018-08-15Text emotion analysis system capable of achieving automatic upgrading and resisting noise

Publications (2)

Publication NumberPublication Date
CN109165298Atrue CN109165298A (en)2019-01-08
CN109165298B CN109165298B (en)2022-11-15

Family

ID=64895868

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201810930606.3AActiveCN109165298B (en)2018-08-152018-08-15Text emotion analysis system capable of achieving automatic upgrading and resisting noise

Country Status (1)

CountryLink
CN (1)CN109165298B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110888983A (en)*2019-11-262020-03-17厦门市美亚柏科信息股份有限公司Positive and negative emotion analysis method, terminal device and storage medium
WO2021056127A1 (en)*2019-09-232021-04-01Beijing Didi Infinity Technology And Development Co., Ltd.Systems and methods for analyzing sentiment

Citations (9)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20110208522A1 (en)*2010-02-212011-08-25Nice Systems Ltd.Method and apparatus for detection of sentiment in automated transcriptions
CN103123620A (en)*2012-12-112013-05-29中国互联网新闻中心Web text sentiment analysis method based on propositional logic
US20140337257A1 (en)*2013-05-092014-11-13Metavana, Inc.Hybrid human machine learning system and method
CN104281694A (en)*2014-10-132015-01-14安徽华贞信息科技有限公司Analysis system of emotional tendency of text
US20150199609A1 (en)*2013-12-202015-07-16Xurmo Technologies Pvt. LtdSelf-learning system for determining the sentiment conveyed by an input text
CN105335352A (en)*2015-11-302016-02-17武汉大学Entity identification method based on Weibo emotion
US20160117591A1 (en)*2014-10-232016-04-28Fair Isaac CorporationDynamic Business Rule Creation Using Scored Sentiments
CN107291696A (en)*2017-06-282017-10-24达而观信息科技(上海)有限公司A kind of comment word sentiment analysis method and system based on deep learning
CN107945033A (en)*2017-11-142018-04-20李勇A kind of analysis method of network public-opinion, system and relevant apparatus

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20110208522A1 (en)*2010-02-212011-08-25Nice Systems Ltd.Method and apparatus for detection of sentiment in automated transcriptions
CN103123620A (en)*2012-12-112013-05-29中国互联网新闻中心Web text sentiment analysis method based on propositional logic
US20140337257A1 (en)*2013-05-092014-11-13Metavana, Inc.Hybrid human machine learning system and method
US20150199609A1 (en)*2013-12-202015-07-16Xurmo Technologies Pvt. LtdSelf-learning system for determining the sentiment conveyed by an input text
CN104281694A (en)*2014-10-132015-01-14安徽华贞信息科技有限公司Analysis system of emotional tendency of text
US20160117591A1 (en)*2014-10-232016-04-28Fair Isaac CorporationDynamic Business Rule Creation Using Scored Sentiments
CN105335352A (en)*2015-11-302016-02-17武汉大学Entity identification method based on Weibo emotion
CN107291696A (en)*2017-06-282017-10-24达而观信息科技(上海)有限公司A kind of comment word sentiment analysis method and system based on deep learning
CN107945033A (en)*2017-11-142018-04-20李勇A kind of analysis method of network public-opinion, system and relevant apparatus

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
叶强等: "面向互联网评论情感分析的中文主观性自动判别方法研究", 《信息系统学报》*
左荣欣: "一种分层多算法集成的微博情感分类方法", 《电子世界》*
汪淳等: "基于网络舆情倾向性分析的机器学习方法研究", 《智能计算机与应用》*

Cited By (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2021056127A1 (en)*2019-09-232021-04-01Beijing Didi Infinity Technology And Development Co., Ltd.Systems and methods for analyzing sentiment
CN110888983A (en)*2019-11-262020-03-17厦门市美亚柏科信息股份有限公司Positive and negative emotion analysis method, terminal device and storage medium
CN110888983B (en)*2019-11-262022-07-15厦门市美亚柏科信息股份有限公司Positive and negative emotion analysis method, terminal equipment and storage medium

Also Published As

Publication numberPublication date
CN109165298B (en)2022-11-15

Similar Documents

PublicationPublication DateTitle
Wen et al.Dip: Dual incongruity perceiving network for sarcasm detection
CN109117482B (en) An Adversarial Sample Generation Method for Chinese Text Sentiment Tendency Detection
Wei et al.Twitter bot detection using bidirectional long short-term memory neural networks and word embeddings
Jacovi et al.Understanding convolutional neural networks for text classification
Peddinti et al.Domain Adaptation in Sentiment Analysis of Twitter.
CN106294590B (en)A kind of social networks junk user filter method based on semi-supervised learning
CN103116588A (en)Method and system for personalized recommendation
CN111523311B (en)Search intention recognition method and device
CN108804651A (en)A kind of Social behaviors detection method based on reinforcing Bayes's classification
EklundComparing feature extraction methods and effects of pre-processing methods for multi-label classification of textual data
CN115994224A (en)Phishing URL detection method and system based on pre-training language model
CN109889436A (en) A method for discovering spammers in social networks
CN106681985A (en)Establishment system of multi-field dictionaries based on theme automatic matching
CN106649662A (en)Construction method of domain dictionary
Gautam et al.Effect of features extraction techniques on cyberstalking detection using machine learning framework
Jin et al.Image credibility analysis with effective domain transferred deep networks
Setiawan et al.Feature expansion for sentiment analysis in twitter
CN109165298A (en)A kind of text emotion analysis system of autonomous upgrading and anti-noise
Matheven et al.Fake news detection using deep learning and natural language processing
Mehendale et al.Cyber bullying detection for hindi-english language using machine learning
CN113157993A (en) An early warning model of network water army behavior based on polarization analysis of timing diagrams
Ergin et al.Turkish anti-spam filtering using binary and probabilistic models
Lakzaei et al.A decision-based heterogenous graph attention network for multi-class fake news detection
Hansen et al.Is the lottery fair? evaluating winning tickets across demographics
CN112966507A (en)Method, device, equipment and storage medium for constructing recognition model and identifying attack

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
TA01Transfer of patent application right

Effective date of registration:20220927

Address after:201100 5th and 6th floor, 380 Xinsong Road, Minhang District, Shanghai

Applicant after:Shanghai WuJie Data Technology Co.,Ltd.

Address before:Room 1449, No. 4999, Zhongchun Road, Minhang District, Shanghai, 201100

Applicant before:SHANGHAI WENJUN INFORMATION TECHNOLOGY Co.,Ltd.

TA01Transfer of patent application right
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp