A kind of text emotion analysis system of autonomous upgrading and anti-noiseTechnical field
The present invention relates to text emotion analysis technical fields, and in particular to a kind of independently text emotion of upgrading and anti-noise pointAnalysis system.
Background technique
Sentiment analysis and accurate judgement to client are the targets of the diligent pursuit of businessman, with the sea of internet text notebook dataAmount increases, and it is unlikely to analyze data by manually, therefore introduce machine learning method one after another, to these or it is long orShort text carries out sentiment analysis by machine come information expressed by these texts, and then it is expected to make essence to the emotion of userTrue judgement and assurance.
Instantly, such numerous technology are produced: having semantic-based, are also had based on statistics;Have plenty of supervised,There is non-supervisory formula, there are also Semi-superviseds;Have based on tradition SVM or random forests algorithm, also has based on deep learning;Have specially inShort text also has specially in long text.But from the point of view of presently disclosed situation, the performance of such technology is not so to the greatest extent such as peopleMeaning.Such as the open short text sentiment analysis engine of Baidu, we survey, and accuracy is also only 75% or so.NamelyIt says, instantly the used technology being inclined to by machine recognition text emotion, to the standard of the Judgment by emotion of text on internetTrue rate, also farther out apart from artificial judgment, even less than 80%, this ratio is also compared with the machine AI technology in video identification fieldAccuracy rate wants much lower.
The main reason for analysis is got off, and the analysis of restricting current text emotion is bad has:
1, existing participle technique etc., can introduce irrelevant with article, even result in the vocabulary of ambiguity, and vocabulary is allThe basis of machine learning algorithm, because they are the sources of article feature extraction;
2, identical vocabulary often has different emotion meanings in different types of article and the article of different fieldJustice;
3, internet is the synonym of variation, and new word continues to bring out or a word, in similar scene, with whenBetween change and have the different meanings;
4, although be machine learning type algorithm, but its algorithm model is often before online production environment by manually instructingIt perfects, and in the process of running, it cannot automatically learn the internet environment with the above-mentioned complexity of adaptation.
In short, the interference of the internet come is too many, and machine learning algorithm instantly used, although can be to article emotionIt being prejudged (regardless of quasi- inaccuracy), shortage independently adapts to and the ability of autonomous learning, that is, lacks the mechanism of anti-noise, thusIt is not high to result in machine learning type text emotion judgment technology accuracy instantly.
Summary of the invention
In order at least solve one of the problems of the above-mentioned prior art, the application provides a kind of with self-teaching energyPower independently adapts to environment, the text emotion analysis system compared with strong anti-interference ability, while ensureing efficiency, improves accuracy rate.
In order to reach above-mentioned technical effect, the specific technical solution of the present invention is as follows:
The text emotion analysis system of a kind of autonomous upgrading and anti-noise, including user terminal, background end, text emotion judgement systemSystem;The text emotion judges that system includes medium categorization module, trade classification module, medium engine group, industry engine group, ruleThen learn engine group;
The medium categorization module obtains the content of text to sentiment analysis, judges whether it derives from medium, if literaryThis content is derived from medium, then sends it to the media characteristics dictionary of corresponding media types, on the contrary then be not sent to mediumFeature lexicon;The media types includes comment, news, blog, wechat, microblogging, and the media characteristics dictionary accordingly includesComment on category feature dictionary, news category feature lexicon, blog category feature dictionary, wechat category feature dictionary, microblogging category feature dictionary;
The media characteristics dictionary receive corresponding media types to sentiment analysis content of text, generated by word segmentation moduleTo the vocabulary of sentiment analysis, the vocabulary to sentiment analysis of generation is sent to the M in media characteristics extraction module groupMedia characteristics extraction module, the media characteristics extraction module are mentioned including the first media characteristics extraction module, the second media characteristicsTo M media characteristics extraction module, M is integer for modulus block, third media characteristics extraction module, and each media characteristics extractThe feature vector respectively extracted is sent to N number of media characteristics selecting module in media characteristics selecting module group, institute by moduleStating media characteristics selecting module includes the first media characteristics selecting module, the second media characteristics selecting module to N media characteristicsSelecting module, N are integer, and the feature vector respectively selected is sent to the medium and drawn by each media characteristics selecting moduleHold up group;
The medium engine group includes medium deep learning engine group and medium machine learning engine group, the medium depthStudy engine group includes the Q medium deep learning engines based on deep learning algorithm model, realization Judgment by emotion, and Q is integer,The medium machine learning engine group includes the S medium machine learning based on machine learning algorithm model, realization Judgment by emotionEngine, S are integer, and the medium engine group is based on corresponding algorithm model, select mould by the media characteristics to what is receivedThe feature vector that block is sent is calculated, and the sentiment analysis result data of each vocabulary to sentiment analysis is calculated,Calculated sentiment analysis result data is sent to medium Sentiment orientation judgment module by the medium engine group;
The medium Sentiment orientation judgment module will judge for judging whether each sentiment analysis result data is correctResult data is sent to rule learning engine group;
The trade classification module obtains the content of text to sentiment analysis, judges it with the presence or absence of the industry neck of ownershipDomain if content of text is the industry field in the presence of ownership, then sends it to the industrial characteristic dictionary of corresponding industry field, insteadBe not sent to industrial characteristic dictionary then;The industry field includes food and drink, electronics, automobile, communication, clothes, and the industry is specialSign dictionary accordingly includes catering field feature lexicon, electronic field feature lexicon, automotive field feature lexicon, communications field spyLevy dictionary, garment industry feature lexicon;
The industrial characteristic dictionary receive corresponding industry field to sentiment analysis content of text, generated by word segmentation moduleTo the vocabulary of sentiment analysis, the vocabulary to sentiment analysis of generation is sent to the X in industrial characteristic extraction module groupIndustrial characteristic extraction module, the industrial characteristic extraction module are mentioned including the first industry characteristic extracting module, secondary industry featureTo X industrial characteristic extraction module, X is integer for modulus block, third industry characteristic extracting module, and each industrial characteristic extractsThe feature vector respectively extracted is sent to Y industrial characteristic selecting module in industrial characteristic selecting module group, institute by moduleStating industrial characteristic selecting module includes the first industry feature selection module, secondary industry feature selection module to Y industrial characteristicSelecting module, Y are integer, and the feature vector respectively selected is sent to the industry and drawn by each industrial characteristic selecting moduleHold up group;
The industry engine group includes industry deep learning engine group and industry machine learning engine group, the industry depthStudy engine group includes the U industry deep learning engines based on deep learning algorithm model, realization Judgment by emotion, and U is integer,The industry machine learning engine group includes the V industry machine learning based on machine learning algorithm model, realization Judgment by emotionEngine, V are integer, and the industry engine group is based on corresponding algorithm model, select mould by the industrial characteristic to what is receivedThe feature vector that block is sent is calculated, and the sentiment analysis result data of each vocabulary to sentiment analysis is calculated,Calculated sentiment analysis result data is sent to industry Sentiment orientation judgment module by the industry engine group;
The industry Sentiment orientation judgment module will judge for judging whether each sentiment analysis result data is correctResult data is sent to rule learning engine group;
The rule learning engine group is according to receiving by the medium Sentiment orientation judgment module, industry Sentiment orientationThe judging result data that judgment module is sent, count M*N (Q+S) item or/and emotion is treated in X*Y (U+V) Judgment by emotion pathThe accuracy rate of the content of text Sentiment orientation judgement of analysis, is different media types, industry field text matches accuracy rate highestJudgment by emotion path;The rule learning engine group is according to known judging result data, the existing depth of on-line training simultaneouslyDegree learning algorithm model or machine learning algorithm model form new deep learning algorithm model or machine learning algorithm model,And by new deep learning algorithm model or machine learning algorithm model be added medium engine group or industry engine group with it is existingDeep learning algorithm model or machine learning algorithm model carry out superiority and inferiority comparison, realize deep learning algorithm model or machine learningThe iteration of algorithm model upgrades.
Further, when the same vocabulary to sentiment analysis in content of text is sent to medium classification mould simultaneouslyBlock, trade classification module and when judging through the medium Sentiment orientation judgment module, industry Sentiment orientation judgment module, the ruleThen learn engine group using following steps:
S1, the rule learning engine group obtain the medium Sentiment orientation judgment module respectively, industry Sentiment orientation is sentencedThe judging result data of disconnected module;
S2, judge whether two judging result data are consistent, if unanimously, judging result data are sent to user terminalUser, user online label text and can form user's marking data based on label text;If it is inconsistent, carrying out stepS3;
The administrator of S3, rule learning engine group notice background end;
S4, the online label text of administrator simultaneously form administrator's marking data based on label text;
S5, pipeline person judge whether the industry of content of text ownership is correct, if correctly, it is correct that text is put into industry classText training library and medium class Error Text training library;If incorrect, step S6 is carried out;
S6, text is put into industry class Error Text training library and medium class correct text training library.
Further, new deep learning algorithm model or machine learning algorithm model and existing deep learning algorithm mouldType or machine learning algorithm model carry out superiority and inferiority comparison, realize the iteration of deep learning algorithm model or machine learning algorithm modelSpecific step is as follows for upgrading:
A, the new training test sample of building, the new trained test sample is by user's marking data, administrator's marking dataIt is constituted with the training test sample data extracted in new content of text;
B, existing deep learning algorithm model or machine learning algorithm model are instructed using new training test samplePractice, formed and be able to achieve the new deep learning algorithm model or machine learning algorithm model of Judgment by emotion, while by training processMiddle identification, the new term obtained, old word new meaning add to corresponding media characteristics dictionary or industrial characteristic dictionary;
C, new deep learning algorithm model or machine learning algorithm model is verified to incline to the emotion of new training test sampleWhether reach 85% to the accuracy rate of judgement, if reaching standard, by new deep learning algorithm model or machine learning algorithmMedium engine group or industry engine group is added in model;If not reaching standard, step d is carried out;
D, this iteration is abandoned;
E, whether the accuracy rate for reaching the new deep learning algorithm model or machine learning algorithm model of standard is higher than nowThe accuracy rate of some deep learning algorithm models or machine learning algorithm model, if so, retaining new deep learning algorithm mouldType or machine learning algorithm model delete existing deep learning algorithm model or machine learning algorithm model simultaneously, if notIt is then to carry out step f;
F, retain existing deep learning algorithm model or machine learning algorithm model is deleted new deep learning simultaneously and calculatedMethod model or machine learning algorithm model;
G, repeat the above steps a, b, c, d, e, f.
Further, Judgment by emotion path described in single includes a characteristic extracting module, feature choosing on its pathModule and a Judgment by emotion algorithm model are selected, the characteristic extracting module uses participle technique to original text to by word segmentation moduleThis participle for carrying out operation formation carries out feature extraction, and the feature vector of extraction conveys after feature selection module selection, amendmentThe training of Judgment by emotion algorithm model is given, and forms new Judgment by emotion algorithm model.
Further, in this system operational process, the rule learning engine group is sentenced according to the medium Sentiment orientationThe judging result data that disconnected module, industry Sentiment orientation judgment module are sent, are also used to identify new term, old word new meaning and will obtainNew term, the old word new meaning got add to corresponding media characteristics dictionary or industrial characteristic dictionary.
Further, the machine learning algorithm model includes decision Tree algorithms model, regression algorithm model, clustering algorithmModel, artificial neural network algorithm model.
According to above-mentioned technical proposal, we set up medium categorization module, trade classification module, on the one hand from comment, news,Blog, microblogging media types different from wechat etc. set out, and construct different media characteristics dictionaries, on the other hand from food and drink, electronics,The different industries such as clothes, automobile, communication field is set out, and different industrial characteristic dictionaries is constructed, behind according to the media types, rowIndustry field will select suitable feature lexicon, feature extraction mode, feature vector expression content and algorithm model, engine,To obtain more accurate Sentiment orientation judgement;In the present invention, it adapts to industry and adapts to medium be two mutually independent emotionsJudge path, a text is certain to adhere to different medium and industry separately, will be sufficiently from the medium of the text by this two pathsIt sets out with industry speciality, carries out the identification of feature extraction and Sentiment orientation;In addition present invention further introduces " rule learning enginesGroup ", it realizes following functions: 1) in system operation, each Judgment by emotion path is counted, to text emotion tendency judgementAccuracy rate is different media types, industry field text, finds most suitable Judgment by emotion path;2) in system operationIn, according to known court verdict data, on-line training algorithm model, and engine group is added in New Algorithm Model, with other algorithmsModel is at war with;Autonomous learning in the present invention is mainly realized in rule learning engine group, and two levels are divided to:
1) autonomous optimal selection: the judging result data by counting each Judgment by emotion path select anticipation accuracy rateHighest Judgment by emotion path;
2) self iteration upgrades: the error correction information by collecting user and administrator, periodically carries out the training, new of old modelThe generation of model by new model test, upgrading and replaces old model.
Compared with prior art, the present invention having the advantage that
1, the Judgment by emotion based on medium classification and trade classification;
2, the defect that internet text introduces a large amount of noise across room and time is overcome;
3, good user's interaction, the data of Internet magnanimity obtain suitable training and test sample;
4, optimal Judgment by emotion Path selection and autonomous continuous iteration upgrade, it is ensured that Judgment by emotion is higher accurateRate.
Detailed description of the invention
Below by specific embodiment combination attached drawing to being originally described in further detail.
Fig. 1 is general frame schematic diagram of the invention;
Fig. 2 is the part frame schematic diagram in the present invention;
Fig. 3 is another part block schematic illustration in the present invention;
Fig. 4 is that the same vocabulary to sentiment analysis is sent to medium categorization module, trade classification module simultaneously in the present inventionWhen rule learning engine group use method flow diagram;
Fig. 5 is deep learning algorithm model or the method flow of machine learning algorithm model realization iteration upgrading in the present inventionFigure;
Fig. 6 is the schematic diagram for generating and its using in a Judgment by emotion path in the present invention;
Wherein, 1, user terminal;2, background end;3, text emotion judges system;4, medium categorization module;5, trade classification mouldBlock;6, medium engine group;61, medium deep learning engine group;62, medium machine learning engine group;7, industry engine group;71,Industry deep learning engine group;72, industry machine learning engine group;8, rule learning engine group;9, media characteristics dictionary;10,Media characteristics extraction module group;11, media characteristics selecting module group;12, medium Sentiment orientation judgment module;13, industrial characteristicDictionary;14, industrial characteristic extraction module group;15, industrial characteristic selecting module group;16, industry Sentiment orientation judgment module;17,Judgment by emotion path.
Specific embodiment
To keep the purposes, technical schemes and advantages of present embodiment clearer, below in conjunction in present embodimentThe technical solution in present embodiment is clearly and completely described in attached drawing, it is clear that described embodiment is sheet oneSome embodiments, rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art existEvery other embodiment obtained under the premise of creative work is not made, the range of this protection is belonged to.
In addition, term " first ", " second ", " M ", " X " etc. are used for description purposes only, and should not be understood as instruction orIt implies relative importance or implicitly indicates the quantity of indicated technical characteristic." first ", " second ", " are defined as a result,The feature of M ", " X " etc. can explicitly or implicitly include one or more of the features.
In the present invention unless specifically defined or limited otherwise, the terms such as term " installation ", " connection ", " fixation " are answeredIt is interpreted broadly, for example, it may be being fixedly connected, may be a detachable connection, or is integral;It can be mechanical connection,It can be electrical connection;It can be directly connected, the company inside two elements can also be can be indirectly connected through an intermediaryLogical or two elements interaction relationship.For the ordinary skill in the art, can understand as the case may beThe concrete meaning of above-mentioned term in the present invention.
Embodiment
As shown in Figure 1, the text emotion analysis system of a kind of autonomous upgrading and anti-noise, including user terminal 1, background end 2, textThis Judgment by emotion system 3;The text emotion judges that system 3 includes medium categorization module 4, trade classification module 5, medium engineGroup 6, industry engine group 7, rule learning engine group 8.
The Main classification module 3 is used to obtain the content of text to sentiment analysis, is derived from medium according to it and still depositsIn the industry of ownership, content of text is sent to medium categorization module 4 or/and trade classification module 5, if content of text is onlyFrom medium, content of text is sent to medium categorization module 4, if there is only the industries of ownership for content of text, by textContent is sent to trade classification module 5, if content of text had not only been derived from medium but also there is the industry of ownership, by content of textIt is sent to medium categorization module 4, trade classification module 5 simultaneously.
As shown in Fig. 2, the medium categorization module 4 obtains the content of text to sentiment analysis, judge whether it derives fromMedium, if content of text is derived from medium, then sends it to the media characteristics dictionary 9 of corresponding media types, it is on the contrary thenIt is not sent to media characteristics dictionary 9;The media types includes comment, news, blog, wechat, microblogging, the media characteristics wordAllusion quotation 9 accordingly includes comment category feature dictionary, news category feature lexicon, blog category feature dictionary, wechat category feature dictionary, microbloggingCategory feature dictionary.
As shown in Fig. 2, the media characteristics dictionary 9 receive corresponding media types to sentiment analysis content of text, pass throughWord segmentation module (not shown) generates the vocabulary to sentiment analysis, and wherein word segmentation module and its participle technique are the prior art,This is not described in detail, and the vocabulary to sentiment analysis of generation is sent to the spy of M medium in media characteristics extraction module group 10Levy extraction module, the media characteristics extraction module include the first media characteristics extraction module, the second media characteristics extraction module,For third media characteristics extraction module to M media characteristics extraction module, M is integer, and each media characteristics extraction module willThe feature vector respectively extracted is sent to N number of media characteristics selecting module in media characteristics selecting module group 11, the matchmakerJie's feature selection module includes the first media characteristics selecting module, the second media characteristics selecting module to N media characteristics selectionModule, N are integer, and the feature vector respectively selected is sent to the medium engine group by each media characteristics selecting module6;The first media characteristics extraction module, the second media characteristics extraction module, third media characteristics extraction module in the present invention are extremelyThe first industry characteristic extracting module that M media characteristics extraction module and lower section to be mentioned, secondary industry characteristic extracting module,Third industry characteristic extracting module is to X industrial characteristic extraction module, and in order to achieve the purpose that title is distinguished, essence is still specialExtraction module is levied, using identical or different Feature Extraction Technology, these Feature Extraction Technologies are the prior art, unknown hereinIt states;Similarly the first media characteristics selecting module, the second media characteristics selecting module to N media characteristics selecting module and firstTo Y industrial characteristic selecting module, essence is still feature selecting for industrial characteristic selecting module, secondary industry feature selection moduleModule, using identical or different Feature Selection, these Feature Selections are the prior art.
As shown in Fig. 2, the medium engine group 6 includes medium deep learning engine group 61 and medium machine learning engine group62, the medium deep learning engine group 61 includes the Q medium depth based on deep learning algorithm model, realization Judgment by emotionLearn engine, Q is integer, and the medium machine learning engine group 62 includes S based on machine learning algorithm model, realization emotionThe medium machine learning engine of judgement, S are integer, and the medium engine group 6 is based on corresponding algorithm model, to receive byThe feature vector that the media characteristics selecting module is sent is calculated, and each vocabulary to sentiment analysis is calculatedCalculated sentiment analysis result data is sent to medium Sentiment orientation by sentiment analysis result data, the medium engine group 6Judgment module 12;The machine learning algorithm model include decision Tree algorithms model, regression algorithm model, clustering algorithm model,Artificial neural network algorithm model.
It must be noted that: medium deep learning engine group 61 and industry deep learning engine group 71, medium in the present inventionMachine learning engine group 62 and industry machine learning engine group 72, medium deep learning engine and industry deep learning engine, matchmakerJie's machine learning engine and industry machine learning engine are distinguished for title and are needed, and essence is still deep learning engine and machineDevice learns engine, medium deep learning engine group 61 and industry deep learning engine group 71, medium machine learning engine group 62 withThe deep learning engine and machine learning engine of same technique or different technologies can be used in industry machine learning engine group 72;DepthLearn engine group, refers to based on deep learning algorithm model, the combination of multiple deep learning engines of the Judgment by emotion of realization, byIn there are multiple deep learning algorithm models, therefore there is different deep learning engines in the combination, machine learning engine group refers toBased on machine learning algorithm model, the combination of multiple machine learning engines of the Judgment by emotion of realization, due to there is multiple engineeringsAlgorithm model is practised, therefore has different machine learning engines in the combination;On a Judgment by emotion path, one is centainly hadA deep learning engine or machine learning engine.
The medium Sentiment orientation judgment module 12 will be sentenced for judging whether each sentiment analysis result data is correctDisconnected result data is sent to rule learning engine group 8.
As shown in figure 3, the trade classification module 5 obtains the content of text to sentiment analysis, judges that it whether there is and returnThe industry field of category if content of text is the industry field in the presence of ownership, then sends it to the industry of corresponding industry fieldFeature lexicon 13, it is on the contrary then be not sent to industrial characteristic dictionary 13;The industry field include food and drink, electronics, automobile, communication,Clothes, the industrial characteristic dictionary 13 accordingly include catering field feature lexicon, electronic field feature lexicon, automotive field spyLevy dictionary, communications field feature lexicon, garment industry feature lexicon.
As shown in figure 3, the industrial characteristic dictionary 13 receive corresponding industry field to sentiment analysis content of text, pass throughWord segmentation module generates the vocabulary to sentiment analysis, and the vocabulary to sentiment analysis of generation is sent to industrial characteristic and extracts mouldX industrial characteristic extraction module in block group 14, the industrial characteristic extraction module include the first industry characteristic extracting module, theTo X industrial characteristic extraction module, X is integer for two industrial characteristic extraction modules, third industry characteristic extracting module, each describedThe feature vector respectively extracted is sent to the spy of Y industry in industrial characteristic selecting module group 15 by industrial characteristic extraction moduleSelecting module is levied, the industrial characteristic selecting module includes the first industry feature selection module, secondary industry feature selection moduleTo Y industrial characteristic selecting module, Y is integer, and each industrial characteristic selecting module sends out the feature vector respectively selectedGive the industry engine group 7.
As shown in figure 3, the industry engine group 7 includes industry deep learning engine group 71 and industry machine learning engine group72, the industry deep learning engine group 71 includes the U industry depth based on deep learning algorithm model, realization Judgment by emotionLearn engine, U is integer, and the industry machine learning engine group 72 includes V based on machine learning algorithm model, realization emotionThe industry machine learning engine of judgement, V are integer, and the industry engine group 7 is based on corresponding algorithm model, to receive byThe feature vector that the industrial characteristic selecting module is sent is calculated, and each vocabulary to sentiment analysis is calculatedCalculated sentiment analysis result data is sent to industry Sentiment orientation by sentiment analysis result data, the industry engine group 7Judgment module 16.
The industry Sentiment orientation judgment module 16 will be sentenced for judging whether each sentiment analysis result data is correctDisconnected result data is sent to rule learning engine group 8.
The rule learning engine group 8 is according to receiving by the medium Sentiment orientation judgment module 12, industry emotionThe judging result data that tendency judgment module 16 is sent, count M*N (Q+S) item or/and X*Y (U+V) Judgment by emotion path 17The accuracy rate for treating the content of text Sentiment orientation judgement of sentiment analysis is different media types, industry field text matches standardThe true highest Judgment by emotion path 17 of rate;Shown in the system as shown in Figure 6, the generation in a Judgment by emotion path with makeWith: single Judgment by emotion path includes a characteristic extracting module, a feature selection module and a Judgment by emotion on its pathAlgorithm model, the characteristic extracting module to by word segmentation module using participle technique to urtext carry out operation formation divideWord carries out feature extraction, and the feature vector of extraction is conveyed to Judgment by emotion algorithm model after feature selection module selection, amendmentTraining, and new Judgment by emotion algorithm model is formed, which is illustrated by taking regression algorithm model as an example.
The rule learning engine group 8 is according to known judging result data, the existing deep learning algorithm of on-line trainingModel or machine learning algorithm model form new deep learning algorithm model or machine learning algorithm model, and by new depthIt spends learning algorithm model or medium engine group 6 or industry engine group 7 and existing deep learning is added in machine learning algorithm modelAlgorithm model or machine learning algorithm model carry out superiority and inferiority comparison, realize deep learning algorithm model or machine learning algorithm modelIteration upgrading.Wherein by taking decision Tree algorithms model as an example, to look at the on-line training process of algorithm model:
1) sample data of all handmarkings, including user's marking data, administrator's marking data are collected, and presses 2:1 ratio, splits into training set and test set;
2) comentropy of feature vocabulary A in training set D is calculated according to formula below:
P (X=A)=Pi, i=1,2,3 ..., n
Wherein, pi refers to probability when being characterized vocabulary A;
3) according to the following formula, feature vocabulary A is calculated to the information gain G (D, A) of training set D:
G (D, A)=H (D)-H (D | A)
Wherein, H (D) is the empirical entropy of training set D, in the case of H (D | A) then refers to known features vocabulary A, training set DEmpirical condition entropy;
4) training set D, and E (it is an array, wherein each element is a threshold value e) basis of setting are based onFront calculates gained information gain G (D, A) and carries out new decision Tree algorithms model according to the decision Tree algorithms model such as ID3, CARTGeneration;
5) test is carried out to newly-generated decision tree with test set, when accuracy rate is higher than 85%, is deployed to machineLearn engine group;
6) newly-generated decision Tree algorithms model and old decision Tree algorithms model in production environment, is allowed to compete, it is winningBad to eliminate, certain decision Tree algorithms model is at war with such as random forest scheduling algorithm model again, to obtain more text feelingsThe adjudicatory power of sense.
This process is entirely to complete online.It is conceivable that each algorithm model will be more next with the increase of sample dataIt is more fitted true production environment, their judging nicety rate will be higher and higher.
As shown in figure 5, new deep learning algorithm model or machine learning algorithm model and existing deep learning algorithmModel or the progress superiority and inferiority comparison of machine learning algorithm model, realization deep learning algorithm model or machine learning algorithm model changeSpecific step is as follows for generation upgrading:
A, the new training test sample of building, the new trained test sample is by user's marking data, administrator's marking dataIt is constituted with the training test sample data extracted in new content of text;
B, existing deep learning algorithm model or machine learning algorithm model are instructed using new training test samplePractice, formed and be able to achieve the new deep learning algorithm model or machine learning algorithm model of Judgment by emotion, while by training processMiddle identification, the new term obtained, old word new meaning add to corresponding media characteristics dictionary 9 or industrial characteristic dictionary 13;
C, new deep learning algorithm model or machine learning algorithm model is verified to incline to the emotion of new training test sampleWhether reach 85% to the accuracy rate of judgement, if reaching standard, by new deep learning algorithm model or machine learning algorithmMedium engine group 6 or industry engine group 7 is added in model;If not reaching standard, step d is carried out;
D, this iteration is abandoned;
E, whether the accuracy rate for reaching the new deep learning algorithm model or machine learning algorithm model of standard is higher than nowThe accuracy rate of some deep learning algorithm models or machine learning algorithm model, if so, retaining new deep learning algorithm mouldType or machine learning algorithm model delete existing deep learning algorithm model or machine learning algorithm model simultaneously, if notIt is then to carry out step f;
F, retain existing deep learning algorithm model or machine learning algorithm model is deleted new deep learning simultaneously and calculatedMethod model or machine learning algorithm model;
G, repeat the above steps a, b, c, d, e, f.
In the present invention, the iteration upgrading of the old and new's algorithm model be in production environment, by the mechanism of the survival of the fittest intoCapable, it is ensured that corresponding Judgment by emotion path is unidirectionally advanced, in other words, it is ensured that our sentiment analysis systems are more nextMore clever, Judgment by emotion is more and more accurate.
The mass data for having benefited from internet has ensured required training and test sample data in the present invention.WeBased on Spark, the invention is realized, allows our data based on magnanimity, on a parallel computing trunking, implementAbove-mentioned thinking, and be reasonably resistant on internet be equally magnanimity noise, realize the emotion for capableing of self on-line evolutionAnalysis system.
As shown in figure 4, when the same vocabulary to sentiment analysis in content of text is sent to medium classification simultaneouslyModule 4, trade classification module 5 simultaneously judge through the medium Sentiment orientation judgment module 12, industry Sentiment orientation judgment module 16When, the rule learning engine group 8 uses following steps:
S1, the rule learning engine group 8 obtain the medium Sentiment orientation judgment module 12, industry Sentiment orientation respectivelyThe judging result data of judgment module 16;
S2, judge whether two judging result data are consistent, if unanimously, judging result data are sent to user terminal 1User, user online label text and can form user's marking data based on label text;If it is inconsistent, carrying out stepS3;
The administrator of S3, the rule learning engine group 8 notice background end 2;
S4, the online label text of administrator simultaneously form administrator's marking data based on label text;
S5, pipeline person judge whether the industry of content of text ownership is correct, if correctly, it is correct that text is put into industry classText training library and medium class Error Text training library;If incorrect, step S6 is carried out;
S6, text is put into industry class Error Text training library and medium class correct text training library.
In addition, the rule learning engine group 8 judges mould according to the medium Sentiment orientation in this system operational processThe judging result data that block 12, industry Sentiment orientation judgment module 16 are sent are also used to identify new term, old word new meaning and will obtainNew term, the old word new meaning got add to corresponding media characteristics dictionary 9 or industrial characteristic dictionary 13.
Use above specific case is merely used to help understand this to being originally illustrated, not to limit this.ForThis person of ordinary skill in the field can also make several simple deductions, deformation or replacement according to this thought.