Summary of the invention
The object of the present invention is to provide a kind of construction method of causality knowledge base, device, equipment and computer-readableStorage medium is able to solve and realizes the existing realization prediction of result of the technical solution of causality construction of knowledge base in the prior artWhen the lower problem of accuracy.
To achieve the goals above, the invention provides the following technical scheme:
A kind of construction method of causality knowledge base, comprising:
Multiple sentences are processed by obtaining text data in data source, and by the text data;
It determines in the multiple sentences obtained by processing there are causal cause and effect sentence, identifies in the cause and effect sentence and includeEntity pair, and determination includes that the obtained collection of entity pair of identification is combined into causality knowledge base;Wherein, the entity is to packetInclude reason entity and result entity;
It is every at the time of by obtaining text data in the data source to pass through preset time period, then judge the data sourceWhether the total amount of data of middle variation reaches data-quantity threshold, if it is, it is described by obtaining textual data in data source to return to executionAccording to the step of, if it is not, then determine without rebuild building causality knowledge base.
Preferably, determine whether any sentence is that there are causal cause and effect sentences, comprising:
It determines that any sentence is current statement, whether judges in current statement comprising in clear causality prompt set of wordsAny clear causality prompt word for including, if it is, determining that current statement is cause and effect sentence, if it is not, then determining currentSentence is not cause and effect sentence;Wherein, the clear causality prompt word is to show to determine that there are causal causalitiesPrompt word.
Preferably, before to determine current statement not be cause and effect sentence, further includes:
If not including any clear causality for including in clear causality prompt set of words in current statement to mentionShow word, then judges whether any FUZZY RELATION OF CAUSE AND EFFECT comprising including in FUZZY RELATION OF CAUSE AND EFFECT prompt set of words mentions in current statementShow word, if it is, current statement is converted to the classifier for being input to and being pre-created after feature vector, if the classifierThe result of output is preset value, it is determined that current statement is cause and effect sentence, if the result of classifier output is not preset value,It then determines that current statement is not cause and effect sentence, determines the step of current statement is not cause and effect sentence if it is not, then executing;Wherein,The FUZZY RELATION OF CAUSE AND EFFECT prompt word is to show there may be causal causality prompt word, and the classifier is to utilizeFeature vector that multiple sentences with FUZZY RELATION OF CAUSE AND EFFECT prompt word are converted to and corresponding sentence whether be cause and effect sentence markLabel training obtains.
Preferably, determination includes that the collection for the entity pair that identification obtains is combined into after causality knowledge base, further includes:
Will the obtained entity of identification to cartesian product pairing is carried out, determine obtained multiple entities to being novel entitiesIt is right;
To the novel entities to comprising entity carry out cluster operation, obtain multiple being made of causality entity setsSet pair, wherein comprising being classified as of a sort reason entity or knot in cluster operation in each causality entity setsSporocarp;
Retain it is each set to comprising the co-occurrence frequency in the data source be greater than frequency threshold novel entities pair, deleteOther entities pair.
Preferably, further includes:
The co-occurrence frequency of the novel entities pair of each reservation is added into the causality knowledge base.
Preferably, further includes:
The support of each novel entities pair is calculated according to the following formula, and the support of each novel entities pair is added to instituteIt states in causality knowledge base:
SupportNum=(α * Adverb+ β * SentenceType+ γ * Emotion) * Negative;
Wherein, α, β and γ are preset weight coefficient, α > β > γ, and alpha+beta+γ=1;SupportNum isSupport, Adverb are that novel entities correspond to the degree adverb reciprocal fraction for including in sentence, and SentenceType is novel entities pairThe causality prompt word reciprocal fraction for including in sentence is answered, Emotion is that novel entities correspond to the emotion word pair for including in sentenceScore is answered, Negative is that novel entities correspond to the negative word reciprocal fraction for including in sentence.
Preferably, further includes:
Reason entity to be predicted is compared with each reason entity in newest obtained causality knowledge base, and is exportedNovel entities corresponding with entity the reason of reason Entities Matching to be predicted to comprising result entity, novel entities are to corresponding co-occurrence frequencyRate and novel entities are to corresponding support.
A kind of construction device of causality knowledge base, comprising:
Preprocessing module is used for: being processed into multiple languages by obtaining text data in data source, and by the text dataSentence;
Module is constructed, is used for: determining to identify institute there are causal cause and effect sentence in the multiple sentences obtained by processingThe entity pair for including in cause and effect sentence is stated, and determination includes to identify that the collection of obtained entity pair is combined into causality knowledge base;ItsIn, the entity is to including reason entity and result entity;
Incremental learning module, is used for: every at the time of by obtaining text data in the data source to pass through preset timeSection, then judge whether the total amount of data changed in the data source reaches data-quantity threshold, if it is, return execute it is described byThe step of text data is obtained in data source, if it is not, then determining without rebuilding building causality knowledge base.
A kind of building equipment of causality knowledge base, comprising:
Memory, for storing computer program;
Processor realizes the building of the as above any one causality knowledge base when for executing the computer programThe step of method.
A kind of computer readable storage medium is stored with computer program on the computer readable storage medium, describedThe step of as above construction method of any one causality knowledge base is realized when computer program is executed by processor.
The present invention provides construction method, device, equipment and the computer-readable storage mediums of a kind of causality knowledge baseMatter, wherein this method comprises: being processed into multiple sentences by obtaining text data in data source, and by the text data;ByIt manages and determines to identify the entity pair for including in the cause and effect sentence there are causal cause and effect sentence in obtained multiple sentences, andDetermination includes that the collection for the entity pair that identification obtains is combined into causality knowledge base;Wherein, the entity is to including reason entityAnd result entity;It is every at the time of by obtaining text data in the data source to pass through preset time period, then judge the numberWhether reach data-quantity threshold according to the total amount of data changed in source, if it is, it is described by obtaining text in data source to return to executionThe step of notebook data, if it is not, then determining without rebuilding building causality knowledge base.In technical solution disclosed in the present application,The text data in data source is obtained, the text data that will acquire is processed into after sentence, will exist in the cause and effect sentence in sentenceEntity to identifying processing, causality knowledge base is combined into obtain the collection comprising entity pair, to be based on the causality knowledgeCausal identification is realized in library, and then realizes prediction of result;And per after a period of time, then judging to become in data sourceWhether the data volume of change is enough, if it is, thinking causality present in data source, there may be more variations, in turnAgain realize otherwise the building of causality knowledge base then waits until next judgement.As it can be seen that technical solution disclosed in the present applicationIn, building complete causality knowledge base after, can also periodically judge the causality in data source whether may have occurred compared withMore variations rebuilds causality knowledge base when more variation may occur with the causality in data source, thusFrame based on this eternal study ensure that causality knowledge base matches with the causality in data source, so that cause and effectRelational knowledge base has real-time effectiveness, and then ensure that accuracy when realizing prediction of result based on causality knowledge base.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, completeSite preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based onEmbodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every otherEmbodiment shall fall within the protection scope of the present invention.
Referring to Fig. 1, it illustrates a kind of streams of the construction method of causality knowledge base provided in an embodiment of the present inventionCheng Tu may include:
S11: multiple sentences are processed by obtaining text data in data source, and by text data.
A kind of execution subject of the construction method of causality knowledge base provided in an embodiment of the present invention can be correspondingConstruction device;Under big data environment, selection is covered abundant in content, is related to that field is extensive, and by the data of big well-established approvalIt is the most important thing for carrying out knowledge extraction, the present embodiment select Chinese wikipedia as data source, real-time update, can be randomAccess, be maximum network resources system on internet, and most contents all pass through different user multiple edit validation it is completeAt the comprehensive and accuracy for having fully ensured that data.Natural language processing is carried out to the text data obtained in data source, is obtainedTo corresponding multiple sentences, sentence set XML file can be made of obtained whole sentences, it is this that text data is handledTechnical solution to sentence is consistent with the realization principle for corresponding to technical solution in the prior art, and details are not described herein.
S12: determining in the multiple sentences obtained by processing there are causal cause and effect sentence, identifies in cause and effect sentence and includesEntity pair, and determination includes that the obtained collection of entity pair of identification is combined into causality knowledge base;Wherein, entity is to including originalBecause of entity and result entity.
Wherein, entity is Chinese vocabulary, such as teacher, temperature gos up, incurs loss all is entity;Cause and effect sentence be there are becauseThe sentence of fruit relationship includes showing the entity (reason entity) of reason and showing the entity (result entity) of result in cause and effect sentence,After determining cause and effect sentence, by identified in cause and effect sentence it includes entity pair, so that obtaining the collection comprising entity pair is combined into cause and effectRelational knowledge base.Wherein, the realization principle one for identifying the entity pair for including in cause and effect sentence and corresponding to technical solution in the prior artIt causes, details are not described herein.It obtains in causality knowledge base, can realize causal knowledge based on causality knowledge baseNot, and then prediction of result is realized, specifically, when there is the reason of needing realization prediction, by reason entity and causalityReason entity in knowledge base is compared, if existing in causality knowledge base and needing the reason of predicting Entities MatchingThe reason of (identical) entity, it is determined that the result entity that entity centering corresponding with the reason of predicting entity is needed includes be and needThe reason of predicting entity corresponds to result entity namely the bright result of the fructufy body surface is to need the reason of predicting corresponding knotFruit.
S13: it is every at the time of by obtaining text data in data source to pass through preset time period, then judge to become in data sourceWhether the total amount of data of change reaches data-quantity threshold, if it is, the step of executing by obtaining text data in data source is returned,If it is not, then determining without rebuilding building causality knowledge base.
Wherein, preset time period and the specific value of data-quantity threshold can be set according to actual needs, byAt the time of once by obtaining text data in database, every data volume for then judging to change in data source by certain timeIt is whether enough, to thereby determine whether to need to rebuild causality knowledge base.It should be noted that data-quantity threshold can rootIt is set according to actual needs, the data volume changed in data source is enough it may be considered that may have occurred in data source moreCausality variation, at this time carry out causality knowledge base reconstruction, and building obtain new causality knowledge base,It then needs to realize prediction of result using newest obtained causality knowledge base when needing to carry out prediction of result again.As it can be seen that this ShenPlease the variable quantity timing based on data in data source in embodiment judge whether the building for needing to carry out causality knowledge base, andThe reconstruction that causality knowledge base is realized when the data volume for determining variation is sufficiently large, to ensure that causality knowledge baseThe data for having occurred and that variation can be met, be also to construct complete and practical causality knowledge base, the present embodiment is using everyWhen more causality variation may occur in data source, then this eternal study of reconstruction of causality knowledge base is carried outFrame, increment type decimation pattern ensure that the real-time effectiveness of causality knowledge base.
In technical solution disclosed in the present application, the text data in data source is obtained, the text data that will acquire is processed intoAfter sentence, by entity present in the cause and effect sentence in sentence to identifying processing, cause and effect is combined into obtain the collection comprising entity pairRelational knowledge base, to realize causal identification based on the causality knowledge base;And per after a period of time, then judgingWhether changed data volume is enough in data source, if it is, thinking that causality present in data source may depositIn more variation, and then the building of causality knowledge base is realized again, otherwise, then wait until next judgement.As it can be seen that the applicationIn disclosed technical solution, after causality knowledge base is completed in building, it can also periodically judge that the causality in data source isIt is no to may have occurred more variation, cause and effect pass is rebuild when more variation may occur with the causality in data sourceIt is knowledge base, so that the frame based on this eternal study ensure that causality knowledge base and the causality phase in data sourceMatching so that causality knowledge base has real-time effectiveness, and then ensure that and realize that result is pre- based on causality knowledge baseAccuracy when survey.
A kind of construction method of causality knowledge base provided in an embodiment of the present invention determines whether any sentence is presenceCausal cause and effect sentence may include:
It determines that any sentence is current statement, whether judges in current statement comprising in clear causality prompt set of wordsAny clear causality prompt word for including, if it is, determining that current statement is cause and effect sentence, if it is not, then determining currentSentence is not cause and effect sentence;Wherein, specifying causality prompt word is to show to determine that there are the prompts of causal causalityWord.
It should be noted that clear causality prompt word is to show that corresponding sentence determines that there are causal cause and effect passesBe prompt word, such as because, lead to word;It can be prompted to summarize the clear causality obtained in advance by staffSet of words, as long as thus any clear causality prompt in sentence comprising including in clear causality prompt set of wordsWord, then it is assumed that corresponding sentence is to determine that there are causal cause and effect sentences, otherwise, then it is assumed that corresponding sentence is not cause and effect sentence.FromAnd it can quickly and effectively determine whether sentence is cause and effect sentence in this way.
A kind of construction method of causality knowledge base provided in an embodiment of the present invention, determining current statement not is cause and effect languageBefore sentence, can also include:
If not including any clear causality for including in clear causality prompt set of words in current statement to mentionShow word, then judges whether any FUZZY RELATION OF CAUSE AND EFFECT comprising including in FUZZY RELATION OF CAUSE AND EFFECT prompt set of words mentions in current statementShow word, if it is, current statement is converted to the classifier for being input to and being pre-created after feature vector, if classifier exportsResult be preset value, it is determined that current statement be cause and effect sentence, if classifier output result be not preset value, it is determined that whenPreceding sentence is not cause and effect sentence, determines the step of current statement is not cause and effect sentence if it is not, then executing;Wherein, fuzzy causationRelationship prompt word is to show there may be causal causality prompt word, and classifier is to have fuzzy causation using multipleWhether feature vector that the sentence of relationship prompt word is converted to and corresponding sentence are that the label training of cause and effect sentence obtains.
It should be noted that FUZZY RELATION OF CAUSE AND EFFECT prompt word be show corresponding sentence there may be but cannot completely reallySurely whether there is causal causality prompt word, such as then, following word;It can be summarized in advance for staffFUZZY RELATION OF CAUSE AND EFFECT out prompts set of words, when thus not including clear causality prompt word in sentence, in if statementInclude any FUZZY RELATION OF CAUSE AND EFFECT prompt word for including in FUZZY RELATION OF CAUSE AND EFFECT prompt set of words, then it is assumed that corresponding sentence is possibleThere are causal cause and effect sentences, otherwise, then it is assumed that corresponding sentence is not cause and effect sentence.To be further ensured that whether judge sentenceFor the accuracy of cause and effect sentence.In addition, existing technology is only limitted to carry out relationship in the sentence for containing causality prompt wordExtraction, therefore comprehensively, accurately identification causality prompt word for promoted obtain causal quality have it is greatHelp.The present embodiment is based on Chinese literature knowledge and comprehensively summarizes existing causality prompt word, to accurately divideClass difference causality, so carry out through this embodiment sentence whether be cause and effect sentence multiple judgement.Additionally need explanationBe, in the present embodiment using classifier realize sentence whether be cause and effect sentence judgement, actually be use LTP natural language processing skillArt, according to the judgement of the realizations such as part-of-speech rule, syntax dependence, principal component analysis.It specifically, can be by manually marking instructionPractice collection, the training sample for including in training set be the feature vector being converted by the sentence comprising FUZZY RELATION OF CAUSE AND EFFECT prompt word andThis feature vector corresponds to whether sentence is that the label of cause and effect sentence forms, to can identify using training set training is availableThe classifier of cause and effect sentence, and then realize the identification of FUZZY RELATION OF CAUSE AND EFFECT prompt word, accuracy is higher.Wherein, classifier can beThe classifier that NB Algorithm is realized, in simple terms, whether the sentence containing FUZZY RELATION OF CAUSE AND EFFECT prompt word is cause and effect sentenceIdentification problem can directly be defined as two classification problems, i.e., are as follows: be or be not cause and effect sentence (0 | 1 problem).Based on engineeringLearning method can convert sentence to feature vector, for example the feature vector of sentence X indicates are as follows: X=(x1, x2...xn), xi (iIt is indicated for 1 to the numeralization for n) being sentence word sequence;Similarly, the categorised decision variable C={ 1,0 } of cause and effect sentence, 1 expression sentence areCause and effect sentence, 0 indicates that sentence is non-causal sentence.Using machine learning algorithm to given one group of training set { { X1 }, { X2 } ...{ Xn } }, it whether is cause and effect sentence according to following equation learning objective sentence.
f:X→C
Using the method for machine learning based on the training set Study strategies and methods f manually marked.So that giving new sentence justCan judge whether it is cause and effect sentence.
In addition, the NB Algorithm due to text classification has stable classification effectiveness and higher classification performance,Classifier is realized using NB Algorithm in the present embodiment.The algorithm is less sensitive to missing data, and feature vector is eachComponent is relatively independent relative to decision variable, so that the adaptability of the algorithm compares analogous algorithms with complexity with apparent excellentGesture.There is the characteristic of division of the sentence of FUZZY RELATION OF CAUSE AND EFFECT prompt word by observing, it is known that corresponding sentence is based primarily upon cause and effect passIt is prompt word, context-sensitive morphology, syntactic feature, due to the randomness and diversity that Chinese expression is semantic, so these are specialWeak dependence is presented between sign, corresponding Chinese expresses weight justice and most of cause and effect sentence is caused not have complete syntactic structure, i.e., are as follows: instructionPractice the loss of the data of sample.To sum up feature carries out the differentiation of sentence using NB Algorithm.It can be calculated by following equationIts posterior probability:
A kind of construction method of causality knowledge base provided in an embodiment of the present invention, determination include the reality that identification obtainsThe collection of body pair is combined into after causality knowledge base, can also include:
Will the obtained entity of identification to cartesian product pairing is carried out, determine obtained multiple entities to being novel entitiesIt is right;
To novel entities to comprising entity carry out cluster operation, obtain multiple set being made of causality entity setsIt is right, wherein comprising being classified as of a sort reason entity or fructufy in cluster operation in each causality entity setsBody;
Retain it is each set to comprising the co-occurrence frequency in data source be greater than frequency threshold novel entities pair, delete otherEntity pair.
Entity obtained in step S12 is to may be due to lengthy and jumbled and meaningless, it is difficult to form knowledge.Therefore the present embodimentIn based on the similitude of entity to the entity of identification to doing cluster operation.Specifically, to identification obtain whole entities (includingReason entity and result entity) cartesian product pairing is carried out, multiple novel entities are obtained, cluster behaviour is carried out to this multiple novel entitiesMake, obtain corresponding multiple set pair, each set centering includes the causality entity sets and result of reason entity compositionEntity composition causality entity sets, and then only by entity representative in each causality entity sets,Greater than the novel entities of the frequency threshold set according to actual needs to reservation, remaining is then deleted co-occurrence frequency as in data sourceIt removes.Wherein, the co-occurrence frequency of novel entities pair is that novel entities account in data source the sentence sum occurred jointly in data sourceThe probability for the whole sentences sum for including, so that it is determined that by the most representative entity of the entity centering being all likely to occurIt is right, and then ensure that the accuracy that prediction of result is realized using causality knowledge base.
It illustrates below and the present embodiment is specifically described, such as sentence are as follows: due to climate warming, snowcap melts, drawsIt sends out snow property mountain torrents molten, causes more areas disaster-stricken, make the common people by serious financial consequences.Indicate causal entity to including:Temperature rise snow melting mountain torrents → disaster-stricken incur loss.To entity to cartesian product pairing is carried out, obtain as followsEach novel entities pair:
Data source, which is based on, based on TF-IDF method counts each novel entities centering reason part and result part in data sourceIn co-occurrence frequency.It is superseded lower than being carried out as trustless causality if certain threshold value if its co-occurrence frequency;Otherwise retained as trusted causality.Retain its co-occurrence frequency as later period causal trusted simultaneouslySpend index.To novel entities to comprising whole entities carry out cluster operation, obtain effect picture as shown in Figure 2, will most generationThe entity of table is retained, and when input " overcasting ", " lightning accompanied by peals of thunder ", following result can be obtained:
Wherein, 0.38,0.52,0.78 is the co-occurrence frequency for corresponding to novel entities pair, and alternatively referred to as thus kind reason leads to thisThe intensity that kind result occurs.
A kind of construction method of causality knowledge base provided in an embodiment of the present invention can also include:
The co-occurrence frequency of the novel entities pair of each reservation is added into causality knowledge base.
It should be noted that the co-occurrence frequency of novel entities pair can consider that novel entities cause it to correspond to result corresponding reasonIntensity, therefore the co-occurrence frequency of novel entities pair is added into causality knowledge base, is inquired when can be for needing, intoThe perfect causality knowledge base of one step.
A kind of construction method of causality knowledge base provided in an embodiment of the present invention can also include:
Calculate the support of each novel entities pair according to the following formula, and by the support of each novel entities pair be added to becauseIn fruit relational knowledge base:
SupportNum=(α * Adverb+ β * SentenceType+ γ * Emotion) * Negative;
Wherein, α, β and γ are preset weight coefficient, α > β > γ, and alpha+beta+γ=1;SupportNum isSupport, Adverb are that novel entities correspond to the degree adverb reciprocal fraction for including in sentence, and SentenceType is novel entities pairThe causality prompt word reciprocal fraction for including in sentence is answered, Emotion is that novel entities correspond to the emotion word pair for including in sentenceScore is answered, Negative is that novel entities correspond to the negative word reciprocal fraction for including in sentence.
Wherein, the specific value of every weight coefficient and the score value of various words can be set according to actual needsFixed, the present embodiment on the basis of causality, mentions in excavating sentence according to emotion word, degree adverb, causality in sentenceShow that word and negative word calculate causal support between reason entity and corresponding result entity.It should be noted that cause and effectThe degree namely reason part that the support or intensity of relationship refer to that reason entity influence result entity occurs in cause and effect sentence haveGreat probability leads to the generation of result;And support and intensity are to indicate above-mentioned implication from different perspectives.
Degree adverb refers to the adverbial word that adverbial word or adjective are modified or limited in degree in sentence, is used forExpress semantic intensity or word denoting the receiver of an action degree in sentence.The semantic component served as according to degree adverb in text, it is known that either oppositeDegree adverb or degree absolute adverbial word, all various trait sentimental polarity degree in expression text.Therefore, it is based on Chinese knowledgeThe magnitude classification method of middle degree adverb requires according to causality support, degree adverb is made the appropriate adjustments and is assigned respectivelyIt, specifically can be as shown in table 1 with different polarity numbers (score):
1 degree adverb polarity number of table
In sentence emotion word be expression actor tendentiousness is passed judgement on to word denoting the receiver of an action person, have certain emotion, becauseAlso there is a degree of polarity effect, so the present embodiment is closed using Chinese emotion word as cause and effect is influenced in the expression of fruit relationshipIt is that a weak factor of support is paid attention to.It can be that each emotion word sets corresponding score previously according to actual needs,In general, emotion word indicate pass judgement on tendentiousness be more obvious, emotion it is heavier, corresponding score is bigger;It can also be based onEmotion word is divided into 7 major class and 20 groups by the ontology library of the third party's emotion vocabulary obtained in advance;And by the polarity of emotion wordIt is divided into 9,7,5,3,1 five ranks by descending order, wherein the classification chart of emotion word can be as shown in table 2, and each emotion word existsThere is different ranks in different groups, rank of each emotion word in all groups can be weighted at this timeTo value be then corresponding emotion word score, weight coefficient can be set according to actual needs, such as " happiness " this emotionWord rank in " happy " this group is 9, and rank is 7 in this group in " feeling at ease ", etc..
The classification of 2 emotion word of table
For causality between expressing entity in Chinese based on different causality prompt words, the cause and effect pass of expressionSystem will have different semantic intensity.The causal semanitics of rigorous auxiliary type causality prompt word expression are opposite be based on it is fuzzy becauseThe causal semanitics of fruit relationship prompt word expression will have stronger support.Such theory can be explained to a certain extent are as follows: baseThere is stronger certainty in the causality that Chinese corpus extracts in rigorous causality prompt word.The present embodiment is based on thisKind theory is according to the difference of causality prompt word, and the causality to extract is assigned to different cause and effect supports, such as 3 institute of tableShow:
3 causality prompt word polarity number of table
| Cause and effect prompt word part of speech | Polarity number | Cause and effect prompt word part of speech | Polarity number |
| Nested cause and effect conjunction | 0.7 | The adverbial word of table cause and effect | 0.3 |
| Single conjunction of table cause and effect | 0.5 | The verb of table cause and effect | 0.3 |
| The preposition of table cause and effect | 0.1 | The verb of table result production | 0.6 |
Causality generally can be divided into positive association and two kinds of negative customers;It is having the same that positive association often shows as cause and effectTrend feature, i.e. cause and effect have the tendency that increase or reduction simultaneously;Negative customers then have different growths to become on the contrary, showing as the twoGesture;I.e. reason causes the generation of result and reason to inhibit the generation of result.The identification of such semantic resultsCausal judgement is carried out based on negative word herein.If expressing in the sentence of cause and effect and negative word occur, then it represents that suchCausality has inhibition relationship.Therefore it is recognized in the present embodiment based on negative word causal positive and negative.Specifically, negateWord reciprocal fraction is that there are negative words in if statement, it is determined that negative word reciprocal fraction is 1, otherwise then determines negative word pairAnswering score is -1, and negates that vocabulary can be as shown in table 4:
The Chinese negative word of table 4
To be indicated by calculating the degree adverb, causality prompt word, emotion word and the negative word that include in sentenceThe support of influence degree between causality, and record into causality knowledge base, thus further perfect cause and effect passIt is knowledge base.It should be noted that if the sentence comprising some novel entities pair be it is multiple, this multiple sentence are new for thisSupport of the entity to the mean value of calculated support as the novel entities pair;In addition, obtaining the intensity and branch of novel entities pairAfter degree of holding, it can be deduced that such as the causality of several classifications in table 5, wherein strong reason → strong resulting class is corresponding new realThe support and intensity of body pair are respectively greater than or are equal to corresponding threshold value, and weak reason → strong resulting class corresponds to novel entities pairSupport is greater than corresponding threshold value, intensity is less than corresponding threshold value, and weak reason → weak resulting class corresponds to the support of novel entities pairAnd intensity is respectively less than or equal to corresponding threshold value, the support that strong reason → weak resulting class corresponds to novel entities pair is less than pairThreshold value, intensity is answered to be greater than corresponding threshold value, wherein every threshold value can be set according to actual needs, and can also will be eachThe classification of novel entities pair is also added in causality knowledge base, for inquiry.
5 causality category table of table
| Strong reason → strong result | Weak reason → weak result |
| Weak reason → strong result | Strong reason → weak result |
A kind of construction method of causality knowledge base provided in an embodiment of the present invention can also include:
Reason entity to be predicted is compared with each reason entity in newest obtained causality knowledge base, and is exportedNovel entities corresponding with entity the reason of reason Entities Matching to be predicted to comprising result entity, novel entities are to corresponding co-occurrence frequencyRate and novel entities are to corresponding support.
Wherein, any to need the reason of realizing prediction of result correspondent entity and can be used as reason entity to be predicted, it is based onNewest obtained causality knowledge base is capable of determining that (same or similar degree is greater than preparatory root with reason Entities Matching to be predictedAccording to the similarity threshold of actual needs setting) the reason of entity, and then determine the corresponding result entity of the reason entity be with toIt predicts the corresponding result entity of reason entity, and then realizes prediction of result, and will result corresponding with reason entity to be predictedIntensity, support and the classification of the affiliated novel entities pair of entity export, to realize the comprehensive and integrity of result output.
In addition, realize the result correspondent entity of reason prediction if necessary, then it can be using the entity as knot to be predictedSporocarp, so that result entity to be predicted is compared with each result entity in newest obtained causality knowledge base, andNovel entities corresponding with the matched result entity of result entity to be predicted are exported to entity, novel entities the reason of including to corresponding totalExisting frequency and novel entities are to corresponding support;Wherein, with the matched result entity of result entity to be predicted can for to pre-Survey the result entity that the same or similar degree of result entity is greater than the similarity threshold previously according to actual needs setting, Jin ErshiExisting reason prediction, certainly can also export the classification of novel entities pair corresponding with result entity to be predicted, to realize knotThe comprehensive and integrity of fruit output.
Technical solution disclosed in the present application excavates the causality between entity, under internet big data background with cause and effectThe form of relationship entity pair is presented to user, on the one hand alleviates the puzzlement that people are brought by " information overload ", on the other handThe advantage for taking full advantage of big data has pushed the change of information technology;Meanwhile using a kind of increment type duration machine learningFrame carries out increment extraction to causality entity, improves the real-time effectiveness of causality knowledge base.As it can be seen that the application is publicThe technical solution opened is under internet big data background, using eternal learning framework, increment type decimation pattern, at the same learn just,Negative customers and causalnexus intensity, with building and gradually complete causality knowledge base, to qualitative reasoning and north because explaining offerIt supports.In the environment of " internet "+" big data ", it can make full use of community network resource, acquire Various types of data in real time;InstituteThe causality knowledge base of building, by effectively in promote information-intensive society process, be public demands and scientific and technological resources basic dataShared to provide big data analysis service, industrial application prospect is wide.
The embodiment of the invention also provides a kind of construction devices of causality knowledge base, as shown in figure 3, may include:
Preprocessing module 11, is used for: being processed into multiple sentences by obtaining text data in data source, and by text data;
Module 12 is constructed, is used for: determining that there are causal cause and effect sentence, identifications in the multiple sentences obtained by processingThe entity pair for including in cause and effect sentence, and determination includes to identify that the collection of obtained entity pair is combined into causality knowledge base;Wherein,Entity is to including reason entity and result entity;
Incremental learning module 13, is used for: it is every at the time of by obtaining text data in data source to pass through preset time period,Then judge whether the total amount of data changed in data source reaches data-quantity threshold, executes if it is, returning by being obtained in data sourceThe step of taking text data, if it is not, then determining without rebuilding building causality knowledge base.
A kind of construction device of causality knowledge base provided in an embodiment of the present invention, building module may include:
First judging unit, is used for: determine that any sentence is current statement, judge in current statement whether comprising it is clear becauseAny clear causality prompt word for including in fruit relationship prompt set of words, if it is, determine that current statement is cause and effect sentence,If it is not, then determining that current statement is not cause and effect sentence;Wherein, specifying causality prompt word is to show to determine that there are cause and effect passesThe causality prompt word of system.
A kind of construction device of causality knowledge base provided in an embodiment of the present invention, building module can also include:
Second judgment unit is used for: before to determine current statement not be cause and effect sentence, if do not included in current statement brightWhether any clear causality prompt word for including in true causality prompt set of words then judges in current statement comprising mouldAny FUZZY RELATION OF CAUSE AND EFFECT prompt word for including in paste causality prompt set of words, if it is, converting current statement toThe classifier being pre-created is input to after feature vector, if the result of classifier output is preset value, it is determined that current statementFor cause and effect sentence, if the result of classifier output is not preset value, it is determined that current statement is not cause and effect sentence, if it is not, thenIt indicates that the first judging unit executes and determines the step of current statement is not cause and effect sentence;Wherein, FUZZY RELATION OF CAUSE AND EFFECT prompt word isShow there may be causal causality prompt word, classifier is to have FUZZY RELATION OF CAUSE AND EFFECT prompt word using multipleWhether feature vector that sentence is converted to and corresponding sentence are that the label training of cause and effect sentence obtains.
A kind of construction device of causality knowledge base provided in an embodiment of the present invention can also include:
Reprocess module, be used for: determination includes that the collection for the entity pair that identification obtains is combined into after causality knowledge base,Will the obtained entity of identification to cartesian product pairing is carried out, determine obtained multiple entities to being novel entities pair;To new realityBody to comprising entity carry out cluster operation, obtain multiple set pair being made of causality entity sets, wherein Mei GeyinComprising being classified as of a sort reason entity or result entity in cluster operation in fruit relationship entity set;Retain each setTo comprising the co-occurrence frequency in data source be greater than frequency threshold novel entities pair, delete other entities pair.
A kind of construction device of causality knowledge base provided in an embodiment of the present invention can also include:
Adding module is used for: the co-occurrence frequency of the novel entities pair of each reservation is added into causality knowledge base.
A kind of construction device of causality knowledge base provided in an embodiment of the present invention can also include:
Computing module is used for: calculating the support of each novel entities pair according to the following formula, and by each novel entities pairSupport is added into causality knowledge base:
SupportNum=(α * Adverb+ β * SentenceType+ γ * Emotion) * Negative;
Wherein, α, β and γ are preset weight coefficient, α > β > γ, and alpha+beta+γ=1;SupportNum isSupport, Adverb are that novel entities correspond to the degree adverb reciprocal fraction for including in sentence, and SentenceType is novel entities pairThe causality prompt word reciprocal fraction for including in sentence is answered, Emotion is that novel entities correspond to the emotion word pair for including in sentenceScore is answered, Negative is that novel entities correspond to the negative word reciprocal fraction for including in sentence.
A kind of construction device of causality knowledge base provided in an embodiment of the present invention can also include:
Comparison module is used for: by each reason entity in reason entity to be predicted and newest obtained causality knowledge baseBe compared, and export corresponding with entity the reason of reason Entities Matching to be predicted novel entities to comprising result entity, new realityBody is to corresponding co-occurrence frequency and novel entities to corresponding support.
The embodiment of the invention also provides a kind of building equipment of causality knowledge base, may include:
Memory, for storing computer program;
Processor realizes the step of the as above construction method of any one causality knowledge base when for executing computer programSuddenly.
The embodiment of the invention also provides a kind of computer readable storage medium, it is stored on computer readable storage mediumComputer program realizes the step of the as above construction method of any one causality knowledge base when computer program is executed by processorSuddenly.
It should be noted that construction device, equipment and the meter of a kind of causality knowledge base provided in an embodiment of the present inventionThe explanation of relevant portion refers to a kind of causality knowledge base provided in an embodiment of the present invention in calculation machine readable storage medium storing program for executingThe detailed description of corresponding part in construction method, details are not described herein.In addition above-mentioned technical proposal provided in an embodiment of the present inventionIn with correspond to the consistent part of technical solution realization principle and unspecified in the prior art, in order to avoid excessively repeat.
The foregoing description of the disclosed embodiments can be realized those skilled in the art or using the present invention.To thisA variety of modifications of a little embodiments will be apparent for a person skilled in the art, and the general principles defined herein canWithout departing from the spirit or scope of the present invention, to realize in other embodiments.Therefore, the present invention will not be limitedIt is formed on the embodiments shown herein, and is to fit to consistent with the principles and novel features disclosed in this article widestRange.