Movatterモバイル変換


[0]ホーム

URL:


CN109308323A - A method, device and device for constructing a causal relationship knowledge base - Google Patents

A method, device and device for constructing a causal relationship knowledge base
Download PDF

Info

Publication number
CN109308323A
CN109308323ACN201811494944.3ACN201811494944ACN109308323ACN 109308323 ACN109308323 ACN 109308323ACN 201811494944 ACN201811494944 ACN 201811494944ACN 109308323 ACN109308323 ACN 109308323A
Authority
CN
China
Prior art keywords
causality
entity
sentence
knowledge base
cause
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811494944.3A
Other languages
Chinese (zh)
Inventor
高云龙
朱明�
郝志成
吴川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changchun Institute of Optics Fine Mechanics and Physics of CAS
Original Assignee
Changchun Institute of Optics Fine Mechanics and Physics of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changchun Institute of Optics Fine Mechanics and Physics of CASfiledCriticalChangchun Institute of Optics Fine Mechanics and Physics of CAS
Priority to CN201811494944.3ApriorityCriticalpatent/CN109308323A/en
Publication of CN109308323ApublicationCriticalpatent/CN109308323A/en
Pendinglegal-statusCriticalCurrent

Links

Landscapes

Abstract

The invention discloses construction method, device, equipment and the computer readable storage mediums of a kind of causality knowledge base, this method comprises: being processed into multiple sentences by obtaining text data in data source, and by text data;It determines to identify the entity pair for including in cause and effect sentence, and determination includes to identify that the collection of obtained entity pair is combined into causality knowledge base there are causal cause and effect sentence in the multiple sentences obtained by processing;Entity is to including reason entity and result entity;It is every at the time of by obtaining text data in data source to pass through preset time period, then judge whether the total amount of data changed in data source reaches data-quantity threshold, if it is, then return to execution is by obtaining text data in data source the step of, if it is not, then determining without rebuilding building causality knowledge base.To which the frame based on eternal study ensure that the real-time effectiveness of causality knowledge base, and then it ensure that accuracy when realizing prediction of result based on causality knowledge base.

Description

A kind of construction method, device and the equipment of causality knowledge base
Technical field
The present invention relates to data analysis technique fields, more specifically to a kind of building side of causality knowledge baseMethod, device, equipment and computer readable storage medium.
Background technique
With the rapid development of Internet technology, author of each user as data shows each " individual ", promoteesThe explosive growth of data is made.
As the carrier of expression self daily behavior and social affection, a large amount of individual is contained in text data for lifeThe summary of experience living, wherein including incidence relation or even causality between things abundant.Causality is contacted between thingsMain form, be widely used in the fields such as economic, medical treatment, military and safety.Generally for prediction of result is realized, needCreation includes the causality knowledge base of reason and result correspondent entity, so for need realize prediction the reason of, by becauseIt is determined in fruit relational knowledge base corresponding with the reason as a result, realizing prediction of result.Wherein, causality knowledge base is pre- in resultPlay the role of very important in survey, be typically based on certain data in the prior art and construct after obtaining causality knowledge base,Then realized every time using the causality knowledge base as the later period prediction of result based on causality knowledge base, but due to cause and effectRelationship may change because of time change, this to change including causal increase, reduction and change etc., and existing skillSubsequent each prediction of result is realized based on the causality knowledge base after constructing causality knowledge base in art, it is clear that can depositIn the lower problem of accuracy.
In conclusion when realizing that the technical solution of causality construction of knowledge base has realization prediction of result in the prior artThe lower problem of accuracy.
Summary of the invention
The object of the present invention is to provide a kind of construction method of causality knowledge base, device, equipment and computer-readableStorage medium is able to solve and realizes the existing realization prediction of result of the technical solution of causality construction of knowledge base in the prior artWhen the lower problem of accuracy.
To achieve the goals above, the invention provides the following technical scheme:
A kind of construction method of causality knowledge base, comprising:
Multiple sentences are processed by obtaining text data in data source, and by the text data;
It determines in the multiple sentences obtained by processing there are causal cause and effect sentence, identifies in the cause and effect sentence and includeEntity pair, and determination includes that the obtained collection of entity pair of identification is combined into causality knowledge base;Wherein, the entity is to packetInclude reason entity and result entity;
It is every at the time of by obtaining text data in the data source to pass through preset time period, then judge the data sourceWhether the total amount of data of middle variation reaches data-quantity threshold, if it is, it is described by obtaining textual data in data source to return to executionAccording to the step of, if it is not, then determine without rebuild building causality knowledge base.
Preferably, determine whether any sentence is that there are causal cause and effect sentences, comprising:
It determines that any sentence is current statement, whether judges in current statement comprising in clear causality prompt set of wordsAny clear causality prompt word for including, if it is, determining that current statement is cause and effect sentence, if it is not, then determining currentSentence is not cause and effect sentence;Wherein, the clear causality prompt word is to show to determine that there are causal causalitiesPrompt word.
Preferably, before to determine current statement not be cause and effect sentence, further includes:
If not including any clear causality for including in clear causality prompt set of words in current statement to mentionShow word, then judges whether any FUZZY RELATION OF CAUSE AND EFFECT comprising including in FUZZY RELATION OF CAUSE AND EFFECT prompt set of words mentions in current statementShow word, if it is, current statement is converted to the classifier for being input to and being pre-created after feature vector, if the classifierThe result of output is preset value, it is determined that current statement is cause and effect sentence, if the result of classifier output is not preset value,It then determines that current statement is not cause and effect sentence, determines the step of current statement is not cause and effect sentence if it is not, then executing;Wherein,The FUZZY RELATION OF CAUSE AND EFFECT prompt word is to show there may be causal causality prompt word, and the classifier is to utilizeFeature vector that multiple sentences with FUZZY RELATION OF CAUSE AND EFFECT prompt word are converted to and corresponding sentence whether be cause and effect sentence markLabel training obtains.
Preferably, determination includes that the collection for the entity pair that identification obtains is combined into after causality knowledge base, further includes:
Will the obtained entity of identification to cartesian product pairing is carried out, determine obtained multiple entities to being novel entitiesIt is right;
To the novel entities to comprising entity carry out cluster operation, obtain multiple being made of causality entity setsSet pair, wherein comprising being classified as of a sort reason entity or knot in cluster operation in each causality entity setsSporocarp;
Retain it is each set to comprising the co-occurrence frequency in the data source be greater than frequency threshold novel entities pair, deleteOther entities pair.
Preferably, further includes:
The co-occurrence frequency of the novel entities pair of each reservation is added into the causality knowledge base.
Preferably, further includes:
The support of each novel entities pair is calculated according to the following formula, and the support of each novel entities pair is added to instituteIt states in causality knowledge base:
SupportNum=(α * Adverb+ β * SentenceType+ γ * Emotion) * Negative;
Wherein, α, β and γ are preset weight coefficient, α > β > γ, and alpha+beta+γ=1;SupportNum isSupport, Adverb are that novel entities correspond to the degree adverb reciprocal fraction for including in sentence, and SentenceType is novel entities pairThe causality prompt word reciprocal fraction for including in sentence is answered, Emotion is that novel entities correspond to the emotion word pair for including in sentenceScore is answered, Negative is that novel entities correspond to the negative word reciprocal fraction for including in sentence.
Preferably, further includes:
Reason entity to be predicted is compared with each reason entity in newest obtained causality knowledge base, and is exportedNovel entities corresponding with entity the reason of reason Entities Matching to be predicted to comprising result entity, novel entities are to corresponding co-occurrence frequencyRate and novel entities are to corresponding support.
A kind of construction device of causality knowledge base, comprising:
Preprocessing module is used for: being processed into multiple languages by obtaining text data in data source, and by the text dataSentence;
Module is constructed, is used for: determining to identify institute there are causal cause and effect sentence in the multiple sentences obtained by processingThe entity pair for including in cause and effect sentence is stated, and determination includes to identify that the collection of obtained entity pair is combined into causality knowledge base;ItsIn, the entity is to including reason entity and result entity;
Incremental learning module, is used for: every at the time of by obtaining text data in the data source to pass through preset timeSection, then judge whether the total amount of data changed in the data source reaches data-quantity threshold, if it is, return execute it is described byThe step of text data is obtained in data source, if it is not, then determining without rebuilding building causality knowledge base.
A kind of building equipment of causality knowledge base, comprising:
Memory, for storing computer program;
Processor realizes the building of the as above any one causality knowledge base when for executing the computer programThe step of method.
A kind of computer readable storage medium is stored with computer program on the computer readable storage medium, describedThe step of as above construction method of any one causality knowledge base is realized when computer program is executed by processor.
The present invention provides construction method, device, equipment and the computer-readable storage mediums of a kind of causality knowledge baseMatter, wherein this method comprises: being processed into multiple sentences by obtaining text data in data source, and by the text data;ByIt manages and determines to identify the entity pair for including in the cause and effect sentence there are causal cause and effect sentence in obtained multiple sentences, andDetermination includes that the collection for the entity pair that identification obtains is combined into causality knowledge base;Wherein, the entity is to including reason entityAnd result entity;It is every at the time of by obtaining text data in the data source to pass through preset time period, then judge the numberWhether reach data-quantity threshold according to the total amount of data changed in source, if it is, it is described by obtaining text in data source to return to executionThe step of notebook data, if it is not, then determining without rebuilding building causality knowledge base.In technical solution disclosed in the present application,The text data in data source is obtained, the text data that will acquire is processed into after sentence, will exist in the cause and effect sentence in sentenceEntity to identifying processing, causality knowledge base is combined into obtain the collection comprising entity pair, to be based on the causality knowledgeCausal identification is realized in library, and then realizes prediction of result;And per after a period of time, then judging to become in data sourceWhether the data volume of change is enough, if it is, thinking causality present in data source, there may be more variations, in turnAgain realize otherwise the building of causality knowledge base then waits until next judgement.As it can be seen that technical solution disclosed in the present applicationIn, building complete causality knowledge base after, can also periodically judge the causality in data source whether may have occurred compared withMore variations rebuilds causality knowledge base when more variation may occur with the causality in data source, thusFrame based on this eternal study ensure that causality knowledge base matches with the causality in data source, so that cause and effectRelational knowledge base has real-time effectiveness, and then ensure that accuracy when realizing prediction of result based on causality knowledge base.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show belowThere is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only thisThe embodiment of invention for those of ordinary skill in the art without creative efforts, can also basisThe attached drawing of offer obtains other attached drawings.
Fig. 1 is a kind of flow chart of the construction method of causality knowledge base provided in an embodiment of the present invention;
Fig. 2 is cluster operation acquired results in a kind of construction method of causality knowledge base provided in an embodiment of the present inventionExemplary diagram;
Fig. 3 is a kind of structural schematic diagram of the construction device of causality knowledge base provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, completeSite preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based onEmbodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every otherEmbodiment shall fall within the protection scope of the present invention.
Referring to Fig. 1, it illustrates a kind of streams of the construction method of causality knowledge base provided in an embodiment of the present inventionCheng Tu may include:
S11: multiple sentences are processed by obtaining text data in data source, and by text data.
A kind of execution subject of the construction method of causality knowledge base provided in an embodiment of the present invention can be correspondingConstruction device;Under big data environment, selection is covered abundant in content, is related to that field is extensive, and by the data of big well-established approvalIt is the most important thing for carrying out knowledge extraction, the present embodiment select Chinese wikipedia as data source, real-time update, can be randomAccess, be maximum network resources system on internet, and most contents all pass through different user multiple edit validation it is completeAt the comprehensive and accuracy for having fully ensured that data.Natural language processing is carried out to the text data obtained in data source, is obtainedTo corresponding multiple sentences, sentence set XML file can be made of obtained whole sentences, it is this that text data is handledTechnical solution to sentence is consistent with the realization principle for corresponding to technical solution in the prior art, and details are not described herein.
S12: determining in the multiple sentences obtained by processing there are causal cause and effect sentence, identifies in cause and effect sentence and includesEntity pair, and determination includes that the obtained collection of entity pair of identification is combined into causality knowledge base;Wherein, entity is to including originalBecause of entity and result entity.
Wherein, entity is Chinese vocabulary, such as teacher, temperature gos up, incurs loss all is entity;Cause and effect sentence be there are becauseThe sentence of fruit relationship includes showing the entity (reason entity) of reason and showing the entity (result entity) of result in cause and effect sentence,After determining cause and effect sentence, by identified in cause and effect sentence it includes entity pair, so that obtaining the collection comprising entity pair is combined into cause and effectRelational knowledge base.Wherein, the realization principle one for identifying the entity pair for including in cause and effect sentence and corresponding to technical solution in the prior artIt causes, details are not described herein.It obtains in causality knowledge base, can realize causal knowledge based on causality knowledge baseNot, and then prediction of result is realized, specifically, when there is the reason of needing realization prediction, by reason entity and causalityReason entity in knowledge base is compared, if existing in causality knowledge base and needing the reason of predicting Entities MatchingThe reason of (identical) entity, it is determined that the result entity that entity centering corresponding with the reason of predicting entity is needed includes be and needThe reason of predicting entity corresponds to result entity namely the bright result of the fructufy body surface is to need the reason of predicting corresponding knotFruit.
S13: it is every at the time of by obtaining text data in data source to pass through preset time period, then judge to become in data sourceWhether the total amount of data of change reaches data-quantity threshold, if it is, the step of executing by obtaining text data in data source is returned,If it is not, then determining without rebuilding building causality knowledge base.
Wherein, preset time period and the specific value of data-quantity threshold can be set according to actual needs, byAt the time of once by obtaining text data in database, every data volume for then judging to change in data source by certain timeIt is whether enough, to thereby determine whether to need to rebuild causality knowledge base.It should be noted that data-quantity threshold can rootIt is set according to actual needs, the data volume changed in data source is enough it may be considered that may have occurred in data source moreCausality variation, at this time carry out causality knowledge base reconstruction, and building obtain new causality knowledge base,It then needs to realize prediction of result using newest obtained causality knowledge base when needing to carry out prediction of result again.As it can be seen that this ShenPlease the variable quantity timing based on data in data source in embodiment judge whether the building for needing to carry out causality knowledge base, andThe reconstruction that causality knowledge base is realized when the data volume for determining variation is sufficiently large, to ensure that causality knowledge baseThe data for having occurred and that variation can be met, be also to construct complete and practical causality knowledge base, the present embodiment is using everyWhen more causality variation may occur in data source, then this eternal study of reconstruction of causality knowledge base is carried outFrame, increment type decimation pattern ensure that the real-time effectiveness of causality knowledge base.
In technical solution disclosed in the present application, the text data in data source is obtained, the text data that will acquire is processed intoAfter sentence, by entity present in the cause and effect sentence in sentence to identifying processing, cause and effect is combined into obtain the collection comprising entity pairRelational knowledge base, to realize causal identification based on the causality knowledge base;And per after a period of time, then judgingWhether changed data volume is enough in data source, if it is, thinking that causality present in data source may depositIn more variation, and then the building of causality knowledge base is realized again, otherwise, then wait until next judgement.As it can be seen that the applicationIn disclosed technical solution, after causality knowledge base is completed in building, it can also periodically judge that the causality in data source isIt is no to may have occurred more variation, cause and effect pass is rebuild when more variation may occur with the causality in data sourceIt is knowledge base, so that the frame based on this eternal study ensure that causality knowledge base and the causality phase in data sourceMatching so that causality knowledge base has real-time effectiveness, and then ensure that and realize that result is pre- based on causality knowledge baseAccuracy when survey.
A kind of construction method of causality knowledge base provided in an embodiment of the present invention determines whether any sentence is presenceCausal cause and effect sentence may include:
It determines that any sentence is current statement, whether judges in current statement comprising in clear causality prompt set of wordsAny clear causality prompt word for including, if it is, determining that current statement is cause and effect sentence, if it is not, then determining currentSentence is not cause and effect sentence;Wherein, specifying causality prompt word is to show to determine that there are the prompts of causal causalityWord.
It should be noted that clear causality prompt word is to show that corresponding sentence determines that there are causal cause and effect passesBe prompt word, such as because, lead to word;It can be prompted to summarize the clear causality obtained in advance by staffSet of words, as long as thus any clear causality prompt in sentence comprising including in clear causality prompt set of wordsWord, then it is assumed that corresponding sentence is to determine that there are causal cause and effect sentences, otherwise, then it is assumed that corresponding sentence is not cause and effect sentence.FromAnd it can quickly and effectively determine whether sentence is cause and effect sentence in this way.
A kind of construction method of causality knowledge base provided in an embodiment of the present invention, determining current statement not is cause and effect languageBefore sentence, can also include:
If not including any clear causality for including in clear causality prompt set of words in current statement to mentionShow word, then judges whether any FUZZY RELATION OF CAUSE AND EFFECT comprising including in FUZZY RELATION OF CAUSE AND EFFECT prompt set of words mentions in current statementShow word, if it is, current statement is converted to the classifier for being input to and being pre-created after feature vector, if classifier exportsResult be preset value, it is determined that current statement be cause and effect sentence, if classifier output result be not preset value, it is determined that whenPreceding sentence is not cause and effect sentence, determines the step of current statement is not cause and effect sentence if it is not, then executing;Wherein, fuzzy causationRelationship prompt word is to show there may be causal causality prompt word, and classifier is to have fuzzy causation using multipleWhether feature vector that the sentence of relationship prompt word is converted to and corresponding sentence are that the label training of cause and effect sentence obtains.
It should be noted that FUZZY RELATION OF CAUSE AND EFFECT prompt word be show corresponding sentence there may be but cannot completely reallySurely whether there is causal causality prompt word, such as then, following word;It can be summarized in advance for staffFUZZY RELATION OF CAUSE AND EFFECT out prompts set of words, when thus not including clear causality prompt word in sentence, in if statementInclude any FUZZY RELATION OF CAUSE AND EFFECT prompt word for including in FUZZY RELATION OF CAUSE AND EFFECT prompt set of words, then it is assumed that corresponding sentence is possibleThere are causal cause and effect sentences, otherwise, then it is assumed that corresponding sentence is not cause and effect sentence.To be further ensured that whether judge sentenceFor the accuracy of cause and effect sentence.In addition, existing technology is only limitted to carry out relationship in the sentence for containing causality prompt wordExtraction, therefore comprehensively, accurately identification causality prompt word for promoted obtain causal quality have it is greatHelp.The present embodiment is based on Chinese literature knowledge and comprehensively summarizes existing causality prompt word, to accurately divideClass difference causality, so carry out through this embodiment sentence whether be cause and effect sentence multiple judgement.Additionally need explanationBe, in the present embodiment using classifier realize sentence whether be cause and effect sentence judgement, actually be use LTP natural language processing skillArt, according to the judgement of the realizations such as part-of-speech rule, syntax dependence, principal component analysis.It specifically, can be by manually marking instructionPractice collection, the training sample for including in training set be the feature vector being converted by the sentence comprising FUZZY RELATION OF CAUSE AND EFFECT prompt word andThis feature vector corresponds to whether sentence is that the label of cause and effect sentence forms, to can identify using training set training is availableThe classifier of cause and effect sentence, and then realize the identification of FUZZY RELATION OF CAUSE AND EFFECT prompt word, accuracy is higher.Wherein, classifier can beThe classifier that NB Algorithm is realized, in simple terms, whether the sentence containing FUZZY RELATION OF CAUSE AND EFFECT prompt word is cause and effect sentenceIdentification problem can directly be defined as two classification problems, i.e., are as follows: be or be not cause and effect sentence (0 | 1 problem).Based on engineeringLearning method can convert sentence to feature vector, for example the feature vector of sentence X indicates are as follows: X=(x1, x2...xn), xi (iIt is indicated for 1 to the numeralization for n) being sentence word sequence;Similarly, the categorised decision variable C={ 1,0 } of cause and effect sentence, 1 expression sentence areCause and effect sentence, 0 indicates that sentence is non-causal sentence.Using machine learning algorithm to given one group of training set { { X1 }, { X2 } ...{ Xn } }, it whether is cause and effect sentence according to following equation learning objective sentence.
f:X→C
Using the method for machine learning based on the training set Study strategies and methods f manually marked.So that giving new sentence justCan judge whether it is cause and effect sentence.
In addition, the NB Algorithm due to text classification has stable classification effectiveness and higher classification performance,Classifier is realized using NB Algorithm in the present embodiment.The algorithm is less sensitive to missing data, and feature vector is eachComponent is relatively independent relative to decision variable, so that the adaptability of the algorithm compares analogous algorithms with complexity with apparent excellentGesture.There is the characteristic of division of the sentence of FUZZY RELATION OF CAUSE AND EFFECT prompt word by observing, it is known that corresponding sentence is based primarily upon cause and effect passIt is prompt word, context-sensitive morphology, syntactic feature, due to the randomness and diversity that Chinese expression is semantic, so these are specialWeak dependence is presented between sign, corresponding Chinese expresses weight justice and most of cause and effect sentence is caused not have complete syntactic structure, i.e., are as follows: instructionPractice the loss of the data of sample.To sum up feature carries out the differentiation of sentence using NB Algorithm.It can be calculated by following equationIts posterior probability:
A kind of construction method of causality knowledge base provided in an embodiment of the present invention, determination include the reality that identification obtainsThe collection of body pair is combined into after causality knowledge base, can also include:
Will the obtained entity of identification to cartesian product pairing is carried out, determine obtained multiple entities to being novel entitiesIt is right;
To novel entities to comprising entity carry out cluster operation, obtain multiple set being made of causality entity setsIt is right, wherein comprising being classified as of a sort reason entity or fructufy in cluster operation in each causality entity setsBody;
Retain it is each set to comprising the co-occurrence frequency in data source be greater than frequency threshold novel entities pair, delete otherEntity pair.
Entity obtained in step S12 is to may be due to lengthy and jumbled and meaningless, it is difficult to form knowledge.Therefore the present embodimentIn based on the similitude of entity to the entity of identification to doing cluster operation.Specifically, to identification obtain whole entities (includingReason entity and result entity) cartesian product pairing is carried out, multiple novel entities are obtained, cluster behaviour is carried out to this multiple novel entitiesMake, obtain corresponding multiple set pair, each set centering includes the causality entity sets and result of reason entity compositionEntity composition causality entity sets, and then only by entity representative in each causality entity sets,Greater than the novel entities of the frequency threshold set according to actual needs to reservation, remaining is then deleted co-occurrence frequency as in data sourceIt removes.Wherein, the co-occurrence frequency of novel entities pair is that novel entities account in data source the sentence sum occurred jointly in data sourceThe probability for the whole sentences sum for including, so that it is determined that by the most representative entity of the entity centering being all likely to occurIt is right, and then ensure that the accuracy that prediction of result is realized using causality knowledge base.
It illustrates below and the present embodiment is specifically described, such as sentence are as follows: due to climate warming, snowcap melts, drawsIt sends out snow property mountain torrents molten, causes more areas disaster-stricken, make the common people by serious financial consequences.Indicate causal entity to including:Temperature rise snow melting mountain torrents → disaster-stricken incur loss.To entity to cartesian product pairing is carried out, obtain as followsEach novel entities pair:
Data source, which is based on, based on TF-IDF method counts each novel entities centering reason part and result part in data sourceIn co-occurrence frequency.It is superseded lower than being carried out as trustless causality if certain threshold value if its co-occurrence frequency;Otherwise retained as trusted causality.Retain its co-occurrence frequency as later period causal trusted simultaneouslySpend index.To novel entities to comprising whole entities carry out cluster operation, obtain effect picture as shown in Figure 2, will most generationThe entity of table is retained, and when input " overcasting ", " lightning accompanied by peals of thunder ", following result can be obtained:
Wherein, 0.38,0.52,0.78 is the co-occurrence frequency for corresponding to novel entities pair, and alternatively referred to as thus kind reason leads to thisThe intensity that kind result occurs.
A kind of construction method of causality knowledge base provided in an embodiment of the present invention can also include:
The co-occurrence frequency of the novel entities pair of each reservation is added into causality knowledge base.
It should be noted that the co-occurrence frequency of novel entities pair can consider that novel entities cause it to correspond to result corresponding reasonIntensity, therefore the co-occurrence frequency of novel entities pair is added into causality knowledge base, is inquired when can be for needing, intoThe perfect causality knowledge base of one step.
A kind of construction method of causality knowledge base provided in an embodiment of the present invention can also include:
Calculate the support of each novel entities pair according to the following formula, and by the support of each novel entities pair be added to becauseIn fruit relational knowledge base:
SupportNum=(α * Adverb+ β * SentenceType+ γ * Emotion) * Negative;
Wherein, α, β and γ are preset weight coefficient, α > β > γ, and alpha+beta+γ=1;SupportNum isSupport, Adverb are that novel entities correspond to the degree adverb reciprocal fraction for including in sentence, and SentenceType is novel entities pairThe causality prompt word reciprocal fraction for including in sentence is answered, Emotion is that novel entities correspond to the emotion word pair for including in sentenceScore is answered, Negative is that novel entities correspond to the negative word reciprocal fraction for including in sentence.
Wherein, the specific value of every weight coefficient and the score value of various words can be set according to actual needsFixed, the present embodiment on the basis of causality, mentions in excavating sentence according to emotion word, degree adverb, causality in sentenceShow that word and negative word calculate causal support between reason entity and corresponding result entity.It should be noted that cause and effectThe degree namely reason part that the support or intensity of relationship refer to that reason entity influence result entity occurs in cause and effect sentence haveGreat probability leads to the generation of result;And support and intensity are to indicate above-mentioned implication from different perspectives.
Degree adverb refers to the adverbial word that adverbial word or adjective are modified or limited in degree in sentence, is used forExpress semantic intensity or word denoting the receiver of an action degree in sentence.The semantic component served as according to degree adverb in text, it is known that either oppositeDegree adverb or degree absolute adverbial word, all various trait sentimental polarity degree in expression text.Therefore, it is based on Chinese knowledgeThe magnitude classification method of middle degree adverb requires according to causality support, degree adverb is made the appropriate adjustments and is assigned respectivelyIt, specifically can be as shown in table 1 with different polarity numbers (score):
1 degree adverb polarity number of table
In sentence emotion word be expression actor tendentiousness is passed judgement on to word denoting the receiver of an action person, have certain emotion, becauseAlso there is a degree of polarity effect, so the present embodiment is closed using Chinese emotion word as cause and effect is influenced in the expression of fruit relationshipIt is that a weak factor of support is paid attention to.It can be that each emotion word sets corresponding score previously according to actual needs,In general, emotion word indicate pass judgement on tendentiousness be more obvious, emotion it is heavier, corresponding score is bigger;It can also be based onEmotion word is divided into 7 major class and 20 groups by the ontology library of the third party's emotion vocabulary obtained in advance;And by the polarity of emotion wordIt is divided into 9,7,5,3,1 five ranks by descending order, wherein the classification chart of emotion word can be as shown in table 2, and each emotion word existsThere is different ranks in different groups, rank of each emotion word in all groups can be weighted at this timeTo value be then corresponding emotion word score, weight coefficient can be set according to actual needs, such as " happiness " this emotionWord rank in " happy " this group is 9, and rank is 7 in this group in " feeling at ease ", etc..
The classification of 2 emotion word of table
For causality between expressing entity in Chinese based on different causality prompt words, the cause and effect pass of expressionSystem will have different semantic intensity.The causal semanitics of rigorous auxiliary type causality prompt word expression are opposite be based on it is fuzzy becauseThe causal semanitics of fruit relationship prompt word expression will have stronger support.Such theory can be explained to a certain extent are as follows: baseThere is stronger certainty in the causality that Chinese corpus extracts in rigorous causality prompt word.The present embodiment is based on thisKind theory is according to the difference of causality prompt word, and the causality to extract is assigned to different cause and effect supports, such as 3 institute of tableShow:
3 causality prompt word polarity number of table
Cause and effect prompt word part of speechPolarity numberCause and effect prompt word part of speechPolarity number
Nested cause and effect conjunction0.7The adverbial word of table cause and effect0.3
Single conjunction of table cause and effect0.5The verb of table cause and effect0.3
The preposition of table cause and effect0.1The verb of table result production0.6
Causality generally can be divided into positive association and two kinds of negative customers;It is having the same that positive association often shows as cause and effectTrend feature, i.e. cause and effect have the tendency that increase or reduction simultaneously;Negative customers then have different growths to become on the contrary, showing as the twoGesture;I.e. reason causes the generation of result and reason to inhibit the generation of result.The identification of such semantic resultsCausal judgement is carried out based on negative word herein.If expressing in the sentence of cause and effect and negative word occur, then it represents that suchCausality has inhibition relationship.Therefore it is recognized in the present embodiment based on negative word causal positive and negative.Specifically, negateWord reciprocal fraction is that there are negative words in if statement, it is determined that negative word reciprocal fraction is 1, otherwise then determines negative word pairAnswering score is -1, and negates that vocabulary can be as shown in table 4:
The Chinese negative word of table 4
To be indicated by calculating the degree adverb, causality prompt word, emotion word and the negative word that include in sentenceThe support of influence degree between causality, and record into causality knowledge base, thus further perfect cause and effect passIt is knowledge base.It should be noted that if the sentence comprising some novel entities pair be it is multiple, this multiple sentence are new for thisSupport of the entity to the mean value of calculated support as the novel entities pair;In addition, obtaining the intensity and branch of novel entities pairAfter degree of holding, it can be deduced that such as the causality of several classifications in table 5, wherein strong reason → strong resulting class is corresponding new realThe support and intensity of body pair are respectively greater than or are equal to corresponding threshold value, and weak reason → strong resulting class corresponds to novel entities pairSupport is greater than corresponding threshold value, intensity is less than corresponding threshold value, and weak reason → weak resulting class corresponds to the support of novel entities pairAnd intensity is respectively less than or equal to corresponding threshold value, the support that strong reason → weak resulting class corresponds to novel entities pair is less than pairThreshold value, intensity is answered to be greater than corresponding threshold value, wherein every threshold value can be set according to actual needs, and can also will be eachThe classification of novel entities pair is also added in causality knowledge base, for inquiry.
5 causality category table of table
Strong reason → strong resultWeak reason → weak result
Weak reason → strong resultStrong reason → weak result
A kind of construction method of causality knowledge base provided in an embodiment of the present invention can also include:
Reason entity to be predicted is compared with each reason entity in newest obtained causality knowledge base, and is exportedNovel entities corresponding with entity the reason of reason Entities Matching to be predicted to comprising result entity, novel entities are to corresponding co-occurrence frequencyRate and novel entities are to corresponding support.
Wherein, any to need the reason of realizing prediction of result correspondent entity and can be used as reason entity to be predicted, it is based onNewest obtained causality knowledge base is capable of determining that (same or similar degree is greater than preparatory root with reason Entities Matching to be predictedAccording to the similarity threshold of actual needs setting) the reason of entity, and then determine the corresponding result entity of the reason entity be with toIt predicts the corresponding result entity of reason entity, and then realizes prediction of result, and will result corresponding with reason entity to be predictedIntensity, support and the classification of the affiliated novel entities pair of entity export, to realize the comprehensive and integrity of result output.
In addition, realize the result correspondent entity of reason prediction if necessary, then it can be using the entity as knot to be predictedSporocarp, so that result entity to be predicted is compared with each result entity in newest obtained causality knowledge base, andNovel entities corresponding with the matched result entity of result entity to be predicted are exported to entity, novel entities the reason of including to corresponding totalExisting frequency and novel entities are to corresponding support;Wherein, with the matched result entity of result entity to be predicted can for to pre-Survey the result entity that the same or similar degree of result entity is greater than the similarity threshold previously according to actual needs setting, Jin ErshiExisting reason prediction, certainly can also export the classification of novel entities pair corresponding with result entity to be predicted, to realize knotThe comprehensive and integrity of fruit output.
Technical solution disclosed in the present application excavates the causality between entity, under internet big data background with cause and effectThe form of relationship entity pair is presented to user, on the one hand alleviates the puzzlement that people are brought by " information overload ", on the other handThe advantage for taking full advantage of big data has pushed the change of information technology;Meanwhile using a kind of increment type duration machine learningFrame carries out increment extraction to causality entity, improves the real-time effectiveness of causality knowledge base.As it can be seen that the application is publicThe technical solution opened is under internet big data background, using eternal learning framework, increment type decimation pattern, at the same learn just,Negative customers and causalnexus intensity, with building and gradually complete causality knowledge base, to qualitative reasoning and north because explaining offerIt supports.In the environment of " internet "+" big data ", it can make full use of community network resource, acquire Various types of data in real time;InstituteThe causality knowledge base of building, by effectively in promote information-intensive society process, be public demands and scientific and technological resources basic dataShared to provide big data analysis service, industrial application prospect is wide.
The embodiment of the invention also provides a kind of construction devices of causality knowledge base, as shown in figure 3, may include:
Preprocessing module 11, is used for: being processed into multiple sentences by obtaining text data in data source, and by text data;
Module 12 is constructed, is used for: determining that there are causal cause and effect sentence, identifications in the multiple sentences obtained by processingThe entity pair for including in cause and effect sentence, and determination includes to identify that the collection of obtained entity pair is combined into causality knowledge base;Wherein,Entity is to including reason entity and result entity;
Incremental learning module 13, is used for: it is every at the time of by obtaining text data in data source to pass through preset time period,Then judge whether the total amount of data changed in data source reaches data-quantity threshold, executes if it is, returning by being obtained in data sourceThe step of taking text data, if it is not, then determining without rebuilding building causality knowledge base.
A kind of construction device of causality knowledge base provided in an embodiment of the present invention, building module may include:
First judging unit, is used for: determine that any sentence is current statement, judge in current statement whether comprising it is clear becauseAny clear causality prompt word for including in fruit relationship prompt set of words, if it is, determine that current statement is cause and effect sentence,If it is not, then determining that current statement is not cause and effect sentence;Wherein, specifying causality prompt word is to show to determine that there are cause and effect passesThe causality prompt word of system.
A kind of construction device of causality knowledge base provided in an embodiment of the present invention, building module can also include:
Second judgment unit is used for: before to determine current statement not be cause and effect sentence, if do not included in current statement brightWhether any clear causality prompt word for including in true causality prompt set of words then judges in current statement comprising mouldAny FUZZY RELATION OF CAUSE AND EFFECT prompt word for including in paste causality prompt set of words, if it is, converting current statement toThe classifier being pre-created is input to after feature vector, if the result of classifier output is preset value, it is determined that current statementFor cause and effect sentence, if the result of classifier output is not preset value, it is determined that current statement is not cause and effect sentence, if it is not, thenIt indicates that the first judging unit executes and determines the step of current statement is not cause and effect sentence;Wherein, FUZZY RELATION OF CAUSE AND EFFECT prompt word isShow there may be causal causality prompt word, classifier is to have FUZZY RELATION OF CAUSE AND EFFECT prompt word using multipleWhether feature vector that sentence is converted to and corresponding sentence are that the label training of cause and effect sentence obtains.
A kind of construction device of causality knowledge base provided in an embodiment of the present invention can also include:
Reprocess module, be used for: determination includes that the collection for the entity pair that identification obtains is combined into after causality knowledge base,Will the obtained entity of identification to cartesian product pairing is carried out, determine obtained multiple entities to being novel entities pair;To new realityBody to comprising entity carry out cluster operation, obtain multiple set pair being made of causality entity sets, wherein Mei GeyinComprising being classified as of a sort reason entity or result entity in cluster operation in fruit relationship entity set;Retain each setTo comprising the co-occurrence frequency in data source be greater than frequency threshold novel entities pair, delete other entities pair.
A kind of construction device of causality knowledge base provided in an embodiment of the present invention can also include:
Adding module is used for: the co-occurrence frequency of the novel entities pair of each reservation is added into causality knowledge base.
A kind of construction device of causality knowledge base provided in an embodiment of the present invention can also include:
Computing module is used for: calculating the support of each novel entities pair according to the following formula, and by each novel entities pairSupport is added into causality knowledge base:
SupportNum=(α * Adverb+ β * SentenceType+ γ * Emotion) * Negative;
Wherein, α, β and γ are preset weight coefficient, α > β > γ, and alpha+beta+γ=1;SupportNum isSupport, Adverb are that novel entities correspond to the degree adverb reciprocal fraction for including in sentence, and SentenceType is novel entities pairThe causality prompt word reciprocal fraction for including in sentence is answered, Emotion is that novel entities correspond to the emotion word pair for including in sentenceScore is answered, Negative is that novel entities correspond to the negative word reciprocal fraction for including in sentence.
A kind of construction device of causality knowledge base provided in an embodiment of the present invention can also include:
Comparison module is used for: by each reason entity in reason entity to be predicted and newest obtained causality knowledge baseBe compared, and export corresponding with entity the reason of reason Entities Matching to be predicted novel entities to comprising result entity, new realityBody is to corresponding co-occurrence frequency and novel entities to corresponding support.
The embodiment of the invention also provides a kind of building equipment of causality knowledge base, may include:
Memory, for storing computer program;
Processor realizes the step of the as above construction method of any one causality knowledge base when for executing computer programSuddenly.
The embodiment of the invention also provides a kind of computer readable storage medium, it is stored on computer readable storage mediumComputer program realizes the step of the as above construction method of any one causality knowledge base when computer program is executed by processorSuddenly.
It should be noted that construction device, equipment and the meter of a kind of causality knowledge base provided in an embodiment of the present inventionThe explanation of relevant portion refers to a kind of causality knowledge base provided in an embodiment of the present invention in calculation machine readable storage medium storing program for executingThe detailed description of corresponding part in construction method, details are not described herein.In addition above-mentioned technical proposal provided in an embodiment of the present inventionIn with correspond to the consistent part of technical solution realization principle and unspecified in the prior art, in order to avoid excessively repeat.
The foregoing description of the disclosed embodiments can be realized those skilled in the art or using the present invention.To thisA variety of modifications of a little embodiments will be apparent for a person skilled in the art, and the general principles defined herein canWithout departing from the spirit or scope of the present invention, to realize in other embodiments.Therefore, the present invention will not be limitedIt is formed on the embodiments shown herein, and is to fit to consistent with the principles and novel features disclosed in this article widestRange.

Claims (10)

If not including any clear causality prompt word for including in clear causality prompt set of words in current statement,Then judge in current statement whether to include any FUZZY RELATION OF CAUSE AND EFFECT prompt word for including in FUZZY RELATION OF CAUSE AND EFFECT prompt set of words,If it is, current statement is converted to the classifier for being input to and being pre-created after feature vector, if the classifier exportsResult be preset value, it is determined that current statement be cause and effect sentence, if the classifier output result be not preset value, reallyDetermining current statement not is cause and effect sentence, determines the step of current statement is not cause and effect sentence if it is not, then executing;Wherein, describedFUZZY RELATION OF CAUSE AND EFFECT prompt word is to show there may be causal causality prompt word, and the classifier is using multipleFeature vector that sentence with FUZZY RELATION OF CAUSE AND EFFECT prompt word is converted to and corresponding sentence whether be cause and effect sentence label instructionIt gets.
CN201811494944.3A2018-12-072018-12-07 A method, device and device for constructing a causal relationship knowledge basePendingCN109308323A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201811494944.3ACN109308323A (en)2018-12-072018-12-07 A method, device and device for constructing a causal relationship knowledge base

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201811494944.3ACN109308323A (en)2018-12-072018-12-07 A method, device and device for constructing a causal relationship knowledge base

Publications (1)

Publication NumberPublication Date
CN109308323Atrue CN109308323A (en)2019-02-05

Family

ID=65222443

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201811494944.3APendingCN109308323A (en)2018-12-072018-12-07 A method, device and device for constructing a causal relationship knowledge base

Country Status (1)

CountryLink
CN (1)CN109308323A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110377759A (en)*2019-07-222019-10-25中国工商银行股份有限公司Event relation map construction method and device
CN110674308A (en)*2019-08-232020-01-10上海科技发展有限公司Scientific and technological word list expansion method, device, terminal and medium based on grammar mode
CN111428052A (en)*2020-03-302020-07-17中国科学技术大学Method for constructing educational concept graph with multiple relations from multi-source data
CN112100312A (en)*2019-06-182020-12-18国际商业机器公司Intelligent extraction of causal knowledge from data sources
CN112287111A (en)*2020-12-182021-01-29腾讯科技(深圳)有限公司Text processing method and related device
CN112543897A (en)*2019-03-132021-03-23欧姆龙株式会社Analysis device, analysis method, and analysis program
CN113033809A (en)*2021-04-162021-06-25复旦大学Common sense causal reasoning method and system based on weak evidence aggregation
CN113642321A (en)*2021-06-282021-11-12浙江工业大学Financial field-oriented causal relationship extraction method and system
CN113742445A (en)*2021-07-162021-12-03中国科学院自动化研究所Text recognition sample obtaining method and device and text recognition method and device
CN114254752A (en)*2020-09-252022-03-29辉达公司Knowledge discovery using neural networks
CN116151231A (en)*2021-11-182023-05-23富士通株式会社Storage medium, output method, and information processing apparatus

Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US8976063B1 (en)*2014-04-292015-03-10Google Inc.Automated detection of vehicle parking and location
CN104735074A (en)*2015-03-312015-06-24江苏通付盾信息科技有限公司Malicious URL detection method and implement system thereof
CN105550288A (en)*2015-12-102016-05-04百度在线网络技术(北京)有限公司Database system updating method and management system
CN106022018A (en)*2016-05-142016-10-12丁贤根CMS object-oriented artificial intelligence information secrecy system
CN107783973A (en)*2016-08-242018-03-09慧科讯业有限公司Method, device and system for monitoring internet media event based on industry knowledge map database

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US8976063B1 (en)*2014-04-292015-03-10Google Inc.Automated detection of vehicle parking and location
CN104735074A (en)*2015-03-312015-06-24江苏通付盾信息科技有限公司Malicious URL detection method and implement system thereof
CN105550288A (en)*2015-12-102016-05-04百度在线网络技术(北京)有限公司Database system updating method and management system
CN106022018A (en)*2016-05-142016-10-12丁贤根CMS object-oriented artificial intelligence information secrecy system
CN107783973A (en)*2016-08-242018-03-09慧科讯业有限公司Method, device and system for monitoring internet media event based on industry knowledge map database

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
杨攀飞: ""因果关系知识库的研究与构建"", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》*
焦玉英 等: ""合作数字参考服务中的知识库建设——DREW与DCVRS的Knowledge Base"", 《图书情报知识》*

Cited By (18)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112543897A (en)*2019-03-132021-03-23欧姆龙株式会社Analysis device, analysis method, and analysis program
CN112543897B (en)*2019-03-132024-02-02欧姆龙株式会社Analysis device, analysis method, and storage medium
CN112100312A (en)*2019-06-182020-12-18国际商业机器公司Intelligent extraction of causal knowledge from data sources
CN112100312B (en)*2019-06-182024-09-17国际商业机器公司Intelligent extraction of causal knowledge from data sources
CN110377759A (en)*2019-07-222019-10-25中国工商银行股份有限公司Event relation map construction method and device
CN110377759B (en)*2019-07-222022-02-11中国工商银行股份有限公司Method and device for constructing event relation graph
CN110674308A (en)*2019-08-232020-01-10上海科技发展有限公司Scientific and technological word list expansion method, device, terminal and medium based on grammar mode
CN111428052A (en)*2020-03-302020-07-17中国科学技术大学Method for constructing educational concept graph with multiple relations from multi-source data
CN111428052B (en)*2020-03-302023-06-16中国科学技术大学 A method for building educational concept maps with multiple relationships from multi-source data
CN114254752A (en)*2020-09-252022-03-29辉达公司Knowledge discovery using neural networks
CN112287111A (en)*2020-12-182021-01-29腾讯科技(深圳)有限公司Text processing method and related device
CN113033809B (en)*2021-04-162023-01-17复旦大学 A Commonsense Causal Reasoning Method and System Based on Weak Evidence Aggregation
CN113033809A (en)*2021-04-162021-06-25复旦大学Common sense causal reasoning method and system based on weak evidence aggregation
CN113642321A (en)*2021-06-282021-11-12浙江工业大学Financial field-oriented causal relationship extraction method and system
CN113642321B (en)*2021-06-282024-03-29浙江工业大学 Causal relationship extraction methods and systems for the financial field
CN113742445B (en)*2021-07-162022-09-27中国科学院自动化研究所Text recognition sample obtaining method and device and text recognition method and device
CN113742445A (en)*2021-07-162021-12-03中国科学院自动化研究所Text recognition sample obtaining method and device and text recognition method and device
CN116151231A (en)*2021-11-182023-05-23富士通株式会社Storage medium, output method, and information processing apparatus

Similar Documents

PublicationPublication DateTitle
CN109308323A (en) A method, device and device for constructing a causal relationship knowledge base
Snyder et al.Interactive learning for identifying relevant tweets to support real-time situational awareness
Hashimoto et al.Topic detection using paragraph vectors to support active learning in systematic reviews
CN106599032B (en)Text event extraction method combining sparse coding and structure sensing machine
CN108717408A (en)A kind of sensitive word method for real-time monitoring, electronic equipment, storage medium and system
WO2015093540A1 (en)Phrase pair gathering device and computer program therefor
Whitehouse et al.Evaluation of fake news detection with knowledge-enhanced language models
Azizah et al.Performance analysis of transformer based models (BERT, ALBERT, and RoBERTa) in fake news detection
CN111090735B (en)Performance evaluation method of intelligent question-answering method based on knowledge graph
WO2015093539A1 (en)Complex predicate template gathering device, and computer program therefor
Rawat et al.Sentiment analysis of Covid19 vaccines tweets using NLP and machine learning classifiers
Parde et al.A corpus of metaphor novelty scores for syntactically-related word pairs
CN110851593A (en)Complex value word vector construction method based on position and semantics
Ullah et al.Unveiling the power of deep learning: A comparative study of lstm, bert, and gru for disaster tweet classification
Wu et al.Maximum entropy-based sentiment analysis of online product reviews in Chinese
CN114328820B (en) Information search method and related equipment
Voronov et al.Forecasting popularity of news article by title analyzing with BN-LSTM network
Das et al.Graph-based text summarization and its application on COVID-19 twitter data
Ma et al.Aspect-based attention LSTM for aspect-level sentiment analysis
Jeong et al.Discovery of research interests of authors over time using a topic model
CN104679836B (en)A kind of automatic extending method of Agricultural ontology
Isaeva et al.Neurostimulation for Finding Metaphor in Terminology
Bai et al.Gated character-aware convolutional neural network for effective automated essay scoring
Alsheri et al.MOOCSent: a sentiment predictor for massive open online courses
El Kah et al.Arabic authorship attribution on twitter: what is really matters?

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
RJ01Rejection of invention patent application after publication
RJ01Rejection of invention patent application after publication

Application publication date:20190205


[8]ページ先頭

©2009-2025 Movatter.jp