Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art. Therefore, the first objective of the present invention is to provide a method for constructing a domain knowledge graph, which is capable of constructing a complete and high-quality domain knowledge graph of a target domain by using a high-quality and high-accuracy initial structured knowledge graph constructed by using structured knowledge and then performing targeted expansion and completion of the initial structured knowledge graph with an unstructured text.
The second purpose of the invention is to provide a knowledge graph constructing device.
A third object of the invention is to propose a computer-readable storage medium.
A fourth object of the invention is to propose an electronic device.
In order to achieve the above object, an embodiment of a first aspect of the present invention provides a method for constructing a knowledge graph, including:
acquiring a structured database of a target field, and constructing an initial structured knowledge graph according to the structured database of the target field, wherein the initial structured knowledge graph comprises standard entities and standard relations;
extracting a reference entity and a reference relation in the unstructured text of the target field according to the standard entity and the standard relation;
and performing knowledge fusion on the reference entities and the reference relations extracted from the unstructured text and the standard entities and the standard relations in the initial structured graph to form a knowledge graph of the target field.
According to the method for constructing the knowledge graph, the initial structured knowledge graph is constructed according to the structured database of the target field, the reference entity and the reference relation in the unstructured text of the target field are extracted according to the standard entity and the standard relation of the initial structured knowledge graph, and the reference entity and the reference relation extracted from the unstructured text and the standard entity and the standard relation in the initial structured knowledge graph are subjected to knowledge fusion to form the knowledge graph of the target field. The initial structured knowledge graph constructed by the structured knowledge is a high-quality and high-accuracy knowledge graph, and the initial structured knowledge graph is subjected to targeted expansion and completion by the aid of the unstructured text on the basis, so that a complete and high-quality knowledge graph in the target field can be constructed.
According to an embodiment of the present invention, the extracting, according to the standard entity and the standard relationship, a corresponding reference entity and a corresponding reference relationship in the unstructured text of the field includes:
marking the corresponding entities and relations in the unstructured text according to the standard entities and the standard relations;
extracting text abstract processing is carried out on the marked unstructured text according to the standard entity and the standard relation so as to screen out a sentence set associated with the standard entity and the standard relation;
and performing entity recognition on the sentence sets associated with the standard entities and the standard relations, and performing entity relation extraction according to recognition results to obtain reference entities and reference relations in the unstructured text.
According to an embodiment of the present invention, the extracting the entity relationship according to the recognition result includes:
and on the basis of carrying out entity recognition on the standard entity and the sentence set associated with the standard relation, carrying out entity relation extraction by adopting a relation classification and syntactic analysis mode.
According to an embodiment of the present invention, the constructing an initial structured knowledge-graph from the structured database of the target domain comprises:
constructing a domain ontology, wherein the domain ontology comprises important concepts, concept relations and axioms in the target domain;
mapping the structured knowledge in the structured database to the domain ontology to obtain entity nodes and relationship nodes;
and carrying out knowledge fusion on the entity nodes and the relation nodes obtained from different structured databases to obtain the standard entities and the standard relations, and forming the initial structured knowledge graph according to the standard entities and the standard relations.
According to one embodiment of the invention, the knowledge fusion of the reference entities and reference relations extracted from the unstructured text and the standard entities and standard relations in the initial structured graph comprises:
verifying the reference entity and the reference relationship and the standard entity and the standard relationship according to the axiom to judge whether the reference entity and the standard entity, the reference relationship and the standard relationship meet the axiom;
and when the reference entity and the standard entity, and the reference relationship and the standard relationship meet the axiom, performing knowledge fusion on the reference entity and the standard entity, and the reference relationship and the standard relationship.
According to an embodiment of the present invention, the knowledge fusion of the reference entities and reference relations extracted from the unstructured text and the standard entities and standard relations in the initial structured graph further includes:
setting a relation confidence value according to the original source of the reference relation and the frequency of the reference relation;
and performing knowledge fusion on the reference relation and the standard relation according to the relation confidence value.
According to an embodiment of the present invention, fusing the reference relationship and the standard relationship according to the relationship confidence value includes:
and when the reference relationship conflicts with the standard relationship, if the relationship confidence value of the reference relationship is smaller than a preset threshold value, deleting the reference relationship.
In order to achieve the above object, a second embodiment of the present invention provides an apparatus for constructing a knowledge graph, including:
the acquisition module is used for acquiring a structured database of the target field;
a construction module for constructing an initial structured knowledge graph according to the structured database of the target domain, the initial structured knowledge graph comprising standard entities and standard relationships;
the extraction module is used for extracting a reference entity and a reference relation in the unstructured text of the target field according to the standard entity and the standard relation;
and the fusion module is used for carrying out knowledge fusion on the reference entities and the reference relations extracted from the unstructured text and the standard entities and the standard relations in the initial structured knowledge graph so as to form the knowledge graph of the target field.
According to the device for constructing the knowledge graph, the acquisition module is used for acquiring the structured database of the target field, the construction module is used for constructing the initial structured knowledge graph according to the structured database of the target field, the initial structured knowledge graph comprises the standard entity and the standard relation, the extraction module is used for extracting the reference entity and the reference relation in the unstructured text of the target field according to the standard entity and the standard relation, and the fusion module is used for carrying out knowledge fusion on the reference entity and the reference relation extracted from the unstructured text and the standard entity and the standard relation in the initial structured knowledge graph so as to form the knowledge graph of the target field. The initial structured knowledge graph constructed by the structured knowledge is a high-quality and high-accuracy knowledge graph, and the initial structured knowledge graph is subjected to targeted expansion and completion by the aid of the unstructured text on the basis, so that a complete and high-quality knowledge graph in the target field can be constructed.
In order to achieve the above object, a third embodiment of the present invention provides a computer-readable storage medium, on which a knowledge-graph constructing program is stored, which, when executed by a processor, implements the aforementioned knowledge-graph constructing method.
According to the computer-readable storage medium of the embodiment of the invention, by the above method for constructing the knowledge graph, the initial structured knowledge graph is constructed by using the structured knowledge, and the knowledge graph is a high-quality and high-accuracy knowledge graph, and on the basis, the initial structured knowledge graph is subjected to targeted expansion and completion by using the unstructured text, so that a complete and high-quality knowledge graph of the target field can be constructed.
In order to achieve the above object, a fourth aspect of the present invention provides an electronic device, including a memory, a processor, and a knowledge graph constructing program stored in the memory and executable on the processor, where the processor implements the knowledge graph constructing method when executing the knowledge graph constructing program.
According to the electronic equipment provided by the embodiment of the invention, the knowledge graph is constructed by adopting the structured knowledge through the construction method of the knowledge graph, the knowledge graph is a high-quality and high-accuracy knowledge graph, and the initial structured knowledge graph is subjected to targeted expansion and completion by using the unstructured text on the basis, so that a complete and high-quality knowledge graph in the target field can be constructed.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
A method, an apparatus, a computer-readable storage medium, and an electronic device for constructing a knowledge graph according to embodiments of the present invention are described below with reference to the accompanying drawings.
Fig. 1 is a flowchart of a method for constructing a knowledge-graph according to an embodiment of the present invention, and referring to fig. 1, the method for constructing a knowledge-graph includes the following steps:
step S100, acquiring a structured database of the target field, and constructing an initial structured knowledge graph according to the structured database of the target field, wherein the initial structured knowledge graph comprises standard entities and standard relations.
Specifically, a large number of structured databases are accumulated in the construction and application of an information and internet system for a specific target field, and an initial structured knowledge map is constructed by acquiring the accumulated structured databases.
In one embodiment, referring to FIG. 2, constructing an initial structured knowledge-graph from a structured database of target domains comprises:
and step S110, constructing a domain ontology, wherein the domain ontology comprises important concepts, concept relations and axioms in the target domain.
Specifically, when an initial structured knowledge graph is constructed according to a structured database of a target field, a field body can be constructed firstly, specifically, the constructed target field can be determined firstly, then important concepts, concept relations, axioms and other elements in the target field are clarified, and in the process, the method is completed mainly by automatically extracting a relational database table structure of an existing system of the target field and assisting field experts in modification and identification.
Step S120, mapping the structured knowledge in the structured database to the domain ontology to obtain entity nodes and relationship nodes.
Specifically, structured knowledge migration can be performed after the domain ontology is built. When structured knowledge migration is carried out, because a large amount of domain knowledge exists in a database or a document in the form of structured data, the structured knowledge in the database or the document can be extracted by an automatic program on the basis of the domain ontology determined in the previous step so as to carry out the knowledge migration.
And step S130, performing knowledge fusion on the entity nodes and the relation nodes acquired from the different structural databases to acquire standard entities and standard relations, and forming an initial structural knowledge graph according to the standard entities and the standard relations.
Specifically, after the structured knowledge migration, knowledge fusion may be performed. In knowledge fusion, since structured knowledge comes from different databases and documents, entity alignment, entity disambiguation and relationship, and attribute merging need to be performed on the same entity and relationship from different databases to collectively import the multi-source structured knowledge into the original knowledge graph, thereby obtaining an entity-relationship-entity transformed by the structured knowledge, wherein the structured knowledge is transformed into the entity and relationship as a standard entity and a standard relationship.
And S200, extracting a reference entity and a reference relation in the unstructured text of the target field according to the standard entity and the standard relation.
For example, a large amount of unstructured texts in the target field can be acquired from a search engine through a web page crawling technology, and since the standard entities and the standard relations acquired through the structured knowledge are knowledge contents with high accuracy and high quality, the reference entities and the reference relations in the unstructured texts can be extracted according to the standard entities and the standard relations acquired from the structured knowledge to establish a high-quality knowledge map of the target field.
In one embodiment, as shown in fig. 3, extracting corresponding reference entities and reference relationships in an unstructured text of a domain according to the standard entities and the standard relationships includes:
and step S210, marking the corresponding entities and relations in the unstructured text according to the standard entities and the standard relations.
Specifically, according to the standard entities and the standard relations, entities and relations related to the standard entities and the standard relations in the unstructured text are labeled one by special symbols, the process can be called as reverse labeling and can be automatically completed by a machine, and during labeling, only the entities or relations appearing are labeled.
Step S220, extracting text abstract processing is carried out on the marked unstructured text according to the standard entity and the standard relation so as to screen out a sentence set associated with the standard entity and the standard relation.
Considering that not all sentences in an article need to extract relations (many relations are not knowledge of a target field), the target sentences of the automatically extracted relations need to be screened, so that the extracted relations do not contain some unrelated entities and relations, and the expertise and the pureness of the knowledge of the domain knowledge graph are ensured. However, if the sentence is selected only by the entity and relationship labels in step S210, many new contents that are not originally in the structured data are omitted, which results in poor integrity and comprehensiveness of the domain knowledge graph. Therefore, a decimated text summarization process is performed on unstructured text. Generally, the extraction type text summarization process is to find out sentences containing more information by counting the word frequency of the keywords. In this embodiment, in order to improve the specialty of the domain knowledge graph, when the extraction-type document summarization processing is performed, the weight of the sentences associated with the standard entities and the standards is also increased, and the sentences with higher relevance are screened from a large number of sentences in the unstructured text and used as the target sentence set for entity identification and entity relationship extraction.
And step S230, performing entity recognition on the sentence set associated with the standard entity and the standard relation, and performing entity relation extraction according to the recognition result to obtain a reference entity and a reference relation in the unstructured text.
Specifically, when the entity recognition is performed on the sentence sets associated with the standard entities and the standard relationships, both the entities that are consistent with the standard entities and the entities related to the standard entities are recognized.
Further, the extracting the entity relationship according to the recognition result comprises: on the basis of carrying out entity recognition on a sentence set associated with a standard entity and a standard relation, entity relation extraction is carried out by adopting a relation classification and syntactic analysis mode, specifically, the two methods can be simultaneously carried out so as to extract semantic relations between two or more entities from a text by utilizing the recognized entities. The two methods are adopted to extract the entity relationship by adopting the relationship classification and the syntactic analysis simultaneously, so that the relationship between the extracted entities is more perfect, and the construction of a complete high-quality knowledge graph is facilitated.
And step S300, carrying out knowledge fusion on the reference entities and the reference relations extracted from the unstructured text and the standard entities and the standard relations in the initial structured spectrogram to form a knowledge spectrogram of the target field.
Specifically, the part of the entities and the relations which are extracted from the unstructured text in the previous step and are different from the standard entities and the standard relations are supplemented into the initial structured graph, and meanwhile, the entities and the relations which are similar to the standard entities and the standard relations are subjected to entity alignment, entity disambiguation and relation, attribute combination and the like. Step S200 and step S300 are iterated mutually, step S200 may further perform information extraction from the unstructured text by using the entities and relationships obtained in step S300, and step S300 may also perform targeted, controllable expansion and completion on the initial structured graph by using the new entities and relationships extracted in step S200.
The method for constructing the knowledge graph comprises the steps of obtaining a structured database of a target field, constructing an initial structured knowledge graph according to the structured database, extracting a reference entity and a reference relation in an unstructured text of the target field according to a standard entity and a standard relation of the initial structured knowledge graph, and carrying out knowledge fusion on the reference entity and the reference relation extracted from the unstructured text and the standard entity and the standard relation in the initial structured knowledge graph to construct the knowledge graph of the target field. The initial structured knowledge graph constructed by the structured knowledge is a high-quality and high-accuracy knowledge graph, and the initial structured knowledge graph is subjected to targeted expansion and completion by the aid of the unstructured text on the basis, so that a complete and high-quality knowledge graph in the target field can be constructed.
In one embodiment, step S300 includes: verifying the reference entity and the reference relationship and the standard entity and the standard relationship according to the axiom to judge whether the reference entity and the standard entity, the reference relationship and the standard relationship meet the axiom; and when the reference entity and the standard entity, the reference relationship and the standard relationship meet the axiom, fusing the reference entity and the standard entity, and the reference relationship and the standard relationship.
Further, in one embodiment, step S300 further includes: setting a relation confidence value according to the original source of the reference relation and the frequency of the reference relation; and fusing the reference relation and the standard relation according to the relation confidence value.
Specifically, the more reliable the original source of the reference (e.g., from an authority, official media post text, standard references, etc.), the more frequently the reference occurs, and the higher the corresponding relationship confidence value. In this embodiment, the relationship confidence value may be a numerical value between 0 and 1, and the larger the numerical value, the higher the corresponding relationship confidence value. Fusing the reference relationship and the standard relationship according to the relationship confidence value comprises: and when the reference relation conflicts with the standard relation, if the confidence value of the reference relation is smaller than a preset threshold value, deleting the reference relation. The preset threshold may be selected according to a requirement, for example, may be set to 0.6, when the reference relationship and the standard relationship conflict and cannot be combined, the relationship confidence value may be used as a reference, if the relationship confidence value of the reference relationship is less than 0.6, the reference relationship confidence value is low, the reference relationship is deleted, if the relationship confidence value of the reference relationship is greater than or equal to 0.6, the reference relationship confidence value is high, the reference relationship may be supplemented into the initial structured knowledge graph, and the standard relationship that conflicts with the reference relationship may be deleted. On one hand, the method adopts axiom to verify the standard entity and the reference entity, and the standard relation and the reference relation, on the other hand, the relation confidence value is set for the reference relation, and knowledge fusion is carried out with the assistance of the relation confidence value, so that the accuracy of the formed knowledge graph can be ensured.
In summary, according to the method for constructing a knowledge graph of the embodiment of the present invention, an initial structured knowledge graph is constructed according to a structured database, a reference entity and a reference relationship in an unstructured text of a target field are extracted according to a standard entity and a standard relationship of the initial structured knowledge graph, and the reference entity and the reference relationship extracted in the unstructured text and the standard entity and the standard relationship in the initial structured knowledge graph are subjected to knowledge fusion to construct the knowledge graph of the target field. The initial structured knowledge graph constructed by the structured knowledge is a high-quality and high-accuracy knowledge graph, and is supplemented with unstructured texts to perform targeted expansion and completion on the initial structured knowledge graph, so that a complete and high-quality target domain knowledge graph can be constructed.
Referring to fig. 4, another embodiment of the present application provides a knowledge-graph constructing apparatus, including:
the obtainingmodule 100 is configured to obtain a structured database in a target field.
Aconstruction module 200, configured to construct an initial structured knowledge graph according to the structured database of the target domain, where the initial structured knowledge graph includes standard entities and standard relationships.
And theextraction module 300 is configured to extract the reference entities and the reference relationships in the unstructured text of the target field according to the standard entities and the standard relationships.
And afusion module 400, configured to perform knowledge fusion on the reference entities and reference relationships extracted from the unstructured text and the standard entities and standard relationships in the initial structured knowledge graph to form a knowledge graph of the target domain.
In one embodiment, referring to fig. 5, theextraction module 300 includes alabeling unit 310, aprocessing unit 320, and anidentification extraction unit 330. Thelabeling unit 310 is configured to label, according to the standard entity and the standard relationship, the entity and the relationship corresponding to the unstructured text; theprocessing unit 320 is configured to perform abstraction-type text summarization on the labeled unstructured text according to the standard entity and the standard relationship to filter out a sentence set associated with the standard entity and the standard relationship; the recognition andextraction unit 330 is configured to perform entity recognition on the sentence sets associated with the standard entities and the standard relationships, and perform entity relationship extraction according to recognition results to obtain reference entities and reference relationships in the unstructured text.
In one embodiment, thebuilding module 200 comprises anontology unit 210 and amapping unit 220, wherein theontology unit 210 is used for building a domain ontology, and the domain ontology comprises important concepts, concept relationships and axioms in the target domain. Themapping unit 220 is configured to map the structured knowledge in the structured database to the domain ontology to obtain an entity node and a relationship node. Thefusion module 300 includes afusion subunit 310, and thefusion subunit 310 is configured to perform knowledge fusion on the entity nodes and the relationship nodes obtained from the different structured databases to obtain standard entities and standard relationships, and form an initial structured knowledge graph according to the standard entities and the standard relationships.
In one embodiment, thefusion module 300 further includes afirst verification unit 320 and asecond verification unit 330, where thefirst verification unit 320 is configured to verify the reference entity and the reference relationship and the standard entity and the standard relationship according to an axiom to determine whether the reference entity and the standard entity, the reference relationship, and the standard relationship satisfy the axiom; thefusion subunit 310 performs knowledge fusion on the reference entity and the standard entity, the reference relationship, and the standard relationship when the reference entity and the standard entity, the reference relationship, and the standard relationship satisfy the axiom. Thesecond verifying unit 330 is configured to set a relationship confidence value according to an original source of the reference relationship and a frequency of occurrence of the reference relationship, thefusion subunit 310 is configured to perform knowledge fusion on the reference relationship and the standard relationship according to the relationship confidence value, and when the reference relationship and the standard relationship conflict, if the relationship confidence value of the reference relationship is smaller than a preset threshold, the reference relationship is deleted.
It should be noted that, for the description of the apparatus for constructing a knowledge graph in the present application, please refer to the description of the method for constructing a knowledge graph in the present application, and details are not repeated here.
The device for constructing the knowledge graph is characterized in that the initial structured knowledge graph constructed by adopting the structured knowledge is a high-quality and high-accuracy knowledge graph, and the initial structured knowledge graph is subjected to targeted expansion and completion by the aid of the unstructured text, so that a complete and high-quality target field knowledge graph can be constructed.
In addition, in another embodiment of the present application, a computer-readable storage medium is provided, on which a knowledge graph constructing program is stored, and the knowledge graph constructing program is executed by a processor to implement the aforementioned knowledge graph constructing method, and for a description of operation of the knowledge graph constructing program in the present application, please refer to the description of the knowledge graph constructing method in the present application, which is not repeated herein.
According to the computer-readable storage medium of the embodiment of the invention, the complete and high-quality target domain knowledge graph can be constructed by the knowledge graph construction method.
In addition, another embodiment of the present application provides an electronic device, which includes a memory, a processor, and a knowledge graph constructing program that is stored in the memory and is executable on the processor, where the processor implements the knowledge graph constructing method when executing the knowledge graph constructing program, and details are not repeated here.
According to the electronic equipment provided by the embodiment of the invention, the complete and high-quality target domain knowledge graph can be constructed by the construction method of the knowledge graph.
It should be noted that the logic and/or steps represented in the flowcharts or otherwise described herein, such as an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the present invention, unless otherwise expressly stated or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; they may be directly connected or indirectly connected through intervening media, or they may be connected internally or in any other suitable relationship, unless expressly stated otherwise. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.