Disclosure of Invention
The technical problem to be solved by the embodiments of the present invention is to provide a method, a system and a storage medium for constructing a knowledge graph, which can acquire the relation between knowledge points by constructing the knowledge graph, and accurately master the knowledge state of a learner.
In order to solve the above problem, an embodiment of the present invention provides a method for constructing a knowledge graph, which at least includes the following steps:
acquiring initial learning data, storing the initial learning data into a learning resource database, extracting body terms of the initial learning data according to a subject teaching outline and teaching materials, and labeling the extracted body terms;
inputting the relation and sequence between each element knowledge point, constructing a directed disciplinary knowledge map through a body editor, generating an OWL file and storing the OWL file into a learning system in a triple form;
updating the learning process state data of the input user in real time, and performing multi-knowledge point modeling on the learning process state data of the user through a DKT algorithm to obtain the learning mastery state of the user;
and mining the relation among the knowledge points according to the learning and mastering state of the user, and dynamically updating the subject knowledge map.
Further, the method for constructing the knowledge graph further comprises the following steps:
and carrying out knowledge reasoning on the discipline knowledge graph by adopting a TransE algorithm, adding new element knowledge points and hierarchical relations thereof, and further perfecting the discipline knowledge graph.
Further, the initial learning data comprises learner personal information, subject learning content information, subject exercise test data and learning resource data.
Further, the labeling comprises automatic labeling and manual labeling, wherein,
the automatic labeling specifically comprises the following steps: in the process of constructing a directed discipline knowledge graph by an ontology editor, a Jena framework is used for realizing the hierarchical relation of ontology terms;
the manual labeling specifically comprises the following steps: and sorting the attribute relation and the correlation relation of the extracted ontology terms, and importing the ontology terms into an ontology editor through a Jena frame.
Further, the learning process state data comprises knowledge state records and knowledge level test records, wherein,
the knowledge state records comprise statistical information of mastered knowledge points and statistical information of mastered knowledge points, and the knowledge level test records comprise questions, answer results, answer times and answer time.
Further, the learning process state data of the user is updated and input in real time, multi-knowledge point modeling is performed on the learning process state data of the user through a DKT algorithm, and the learning mastery state of the user is obtained, specifically:
acquiring learning test data of a learner in all time sequences in the system as training samples of the DKT model;
implementing an LSTM training by adopting a tensorflow method;
inputting the learning data of each learner as a batch and then performing model training;
after the model training is finished, a DKT model is led out and is deployed into tensoflow serving;
and calling the DKT model on line in real time through tensoflow serving, and feeding back the knowledge point mastering state of students and the right answer probability of the knowledge points at the next time.
Further, the mining of the relation between the knowledge points according to the learning and mastering state of the user and the dynamic updating of the subject knowledge graph specifically comprise:
acquiring the answer correct probability of each knowledge point of all learners at present through a DKT model;
adopting a probability association rule mining technology to identify implicit relations between relation acquisition knowledge points;
and judging the mastering state of the knowledge points by the user according to the implicit relationship among the knowledge points, and dynamically updating the disciplinary knowledge map in real time.
An embodiment of the present invention further provides a system for constructing a knowledge graph, including:
the data module is used for collecting initial learning data, storing the initial learning data into a learning resource database, extracting body terms of the initial learning data according to a subject teaching outline and teaching materials, and labeling the extracted body terms;
the map building module is used for inputting the relation and the sequence between each element knowledge point, building a directed discipline knowledge map through the body editor, generating an OWL file and storing the OWL file into the learning system in a triple form;
the DKT module is used for updating and inputting the learning process state data of the user in real time, and performing multi-knowledge point modeling on the learning process state data of the user through a DKT algorithm to obtain the learning mastery state of the user;
the knowledge point mining module is used for mining the relation among the knowledge points according to the learning and mastering state of the user and dynamically updating the disciplinary knowledge map;
and the knowledge inference module is used for carrying out knowledge inference on the subject knowledge graph by adopting a TransE algorithm, adding new element knowledge points and hierarchical relations thereof and further perfecting the subject knowledge graph.
Further, the DKT module specifically includes:
acquiring learning test data of a learner in all time sequences in the system as training samples of the DKT model;
implementing an LSTM training by adopting a tensorflow method;
inputting the learning data of each learner as a batch and then performing model training;
after the model training is finished, a DKT model is led out and is deployed into tensoflow serving;
and calling the DKT model on line in real time through tensoflow serving, and feeding back the knowledge point mastering state of students and the right answer probability of the knowledge points at the next time.
Another embodiment of the present invention further provides a computer-readable storage medium, which includes a stored computer program, wherein when the computer program runs, the apparatus on which the computer-readable storage medium is located is controlled to execute the method for constructing a knowledge graph as described above.
The embodiment of the invention has the following beneficial effects:
the embodiment of the invention provides a method, a system and a storage medium for constructing a knowledge graph, wherein the method comprises the following steps: acquiring initial learning data, storing the initial learning data into a learning resource database, and extracting and labeling ontology terms of the initial learning data according to a subject teaching outline and teaching materials; inputting the relation and sequence between each element knowledge point, constructing a directed disciplinary knowledge map through a body editor, generating an OWL file and storing the OWL file into a learning system in a triple form; updating the learning process state data of the input user in real time, and performing multi-knowledge point modeling on the learning process state data of the user through a DKT algorithm to obtain the learning mastery state of the user; and mining the relation among the knowledge points according to the learning and mastering state of the user, and dynamically updating the subject knowledge map. According to the knowledge graph modeling method based on deep learning, DKT is tracked based on deep learning, the scene of multi-knowledge-point modeling is effectively solved, the latest knowledge state of a learner is accurately mastered, the construction of a knowledge graph is completed, the internal relation of knowledge points is obtained, knowledge reasoning is carried out on the knowledge graph, and the knowledge graph is dynamically updated.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
First, application scenarios that can be provided by the present invention, such as constructing a knowledge graph, are introduced.
The first embodiment of the present invention:
please refer to fig. 1-4.
As shown in fig. 1, the method for constructing a knowledge graph provided in this embodiment at least includes the following steps:
s101, collecting initial learning data, storing the initial learning data into a learning resource database, extracting body terms of the initial learning data according to a subject teaching outline and teaching materials, and labeling the extracted body terms;
specifically, for step S101, initial learning data including course education targets, learning resources, course structures, teaching strategies, exercise test question banks, and the like in each field are collected, subject experts extract subject terms from subject teaching schemas and teaching materials in advance, and labeling the subject terms is divided into two modes, namely, automatic labeling and manual labeling, so that a more efficient labeling mode can be switched for different situations, and flexibility and work efficiency are improved.
S102, inputting the relation and sequence between each element knowledge point, constructing a directed discipline knowledge map through a body editor, generating an OWL file and storing the OWL file into a learning system in a triple form;
specifically, for step S102, after the association and sequence between each meta knowledge point are entered according to the initial learning data collected in step S101, a directed disciplinary knowledge graph is constructed by using an ontology editor Prot g according to the extracted and labeled ontology terms and the association and sequence between the meta knowledge points, and after the construction is completed, an OWL file is generated and stored in the learning system in the form of a triplet.
S103, updating and recording the learning process state data of the user in real time, and performing multi-knowledge point modeling on the learning process state data of the user through a DKT algorithm to obtain the learning mastery state of the user;
specifically, in step S103, the learner logs in the learning system to perform learning and testing, and in the learning process, according to the learning and testing data of the learner, the model of the learner is built at multiple knowledge points through the DKT algorithm, so as to obtain the current learning and mastering state of the learner, such as the mastering state, answering time, answering accuracy and other data of each knowledge point.
And S104, mining the relation among the knowledge points according to the learning and mastering state of the user, and dynamically updating the subject knowledge map.
Specifically, in step S104, the answer pair probabilities of all knowledge points of the current learner are obtained through DKT to obtain the corresponding learning mastery state, the implicit relationship between the knowledge points is obtained by identifying the relationship through the probability association rule mining technology, and the disciplinary knowledge graph is dynamically updated.
In a preferred embodiment, the method for constructing a knowledge-graph further includes:
and carrying out knowledge reasoning on the discipline knowledge graph by adopting a TransE algorithm, adding new element knowledge points and hierarchical relations thereof, and further perfecting the discipline knowledge graph.
Specifically, translation is performed on an OWL file in a triple form stored in a learning system, wherein the OWL file includes a subject, a relationship and an object, and the subject and the object are collectively referred to as an entity. When a directed knowledge graph is constructed, the three are linked to form a graph, wherein each node is an entity, each edge is a relation, a subject points to an object, the visual meaning of TransE is distributed vector representation based on the entity and the relation, the relation in each triple is regarded as translation from the subject to the object, and the vector value of the subject plus the vector value of the relation is enabled to be as far as possible with the vector value of the object by continuously adjusting the subject, the relation and the object, so that the relation deduction between the subject and the object is completed, new element knowledge points and the relation thereof are increased and decreased, the subject knowledge graph is further perfected, and the prediction accuracy and the effectiveness of the knowledge graph are improved.
The training process of the TransE algorithm comprises the following steps: firstly, determining a training set, a hyper-parameter gamma and a learning rate lambda; initializing a relation vector and an entity vector, randomly taking a value in a preset value range for each dimension of each vector, and normalizing all vectors after initializing; entering a loop, adopting minipatch, accelerating the training speed of a batch of training, carrying out negative sampling on each batch of data (randomly replacing a certain entity of a triplet in a training set), and initially setting T _ batch as an empty list, and then adding a list consisting of tuple pairs (original triplets, broken triplets) to the empty list: t _ batch [ ([ h, r, T ], [ h ', r, T' ]), ([ ], [ ]), ·. ·; and taking the T _ batch, training, and adjusting parameters by adopting gradient descent.
In a preferred embodiment, the initial learning data includes learner personal information, subject learning content information, subject exercise test data, and learning resource data.
Specifically, learning resource data and subject learning content information are collected by an education institution in advance and are subjected to system entry, a learner logs in a system to register personal information, and subject exercise test data corresponding to the learner is obtained after the learner tests the subject knowledge.
In a preferred embodiment, the labeling includes automatic labeling and manual labeling, wherein,
the automatic labeling specifically comprises the following steps: in the process of constructing a directed discipline knowledge graph by an ontology editor, a Jena framework is used for realizing the hierarchical relation of ontology terms;
the manual labeling specifically comprises the following steps: and sorting the attribute relation and the correlation relation of the extracted ontology terms, and importing the ontology terms into an ontology editor through a Jena frame.
Specifically, the method for labeling the ontology terms further comprises two modes, namely system automatic labeling and manual labeling, and a Jena framework is used for realizing the hierarchical relationship of the ontology in the process of constructing the ontology by using the Protege; the attribute relation and the related relation are sorted by subject experts and then are imported into the Prot g e through a Jena framework, a more efficient labeling mode is switched according to different scenes, and flexibility and working efficiency are improved.
In a preferred embodiment, the learning process state data comprises knowledge state records and knowledge level test records, wherein,
the knowledge state records comprise statistical information of mastered knowledge points and statistical information of mastered knowledge points, and the knowledge level test records comprise questions, answer results, answer times and answer time.
Specifically, in the learning process of the learner, learning process state data including knowledge state records and knowledge level test records, the mastering state of the learner on each knowledge point and the knowledge point test records are collected and updated in real time.
In a preferred embodiment, as shown in fig. 3, the learning process state data recorded in the form of the real-time update is subjected to multi-knowledge point modeling by using a DKT algorithm to obtain the learning and mastering state of the user, and specifically includes:
acquiring learning test data of a learner in all time sequences in the system as training samples of the DKT model;
implementing an LSTM training by adopting a tensorflow method;
inputting the learning data of each learner as a batch and then performing model training;
after the model training is finished, a DKT model is led out and is deployed into tensoflow serving;
and calling the DKT model on line in real time through tensoflow serving, and feeding back the knowledge point mastering state of students and the right answer probability of the knowledge points at the next time.
Specifically, the DKT model shown in fig. 2 is developed in time sequence, and the sequence x1, x2, x3. corresponds to t1, t2, t3..1,x′2,x′3… corresponds to the external characteristics (answer times and answer time) of the input sequences x1, x2 and x3, hidden layer states h0, h1 and h2 … correspond to knowledge point mastering conditions of students at all times, the output sequences y1, y2 and y3 … of the model correspond to the probability that all exercises in an answer question library of the students at all times answer correct answers, when a learner answers the questions, the system carries out real-time online calling of the model through tensoflow serving, the knowledge point mastering state of the students is returned from the hidden layer, the output layer returns the correct probability of the next answer of the knowledge point, the learning and mastering state of the user is accurately obtained, and the subject knowledge map is dynamically updated.
In a preferred embodiment, as shown in fig. 4, the mining of the relation between the knowledge points according to the learning and grasping state of the user and the dynamic update of the discipline knowledge graph specifically include:
acquiring the answer correct probability of each knowledge point of all learners at present through a DKT model;
adopting a probability association rule mining technology to identify implicit relations between relation acquisition knowledge points;
and judging the mastering state of the knowledge points by the user according to the implicit relationship among the knowledge points, and dynamically updating the disciplinary knowledge map in real time.
Specifically, the excavating method comprises the following specific steps: acquiring all knowledge point answer pair probabilities of all learners at present through DKT; and identifying the implicit relation between the relation acquisition knowledge points by a probability association rule mining technology. From the perspective of the prerequisite relationship, if the concept Si is a prerequisite of the concept Sj, then a learner who does not know Si is likely not to know Sj, and a learner who knows Sj is likely to know Si, resulting in
Calculating two key indexes of the support degree suppp and the confidence degree conf, defining two key parameters minsupp and minconf, and substituting the two key parameters minsupp and minconf into the formula to obtain:
in the same way, the method can obtain,
the method for constructing the knowledge graph provided by the embodiment comprises the following steps: acquiring initial learning data, storing the initial learning data into a learning resource database, and extracting and labeling ontology terms of the initial learning data according to a subject teaching outline and teaching materials; inputting the relation and sequence between each element knowledge point, constructing a directed disciplinary knowledge map through a body editor, generating an OWL file and storing the OWL file into a learning system in a triple form; updating the learning process state data of the input user in real time, and performing multi-knowledge point modeling on the learning process state data of the user through a DKT algorithm to obtain the learning mastery state of the user; and mining the relation among the knowledge points according to the learning and mastering state of the user, and dynamically updating the subject knowledge map. According to the knowledge graph modeling method based on deep learning, DKT is tracked based on deep learning, the problem of multi-knowledge-point modeling scenes is effectively solved, the latest knowledge state of a learner is accurately mastered, the construction of the knowledge graph is completed, the internal relation of knowledge points is obtained, knowledge reasoning is carried out on the knowledge graph, and the knowledge graph is dynamically updated.
Second embodiment of the invention
Please refer to fig. 2-5.
As shown in fig. 5, an embodiment of the present invention further provides a system for constructing a knowledge graph, including:
thedata module 100 is used for collecting initial learning data, storing the initial learning data into a learning resource database, extracting body terms of the initial learning data according to a subject teaching outline and teaching materials, and labeling the extracted body terms;
specifically, for thedata module 100, initial learning data is collected, including course education targets, learning resources, course structures, teaching strategies, exercise test question libraries and the like in each field, subject experts extract subject terms from subject teaching outlines and teaching materials in advance, and the process of labeling the subject terms is divided into two modes of automatic labeling and manual labeling of the system, so that a more efficient labeling mode can be switched according to different scenes, and flexibility and work efficiency are improved.
Themap building module 200 is used for inputting the relation and sequence between each element knowledge point, building a directed discipline knowledge map through the ontology editor, generating an OWL file and storing the OWL file into the learning system in a triple form;
specifically, for theatlas construction module 200, after the association and sequence between each meta-knowledge point are entered according to the initial learning data acquired by thedata module 100, a directed disciplinary knowledge atlas is constructed by using the ontology editor Prot g e according to the extracted and labeled ontology terms and the association and sequence between the meta-knowledge points, and after the construction is completed, an OWL file is generated and stored in the learning system in the form of a triplet.
TheDKT module 300 is configured to update the learning process state data entered into the user in real time, and perform multi-knowledge point modeling on the learning process state data of the user through a DKT algorithm to obtain a learning mastery state of the user;
specifically, for theDKT module 300, a learner logs in the learning system to learn and test, and during the learning process, according to the learning and testing data of the learner, the model of the learner with multiple knowledge points is realized through the DKT algorithm, so as to obtain the current learning and mastering state of the learner, such as the mastering state, answering time, answering accuracy and other data of each knowledge point.
A knowledgepoint mining module 400, configured to mine the relation between the knowledge points according to the learning and mastering state of the user, and dynamically update the discipline knowledge graph;
specifically, for the knowledgepoint mining module 400, as shown in fig. 4, the answer correct probability of each knowledge point of all learners is obtained through the DKT model; adopting a probability association rule mining technology to identify implicit relations between relation acquisition knowledge points; and judging the mastering state of the knowledge points by the user according to the implicit relationship among the knowledge points, and dynamically updating the disciplinary knowledge map in real time.
The excavating method comprises the following specific steps: acquiring all knowledge point answer pair probabilities of all learners at present through DKT; and identifying the implicit relation between the relation acquisition knowledge points by a probability association rule mining technology. From the perspective of the prerequisite relationship, if the concept Si is a prerequisite of the concept Sj, then a learner who does not know Si is likely not to know Sj, and a learner who knows Sj is likely to know Si, resulting in
Calculating two key indexes of the support degree suppp and the confidence degree conf, defining two key parameters minsupp and minconf, and substituting the two key parameters minsupp and minconf into the formula to obtain:
in the same way, the method can obtain,
and theknowledge inference module 500 is used for performing knowledge inference on the subject knowledge graph by adopting a TransE algorithm, adding new element knowledge points and hierarchical relations thereof, and further improving the subject knowledge graph.
Specifically, for theknowledge inference module 500, the OWL files in the form of triples stored in the learning system are translated, including subjects, relationships and objects, and the subjects and the objects are collectively referred to as entities. When a directed knowledge graph is constructed, the three are linked to form a graph, wherein each node is an entity, each edge is a relation, a subject points to an object, the visual meaning of TransE is distributed vector representation based on the entity and the relation, the relation in each triple is regarded as translation from the subject to the object, and the vector value of the subject plus the vector value of the relation is enabled to be as far as possible with the vector value of the object by continuously adjusting the subject, the relation and the object, so that the relation deduction between the subject and the object is completed, new element knowledge points and the relation thereof are increased and decreased, the subject knowledge graph is further perfected, and the prediction accuracy and the effectiveness of the knowledge graph are improved.
The training process of the TransR algorithm comprises the following steps: firstly, determining a training set, a hyper-parameter gamma and a learning rate lambda; initializing a relation vector and an entity vector, randomly taking a value in a preset value range for each dimension of each vector, and normalizing all vectors after initializing; entering a loop, adopting minipatch, accelerating the training speed of a batch of training, carrying out negative sampling on each batch of data (randomly replacing a certain entity of a triplet in a training set), and initially setting T _ batch as an empty list, and then adding a list consisting of tuple pairs (original triplets, broken triplets) to the empty list: t _ batch [ ([ h, r, T ], [ h ', r, T' ]), ([ ], [ ]), ·. ·; and taking the T _ batch, training, and adjusting parameters by adopting gradient descent.
In a preferred embodiment, as shown in fig. 3, theDKT module 300 is specifically constructed by the following steps:
acquiring learning test data of a learner in all time sequences in the system as training samples of the DKT model;
implementing an LSTM training by adopting a tensorflow method;
inputting the learning data of each learner as a batch and then performing model training;
after the model training is finished, a DKT model is led out and is deployed into tensoflow serving;
and calling the DKT model on line in real time through tensoflow serving, and feeding back the knowledge point mastering state of students and the right answer probability of the knowledge points at the next time.
Specifically, the DKT model structure shown in fig. 2 is developed in time sequence, and the sequence x1, x2, x3. corresponds to t1, t2, t3..1,x′2,x′3… corresponds to the external characteristics (answer times and answer time) of the input sequences x1, x2 and x3, hidden layer states h0, h1 and h2 … correspond to knowledge point mastering conditions of students at all times, the output sequences y1, y2 and y3 … of the model correspond to the probability that all exercises in an answer question library of the students at all times answer correct answers, when a learner answers the questions, the system carries out real-time online calling of the model through tensoflow serving, the knowledge point mastering state of the students is returned from the hidden layer, the output layer returns the correct probability of the next answer of the knowledge point, the learning and mastering state of the user is accurately obtained, and the subject knowledge map is dynamically updated.
In a preferred embodiment, the labeling includes automatic labeling and manual labeling, wherein,
the automatic labeling specifically comprises the following steps: in the process of constructing a directed discipline knowledge graph by an ontology editor, a Jena framework is used for realizing the hierarchical relation of ontology terms;
the manual labeling specifically comprises the following steps: and sorting the attribute relation and the correlation relation of the extracted ontology terms, and importing the ontology terms into an ontology editor through a Jena frame.
Specifically, the method for labeling the ontology terms further comprises two modes, namely system automatic labeling and manual labeling, and a Jena framework is used for realizing the hierarchical relationship of the ontology in the process of constructing the ontology by using the Protege; the attribute relation and the related relation are sorted by subject experts and then are imported into the Prot g e through a Jena framework, a more efficient labeling mode is switched according to different scenes, and flexibility and working efficiency are improved.
The system for constructing a knowledge graph provided by the embodiment comprises: the data module is used for collecting initial learning data, storing the initial learning data into a learning resource database, extracting body terms of the initial learning data according to a subject teaching outline and teaching materials, and labeling the extracted body terms; the map building module is used for inputting the relation and the sequence between each element knowledge point, building a directed discipline knowledge map through the body editor, generating an OWL file and storing the OWL file into the learning system in a triple form; the DKT module is used for updating and inputting the learning process state data of the user in real time, and performing multi-knowledge point modeling on the learning process state data of the user through a DKT algorithm to obtain the learning mastery state of the user; the knowledge point mining module is used for mining the relation among the knowledge points according to the learning and mastering state of the user and dynamically updating the disciplinary knowledge map; and the knowledge inference module is used for carrying out knowledge inference on the subject knowledge graph by adopting a TransE algorithm, adding new element knowledge points and hierarchical relations thereof and further perfecting the subject knowledge graph. According to the knowledge graph modeling method based on deep learning, DKT is tracked based on deep learning, the problem of multi-knowledge-point modeling scenes is effectively solved, the latest knowledge state of a learner is accurately mastered, the construction of the knowledge graph is completed, the internal relation of knowledge points is obtained, knowledge reasoning is carried out on the knowledge graph, and the knowledge graph is dynamically updated.
Another embodiment of the present invention further provides a computer-readable storage medium, which includes a stored computer program, wherein when the computer program runs, the apparatus on which the computer-readable storage medium is located is controlled to execute the method for constructing a knowledge graph as described above.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the modules may be a logical division, and in actual implementation, there may be another division, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.
The foregoing is directed to the preferred embodiment of the present invention, and it is understood that various changes and modifications may be made by one skilled in the art without departing from the spirit of the invention, and it is intended that such changes and modifications be considered as within the scope of the invention.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.