Disclosure of Invention
The present invention is directed to solving at least one of the problems of the prior art or the related art.
To this end, an aspect of the present invention is to provide a knowledge point prediction method.
Another aspect of the invention is to provide a knowledge point prediction system.
Yet another aspect of the present invention is directed to a readable storage medium.
In view of the above, according to an aspect of the present invention, a knowledge point prediction method is provided, including: acquiring an item bank, and corresponding knowledge point labels and knowledge point label trees; preprocessing all question texts in a question bank according to the knowledge point labels to obtain a plurality of word vector samples; training a preset text classification model according to a plurality of word vector samples and a knowledge point label tree to obtain a prediction model; and predicting the knowledge points according to the prediction model.
The knowledge point prediction method provided by the invention comprises the steps of firstly, collecting an online question bank, knowledge point labels of the question bank and a knowledge point label tree, wherein the knowledge point label tree enriches the hierarchy representation of the knowledge point labels of each question. And then, entering a question preprocessing stage, and preprocessing all question texts in a question bank through the knowledge point labels to obtain a plurality of word vector samples. And then, training a preset text classification model according to the plurality of word vector samples and the knowledge point label tree, thereby converting the problem of predicting the related knowledge points of the question into a hierarchical multi-label classification problem. Moreover, through knowledge point hierarchical characteristics, some similar knowledge point labels can be combined, for example, equation calculation in mathematics can include a unary linear equation and a multivariate linear equation, so that a hierarchical multivariate label classification structure can be simplified. And finally, predicting the knowledge points according to the prediction model obtained by training, namely the optimal model. According to the knowledge point prediction method provided by the invention, on one hand, the related knowledge point prediction problem of the question is converted into the hierarchical multi-label classification problem, compared with the traditional question knowledge point prediction method, manual marking and selection of similar thresholds by experience are not needed, so that the prediction precision is improved, on the other hand, word vectors are used for replacing word vectors when a model is trained, word segmentation is not needed in the question text preprocessing stage, and a domain dictionary is also not needed, so that manpower and material resources are saved, and the prediction precision and universality are greatly improved.
The knowledge point label tree is a multidisciplinary hierarchical knowledge point label tree; the online question bank may be one or more, and is not limited herein.
The knowledge point prediction method according to the present invention may further include the following technical features:
in the above technical solution, the step of training the preset text classification model according to the plurality of word vector samples and the knowledge point label tree to obtain the prediction model specifically includes: training a preset text classification model according to a plurality of word vector samples and a knowledge point label tree to obtain a target model, and outputting corresponding hierarchical labels; judging whether the hierarchical label is predicted correctly; taking the target model as a prediction model under the condition that the prediction is correct; and continuing to train the target model based on the prediction error.
In the technical scheme, the preset text classification model is trained through a plurality of word vector samples and a knowledge point label tree, and the preset text classification model can learn hierarchical knowledge point labels. And finishing training to obtain a target model and outputting a corresponding hierarchical label. And determining whether the target model is the optimal model by judging whether the hierarchical label is predicted correctly. And if the hierarchical label prediction is correct, the target model is the optimal model, and the target model is taken as the prediction model. If the hierarchical label is wrong in prediction, the target model needs to be trained continuously to obtain the optimal model. By the technical scheme, the preset text classification model can learn the hierarchical knowledge point labels and judge whether the hierarchical labels are predicted correctly, so that the optimal model is determined and the prediction precision is improved.
In any of the above technical solutions, the hierarchical label includes a first-level classification label and a second-level classification label.
In this embodiment, the hierarchical labels include a first class label and a second class label, but are not limited thereto.
In any of the above technical solutions, the step of training a preset text classification model according to a plurality of word vector samples and a knowledge point label tree to obtain a target model and outputting a hierarchical label specifically includes: inputting a plurality of word vector samples into a preset text classification model for training so as to output a first-level classification label; and under the condition that the parameters of the convolutional layer based on the word vector samples and the preset text classification model are not changed, finely adjusting the preset text classification model according to the knowledge point label tree to obtain a target model, and outputting a secondary classification label.
In the technical scheme, in the step of topic text preprocessing, all topic texts are converted into a plurality of word vector samples, and the word vector samples are input into a preset text classification model for feature learning to train a primary classification label, so that text sequence information is effectively captured. And then keeping a plurality of word vector samples and convolutional layer parameters unchanged, and on the basis, inputting the knowledge point label tree into an output layer of the first target model to perform fine tuning learning to obtain the target model and outputting a secondary classification label, thereby effectively learning the hierarchical relationship of the knowledge points. By the technical scheme, the problem of prediction of related knowledge points of the question is converted into the problem of hierarchical label classification, so that text sequence information is effectively captured, and the hierarchical relation of the knowledge points is effectively learned, so that the prediction precision and the universality are greatly improved.
In any of the above technical solutions, the primary classification label includes any one of or a combination of the following: mathematics, language, english, physics, chemistry, biology; the secondary classification label comprises any one or a combination of the following: equation, triangle, acceleration, gravity.
In this embodiment, the first-level classification label includes any one or a combination of mathematics, language, english, physics, chemistry, and biology, and it is understood that the subject types in the education field are not limited to mathematics, language, english, physics, chemistry, and biology, and thus the classification of the top-level label is not limited thereto. Likewise, knowledge points under each discipline are diverse, so the secondary classification labels include, but are not limited to, one or a combination of equations, triangles, acceleration, gravity.
In any of the above technical solutions, the preset text classification model is a TextCNN model.
In the technical solution, the preset text classification model is a TextCNN model, but is not limited thereto. The textCNN model is used for extracting the text features of the question, semantic features among short texts can be efficiently extracted, and meanwhile, fine tuning learning is carried out on the improved textCNN model based on the knowledge point label tree, so that the model can learn hierarchical knowledge point labels, the prediction precision and universality are greatly improved, and the prediction speed is improved.
In any of the above technical solutions, the step of preprocessing all question texts in the question bank according to the knowledge point labels specifically includes: according to one or more knowledge point labels corresponding to each topic text, each topic text is divided into one or more corresponding text short sentences; and converting a plurality of text short sentences corresponding to all the title texts into a plurality of corresponding word vector samples.
In the technical scheme, in the data acquisition stage, an online question bank, corresponding knowledge point tags and a knowledge point tag tree are crawled. And in the data preprocessing stage, preprocessing each topic text according to the knowledge point labels. Firstly, each title text is divided into different text short sentences according to punctuation marks such as commas, periods, spaces and the like, each text short sentence corresponds to a knowledge point label, then the text short sentences are expressed by word vectors to obtain word vector samples, and the word vectors are used for replacing the word vectors during training, so that word segmentation is not needed, a field dictionary is also not needed, manpower and material resources are saved, and the universality and the field independence of the model are ensured.
In any of the above technical solutions, before the step of segmenting each topic text into one or more corresponding text phrases according to one or more corresponding knowledge point tags of each topic text, the method further includes: and performing data filtering processing on each title text to delete preset characters.
In the technical scheme, in the topic text preprocessing stage, before short sentence segmentation is performed on each topic text, filtering processing is also required. Preset characters, namely unnecessary characters, and characters irrelevant to knowledge points are filtered out through data filtering processing, so that interference is removed, and text phrases are segmented better and faster. And then, performing multi-scale convolution kernel feature learning by using a TextCNN model to effectively capture text sequence information.
According to another aspect of the present invention, there is provided a knowledge point prediction system including: a memory storing a program; and the processor is used for realizing the knowledge point prediction method of any one technical scheme when the processor executes the program.
The knowledge point prediction system provided by the invention realizes the steps of the knowledge point prediction method of any one of the above technical schemes when the processor executes the program, so the knowledge point prediction system has all the beneficial effects of the knowledge point prediction method of any one of the above technical schemes.
According to still another aspect of the present invention, there is provided a readable storage medium, on which a program is stored, the program, when executed by a processor, implementing the knowledge point prediction method according to any one of the above-mentioned aspects.
The readable storage medium provided by the present invention, when being executed by a processor, implements the steps of the knowledge point prediction method according to any one of the above-mentioned technical solutions, and therefore, the readable storage medium includes all the advantageous effects of the knowledge point prediction method according to any one of the above-mentioned technical solutions.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments of the present invention and features of the embodiments may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited to the specific embodiments disclosed below.
According to an embodiment of an aspect of the present invention, a knowledge point prediction method is proposed.
First embodiment, fig. 1 is a flowchart illustrating a knowledge point prediction method according to a first embodiment of the present invention. The knowledge point prediction method comprises the following steps:
step 102, acquiring an item bank, and corresponding knowledge point labels and knowledge point label trees;
step 104, preprocessing all question texts in a question bank according to the knowledge point labels to obtain a plurality of word vector samples;
step 106, training a preset text classification model according to a plurality of word vector samples and a knowledge point label tree to obtain a prediction model;
and step 108, predicting the knowledge points according to the prediction model.
According to the knowledge point prediction method provided by the embodiment, firstly, an online question bank, knowledge point labels of the question bank and a knowledge point label tree are collected, and the knowledge point label tree enriches the level representation of the knowledge point labels of each question. And then, entering a question preprocessing stage, and preprocessing all question texts in a question bank through the knowledge point labels to obtain a plurality of word vector samples. And then, training a preset text classification model according to the plurality of word vector samples and the knowledge point label tree, thereby converting the problem of predicting the related knowledge points of the question into a hierarchical multi-label classification problem. Moreover, through knowledge point hierarchical characteristics, some similar knowledge point labels can be combined, for example, equation calculation in mathematics can include a unary linear equation and a multivariate linear equation, so that a hierarchical multivariate label classification structure can be simplified. And finally, predicting the knowledge points according to the prediction model obtained by training, namely the optimal model. According to the knowledge point prediction method provided by the invention, on one hand, the related knowledge point prediction problem of the question is converted into the hierarchical multi-label classification problem, compared with the traditional question knowledge point prediction method, manual marking and selection of similar thresholds by experience are not needed, so that the prediction precision is improved, on the other hand, word vectors are used for replacing word vectors when a model is trained, word segmentation is not needed in the question text preprocessing stage, and a domain dictionary is also not needed, so that manpower and material resources are saved, and the prediction precision and universality are greatly improved.
The knowledge point label tree is a multidisciplinary hierarchical knowledge point label tree; the online question bank may be one or more, and is not limited herein.
Second embodiment, fig. 2 is a flow chart of a knowledge point prediction method according to a second embodiment of the present invention. The knowledge point prediction method comprises the following steps:
step 202, obtaining an item bank, and corresponding knowledge point labels and knowledge point label trees;
step 204, preprocessing all question texts in a question bank according to the knowledge point labels to obtain a plurality of word vector samples;
step 206, training a preset text classification model according to a plurality of word vector samples and a knowledge point label tree to obtain a target model, and outputting corresponding hierarchical labels;
step 208, judging whether the hierarchical label is predicted correctly; if the prediction is correct, go to step 210, if the prediction is incorrect, go to step 212;
step 210, taking the target model as a prediction model, and predicting the knowledge points;
and step 212, continuing to train the target model.
In this embodiment, the preset text classification model is trained by a plurality of word vector samples and a knowledge point label tree, and the preset text classification model can learn hierarchical knowledge point labels. And finishing training to obtain a target model and outputting a corresponding hierarchical label. And determining whether the target model is the optimal model by judging whether the hierarchical label is predicted correctly. And if the hierarchical label prediction is correct, the target model is the optimal model, and the target model is taken as the prediction model. If the hierarchical label is wrong in prediction, the target model needs to be trained continuously to obtain the optimal model. By the technical scheme, the preset text classification model can learn the hierarchical knowledge point labels and determine the optimal model, so that the training precision is improved.
In one embodiment of the invention, the hierarchical labels include a primary classification label and a secondary classification label.
In this embodiment, the hierarchical labels include, but are not limited to, primary classification labels and secondary classification labels.
Third embodiment, fig. 3 is a flow chart of a knowledge point prediction method according to a third embodiment of the present invention. The knowledge point prediction method comprises the following steps:
step 302, obtaining an item bank, and corresponding knowledge point labels and knowledge point label trees;
step 304, preprocessing all question texts in a question bank according to the knowledge point labels to obtain a plurality of word vector samples;
step 306, inputting a plurality of word vector samples into a preset text classification model for training so as to output a first-level classification label;
308, fine-tuning the preset text classification model according to the knowledge point label tree under the condition that the convolution layer parameters of the word vector samples and the preset text classification model are not changed, so as to obtain a target model and output a secondary classification label;
step 310, judging whether the first-level classification label and the second-level classification label are predicted correctly; if the prediction is correct, go to step 312, if the prediction is incorrect, go to step 314;
step 312, taking the target model as a prediction model, and predicting the knowledge points;
and step 314, continuing to train the target model.
In the embodiment, in the subject text preprocessing stage, all the subject texts are converted into a plurality of word vector samples, and the word vector samples are input into a preset text classification model for feature learning to train a primary classification label, so that the text sequence information is effectively captured. And then keeping a plurality of word vector samples and convolutional layer parameters unchanged, and on the basis, inputting the knowledge point label tree into an output layer of the first target model to perform fine tuning learning to obtain the target model and outputting a secondary classification label, thereby effectively learning the hierarchical relationship of the knowledge points. By the technical scheme, the problem of prediction of related knowledge points of the question is converted into the problem of hierarchical label classification, so that text sequence information is effectively captured, and the hierarchical relation of the knowledge points is effectively learned, so that the prediction precision and the universality are greatly improved.
Further, the bsf (boolean screening function) algorithm is used to determine whether the hierarchical label is predicted correctly.
Fourth embodiment, fig. 4 is a flow chart of a knowledge point prediction method according to a fourth embodiment of the present invention. The knowledge point prediction method comprises the following steps:
step 402, obtaining an item bank, and corresponding knowledge point labels and knowledge point label trees;
step 404, segmenting each topic text into one or more corresponding text short sentences according to one or more corresponding knowledge point labels of each topic text;
step 406, converting a plurality of text phrases corresponding to all subject texts into a plurality of corresponding word vector samples;
step 408, inputting a plurality of word vector samples into a preset text classification model for training so as to output a first-level classification label;
step 410, fine-tuning a preset text classification model according to a knowledge point label tree under the condition that the convolution layer parameters of a plurality of word vector samples and the preset text classification model are kept unchanged to obtain a target model and output a secondary classification label;
step 412, determining whether the first class label and the second class label are predicted correctly; if the prediction is correct, go to step 414; if the prediction is incorrect, go to step 416;
step 414, taking the target model as a prediction model, and predicting the knowledge points;
and step 416, continuing to train the target model.
In this embodiment, in the data collection stage, the online question bank and the corresponding knowledge point tags and knowledge point tag trees are obtained by crawling. And in the data preprocessing stage, preprocessing each topic text according to the knowledge point labels. Firstly, each title text is divided into different text short sentences according to punctuation marks such as commas, periods, spaces and the like, each text short sentence corresponds to a knowledge point label, and then the text short sentences are expressed by word vectors, so that word vector samples are obtained. The word vectors are replaced by the word vectors during training, so that word segmentation and a domain dictionary are not needed, manpower and material resources are saved, and the universality and the domain independence of the model are ensured.
In the fifth embodiment, further before the step of segmenting each topic text into one or more corresponding text phrases according to one or more corresponding knowledge point labels of each topic text, the knowledge point prediction method further includes: and performing data filtering processing on each title text to delete preset characters.
In this embodiment, in the topic text preprocessing stage, before the phrase segmentation is performed on each topic text, a filtering process is also required. Preset characters, namely unnecessary characters, and characters irrelevant to knowledge points are filtered through data filtering processing, so that interference is removed, text phrases are segmented better and faster, a TextCNN model is used for multi-scale convolution kernel feature learning, and text sequence information is effectively captured.
In any of the above embodiments, the primary classification label comprises any one or combination of: mathematics, language, english, physics, chemistry, biology; the secondary classification label comprises any one or a combination of the following: equation, triangle, acceleration, gravity.
In this embodiment, the primary category labels include any one or a combination of mathematics, language, English, physics, chemistry, biology, it is understood that the subject types in the field of education are not limited to mathematics, language, English, physics, chemistry, biology, and thus the category of the top-level labels is not limited thereto. Likewise, knowledge points under each discipline are diverse, so the secondary classification labels include, but are not limited to, one or a combination of equations, triangles, acceleration, gravity.
In any of the above embodiments, the predetermined text classification model is a TextCNN model.
In this embodiment, the preset text classification model is a TextCNN model, but is not limited thereto. The textCNN model is used for extracting the text features of the question, semantic features among short texts can be efficiently extracted, and meanwhile, fine tuning learning is carried out on the improved textCNN model based on the knowledge point label tree, so that the model can learn hierarchical knowledge point labels, the prediction precision and universality are greatly improved, and the prediction speed is improved.
According to an embodiment of another aspect of the present invention, a knowledge point prediction system is proposed, and fig. 5 shows a schematic block diagram of a knowledgepoint prediction system 500 of an embodiment of the present invention. The knowledgepoint prediction system 500 includes: amemory 502, thememory 502 storing a program; aprocessor 504, wherein theprocessor 504 executes a program to implement the knowledge point prediction method according to any one of the above embodiments.
In the knowledgepoint prediction system 500 provided in the present embodiment, theprocessor 504 executes the program to implement the steps of the knowledge point prediction method according to any one of the above embodiments, so that the knowledgepoint prediction system 500 includes all the beneficial effects of the knowledge point prediction method according to any one of the above embodiments.
Fig. 6 is a flow chart of a knowledge point prediction method according to an embodiment of the present invention. The knowledge point prediction method comprises the following steps:
step 602, crawling an acquired question bank and a hierarchical knowledge point tag tree, and downloading open source word vectors;
step 604, filtering out unnecessary characters, separating sentences from texts according to commas and periods, dividing different short sentences by using spaces, wherein each subject text short sentence is in one-to-one correspondence with a knowledge point label, and the short sentences are expressed by word vectors with fixed lengths;
step 606, transmitting the phrase matrix represented by the word vector into an improved TextCNN for training;
step 608, keeping the vector and the convolution layer parameters unchanged, performing fine tuning learning on the basis, and judging whether the hierarchical label is predicted correctly by using a BSF algorithm;
and step 610, determining an optimal model and carrying out industrial field deployment.
In this embodiment, the knowledge point prediction method is totally divided into 5 steps:
1. and a data acquisition stage, wherein an online question bank, corresponding knowledge point labels and knowledge point label trees are acquired, and word vectors of Tencent open sources are downloaded. The knowledge point label tree is a multidisciplinary hierarchical knowledge point label tree, taking a question bank a as an example, and the structure of the knowledge point label tree is shown in fig. 7, wherein the first-level classification labels comprise mathematics, physics and the like, and the second-level classification labels correspond to the mathematics, and comprise equation calculation, triangles and the like.
2. And in the data preprocessing stage, the question text is divided according to punctuation marks and is standardized into word vectors with fixed length, and the word vectors are fed to a TextCNN model for extracting the features of the question text. However, in the related art, the word segmentation is performed first, and then the keyword is extracted according to the domain dictionary to serve as the knowledge point, but the word segmentation and the dictionary are not needed in the embodiment, and the word vector is directly used, so that the scheme has universality and domain independence. The architecture of the TextCNN model is shown in fig. 8.
3. And in the model training stage, a phrase matrix represented by a word vector is transmitted into improved TextCNN for training, and a TextCNN model is used for extracting the text features of the question, so that the semantic features among the phrases can be quickly and efficiently extracted.
4. And in the model optimization stage, the label prediction algorithm part of the TextCNN model is improved, the vector and the convolutional layer parameters are kept unchanged, fine tuning learning is carried out on the basis, and whether the hierarchical label is predicted correctly is judged by using a BSF algorithm. Through the improved TextCNN model, a knowledge point hierarchical structure is innovatively introduced into a task, the knowledge point label hierarchical representation of each topic is richer, and some similar knowledge point labels can be combined through knowledge point hierarchical characteristics. For example, in the equation calculation in mathematics, a unary equation, a multiple equation and knowledge points have hierarchical relations.
5. And determining an optimal model and carrying out industrial field deployment.
In the related technology, a similarity algorithm is adopted to predict question knowledge points, or a neural network comprising 1 hidden layer is respectively established for the knowledge points, and whether the questions described by the input vector belong to the knowledge points corresponding to the current neural network is judged. Some textCNN models are directly used for multi-label classification, and the hierarchical structure relation among knowledge point labels is not considered. The algorithmic models used in the related art either yield results that are locally optimal or do not take into account the relationship between the text sequence information and the knowledge points. In each embodiment of the invention, the problem of predicting the related knowledge points of the topic is converted into the problem of classifying the hierarchical labels, and the TextCNN is used for carrying out multi-scale convolution kernel for feature learning, so that the text sequence information is effectively captured. And then keeping the word vector and the convolutional layer parameters unchanged, and performing fine tuning learning on the basis, wherein the label is also changed from [ mathematics,. and physics ] to a next layer label [ equation calculation,. the., triangle, acceleration,. and gravity ], so that the knowledge point hierarchical relation is effectively learned.
According to the knowledge point prediction method provided by the embodiment, the hierarchical label prediction method is used for predicting the question knowledge point based on the word vector and the improved TextCNN model, so that the prediction precision and the universality are greatly improved.
According to an embodiment of a further aspect of the present invention, a readable storage medium is proposed, on which a program is stored, which when executed by a processor implements the knowledge point prediction method according to any one of the embodiments described above.
The present embodiment provides a readable storage medium, and the program is executed by a processor to implement the steps of the knowledge point prediction method according to any one of the above embodiments, so that the readable storage medium includes all the beneficial effects of the knowledge point prediction method according to any one of the above embodiments.
Compared with the traditional topic knowledge point prediction method, the knowledge point prediction system and the readable storage medium, the knowledge point prediction method, the knowledge point prediction system and the readable storage medium have the following advantages that:
1. by using the TextCNN model to extract the text features of the title, the semantic features among short texts can be quickly and efficiently extracted.
2. For the topic text, word segmentation is not needed, a domain dictionary is not needed, and the word vector training model is directly used, so that the model has universality and domain independence.
3. The label prediction algorithm part of the TextCNN model is improved, so that the model can learn hierarchical knowledge point labels, and the prediction precision is greatly improved.
In the description herein, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance unless explicitly stated or limited otherwise; the terms "connected," "mounted," "secured," and the like are to be construed broadly and include, for example, fixed connections, removable connections, or integral connections; may be directly connected or indirectly connected through an intermediate. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
In the description herein, the description of the terms "one embodiment," "some embodiments," "specific embodiments," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.