Movatterモバイル変換


[0]ホーム

URL:


CN119783635B - Question generation method, device, equipment and computer readable storage medium - Google Patents

Question generation method, device, equipment and computer readable storage medium
Download PDF

Info

Publication number
CN119783635B
CN119783635BCN202510283454.2ACN202510283454ACN119783635BCN 119783635 BCN119783635 BCN 119783635BCN 202510283454 ACN202510283454 ACN 202510283454ACN 119783635 BCN119783635 BCN 119783635B
Authority
CN
China
Prior art keywords
entity
question
recognition
stem
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202510283454.2A
Other languages
Chinese (zh)
Other versions
CN119783635A (en
Inventor
郑肖南
陈文坚
郑巨隆
易亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Tongyuan Zhihui Technology Co ltd
Original Assignee
Zhejiang Tongyuan Zhihui Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Tongyuan Zhihui Technology Co ltdfiledCriticalZhejiang Tongyuan Zhihui Technology Co ltd
Priority to CN202510283454.2ApriorityCriticalpatent/CN119783635B/en
Publication of CN119783635ApublicationCriticalpatent/CN119783635A/en
Application grantedgrantedCritical
Publication of CN119783635BpublicationCriticalpatent/CN119783635B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Landscapes

Abstract

The invention discloses a method, a device, equipment and a computer readable storage medium for generating topics, which are applied to the field of intelligent topic production and comprise the steps of obtaining text materials containing knowledge points and word libraries formed by pre-arranging entities and corresponding entity labels in each field, carrying out entity recognition on the text materials by utilizing a named entity recognition model and combining the word libraries to obtain recognition results, wherein the recognition results comprise target entities and corresponding target entity labels, carrying out semantic analysis and part-of-speech analysis by utilizing a natural language processing model according to the recognition results in the text materials to generate topics stems of the topics, obtaining interference items of the topics from the word libraries according to the target entity labels, and taking the interference items and the target entities as options of the topics. The invention decomposes the topic generation into a plurality of processes, and each process can achieve the purpose of topic generation accuracy by training and optimizing different models, so that the generated topic accuracy is higher.

Description

Question generation method, device, equipment and computer readable storage medium
Technical Field
The present invention relates to the field of intelligent question generation, and in particular, to a method, apparatus, device, and computer readable storage medium for generating questions.
Background
The questions in the answering game are wide in category and large in quantity, so that the questions which are required to be intelligently generated can cover the aspects of life, and the generated questions are high in quality, so that the answering of the user competition can be met. The deep neural network is generally adopted at the present stage to produce questions in batches. If the neural network is directly used, but because the neural network model is a black box, a certain rule can be learned by the model only through a large amount of data to generate the questions, but how to realize the questions is not known, so that the output result of the model is uncontrolled, and the problems of incorrect answers, non-unique answers, unsuitable interference options and the like can exist.
Therefore, how to ensure the correctness of the final topic generation is a technical problem that needs to be solved currently.
Disclosure of Invention
Accordingly, an object of the present invention is to provide a method, apparatus, device and computer readable storage medium for generating questions, which solve the problem of incorrect generated questions in the prior art.
In order to solve the technical problems, the invention provides a method for generating a question, which comprises the following steps:
acquiring text materials containing knowledge points and word libraries formed by pre-arranging entities in each field and corresponding entity tags;
Carrying out entity recognition on the text material by using a named entity recognition model and combining the word stock to obtain a recognition result, wherein the recognition result comprises a target entity and a corresponding target entity label;
Using a natural language processing model, and carrying out semantic analysis and part-of-speech analysis according to the recognition result in the text material to generate a question stem;
and acquiring an interference item of the question stem from the word stock according to the target entity tag, and taking the interference item and the target entity as options of the question.
Optionally, using a named entity recognition model and combining the word stock to perform entity recognition on the text material to obtain a recognition result, including:
Performing entity recognition on the text material by utilizing jieba entity recognition models and combining the word stock to obtain a first recognition result;
performing entity recognition on the text material by utilizing an LCA entity recognition model and combining the word stock to obtain a second recognition result;
and taking the overlapping part of the first identification result and the second identification result as the identification result.
Optionally, after using the named entity recognition model and combining the word stock to perform entity recognition on the text material, the method further includes:
performing entity tag correction on the target entity tag through the trained FastText model to obtain a corrected entity tag, wherein the corrected entity tag and the target entity jointly form a corrected identification result;
Correspondingly, using a natural language processing model, and carrying out semantic analysis and part-of-speech analysis according to the recognition result in the text material, generating a question stem, including:
And carrying out semantic analysis and part-of-speech analysis on the corrected recognition result by using a natural language processing model to generate the stem of the question.
Optionally, using a natural language processing model, and performing semantic analysis and part-of-speech analysis according to the recognition result in the text material, to generate a stem of the question, including:
Performing context understanding on the identification result by using a BERT model, and if the target entity is associated with other entities and the word order of the stem is reasonable according to the logical order of the parts of speech in the text material, hollowing the target entity in the text material to obtain a filled stem;
and performing natural language processing on the gap-filling type stem by using a Transform model to obtain the stem of the question.
Optionally, performing natural language processing on the gap-filling type stem by using a Transform model to obtain the stem of the question, including:
performing natural language processing on the gap-filling type stem by using the Transform model to obtain an optimized stem;
analyzing the optimized stem by using a stem evaluation model based on LSTM to obtain a stem parameter value;
if the parameter value of the question stem is larger than a preset question stem threshold value, taking the optimized question stem as the question stem of the question;
Otherwise, the gap-filling type question stem is used as the question stem of the question.
Optionally, after obtaining the interference item of the stem from the word stock according to the target entity tag, the method further includes:
carrying out diversification processing on the interference items of the stems by using a GAN model to obtain a plurality of options;
correspondingly, the interference item and the target entity are used as options of the title;
And taking the plurality of options and the target entity as options of the title.
Optionally, taking the plurality of options and the target entity as options of the title includes:
Analyzing the multiple options by using an LSTM-based interference item evaluation model, sorting the options according to parameter values of the options, and obtaining a preset number of target options according to sorting results;
and taking the target options and the target entity as options of the title.
The invention also provides a method for generating the title, which comprises the following steps:
the text material and word stock obtaining module is used for obtaining text materials containing knowledge points and word stocks formed by pre-arranging entities in each field and corresponding entity labels;
the entity recognition module is used for carrying out entity recognition on the text material by using a named entity recognition model and combining the word stock to obtain a recognition result, wherein the recognition result comprises a target entity and a corresponding target entity label;
the stem generation module is used for utilizing a natural language processing model, carrying out semantic analysis and part-of-speech analysis according to the identification result in the text material and generating a stem of a question;
and the option generating module is used for acquiring the interference item of the question stem from the word stock according to the target entity tag, and taking the interference item and the target entity as options of the question.
The invention also provides a question generation device, comprising:
A memory for storing a computer program;
And the processor is used for realizing the steps of the topic generation method when executing the computer program.
The invention also provides a computer readable storage medium, wherein the computer readable storage medium stores computer executable instructions, and the computer executable instructions realize the steps of the method for generating the title when being loaded and executed by a processor.
The method comprises the steps of obtaining text materials containing knowledge points, pre-arranging entities and corresponding entity labels in each field to form a word stock, utilizing a named entity recognition model and combining the word stock to conduct entity recognition on the text materials to obtain recognition results, enabling the recognition results to comprise target entities and corresponding target entity labels, utilizing a natural language processing model to conduct semantic analysis and part-of-speech analysis according to the recognition results in the text materials to generate a stem of a question, and obtaining interference items of the stem from the word stock according to the target entity labels, wherein the interference items and the target entities are used as options of the question. The invention obtains the recognition result by adopting a named entity recognition technology and a custom word stock, then analyzes the semantics and the parts of speech by a semantic understanding technology to generate the question stem, obtains the question options based on the word stock and the recognition result, and generates a large number of questions. Because the invention decomposes the question generation into a plurality of steps, each step can achieve the purpose of question accuracy by training and optimizing different models, thereby ensuring the accuracy of the question generation. And the whole architecture is based on natural language processing technology and other neural network algorithms, so that the degree of dependence on manpower in the process of question generation is reduced, and a large amount of questions can be generated by the method only by collecting proper knowledge materials.
In addition, the invention also provides a question generation device, equipment and a computer readable storage medium, which also have the beneficial effects.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for generating a topic according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a method for determining a recognition result according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating a method for generating a topic according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating a method for optimizing a topic according to an embodiment of the present invention;
FIG. 5 is a flowchart illustrating another method for generating a topic according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a topic generating device according to an embodiment of the present invention;
Fig. 7 is a schematic structural diagram of a topic generating device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
First, several terms involved in the present application are explained:
jieba entity recognition model jieba is a Python library for Chinese word segmentation, ‌ also supports Chinese named entity recognition;
The LAC entity recognition model is a combined lexical analysis model, ‌ aims to integrally complete Chinese word segmentation, ‌ part-of-speech tagging and ‌ special name recognition tasks ‌;
fastText is a rapid text classification model;
The BERT model is a typical pre-training language model adopting bidirectional coding;
A Transform model, namely a deep learning model, which is mainly used for processing sequence data, in particular to the field of natural language processing;
LSTM is a Long Short-Term Memory network, which is a time-circulating neural network;
GAN model generation of an antagonism network is a deep learning model.
The invention mainly solves the demands of the types and the quantity of the questions in the answering game by intelligently generating the selection questions, the questions generated by the intelligent question generation demands can cover the aspects of life, and meanwhile, the generated questions are required to have higher quality and larger quantity, thereby meeting the questions required by the competition answering of the users in the game. The existing scheme of directly using the neural network cannot guarantee the quality of the questions, including the problems that the expression of the questions is not smooth and reasonable, the matching of options is not proper, and the like, because the neural network model is a black box, the neural network model can only learn a certain rule through a large amount of data to generate the questions, but the specific implementation is unknown, the output result of the model is uncontrolled, in addition, the accuracy of the questions is difficult to guarantee, the logic completeness of a proper question is very strong, and the problems that the answer is incorrect, the answer is not unique, the interference options are not proper, and the like exist when the selection questions are generated simply through the neural network technology.
In the process of actually generating the title, the invention mainly uses technologies such as named entity recognition, semantic analysis, text generation and the like, please refer to fig. 1, fig. 1 is a flowchart of a title generation method provided by an embodiment of the invention. The method may include:
S101, acquiring text materials containing knowledge points and word libraries formed by pre-arranging entities in each field and corresponding entity tags.
Specifically, step S101 is a data preparation step. Before the question generation process is carried out, collecting and arranging entities in each field and corresponding labels to form word libraries, wherein the finer the field is, the higher the quality of the question generation is, and meanwhile, short text materials containing knowledge points are required to be collected as the input of the whole question generation process.
S102, carrying out entity recognition on the text material by using a named entity recognition model and combining a word stock to obtain a recognition result, wherein the recognition result comprises a target entity and a corresponding target entity label.
The present embodiment does not limit the named entity recognition model, nor does it limit the number of named entity models. Step S102 is an entity identification part. When the text material is input into the named entity recognition model, the word stock is loaded in advance, so that the named entity model is constrained, and an accurate recognition result is obtained.
Further, in order to improve accuracy of entity result recognition, the above method for performing entity recognition on text materials by using a named entity recognition model and combining a word stock to obtain a recognition result may specifically include the following steps:
step 21, performing entity recognition on the text material by utilizing jieba entity recognition models and combining word stock to obtain a first recognition result;
Step 22, performing entity recognition on the text material by utilizing the LCA entity recognition model and combining a word stock to obtain a second recognition result;
And step 23, taking the overlapping part of the first identification result and the second identification result as the identification result.
In the embodiment, two entity recognition tools, specifically jieba entity recognition models and LAC entity recognition models, are adopted, double recognition is carried out by combining a word stock which is arranged in advance, and finally, the intersection of the two entity recognition models is selected as a final recognition result, so that the probability of single entity recognition model recognition errors is reduced.
Further, correcting the recognition result to improve the accuracy of the recognition result, and after the named entity recognition model is used and the text material is subjected to entity recognition by combining the word stock to obtain the recognition result, the method specifically further comprises the following steps:
Performing entity tag correction on the target entity tag through the trained FastText model to obtain a corrected entity tag, wherein the corrected entity tag and the target entity jointly form a corrected identification result;
correspondingly, in step S103, the corrected recognition result is subjected to semantic analysis and part-of-speech analysis by using a natural language processing model, so as to generate a stem of the question.
In this embodiment, considering that different entities belong to different entity labels in different semantic environments, i.e. different entity categories, such as "liqueur" belongs to "poetry" in materials of historic literature class, and "liqueur" belongs to "game character" in materials of game class, there may be a category identification error of a part of the entities by means of entity identification, so that an entity category corrector (predictor) is trained by using FastText model, input of FastText model is a coding vector composed of material text and entities, output of FastText model is probability of belonging to each category, and a category with the highest probability is selected as a final entity category, i.e. entity label. Through this step, the target entity and the target entity label corresponding to each knowledge material are obtained, and the detailed flow can refer to fig. 2, and fig. 2 is a flow example diagram of a method for determining a recognition result according to an embodiment of the present invention.
S103, utilizing a natural language processing model, and carrying out semantic analysis and part-of-speech analysis according to the recognition result in the text material to generate a question stem.
The present embodiment is not limited to a natural language processing model. And (3) performing simple semantic analysis and part-of-speech analysis according to the result of entity identification in the text material to judge whether the identified entity can be used as an answer to generate a stem.
Further, in order to improve the accuracy of the stem generation, the method for generating the stem by using the natural language processing model and performing semantic analysis and part-of-speech analysis according to the recognition result in the text material may specifically include the following steps:
And 31, performing context understanding on the identification result by using the BERT model, and if the target entity is related to other entities and the word order of the stem is reasonable according to the logical order of the parts of speech in the text material, hollowing the target entity in the text material to obtain the blank-filled stem.
The semantic analysis in this embodiment mainly uses the BERT model to understand the context, and determines whether the currently identified target entity has a certain relationship with other entities, if so, it will determine whether the word order of the generated stem is reasonable according to the part-of-speech logic order in the text material, and if the above conditions are satisfied, the target entity in the text material is replaced by a transverse line, so as to generate the gap-filling stem. As shown in the left half of fig. 3, fig. 3 is a flowchart illustrating a method for generating a title according to an embodiment of the present invention.
And step 32, performing natural language processing on the gap-filling type stem by using a Transform model to obtain the stem of the question.
In the embodiment, the blank-filling type stem is optimized in a natural language mode by using a Transform model, and statement expression is generated into question sentence forms conforming to natural language rules through the trained Transform model, for example, the author of 'Equipped with mind' input is ___. ", output:" what poetry is the author of "quiet night thought".
Further, in order to ensure the accuracy of the stem generation, the performing natural language processing on the gap-filling stem by using the Transform model to obtain the stem of the question may specifically include:
step 321, performing natural language processing on the gap-filling type stem by using a Transform model to obtain an optimized stem;
Step 322, analyzing the optimized stem by using a stem evaluation model based on LSTM to obtain a stem parameter value;
step 323, taking the optimized stem as the stem of the question if the parameter value of the stem is larger than a preset stem threshold;
step 324, otherwise, taking the gap-filling type stem as the stem of the question.
The embodiment further detects the optimization result of the Transform model. Since the result generated by the Transform model is not completely accurate, an evaluation model is added after the result generated by the Transform model for scoring the optimized result, the model is obtained by LSTM training, a value between 0,1 can be generated by inputting the generated stem and the answer, the value is closer to 1, the better the generated effect is, and therefore, whether the optimized stem is reserved or not is determined by setting a threshold value. The optimization of the stem may refer to the left half of the graph, and fig. 4 is a flowchart illustrating a method for optimizing the stem according to an embodiment of the present invention.
S104, obtaining interference items of the topics from the word stock according to the target entity tags, and taking the interference items and the target entity as options of the topics.
The embodiment is not limited to a specific method for acquiring the stem interference item. For example, a clustering algorithm may be used, or other algorithms for computing similarity may be selected. The method for generating the options is shown in the right half of fig. 3, and fig. 3 is a flowchart illustrating an embodiment of a method for generating a question according to the present invention.
Further, in order to generate diversity and rationality of the interference term, after the interference term of the stem is obtained from the word stock according to the target entity tag, the method may further include the following steps:
Carrying out diversified processing on interference items of the stems by using a GAN model to obtain a plurality of options;
Correspondingly, taking the interference item and the target entity as the item of the title;
the plurality of options and the target entity are taken as the options of the title.
The optimization of the options in this embodiment is to ensure the diversity and rationality of the options. Firstly, considering the diversity of options, training a generating countermeasure network according to the option data in the existing question bank to generate the options, and automatically adding the generated first three options into an initial option list. And carrying out diversification processing on interference items of the stems by using a GAN model to obtain a plurality of options. Referring to the right half of fig. 4, fig. 4 is a flowchart illustrating a method for optimizing a topic according to an embodiment of the present invention.
Further, in order to ensure the accuracy and reliability of the interference item generation, the above-mentioned option using the multiple options and the target entity as the topics may specifically include the following steps:
step 41, analyzing the plurality of options by using an interference item evaluation model based on LSTM, sorting the options according to parameter values of the options, and obtaining a preset number of target options according to sorting results;
And 42, taking the target options and the target entity as the options of the title.
According to the method, an interference option evaluation model is trained through LSTM according to the stems and interference option data in the existing question bank, the initial multiple options are ranked in a scoring mode, finally the required options are selected according to the sequence, and the purpose of the step is to select the interference item which is most suitable for the current stems from the initial option set through the evaluation model. Optimization of options referring to the right half of fig. 4, fig. 4 is a flowchart illustrating a method for optimizing a topic according to an embodiment of the present invention.
Further, the appointed large language model is called for final correctness checking for the finally generated selection questions, and whether further manual checking is needed is judged by comparing the answer result of the large language model with the correct answer of the current questions.
The method for generating the topics comprises the steps of obtaining text materials containing knowledge points and a word stock formed by pre-arranging entities and corresponding entity tags in each field, utilizing a named entity recognition model and combining the word stock to conduct entity recognition on the text materials to obtain recognition results, enabling the recognition results to comprise target entities and corresponding target entity tags, utilizing a natural language processing model to conduct semantic analysis and part-of-speech analysis according to the recognition results in the text materials to generate the topics stems, obtaining interference items of the topics from the word stock according to the target entity tags, and taking the interference items and the target entities as options of the topics. The invention decomposes the topic generation into a plurality of flows, and each flow can achieve the purpose of topic accuracy by training and optimizing different models, so that the generated topic accuracy is higher. The whole architecture is based on natural language processing technology and other neural network algorithms, such as a sequence network model, a countermeasure network generation model and a diffusion generation model, the degree of dependence on manpower in the process of producing questions is reduced, a large number of questions can be produced through the method only by collecting proper knowledge materials, in addition, the diversity of the questions is guaranteed through the questions after the questions are optimized and options are optimized, the repeatability of the questions with a large number of questions on the market is avoided, finally, machine checking is conducted through calling of the large language model, manpower is saved, and efficiency is improved.
For better understanding of the present invention, please refer to fig. 5, fig. 5 is a flowchart illustrating another method for generating a question according to an embodiment of the present invention, which may include:
(1) Data preparation, namely, natural language text materials containing specific knowledge points and classification word libraries containing various fields.
(2) And identifying the specific entity contained in the material and identifying the specific category of the entity according to the text material description.
(3) The method comprises the steps of generating a hollowed-out selection question, namely judging that an entity capable of giving questions is replaced by an underline through semantic analysis and part-of-speech analysis, and obtaining other entities of the same category as an initial interference item according to a current entity through a clustering algorithm.
(4) And optimizing the question stem, namely generating question form problems by the generated hollowed question stem according to a trained natural language model, and judging the score of the generated problems on grammar and smoothness by combining a scoring model (namely an evaluation model) to determine whether to keep an optimized result.
(5) And optimizing the question options, namely generating a batch of interference items outside the word stock according to the generated countermeasure network, the diffusion model and the like, and finally screening the interference items which are most suitable for the current question according to the trained option scoring model.
(6) And finishing the generation of the questions, namely checking the accuracy of the generated questions through a large language model.
The following describes a topic generation device provided in an embodiment of the present invention, and the topic generation device described below and the topic generation method described above can be referred to correspondingly.
Referring to fig. 6 specifically, fig. 6 is a schematic structural diagram of a topic generating device according to an embodiment of the present invention, which may include:
The text material and word stock obtaining module 100 is used for obtaining a word stock formed by text materials containing knowledge points, and pre-arranging entities in each field and corresponding entity tags;
The entity recognition module 200 is configured to perform entity recognition on the text material by using a named entity recognition model and combining the word stock to obtain a recognition result, where the recognition result includes a target entity and a corresponding target entity tag;
The stem generation module 300 is configured to generate a stem of a question by using a natural language processing model and performing semantic analysis and part-of-speech analysis according to the recognition result in the text material;
and the option generating module 400 is configured to obtain, according to the target entity tag, an interference item of the stem from the word stock, and use the interference item and the target entity as options of the topic.
Based on the above embodiment, the entity identification module 200 may include:
the first recognition unit is used for carrying out entity recognition on the text material by utilizing jieba entity recognition models and combining the word stock to obtain a first recognition result;
The second recognition unit is used for carrying out entity recognition on the text material by utilizing the LCA entity recognition model and combining the word stock to obtain a second recognition result;
And the identification result determining unit is used for taking the overlapping part of the first identification result and the second identification result as the identification result.
Based on the above embodiment, the topic generation device may further include:
The entity tag correction module is used for carrying out entity recognition on the text material by using a named entity recognition model and combining the word stock to obtain a recognition result, and carrying out entity tag correction on the target entity tag through a trained FastText model to obtain a corrected entity tag;
accordingly, the stem generation module 300 may include:
And the stem generation unit is used for carrying out semantic analysis and part-of-speech analysis on the corrected recognition result by utilizing a natural language processing model to generate the stem of the question.
Based on any of the foregoing embodiments, the stem generation module 300 may include:
The gap-filling type stem generation unit is used for carrying out context understanding on the identification result by utilizing a BERT model, and if the target entity is associated with other entities and the word order of the stem is reasonable according to the logic order of the parts of speech in the text material, the target entity in the text material is hollowed to obtain the gap-filling type stem;
and the natural language processing unit is used for carrying out natural language processing on the gap-filling type stem by utilizing a transformation model to obtain the stem of the question.
Based on the above embodiment, the natural language processing unit may include:
the stem optimization subunit is used for performing natural language processing on the gap-filling stem by using the transformation model to obtain an optimized stem;
the stem evaluation subunit is used for analyzing the optimized stem by utilizing a stem evaluation model based on LSTM to obtain a stem parameter value;
a first result subunit, configured to take the optimized stem as a stem of the question if the stem parameter value is greater than a preset stem threshold;
and the second result subunit is used for taking the gap-filling type question stem as the question stem of the question if not.
Based on the above embodiment, the topic generation device may further include:
the diversification processing module is used for carrying out diversification processing on the interference items of the stem by utilizing a GAN model after obtaining the interference items of the stem from the word stock according to the target entity tag so as to obtain a plurality of options;
accordingly, the option generating module 400 may include:
and the option generating unit is used for taking the plurality of options and the target entity as options of the title.
Based on the above embodiment, the option generating unit may include:
The option evaluation subunit is used for analyzing the plurality of options by utilizing an interference item evaluation model based on the LSTM, sequencing the options according to the parameter values of the options, and obtaining a preset number of target options according to the sequencing result;
And the option determining subunit is used for taking the target option and the target entity as options of the title.
The order of the modules and units in the above-described topic generation device may be changed without affecting the logic.
The topic generation device provided by the embodiment of the invention is applied to a word stock formed by a text material and word stock acquisition module 100 for acquiring a text material containing knowledge points and arranging entities in each field and corresponding entity tags in advance, an entity identification module 200 for carrying out entity identification on the text material by using a named entity identification model and combining the word stock to obtain an identification result, wherein the identification result comprises a target entity and a corresponding target entity tag, a topic stem generation module 300 for carrying out semantic analysis and part-of-speech analysis by using a natural language processing model and according to the identification result in the text material to generate a topic stem, and an option generation module 400 for acquiring interference items of the topic stem from the word stock according to the target entity tag and taking the interference items and the target entity as options of topics. Because the device decomposes the question generation into a plurality of steps, each step can achieve the purpose of question accuracy by training and optimizing different models, thereby ensuring the accuracy of the question generation. The whole architecture is based on natural language processing technology and other neural network algorithms, such as a sequence network model, a countermeasure network generation model and a diffusion generation model, the dependence on manpower in the process of producing questions is reduced, a large number of questions can be produced through the device only by collecting proper knowledge materials, in addition, the diversity of the questions is guaranteed through the questions after the questions are optimized with options, the repeatability of the questions with a large number of questions on the market is avoided, finally, machine checking is carried out by calling the large language model, the manpower is saved, and the efficiency is improved.
The following describes a topic generation device provided in an embodiment of the present invention, and the topic generation device described below and the topic generation method described above may be referred to correspondingly.
Referring to fig. 7, fig. 7 is a schematic structural diagram of a topic generating device according to an embodiment of the present invention, which may include:
A memory 10 for storing a computer program;
A processor 20 for executing a computer program to implement the subject generation method described above.
The memory 10, the processor 20, and the communication interface 31 all communicate with each other via a communication bus 32.
In the embodiment of the present invention, the memory 10 is used for storing one or more programs, the programs may include program codes, the program codes include computer operation instructions, and in the embodiment of the present invention, the memory 10 may store programs for implementing the following functions:
Acquiring text materials containing knowledge points and pre-arranging entities in each field and word libraries formed by corresponding entity tags;
Carrying out entity recognition on the text material by using a named entity recognition model and combining a word stock to obtain a recognition result, wherein the recognition result comprises a target entity and a corresponding target entity label;
Using a natural language processing model, and carrying out semantic analysis and part-of-speech analysis according to the recognition result in the text material to generate a question stem;
and obtaining interference items of the topics from the word stock according to the target entity tags, and taking the interference items and the target entity as options of the topics.
In one possible implementation, memory 10 may include a storage program area that may store an operating system, as well as at least one application program required for functionality, etc., and a storage data area that may store data created during use.
In addition, memory 10 may include read only memory and random access memory and provide instructions and data to the processor. A portion of the memory may also include NVRAM. The memory stores an operating system and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, where the operating instructions may include various operating instructions for performing various operations. The operating system may include various system programs for implementing various basic tasks as well as handling hardware-based tasks.
The processor 20 may be a central processing unit (Central Processing Unit, CPU), an asic, a dsp, a fpga or other programmable logic device, and the processor 20 may be a microprocessor or any conventional processor. The processor 20 may call a program stored in the memory 10.
The communication interface 31 may be an interface of a communication module for connecting with other devices or systems.
Of course, it should be noted that the structure shown in fig. 7 is not limited to the topic generating device in the embodiment of the present invention, and the topic generating device may include more or less components than those shown in fig. 7 or may be combined with some components in practical applications.
The following describes a computer-readable storage medium provided in an embodiment of the present invention, and the computer-readable storage medium described below and the topic generation method described above may be referred to correspondingly.
The present invention also provides a computer-readable storage medium having a computer program stored thereon, which when executed by a processor, implements the steps of the topic generation method described above.
The computer readable storage medium may include a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk or an optical disk, etc. various media that can store program codes.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
Finally, it is further noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
The foregoing describes in detail a method, apparatus, device and computer readable storage medium for generating a title, wherein specific examples are set forth herein to illustrate the principles and embodiments of the present invention, and the description of the examples is only for aiding in the understanding of the method and its core concept, and wherein, for those skilled in the art, there are variations in the specific embodiments and application scope of the invention, and in light of the above, the disclosure should not be construed as limiting the invention.

Claims (8)

Translated fromChinese
1.一种题目生成方法,其特征在于,包括:1. A method for generating a topic, comprising:获取含有知识点的文本素材和预先整理各个领域的实体及对应的实体标签形成的词库;Obtain text materials containing knowledge points and pre-organize the vocabulary formed by entities in various fields and corresponding entity labels;利用命名实体识别模型并结合所述词库对所述文本素材进行实体识别,得到识别结果;所述识别结果包括目标实体及对应的目标实体标签;Performing entity recognition on the text material using a named entity recognition model in combination with the vocabulary to obtain a recognition result; the recognition result includes a target entity and a corresponding target entity label;利用自然语言处理模型,并根据所述文本素材中的所述识别结果进行语义分析和词性分析,生成题目的题干;Using a natural language processing model, and performing semantic analysis and part-of-speech analysis based on the recognition results in the text material, to generate a question stem;根据所述目标实体标签从所述词库中获取所述题干的干扰项,将所述干扰项和所述目标实体作为所述题目的选项;Obtain interference items of the question stem from the vocabulary according to the target entity tag, and use the interference items and the target entity as options for the question;利用命名实体识别模型并结合所述词库对所述文本素材进行实体识别,得到识别结果,包括:The named entity recognition model is used in combination with the vocabulary to perform entity recognition on the text material to obtain recognition results, including:利用jieba实体识别模型并结合所述词库对所述文本素材进行实体识别,得到第一识别结果;Using the Jieba entity recognition model and combining the vocabulary to perform entity recognition on the text material, obtaining a first recognition result;利用LCA实体识别模型并结合所述词库对所述文本素材进行实体识别,得到第二识别结果;Performing entity recognition on the text material using the LCA entity recognition model in combination with the vocabulary to obtain a second recognition result;将所述第一识别结果和所述第二识别结果的重叠部分作为所述识别结果;taking the overlapping part of the first recognition result and the second recognition result as the recognition result;在利用命名实体识别模型并结合所述词库对所述文本素材进行实体识别,得到识别结果之后,还包括:After performing entity recognition on the text material using the named entity recognition model in combination with the thesaurus to obtain a recognition result, the method further includes:通过训练好的FastText模型对所述目标实体标签进行实体标签校正,得到校正后的实体标签;所述校正后的实体标签和所述目标实体共同构成校正后的识别结果;Performing entity label correction on the target entity label through the trained FastText model to obtain a corrected entity label; the corrected entity label and the target entity together constitute a corrected recognition result;相应的,利用自然语言处理模型,并根据所述文本素材中的所述识别结果进行语义分析和词性分析,生成题目的题干,包括:Accordingly, a natural language processing model is used to perform semantic analysis and part-of-speech analysis based on the recognition results in the text material to generate the stem of the question, including:利用自然语言处理模型对所述校正后的识别结果进行语义分析和词性分析,生成所述题目的题干。A natural language processing model is used to perform semantic analysis and part-of-speech analysis on the corrected recognition result to generate the stem of the question.2.根据权利要求1所述的题目生成方法,其特征在于,利用自然语言处理模型,并根据所述文本素材中的所述识别结果进行语义分析和词性分析,生成题目的题干,包括:2. The method for generating a question according to claim 1, characterized in that the question stem is generated by using a natural language processing model and performing semantic analysis and part-of-speech analysis according to the recognition result in the text material, including:利用BERT模型对所述识别结果进行上下文理解,若所述目标实体与其他实体存在关联,且根据所述文本素材中词性的逻辑顺序判断所述题干的语序合理,则将所述文本素材中的所述目标实体进行挖空,得到填空式题干;The BERT model is used to understand the context of the recognition result. If the target entity is associated with other entities, and the word order of the question stem is judged to be reasonable according to the logical order of the parts of speech in the text material, the target entity in the text material is hollowed out to obtain a fill-in-the-blank question stem;利用Transformer模型对所述填空式题干进行自然语言处理,得到所述题目的题干。The Transformer model is used to perform natural language processing on the fill-in-the-blank question stem to obtain the question stem of the question.3.根据权利要求2所述的题目生成方法,其特征在于,利用Transformer模型对所述填空式题干进行自然语言处理,得到所述题目的题干,包括:3. The method for generating questions according to claim 2, characterized in that the stem of the fill-in-the-blank question is subjected to natural language processing using a Transformer model to obtain the stem of the question, comprising:利用所述Transformer模型对所述填空式题干进行自然语言处理,得到优化后的题干;Using the Transformer model to perform natural language processing on the fill-in-the-blank question stem to obtain an optimized question stem;利用基于LSTM的题干评价模型对所述优化后的题干进行分析,得到题干参数值;Analyze the optimized question stem using the LSTM-based question stem evaluation model to obtain question stem parameter values;若所述题干参数值大于预设题干阈值,则将所述优化后的题干作为所述题目的题干;If the question stem parameter value is greater than a preset question stem threshold, the optimized question stem is used as the question stem of the question;否则,则将所述填空式题干作为所述题目的题干。Otherwise, the fill-in-the-blank question stem will be used as the question stem of the question.4.根据权利要求1所述题目生成方法,其特征在于,根据所述目标实体标签从所述词库中获取所述题干的干扰项之后,还包括:4. The question generation method according to claim 1, characterized in that after obtaining the distractor items of the question stem from the vocabulary according to the target entity tag, it also includes:利用GAN模型对所述题干的干扰项进行多样化处理,得到多个选项;The GAN model is used to diversify the interference items of the question stem to obtain multiple options;相应的,将所述干扰项和所述目标实体作为所述题目的选项;Accordingly, the distractor and the target entity are used as options for the question;将所述多个选项和所述目标实体作为所述题目的选项。The multiple options and the target entity are used as options for the question.5.根据权利要求4所述的题目生成方法,其特征在于,将所述多个选项和所述目标实体作为所述题目的选项,包括:5. The method for generating a topic according to claim 4, wherein the multiple options and the target entity are used as options of the topic, comprising:利用基于LSTM的干扰项评价模型对所述多个选项进行分析,根据各个选项参数值对各个选项进行排序,根据排序结果得到预设数量的目标选项;Analyze the multiple options using the LSTM-based interference item evaluation model, sort the options according to the parameter values of the options, and obtain a preset number of target options according to the sorting results;将所述目标选项和所述目标实体作为所述题目的选项。The target option and the target entity are used as options of the topic.6.一种题目生成装置,其特征在于,包括:6. A question generating device, characterized by comprising:文本素材和词库获取模块,用于获取含有知识点的文本素材和预先整理各个领域的实体及对应的实体标签形成的词库;The text material and vocabulary acquisition module is used to acquire text materials containing knowledge points and pre-organize the vocabulary formed by entities in various fields and corresponding entity tags;实体识别模块,用于利用命名实体识别模型并结合所述词库对所述文本素材进行实体识别,得到识别结果;所述识别结果包括目标实体及对应的目标实体标签;An entity recognition module, used to perform entity recognition on the text material using a named entity recognition model in combination with the vocabulary to obtain a recognition result; the recognition result includes a target entity and a corresponding target entity label;题干生成模块,用于利用自然语言处理模型,并根据所述文本素材中的所述识别结果进行语义分析和词性分析,生成题目的题干;A question stem generation module, used to generate the question stem by using a natural language processing model and performing semantic analysis and part-of-speech analysis according to the recognition results in the text material;选项生成模块,用于根据所述目标实体标签从所述词库中获取所述题干的干扰项,将所述干扰项和所述目标实体作为所述题目的选项;An option generation module, used to obtain interference items of the question stem from the vocabulary according to the target entity tag, and use the interference items and the target entity as options for the question;所述实体识别模块,包括:The entity recognition module comprises:第一识别单元,用于利用jieba实体识别模型并结合所述词库对所述文本素材进行实体识别,得到第一识别结果;A first recognition unit, configured to perform entity recognition on the text material by using the Jieba entity recognition model in combination with the vocabulary to obtain a first recognition result;第二识别单元,用于利用LCA实体识别模型并结合所述词库对所述文本素材进行实体识别,得到第二识别结果;A second recognition unit is used to perform entity recognition on the text material by using the LCA entity recognition model in combination with the vocabulary to obtain a second recognition result;识别结果确定单元,用于将所述第一识别结果和所述第二识别结果的重叠部分作为所述识别结果;a recognition result determination unit, configured to use an overlapping portion of the first recognition result and the second recognition result as the recognition result;还包括:Also includes:实体标签校正模块,用于在利用命名实体识别模型并结合所述词库对所述文本素材进行实体识别,得到识别结果之后,通过训练好的FastText模型对所述目标实体标签进行实体标签校正,得到校正后的实体标签;所述校正后的实体标签和所述目标实体共同构成校正后的识别结果;An entity label correction module is used to perform entity recognition on the text material using a named entity recognition model in combination with the vocabulary to obtain a recognition result, and then perform entity label correction on the target entity label using a trained FastText model to obtain a corrected entity label; the corrected entity label and the target entity together constitute a corrected recognition result;相应的,题干生成模块,包括:Correspondingly, the question stem generation module includes:题干生成单元,用于利用自然语言处理模型对所述校正后的识别结果进行语义分析和词性分析,生成所述题目的题干。The question stem generation unit is used to use a natural language processing model to perform semantic analysis and part-of-speech analysis on the corrected recognition result to generate the question stem of the question.7.一种题目生成设备,其特征在于,包括:7. A topic generating device, characterized by comprising:存储器,用于存储计算机程序;Memory for storing computer programs;处理器,用于执行所述计算机程序时实现如权利要求1至5任一项所述的题目生成方法的步骤。A processor, configured to implement the steps of the question generating method as claimed in any one of claims 1 to 5 when executing the computer program.8.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机可执行指令,所述计算机可执行指令被处理器加载并执行时,实现如权利要求1至5任一项所述的题目生成方法的步骤。8. A computer-readable storage medium, characterized in that the computer-readable storage medium stores computer-executable instructions, and when the computer-executable instructions are loaded and executed by a processor, the steps of the question generation method according to any one of claims 1 to 5 are implemented.
CN202510283454.2A2025-03-112025-03-11Question generation method, device, equipment and computer readable storage mediumActiveCN119783635B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202510283454.2ACN119783635B (en)2025-03-112025-03-11Question generation method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202510283454.2ACN119783635B (en)2025-03-112025-03-11Question generation method, device, equipment and computer readable storage medium

Publications (2)

Publication NumberPublication Date
CN119783635A CN119783635A (en)2025-04-08
CN119783635Btrue CN119783635B (en)2025-06-06

Family

ID=95242554

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202510283454.2AActiveCN119783635B (en)2025-03-112025-03-11Question generation method, device, equipment and computer readable storage medium

Country Status (1)

CountryLink
CN (1)CN119783635B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112560443A (en)*2020-12-292021-03-26平安银行股份有限公司Choice question generation model training method, choice question generation method, device and medium
CN112686025A (en)*2021-01-272021-04-20浙江工商大学Chinese choice question interference item generation method based on free text

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110825879B (en)*2019-09-182024-05-07平安科技(深圳)有限公司Decide a case result determination method, device, equipment and computer readable storage medium
CN114201613B (en)*2021-11-302022-10-21北京百度网讯科技有限公司Test question generation method, test question generation device, electronic device, and storage medium
CN115964997A (en)*2022-12-072023-04-14竹间智能科技(上海)有限公司Confusion option generation method and device for choice questions, electronic equipment and storage medium
US12210800B2 (en)*2023-01-312025-01-28Adobe Inc.Modifying digital images using combinations of direct interactions with the digital images and context-informing speech input
CN116227627A (en)*2023-03-072023-06-06北京航空航天大学 A method and system for sorting distractor items in multiple-choice questions based on multi-information source enhancement

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112560443A (en)*2020-12-292021-03-26平安银行股份有限公司Choice question generation model training method, choice question generation method, device and medium
CN112686025A (en)*2021-01-272021-04-20浙江工商大学Chinese choice question interference item generation method based on free text

Also Published As

Publication numberPublication date
CN119783635A (en)2025-04-08

Similar Documents

PublicationPublication DateTitle
CN118797017B (en) An intelligent question answering method based on the collaboration of large language model and knowledge graph
CN112070138B (en)Construction method of multi-label mixed classification model, news classification method and system
CN110175227B (en)Dialogue auxiliary system based on team learning and hierarchical reasoning
Llorens et al.Tipsem (english and spanish): Evaluating crfs and semantic roles in tempeval-2
CN110727779A (en)Question-answering method and system based on multi-model fusion
CN111062220B (en)End-to-end intention recognition system and method based on memory forgetting device
CN114218379B (en)Attribution method for question answering incapacity of intelligent question answering system
CN107451230A (en)A kind of answering method and question answering system
CN116991982B (en)Interactive dialogue method, device, equipment and storage medium based on artificial intelligence
CN114330318A (en)Method and device for recognizing Chinese fine-grained entities in financial field
CN109872775B (en)Document labeling method, device, equipment and computer readable medium
CN107783958B (en)Target statement identification method and device
CN111460118A (en)Artificial intelligence conflict semantic recognition method and device
CN117235215A (en)Large model and knowledge graph based dialogue generation method, system and medium
CN119169650A (en) A method for generating sequential text bill image question-answering data based on a multimodal large model
WO2025077885A1 (en)Method and apparatus for text sentiment recognition, computer storage medium, and electronic device
CN117112767A (en)Question and answer result generation method, commercial query big model training method and device
Yu et al.Assessing the potential of AI-assisted pragmatic annotation: The case of apologies
CN107797981B (en)Target text recognition method and device
Alshammary et al.Evaluating The Impact of Feature Extraction Techniques on Arabic Reviews Classification
CN111753554B (en) A method and device for generating an intention knowledge base
CN119149707A (en)Intelligent question-answering system and method based on self-adaptive feedback loop
CN119783635B (en)Question generation method, device, equipment and computer readable storage medium
OliveroFigurative language understanding based on large language models
CN117350302B (en)Semantic analysis-based language writing text error correction method, system and man-machine interaction device

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp