CN119783635B

Movatterモバイル変換

Info

Publication number: CN119783635B
Application number: CN202510283454.2A
Authority: CN
Inventors: 郑肖南; 陈文坚; 郑巨隆; 易亮
Original assignee: Zhejiang Tongyuan Zhihui Technology Co ltd
Current assignee: Zhejiang Tongyuan Zhihui Technology Co ltd
Priority date: 2025-03-11
Filing date: 2025-03-11
Publication date: 2025-06-06
Anticipated expiration: 2045-03-11
Also published as: CN119783635A

Abstract

The invention discloses a method, a device, equipment and a computer readable storage medium for generating topics, which are applied to the field of intelligent topic production and comprise the steps of obtaining text materials containing knowledge points and word libraries formed by pre-arranging entities and corresponding entity labels in each field, carrying out entity recognition on the text materials by utilizing a named entity recognition model and combining the word libraries to obtain recognition results, wherein the recognition results comprise target entities and corresponding target entity labels, carrying out semantic analysis and part-of-speech analysis by utilizing a natural language processing model according to the recognition results in the text materials to generate topics stems of the topics, obtaining interference items of the topics from the word libraries according to the target entity labels, and taking the interference items and the target entities as options of the topics. The invention decomposes the topic generation into a plurality of processes, and each process can achieve the purpose of topic generation accuracy by training and optimizing different models, so that the generated topic accuracy is higher.

Description

Question generation method, device, equipment and computer readable storage medium

Technical Field

The present invention relates to the field of intelligent question generation, and in particular, to a method, apparatus, device, and computer readable storage medium for generating questions.

Background

The questions in the answering game are wide in category and large in quantity, so that the questions which are required to be intelligently generated can cover the aspects of life, and the generated questions are high in quality, so that the answering of the user competition can be met. The deep neural network is generally adopted at the present stage to produce questions in batches. If the neural network is directly used, but because the neural network model is a black box, a certain rule can be learned by the model only through a large amount of data to generate the questions, but how to realize the questions is not known, so that the output result of the model is uncontrolled, and the problems of incorrect answers, non-unique answers, unsuitable interference options and the like can exist.

Therefore, how to ensure the correctness of the final topic generation is a technical problem that needs to be solved currently.

Disclosure of Invention

Accordingly, an object of the present invention is to provide a method, apparatus, device and computer readable storage medium for generating questions, which solve the problem of incorrect generated questions in the prior art.

In order to solve the technical problems, the invention provides a method for generating a question, which comprises the following steps:

acquiring text materials containing knowledge points and word libraries formed by pre-arranging entities in each field and corresponding entity tags;

Carrying out entity recognition on the text material by using a named entity recognition model and combining the word stock to obtain a recognition result, wherein the recognition result comprises a target entity and a corresponding target entity label;

Using a natural language processing model, and carrying out semantic analysis and part-of-speech analysis according to the recognition result in the text material to generate a question stem;

and acquiring an interference item of the question stem from the word stock according to the target entity tag, and taking the interference item and the target entity as options of the question.

Optionally, using a named entity recognition model and combining the word stock to perform entity recognition on the text material to obtain a recognition result, including:

Performing entity recognition on the text material by utilizing jieba entity recognition models and combining the word stock to obtain a first recognition result;

performing entity recognition on the text material by utilizing an LCA entity recognition model and combining the word stock to obtain a second recognition result;

and taking the overlapping part of the first identification result and the second identification result as the identification result.

Optionally, after using the named entity recognition model and combining the word stock to perform entity recognition on the text material, the method further includes:

performing entity tag correction on the target entity tag through the trained FastText model to obtain a corrected entity tag, wherein the corrected entity tag and the target entity jointly form a corrected identification result;

Correspondingly, using a natural language processing model, and carrying out semantic analysis and part-of-speech analysis according to the recognition result in the text material, generating a question stem, including:

And carrying out semantic analysis and part-of-speech analysis on the corrected recognition result by using a natural language processing model to generate the stem of the question.

Optionally, using a natural language processing model, and performing semantic analysis and part-of-speech analysis according to the recognition result in the text material, to generate a stem of the question, including:

Performing context understanding on the identification result by using a BERT model, and if the target entity is associated with other entities and the word order of the stem is reasonable according to the logical order of the parts of speech in the text material, hollowing the target entity in the text material to obtain a filled stem;

and performing natural language processing on the gap-filling type stem by using a Transform model to obtain the stem of the question.

Optionally, performing natural language processing on the gap-filling type stem by using a Transform model to obtain the stem of the question, including:

performing natural language processing on the gap-filling type stem by using the Transform model to obtain an optimized stem;

analyzing the optimized stem by using a stem evaluation model based on LSTM to obtain a stem parameter value;

if the parameter value of the question stem is larger than a preset question stem threshold value, taking the optimized question stem as the question stem of the question;

Otherwise, the gap-filling type question stem is used as the question stem of the question.

Optionally, after obtaining the interference item of the stem from the word stock according to the target entity tag, the method further includes:

carrying out diversification processing on the interference items of the stems by using a GAN model to obtain a plurality of options;

correspondingly, the interference item and the target entity are used as options of the title;

And taking the plurality of options and the target entity as options of the title.

Optionally, taking the plurality of options and the target entity as options of the title includes:

Analyzing the multiple options by using an LSTM-based interference item evaluation model, sorting the options according to parameter values of the options, and obtaining a preset number of target options according to sorting results;

and taking the target options and the target entity as options of the title.

The invention also provides a method for generating the title, which comprises the following steps:

the text material and word stock obtaining module is used for obtaining text materials containing knowledge points and word stocks formed by pre-arranging entities in each field and corresponding entity labels;

the entity recognition module is used for carrying out entity recognition on the text material by using a named entity recognition model and combining the word stock to obtain a recognition result, wherein the recognition result comprises a target entity and a corresponding target entity label;

the stem generation module is used for utilizing a natural language processing model, carrying out semantic analysis and part-of-speech analysis according to the identification result in the text material and generating a stem of a question;

and the option generating module is used for acquiring the interference item of the question stem from the word stock according to the target entity tag, and taking the interference item and the target entity as options of the question.

The invention also provides a question generation device, comprising:

A memory for storing a computer program;

And the processor is used for realizing the steps of the topic generation method when executing the computer program.

The invention also provides a computer readable storage medium, wherein the computer readable storage medium stores computer executable instructions, and the computer executable instructions realize the steps of the method for generating the title when being loaded and executed by a processor.

The method comprises the steps of obtaining text materials containing knowledge points, pre-arranging entities and corresponding entity labels in each field to form a word stock, utilizing a named entity recognition model and combining the word stock to conduct entity recognition on the text materials to obtain recognition results, enabling the recognition results to comprise target entities and corresponding target entity labels, utilizing a natural language processing model to conduct semantic analysis and part-of-speech analysis according to the recognition results in the text materials to generate a stem of a question, and obtaining interference items of the stem from the word stock according to the target entity labels, wherein the interference items and the target entities are used as options of the question. The invention obtains the recognition result by adopting a named entity recognition technology and a custom word stock, then analyzes the semantics and the parts of speech by a semantic understanding technology to generate the question stem, obtains the question options based on the word stock and the recognition result, and generates a large number of questions. Because the invention decomposes the question generation into a plurality of steps, each step can achieve the purpose of question accuracy by training and optimizing different models, thereby ensuring the accuracy of the question generation. And the whole architecture is based on natural language processing technology and other neural network algorithms, so that the degree of dependence on manpower in the process of question generation is reduced, and a large amount of questions can be generated by the method only by collecting proper knowledge materials.

In addition, the invention also provides a question generation device, equipment and a computer readable storage medium, which also have the beneficial effects.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a method for generating a topic according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a method for determining a recognition result according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating a method for generating a topic according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating a method for optimizing a topic according to an embodiment of the present invention;

FIG. 5 is a flowchart illustrating another method for generating a topic according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a topic generating device according to an embodiment of the present invention;

Fig. 7 is a schematic structural diagram of a topic generating device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

First, several terms involved in the present application are explained:

jieba entity recognition model jieba is a Python library for Chinese word segmentation, ‌ also supports Chinese named entity recognition;

The LAC entity recognition model is a combined lexical analysis model, ‌ aims to integrally complete Chinese word segmentation, ‌ part-of-speech tagging and ‌ special name recognition tasks ‌;

fastText is a rapid text classification model;

The BERT model is a typical pre-training language model adopting bidirectional coding;

A Transform model, namely a deep learning model, which is mainly used for processing sequence data, in particular to the field of natural language processing;

LSTM is a Long Short-Term Memory network, which is a time-circulating neural network;

GAN model generation of an antagonism network is a deep learning model.

The invention mainly solves the demands of the types and the quantity of the questions in the answering game by intelligently generating the selection questions, the questions generated by the intelligent question generation demands can cover the aspects of life, and meanwhile, the generated questions are required to have higher quality and larger quantity, thereby meeting the questions required by the competition answering of the users in the game. The existing scheme of directly using the neural network cannot guarantee the quality of the questions, including the problems that the expression of the questions is not smooth and reasonable, the matching of options is not proper, and the like, because the neural network model is a black box, the neural network model can only learn a certain rule through a large amount of data to generate the questions, but the specific implementation is unknown, the output result of the model is uncontrolled, in addition, the accuracy of the questions is difficult to guarantee, the logic completeness of a proper question is very strong, and the problems that the answer is incorrect, the answer is not unique, the interference options are not proper, and the like exist when the selection questions are generated simply through the neural network technology.

In the process of actually generating the title, the invention mainly uses technologies such as named entity recognition, semantic analysis, text generation and the like, please refer to fig. 1, fig. 1 is a flowchart of a title generation method provided by an embodiment of the invention. The method may include:

S101, acquiring text materials containing knowledge points and word libraries formed by pre-arranging entities in each field and corresponding entity tags.

Specifically, step S101 is a data preparation step. Before the question generation process is carried out, collecting and arranging entities in each field and corresponding labels to form word libraries, wherein the finer the field is, the higher the quality of the question generation is, and meanwhile, short text materials containing knowledge points are required to be collected as the input of the whole question generation process.

S102, carrying out entity recognition on the text material by using a named entity recognition model and combining a word stock to obtain a recognition result, wherein the recognition result comprises a target entity and a corresponding target entity label.

The present embodiment does not limit the named entity recognition model, nor does it limit the number of named entity models. Step S102 is an entity identification part. When the text material is input into the named entity recognition model, the word stock is loaded in advance, so that the named entity model is constrained, and an accurate recognition result is obtained.

Further, in order to improve accuracy of entity result recognition, the above method for performing entity recognition on text materials by using a named entity recognition model and combining a word stock to obtain a recognition result may specifically include the following steps:

step 21, performing entity recognition on the text material by utilizing jieba entity recognition models and combining word stock to obtain a first recognition result;

Step 22, performing entity recognition on the text material by utilizing the LCA entity recognition model and combining a word stock to obtain a second recognition result;

And step 23, taking the overlapping part of the first identification result and the second identification result as the identification result.

In the embodiment, two entity recognition tools, specifically jieba entity recognition models and LAC entity recognition models, are adopted, double recognition is carried out by combining a word stock which is arranged in advance, and finally, the intersection of the two entity recognition models is selected as a final recognition result, so that the probability of single entity recognition model recognition errors is reduced.

Further, correcting the recognition result to improve the accuracy of the recognition result, and after the named entity recognition model is used and the text material is subjected to entity recognition by combining the word stock to obtain the recognition result, the method specifically further comprises the following steps:

correspondingly, in step S103, the corrected recognition result is subjected to semantic analysis and part-of-speech analysis by using a natural language processing model, so as to generate a stem of the question.

In this embodiment, considering that different entities belong to different entity labels in different semantic environments, i.e. different entity categories, such as "liqueur" belongs to "poetry" in materials of historic literature class, and "liqueur" belongs to "game character" in materials of game class, there may be a category identification error of a part of the entities by means of entity identification, so that an entity category corrector (predictor) is trained by using FastText model, input of FastText model is a coding vector composed of material text and entities, output of FastText model is probability of belonging to each category, and a category with the highest probability is selected as a final entity category, i.e. entity label. Through this step, the target entity and the target entity label corresponding to each knowledge material are obtained, and the detailed flow can refer to fig. 2, and fig. 2 is a flow example diagram of a method for determining a recognition result according to an embodiment of the present invention.

S103, utilizing a natural language processing model, and carrying out semantic analysis and part-of-speech analysis according to the recognition result in the text material to generate a question stem.

The present embodiment is not limited to a natural language processing model. And (3) performing simple semantic analysis and part-of-speech analysis according to the result of entity identification in the text material to judge whether the identified entity can be used as an answer to generate a stem.

Further, in order to improve the accuracy of the stem generation, the method for generating the stem by using the natural language processing model and performing semantic analysis and part-of-speech analysis according to the recognition result in the text material may specifically include the following steps:

And 31, performing context understanding on the identification result by using the BERT model, and if the target entity is related to other entities and the word order of the stem is reasonable according to the logical order of the parts of speech in the text material, hollowing the target entity in the text material to obtain the blank-filled stem.

The semantic analysis in this embodiment mainly uses the BERT model to understand the context, and determines whether the currently identified target entity has a certain relationship with other entities, if so, it will determine whether the word order of the generated stem is reasonable according to the part-of-speech logic order in the text material, and if the above conditions are satisfied, the target entity in the text material is replaced by a transverse line, so as to generate the gap-filling stem. As shown in the left half of fig. 3, fig. 3 is a flowchart illustrating a method for generating a title according to an embodiment of the present invention.

And step 32, performing natural language processing on the gap-filling type stem by using a Transform model to obtain the stem of the question.

In the embodiment, the blank-filling type stem is optimized in a natural language mode by using a Transform model, and statement expression is generated into question sentence forms conforming to natural language rules through the trained Transform model, for example, the author of 'Equipped with mind' input is ___. ", output:" what poetry is the author of "quiet night thought".

Further, in order to ensure the accuracy of the stem generation, the performing natural language processing on the gap-filling stem by using the Transform model to obtain the stem of the question may specifically include:

step 321, performing natural language processing on the gap-filling type stem by using a Transform model to obtain an optimized stem;

Step 322, analyzing the optimized stem by using a stem evaluation model based on LSTM to obtain a stem parameter value;

step 323, taking the optimized stem as the stem of the question if the parameter value of the stem is larger than a preset stem threshold;

step 324, otherwise, taking the gap-filling type stem as the stem of the question.

The embodiment further detects the optimization result of the Transform model. Since the result generated by the Transform model is not completely accurate, an evaluation model is added after the result generated by the Transform model for scoring the optimized result, the model is obtained by LSTM training, a value between 0,1 can be generated by inputting the generated stem and the answer, the value is closer to 1, the better the generated effect is, and therefore, whether the optimized stem is reserved or not is determined by setting a threshold value. The optimization of the stem may refer to the left half of the graph, and fig. 4 is a flowchart illustrating a method for optimizing the stem according to an embodiment of the present invention.

S104, obtaining interference items of the topics from the word stock according to the target entity tags, and taking the interference items and the target entity as options of the topics.

The embodiment is not limited to a specific method for acquiring the stem interference item. For example, a clustering algorithm may be used, or other algorithms for computing similarity may be selected. The method for generating the options is shown in the right half of fig. 3, and fig. 3 is a flowchart illustrating an embodiment of a method for generating a question according to the present invention.

Further, in order to generate diversity and rationality of the interference term, after the interference term of the stem is obtained from the word stock according to the target entity tag, the method may further include the following steps:

Carrying out diversified processing on interference items of the stems by using a GAN model to obtain a plurality of options;

Correspondingly, taking the interference item and the target entity as the item of the title;

the plurality of options and the target entity are taken as the options of the title.

The optimization of the options in this embodiment is to ensure the diversity and rationality of the options. Firstly, considering the diversity of options, training a generating countermeasure network according to the option data in the existing question bank to generate the options, and automatically adding the generated first three options into an initial option list. And carrying out diversification processing on interference items of the stems by using a GAN model to obtain a plurality of options. Referring to the right half of fig. 4, fig. 4 is a flowchart illustrating a method for optimizing a topic according to an embodiment of the present invention.

Further, in order to ensure the accuracy and reliability of the interference item generation, the above-mentioned option using the multiple options and the target entity as the topics may specifically include the following steps:

step 41, analyzing the plurality of options by using an interference item evaluation model based on LSTM, sorting the options according to parameter values of the options, and obtaining a preset number of target options according to sorting results;

And 42, taking the target options and the target entity as the options of the title.

According to the method, an interference option evaluation model is trained through LSTM according to the stems and interference option data in the existing question bank, the initial multiple options are ranked in a scoring mode, finally the required options are selected according to the sequence, and the purpose of the step is to select the interference item which is most suitable for the current stems from the initial option set through the evaluation model. Optimization of options referring to the right half of fig. 4, fig. 4 is a flowchart illustrating a method for optimizing a topic according to an embodiment of the present invention.

Further, the appointed large language model is called for final correctness checking for the finally generated selection questions, and whether further manual checking is needed is judged by comparing the answer result of the large language model with the correct answer of the current questions.

The method for generating the topics comprises the steps of obtaining text materials containing knowledge points and a word stock formed by pre-arranging entities and corresponding entity tags in each field, utilizing a named entity recognition model and combining the word stock to conduct entity recognition on the text materials to obtain recognition results, enabling the recognition results to comprise target entities and corresponding target entity tags, utilizing a natural language processing model to conduct semantic analysis and part-of-speech analysis according to the recognition results in the text materials to generate the topics stems, obtaining interference items of the topics from the word stock according to the target entity tags, and taking the interference items and the target entities as options of the topics. The invention decomposes the topic generation into a plurality of flows, and each flow can achieve the purpose of topic accuracy by training and optimizing different models, so that the generated topic accuracy is higher. The whole architecture is based on natural language processing technology and other neural network algorithms, such as a sequence network model, a countermeasure network generation model and a diffusion generation model, the degree of dependence on manpower in the process of producing questions is reduced, a large number of questions can be produced through the method only by collecting proper knowledge materials, in addition, the diversity of the questions is guaranteed through the questions after the questions are optimized and options are optimized, the repeatability of the questions with a large number of questions on the market is avoided, finally, machine checking is conducted through calling of the large language model, manpower is saved, and efficiency is improved.

For better understanding of the present invention, please refer to fig. 5, fig. 5 is a flowchart illustrating another method for generating a question according to an embodiment of the present invention, which may include:

(1) Data preparation, namely, natural language text materials containing specific knowledge points and classification word libraries containing various fields.

(2) And identifying the specific entity contained in the material and identifying the specific category of the entity according to the text material description.

(3) The method comprises the steps of generating a hollowed-out selection question, namely judging that an entity capable of giving questions is replaced by an underline through semantic analysis and part-of-speech analysis, and obtaining other entities of the same category as an initial interference item according to a current entity through a clustering algorithm.

(4) And optimizing the question stem, namely generating question form problems by the generated hollowed question stem according to a trained natural language model, and judging the score of the generated problems on grammar and smoothness by combining a scoring model (namely an evaluation model) to determine whether to keep an optimized result.

(5) And optimizing the question options, namely generating a batch of interference items outside the word stock according to the generated countermeasure network, the diffusion model and the like, and finally screening the interference items which are most suitable for the current question according to the trained option scoring model.

(6) And finishing the generation of the questions, namely checking the accuracy of the generated questions through a large language model.

The following describes a topic generation device provided in an embodiment of the present invention, and the topic generation device described below and the topic generation method described above can be referred to correspondingly.

Referring to fig. 6 specifically, fig. 6 is a schematic structural diagram of a topic generating device according to an embodiment of the present invention, which may include:

The text material and word stock obtaining module 100 is used for obtaining a word stock formed by text materials containing knowledge points, and pre-arranging entities in each field and corresponding entity tags;

The entity recognition module 200 is configured to perform entity recognition on the text material by using a named entity recognition model and combining the word stock to obtain a recognition result, where the recognition result includes a target entity and a corresponding target entity tag;

The stem generation module 300 is configured to generate a stem of a question by using a natural language processing model and performing semantic analysis and part-of-speech analysis according to the recognition result in the text material;

and the option generating module 400 is configured to obtain, according to the target entity tag, an interference item of the stem from the word stock, and use the interference item and the target entity as options of the topic.

Based on the above embodiment, the entity identification module 200 may include:

the first recognition unit is used for carrying out entity recognition on the text material by utilizing jieba entity recognition models and combining the word stock to obtain a first recognition result;

The second recognition unit is used for carrying out entity recognition on the text material by utilizing the LCA entity recognition model and combining the word stock to obtain a second recognition result;

And the identification result determining unit is used for taking the overlapping part of the first identification result and the second identification result as the identification result.

Based on the above embodiment, the topic generation device may further include:

The entity tag correction module is used for carrying out entity recognition on the text material by using a named entity recognition model and combining the word stock to obtain a recognition result, and carrying out entity tag correction on the target entity tag through a trained FastText model to obtain a corrected entity tag;

accordingly, the stem generation module 300 may include:

And the stem generation unit is used for carrying out semantic analysis and part-of-speech analysis on the corrected recognition result by utilizing a natural language processing model to generate the stem of the question.

Based on any of the foregoing embodiments, the stem generation module 300 may include:

The gap-filling type stem generation unit is used for carrying out context understanding on the identification result by utilizing a BERT model, and if the target entity is associated with other entities and the word order of the stem is reasonable according to the logic order of the parts of speech in the text material, the target entity in the text material is hollowed to obtain the gap-filling type stem;

and the natural language processing unit is used for carrying out natural language processing on the gap-filling type stem by utilizing a transformation model to obtain the stem of the question.

Based on the above embodiment, the natural language processing unit may include:

the stem optimization subunit is used for performing natural language processing on the gap-filling stem by using the transformation model to obtain an optimized stem;

the stem evaluation subunit is used for analyzing the optimized stem by utilizing a stem evaluation model based on LSTM to obtain a stem parameter value;

a first result subunit, configured to take the optimized stem as a stem of the question if the stem parameter value is greater than a preset stem threshold;

and the second result subunit is used for taking the gap-filling type question stem as the question stem of the question if not.

Based on the above embodiment, the topic generation device may further include:

the diversification processing module is used for carrying out diversification processing on the interference items of the stem by utilizing a GAN model after obtaining the interference items of the stem from the word stock according to the target entity tag so as to obtain a plurality of options;

accordingly, the option generating module 400 may include:

and the option generating unit is used for taking the plurality of options and the target entity as options of the title.

Based on the above embodiment, the option generating unit may include:

The option evaluation subunit is used for analyzing the plurality of options by utilizing an interference item evaluation model based on the LSTM, sequencing the options according to the parameter values of the options, and obtaining a preset number of target options according to the sequencing result;

And the option determining subunit is used for taking the target option and the target entity as options of the title.

The order of the modules and units in the above-described topic generation device may be changed without affecting the logic.

The topic generation device provided by the embodiment of the invention is applied to a word stock formed by a text material and word stock acquisition module 100 for acquiring a text material containing knowledge points and arranging entities in each field and corresponding entity tags in advance, an entity identification module 200 for carrying out entity identification on the text material by using a named entity identification model and combining the word stock to obtain an identification result, wherein the identification result comprises a target entity and a corresponding target entity tag, a topic stem generation module 300 for carrying out semantic analysis and part-of-speech analysis by using a natural language processing model and according to the identification result in the text material to generate a topic stem, and an option generation module 400 for acquiring interference items of the topic stem from the word stock according to the target entity tag and taking the interference items and the target entity as options of topics. Because the device decomposes the question generation into a plurality of steps, each step can achieve the purpose of question accuracy by training and optimizing different models, thereby ensuring the accuracy of the question generation. The whole architecture is based on natural language processing technology and other neural network algorithms, such as a sequence network model, a countermeasure network generation model and a diffusion generation model, the dependence on manpower in the process of producing questions is reduced, a large number of questions can be produced through the device only by collecting proper knowledge materials, in addition, the diversity of the questions is guaranteed through the questions after the questions are optimized with options, the repeatability of the questions with a large number of questions on the market is avoided, finally, machine checking is carried out by calling the large language model, the manpower is saved, and the efficiency is improved.

The following describes a topic generation device provided in an embodiment of the present invention, and the topic generation device described below and the topic generation method described above may be referred to correspondingly.

Referring to fig. 7, fig. 7 is a schematic structural diagram of a topic generating device according to an embodiment of the present invention, which may include:

A memory 10 for storing a computer program;

A processor 20 for executing a computer program to implement the subject generation method described above.

The memory 10, the processor 20, and the communication interface 31 all communicate with each other via a communication bus 32.

In the embodiment of the present invention, the memory 10 is used for storing one or more programs, the programs may include program codes, the program codes include computer operation instructions, and in the embodiment of the present invention, the memory 10 may store programs for implementing the following functions:

Acquiring text materials containing knowledge points and pre-arranging entities in each field and word libraries formed by corresponding entity tags;

Carrying out entity recognition on the text material by using a named entity recognition model and combining a word stock to obtain a recognition result, wherein the recognition result comprises a target entity and a corresponding target entity label;

and obtaining interference items of the topics from the word stock according to the target entity tags, and taking the interference items and the target entity as options of the topics.

In one possible implementation, memory 10 may include a storage program area that may store an operating system, as well as at least one application program required for functionality, etc., and a storage data area that may store data created during use.

In addition, memory 10 may include read only memory and random access memory and provide instructions and data to the processor. A portion of the memory may also include NVRAM. The memory stores an operating system and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, where the operating instructions may include various operating instructions for performing various operations. The operating system may include various system programs for implementing various basic tasks as well as handling hardware-based tasks.

The processor 20 may be a central processing unit (Central Processing Unit, CPU), an asic, a dsp, a fpga or other programmable logic device, and the processor 20 may be a microprocessor or any conventional processor. The processor 20 may call a program stored in the memory 10.

The communication interface 31 may be an interface of a communication module for connecting with other devices or systems.

Of course, it should be noted that the structure shown in fig. 7 is not limited to the topic generating device in the embodiment of the present invention, and the topic generating device may include more or less components than those shown in fig. 7 or may be combined with some components in practical applications.

The following describes a computer-readable storage medium provided in an embodiment of the present invention, and the computer-readable storage medium described below and the topic generation method described above may be referred to correspondingly.

The present invention also provides a computer-readable storage medium having a computer program stored thereon, which when executed by a processor, implements the steps of the topic generation method described above.

The computer readable storage medium may include a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk or an optical disk, etc. various media that can store program codes.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

Finally, it is further noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

The foregoing describes in detail a method, apparatus, device and computer readable storage medium for generating a title, wherein specific examples are set forth herein to illustrate the principles and embodiments of the present invention, and the description of the examples is only for aiding in the understanding of the method and its core concept, and wherein, for those skilled in the art, there are variations in the specific embodiments and application scope of the invention, and in light of the above, the disclosure should not be construed as limiting the invention.

Claims

Translated fromChinese

1.一种题目生成方法，其特征在于，包括：1. A method for generating a topic, comprising:

获取含有知识点的文本素材和预先整理各个领域的实体及对应的实体标签形成的词库；Obtain text materials containing knowledge points and pre-organize the vocabulary formed by entities in various fields and corresponding entity labels;

利用命名实体识别模型并结合所述词库对所述文本素材进行实体识别，得到识别结果；所述识别结果包括目标实体及对应的目标实体标签；Performing entity recognition on the text material using a named entity recognition model in combination with the vocabulary to obtain a recognition result; the recognition result includes a target entity and a corresponding target entity label;

利用自然语言处理模型，并根据所述文本素材中的所述识别结果进行语义分析和词性分析，生成题目的题干；Using a natural language processing model, and performing semantic analysis and part-of-speech analysis based on the recognition results in the text material, to generate a question stem;

根据所述目标实体标签从所述词库中获取所述题干的干扰项，将所述干扰项和所述目标实体作为所述题目的选项；Obtain interference items of the question stem from the vocabulary according to the target entity tag, and use the interference items and the target entity as options for the question;

利用命名实体识别模型并结合所述词库对所述文本素材进行实体识别，得到识别结果，包括：The named entity recognition model is used in combination with the vocabulary to perform entity recognition on the text material to obtain recognition results, including:

利用jieba实体识别模型并结合所述词库对所述文本素材进行实体识别，得到第一识别结果；Using the Jieba entity recognition model and combining the vocabulary to perform entity recognition on the text material, obtaining a first recognition result;

利用LCA实体识别模型并结合所述词库对所述文本素材进行实体识别，得到第二识别结果；Performing entity recognition on the text material using the LCA entity recognition model in combination with the vocabulary to obtain a second recognition result;

将所述第一识别结果和所述第二识别结果的重叠部分作为所述识别结果；taking the overlapping part of the first recognition result and the second recognition result as the recognition result;

在利用命名实体识别模型并结合所述词库对所述文本素材进行实体识别，得到识别结果之后，还包括：After performing entity recognition on the text material using the named entity recognition model in combination with the thesaurus to obtain a recognition result, the method further includes:

通过训练好的FastText模型对所述目标实体标签进行实体标签校正，得到校正后的实体标签；所述校正后的实体标签和所述目标实体共同构成校正后的识别结果；Performing entity label correction on the target entity label through the trained FastText model to obtain a corrected entity label; the corrected entity label and the target entity together constitute a corrected recognition result;

相应的，利用自然语言处理模型，并根据所述文本素材中的所述识别结果进行语义分析和词性分析，生成题目的题干，包括：Accordingly, a natural language processing model is used to perform semantic analysis and part-of-speech analysis based on the recognition results in the text material to generate the stem of the question, including:

利用自然语言处理模型对所述校正后的识别结果进行语义分析和词性分析，生成所述题目的题干。A natural language processing model is used to perform semantic analysis and part-of-speech analysis on the corrected recognition result to generate the stem of the question.

2.根据权利要求1所述的题目生成方法，其特征在于，利用自然语言处理模型，并根据所述文本素材中的所述识别结果进行语义分析和词性分析，生成题目的题干，包括：2. The method for generating a question according to claim 1, characterized in that the question stem is generated by using a natural language processing model and performing semantic analysis and part-of-speech analysis according to the recognition result in the text material, including:

利用BERT模型对所述识别结果进行上下文理解，若所述目标实体与其他实体存在关联，且根据所述文本素材中词性的逻辑顺序判断所述题干的语序合理，则将所述文本素材中的所述目标实体进行挖空，得到填空式题干；The BERT model is used to understand the context of the recognition result. If the target entity is associated with other entities, and the word order of the question stem is judged to be reasonable according to the logical order of the parts of speech in the text material, the target entity in the text material is hollowed out to obtain a fill-in-the-blank question stem;

利用Transformer模型对所述填空式题干进行自然语言处理，得到所述题目的题干。The Transformer model is used to perform natural language processing on the fill-in-the-blank question stem to obtain the question stem of the question.

3.根据权利要求2所述的题目生成方法，其特征在于，利用Transformer模型对所述填空式题干进行自然语言处理，得到所述题目的题干，包括：3. The method for generating questions according to claim 2, characterized in that the stem of the fill-in-the-blank question is subjected to natural language processing using a Transformer model to obtain the stem of the question, comprising:

利用所述Transformer模型对所述填空式题干进行自然语言处理，得到优化后的题干；Using the Transformer model to perform natural language processing on the fill-in-the-blank question stem to obtain an optimized question stem;

利用基于LSTM的题干评价模型对所述优化后的题干进行分析，得到题干参数值；Analyze the optimized question stem using the LSTM-based question stem evaluation model to obtain question stem parameter values;

若所述题干参数值大于预设题干阈值，则将所述优化后的题干作为所述题目的题干；If the question stem parameter value is greater than a preset question stem threshold, the optimized question stem is used as the question stem of the question;

否则，则将所述填空式题干作为所述题目的题干。Otherwise, the fill-in-the-blank question stem will be used as the question stem of the question.

4.根据权利要求1所述题目生成方法，其特征在于，根据所述目标实体标签从所述词库中获取所述题干的干扰项之后，还包括：4. The question generation method according to claim 1, characterized in that after obtaining the distractor items of the question stem from the vocabulary according to the target entity tag, it also includes:

利用GAN模型对所述题干的干扰项进行多样化处理，得到多个选项；The GAN model is used to diversify the interference items of the question stem to obtain multiple options;

相应的，将所述干扰项和所述目标实体作为所述题目的选项；Accordingly, the distractor and the target entity are used as options for the question;

将所述多个选项和所述目标实体作为所述题目的选项。The multiple options and the target entity are used as options for the question.

5.根据权利要求4所述的题目生成方法，其特征在于，将所述多个选项和所述目标实体作为所述题目的选项，包括：5. The method for generating a topic according to claim 4, wherein the multiple options and the target entity are used as options of the topic, comprising:

利用基于LSTM的干扰项评价模型对所述多个选项进行分析，根据各个选项参数值对各个选项进行排序，根据排序结果得到预设数量的目标选项；Analyze the multiple options using the LSTM-based interference item evaluation model, sort the options according to the parameter values of the options, and obtain a preset number of target options according to the sorting results;

将所述目标选项和所述目标实体作为所述题目的选项。The target option and the target entity are used as options of the topic.

6.一种题目生成装置，其特征在于，包括：6. A question generating device, characterized by comprising:

文本素材和词库获取模块，用于获取含有知识点的文本素材和预先整理各个领域的实体及对应的实体标签形成的词库；The text material and vocabulary acquisition module is used to acquire text materials containing knowledge points and pre-organize the vocabulary formed by entities in various fields and corresponding entity tags;

实体识别模块，用于利用命名实体识别模型并结合所述词库对所述文本素材进行实体识别，得到识别结果；所述识别结果包括目标实体及对应的目标实体标签；An entity recognition module, used to perform entity recognition on the text material using a named entity recognition model in combination with the vocabulary to obtain a recognition result; the recognition result includes a target entity and a corresponding target entity label;

题干生成模块，用于利用自然语言处理模型，并根据所述文本素材中的所述识别结果进行语义分析和词性分析，生成题目的题干；A question stem generation module, used to generate the question stem by using a natural language processing model and performing semantic analysis and part-of-speech analysis according to the recognition results in the text material;

选项生成模块，用于根据所述目标实体标签从所述词库中获取所述题干的干扰项，将所述干扰项和所述目标实体作为所述题目的选项；An option generation module, used to obtain interference items of the question stem from the vocabulary according to the target entity tag, and use the interference items and the target entity as options for the question;

所述实体识别模块，包括：The entity recognition module comprises:

第一识别单元，用于利用jieba实体识别模型并结合所述词库对所述文本素材进行实体识别，得到第一识别结果；A first recognition unit, configured to perform entity recognition on the text material by using the Jieba entity recognition model in combination with the vocabulary to obtain a first recognition result;

第二识别单元，用于利用LCA实体识别模型并结合所述词库对所述文本素材进行实体识别，得到第二识别结果；A second recognition unit is used to perform entity recognition on the text material by using the LCA entity recognition model in combination with the vocabulary to obtain a second recognition result;

识别结果确定单元，用于将所述第一识别结果和所述第二识别结果的重叠部分作为所述识别结果；a recognition result determination unit, configured to use an overlapping portion of the first recognition result and the second recognition result as the recognition result;

还包括：Also includes:

实体标签校正模块，用于在利用命名实体识别模型并结合所述词库对所述文本素材进行实体识别，得到识别结果之后，通过训练好的FastText模型对所述目标实体标签进行实体标签校正，得到校正后的实体标签；所述校正后的实体标签和所述目标实体共同构成校正后的识别结果；An entity label correction module is used to perform entity recognition on the text material using a named entity recognition model in combination with the vocabulary to obtain a recognition result, and then perform entity label correction on the target entity label using a trained FastText model to obtain a corrected entity label; the corrected entity label and the target entity together constitute a corrected recognition result;

相应的，题干生成模块，包括：Correspondingly, the question stem generation module includes:

题干生成单元，用于利用自然语言处理模型对所述校正后的识别结果进行语义分析和词性分析，生成所述题目的题干。The question stem generation unit is used to use a natural language processing model to perform semantic analysis and part-of-speech analysis on the corrected recognition result to generate the question stem of the question.

7.一种题目生成设备，其特征在于，包括：7. A topic generating device, characterized by comprising:

存储器，用于存储计算机程序；Memory for storing computer programs;

处理器，用于执行所述计算机程序时实现如权利要求1至5任一项所述的题目生成方法的步骤。A processor, configured to implement the steps of the question generating method as claimed in any one of claims 1 to 5 when executing the computer program.

8.一种计算机可读存储介质，其特征在于，所述计算机可读存储介质中存储有计算机可执行指令，所述计算机可执行指令被处理器加载并执行时，实现如权利要求1至5任一项所述的题目生成方法的步骤。8. A computer-readable storage medium, characterized in that the computer-readable storage medium stores computer-executable instructions, and when the computer-executable instructions are loaded and executed by a processor, the steps of the question generation method according to any one of claims 1 to 5 are implemented.