CN116501859B

Movatterモバイル変換

Info

Publication number: CN116501859B
Application number: CN202310752492.9A
Authority: CN
Inventors: 刘昊; 夏祎敏; 马坚; 魏志强; 孔令磊; 曾谁飞; 李桂玺; 张景瑞
Original assignee: Ocean University of China; Qingdao Haier Refrigerator Co Ltd
Current assignee: Ocean University of China; Qingdao Haier Refrigerator Co Ltd
Priority date: 2023-06-26
Filing date: 2023-06-26
Publication date: 2023-09-01
Anticipated expiration: 2043-06-26
Also published as: CN116501859A

Abstract

Translated fromChinese

本发明涉及一种基于冰箱领域的段落检索方法、设备和介质，属于自然语言处理段落检索领域，所述方法包括针对对偶性任务将交叉训练方法用于迁移学习的模型训练中；在问题生成模型上引入流利性奖励机制和基于目标性能奖励的数据过滤方法，所述方法使用交叉联合训练问题生成和段落检索模型降低模型在源领域过拟合。本发明引入基于流利性奖励机制的问题生成，从实践角度真正提升生成问题的质量，同时引入基于目标性能奖励的数据过滤方法，进一步提升QG和IR模型在冰箱领域的适应能力。

The invention relates to a paragraph retrieval method, equipment and medium based on the refrigerator field, belonging to the field of natural language processing paragraph retrieval. The method includes using a cross-training method for model training of transfer learning for dual tasks; in the question generation model In this paper, a fluency reward mechanism and a data filtering method based on target performance rewards are introduced, which use cross-joint training problem generation and paragraph retrieval models to reduce model overfitting in the source domain. The present invention introduces question generation based on the fluency reward mechanism, which truly improves the quality of generated questions from a practical point of view, and at the same time introduces a data filtering method based on target performance rewards to further improve the adaptability of QG and IR models in the field of refrigerators.

Description

Translated fromChinese

基于冰箱领域的段落检索方法、设备和介质Paragraph retrieval method, device and medium based on refrigerator field

技术领域technical field

本发明属于自然语言处理段落检索领域，特别涉及一种基于冰箱领域的段落检索方法、装置、设备和介质。The invention belongs to the field of natural language processing paragraph retrieval, and in particular relates to a paragraph retrieval method, device, equipment and medium based on the field of refrigerators.

背景技术Background technique

在现代家庭中，冰箱是不可或缺的家电之一。它为人们提供了食物储存和保鲜的重要功能，确保食物保持新鲜和安全。然而，冰箱的使用并非尽人皆知，而且涉及到各种复杂的方面，如正确的储存温度、食材分类、保鲜技巧等。对于许多人来说，他们在使用冰箱时常常面临一些疑问和问题。解答这些问题、提供准确和有用的指导对于确保食物安全和提高使用体验至关重要。在这种情况下，冰箱领域的段落检索任务变得尤为重要。段落检索是一种信息检索技术，它通过从大量文本中定位和提取相关的段落，帮助用户快速获取所需的知识和指导。在冰箱领域中，通过段落检索，可以提供准确、权威的指导，回答用户关于冰箱使用的各种问题。这种技术不仅仅是为了提供便利和实用性，更重要的是保证食品安全和健康。冰箱领域的段落检索任务必要性凸显了在满足用户需求和解决实际问题方面的重要作用。通过精确的检索和相关文本的提供，用户可以获得关于食物储存、温度控制、食材保鲜技巧等方面的权威指导。这不仅有助于用户正确地使用冰箱，还能够避免食物浪费和食品安全问题。因此，冰箱领域的段落检索任务对于提高用户体验、保障食品安全和推动冰箱技术的发展都具有重要的必要性。In modern families, the refrigerator is one of the indispensable home appliances. It provides people with important functions of food storage and preservation, ensuring that food remains fresh and safe. However, the use of a refrigerator is not well known and involves various complex aspects such as correct storage temperature, sorting ingredients, preservation techniques, etc. For many people, they often face some doubts and problems while using the refrigerator. Answering these questions and providing accurate and useful guidance is critical to ensuring food safety and improving the consumption experience. In this case, the task of paragraph retrieval in the refrigerator domain becomes particularly important. Paragraph retrieval is an information retrieval technique that helps users quickly acquire the knowledge and guidance they need by locating and extracting relevant paragraphs from a large amount of text. In the field of refrigerators, through paragraph retrieval, accurate and authoritative guidance can be provided, and various questions about the use of refrigerators can be answered. This technology is not only to provide convenience and practicality, but more importantly, to ensure food safety and health. The necessity of the paragraph retrieval task in the refrigerator field highlights the important role in meeting user needs and solving practical problems. Through precise searching and provision of relevant text, users can obtain authoritative guidance on food storage, temperature control, food preservation tips and more. This not only helps users use the refrigerator correctly, but also avoids food waste and food safety issues. Therefore, the paragraph retrieval task in the refrigerator field is necessary to improve user experience, ensure food safety, and promote the development of refrigerator technology.

在冰箱领域的段落检索任务（IR）相比与其他领域段落检索的难点是冰箱领域涉及到多个方面，例如食材的储存、温度控制、食物保鲜技巧等。这个领域中的知识和信息往往非常广泛而复杂，需要针对不同情况和需求提供具体的指导和建议。除此之外，冰箱领域的数据可能相对较少，尤其是公开可用的大规模数据集。这可能导致在构建段落检索模型时面临数据匮乏的挑战。通过问题生成任务生成丰富且准确的领域问题来完成段落检索模型的训练。问题生成(QG)是“从原始文本、数据库或语义表示等各种输入中自动生成问题”的任务。人们有能力提出丰富、有创意、有启发性的问题；例如，问题：冰箱里如何储存桂圆可以保持鲜甜？段落：将桂圆放入透明的密封容器中，确保容器密封良好，以防止氧气和湿气进入。然后将容器放入冷藏室，保持温度在4摄氏度左右。冷藏可以延长桂圆的保鲜期，同时避免过高的温度导致桂圆变质。建议在3至5天内食用桂圆以保持其新鲜口感。如何赋予问题生成模型在各种输入条件下提出切题且语法逻辑正确问题的能力，这是一个具有挑战性的任务。Compared with the paragraph retrieval task (IR) in the refrigerator field, the difficulty of paragraph retrieval in other fields is that the refrigerator field involves many aspects, such as storage of ingredients, temperature control, food preservation skills, etc. Knowledge and information in this field are often very broad and complex, requiring specific guidance and advice tailored to different situations and needs. Beyond that, there may be relatively little data in the refrigerator domain, especially publicly available large-scale datasets. This can lead to data-starved challenges when building paragraph retrieval models. The paragraph retrieval model is trained by generating rich and accurate domain questions through the question generation task. Question Generation (QG) is the task of "automatically generating questions from various inputs such as raw text, databases, or semantic representations". People have the ability to ask rich, creative, and enlightening questions; for example, the question: How to store longans in the refrigerator to keep them fresh and sweet? Paragraph: Put the longan in a transparent airtight container, make sure the container is well sealed to prevent oxygen and moisture from entering. Then put the container in the refrigerator and keep the temperature at about 4 degrees Celsius. Refrigeration can prolong the shelf life of longan, and at the same time prevent longan from going bad due to high temperature. It is recommended to consume longan within 3 to 5 days to keep its fresh taste. How to endow question generation models with the ability to ask relevant and grammatically correct questions under various input conditions is a challenging task.

在冰箱领域，为问题生成和段落检索任务收集标记数据需要领域专家，因此构建监督模型的成本很高。通过利用在标记数据易于获取的其他领域训练的模型，迁移学习规避冰箱领域标记数据不充足的限制。传统的QG和IR任务采用迁移学习中的自训练方法，在自训练中，给定一个能够在源领域执行感兴趣任务的预训练模型和来自冰箱领域的无标签数据，预训练模型被用来预测冰箱领域数据的标签。预训练的模型在合成数据上进一步训练以适应新的领域（这一步也被称为领域适应的微调）。虽然自训练提高了冰箱领域的性能，但微调过后的自我训练模型可能会由于确认偏差导致过拟合。In the refrigerator domain, collecting labeled data for question generation and paragraph retrieval tasks requires domain experts, so building supervised models is costly. By leveraging models trained in other domains where labeled data is readily available, transfer learning circumvents the limitation of insufficient labeled data in the refrigerator domain. Traditional QG and IR tasks adopt the self-training method in transfer learning. In self-training, given a pre-trained model capable of performing the task of interest in the source domain and unlabeled data from the refrigerator domain, the pre-trained model is used to Predict labels for refrigerator domain data. The pre-trained model is further trained on synthetic data to fit the new domain (this step is also known as fine-tuning for domain adaptation). While self-training improves performance in the refrigerator domain, the fine-tuned self-trained model may overfit due to confirmation bias.

发明内容Contents of the invention

本发明针对上述技术问题，提供一种基于冰箱领域的段落检索方法、装置、设备和介质。所述方法考虑到QG和IR任务的对偶性，提出交叉联合训练两个模型以降低模型在源领域过拟合。QG和IR的输入数据分别为段落、问题（该输入数据并不需要对齐）。通过QG和IR生成较高质量合成数据对，人工挑选高质量的数据对用于流利性奖励机制和问题生成评估的数据价值评估器模型的训练，解决了冰箱领域缺乏标签数据集，而无法完成流利性奖励模型和问题生成数据价值评估器模型训练的技术问题。本发明引入基于流利性奖励机制的问题生成，从实践角度真正提升生成问题的质量。此外，本发明引入基于目标性能奖励的数据过滤方法，进一步提升QG和IR模型在冰箱领域的适应能力。Aiming at the above technical problems, the present invention provides a paragraph retrieval method, device, equipment and medium based on the field of refrigerators. The method considers the duality of QG and IR tasks, and proposes to cross-jointly train the two models to reduce the overfitting of the model in the source domain. The input data of QG and IR are paragraphs and questions respectively (the input data does not need to be aligned). Generate high-quality synthetic data pairs through QG and IR, and manually select high-quality data pairs for the training of the data value evaluator model for fluency reward mechanism and question generation evaluation, which solves the lack of label data sets in the refrigerator field and cannot be completed Fluency Reward Models and Technical Issues in Question Generation Data Value Evaluator Model Training. The present invention introduces question generation based on a fluency reward mechanism, and truly improves the quality of generated questions from a practical point of view. In addition, the present invention introduces a data filtering method based on target performance rewards to further improve the adaptability of QG and IR models in the field of refrigerators.

本发明是通过如下技术方案来实现的：The present invention is achieved through the following technical solutions:

一种基于冰箱领域的段落检索方法，所述方法包括：A paragraph retrieval method based on the refrigerator field, the method comprising:

步骤一、针对对偶性任务将交叉训练方法用于迁移学习的模型训练中；所述的交叉训练方法使用问题生成模型（QG）和段落检索模型(IR)，收集冰箱领域知识的段落和相关问题作为问题生成模型（QG）和段落检索模型(IR)的训练数据，这部分人工收集的训练数据并不需要对齐，最后获得合成数据，所述的合成数据相对于传统自训练方法获得数据质量较高；Step 1. The cross-training method is used in the model training of transfer learning for dual tasks; the cross-training method uses a question generation model (QG) and a paragraph retrieval model (IR) to collect paragraphs and related questions of refrigerator domain knowledge As the training data of the question generation model (QG) and paragraph retrieval model (IR), this part of the artificially collected training data does not need to be aligned, and finally the synthetic data is obtained. Compared with the data obtained by the traditional self-training method, the quality of the synthetic data is relatively low. high;

步骤二、在问题生成模型上引入流利性奖励机制；问题生成模型使用基本模型生成问题，从步骤一获得较质量合成数据中，人工挑选高质量数据作为流利性奖励机制的训练数据，流利性奖励机制用于评估基本模型生成问题的流畅性；然后，通过优化流利性奖励机制，并在强化学习框架下对基本模型进行微调；Step 2. Introduce a fluency reward mechanism to the question generation model; the question generation model uses the basic model to generate questions, and from the relatively high-quality synthetic data obtained in step 1, manually select high-quality data as the training data for the fluency reward mechanism, and the fluency reward The mechanism is used to evaluate the fluency of the basic model generation problem; then, the basic model is fine-tuned under the reinforcement learning framework by optimizing the fluency reward mechanism;

步骤三、基于目标性能奖励的数据过滤方法；从步骤一获得较质量合成数据中，人工挑选高质量数据作为数据价值评估器模型的训练数据，当段落检索模型回答问题时，记录其回答是否正确以及所用时间在内的信息，并将其作为反馈传递给数据价值评估器模型；使用段落检索模型直接反馈的信息来调整数据价值评估器模型的参数，以更好地估计数据的价值，提高问题生成和段落检索模型的性能，进一步提升迁移模型在冰箱领域的适应能力。Step 3. A data filtering method based on target performance rewards; from the relatively high-quality synthetic data obtained in step 1, manually select high-quality data as the training data for the data value evaluator model, and record whether the answer is correct when the paragraph retrieval model answers the question And the information including the time used, and pass it to the data value evaluator model as feedback; use the information directly fed back by the paragraph retrieval model to adjust the parameters of the data value evaluator model to better estimate the value of the data and improve the problem The performance of generation and paragraph retrieval models further improves the adaptability of transfer models in the refrigerator domain.

进一步，所述的步骤一：QG和IR具有对偶性质，QG使用BART模型；IR使用预训练的密集通道检索器DPR ,它使用BERT双编码器分别对问题和段落进行编码，并训练其最大化编码和之间的点积，同时最小化其他密切相关但负向段落的相似度；对于QG，在冰箱领域中有未标记的问题，它的对偶任务IR从冰箱领域中检索它们对应的输入段落，产生的问题段落对添加到QG的合成数据中用于微调QG；对于IR，在冰箱领域中有未标记的段落，QG生成它们的输入问题，产生的问题段落对添加到IR的合成数据中用于微调IR。Further, the first step: QG and IR have a dual nature, QG uses the BART model; IR uses the pre-trained dense channel retriever DPR, which uses the BERT dual encoder to separately analyze the problem and paragraphs encode, and train it to maximize the encoding and while minimizing the similarity of other closely related but negative passages; for QG, with unlabeled problems in the refrigerator domain, its dual task IR retrieves their corresponding input paragraphs from the refrigerator domain, yielding The question-paragraph pairs of QG are added to the synthetic data of QG for fine-tuning QG; for IR, there are unlabeled passages in the refrigerator field, QG generates their input questions, and the resulting question-paragraph pairs are added to the synthetic data of IR for Fine tune the IR.

进一步，所述的基本模型为BART模型。Further, the basic model is the BART model.

进一步，所述的步骤二、首先预训练一个语言模型，然后，将步骤一生成的问题的流利性奖励定义为由评估的负困惑度，表示为：Further, in the second step, first pre-training a language model , and then, the problem generated in step 1 Fluency rewards for defined by The estimated negative perplexity, expressed as:

(2)； (2);

为了在训练中优化流利性奖励机制，定义的损失函数如下：In order to optimize the fluency reward mechanism during training, the defined loss function as follows:

(3)； (3);

其中是预测的问题中的第t个标记，它是从问题生成器的解码器指定的词汇分布中进行采样的；T是预测的问题中共有T个标记；是预先定义的负困惑度，它在强化学习算法中作为基准奖励，用于稳定训练过程。in is a matter of prediction The t-th token in , which is the vocabulary distribution specified by the decoder from the question generator is sampled in ; T is the problem of prediction There are T tokens in total; is a predefined negative perplexity, which is used as a baseline reward in reinforcement learning algorithms to stabilize the training process.

进一步，所述的步骤三中估计数据的价值：将问题、对应答案段落和上下文连接作为数据价值评估器模型的输入，并使用BERT对序列进行编码；Further, the value of the data is estimated in the third step: the question, the corresponding answer paragraph and the context connection are used as the input of the data value evaluator model, and the sequence is encoded using BERT;

(4)； (4);

其中，、c分别表示答案段落、问题和上下文；表示从“”标记派生的输入序列的隐藏表示；是用作分隔符的特殊标记。in, , c represent answer paragraph, question and context respectively; means from " " token derived hidden representation of the input sequence; is a special token used as a separator.

本发明是一种基于冰箱领域的段落检索的装置，所述装置包括基于交叉训练方法的迁移学习的模型训练模块，基于流利性奖励机制的问题生成模型模块，基于目标性能奖励的数据过滤模块；所述基于交叉训练方法的迁移学习的模型训练模块运行所述步骤一的方法；所述基于流利性奖励的问题生成模型模块运行步骤二的方法，所述基于目标性能奖励的数据过滤模块运行所述步骤三的方法。The present invention is a device for paragraph retrieval based on the field of refrigerators. The device includes a model training module of transfer learning based on a cross-training method, a problem generation model module based on a fluency reward mechanism, and a data filtering module based on target performance rewards; The model training module of the migration learning based on the cross-training method runs the method of the first step; the problem generation model module based on the fluency reward runs the method of the second step, and the data filtering module based on the target performance reward runs the method method in step three.

本发明还提供一种计算机可读存储介质，所述计算机可读存储介质存储有计算机程序，所述计算机程序适用于由处理器加载并执行所述基于冰箱领域的段落检索方法。The present invention also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program is suitable for being loaded by a processor and executing the paragraph retrieval method based on the refrigerator field.

本发明与现有技术相比的有益效果：（1）由于当前冰箱领域缺乏训练段落检索模型的公开问题数据集，本发明首先引入了问题生成模型生成问题。其次，当前迁移学习下通常使用自训练的方式微调QG和IR，可能会受到确认偏差的影响，从而导致过拟合的问题。本发明提出一种基于迁移学习的交叉训练问题生成和段落检索模型的方法，将这QG和IR联合训练，改善了自训练方法造成的问题；The beneficial effects of the present invention compared with the prior art: (1) Since the current refrigerator field lacks public question datasets for training paragraph retrieval models, the present invention firstly introduces the question generation model generation problem. Secondly, under current transfer learning, self-training is usually used to fine-tune QG and IR, which may be affected by confirmation bias, resulting in over-fitting problems. The present invention proposes a method for cross-training question generation and paragraph retrieval model based on transfer learning, and jointly trains the QG and IR to improve the problems caused by the self-training method;

（2）由于冰箱领域缺乏公开标记数据对，领域专家标注数据需要耗费大量成本。本发明提出基于迁移学习的交叉训练方法不仅改善了过拟合问题，而且使得模型训练仅需要整理未对齐的段落和问题就可生成质量较高的数据对，大大节省了数据标记的开支。此外，根据模型生成的合成数据，人工再次挑选高质量数据用于流利性奖励模型和数据价值评估器的训练，解决了流利性奖励模型和数据价值评估器模型缺乏冰箱领域训练数据的问题，从而使得这两个个模型在冰箱领域的性能更好；(2) Due to the lack of publicly labeled data pairs in the refrigerator field, it takes a lot of cost for domain experts to label data. The cross-training method based on migration learning proposed by the present invention not only improves the overfitting problem, but also enables model training to generate high-quality data pairs only by sorting out unaligned paragraphs and questions, which greatly saves the cost of data labeling. In addition, according to the synthetic data generated by the model, high-quality data is manually selected again for the training of the fluency reward model and the data value evaluator, which solves the problem of the lack of training data in the refrigerator field for the fluency reward model and the data value evaluator model, thereby Make these two models perform better in the field of refrigerators;

（3）由于传统问题生成任务的评价指标，局限于评估生成文本和答案的相似度与重合度。因此，本发明采用人类对问题质量评价中经常引用的问题句语法和逻辑的正确性，针对这个指标训练相关模型。本发明内容针对传统问题生成模型缺乏对于句子语法正确性的优化，引入了基于流利性奖励机制的问题生成模型；(3) Due to the evaluation indicators of the traditional question generation task, it is limited to evaluate the similarity and coincidence between the generated text and the answer. Therefore, the present invention adopts the correctness of grammar and logic of question sentences often cited in human evaluation of question quality, and trains a relevant model for this indicator. The content of the present invention aims at the lack of optimization of sentence grammatical correctness in the traditional question generation model, and introduces a question generation model based on a fluency reward mechanism;

（4）虽然现有技术中的一致性评估器过滤掉部分低质量数据，但是现有技术中的评估器置信度阈值的设置不能适应冰箱领域。本发明提出一种基于目标性能奖励的数据过滤方法，从而进一步提高QG和IR模型在冰箱领域的适应能力。(4) Although the consistency evaluator in the prior art filters out some low-quality data, the setting of the confidence threshold of the evaluator in the prior art cannot be adapted to the refrigerator field. The present invention proposes a data filtering method based on target performance rewards, thereby further improving the adaptability of QG and IR models in the field of refrigerators.

附图说明Description of drawings

图1本发明步骤一的流程图；The flowchart of Fig. 1 step one of the present invention;

图2为发明步骤二的流程图；Fig. 2 is the flowchart of invention step 2;

图3为发明步骤三的流程图。Fig. 3 is a flowchart of the third step of the invention.

具体实施方式Detailed ways

下面将结合本发明的具体实施例对本发明作进一步的说明，显然，所描述的实施例仅是本发明的一部分实施例，而不是全部的实施例。基于本发明的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The present invention will be further described below in conjunction with specific embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts fall within the protection scope of the present invention.

实施例1：一种基于冰箱领域的段落检索方法，所述方法具体步骤如下：Embodiment 1: a paragraph retrieval method based on the refrigerator field, the specific steps of the method are as follows:

步骤一、针对对偶性任务将交叉训练方法用于迁移学习的模型训练中；所述的交叉训练方法使用问题生成模型（QG）和段落检索模型(IR)，收集冰箱领域知识的段落和相关问题作为问题生成模型（QG）和段落检索模型(IR)的训练数据，这部分人工收集的训练数据并不需要对齐，最后获得高质量合成数据；Step 1. The cross-training method is used in the model training of transfer learning for dual tasks; the cross-training method uses a question generation model (QG) and a paragraph retrieval model (IR) to collect paragraphs and related questions of refrigerator domain knowledge As the training data of the question generation model (QG) and the paragraph retrieval model (IR), this part of the artificially collected training data does not need to be aligned, and finally high-quality synthetic data is obtained;

步骤二、在问题生成模型上引入流利性奖励机制，问题生成模型使用基本模型生成问题，从步骤一获得较质量合成数据中，人工挑选高质量数据作为流利性奖励机制的训练数据，流利性奖励机制用于评估基本模型生成问题的流畅性；然后，通过优化流利性奖励机制，并在强化学习框架下对基本模型进行微调；Step 2. Introduce a fluency reward mechanism to the question generation model. The question generation model uses the basic model to generate questions. From the relatively high-quality synthetic data obtained in step 1, manually select high-quality data as the training data for the fluency reward mechanism, and the fluency reward The mechanism is used to evaluate the fluency of the basic model generation problem; then, the basic model is fine-tuned under the reinforcement learning framework by optimizing the fluency reward mechanism;

步骤三、基于目标性能奖励的数据过滤方法；从步骤一获得较质量合成数据中，人工挑选高质量数据作为数据价值评估器模型的训练数据当段落检索模型回答问题时，记录其回答是否正确以及所用时间在内的信息，并将其作为反馈传递给数据价值评估器模型；使用段落检索模型直接反馈的信息来调整数据价值评估器模型的参数，以更好地估计数据的价值，提高问题生成和段落检索模型的性能，进一步提升迁移模型在冰箱领域的适应能力。Step 3. A data filtering method based on target performance rewards; from the relatively high-quality synthetic data obtained in step 1, manually select high-quality data as the training data for the data value evaluator model. When the paragraph retrieval model answers questions, record whether the answer is correct and information including the time spent and pass it as feedback to the data value estimator model; use the information directly fed back by the paragraph retrieval model to adjust the parameters of the data value estimator model to better estimate the value of the data and improve question generation and the performance of the paragraph retrieval model, further improving the adaptability of the migration model in the refrigerator field.

所述方法具体包括以下步骤：Described method specifically comprises the following steps:

步骤一、如图1所示，QG使用BART模型。IR使用预训练的密集通道检索器DPR,它使用BERT双编码器分别对问题和段落进行编码，并训练其最大化编码和之间的点积，同时最小化其他密切相关但负向段落的相似度。对于QG，在目标域中有未标记的问题，它的对偶任务IR可以从目标域中检索它们对应的输入段落，产生的问题段落对添加到QG的合成数据中用于微调QG。对于IR，在目标域中有未标记的段落，QG可以生成他们的输入问题，产生的问题段落对添加到IR的合成数据中用于微调IR。值得注意的是，在自训练过程中，QG模型学习QG模型自身的合成数据，IR模型同理。在自训练的训练数据中，输入数据是从目标域中进行分布采样，输出数据是存在噪声标签的预测结果。因此，这两个模型会学习到有噪声标签的输出数据，从而导致模型在冰箱领域出现过拟合的问题。而在交叉训练过程中，QG模型的训练数据是IR模型的合成数据，IR模型的训练数据是QG模型的合成数据。对于QG模型的训练数据来说，它的输入数据是IR模型有噪声标签的输出数据,它的输出数据是IR模型从目标域中进行分布采样的输入数据。所以，在交叉训练中，QG模型学习到的是有正确标签的输出数据，IR模型同理。这与自训练方法的训练数据标签正好相反。因此，交叉训练方法可以减少模型对于有噪声输出标签的学习，改善QG和IR模型过拟合的问题。Step 1, as shown in Figure 1, QG uses the BART model. IR uses a pre-trained dense channel retriever DPR, which uses BERT dual encoders to separately and paragraphs encode, and train it to maximize the encoding and while minimizing the similarity of other closely related but negative passages. For QG, with unlabeled questions in the target domain, its dual task IR can retrieve their corresponding input passages from the target domain, and the resulting question passage pairs are added to QG’s synthetic data for fine-tuning QG. For IR, with unlabeled passages in the target domain, QG can generate their input questions, and the generated question passage pairs are added to the synthetic data of IR for fine-tuning IR. It is worth noting that during the self-training process, the QG model learns the synthetic data of the QG model itself, and the IR model does the same. In the self-training training data, the input data is distributed sampling from the target domain, and the output data is the prediction result with noisy labels. Therefore, these two models will learn the output data with noisy labels, which will lead to the overfitting problem of the model in the refrigerator domain. In the cross-training process, the training data of the QG model is the synthetic data of the IR model, and the training data of the IR model is the synthetic data of the QG model. For the training data of the QG model, its input data is the output data of the IR model with noisy labels, and its output data is the input data of the IR model's distribution sampling from the target domain. Therefore, in cross-training, the QG model learns the output data with the correct label, and the IR model is the same. This is exactly the opposite of the training data labels for self-training methods. Therefore, the cross-training method can reduce the model's learning of noisy output labels and improve the overfitting problem of QG and IR models.

步骤二、如图2所示，问题生成器使用BART模型，给定一个段落作为输入，目标是生成一个相关的问题，该问题可以由段落回答。这可以表示为最大化条件概率:Step 2, as shown in Figure 2, the question generator uses the BART model, given a paragraph As input, the goal is to generate a relevant question , the question can be represented by the paragraph answer. This can be expressed as maximizing the conditional probability :

(1)， (1),

其中是生成的问题的第t个标记，而q<t表示先前解码的标记，即,…,。基于流利性奖励机制的问题生成模型的总体框架如图 2 所示。in is the generated problem The t-th token of , while q < t represents the previously decoded token, ie ,…, . The overall framework of the question generation model based on the fluency reward mechanism is shown in Figure 2.

本实施例设计了流利性奖励机制，旨在评估基本模型生成的问题的流畅性。然后，通过优化流利性奖励机制，并在强化学习框架下对BART模型进行微调。接下来，本发明详细描述流利性奖励机制的设计。In this embodiment, a fluency reward mechanism is designed to evaluate the fluency of the questions generated by the basic model. Then, by optimizing the fluency reward mechanism, and fine-tuning the BART model under the framework of reinforcement learning. Next, the present invention describes the design of the fluency reward mechanism in detail.

在训练良好的语言模型（LM）下，一个句子的困惑度通常被视为其流畅性的良好指标。因此，本实施例引入了基于语言模型的奖励来提高生成问题的流畅性。首先预训练一个语言模型，然后，将生成的问题的流利性奖励定义为由评估的负困惑度，表示为：Under a well-trained language model (LM), the perplexity of a sentence is usually considered a good indicator of its fluency. Therefore, this embodiment introduces a reward based on a language model to improve the fluency of generating questions. First pre-train a language model , then, the generated question Fluency rewards for defined by The estimated negative perplexity, expressed as:

(2)， (2),

为了在训练中优化流利性奖励，定义的损失函数如下：In order to optimize the fluency reward during training, the defined loss function as follows:

(3)， (3),

其中是预测的问题中的第t个标记，它是从问题生成器的解码器指定的词汇分布中进行采样的。是预先定义的负困惑度，它在强化学习算法中作为基准奖励，用于稳定训练过程，本实施例所用的语言模型是BART模型。in is a matter of prediction The t-th token in , which is the vocabulary distribution specified by the decoder from the question generator sampled in. is a predefined negative perplexity, which is used as a benchmark reward in the reinforcement learning algorithm to stabilize the training process. The language model used in this embodiment It is the BART model.

首先通过最小交叉熵损失、复制误差损失来对问题生成模型进行预训练，这些损失合并为；Firstly, the question generation model is pre-trained by minimum cross-entropy loss and replication error loss, and these losses are combined as ;

， ,

通过线性组合和基于强化学习的损失函数的组合损失函数来微调使用训练的基本QG 模型，以最大化先前定义的针对 QG 的流利性奖励。具体如下所示：by linear combination and a loss function based on reinforcement learning A combined loss function to fine-tune using A base QG model trained to maximize the previously defined fluency reward for QG. Specifically as follows:

，其中L表示组合损失函数。 , where L represents the combined loss function.

步骤三、如图3所示，在步骤二的基础上设计了一个数据价值评估器模型，记作，它接收一个合成的问答示例；（c_u为冰箱领域上下文，p_u为冰箱领域的段落，为生成的问题），并输出一个表示其“价值”的分数，即。这个“价值”可以表示“作为训练样本时改善冰箱领域段落检索性能的潜力”。通过这个分数，可以选择冰箱领域段落检索训练中最有用的合成示例。本实施例使用BERT模型作为问题价值评估器的基础。具体而言，将问题、对应答案段落和上下文连接作为问题价值评估器的输入，并使用BERT对序列进行编码；Step 3, as shown in Figure 3, a data value evaluator model is designed on the basis of step 2, denoted as , which receives a synthetic question-answer example ; (c_u is the context of the refrigerator domain, p_u is the paragraph of the refrigerator domain, for the generated question), and outputs a score representing its "value", i.e. . This "value" can represent "the potential to improve paragraph retrieval performance in the refrigerator domain when used as training samples". With this score, the most useful synthetic examples for paragraph retrieval training in the refrigerator domain can be selected. This embodiment uses the BERT model as the basis for the question value evaluator. Specifically, the question, the corresponding answer paragraph and the context connection are taken as input to the question value evaluator, and the sequence is encoded using BERT;

(4)， (4),

其中，、c分别表示答案段落、问题和上下文。表示从“”标记派生的输入序列的隐藏表示。是用作分隔符的特殊标记。in, , c denote answer paragraph, question and context, respectively. means from " " token derived hidden representation of the input sequence. is a special token used as a separator.

将预训练的段落检索模型给出的答案段落（起始索引和结束索引）的概率作为额外的特征添加到隐藏表示h中，加速数据价值评估器模型的训练收敛并提高性能。因此，将这两个特征添加到原始隐藏表示的线性转换，然后构建一个线性分类器来输出问题的值；The probability of the answer paragraph (start index and end index) given by the pre-trained paragraph retrieval model Added as additional features to the hidden representation h, speed up the training convergence of the data value estimator model and improve the performance. Therefore, combining these two features Add a linear transformation to the original hidden representation, then build a linear classifier to output the value of the question;

(5)， (5),

(6)， (6),

(7)， (7),

其中，H表示使用BERT模型从“<CLS>”令牌派生的输入序列的隐藏表示的维度。H1, H2, H3, H4表示QVE (question value estimator)中中间隐藏表示的维度，W1, W2, W3, W4:表示线性层的可训练权参数；，，，；b1, b2, b3, b4:表示线性层的可训练偏差参数，，，，是线性层的可训练参数。where H denotes the dimensionality of the hidden representation of the input sequence derived from the “<CLS>” token using the BERT model. H1, H2, H3, H4 represent the dimensions of the intermediate hidden representation in QVE (question value estimator), W1, W2, W3, W4: represent the trainable weight parameters of the linear layer; , , , ; b1, b2, b3, b4: represent the trainable bias parameters of the linear layer, , , , is the trainable parameter of the linear layer.

数据价值评估器模型的奖励是基于选定的样本在训练冰箱领域的IR模型时所带来的性能提升。为此，根据交叉熵损失在选定的批量样本上对IR模型的微调。The reward of the data value evaluator model is based on the performance improvement brought by the selected samples when training the IR model in the refrigerator domain. To this end, the IR model is trained on a selected batch of samples according to the cross-entropy loss fine-tuning.

奖励被定义为在冰箱领域选定的批量样本Pt上对IR模型在微调之前（）和微调之后（）的性能增益；award is defined as the IR model before fine-tuning ( ) and after fine-tuning ( ) performance gain;

(8)。 (8).

鉴于问题选择过程是离散且不可微分的，使用强化学习对数据价值评估器模型进行更新。从数学上讲，目标是最小化以下表达式：Given that the problem selection process is discrete and non-differentiable, the data value estimator model is updated using reinforcement learning. Mathematically, the goal is to minimize the following expression:

(9) ， (9) ,

表示损失函数，目标是最小化该损失；E：表示期望值，表示对后面的表达式进行期望操作。S：表示选择的问题，代表从策略中选择的问题。表示问题选择的策略，γ为参数，D为输入数据。 Represents the loss function, the goal is to minimize the loss; E: Represents the expected value, which represents the expected operation of the following expression. S: Indicates the question of choice, and stands for Slave Strategy Questions selected in . Indicates the strategy of problem selection, γ is the parameter, and D is the input data.

训练完问题价值估计器模型后，可以使用它来计算冰箱领域上所有合成问题的问题值。然后，本实施例选择前K%的合成数据对作为训练语料来训练冰箱领域IR模型。具体的最高K%的数值取决于具体的设置和需求。这个数值根据实际情况和冰箱领域的需求进行调整。一般来说，选择最高K%的合成问题—段落对作为训练语料库是为了筛选出相对较高质量的样本，以避免将低质量的合成问题引入到冰箱领域的IR模型训练中。具体选择多大的K%取决于需要平衡训练样本的数量和质量，一般可根据实验和验证集的性能来进行调整和选择。在冰箱领域，由于领域特殊性，为保障用户体验，本发明选择的K%值为30%。Once the question value estimator model is trained, it can be used to compute question values for all synthetic problems on the refrigerator domain. Then, this embodiment selects the first K% synthetic data pairs as the training corpus to train the refrigerator domain IR model. The specific maximum K% value depends on the specific settings and needs. This value is adjusted according to the actual situation and the needs of the refrigerator field. In general, selecting the top K% synthetic question-paragraph pairs as the training corpus is to screen out relatively high-quality samples to avoid introducing low-quality synthetic questions into IR model training in the refrigerator domain. The specific choice of K% depends on the need to balance the number and quality of training samples, and generally can be adjusted and selected according to the performance of experiments and verification sets. In the field of refrigerators, due to the particularity of the field, in order to ensure user experience, the K% value selected by the present invention is 30%.

本发明的问题价值评估模型通过考虑下游IR模型性能的优化目标，问题价值评估模型可以选择更有用的问题，从而改善冰箱领域IR模型。The question value evaluation model of the present invention can select more useful questions by considering the optimization target of the performance of the downstream IR model, thereby improving the IR model in the refrigerator field.

问题价值评估模型在有更多注释的（问题—段落）数据对时，改进通常更大。这是因为问题价值评估模型训练（使用强化学习）依赖于基于可用注释对的IR反馈。随着更多的注释对，反馈可以更准确，从而导致更好的问题价值评估模型选择更有用的合成问题。而本发明基于步骤一完成了大量较高质量合成数据对的生成。通过这部分合成数据，人工挑选获得了高质量的可用注释对，从而使得IR模型反馈更准确，问题选择更有价值。The improvement of the question value evaluation model is generally larger with more annotated (question-paragraph) data pairs. This is because question value assessment model training (using reinforcement learning) relies on IR feedback based on available annotation pairs. With more annotation pairs, the feedback can be more accurate, leading to better question value evaluation models selecting more useful synthetic questions. However, the present invention completes the generation of a large number of higher-quality synthetic data pairs based on the first step. Through this part of synthetic data, manual selection obtains high-quality available annotation pairs, which makes IR model feedback more accurate and question selection more valuable.

本发明在经过三个步骤之后，构建了基于流利性奖励机制的问题生成模型、基于目标性能奖励优化的过滤器和段落检索模型。首先我们使用基于流利性奖励机制的问题生成模型生成合成数据对。其次，我们使用基于目标性能奖励优化的过滤器过滤出评分值高的数据。然后，该部分数据作为段落检索模型的训练数据。最后，段落检索模型的合成数据作为基于流利性奖励机制的问题生成模型的训练数据，迭代循环训练，最终实现模型性能的共同提升。After three steps, the present invention constructs a question generation model based on a fluency reward mechanism, a filter and a paragraph retrieval model based on target performance reward optimization. First we generate synthetic data pairs using a question generation model based on a fluency reward mechanism. Second, we filter out data with high score values using filters optimized based on target performance rewards. Then, this part of the data is used as the training data for the paragraph retrieval model. Finally, the synthetic data of the paragraph retrieval model is used as the training data of the question generation model based on the fluency reward mechanism, and iterative cycle training is used to achieve the joint improvement of model performance.

Claims

1. A paragraph retrieval method based on the field of refrigerators is characterized by comprising the following steps:

step one, a cross training method is used for model training of transfer learning aiming at a dual task; the cross training method uses a problem generation model and a paragraph retrieval model, wherein the problem generation model is abbreviated as QG, the paragraph retrieval model is abbreviated as IR, paragraphs of knowledge in the field of refrigerators and related problems are collected to serve as training data of the problem generation model and the paragraph retrieval model, the training data do not need to be manually aligned, and finally synthetic data are obtained;

QG and IR have dual properties, QG uses BART model; IR uses a pre-trained dense channel searcher DPR, QG uses a BERT double encoder to encode the question q and the paragraph p, respectively, and the BERT double encoder is trained to maximize the code E_P (p) and E_Q (q) similarity between closely related but negative going paragraphs while minimizing similarity of other closely related but negative going paragraphs; for QG, there is an unlabeled problem in the refrigerator field, its dual task IR retrieves the input paragraph corresponding to the unlabeled problem from the refrigerator field, the generated question paragraph pair is added to the synthetic data of QG for fine tuning the QG; for IR, there are unlabeled paragraphs in the refrigerator field, the QG generates their input questions, the generated question paragraph pairs are added to the IR's synthetic data for fine-tuning the IR;

step two, introducing a fluency rewarding mechanism on the problem generating model; the problem generating model generates a problem by using a basic model, and in the step one, high-quality data is manually selected as training data of a fluency rewarding mechanism, wherein the fluency rewarding mechanism is used for evaluating the fluency and grammar correctness of the basic model generating problem; then, fine tuning is carried out on the basic model under the reinforcement learning framework by optimizing a fluency rewarding mechanism;

first, a language model p is pre-trained_LM Then, the fluency rewards R of the questions q generated in the step one are awarded_flu Is defined as p_LM Negative confusion of the evaluation, expressed as:

to optimize the fluency rewarding mechanism in training, a defined loss function L_flu The following are provided:

wherein q_t Is the t-th marker of the generated question q,is a predictive question->Is the t-th mark in the question generator, which is the vocabulary distribution P specified from the decoder of the question generator_QG (q_t |p,q_<t ) Sampling in the above; t indicates problem of prediction->Together with T tags, alpha_flu Is a predefined negative confusion, which is used as a benchmark reward in reinforcement learning algorithms for stabilizing the training process;

step three, a data filtering method based on target performance rewards; manually selecting high-quality data from the high-quality synthesized data obtained in the step one as training data of the data value estimator model, recording information including whether the answer is correct or not and the time used when the paragraph retrieval model answers the questions, and transmitting the information as feedback to the data value estimator model; using the directly fed back information of the paragraph retrieval model to adjust the parameters of the data value evaluator model;

the method for estimating the value of the data by the data value estimator model is to connect the questions, corresponding answer paragraphs and context as the input of the data value estimator model and encode the sequence by using BERT;

h＝BERT[<CLS>q<ANS>p<sep>c] (4)；

wherein p, q, c represent answer paragraphs, questions, and contexts, respectively; h E R^H Representing slave'<CLS>"hidden representation of tag derived input sequence;<ANS>、<sep>is a special marker used as a separator.

2. The method for retrieving paragraphs based on the refrigerator field of claim 1, wherein the basic model is a BART model.

3. The device is characterized by comprising a model training module based on transfer learning of a cross training method, a problem generating model module based on a fluency rewarding mechanism and a data filtering module based on target performance rewarding; the model training module based on the transfer learning of the cross training method runs the method of the step one in the paragraph retrieval method based on the refrigerator field according to any one of claims 1-2; the problem generating model module based on fluency rewards operates the method of the step two in the paragraph retrieving method based on the refrigerator field according to any one of claims 1-2, and the data filtering module based on target performance rewards operates the method of the step three in the paragraph retrieving method based on the refrigerator field according to any one of claims 1-2.

4. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program adapted to be loaded by a processor and to perform the refrigerator domain based paragraph retrieval method according to any of the claims 1-2.