WO2022105122A1

Movatterモバイル変換

Info

Publication number: WO2022105122A1
Application number: PCT/CN2021/090555
Authority: WO
Inventors: 毛经纬
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-11-17
Filing date: 2021-04-28
Publication date: 2022-05-27
Anticipated expiration: 2023-05-17
Also published as: CN112417885A

Abstract

Provided are an answer generation method and apparatus based on artificial intelligence, and a computer device and a medium. The method comprises: acquiring data information, and identifying the data information by using an information identification model corresponding to the information type of the data information, so as to obtain text (S11); when the length of the text is greater than a pre-set length threshold value, performing segmentation processing on the text according to a pre-set sliding window, so as to obtain a plurality of text segments (S12); receiving query information, and generating, according to the query information and a pre-set question template, a query question corresponding to the query information (S13); splicing each text segment and the query question, so as to obtain input information, and inputting the input information into a BERT model, so as to obtain a feature vector corresponding to each text character in the input information (S14); and determining, in the text segments and according to a pre-set calculation formula and the feature vector, a question answer corresponding to the query question. Therefore, the question answering accuracy of a question answering system can be improved.

Description

Translated fromChinese

基于人工智能的答案生成方法、装置、计算机设备及介质Answer generation method, device, computer equipment and medium based on artificial intelligence

本申请要求于2020年11月17日提交中国专利局、申请号为202011288043.6，发明名称为“基于人工智能的答案生成方法、装置、计算机设备及介质”的中国专利申请的优先权，其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on November 17, 2020 with the application number 202011288043.6 and the title of the invention is "An answer generation method, device, computer equipment and medium based on artificial intelligence", the entire content of which is Incorporated herein by reference.

技术领域technical field

本申请涉及人工智能技术领域，具体涉及一种基于人工智能的答案生成方法、装置、计算机设备及介质。The present application relates to the technical field of artificial intelligence, and in particular to an answer generation method, apparatus, computer equipment and medium based on artificial intelligence.

背景技术Background technique

问答系统(Question Answering System，QA)是信息检索系统的一种高级形式，其融合了信息检索、信息抽取以及自然语言处理等技术，它能用准确、简洁的自然语言回答用户用自然语言提出的问题。通常问答系统是在获取到的数据中进行信息抽取，以获取问题对应的答案，但是由于汉语表达方式灵活，具有相同语义句子的关键词的出现的位置也不定，因此容易抽取到错误的答案。Question Answering System (QA) is an advanced form of information retrieval system, which integrates information retrieval, information extraction and natural language processing technologies. question. Usually, the question answering system extracts information from the acquired data to obtain the answer corresponding to the question. However, due to the flexible expression in Chinese, the position of the keywords with the same semantic sentence is also uncertain, so it is easy to extract the wrong answer.

发明人在实现本申请的过程中发现，信息抽取通常是基于命名实体识别模型，由于命名实体识别模型对输入文本的限制，可能造成信息的缺失，导致难以抽取完整的数据信息，所以容易导致信息抽取的准确率不高，从而导致问答系统回答问题的准确率较低。In the process of realizing this application, the inventor found that information extraction is usually based on the named entity recognition model. Due to the restriction of the named entity recognition model on the input text, information may be missing, making it difficult to extract complete data information, so it is easy to cause information The accuracy of the extraction is not high, resulting in a low accuracy rate of the question answering system in answering questions.

发明内容SUMMARY OF THE INVENTION

鉴于以上内容，有必要提出一种基于人工智能的答案生成方法、装置、计算机设备及介质，能够提高问答系统回答问题的准确率。In view of the above content, it is necessary to propose an answer generation method, device, computer equipment and medium based on artificial intelligence, which can improve the accuracy of question answering system answering questions.

本申请的第一方面提供一种基于人工智能的答案生成方法，所述方法包括：A first aspect of the present application provides an artificial intelligence-based answer generation method, the method comprising:

获取数据信息，并使用与所述数据信息的信息类型对应的信息识别模型识别所述数据信息得到文本；Acquire data information, and use an information identification model corresponding to the information type of the data information to identify the data information to obtain text;

当所述文本的长度大于预设长度阈值时，根据预设的滑动窗口对所述文本进行切分处理，得到多个文本片段；When the length of the text is greater than a preset length threshold, the text is segmented according to a preset sliding window to obtain a plurality of text fragments;

接收查询信息，并根据所述查询信息和预设的问题模板生成所述查询信息对应的查询问题；receiving query information, and generating a query question corresponding to the query information according to the query information and a preset question template;

将每个所述文本片段和所述查询问题进行拼接得到输入信息，并将所述输入信息输入预先训练的BERT模型得到所述输入信息中每个文本字符对应的特征向量；The input information is obtained by splicing each of the text fragments and the query question, and the input information is input into a pre-trained BERT model to obtain a feature vector corresponding to each text character in the input information;

根据预设的第一概率计算公式和所述输入信息中每个文本字符对应的特征向量计算每个文本字符对应的第一起始概率，并根据预设的第二概率计算公式和所述输入信息中每个文本字符对应的特征向量计算每个文本字符对应的第二结束概率每个文本字符每个文本字符每个文本字符；The first starting probability corresponding to each text character is calculated according to the preset first probability calculation formula and the feature vector corresponding to each text character in the input information, and the first starting probability corresponding to each text character is calculated according to the preset second probability calculation formula and the input information Calculate the second end probability corresponding to each text character in the feature vector corresponding to each text character in each text character each text character each text character;

根据预设的第三概率计算公式计算所述输入信息的错误概率；Calculate the error probability of the input information according to a preset third probability calculation formula;

当所述错误概率不超过预设的错误概率阈值时，根据多个所述文本字符对应的起始概率和多个所述文本字符对应的结束概率在所述文本片段中确定所述查询问题对应的问题答案。When the error probability does not exceed a preset error probability threshold, determine the corresponding query question in the text segment according to the start probability corresponding to the plurality of text characters and the end probability corresponding to the plurality of text characters answer to the question.

本申请的第二方面提供一种基于人工智能的答案生成装置，所述装置包括：A second aspect of the present application provides an artificial intelligence-based answer generation device, the device comprising:

文本生成模块，用于获取数据信息，并使用与所述数据信息的信息类型对应的信息识别模型识别所述数据信息得到文本；a text generation module for acquiring data information, and using an information identification model corresponding to the information type of the data information to identify the data information to obtain text;

切分处理模块，用于当所述文本的长度大于预设长度阈值时，根据预设的滑动窗口对所述文本进行切分处理，得到多个文本片段；a segmentation processing module, configured to perform segmentation processing on the text according to a preset sliding window when the length of the text is greater than a preset length threshold to obtain a plurality of text fragments;

问题生成模块，用于接收查询信息，并根据所述查询信息和预设的问题模板生成所述查询信息对应的查询问题；a question generation module, configured to receive query information, and generate a query question corresponding to the query information according to the query information and a preset question template;

向量确定模块，用于将每个所述文本片段和所述查询问题进行拼接得到输入信息，并将所述输入信息输入预先训练的BERT模型得到所述输入信息中每个文本字符对应的特征向量；A vector determination module for splicing each of the text fragments and the query question to obtain input information, and inputting the input information into a pre-trained BERT model to obtain a feature vector corresponding to each text character in the input information ;

概率计算模块，用于根据预设的第一概率计算公式和所述输入信息中每个文本字符对应的特征向量，计算每个文本字符对应的起始概率，并根据预设的第二概率计算公式和输入信息中每个文本字符对应的特征向量，计算每个文本字符对应的结束概率；The probability calculation module is used to calculate the starting probability corresponding to each text character according to the preset first probability calculation formula and the feature vector corresponding to each text character in the input information, and calculate according to the preset second probability Formula and the feature vector corresponding to each text character in the input information, and calculate the end probability corresponding to each text character;

所述概率计算模块，还用于根据预设的第三概率计算公式计算所述输入信息的错误概率；The probability calculation module is further configured to calculate the error probability of the input information according to a preset third probability calculation formula;

答案生成模块，用于当所述错误概率不超过预设的错误概率阈值时，根据多个所述文本字符对应的起始概率和多个所述文本字符对应的结束概率在所述文本片段中确定所述查询问题对应的问题答案。The answer generation module is configured to, when the error probability does not exceed a preset error probability threshold, generate a value in the text segment according to the start probability corresponding to a plurality of the text characters and the end probability corresponding to the plurality of text characters A question answer corresponding to the query question is determined.

本申请的第三方面提供一种计算机设备，所述计算机设备包括处理器，所述处理器用于执行存储器中存储的计算机可读指令时实现以下步骤：A third aspect of the present application provides a computer device, the computer device includes a processor, and the processor is configured to implement the following steps when executing computer-readable instructions stored in a memory:

将每个所述文本片段和所述查询问题进行拼接得到输入信息，并将所述输入信息输入预先训练的BERT模型，得到所述输入信息中每个文本字符对应的特征向量；The input information is obtained by splicing each of the text fragments and the query question, and the input information is input into a pre-trained BERT model to obtain a feature vector corresponding to each text character in the input information;

根据预设的第一概率计算公式和所述输入信息中每个文本字符对应的特征向量计算每个文本字符对应的起始概率，并根据预设的第二概率计算公式和所述输入信息中每个文本字符对应的特征向量计算每个文本字符对应的结束概率；The starting probability corresponding to each text character is calculated according to the preset first probability calculation formula and the feature vector corresponding to each text character in the input information, and the starting probability corresponding to each text character is calculated according to the preset second probability calculation formula and the input information. The feature vector corresponding to each text character calculates the end probability corresponding to each text character;

当所述错误概率不超过预设的错误概率阈值时，根据多个所述文本字符对应的起始概率和多个所述文本字符对应的结束概率在所述多个文本片段中确定所述查询问题对应的问题答案。When the error probability does not exceed a preset error probability threshold, the query is determined in the plurality of text segments according to the start probability corresponding to the plurality of text characters and the end probability corresponding to the plurality of text characters The corresponding answer to the question.

本申请的第四方面提供一种计算机可读存储介质，所述计算机可读存储介质上存储有计算机可读指令，所述计算机可读指令被处理器执行时实现以下步骤：A fourth aspect of the present application provides a computer-readable storage medium, where computer-readable instructions are stored thereon, and when the computer-readable instructions are executed by a processor, the following steps are implemented:

综上所述，本申请所述的基于人工智能的答案生成方法、装置、计算机设备及存储介质，通过根据数据信息的信息类型对数据信息进行处理得到文本片段，如采用与所述数据信息的信息类型对应的信息识别模型识别所述数据信息得到文本，对文本进行切分得到文本片段，其中数据信息的信息类型有多种，不局限于文字类型，提高问答系统的使用范围，同时对数据信息进行预处理可以得到更准确的文本片段，从而提高问题答案的准确率；接着将获取的查询信息和预设的问题模板生成所述查询信息对应的查询问题，避免因查询信息不清晰影响问题答案生成的情况发生，从而提高问题答案的准确率；将所述文本片段和所述查询问题进行拼接得到输入信息，并将所述输入信息输入预先训练的BERT模型得到所述输入信息中每个文本字符对应的特征向量，通过将所述文本片段和所述查询问题进行拼接得到输入信息实现BERT模型的单输入，可提高BERT模型的处理速度，从而提高问题答案生成的速率，同时将文本片段和查询问题进行拼接可以使得特征之间进行更全面的交互，提高BERT模型确定全文语义信息的准确率，从而提高每个文本字符对应的特征向量的准确率，进一步提高问题答案生成的准确率；然后根据预设的第一概率计算公式和所述输入信息中每个文本字符对应的特征向量，计算每个文本字符对应的起始概率；并根据预设的第二概率计算公式和输入信息中每个文本字符对应的特征向量，计算每个所述文本字符对应的结束概率；根据预设的第三概率计算公式计算所述输入信息的错误概率；最后当所述错误概率不超过预设的错误概率阈值时，根据所述多个文本字符对应的起始概率和所述多个文本字符对应的结束概率在所述文本片段中确定所述查询问题对应的问题答案，通过预设的错误概率阈值，在错误概率超过预设的错误概率阈值时确定文本片段中不包括问题答案，可避免在不包括问题答案的文本片段中确定出错误的问题答案，进一步提高了问答系统回答问题的准确率。To sum up, the artificial intelligence-based answer generation method, device, computer equipment and storage medium described in this application obtain text fragments by processing the data information according to the information type of the data information. The information recognition model corresponding to the information type recognizes the data information to obtain text, and divides the text to obtain text fragments. There are many types of data information, not limited to text types. The information can be preprocessed to obtain more accurate text fragments, thereby improving the accuracy of the question answer; then, the obtained query information and the preset question template are used to generate the query question corresponding to the query information, so as to avoid the problem caused by unclear query information. Answer generation occurs, thereby improving the accuracy of the answer to the question; the input information is obtained by splicing the text fragment and the query question, and the input information is input into the pre-trained BERT model to obtain each of the input information. The feature vector corresponding to the text character, the input information is obtained by splicing the text fragment and the query question to realize the single input of the BERT model, which can improve the processing speed of the BERT model, thereby improving the rate of question answer generation. Splicing with the query question can make the features more comprehensively interact, and improve the accuracy of the BERT model to determine the full text semantic information, thereby improving the accuracy of the feature vector corresponding to each text character, and further improving the accuracy of question answer generation; Then, according to the preset first probability calculation formula and the feature vector corresponding to each text character in the input information, the starting probability corresponding to each text character is calculated; and according to the preset second probability calculation formula and the input information the feature vector corresponding to each text character, calculate the end probability corresponding to each text character; calculate the error probability of the input information according to the preset third probability calculation formula; finally, when the error probability does not exceed the preset When the error probability threshold is set, the question answer corresponding to the query question is determined in the text segment according to the start probability corresponding to the multiple text characters and the end probability corresponding to the multiple text characters, and the preset error probability Threshold, when the error probability exceeds the preset error probability threshold, it is determined that the question answer is not included in the text segment, which can avoid determining the wrong question answer in the text segment that does not include the question answer, and further improve the question answering system. The accuracy rate of answering questions .

附图说明Description of drawings

图1是本申请实施例一提供的基于人工智能的答案生成方法的流程示意图。FIG. 1 is a schematic flowchart of an answer generation method based on artificial intelligence provided in Embodiment 1 of the present application.

图2是本申请实施例二提供的基于人工智能的答案生成装置的示意性框图。FIG. 2 is a schematic block diagram of an apparatus for generating an answer based on artificial intelligence provided in Embodiment 2 of the present application.

图3是本申请实施例三提供的计算机设备的结构示意图。FIG. 3 is a schematic structural diagram of a computer device provided in Embodiment 3 of the present application.

具体实施方式Detailed ways

为了能够更清楚地理解本申请的上述目的、特征和优点，下面结合附图和具体实施例对本申请进行详细描述。需要说明的是，在不冲突的情况下，本申请的实施例及实施例中的特征可以相互组合。In order to more clearly understand the above objects, features and advantages of the present application, the present application will be described in detail below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments of the present application and the features in the embodiments may be combined with each other in the case of no conflict.

除非另有定义，本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中在本申请的说明书中所使用的术语只是为了描述具体的实施例的目的，不是旨在于限制本申请。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. The terms used herein in the specification of the application are for the purpose of describing specific embodiments only, and are not intended to limit the application.

本申请实施例提供的基于人工智能的答案生成方法可应用于终端设备或服务器中，该终端设备可以手机、平板电脑、笔记本电脑、台式电脑、个人数字助理和穿戴式设备等电子设备，该服务器可以为单台的服务器，也可以为由多台服务器组成的服务器集群。以下以该基于人工智能的答案生成方法应用于服务器为例进行解释说明。The artificial intelligence-based answer generation method provided in the embodiment of the present application can be applied to a terminal device or a server. It can be a single server or a server cluster composed of multiple servers. The following takes the artificial intelligence-based answer generation method applied to the server as an example for explanation.

图1是本申请实施例一提供的基于人工智能的答案生成方法的流程图。所述基于人工智能的答案生成方法具体包括以下步骤，根据不同的需求，该流程图中步骤的顺序可以改变，某些可以省略。FIG. 1 is a flowchart of an answer generation method based on artificial intelligence provided in Embodiment 1 of the present application. The artificial intelligence-based answer generation method specifically includes the following steps. According to different requirements, the order of the steps in the flowchart can be changed, and some can be omitted.

S11、获取数据信息，并使用与所述数据信息的信息类型对应的信息识别模型识别所述数据信息得到文本。S11. Acquire data information, and use an information identification model corresponding to the information type of the data information to identify the data information to obtain text.

其中，所述信息类型可以为文字类型，也可以为图片类型，还可以是语音类型，本申请实施例不做限定。当所述数据信息的信息类型为非文字类型时，将所述数据信息的信息类型转换为文字类型，并根据转换后的数据信息得到文本。例如，当所述数据信息的信息类型为图片类型时，将所述数据信息输入预设的图像识别模型，得到所述数据信息对应的文本信息，并根据所述数据信息对应的文本信息得到文本；当所述数据信息的信息类型为语音类型时，将所述数据类型输入预设的语音识别模型，得到所述数据信息对应的文本信息，并根据所述数据信息对应的文本信息得到文本。数据信息的信息类型有多种，不局限于文字类型，提高问答系统的使用范围。The information type may be a text type, a picture type, or a voice type, which is not limited in this embodiment of the present application. When the information type of the data information is a non-text type, the information type of the data information is converted into a text type, and text is obtained according to the converted data information. For example, when the information type of the data information is a picture type, input the data information into a preset image recognition model to obtain text information corresponding to the data information, and obtain text according to the text information corresponding to the data information. When the information type of the data information is a voice type, input the data type into a preset speech recognition model to obtain text information corresponding to the data information, and obtain text according to the text information corresponding to the data information. There are many types of data information, not limited to text types, which improves the use of the question answering system.

在一实施方式中，当所述数据信息为语音类型时，在所述获取数据信息之后，所述方法具体还包括：In one embodiment, when the data information is of a voice type, after the acquiring the data information, the method specifically further includes:

识别所述数据信息所属的区域口音类型；基于所述区域口音类型对所述数据信息进行语音纠正处理；Identifying the regional accent type to which the data information belongs; performing speech correction processing on the data information based on the regional accent type;

所述使用与所述数据信息的信息类型对应的信息识别模型识别所述数据信息得到文本包括：The text obtained by identifying the data information using an information identification model corresponding to the information type of the data information includes:

将处理后的数据信息输入预设的语音识别模型得到所述数据信息对应的文本信息；通过所述数据信息对应的文本信息得到文本。Input the processed data information into a preset speech recognition model to obtain text information corresponding to the data information; and obtain text through the text information corresponding to the data information.

不同区域的用户在输入语音时，会自带本区域的口音，由于口音的区域化，会导致识别困难，容易造成语音识别的错误。基于所述区域口音类型，对所述数据信息进行语音纠正处理，可以获得符合标准的普通话，进而可以更准确的识别出数据信息的内容，得到更准确的文本，从而提高答案生成的准确率。Users in different regions will bring their own accents in their own regions when inputting speech. Due to the regionalization of accents, it will be difficult to recognize and easily lead to errors in speech recognition. Based on the regional accent type, voice correction processing is performed on the data information, and standard Mandarin can be obtained, so that the content of the data information can be more accurately identified, and more accurate text can be obtained, thereby improving the accuracy of answer generation.

在一实施方式中，在所述使用与所述数据信息的信息类型对应的信息识别模型识别所述数据信息得到文本之后，所述方法具体还包括以下步骤：In one embodiment, after the text is obtained by identifying the data information using an information identification model corresponding to the information type of the data information, the method further includes the following steps:

判断所述文本中是否存在错误文字；determine whether there is any wrong word in the text;

若所述文本中存在错误文字，从预设词库中查找与所述错误文字的相似度大于预设的相似度阈值的词，得到候选词组；If there is an erroneous character in the text, search for a word whose similarity with the erroneous character is greater than a preset similarity threshold from a preset thesaurus to obtain a candidate phrase;

根据编辑距离算法在所述候选词组中确定第一文字；determining the first character in the candidate phrase according to an edit distance algorithm;

使用所述第一文字替换所述错误文字，并根据替换后的文本得到新的文本。The wrong text is replaced with the first text, and a new text is obtained according to the replaced text.

在该可选的实施方式中，当文本出现单词拼写错误、同音字时，问答系统可以自动识别纠错。汉字的纠错使用编辑距离算法实现，首先在预设词库(通常是与该文本同一行业领域的词库)中查找词相似度以及拼音相似度较高的词作为候选词，以缩小编辑距离计算范围，之后计算各个候选词与需要纠错词的编辑距离，取编辑距离最小的候选词，如果编辑距离值超过设置的纠错阀值，则作为结果返回。In this optional implementation, when there are spelling mistakes and homophones in the text, the question answering system can automatically identify and correct the errors. The error correction of Chinese characters is realized by the edit distance algorithm. First, the word similarity and pinyin similarity are searched in the preset thesaurus (usually the thesaurus of the same industry field as the text) as candidate words to reduce the edit distance. Calculate the range, and then calculate the edit distance between each candidate word and the word that needs to be corrected. Take the candidate word with the smallest edit distance. If the edit distance value exceeds the set error correction threshold, it will be returned as the result.

通过根据所述数据信息的信息类型对所述数据信息进行预处理得到文本，数据信息的信息类型有多种，不局限于文字类型，提高问答系统的使用范围，同时对数据信息进行预处理可以得到更准确的文本片段，从而提高问题答案的准确率。The text is obtained by preprocessing the data information according to the information type of the data information. There are many types of data information, not limited to the text type, which improves the application range of the question answering system. Get more accurate text snippets to improve the accuracy of your question answers.

需要说明的是，为了保证上述数据信息过程中的数据的私密性和安全性，所述处理过程中的信息数据，比如获取到的数据信息、识别所述数据信息得到的文本等可存储于区块链中。It should be noted that, in order to ensure the privacy and security of the data in the above data information process, the information data in the processing process, such as the acquired data information, the text obtained by identifying the data information, etc., can be stored in the area. in the blockchain.

S12、当所述文本的长度大于预设长度阈值时，根据预设的滑动窗口对所述文本进行切分处理，得到多个文本片段。S12. When the length of the text is greater than a preset length threshold, perform segmentation processing on the text according to a preset sliding window to obtain multiple text fragments.

其中，所述预设长度阈值为BERT模型所能够识别的最大文本长度。The preset length threshold is the maximum text length that the BERT model can recognize.

在一实施方式中，所述根据预设的滑动窗口对所述文本进行切分处理，得到多个文本片段，具体包括基于人工智能的以下步骤：In one embodiment, the process of segmenting the text according to a preset sliding window to obtain a plurality of text fragments specifically includes the following steps based on artificial intelligence:

从所述文本的第一个文本字符开始无重叠滑动所述滑动窗口，并在每次滑动之后判断是否满足滑动结束条件；Sliding the sliding window without overlapping from the first text character of the text, and judging whether the sliding end condition is satisfied after each sliding;

当确定满足滑动结束条件时，停止所述滑动窗口的滑动，并将每次滑动时所述滑动窗口在所述文本中的开始位置和结束位置确定为字符切分节点；When it is determined that the sliding end condition is met, the sliding of the sliding window is stopped, and the start position and the end position of the sliding window in the text each time the sliding window is slid is determined as a character segmentation node;

从每个所述字符切分节点开始，从所述文本中切分出所述预设长度阈值的文本字符，得到多个文本片段。Starting from each of the character segmentation nodes, the text characters of the preset length threshold are segmented from the text to obtain a plurality of text segments.

其中，当每次滑动之后所述滑动窗口在所述文本中的结束位置与所述文本的结束位置之间的差值小于或者等于所述长度阈值时，确定满足所述预设滑动结束条件；当每次滑动之后所述滑动窗口在所述文本中的结束位置与所述文本的结束位置之间的差值大于所述长度阈值时，确定未满足所述预设滑动结束条件。Wherein, when the difference between the end position of the sliding window in the text and the end position of the text after each sliding is less than or equal to the length threshold, it is determined that the preset sliding end condition is satisfied; When the difference between the end position of the sliding window in the text and the end position of the text after each sliding is greater than the length threshold, it is determined that the preset sliding end condition is not met.

示例性的，BERT模型允许输入的最大文本长度，即文本长度阈值，记为m。当文本T的长度L大于所设定的最大文本长度m，那么需要将文本T进行切分，如按照预先设定的滑动窗口d对文本T进行切分得到多个文本片段。例如，BERT模型的文本预设长度阈值m为500，预设的滑动窗口为40，现有一文本T的长度L为600，如文本T＝[t₁，t₂，...，t₆₀₀]，文本T的长度大于文本长度阈值，按照滑动窗口d对文本T进行切分，得到长度分别为500，500，500，480的4个文本片段，如[t₁，t₂，...，t₅₀₀]，，[t₄₁，t₄₂，...，t₅₄₀]，[t₈₁，t₈₂，...，t₅₈₀]，[t₁₂₁，t₁₂₂，...，t₆₀₀]。Exemplarily, the maximum text length allowed by the BERT model, that is, the text length threshold, is denoted as m. When the length L of the text T is greater than the set maximum text length m, the text T needs to be segmented, for example, the text T is segmented according to the preset sliding window d to obtain multiple text segments. For example, the preset text length threshold m of the BERT model is 500, the preset sliding window is 40, and the length L of an existing text T is 600, such as text T=[t₁ , t₂ ,...,t₆₀₀ ] , the length of the text T is greater than the text length threshold, segment the text T according to the sliding window d, and obtain 4 text segments with lengths of 500, 500, 500, 480, such as [t₁ , t₂ , ..., t₅₀₀ ], [t₄₁ , t₄₂ , ..., t₅₄₀ ], [t₈₁ , t₈₂ , ..., t₅₈₀ ], [t₁₂₁ , t₁₂₂ , ..., t₆₀₀ ].

根据BERT模型的文本预设长度阈值对文本进行切分处理，避免因文本过长导致文本信息读取不全的情况发生，进一步提高了答案生成的准确率。According to the text preset length threshold of the BERT model, the text is segmented and processed, so as to avoid the incomplete reading of text information due to too long text, and further improve the accuracy of answer generation.

S13、接收查询信息，并根据所述查询信息和预设的问题模板生成所述查询信息对应的查询问题。S13. Receive query information, and generate a query question corresponding to the query information according to the query information and a preset question template.

示例性的，用户在问答系统的搜索页面上输入需要查询的查询信息，问答系统获取该查询信息，并根据该查询信息的内容和预先设置的问题模板生成所述查询信息对应的查询问题。例如用户输入需要查询的查询信息为“小李的出院日期”，根据预先设置的问题模板和该查询信息生成该查询信息对应的查询问题为：小李的出院日期是什么。Exemplarily, a user inputs query information to be queried on a search page of the question answering system, the question answering system obtains the query information, and generates a query question corresponding to the query information according to the content of the query information and a preset question template. For example, the user inputs the query information to be queried as "Xiao Li's hospital discharge date", and the query question corresponding to the query information is generated according to the preset question template and the query information: what is Xiao Li's hospital discharge date.

在一实施方式中，所述查询信息包括至少一个查询句子，所述根据所述查询信息和预设的问题模板生成所述查询信息对应的查询问题包括：In one embodiment, the query information includes at least one query sentence, and the generating a query question corresponding to the query information according to the query information and a preset question template includes:

对各所述查询句子进行句法分析和命名实体识别，得到每个查询句子对应的语法树；Perform syntactic analysis and named entity recognition on each of the query sentences to obtain a syntax tree corresponding to each query sentence;

将各所述查询句子对应的语法树与预先建立的问题模板数据库中的问题模板进行匹配；Matching the syntax tree corresponding to each described query sentence with the question template in the pre-established question template database;

当所述问题模板数据库中存在一问题模板与一语法树相匹配时，将所述语法树对应的查询句子转换为基于所述语法树相匹配的问题模板的疑问句，得到查询问题。When a question template matches a syntax tree in the question template database, the query sentence corresponding to the syntax tree is converted into a question sentence based on the question template matched by the syntax tree to obtain a query question.

示例性的，所述问题模板可包括以下几种，其中，数据库中的问题模板的格式可如下：Exemplarily, the question template may include the following types, wherein the format of the question template in the database may be as follows:

表示“多少”的问题模板：QP＜CD＝number＜CLP；Question template representing "how many": QP<CD=number<CLP;

表示“第几”的问题模板：QP＜OD＝number；Question template representing "number": QP<OD=number;

其中的符号均来自斯坦福自然语言实验室对语法树中存在的成分的定义。当某个问题模板在当前查询句子的语法树上匹配成功时，就利用该问题模板将当前的查询句子改写成基于该问题模板的疑问句，从而就生成了查询问题。The symbols in it all come from the Stanford Natural Language Laboratory's definition of the components that exist in the syntax tree. When a question template is successfully matched on the syntax tree of the current query sentence, the question template is used to rewrite the current query sentence into a question sentence based on the question template, thereby generating a query question.

其中，对于所述问题模板数据库，是从大量的文章数据中，学习语言规则而得到大量的问题模板，构成所述问题模板数据库。对于所述查询内容中的每一个查询句子，都利用各自的语法树去匹配问题模板数据库中的问题模板，一旦匹配成功，就利用与之匹配的问题模板直接将查询句子转化为相应的疑问句，从而生成相应的查询问题。当句子与当前的问题模板数据库中的问题模板都不匹配时，即该查询句子不能生成问题；通过批量统计不能生成查询问题的句子，制定新的问题模板，更新到所述问题模板数据库。在此仅举例说明问题模板和问题模板数据库，本实施例对问题模板和问题模板数据库不做任何限制。Wherein, for the question template database, a large number of question templates are obtained by learning language rules from a large amount of article data, and the question template database is constituted. For each query sentence in the query content, the respective syntax tree is used to match the question template in the question template database. Once the matching is successful, the query sentence is directly converted into a corresponding question sentence by using the matching question template, Thereby generating corresponding query questions. When the sentence does not match the question template in the current question template database, that is, the query sentence cannot generate a question; by batch counting the sentences that cannot generate the query question, a new question template is formulated and updated to the question template database. Only the question template and the question template database are exemplified here, and this embodiment does not impose any limitation on the question template and the question template database.

在一实施方式中，所述对各所述查询句子进行句法分析和命名实体识别，得到每个查询句子对应的语法树包括：In one embodiment, performing syntactic analysis and named entity recognition on each of the query sentences to obtain a syntax tree corresponding to each query sentence includes:

对各所述查询句子进行词分割，得到多个查询词；Perform word segmentation on each of the query sentences to obtain a plurality of query words;

对所述多个查询词进行词性标注，得到各所述查询词对应的词性标注标签；Perform part-of-speech tagging on the plurality of query words to obtain part-of-speech tagging labels corresponding to each of the query words;

对所述多个查询词进行命名实体识别，确定所述多个查询词中的命名实体词；Performing named entity recognition on the plurality of query words, and determining the named entity words in the plurality of query words;

根据所述多个查询词对应的词性标注标签和所述命名实体词得到各所述查询句子对应的语法树。A syntax tree corresponding to each of the query sentences is obtained according to the part-of-speech tags corresponding to the plurality of query words and the named entity words.

示例性的，对查询句子进行句法分析和命名实体识别的过程可以包括以下步骤：先对查询句子进行词分割得到多个查询词，然后根据句法分析中用于表示时间名词的符号对每个查询词进行词性标注得到各所述查询词对应的词性标注标签，并对每个查询词进行命名实体的识别，比如是人名、机构名、地名还是其它以名称为标识的实体，从而确定所述多个查询词中的命名实体词。在对查询句子完成句法分析和命名实体识别后，就可以根据该查询句子中各查询词对应的词性标注标签和该查询句子中的命名实体词对该查询句子建立相应的语法树。Exemplarily, the process of performing syntactic analysis and named entity recognition on the query sentence may include the following steps: firstly performing word segmentation on the query sentence to obtain a plurality of query words, and then analyzing each query according to the symbols used to represent time nouns in the syntactic analysis. The part-of-speech tagging of each query word is performed to obtain the part-of-speech tagging label corresponding to each query word, and each query word is identified as a named entity, such as a person's name, an institution name, a place name or other entity identified by a name, so as to determine the number of the query words. named entity words in a query term. After completing the syntax analysis and named entity recognition on the query sentence, a corresponding syntax tree can be established for the query sentence according to the part-of-speech tags corresponding to each query word in the query sentence and the named entity words in the query sentence.

例如，一查询句子为“2020年因冠状病毒大势蔓延”，对该查询句子进行词的分割，得到多个查询词，如“2020年|因|冠状|病毒|大势|蔓延”其中用符号“|”表示分割，然后根据句法分析中用于表示时间名词的符号对每个查询词进行加注词性标注标签，例如，“NT”在句法分析中表示常用名词，因此“2020年”被标注为“NT”；并对每个查询词进行命名实体的识别，确定所述多个查询词中的命名实体词，如将2020年确定为以时间为标识的命名实体词、将冠状、病毒确定为以名称为标识的命名实体词。最后根据查询句子中多个查询词对应的词性标注标签和所述命名实体词得到该查询句子对应的语法树。利用所述查询句子对应的语法树与预先建立的问题模板数据库中的问题模板进行匹配生成查询句子对应的查询问题，可以提高查询问题生成的准确率，从而提高问题答案的准确率。For example, a query sentence is "2020 due to the spread of coronavirus", and the query sentence is divided into words to obtain multiple query words, such as "2020 | due to | coronavirus | virus | general trend | spread" in which the symbol " |" means segmentation, and then each query word is annotated with a part-of-speech tag according to the symbols used to represent temporal nouns in syntactic analysis, for example, "NT" means common nouns in syntactic analysis, so "2020" is labeled as "NT"; and identify the named entity for each query word, and determine the named entity word in the multiple query words, for example, determine 2020 as the named entity word identified by time, determine the coronavirus and virus as the named entity word Named entity word identified by name. Finally, a grammar tree corresponding to the query sentence is obtained according to the part-of-speech tags corresponding to the plurality of query words in the query sentence and the named entity words. The query question corresponding to the query sentence is generated by matching the syntax tree corresponding to the query sentence with the question template in the pre-established question template database, which can improve the accuracy of query question generation and thus the accuracy of the question answer.

通过根据所述查询信息和预设的问题模板生成所述查询信息对应的查询问题，避免因查询信息不清晰影响问题答案生成的情况发生，从而提高问题答案的准确率。By generating a query question corresponding to the query information according to the query information and a preset question template, the generation of the question answer is avoided due to unclear query information, thereby improving the accuracy of the question answer.

S14、将每个所述文本片段和所述查询问题进行拼接得到输入信息，并将所述输入信息输入预先训练的BERT模型得到所述输入信息中每个文本字符对应的特征向量。S14, splicing each of the text fragments and the query question to obtain input information, and inputting the input information into a pre-trained BERT model to obtain a feature vector corresponding to each text character in the input information.

示例性的，所述BERT模型的训练任务可以包括掩码语言模型(Masked Language Model，MLM)任务和/或下一句预测(Next Sentence Prediction，NSP)任务，其中所述MLM任务用于对训练文本片段中的预设比例的词进行掩码处理并预测被掩码处理的词，上述预设比例可以根据实际情况进行合理设置，例如，上述预设比例可以为15％、20％等；所述NSP任务用于预测句子对关系，如判断句子B是否是句子A的下文。通过使用预先训练的BERT模型得到所述输入信息中每个文本字符对应的特征向量，由于BERT模型融合全文语义信息的能力，可以提高每个文本字符对应的特征向量的准确率，从而提高问题答案的准确率。Exemplarily, the training task of the BERT model may include a masked language model (Masked Language Model, MLM) task and/or a next sentence prediction (Next Sentence Prediction, NSP) task, wherein the MLM task is used for training text. The preset proportion of words in the segment is masked and the words to be masked are predicted. The above preset proportion can be set reasonably according to the actual situation, for example, the above preset proportion can be 15%, 20%, etc.; the The NSP task is used to predict sentence-pair relations, such as judging whether sentence B is the context of sentence A. By using the pre-trained BERT model to obtain the feature vector corresponding to each text character in the input information, due to the ability of the BERT model to integrate full-text semantic information, the accuracy of the feature vector corresponding to each text character can be improved, thereby improving the answer to the question 's accuracy.

示例性的，所述将所述文本片段和所述查询问题进行拼接得到输入信息，并将所述输入信息输入预先训练的BERT模型，具体可以包括：通过查询字向量表将所述文本片段和所述查询问题中的每个字符转换为一维向量，将转换为一维向量后的文本片段和转换为一维向量后的查询问题进行拼接，得到输入信息，所述输入信息包括多个文本字符。例如，文本片段为文本w，通过查询字向量表将文本W中的各个字符转换为一维向量，如W＝[w₁，w₂，...，w_n]；查询问题为问题Q，通过查询字向量表将问题Q中的各个字符转换为一维向量，如Q＝[q₁，q₂，...，q_n]，将W＝[w₁，w₂，...，w_n]和Q＝[q₁，q₂，...，q_n]进行拼接得到输入信息，并将所述输入信息输入预先训练的BERT模型。Exemplarily, the input information obtained by splicing the text fragment and the query question, and inputting the input information into a pre-trained BERT model may specifically include: querying the word vector table to combine the text fragment and the query. Each character in the query question is converted into a one-dimensional vector, and the text segment converted into a one-dimensional vector and the query question after being converted into a one-dimensional vector are spliced to obtain input information, and the input information includes a plurality of texts character. For example, if the text fragment is text w, each character in the text W is converted into a one-dimensional vector by querying the word vector table, such as W=[w₁ , w₂ ,...,_wn ]; the query question is question Q, Convert each character in question Q into a one-dimensional vector by querying the word vector table, such as Q=[q₁ , q₂ ,..., q_n ], W=[w₁ , w₂ ,..., w_n ] and Q=[q₁ ,_q₂ , .

示例性的，可在查询问题前面添加专用分类记号，如CLS记号，进行标记；将查询问题和文本段落拼接在一起时，中间使用专用记号，如SEP记号，进行区别标记，如[CLS q₁，q₂，...，q_n SEP w₁，w₂，...，w_n]。Exemplarily, special classification marks, such as CLS marks, can be added in front of the query questions for marking; when splicing query questions and text paragraphs together, special marks, such as SEP marks, are used in the middle for distinguishing marks, such as [CLS q₁ , q₂ , ..., q_n SEP w₁ , w₂ , ..., w_n ].

示例性的，预先训练的BERT模型根据所述输入信息确定所述输入信息对应的全文语义信息，并根据所述全文语义信息对所述输入信息中各字符的向量进行处理，得到所述输入信息中各字符融合全文语义信息后对应的向量表示，即得到输入信息中每个文本字符对应的特征向量，如特征向量V＝[v₁，v₂，...，v_m]。Exemplarily, the pre-trained BERT model determines full-text semantic information corresponding to the input information according to the input information, and processes the vector of each character in the input information according to the full-text semantic information to obtain the input information After each character in the input information is fused with the full text semantic information, the corresponding vector representation is obtained, that is, the feature vector corresponding to each text character in the input information is obtained, such as the feature vector V=[v₁ , v₂ , . . . , v_m ].

通过将文本片段和查询问题进行拼接得到输入信息，实现BERT模型的单输入，可提高BERT模型的处理速度，从而提高问题答案生成的速率。同时将文本片段和查询问题进行拼接可以使得特征之间进行更全面的交互，提高BERT模型确定全文语义信息的准确率，从而提高问题答案生成的准确率。The input information is obtained by splicing text fragments and query questions to realize single input of the BERT model, which can improve the processing speed of the BERT model, thereby increasing the rate of question answer generation. At the same time, splicing text fragments and query questions can make more comprehensive interactions between features, improve the accuracy of BERT model in determining full-text semantic information, and thus improve the accuracy of question answer generation.

在一实施方式中，所述将所述输入信息输入预先训练的BERT模型得到所述输入信息中每个文本字符对应的特征向量之前，还包括：In one embodiment, before the input information is input into the pre-trained BERT model to obtain the feature vector corresponding to each text character in the input information, the method further includes:

确定所述输入信息中每个文字字符对应的初始向量、类别向量和位置向量，其中，所述类别向量用于表示所述每个文字字符对应的内容对象，所述位置向量用于表示所述每个文字字符在所述输入信息中的相对位置；Determine an initial vector, a category vector, and a position vector corresponding to each character character in the input information, wherein the category vector is used to represent the content object corresponding to each character character, and the position vector is used to represent the the relative position of each text character in the input information;

分别将所述输入信息中每个文字字符对应的初始向量、类别向量和位置向量进行叠加，得到所述每个文字字符对应的目标向量；The initial vector, the category vector and the position vector corresponding to each character character in the input information are respectively superimposed to obtain the target vector corresponding to each character character;

所述将所述输入信息输入预先训练的BERT模型得到所述输入信息中每个文本字符对应的特征向量，包括：将所述每个文字字符对应的目标向量输入预先训练的BERT模型，得到所述输入信息中每个文字字符对应的特征向量。The inputting the input information into the pre-trained BERT model to obtain the feature vector corresponding to each text character in the input information includes: inputting the target vector corresponding to each text character into the pre-trained BERT model to obtain the The feature vector corresponding to each text character in the input information.

输入信息中的每个文字字符可以称为一个标记(Token)，上述每个文字字符对应的初始向量也可以称为词嵌入(Token embedding)，可以是指每个文本字符的初始化的向量。上述每个文本字符对应的类别向量也可以称为分割嵌入(Segment embedding)，用于表示每个文字字符对应的内容对象，可用于区分所述输入信息中的查询问题和文本片段，例如文本字符A为所述查询问题中的文本字符，文本字符B为所述文本片段中的文本字符，文本字符A的类别向量是0，文本字符B的类别向量是1。上述每个文本字符对应的位置向量也可以称为位置嵌入(Position Embedding)，用于表示所述每个文本字符在所述输入信息中的相对位置。Each text character in the input information can be called a token (Token), and the initial vector corresponding to each text character above can also be called a word embedding (Token embedding), which can refer to the initialization vector of each text character. The category vector corresponding to each text character above can also be called segment embedding, which is used to represent the content object corresponding to each text character, and can be used to distinguish query questions and text fragments in the input information, such as text characters. A is a text character in the query question, text character B is a text character in the text segment, the category vector of text character A is 0, and the category vector of text character B is 1. The above-mentioned position vector corresponding to each text character may also be called position embedding (Position Embedding), which is used to represent the relative position of each text character in the input information.

通过将所述文本片段和所述查询问题进行拼接得到输入信息实现BERT模型的单输入，可提高BERT模型的处理速度，从而提高问题答案生成的速率。同时将文本片段和查询问题进行拼接可以使得特征之间进行更全面的交互，提高BERT模型确定全文语义信息的准确率，从而提高每个文本字符对应的特征向量的准确率，进一步提高问题答案生成的准确率。The single input of the BERT model is realized by splicing the text fragment and the query question to obtain input information, which can improve the processing speed of the BERT model, thereby increasing the rate of generating the answer to the question. At the same time, splicing text fragments and query questions can make more comprehensive interactions between features, improve the accuracy of BERT model in determining full-text semantic information, thus improve the accuracy of feature vectors corresponding to each text character, and further improve the generation of question answers. 's accuracy.

S15、根据预设的第一概率计算公式和所述输入信息中每个文本字符对应的特征向量，计算每个文本字符对应的起始概率，并根据预设的第二概率计算公式和输入信息中每个文本字符对应的特征向量，计算每个文本字符对应的结束概率。S15. Calculate the starting probability corresponding to each text character according to the preset first probability calculation formula and the feature vector corresponding to each text character in the input information, and calculate the starting probability corresponding to each text character according to the preset second probability calculation formula and the input information The feature vector corresponding to each text character in , and the end probability corresponding to each text character is calculated.

示例性的，所述第一概率计算公式为：Exemplarily, the first probability calculation formula is:

其中，v_s是BERT模型的起始概率参数，v_i是文本片段中第i个字符语义融合后对应的特征向量，v_j是文本片段中全部字符进行语义融合后特征向量的平均值，m是BERT模型允许输入的最大文本长度。Among them, v_s is the initial probability parameter of the BERT model, vi is the feature vector corresponding to the ith character in the text segment after semantic fusion, v_j is the average value of the feature vectors after semantic fusion of all characters in the text segment,_m is the maximum text length allowed by the BERT model.

根据所述第一概率计算公式和所述输入信息中每个文本字符语义融合后对应的特征向量，计算所述文本片段中每个文本字符对应的起始概率，即问题答案的起始概率。例如，文本片段W＝[w₁，w₂，...，w_n]，查询问题Q＝[q₁，q₂，...，q_n]，将查询问题Q和文本片段W进行拼接得到输入信息V，通过BERT模型进行语义融合，得到输入信息V中每个文本字符语义融合后对应的特征向量，如V＝[v₁，v₂，...，v_m]，根据输入信息V中每个文本字符语义融合后对应的特征向量计算所述第一概率计算公式中需要的数据，并将得到的数据带入所述第一概率计算公式计算文本片段W中每个文本字符对应的起始概率，如计算W1对应的起始概率、W2对应的起始概率...Wn对应的起始概率。According to the first probability calculation formula and the feature vector corresponding to the semantic fusion of each text character in the input information, the starting probability corresponding to each text character in the text segment is calculated, that is, the starting probability of the answer to the question. For example, the text segment W=[w₁ , w₂ ,...,_wn ], the query question Q=[q₁ , q₂ ,..., q_n ], the query question Q and the text segment W are concatenated Obtain the input information V, perform semantic fusion through the BERT model, and obtain the corresponding feature vector after semantic fusion of each text character in the input information V, such as V=[v₁ , v₂ , ..., v_m ], according to the input information The feature vector corresponding to each text character in V after semantic fusion calculates the data required in the first probability calculation formula, and brings the obtained data into the first probability calculation formula to calculate the corresponding value of each text character in the text segment W The starting probability of , such as calculating the starting probability corresponding to W1, the starting probability corresponding to W2...the starting probability corresponding to Wn.

示例性的，所述第二概率计算公式为：Exemplarily, the second probability calculation formula is:

其中，v_e是BERT模型的结束概率参数，v_i是文本片段中第i个字符语义融合后对应的特征向量，v_j是文本片段中全部字符进行语义融合后特征向量的平均值，m是BERT模型允许输入的最大文本长度。Among them, v_e is the end probability parameter of the BERT model, v_i is the feature vector corresponding to the ith character in the text segment after semantic fusion, v_j is the average value of the feature vectors after semantic fusion of all characters in the text segment, m is The maximum text length that the BERT model allows for input.

根据所述第二概率计算公式和所述输入信息中每个文本字符语义融合后对应的特征向量，计算所述文本片段中每个文本字符对应的结束概率，即问题答案的结束概率。例如，文本片段W＝[w₁，w₂，...，w_n]，查询问题Q＝[q₁，q₂，...，q_n]，将查询问题Q和文本片段W进行拼接得到输入信息V，通过BERT模型进行语义融合，得到输入信息V中每个文本字符语义融合后对应的特征向量，如V＝[v₁，v₂，...，v_m]，根据输入信息V中每个文本字符语义融合后对应的特征向量计算所述第二概率计算公式中需要的数据，并将得到的数据带入所述第二概率计算公式计算文本片段W中每个文本字符对应的结束概率，如计算W1对应的结束概率、W2对应的结束概率...Wn对应的结束概率。According to the second probability calculation formula and the feature vector corresponding to the semantic fusion of each text character in the input information, the end probability corresponding to each text character in the text segment is calculated, that is, the end probability of the answer to the question. For example, the text segment W=[w₁ , w₂ ,...,_wn ], the query question Q=[q₁ , q₂ ,..., q_n ], the query question Q and the text segment W are concatenated Obtain the input information V, perform semantic fusion through the BERT model, and obtain the corresponding feature vector after semantic fusion of each text character in the input information V, such as V=[v₁ , v₂ , ..., v_m ], according to the input information After the semantic fusion of each text character in V, the corresponding feature vector calculates the data required in the second probability calculation formula, and brings the obtained data into the second probability calculation formula to calculate the corresponding value of each text character in the text segment W The ending probability of , such as calculating the ending probability corresponding to W1, the ending probability corresponding to W2...the ending probability corresponding to Wn.

S16、根据预设的第三概率计算公式计算所述输入信息的错误概率。S16. Calculate the error probability of the input information according to a preset third probability calculation formula.

示例性的，所述第三概率计算公式为：Exemplarily, the third probability calculation formula is:

P_N＝σ(w*v_cls+b)P_N =σ(w*v_cls +b)

其中，σ是sigmoid函数，w是BERT模型中可学习的权值矩阵，b是BERT模型的错误概率参数，v_cls是输入信息中CLS记号对应的特征向量。Among them, σ is the sigmoid function, w is the learnable weight matrix in the BERT model, b is the error probability parameter of the BERT model, and v_cls is the feature vector corresponding to the CLS notation in the input information.

获取输入信息中每个文本字符语义融合后对应的特征向量后，将输入信息V中CLS记号对应的特征向量，代入所述第三概率计算公式计算所述输入信息的错误概率，即计算所述输入信息中所述文本片段中不包含问题答案的概率。After obtaining the feature vector corresponding to the semantic fusion of each text character in the input information, the feature vector corresponding to the CLS symbol in the input information V is substituted into the third probability calculation formula to calculate the error probability of the input information, that is, calculating the The probability that an answer to the question is not contained in the text fragment described in the input message.

S17、当所述错误概率不超过预设的错误概率阈值时，根据多个所述文本字符对应的起始概率和多个所述文本字符对应的结束概率在所述文本片段中确定所述查询问题对应的问题答案。S17. When the error probability does not exceed a preset error probability threshold, determine the query in the text segment according to the start probability corresponding to a plurality of the text characters and the end probability corresponding to the plurality of text characters The corresponding answer to the question.

示例性的，所述错误概率阈值设为0.5，当所述错误概率的值小于或者等于0.5时，确定所述文本片段中存在所述查询问题对应的问题答案，并根据所述多个文本字符对应的起始概率和所述多个文本字符对应的结束概率在所述文本片段中确定所述查询问题对应的问题答案。通过分别计算每个文本字符对应的起始概率和其他所有文本字符对应的结束概率的乘积数值，确定最大的乘积数值，将该最大的乘积数值对应的两个文本字符之间的文本确定为所述查询问题对应的问题答案。例如，文本片段中包括文本字符[w₁，w₂，...，w_n]，计算文本片段中每个文本字符对应的起始概率和其他所有文本字符对应的结束概率的乘积数值，如计算文本字符W1的起始概率与文本字符W2、W3....Wn的结束概率的乘积数值，计算文本字符W2的起始概率与文本字符W3、W4....Wn的结束概率的乘积数值，计算文本字符W3的起始概率与文本字符W4....Wn的结束概率的乘积数值，确定所述乘积数值中的最大值，并将该最大的乘积数值对应的两个文本字符之间的文本确定为所述查询问题对应的问题答案。当文本字符W2的起始概率和文本字符W6的结束概率的乘积数值为最大值时，将文本字符W2和文本字符W6之间的文本确定为查询问题对应的问题答案。Exemplarily, the error probability threshold is set to 0.5, and when the value of the error probability is less than or equal to 0.5, it is determined that there is a question answer corresponding to the query question in the text segment, and according to the plurality of text characters The corresponding starting probability and the ending probability corresponding to the plurality of text characters determine the question answer corresponding to the query question in the text segment. By calculating the product value of the start probability corresponding to each text character and the end probability corresponding to all other text characters respectively, the maximum product value is determined, and the text between the two text characters corresponding to the maximum product value is determined as all the text characters. The answer to the question corresponding to the query question. For example, if the text segment includes text characters [w₁ , w₂ , ..., w_n ], calculate the product value of the start probability corresponding to each text character in the text segment and the end probability corresponding to all other text characters, such as Calculate the value of the product of the start probability of text character W1 and the end probability of text characters W2, W3....Wn, and calculate the product of the start probability of text character W2 and the end probability of text characters W3, W4....Wn Numerical value, calculate the value of the product of the starting probability of the text character W3 and the ending probability of the text character W4....Wn, determine the maximum value in the product value, and calculate the maximum value of the product value corresponding to the two text characters. The text in between is determined as the question answer corresponding to the query question. When the product value of the start probability of the text character W2 and the end probability of the text character W6 is the maximum value, the text between the text character W2 and the text character W6 is determined as the question answer corresponding to the query question.

示例性的，当所述错误概率超过预设的错误概率阈值时，可按照预设的提示规则生成提示指令。例如，所述错误概率阈值设为0.5，当所述错误概率的值大于0.5时，确定所述文本片段中不存在所述查询问题对应的问题答案，按照预设的提示规则生成提示指令。Exemplarily, when the error probability exceeds a preset error probability threshold, a prompt instruction may be generated according to a preset prompt rule. For example, the error probability threshold is set to 0.5. When the error probability value is greater than 0.5, it is determined that there is no question answer corresponding to the query question in the text segment, and a prompt instruction is generated according to a preset prompt rule.

通过预设的错误概率阈值，在错误概率超过预设的错误概率阈值时确定文本片段中不包括问题答案，可避免在不包括问题答案的文本片段中确定出错误的问题答案，从而提高问题答案的准确率。Through the preset error probability threshold, when the error probability exceeds the preset error probability threshold, it is determined that the question answer is not included in the text segment, which can avoid determining the wrong question answer in the text segment that does not include the question answer, thereby improving the question answer. 's accuracy.

在一实施方式中，所述在所述文本片段中确定所述查询问题对应的问题答案之后，所述方法还包括：In one embodiment, after the question answer corresponding to the query question is determined in the text segment, the method further includes:

获取所述查询问题对应的机密级别，以及获取用户的用户级别；Obtain the confidentiality level corresponding to the query question, and obtain the user level of the user;

判断所述用户级别与所述机密级别是否匹配；judging whether the user level matches the secret level;

当所述用户级别与所述机密级别匹配时，输出所述查询问题对应的问题答案。When the user level matches the confidentiality level, a question answer corresponding to the query question is output.

核实当前用户的用户身份，确定用户级别，将用户级别与当前查询问题的机密级别进行匹配，当所述用户级别与所述机密级别匹配时，输出所述查询问题对应的问题答案，当所述用户级别与所述机密级别不匹配时，不输出所述查询问题对应的问题答案。通过将机密级别与用户级别进行匹配判断，可以对当前用户进行身份校验，如果匹配，表明当前用户属于用于查询问题对应的搜索权限。通过这种用户身份的校验，可以确保信息的安全性。Verify the user identity of the current user, determine the user level, match the user level with the confidentiality level of the current query question, when the user level matches the confidentiality level, output the question answer corresponding to the query question, when the When the user level does not match the confidentiality level, the question answer corresponding to the query question is not output. By judging the secret level and the user level, the identity of the current user can be verified. If it matches, it indicates that the current user belongs to the search authority corresponding to the query question. Through the verification of the user identity, the security of the information can be ensured.

上述实施例提供的基于人工智能的答案生成方法，通过根据数据信息的信息类型对数据信息进行处理得到文本片段，如采用与所述数据信息的信息类型对应的信息识别模型识别所述数据信息得到文本，对文本进行切分得到文本片段，其中数据信息的信息类型有多种，不局限于文字类型，提高问答系统的使用范围，同时对数据信息进行预处理可以得到更准确的文本片段，从而提高问题答案的准确率；接着将获取的查询信息和预设的问题模板生成所述查询信息对应的查询问题，避免因查询信息不清晰影响问题答案生成的情况发生，从而提高问题答案的准确率；将所述文本片段和所述查询问题进行拼接得到输入信息，并将所述输入信息输入预先训练的BERT模型得到所述输入信息中每个文本字符对应的特征向量，通过将所述文本片段和所述查询问题进行拼接得到输入信息实现BERT模型的单输入，可提高BERT模型的处理速度，从而提高问题答案生成的速率，同时将文本片段和查询问题进行拼接可以使得特征之间进行更全面的交互，提高BERT模型确定全文语义信息的准确率，从而提高每个文本字符对应的特征向量的准确率，进一步提高问题答案生成的准确率；然后根据预设的第一概率计算公式和所述输入信息中每个文本字符对应的特征向量，计算每个文本字符对应的起始概率；并根据预设的第二概率计算公式和输入信息中每个文本字符对应的特征向量，计算每个文本字符对应的结束概率；根据预设的第三概率计算公式计算所述输入信息的错误概率；最后当所述错误概率不超过预设的错误概率阈值时，根据所述多个文本字符对应的起始概率和所述多个文本字符对应的结束概率在所述文本片段中确定所述查询问题对应的问题答案，通过预设的错误概率阈值，在错误概率超过预设的错误概率阈值时确定文本片段中不包括问题答案，可避免在不包括问题答案的文本片段中确定出错误的问题答案，从而提高了问答系统回答问题的准确率。The artificial intelligence-based answer generation method provided by the above embodiment obtains text fragments by processing data information according to the information type of the data information, such as identifying the data information by using an information recognition model corresponding to the information type of the data information. Text, segment the text to obtain text fragments, in which there are various types of data information, not limited to text types, improve the use of the question answering system, and preprocess the data information to obtain more accurate text fragments, thus Improve the accuracy of the question answer; then generate the query question corresponding to the query information from the acquired query information and the preset question template, so as to avoid the occurrence of the problem that the generation of the question answer is affected by the unclear query information, thereby improving the accuracy of the question answer ; The text fragment and the query question are spliced to obtain input information, and the input information is input into a pre-trained BERT model to obtain the feature vector corresponding to each text character in the input information, by combining the text fragment The input information is obtained by splicing with the query question to realize the single input of the BERT model, which can improve the processing speed of the BERT model, thereby improving the generation rate of the question answer. At the same time, splicing the text fragment and the query question can make the features more comprehensive interaction, improve the accuracy of the BERT model in determining the full-text semantic information, thereby improving the accuracy of the feature vector corresponding to each text character, and further improving the accuracy of the question answer generation; then according to the preset first probability calculation formula and the described The feature vector corresponding to each text character in the input information is calculated, and the starting probability corresponding to each text character is calculated; and each text is calculated according to the preset second probability calculation formula and the feature vector corresponding to each text character in the input information. The ending probability corresponding to the characters; calculating the error probability of the input information according to the preset third probability calculation formula; finally, when the error probability does not exceed the preset error probability threshold, according to the starting probability corresponding to the plurality of text characters The starting probability and the ending probability corresponding to the plurality of text characters are used to determine the question answer corresponding to the query question in the text segment, and the text is determined by the preset error probability threshold when the error probability exceeds the preset error probability threshold. The question answer is not included in the segment, which can avoid determining the wrong question answer in the text segment that does not include the question answer, thereby improving the question answering accuracy of the question answering system.

图2是本申请实施例二提供的基于人工智能的答案生成装置的示意性框图，该答案生成装置用于执行前述的基于人工智能的答案生成方法。其中，该答案生成装置可以配置于服务器或终端中。FIG. 2 is a schematic block diagram of an apparatus for generating an answer based on artificial intelligence provided in Embodiment 2 of the present application, and the apparatus for generating an answer is configured to execute the aforementioned method for generating an answer based on artificial intelligence. Wherein, the answer generating apparatus may be configured in a server or a terminal.

其中，服务器可以为独立的服务器，也可以为服务器集群。该终端可以是手机、平板电脑、笔记本电脑、台式电脑、个人数字助理和穿戴式设备等电子设备。The server may be an independent server or a server cluster. The terminal may be an electronic device such as a mobile phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant and a wearable device.

如图2所示，基于人工智能的答案生成装置20包括：文本生成模块201、切分处理模块202、问题生成模块203、向量确定模块204、概率计算模块205和答案生成模块206。As shown in FIG. 2 , the artificial intelligence-based answer generation device 20 includes: atext generation module 201 , asegmentation processing module 202 , a question generation module 203 , a vector determination module 204 , a probability calculation module 205 and an answer generation module 206 .

所述文本生成模块201，用于获取数据信息，并使用与所述数据信息的信息类型对应的信息识别模型识别所述数据信息得到文本；Thetext generation module 201 is configured to acquire data information, and use an information identification model corresponding to the information type of the data information to identify the data information to obtain text;

所述切分处理模块202，用于当所述文本的长度大于预设长度阈值时，根据预设的滑动窗口对所述文本进行切分处理，得到多个文本片段；Thesegmentation processing module 202 is configured to perform segmentation processing on the text according to a preset sliding window when the length of the text is greater than a preset length threshold to obtain multiple text fragments;

所述问题生成模块203，用于接收查询信息，并根据所述查询信息和预设的问题模板生成所述查询信息对应的查询问题；The question generation module 203 is configured to receive query information, and generate a query question corresponding to the query information according to the query information and a preset question template;

所述向量确定模块204，用于将每个所述文本片段和所述查询问题进行拼接得到输入信息，并将所述输入信息输入预先训练的BERT模型得到所述输入信息中每个文本字符对应的特征向量；The vector determination module 204 is used for splicing each of the text fragments and the query question to obtain input information, and inputting the input information into a pre-trained BERT model to obtain the corresponding correspondence of each text character in the input information. eigenvector of ;

所述概率计算模块205，用于根据预设的第一概率计算公式和所述输入信息中每个文本字符对应的特征向量，计算每个文本字符对应的起始概率，并根据预设的第二概率计算公式和输入信息中每个文本字符对应的特征向量，计算每个文本字符对应的结束概率；The probability calculation module 205 is configured to calculate the starting probability corresponding to each text character according to the preset first probability calculation formula and the feature vector corresponding to each text character in the input information, and calculate the corresponding starting probability according to the preset first probability calculation formula. The second probability calculation formula and the feature vector corresponding to each text character in the input information, calculate the end probability corresponding to each text character;

所述概率计算模块205，还用于根据预设的第三概率计算公式计算所述输入信息的错误概率；The probability calculation module 205 is further configured to calculate the error probability of the input information according to a preset third probability calculation formula;

所述答案生成模块206，用于当所述错误概率不超过预设的错误概率阈值时，根据多个所述文本字符对应的起始概率和多个所述文本字符对应的结束概率在所述文本片段中确定所述查询问题对应的问题答案。The answer generation module 206 is configured to, when the error probability does not exceed a preset error probability threshold, generate a value in the A question answer corresponding to the query question is determined in the text fragment.

需要说明的是，所属领域的技术人员可以清楚地了解到，为了描述的方便和简洁，上述描述的装置和各模块及单元的具体工作过程，可以参考前述基于人工智能的答案生成方法实施例中的对应过程，在此不再赘述。It should be noted that those skilled in the art can clearly understand that, for the convenience and brevity of the description, the specific working process of the above-described device and each module and unit can be referred to in the foregoing embodiments of the artificial intelligence-based answer generation method. The corresponding process is not repeated here.

上述实施例提供的基于人工智能的答案生成装置，通过根据数据信息的信息类型对数据信息进行处理得到文本片段，如采用与所述数据信息的信息类型对应的信息识别模型识别所述数据信息得到文本，对文本进行切分得到文本片段，其中数据信息的信息类型有多种，不局限于文字类型，提高问答系统的使用范围，同时对数据信息进行预处理可以得到更准确的文本片段，从而提高问题答案的准确率；接着将获取的查询信息和预设的问题模板生成所述查询信息对应的查询问题，避免因查询信息不清晰影响问题答案生成的情况发生，从而提高问题答案的准确率；将所述文本片段和所述查询问题进行拼接得到输入信息，并将所述输入信息输入预先训练的BERT模型得到所述输入信息中每个文本字符对应的特征向量，通过将所述文本片段和所述查询问题进行拼接得到输入信息实现BERT模型的单输入，可提高BERT模型的处理速度，从而提高问题答案生成的速率，同时将文本片段和查询问题进行拼接可以使得特征之间进行更全面的交互，提高BERT模型确定全文语义信息的准确率，从而提高每个文本字符对应的特征向量的准确率，进一步提高问题答案生成的准确率；然后根据预设的第一概率计算公式和所述输入信息中每个文本字符对应的特征向量，计算每个文本字符对应的起始概率；并根据预设的第二概率计算公式和输入信息中每个文本字符对应的特征向量，计算每个文本字符对应的结束概率；根据预设的第三概率计算公式计算所述输入信息的错误概率；最后当所述错误概率不超过预设的错误概率阈值时，根据所述多个文本字符对应的起始概率和所述多个文本字符对应的结束概率在所述文本片段中确定所述查询问题对应的问题答案，通过预设的错误概率阈值，在错误概率超过预设的错误概率阈值时确定文本片段中不包括问题答案，可避免在不包括问题答案的文本片段中确定出错误的问题答案，从而提高了问答系统回答问题的准确率。The artificial intelligence-based answer generation device provided by the above embodiment obtains text fragments by processing the data information according to the information type of the data information. Text, segment the text to obtain text fragments, in which there are various types of data information, not limited to text types, improve the use of the question answering system, and preprocess the data information to obtain more accurate text fragments, thus Improve the accuracy of the question answer; then generate the query question corresponding to the query information from the acquired query information and the preset question template, so as to avoid the occurrence of the problem that the generation of the question answer is affected by the unclear query information, thereby improving the accuracy of the question answer ; The text fragment and the query question are spliced to obtain input information, and the input information is input into a pre-trained BERT model to obtain the feature vector corresponding to each text character in the input information, by combining the text fragment The input information is obtained by splicing with the query question to realize the single input of the BERT model, which can improve the processing speed of the BERT model, thereby improving the generation rate of the question answer. At the same time, splicing the text fragment and the query question can make the features more comprehensive interaction, improve the accuracy of the BERT model in determining the full-text semantic information, thereby improving the accuracy of the feature vector corresponding to each text character, and further improving the accuracy of the question answer generation; then according to the preset first probability calculation formula and the described The feature vector corresponding to each text character in the input information is calculated, and the starting probability corresponding to each text character is calculated; and each text is calculated according to the preset second probability calculation formula and the feature vector corresponding to each text character in the input information. The ending probability corresponding to the characters; calculating the error probability of the input information according to the preset third probability calculation formula; finally, when the error probability does not exceed the preset error probability threshold, according to the starting probability corresponding to the plurality of text characters The starting probability and the ending probability corresponding to the plurality of text characters are used to determine the question answer corresponding to the query question in the text segment, and the text is determined by the preset error probability threshold when the error probability exceeds the preset error probability threshold. The question answer is not included in the segment, which can avoid determining the wrong question answer in the text segment that does not include the question answer, thereby improving the question answering accuracy of the question answering system.

上述实施例提供的答案生成装置可以实现为一种计算机程序的形式，该计算机程序可以在如图3所示的计算机设备上运行。The answer generating apparatus provided by the above embodiments may be implemented in the form of a computer program, and the computer program may be executed on the computer device as shown in FIG. 3 .

参阅图3所示，为本申请实施例三提供的计算机设备的结构示意图。该计算机设备可以为服务器或终端设备。Referring to FIG. 3 , it is a schematic structural diagram of a computer device according to Embodiment 3 of the present application. The computer device can be a server or a terminal device.

如图3所示，该计算机设备30包括通过系统总线连接的处理器301和存储器302，其中，存储器302可以包括非易失性存储介质和易失性存储介质。As shown in FIG. 3 , thecomputer device 30 includes aprocessor 301 and amemory 302 connected through a system bus, wherein thememory 302 may include a non-volatile storage medium and a volatile storage medium.

存储器302可存储操作系统和计算机程序。示例性的，所述计算机程序可以被分割成一个或多个模块/单元，所述一个或者多个模块/单元被存储在所述存储器302中，并由所述处理器301执行，以完成本申请。所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机可读指令段，该指令段用于描述所述计算机程序在所述计算机设备中的执行过程。例如，所述计算机程序可以被分割成文本生成模块201、切分处理模块202、问题生成模块203、向量确定模块204、概率计算模块205、答案生成模块206。该计算机可读指令被执行时，可使得处理器301执行任意一项所述的基于人工智能的答案生成方法。Memory 302 may store operating systems and computer programs. Exemplarily, the computer program can be divided into one or more modules/units, and the one or more modules/units are stored in thememory 302 and executed by theprocessor 301 to complete the present invention. Application. The one or more modules/units may be a series of computer-readable instruction segments capable of performing specific functions, and the instruction segments are used to describe the execution process of the computer program in the computer device. For example, the computer program can be divided into atext generation module 201 , asegmentation processing module 202 , a question generation module 203 , a vector determination module 204 , a probability calculation module 205 , and an answer generation module 206 . When executed, the computer-readable instructions can cause theprocessor 301 to execute any one of the artificial intelligence-based answer generation methods.

处理器301用于提供计算和控制能力，支撑整个计算机设备的运行。Theprocessor 301 is used to provide computing and control capabilities and support the operation of the entire computer equipment.

在一可行实施例中，所述计算机设备还包括网络接口，所述网络接口用于进行网络通信，如发送分配的任务等。本领域技术人员可以理解，图3中示出的结构，仅仅是与本申请方案相关的部分结构的框图，并不构成对本申请方案所应用于其上的计算机设备的限定，具体的计算机设备可以包括比图中所示更多或更少的部件，或者组合某些部件，或者具有不同的部件布置。In a feasible embodiment, the computer device further includes a network interface, and the network interface is used for network communication, such as sending assigned tasks and the like. Those skilled in the art can understand that the structure shown in FIG. 3 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.

应当理解的是，处理器301是中央处理单元(Central Processing Unit，CPU)，该处理器还可以是其他通用处理器、数字信号处理器(Digital Signal Processor，DSP)、专用集成电路(Application Specific Integrated Circuit，ASIC)、现场可编程门阵列(Field-Programmable Gate Array，FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中，通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be understood that theprocessor 301 is a central processing unit (Central Processing Unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSP), application specific integrated circuits (Application Specific Integrated circuits) Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. Wherein, the general-purpose processor can be a microprocessor or the processor can also be any conventional processor or the like.

其中，在一个实施例中，所述处理器执行存储在存储器中的计算机可读指令，以实现如下步骤：Wherein, in one embodiment, the processor executes computer-readable instructions stored in the memory to implement the following steps:

根据预设的第一概率计算公式和所述输入信息中每个文本字符对应的特征向量，计算每个文本字符对应的起始概率，并根据预设的第二概率计算公式和输入信息中每个文本字符对应的特征向量，计算每个文本字符对应的结束概率；Calculate the starting probability corresponding to each text character according to the preset first probability calculation formula and the feature vector corresponding to each text character in the input information, and calculate the starting probability corresponding to each text character according to the preset second probability calculation formula and each text character in the input information. feature vectors corresponding to each text character, and calculate the end probability corresponding to each text character;

具体地，所述处理器对上述计算机可读指令的具体实现方法可参考前述基于人工智能的答案生成方法实施例中相关步骤的描述，在此不赘述。Specifically, for the specific implementation method of the computer-readable instruction by the processor, reference may be made to the description of the relevant steps in the foregoing embodiment of the answer generation method based on artificial intelligence, which is not repeated here.

本申请实施例四还提供一种计算机可读存储介质，所述计算机可读存储介质上存储计算机可读指令，所述计算机可读指令被执行时实现以下步骤：Embodiment 4 of the present application further provides a computer-readable storage medium, where computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed, the following steps are implemented:

具体地，上述计算机可读指令被所述处理器执行时的具体实现方法可参考前述基于人工智能的答案生成方法实施例中相关步骤的描述，在此不赘述。Specifically, for a specific implementation method when the computer-readable instruction is executed by the processor, reference may be made to the description of the relevant steps in the foregoing embodiments of the answer generation method based on artificial intelligence, and details are not described herein.

其中，所述计算机可读存储介质可以是前述实施例所述的计算机设备的内部存储单元，例如所述计算机设备的硬盘或内存。所述计算机可读存储介质也可以是所述计算机设备的外部存储设备，例如所述计算机设备上配备的插接式硬盘，智能存储卡(Smart Media Card， SMC)，安全数字(Secure Digital，SD)卡，闪存卡(Flash Card)等。The computer-readable storage medium may be an internal storage unit of the computer device described in the foregoing embodiments, such as a hard disk or a memory of the computer device. The computer-readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk equipped on the computer device, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) ) card, Flash Card, etc.

前述实施例提供的计算机设备及计算机可读存储介质，通过根据数据信息的信息类型对数据信息进行处理得到文本片段，如采用与所述数据信息的信息类型对应的信息识别模型识别所述数据信息得到文本，对文本进行切分得到文本片段，其中数据信息的信息类型有多种，不局限于文字类型，提高问答系统的使用范围，同时对数据信息进行预处理可以得到更准确的文本片段，从而提高问题答案的准确率；接着将获取的查询信息和预设的问题模板生成所述查询信息对应的查询问题，避免因查询信息不清晰影响问题答案生成的情况发生，从而提高问题答案的准确率；将所述文本片段和所述查询问题进行拼接得到输入信息，并将所述输入信息输入预先训练的BERT模型得到所述输入信息中每个文本字符对应的特征向量，通过将所述文本片段和所述查询问题进行拼接得到输入信息实现BERT模型的单输入，可提高BERT模型的处理速度，从而提高问题答案生成的速率，同时将文本片段和查询问题进行拼接可以使得特征之间进行更全面的交互，提高BERT模型确定全文语义信息的准确率，从而提高每个文本字符对应的特征向量的准确率，进一步提高问题答案生成的准确率；然后根据预设的第一概率计算公式和所述输入信息中每个文本字符对应的特征向量，计算每个文本字符对应的起始概率；并根据预设的第二概率计算公式和输入信息中每个文本字符对应的特征向量，计算每个文本字符对应的结束概率；根据预设的第三概率计算公式计算所述输入信息的错误概率；最后当所述错误概率不超过预设的错误概率阈值时，根据所述多个文本字符对应的起始概率和所述多个文本字符对应的结束概率在所述文本片段中确定所述查询问题对应的问题答案，通过预设的错误概率阈值，在错误概率超过预设的错误概率阈值时确定文本片段中不包括问题答案，可避免在不包括问题答案的文本片段中确定出错误的问题答案，从而提高了问答系统回答问题的准确率。The computer device and the computer-readable storage medium provided by the foregoing embodiments obtain text fragments by processing the data information according to the information type of the data information, such as identifying the data information by using an information identification model corresponding to the information type of the data information. Obtain text, segment the text to obtain text fragments, there are various types of data information, not limited to text types, improve the use range of the question answering system, and preprocess the data information to obtain more accurate text fragments, Thereby, the accuracy of the question answer is improved; then the obtained query information and the preset question template are used to generate the query question corresponding to the query information, so as to avoid the occurrence of the problem that the generation of the question answer is affected by the unclear query information, thereby improving the accuracy of the question answer. The input information is obtained by splicing the text fragment and the query question, and the input information is input into the pre-trained BERT model to obtain the feature vector corresponding to each text character in the input information. The fragment and the query question are spliced to obtain the input information to realize the single input of the BERT model, which can improve the processing speed of the BERT model, thereby increasing the rate of generating the answer to the question. Comprehensive interaction improves the accuracy of the BERT model in determining the full-text semantic information, thereby improving the accuracy of the feature vector corresponding to each text character, and further improving the accuracy of the question answer generation; then calculate the formula according to the preset first probability and all Describe the feature vector corresponding to each text character in the input information, and calculate the initial probability corresponding to each text character; and according to the preset second probability calculation formula and the feature vector corresponding to each text character in the input information, calculate each The end probability corresponding to the text characters; the error probability of the input information is calculated according to the preset third probability calculation formula; finally, when the error probability does not exceed the preset error probability threshold, the error probability corresponding to the plurality of text characters is calculated. The starting probability and the ending probability corresponding to the plurality of text characters are determined in the text segment to determine the answer to the question corresponding to the query question, and a preset error probability threshold is used to determine when the error probability exceeds the preset error probability threshold. The question answer is not included in the text segment, which can avoid determining the wrong question answer in the text segment that does not include the question answer, thereby improving the question answering accuracy of the question answering system.

本申请中所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain)，本质上是一个去中心化的数据库，是一串使用密码学方法相关联产生的数据块，每一个数据块中包含了一批次网络交易的信息，用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。The blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

还应当理解，在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合，并且包括这些组合。需要说明的是，在本文中，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者系统不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者系统所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括该要素的过程、方法、物品或者系统中还存在另外的相同要素。It will also be understood that, as used in this specification and the appended claims, the term "and/or" refers to and including any and all possible combinations of one or more of the associated listed items. It should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article or system comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, method, article or system. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article or system that includes the element.

上述本申请实施例序号仅仅为了描述，不代表实施例的优劣。以上所述，仅为本申请的具体实施方式，但本申请的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，可轻易想到各种等效的修改或替换，这些修改或替换都应涵盖在本申请的保护范围之内。因此，本申请的保护范围应以权利要求的保护范围为准。The above-mentioned serial numbers of the embodiments of the present application are only for description, and do not represent the advantages or disadvantages of the embodiments. The above are only specific embodiments of the present application, but the protection scope of the present application is not limited thereto. Any person skilled in the art can easily think of various equivalents within the technical scope disclosed in the present application. Modifications or substitutions shall be covered by the protection scope of this application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.