CN111553151A

Movatterモバイル変換

Info

Publication number: CN111553151A
Application number: CN202010255040.6A
Authority: CN
Inventors: 赵亮
Original assignee: OneConnect Financial Technology Co Ltd Shanghai
Current assignee: OneConnect Financial Technology Co Ltd Shanghai
Priority date: 2020-04-02
Filing date: 2020-04-02
Publication date: 2020-08-18
Also published as: WO2021196934A1

Abstract

Translated fromChinese

本申请适用于计算机技术领域，提出一种基于字段相似度计算的问题推荐方法、装置、存储介质和服务器。该问题推荐方法包括：获取输入的第一提问语句；对所述第一提问语句进行分词处理，提取其中包含的各个字段；将所述各个字段逐一与预先构建的字段数据表中具有的字段进行比较，找出所述各个字段和所述字段数据表具有的相同字段，确定为目标字段；分别计算所述目标字段与所述字段数据表中除所述目标字段外的各个其它字段之间的相似度；选取所述各个其它字段中所述相似度最高的字段，对所述第一提问语句中的所述目标字段进行替换，得到推荐的第二提问语句。采用该问题推荐方法，能够生成更符合用户预期的新问句，提高智能问答系统推荐问题的精准度。

This application applies to the field of computer technology, and proposes a method, device, storage medium and server for question recommendation based on field similarity calculation. The question recommendation method includes: acquiring an inputted first question sentence; performing word segmentation processing on the first question sentence, and extracting each field contained therein; performing each field with the fields in a pre-built field data table one by one. Compare, find out the same field that each field and the field data table have, and determine it as the target field; calculate the difference between the target field and each other field in the field data table except the target field. Similarity; select the field with the highest similarity among the other fields, and replace the target field in the first question sentence to obtain a recommended second question sentence. Using this question recommendation method, new questions that are more in line with user expectations can be generated, and the accuracy of the questions recommended by the intelligent question answering system can be improved.

Description

Translated fromChinese

技术领域technical field

本申请属于计算机技术领域，尤其涉及一种基于字段相似度计算的问题推荐方法、装置、存储介质和服务器。The present application belongs to the field of computer technology, and in particular, relates to a problem recommendation method, device, storage medium and server based on field similarity calculation.

背景技术Background technique

基于自然语言的智能问答系统的工作原理通常是，用户输入一条问句，智能问答系统对该问句进行自然语言处理，生成结构化查询语言，再根据该结构化查询语言到数据库或知识库中查找答复的内容，最后将查询结果返回给用户。The working principle of the intelligent question answering system based on natural language is usually that the user inputs a question, the intelligent question answering system performs natural language processing on the question, generates a structured query language, and then sends it to the database or knowledge base according to the structured query language. Find the content of the reply, and finally return the query result to the user.

目前，智能问答系统的问题推荐方式主要有两种，一种是实时推荐，即根据用户当前输入的问句进行推荐；另外一种是相似问题推荐。在实时推荐时，往往是基于关键字触发，例如当用户输入“by”时，会推荐某个枚举型字段名；而在相似问题推荐上，则是随机替换原问句中同类型的关键词，从而拼成新的问句。然而，上述两种方式推荐的问题往往与用户的预期相去甚远，问题推荐的精准度较低。At present, there are two main ways to recommend questions in the intelligent question answering system. One is real-time recommendation, that is, the recommendation is based on the question currently input by the user; the other is similar question recommendation. In real-time recommendation, it is often triggered based on keywords. For example, when the user enters "by", an enumerated field name will be recommended; in the recommendation of similar questions, it is to randomly replace the same type of key in the original question. words to form a new question. However, the questions recommended by the above two methods are often far from the user's expectations, and the accuracy of question recommendation is low.

发明内容SUMMARY OF THE INVENTION

有鉴于此，本申请提出一种基于字段相似度计算的问题推荐方法、装置、存储介质和服务器，能够提高智能问答系统推荐问题的精准度。In view of this, the present application proposes a question recommendation method, device, storage medium and server based on field similarity calculation, which can improve the accuracy of question recommendation by an intelligent question answering system.

第一方面，本申请实施例提供了一种基于字段相似度计算的问题推荐方法，包括：In a first aspect, an embodiment of the present application provides a method for question recommendation based on field similarity calculation, including:

获取输入的第一提问语句；Get the first question sentence entered;

对所述第一提问语句进行分词处理，提取其中包含的各个字段；Perform word segmentation processing on the first question sentence, and extract each field contained therein;

将所述各个字段逐一与预先构建的字段数据表中具有的字段进行比较，找出所述各个字段和所述字段数据表具有的相同字段，确定为目标字段；Compare each field with the fields in the pre-built field data table one by one, find out the same field that each field and the field data table have, and determine it as the target field;

分别计算所述目标字段与所述字段数据表中除所述目标字段外的各个其它字段之间的相似度；respectively calculating the similarity between the target field and each other field except the target field in the field data table;

选取所述各个其它字段中所述相似度最高的字段，对所述第一提问语句中的所述目标字段进行替换，得到推荐的第二提问语句。The field with the highest similarity among the other fields is selected, and the target field in the first question sentence is replaced to obtain a recommended second question sentence.

进一步的，所述目标字段与所述字段数据表中任意一个其它字段之间的相似度可以通过以下步骤计算：Further, the similarity between the target field and any other field in the field data table can be calculated by the following steps:

结合所述目标字段的字符串和枚举值，以及所述任意一个其它字段的字符串和枚举值，计算所述目标字段和所述任意一个其它字段的相似度指标，所述相似度指标为用于衡量两个字段之间的相似程度的参数；Combining the character string and enumeration value of the target field and the character string and enumeration value of any other field, calculate the similarity index of the target field and any other field, the similarity index is a parameter used to measure the degree of similarity between two fields;

根据所述目标字段和所述任意一个其它字段的相似度指标，计算得到所述目标字段和所述任意一个其它字段的相似度。According to the similarity index between the target field and any one of the other fields, the similarity between the target field and the any one of the other fields is calculated.

进一步的，所述计算所述目标字段和所述任意一个其它字段的相似度指标可以包括：Further, the calculating the similarity index between the target field and any one of the other fields may include:

计算所述目标字段和所述任意一个其它字段的字符串相似度指标、字符串长度相似度指标、枚举值个数相似度指标以及枚举值长度相似度指标；Calculate the string similarity index, the string length similarity index, the enumeration value number similarity index, and the enumeration value length similarity index of the target field and any one of the other fields;

所述根据所述目标字段和所述任意一个其它字段的相似度指标，计算得到所述目标字段和所述任意一个其它字段的相似度可以包括：The calculating the similarity between the target field and the any other field according to the similarity index of the target field and the any other field may include:

计算所述字符串相似度指标、所述字符串长度相似度指标、所述枚举值个数相似度指标以及所述枚举值长度相似度指标的平均值或者加权平均值，作为所述目标字段和所述任意一个其它字段的相似度。Calculate the average or weighted average of the string similarity index, the string length similarity index, the enumeration value number similarity index, and the enumeration value length similarity index as the target The similarity between the field and any one of the other fields.

更进一步的，所述字符串相似度指标可以采用以下公式计算：Further, the string similarity index can be calculated by the following formula:

其中，s₁表示所述字符串相似度指标，sim表示两个字段具有的相同字符串的个数，short表示两个字段中长度较短的字段具有的字符串长度，long表示两个字段中长度较长的字段具有的字符串长度，α是一个超参数，用于控制字符串对相似度的影响；Among them, s₁ represents the string similarity index, sim represents the number of identical strings in the two fields, short represents the string length of the shorter field in the two fields, and long represents the length of the string in the two fields. The length of the string that a field with a longer length has, α is a hyperparameter that controls the impact of the string on the similarity;

所述字符串长度相似度指标可以采用以下公式计算：The string length similarity index can be calculated by the following formula:

其中，s₂表示所述字符串长度相似度指标，short表示两个字段中长度较短的字段具有的字符串长度，long表示两个字段中长度较长的字段具有的字符串长度；Wherein, s₂ represents the string length similarity index, short represents the string length of the field with the shorter length among the two fields, and long represents the string length of the field with the longer length among the two fields;

所述枚举值个数相似度指标可以采用以下公式计算：The similarity index of the number of enumerated values can be calculated by the following formula:

其中，s₃表示所述枚举值个数相似度指标，min表示两个字段中枚举值数量较少的字段具有的枚举值个数，max表示两个字段中枚举值数量较多的字段具有的枚举值个数；Wherein,_s3 represents the similarity index of the number of enumeration values, min represents the number of enumeration values in the two fields with a smaller number of enumeration values, and max represents the number of enumeration values in the two fields is larger The number of enumeration values the field has;

所述枚举值长度相似度指标可以采用以下公式计算：The enumeration value length similarity index can be calculated by the following formula:

其中，s₄表示所述枚举值长度相似度指标，avg_min表示两个字段中枚举值平均长度较短的字段的枚举值平均长度，avg_max表示两个字段中枚举值平均长度较长的字段的枚举值平均长度。Wherein,_s4 represents the similarity index of the length of the enumeration value, avg_min represents the average length of the enumeration value of the field whose average length of the enumeration value is shorter in the two fields, and avg_max represents the average length of the enumeration value in the two fields is longer. The average length of the enumeration values of the fields.

进一步的，所述分别计算所述目标字段与所述字段数据表中除所述目标字段外的各个其它字段之间的相似度可以包括：Further, the separately calculating the similarity between the target field and each other field in the field data table except the target field may include:

查找输入所述第一提问语句的用户的所有历史提问语句；Find all historical question sentences of the user who input the first question sentence;

根据所述历史提问语句构建共现矩阵，所述共现矩阵记录所述字段数据表中任意两个字段共同出现于所述用户的同一条历史提问语句中的次数；constructing a co-occurrence matrix according to the historical question statement, and the co-occurrence matrix records the number of times that any two fields in the field data table co-occur in the same historical question statement of the user;

根据所述共现矩阵计算所述目标字段与所述各个其它字段之间的相似度。The similarity between the target field and each of the other fields is calculated according to the co-occurrence matrix.

进一步的，所述根据所述共现矩阵确定所述目标字段与所述各个其它字段之间的相似度可以包括：Further, the determining the similarity between the target field and the other fields according to the co-occurrence matrix may include:

从所述共现矩阵中分别提取所述目标字段的字段向量以及每个所述其它字段的字段向量，所述字段向量的各个元素分别为相应的字段与所述字段数据表中的各个字段共同出现于所述用户的同一条历史提问语句中的次数；The field vector of the target field and the field vector of each of the other fields are respectively extracted from the co-occurrence matrix, and each element of the field vector is the common field of the corresponding field and each field in the field data table. the number of occurrences in the same historical question statement of said user;

分别计算所述目标字段的字段向量和每个所述其它字段的字段向量之间的余弦相似度，得到所述目标字段与所述各个其它字段之间的相似度。Calculate the cosine similarity between the field vector of the target field and the field vector of each of the other fields to obtain the similarity between the target field and each of the other fields.

更进一步的，在根据所述历史提问语句构建共现矩阵之后，还可以包括：Further, after constructing the co-occurrence matrix according to the historical question statement, it may further include:

根据所述共现矩阵确定所述字段数据表中与所述目标字段共同出现于所述用户的同一条历史提问语句中的次数最多的字段；determining, according to the co-occurrence matrix, the field in the field data table that co-occurs with the target field in the same historical question sentence of the user the most frequently;

选取所述次数最多的字段，对所述第一提问语句中的所述目标字段进行替换，得到推荐的第三提问语句。The field with the highest number of times is selected, and the target field in the first question sentence is replaced to obtain a recommended third question sentence.

第二方面，本申请实施例提供了一种基于字段相似度计算的问题推荐装置，包括：In a second aspect, an embodiment of the present application provides a question recommendation device based on field similarity calculation, including:

问题获取模块，用于获取输入的第一提问语句；The question obtaining module is used to obtain the inputted first question statement;

分词模块，用于对所述第一提问语句进行分词处理，提取其中包含的各个字段；A word segmentation module, configured to perform word segmentation processing on the first question sentence, and extract each field contained therein;

字段比较模块，用于将所述各个字段逐一与预先构建的字段数据表中具有的字段进行比较，找出所述各个字段和所述字段数据表具有的相同字段，确定为目标字段；a field comparison module, used to compare the fields with the fields in the pre-built field data table one by one, find out the same fields that the each field and the field data table have, and determine it as a target field;

字段相似度计算模块，用于分别计算所述目标字段与所述字段数据表中除所述目标字段外的各个其它字段之间的相似度；a field similarity calculation module, configured to calculate the similarity between the target field and each other field except the target field in the field data table;

问题推荐模块，用于选取所述各个其它字段中所述相似度最高的字段，对所述第一提问语句中的所述目标字段进行替换，得到推荐的第二提问语句。The question recommendation module is configured to select the field with the highest similarity among the other fields, and replace the target field in the first question sentence to obtain a recommended second question sentence.

第三方面，本申请实施例提供了一种计算机可读存储介质，所述计算机可读存储介质存储有计算机程序，所述计算机程序被处理器执行时实现如本申请实施例第一方面提出的问题推荐方法的步骤。In a third aspect, an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the implementation of the first aspect of the embodiment of the present application is implemented. The steps of the recommended method for the problem.

第四方面，本申请实施例提供了一种服务器，包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现如本申请实施例第一方面提出的问题推荐方法的步骤。In a fourth aspect, an embodiment of the present application provides a server, including a memory, a processor, and a computer program stored in the memory and executable on the processor, which is implemented when the processor executes the computer program The steps of the problem recommendation method proposed in the first aspect of the embodiments of the present application.

第五方面，本申请实施例提供了一种计算机程序产品，当计算机程序产品在终端设备上运行时，使得终端设备执行上述第一方面所述的问题推荐方法的步骤。In a fifth aspect, an embodiment of the present application provides a computer program product that, when the computer program product runs on a terminal device, enables the terminal device to execute the steps of the problem recommendation method described in the first aspect above.

本申请提出的基于字段相似度计算的问题推荐方法，在提取到输入的提问语句的各个字段之后，会将各个字段逐一与预先构建的字段数据表中具有的字段进行比较，找出提取出的字段和字段数据表中具有的相同字段，确定为目标字段；然后，分别计算该目标字段与该字段数据表中各个其它字段之间的相似度，找出相似度最高的字段，对该提问语句中的目标字段进行替换，从而得到推荐的问句。与常规的随机替换语句中同类型关键词的方式相比，本申请综合考虑了各个预设字段之间的相似度，用相似度最高的字段对原提问语句中的字段进行替换，能够生成更符合用户预期的新问句，提高智能问答系统推荐问题的精准度。The question recommendation method based on field similarity calculation proposed in this application, after extracting each field of the input question sentence, compares each field with the fields in the pre-built field data table one by one, and finds out the extracted The same field in the field and the field data table is determined as the target field; then, the similarity between the target field and each other field in the field data table is calculated respectively, and the field with the highest similarity is found. Replace the target field in , so as to get the recommended question. Compared with the conventional method of randomly replacing the same type of keywords in the sentence, the present application comprehensively considers the similarity between each preset field, and replaces the field in the original question sentence with the field with the highest similarity, which can generate more New questions that meet user expectations improve the accuracy of questions recommended by the intelligent question answering system.

附图说明Description of drawings

为了更清楚地说明本申请实施例中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only for the present application. In some embodiments, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without any creative effort.

图1是本申请实施例提供的一种问题推荐方法的第一个实施例的流程图；1 is a flowchart of a first embodiment of a problem recommendation method provided by an embodiment of the present application;

图2是本申请实施例提供的一种问题推荐方法的第二个实施例的流程图；2 is a flowchart of a second embodiment of a problem recommendation method provided by an embodiment of the present application;

图3是本申请实施例提供的一种问题推荐方法的第三个实施例的流程图；3 is a flowchart of a third embodiment of a problem recommendation method provided by an embodiment of the present application;

图4是本申请实施例提供的一种问题推荐装置的一个实施例的结构图；FIG. 4 is a structural diagram of an embodiment of a question recommendation device provided by an embodiment of the present application;

图5是本申请实施例提供的一种服务器的示意图。FIG. 5 is a schematic diagram of a server provided by an embodiment of the present application.

具体实施方式Detailed ways

以下描述中，为了说明而不是为了限定，提出了诸如特定系统结构、技术之类的具体细节，以便透彻理解本申请实施例。然而，本领域的技术人员应当清楚，在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中，省略对众所周知的系统、装置、电路以及方法的详细说明，以免不必要的细节妨碍本申请的描述。另外，在本申请说明书和所附权利要求书的描述中，术语“第一”、“第二”、“第三”等仅用于区分描述，而不能理解为指示或暗示相对重要性。In the following description, for the purpose of illustration rather than limitation, specific details such as a specific system structure and technology are set forth in order to provide a thorough understanding of the embodiments of the present application. However, it will be apparent to those skilled in the art that the present application may be practiced in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail. In addition, in the description of the specification of the present application and the appended claims, the terms "first", "second", "third", etc. are only used to distinguish the description, and should not be construed as indicating or implying relative importance.

本申请提出一种问题推荐方法、装置、存储介质和服务器，能够提高智能问答系统推荐问题的精准度。The present application provides a question recommendation method, device, storage medium and server, which can improve the accuracy of question recommendation by an intelligent question answering system.

应当理解，本申请各个实施例提出的基于字段相似度计算的问题推荐方法的执行主体是各种类型的服务器或者终端设备。It should be understood that the execution subject of the question recommendation method based on field similarity calculation proposed by each embodiment of the present application is various types of servers or terminal devices.

请参阅图1，本申请实施例中一种基于字段相似度计算的问题推荐方法的第一个实施例包括：Referring to FIG. 1, the first embodiment of a question recommendation method based on field similarity calculation in the embodiment of the present application includes:

101、获取输入的第一提问语句；101. Obtain the inputted first question statement;

用户可以在终端设备上通过语音输入或者手动输入要提问的问题，即该第一提问语句，该提问语句会发送至服务器端的智能问答系统。The user can input the question to be asked, that is, the first question sentence, by voice input or manually input on the terminal device, and the question sentence will be sent to the intelligent question answering system on the server side.

102、对所述第一提问语句进行分词处理，提取其中包含的各个字段；102. Perform word segmentation processing on the first question sentence, and extract each field contained therein;

服务器在获取到该提问语句之后，会对该提问语句进行分词，提取其包含的各个字段。在分词的时候，可以采用现有技术中各种不同类型的分词方式，比如可以采用jieba分词，假如用户提出的问题为：“男性不同职业平均年龄如何？”，则在使用jieba分词之后，会得到字段list[“男性”，“不同”，“职业”，“平均”，“年龄”，“如何”，“？”]。After obtaining the question sentence, the server will perform word segmentation on the question sentence, and extract each field contained in the question sentence. During word segmentation, various types of word segmentation methods in the prior art can be used. For example, jieba word segmentation can be used. If the question raised by the user is: "What is the average age of men in different occupations?" get fields list["male", "different", "occupation", "average", "age", "how", "?"].

103、将所述各个字段逐一与预先构建的字段数据表中具有的字段进行比较，找出所述各个字段和所述字段数据表具有的相同字段，确定为目标字段；103. Compare each field with the fields in the pre-built field data table one by one, find out the same field that each field and the field data table have, and determine it as a target field;

在分词得到该第一提问语句中的各个字段之后，服务器会将所述各个字段逐一与预先构建的字段数据表中具有的字段进行比较，找出所述各个字段和所述字段数据表具有的相同字段，确定为目标字段。After each field in the first question sentence is obtained by word segmentation, the server will compare the fields with the fields in the pre-built field data table one by one, and find out the fields and the fields in the field data table. The same field is determined as the target field.

预先构建的字段数据表可以如以下的表1所示：A pre-built field data table can be shown in Table 1 below:

表1Table 1

姓名Name职业Profession性别gender年龄age个人税后月收入Personal monthly income after tax行业industry张三Zhang San警察police男male353545004500安保security李四Li Si服务员waiter女Female292940004000服务Serve………………………………

在表1中，“姓名”，“职业”，“性别”，“年龄”，“个人税后月收入”，“行业”都是该字段数据表具有的字段，“张三”，“李四”，“服务员”，“警察”，“男”，“女”，“安保”，“服务”等都是字段的枚举值。在构建字段数据表时，将上述字段和枚举值，写进数据结构中，例如在python语言中，可以用dict类型，存储上述数据，形成dict类型的数据结构表格。In Table 1, "name", "occupation", "gender", "age", "personal after-tax monthly income", "industry" are all fields that the field data table has, "Zhang San", "Li Si" ", "waiter", "police", "male", "female", "security", "service", etc. are all enumeration values for the fields. When building a field data table, the above fields and enumeration values are written into the data structure. For example, in the python language, the dict type can be used to store the above data to form a dict type data structure table.

另外，可以将这些字段加入到jieba的自定义词典中，这样，在对用户输入的问句进行分词时，就不会将这些字段关键词切开。例如，对于字段关键词“个人税后月收入”，jieba默认会将其切成“个人”，“税后”，“月收入”3个字段，而如果将“个人税后月收入”加入到jieba的自定义词典中，jieba就不会对其进行切分。In addition, these fields can be added to jieba's custom dictionary, so that the keywords of these fields will not be cut when the questions entered by the user are segmented. For example, for the field keyword "personal after-tax monthly income", jieba will cut it into three fields by default: "personal", "after-tax", "monthly income", and if "personal after-tax monthly income" is added to In jieba's custom dictionary, jieba will not segment it.

假设所述各个字段为list[“男性”，“不同”，“职业”，“平均”，“年龄”，“如何”，“？”]，将这些字段与表1中的各个字段进行比较，找出相同的字段为“职业”和“年龄”，作为目标字段。需要说明的是，这里的目标字段可以为一个，也可以为多个。Assuming that the respective fields are list["Male", "Different", "Occupation", "Average", "Age", "How", "?"], compare these fields with the respective fields in Table 1, Find the same fields as "Occupation" and "Age" as target fields. It should be noted that the target field here can be one or multiple.

104、分别计算所述目标字段与所述字段数据表中除所述目标字段外的各个其它字段之间的相似度；104. Calculate the similarity between the target field and each other field except the target field in the field data table respectively;

在确定目标字段之后，分别计算所述目标字段与所述字段数据表中除所述目标字段外的各个其它字段之间的相似度。比如在上述表1的例子中，对于目标字段“职业”，则计算“职业”与“姓名”的相似度、“职业”与“性别”的相似度、“职业”与“年龄”的相似度、“职业”与“个人税后月收入”的相似度以及“职业”与“行业”的相似度。After the target field is determined, the similarity between the target field and each other field in the field data table except the target field is calculated respectively. For example, in the example in Table 1 above, for the target field "occupation", the similarity between "occupation" and "name", the similarity between "occupation" and "gender", and the similarity between "occupation" and "age" are calculated. , the similarity between "occupation" and "personal after-tax monthly income", and the similarity between "occupation" and "industry".

(1)结合所述目标字段的字符串和枚举值，以及所述任意一个其它字段的字符串和枚举值，计算所述目标字段和所述任意一个其它字段的相似度指标，所述相似度指标为用于衡量两个字段之间的相似程度的参数；(1) Combining the character string and enumeration value of the target field and the character string and enumeration value of any other field, calculate the similarity index of the target field and any other field, the The similarity index is a parameter used to measure the similarity between two fields;

(2)根据所述目标字段和所述任意一个其它字段的相似度指标，计算得到所述目标字段和所述任意一个其它字段的相似度。(2) Calculate the similarity between the target field and the any other field according to the similarity index between the target field and the any other field.

字符串和枚举值的相关属性参数，比如字符串的长度，或者枚举值的数量和类别，都是可以用于确定字段之间相似程度的重要参数。进一步的，所述计算所述目标字段和所述任意一个其它字段的相似度指标可以包括：计算所述目标字段和所述任意一个其它字段的字符串相似度指标、字符串长度相似度指标、枚举值个数相似度指标以及枚举值长度相似度指标。Related property parameters of strings and enumeration values, such as the length of the string, or the number and category of enumeration values, are important parameters that can be used to determine the degree of similarity between fields. Further, the calculating the similarity index of the target field and the any other field may include: calculating the string similarity index, the string length similarity index, The similarity index of the number of enumeration values and the similarity index of the length of the enumeration value.

具体的，所述字符串相似度指标可以采用以下公式计算：Specifically, the string similarity index can be calculated using the following formula:

其中，s₁表示所述字符串相似度指标，sim表示两个字段(即所述目标字段和所述任意一个其它字段)具有的相同字符串的个数，short表示两个字段中长度较短的字段具有的字符串长度，long表示两个字段中长度较长的字段具有的字符串长度，α是一个超参数，用于控制字符串对相似度的影响。

的作用是将s₁压缩在0和1之间。例如，有两个字段，分别为“个人税后月收入”和“个人所得税”，那么在计算两者的s₁时，sim＝3(“个”、“人”、“税”)，short＝5，long＝7。Wherein, s₁ represents the string similarity index, sim represents the number of identical strings in two fields (that is, the target field and the any other field), and short represents the shorter length of the two fields The length of the string that the field has, long represents the length of the string that the longer of the two fields has, and α is a hyperparameter that controls the effect of the string on the similarity.

The effect is to compress s₁ between 0 and 1. For example, there are two fields, namely "personal after-tax monthly income" and "personal income tax", then when calculating s₁ of both, sim=3 ("person", "person", "tax"), short =5, long=7.

其中，s₂表示所述字符串长度相似度指标，short表示两个字段(即所述目标字段和所述任意一个其它字段)中长度较短的字段具有的字符串长度，long表示两个字段中长度较长的字段具有的字符串长度，例如计算字段“个人税后月收入”和“职业”的s₂，得到

Wherein, s₂ represents the string length similarity index, short represents the string length of the shorter field in the two fields (that is, the target field and the any other field), and long represents the two fields The length of the string that the medium-length field has, such as the calculation field "personal after-tax monthly income" and "occupation" s₂ , get

其中，s₃表示所述枚举值个数相似度指标，min表示两个字段中枚举值数量较少的字段具有的枚举值个数，max表示两个字段中枚举值数量较多的字段具有的枚举值个数。例如，字段数据表中“职业”字段的枚举值有6个(警察、护士、教师、程序员、学生、职员)，“性别”字段的枚举值有2个(男和女)，则两者的s₃为

Wherein,_s3 represents the similarity index of the number of enumeration values, min represents the number of enumeration values in the two fields with a smaller number of enumeration values, and max represents the number of enumeration values in the two fields is larger The number of enumeration values the field has. For example, the "occupation" field in the field data table has 6 enumeration values (police, nurse, teacher, programmer, student, staff), and the "gender" field has 2 enumeration values (male and female), then The s₃ of both is

其中，s₄表示所述枚举值长度相似度指标，avg_min表示两个字段中枚举值平均长度较短的字段的枚举值平均长度，avg_max表示两个字段中枚举值平均长度较长的字段的枚举值平均长度。例如“职业”字段的枚举值平均长度为(2+2+2+3+2+2)/6＝2.17，“性别”字段的枚举值平均长度为(1+1)/2＝1，则两者的s₄为

Wherein,_s4 represents the similarity index of the length of the enumeration value, avg_min represents the average length of the enumeration value of the field whose average length of the enumeration value is shorter in the two fields, and avg_max represents the average length of the enumeration value in the two fields is longer. The average length of the enumeration values of the fields. For example, the average length of the enumeration value of the "occupation" field is (2+2+2+3+2+2)/6=2.17, and the average length of the enumeration value of the "gender" field is (1+1)/2=1 , then the s₄ of the two is

具体的，所述根据所述目标字段和所述任意一个其它字段的相似度指标，计算得到所述目标字段和所述任意一个其它字段的相似度可以包括：Specifically, calculating the similarity between the target field and any other field according to the similarity index between the target field and any other field may include:

计算所述字符串相似度指标、所述字符串长度相似度指标、所述枚举值个数相似度指标以及所述枚举值长度相似度指标的平均值或者加权平均值，作为所述目标字段和所述任意一个其它字段的相似度，比如两个字段的相似度

Calculate the average or weighted average of the string similarity index, the string length similarity index, the enumeration value number similarity index, and the enumeration value length similarity index as the target The similarity between the field and any of the other fields, such as the similarity between two fields

105、选取所述各个其它字段中所述相似度最高的字段，对所述第一提问语句中的所述目标字段进行替换，得到推荐的第二提问语句。105. Select the field with the highest similarity among the other fields, and replace the target field in the first question sentence to obtain a recommended second question sentence.

在计算得到所述目标字段与所述字段数据表中除所述目标字段外的各个其它字段之间的相似度之后，选取所述各个其它字段中所述相似度最高的字段，对所述第一提问语句中的所述目标字段进行替换，得到推荐的第二提问语句。例如，第一提问语句为“上海不同职业的平均收入分布如何”，其中“职业”是一个目标字段，该字段数据表中与“职业”这个字段相似度最高的字段为“行业”，那么就可以用“行业”替换该第一提问语句中的“职业”，从而得到第二提问语句：“上海不同行业的平均收入分布如何”。最后，将该第二提问语句推荐给用户，完成一次问题推荐的过程。After calculating the similarity between the target field and each other field in the field data table except the target field, select the field with the highest similarity among the other fields, The target field in a question sentence is replaced to obtain a recommended second question sentence. For example, the first question sentence is "What is the average income distribution of different occupations in Shanghai", where "occupation" is a target field, and the field with the highest similarity to the field "occupation" in the field data table is "industry", then The "occupation" in the first question sentence can be replaced with "industry", so as to obtain the second question sentence: "What is the average income distribution of different industries in Shanghai?" Finally, the second question sentence is recommended to the user to complete a question recommendation process.

本申请实施例在提取到输入的提问语句的各个字段之后，会将各个字段逐一与预先构建的字段数据表中具有的字段进行比较，找出提取出的字段和字段数据表中具有的相同字段，确定为目标字段；然后，分别计算该目标字段与该字段数据表中各个其它字段之间的相似度，找出相似度最高的字段，对该提问语句中的目标字段进行替换，从而得到推荐的问句。与常规的随机替换语句中同类型关键词的方式相比，本申请实施例综合考虑了各个预设字段之间的相似度，用相似度最高的字段对原提问语句中的字段进行替换，能够生成更符合用户预期的新问句，提高智能问答系统推荐问题的精准度。In this embodiment of the present application, after each field of the input question statement is extracted, each field is compared with the fields in the pre-built field data table one by one, and the extracted fields and the same fields in the field data table are found out. , determined as the target field; then, calculate the similarity between the target field and each other field in the field data table, find the field with the highest similarity, and replace the target field in the question sentence, so as to get the recommendation question. Compared with the conventional method of randomly replacing the same type of keywords in the sentence, the embodiment of the present application comprehensively considers the similarity between each preset field, and replaces the field in the original question sentence with the field with the highest similarity, which can Generate new questions that are more in line with user expectations, and improve the accuracy of questions recommended by the intelligent question answering system.

请参阅图2，本申请实施例中一种基于字段相似度计算的问题推荐方法的第二个实施例包括：Referring to FIG. 2, a second embodiment of a question recommendation method based on field similarity calculation in the embodiment of the present application includes:

201、获取输入的第一提问语句；201. Obtain an inputted first question statement;

202、对所述第一提问语句进行分词处理，提取其中包含的各个字段；202. Perform word segmentation processing on the first question sentence, and extract each field contained therein;

203、将所述各个字段逐一与预先构建的字段数据表中具有的字段进行比较，找出所述各个字段和所述字段数据表具有的相同字段，确定为目标字段；203. Compare each of the fields with the fields in the pre-built field data table one by one, find out the same fields that the each field and the field data table have, and determine it as a target field;

步骤201-203与步骤101-103相同，具体可参照步骤101-103的相关说明。Steps 201-203 are the same as steps 101-103. For details, please refer to the relevant description of steps 101-103.

204、查找输入所述第一提问语句的用户的所有历史提问语句；204. Find all historical question sentences of the user who input the first question sentence;

在确定该目标字段之后，服务器可以获取输入所述第一提问语句的用户的历史提问记录，查找该用户的所有历史提问语句。After determining the target field, the server may obtain the historical questioning records of the user who input the first questioning sentence, and search for all historical questioning sentences of the user.

205、根据所述历史提问语句构建共现矩阵，所述共现矩阵记录所述字段数据表中任意两个字段共同出现于所述用户的同一条历史提问语句中的次数；205. Construct a co-occurrence matrix according to the historical question statement, and the co-occurrence matrix records the number of times that any two fields in the field data table co-occur in the same historical question statement of the user;

然后，根据所述历史提问语句构建共现矩阵，所述共现矩阵记录所述字段数据表中任意两个字段共同出现于所述用户的同一条历史提问语句中的次数。比如，根据用户的历史提问语句构建的某个共现矩阵M为：Then, a co-occurrence matrix is constructed according to the historical question sentence, and the co-occurrence matrix records the number of times that any two fields in the field data table co-occur in the same historical question sentence of the user. For example, a certain co-occurrence matrix M constructed according to the user's historical question sentences is:

该共现矩阵M对应于以下的表2：The co-occurrence matrix M corresponds to Table 2 below:

表2Table 2

共现矩阵MCo-occurrence matrix M职业Profession性别gender年龄age个人税后月收入Personal monthly income after tax行业industry职业Profession--1818272722twenty two33性别gender1818--22151555年龄age272722--30301010个人税后月收入Personal monthly income after tax22twenty two15153030--21twenty one行业industry3355101021twenty one--

在表2中，“性别”和“职业”所对应的值为18，表示在该用户的所有历史提问语句中，“性别”和“职业”在同一条历史提问语句中共现过的次数为18。比如，预先存储用户提问过的所有提问语句，“不同性别和职业之间的关系”、“不同职业和性别未婚比例”、…、“不同性别和不同职业之间的相关性”等。在这些问句中，都有“职业”和“性别”，如果这样的提问语句有18个，那么“职业”和“性别”这两个就是共现了18次。In Table 2, the value corresponding to "gender" and "occupation" is 18, which means that among all the historical question sentences of the user, the number of times "gender" and "occupation" co-occur in the same historical question sentence is 18 . For example, pre-store all question sentences that the user has asked, "relationship between different genders and occupations", "unmarried proportions between different occupations and genders", ..., "correlations between different genders and different occupations", etc. In these questions, there are both "occupation" and "sex". If there are 18 such question sentences, then "occupation" and "sex" appear 18 times in total.

206、根据所述共现矩阵计算所述目标字段与所述字段数据表中除所述目标字段外的各个其它字段之间的相似度；206. Calculate, according to the co-occurrence matrix, the similarity between the target field and each other field in the field data table except the target field;

在构建出共现矩阵之后，可以根据所述共现矩阵计算所述目标字段与所述字段数据表中除所述目标字段外的各个其它字段之间的相似度。After the co-occurrence matrix is constructed, the similarity between the target field and each other field except the target field in the field data table may be calculated according to the co-occurrence matrix.

具体的，步骤206可以包括：Specifically, step 206 may include:

(1)从所述共现矩阵中分别提取所述目标字段的字段向量以及每个所述其它字段的字段向量，所述字段向量的各个元素分别为相应的字段与所述字段数据表中的各个字段共同出现于所述用户的同一条历史提问语句中的次数；(1) Extract the field vector of the target field and the field vector of each of the other fields respectively from the co-occurrence matrix, and each element of the field vector is the corresponding field and the field data in the field data table. The number of times that each field co-occurs in the same historical question statement of the user;

(2)分别计算所述目标字段的字段向量和每个所述其它字段的字段向量之间的余弦相似度，得到所述目标字段与所述各个其它字段之间的相似度。(2) Calculate the cosine similarity between the field vector of the target field and the field vector of each of the other fields, to obtain the similarity between the target field and each of the other fields.

在该共现矩阵中，每一个字段对应于一个字段向量，例如“职业”的字段向量为[0，18，27，22，3]，性别的字段向量为[18，0，2，15，5]，也即从该共现矩阵中取出某个字段所在的行或列，就是该字段的字段向量。在提取出字段向量之后，分别计算所述目标字段的字段向量和每个所述其它字段的字段向量之间的余弦相似度，即得到所述目标字段与所述各个其它字段之间的相似度。比如，目标字段为“职业”，则其与某个其它字段“性别”之间的相似度等于向量[0，18，27，22，3]和向量[18，0，2，15，5]的余弦相似度。In the co-occurrence matrix, each field corresponds to a field vector, for example, the field vector of "occupation" is [0, 18, 27, 22, 3], and the field vector of gender is [18, 0, 2, 15, 5], that is, the row or column where a field is located from the co-occurrence matrix is taken out, which is the field vector of the field. After the field vector is extracted, the cosine similarity between the field vector of the target field and the field vector of each of the other fields is calculated respectively, that is, the similarity between the target field and each of the other fields is obtained. . For example, if the target field is "occupation", the similarity between it and some other field "gender" is equal to the vector [0, 18, 27, 22, 3] and the vector [18, 0, 2, 15, 5] The cosine similarity of .

207、选取所述各个其它字段中所述相似度最高的字段，对所述第一提问语句中的所述目标字段进行替换，得到推荐的第二提问语句。207. Select the field with the highest similarity among the other fields, and replace the target field in the first question sentence to obtain a recommended second question sentence.

步骤207与步骤105相同，具体可参照步骤105的相关说明。Step 207 is the same asstep 105 , for details, please refer to the relevant description ofstep 105 .

本申请实施例在提取到输入的提问语句的各个字段之后，会将各个字段逐一与预先构建的字段数据表中具有的字段进行比较，找出提取出的字段和字段数据表中具有的相同字段，确定为目标字段；然后，查找用户输入的所有历史提问语句并构建共现矩阵，根据共现矩阵计算所述目标字段与所述字段数据表中除所述目标字段外的各个其它字段之间的相似度，找出相似度最高的字段，对该提问语句中的目标字段进行替换，从而得到推荐的问句。与本申请的第一个实施例相比，本实施例提出了一种计算目标字段与各个其它字段之间的相似度的具体方式。In this embodiment of the present application, after each field of the input question statement is extracted, each field is compared with the fields in the pre-built field data table one by one, and the extracted fields and the same fields in the field data table are found out. , determined as the target field; then, find all the historical question sentences input by the user and construct a co-occurrence matrix, and calculate the relationship between the target field and each other field except the target field in the field data table according to the co-occurrence matrix. , find the field with the highest similarity, and replace the target field in the question sentence to obtain the recommended question sentence. Compared with the first embodiment of the present application, this embodiment proposes a specific method for calculating the similarity between the target field and each other field.

请参阅图3，本申请实施例中一种基于字段相似度计算的问题推荐方法的第三个实施例包括：Referring to FIG. 3 , a third embodiment of a question recommendation method based on field similarity calculation in the embodiment of the present application includes:

301、获取输入的第一提问语句；301. Obtain an inputted first question statement;

302、对所述第一提问语句进行分词处理，提取其中包含的各个字段；302. Perform word segmentation processing on the first question sentence, and extract each field contained therein;

303、将所述各个字段逐一与预先构建的字段数据表中具有的字段进行比较，找出所述各个字段和所述字段数据表具有的相同字段，确定为目标字段；303. Compare each of the fields with the fields in the pre-built field data table one by one, find out the same fields that the each field and the field data table have, and determine it as a target field;

304、查找输入所述第一提问语句的用户的所有历史提问语句；304. Find all historical question sentences of the user who input the first question sentence;

305、根据所述历史提问语句构建共现矩阵，所述共现矩阵记录所述字段数据表中任意两个字段共同出现于所述用户的同一条历史提问语句中的次数；305. Construct a co-occurrence matrix according to the historical question statement, and the co-occurrence matrix records the number of times that any two fields in the field data table co-occur in the same historical question statement of the user;

步骤301-305与步骤201-205相同，具体可参照步骤201-205的相关说明。Steps 301-305 are the same as steps 201-205, and for details, please refer to the relevant description of steps 201-205.

306、根据所述共现矩阵确定所述字段数据表中与所述目标字段共同出现于所述用户的同一条历史提问语句中的次数最多的字段；306. Determine, according to the co-occurrence matrix, the field in the field data table that co-occurs with the target field in the user's same historical question statement;

307、选取所述次数最多的字段，对所述第一提问语句中的所述目标字段进行替换，得到推荐的第三提问语句。307. Select the field with the most times, and replace the target field in the first question sentence to obtain a recommended third question sentence.

在构建出共现矩阵之后，可以根据所述共现矩阵确定所述字段数据表中与所述目标字段共同出现于所述用户的同一条历史提问语句中的次数最多的字段，然后选取所述次数最多的字段，对所述第一提问语句中的所述目标字段进行替换，得到推荐的第三提问语句。After the co-occurrence matrix is constructed, the field in the field data table that co-occurs with the target field in the same historical question sentence of the user the most times can be determined according to the co-occurrence matrix, and then the field is selected. For the field with the highest number of times, the target field in the first question sentence is replaced to obtain a recommended third question sentence.

例如，第一提问语句为“上海不同职业的平均收入分布如何”，其中“职业”是一个目标字段，在该共现矩阵M中，与字段“职业”的共现次数最多的字段为“年龄”(27次)，那么就可以用“年龄”替换该第一提问语句中的“职业”，从而得到第三提问语句：“上海不同年龄的平均收入分布如何”。For example, the first question sentence is "What is the average income distribution of different occupations in Shanghai", where "occupation" is a target field, and in the co-occurrence matrix M, the field with the most co-occurrences with the field "occupation" is "age" ” (27 times), then the “occupation” in the first question sentence can be replaced with “age” to obtain the third question sentence: “What is the average income distribution of different ages in Shanghai?”

本申请实施例在提取到输入的提问语句的各个字段之后，会将各个字段逐一与预先构建的字段数据表中具有的字段进行比较，找出提取出的字段和字段数据表中具有的相同字段，确定为目标字段；然后，查找用户输入的所有历史提问语句并构建共现矩阵；根据所述共现矩阵确定所述字段数据表中与所述目标字段共同出现于所述用户的同一条历史提问语句中的次数最多的字段，选取所述次数最多的字段，对所述第一提问语句中的所述目标字段进行替换，得到推荐的第三提问语句。与本申请的第二个实施例相比，本实施例提出了一种同样使用该共现矩阵，但区别于计算字段之间相似度的提问语句生成方式。In this embodiment of the present application, after each field of the input question statement is extracted, each field is compared with the fields in the pre-built field data table one by one, and the extracted fields and the same fields in the field data table are found out. , determined as the target field; then, look up all historical question sentences input by the user and construct a co-occurrence matrix; according to the co-occurrence matrix, determine the same history of the user that co-occurs with the target field in the field data table The field with the most times in the question sentence is selected, and the field with the most times is selected, and the target field in the first question sentence is replaced to obtain a recommended third question sentence. Compared with the second embodiment of the present application, this embodiment proposes a question sentence generation method that also uses the co-occurrence matrix, but is different from calculating the similarity between fields.

应理解，上述实施例中各步骤的序号的大小并不意味着执行顺序的先后，各过程的执行顺序应以其功能和内在逻辑确定，而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the sequence numbers of the steps in the above embodiments does not mean the sequence of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

对应于上文实施例所述的基于字段相似度计算的问题推荐方法，图4示出了本申请实施例提供的一种基于字段相似度计算的问题推荐装置的结构框图，为了便于说明，仅示出了与本申请实施例相关的部分。Corresponding to the question recommendation method based on field similarity calculation described in the above embodiment, FIG. 4 shows a structural block diagram of a question recommendation device based on field similarity calculation provided by the embodiment of the present application. For convenience of description, only Parts related to the embodiments of the present application are shown.

参照图4，该装置包括：Referring to Figure 4, the device includes:

问题获取模块401，用于获取输入的第一提问语句；aquestion acquisition module 401, configured to acquire the inputted first question statement;

分词模块402，用于对所述第一提问语句进行分词处理，提取其中包含的各个字段；Aword segmentation module 402, configured to perform word segmentation processing on the first question sentence, and extract each field contained therein;

字段比较模块403，用于将所述各个字段逐一与预先构建的字段数据表中具有的字段进行比较，找出所述各个字段和所述字段数据表具有的相同字段，确定为目标字段；Thefield comparison module 403 is used to compare the fields with the fields in the pre-built field data table one by one, find out the same fields that the fields and the field data table have, and determine it as the target field;

字段相似度计算模块404，用于分别计算所述目标字段与所述字段数据表中除所述目标字段外的各个其它字段之间的相似度；a fieldsimilarity calculation module 404, configured to calculate the similarity between the target field and each other field except the target field in the field data table;

问题推荐模块405，用于选取所述各个其它字段中所述相似度最高的字段，对所述第一提问语句中的所述目标字段进行替换，得到推荐的第二提问语句。Thequestion recommendation module 405 is configured to select the field with the highest similarity among the other fields, and replace the target field in the first question sentence to obtain a recommended second question sentence.

进一步的，所述字段相似度计算模块可以包括：Further, the field similarity calculation module may include:

相似度指标计算单元，用于结合所述目标字段的字符串和枚举值，以及所述任意一个其它字段的字符串和枚举值，计算所述目标字段和所述任意一个其它字段的相似度指标，所述相似度指标为用于衡量两个字段之间的相似程度的参数；A similarity index calculation unit, configured to combine the character string and enumeration value of the target field and the character string and enumeration value of any other field to calculate the similarity between the target field and any other field a degree index, the similarity index is a parameter used to measure the degree of similarity between two fields;

第一字段相似度计算单元，用于根据所述目标字段和所述任意一个其它字段的相似度指标，计算得到所述目标字段和所述任意一个其它字段的相似度。The first field similarity calculation unit is configured to calculate the similarity between the target field and any other field according to the similarity index of the target field and any one of the other fields.

进一步的，所述相似度指标计算单元具体可以用于：计算所述目标字段和所述任意一个其它字段的字符串相似度指标、字符串长度相似度指标、枚举值个数相似度指标以及枚举值长度相似度指标；Further, the similarity index calculation unit may be specifically configured to: calculate the string similarity index, the string length similarity index, the number of enumeration value similarity index, and the string similarity index of the target field and any other field. Enumeration value length similarity index;

所述第一字段相似度计算单元具体可以用于：计算所述字符串相似度指标、所述字符串长度相似度指标、所述枚举值个数相似度指标以及所述枚举值长度相似度指标的平均值或者加权平均值，作为所述目标字段和所述任意一个其它字段的相似度。The first field similarity calculation unit may be specifically configured to: calculate the string similarity index, the string length similarity index, the enumeration value number similarity index, and the enumeration value length similarity The average or weighted average of the degree indicators is used as the similarity between the target field and any one of the other fields.

进一步的，所述字符串相似度指标可以采用以下公式计算：Further, the string similarity index can be calculated by the following formula:

历史语句查找单元，用于查找输入所述第一提问语句的用户的所有历史提问语句；a historical sentence search unit, configured to search for all historical question sentences of the user who input the first question sentence;

共现矩阵构建单元，用于根据所述历史提问语句构建共现矩阵，所述共现矩阵记录所述字段数据表中任意两个字段共同出现于所述用户的同一条历史提问语句中的次数；A co-occurrence matrix construction unit, configured to construct a co-occurrence matrix according to the historical question sentence, and the co-occurrence matrix records the number of times that any two fields in the field data table co-occur in the same historical question sentence of the user ;

第二字段相似度计算单元，用于根据所述共现矩阵计算所述目标字段与所述各个其它字段之间的相似度。A second field similarity calculation unit, configured to calculate the similarity between the target field and each of the other fields according to the co-occurrence matrix.

进一步的，所述第二字段相似度计算单元可以包括：Further, the second field similarity calculation unit may include:

字段向量提取子单元，用于从所述共现矩阵中分别提取所述目标字段的字段向量以及每个所述其它字段的字段向量，所述字段向量的各个元素分别为相应的字段与所述字段数据表中的各个字段共同出现于所述用户的同一条历史提问语句中的次数；The field vector extraction subunit is used for extracting the field vector of the target field and the field vector of each of the other fields from the co-occurrence matrix respectively, and each element of the field vector is the corresponding field and the field vector respectively. The number of times that each field in the field data table co-occurs in the same historical question statement of the user;

余弦相似度计算子单元，用于分别计算所述目标字段的字段向量和每个所述其它字段的字段向量之间的余弦相似度，得到所述目标字段与所述各个其它字段之间的相似度。A cosine similarity calculation subunit, used to calculate the cosine similarity between the field vector of the target field and the field vector of each of the other fields, to obtain the similarity between the target field and each of the other fields Spend.

进一步的，所述字段相似度计算模块还可以包括：Further, the field similarity calculation module may also include:

频次最高字段确定单元，用于根据所述共现矩阵确定所述字段数据表中与所述目标字段共同出现于所述用户的同一条历史提问语句中的次数最多的字段；a field determination unit with the highest frequency, configured to determine, according to the co-occurrence matrix, the field with the highest number of times in the field data table co-occurring with the target field in the same historical question sentence of the user;

字段替换模块，用于选取所述次数最多的字段，对所述第一提问语句中的所述目标字段进行替换，得到推荐的第三提问语句。The field replacement module is configured to select the field with the most times, and replace the target field in the first question sentence to obtain a recommended third question sentence.

本申请实施例还提供一种计算机可读存储介质，所述计算机可读存储介质存储有计算机可读指令，所述计算机可读指令被处理器执行时实现如图1至图3表示的任意一种基于字段相似度计算的问题推荐方法的步骤。Embodiments of the present application further provide a computer-readable storage medium, where computer-readable instructions are stored in the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, any one of the instructions shown in FIG. 1 to FIG. 3 is implemented. Steps of a question recommendation method based on field similarity calculation.

本申请实施例还提供一种服务器，包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令，所述处理器执行所述计算机可读指令时实现如图1至图3表示的任意一种基于字段相似度计算的问题推荐方法的步骤。Embodiments of the present application further provide a server, including a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, which are implemented when the processor executes the computer-readable instructions Steps of any question recommendation method based on field similarity calculation as shown in FIG. 1 to FIG. 3 .

本申请实施例还提供一种计算机程序产品，当该计算机程序产品在服务器上运行时，使得服务器执行实现如图1至图3表示的任意一种基于字段相似度计算的问题推荐方法的步骤。Embodiments of the present application also provide a computer program product, which, when the computer program product runs on the server, causes the server to execute the steps of implementing any one of the field similarity calculation-based question recommendation methods shown in FIG. 1 to FIG. 3 .

图5是本申请一实施例提供的服务器的示意图。如图5所示，该实施例的服务器5包括：处理器50、存储器51以及存储在所述存储器51中并可在所述处理器50上运行的计算机可读指令52。所述处理器50执行所述计算机可读指令52时实现上述各个基于字段相似度计算的问题推荐方法实施例中的步骤，例如图1所示的步骤101至105。或者，所述处理器50执行所述计算机可读指令52时实现上述各装置实施例中各模块/单元的功能，例如图4所示模块401至405的功能。FIG. 5 is a schematic diagram of a server provided by an embodiment of the present application. As shown in FIG. 5 , theserver 5 of this embodiment includes: aprocessor 50 , amemory 51 , and computer-readable instructions 52 stored in thememory 51 and executable on theprocessor 50 . When theprocessor 50 executes the computer-readable instructions 52, the steps in each of the above embodiments of the question recommendation method based on field similarity calculation are implemented, for example, steps 101 to 105 shown in FIG. 1 . Alternatively, when theprocessor 50 executes the computer-readable instructions 52, the functions of the modules/units in the above-mentioned apparatus embodiments, for example, the functions of themodules 401 to 405 shown in FIG. 4 are implemented.

示例性的，所述计算机可读指令52可以被分割成一个或多个模块/单元，所述一个或者多个模块/单元被存储在所述存储器51中，并由所述处理器50执行，以完成本申请。所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机可读指令段，该指令段用于描述所述计算机可读指令52在所述服务器5中的执行过程。Exemplarily, the computer-readable instructions 52 may be divided into one or more modules/units, and the one or more modules/units are stored in thememory 51 and executed by theprocessor 50, to complete this application. The one or more modules/units may be a series of computer-readable instruction segments capable of accomplishing specific functions, and the instruction segments are used to describe the execution process of the computer-readable instructions 52 in theserver 5 .

所述服务器5可以是智能手机、笔记本、掌上电脑及云端服务器等计算设备。所述服务器5可包括，但不仅限于，处理器50、存储器51。本领域技术人员可以理解，图5仅仅是服务器5的示例，并不构成对服务器5的限定，可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件，例如所述服务器5还可以包括输入输出设备、网络接入设备、总线等。Theserver 5 may be a computing device such as a smart phone, a notebook, a palmtop computer, and a cloud server. Theserver 5 may include, but is not limited to, aprocessor 50 and amemory 51 . Those skilled in the art can understand that FIG. 5 is only an example of theserver 5, and does not constitute a limitation on theserver 5. It may include more or less components than the one shown in the figure, or combine some components, or different components, such as Theserver 5 may also include input and output devices, network access devices, buses, and the like.

所述处理器50可以是中央处理单元(CentraL Processing Unit，CPU)，还可以是其他通用处理器、数字信号处理器(DigitaL SignaL Processor，DSP)、专用集成电路(AppLication Specific Integrated Circuit，ASIC)、现成可编程门阵列(FieLd-ProgrammabLe Gate Array，FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。Theprocessor 50 may be a central processing unit (CentraL Processing Unit, CPU), or other general-purpose processors, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), Off-the-shelf programmable gate array (FieLd-ProgrammabLe Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

所述存储器51可以是所述服务器5的内部存储单元，例如服务器5的硬盘或内存。所述存储器51也可以是所述服务器5的外部存储设备，例如所述服务器5上配备的插接式硬盘，智能存储卡(Smart Media Card,SMC)，安全数字(Secure DigitaL,SD)卡，闪存卡(FLash Card)等。进一步地，所述存储器51还可以既包括所述服务器5的内部存储单元也包括外部存储设备。所述存储器51用于存储所述计算机可读指令以及所述服务器所需的其他程序和数据。所述存储器51还可以用于暂时地存储已经输出或者将要输出的数据。Thememory 51 may be an internal storage unit of theserver 5 , such as a hard disk or a memory of theserver 5 . Thememory 51 may also be an external storage device of theserver 5, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a Secure Digital (Secure Digital, SD) card equipped on theserver 5, Flash card (FLash Card) and so on. Further, thememory 51 may also include both an internal storage unit of theserver 5 and an external storage device. Thememory 51 is used to store the computer readable instructions and other programs and data required by the server. Thememory 51 can also be used to temporarily store data that has been output or will be output.

需要说明的是，上述装置/单元之间的信息交互、执行过程等内容，由于与本申请方法实施例基于同一构思，其具体功能及带来的技术效果，具体可参见方法实施例部分，此处不再赘述。It should be noted that the information exchange, execution process and other contents between the above-mentioned devices/units are based on the same concept as the method embodiments of the present application. For specific functions and technical effects, please refer to the method embodiments section. It is not repeated here.

所属领域的技术人员可以清楚地了解到，为了描述的方便和简洁，仅以上述各功能单元、模块的划分进行举例说明，实际应用中，可以根据需要而将上述功能分配由不同的功能单元、模块完成，即将所述装置的内部结构划分成不同的功能单元或模块，以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中，上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。另外，各功能单元、模块的具体名称也只是为了便于相互区分，并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and simplicity of description, only the division of the above-mentioned functional units and modules is used as an example. Module completion, that is, dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above. Each functional unit and module in the embodiment may be integrated in one processing unit, or each unit may exist physically alone, or two or more units may be integrated in one unit, and the above-mentioned integrated units may adopt hardware. It can also be realized in the form of software functional units. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing from each other, and are not used to limit the protection scope of the present application. For the specific working processes of the units and modules in the above-mentioned system, reference may be made to the corresponding processes in the foregoing method embodiments, which will not be repeated here.

所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请实现上述实施例方法中的全部或部分流程，可以通过计算机程序来指令相关的硬件来完成，所述的计算机程序可存储于一计算机可读存储介质中，该计算机程序在被处理器执行时，可实现上述各个方法实施例的步骤。其中，所述计算机程序包括计算机程序代码，所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质至少可以包括：能够将计算机程序代码携带到拍照装置/终端设备的任何实体或装置、记录介质、计算机存储器、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，RandomAccess Memory)、电载波信号、电信信号以及软件分发介质。例如U盘、移动硬盘、磁碟或者光盘等。The integrated unit, if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium. Based on this understanding, the present application realizes all or part of the processes in the methods of the above embodiments, which can be completed by instructing the relevant hardware through a computer program, and the computer program can be stored in a computer-readable storage medium. When executed by a processor, the steps of each of the above method embodiments can be implemented. Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form, and the like. The computer-readable medium may include at least: any entity or device capable of carrying the computer program code to the photographing device/terminal device, recording medium, computer memory, read-only memory (ROM, Read-Only Memory), random access memory (RAM, RandomAccess Memory), electrical carrier signal, telecommunication signal, and software distribution medium. For example, U disk, mobile hard disk, disk or CD, etc.

在上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述或记载的部分，可以参见其它实施例的相关描述。In the foregoing embodiments, the description of each embodiment has its own emphasis. For parts that are not described or described in detail in a certain embodiment, reference may be made to the relevant descriptions of other embodiments.

以上所述实施例仅用以说明本申请的技术方案，而非对其限制；尽管参照前述实施例对本申请进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围，均应包含在本申请的保护范围之内。The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the above-mentioned embodiments, those of ordinary skill in the art should understand that: it can still be used for the above-mentioned implementations. The technical solutions described in the examples are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions in the embodiments of the application, and should be included in the within the scope of protection of this application.

Claims

1. A problem recommendation method based on field similarity calculation is characterized by comprising the following steps:

acquiring an input first question sentence;

performing word segmentation processing on the first question sentence, and extracting each field contained in the first question sentence;

comparing the fields with fields in a field data table which is constructed in advance one by one, finding out the fields which are the same as the fields in the field data table, and determining the fields as target fields;

respectively calculating the similarity between the target field and each other field except the target field in the field data table;

and selecting the field with the highest similarity in the other fields, and replacing the target field in the first question sentence to obtain the recommended second question sentence.

2. The question recommendation method of claim 1 wherein the similarity between the target field and any one of the other fields in the field data table is calculated by:

calculating a similarity index of the target field and any one of the other fields by combining the character string and the enumerated value of the target field and the character string and the enumerated value of any one of the other fields, wherein the similarity index is a parameter for measuring the similarity between the two fields;

and calculating the similarity between the target field and any one of the other fields according to the similarity indexes of the target field and any one of the other fields.

3. The question recommendation method of claim 2, wherein said calculating a similarity measure of the target field and the any one of the other fields comprises:

calculating a character string similarity index, a character string length similarity index, an enumerated value number similarity index and an enumerated value length similarity index of the target field and any other field;

the calculating the similarity between the target field and the any one other field according to the similarity index between the target field and the any one other field includes:

and calculating an average value or a weighted average value of the character string similarity index, the character string length similarity index, the enumerated value number similarity index and the enumerated value length similarity index as the similarity of the target field and any other field.

4. The question recommendation method of claim 3, wherein the string similarity index is calculated using the following formula:

wherein s is₁Representing the character string similarity index, sim representing the number of identical character strings of the two fields, short representing the length of the character string of the field with shorter length of the two fields, long representing the length of the character string of the field with longer length of the two fields, α being a hyper-parameter for controlling the influence of the character string on the similarity;

the character string length similarity index is calculated by adopting the following formula:

wherein s is₂Representing the character string length similarity index, wherein short represents the character string length of the field with the shorter length in the two fields, and long represents the character string length of the field with the longer length in the two fields;

the enumeration value number similarity index is calculated by adopting the following formula:

wherein s is₃Expressing the similarity index of the enumeration value number, min expressing the enumeration value number of the field with less enumeration value number in the two fields, max expressing the enumeration value number of the field with less enumeration value number in the two fieldsEnumerated value numbers of fields with more enumerated values;

the enumerated value length similarity index is calculated by adopting the following formula:

wherein s is₄And indicating the enumeration value length similarity index, wherein avg _ min indicates the average length of the enumeration values of the fields with shorter average lengths of the enumeration values in the two fields, and avg _ max indicates the average length of the enumeration values of the fields with longer average lengths of the enumeration values in the two fields.

5. The question recommendation method of claim 1, wherein said separately calculating the similarity between the target field and each of the other fields in the field data table except the target field comprises:

searching all historical question sentences of the user who inputs the first question sentence;

constructing a co-occurrence matrix according to the historical question sentences, wherein the co-occurrence matrix records the times of the common occurrence of any two fields in the field data table in the same historical question sentences of the user;

and calculating the similarity between the target field and each other field according to the co-occurrence matrix.

6. The question recommendation method of claim 5, wherein said calculating a similarity between the target field and the respective other fields according to the co-occurrence matrix comprises:

extracting field vectors of the target fields and field vectors of each other field from the co-occurrence matrix respectively, wherein each element of the field vectors is the times of the common occurrence of the corresponding field and each field in the field data table in the same historical question sentence of the user;

and respectively calculating cosine similarity between the field vector of the target field and the field vectors of each other field to obtain the similarity between the target field and each other field.

7. The question recommendation method according to claim 5 or 6, after constructing a co-occurrence matrix from the historical question sentences, further comprising:

determining a field with the most times which appears in the same historical question sentence of the user together with the target field in the field data table according to the co-occurrence matrix;

and selecting the field with the most times, and replacing the target field in the first question sentence to obtain a recommended third question sentence.

8. A question recommendation apparatus based on field similarity calculation, comprising:

the question acquisition module is used for acquiring an input first question sentence;

the word segmentation module is used for carrying out word segmentation on the first question sentence and extracting each field contained in the first question sentence;

the field comparison module is used for comparing each field with fields in a field data table which is constructed in advance one by one, finding out the same fields of each field and the field data table and determining the same fields as target fields;

the field similarity calculation module is used for calculating the similarity between the target field and each other field except the target field in the field data table;

and the question recommending module is used for selecting the field with the highest similarity in the other fields, and replacing the target field in the first question sentence to obtain a recommended second question sentence.

9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the question recommendation method as claimed in any one of claims 1 to 7.

10. A server comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the problem recommendation method according to any one of claims 1 to 7 when executing the computer program.