CN111274375A

Movatterモバイル変換

Info

Publication number: CN111274375A
Application number: CN202010067240.9A
Authority: CN
Inventors: 陈羽中; 谢琪; 刘漳辉
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2020-01-20
Filing date: 2020-01-20
Publication date: 2020-06-12
Anticipated expiration: 2040-01-20
Also published as: CN111274375B

Abstract

The invention relates to a multi-turn dialogue method and a system based on a bidirectional GRU network, wherein the method comprises the following steps: step A: collecting dialogue context and answer data and constructing dialogue training setD(ii) a And B: using a dialog training setDTraining and fusing bidirectional GRU deep learning network modelM(ii) a And C: conversing with the user, inputting the user question into the trained deep learning network modelMAnd outputting the matched answer. The method and the system are beneficial to improving the matching of answers to the user questions.

Description

Translated fromChinese

一种基于双向GRU网络的多轮对话方法及系统A multi-round dialogue method and system based on bidirectional GRU network

技术领域technical field

本发明涉及自然语言处理领域，具体涉及一种基于双向GRU网络的多轮对话方法及系统。The invention relates to the field of natural language processing, in particular to a multi-round dialogue method and system based on a bidirectional GRU network.

背景技术Background technique

近年来，随着深度学习与神经网络的迅猛发展，人工智能领域迎来了新变革。作为人工智能领域的核心技术之一，多轮对话已成为一个研究热点，未来它可以广泛应用于人机交互、智能家居、智能客服、智能家教和社交机器人等不同行业，具有重大的研究意义、学术价值和应用价值，因而得到了学术界的持续关注与工业界的高度重视。In recent years, with the rapid development of deep learning and neural networks, the field of artificial intelligence has ushered in new changes. As one of the core technologies in the field of artificial intelligence, multi-round dialogue has become a research hotspot. In the future, it can be widely used in different industries such as human-computer interaction, smart home, smart customer service, smart tutoring and social robots. It has great research significance. The academic value and application value have thus received continuous attention from academia and high attention from industry.

Lowe等人将对话上下文字面相连接，形成连接后的上下文矩阵和回答进行匹配，进一步的考虑了对话上下文的整体语意。Yan等人将上下文语句与输入消息连接起来，作为新的查询查询，并执行深度神经网络体系结构的匹配。Zhou等人改进多角度响应选择，使用包含话语视图和单词视图的多视图模型。Zhou等人提出了一种基于注意力机制的对话上下文和回答的匹配算法。该方法使用基于scale-attention的自注意力机制和交互注意力机制构造了两种匹配矩阵，验证了该方法的有效性。Wu等人将候选的回答和每一个上下文的语句进行匹配，并且使用了RNN用来维持句子语义的顺序性，该框架提高了系统的性能，表明了回答和每一个上下文之间的交互作用是有效的。Zhou等人还将回答和每一个上下文语句进行交互，他们使用了一个编码层作为转化而没有用RNN去表示不同层级的句子。他们使用注意力机制提取对话和回答之间更多依赖关系信息，把所有信息累加起来以计算匹配度。现有的注意力机制模型，虽然可以提取对话和回答之间更多依赖关系信息，但是容易受到噪声的影响并且无法补获长期依赖。Lowe et al. connected the dialogue context literally, formed a connected context matrix and matched the answer, and further considered the overall semantics of the dialogue context. Yan et al. concatenate contextual sentences with input messages as new query queries and perform matching of deep neural network architectures. Zhou et al. improve multi-angle response selection, using a multi-view model that includes both utterance views and word views. Zhou et al. proposed an attention-based matching algorithm for dialogue context and answer. This method uses scale-attention-based self-attention mechanism and interactive attention mechanism to construct two matching matrices, which verifies the effectiveness of the method. Wu et al. matched candidate answers with sentences in each context, and used RNN to maintain the order of sentence semantics. The framework improved the performance of the system, showing that the interaction between the answer and each context is Effective. Zhou et al. also interacted the answer with each contextual sentence, and they used an encoding layer as the transformation instead of using RNN to represent sentences at different levels. They use the attention mechanism to extract more information about the dependencies between the dialogue and the answer, and add up all the information to calculate the matching degree. Although the existing attention mechanism models can extract more information about the dependencies between dialogues and answers, they are easily affected by noise and cannot capture long-term dependencies.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于提供一种基于双向GRU网络的多轮对话方法及系统，该方法及系统有利于提高对于用户提问所作回答的匹配性。The purpose of the present invention is to provide a multi-round dialogue method and system based on a bidirectional GRU network, which is beneficial to improve the matching of answers to user questions.

为实现上述目的，本发明采用的技术方案是：一种基于双向GRU网络的多轮对话方法，包括以下步骤：In order to achieve the above object, the technical solution adopted in the present invention is: a multi-round dialogue method based on a two-way GRU network, comprising the following steps:

步骤A：采集对话上下文和回答数据，构建对话训练集TS；Step A: Collect dialogue context and answer data, and construct dialogue training set TS;

步骤B：使用对话训练集TS，训练融合双向GRU网络的深度学习网络模型；Step B: Use the dialogue training set TS to train a deep learning network model fused with a bidirectional GRU network;

步骤C：与用户进行对话，将用户提问输入到训练好的深度学习网络模型中，输出匹配的答案。Step C: Have a dialogue with the user, input the user's question into the trained deep learning network model, and output the matching answer.

进一步地，所述步骤B具体包括以下步骤：Further, the step B specifically includes the following steps:

步骤B1：遍历对话训练集TS，对每一个训练样本的对话上下文和回答进行编码，得到初始表征向量；Step B1: Traverse the dialogue training set TS, encode the dialogue context and answer of each training sample, and obtain an initial representation vector;

步骤B2：将对话上下文和回答的初始表征向量输入到多头注意力机制模块，得到对话和回答的语义表征向量，计算对话与回答的词语相似度矩阵；Step B2: Input the dialogue context and the initial representation vector of the answer to the multi-head attention mechanism module, obtain the semantic representation vector of the dialogue and the answer, and calculate the word similarity matrix of the dialogue and the answer;

步骤B3：将步骤B1得到的对话上下文和回答的初始表征向量输入到双向GRU网络中，计算对话和回答的双向隐状态，然后计算对话与回答的正向语义表征矩阵与反向语义表征矩阵；Step B3: Input the dialogue context and the initial representation vector of the answer obtained in Step B1 into the bidirectional GRU network, calculate the bidirectional hidden state of the dialogue and the answer, and then calculate the forward semantic representation matrix and the reverse semantic representation matrix of the dialogue and the answer;

步骤B4：将对话与回答的词语相似度矩阵、正向语义表征矩阵、反向语义表征矩阵合并为张量输入到二维卷积神经网络中，然后进行特征降维，得到融合对话与回答的语义信息的表征向量序列；Step B4: Combine the word similarity matrix, forward semantic representation matrix, and reverse semantic representation matrix of the dialogue and answer into a tensor and input it into a two-dimensional convolutional neural network, and then perform feature dimension reduction to obtain a fusion dialogue and answer matrix. A sequence of representation vectors of semantic information;

步骤B5：将步骤B4得到的表征向量序列输入到双向GRU网络中，得到融合对话与回答的上下文依赖关系以及语义信息的表征向量

Step B5: Input the sequence of representation vectors obtained in Step B4 into the bidirectional GRU network to obtain the representation vector of the context dependency of the fusion dialogue and answer and the semantic information

步骤B6：重复步骤B2-B5，计算对话训练集中所有训练样本的融合对话与回答的上下文依赖关系以及语义信息的表征向量

Step B6: Repeat steps B2-B5 to calculate the contextual dependencies of the fused dialogue and answer and the representation vector of semantic information of all training samples in the dialogue training set

步骤B7：将所有样本的表征向量

输入到深度学习网络模型的全连接层，根据目标损失函数loss，利用反向传播方法计算深度网络中各参数的梯度，并利用随机梯度下降方法更新参数；Step B7: Convert the characterization vectors of all samples

Input to the fully connected layer of the deep learning network model, according to the target loss function loss, use the back propagation method to calculate the gradient of each parameter in the deep network, and use the stochastic gradient descent method to update the parameters;

步骤B8：当深度学习网络模型产生的损失值小于设定阈值或者达到最大迭代次数时，终止深度学习网络模型的训练。Step B8: When the loss value generated by the deep learning network model is smaller than the set threshold or reaches the maximum number of iterations, the training of the deep learning network model is terminated.

进一步地，所述步骤B1中，对话训练集表示为

其中N表示训练样本数，(U,a)表示对话训练集TS中由对话上下文U与回答a构成的一个训练样本，对话上下文U由对话过程中的多个语句组成，分别对对话上下文U中的每句话以及回答a编码得到其初始表征向量；若u_t表示对话上下文U中的第t句话，其初始表征向量

表示为：Further, in the step B1, the dialogue training set is expressed as

Among them, N represents the number of training samples, and (U, a) represents a training sample composed of a dialogue context U and an answer a in the dialogue training set TS. The dialogue context U consists of multiple sentences in the dialogue process, and the Each sentence and answer a are encoded to obtain its initial representation vector; if u_t represents the t-th sentence in the dialogue context U, its initial representation vector

Expressed as:

回答a的初始表征向量表示为：The initial representation vector for answer a is expressed as:

其中，

L_t和L_a分别表示u_t和a经过分词以及去除停用词后的剩余词数，

和

分别为

和

中第i个词的词向量，通过在预训练的词向量矩阵

中查找得到，d₁表示词向量的维度，|D|表示词典中的词数。in,

L_t and La represent the number of remaining words of u_t and_a after word segmentation and removal of stop words, respectively,

and

respectively

and

The word vector of the i-th word in the pretrained word vector matrix

, d₁ represents the dimension of the word vector, and |D| represents the number of words in the dictionary.

进一步地，所述步骤B2具体包括以下步骤：Further, the step B2 specifically includes the following steps:

步骤B21：选择能够整除d₁的整数s，对对话上下文中的每句话，将其初始表征向量

与回答的初始表征向量

在最后一个维度上平均切分成s个子向量，分别得到子向量序列

和

其中

是u_t的第h个子向量，

是

的第h个子向量；Step B21: Select an integer s that can divide d₁ , and for each sentence in the dialogue context, use its initial representation vector

and the initial representation vector of the answer

In the last dimension, it is divided into s sub-vectors on average, and the sub-vector sequences are obtained respectively

and

in

is the h-th subvector of u_t ,

Yes

The h-th subvector of ;

步骤B22：将

的每个子向量和

中对应的子向量构成一个子向量对，即

h＝1,2,...,n，输入到注意力机制模块中，计算得到

的语义表征向量

以及

的语义表征向量

Step B22: put

Each subvector of and

The corresponding sub-vectors in form a sub-vector pair, that is,

h=1,2,...,n, input into the attention mechanism module, the calculation is obtained

The semantic representation vector of

as well as

The semantic representation vector of

其中

的计算公式如下：in

The calculation formula is as follows:

的计算公式如下：

The calculation formula is as follows:

其中，T表示矩阵转置操作；Among them, T represents the matrix transpose operation;

计算

的加权连接，得到u_t的语义表征向量

表示如下：calculate

The weighted connection of , obtains the semantic representation vector of u_t

It is expressed as follows:

计算

的加权连接，得到a的语义表征向量

表示如下：calculate

The weighted connection of , obtains the semantic representation vector of a

It is expressed as follows:

其中W₁,W₂为多头注意力机制的训练参数；where W₁ , W₂ are the training parameters of the multi-head attention mechanism;

步骤B23：计算对话上下文中的每句话与回答的词语相似度矩阵；u_t表示对话上下文中的第t句话，则其与回答a的词语相似度矩阵

的计算公式如下：Step B23: Calculate the word similarity matrix of each sentence in the dialogue context and the answer; u_t represents the t-th sentence in the dialogue context, then it is the word similarity matrix of answer a

The calculation formula is as follows:

进一步地，所述步骤B3具体包括以下步骤：Further, the step B3 specifically includes the following steps:

步骤B31：将回答的初始表征向量视作词向量构成的序列，输入到双向GRU网络中，计算正向与反向隐状态向量；Step B31: Treat the initial representation vector of the answer as a sequence composed of word vectors, input it into the bidirectional GRU network, and calculate the forward and reverse hidden state vectors;

将回答的初始表征向量

视作

构成的序列，依次输入正向GRU，得到回答的正向隐状态向量

initial representation vector that will answer

regarded as

The sequence formed is input into the forward GRU in turn, and the forward hidden state vector of the answer is obtained.

将

依次输入反向GRU，得到回答的反向隐状态向量

其中

d₂为GRU的单元数；Will

Enter the reverse GRU in turn to get the reverse hidden state vector of the answer

in

d₂ is the unit number of GRU;

步骤B32：将对话上下文中的每句话的初始表征向量视作词向量构成的序列，输入到双向GRU网络中，计算正向与反向隐状态向量；Step B32: The initial representation vector of each sentence in the dialogue context is regarded as a sequence composed of word vectors, input into the bidirectional GRU network, and the forward and reverse hidden state vectors are calculated;

u_t表示对话上下文中的第t句话，将

视作

构成的序列，依次输入正向GRU，得到对话上下文中第t句话u_t的正向隐状态向量

u_t represents the t-th sentence in the dialogue context, which will be

regarded as

The formed sequence is input to the forward GRU in turn, and the forward hidden state vector of the_t -th sentence ut in the dialogue context is obtained.

将

依次输入反向GRU，得到对话上下文中第t句话u_t的反向隐状态向量

其中

Will

Enter the reverse GRU in turn to get the reverse hidden state vector of the_t -th sentence ut in the dialogue context

in

步骤B33：计算对话上下文中的每句话与回答的正向语义表征矩阵与反向语义表征矩阵；u_t表示对话上下文中的第t句话，则其与回答a的正向语义表征矩阵M_2,t与反向语义表征矩阵M_3,t的计算公式如下：Step B33: Calculate the forward semantic representation matrix and reverse semantic representation matrix of each sentence and answer in the dialogue context; u_t represents the t-th sentence in the dialogue context, then it and the forward semantic representation matrix M of answer a_2,t and the reverse semantic representation matrix M_3,t The calculation formula is as follows:

其中，

in,

进一步地，所述步骤B4具体包括以下步骤：Further, the step B4 specifically includes the following steps:

步骤B41：合并M_1,t、M_2,t、M_3,t，得到张量

Step B41: Merge M_1,t , M_2,t , M_3,t to obtain a tensor

M_t＝[M_1,t,M_2,t,M_3,t]M_t =[M_1,t ,M_2,t ,M_3,t ]

步骤B42：将M_t输入到二维卷积神经网络中进行卷积和池化，然后输入全连接层进行降维，得到融合u_t与a的语义信息的表征向量

其中d₃为全连接层降维后的维度；Step B42: Input M_t into a two-dimensional convolutional neural network for convolution and pooling, and then input it into a fully connected layer for dimensionality reduction to obtain a representation vector that fuses the semantic information of u_t and a

where d₃ is the dimension of the fully connected layer after dimensionality reduction;

步骤B43：对对话上下文U中的每个语句，计算其与回答a的语义信息的表征向量

其中L_u为对话上下文U中的句子数。Step B43: For each sentence in the dialogue context U, calculate the representation vector of the semantic information of the sentence and the answer a

where_Lu is the number of sentences in the dialogue context U.

进一步地，所述步骤B5中，将表征向量序列

输入到双向GRU网络中，通过双向GRU网络对对话上下文与回答的关系进行建模，将最后输出的隐状态向量作为融合对话与回答的上下文依赖关系以及语义信息的表征向量

其中

Further, in the step B5, characterize the vector sequence

The input is input into the bidirectional GRU network, the relationship between the dialogue context and the answer is modeled through the bidirectional GRU network, and the final output hidden state vector is used as the representation vector of the context dependency of the fusion dialogue and answer and the semantic information.

in

进一步地，所述步骤B7具体包括以下步骤：Further, the step B7 specifically includes the following steps:

步骤B71：将最终的表征向量

输入到全连接层，并使用softmax归一化，计算回答属于各类别的概率，计算公式如下：Step B71: Convert the final representation vector

Input to the fully connected layer and use softmax normalization to calculate the probability that the answer belongs to each category. The calculation formula is as follows:

g^c(U,a)＝softmax(y)g^c (U,a)=softmax(y)

其中，W_s为全连接层权重矩阵，b_s为全连接层的偏置项，g^c(U,a)为回答属于步骤B1所处理的训练样本(U,a)中的对话上下文U的概率，0≤g^c(U,a)≤1，c∈{正确,错误}；Among them, W_s is the weight matrix of the fully connected layer, b_s is the bias term of the fully connected layer, and g^c (U, a) is the answer to the dialogue context U in the training sample (U, a) processed in step B1. Probability, 0≤g^c (U, a)≤1, c∈{true, false};

步骤B72：用交叉熵作为损失函数计算损失值，通过梯度优化算法AdaGrad进行学习率更新，利用反向传播迭代更新模型参数，以最小化损失函数来训练模型；Step B72: Calculate the loss value by using the cross entropy as the loss function, update the learning rate through the gradient optimization algorithm AdaGrad, and iteratively update the model parameters by using backpropagation to train the model to minimize the loss function;

其中，最小化损失函数Loss的计算公式为：Among them, the calculation formula for minimizing the loss function Loss is:

其中，(U_i,a_i)表示对话训练集TS中的第i个训练样本，y_i为类别标签，y_i∈{0,1}。Among them, (U_i , a_i ) represents the ith training sample in the dialogue training set TS,_yi is the category label, and_yi ∈ {0,1}.

本发明还提供了一种采用上述方法的多轮对话系统，包括：The present invention also provides a multi-round dialogue system using the above method, including:

构建训练集模块，用于采集对话上下文和回答数据，构建对话训练集TS；Build a training set module to collect dialogue context and answer data, and construct a dialogue training set TS;

模型训练模块，用于使用对话训练集TS，训练融合双向GRU网络的深度学习网络模型；以及A model training module for training a deep learning network model fused with a bidirectional GRU network using the dialogue training set TS; and

多轮对话模块，用于与用户进行对话，将用户提问输入训练好的深度学习网络模型中，输出最匹配的回答。The multi-round dialogue module is used to dialogue with the user, input the user's question into the trained deep learning network model, and output the most matching answer.

相较于现有技术，本发明具有以下有益效果：提供了一种基于双向GRU网络的多轮对话方法及系统，该方法及系统使用多头注意力能够捕获长期依赖，并且多头注意力机制比传统的注意力机制更细粒度，从而能够减少噪声的影响。同时使用双向GRU可以更好地捕获语句在时间上的关系，提高对于用户提问所作回答的准确性和匹配性，具有很强的实用性和广阔的应用前景。Compared with the prior art, the present invention has the following beneficial effects: providing a multi-round dialogue method and system based on a bidirectional GRU network, the method and system can capture long-term dependencies using multi-head attention, and the multi-head attention mechanism is more efficient than traditional The attention mechanism is more fine-grained and thus able to reduce the effect of noise. At the same time, the use of bidirectional GRU can better capture the temporal relationship of sentences, improve the accuracy and matching of answers to user questions, and has strong practicability and broad application prospects.

附图说明Description of drawings

图1为本发明实施例的方法实现流程图。FIG. 1 is a flowchart of a method implementation according to an embodiment of the present invention.

图2为本发明实施例的系统结构示意图。FIG. 2 is a schematic diagram of a system structure according to an embodiment of the present invention.

图3为本发明实施例的模型架构图。FIG. 3 is a model architecture diagram of an embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图及具体实施例对本发明作进一步的详细说明。The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.

本发明提供了一种基于双向GRU网络的多轮对话方法，如图1所示，包括以下步骤：The present invention provides a multi-round dialogue method based on a bidirectional GRU network, as shown in FIG. 1 , including the following steps:

步骤A：采集对话上下文和回答数据，构建对话训练集TS。Step A: Collect dialogue context and answer data, and construct a dialogue training set TS.

步骤B：使用对话训练集TS，训练融合双向GRU网络的深度学习网络模型。Step B: Use the dialogue training set TS to train a deep learning network model fused with a bidirectional GRU network.

图3是本发明实施例中深度学习网络模型的架构图。使用对话训练集TS训练所述模型具体包括以下步骤：FIG. 3 is an architectural diagram of a deep learning network model in an embodiment of the present invention. Using the dialogue training set TS to train the model specifically includes the following steps:

步骤B1：遍历对话训练集TS，对每一个训练样本的对话上下文和回答进行编码，得到初始表征向量。Step B1: Traverse the dialogue training set TS, encode the dialogue context and answer of each training sample, and obtain an initial representation vector.

其中，对话训练集表示为

表示为：Among them, the dialogue training set is expressed as

Expressed as:

其中，

和

分别为

和

中第i个词的词向量，通过在预训练的词向量矩阵

and

respectively

and

The word vector of the i-th word in the pretrained word vector matrix

步骤B2：将对话上下文和回答的初始表征向量输入到多头注意力机制模块，得到对话和回答的语义表征向量，计算对话与回答的词语相似度矩阵。具体包括以下步骤：Step B2: Input the dialogue context and the initial representation vector of the answer to the multi-head attention mechanism module, obtain the semantic representation vector of the dialogue and the answer, and calculate the word similarity matrix of the dialogue and the answer. Specifically include the following steps:

与回答的初始表征向量

和

其中

是u_t的第h个子向量，

是

and the initial representation vector of the answer

and

in

is the h-th subvector of u_t ,

Yes

The h-th subvector of ;

步骤B22：将

的每个子向量和

中对应的子向量构成一个子向量对，即

输入到注意力机制模块中，计算得到

的语义表征向量

以及

的语义表征向量

Step B22: put

Each subvector of and

The corresponding sub-vectors in form a sub-vector pair, that is,

Input into the attention mechanism module and calculate

The semantic representation vector of

as well as

The semantic representation vector of

其中

的计算公式如下：in

The calculation formula is as follows:

的计算公式如下：

The calculation formula is as follows:

计算

的加权连接，得到u_t的语义表征向量

表示如下：calculate

The weighted connection of , obtains the semantic representation vector of u_t

It is expressed as follows:

计算

的加权连接，得到a的语义表征向量

表示如下：calculate

The weighted connection of , obtains the semantic representation vector of a

It is expressed as follows:

The calculation formula is as follows:

步骤B3：将步骤B1得到的对话上下文和回答的初始表征向量输入到双向GRU网络中，计算对话和回答的双向隐状态，然后计算对话与回答的正向语义表征矩阵与反向语义表征矩阵。具体包括以下步骤：Step B3: Input the dialogue context and the initial representation vector of the answer obtained in Step B1 into the bidirectional GRU network, calculate the bidirectional hidden state of the dialogue and the answer, and then calculate the forward semantic representation matrix and the reverse semantic representation matrix of the dialogue and the answer. Specifically include the following steps:

将回答的初始表征向量

视作

构成的序列，依次输入正向GRU，得到回答的正向隐状态向量

initial representation vector that will answer

regarded as

将

依次输入反向GRU，得到回答的反向隐状态向量

其中

d₂为GRU的单元数；Will

in

d₂ is the unit number of GRU;

u_t表示对话上下文中的第t句话，将

视作

u_t represents the t-th sentence in the dialogue context, which will be

regarded as

将

其中

Will

in

其中，

in,

步骤B4：将对话与回答的词语相似度矩阵、正向语义表征矩阵、反向语义表征矩阵合并为张量输入到二维卷积神经网络中，然后进行特征降维，得到融合对话与回答的语义信息的表征向量序列。具体包括以下步骤：Step B4: Combine the word similarity matrix, forward semantic representation matrix, and reverse semantic representation matrix of the dialogue and answer into a tensor and input it into a two-dimensional convolutional neural network, and then perform feature dimension reduction to obtain a fusion dialogue and answer matrix. A sequence of representation vectors for semantic information. Specifically include the following steps:

步骤B41：合并M_1,t、M_2,t、M_3,t，得到张量

Step B41: Merge M_1,t , M_2,t , M_3,t to obtain a tensor

M_t＝[M_1,t,M_2,t,M_3,t]M_t =[M_1,t ,M_2,t ,M_3,t ]

where_Lu is the number of sentences in the dialogue context U.

其中，将表征向量序列

其中

where, will characterize the vector sequence

in

步骤B7：将所有样本的表征向量

输入到深度学习网络模型的全连接层，根据目标损失函数loss，利用反向传播方法计算深度网络中各参数的梯度，并利用随机梯度下降方法更新参数。具体包括以下步骤：Step B7: Convert the characterization vectors of all samples

Input to the fully connected layer of the deep learning network model, according to the target loss function loss, use the back propagation method to calculate the gradient of each parameter in the deep network, and use the stochastic gradient descent method to update the parameters. Specifically include the following steps:

步骤B71：将最终的表征向量

g^c(U,a)＝softmax(y)g^c (U,a)=softmax(y)

本发明还提供了采用上述方法的多轮对话系统，如图2所示，包括：The present invention also provides a multi-round dialogue system using the above method, as shown in Figure 2, including:

以上是本发明的较佳实施例，凡依本发明技术方案所作的改变，所产生的功能作用未超出本发明技术方案的范围时，均属于本发明的保护范围。The above are the preferred embodiments of the present invention, all changes made according to the technical solutions of the present invention, when the resulting functional effects do not exceed the scope of the technical solutions of the present invention, belong to the protection scope of the present invention.

Claims

1. A multi-turn dialogue method based on a bidirectional GRU network is characterized by comprising the following steps:

step A: collecting conversation context and answer data, and constructing a conversation Training Set (TS);

and B: training a deep learning network model fusing a bidirectional GRU network by using a session Training Set (TS);

and C: and (5) carrying out dialogue with the user, inputting the user question into the trained deep learning network model, and outputting a matched answer.

2. The method of claim 1, wherein step B specifically comprises the following steps:

step B1: traversing a dialogue Training Set (TS), and coding the dialogue context and answer of each training sample to obtain an initial characterization vector;

step B2: inputting the initial characterization vectors of the conversation context and the answer into a multi-head attention mechanism module to obtain semantic characterization vectors of the conversation and the answer, and calculating a word similarity matrix of the conversation and the answer;

step B3: inputting the dialogue context and the initial characterization vector of the answer obtained in the step B1 into a bidirectional GRU network, calculating bidirectional hidden states of the dialogue and the answer, and then calculating a forward semantic characterization matrix and a reverse semantic characterization matrix of the dialogue and the answer;

step B4: merging a word similarity matrix, a forward semantic representation matrix and a reverse semantic representation matrix of the conversation and the answer into a tensor, inputting the tensor into a two-dimensional convolution neural network, and then performing feature dimensionality reduction to obtain a representation vector sequence fusing semantic information of the conversation and the answer;

step B5: inputting the characterization vector sequence obtained in the step B4 into a bidirectional GRU network to obtain a characterization vector fusing context dependence of conversation and answer and semantic information

Step B6: repeating the steps B2-B5, calculating the context dependency relationship of the fused dialog and answer of all the training samples in the dialog training set and the characterization vector of the semantic information

Step (ii) ofB7: feature vectors of all samples

Inputting the data into a full-connection layer of a deep learning network model, calculating the gradient of each parameter in the deep network by using a back propagation method according to a target loss function loss, and updating the parameter by using a random gradient descent method;

step B8: and terminating the training of the deep learning network model when the loss value generated by the deep learning network model is smaller than a set threshold value or reaches the maximum iteration number.

3. The method of claim 2, wherein in step B1, the dialog training set is expressed as

Wherein N represents the number of training samples, and (U, a) represents a training sample consisting of a conversation context U and an answer a in a conversation training set TS, wherein the conversation context U consists of a plurality of sentences in a conversation process, and each sentence in the conversation context U and the answer a are coded respectively to obtain an initial characterization vector of the conversation context U; if u_tRepresents the t < th > sentence in the dialog context U, its initial token vector

Expressed as:

the initial characterization vector for answer a is represented as:

wherein,

L_tand L_aRespectively represents u_tAnd a the number of remaining words after word segmentation and removal of stop words,

and

are respectively as

And

the word vector of the ith word is obtained by pre-training the word vector matrix

Is found in d₁Represents the dimension of the word vector, | D | represents the number of words in the dictionary.

4. The method for multi-turn dialog based on a bidirectional GRU network as claimed in claim 3, wherein said step B2 specifically comprises the steps of:

step B21: selecting the ability to divide d₁S, for each sentence in the conversational context, its initial token vector

Initial characterization vector with answer

Dividing the last dimension into s sub-vectors to obtain sub-vector sequences

And

wherein

Is u_tThe h-th sub-vector of (2),

is that

The h-th sub-vector of (1);

step B22: will be provided with

Each subvector of (1) and

wherein the corresponding sub-vectors form a sub-vector pair, i.e.

Inputting the data into an attention mechanism module, and calculating to obtain

Semantic representation vector of

And

semantic representation vector of

Wherein

The calculation formula of (a) is as follows:

the calculation formula of (a) is as follows:

wherein T represents a matrix transpose operation;

computing

Weighted concatenation of to obtain u_tSemantic representation vector of

Is represented as follows:

computing

To obtain a semantic representation vector of a

Is represented as follows:

wherein W₁,W₂Training parameters for a multi-head attention system;

step B23: calculating the term facies of each sentence and answer in a conversational contextA similarity matrix; u. of_tThe word similarity matrix representing the tth sentence in the dialog context with answer a

The calculation formula of (a) is as follows:

5. the method for multi-turn dialog based on a bidirectional GRU network as claimed in claim 4, wherein said step B3 specifically comprises the steps of:

step B31: taking the initial characterization vectors of the answers as a sequence formed by word vectors, inputting the sequence into a bidirectional GRU network, and calculating forward and reverse hidden state vectors;

initial characterization vector of answer

Is regarded as

The formed sequence is sequentially input into forward GRU to obtain a forward hidden state vector of the answer

Will be provided with

Sequentially inputting reverse GRUs to obtain a reverse hidden state vector of the answer

Wherein

d₂The unit number of GRU;

step B32: regarding the initial characterization vector of each sentence in the dialog context as a sequence formed by word vectors, inputting the sequence into a bidirectional GRU network, and calculating forward and reverse hidden state vectors;

u_tthe t-th sentence in the context of the presentation dialog will

Is regarded as

The formed sequence is input with forward GRU in turn to obtain the t-th sentence u in the dialog context_tForward hidden state vector of

Will be provided with

Inputting reverse GRU in turn to obtain the t-th sentence u in the dialog context_tReverse hidden state vector of

Wherein

Step B33: calculating a forward semantic representation matrix and a reverse semantic representation matrix of each sentence in the dialogue context; u. of_tDenotes the t-th sentence in the dialog context, and its forward semantic representation matrix M with answer a_2,tAnd a reverse semantic representation matrix M_3,tThe calculation formula of (a) is as follows:

wherein,

6. the method for multi-turn dialog based on a bidirectional GRU network as claimed in claim 5, wherein said step B4 specifically comprises the steps of:

step B41: merge M_1,t、M_2,t、M_3,tTo obtain tensor

M_t＝[M_1,t,M_2,t,M_3,t]

Step B42: will M_tInputting the data into a two-dimensional convolution neural network for convolution and pooling, and then inputting the data into a full-connection layer for dimensionality reduction to obtain a fusion u_tCharacterization vector of semantic information of a

Wherein d is₃Dimension after dimension reduction of the full connection layer;

step B43: for each sentence in the dialog context U, a characterization vector of its semantic information with the answer a is calculated

Wherein L is_uIs the number of sentences in the dialog context U.

7. The method of claim 6, wherein in step B5, the token vector sequence is represented by a two-way GRU network

Inputting the result into a bidirectional GRU network, modeling the relationship between the conversation context and the answer through the bidirectional GRU network, and outputting the finally output hidden state vectorCharacterization vectors as context dependencies and semantic information for fusing dialog and answers

Wherein

8. The method for multi-turn dialog based on a bidirectional GRU network as claimed in claim 7, wherein said step B7 specifically comprises the steps of:

step B71: the final characterization vector

Inputting the data into the full-link layer, and calculating the probability of answers belonging to each category by using softmax normalization, wherein the calculation formula is as follows:

g^c(U,a)＝softmax(y)

wherein, W_sIs a full connection layer weight matrix, b_sBias term for fully connected layer, g^c(U, a) is the probability of answering the dialog context U belonging to the training sample (U, a) processed in step B1, 0 ≦ g^c(U, a) is less than or equal to 1, c belongs to { correct, wrong };

step B72: calculating a loss value by using the cross entropy as a loss function, updating the learning rate by using a gradient optimization algorithm AdaGrad, and training a model by using a minimum loss function by updating model parameters through back propagation iteration;

the calculation formula of the Loss minimization function Loss is as follows:

wherein (U)_i,a_i) Presentation dialogue trainingSet i training sample in TS, y_iAs a class label, y_i∈{0,1}。

9. A multi-turn dialog system employing the method of any of claims 1-8 comprising:

a training set building module for collecting the dialogue context and the answer data and building a dialogue training set TS;

the model training module is used for training a deep learning network model fusing the bidirectional GRU network by using a session Training Set (TS); and

and the multi-turn dialogue module is used for carrying out dialogue with the user, inputting the user questions into the trained deep learning network model and outputting the best matched answers.