


技术领域technical field
本发明涉及自然语言处理领域,具体涉及一种基于双向GRU网络的多轮对话方法及系统。The invention relates to the field of natural language processing, in particular to a multi-round dialogue method and system based on a bidirectional GRU network.
背景技术Background technique
近年来,随着深度学习与神经网络的迅猛发展,人工智能领域迎来了新变革。作为人工智能领域的核心技术之一,多轮对话已成为一个研究热点,未来它可以广泛应用于人机交互、智能家居、智能客服、智能家教和社交机器人等不同行业,具有重大的研究意义、学术价值和应用价值,因而得到了学术界的持续关注与工业界的高度重视。In recent years, with the rapid development of deep learning and neural networks, the field of artificial intelligence has ushered in new changes. As one of the core technologies in the field of artificial intelligence, multi-round dialogue has become a research hotspot. In the future, it can be widely used in different industries such as human-computer interaction, smart home, smart customer service, smart tutoring and social robots. It has great research significance. The academic value and application value have thus received continuous attention from academia and high attention from industry.
Lowe等人将对话上下文字面相连接,形成连接后的上下文矩阵和回答进行匹配,进一步的考虑了对话上下文的整体语意。Yan等人将上下文语句与输入消息连接起来,作为新的查询查询,并执行深度神经网络体系结构的匹配。Zhou等人改进多角度响应选择,使用包含话语视图和单词视图的多视图模型。Zhou等人提出了一种基于注意力机制的对话上下文和回答的匹配算法。该方法使用基于scale-attention的自注意力机制和交互注意力机制构造了两种匹配矩阵,验证了该方法的有效性。Wu等人将候选的回答和每一个上下文的语句进行匹配,并且使用了RNN用来维持句子语义的顺序性,该框架提高了系统的性能,表明了回答和每一个上下文之间的交互作用是有效的。Zhou等人还将回答和每一个上下文语句进行交互,他们使用了一个编码层作为转化而没有用RNN去表示不同层级的句子。他们使用注意力机制提取对话和回答之间更多依赖关系信息,把所有信息累加起来以计算匹配度。现有的注意力机制模型,虽然可以提取对话和回答之间更多依赖关系信息,但是容易受到噪声的影响并且无法补获长期依赖。Lowe et al. connected the dialogue context literally, formed a connected context matrix and matched the answer, and further considered the overall semantics of the dialogue context. Yan et al. concatenate contextual sentences with input messages as new query queries and perform matching of deep neural network architectures. Zhou et al. improve multi-angle response selection, using a multi-view model that includes both utterance views and word views. Zhou et al. proposed an attention-based matching algorithm for dialogue context and answer. This method uses scale-attention-based self-attention mechanism and interactive attention mechanism to construct two matching matrices, which verifies the effectiveness of the method. Wu et al. matched candidate answers with sentences in each context, and used RNN to maintain the order of sentence semantics. The framework improved the performance of the system, showing that the interaction between the answer and each context is Effective. Zhou et al. also interacted the answer with each contextual sentence, and they used an encoding layer as the transformation instead of using RNN to represent sentences at different levels. They use the attention mechanism to extract more information about the dependencies between the dialogue and the answer, and add up all the information to calculate the matching degree. Although the existing attention mechanism models can extract more information about the dependencies between dialogues and answers, they are easily affected by noise and cannot capture long-term dependencies.
发明内容SUMMARY OF THE INVENTION
本发明的目的在于提供一种基于双向GRU网络的多轮对话方法及系统,该方法及系统有利于提高对于用户提问所作回答的匹配性。The purpose of the present invention is to provide a multi-round dialogue method and system based on a bidirectional GRU network, which is beneficial to improve the matching of answers to user questions.
为实现上述目的,本发明采用的技术方案是:一种基于双向GRU网络的多轮对话方法,包括以下步骤:In order to achieve the above object, the technical solution adopted in the present invention is: a multi-round dialogue method based on a two-way GRU network, comprising the following steps:
步骤A:采集对话上下文和回答数据,构建对话训练集TS;Step A: Collect dialogue context and answer data, and construct dialogue training set TS;
步骤B:使用对话训练集TS,训练融合双向GRU网络的深度学习网络模型;Step B: Use the dialogue training set TS to train a deep learning network model fused with a bidirectional GRU network;
步骤C:与用户进行对话,将用户提问输入到训练好的深度学习网络模型中,输出匹配的答案。Step C: Have a dialogue with the user, input the user's question into the trained deep learning network model, and output the matching answer.
进一步地,所述步骤B具体包括以下步骤:Further, the step B specifically includes the following steps:
步骤B1:遍历对话训练集TS,对每一个训练样本的对话上下文和回答进行编码,得到初始表征向量;Step B1: Traverse the dialogue training set TS, encode the dialogue context and answer of each training sample, and obtain an initial representation vector;
步骤B2:将对话上下文和回答的初始表征向量输入到多头注意力机制模块,得到对话和回答的语义表征向量,计算对话与回答的词语相似度矩阵;Step B2: Input the dialogue context and the initial representation vector of the answer to the multi-head attention mechanism module, obtain the semantic representation vector of the dialogue and the answer, and calculate the word similarity matrix of the dialogue and the answer;
步骤B3:将步骤B1得到的对话上下文和回答的初始表征向量输入到双向GRU网络中,计算对话和回答的双向隐状态,然后计算对话与回答的正向语义表征矩阵与反向语义表征矩阵;Step B3: Input the dialogue context and the initial representation vector of the answer obtained in Step B1 into the bidirectional GRU network, calculate the bidirectional hidden state of the dialogue and the answer, and then calculate the forward semantic representation matrix and the reverse semantic representation matrix of the dialogue and the answer;
步骤B4:将对话与回答的词语相似度矩阵、正向语义表征矩阵、反向语义表征矩阵合并为张量输入到二维卷积神经网络中,然后进行特征降维,得到融合对话与回答的语义信息的表征向量序列;Step B4: Combine the word similarity matrix, forward semantic representation matrix, and reverse semantic representation matrix of the dialogue and answer into a tensor and input it into a two-dimensional convolutional neural network, and then perform feature dimension reduction to obtain a fusion dialogue and answer matrix. A sequence of representation vectors of semantic information;
步骤B5:将步骤B4得到的表征向量序列输入到双向GRU网络中,得到融合对话与回答的上下文依赖关系以及语义信息的表征向量Step B5: Input the sequence of representation vectors obtained in Step B4 into the bidirectional GRU network to obtain the representation vector of the context dependency of the fusion dialogue and answer and the semantic information
步骤B6:重复步骤B2-B5,计算对话训练集中所有训练样本的融合对话与回答的上下文依赖关系以及语义信息的表征向量Step B6: Repeat steps B2-B5 to calculate the contextual dependencies of the fused dialogue and answer and the representation vector of semantic information of all training samples in the dialogue training set
步骤B7:将所有样本的表征向量输入到深度学习网络模型的全连接层,根据目标损失函数loss,利用反向传播方法计算深度网络中各参数的梯度,并利用随机梯度下降方法更新参数;Step B7: Convert the characterization vectors of all samples Input to the fully connected layer of the deep learning network model, according to the target loss function loss, use the back propagation method to calculate the gradient of each parameter in the deep network, and use the stochastic gradient descent method to update the parameters;
步骤B8:当深度学习网络模型产生的损失值小于设定阈值或者达到最大迭代次数时,终止深度学习网络模型的训练。Step B8: When the loss value generated by the deep learning network model is smaller than the set threshold or reaches the maximum number of iterations, the training of the deep learning network model is terminated.
进一步地,所述步骤B1中,对话训练集表示为其中N表示训练样本数,(U,a)表示对话训练集TS中由对话上下文U与回答a构成的一个训练样本,对话上下文U由对话过程中的多个语句组成,分别对对话上下文U中的每句话以及回答a编码得到其初始表征向量;若ut表示对话上下文U中的第t句话,其初始表征向量表示为:Further, in the step B1, the dialogue training set is expressed as Among them, N represents the number of training samples, and (U, a) represents a training sample composed of a dialogue context U and an answer a in the dialogue training set TS. The dialogue context U consists of multiple sentences in the dialogue process, and the Each sentence and answer a are encoded to obtain its initial representation vector; if ut represents the t-th sentence in the dialogue context U, its initial representation vector Expressed as:
回答a的初始表征向量表示为:The initial representation vector for answer a is expressed as:
其中,Lt和La分别表示ut和a经过分词以及去除停用词后的剩余词数,和分别为和中第i个词的词向量,通过在预训练的词向量矩阵中查找得到,d1表示词向量的维度,|D|表示词典中的词数。in, Lt and La represent the number of remaining words of ut anda after word segmentation and removal of stop words, respectively, and respectively and The word vector of the i-th word in the pretrained word vector matrix , d1 represents the dimension of the word vector, and |D| represents the number of words in the dictionary.
进一步地,所述步骤B2具体包括以下步骤:Further, the step B2 specifically includes the following steps:
步骤B21:选择能够整除d1的整数s,对对话上下文中的每句话,将其初始表征向量与回答的初始表征向量在最后一个维度上平均切分成s个子向量,分别得到子向量序列和其中是ut的第h个子向量,是的第h个子向量;Step B21: Select an integer s that can divide d1 , and for each sentence in the dialogue context, use its initial representation vector and the initial representation vector of the answer In the last dimension, it is divided into s sub-vectors on average, and the sub-vector sequences are obtained respectively and in is the h-th subvector of ut , Yes The h-th subvector of ;
步骤B22:将的每个子向量和中对应的子向量构成一个子向量对,即h=1,2,...,n,输入到注意力机制模块中,计算得到的语义表征向量以及的语义表征向量Step B22: put Each subvector of and The corresponding sub-vectors in form a sub-vector pair, that is, h=1,2,...,n, input into the attention mechanism module, the calculation is obtained The semantic representation vector of as well as The semantic representation vector of
其中的计算公式如下:in The calculation formula is as follows:
的计算公式如下: The calculation formula is as follows:
其中,T表示矩阵转置操作;Among them, T represents the matrix transpose operation;
计算的加权连接,得到ut的语义表征向量表示如下:calculate The weighted connection of , obtains the semantic representation vector of ut It is expressed as follows:
计算的加权连接,得到a的语义表征向量表示如下:calculate The weighted connection of , obtains the semantic representation vector of a It is expressed as follows:
其中W1,W2为多头注意力机制的训练参数;where W1 , W2 are the training parameters of the multi-head attention mechanism;
步骤B23:计算对话上下文中的每句话与回答的词语相似度矩阵;ut表示对话上下文中的第t句话,则其与回答a的词语相似度矩阵的计算公式如下:Step B23: Calculate the word similarity matrix of each sentence in the dialogue context and the answer; ut represents the t-th sentence in the dialogue context, then it is the word similarity matrix of answer a The calculation formula is as follows:
进一步地,所述步骤B3具体包括以下步骤:Further, the step B3 specifically includes the following steps:
步骤B31:将回答的初始表征向量视作词向量构成的序列,输入到双向GRU网络中,计算正向与反向隐状态向量;Step B31: Treat the initial representation vector of the answer as a sequence composed of word vectors, input it into the bidirectional GRU network, and calculate the forward and reverse hidden state vectors;
将回答的初始表征向量视作构成的序列,依次输入正向GRU,得到回答的正向隐状态向量initial representation vector that will answer regarded as The sequence formed is input into the forward GRU in turn, and the forward hidden state vector of the answer is obtained.
将依次输入反向GRU,得到回答的反向隐状态向量其中d2为GRU的单元数;Will Enter the reverse GRU in turn to get the reverse hidden state vector of the answer in d2 is the unit number of GRU;
步骤B32:将对话上下文中的每句话的初始表征向量视作词向量构成的序列,输入到双向GRU网络中,计算正向与反向隐状态向量;Step B32: The initial representation vector of each sentence in the dialogue context is regarded as a sequence composed of word vectors, input into the bidirectional GRU network, and the forward and reverse hidden state vectors are calculated;
ut表示对话上下文中的第t句话,将视作构成的序列,依次输入正向GRU,得到对话上下文中第t句话ut的正向隐状态向量ut represents the t-th sentence in the dialogue context, which will be regarded as The formed sequence is input to the forward GRU in turn, and the forward hidden state vector of thet -th sentence ut in the dialogue context is obtained.
将依次输入反向GRU,得到对话上下文中第t句话ut的反向隐状态向量其中Will Enter the reverse GRU in turn to get the reverse hidden state vector of thet -th sentence ut in the dialogue context in
步骤B33:计算对话上下文中的每句话与回答的正向语义表征矩阵与反向语义表征矩阵;ut表示对话上下文中的第t句话,则其与回答a的正向语义表征矩阵M2,t与反向语义表征矩阵M3,t的计算公式如下:Step B33: Calculate the forward semantic representation matrix and reverse semantic representation matrix of each sentence and answer in the dialogue context; ut represents the t-th sentence in the dialogue context, then it and the forward semantic representation matrix M of answer a2,t and the reverse semantic representation matrix M3,t The calculation formula is as follows:
其中,in,
进一步地,所述步骤B4具体包括以下步骤:Further, the step B4 specifically includes the following steps:
步骤B41:合并M1,t、M2,t、M3,t,得到张量Step B41: Merge M1,t , M2,t , M3,t to obtain a tensor
Mt=[M1,t,M2,t,M3,t]Mt =[M1,t ,M2,t ,M3,t ]
步骤B42:将Mt输入到二维卷积神经网络中进行卷积和池化,然后输入全连接层进行降维,得到融合ut与a的语义信息的表征向量其中d3为全连接层降维后的维度;Step B42: Input Mt into a two-dimensional convolutional neural network for convolution and pooling, and then input it into a fully connected layer for dimensionality reduction to obtain a representation vector that fuses the semantic information of ut and a where d3 is the dimension of the fully connected layer after dimensionality reduction;
步骤B43:对对话上下文U中的每个语句,计算其与回答a的语义信息的表征向量其中Lu为对话上下文U中的句子数。Step B43: For each sentence in the dialogue context U, calculate the representation vector of the semantic information of the sentence and the answer a whereLu is the number of sentences in the dialogue context U.
进一步地,所述步骤B5中,将表征向量序列输入到双向GRU网络中,通过双向GRU网络对对话上下文与回答的关系进行建模,将最后输出的隐状态向量作为融合对话与回答的上下文依赖关系以及语义信息的表征向量其中Further, in the step B5, characterize the vector sequence The input is input into the bidirectional GRU network, the relationship between the dialogue context and the answer is modeled through the bidirectional GRU network, and the final output hidden state vector is used as the representation vector of the context dependency of the fusion dialogue and answer and the semantic information. in
进一步地,所述步骤B7具体包括以下步骤:Further, the step B7 specifically includes the following steps:
步骤B71:将最终的表征向量输入到全连接层,并使用softmax归一化,计算回答属于各类别的概率,计算公式如下:Step B71: Convert the final representation vector Input to the fully connected layer and use softmax normalization to calculate the probability that the answer belongs to each category. The calculation formula is as follows:
gc(U,a)=softmax(y)gc (U,a)=softmax(y)
其中,Ws为全连接层权重矩阵,bs为全连接层的偏置项,gc(U,a)为回答属于步骤B1所处理的训练样本(U,a)中的对话上下文U的概率,0≤gc(U,a)≤1,c∈{正确,错误};Among them, Ws is the weight matrix of the fully connected layer, bs is the bias term of the fully connected layer, and gc (U, a) is the answer to the dialogue context U in the training sample (U, a) processed in step B1. Probability, 0≤gc (U, a)≤1, c∈{true, false};
步骤B72:用交叉熵作为损失函数计算损失值,通过梯度优化算法AdaGrad进行学习率更新,利用反向传播迭代更新模型参数,以最小化损失函数来训练模型;Step B72: Calculate the loss value by using the cross entropy as the loss function, update the learning rate through the gradient optimization algorithm AdaGrad, and iteratively update the model parameters by using backpropagation to train the model to minimize the loss function;
其中,最小化损失函数Loss的计算公式为:Among them, the calculation formula for minimizing the loss function Loss is:
其中,(Ui,ai)表示对话训练集TS中的第i个训练样本,yi为类别标签,yi∈{0,1}。Among them, (Ui , ai ) represents the ith training sample in the dialogue training set TS,yi is the category label, andyi ∈ {0,1}.
本发明还提供了一种采用上述方法的多轮对话系统,包括:The present invention also provides a multi-round dialogue system using the above method, including:
构建训练集模块,用于采集对话上下文和回答数据,构建对话训练集TS;Build a training set module to collect dialogue context and answer data, and construct a dialogue training set TS;
模型训练模块,用于使用对话训练集TS,训练融合双向GRU网络的深度学习网络模型;以及A model training module for training a deep learning network model fused with a bidirectional GRU network using the dialogue training set TS; and
多轮对话模块,用于与用户进行对话,将用户提问输入训练好的深度学习网络模型中,输出最匹配的回答。The multi-round dialogue module is used to dialogue with the user, input the user's question into the trained deep learning network model, and output the most matching answer.
相较于现有技术,本发明具有以下有益效果:提供了一种基于双向GRU网络的多轮对话方法及系统,该方法及系统使用多头注意力能够捕获长期依赖,并且多头注意力机制比传统的注意力机制更细粒度,从而能够减少噪声的影响。同时使用双向GRU可以更好地捕获语句在时间上的关系,提高对于用户提问所作回答的准确性和匹配性,具有很强的实用性和广阔的应用前景。Compared with the prior art, the present invention has the following beneficial effects: providing a multi-round dialogue method and system based on a bidirectional GRU network, the method and system can capture long-term dependencies using multi-head attention, and the multi-head attention mechanism is more efficient than traditional The attention mechanism is more fine-grained and thus able to reduce the effect of noise. At the same time, the use of bidirectional GRU can better capture the temporal relationship of sentences, improve the accuracy and matching of answers to user questions, and has strong practicability and broad application prospects.
附图说明Description of drawings
图1为本发明实施例的方法实现流程图。FIG. 1 is a flowchart of a method implementation according to an embodiment of the present invention.
图2为本发明实施例的系统结构示意图。FIG. 2 is a schematic diagram of a system structure according to an embodiment of the present invention.
图3为本发明实施例的模型架构图。FIG. 3 is a model architecture diagram of an embodiment of the present invention.
具体实施方式Detailed ways
下面结合附图及具体实施例对本发明作进一步的详细说明。The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.
本发明提供了一种基于双向GRU网络的多轮对话方法,如图1所示,包括以下步骤:The present invention provides a multi-round dialogue method based on a bidirectional GRU network, as shown in FIG. 1 , including the following steps:
步骤A:采集对话上下文和回答数据,构建对话训练集TS。Step A: Collect dialogue context and answer data, and construct a dialogue training set TS.
步骤B:使用对话训练集TS,训练融合双向GRU网络的深度学习网络模型。Step B: Use the dialogue training set TS to train a deep learning network model fused with a bidirectional GRU network.
图3是本发明实施例中深度学习网络模型的架构图。使用对话训练集TS训练所述模型具体包括以下步骤:FIG. 3 is an architectural diagram of a deep learning network model in an embodiment of the present invention. Using the dialogue training set TS to train the model specifically includes the following steps:
步骤B1:遍历对话训练集TS,对每一个训练样本的对话上下文和回答进行编码,得到初始表征向量。Step B1: Traverse the dialogue training set TS, encode the dialogue context and answer of each training sample, and obtain an initial representation vector.
其中,对话训练集表示为其中N表示训练样本数,(U,a)表示对话训练集TS中由对话上下文U与回答a构成的一个训练样本,对话上下文U由对话过程中的多个语句组成,分别对对话上下文U中的每句话以及回答a编码得到其初始表征向量;若ut表示对话上下文U中的第t句话,其初始表征向量表示为:Among them, the dialogue training set is expressed as Among them, N represents the number of training samples, and (U, a) represents a training sample composed of a dialogue context U and an answer a in the dialogue training set TS. The dialogue context U consists of multiple sentences in the dialogue process, and the Each sentence and answer a are encoded to obtain its initial representation vector; if ut represents the t-th sentence in the dialogue context U, its initial representation vector Expressed as:
回答a的初始表征向量表示为:The initial representation vector for answer a is expressed as:
其中,Lt和La分别表示ut和a经过分词以及去除停用词后的剩余词数,和分别为和中第i个词的词向量,通过在预训练的词向量矩阵中查找得到,d1表示词向量的维度,|D|表示词典中的词数。in, Lt and La represent the number of remaining words of ut anda after word segmentation and removal of stop words, respectively, and respectively and The word vector of the i-th word in the pretrained word vector matrix , d1 represents the dimension of the word vector, and |D| represents the number of words in the dictionary.
步骤B2:将对话上下文和回答的初始表征向量输入到多头注意力机制模块,得到对话和回答的语义表征向量,计算对话与回答的词语相似度矩阵。具体包括以下步骤:Step B2: Input the dialogue context and the initial representation vector of the answer to the multi-head attention mechanism module, obtain the semantic representation vector of the dialogue and the answer, and calculate the word similarity matrix of the dialogue and the answer. Specifically include the following steps:
步骤B21:选择能够整除d1的整数s,对对话上下文中的每句话,将其初始表征向量与回答的初始表征向量在最后一个维度上平均切分成s个子向量,分别得到子向量序列和其中是ut的第h个子向量,是的第h个子向量;Step B21: Select an integer s that can divide d1 , and for each sentence in the dialogue context, use its initial representation vector and the initial representation vector of the answer In the last dimension, it is divided into s sub-vectors on average, and the sub-vector sequences are obtained respectively and in is the h-th subvector of ut , Yes The h-th subvector of ;
步骤B22:将的每个子向量和中对应的子向量构成一个子向量对,即输入到注意力机制模块中,计算得到的语义表征向量以及的语义表征向量Step B22: put Each subvector of and The corresponding sub-vectors in form a sub-vector pair, that is, Input into the attention mechanism module and calculate The semantic representation vector of as well as The semantic representation vector of
其中的计算公式如下:in The calculation formula is as follows:
的计算公式如下: The calculation formula is as follows:
其中,T表示矩阵转置操作;Among them, T represents the matrix transpose operation;
计算的加权连接,得到ut的语义表征向量表示如下:calculate The weighted connection of , obtains the semantic representation vector of ut It is expressed as follows:
计算的加权连接,得到a的语义表征向量表示如下:calculate The weighted connection of , obtains the semantic representation vector of a It is expressed as follows:
其中W1,W2为多头注意力机制的训练参数;where W1 , W2 are the training parameters of the multi-head attention mechanism;
步骤B23:计算对话上下文中的每句话与回答的词语相似度矩阵;ut表示对话上下文中的第t句话,则其与回答a的词语相似度矩阵的计算公式如下:Step B23: Calculate the word similarity matrix of each sentence in the dialogue context and the answer; ut represents the t-th sentence in the dialogue context, then it is the word similarity matrix of answer a The calculation formula is as follows:
步骤B3:将步骤B1得到的对话上下文和回答的初始表征向量输入到双向GRU网络中,计算对话和回答的双向隐状态,然后计算对话与回答的正向语义表征矩阵与反向语义表征矩阵。具体包括以下步骤:Step B3: Input the dialogue context and the initial representation vector of the answer obtained in Step B1 into the bidirectional GRU network, calculate the bidirectional hidden state of the dialogue and the answer, and then calculate the forward semantic representation matrix and the reverse semantic representation matrix of the dialogue and the answer. Specifically include the following steps:
步骤B31:将回答的初始表征向量视作词向量构成的序列,输入到双向GRU网络中,计算正向与反向隐状态向量;Step B31: Treat the initial representation vector of the answer as a sequence composed of word vectors, input it into the bidirectional GRU network, and calculate the forward and reverse hidden state vectors;
将回答的初始表征向量视作构成的序列,依次输入正向GRU,得到回答的正向隐状态向量initial representation vector that will answer regarded as The sequence formed is input into the forward GRU in turn, and the forward hidden state vector of the answer is obtained.
将依次输入反向GRU,得到回答的反向隐状态向量其中d2为GRU的单元数;Will Enter the reverse GRU in turn to get the reverse hidden state vector of the answer in d2 is the unit number of GRU;
步骤B32:将对话上下文中的每句话的初始表征向量视作词向量构成的序列,输入到双向GRU网络中,计算正向与反向隐状态向量;Step B32: The initial representation vector of each sentence in the dialogue context is regarded as a sequence composed of word vectors, input into the bidirectional GRU network, and the forward and reverse hidden state vectors are calculated;
ut表示对话上下文中的第t句话,将视作构成的序列,依次输入正向GRU,得到对话上下文中第t句话ut的正向隐状态向量ut represents the t-th sentence in the dialogue context, which will be regarded as The formed sequence is input to the forward GRU in turn, and the forward hidden state vector of thet -th sentence ut in the dialogue context is obtained.
将依次输入反向GRU,得到对话上下文中第t句话ut的反向隐状态向量其中Will Enter the reverse GRU in turn to get the reverse hidden state vector of thet -th sentence ut in the dialogue context in
步骤B33:计算对话上下文中的每句话与回答的正向语义表征矩阵与反向语义表征矩阵;ut表示对话上下文中的第t句话,则其与回答a的正向语义表征矩阵M2,t与反向语义表征矩阵M3,t的计算公式如下:Step B33: Calculate the forward semantic representation matrix and reverse semantic representation matrix of each sentence and answer in the dialogue context; ut represents the t-th sentence in the dialogue context, then it and the forward semantic representation matrix M of answer a2,t and the reverse semantic representation matrix M3,t The calculation formula is as follows:
其中,in,
步骤B4:将对话与回答的词语相似度矩阵、正向语义表征矩阵、反向语义表征矩阵合并为张量输入到二维卷积神经网络中,然后进行特征降维,得到融合对话与回答的语义信息的表征向量序列。具体包括以下步骤:Step B4: Combine the word similarity matrix, forward semantic representation matrix, and reverse semantic representation matrix of the dialogue and answer into a tensor and input it into a two-dimensional convolutional neural network, and then perform feature dimension reduction to obtain a fusion dialogue and answer matrix. A sequence of representation vectors for semantic information. Specifically include the following steps:
步骤B41:合并M1,t、M2,t、M3,t,得到张量Step B41: Merge M1,t , M2,t , M3,t to obtain a tensor
Mt=[M1,t,M2,t,M3,t]Mt =[M1,t ,M2,t ,M3,t ]
步骤B42:将Mt输入到二维卷积神经网络中进行卷积和池化,然后输入全连接层进行降维,得到融合ut与a的语义信息的表征向量其中d3为全连接层降维后的维度;Step B42: Input Mt into a two-dimensional convolutional neural network for convolution and pooling, and then input it into a fully connected layer for dimensionality reduction to obtain a representation vector that fuses the semantic information of ut and a where d3 is the dimension of the fully connected layer after dimensionality reduction;
步骤B43:对对话上下文U中的每个语句,计算其与回答a的语义信息的表征向量其中Lu为对话上下文U中的句子数。Step B43: For each sentence in the dialogue context U, calculate the representation vector of the semantic information of the sentence and the answer a whereLu is the number of sentences in the dialogue context U.
步骤B5:将步骤B4得到的表征向量序列输入到双向GRU网络中,得到融合对话与回答的上下文依赖关系以及语义信息的表征向量Step B5: Input the sequence of representation vectors obtained in Step B4 into the bidirectional GRU network to obtain the representation vector of the context dependency of the fusion dialogue and answer and the semantic information
其中,将表征向量序列输入到双向GRU网络中,通过双向GRU网络对对话上下文与回答的关系进行建模,将最后输出的隐状态向量作为融合对话与回答的上下文依赖关系以及语义信息的表征向量其中where, will characterize the vector sequence The input is input into the bidirectional GRU network, the relationship between the dialogue context and the answer is modeled through the bidirectional GRU network, and the final output hidden state vector is used as the representation vector of the context dependency of the fusion dialogue and answer and the semantic information. in
步骤B6:重复步骤B2-B5,计算对话训练集中所有训练样本的融合对话与回答的上下文依赖关系以及语义信息的表征向量Step B6: Repeat steps B2-B5 to calculate the contextual dependencies of the fused dialogue and answer and the representation vector of semantic information of all training samples in the dialogue training set
步骤B7:将所有样本的表征向量输入到深度学习网络模型的全连接层,根据目标损失函数loss,利用反向传播方法计算深度网络中各参数的梯度,并利用随机梯度下降方法更新参数。具体包括以下步骤:Step B7: Convert the characterization vectors of all samples Input to the fully connected layer of the deep learning network model, according to the target loss function loss, use the back propagation method to calculate the gradient of each parameter in the deep network, and use the stochastic gradient descent method to update the parameters. Specifically include the following steps:
步骤B71:将最终的表征向量输入到全连接层,并使用softmax归一化,计算回答属于各类别的概率,计算公式如下:Step B71: Convert the final representation vector Input to the fully connected layer and use softmax normalization to calculate the probability that the answer belongs to each category. The calculation formula is as follows:
gc(U,a)=softmax(y)gc (U,a)=softmax(y)
其中,Ws为全连接层权重矩阵,bs为全连接层的偏置项,gc(U,a)为回答属于步骤B1所处理的训练样本(U,a)中的对话上下文U的概率,0≤gc(U,a)≤1,c∈{正确,错误};Among them, Ws is the weight matrix of the fully connected layer, bs is the bias term of the fully connected layer, and gc (U, a) is the answer to the dialogue context U in the training sample (U, a) processed in step B1. Probability, 0≤gc (U, a)≤1, c∈{true, false};
步骤B72:用交叉熵作为损失函数计算损失值,通过梯度优化算法AdaGrad进行学习率更新,利用反向传播迭代更新模型参数,以最小化损失函数来训练模型;Step B72: Calculate the loss value by using the cross entropy as the loss function, update the learning rate through the gradient optimization algorithm AdaGrad, and iteratively update the model parameters by using backpropagation to train the model to minimize the loss function;
其中,最小化损失函数Loss的计算公式为:Among them, the calculation formula for minimizing the loss function Loss is:
其中,(Ui,ai)表示对话训练集TS中的第i个训练样本,yi为类别标签,yi∈{0,1}。Among them, (Ui , ai ) represents the ith training sample in the dialogue training set TS,yi is the category label, andyi ∈ {0,1}.
步骤B8:当深度学习网络模型产生的损失值小于设定阈值或者达到最大迭代次数时,终止深度学习网络模型的训练。Step B8: When the loss value generated by the deep learning network model is smaller than the set threshold or reaches the maximum number of iterations, the training of the deep learning network model is terminated.
步骤C:与用户进行对话,将用户提问输入到训练好的深度学习网络模型中,输出匹配的答案。Step C: Have a dialogue with the user, input the user's question into the trained deep learning network model, and output the matching answer.
本发明还提供了采用上述方法的多轮对话系统,如图2所示,包括:The present invention also provides a multi-round dialogue system using the above method, as shown in Figure 2, including:
构建训练集模块,用于采集对话上下文和回答数据,构建对话训练集TS;Build a training set module to collect dialogue context and answer data, and construct a dialogue training set TS;
模型训练模块,用于使用对话训练集TS,训练融合双向GRU网络的深度学习网络模型;以及A model training module for training a deep learning network model fused with a bidirectional GRU network using the dialogue training set TS; and
多轮对话模块,用于与用户进行对话,将用户提问输入训练好的深度学习网络模型中,输出最匹配的回答。The multi-round dialogue module is used to dialogue with the user, input the user's question into the trained deep learning network model, and output the most matching answer.
以上是本发明的较佳实施例,凡依本发明技术方案所作的改变,所产生的功能作用未超出本发明技术方案的范围时,均属于本发明的保护范围。The above are the preferred embodiments of the present invention, all changes made according to the technical solutions of the present invention, when the resulting functional effects do not exceed the scope of the technical solutions of the present invention, belong to the protection scope of the present invention.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010067240.9ACN111274375B (en) | 2020-01-20 | 2020-01-20 | Multi-turn dialogue method and system based on bidirectional GRU network |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010067240.9ACN111274375B (en) | 2020-01-20 | 2020-01-20 | Multi-turn dialogue method and system based on bidirectional GRU network |
| Publication Number | Publication Date |
|---|---|
| CN111274375Atrue CN111274375A (en) | 2020-06-12 |
| CN111274375B CN111274375B (en) | 2022-06-14 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202010067240.9AExpired - Fee RelatedCN111274375B (en) | 2020-01-20 | 2020-01-20 | Multi-turn dialogue method and system based on bidirectional GRU network |
| Country | Link |
|---|---|
| CN (1) | CN111274375B (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112434143A (en)* | 2020-11-20 | 2021-03-02 | 西安交通大学 | Dialog method, storage medium and system based on hidden state constraint of GRU (generalized regression Unit) |
| CN112632236A (en)* | 2020-12-02 | 2021-04-09 | 中山大学 | Improved sequence matching network-based multi-turn dialogue model |
| CN112818105A (en)* | 2021-02-05 | 2021-05-18 | 江苏实达迪美数据处理有限公司 | Multi-turn dialogue method and system fusing context information |
| CN113157855A (en)* | 2021-02-22 | 2021-07-23 | 福州大学 | Text summarization method and system fusing semantic and context information |
| WO2021147405A1 (en)* | 2020-08-31 | 2021-07-29 | 平安科技(深圳)有限公司 | Customer-service statement quality detection method and related device |
| CN114443827A (en)* | 2022-01-28 | 2022-05-06 | 福州大学 | Local information perception dialogue method and system based on pre-training language model |
| CN114490991A (en)* | 2022-01-28 | 2022-05-13 | 福州大学 | Dialogue structure-aware dialogue method and system based on fine-grained local information enhancement |
| CN114564568A (en)* | 2022-02-25 | 2022-05-31 | 福州大学 | Dialogue state tracking method and system based on knowledge enhancement and context awareness |
| CN114840652A (en)* | 2022-04-21 | 2022-08-02 | 大箴(杭州)科技有限公司 | Training method, device, model and dialogue scoring method for dialogue scoring model |
| CN115276697A (en)* | 2022-07-22 | 2022-11-01 | 交通运输部规划研究院 | Coast radio station communication system integrated with intelligent voice |
| CN116028604A (en)* | 2022-11-22 | 2023-04-28 | 福州大学 | An answer selection method and system based on knowledge-enhanced graph convolutional network |
| CN116128438A (en)* | 2022-12-27 | 2023-05-16 | 江苏巨楷科技发展有限公司 | Intelligent community management system based on big data record information |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108874972A (en)* | 2018-06-08 | 2018-11-23 | 青岛里奥机器人技术有限公司 | A kind of more wheel emotion dialogue methods based on deep learning |
| CN109460463A (en)* | 2018-11-15 | 2019-03-12 | 平安科技(深圳)有限公司 | Model training method, device, terminal and storage medium based on data processing |
| CN109933659A (en)* | 2019-03-22 | 2019-06-25 | 重庆邮电大学 | A vehicle multi-round dialogue method for travel |
| CN110020015A (en)* | 2017-12-29 | 2019-07-16 | 中国科学院声学研究所 | A kind of conversational system answers generation method and system |
| US20190385051A1 (en)* | 2018-06-14 | 2019-12-19 | Accenture Global Solutions Limited | Virtual agent with a dialogue management system and method of training a dialogue management system |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110020015A (en)* | 2017-12-29 | 2019-07-16 | 中国科学院声学研究所 | A kind of conversational system answers generation method and system |
| CN108874972A (en)* | 2018-06-08 | 2018-11-23 | 青岛里奥机器人技术有限公司 | A kind of more wheel emotion dialogue methods based on deep learning |
| US20190385051A1 (en)* | 2018-06-14 | 2019-12-19 | Accenture Global Solutions Limited | Virtual agent with a dialogue management system and method of training a dialogue management system |
| CN109460463A (en)* | 2018-11-15 | 2019-03-12 | 平安科技(深圳)有限公司 | Model training method, device, terminal and storage medium based on data processing |
| CN109933659A (en)* | 2019-03-22 | 2019-06-25 | 重庆邮电大学 | A vehicle multi-round dialogue method for travel |
| Title |
|---|
| 宋皓宇等: "基于DQN的开放域多轮对话策略学习", 《中文信息学报》* |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2021147405A1 (en)* | 2020-08-31 | 2021-07-29 | 平安科技(深圳)有限公司 | Customer-service statement quality detection method and related device |
| CN112434143B (en)* | 2020-11-20 | 2022-12-09 | 西安交通大学 | Dialog method, storage medium and system based on hidden state constraint of GRU (generalized regression Unit) |
| CN112434143A (en)* | 2020-11-20 | 2021-03-02 | 西安交通大学 | Dialog method, storage medium and system based on hidden state constraint of GRU (generalized regression Unit) |
| CN112632236A (en)* | 2020-12-02 | 2021-04-09 | 中山大学 | Improved sequence matching network-based multi-turn dialogue model |
| CN112818105A (en)* | 2021-02-05 | 2021-05-18 | 江苏实达迪美数据处理有限公司 | Multi-turn dialogue method and system fusing context information |
| CN112818105B (en)* | 2021-02-05 | 2021-12-07 | 江苏实达迪美数据处理有限公司 | Multi-turn dialogue method and system fusing context information |
| CN113157855A (en)* | 2021-02-22 | 2021-07-23 | 福州大学 | Text summarization method and system fusing semantic and context information |
| CN114443827A (en)* | 2022-01-28 | 2022-05-06 | 福州大学 | Local information perception dialogue method and system based on pre-training language model |
| CN114490991A (en)* | 2022-01-28 | 2022-05-13 | 福州大学 | Dialogue structure-aware dialogue method and system based on fine-grained local information enhancement |
| CN114564568A (en)* | 2022-02-25 | 2022-05-31 | 福州大学 | Dialogue state tracking method and system based on knowledge enhancement and context awareness |
| CN114840652A (en)* | 2022-04-21 | 2022-08-02 | 大箴(杭州)科技有限公司 | Training method, device, model and dialogue scoring method for dialogue scoring model |
| CN115276697A (en)* | 2022-07-22 | 2022-11-01 | 交通运输部规划研究院 | Coast radio station communication system integrated with intelligent voice |
| CN116028604A (en)* | 2022-11-22 | 2023-04-28 | 福州大学 | An answer selection method and system based on knowledge-enhanced graph convolutional network |
| CN116128438A (en)* | 2022-12-27 | 2023-05-16 | 江苏巨楷科技发展有限公司 | Intelligent community management system based on big data record information |
| Publication number | Publication date |
|---|---|
| CN111274375B (en) | 2022-06-14 |
| Publication | Publication Date | Title |
|---|---|---|
| CN111274375B (en) | Multi-turn dialogue method and system based on bidirectional GRU network | |
| CN111274398B (en) | Method and system for analyzing comment emotion of aspect-level user product | |
| CN113297364B (en) | Natural language understanding method and device in dialogue-oriented system | |
| CN114443827B (en) | Local information perception dialogue method and system based on pre-trained language model | |
| CN111414481B (en) | Chinese semantic matching method based on pinyin and BERT embedding | |
| CN112232087B (en) | Specific aspect emotion analysis method of multi-granularity attention model based on Transformer | |
| CN115688879B (en) | An intelligent customer service voice processing system and method based on knowledge graph | |
| CN114238649B (en) | Language model pre-training method with common sense concept enhancement | |
| CN115810351B (en) | Voice recognition method and device for controller based on audio-visual fusion | |
| CN113807079B (en) | A sequence-to-sequence based end-to-end joint entity and relation extraction method | |
| CN111274359B (en) | Query recommendation method and system based on improved VHRED and reinforcement learning | |
| Dai et al. | Hybrid deep model for human behavior understanding on industrial internet of video things | |
| CN116050401B (en) | Automatic Generation Method of Diversity Questions Based on Transformer Question Keyword Prediction | |
| CN116010622A (en) | BERT knowledge map completion method and system integrating entity types | |
| CN108170848A (en) | A kind of session operational scenarios sorting technique towards China Mobile's intelligent customer service | |
| CN115796182A (en) | Multi-modal named entity recognition method based on entity-level cross-modal interaction | |
| CN111158640B (en) | One-to-many demand analysis and identification method based on deep learning | |
| CN113901758B (en) | A relation extraction method for automatic knowledge graph construction system | |
| CN114548293A (en) | Video-text cross-modal retrieval method based on cross-granularity self-distillation | |
| CN119739990B (en) | Multi-mode emotion recognition method based on hypergraph level contrast learning | |
| CN119272849A (en) | Dialogue state tracking method and system based on context deconstruction | |
| CN114564568B (en) | Dialogue state tracking method and system based on knowledge enhancement and context awareness | |
| CN115422945A (en) | A rumor detection method and system integrating emotion mining | |
| CN119003769A (en) | Netizen view analysis method based on double large models | |
| CN116860943B (en) | Multi-round dialogue method and system for dialogue style perception and topic guidance |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| CF01 | Termination of patent right due to non-payment of annual fee | ||
| CF01 | Termination of patent right due to non-payment of annual fee | Granted publication date:20220614 |