Movatterモバイル変換


[0]ホーム

URL:


CN111274375A - A multi-round dialogue method and system based on bidirectional GRU network - Google Patents

A multi-round dialogue method and system based on bidirectional GRU network
Download PDF

Info

Publication number
CN111274375A
CN111274375ACN202010067240.9ACN202010067240ACN111274375ACN 111274375 ACN111274375 ACN 111274375ACN 202010067240 ACN202010067240 ACN 202010067240ACN 111274375 ACN111274375 ACN 111274375A
Authority
CN
China
Prior art keywords
answer
vector
context
dialogue
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010067240.9A
Other languages
Chinese (zh)
Other versions
CN111274375B (en
Inventor
陈羽中
谢琪
刘漳辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou UniversityfiledCriticalFuzhou University
Priority to CN202010067240.9ApriorityCriticalpatent/CN111274375B/en
Publication of CN111274375ApublicationCriticalpatent/CN111274375A/en
Application grantedgrantedCritical
Publication of CN111274375BpublicationCriticalpatent/CN111274375B/en
Expired - Fee Relatedlegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

The invention relates to a multi-turn dialogue method and a system based on a bidirectional GRU network, wherein the method comprises the following steps: step A: collecting dialogue context and answer data and constructing dialogue training setD(ii) a And B: using a dialog training setDTraining and fusing bidirectional GRU deep learning network modelM(ii) a And C: conversing with the user, inputting the user question into the trained deep learning network modelMAnd outputting the matched answer. The method and the system are beneficial to improving the matching of answers to the user questions.

Description

Translated fromChinese
一种基于双向GRU网络的多轮对话方法及系统A multi-round dialogue method and system based on bidirectional GRU network

技术领域technical field

本发明涉及自然语言处理领域,具体涉及一种基于双向GRU网络的多轮对话方法及系统。The invention relates to the field of natural language processing, in particular to a multi-round dialogue method and system based on a bidirectional GRU network.

背景技术Background technique

近年来,随着深度学习与神经网络的迅猛发展,人工智能领域迎来了新变革。作为人工智能领域的核心技术之一,多轮对话已成为一个研究热点,未来它可以广泛应用于人机交互、智能家居、智能客服、智能家教和社交机器人等不同行业,具有重大的研究意义、学术价值和应用价值,因而得到了学术界的持续关注与工业界的高度重视。In recent years, with the rapid development of deep learning and neural networks, the field of artificial intelligence has ushered in new changes. As one of the core technologies in the field of artificial intelligence, multi-round dialogue has become a research hotspot. In the future, it can be widely used in different industries such as human-computer interaction, smart home, smart customer service, smart tutoring and social robots. It has great research significance. The academic value and application value have thus received continuous attention from academia and high attention from industry.

Lowe等人将对话上下文字面相连接,形成连接后的上下文矩阵和回答进行匹配,进一步的考虑了对话上下文的整体语意。Yan等人将上下文语句与输入消息连接起来,作为新的查询查询,并执行深度神经网络体系结构的匹配。Zhou等人改进多角度响应选择,使用包含话语视图和单词视图的多视图模型。Zhou等人提出了一种基于注意力机制的对话上下文和回答的匹配算法。该方法使用基于scale-attention的自注意力机制和交互注意力机制构造了两种匹配矩阵,验证了该方法的有效性。Wu等人将候选的回答和每一个上下文的语句进行匹配,并且使用了RNN用来维持句子语义的顺序性,该框架提高了系统的性能,表明了回答和每一个上下文之间的交互作用是有效的。Zhou等人还将回答和每一个上下文语句进行交互,他们使用了一个编码层作为转化而没有用RNN去表示不同层级的句子。他们使用注意力机制提取对话和回答之间更多依赖关系信息,把所有信息累加起来以计算匹配度。现有的注意力机制模型,虽然可以提取对话和回答之间更多依赖关系信息,但是容易受到噪声的影响并且无法补获长期依赖。Lowe et al. connected the dialogue context literally, formed a connected context matrix and matched the answer, and further considered the overall semantics of the dialogue context. Yan et al. concatenate contextual sentences with input messages as new query queries and perform matching of deep neural network architectures. Zhou et al. improve multi-angle response selection, using a multi-view model that includes both utterance views and word views. Zhou et al. proposed an attention-based matching algorithm for dialogue context and answer. This method uses scale-attention-based self-attention mechanism and interactive attention mechanism to construct two matching matrices, which verifies the effectiveness of the method. Wu et al. matched candidate answers with sentences in each context, and used RNN to maintain the order of sentence semantics. The framework improved the performance of the system, showing that the interaction between the answer and each context is Effective. Zhou et al. also interacted the answer with each contextual sentence, and they used an encoding layer as the transformation instead of using RNN to represent sentences at different levels. They use the attention mechanism to extract more information about the dependencies between the dialogue and the answer, and add up all the information to calculate the matching degree. Although the existing attention mechanism models can extract more information about the dependencies between dialogues and answers, they are easily affected by noise and cannot capture long-term dependencies.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于提供一种基于双向GRU网络的多轮对话方法及系统,该方法及系统有利于提高对于用户提问所作回答的匹配性。The purpose of the present invention is to provide a multi-round dialogue method and system based on a bidirectional GRU network, which is beneficial to improve the matching of answers to user questions.

为实现上述目的,本发明采用的技术方案是:一种基于双向GRU网络的多轮对话方法,包括以下步骤:In order to achieve the above object, the technical solution adopted in the present invention is: a multi-round dialogue method based on a two-way GRU network, comprising the following steps:

步骤A:采集对话上下文和回答数据,构建对话训练集TS;Step A: Collect dialogue context and answer data, and construct dialogue training set TS;

步骤B:使用对话训练集TS,训练融合双向GRU网络的深度学习网络模型;Step B: Use the dialogue training set TS to train a deep learning network model fused with a bidirectional GRU network;

步骤C:与用户进行对话,将用户提问输入到训练好的深度学习网络模型中,输出匹配的答案。Step C: Have a dialogue with the user, input the user's question into the trained deep learning network model, and output the matching answer.

进一步地,所述步骤B具体包括以下步骤:Further, the step B specifically includes the following steps:

步骤B1:遍历对话训练集TS,对每一个训练样本的对话上下文和回答进行编码,得到初始表征向量;Step B1: Traverse the dialogue training set TS, encode the dialogue context and answer of each training sample, and obtain an initial representation vector;

步骤B2:将对话上下文和回答的初始表征向量输入到多头注意力机制模块,得到对话和回答的语义表征向量,计算对话与回答的词语相似度矩阵;Step B2: Input the dialogue context and the initial representation vector of the answer to the multi-head attention mechanism module, obtain the semantic representation vector of the dialogue and the answer, and calculate the word similarity matrix of the dialogue and the answer;

步骤B3:将步骤B1得到的对话上下文和回答的初始表征向量输入到双向GRU网络中,计算对话和回答的双向隐状态,然后计算对话与回答的正向语义表征矩阵与反向语义表征矩阵;Step B3: Input the dialogue context and the initial representation vector of the answer obtained in Step B1 into the bidirectional GRU network, calculate the bidirectional hidden state of the dialogue and the answer, and then calculate the forward semantic representation matrix and the reverse semantic representation matrix of the dialogue and the answer;

步骤B4:将对话与回答的词语相似度矩阵、正向语义表征矩阵、反向语义表征矩阵合并为张量输入到二维卷积神经网络中,然后进行特征降维,得到融合对话与回答的语义信息的表征向量序列;Step B4: Combine the word similarity matrix, forward semantic representation matrix, and reverse semantic representation matrix of the dialogue and answer into a tensor and input it into a two-dimensional convolutional neural network, and then perform feature dimension reduction to obtain a fusion dialogue and answer matrix. A sequence of representation vectors of semantic information;

步骤B5:将步骤B4得到的表征向量序列输入到双向GRU网络中,得到融合对话与回答的上下文依赖关系以及语义信息的表征向量

Figure BDA0002376341150000021
Step B5: Input the sequence of representation vectors obtained in Step B4 into the bidirectional GRU network to obtain the representation vector of the context dependency of the fusion dialogue and answer and the semantic information
Figure BDA0002376341150000021

步骤B6:重复步骤B2-B5,计算对话训练集中所有训练样本的融合对话与回答的上下文依赖关系以及语义信息的表征向量

Figure BDA0002376341150000022
Step B6: Repeat steps B2-B5 to calculate the contextual dependencies of the fused dialogue and answer and the representation vector of semantic information of all training samples in the dialogue training set
Figure BDA0002376341150000022

步骤B7:将所有样本的表征向量

Figure BDA0002376341150000023
输入到深度学习网络模型的全连接层,根据目标损失函数loss,利用反向传播方法计算深度网络中各参数的梯度,并利用随机梯度下降方法更新参数;Step B7: Convert the characterization vectors of all samples
Figure BDA0002376341150000023
Input to the fully connected layer of the deep learning network model, according to the target loss function loss, use the back propagation method to calculate the gradient of each parameter in the deep network, and use the stochastic gradient descent method to update the parameters;

步骤B8:当深度学习网络模型产生的损失值小于设定阈值或者达到最大迭代次数时,终止深度学习网络模型的训练。Step B8: When the loss value generated by the deep learning network model is smaller than the set threshold or reaches the maximum number of iterations, the training of the deep learning network model is terminated.

进一步地,所述步骤B1中,对话训练集表示为

Figure BDA0002376341150000024
其中N表示训练样本数,(U,a)表示对话训练集TS中由对话上下文U与回答a构成的一个训练样本,对话上下文U由对话过程中的多个语句组成,分别对对话上下文U中的每句话以及回答a编码得到其初始表征向量;若ut表示对话上下文U中的第t句话,其初始表征向量
Figure BDA0002376341150000025
表示为:Further, in the step B1, the dialogue training set is expressed as
Figure BDA0002376341150000024
Among them, N represents the number of training samples, and (U, a) represents a training sample composed of a dialogue context U and an answer a in the dialogue training set TS. The dialogue context U consists of multiple sentences in the dialogue process, and the Each sentence and answer a are encoded to obtain its initial representation vector; if ut represents the t-th sentence in the dialogue context U, its initial representation vector
Figure BDA0002376341150000025
Expressed as:

Figure BDA0002376341150000031
Figure BDA0002376341150000031

回答a的初始表征向量表示为:The initial representation vector for answer a is expressed as:

Figure BDA0002376341150000032
Figure BDA0002376341150000032

其中,

Figure BDA0002376341150000033
Lt和La分别表示ut和a经过分词以及去除停用词后的剩余词数,
Figure BDA0002376341150000034
Figure BDA0002376341150000035
分别为
Figure BDA0002376341150000036
Figure BDA0002376341150000037
中第i个词的词向量,通过在预训练的词向量矩阵
Figure BDA0002376341150000038
中查找得到,d1表示词向量的维度,|D|表示词典中的词数。in,
Figure BDA0002376341150000033
Lt and La represent the number of remaining words of ut anda after word segmentation and removal of stop words, respectively,
Figure BDA0002376341150000034
and
Figure BDA0002376341150000035
respectively
Figure BDA0002376341150000036
and
Figure BDA0002376341150000037
The word vector of the i-th word in the pretrained word vector matrix
Figure BDA0002376341150000038
, d1 represents the dimension of the word vector, and |D| represents the number of words in the dictionary.

进一步地,所述步骤B2具体包括以下步骤:Further, the step B2 specifically includes the following steps:

步骤B21:选择能够整除d1的整数s,对对话上下文中的每句话,将其初始表征向量

Figure BDA0002376341150000039
与回答的初始表征向量
Figure BDA00023763411500000310
在最后一个维度上平均切分成s个子向量,分别得到子向量序列
Figure BDA00023763411500000311
Figure BDA00023763411500000312
其中
Figure BDA00023763411500000330
是ut的第h个子向量,
Figure BDA00023763411500000331
Figure BDA00023763411500000315
的第h个子向量;Step B21: Select an integer s that can divide d1 , and for each sentence in the dialogue context, use its initial representation vector
Figure BDA0002376341150000039
and the initial representation vector of the answer
Figure BDA00023763411500000310
In the last dimension, it is divided into s sub-vectors on average, and the sub-vector sequences are obtained respectively
Figure BDA00023763411500000311
and
Figure BDA00023763411500000312
in
Figure BDA00023763411500000330
is the h-th subvector of ut ,
Figure BDA00023763411500000331
Yes
Figure BDA00023763411500000315
The h-th subvector of ;

步骤B22:将

Figure BDA00023763411500000316
的每个子向量和
Figure BDA00023763411500000317
中对应的子向量构成一个子向量对,即
Figure BDA00023763411500000318
h=1,2,...,n,输入到注意力机制模块中,计算得到
Figure BDA00023763411500000319
的语义表征向量
Figure BDA00023763411500000320
以及
Figure BDA00023763411500000321
的语义表征向量
Figure BDA00023763411500000322
Step B22: put
Figure BDA00023763411500000316
Each subvector of and
Figure BDA00023763411500000317
The corresponding sub-vectors in form a sub-vector pair, that is,
Figure BDA00023763411500000318
h=1,2,...,n, input into the attention mechanism module, the calculation is obtained
Figure BDA00023763411500000319
The semantic representation vector of
Figure BDA00023763411500000320
as well as
Figure BDA00023763411500000321
The semantic representation vector of
Figure BDA00023763411500000322

其中

Figure BDA00023763411500000323
的计算公式如下:in
Figure BDA00023763411500000323
The calculation formula is as follows:

Figure BDA00023763411500000324
Figure BDA00023763411500000324

Figure BDA00023763411500000325
的计算公式如下:
Figure BDA00023763411500000325
The calculation formula is as follows:

Figure BDA00023763411500000326
Figure BDA00023763411500000326

其中,T表示矩阵转置操作;Among them, T represents the matrix transpose operation;

计算

Figure BDA00023763411500000327
的加权连接,得到ut的语义表征向量
Figure BDA00023763411500000328
表示如下:calculate
Figure BDA00023763411500000327
The weighted connection of , obtains the semantic representation vector of ut
Figure BDA00023763411500000328
It is expressed as follows:

Figure BDA00023763411500000329
Figure BDA00023763411500000329

计算

Figure BDA0002376341150000041
的加权连接,得到a的语义表征向量
Figure BDA0002376341150000042
表示如下:calculate
Figure BDA0002376341150000041
The weighted connection of , obtains the semantic representation vector of a
Figure BDA0002376341150000042
It is expressed as follows:

Figure BDA0002376341150000043
Figure BDA0002376341150000043

其中W1,W2为多头注意力机制的训练参数;where W1 , W2 are the training parameters of the multi-head attention mechanism;

步骤B23:计算对话上下文中的每句话与回答的词语相似度矩阵;ut表示对话上下文中的第t句话,则其与回答a的词语相似度矩阵

Figure BDA0002376341150000044
的计算公式如下:Step B23: Calculate the word similarity matrix of each sentence in the dialogue context and the answer; ut represents the t-th sentence in the dialogue context, then it is the word similarity matrix of answer a
Figure BDA0002376341150000044
The calculation formula is as follows:

Figure BDA0002376341150000045
Figure BDA0002376341150000045

进一步地,所述步骤B3具体包括以下步骤:Further, the step B3 specifically includes the following steps:

步骤B31:将回答的初始表征向量视作词向量构成的序列,输入到双向GRU网络中,计算正向与反向隐状态向量;Step B31: Treat the initial representation vector of the answer as a sequence composed of word vectors, input it into the bidirectional GRU network, and calculate the forward and reverse hidden state vectors;

将回答的初始表征向量

Figure BDA0002376341150000046
视作
Figure BDA0002376341150000047
构成的序列,依次输入正向GRU,得到回答的正向隐状态向量
Figure BDA0002376341150000048
initial representation vector that will answer
Figure BDA0002376341150000046
regarded as
Figure BDA0002376341150000047
The sequence formed is input into the forward GRU in turn, and the forward hidden state vector of the answer is obtained.
Figure BDA0002376341150000048

Figure BDA0002376341150000049
依次输入反向GRU,得到回答的反向隐状态向量
Figure BDA00023763411500000410
其中
Figure BDA00023763411500000411
d2为GRU的单元数;Will
Figure BDA0002376341150000049
Enter the reverse GRU in turn to get the reverse hidden state vector of the answer
Figure BDA00023763411500000410
in
Figure BDA00023763411500000411
d2 is the unit number of GRU;

步骤B32:将对话上下文中的每句话的初始表征向量视作词向量构成的序列,输入到双向GRU网络中,计算正向与反向隐状态向量;Step B32: The initial representation vector of each sentence in the dialogue context is regarded as a sequence composed of word vectors, input into the bidirectional GRU network, and the forward and reverse hidden state vectors are calculated;

ut表示对话上下文中的第t句话,将

Figure BDA00023763411500000412
视作
Figure BDA00023763411500000413
构成的序列,依次输入正向GRU,得到对话上下文中第t句话ut的正向隐状态向量
Figure BDA00023763411500000414
ut represents the t-th sentence in the dialogue context, which will be
Figure BDA00023763411500000412
regarded as
Figure BDA00023763411500000413
The formed sequence is input to the forward GRU in turn, and the forward hidden state vector of thet -th sentence ut in the dialogue context is obtained.
Figure BDA00023763411500000414

Figure BDA00023763411500000415
依次输入反向GRU,得到对话上下文中第t句话ut的反向隐状态向量
Figure BDA00023763411500000416
其中
Figure BDA00023763411500000417
Will
Figure BDA00023763411500000415
Enter the reverse GRU in turn to get the reverse hidden state vector of thet -th sentence ut in the dialogue context
Figure BDA00023763411500000416
in
Figure BDA00023763411500000417

步骤B33:计算对话上下文中的每句话与回答的正向语义表征矩阵与反向语义表征矩阵;ut表示对话上下文中的第t句话,则其与回答a的正向语义表征矩阵M2,t与反向语义表征矩阵M3,t的计算公式如下:Step B33: Calculate the forward semantic representation matrix and reverse semantic representation matrix of each sentence and answer in the dialogue context; ut represents the t-th sentence in the dialogue context, then it and the forward semantic representation matrix M of answer a2,t and the reverse semantic representation matrix M3,t The calculation formula is as follows:

Figure BDA0002376341150000051
Figure BDA0002376341150000051

Figure BDA0002376341150000052
Figure BDA0002376341150000052

其中,

Figure BDA0002376341150000053
in,
Figure BDA0002376341150000053

进一步地,所述步骤B4具体包括以下步骤:Further, the step B4 specifically includes the following steps:

步骤B41:合并M1,t、M2,t、M3,t,得到张量

Figure BDA0002376341150000054
Step B41: Merge M1,t , M2,t , M3,t to obtain a tensor
Figure BDA0002376341150000054

Mt=[M1,t,M2,t,M3,t]Mt =[M1,t ,M2,t ,M3,t ]

步骤B42:将Mt输入到二维卷积神经网络中进行卷积和池化,然后输入全连接层进行降维,得到融合ut与a的语义信息的表征向量

Figure BDA0002376341150000055
其中d3为全连接层降维后的维度;Step B42: Input Mt into a two-dimensional convolutional neural network for convolution and pooling, and then input it into a fully connected layer for dimensionality reduction to obtain a representation vector that fuses the semantic information of ut and a
Figure BDA0002376341150000055
where d3 is the dimension of the fully connected layer after dimensionality reduction;

步骤B43:对对话上下文U中的每个语句,计算其与回答a的语义信息的表征向量

Figure BDA0002376341150000056
其中Lu为对话上下文U中的句子数。Step B43: For each sentence in the dialogue context U, calculate the representation vector of the semantic information of the sentence and the answer a
Figure BDA0002376341150000056
whereLu is the number of sentences in the dialogue context U.

进一步地,所述步骤B5中,将表征向量序列

Figure BDA0002376341150000057
输入到双向GRU网络中,通过双向GRU网络对对话上下文与回答的关系进行建模,将最后输出的隐状态向量作为融合对话与回答的上下文依赖关系以及语义信息的表征向量
Figure BDA0002376341150000058
其中
Figure BDA0002376341150000059
Further, in the step B5, characterize the vector sequence
Figure BDA0002376341150000057
The input is input into the bidirectional GRU network, the relationship between the dialogue context and the answer is modeled through the bidirectional GRU network, and the final output hidden state vector is used as the representation vector of the context dependency of the fusion dialogue and answer and the semantic information.
Figure BDA0002376341150000058
in
Figure BDA0002376341150000059

进一步地,所述步骤B7具体包括以下步骤:Further, the step B7 specifically includes the following steps:

步骤B71:将最终的表征向量

Figure BDA00023763411500000510
输入到全连接层,并使用softmax归一化,计算回答属于各类别的概率,计算公式如下:Step B71: Convert the final representation vector
Figure BDA00023763411500000510
Input to the fully connected layer and use softmax normalization to calculate the probability that the answer belongs to each category. The calculation formula is as follows:

Figure BDA00023763411500000511
Figure BDA00023763411500000511

gc(U,a)=softmax(y)gc (U,a)=softmax(y)

其中,Ws为全连接层权重矩阵,bs为全连接层的偏置项,gc(U,a)为回答属于步骤B1所处理的训练样本(U,a)中的对话上下文U的概率,0≤gc(U,a)≤1,c∈{正确,错误};Among them, Ws is the weight matrix of the fully connected layer, bs is the bias term of the fully connected layer, and gc (U, a) is the answer to the dialogue context U in the training sample (U, a) processed in step B1. Probability, 0≤gc (U, a)≤1, c∈{true, false};

步骤B72:用交叉熵作为损失函数计算损失值,通过梯度优化算法AdaGrad进行学习率更新,利用反向传播迭代更新模型参数,以最小化损失函数来训练模型;Step B72: Calculate the loss value by using the cross entropy as the loss function, update the learning rate through the gradient optimization algorithm AdaGrad, and iteratively update the model parameters by using backpropagation to train the model to minimize the loss function;

其中,最小化损失函数Loss的计算公式为:Among them, the calculation formula for minimizing the loss function Loss is:

Figure BDA0002376341150000061
Figure BDA0002376341150000061

其中,(Ui,ai)表示对话训练集TS中的第i个训练样本,yi为类别标签,yi∈{0,1}。Among them, (Ui , ai ) represents the ith training sample in the dialogue training set TS,yi is the category label, andyi ∈ {0,1}.

本发明还提供了一种采用上述方法的多轮对话系统,包括:The present invention also provides a multi-round dialogue system using the above method, including:

构建训练集模块,用于采集对话上下文和回答数据,构建对话训练集TS;Build a training set module to collect dialogue context and answer data, and construct a dialogue training set TS;

模型训练模块,用于使用对话训练集TS,训练融合双向GRU网络的深度学习网络模型;以及A model training module for training a deep learning network model fused with a bidirectional GRU network using the dialogue training set TS; and

多轮对话模块,用于与用户进行对话,将用户提问输入训练好的深度学习网络模型中,输出最匹配的回答。The multi-round dialogue module is used to dialogue with the user, input the user's question into the trained deep learning network model, and output the most matching answer.

相较于现有技术,本发明具有以下有益效果:提供了一种基于双向GRU网络的多轮对话方法及系统,该方法及系统使用多头注意力能够捕获长期依赖,并且多头注意力机制比传统的注意力机制更细粒度,从而能够减少噪声的影响。同时使用双向GRU可以更好地捕获语句在时间上的关系,提高对于用户提问所作回答的准确性和匹配性,具有很强的实用性和广阔的应用前景。Compared with the prior art, the present invention has the following beneficial effects: providing a multi-round dialogue method and system based on a bidirectional GRU network, the method and system can capture long-term dependencies using multi-head attention, and the multi-head attention mechanism is more efficient than traditional The attention mechanism is more fine-grained and thus able to reduce the effect of noise. At the same time, the use of bidirectional GRU can better capture the temporal relationship of sentences, improve the accuracy and matching of answers to user questions, and has strong practicability and broad application prospects.

附图说明Description of drawings

图1为本发明实施例的方法实现流程图。FIG. 1 is a flowchart of a method implementation according to an embodiment of the present invention.

图2为本发明实施例的系统结构示意图。FIG. 2 is a schematic diagram of a system structure according to an embodiment of the present invention.

图3为本发明实施例的模型架构图。FIG. 3 is a model architecture diagram of an embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图及具体实施例对本发明作进一步的详细说明。The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.

本发明提供了一种基于双向GRU网络的多轮对话方法,如图1所示,包括以下步骤:The present invention provides a multi-round dialogue method based on a bidirectional GRU network, as shown in FIG. 1 , including the following steps:

步骤A:采集对话上下文和回答数据,构建对话训练集TS。Step A: Collect dialogue context and answer data, and construct a dialogue training set TS.

步骤B:使用对话训练集TS,训练融合双向GRU网络的深度学习网络模型。Step B: Use the dialogue training set TS to train a deep learning network model fused with a bidirectional GRU network.

图3是本发明实施例中深度学习网络模型的架构图。使用对话训练集TS训练所述模型具体包括以下步骤:FIG. 3 is an architectural diagram of a deep learning network model in an embodiment of the present invention. Using the dialogue training set TS to train the model specifically includes the following steps:

步骤B1:遍历对话训练集TS,对每一个训练样本的对话上下文和回答进行编码,得到初始表征向量。Step B1: Traverse the dialogue training set TS, encode the dialogue context and answer of each training sample, and obtain an initial representation vector.

其中,对话训练集表示为

Figure BDA0002376341150000062
其中N表示训练样本数,(U,a)表示对话训练集TS中由对话上下文U与回答a构成的一个训练样本,对话上下文U由对话过程中的多个语句组成,分别对对话上下文U中的每句话以及回答a编码得到其初始表征向量;若ut表示对话上下文U中的第t句话,其初始表征向量
Figure BDA0002376341150000071
表示为:Among them, the dialogue training set is expressed as
Figure BDA0002376341150000062
Among them, N represents the number of training samples, and (U, a) represents a training sample composed of a dialogue context U and an answer a in the dialogue training set TS. The dialogue context U consists of multiple sentences in the dialogue process, and the Each sentence and answer a are encoded to obtain its initial representation vector; if ut represents the t-th sentence in the dialogue context U, its initial representation vector
Figure BDA0002376341150000071
Expressed as:

Figure BDA0002376341150000072
Figure BDA0002376341150000072

回答a的初始表征向量表示为:The initial representation vector for answer a is expressed as:

Figure BDA0002376341150000073
Figure BDA0002376341150000073

其中,

Figure BDA0002376341150000074
Lt和La分别表示ut和a经过分词以及去除停用词后的剩余词数,
Figure BDA0002376341150000075
Figure BDA0002376341150000076
分别为
Figure BDA0002376341150000077
Figure BDA0002376341150000078
中第i个词的词向量,通过在预训练的词向量矩阵
Figure BDA0002376341150000079
中查找得到,d1表示词向量的维度,|D|表示词典中的词数。in,
Figure BDA0002376341150000074
Lt and La represent the number of remaining words of ut anda after word segmentation and removal of stop words, respectively,
Figure BDA0002376341150000075
and
Figure BDA0002376341150000076
respectively
Figure BDA0002376341150000077
and
Figure BDA0002376341150000078
The word vector of the i-th word in the pretrained word vector matrix
Figure BDA0002376341150000079
, d1 represents the dimension of the word vector, and |D| represents the number of words in the dictionary.

步骤B2:将对话上下文和回答的初始表征向量输入到多头注意力机制模块,得到对话和回答的语义表征向量,计算对话与回答的词语相似度矩阵。具体包括以下步骤:Step B2: Input the dialogue context and the initial representation vector of the answer to the multi-head attention mechanism module, obtain the semantic representation vector of the dialogue and the answer, and calculate the word similarity matrix of the dialogue and the answer. Specifically include the following steps:

步骤B21:选择能够整除d1的整数s,对对话上下文中的每句话,将其初始表征向量

Figure BDA00023763411500000710
与回答的初始表征向量
Figure BDA00023763411500000711
在最后一个维度上平均切分成s个子向量,分别得到子向量序列
Figure BDA00023763411500000712
Figure BDA00023763411500000713
其中
Figure BDA00023763411500000714
是ut的第h个子向量,
Figure BDA00023763411500000715
Figure BDA00023763411500000716
的第h个子向量;Step B21: Select an integer s that can divide d1 , and for each sentence in the dialogue context, use its initial representation vector
Figure BDA00023763411500000710
and the initial representation vector of the answer
Figure BDA00023763411500000711
In the last dimension, it is divided into s sub-vectors on average, and the sub-vector sequences are obtained respectively
Figure BDA00023763411500000712
and
Figure BDA00023763411500000713
in
Figure BDA00023763411500000714
is the h-th subvector of ut ,
Figure BDA00023763411500000715
Yes
Figure BDA00023763411500000716
The h-th subvector of ;

步骤B22:将

Figure BDA00023763411500000717
的每个子向量和
Figure BDA00023763411500000718
中对应的子向量构成一个子向量对,即
Figure BDA00023763411500000719
Figure BDA00023763411500000720
输入到注意力机制模块中,计算得到
Figure BDA00023763411500000721
的语义表征向量
Figure BDA00023763411500000722
以及
Figure BDA00023763411500000723
的语义表征向量
Figure BDA00023763411500000724
Step B22: put
Figure BDA00023763411500000717
Each subvector of and
Figure BDA00023763411500000718
The corresponding sub-vectors in form a sub-vector pair, that is,
Figure BDA00023763411500000719
Figure BDA00023763411500000720
Input into the attention mechanism module and calculate
Figure BDA00023763411500000721
The semantic representation vector of
Figure BDA00023763411500000722
as well as
Figure BDA00023763411500000723
The semantic representation vector of
Figure BDA00023763411500000724

其中

Figure BDA00023763411500000725
的计算公式如下:in
Figure BDA00023763411500000725
The calculation formula is as follows:

Figure BDA00023763411500000726
Figure BDA00023763411500000726

Figure BDA00023763411500000727
的计算公式如下:
Figure BDA00023763411500000727
The calculation formula is as follows:

Figure BDA0002376341150000081
Figure BDA0002376341150000081

其中,T表示矩阵转置操作;Among them, T represents the matrix transpose operation;

计算

Figure BDA0002376341150000082
的加权连接,得到ut的语义表征向量
Figure BDA0002376341150000083
表示如下:calculate
Figure BDA0002376341150000082
The weighted connection of , obtains the semantic representation vector of ut
Figure BDA0002376341150000083
It is expressed as follows:

Figure BDA0002376341150000084
Figure BDA0002376341150000084

计算

Figure BDA0002376341150000085
的加权连接,得到a的语义表征向量
Figure BDA0002376341150000086
表示如下:calculate
Figure BDA0002376341150000085
The weighted connection of , obtains the semantic representation vector of a
Figure BDA0002376341150000086
It is expressed as follows:

Figure BDA0002376341150000087
Figure BDA0002376341150000087

其中W1,W2为多头注意力机制的训练参数;where W1 , W2 are the training parameters of the multi-head attention mechanism;

步骤B23:计算对话上下文中的每句话与回答的词语相似度矩阵;ut表示对话上下文中的第t句话,则其与回答a的词语相似度矩阵

Figure BDA0002376341150000088
的计算公式如下:Step B23: Calculate the word similarity matrix of each sentence in the dialogue context and the answer; ut represents the t-th sentence in the dialogue context, then it is the word similarity matrix of answer a
Figure BDA0002376341150000088
The calculation formula is as follows:

Figure BDA0002376341150000089
Figure BDA0002376341150000089

步骤B3:将步骤B1得到的对话上下文和回答的初始表征向量输入到双向GRU网络中,计算对话和回答的双向隐状态,然后计算对话与回答的正向语义表征矩阵与反向语义表征矩阵。具体包括以下步骤:Step B3: Input the dialogue context and the initial representation vector of the answer obtained in Step B1 into the bidirectional GRU network, calculate the bidirectional hidden state of the dialogue and the answer, and then calculate the forward semantic representation matrix and the reverse semantic representation matrix of the dialogue and the answer. Specifically include the following steps:

步骤B31:将回答的初始表征向量视作词向量构成的序列,输入到双向GRU网络中,计算正向与反向隐状态向量;Step B31: Treat the initial representation vector of the answer as a sequence composed of word vectors, input it into the bidirectional GRU network, and calculate the forward and reverse hidden state vectors;

将回答的初始表征向量

Figure BDA00023763411500000810
视作
Figure BDA00023763411500000811
构成的序列,依次输入正向GRU,得到回答的正向隐状态向量
Figure BDA00023763411500000812
initial representation vector that will answer
Figure BDA00023763411500000810
regarded as
Figure BDA00023763411500000811
The sequence formed is input into the forward GRU in turn, and the forward hidden state vector of the answer is obtained.
Figure BDA00023763411500000812

Figure BDA00023763411500000813
依次输入反向GRU,得到回答的反向隐状态向量
Figure BDA00023763411500000814
其中
Figure BDA00023763411500000815
d2为GRU的单元数;Will
Figure BDA00023763411500000813
Enter the reverse GRU in turn to get the reverse hidden state vector of the answer
Figure BDA00023763411500000814
in
Figure BDA00023763411500000815
d2 is the unit number of GRU;

步骤B32:将对话上下文中的每句话的初始表征向量视作词向量构成的序列,输入到双向GRU网络中,计算正向与反向隐状态向量;Step B32: The initial representation vector of each sentence in the dialogue context is regarded as a sequence composed of word vectors, input into the bidirectional GRU network, and the forward and reverse hidden state vectors are calculated;

ut表示对话上下文中的第t句话,将

Figure BDA00023763411500000816
视作
Figure BDA00023763411500000817
构成的序列,依次输入正向GRU,得到对话上下文中第t句话ut的正向隐状态向量
Figure BDA0002376341150000091
ut represents the t-th sentence in the dialogue context, which will be
Figure BDA00023763411500000816
regarded as
Figure BDA00023763411500000817
The formed sequence is input to the forward GRU in turn, and the forward hidden state vector of thet -th sentence ut in the dialogue context is obtained.
Figure BDA0002376341150000091

Figure BDA0002376341150000092
依次输入反向GRU,得到对话上下文中第t句话ut的反向隐状态向量
Figure BDA0002376341150000093
其中
Figure BDA0002376341150000094
Will
Figure BDA0002376341150000092
Enter the reverse GRU in turn to get the reverse hidden state vector of thet -th sentence ut in the dialogue context
Figure BDA0002376341150000093
in
Figure BDA0002376341150000094

步骤B33:计算对话上下文中的每句话与回答的正向语义表征矩阵与反向语义表征矩阵;ut表示对话上下文中的第t句话,则其与回答a的正向语义表征矩阵M2,t与反向语义表征矩阵M3,t的计算公式如下:Step B33: Calculate the forward semantic representation matrix and reverse semantic representation matrix of each sentence and answer in the dialogue context; ut represents the t-th sentence in the dialogue context, then it and the forward semantic representation matrix M of answer a2,t and the reverse semantic representation matrix M3,t The calculation formula is as follows:

Figure BDA0002376341150000095
Figure BDA0002376341150000095

Figure BDA0002376341150000096
Figure BDA0002376341150000096

其中,

Figure BDA0002376341150000097
in,
Figure BDA0002376341150000097

步骤B4:将对话与回答的词语相似度矩阵、正向语义表征矩阵、反向语义表征矩阵合并为张量输入到二维卷积神经网络中,然后进行特征降维,得到融合对话与回答的语义信息的表征向量序列。具体包括以下步骤:Step B4: Combine the word similarity matrix, forward semantic representation matrix, and reverse semantic representation matrix of the dialogue and answer into a tensor and input it into a two-dimensional convolutional neural network, and then perform feature dimension reduction to obtain a fusion dialogue and answer matrix. A sequence of representation vectors for semantic information. Specifically include the following steps:

步骤B41:合并M1,t、M2,t、M3,t,得到张量

Figure BDA0002376341150000098
Step B41: Merge M1,t , M2,t , M3,t to obtain a tensor
Figure BDA0002376341150000098

Mt=[M1,t,M2,t,M3,t]Mt =[M1,t ,M2,t ,M3,t ]

步骤B42:将Mt输入到二维卷积神经网络中进行卷积和池化,然后输入全连接层进行降维,得到融合ut与a的语义信息的表征向量

Figure BDA0002376341150000099
其中d3为全连接层降维后的维度;Step B42: Input Mt into a two-dimensional convolutional neural network for convolution and pooling, and then input it into a fully connected layer for dimensionality reduction to obtain a representation vector that fuses the semantic information of ut and a
Figure BDA0002376341150000099
where d3 is the dimension of the fully connected layer after dimensionality reduction;

步骤B43:对对话上下文U中的每个语句,计算其与回答a的语义信息的表征向量

Figure BDA00023763411500000910
其中Lu为对话上下文U中的句子数。Step B43: For each sentence in the dialogue context U, calculate the representation vector of the semantic information of the sentence and the answer a
Figure BDA00023763411500000910
whereLu is the number of sentences in the dialogue context U.

步骤B5:将步骤B4得到的表征向量序列输入到双向GRU网络中,得到融合对话与回答的上下文依赖关系以及语义信息的表征向量

Figure BDA00023763411500000911
Step B5: Input the sequence of representation vectors obtained in Step B4 into the bidirectional GRU network to obtain the representation vector of the context dependency of the fusion dialogue and answer and the semantic information
Figure BDA00023763411500000911

其中,将表征向量序列

Figure BDA00023763411500000912
输入到双向GRU网络中,通过双向GRU网络对对话上下文与回答的关系进行建模,将最后输出的隐状态向量作为融合对话与回答的上下文依赖关系以及语义信息的表征向量
Figure BDA0002376341150000101
其中
Figure BDA0002376341150000102
where, will characterize the vector sequence
Figure BDA00023763411500000912
The input is input into the bidirectional GRU network, the relationship between the dialogue context and the answer is modeled through the bidirectional GRU network, and the final output hidden state vector is used as the representation vector of the context dependency of the fusion dialogue and answer and the semantic information.
Figure BDA0002376341150000101
in
Figure BDA0002376341150000102

步骤B6:重复步骤B2-B5,计算对话训练集中所有训练样本的融合对话与回答的上下文依赖关系以及语义信息的表征向量

Figure BDA0002376341150000103
Step B6: Repeat steps B2-B5 to calculate the contextual dependencies of the fused dialogue and answer and the representation vector of semantic information of all training samples in the dialogue training set
Figure BDA0002376341150000103

步骤B7:将所有样本的表征向量

Figure BDA0002376341150000104
输入到深度学习网络模型的全连接层,根据目标损失函数loss,利用反向传播方法计算深度网络中各参数的梯度,并利用随机梯度下降方法更新参数。具体包括以下步骤:Step B7: Convert the characterization vectors of all samples
Figure BDA0002376341150000104
Input to the fully connected layer of the deep learning network model, according to the target loss function loss, use the back propagation method to calculate the gradient of each parameter in the deep network, and use the stochastic gradient descent method to update the parameters. Specifically include the following steps:

步骤B71:将最终的表征向量

Figure BDA0002376341150000105
输入到全连接层,并使用softmax归一化,计算回答属于各类别的概率,计算公式如下:Step B71: Convert the final representation vector
Figure BDA0002376341150000105
Input to the fully connected layer and use softmax normalization to calculate the probability that the answer belongs to each category. The calculation formula is as follows:

Figure BDA0002376341150000106
Figure BDA0002376341150000106

gc(U,a)=softmax(y)gc (U,a)=softmax(y)

其中,Ws为全连接层权重矩阵,bs为全连接层的偏置项,gc(U,a)为回答属于步骤B1所处理的训练样本(U,a)中的对话上下文U的概率,0≤gc(U,a)≤1,c∈{正确,错误};Among them, Ws is the weight matrix of the fully connected layer, bs is the bias term of the fully connected layer, and gc (U, a) is the answer to the dialogue context U in the training sample (U, a) processed in step B1. Probability, 0≤gc (U, a)≤1, c∈{true, false};

步骤B72:用交叉熵作为损失函数计算损失值,通过梯度优化算法AdaGrad进行学习率更新,利用反向传播迭代更新模型参数,以最小化损失函数来训练模型;Step B72: Calculate the loss value by using the cross entropy as the loss function, update the learning rate through the gradient optimization algorithm AdaGrad, and iteratively update the model parameters by using backpropagation to train the model to minimize the loss function;

其中,最小化损失函数Loss的计算公式为:Among them, the calculation formula for minimizing the loss function Loss is:

Figure BDA0002376341150000107
Figure BDA0002376341150000107

其中,(Ui,ai)表示对话训练集TS中的第i个训练样本,yi为类别标签,yi∈{0,1}。Among them, (Ui , ai ) represents the ith training sample in the dialogue training set TS,yi is the category label, andyi ∈ {0,1}.

步骤B8:当深度学习网络模型产生的损失值小于设定阈值或者达到最大迭代次数时,终止深度学习网络模型的训练。Step B8: When the loss value generated by the deep learning network model is smaller than the set threshold or reaches the maximum number of iterations, the training of the deep learning network model is terminated.

步骤C:与用户进行对话,将用户提问输入到训练好的深度学习网络模型中,输出匹配的答案。Step C: Have a dialogue with the user, input the user's question into the trained deep learning network model, and output the matching answer.

本发明还提供了采用上述方法的多轮对话系统,如图2所示,包括:The present invention also provides a multi-round dialogue system using the above method, as shown in Figure 2, including:

构建训练集模块,用于采集对话上下文和回答数据,构建对话训练集TS;Build a training set module to collect dialogue context and answer data, and construct a dialogue training set TS;

模型训练模块,用于使用对话训练集TS,训练融合双向GRU网络的深度学习网络模型;以及A model training module for training a deep learning network model fused with a bidirectional GRU network using the dialogue training set TS; and

多轮对话模块,用于与用户进行对话,将用户提问输入训练好的深度学习网络模型中,输出最匹配的回答。The multi-round dialogue module is used to dialogue with the user, input the user's question into the trained deep learning network model, and output the most matching answer.

以上是本发明的较佳实施例,凡依本发明技术方案所作的改变,所产生的功能作用未超出本发明技术方案的范围时,均属于本发明的保护范围。The above are the preferred embodiments of the present invention, all changes made according to the technical solutions of the present invention, when the resulting functional effects do not exceed the scope of the technical solutions of the present invention, belong to the protection scope of the present invention.

Claims (9)

1. A multi-turn dialogue method based on a bidirectional GRU network is characterized by comprising the following steps:
step A: collecting conversation context and answer data, and constructing a conversation Training Set (TS);
and B: training a deep learning network model fusing a bidirectional GRU network by using a session Training Set (TS);
and C: and (5) carrying out dialogue with the user, inputting the user question into the trained deep learning network model, and outputting a matched answer.
2. The method of claim 1, wherein step B specifically comprises the following steps:
step B1: traversing a dialogue Training Set (TS), and coding the dialogue context and answer of each training sample to obtain an initial characterization vector;
step B2: inputting the initial characterization vectors of the conversation context and the answer into a multi-head attention mechanism module to obtain semantic characterization vectors of the conversation and the answer, and calculating a word similarity matrix of the conversation and the answer;
step B3: inputting the dialogue context and the initial characterization vector of the answer obtained in the step B1 into a bidirectional GRU network, calculating bidirectional hidden states of the dialogue and the answer, and then calculating a forward semantic characterization matrix and a reverse semantic characterization matrix of the dialogue and the answer;
step B4: merging a word similarity matrix, a forward semantic representation matrix and a reverse semantic representation matrix of the conversation and the answer into a tensor, inputting the tensor into a two-dimensional convolution neural network, and then performing feature dimensionality reduction to obtain a representation vector sequence fusing semantic information of the conversation and the answer;
step B5: inputting the characterization vector sequence obtained in the step B4 into a bidirectional GRU network to obtain a characterization vector fusing context dependence of conversation and answer and semantic information
Figure FDA0002376341140000011
Step B6: repeating the steps B2-B5, calculating the context dependency relationship of the fused dialog and answer of all the training samples in the dialog training set and the characterization vector of the semantic information
Figure FDA0002376341140000012
Step (ii) ofB7: feature vectors of all samples
Figure FDA0002376341140000013
Inputting the data into a full-connection layer of a deep learning network model, calculating the gradient of each parameter in the deep network by using a back propagation method according to a target loss function loss, and updating the parameter by using a random gradient descent method;
step B8: and terminating the training of the deep learning network model when the loss value generated by the deep learning network model is smaller than a set threshold value or reaches the maximum iteration number.
3. The method of claim 2, wherein in step B1, the dialog training set is expressed as
Figure FDA0002376341140000014
Wherein N represents the number of training samples, and (U, a) represents a training sample consisting of a conversation context U and an answer a in a conversation training set TS, wherein the conversation context U consists of a plurality of sentences in a conversation process, and each sentence in the conversation context U and the answer a are coded respectively to obtain an initial characterization vector of the conversation context U; if utRepresents the t < th > sentence in the dialog context U, its initial token vector
Figure FDA0002376341140000021
Expressed as:
Figure FDA0002376341140000022
the initial characterization vector for answer a is represented as:
Figure FDA0002376341140000023
wherein,
Figure FDA0002376341140000024
Ltand LaRespectively represents utAnd a the number of remaining words after word segmentation and removal of stop words,
Figure FDA0002376341140000025
and
Figure FDA0002376341140000026
are respectively as
Figure FDA0002376341140000027
And
Figure FDA0002376341140000028
the word vector of the ith word is obtained by pre-training the word vector matrix
Figure FDA0002376341140000029
Is found in d1Represents the dimension of the word vector, | D | represents the number of words in the dictionary.
4. The method for multi-turn dialog based on a bidirectional GRU network as claimed in claim 3, wherein said step B2 specifically comprises the steps of:
step B21: selecting the ability to divide d1S, for each sentence in the conversational context, its initial token vector
Figure FDA00023763411400000210
Initial characterization vector with answer
Figure FDA00023763411400000211
Dividing the last dimension into s sub-vectors to obtain sub-vector sequences
Figure FDA00023763411400000212
And
Figure FDA00023763411400000213
wherein
Figure FDA00023763411400000214
Is utThe h-th sub-vector of (2),
Figure FDA00023763411400000215
is that
Figure FDA00023763411400000216
The h-th sub-vector of (1);
step B22: will be provided with
Figure FDA00023763411400000217
Each subvector of (1) and
Figure FDA00023763411400000218
wherein the corresponding sub-vectors form a sub-vector pair, i.e.
Figure FDA00023763411400000219
Figure FDA00023763411400000220
Inputting the data into an attention mechanism module, and calculating to obtain
Figure FDA00023763411400000221
Semantic representation vector of
Figure FDA00023763411400000222
And
Figure FDA00023763411400000223
semantic representation vector of
Figure FDA00023763411400000224
Wherein
Figure FDA00023763411400000225
The calculation formula of (a) is as follows:
Figure FDA00023763411400000226
Figure FDA00023763411400000227
the calculation formula of (a) is as follows:
Figure FDA00023763411400000228
wherein T represents a matrix transpose operation;
computing
Figure FDA0002376341140000031
Weighted concatenation of to obtain utSemantic representation vector of
Figure FDA0002376341140000032
Is represented as follows:
Figure FDA0002376341140000033
computing
Figure FDA0002376341140000034
To obtain a semantic representation vector of a
Figure FDA0002376341140000035
Is represented as follows:
Figure FDA0002376341140000036
wherein W1,W2Training parameters for a multi-head attention system;
step B23: calculating the term facies of each sentence and answer in a conversational contextA similarity matrix; u. oftThe word similarity matrix representing the tth sentence in the dialog context with answer a
Figure FDA0002376341140000037
The calculation formula of (a) is as follows:
Figure FDA0002376341140000038
5. the method for multi-turn dialog based on a bidirectional GRU network as claimed in claim 4, wherein said step B3 specifically comprises the steps of:
step B31: taking the initial characterization vectors of the answers as a sequence formed by word vectors, inputting the sequence into a bidirectional GRU network, and calculating forward and reverse hidden state vectors;
initial characterization vector of answer
Figure FDA0002376341140000039
Is regarded as
Figure FDA00023763411400000310
The formed sequence is sequentially input into forward GRU to obtain a forward hidden state vector of the answer
Figure FDA00023763411400000311
Will be provided with
Figure FDA00023763411400000312
Sequentially inputting reverse GRUs to obtain a reverse hidden state vector of the answer
Figure FDA00023763411400000313
Wherein
Figure FDA00023763411400000314
d2The unit number of GRU;
step B32: regarding the initial characterization vector of each sentence in the dialog context as a sequence formed by word vectors, inputting the sequence into a bidirectional GRU network, and calculating forward and reverse hidden state vectors;
utthe t-th sentence in the context of the presentation dialog will
Figure FDA00023763411400000315
Is regarded as
Figure FDA00023763411400000316
The formed sequence is input with forward GRU in turn to obtain the t-th sentence u in the dialog contexttForward hidden state vector of
Figure FDA00023763411400000317
Will be provided with
Figure FDA00023763411400000318
Inputting reverse GRU in turn to obtain the t-th sentence u in the dialog contexttReverse hidden state vector of
Figure FDA00023763411400000319
Wherein
Figure FDA00023763411400000320
Step B33: calculating a forward semantic representation matrix and a reverse semantic representation matrix of each sentence in the dialogue context; u. oftDenotes the t-th sentence in the dialog context, and its forward semantic representation matrix M with answer a2,tAnd a reverse semantic representation matrix M3,tThe calculation formula of (a) is as follows:
Figure FDA0002376341140000041
Figure FDA0002376341140000042
wherein,
Figure FDA0002376341140000043
6. the method for multi-turn dialog based on a bidirectional GRU network as claimed in claim 5, wherein said step B4 specifically comprises the steps of:
step B41: merge M1,t、M2,t、M3,tTo obtain tensor
Figure FDA0002376341140000044
Mt=[M1,t,M2,t,M3,t]
Step B42: will MtInputting the data into a two-dimensional convolution neural network for convolution and pooling, and then inputting the data into a full-connection layer for dimensionality reduction to obtain a fusion utCharacterization vector of semantic information of a
Figure FDA0002376341140000045
Wherein d is3Dimension after dimension reduction of the full connection layer;
step B43: for each sentence in the dialog context U, a characterization vector of its semantic information with the answer a is calculated
Figure FDA0002376341140000046
Wherein L isuIs the number of sentences in the dialog context U.
7. The method of claim 6, wherein in step B5, the token vector sequence is represented by a two-way GRU network
Figure FDA0002376341140000047
Inputting the result into a bidirectional GRU network, modeling the relationship between the conversation context and the answer through the bidirectional GRU network, and outputting the finally output hidden state vectorCharacterization vectors as context dependencies and semantic information for fusing dialog and answers
Figure FDA0002376341140000048
Wherein
Figure FDA0002376341140000049
8. The method for multi-turn dialog based on a bidirectional GRU network as claimed in claim 7, wherein said step B7 specifically comprises the steps of:
step B71: the final characterization vector
Figure FDA00023763411400000410
Inputting the data into the full-link layer, and calculating the probability of answers belonging to each category by using softmax normalization, wherein the calculation formula is as follows:
Figure FDA00023763411400000411
gc(U,a)=softmax(y)
wherein, WsIs a full connection layer weight matrix, bsBias term for fully connected layer, gc(U, a) is the probability of answering the dialog context U belonging to the training sample (U, a) processed in step B1, 0 ≦ gc(U, a) is less than or equal to 1, c belongs to { correct, wrong };
step B72: calculating a loss value by using the cross entropy as a loss function, updating the learning rate by using a gradient optimization algorithm AdaGrad, and training a model by using a minimum loss function by updating model parameters through back propagation iteration;
the calculation formula of the Loss minimization function Loss is as follows:
Figure FDA0002376341140000051
wherein (U)i,ai) Presentation dialogue trainingSet i training sample in TS, yiAs a class label, yi∈{0,1}。
9. A multi-turn dialog system employing the method of any of claims 1-8 comprising:
a training set building module for collecting the dialogue context and the answer data and building a dialogue training set TS;
the model training module is used for training a deep learning network model fusing the bidirectional GRU network by using a session Training Set (TS); and
and the multi-turn dialogue module is used for carrying out dialogue with the user, inputting the user questions into the trained deep learning network model and outputting the best matched answers.
CN202010067240.9A2020-01-202020-01-20Multi-turn dialogue method and system based on bidirectional GRU networkExpired - Fee RelatedCN111274375B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202010067240.9ACN111274375B (en)2020-01-202020-01-20Multi-turn dialogue method and system based on bidirectional GRU network

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202010067240.9ACN111274375B (en)2020-01-202020-01-20Multi-turn dialogue method and system based on bidirectional GRU network

Publications (2)

Publication NumberPublication Date
CN111274375Atrue CN111274375A (en)2020-06-12
CN111274375B CN111274375B (en)2022-06-14

Family

ID=70996874

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202010067240.9AExpired - Fee RelatedCN111274375B (en)2020-01-202020-01-20Multi-turn dialogue method and system based on bidirectional GRU network

Country Status (1)

CountryLink
CN (1)CN111274375B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112434143A (en)*2020-11-202021-03-02西安交通大学Dialog method, storage medium and system based on hidden state constraint of GRU (generalized regression Unit)
CN112632236A (en)*2020-12-022021-04-09中山大学Improved sequence matching network-based multi-turn dialogue model
CN112818105A (en)*2021-02-052021-05-18江苏实达迪美数据处理有限公司Multi-turn dialogue method and system fusing context information
CN113157855A (en)*2021-02-222021-07-23福州大学Text summarization method and system fusing semantic and context information
WO2021147405A1 (en)*2020-08-312021-07-29平安科技(深圳)有限公司Customer-service statement quality detection method and related device
CN114443827A (en)*2022-01-282022-05-06福州大学Local information perception dialogue method and system based on pre-training language model
CN114490991A (en)*2022-01-282022-05-13福州大学 Dialogue structure-aware dialogue method and system based on fine-grained local information enhancement
CN114564568A (en)*2022-02-252022-05-31福州大学 Dialogue state tracking method and system based on knowledge enhancement and context awareness
CN114840652A (en)*2022-04-212022-08-02大箴(杭州)科技有限公司 Training method, device, model and dialogue scoring method for dialogue scoring model
CN115276697A (en)*2022-07-222022-11-01交通运输部规划研究院Coast radio station communication system integrated with intelligent voice
CN116028604A (en)*2022-11-222023-04-28福州大学 An answer selection method and system based on knowledge-enhanced graph convolutional network
CN116128438A (en)*2022-12-272023-05-16江苏巨楷科技发展有限公司Intelligent community management system based on big data record information

Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108874972A (en)*2018-06-082018-11-23青岛里奥机器人技术有限公司A kind of more wheel emotion dialogue methods based on deep learning
CN109460463A (en)*2018-11-152019-03-12平安科技(深圳)有限公司Model training method, device, terminal and storage medium based on data processing
CN109933659A (en)*2019-03-222019-06-25重庆邮电大学 A vehicle multi-round dialogue method for travel
CN110020015A (en)*2017-12-292019-07-16中国科学院声学研究所A kind of conversational system answers generation method and system
US20190385051A1 (en)*2018-06-142019-12-19Accenture Global Solutions LimitedVirtual agent with a dialogue management system and method of training a dialogue management system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110020015A (en)*2017-12-292019-07-16中国科学院声学研究所A kind of conversational system answers generation method and system
CN108874972A (en)*2018-06-082018-11-23青岛里奥机器人技术有限公司A kind of more wheel emotion dialogue methods based on deep learning
US20190385051A1 (en)*2018-06-142019-12-19Accenture Global Solutions LimitedVirtual agent with a dialogue management system and method of training a dialogue management system
CN109460463A (en)*2018-11-152019-03-12平安科技(深圳)有限公司Model training method, device, terminal and storage medium based on data processing
CN109933659A (en)*2019-03-222019-06-25重庆邮电大学 A vehicle multi-round dialogue method for travel

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
宋皓宇等: "基于DQN的开放域多轮对话策略学习", 《中文信息学报》*

Cited By (14)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2021147405A1 (en)*2020-08-312021-07-29平安科技(深圳)有限公司Customer-service statement quality detection method and related device
CN112434143B (en)*2020-11-202022-12-09西安交通大学Dialog method, storage medium and system based on hidden state constraint of GRU (generalized regression Unit)
CN112434143A (en)*2020-11-202021-03-02西安交通大学Dialog method, storage medium and system based on hidden state constraint of GRU (generalized regression Unit)
CN112632236A (en)*2020-12-022021-04-09中山大学Improved sequence matching network-based multi-turn dialogue model
CN112818105A (en)*2021-02-052021-05-18江苏实达迪美数据处理有限公司Multi-turn dialogue method and system fusing context information
CN112818105B (en)*2021-02-052021-12-07江苏实达迪美数据处理有限公司Multi-turn dialogue method and system fusing context information
CN113157855A (en)*2021-02-222021-07-23福州大学Text summarization method and system fusing semantic and context information
CN114443827A (en)*2022-01-282022-05-06福州大学Local information perception dialogue method and system based on pre-training language model
CN114490991A (en)*2022-01-282022-05-13福州大学 Dialogue structure-aware dialogue method and system based on fine-grained local information enhancement
CN114564568A (en)*2022-02-252022-05-31福州大学 Dialogue state tracking method and system based on knowledge enhancement and context awareness
CN114840652A (en)*2022-04-212022-08-02大箴(杭州)科技有限公司 Training method, device, model and dialogue scoring method for dialogue scoring model
CN115276697A (en)*2022-07-222022-11-01交通运输部规划研究院Coast radio station communication system integrated with intelligent voice
CN116028604A (en)*2022-11-222023-04-28福州大学 An answer selection method and system based on knowledge-enhanced graph convolutional network
CN116128438A (en)*2022-12-272023-05-16江苏巨楷科技发展有限公司Intelligent community management system based on big data record information

Also Published As

Publication numberPublication date
CN111274375B (en)2022-06-14

Similar Documents

PublicationPublication DateTitle
CN111274375B (en)Multi-turn dialogue method and system based on bidirectional GRU network
CN111274398B (en)Method and system for analyzing comment emotion of aspect-level user product
CN113297364B (en)Natural language understanding method and device in dialogue-oriented system
CN114443827B (en) Local information perception dialogue method and system based on pre-trained language model
CN111414481B (en) Chinese semantic matching method based on pinyin and BERT embedding
CN112232087B (en)Specific aspect emotion analysis method of multi-granularity attention model based on Transformer
CN115688879B (en) An intelligent customer service voice processing system and method based on knowledge graph
CN114238649B (en)Language model pre-training method with common sense concept enhancement
CN115810351B (en)Voice recognition method and device for controller based on audio-visual fusion
CN113807079B (en) A sequence-to-sequence based end-to-end joint entity and relation extraction method
CN111274359B (en) Query recommendation method and system based on improved VHRED and reinforcement learning
Dai et al.Hybrid deep model for human behavior understanding on industrial internet of video things
CN116050401B (en) Automatic Generation Method of Diversity Questions Based on Transformer Question Keyword Prediction
CN116010622A (en) BERT knowledge map completion method and system integrating entity types
CN108170848A (en)A kind of session operational scenarios sorting technique towards China Mobile&#39;s intelligent customer service
CN115796182A (en)Multi-modal named entity recognition method based on entity-level cross-modal interaction
CN111158640B (en)One-to-many demand analysis and identification method based on deep learning
CN113901758B (en) A relation extraction method for automatic knowledge graph construction system
CN114548293A (en) Video-text cross-modal retrieval method based on cross-granularity self-distillation
CN119739990B (en)Multi-mode emotion recognition method based on hypergraph level contrast learning
CN119272849A (en) Dialogue state tracking method and system based on context deconstruction
CN114564568B (en) Dialogue state tracking method and system based on knowledge enhancement and context awareness
CN115422945A (en) A rumor detection method and system integrating emotion mining
CN119003769A (en)Netizen view analysis method based on double large models
CN116860943B (en) Multi-round dialogue method and system for dialogue style perception and topic guidance

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant
CF01Termination of patent right due to non-payment of annual fee
CF01Termination of patent right due to non-payment of annual fee

Granted publication date:20220614


[8]ページ先頭

©2009-2025 Movatter.jp