CN117494728A

Movatterモバイル変換

Info

Publication number: CN117494728A
Application number: CN202311402170.8A
Authority: CN
Inventors: 于凤英; 王健宗; 程宁
Original assignee: Ping An Chuangke Technology Beijing Co ltd
Current assignee: Ping An Chuangke Technology Beijing Co ltd
Priority date: 2023-10-26
Filing date: 2023-10-26
Publication date: 2024-02-02

Abstract

The invention discloses a Chinese dialogue text intention recognition method, a device, equipment and a storage medium. The method comprises the following steps: acquiring a Chinese dialogue text, inputting the Chinese dialogue text into an ERNIE pre-training model, and outputting a semantic enhancement representation corresponding to the Chinese dialogue text; acquiring a first feature vector and a second feature vector; performing feature fusion on the first feature vector and the second feature vector to obtain a target fusion vector; and inputting the target fusion vector into a capsule network for intention recognition to acquire a target intention. According to the method, the target fusion vector is acquired, the feature extraction advantages of the feature extraction network and the self-attention network are fused, and the acquired target fusion vector is more accurate; the capsule network is used for carrying out intention recognition on the target fusion vector, and when the Chinese dialogue text is a small sample, the method has a more accurate intention recognition result and can improve the accuracy of intention recognition. The method is applied to the insurance business field, and has higher accuracy when the intention is identified to the text of the Chinese dialogue.

Description

Chinese dialogue text intention recognition method, device, equipment and storage medium

Technical Field

The invention relates to the field of deep learning, in particular to a method, a device, equipment and a storage medium for identifying Chinese dialogue text intention.

Background

In recent years, with the rapid development of deep learning, a man-machine conversation system is becoming more intelligent. In the insurance business field, in order to accurately conduct service guidance on a user, it is generally required to conduct intention recognition on the user through a man-machine dialogue system, and acquire an intention recognition result, so that the user can be helped to conduct corresponding service guidance according to the intention recognition result. In the prior art, dialogues in a man-machine dialog system are generally transcribed into texts, and intention recognition of users in the man-machine dialog system is realized by classifying the texts. For example, in the prior art, in a man-machine dialogue system in the field of security services, text after dialogue transcription for a user is acquired: the text is subjected to intention recognition, and an intention recognition result of 'insurance recommendation' is obtained, so that corresponding insurance service recommendation is conveniently carried out on a user in actual operation in the insurance service field, and the user is helped to accurately conduct service guide corresponding to the intention recognition result.

However, since the dialogue language of the user is generally spoken and the dialogue text content is less, the intention recognition result obtained by the final recognition can have lower accuracy of intention recognition by only carrying out intention recognition on the dialogue text of the user by the prior art. Therefore, how to improve the accuracy of dialog text intention recognition is a highly desirable problem.

Disclosure of Invention

The embodiment of the invention provides a method, a device, equipment and a storage medium for identifying Chinese dialogue text intention, which are used for solving the problem of how to improve the accuracy of dialogue text intention identification.

A method for chinese dialog text intent recognition, comprising:

acquiring a Chinese dialogue text, inputting the Chinese dialogue text into an ERNIE pre-training model, and outputting a semantic enhancement representation corresponding to the Chinese dialogue text;

extracting features of the semantic enhancement representation by adopting a feature extraction network to obtain a first feature vector;

extracting features of the semantic enhancement representation by adopting a self-attention network to obtain a second feature vector;

performing feature fusion on the first feature vector and the second feature vector to obtain a target fusion vector;

and inputting the target fusion vector into a capsule network for intention recognition to acquire the target intention of the Chinese dialogue text.

Preferably, the feature extraction of the semantic enhancement representation by using a feature extraction network, to obtain a first feature vector, includes:

inputting the semantic enhancement representation to a first convolution layer to obtain a first convolution vector;

performing forward traversal and backward traversal on the first convolution vector by adopting a feature extraction network to respectively acquire forward context information and backward context information;

and splicing the front context information and the rear context information to obtain a first feature vector.

Preferably, the feature extraction of the semantic enhancement representation by using the self-attention network to obtain a second feature vector includes:

identifying the semantic enhancement representation by adopting a self-attention network, and acquiring a first self-attention vector, a second self-attention vector and a third self-attention vector corresponding to each word vector in the semantic enhancement representation;

inputting the first self-attention vector corresponding to each word vector into a second convolution layer to obtain a fourth self-attention vector corresponding to the first self-attention vector;

inputting the second self-attention vector corresponding to each word vector into a third convolution layer to obtain a fifth self-attention vector corresponding to the second self-attention vector;

Taking the dot product of the fourth self-attention vector corresponding to each word vector and the fifth self-attention vector corresponding to all word vectors as the association degree between each word vector and all word vectors, and performing activation processing on the association degree to acquire a weight value corresponding to each word vector;

and weighting the third self-attention vector corresponding to each word vector by adopting a weight value corresponding to each word vector to acquire a second feature vector.

Preferably, the feature fusion of the first feature vector and the second feature vector is performed to obtain a target fusion vector, which includes:

correcting the first feature vector by adopting a first weight to obtain a first target vector;

correcting the second feature vector by adopting a second weight to obtain a second target vector;

and taking the sum value of the first target vector and the second target vector as a target fusion vector.

Preferably, the inputting the target fusion vector into a capsule network for intention recognition, obtaining the target intention of the chinese dialogue text, includes:

inputting the target fusion vector to each initialized capsule layer of a capsule network respectively, and obtaining an initial capsule vector corresponding to the target fusion vector;

Dynamically routing the initial capsule vector to obtain a target capsule vector;

and calculating the intention similarity between the target capsule vector and each candidate intention vector, taking the candidate intention vector corresponding to the maximum intention similarity as a target intention vector, and taking the intention corresponding to the target intention vector as a target intention.

Preferably, the number of the initial capsule vectors is N, wherein N is more than or equal to 2;

the dynamically routing the initial capsule vector to obtain a target capsule vector includes:

acquiring initial weights corresponding to each initial capsule vector, and acquiring a first intermediate vector according to N initial capsule vectors and the initial weights corresponding to the initial capsule vectors;

respectively calculating vector similarity between N initial capsule vectors and the first intermediate vector, determining the vector similarity as a target weight of the initial capsule vectors, and acquiring a second intermediate vector according to the N initial capsule vectors and the N target weights;

if the second intermediate vector is the last iteration result in the preset iteration times, determining the second intermediate vector as a target capsule vector;

if the second intermediate vector is not the last iteration result in the preset iteration times, updating the second intermediate vector into a first intermediate vector, and continuously executing the calculation of the vector similarity between the N initial capsule vectors and the first intermediate vector.

Preferably, the obtaining a first intermediate vector according to the N initial capsule vectors and the initial weights corresponding to the N initial capsule vectors includes:

determining the product of each initial capsule vector and the corresponding initial weight as a first initial vector corresponding to each initial capsule vector;

determining the sum of the first initial vectors corresponding to the N initial capsule vectors as a second initial vector;

and performing compression activation processing on the second initial vector by adopting a square compression activation function to obtain a first intermediate vector.

A chinese dialogue text intention recognition device, comprising:

the semantic enhancement representation acquisition module is used for acquiring a Chinese dialogue text, inputting the Chinese dialogue text into an ERNIE pre-training model and outputting a semantic enhancement representation corresponding to the Chinese dialogue text;

the first feature vector acquisition module is used for carrying out feature extraction on the semantic enhancement representation by adopting a feature extraction network to acquire a first feature vector;

the second feature vector acquisition module is used for carrying out feature extraction on the semantic enhancement representation by adopting a self-attention network to acquire a second feature vector;

the target fusion vector acquisition module is used for carrying out feature fusion on the first feature vector and the second feature vector to acquire a target fusion vector;

And the target intention acquisition module is used for inputting the target fusion vector into a capsule network for intention recognition and acquiring the target intention of the Chinese dialogue text.

A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the chinese dialogue text intention recognition method described above when executing the computer program.

A computer readable storage medium storing a computer program which when executed by a processor implements the chinese dialogue text intention recognition method described above.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an application environment of a method for recognizing Chinese dialog text intention in an embodiment of the invention;

FIG. 2 is a flow chart of a method for Chinese dialog text intent recognition in accordance with an embodiment of the present invention;

FIG. 3 is another flow chart of a method for Chinese dialog text intent recognition in accordance with an embodiment of the present invention;

FIG. 4 is another flow chart of a method for Chinese dialog text intent recognition in accordance with an embodiment of the present invention;

FIG. 5 is another flow chart of a method for Chinese dialog text intent recognition in accordance with an embodiment of the present invention;

FIG. 6 is another flow chart of a method for Chinese dialog text intent recognition in accordance with an embodiment of the present invention;

FIG. 7 is another flow chart of a method for Chinese dialog text intent recognition in accordance with an embodiment of the present invention;

FIG. 8 is another flow chart of a method for Chinese dialog text intent recognition in accordance with an embodiment of the present invention;

FIG. 9 is a schematic diagram of a Chinese dialogue text intention recognition device according to an embodiment of the invention;

FIG. 10 is a schematic diagram of a computer device in accordance with an embodiment of the invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The Chinese dialogue text intention recognition method provided by the embodiment of the invention can be applied to an application environment shown in fig. 1. Specifically, the Chinese dialogue text intention recognition method is applied to a Chinese dialogue text intention recognition system, wherein the Chinese dialogue text intention recognition system comprises a client and a server as shown in fig. 1, and the client and the server communicate through a network so as to realize the problem of how to improve the accuracy of dialogue text intention recognition. The client is also called a client, and refers to a program corresponding to the server for providing local service for the client. The client may be installed on, but is not limited to, various personal computers, notebook computers, smartphones, tablet computers, and portable wearable devices. The server may be implemented as a stand-alone server or as a server cluster composed of a plurality of servers.

In one embodiment, as shown in fig. 2, a method for identifying intent of chinese dialogue text is provided, and the method is applied to the server in fig. 1, and includes the following steps:

s201: acquiring a Chinese dialogue text, inputting the Chinese dialogue text into an ERNIE pre-training model, and outputting a semantic enhancement representation corresponding to the Chinese dialogue text;

s202: extracting features of the semantic enhancement representation by adopting a feature extraction network to obtain a first feature vector;

s203: extracting features of the semantic enhancement representation by adopting a self-attention network to obtain a second feature vector;

s204: performing feature fusion on the first feature vector and the second feature vector to obtain a target fusion vector;

s205: and inputting the target fusion vector into a capsule network for intention recognition to acquire the target intention of the Chinese dialogue text.

The Chinese dialogue text is a Chinese text obtained by identifying sentences spoken by a speaker. The ERNIE pre-training model is used to obtain vectors containing semantic information in chinese dialog text. The semantic enhancement representation is a vector containing semantic information, which is obtained after semantic information extraction is performed on the Chinese dialogue text.

As an example, in step S201, the server obtains a chinese dialogue text of a speaker when performing a man-machine dialogue, inputs the chinese dialogue text into an ERNIE pre-training model that is trained in advance, and performs semantic information extraction on the chinese dialogue text using the ERNIE pre-training model to obtain a semantic enhancement representation corresponding to the chinese dialogue text. For example, in the field of security services, for an application scenario corresponding to intention recognition, a server acquires a Chinese dialogue text "i want to make an application for my parents" of a speaker (i.e., a user of a man-machine dialogue system, what is recommended by you, inputs the Chinese dialogue text into an ERNIE pre-training model, extracts semantic information from the Chinese dialogue text, and can extract vector representations corresponding to semantic information such as "i", "parental", "go", "apply", "you", "what" and "recommended", and uses the vector corresponding to the semantic information as a semantic enhancement representation corresponding to the Chinese dialogue text. In the example, semantic information extraction is carried out on the Chinese dialogue text by using the ERNIE pre-training model, semantic enhancement representation is obtained, the semantic representation capacity of the Chinese dialogue text is enhanced, the quality of the semantic enhancement representation is improved, and the subsequent improvement of the accuracy of semantic feature extraction on the semantic enhancement representation is facilitated.

Wherein the feature extraction network refers to a network for feature extraction of the semantic enhanced representation. The first feature vector is the result of feature extraction of the semantic enhanced representation by the feature extraction network.

As an example, in step S202, the server performs feature extraction on the semantic enhancement representation using a feature extraction network, and obtains a first feature vector obtained by the feature extraction. In this example, the feature extraction network may be a Bi-directional long and short Term Memory network BiLSTM (Bi-directional Long Short-Term Memory) or a Bi-directional gated loop unit BiGRU (Bi-directional Gated Recurrent Units). In this example, feature extraction may be performed on the semantic enhancement identifier by using BiLSTM or biglu, bidirectional feature extraction may be performed according to the context information relationship of the semantic enhancement representation, a feature vector may be obtained, and the extracted feature vector may be used as the first feature vector. For example, in the field of security services, for an application scenario intended to be identified, using BiLSTM or biglu, feature extraction is performed on semantic enhancement representations corresponding to semantic information such as "i", "is", "i", "parent", "make", "apply", "you", "what" and "recommended", and a first feature vector is obtained. In this example, the feature extraction network may fully learn the relationship between context information in the semantic enhanced representation, and may also obtain a more accurate first feature vector for comparing the semantic enhanced representation corresponding to the spoken chinese dialog text.

Wherein the self-attention network is used for extracting features of the semantic enhanced representation. The second feature vector is the result of feature extraction of the semantic enhanced representation from the attention network.

As an example, in step S203, the server inputs the semantic enhancement representation into a self-attention network, obtains a q vector (query), a k vector (key), and a v vector (value) corresponding to the word vector in each semantic enhancement representation through the self-attention network, obtains a weight of each word vector according to the q vector and the k vector, performs weighted summation on the v vector corresponding to each word vector by using the weight, obtains a feature extraction result of the self-attention network, and uses the feature extraction result as a second feature vector. For example, in the field of security services, for an application scenario intended to be identified, q-vectors (query), k-vectors (key), and v-vectors (value) corresponding to word vectors corresponding to "me", "i", "parental", "proceed", "apply", "you", "what" and "recommend" are acquired based on a self-attention network, weights of each word vector are acquired according to the q-vectors and the k-vectors, the v-vectors corresponding to each word vector are weighted and summed using the weights, and the weighted and summed result is used as a second feature vector. In this example, feature extraction is performed on the semantic enhancement representation based on the self-attention network, so as to obtain a second feature vector, so that the obtained second feature vector is more focused on the internal information of the semantic enhancement representation, dependence on external information during feature extraction is reduced, and the extracted second feature vector is more attached to the internal information of the semantic enhancement representation.

The target fusion vector is a vector obtained by carrying out feature fusion on the first feature vector and the second feature vector.

As an example, in step S204, after obtaining the first feature vector extracted by the feature extraction network and the second feature vector extracted by the self-attention network, the server performs fusion processing on the first feature vector and the second feature vector to obtain a target fusion vector. In the example, the target fusion vector is acquired, so that the advantages of feature extraction by the feature extraction network and the self-attention network are conveniently fused, and the acquired target fusion vector is more accurate.

The capsule network is used for carrying out intention recognition on the target fusion vector. The target intention refers to the intention recognition result of the Chinese dialog text.

As an example, in step S205, the server inputs the target fusion vector to the initialized capsule layer of the capsule network, iterates the output of the initialized capsule layer by using a dynamic routing manner, obtains an iterated vector, obtains the association degree between the vector and the vector corresponding to each intention category, and uses the intention category with the highest association degree as the target intention of the chinese dialogue text. For example, in the field of insurance business, for an application scene corresponding to intention recognition, a Chinese dialogue text 'I want to make insurance for my parents', what is recommended is the corresponding target fusion vector is input to an initialization capsule layer, the output of the initialization capsule layer is iterated in a dynamic routing mode to obtain an iterated vector, the association degree between the vector and the vectors corresponding to all intention categories is obtained, and the intention category 'insurance recommendation' with the highest association degree is taken as the target intention of the Chinese dialogue text, so that the aim of accurately recognizing the target intention of the Chinese dialogue text is fulfilled. In the example, the capsule network is used for carrying out intention recognition on the target fusion vector to obtain the target intention, and the accurate intention recognition result is provided under the condition that the Chinese dialogue text is a small sample, so that the accuracy of the intention recognition can be improved.

According to the Chinese dialogue text intention recognition method provided by the embodiment, semantic information extraction is carried out on the Chinese dialogue text by using the ERNIE pre-training model, semantic enhancement representation is obtained, the semantic representation capacity of the Chinese dialogue text is enhanced, and the quality of the semantic enhancement representation is improved; the feature extraction network can fully learn the relation between context information in the semantic enhancement representation, and can acquire a more accurate first feature vector for comparing the semantic enhancement representation corresponding to the spoken Chinese dialogue text; the self-attention network performs feature extraction on the semantic enhancement representation, so that the dependence on external information during feature extraction is reduced by the obtained second feature vector, and the internal information of the semantic enhancement representation is attached; the first feature vector and the second feature vector are fused to obtain a target fusion vector, so that the advantages of feature extraction by a feature extraction network and a self-attention network are conveniently fused, and the obtained target fusion vector is more accurate; the capsule network is used for carrying out intention recognition on the target fusion vector to obtain target intention, and under the condition that the Chinese dialogue text is a small sample, the method has a relatively accurate intention recognition result, and can effectively improve the accuracy of intention recognition.

According to the method for identifying the intention of the Chinese dialogue text, which is provided by the embodiment, in the field of insurance business, for the application scene corresponding to the intention identification, the problem that the intention identification accuracy is low due to excessive spoken language of a user and less dialogue text content (namely, small sample of the Chinese dialogue text) can be solved, a relatively accurate intention identification result is obtained, the purpose of effectively improving the intention identification accuracy is realized, and relatively accurate service guidance is conveniently carried out on the user according to the intention identification result in the actual operation in the field of insurance business.

In one embodiment, as shown in fig. 3, step S202, that is, performing feature extraction on the semantic enhancement representation by using a feature extraction network, obtains a first feature vector, includes:

s301: inputting the semantic enhancement representation to a first convolution layer to obtain a first convolution vector;

s302: performing forward traversal and backward traversal on the first convolution vector by adopting a feature extraction network to respectively acquire forward context information and backward context information;

s303: and splicing the front context information and the rear context information to obtain a first feature vector.

Wherein the first convolution layer is a convolution layer added before the feature extraction network for performing convolution processing on the semantic enhanced representation. The first convolution vector refers to the output result of the semantic enhancement representation in the first convolution layer.

As an example, in step S301, the server inputs the semantic enhancement representation to the first convolution layer, and performs convolution processing on the semantic enhancement representation by using the first convolution layer to obtain a first convolution vector corresponding to each word vector in the semantic enhancement representation. In this example, before feature extraction is performed by using the feature extraction network, the first convolution layer is used to perform convolution processing on the semantic enhancement representation, so that in the semantic enhancement representation, the first convolution vector corresponding to each word vector contains local correlation of the semantic enhancement representation, which is convenient for improving the expression obtaining and generalization capability of the feature extraction network when the feature extraction network is used for feature extraction in the following process, and the feature extraction network has a better feature extraction effect.

The forward context information refers to a vector which is extracted by the feature extraction network through forward traversal of the first convolution vector and contains the context information. The backward context information refers to a vector containing context information extracted by the feature extraction network by backward traversing the first convolution vector.

As an example, in step S302, the server uses the feature extraction network to perform forward traversal on the first convolution vectors, obtain forward context information of each first convolution vector, perform backward traversal on the first convolution vectors, and obtain backward context information of each first convolution vector. For example, the server uses BiLSTM or biglu to perform forward traversal and backward traversal on the obtained first convolution vectors, respectively, to obtain forward context information and backward context information of each first convolution vector. In this example, the front context information and the rear context information are acquired, so that the feature information of the first convolution vector is more comprehensively acquired.

As an example, in step S303, after obtaining the front context information and the back context information of each first convolution vector, the server performs a stitching process on the front context information and the back context information of each first convolution vector according to the sequence of the first convolution vectors, so as to obtain a first feature vector. In this example, the front context information and the rear context information are spliced, and for comparing semantic enhancement representations corresponding to the spoken Chinese dialogue text, the obtained first feature vector can include more detailed context information, so that a target fusion vector including more context information can be obtained when feature fusion is performed according to the first feature vector.

According to the Chinese dialogue text intention recognition method provided by the embodiment, the semantic enhancement representation is input to the first convolution layer, the first convolution vector is obtained, the first convolution layer is adopted to carry out convolution processing on the semantic enhancement representation, and the first convolution vector can be made to contain local correlation of the semantic enhancement representation; performing forward traversal and backward traversal on the first convolution vector by adopting a feature extraction network to respectively acquire forward context information and backward context information, so that feature information in the first convolution vector can be acquired more comprehensively; and the front context information and the rear context information are spliced, and for semantic enhancement representation corresponding to the comparative spoken Chinese dialogue text, the acquired first feature vector can contain more detailed context information, so that the target fusion vector containing more context information can be acquired when feature fusion is carried out according to the first feature vector.

In one embodiment, as shown in fig. 4, step S203, that is, performing feature extraction on the semantic enhancement representation by using the self-attention network, obtains a second feature vector, includes:

s401: identifying the semantic enhancement representation by adopting a self-attention network, and acquiring a first self-attention vector, a second self-attention vector and a third self-attention vector corresponding to each word vector in the semantic enhancement representation;

s402: inputting the first self-attention vector corresponding to each word vector into a second convolution layer, and obtaining a fourth self-attention vector corresponding to the first self-attention vector;

s403: inputting a second self-attention vector corresponding to each word vector into a third convolution layer, and obtaining a fifth self-attention vector corresponding to the second self-attention vector;

s404: taking the dot product of the fourth self-attention vector corresponding to each word vector and the fifth self-attention vector corresponding to all word vectors as the association degree between each word vector and all word vectors, and activating the association degree to acquire a weight value corresponding to each word vector;

s405: and weighting the third self-attention vector corresponding to each word vector by adopting a weight value corresponding to each word vector to acquire a second feature vector.

The word vector refers to a vector corresponding to semantic information of the Chinese dialogue text in the semantic enhancement representation. Understandably, the semantic enhancement representation refers to a vector containing semantic information obtained by extracting semantic information from the text of the Chinese dialogue, and thus, each semantic information corresponds to one word vector in the semantic enhancement representation. The first self-attention vector is a q vector (query) corresponding to each word vector obtained by identifying each word vector by the self-attention network. The second self-attention vector refers to a k vector (key) corresponding to each word vector obtained by identifying each word vector by the self-attention network. The third self-attention vector refers to a v vector (value) corresponding to each word vector obtained by identifying each word vector by the self-attention network.

As an example, in step S401, the server identifies the semantic enhancement representation using the self-attention network, and obtains a q vector (query), a k vector (key), and a v vector (value) corresponding to each word vector in the semantic enhancement representation, where the q vector (query) corresponding to each word vector is used as a first self-attention vector, the k vector (key) corresponding to each word vector is used as a second self-attention vector, and the v vector (value) corresponding to each word vector is used as a third self-attention vector. In this example, the first self-attention vector, the second self-attention vector, and the third self-attention vector are acquired, facilitating subsequent acquisition of the second feature vector from the self-attention vectors described above.

The second convolution layer refers to a convolution layer which carries out convolution processing on the first self-attention vector. The fourth self-attention vector is the convolution output result of the second convolution layer.

Wherein the third convolution layer refers to a convolution layer that convolves the second self-attention vector. The fifth self-attention vector is the convolution output result of the third convolution layer.

The association degree refers to the degree of association between each word vector. The weight value is obtained based on the fourth self-attention vector and the fifth self-attention vector and is used for carrying out feature fusion on the third self-attention vector corresponding to each word vector.

As an example, in step S404, the server uses the dot product of the fourth self-attention vector corresponding to each word vector and the fifth self-attention vector corresponding to all word vectors as the association degree between each word vector and all word vectors, and uses an activation function to activate the association degree to obtain the weight value corresponding to each word vector. In this example, since the fourth self-attention vector and the fifth self-attention vector are vectors with reduced modular length, the calculated amount is smaller, so that the weight value corresponding to each word vector is more convenient to obtain; and obtaining a weight value corresponding to each word vector, so that the possibility is realized that the third self-attention output is weighted according to the weight value.

As an example, in step S405, the server performs weighting processing on the third self-attention vector corresponding to each word vector by using the weight value corresponding to each word vector, to obtain the second feature vector. In this example, a weighted sum process is performed on the third self-attention vector corresponding to each word vector by using a weight value corresponding to the word vector, so as to obtain a second feature vector. For example, for each word vector, the corresponding weight value { alpha }₁ ，α₂ ，…，α_i ，…α_n Third self-attention vector { v } corresponding to each word vector₁ ，v₂ ，…，v_i ，…v_n Weighted summation to obtain a second eigenvectorWherein alpha is_i For the weight value corresponding to the ith word vector, v_i And a third self-attention vector corresponding to the ith word vector, wherein n is the number of word vectors and is also the number of the third self-attention vectors. In this example, the second feature vector is obtained, so that the second feature vector reduces dependence on external information, more features in semantic information are contained, and the meaning of Chinese text is more attachedThe diagram facilitates subsequent acquisition of the target fusion vector from the second feature vector.

According to the Chinese dialogue text intention recognition method provided by the embodiment, the fourth self-attention vector with the reduced length is obtained by adopting the second convolution layer, and the fifth self-attention vector with the reduced length is obtained by adopting the third convolution layer, so that the calculation amount is reduced when the fourth self-attention vector and the fifth self-attention vector are calculated subsequently; the weight value corresponding to each word vector is obtained, so that the possibility is realized that the third self-attention output is weighted according to the weight value; the second feature vector is obtained through the self-attention network, so that the dependence on external information is reduced by the second feature vector, features in semantic information are contained more, the intention of Chinese dialogue text is more fitted, and the target fusion vector can be conveniently obtained according to the second feature vector.

In one embodiment, as shown in fig. 5, step S204, i.e. performing feature fusion on the first feature vector and the second feature vector, obtains a target fusion vector, includes:

s501: correcting the first feature vector by adopting a first weight to obtain a first target vector;

s502: correcting the second feature vector by adopting a second weight to obtain a second target vector;

s503: and taking the sum value of the first target vector and the second target vector as a target fusion vector.

Wherein the first weight is a hyper-parameter that modifies the first feature vector. The first target vector is a vector obtained by correcting the first feature vector. The first weight is a super parameter between 0 and 1.

As an example, in step S501, the server corrects the first feature vector by using the first weight, to obtain a corrected first target vector. In this example, for the first feature vectorAnd a first weight beta, the first target vector being +.>In this example, the corrected first target vector is acquired, so that the target fusion vector acquired subsequently includes the vector features of the first feature vector.

Wherein the second weight is a hyper-parameter that modifies the second feature vector. The second target vector is a vector obtained by correcting the second feature vector. The second weight is a super parameter between 0 and 1.

As an example, in step S502, the server corrects the second feature vector by using the second weight, to obtain a corrected second target vector. In this example, for the second feature vectorAnd a second weight gamma, the second target vector being +.>In this example, the corrected second target vector is acquired, so that the target fusion vector acquired later contains the vector features of the second feature vector.

As an example, in step S503, the server takes the sum of the first target vector and the second target vector as the target fusion vector. In this example, the target fusion vector isIn the example, the first target vector and the second target vector are fused, so that the obtained target fusion vector not only contains more detailed context information for comparing semantic enhancement representations corresponding to the spoken Chinese dialogue text, but also has the advantage of a feature extraction network; features in semantic information can be contained more, the method is more suitable for the intention of Chinese text, the advantage of a self-attention network is achieved, and the accuracy of intention recognition is improved when the intention recognition is carried out on the target fusion vector in the follow-up process.

According to the method for identifying the intention of the Chinese dialogue text, the first target vector and the second target vector are fused, so that the obtained target fusion vector not only contains more detailed context information, but also can contain more features in semantic information, the intention of the Chinese dialogue text is attached, and the accuracy of intention identification is improved when the intention identification is carried out on the target fusion vector in the follow-up process.

In one embodiment, as shown in fig. 6, step S205, namely, inputting a target fusion vector into a capsule network for intention recognition, obtaining a target intention of a chinese dialogue text, includes:

s601: respectively inputting the target fusion vector into each initialized capsule layer of the capsule network, and obtaining an initial capsule vector corresponding to the target fusion vector;

s602: dynamically routing the initial capsule vector to obtain a target capsule vector;

s603: the intention similarity between the target capsule vector and each candidate intention vector is calculated, the candidate intention vector corresponding to the maximum intention similarity is taken as a target intention vector, and the intention corresponding to the target intention vector is taken as a target intention.

The initialization capsule layer is used for converting the target fusion vector into a corresponding initial capsule vector in the capsule network. The initial capsule vector is a vector which is output after the target fusion vector is converted by the initial capsule layer.

As an example, in step S601, the server inputs the target fusion vector to each initialization capsule layer of the capsule network, and each initialization capsule layer converts the input target fusion vector into a corresponding initial capsule vector. In this example, an initial capsule vector corresponding to the target fusion vector is obtained, so that a corresponding target capsule vector is conveniently obtained according to the initial capsule vector.

The dynamic route is used for carrying out iterative updating on the initial capsule vector to obtain the target capsule vector. The target capsule vector is the output vector of the capsule network.

As an example, in step S602, the server processes each initial capsule vector in a dynamic routing manner, and uses the finally output vector as the target capsule vector. In this example, the target capsule vector is obtained, so that the intention recognition is conveniently performed on the target capsule vector, and the target intention is obtained.

The candidate intention vector is a vector corresponding to a disagreement map category preset for intention recognition of the target capsule vector. The intent similarity refers to the similarity between the target capsule vector and each candidate intent vector. The target intention vector is the candidate intention vector with the greatest intention similarity. The target intention is an intention corresponding to the target intention vector.

As an example, in step S603, after obtaining the target capsule vector output by the capsule network, the server calculates the intention similarity between the target capsule vector and each candidate intention vector, uses the candidate intention vector corresponding to the maximum intention similarity as the target intention vector, uses the intention corresponding to the target intention vector as the target intention, and realizes the intention recognition on the text of the Chinese dialogue. In this example, the intent similarity between the target capsule vector and each candidate intent vector may be selected as a cosine similarity, that is, the magnitude of the intent similarity may be determined by calculating the cosine similarity between the target capsule vector and each candidate intent vector, and if the cosine similarity is large, the intent similarity is large. In the example, the candidate intention vector corresponding to the maximum intention similarity is taken as the target intention vector, and the target intention corresponding to the target intention vector is taken as the intention recognition result.

According to the method for identifying the intention of the Chinese dialogue text, which is provided by the embodiment, the intention similarity between the target capsule vector and each candidate intention vector is calculated, the candidate intention vector corresponding to the maximum intention similarity is taken as the target intention vector, the target intention corresponding to the target intention vector is taken as the intention identification result, and the more accurate intention identification result can be realized under the condition that the target capsule vector is a small sample.

In one embodiment, as shown in fig. 7, step S602, namely dynamically routing the initial capsule vector to obtain the target capsule vector, includes:

s701: acquiring initial weights corresponding to each initial capsule vector, and acquiring a first intermediate vector according to N initial capsule vectors and the initial weights corresponding to the N initial capsule vectors;

s703: if the second intermediate vector is the last iteration result in the preset iteration times, determining the second intermediate vector as a target capsule vector;

S704: if the second intermediate vector is not the last iteration result in the preset iteration times, updating the second intermediate vector to the first intermediate vector, and continuously executing the calculation of the vector similarity between the N initial capsule vectors and the first intermediate vector respectively.

Wherein the number of initial capsule vectors is N, and N is more than or equal to 2. The initial weight is a weight corresponding to each initial capsule vector set according to the empirical value. The first intermediate vector is a vector obtained by processing each initial capsule vector according to the initial weight.

As an example, in step S701, the server obtains a preset initial weight, and performs an operation process on the initial capsule vector according to the initial weight to obtain a first intermediate vector. In this example, a first intermediate vector is obtained, facilitating the obtaining of a second intermediate vector from the first intermediate vector.

As an example, in step S702, the server calculates a vector similarity between each initial capsule vector and the first intermediate vector from among N initial capsule vectors, and uses the vector similarity between each initial capsule vector and the first intermediate vector as a target weight corresponding to the initial capsule vector to obtain N target weights. And respectively carrying out weighted summation processing on the N initial capsule vectors corresponding to the N target weights to obtain weighted summation processing results, and carrying out activation compression on the weighted summation processing results by using a square compression activation function to obtain a second intermediate vector with the modular length range of [0,1 ]. In this example, the compressed second intermediate vector is obtained, so that the target capsule vector is obtained according to the second intermediate vector.

The preset iteration times refer to the preset times of iterating the initial capsule vector by adopting a dynamic routing mode.

As an example, in step S703, when the server determines that the obtained second intermediate vector is the iteration result obtained in the last iteration in the preset number of iterations, the server outputs the second intermediate vector corresponding to the iteration result as the target capsule vector. Understandably, if the second intermediate vector is the iteration result obtained in the last iteration in the preset iteration times, it indicates that the iteration of the initial capsule vector has reached the preset iteration times, and the obtained second intermediate vector is the target capsule vector without performing the iteration again.

As an example, in step S704, when it is determined that the obtained second intermediate vector is not the iteration result obtained in the last iteration, the server updates the second intermediate vector to the first intermediate vector, and continues to perform calculation of the vector similarity between the N initial capsule vectors and the first intermediate vector, that is, continues to iterate the initial capsule vector until it is determined that the obtained second intermediate vector is the iteration result obtained in the last iteration in the preset iteration number, and uses the second intermediate vector obtained in the last iteration as the target capsule vector. Understandably, if the second intermediate vector is not the iteration result obtained by the last iteration in the preset iteration times, it indicates that the iteration of the initial capsule vector does not reach the preset iteration times, and the iteration needs to be continued on the initial capsule vector until the second intermediate vector obtained by the determination is the iteration result obtained by the last iteration in the preset iteration times.

According to the method for identifying the intention of the Chinese dialogue text, a dynamic routing mode is adopted, iteration is carried out on the initial capsule vector according to the preset iteration times, and the target capsule vector corresponding to the Chinese dialogue text with a small sample is obtained. The method has a good iteration effect on the data set of the small sample, can comprehensively capture the relation between the initial capsule vector and the intention, enables the obtained target capsule vector to contain more intention information, and has a good recognition effect when the intention recognition is carried out on the target capsule vector subsequently.

In an embodiment, as shown in fig. 8, step S701, that is, obtaining a first intermediate vector according to N initial capsule vectors and their corresponding initial weights, includes:

s801: determining the product of each initial capsule vector and the corresponding initial weight as a first initial vector corresponding to each initial capsule vector;

s802: determining the sum of first initial vectors corresponding to the N initial capsule vectors as a second initial vector;

s803: and performing compression activation processing on the second initial vector by adopting a square compression activation function to obtain a first intermediate vector.

As an example, in step S801, the server determines a product of each initial capsule vector and its corresponding initial weight as a first initial vector corresponding to each initial capsule vector, to obtain N first initial vectors. In this example, a first initial vector corresponding to each initial capsule vector is obtained, so that a sum of the first initial vectors can be obtained later.

As an example, in step S802, the server determines a sum of the first initial vectors corresponding to the N initial capsule vectors as the second initial vector, so as to facilitate the subsequent acquisition of the first intermediate vector.

The square compression activation function is used for compressing the second initial vector, so that the modulo length of the second initial vector is between [0,1 ].

As an example, in step S803, the server performs compression activation processing on the second initial vector using a squaring compression activation function, to obtain a first intermediate vector. Wherein the modulus of the first intermediate vector is between [0,1 ]. In this example, the first intermediate vector is obtained, so that the target weight is determined according to the similarity between the first intermediate vector and each initial capsule vector.

According to the method for recognizing the intention of the Chinese dialogue text, the product of each initial capsule vector and the corresponding initial weight is determined to be the first initial vector corresponding to each initial capsule vector, the sum of the first initial vectors corresponding to N initial capsule vectors is determined to be the second initial vector, and the second initial vector is compressed and activated by adopting a square compression activation function to obtain the first intermediate vector.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.

In one embodiment, a chinese dialogue text intention recognition device is provided, which corresponds to the chinese dialogue text intention recognition method in the above embodiment one by one. As shown in fig. 9, the chinese dialog text intention recognition device includes a semantic enhancement representation acquisition module 901, a first feature vector acquisition module 902, a second feature vector acquisition module 903, a target fusion vector acquisition module 904, and a target intention acquisition module 905. The functional modules are described in detail as follows:

the semantic enhancement representation acquisition module 901 is used for acquiring a Chinese dialogue text, inputting the Chinese dialogue text into the ERNIE pre-training model, and outputting a semantic enhancement representation corresponding to the Chinese dialogue text;

a first feature vector obtaining module 902, configured to perform feature extraction on the semantic enhancement representation by using a feature extraction network to obtain a first feature vector;

a second feature vector obtaining module 903, configured to perform feature extraction on the semantic enhancement representation by using a self-attention network, to obtain a second feature vector;

The target fusion vector acquisition module 904 is configured to perform feature fusion on the first feature vector and the second feature vector to acquire a target fusion vector;

the target intention obtaining module 905 is configured to input the target fusion vector into the capsule network for intention recognition, and obtain a target intention of the chinese dialogue text.

In an embodiment, the first feature vector acquisition module 902 includes:

the first convolution vector acquisition submodule is used for inputting the semantic enhancement representation into the first convolution layer to acquire a first convolution vector;

the context information acquisition sub-module is used for performing forward traversal and backward traversal on the first convolution vector by adopting the feature extraction network to acquire forward context information and backward context information respectively;

and the first feature vector acquisition sub-module is used for splicing the front context information and the rear context information to acquire a first feature vector.

In one embodiment, the second feature vector obtaining module 903 includes:

the self-attention vector first acquisition submodule is used for identifying the semantic enhancement representation by adopting a self-attention network and acquiring a first self-attention vector, a second self-attention vector and a third self-attention vector corresponding to each word vector in the semantic enhancement representation;

The second self-attention vector acquisition submodule is used for inputting the first self-attention vector corresponding to each word vector into the second convolution layer and acquiring a fourth self-attention vector corresponding to the first self-attention vector;

a third self-attention vector obtaining sub-module, configured to input a second self-attention vector corresponding to each word vector into a third convolution layer, and obtain a fifth self-attention vector corresponding to the second self-attention vector;

the weight value acquisition sub-module is used for taking the dot product of the fourth self-attention vector corresponding to each word vector and the fifth self-attention vector corresponding to all word vectors as the association degree between each word vector and all word vectors, and performing activation processing on the association degree to acquire the weight value corresponding to each word vector;

and the second feature vector acquisition sub-module is used for acquiring a second feature vector by adopting a weight value corresponding to each word vector to carry out weighting processing on a third self-attention vector corresponding to each word vector.

In one embodiment, the target fusion vector acquisition module 904 includes:

the first target vector acquisition sub-module is used for correcting the first feature vector by adopting the first weight to acquire a first target vector;

The second target vector acquisition sub-module is used for correcting the second characteristic vector by adopting a second weight to acquire a second target vector;

and the target fusion vector acquisition sub-module is used for taking the sum value of the first target vector and the second target vector as a target fusion vector.

In one embodiment, the target intent acquisition module 905 includes:

the initial capsule vector acquisition sub-module is used for respectively inputting the target fusion vector into each initialized capsule layer of the capsule network to acquire an initial capsule vector corresponding to the target fusion vector;

the target capsule vector acquisition sub-module is used for dynamically routing the initial capsule vector to acquire a target capsule vector;

the target intention obtaining sub-module is used for calculating the intention similarity between the target capsule vector and each candidate intention vector, taking the candidate intention vector corresponding to the maximum intention similarity as a target intention vector, and taking the intention corresponding to the target intention vector as a target intention.

In one embodiment, the target capsule vector acquisition sub-module comprises:

the first intermediate vector acquisition unit is used for acquiring initial weights corresponding to each initial capsule vector, and acquiring first intermediate vectors according to the N initial capsule vectors and the initial weights corresponding to the N initial capsule vectors;

The second intermediate vector acquisition unit is used for respectively calculating the vector similarity between the N initial capsule vectors and the first intermediate vector, determining the vector similarity as the target weight of the initial capsule vectors, and acquiring the second intermediate vector according to the N initial capsule vectors and the N target weights;

the first acquisition unit is used for determining the second intermediate vector as a target capsule vector if the second intermediate vector is the last iteration result in the preset iteration times;

and the second acquisition unit is used for updating the second intermediate vector into the first intermediate vector if the second intermediate vector is not the last iteration result in the preset iteration times, and continuously executing the calculation of the vector similarity between the N initial capsule vectors and the first intermediate vector respectively.

Wherein the number of initial capsule vectors is N, and N is more than or equal to 2;

in an embodiment, the first intermediate vector acquisition unit includes:

a first initial vector determining subunit, configured to determine, as a first initial vector corresponding to each initial capsule vector, a product of each initial capsule vector and an initial weight corresponding to the initial capsule vector;

a second initial vector determining subunit, configured to determine a sum of first initial vectors corresponding to the N initial capsule vectors as a second initial vector;

The first intermediate vector obtaining subunit is configured to perform compression activation processing on the second initial vector by using a squaring compression activation function to obtain a first intermediate vector.

For specific limitations on the chinese dialog text intention recognition device, reference may be made to the above limitation on the chinese dialog text intention recognition method, and no further description is given here. The above-described respective modules in the chinese dialogue text intention recognition device may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 10. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing data adopted or generated in the process of executing the Chinese dialogue text intention recognition method. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a method for Chinese dialogue text intent recognition.

In an embodiment, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the method for identifying intent of chinese dialog text in the above embodiment when executing the computer program, for example, S201-S205 shown in fig. 2, or S201-S205 shown in fig. 3-8, which are not repeated here. Alternatively, the processor may implement the functions of the modules/units in this embodiment of the chinese dialogue text intention recognition device when executing the computer program, for example, the functions of the semantic enhancement presentation acquisition module 901, the first feature vector acquisition module 902, the second feature vector acquisition module 903, the target fusion vector acquisition module 904, and the target intention acquisition module 905 shown in fig. 9, which are not repeated herein.

In an embodiment, a computer readable storage medium is provided, and a computer program is stored on the computer readable storage medium, where the computer program is executed by a processor to implement the method for identifying intent of chinese dialog text in the above embodiment, for example, S201-S205 shown in fig. 2, or S201-S205 shown in fig. 8, which are not repeated herein. Alternatively, the computer program when executed by the processor implements the functions of each module/unit in the embodiment of the chinese dialogue text intention recognition device, for example, the functions of the semantic enhancement representation acquisition module 901, the first feature vector acquisition module 902, the second feature vector acquisition module 903, the target fusion vector acquisition module 904, and the target intention acquisition module 905 shown in fig. 9, which are not repeated herein. The computer readable storage medium may be nonvolatile or may be volatile.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims

1. A method for identifying chinese dialogue text intention, comprising:

2. The method for recognizing a text intention of a chinese dialogue according to claim 1, wherein said extracting features of said semantic enhanced representation using a feature extraction network to obtain a first feature vector comprises:

3. The method for identifying text intent of chinese dialog of claim 1, wherein said employing a self-attention network to perform feature extraction on said semantically enhanced representation to obtain a second feature vector comprises:

4. The method for recognizing text intent of chinese dialogue of claim 1, wherein said feature fusing said first feature vector and said second feature vector to obtain a target fused vector comprises:

5. The method for recognizing intention of chinese dialogue text as recited in claim 1, wherein said inputting said target fusion vector into a capsule network for intention recognition, obtaining target intention of said chinese dialogue text, comprises:

6. The method for recognizing Chinese text intent of claim 5, wherein the number of initial capsule vectors is N, N being equal to or greater than 2;

7. The method of claim 6, wherein the obtaining a first intermediate vector from the N initial capsule vectors and the initial weights corresponding thereto comprises:

8. A chinese dialogue text intention recognition device, comprising:

9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the chinese dialogue text intention recognition method according to any one of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements a chinese dialogue text intention recognition method according to any one of claims 1 to 7.