CN117669565A

Movatterモバイル変換

Info

Publication number: CN117669565A
Application number: CN202311791354.8A
Authority: CN
Inventors: 贾文雷; 喇全亮; 刘升平; 梁家恩
Original assignee: Unisound Intelligent Technology Co Ltd
Current assignee: Unisound Intelligent Technology Co Ltd
Priority date: 2023-12-24
Filing date: 2023-12-24
Publication date: 2024-03-08

Abstract

The embodiment of the invention relates to a slot value extraction method and device based on a large model for slot meaning, which are used for acquiring the relevant information of the mth slot at the current moment; the related information is formed by splicing sentences to be filled, original input text and filling instructions; the sentence to be filled consists of the slot name, meaning explanation and corresponding slot value of the first slot, the slot name, meaning explanation and corresponding slot value of the second slot, and the slot name, meaning explanation and corresponding slot value of the third slot; determining an attention matrix of the mth slot position based on the related information of the mth slot position at the current moment and the parameter matrix and the hidden layer characterization vector of the decoder at the last moment; the parameter matrix comprises a q matrix, a k matrix and a v matrix in the attention mechanism; determining vocabulary probability distribution corresponding to the mth slot based on the attention matrix of the mth slot; and determining a slot value corresponding to the mth slot position based on the vocabulary probability distribution, and replacing the corresponding gap filling symbol to fill the completed sentence.

Description

Method and device for extracting slot values of slot meaning based on large model

Technical Field

The invention relates to the technical field of computers, in particular to a slot value extraction method and device based on a large model for slot meaning.

Background

The slot value extraction is a key task in a man-machine dialogue system, and is to perform lexical analysis and grammatical analysis on a sentence in a certain field, and judge whether a certain segment of a specified sentence character string is a word slot which has a certain meaning and accords with the context by combining the context, namely, the purpose of slot value extraction is to identify the slot value in a user sentence. In recent years, with the rapid development of deep learning technology, the task of extracting slot values has also advanced greatly, and researchers have proposed a series of effective algorithms, for example, in the process of extracting slot values in the market at present, text and instructions for extracting slots are generally used as input of models, and the models output slot names and corresponding slot values in a specific structure, but the following problems are easy to occur in the method:

1. generating an abnormal slot position;

2. the extracted slot values are inaccurate.

Therefore, how to avoid abnormal slot position generation during slot value extraction or inaccurate extracted slot value is a technical problem to be solved.

Disclosure of Invention

Based on the above problems, the invention provides a method and a device for extracting a slot value based on a large model slot meaning, so as to solve the problems in the prior art.

To this end, in a first aspect, the present invention provides a method for extracting slot values based on the meaning of slots of a large model, the method comprising:

determining an attention matrix of the mth slot position based on the related information and the parameter matrix of the mth slot position at the current moment and the hidden layer characterization vector of the decoder at the last moment; wherein the parameter matrix comprises a q matrix, a k matrix and a v matrix in an attention mechanism;

determining vocabulary probability distribution corresponding to the mth slot based on the attention matrix of the mth slot;

and determining a slot value corresponding to the mth slot position based on the vocabulary probability distribution, and replacing a corresponding gap filling symbol to fill the completed sentence.

In one possible implementation manner, the related information is formed by splicing sentences, original input text and filling instructions, and specifically includes:

inputting text: o\n sentences to be filled: U\nI

Wherein, I represents a gap filling instruction, n represents the total number of slots, O represents an original input text, U represents a sentence, and the method is expressed as follows:

“s₀ meaning of (1)Its corresponding value is _; …; s is(s)_m-1 Meaning +.>Its corresponding value is _; …; s is(s)_n-1 Meaning +.>Its corresponding value is _; "

Wherein, _represents the corresponding position of the slot value; s is(s)_n-1 The slot name representing the nth slot, the corresponding interpretation is expressed ast_n-1 Representing the length of the nth slot text.

In one possible implementation manner, the determining, based on the attention matrix of the mth slot, a vocabulary probability distribution corresponding to the mth slot specifically includes:

determining a hidden layer representation vector output at the current moment based on the attention matrix of the mth slot position;

and determining the vocabulary probability distribution corresponding to the mth slot based on the hidden layer representation vector output at the current moment.

In one possible implementation manner, the determining the attention matrix of the mth slot is specifically performed based on the related information of the mth slot at the current moment, the parameter matrix, and the hidden layer representation vector of the decoder at the previous moment: the attention matrix for the mth slot is determined by the following formula:

wherein, atten_m Represents the attention matrix, o, corresponding to the mth slot_m-1 For the hidden layer token vector of the decoder at the previous moment, W^v 、W^q 、W^k V matrix, q matrix and k matrix, d_k Representing i_m And o_m-1 Dimension i of (i)_m And a coding vector matrix of the related information of the mth slot of the current moment, wherein T represents transposition.

In one possible implementation manner, the determining, based on the attention matrix of the mth slot, a vocabulary probability distribution corresponding to the mth slot specifically includes: determining the vocabulary probability distribution corresponding to the mth slot according to the following formula:

prob_vocb ＝softmax(o_m )

wherein prob_vovb Representing vocabulary probability distribution, atten_m Representing the attention matrix corresponding to the mth slot,representing a parameter matrix for performing linear transformation to transform the length of the matrix of attention output into the length of the vocabulary, o_m A hidden layer token vector representing the decoder at the current time.

In a second aspect, the present invention provides a slot value extraction device based on a slot meaning of a large model, the device comprising:

the attention moment array module is used for determining an attention matrix of the mth slot position based on the related information of the mth slot position at the current moment, the parameter matrix and the hidden layer characterization vector of the decoder at the last moment; wherein the parameter matrix comprises a q matrix, a k matrix and a v matrix in an attention mechanism;

the vocabulary probability distribution module is used for determining vocabulary probability distribution corresponding to the mth slot based on the attention matrix of the mth slot;

and the slot value module is used for determining a slot value corresponding to the mth slot position based on the vocabulary probability distribution and replacing a corresponding gap filling symbol to fill the completed sentence.

In one possible implementation, the vocabulary probability distribution module includes:

the first determining unit is used for determining a hidden layer representation vector output at the current moment based on the attention matrix of the mth slot position;

and the second determining unit is used for determining the vocabulary probability distribution corresponding to the mth slot position based on the hidden layer representation vector output at the current moment.

In a third aspect, the present invention provides a computer server comprising: memory, processor, and transceiver;

the processor is used for coupling with the memory, reading and executing the instructions in the memory to realize the slot value extraction method based on the slot meaning of the large model in the first aspect;

the transceiver is coupled to the processor and is controlled by the processor to transmit and receive messages.

In a fourth aspect, the present invention provides a chip system, including a processor, where the processor is coupled to a memory, where the memory stores program instructions, and where the program instructions stored in the memory implement the slot value extraction method according to the first aspect based on the slot meaning of the large model when executed by the processor.

In a fifth aspect, the present invention provides a computer readable storage medium, on which a computer program is stored, the computer program being executed by a processor to perform the slot value extraction method according to the first aspect based on the slot meaning of the large model.

According to the slot value extraction method based on the meaning of the slot position of the large model, in the slot information extraction process, the slot name and the slot value of the slot position are organized into one sentence, wherein the explanation of each slot position meaning is included, the sentence and the original input text are spliced and input into the large model, so that the large model can fully understand each slot position meaning, the slot value corresponding to the slot position is accurately extracted, and meanwhile, the problem that the large model outputs abnormal slot positions is prevented.

Drawings

FIG. 1 is a schematic flow chart of a method for extracting slot values based on the meaning of slot positions of a large model according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of determining a vocabulary probability distribution;

FIG. 3 is a schematic diagram of a slot value extraction device based on the meaning of a slot position of a large model according to a second embodiment of the present invention;

FIG. 4 is a schematic diagram of the structure of the table probability distribution module;

fig. 5 is a schematic diagram of a computer server according to a third embodiment of the present invention;

fig. 6 is a schematic diagram of a chip system according to a fourth embodiment of the present invention;

fig. 7 is a schematic diagram of a chip system according to a fifth embodiment of the present invention.

Detailed Description

The present application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.

It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Example 1

The first embodiment of the present invention provides a method for extracting a slot value based on structural enhancement of a large model, where an execution body of the present application is a server, or a system or a device with an operation processing function, and for slot value information extraction, based on a pre-trained large model, a slot name and a slot value of a slot are organized into a sentence U by using a natural language text, where an explanation corresponding to the slot is included, an underline flag "_is used in a position corresponding to the slot value, and when the large model performs slot value extraction, the sentence, an original input text O and a blank filling instruction I are simultaneously input to the model, and in a large model decoding process, the large model can fully understand a meaning of each slot position, and can accurately extract the slot value corresponding to the slot position, so that the large model outputs the sentence and fills the slot value flag position by using the extracted slot value, where the large model refers to a large-scale language model, such as a common OpenAI's chattgpt, and detailed description is not performed here, and the method includes the following steps as shown in fig. 1:

step 110, obtaining the related information of the mth slot position at the current moment;

The following is explained in detail with respect to stitching:

first, a set of slot names s= { S of predefined slots₀ ,s₁ ,…,s_n-1 N represents the total number of slots, s_0≤l≤n-1 The slot name indicating the (i+1) th slot, and v= { V is defined₀ ,v₁ ,…,v_n-1 The value set of slot, v_0≤l≤n-1 Representation s_l Corresponding slot values.

The method comprises the steps of organizing a slot name and a slot value of a slot into a sentence U by using natural language, wherein the corresponding position of the slot value is marked by using an underline "_" and explanation is given to each slot, and s_0≤j＜n The corresponding interpretation is expressed asWhere t represents the length of the text and j represents the j-th slot, i.e., t_j Representing the length of the text of the jth slot, sentence U may be represented as "s₀ Meaning +.>Its corresponding value is _; s is(s)₁ Meaning +.>Its corresponding value is _; …; s is(s)_n-1 Meaning of (1)Its corresponding value is _; ".

According to the definition of the sentence, the relevant information of the mth slot at the current moment is specifically:

inputting text: o\n sentences to be filled: U\nI

The relevant information of the mth slot position at the current moment is recorded as t_m Inputting a model, gradually extracting slot values through an autoregressive method, and finally generating a sentence with completed filling, wherein the calculation process of the slot values generated by the model is as shown in the steps 120-140:

step 120, determining an attention matrix of the mth slot based on the related information of the mth slot at the current moment, the parameter matrix and the hidden layer characterization vector of the decoder at the previous moment;

the large-scale language model calculation process is autoregressive calculation, each step of calculation depends on the output vector of the last step, and specifically, the invention determines the attention matrix of the mth slot position through the following formula:

wherein, atten_m Represents the attention matrix, o, corresponding to the mth slot_m-1 For the hidden layer token vector of the decoder at the previous moment, W^v 、W^q 、W^k V matrix, q matrix and k matrix, d_k Representing i_m And o_m-1 Dimension i of (i)_m Representing t_m Is represented by T, and is represented by transpose. The v matrix, q matrix, and k matrix mentioned here are the three core matrices in the attention mechanism, which are trained parameter matrices.

130, determining vocabulary probability distribution corresponding to the mth slot based on the attention matrix of the mth slot;

specifically, as shown in fig. 2, the method is realized through the following two steps:

step 1301, determining hidden layer characterization vectors output at the current moment based on the attention matrix of the mth slot position;

specifically, the hidden layer representation vector output at the current moment is determined according to the following formula:

wherein o is_m Hidden layer representation vector (Atten) output at current moment_m Representing the attention matrix corresponding to the mth slot,representing a parameter matrix, which is a trained parameter matrix for performing linear transformation to transform the length of the matrix of attention output into the length of the vocabulary, o_m A hidden layer token vector representing the decoder at the current time.

Step 1302, determining a vocabulary probability distribution corresponding to the mth slot based on the hidden layer representation vector output at the current moment.

Specifically, determining the vocabulary probability distribution corresponding to the mth slot according to the following formula:

prob_vocb ＝softmax(o_m )

wherein prob is_vocb Representing vocabulary probability distribution, o_m And the hidden layer representation vector which is output at the current moment is represented, and the size of the hidden layer representation vector is consistent with the length of the word list.

And 140, determining a slot value corresponding to the mth slot position based on the vocabulary probability distribution, and replacing a corresponding gap filling symbol to fill the completed sentence.

Specifically, the slot value is determined by the following formula, and the corresponding gap filling symbol is replaced to fill the completed sentence: y is_m ＝max(prob_vocb )。

After traversing all slots in the slot set, finishing calculation to finally generate a sentence U with completed filling_final Can be expressed as "s₀ Meaning of (1)Corresponding to a value v₁ ；s₁ Meaning +.>Corresponding to a value v₁ ；…；s_n-1 Meaning +.>Corresponding to a value v_n-1 ；”。

Example two

The second embodiment of the present invention provides a slot value extraction device based on a large model of slot meaning, as shown in fig. 3, the device includes: a related information module 310, an attention moment array module 320, a vocabulary probability distribution module 330, and a slot value module 340. Specific:

the attention moment array module 320 is configured to determine an attention matrix of the mth slot based on the related information of the mth slot at the current time, the parameter matrix, and the hidden layer token vector of the decoder at the previous time; wherein the parameter matrix comprises a q matrix, a k matrix and a v matrix in an attention mechanism;

a vocabulary probability distribution module 330, configured to determine a vocabulary probability distribution corresponding to the mth slot based on the attention matrix of the mth slot;

and a slot value module 340, configured to determine a slot value corresponding to the mth slot based on the vocabulary probability distribution, and replace a corresponding gap filler symbol to fill the completed sentence.

Further, specifically, the related information module 310 specifically is configured to:

inputting text: o\n sentences to be filled: U\nI

“s₀ meaning of (1)Its corresponding value is _; …; s is(s)_m-1 Meaning +.>Its corresponding value is _; …; s is(s)_n-1 Meaning +.>Corresponding value thereofIs _; "

Further, in particular, attention matrix module 320 determines the attention matrix for the mth slot by the following formula:

wherein, atten_m Represents the attention matrix, o, corresponding to the mth slot_m-1 For the hidden layer token vector of the decoder at the previous moment, W^v 、W^q 、W^k V matrix, q matrix and k matrix, d_k Representing i_m And o_m-1 Dimension i of (i)_m And (3) a coding vector matrix of relevant JSON structure information of the mth slot of the current moment, wherein T represents transposition.

Further, as shown in fig. 4, the vocabulary probability distribution module 330 includes a first determining unit 3301 and a second determining unit 3302, specifically,

a first determining unit 3301, configured to determine a hidden layer token vector output at the current moment based on the attention matrix of the mth slot;

wherein, atten_m Representing the attention matrix corresponding to the mth slot,representing a parameter matrix for performing linear transformation to change the length of the matrix of attention outputChange to length of vocabulary, o_m The hidden layer representation vector of the decoder representing the current time, i.e. the hidden layer representation vector output at the current time.

The second determining unit 302 is configured to determine a vocabulary probability distribution corresponding to the mth slot based on the hidden layer token vector output at the current time.

prob_vocb ＝softmax(o_m )

The device provided in the second embodiment of the present invention may perform the method steps in the first embodiment of the method, and its implementation principle and technical effects are similar, and are not described herein again.

It should be noted that, it should be understood that the division of the modules of the above apparatus is merely a division of a logic function, and may be fully or partially integrated into a physical entity or may be physically separated. And these modules may all be implemented in software in the form of calls by the processing element; or can be realized in hardware; the method can also be realized in a form of calling software by a processing element, and the method can be realized in a form of hardware by a part of modules. For example, the determining module may be a processing element that is set up separately, may be implemented in a chip of the above apparatus, or may be stored in a memory of the above apparatus in the form of program code, and may be called by a processing element of the above apparatus and execute the functions of the determining module. The implementation of the other modules is similar. In addition, all or part of the modules can be integrated together or can be independently implemented. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in a software form.

For example, the modules above may be one or more integrated circuits configured to implement the methods above, such as: one or more specific integrated circuits (Application Specific Integrated Circuit, ASIC), or one or more microprocessors (Digital Signal Processor, DSP), or one or more field programmable gate arrays (Field Programmable Gate Array, FPGA), or the like. For another example, when a module above is implemented in the form of a processing element scheduler code, the processing element may be a general purpose processor, such as a central processing unit (Central Processing Unit, CPU) or other processor that may invoke the program code. For another example, the modules may be integrated together and implemented in the form of a System-on-a-chip (SOC).

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces, in whole or in part, the procedures or functions described in accordance with embodiments of the present application. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wired (e.g., coaxial cable, fiber optic, digital subscriber line ((Digital Subscriber Line, DSL)), or wireless (e.g., infrared, wireless, bluetooth, microwave, etc.) means, the computer-readable storage medium may be any available medium that can be accessed by the computer or a data storage device such as a server, data center, etc., that contains an integration of one or more available media, the available media may be magnetic media (e.g., floppy disk, hard disk, tape), optical media (e.g., DVD), or semiconductor media (e.g., solid state disk, SSD), etc.

Example III

A third embodiment of the present invention provides a computer server, as shown in FIG. 5, including: memory, processor, and transceiver;

the processor is used for coupling with the memory, reading and executing the instructions in the memory to realize any of the large model-based structure enhanced slot value extraction methods provided in the first embodiment;

the transceiver is coupled to the processor, and the processor controls the transceiver to transmit and receive messages.

Example IV

A fourth embodiment of the present invention provides a chip system, as shown in FIG. 6, including a processor, a coupling between the processor and a memory, where the memory stores program instructions, and when the program instructions stored in the memory are executed by the processor, implement any of the large model-based structure-enhanced slot value extraction methods provided in the first embodiment.

Example five

A fifth embodiment of the present invention provides a computer readable storage medium, as shown in fig. 7, including a program or an instruction, where the program or the instruction implement any one of the method for extracting a slot value based on structural enhancement of a large model according to the first embodiment when running on a computer.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of function in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The foregoing detailed description of the invention has been presented for purposes of illustration and description, and it should be understood that the invention is not limited to the particular embodiments disclosed, but is intended to cover all modifications, equivalents, alternatives, and improvements within the spirit and principles of the invention.

Claims

1. A method for extracting slot values based on slot meaning of a large model, the method comprising:

2. The method according to claim 1, wherein the related information is formed by splicing sentences, original input text and filling instructions, specifically:

inputting text: o\n sentences to be filled: U\nI

3. The method according to claim 1, wherein the determining the vocabulary probability distribution corresponding to the mth slot based on the attention matrix of the mth slot specifically includes:

4. The method according to claim 1, wherein the determining the attention matrix of the mth slot is based on the related information of the mth slot at the current time, the parameter matrix, and the hidden layer token vector of the decoder at the previous time, specifically: the attention matrix for the mth slot is determined by the following formula:

5. The method according to claim 1, wherein the determining, based on the attention matrix of the mth slot, a vocabulary probability distribution corresponding to the mth slot is specifically: determining the vocabulary probability distribution corresponding to the mth slot according to the following formula:

prob_vocb ＝softmax(o_m )

wherein prob is_vocb Representing vocabulary probability distribution, atten_m Representing the attention matrix corresponding to the mth slot,representing a parameter matrix for performing linear transformation to transform the length of the matrix of attention output into the length of the vocabulary, o_m A hidden layer token vector representing the decoder at the current time.

6. A slot value extraction device based on a large model of slot meaning, the device comprising:

7. The apparatus of claim 6, wherein the vocabulary probability distribution module comprises:

8. A computer server, comprising: memory, processor, and transceiver;

the processor is coupled with the memory, reads and executes the instructions in the memory to realize the slot value extraction method based on the slot meaning of the large model according to any one of claims 1 to 5;

9. A system on a chip comprising a processor coupled to a memory, the memory storing program instructions that when executed by the processor implement the large model based slot value extraction method of any one of claims 1-5.

10. A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and the computer program is executed by a processor to perform the large-model-based slot value extraction method according to any one of claims 1 to 5.