Disclosure of Invention
Based on the above problems, the invention provides a method and a device for extracting a slot value based on a large model slot meaning, so as to solve the problems in the prior art.
To this end, in a first aspect, the present invention provides a method for extracting slot values based on the meaning of slots of a large model, the method comprising:
acquiring related information of an mth slot position at the current moment, wherein the related information is formed by splicing sentences to be filled, original input texts and filling instructions; the sentence to be filled consists of a groove name, meaning explanation and corresponding groove value of the first groove, a groove name, meaning explanation and corresponding groove value of the second groove, and a groove name, meaning explanation and corresponding groove value of the third groove; the first slot positions are all slot positions before the current moment, the second slot positions are the mth slot positions at the current moment, and the third slot positions are all slot positions after the current moment, wherein the slot values corresponding to the second slot positions and the slot values corresponding to the third slot positions are replaced by filling symbols;
determining an attention matrix of the mth slot position based on the related information and the parameter matrix of the mth slot position at the current moment and the hidden layer characterization vector of the decoder at the last moment; wherein the parameter matrix comprises a q matrix, a k matrix and a v matrix in an attention mechanism;
determining vocabulary probability distribution corresponding to the mth slot based on the attention matrix of the mth slot;
and determining a slot value corresponding to the mth slot position based on the vocabulary probability distribution, and replacing a corresponding gap filling symbol to fill the completed sentence.
In one possible implementation manner, the related information is formed by splicing sentences, original input text and filling instructions, and specifically includes:
inputting text: o\n sentences to be filled: U\nI
Wherein, I represents a gap filling instruction, n represents the total number of slots, O represents an original input text, U represents a sentence, and the method is expressed as follows:
“s0 meaning of (1)Its corresponding value is _; …; s is(s)m-1 Meaning +.>Its corresponding value is _; …; s is(s)n-1 Meaning +.>Its corresponding value is _; "
Wherein, _represents the corresponding position of the slot value; s is(s)n-1 The slot name representing the nth slot, the corresponding interpretation is expressed astn-1 Representing the length of the nth slot text.
In one possible implementation manner, the determining, based on the attention matrix of the mth slot, a vocabulary probability distribution corresponding to the mth slot specifically includes:
determining a hidden layer representation vector output at the current moment based on the attention matrix of the mth slot position;
and determining the vocabulary probability distribution corresponding to the mth slot based on the hidden layer representation vector output at the current moment.
In one possible implementation manner, the determining the attention matrix of the mth slot is specifically performed based on the related information of the mth slot at the current moment, the parameter matrix, and the hidden layer representation vector of the decoder at the previous moment: the attention matrix for the mth slot is determined by the following formula:
wherein, attenm Represents the attention matrix, o, corresponding to the mth slotm-1 For the hidden layer token vector of the decoder at the previous moment, Wv 、Wq 、Wk V matrix, q matrix and k matrix, dk Representing im And om-1 Dimension i of (i)m And a coding vector matrix of the related information of the mth slot of the current moment, wherein T represents transposition.
In one possible implementation manner, the determining, based on the attention matrix of the mth slot, a vocabulary probability distribution corresponding to the mth slot specifically includes: determining the vocabulary probability distribution corresponding to the mth slot according to the following formula:
probvocb =softmax(om )
wherein probvovb Representing vocabulary probability distribution, attenm Representing the attention matrix corresponding to the mth slot,representing a parameter matrix for performing linear transformation to transform the length of the matrix of attention output into the length of the vocabulary, om A hidden layer token vector representing the decoder at the current time.
In a second aspect, the present invention provides a slot value extraction device based on a slot meaning of a large model, the device comprising:
the related information module is used for acquiring related information of an mth slot position at the current moment, wherein the related information is formed by splicing sentences to be filled, original input texts and filling instructions; the sentence to be filled consists of a groove name, meaning explanation and corresponding groove value of the first groove, a groove name, meaning explanation and corresponding groove value of the second groove, and a groove name, meaning explanation and corresponding groove value of the third groove; the first slot positions are all slot positions before the current moment, the second slot positions are the mth slot positions at the current moment, and the third slot positions are all slot positions after the current moment, wherein the slot values corresponding to the second slot positions and the slot values corresponding to the third slot positions are replaced by filling symbols;
the attention moment array module is used for determining an attention matrix of the mth slot position based on the related information of the mth slot position at the current moment, the parameter matrix and the hidden layer characterization vector of the decoder at the last moment; wherein the parameter matrix comprises a q matrix, a k matrix and a v matrix in an attention mechanism;
the vocabulary probability distribution module is used for determining vocabulary probability distribution corresponding to the mth slot based on the attention matrix of the mth slot;
and the slot value module is used for determining a slot value corresponding to the mth slot position based on the vocabulary probability distribution and replacing a corresponding gap filling symbol to fill the completed sentence.
In one possible implementation, the vocabulary probability distribution module includes:
the first determining unit is used for determining a hidden layer representation vector output at the current moment based on the attention matrix of the mth slot position;
and the second determining unit is used for determining the vocabulary probability distribution corresponding to the mth slot position based on the hidden layer representation vector output at the current moment.
In a third aspect, the present invention provides a computer server comprising: memory, processor, and transceiver;
the processor is used for coupling with the memory, reading and executing the instructions in the memory to realize the slot value extraction method based on the slot meaning of the large model in the first aspect;
the transceiver is coupled to the processor and is controlled by the processor to transmit and receive messages.
In a fourth aspect, the present invention provides a chip system, including a processor, where the processor is coupled to a memory, where the memory stores program instructions, and where the program instructions stored in the memory implement the slot value extraction method according to the first aspect based on the slot meaning of the large model when executed by the processor.
In a fifth aspect, the present invention provides a computer readable storage medium, on which a computer program is stored, the computer program being executed by a processor to perform the slot value extraction method according to the first aspect based on the slot meaning of the large model.
According to the slot value extraction method based on the meaning of the slot position of the large model, in the slot information extraction process, the slot name and the slot value of the slot position are organized into one sentence, wherein the explanation of each slot position meaning is included, the sentence and the original input text are spliced and input into the large model, so that the large model can fully understand each slot position meaning, the slot value corresponding to the slot position is accurately extracted, and meanwhile, the problem that the large model outputs abnormal slot positions is prevented.
Detailed Description
The present application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.
It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Example 1
The first embodiment of the present invention provides a method for extracting a slot value based on structural enhancement of a large model, where an execution body of the present application is a server, or a system or a device with an operation processing function, and for slot value information extraction, based on a pre-trained large model, a slot name and a slot value of a slot are organized into a sentence U by using a natural language text, where an explanation corresponding to the slot is included, an underline flag "_is used in a position corresponding to the slot value, and when the large model performs slot value extraction, the sentence, an original input text O and a blank filling instruction I are simultaneously input to the model, and in a large model decoding process, the large model can fully understand a meaning of each slot position, and can accurately extract the slot value corresponding to the slot position, so that the large model outputs the sentence and fills the slot value flag position by using the extracted slot value, where the large model refers to a large-scale language model, such as a common OpenAI's chattgpt, and detailed description is not performed here, and the method includes the following steps as shown in fig. 1:
step 110, obtaining the related information of the mth slot position at the current moment;
the related information is formed by splicing sentences to be filled, original input texts and filling instructions; the sentence to be filled consists of a groove name, meaning explanation and corresponding groove value of the first groove, a groove name, meaning explanation and corresponding groove value of the second groove, and a groove name, meaning explanation and corresponding groove value of the third groove; the first slots are all slots before the current moment, the second slots are the mth slots at the current moment, and the third slots are all slots after the current moment, wherein the slot values corresponding to the second slots and the slot values corresponding to the third slots are replaced by filling symbols.
The following is explained in detail with respect to stitching:
first, a set of slot names s= { S of predefined slots0 ,s1 ,…,sn-1 N represents the total number of slots, s0≤l≤n-1 The slot name indicating the (i+1) th slot, and v= { V is defined0 ,v1 ,…,vn-1 The value set of slot, v0≤l≤n-1 Representation sl Corresponding slot values.
The method comprises the steps of organizing a slot name and a slot value of a slot into a sentence U by using natural language, wherein the corresponding position of the slot value is marked by using an underline "_" and explanation is given to each slot, and s0≤j<n The corresponding interpretation is expressed asWhere t represents the length of the text and j represents the j-th slot, i.e., tj Representing the length of the text of the jth slot, sentence U may be represented as "s0 Meaning +.>Its corresponding value is _; s is(s)1 Meaning +.>Its corresponding value is _; …; s is(s)n-1 Meaning of (1)Its corresponding value is _; ".
According to the definition of the sentence, the relevant information of the mth slot at the current moment is specifically:
inputting text: o\n sentences to be filled: U\nI
Wherein, I represents a gap filling instruction, n represents the total number of slots, O represents an original input text, U represents a sentence, and the method is expressed as follows:
“s0 meaning of (1)Its corresponding value is _; …; s is(s)m-1 Meaning +.>Its corresponding value is _; …; s is(s)n-1 Meaning +.>Its corresponding value is _; "
Wherein, _represents the corresponding position of the slot value; s is(s)n-1 The slot name representing the nth slot, the corresponding interpretation is expressed astn-1 Representing the length of the nth slot text.
The relevant information of the mth slot position at the current moment is recorded as tm Inputting a model, gradually extracting slot values through an autoregressive method, and finally generating a sentence with completed filling, wherein the calculation process of the slot values generated by the model is as shown in the steps 120-140:
step 120, determining an attention matrix of the mth slot based on the related information of the mth slot at the current moment, the parameter matrix and the hidden layer characterization vector of the decoder at the previous moment;
the large-scale language model calculation process is autoregressive calculation, each step of calculation depends on the output vector of the last step, and specifically, the invention determines the attention matrix of the mth slot position through the following formula:
wherein, attenm Represents the attention matrix, o, corresponding to the mth slotm-1 For the hidden layer token vector of the decoder at the previous moment, Wv 、Wq 、Wk V matrix, q matrix and k matrix, dk Representing im And om-1 Dimension i of (i)m Representing tm Is represented by T, and is represented by transpose. The v matrix, q matrix, and k matrix mentioned here are the three core matrices in the attention mechanism, which are trained parameter matrices.
130, determining vocabulary probability distribution corresponding to the mth slot based on the attention matrix of the mth slot;
specifically, as shown in fig. 2, the method is realized through the following two steps:
step 1301, determining hidden layer characterization vectors output at the current moment based on the attention matrix of the mth slot position;
specifically, the hidden layer representation vector output at the current moment is determined according to the following formula:
wherein o ism Hidden layer representation vector (Atten) output at current momentm Representing the attention matrix corresponding to the mth slot,representing a parameter matrix, which is a trained parameter matrix for performing linear transformation to transform the length of the matrix of attention output into the length of the vocabulary, om A hidden layer token vector representing the decoder at the current time.
Step 1302, determining a vocabulary probability distribution corresponding to the mth slot based on the hidden layer representation vector output at the current moment.
Specifically, determining the vocabulary probability distribution corresponding to the mth slot according to the following formula:
probvocb =softmax(om )
wherein prob isvocb Representing vocabulary probability distribution, om And the hidden layer representation vector which is output at the current moment is represented, and the size of the hidden layer representation vector is consistent with the length of the word list.
And 140, determining a slot value corresponding to the mth slot position based on the vocabulary probability distribution, and replacing a corresponding gap filling symbol to fill the completed sentence.
Specifically, the slot value is determined by the following formula, and the corresponding gap filling symbol is replaced to fill the completed sentence: y ism =max(probvocb )。
After traversing all slots in the slot set, finishing calculation to finally generate a sentence U with completed fillingfinal Can be expressed as "s0 Meaning of (1)Corresponding to a value v1 ;s1 Meaning +.>Corresponding to a value v1 ;…;sn-1 Meaning +.>Corresponding to a value vn-1 ;”。
According to the slot value extraction method based on the meaning of the slot position of the large model, in the slot information extraction process, the slot name and the slot value of the slot position are organized into one sentence, wherein the explanation of each slot position meaning is included, the sentence and the original input text are spliced and input into the large model, so that the large model can fully understand each slot position meaning, the slot value corresponding to the slot position is accurately extracted, and meanwhile, the problem that the large model outputs abnormal slot positions is prevented.
Example two
The second embodiment of the present invention provides a slot value extraction device based on a large model of slot meaning, as shown in fig. 3, the device includes: a related information module 310, an attention moment array module 320, a vocabulary probability distribution module 330, and a slot value module 340. Specific:
the related information module 310 is configured to obtain related information of an mth slot at a current time, where the related information is formed by splicing a sentence to be filled, an original input text and a filling instruction; the sentence to be filled consists of a groove name, meaning explanation and corresponding groove value of the first groove, a groove name, meaning explanation and corresponding groove value of the second groove, and a groove name, meaning explanation and corresponding groove value of the third groove; the first slot positions are all slot positions before the current moment, the second slot positions are the mth slot positions at the current moment, and the third slot positions are all slot positions after the current moment, wherein the slot values corresponding to the second slot positions and the slot values corresponding to the third slot positions are replaced by filling symbols;
the attention moment array module 320 is configured to determine an attention matrix of the mth slot based on the related information of the mth slot at the current time, the parameter matrix, and the hidden layer token vector of the decoder at the previous time; wherein the parameter matrix comprises a q matrix, a k matrix and a v matrix in an attention mechanism;
a vocabulary probability distribution module 330, configured to determine a vocabulary probability distribution corresponding to the mth slot based on the attention matrix of the mth slot;
and a slot value module 340, configured to determine a slot value corresponding to the mth slot based on the vocabulary probability distribution, and replace a corresponding gap filler symbol to fill the completed sentence.
Further, specifically, the related information module 310 specifically is configured to:
the method comprises the steps of organizing a slot name and a slot value of a slot into a sentence U by using natural language, wherein the corresponding position of the slot value is marked by using an underline "_" and explanation is given to each slot, and s0≤j<n The corresponding interpretation is expressed asWhere t represents the length of the text and j represents the j-th slot, i.e., tj Representing the length of the text of the jth slot, sentence U may be represented as "s0 Meaning +.>Its corresponding value is _; s is(s)1 Meaning +.>Its corresponding value is _; …; s is(s)n-1 Meaning of (1)Its corresponding value is _; ".
According to the definition of the sentence, the relevant information of the mth slot at the current moment is specifically:
inputting text: o\n sentences to be filled: U\nI
Wherein, I represents a gap filling instruction, n represents the total number of slots, O represents an original input text, U represents a sentence, and the method is expressed as follows:
“s0 meaning of (1)Its corresponding value is _; …; s is(s)m-1 Meaning +.>Its corresponding value is _; …; s is(s)n-1 Meaning +.>Corresponding value thereofIs _; "
Wherein, _represents the corresponding position of the slot value; s is(s)n-1 The slot name representing the nth slot, the corresponding interpretation is expressed astn-1 Representing the length of the nth slot text.
Further, in particular, attention matrix module 320 determines the attention matrix for the mth slot by the following formula:
wherein, attenm Represents the attention matrix, o, corresponding to the mth slotm-1 For the hidden layer token vector of the decoder at the previous moment, Wv 、Wq 、Wk V matrix, q matrix and k matrix, dk Representing im And om-1 Dimension i of (i)m And (3) a coding vector matrix of relevant JSON structure information of the mth slot of the current moment, wherein T represents transposition.
Further, as shown in fig. 4, the vocabulary probability distribution module 330 includes a first determining unit 3301 and a second determining unit 3302, specifically,
a first determining unit 3301, configured to determine a hidden layer token vector output at the current moment based on the attention matrix of the mth slot;
specifically, the hidden layer representation vector output at the current moment is determined according to the following formula:
wherein, attenm Representing the attention matrix corresponding to the mth slot,representing a parameter matrix for performing linear transformation to change the length of the matrix of attention outputChange to length of vocabulary, om The hidden layer representation vector of the decoder representing the current time, i.e. the hidden layer representation vector output at the current time.
The second determining unit 302 is configured to determine a vocabulary probability distribution corresponding to the mth slot based on the hidden layer token vector output at the current time.
Specifically, determining the vocabulary probability distribution corresponding to the mth slot according to the following formula:
probvocb =softmax(om )
wherein prob isvocb Representing vocabulary probability distribution, om And the hidden layer representation vector which is output at the current moment is represented, and the size of the hidden layer representation vector is consistent with the length of the word list.
The device provided in the second embodiment of the present invention may perform the method steps in the first embodiment of the method, and its implementation principle and technical effects are similar, and are not described herein again.
It should be noted that, it should be understood that the division of the modules of the above apparatus is merely a division of a logic function, and may be fully or partially integrated into a physical entity or may be physically separated. And these modules may all be implemented in software in the form of calls by the processing element; or can be realized in hardware; the method can also be realized in a form of calling software by a processing element, and the method can be realized in a form of hardware by a part of modules. For example, the determining module may be a processing element that is set up separately, may be implemented in a chip of the above apparatus, or may be stored in a memory of the above apparatus in the form of program code, and may be called by a processing element of the above apparatus and execute the functions of the determining module. The implementation of the other modules is similar. In addition, all or part of the modules can be integrated together or can be independently implemented. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in a software form.
For example, the modules above may be one or more integrated circuits configured to implement the methods above, such as: one or more specific integrated circuits (Application Specific Integrated Circuit, ASIC), or one or more microprocessors (Digital Signal Processor, DSP), or one or more field programmable gate arrays (Field Programmable Gate Array, FPGA), or the like. For another example, when a module above is implemented in the form of a processing element scheduler code, the processing element may be a general purpose processor, such as a central processing unit (Central Processing Unit, CPU) or other processor that may invoke the program code. For another example, the modules may be integrated together and implemented in the form of a System-on-a-chip (SOC).
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces, in whole or in part, the procedures or functions described in accordance with embodiments of the present application. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wired (e.g., coaxial cable, fiber optic, digital subscriber line ((Digital Subscriber Line, DSL)), or wireless (e.g., infrared, wireless, bluetooth, microwave, etc.) means, the computer-readable storage medium may be any available medium that can be accessed by the computer or a data storage device such as a server, data center, etc., that contains an integration of one or more available media, the available media may be magnetic media (e.g., floppy disk, hard disk, tape), optical media (e.g., DVD), or semiconductor media (e.g., solid state disk, SSD), etc.
Example III
A third embodiment of the present invention provides a computer server, as shown in FIG. 5, including: memory, processor, and transceiver;
the processor is used for coupling with the memory, reading and executing the instructions in the memory to realize any of the large model-based structure enhanced slot value extraction methods provided in the first embodiment;
the transceiver is coupled to the processor, and the processor controls the transceiver to transmit and receive messages.
Example IV
A fourth embodiment of the present invention provides a chip system, as shown in FIG. 6, including a processor, a coupling between the processor and a memory, where the memory stores program instructions, and when the program instructions stored in the memory are executed by the processor, implement any of the large model-based structure-enhanced slot value extraction methods provided in the first embodiment.
Example five
A fifth embodiment of the present invention provides a computer readable storage medium, as shown in fig. 7, including a program or an instruction, where the program or the instruction implement any one of the method for extracting a slot value based on structural enhancement of a large model according to the first embodiment when running on a computer.
According to the slot value extraction method based on the meaning of the slot position of the large model, in the slot information extraction process, the slot name and the slot value of the slot position are organized into one sentence, wherein the explanation of each slot position meaning is included, the sentence and the original input text are spliced and input into the large model, so that the large model can fully understand each slot position meaning, the slot value corresponding to the slot position is accurately extracted, and meanwhile, the problem that the large model outputs abnormal slot positions is prevented.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of function in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The foregoing detailed description of the invention has been presented for purposes of illustration and description, and it should be understood that the invention is not limited to the particular embodiments disclosed, but is intended to cover all modifications, equivalents, alternatives, and improvements within the spirit and principles of the invention.