CN114077650A

Movatterモバイル変換

Info

Publication number: CN114077650A
Application number: CN202010839609.3A
Authority: CN
Inventors: 祝官文; 王宝军
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-08-19
Filing date: 2020-08-19
Publication date: 2022-02-22

Abstract

Translated fromChinese

本申请提供了一种口语理解模型的训练方法和装置，涉及人工智能领域以及自然语言理解领域。该方法包括：根据目标小语种的无标注语料和多语言的预训练语言模型，确定预训练语言模型的嵌入层权重参数，该预训练语言模型采用双向编码器的架构；根据至少一个大语种的已标注语料和该预训练语言模型，确定预训练语言模型的编码层权重参数；采用该嵌入层权重参数对该预训练语言模型的嵌入层进行初始化，采用该编码层权重参数对该预训练语言模型的编码层进行初始化，获得目标小语种的口语理解模型。上述口语理解模型的训练方法和装置能够在获得基于特定小语种的口语理解模型的同时，提高该口语理解模型的准确性，从而提升该小语种的口语理解任务的性能。

The present application provides a training method and device for a spoken language understanding model, which relate to the field of artificial intelligence and natural language understanding. The method includes: determining the weight parameters of the embedding layer of the pre-training language model according to the unlabeled corpus of the target small language and the multi-language pre-training language model, and the pre-training language model adopts the architecture of a bidirectional encoder; The annotated corpus and the pre-trained language model are used to determine the coding layer weight parameters of the pre-trained language model; the embedding layer weight parameters of the pre-trained language model are used to initialize the embedding layer of the pre-trained language model, and the coding layer weight parameters are used for the pre-trained language model The encoding layer of the model is initialized to obtain the spoken language understanding model of the target small language. The above training method and device for a spoken language comprehension model can improve the accuracy of the spoken language understanding model while obtaining a spoken language understanding model based on a specific minor language, thereby improving the performance of the spoken language understanding task of the minor language.

Description

Training method and device of spoken language understanding model

Technical Field

The present application relates to the field of artificial intelligence, and more particularly, to a method and an apparatus for training a spoken language understanding model.

Background

In the field of artificial intelligence, the task of spoken language understanding in a human-machine dialog system is of crucial importance. The spoken language understanding task mainly comprises two subtasks, namely intention identification and slot filling, and is realized through a spoken language understanding model. Specifically, to make the machine understand the expression of the user, the intention expressed in the user utterance is determined first, and then the intention of the user is converted into an explicit instruction that the machine can recognize so that the machine performs a corresponding operation according to the instruction. Therefore, the accuracy of intent recognition and slot filling has a very important impact on the quality of the overall dialog system.

With the increasing trend of internationalization, the spoken language understanding task needs to support a plurality of different languages, which requires that the spoken language understanding model can execute the spoken language understanding task based on the plurality of different languages. When a spoken language understanding model in the prior art is used for training a target language, a large number of corpora of the target language need to be collected and labeled, if the target language is a small language, it is difficult to collect and label a large number of corpora of the small language, and if a small number of corpora of the small language are directly used for training, the training effect is often not accurate enough.

Disclosure of Invention

The application provides a method and a device for training a spoken language understanding model, which can improve the accuracy of the spoken language understanding model while obtaining the spoken language understanding model based on a specific Chinese language, thereby improving the performance of a spoken language understanding task of the Chinese language.

In a first aspect, a method for training a spoken language understanding model is provided, which specifically includes: firstly, determining an embedded layer weight parameter of a pre-training language model according to a label-free corpus of a target language and the multi-language-based pre-training language model, wherein the pre-training language model adopts a framework of a bidirectional encoder; determining a coding layer weight parameter of a pre-training language model according to at least one marked corpus of a large language and the pre-training language model; and then, initializing the embedding layer of the pre-training language model by adopting the embedding layer weight parameters, and initializing the coding layer of the pre-training language model by adopting the coding layer weight parameters to obtain the spoken language understanding model of the target language.

It should be understood that, in the embodiment of the present application, the obtained embedding layer weight parameter is used to initialize the embedding layer of the multi-language pre-training language model, specifically, a new initial value is newly assigned to the embedding layer weight parameter of the multi-language pre-training language model. Similarly, the coding layer of the multi-language-based pre-training language model is initialized by using the obtained coding layer weight parameters, specifically, new initial values are given to the coding layer weight parameters of the multi-language-based pre-training language model again.

Wherein, the Chinese language can be called as a scarce language or a low-resource language; the large language can also be called rich language or high resource language; the multi-language pre-training model may be a multi-layer bi-directional encoder structure with a transform as a core.

The training method of the spoken language understanding model in the embodiment of the application utilizes at least one marked corpus of a large language and a non-marked corpus of a target small language to learn in stages, so that the effect of migration across multiple languages is achieved, corresponding weight parameters are obtained by multiplexing the marked corpus of the large language, and the dependence on the marked target small language in the model training process is relieved to a great extent. In addition, the obtained embedded layer weight parameters and the coding layer weight parameters are adopted to initialize the multi-language pre-training language model, so that the performance of the spoken language understanding model of the target language is improved while the spoken language understanding model based on the target language is obtained, the accuracy of the spoken language understanding model is improved, and the training time of the spoken language understanding model of the target language is reduced.

The method of the embodiment of the present application may be executed by a data processing device, or may be executed by a chip in the data processing device, which is not limited in the embodiment of the present application.

Optionally, the data processing device may obtain the embedded layer weight parameter first, and then obtain the coding layer weight parameter; or, the data processing device may also obtain the coding layer weight parameter first, and then obtain the embedding layer weight parameter; alternatively, the data processing device may obtain the embedding layer weight parameter and the encoding layer weight parameter at the same time.

With reference to the first aspect, in some implementations of the first aspect, determining the embedding layer weight parameter of the pre-trained language model according to the unlabeled corpus of the target language and the multi-language-based pre-trained language model includes: freezing the original coding layer weight parameters of the pre-training language model; and inputting the unmarked corpus into a pre-training language model for training to obtain the weight parameter of the embedding layer.

Freezing the original coding layer weight parameters of the pre-trained language model as described above may be understood as keeping the original coding layer parameters unchanged.

In the embodiment of the present application, when acquiring the weight parameter of the embedding layer, the data processing device may freeze the coding layer weight parameter of the pre-training language model, that is, keep the layer parameter unchanged, and then input the unmarked corpus of the target language into the multi-language pre-training language model for training, thereby acquiring the weight parameter of the embedding layer of the trained multi-language pre-training language model. The embedding layer parameter may be understood as word embedding vector information related to the target phrase.

With reference to the first aspect, in certain implementations of the first aspect, determining coding layer weight parameters of a pre-trained language model according to at least one labeled corpus of a large language and the pre-trained language model includes: freezing the original embedded layer weight parameters of the pre-training language model; and inputting the marked linguistic data into a pre-training language model for training to obtain a coding layer weight parameter.

Freezing the original embedding layer weight parameters of the pre-trained language model as described above may be understood as keeping the original embedding layer parameters unchanged.

In the embodiment of the present application, when acquiring the coding layer weight parameter, the data processing device may freeze the embedded layer weight parameter of the multi-language pre-training language model, that is, keep the layer parameter unchanged, and then input the labeled corpus of one or more major languages collected by the technical staff into the pre-training language model for training, where the process may also be referred to as fine tuning, so as to obtain the coding layer weight parameter of the fine-tuned pre-training language model. The encoding layer parameters may be understood as information related to the spoken language understanding task of a large corpus.

With reference to the first aspect, in certain implementations of the first aspect, after obtaining the spoken language understanding model of the target species, the method further includes: and fine-tuning the spoken language understanding model of the target language according to the labeled corpus of the target language to obtain a new spoken language understanding model of the target language.

The fine tuning may be understood as inputting the labeled corpus of the target idiom into the spoken language understanding model of the target idiom for training, so as to adjust the weight parameter of the model.

The marking corpora of the target Chinese language can be obtained in a manual marking mode. The method has the advantages that the spoken language understanding model of the target language is finely adjusted by the target language with the labels, the accuracy of the spoken language understanding model of the target language can be further improved while the spoken language understanding model based on the specific language is obtained, and therefore the performance of the spoken language understanding model of the target language is improved.

In a second aspect, a device for training a spoken language understanding model is provided, which is configured to perform the method in any one of the possible implementations of the first aspect. In particular, the apparatus comprises means for performing the method of any one of the possible implementations of the first aspect described above.

In a third aspect, there is provided another apparatus for training a spoken language understanding model, comprising a processor coupled to a memory and configured to execute instructions in the memory to implement the method of any one of the possible implementations of the first aspect. Optionally, the apparatus further comprises a memory. Optionally, the apparatus further comprises a communication interface, the processor being coupled to the communication interface.

In one implementation, the training means of the spoken language understanding model is a data processing device. When the training apparatus of the spoken language understanding model is a data processing device, the communication interface may be a transceiver, or an input/output interface.

In another implementation, the training device of the spoken language understanding model is a chip configured in a server. When the training apparatus of the spoken language understanding model is a chip configured in a server, the communication interface may be an input/output interface.

In a fourth aspect, a processor is provided, comprising: input circuit, output circuit and processing circuit. The processing circuit is configured to receive a signal via the input circuit and transmit a signal via the output circuit, so that the processor performs the method of any one of the possible implementations of the first aspect.

In a specific implementation process, the processor may be a chip, the input circuit may be an input pin, the output circuit may be an output pin, and the processing circuit may be a transistor, a gate circuit, a flip-flop, various logic circuits, and the like. The input signal received by the input circuit may be received and input by, for example and without limitation, a receiver, the signal output by the output circuit may be output to and transmitted by a transmitter, for example and without limitation, and the input circuit and the output circuit may be the same circuit that functions as the input circuit and the output circuit, respectively, at different times. The embodiment of the present application does not limit the specific implementation manner of the processor and various circuits.

In a fifth aspect, a processing apparatus is provided that includes a processor and a memory. The processor is configured to read instructions stored in the memory, and may receive signals via the receiver and transmit signals via the transmitter to perform the method of any one of the possible implementations of the first aspect.

Optionally, there are one or more processors and one or more memories.

Alternatively, the memory may be integrated with the processor, or provided separately from the processor.

In a specific implementation process, the memory may be a non-transient memory, such as a Read Only Memory (ROM), which may be integrated on the same chip as the processor, or may be separately disposed on different chips.

It will be appreciated that the associated data interaction process, for example, sending the indication information, may be a process of outputting the indication information from the processor, and receiving the capability information may be a process of receiving the input capability information from the processor. In particular, the data output by the processor may be output to a transmitter and the input data received by the processor may be from a receiver. The transmitter and receiver may be collectively referred to as a transceiver, among others.

The processing device in the fifth aspect may be a chip, the processor may be implemented by hardware or software, and when implemented by hardware, the processor may be a logic circuit, an integrated circuit, or the like; when implemented in software, the processor may be a general-purpose processor implemented by reading software code stored in a memory, which may be integrated with the processor, located external to the processor, or stand-alone.

In a sixth aspect, there is provided a computer program product comprising: computer program (also called code, or instructions), which when executed, causes a computer to perform the method of any of the possible implementations of the first aspect described above.

In a seventh aspect, a computer-readable storage medium is provided, which stores a computer program (which may also be referred to as code or instructions) that, when executed on a computer, causes the computer to perform the method in any of the possible implementations of the first aspect.

Drawings

FIG. 1 is a schematic diagram of a structure of a multi-lingual based pre-trained language model;

FIG. 2 is a schematic flow chart diagram of a method of training a spoken language understanding model according to an embodiment of the present application;

FIG. 3 is a schematic block diagram of a training apparatus for a spoken language understanding model according to an embodiment of the present application;

fig. 4 is a schematic block diagram of another training apparatus for a spoken language understanding model according to an embodiment of the present application.

Detailed Description

The technical solution in the present application will be described below with reference to the accompanying drawings.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

With the continuous development of artificial intelligence technology, natural language human-computer interaction systems, which enable human-computer interaction through natural language, become more and more important. Human-computer interaction through natural language requires a system capable of recognizing specific meanings of human natural language. Typically, systems identify the specific meaning of a sentence by employing key information extraction on the sentence in natural language.

In one possible scenario, a natural language processing system may include a user device and a data processing device.

The user equipment can comprise a user, a mobile phone, a personal computer or an intelligent terminal such as an information processing center. It should be understood that the user equipment is an initiator of the natural language data processing, and is used as an initiator of requests such as language question answering or query, and the user usually initiates the requests through the user equipment.

The data processing device may be a device or a server having a data processing function, such as a cloud server, a network server, an application server, and a management server. The data processing equipment receives question sentences such as query sentences/voice/text and the like from the intelligent terminal through an interactive interface, and then performs language data processing in the modes of machine learning, deep learning, searching, reasoning, decision making and the like through a memory for storing data and a processor link for data processing. The memory may be a general term, and includes a database for locally storing and storing historical data, and the database may be deployed on a data processing device, or may be deployed on other network servers, which is not limited herein.

In another possible scenario, the intelligent terminal may directly serve as a data processing device, directly receive an input from a user, and directly process the input by hardware of the intelligent terminal itself, where a specific process is similar to that in the previous scenario, and reference may be made to the above description, and details are not repeated here.

For ease of understanding, the relevant terms referred to in this application will first be described.

1. The large language, which may also be called as a rich language or a high resource language, refers to a language with a large number of users and a wide application range. Such as "Chinese," "English," "French," and the like.

2. The minor language, which may also be called a scarce language or a low-resource language, refers to a language with a small number of users or a small application scope, such as "Mongolian language", "Vietnamese language", "Portuguese language", and the like.

3. The corpus, i.e. the language material, also called language text, is the basic unit constituting the corpus. Each language has its corresponding corpus.

4. Spoken language understanding task and spoken language understanding model

The spoken language understanding task mainly comprises two subtasks, namely intention identification and slot filling. The intention is the user's will, i.e. what the user wants to do. Intents are sometimes also referred to as "conversation activities," i.e., activities in which the state or context of information shared by users in a conversation changes and is continually updated. Intents are typically named after a 'verb + noun', such as query for weather, booking a hotel, etc. Intent recognition may also be referred to as intent classification, i.e., classifying user utterances into previously defined intent classes according to the domain and intent they relate to. Further, the intention recognition may also be referred to as text classification (e.g., recognition of "spam mail", "spam message", etc.) or emotion analysis (e.g., "assessment of treasure-washing goods", "assessment of how hungry to take out", etc.) according to the difference of application scenarios. In a man-machine interactive system, a machine can respond accurately to sentence information input by a user, and it is required that the machine can determine an intention to be expressed by the user, i.e., intention recognition.

Slot filling is the process of translating the user's intent into machine-specific instructions. Slot filling, also known as named entity recognition, is essentially a sequence labeling problem, in which each character in a given text is labeled. The format of the label can be divided into three forms of BO, BIO and BIEO. For the case of a small number of data sets, BO is generally used, and if there is a large amount of data, the BIEO format may be selected, but the application does not limit this. Wherein "X-B" indicates that the fragment in which the element is located belongs to type X and that the element is at the beginning (begin) of the fragment; "X-I" means that the fragment in which this element is located belongs to the X class and that this element is in the middle position of this fragment, also called continuation (side); "O" means not belonging to any type, i.e. a non-semantic slot (outside).

The intent and slot are described in detail below in connection with the example of Table one. In table one, an input statement "play whiting balloon", the machine determines that the intention to be expressed by the input statement is "play music", and determines slot position information corresponding to the input statement, namely "play \ O announce \ musicName-B white \ musicName-I gas \ musicName-I ball \ musicName-I".

Watch 1

The spoken language understanding model is obtained by training based on a multi-language pre-training language model. The multi-lingual pre-training language model is a multi-layered bi-directional encoder with a transform as a core, such as a pre-training language model (BERT).

The structure of the multi-lingual pre-trained language model is described in detail below with reference to FIG. 1. As shown in fig. 1, the structure of the model includes: an input layer, an embedding (embedding) layer, an encoding (encoder) layer, an intention recognition layer, and a slot filling layer.

1. An input layer: for inputting a text, before entering the embedding layer, a [ CLS ] mark needs to be inserted at the beginning of a sentence, and the [ CLS ] mark is used as an integrated feature of the whole sentence and used for a sentence classification task. In addition, a [ SEP ] mark needs to be inserted at the end of each sentence to separate each sentence.

2. Embedding layer: and the word embedding vector information which corresponds to the text of the input layer one by one is output. The input of the layer is a word vector, a text vector and a position vector corresponding to each word of the input text of the input layer, and the output of the layer is word vector information corresponding to each word of the input text obtained by summing the word vector, the text vector and the position vector, for example, e1 to e5 in fig. 1.

3. And (3) coding layer: the internal architecture is a multi-layer bidirectional transformer structure, and may be 12 layers or 8 layers, for example, and the number of layers of the transformer encoder is not limited in the present application. the key part of the transform encoder is the multi-head self-attention (multi-head self-attention) mechanism.

The input and output of the coding layer are completely the same in form, the input is the original word vector information of each character in the text, and the output is the enhanced vector information of each character after the full-text semantic information is fused.

4. Intent identification and slot fill layer: and (2) inputting a coding vector (for example, h1 in the figure 1, and h1 is the result of weighted average of h 2-h 5) corresponding to [ CLS ] information for sentence classification into a Full Connected (FC) layer, predicting the intention of the input text, inputting a coding vector (for example, h 2-h 5 in the figure 1) corresponding to the residual text into another FC layer, and predicting slot information of the input text.

In the field of natural language understanding, training of spoken language understanding models is intended to let machines understand the expression of a user. Generally, a pre-training language understanding model (binary encoder representation from transformer, BERT) may be used to perform fine-tuning (fine-tuning) in combination with a target language task corpus, and the specific training process is as follows: firstly, a large amount of labeled corpus of a target language is obtained, and then the labeled corpus is input into the multi-language pre-training model for training, and the process is used for adjusting the weight parameters of the multi-language pre-training language model, which can also be called fine-tuning (fine-tuning), so as to obtain a spoken language understanding model of the target language. After the spoken language understanding model of the target language is obtained, the non-labeled corpus of the target language is input to the spoken language understanding model of the target language as an input text, and the intention information and the slot position information corresponding to the input text can be obtained.

In the process of training the spoken language understanding model, the requirement on the number of samples of the labeled target language is high, so that for some languages, it is difficult to collect and label spoken language understanding data of the languages, and if a small amount of linguistic data of the languages are directly used for training, the training effect is often not accurate enough.

In view of this, the present application provides a method and an apparatus for training a spoken language understanding model, which, by multiplexing spoken language understanding task data of a labeled large-language corpus, greatly reduces dependence on the labeled small-language corpus, improves accuracy of the small-language spoken language understanding model, and improves performance of the spoken language understanding model.

It should be understood that the spoken language understanding model of the embodiment of the present application may be applicable not only to a speech input scenario, for example, a spoken language understanding task after speech recognition such as "xiao du", "xiao ai", "tianmao sprite", but also to a text input scenario, for example, a spoken language understanding task after text recognition such as "ali honey". The embodiments of the present application do not limit this.

In the embodiments shown below, terms and english abbreviations such as weight parameters, freezing, fine tuning, and the like are exemplary examples given for convenience of description, and should not limit the present application in any way. This application is not intended to exclude the possibility that other terms may be defined in existing or future protocols to carry out the same or similar functions.

The following describes in detail a method for training a spoken language understanding model according to an embodiment of the present application with reference to fig. 2. The method of the embodiment of the present application may be executed by a data processing device, or may be executed by a chip in the data processing device, which is not limited in the embodiment of the present application.

Fig. 2 is a schematic flowchart of amethod 200 for training a spoken language understanding model according to an embodiment of the present disclosure. As shown in fig. 2, themethod 200 may include the following steps:

s201, determining an embedding layer weight parameter of a pre-training language model according to the unmarked corpus and the multi-language pre-training language model of the target language.

S202, determining a coding layer weight parameter of a pre-training language model according to at least one marked language material of a large language and the multi-language pre-training language model.

S203, initializing the embedding layer of the pre-training language model by adopting the embedding layer weight parameters, and initializing the coding layer of the pre-training language model by adopting the coding layer weight parameters to obtain the spoken language understanding model of the target idiom.

It should be understood that, in the embodiment of the present application, initializing the embedding layer of the multi-language pre-training language model by using the obtained embedding layer weight parameter, specifically, re-assigning a new initial value to the embedding layer weight parameter of the multi-language pre-training language model, where the initial value is the embedding layer weight parameter determined in S201. Similarly, the coding layer of the multi-language-based pre-training language model is initialized by using the obtained coding layer weight parameters, specifically, new initial values are newly given to the coding layer weight parameters of the multi-language-based pre-training language model, and the initial values are the coding layer weight parameters determined in S202.

It should be understood that the spoken language understanding model trained in the embodiment of the present application is specific to the target language, i.e., is capable of performing a spoken language understanding task of a specific language. The target phrase may be any phrase.

The unlabeled corpus of the target phrase may be collected by a technician in advance through various ways, which is not limited in the embodiment of the present application. In one possible implementation, the technician may collect unlabeled corpora of the target languages, such as wikipedia (wikipedia) and news, etc., over the network; in another possible implementation, a technician may translate the existing unlabeled data of the large-language corpus into the target small-language corpus through a public translation model; in another possible implementation manner, the technician may obtain a new target corpus from some existing unmarked target corpuses in a data enhancement manner.

The at least one large-language labeled corpus may be a set of one or more large-language labeled corpora, such as one or more of "chinese", "english", or "french", and the specific combination manner is not limited in the embodiments of the present application. The labeled corpora of the major languages can be obtained by technical personnel through collection, and the specific collection mode is not limited in the embodiment of the application.

In addition, the execution sequence of S201 and S202 in themethod 200 is not limited in the present application, for example, the data processing device may first obtain the embedded layer weight parameter, and then obtain the coding layer weight parameter; or, the data processing device may also obtain the coding layer weight parameter first, and then obtain the embedding layer weight parameter; alternatively, the data processing device may obtain the embedding layer weight parameter and the encoding layer weight parameter at the same time.

As an alternative embodiment, the determining the embedding layer weight parameter of the pre-trained language model according to the unlabeled corpus of the target language and the multi-language-based pre-trained language model includes: freezing the original coding layer weight parameters of the pre-training language model; and inputting the unmarked corpus into a pre-training language model for training to obtain the weight parameter of the embedding layer.

As an alternative embodiment, the determining, according to the labeled corpus and the pre-trained language model of at least one large language, the coding layer weight parameter of the pre-trained language model includes: freezing the original embedding layer weight parameters of the pre-training language model; and inputting the marked linguistic data into the pre-training language model for training to obtain the weight parameters of the coding layer.

As an alternative embodiment, after obtaining the spoken language understanding model of the target language (i.e. S203), themethod 200 of this embodiment of the present application may further include:

s204, fine-tuning the spoken language understanding model of the target language according to the labeled corpus of the target language to obtain a new spoken language understanding model of the target language.

The labeling linguistic data of the target Chinese language in the embodiment of the application can be obtained in a manual labeling mode. The method has the advantages that the spoken language understanding model of the target language is finely adjusted by the target language with the labels, the accuracy of the spoken language understanding model of the target language can be further improved while the spoken language understanding model based on the specific language is obtained, and therefore the performance of the spoken language understanding model of the target language is improved.

It should be understood that the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

The method for training the spoken language understanding model according to the embodiment of the present application is described in detail above with reference to fig. 1 to 2, and the device for training the spoken language understanding model according to the embodiment of the present application is described in detail below with reference to fig. 3 to 4.

Fig. 3 shows a training apparatus 300 for a spoken language understanding model provided in an embodiment of the present application, where the apparatus 300 includes: a determination module 310 and an initialization module 320.

The determining module 310 is configured to determine an embedding layer weight parameter of a pre-training language model according to a pre-training language model of a target minor language and multiple languages, and determine a coding layer weight parameter of the pre-training language model according to at least one labeled language material of a major language and multiple languages. The initialization module 320 is configured to initialize the embedding layer of the pre-training language model according to the embedding layer weight parameters obtained by the determination module 310, and initialize the coding layer of the pre-training language model according to the coding layer weight parameters obtained by the determination module 320, so as to obtain a spoken language understanding model of the target idiom.

Optionally, the determining module 310 is configured to freeze an original embedding layer weight parameter of a pre-training language model, and input a non-labeled corpus to the pre-training language model for training to obtain the embedding layer weight parameter.

Optionally, the determining module 310 is configured to freeze an original coding layer weight parameter of a pre-training language model, and input the labeled corpus into the pre-training language model for training to obtain a coding layer weight parameter.

Optionally, the apparatus 300 further includes: the adjusting module 330 is configured to perform fine tuning on the spoken language understanding model of the target idiom according to the labeled corpus of the target idiom, so as to obtain a new spoken language understanding model of the target idiom.

It should be appreciated that the apparatus 300 herein is embodied in the form of functional modules. The term module herein may refer to an Application Specific Integrated Circuit (ASIC), an electronic circuit, a processor (e.g., a shared, dedicated, or group processor) and memory that execute one or more software or firmware programs, a combinational logic circuit, and/or other suitable components that support the described functionality. In an optional example, it may be understood by those skilled in the art that the apparatus 300 may be specifically a data processing device in the foregoing embodiment, or the functions of the data processing device in the foregoing embodiment may be integrated in the apparatus 300, and the apparatus 300 may be configured to execute each procedure and/or step corresponding to the data processing device in the foregoing method embodiment, and in order to avoid repetition, details are not described here again.

The device 300 has the functions of realizing the corresponding steps executed by the data processing equipment in the method; the above functions may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the functions described above. For example, the determining module 310 may be a communication interface, such as a transceiver interface.

In an embodiment of the present application, the apparatus 300 in fig. 3 may also be a chip or a chip system, for example: system on chip (SoC). Correspondingly, the determining module 310 may be a transceiver circuit of the chip, and is not limited herein.

Fig. 4 shows another training apparatus 400 for a spoken language understanding model provided in an embodiment of the present application. The apparatus 400 includes a processor 410, a transceiver 420, and a memory 430. Wherein the processor 410, the transceiver 420 and the memory 430 are in communication with each other through an internal connection path, the memory 430 is used for storing instructions, and the processor 410 is used for executing the instructions stored in the memory 430 to control the transceiver 420 to transmit and/or receive signals.

It should be understood that the apparatus 400 may be embodied as a data processing device in the foregoing embodiments, or the functions of the data processing device in the foregoing embodiments may be integrated in the apparatus 400, and the apparatus 400 may be configured to execute each step and/or flow corresponding to the data processing device in the foregoing method embodiments. Alternatively, the memory 440 may include both read-only memory and random access memory, and provides instructions and data to the processor. The portion of memory may also include non-volatile random access memory. For example, the memory may also store device type information. The processor 420 may be configured to execute the instructions stored in the memory, and when the processor executes the instructions, the processor may perform the steps and/or processes corresponding to the data processing apparatus in the above method embodiments.

It should be understood that, in the embodiments of the present application, the processor may be a Central Processing Unit (CPU), and the processor may also be other general processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The steps of a method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in a processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor executes instructions in the memory, in combination with hardware thereof, to perform the steps of the above-described method. To avoid repetition, it is not described in detail here.

Those of ordinary skill in the art will appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the apparatus and the module described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for training a spoken language understanding model, comprising:

determining an embedding layer weight parameter of a pre-training language model according to a label-free corpus of a target language and the multi-language-based pre-training language model, wherein the pre-training language model adopts a framework of a bidirectional encoder;

determining a coding layer weight parameter of the pre-training language model according to at least one marked corpus of a large language and the pre-training language model;

initializing the embedding layer of the pre-training language model by adopting the embedding layer weight parameters, and initializing the coding layer of the pre-training language model by adopting the coding layer weight parameters to obtain the spoken language understanding model of the target idiom.

2. The method according to claim 1, wherein determining the embedding layer weight parameters of the pre-trained language model according to the unlabeled corpus of the target language and the multi-language based pre-trained language model comprises:

freezing the original coding layer weight parameters of the pre-training language model;

and inputting the unmarked corpus into the pre-training language model for training to obtain the weight parameters of the embedding layer.

3. The method according to claim 1 or 2, wherein determining coding layer weight parameters of the pre-trained language model according to the labeled corpus of at least one major language and the pre-trained language model comprises:

freezing the original embedding layer weight parameters of the pre-training language model;

and inputting the marked linguistic data into the pre-training language model for training to obtain the weight parameters of the coding layer.

4. The method according to any one of claims 1 to 3, wherein after said obtaining the spoken language understanding model of the target species, the method further comprises:

and fine-tuning the spoken language understanding model of the target language according to the labeled corpus of the target language to obtain a new spoken language understanding model of the target language.

5. A training device for a spoken language understanding model, comprising:

the determining module is used for determining the weight parameters of the embedding layer of the pre-training language model according to the unmarked corpus of the target language and the multi-language-based pre-training language model, wherein the pre-training language model adopts the framework of a bidirectional encoder; determining a coding layer weight parameter of the pre-training language model according to at least one marked corpus of a large language and the pre-training language model;

and the initialization module is used for initializing the embedded layer of the pre-training language model by the embedded layer weight parameters and initializing the coding layer of the pre-training language model by adopting the coding layer weight parameters to obtain the spoken language understanding model of the target idiom.

6. The apparatus of claim 5, wherein the determining module is specifically configured to:

and freezing the original coding layer weight parameters of the pre-training language model, inputting the unmarked corpus into the pre-training language model for training, and obtaining the embedded layer weight parameters.

7. The apparatus according to claim 5 or 6, wherein the determining module is specifically configured to:

and freezing the original embedded layer weight parameters of the pre-training language model, inputting the marked corpus into the pre-training language model for training, and obtaining the coding layer weight parameters.

8. The apparatus of any of claims 5 to 7, further comprising:

and the adjusting module is used for finely adjusting the spoken language understanding model of the target language according to the labeled corpus of the target language to obtain a new spoken language understanding model of the target language.

9. A training device for a spoken language understanding model, comprising: a processor coupled with a memory for storing a computer program that, when invoked by the processor, causes the training apparatus to perform the method of any of claims 1 to 4.

10. A computer-readable storage medium for storing a computer program comprising instructions for implementing the method of any one of claims 1 to 4.