Movatterモバイル変換


[0]ホーム

URL:


CN115730587A - Text feature extraction method based on NGU language model - Google Patents

Text feature extraction method based on NGU language model
Download PDF

Info

Publication number
CN115730587A
CN115730587ACN202211606356.0ACN202211606356ACN115730587ACN 115730587 ACN115730587 ACN 115730587ACN 202211606356 ACN202211606356 ACN 202211606356ACN 115730587 ACN115730587 ACN 115730587A
Authority
CN
China
Prior art keywords
text
ngu
model
language model
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211606356.0A
Other languages
Chinese (zh)
Inventor
曹肖攀
马国祖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Wanwei Information Technology Co Ltd
Original Assignee
China Telecom Wanwei Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Wanwei Information Technology Co LtdfiledCriticalChina Telecom Wanwei Information Technology Co Ltd
Priority to CN202211606356.0ApriorityCriticalpatent/CN115730587A/en
Publication of CN115730587ApublicationCriticalpatent/CN115730587A/en
Pendinglegal-statusCriticalCurrent

Links

Images

Landscapes

Abstract

The invention relates to the technical field of natural language processing, in particular to a text feature extraction method based on an NGU language model. The method is used for optimizing and improving a GRU model based on a recurrent neural network, provides a new text feature extraction model NGU language model, introduces a gate control unit of the GRU into a normalization mechanism, replaces a hyperbolic tangent function with a saturation region into layer normalization operation, and simultaneously fuses a feedforward layer neural network of a Transformer into an iteration unit to improve the semantic representation capability of the model, namely the data fitting capability of the model.

Description

Text feature extraction method based on NGU language model
Technical Field
The invention relates to the technical field of natural language processing, in particular to a text feature extraction method based on an NGU language model.
Background
RNN, transformer, etc. play a key role in natural language processing tasks as important basic units in the field of natural language processing. The GRU model belonging to the RNN series plays an especially important role in entity recognition, relationship extraction and text generation. In the text generation task, the GRU unit can make the text generation inference in an iterative streaming mode, and the speed of text generation is faster than that of a Transformer model and a GPT model. In the tasks of entity identification and relationship extraction, the GRU model is also poor in appearance because the loop iteration mechanism of the GRU can introduce adjacent text content information. However, because the Transformer adopts a multi-head self-attention mechanism of global interaction perception, the Transformer has strong semantic representation capability, so that the current natural language processing related tasks gradually start to adopt a large model based on a Transformer related variant, and the model is trained on a large data set to achieve an excellent effect, but needs to consume a large amount of computing resources. In order to integrate the advantages of the Transformer and the GRU, the GRU has the semantic expression capability of the Transformer and the advantages of the GRU at the same time. The NGU language model is provided based on the GRU model.
Disclosure of Invention
This patent is optimized and is improved to GRU model based on recurrent neural network, has proposed a new text characteristic and has drawed model NGU language model, introduces the normalization mechanism with GRU's gate unit, and the hyperbolic tangent function that will have the saturation region is replaced for the operation of layer normalization, promotes the model semantic expression ability in fusing the iterative unit with transform's feedforward layer neural network simultaneously, also is the model fitting data ability, and this patent definition is NGU language model.
The method solves the problems in the prior art, provides a text feature extraction method based on an NGU language model for integrating respective advantages of GRU and Transformer, and comprises the following steps:
s1, constructing a training data set: collecting and arranging a training data set related to a task, putting the training data set into train.txt, inputting the maximum length of a text of an NGU language model to 1000, and filling the maximum length to 1000 by adopting [ PAD ] when the length of the text is less than 1000;
s2, constructing mapping from characters to IDs: counting characters in train text in a training set in S1 and marking the characters as token _ list, then establishing a dictionary Dict _ token according to the characters in the token _ list, wherein a front key of the Dict _ token is a character index number, a back key of the Dict _ token is a specific single character, and [ PAD ] is a complete character when the text is not large enough;
s3, adaptation of training data and a model: if the text sample in the training data set obtained in the step S1 is not enough to have the maximum length of 1000, filling the list to have the maximum length of 1000 through [ PAD ], then mapping the list into an index number list through a dictionary Dict _ token, and changing the index number list into tensor X input by the model, wherein the batch size batch _ size input into the NGU language model is 128, and the size of X is [ 128, 1000 ];
s4, extracting the text features of the NGU language model: the original GRU network model iteration formula is as follows:
Figure 314179DEST_PATH_IMAGE002
f is an equivalent formula of a GRU gating cycle unit, and the detailed formula of f is as follows:
Figure 157633DEST_PATH_IMAGE004
the proposed NGU iterative formula is specifically as follows:
Figure 364623DEST_PATH_IMAGE006
sigmoid function when x is far from 0, there is a saturation region,
Figure 966155DEST_PATH_IMAGE008
after passing through the full connection layer, information is lost through the sigmoid function, layer normalization operation is introduced, normalization is carried out on the embedded representation dimension, and then the sigmoid function is adopted to effectively retain text representation information;
when in use
Figure 376539DEST_PATH_IMAGE010
In (1)
Figure 286333DEST_PATH_IMAGE012
When the value is far from 0 value and reaches the saturation region, the output tends to be a stable value, and much semantic information is lost. Replacing a hyperbolic tangent function tanh in a GRU by using layer normalization operation, wherein the layer normalization operation is as follows, namely normalization is performed on a semantic representation dimension d _ model, if a current word representation matrix is T, the matrix size is [ 1, d _model ], and values in a second dimension are sequentially as follows:
Figure 100002_DEST_PATH_IMAGE013
the mean values are as follows:
Figure 605490DEST_PATH_IMAGE014
the variance is:
Figure DEST_PATH_IMAGE015
each input to the layer normalization is:
Figure 326059DEST_PATH_IMAGE016
the layer normalization operation only carries out translation contraction on data without losing semantic information, meanwhile, the operation enables the embedded representation dimension to be normalized to be close to 0, so that model training is more stable, a feedforward layer neural network layer in a Transformer is migrated into an NGU model, the feedforward neural network comprises two times of linear transformation and one time of nonlinear transformation GeLU activation functions, and then the operation is carried out through a residual error network and the layer normalization operation;
Figure DEST_PATH_IMAGE017
the word embedding dimension is 256, the hidden layer dimension of the feedforward neural network is 2048, for the size of model input in step S3 to be [ 128, 1000 ], each word in the text data is embedded and represented through token _ embedding, the matrix shape of the embedded representation of each word in 1000 words is [ 128, 256 ], 128 is the size of Batchsize in training, 256 is the word embedding dimension in the text, then each word embedding in the text data X is sequentially input into the NGU loop iteration unit, and then the time of each step is input
Figure 161422DEST_PATH_IMAGE018
Spliced together into
Figure DEST_PATH_IMAGE019
The dimensionality is [ 128, 1000, 256 ], the extraction of text features is completed through an NGU language model, and each word of each sentence of data in one batch is expressed as 256 dimensionalities through the NGU language model;
s5, applying an NGU language model: the proposed NGU language model is a non-pre-training language model, parameter training is carried out according to a specific natural language processing task, the size of a text expression tensor obtained in S4 is (128, 1000, 256), and the tensor is accessed to a neural network for subsequent text classification, relation extraction, text generation and entity identification to be trained.
The method has the advantages that the GRU sigmoid activation function has a saturation region, the intermediate full connection enables the representation data information of the text to be compressed in the saturation region, the method introduces layer normalization after full connection in the activation function, and shifts the data center to be close to 0 to keep the information. In order to further keep text information, namely, an activation function of a saturation region is avoided as much as possible, the hyperbolic tangent function in the GRU is directly replaced by layer normalization operation, and the model training stability is not influenced due to the memory gating effect of sigmoid. The normalization operation only translates the values, the variance is normalized, the richness of the values of the function is kept to the maximum extent, and the fitting data volume of the model is improved. The feedforward neural network of the Transformer is a very powerful module with powerful fitting function for performing full join operation in the embedding representation dimension. The NGU language model effectively combines the respective advantages of the GRU and the Transformer, and can be applied to various tasks of natural processing. Although the idea of the patent is derived from the GRU, the hyperbolic tangent function in the GRU is replaced, layer normalization operation is introduced at multiple positions, and a feedforward layer full-connection neural network is introduced in an iteration unit, so that the method is obviously different from the GRU model. This patent defines the NGU language model.
Drawings
FIG. 1 is an overall flowchart of text feature extraction according to the present invention;
FIG. 2 is a graph of GRU plotted in sigmoid function;
FIG. 3 is a graph plotting the tanh function and its derivatives.
Detailed Description
The following is further detailed in combination with specific cases:
the invention specifically relates to a text feature extraction method based on an NGU language model, which comprises the following steps:
s1, constructing a training data set: and aiming at specific tasks in the project, such as text classification, relation extraction, entity identification, text search, text writing, text completion and other natural language processing tasks. Training data sets that collate and correlate tasks are collected and placed into train. The maximum length of a text input NGU language model is 1000, and when the text length is less than 1000, [ PAD ] is adopted to fill the maximum length to 1000;
s2, mapping of characters to IDs is constructed: and (5) counting characters in the train-in txt in the training set in the S1 to be recorded as token _ list. Then, a dictionary Dict _ token is established according to the characters in the token _ list, the front item key of the Dict _ token is a character index number, and the back item value of the Dict _ token is a specific single character, namely, dict _ token = {0: "middle", 1: "Hua", 2: "Tang", 3: "human", 4: "Min", 5: "[ PAD ]". Said. }, [ PAD ] is a completion character when the text is not of sufficient maximum length;
s3, adaptation of training data and a model: if a certain text sample in the training data set obtained in step S1 is "along with the development of deep learning, artificial intelligence has gained more and more attention in the financial field, medical field, and educational field. "that is to say [ 'follow', 'deep', 'degree', 'study', 'person', 'hair', 'spread', 'person', 'wisdom', 'energy', 'gold', 'melt', 'neck', the term" therapeutic "refers to the term" therapeutic effect "or" therapeutic effect "as used herein, and includes the terms" domain "," treatment "," lead "," field "," teaching "," birth "," lead "," field "," addition "," extended "," generalized "," heavy "," visual "," etc. ' ] list, the text is not enough to be maximum length 1000, the list is filled in to the text maximum length 1000 by [ PAD ], then mapped into index number list by dictionary Dict _ token, and becomes model input tensor X, at this time, if input NGU language model batch size is 128, then X size is [ 128, 1000 ].
S4, extracting the text features of the NGU language model: the original GRU network model iterative formula is as follows:
Figure 897078DEST_PATH_IMAGE020
f is an equivalent formula of a GRU gating cycle unit, and the detailed formula of f is as follows:
Figure DEST_PATH_IMAGE021
the NGU iterative formula proposed by this patent is specifically as follows:
Figure 498567DEST_PATH_IMAGE022
the GRU is plotted in sigmoid function as shown in fig. 2.
sigmoid function when x is far from 0, there is a saturation region,
Figure DEST_PATH_IMAGE023
after passing through the full connection layer, information is lost through the sigmoid function. The method introduces layer normalization operation, normalizes in embedding expression dimension, and effectively reserves the richness of text expression information by adopting a sigmoid function.
The tanh function in GRU also saturates when x is far from 0, because the derivative is 0, causing the gradient to disappear, and the tanh function and its derivative are plotted as shown in fig. 3.
Wherein when
Figure 915336DEST_PATH_IMAGE024
In (1)
Figure DEST_PATH_IMAGE025
When the value is far from 0 value and reaches the saturation region, the output tends to be a stable value, and much semantic information is lost. This patent adopts the normalized operation of layerAs an alternative hyperbolic tangent function tanh in GRU. The operation of the layerorm layer normalization is as follows, namely, normalization is carried out in a semantic representation dimension d _ model. If the current word represents a matrix of T, the matrix size is [ 1, d _model ], and values in the second dimension are sequentially as follows:
Figure 251378DEST_PATH_IMAGE026
the mean values are as follows:
Figure DEST_PATH_IMAGE027
the variance is:
Figure 977675DEST_PATH_IMAGE028
each input to the layer normalization is:
Figure DEST_PATH_IMAGE029
it can be seen that the layer normalization operation only performs translation contraction on the data, and does not lose semantic information. Meanwhile, the operation enables the dimension of the embedded representation to be normalized to be close to 0, and the model training is more stable. Layer normalization operation is completely introduced into a gate control unit, and a tanh hyperbolic tangent function in a GRU is replaced by the layer normalization operation, so that the model training speed is higher, and the semantic representation capability is good. In order to further improve the data fitting capability of the method, a feedforward layer neural network layer in a Transformer is transferred to an NGU model, wherein the feedforward neural network comprises two linear transformations and one nonlinear transformation (GeLU activation function), and then the operation of residual error network and layer normalization is carried out.
Figure 433671DEST_PATH_IMAGE030
In this patentThe word embedding dimension is 256 and the hidden layer dimension of the feedforward neural network is 2048. For the size of the model input in step S3 being [ 128, 1000 ], each word in the text data is first subjected to token _ embedding, at this time, the matrix shape of the embedded representation of each word in the 1000 words is [ 128, 256 ], 128 is the size of batchsize in training, 256 is the word embedding dimension in the text, and then each word embedding in the text data X is sequentially input to the NGU loop iteration unit. Then at each time step
Figure DEST_PATH_IMAGE031
Spliced together into
Figure 755762DEST_PATH_IMAGE032
The dimensionality is [ 128, 1000, 256 ], that is, the extraction of text features is completed through the NGU language model, that is, each word of each sentence of data in one batch is represented as 256 dimensionalities through the NGU language model.
S5, applying an NGU language model: the NGU language model non-pre-training language model provided by the patent needs parameter training according to a specific natural language processing task, the size of a text representation tensor obtained by S4 is 128, 1000 and 256, and the tensor is accessed into a neural network for subsequent text classification, relation extraction, text generation and entity identification to train, so that the patent application is completed.
Theoretical analysis is passed through to this patent, improves based on the GRU model and has proposed NGU language model, and the improvement point mainly has: and layer normalization operation is introduced before sigmoid function operation in the GRU reset gate and the update gate. The hyperbolic tangent function in the GRU is changed to a layer normalization operation. Meanwhile, a feedforward neural network of a Transformer is introduced into the NGU to further improve the semantic representation complexity of the model, namely the data fitting capability. The NGU language model effectively combines the respective advantages of the GRU and the transform, and can be applied to various tasks of natural processing.

Claims (1)

1. A text feature extraction method based on an NGU language model is characterized by comprising the following steps:
s1, constructing a training data set: collecting and arranging a training data set related to a task, putting the training data set into train.txt, inputting the maximum length of a text of an NGU language model to 1000, and filling the maximum length to 1000 by adopting [ PAD ] when the length of the text is less than 1000;
s2, constructing mapping from characters to IDs: calculating characters in train text in a training set in S1, marking the characters as token _ list, establishing a dictionary Dict _ token according to the characters in the token _ list, wherein a front key of the Dict _ token is a character index number, a back key of the Dict _ token is a specific single character, and [ PAD ] is a complete character when the text is not long enough;
s3, adaptation of training data and a model: if the text sample in the training data set obtained in the step S1 is not enough to have the maximum length of 1000, filling the list to have the maximum length of 1000 through [ PAD ], then mapping the list into an index number list through a dictionary Dict _ token, and changing the index number list into tensor X input by the model, wherein the batch size batch _ size input into the NGU language model is 128, and the size of X is [ 128, 1000 ];
s4, extracting the text features of the NGU language model: the original GRU network model iteration formula is as follows:
Figure DEST_PATH_IMAGE001
f is an equivalent formula of a GRU gating cycle unit, and the detailed formula of f is as follows:
Figure 150526DEST_PATH_IMAGE002
the proposed NGU iterative formula is specifically as follows:
Figure DEST_PATH_IMAGE003
sigmoid function when x is far from 0, there is a saturation region,
Figure 378770DEST_PATH_IMAGE004
after passing through the full connection layer, information is lost through the sigmoid function, layer normalization operation is introduced, normalization is carried out on the embedded representation dimension, and then the sigmoid function is adopted to effectively retain text representation information;
when in use
Figure DEST_PATH_IMAGE005
In
Figure 786050DEST_PATH_IMAGE006
When the numerical value is far from the 0 value and reaches the saturation region, the output tends to a stable value, and a lot of semantic information is lost;
replacing a hyperbolic tangent function tanh in a GRU by using layer normalization operation, wherein the layer normalization operation is as follows, namely normalization is performed on a semantic representation dimension d _ model, if a current word representation matrix is T, the matrix size is [ 1, d _model ], and values in a second dimension are sequentially as follows:
Figure DEST_PATH_IMAGE007
the mean values are as follows:
Figure 776658DEST_PATH_IMAGE008
the variance is:
Figure DEST_PATH_IMAGE009
each input to the layer normalization is:
Figure 776009DEST_PATH_IMAGE010
the layer normalization operation only carries out translation contraction on data without losing semantic information, meanwhile, the operation enables the embedded representation dimension to be normalized to be close to 0, so that model training is more stable, a feedforward layer neural network layer in a Transformer is migrated into an NGU model, the feedforward neural network comprises two times of linear transformation and one time of nonlinear transformation GeLU activation functions, and then the operation is carried out through a residual error network and the layer normalization operation;
Figure DEST_PATH_IMAGE011
the word embedding dimension is 256, the hidden layer dimension of the feedforward neural network is 2048, each word in the text data is embedded and represented through token _ embedding in step S3, the size of the model input is [ 128, 1000 ], the matrix shape of the embedding representation of each word in 1000 words is [ 128, 256 ], 128 is the size of Batchsize in training, 256 is the word embedding dimension in the text, each word in the text data X is embedded and sequentially input into the NGU circulation iteration unit, and then the time of each step is input into the NGU circulation iteration unit
Figure 898073DEST_PATH_IMAGE012
Spliced together into
Figure DEST_PATH_IMAGE013
The dimensionality is [ 128, 1000, 256 ], the extraction of text features is completed through an NGU language model, and each word of each sentence of data in one batch is expressed as 256 dimensionalities through the NGU language model;
s5, applying an NGU language model: the proposed NGU language model is a non-pre-training language model, parameter training is carried out according to a specific natural language processing task, and the size of the text expression tensor obtained by S4 is
And (128, 1000 and 256), training the tensor by accessing the tensor into a neural network for subsequent text classification, relation extraction, text generation and entity identification.
CN202211606356.0A2022-12-152022-12-15Text feature extraction method based on NGU language modelPendingCN115730587A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202211606356.0ACN115730587A (en)2022-12-152022-12-15Text feature extraction method based on NGU language model

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202211606356.0ACN115730587A (en)2022-12-152022-12-15Text feature extraction method based on NGU language model

Publications (1)

Publication NumberPublication Date
CN115730587Atrue CN115730587A (en)2023-03-03

Family

ID=85301365

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202211606356.0APendingCN115730587A (en)2022-12-152022-12-15Text feature extraction method based on NGU language model

Country Status (1)

CountryLink
CN (1)CN115730587A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN117077162A (en)*2023-07-312023-11-17上海交通大学Privacy reasoning method, system, medium and electronic equipment based on Transformer network model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN117077162A (en)*2023-07-312023-11-17上海交通大学Privacy reasoning method, system, medium and electronic equipment based on Transformer network model
CN117077162B (en)*2023-07-312024-04-19上海交通大学 Privacy reasoning method, system, medium and electronic device based on Transformer network model

Similar Documents

PublicationPublication DateTitle
CN109597891B (en)Text emotion analysis method based on bidirectional long-and-short-term memory neural network
CN109918644B (en) A named entity recognition method for TCM health consultation text based on transfer learning
Gallant et al.Representing objects, relations, and sequences
CN110569033B (en)Method for generating basic codes of digital transaction type intelligent contracts
CN109840322B (en)Complete shape filling type reading understanding analysis model and method based on reinforcement learning
CN117316466B (en)Clinical decision method, system and equipment based on knowledge graph and natural language processing technology
CN108920445A (en)A kind of name entity recognition method and device based on Bi-LSTM-CRF model
CN111046661B (en)Reading understanding method based on graph convolution network
CN109670168B (en)Short answer automatic scoring method, system and storage medium based on feature learning
CN106951858A (en) A method and device for identifying human relationship based on deep convolutional network
CN117423108A (en)Image fine granularity description method and system for instruction fine adjustment multi-mode large model
CN112487820A (en)Chinese medical named entity recognition method
CN111897944A (en) Knowledge Graph Question Answering System Based on Semantic Space Sharing
CN109815478A (en) Recognition method and system of medicinal chemical entities based on convolutional neural network
CN116341546A (en)Medical natural language processing method based on pre-training model
CN114297399B (en) Knowledge graph generation method, system, storage medium and electronic device
CN111914555A (en)Automatic relation extraction system based on Transformer structure
CN113674866B (en)Pre-training method for medical text
CN117708339B (en)ICD automatic coding method based on pre-training language model
CN112148891A (en)Knowledge graph completion method based on graph perception tensor decomposition
CN117852523A (en) A cross-domain small sample relationship extraction method and device for learning discriminative semantics and multi-view context
JiCombining knowledge with data for efficient and generalizable visual learning
CN115730587A (en)Text feature extraction method based on NGU language model
Guo et al.Efficient agricultural question classification with a BERT-enhanced DPCNN model
CN116543289A (en) An Image Description Method Based on Encoder-Decoder and Bi-LSTM Attention Model

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp