Movatterモバイル変換


[0]ホーム

URL:


CN113886591B - A language model pre-training method based on coreference elimination - Google Patents

A language model pre-training method based on coreference elimination
Download PDF

Info

Publication number
CN113886591B
CN113886591BCN202111237852.9ACN202111237852ACN113886591BCN 113886591 BCN113886591 BCN 113886591BCN 202111237852 ACN202111237852 ACN 202111237852ACN 113886591 BCN113886591 BCN 113886591B
Authority
CN
China
Prior art keywords
training
word
data
mask
phrase
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111237852.9A
Other languages
Chinese (zh)
Other versions
CN113886591A (en
Inventor
侯良学
王冠
杨根科
褚健
王宏武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ningbo Institute Of Artificial Intelligence Shanghai Jiaotong University
Original Assignee
Ningbo Institute Of Artificial Intelligence Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ningbo Institute Of Artificial Intelligence Shanghai Jiaotong UniversityfiledCriticalNingbo Institute Of Artificial Intelligence Shanghai Jiaotong University
Priority to CN202111237852.9ApriorityCriticalpatent/CN113886591B/en
Publication of CN113886591ApublicationCriticalpatent/CN113886591A/en
Application grantedgrantedCritical
Publication of CN113886591BpublicationCriticalpatent/CN113886591B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了一种基于共指消除的语言模型预训练方法,涉及自然语言处理技术领域,包括如下步骤:S100、数据预处理,通过字符串匹配提取语料中的代词,处理工具提取所述语料中命名实体、名词短语,作为训练数据生成阶段的遮盖候选集合;S200、训练数据生成,通过mask_word模式和mask_phrase模式进行遮盖处理,分别生成mask_word训练数据和mask_phrase训练数据:S300、预训练,根据训练模式选择因子αt自适应地切换word_learning模式或phrase_learning模式进行训练。本发明增加了对代词、短语、实体的语义训练,并且自适应切换学习模式,增强了模型的语义表示能力,更好地适用于共指消除任务。

The present invention discloses a language model pre-training method based on coreference elimination, which relates to the technical field of natural language processing and includes the following steps: S100, data pre-processing, extracting pronouns in a corpus by string matching, and extracting named entities and noun phrases in the corpus by a processing tool as a masking candidate set in a training data generation stage; S200, training data generation, masking processing is performed by mask_word mode and mask_phrase mode to generate mask_word training data and mask_phrase training data respectively; S300, pre-training, adaptively switching word_learning mode or phrase_learning mode for training according to a training mode selection factor αt . The present invention increases semantic training for pronouns, phrases, and entities, and adaptively switches learning modes, thereby enhancing the semantic representation ability of the model and being better suitable for coreference elimination tasks.

Description

Language model pre-training method based on co-fingering elimination
Technical Field
The invention relates to the technical field of natural language processing, in particular to a language model pre-training method based on co-fingering elimination.
Background
The task of coreference resolution is to categorize expressions (including pronouns, named entities, noun phrases, etc.) in the text that refer to the same entity. At present, an advanced end-to-end neural network coreference resolution model takes word vectors as input, obtains span (span) representation based on an attention module, and then scores span pairs in a coreference way, so that coreference resolution is realized. Implementation of coreference resolution requires reasoning with context information and world knowledge, i.e., advanced language models are required to obtain a more semantically rich word vector representation. Bert (Bidirectional Encoder Representations from Transformer) is a language model that is currently used to predict covered words based on a fransformer algorithm framework by randomly covering words (mainly aiming at English corpus, one word in English of word finger) on massive text corpus, however, the pretraining mode has some disadvantages. For example, in the sentence "Harry Potter is a wonderful work of magic literature", if only "Harry" is covered, it is easy to predict "Potter", so that the word vector of "Harry Potter" learned by the model cannot contain such information as "magic literature", i.e. the context information is not rich enough. Especially in the field of coreference resolution, more semantic information-rich linguistic representations are needed to capture relationships between entities. In addition, in the coreference resolution task, due to weak semantics of the pronouns, the pronouns resolution error rate is high, the coverage probability of the Bert pre-training method on the pronouns is low, and more external knowledge is needed when the pronouns are resolved, so that the model needs to be enhanced for learning the pronouns.
Spanbert provides a pre-training method for randomly covering any continuous span aiming at span-level (span) tasks such as knowledge question and answer, named entity recognition and the like, and span lengths are subject to distribution of L-Geo (0.2).
The ERNIE model of hundred degrees uses a three-stage covering mechanism for Chinese to pretrain, namely basic-level, phrase-level and entity-level, and the single word, phrase and entity granularity level is progressive, so that phrases and entity knowledge are hidden, and the representation capability of the language model is greatly improved. However, this pre-training approach of ERNIE found in actual use results in forgetting basic-level knowledge during the stage of entity-level training, thus degrading the model word representation.
Accordingly, those skilled in the art are working to develop a language model pre-training method based on co-fingering cancellation.
Disclosure of Invention
In view of the above-mentioned drawbacks of the prior art, the present invention aims to solve the technical problem of how to provide word vectors with richer semantic information for a language model pre-training method for co-reference cancellation of english, thereby improving prediction accuracy of co-reference cancellation.
The process of learning language by human is generally to learn basic words, then learn phrases, and finally apply to sentence and chapter level tasks. However, since the knowledge of the neural network language model is stored in the form of network weights, if words are trained first and phrases are trained, there may be cases where word granularity information is forgotten. Therefore, the inventor proposes to adaptively train word blocks with different granularity according to the current function loss, and meanwhile, aiming at the problem of low pronoun resolution precision in coreference resolution, the training of a language model on pronouns is increased. In the training stage, word-learning mode is adopted for learning word information first by 20% of steps, words are trained, word-learning mode training words or phrase-learning mode training phrases are adaptively selected according to loss by the last 80% of steps, and different loss functions are adopted for the two modes.
In one embodiment of the invention, a language model pre-training method based on co-fingering cancellation is provided, comprising.
S100, preprocessing data, namely extracting pronouns in the corpus through character string matching, and extracting named entities, noun phrases and the like in the corpus by a processing tool as a covering candidate set in a training data generation stage;
S200, training data are generated, and mask word mode (namely word masking mode) and mask phrase mode (namely phrase masking mode) are used for masking, so that mask word training data and mask phrase training data are generated respectively:
S300, pre-training, namely adaptively switching a word-learning mode or a phrase-learning mode according to a training mode selection factor alphat to train.
Optionally, in the language model pre-training method based on co-finger cancellation in the above embodiment, step S100 includes:
S110, acquiring English Wikipedia data;
S120, extracting pronouns in the corpus, and establishing a pronoun set PronounSet;
S130, extracting all named entities in the corpus, and establishing an entity set ENTITYSET;
And S140, extracting noun phrases in the corpus, and removing phrases coincident with ENTITYSET to obtain a noun phrase set NounPhraseSet.
Further, in the language model pre-training method based on co-fingering cancellation in the above embodiment, the extracting tool in step S100 is an entity identification module in the python natural language processing kit Spacy.
Further, in the language model pre-training method based on co-fingering cancellation in the above embodiment, step S140 uses the noun phrase extraction module in Spacy.
Optionally, in the language model pre-training method based on co-fingering cancellation in any of the foregoing embodiments, step S200 includes:
s210, copying the data into two copies, namely data one and data two;
s220, creating training examples for texts in the data I and the data II according to the generation mode of the training data of the BERT, wherein each example comprises a plurality of sentences;
S230, performing covering processing by using a mask_word mode (namely a word covering mode) and a mask_phrase mode (namely a phrase covering mode) respectively by using an instance created by the first data and the second data;
S240, generating MASK word training data, randomly selecting 15% of words from sentences in an example created by the data, putting the words into CANDIDATESET (covering the candidate word set), replacing each word of CANDIDATESET1 with 'MASK' at 80% probability, replacing 10% probability by other random words, and keeping 10% probability unchanged;
S250, generating mask_phrase training data, randomly selecting named entities and noun phrases in sentences in an instance created by the second data, adding the named entities and noun phrases into CANDIDATESET (covering the second candidate set), replacing each word block in CANDIDATESET with 'MASK' at 80%, wherein 10% of the word blocks are replaced by other random words, 10% of the word blocks are unchanged, and all word replacement behaviors in each word block are consistent, namely the word blocks are replaced simultaneously or all word blocks are unchanged.
Further, in the language model pre-training method based on co-fingering cancellation in the above embodiment, step S220 includes performing the alignment process for limiting the sentence length to 128 english words, less than 128 english words, i.e., performing the truncation process for adding english words to 128, more than 128.
Further, in the language model pre-training method based on co-fingering cancellation in the above embodiment, the pronoun in CANDIDATESET in step S240 is about one third, the rest of the words are two thirds, and when the total number of the modern words is less than one third, the common words are used for substitution.
Further, in the language model pre-training method based on co-fingering cancellation in the above embodiment, the number of named entities and nouns selected in CANDIDATESET in step S250 accounts for 15% of the sentence length, where the named entities and noun phrases each account for 50%.
Optionally, in the language model pre-training method based on co-fingering cancellation in any of the above embodiments, step S300 includes, in a word_learning mode, inputting mask_word training data into the BERT network to predict covered words and calculate corresponding losses, and in a phrase_learning mode, inputting mask_phrase training data into the BERT network to predict covered phrases and calculate corresponding losses:
Further, in the language model pre-training method based on co-finger cancellation in any of the above embodiments, step S300 includes;
s310, preheating training, namely firstly learning basic words, carrying out 20% of training steps before, preheating training by using a word-learning mode, and storing initial word-learning prediction lossAnd initial 62. Learning prediction loss
S302, self-adaptive training, wherein the number of training steps is 80%, word-learning or phrase-learning mode training is adopted in the t+1 step according to a selection factor alphat, and the method is specifically as follows:
and respectively representing the losses of the two modes when the t training step is performed, wherein when alphat is more than 0, the word-learning mode is adopted in the t+1 step, and otherwise, the training is continued by adopting the phrase-learning mode.
The invention increases semantic training on pronouns, phrases and entities, adaptively switches learning modes, enhances the semantic representation capability of the model, and is better suitable for co-fingering elimination tasks.
The conception, specific structure, and technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, features, and effects of the present invention.
Drawings
FIG. 1 is a flow diagram illustrating a co-fingering cancellation-based language model pre-training method in accordance with an exemplary embodiment.
Detailed Description
The following description of the preferred embodiments of the present invention refers to the accompanying drawings, which make the technical contents thereof more clear and easy to understand. The present invention may be embodied in many different forms of embodiments and the scope of the present invention is not limited to only the embodiments described herein.
In the drawings, like structural elements are referred to by like reference numerals and components having similar structure or function are referred to by like reference numerals. The dimensions and thickness of each component shown in the drawings are arbitrarily shown, and the present invention is not limited to the dimensions and thickness of each component. The thickness of the components is schematically and appropriately exaggerated in some places in the drawings for clarity of illustration.
The inventor designs a language model pre-training method based on common-finger elimination, as shown in fig. 1, comprising the following steps:
S100, preprocessing data, extracting pronouns in corpus through character string matching, extracting named entities, noun phrases and the like in corpus by a processing tool as a covering candidate set in a training data generation stage, wherein the extracting tool is an entity identification module in a python natural language processing tool kit Spacy, and specifically comprising:
S110, acquiring English Wikipedia data;
S120, extracting pronouns in the corpus, and establishing a pronoun set PronounSet;
S130, extracting all named entities in the corpus, and establishing an entity set ENTITYSET;
And S140, extracting noun phrases in the corpus by using a noun phrase extraction module Spacy, and removing phrases overlapped with ENTITYSET to obtain a noun phrase set NounPhraseSet.
S200, training data generation, which is to carry out covering processing through a mask word mode (namely a word covering mode) and a mask phrase mode (namely a phrase covering mode) to respectively generate mask word training data and mask phrase training data, wherein the method specifically comprises the following steps:
s210, copying the data into two copies, namely data one and data two;
s220, creating training examples for texts in the data I and the data II according to the BERT training data generation mode, wherein each example comprises a plurality of sentences, the sentence length is limited to 128 English words, the number of the English words is less than 128, namely the English words are supplemented to 128, and the number of the English words is more than 128, and the number of the English words is truncated;
S230, performing covering processing by using a mask_word mode (namely a word covering mode) and a mask_phrase mode (namely a phrase covering mode) respectively by using an instance created by the first data and the second data;
S240, generating MASK word training data, randomly selecting 15% of words from sentences in an example created by the data, putting the words into CANDIDATESET (covering the candidate word set), wherein the pronouns account for about one third, the rest words account for two thirds, and when the total number of the modern words is less than one third, replacing the words with general words, replacing each word of CANDIDATESET1 with a 'MASK' at 80% probability, and replacing 10% of words with other random words at 10% probability, wherein the 10% probability is unchanged;
S250, generating mask_phrase training data, randomly selecting named entities and noun phrases in sentences in an instance created by the second data, adding the named entities and noun phrases to CANDIDATESET (covering the second candidate set), wherein the number of the selected named entities and noun phrases accounts for 15% of the length of the sentences, wherein the number of the named entities and noun phrases accounts for 50% of each of the sentences, replacing each word block in CANDIDATESET by 'MASK' at 80%, replacing the 10% probability by other random words, and keeping the 10% probability unchanged, wherein the replacement behaviors of all words in each word block are consistent, namely the replacement or the replacement are kept unchanged at the same time.
S300, pre-training, namely adaptively switching a word-learning mode or a phrase-learning mode according to a training mode selection factor alphat to train, wherein in the word-learning mode, mask-word training data are input into a BERT network to predict covered words and calculate corresponding losses, in the phrase-learning mode, mask-phrase training data are input into the BERT network to predict covered phrases and calculate corresponding losses, and the method specifically comprises the following steps:
s310, preheating training, namely firstly learning basic words, carrying out 20% of training steps before, preheating training by using a word-learning mode, and storing initial word-learning prediction lossAnd initial 62. Learning prediction loss
S302, self-adaptive training, wherein the number of training steps is 80%, word-learning or phrase-learning mode training is adopted in the t+1 step according to a selection factor alphat, and the method is specifically as follows:
Respectively representing the loss of two modes when the t training step is performed, wherein when alphat is more than 0, the t+1 step adopts a word-learning mode, otherwise, the training is continued by adopting a phrase-learning mode.
The foregoing describes in detail preferred embodiments of the present invention. It should be understood that numerous modifications and variations can be made in accordance with the concepts of the invention by one of ordinary skill in the art without undue burden. Therefore, all technical solutions which can be obtained by logic analysis, reasoning or limited experiments based on the prior art by the person skilled in the art according to the inventive concept shall be within the scope of protection defined by the claims.

Claims (7)

Translated fromChinese
1.一种基于共指消除的语言模型预训练方法,其特征在于,包括如下步骤:1. A language model pre-training method based on coreference elimination, characterized in that it comprises the following steps:S100、数据预处理,通过字符串匹配提取语料中的代词,处理工具提取所述语料中命名实体、名词短语,作为训练数据生成阶段的遮盖候选集合;S100, data preprocessing, extracting pronouns in the corpus through string matching, and a processing tool extracting named entities and noun phrases in the corpus as a set of masking candidates in the training data generation phase;S200、训练数据生成,通过mask_word模式和mask_phrase模式进行遮盖处理,分别生成mask_word训练数据和mask_phrase训练数据,具体包括:S200, training data generation, masking processing is performed through the mask_word mode and the mask_phrase mode to generate mask_word training data and mask_phrase training data respectively, specifically including:S210、将数据复制两份,分别命名为数据一和数据二;S210, copy the data into two copies, and name them data 1 and data 2 respectively;S220、依据BERT的训练数据生成方式,对所述数据一和所述数据二中的文本创建训练实例,每个实例包括多个句子;S220, creating training instances for the texts in the data 1 and the data 2 according to the training data generation method of BERT, each instance including a plurality of sentences;S230、所述数据一和所述数据二创建的实例分别采用mask_word模式和mask_phrase模式进行遮盖处理;S230, the instances created by the data 1 and the data 2 are masked using the mask_word mode and the mask_phrase mode respectively;S240、生成mask_word训练数据,从所述数据一创建的实例中的句子中随机选取15%的单词放入CandidateSet1,将所述CandidateSet1的每个单词以80%几率替换成“[MASK]”,10%几率保持使用其他随机单词进行替换,10%几率保持不变;S240, generating mask_word training data, randomly selecting 15% of the words from the sentences in the instance created by the data 1 and putting them into CandidateSet1, replacing each word in the CandidateSet1 with "[MASK]" with an 80% probability, replacing it with other random words with a 10% probability, and keeping it unchanged with a 10% probability;S250、生成mask_phrase训练数据,随机选取所述数据二创建的实例中的句子中的命名实体和名词短语,加入到CandidateSet2,将所述CandidateSet2中的每个词块以80%几率替换成“[MASK]”,10%几率保持使用其他随机单词进行替换,10%几率保持不变,每个词块中的所有词替换行为要一致,即同时替换或都保持不变;S250, generate mask_phrase training data, randomly select named entities and noun phrases in the sentences of the instance created by the data 2, add them to CandidateSet2, replace each word block in the CandidateSet2 with "[MASK]" with an 80% probability, keep using other random words for replacement with a 10% probability, and keep it unchanged with a 10% probability, and the replacement behavior of all words in each word block should be consistent, that is, replaced at the same time or remain unchanged;S300、预训练,根据训练模式选择因子自适应地切换所述word_learning模式或所述phrase_learning模式进行训练,具体包括:S300, pre-training, select factors according to training mode Adaptively switching the word_learning mode or the phrase_learning mode for training specifically includes:S310、预热训练,首先学习基本的单词,前20% 训练步数使用所述word_learning模式进行预热训练,并保存初始word_learning预测损失和初始phrase_learning预测损失S310, warm-up training, first learn basic words, the first 20% of the training steps use the word_learning mode for warm-up training, and save the initial word_learning prediction loss and the initial phrase_learning prediction loss ;S302.自适应训练,后80% 训练步数根据所述选择因子决定第t+1步采用word_learning或phrase_learning模式训练,具体如下:S302. Adaptive training, the last 80% of the training steps are based on the selection factor Decide whether to use word_learning or phrase_learning mode for training in step t+1, as follows:当所述选择因子>0时,第t+1步采用所述word_learning模式,否则采用所述phrase_learning模式继续训练。When the selection factor >0, the word_learning mode is adopted in step t+1, otherwise the phrase_learning mode is adopted to continue training.2.如权利要求1所述的基于共指消除的语言模型预训练方法,其特征在于,所述步骤S100包括:2. The language model pre-training method based on coreference elimination according to claim 1, wherein step S100 comprises:S110、获取英文维基百科数据;S110, obtain English Wikipedia data;S120、提取所述语料中的代词,建立代词集合PronounSet;S120, extracting pronouns from the corpus and establishing a pronoun set PronounSet;S130、提取所述语料中所有命名实体,建立实体集合EntitySet;S130, extracting all named entities in the corpus and establishing an entity set EntitySet;S140、提取所述语料中的名词短语,并去掉与所述EntitySet重合的短语,得到名词短语集合NounPhraseSet。S140: extract noun phrases from the corpus, and remove phrases that overlap with the EntitySet to obtain a noun phrase set NounPhraseSet.3.如权利要求1所述的基于共指消除的语言模型预训练方法,其特征在于,所述处理工具为python自然语言处理工具包Spacy中的实体识别模块。3. The language model pre-training method based on coreference elimination as described in claim 1 is characterized in that the processing tool is an entity recognition module in the python natural language processing toolkit Spacy.4.如权利要求2所述的基于共指消除的语言模型预训练方法,其特征在于,所述步骤S140使用Spacy中的名词短语提取模块。4. The language model pre-training method based on coreference elimination as described in claim 2 is characterized in that the step S140 uses a noun phrase extraction module in Spacy.5.如权利要求1所述的基于共指消除的语言模型预训练方法,其特征在于,所述步骤S220中的处理方法为:句子长度限制为128个英语单词,少于128的进行补齐处理,多于128的进行截断处理。5. The language model pre-training method based on coreference elimination as described in claim 1 is characterized in that the processing method in step S220 is: the sentence length is limited to 128 English words, and those less than 128 are padded, and those more than 128 are truncated.6.如权利要求1所述的基于共指消除的语言模型预训练方法,其特征在于,所述步骤S250中CandidateSet2中选取的的命名实体和名词数占句子长度的15%,其中命名实体和名词短语各占50%。6. The language model pre-training method based on coreference elimination as described in claim 1 is characterized in that the number of named entities and nouns selected in CandidateSet2 in step S250 accounts for 15% of the sentence length, of which named entities and noun phrases each account for 50%.7.如权利要求1所述的基于共指消除的语言模型预训练方法,其特征在于,所述步骤S300包括;在所述word_learning 模式下, 所述mask_word训练数据输入到BERT网络中对遮盖的单词进行预测,计算相应损失;在所述phrase_learning 模式下, 所述mask_phrase训练数据输入到BERT网络中对遮盖的短语进行预测,计算相应损失。7. The language model pre-training method based on coreference elimination according to claim 1, characterized in that the step S300 comprises: in the word_learning mode, the mask_word training data is input into the BERT network to predict the masked words and calculate the corresponding loss; in the phrase_learning mode, the mask_phrase training data is input into the BERT network to predict the masked phrases and calculate the corresponding loss.
CN202111237852.9A2021-10-252021-10-25 A language model pre-training method based on coreference eliminationActiveCN113886591B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202111237852.9ACN113886591B (en)2021-10-252021-10-25 A language model pre-training method based on coreference elimination

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202111237852.9ACN113886591B (en)2021-10-252021-10-25 A language model pre-training method based on coreference elimination

Publications (2)

Publication NumberPublication Date
CN113886591A CN113886591A (en)2022-01-04
CN113886591Btrue CN113886591B (en)2025-06-27

Family

ID=79013503

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202111237852.9AActiveCN113886591B (en)2021-10-252021-10-25 A language model pre-training method based on coreference elimination

Country Status (1)

CountryLink
CN (1)CN113886591B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN111160006A (en)*2019-12-062020-05-15北京明略软件系统有限公司Method and device for realizing reference resolution
CN111428490A (en)*2020-01-172020-07-17北京理工大学 A Weakly Supervised Learning Method for Referential Resolution Using Language Models

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN111160006A (en)*2019-12-062020-05-15北京明略软件系统有限公司Method and device for realizing reference resolution
CN111428490A (en)*2020-01-172020-07-17北京理工大学 A Weakly Supervised Learning Method for Referential Resolution Using Language Models

Also Published As

Publication numberPublication date
CN113886591A (en)2022-01-04

Similar Documents

PublicationPublication DateTitle
Alkhatib et al.Deep learning for Arabic error detection and correction
Dos Santos et al.Deep convolutional neural networks for sentiment analysis of short texts
KR101715118B1 (en)Deep Learning Encoding Device and Method for Sentiment Classification of Document
Li et al.Empowering large language models for textual data augmentation
CN114091448A (en) Text adversarial sample generation method, system, computer device and storage medium
Solyman et al.Proposed model for arabic grammar error correction based on convolutional neural network
Al-Ibrahim et al.Neural machine translation from Jordanian Dialect to modern standard Arabic
Xiao et al.Toxicloakcn: Evaluating robustness of offensive language detection in chinese with cloaking perturbations
Dong et al.Neural grapheme-to-phoneme conversion with pre-trained grapheme models
Shahid et al.Next word prediction for Urdu language using deep learning models
Al-Qaraghuli et al.Correcting arabic soft spelling mistakes using transformers
CN114742037A (en) Text error correction method, apparatus, computer equipment and storage medium
CN113886591B (en) A language model pre-training method based on coreference elimination
Malik et al.Urdu named entity recognition system using hidden Markov model
Sun et al.From characters to words: Hierarchical pre-trained language model for open-vocabulary language understanding
CN115437511B (en)Pinyin Chinese character conversion method, conversion model training method and storage medium
Brachemi-Meftah et al.Impact of dimensionality reduction on sentiment analysis of Algerian dialect
Palomino et al.Palomino-Ochoa at TASS 2020: Transformer-based Data Augmentation for Overcoming Few-Shot Learning.
Cross et al.Glossy bytes: Neural glossing using subword encoding
AlAjlouni et al.Knowledge transfer to solve split and rephrase
ChenText classification model based on long short-term memory with L2 regularization
Mekki et al.COTA 2.0: An automatic corrector of Tunisian Arabic social media texts
CN115809658A (en) Parallel corpus generation method and device and unsupervised synonymous transcription method and device
Toyin et al.A Hidden Markov Model-Based Parts-of-Speech Tagger for Yoruba Language
Ji et al.CANCN-BERT: a joint pre-trained language model for classical and modern Chinese

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp