- Notifications
You must be signed in to change notification settings - Fork1
A Keras-based and TensorFlow-backend NLP Models Toolkit.
License
NotificationsYou must be signed in to change notification settings
4AI/langml
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
LangML (LanguageModeL) is a Keras-based and TensorFlow-backend language model toolkit, which provides mainstream pre-trained language models, e.g., BERT/RoBERTa/ALBERT, and their downstream application models.
- Common and widely-used Keras layers: CRF, Transformer, Attentions: Additive, ScaledDot, MultiHead, GatedAttentionUnit, and so on.
- Pretrained Language Models: BERT, RoBERTa, ALBERT. Providing friendly designed interfaces and easy to implement downstream singleton, shared/unshared two-tower or multi-tower models.
- Tokenizers: WPTokenizer (wordpiece), SPTokenizer (sentencepiece)
- Baseline models: Text Classification, Named Entity Recognition, Contrastive Learning. It's no need to write any code, and just need to preprocess the data into a specific format and use the "langml-cli" to train various baseline models.
- Prompt-Based Tuning: PTuning
You can install or upgrade langml/langml-cli via the following command:
pip install -U langml
- Use pure Keras (default setting)
export TF_KERAS=0
- Use TensorFlow Keras
export TF_KERAS=1
fromlangmlimportWPTokenizer,SPTokenizerfromlangmlimportload_bert,load_albert# load bert / roberta plmbert_model,bert=load_bert(config_path,checkpoint_path)# load albert plmalbert_model,albert=load_albert(config_path,checkpoint_path)# load wordpiece tokenizerwp_tokenizer=WPTokenizer(vocab_path,lowercase)# load sentencepiece tokenizersp_tokenizer=SPTokenizer(vocab_path,lowercase)
fromlangmlimportkeras,Lfromlangmlimportload_bertconfig_path='/path/to/bert_config.json'ckpt_path='/path/to/bert_model.ckpt'vocab_path='/path/to/vocab.txt'bert_model,bert_instance=load_bert(config_path,ckpt_path)# get CLS representationcls_output=L.Lambda(lambdax:x[:,0])(bert_model.output)output=L.Dense(2,activation='softmax',kernel_intializer=bert_instance.initializer)(cls_output)train_model=keras.Model(bert_model.input,cls_output)train_model.summary()train_model.compile(loss='categorical_crossentropy',optimizer=keras.optimizer.Adam(1e-5))
- Text Classification
$ langml-cli baseline clf --helpUsage: langml baseline clf [OPTIONS] COMMAND [ARGS]... classificationcommand line toolsOptions: --help Show this message and exit.Commands: bert bilstm textcnn
- Named Entity Recognition
$ langml-cli baseline ner --helpUsage: langml baseline ner [OPTIONS] COMMAND [ARGS]... nercommand line toolsOptions: --help Show this message and exit.Commands: bert-crf lstm-crf
- Contrastive Learning
$ langml-cli baseline contrastive --helpUsage: langml baseline contrastive [OPTIONS] COMMAND [ARGS]... contrastive learningcommand line toolsOptions: --help Show this message and exit.Commands: simcse
- Text Matching
$ langml-cli baseline matching --helpUsage: langml baseline matching [OPTIONS] COMMAND [ARGS]... text matchingcommand line toolsOptions: --help Show this message and exit.Commands: sbert
Please visit thelangml.readthedocs.io to check the latest documentation.
The implementation of pretrained language model is inspired byCyberZHG/keras-bert andbojone/bert4keras.
About
A Keras-based and TensorFlow-backend NLP Models Toolkit.
Topics
Resources
License
Stars
Watchers
Forks
Packages0
No packages published