- Notifications
You must be signed in to change notification settings - Fork1
A Keras-based and TensorFlow-backend NLP Models Toolkit.
License
NotificationsYou must be signed in to change notification settings
4AI/langml
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
LangML (LanguageModeL) is a Keras-based and TensorFlow-backend language model toolkit, which provides mainstream pre-trained language models, e.g., BERT/RoBERTa/ALBERT, and their downstream application models.
- Common and widely-used Keras layers: CRF, Transformer, Attentions: Additive, ScaledDot, MultiHead, GatedAttentionUnit, and so on.
- Pretrained Language Models: BERT, RoBERTa, ALBERT. Providing friendly designed interfaces and easy to implement downstream singleton, shared/unshared two-tower or multi-tower models.
- Tokenizers: WPTokenizer (wordpiece), SPTokenizer (sentencepiece)
- Baseline models: Text Classification, Named Entity Recognition, Contrastive Learning. It's no need to write any code, and just need to preprocess the data into a specific format and use the "langml-cli" to train various baseline models.
- Prompt-Based Tuning: PTuning
You can install or upgrade langml/langml-cli via the following command:
pip install -U langml
- Use pure Keras (default setting)
export TF_KERAS=0
- Use TensorFlow Keras
export TF_KERAS=1
fromlangmlimportWPTokenizer,SPTokenizerfromlangmlimportload_bert,load_albert# load bert / roberta plmbert_model,bert=load_bert(config_path,checkpoint_path)# load albert plmalbert_model,albert=load_albert(config_path,checkpoint_path)# load wordpiece tokenizerwp_tokenizer=WPTokenizer(vocab_path,lowercase)# load sentencepiece tokenizersp_tokenizer=SPTokenizer(vocab_path,lowercase)
fromlangmlimportkeras,Lfromlangmlimportload_bertconfig_path='/path/to/bert_config.json'ckpt_path='/path/to/bert_model.ckpt'vocab_path='/path/to/vocab.txt'bert_model,bert_instance=load_bert(config_path,ckpt_path)# get CLS representationcls_output=L.Lambda(lambdax:x[:,0])(bert_model.output)output=L.Dense(2,activation='softmax',kernel_intializer=bert_instance.initializer)(cls_output)train_model=keras.Model(bert_model.input,cls_output)train_model.summary()train_model.compile(loss='categorical_crossentropy',optimizer=keras.optimizer.Adam(1e-5))
- Text Classification
$ langml-cli baseline clf --helpUsage: langml baseline clf [OPTIONS] COMMAND [ARGS]... classificationcommand line toolsOptions: --help Show this message and exit.Commands: bert bilstm textcnn
- Named Entity Recognition
$ langml-cli baseline ner --helpUsage: langml baseline ner [OPTIONS] COMMAND [ARGS]... nercommand line toolsOptions: --help Show this message and exit.Commands: bert-crf lstm-crf
- Contrastive Learning
$ langml-cli baseline contrastive --helpUsage: langml baseline contrastive [OPTIONS] COMMAND [ARGS]... contrastive learningcommand line toolsOptions: --help Show this message and exit.Commands: simcse
- Text Matching
$ langml-cli baseline matching --helpUsage: langml baseline matching [OPTIONS] COMMAND [ARGS]... text matchingcommand line toolsOptions: --help Show this message and exit.Commands: sbert
Please visit thelangml.readthedocs.io to check the latest documentation.
The implementation of pretrained language model is inspired byCyberZHG/keras-bert andbojone/bert4keras.
About
A Keras-based and TensorFlow-backend NLP Models Toolkit.
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
No packages published
Uh oh!
There was an error while loading.Please reload this page.