Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
/langmlPublic

A Keras-based and TensorFlow-backend NLP Models Toolkit.

License

NotificationsYou must be signed in to change notification settings

4AI/langml

Repository files navigation

LangML (LanguageModeL) is a Keras-based and TensorFlow-backend language model toolkit, which provides mainstream pre-trained language models, e.g., BERT/RoBERTa/ALBERT, and their downstream application models.

pypi

Outline

Features

  • Common and widely-used Keras layers: CRF, Transformer, Attentions: Additive, ScaledDot, MultiHead, GatedAttentionUnit, and so on.
  • Pretrained Language Models: BERT, RoBERTa, ALBERT. Providing friendly designed interfaces and easy to implement downstream singleton, shared/unshared two-tower or multi-tower models.
  • Tokenizers: WPTokenizer (wordpiece), SPTokenizer (sentencepiece)
  • Baseline models: Text Classification, Named Entity Recognition, Contrastive Learning. It's no need to write any code, and just need to preprocess the data into a specific format and use the "langml-cli" to train various baseline models.
  • Prompt-Based Tuning: PTuning

Installation

You can install or upgrade langml/langml-cli via the following command:

pip install -U langml

Quick Start

Specify the Keras variant

  1. Use pure Keras (default setting)
export TF_KERAS=0
  1. Use TensorFlow Keras
export TF_KERAS=1

Load pretrained language models

fromlangmlimportWPTokenizer,SPTokenizerfromlangmlimportload_bert,load_albert# load bert / roberta plmbert_model,bert=load_bert(config_path,checkpoint_path)# load albert plmalbert_model,albert=load_albert(config_path,checkpoint_path)# load wordpiece tokenizerwp_tokenizer=WPTokenizer(vocab_path,lowercase)# load sentencepiece tokenizersp_tokenizer=SPTokenizer(vocab_path,lowercase)

Finetune a model

fromlangmlimportkeras,Lfromlangmlimportload_bertconfig_path='/path/to/bert_config.json'ckpt_path='/path/to/bert_model.ckpt'vocab_path='/path/to/vocab.txt'bert_model,bert_instance=load_bert(config_path,ckpt_path)# get CLS representationcls_output=L.Lambda(lambdax:x[:,0])(bert_model.output)output=L.Dense(2,activation='softmax',kernel_intializer=bert_instance.initializer)(cls_output)train_model=keras.Model(bert_model.input,cls_output)train_model.summary()train_model.compile(loss='categorical_crossentropy',optimizer=keras.optimizer.Adam(1e-5))

Use langml-cli to train baseline models

  1. Text Classification
$ langml-cli baseline clf --helpUsage: langml baseline clf [OPTIONS] COMMAND [ARGS]...  classificationcommand line toolsOptions:  --help  Show this message and exit.Commands:  bert  bilstm  textcnn
  1. Named Entity Recognition
$ langml-cli baseline ner --helpUsage: langml baseline ner [OPTIONS] COMMAND [ARGS]...  nercommand line toolsOptions:  --help  Show this message and exit.Commands:  bert-crf  lstm-crf
  1. Contrastive Learning
$ langml-cli baseline contrastive --helpUsage: langml baseline contrastive [OPTIONS] COMMAND [ARGS]...  contrastive learningcommand line toolsOptions:  --help  Show this message and exit.Commands:  simcse
  1. Text Matching
$ langml-cli baseline matching --helpUsage: langml baseline matching [OPTIONS] COMMAND [ARGS]...  text matchingcommand line toolsOptions:  --help  Show this message and exit.Commands:  sbert

Documentation

Please visit thelangml.readthedocs.io to check the latest documentation.

Reference

The implementation of pretrained language model is inspired byCyberZHG/keras-bert andbojone/bert4keras.


[8]ページ先頭

©2009-2025 Movatter.jp