4AI/langmlPublic

NotificationsYou must be signed in to change notification settings
Fork1
Star11

A Keras-based and TensorFlow-backend NLP Models Toolkit.

License

MIT license

11 stars 1 fork Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.github/workflows		.github/workflows
docs		docs
examples/prompt		examples/prompt
langml		langml
tests		tests
.bumpversion.cfg		.bumpversion.cfg
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dev-requirements.txt		dev-requirements.txt
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Repository files navigation

LangML (LanguageModeL) is a Keras-based and TensorFlow-backend language model toolkit, which provides mainstream pre-trained language models, e.g., BERT/RoBERTa/ALBERT, and their downstream application models.

Outline

Features

Common and widely-used Keras layers: CRF, Transformer, Attentions: Additive, ScaledDot, MultiHead, GatedAttentionUnit, and so on.
Pretrained Language Models: BERT, RoBERTa, ALBERT. Providing friendly designed interfaces and easy to implement downstream singleton, shared/unshared two-tower or multi-tower models.
Tokenizers: WPTokenizer (wordpiece), SPTokenizer (sentencepiece)
Baseline models: Text Classification, Named Entity Recognition, Contrastive Learning. It's no need to write any code, and just need to preprocess the data into a specific format and use the "langml-cli" to train various baseline models.
Prompt-Based Tuning: PTuning

Installation

You can install or upgrade langml/langml-cli via the following command:

pip install -U langml

Quick Start

Specify the Keras variant

Use pure Keras (default setting)

export TF_KERAS=0

Use TensorFlow Keras

export TF_KERAS=1

Load pretrained language models

fromlangmlimportWPTokenizer,SPTokenizerfromlangmlimportload_bert,load_albert# load bert / roberta plmbert_model,bert=load_bert(config_path,checkpoint_path)# load albert plmalbert_model,albert=load_albert(config_path,checkpoint_path)# load wordpiece tokenizerwp_tokenizer=WPTokenizer(vocab_path,lowercase)# load sentencepiece tokenizersp_tokenizer=SPTokenizer(vocab_path,lowercase)

Finetune a model

fromlangmlimportkeras,Lfromlangmlimportload_bertconfig_path='/path/to/bert_config.json'ckpt_path='/path/to/bert_model.ckpt'vocab_path='/path/to/vocab.txt'bert_model,bert_instance=load_bert(config_path,ckpt_path)# get CLS representationcls_output=L.Lambda(lambdax:x[:,0])(bert_model.output)output=L.Dense(2,activation='softmax',kernel_intializer=bert_instance.initializer)(cls_output)train_model=keras.Model(bert_model.input,cls_output)train_model.summary()train_model.compile(loss='categorical_crossentropy',optimizer=keras.optimizer.Adam(1e-5))

Use langml-cli to train baseline models

Text Classification

$ langml-cli baseline clf --helpUsage: langml baseline clf [OPTIONS] COMMAND [ARGS]...  classificationcommand line toolsOptions:  --help  Show this message and exit.Commands:  bert  bilstm  textcnn

Named Entity Recognition

$ langml-cli baseline ner --helpUsage: langml baseline ner [OPTIONS] COMMAND [ARGS]...  nercommand line toolsOptions:  --help  Show this message and exit.Commands:  bert-crf  lstm-crf

Contrastive Learning

$ langml-cli baseline contrastive --helpUsage: langml baseline contrastive [OPTIONS] COMMAND [ARGS]...  contrastive learningcommand line toolsOptions:  --help  Show this message and exit.Commands:  simcse

Text Matching

$ langml-cli baseline matching --helpUsage: langml baseline matching [OPTIONS] COMMAND [ARGS]...  text matchingcommand line toolsOptions:  --help  Show this message and exit.Commands:  sbert

Documentation

Please visit thelangml.readthedocs.io to check the latest documentation.

Reference

The implementation of pretrained language model is inspired byCyberZHG/keras-bert andbojone/bert4keras.

About

A Keras-based and TensorFlow-backend NLP Models Toolkit.

langml.readthedocs.io

Releases

11tags

Packages

No packages published

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Outline

Features

Installation

Quick Start

Specify the Keras variant

Load pretrained language models

Finetune a model

Use langml-cli to train baseline models

Documentation

Reference

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

License

4AI/langml

Folders and files

Latest commit

History

Repository files navigation

Outline

Features

Installation

Quick Start

Specify the Keras variant

Load pretrained language models

Finetune a model

Use langml-cli to train baseline models

Documentation

Reference

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages