Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

An Extensible Continual Learning Framework Focused on Language Models (LMs)

NotificationsYou must be signed in to change notification settings

UIC-Liu-Lab/ContinualLM

Repository files navigation


Imagine an LM that not only effortlessly acquires new knowledge but also retains its mastery of skills, all while successfully transferring knowledge. Is it even possible?

News

🔥 We have addedcheckpoints in Hugging Face for easier reproduction!
🔥 We have addedcontinual_pretrain.ipynb as aself-contained example of the soft-masking scenario. It runs well without GPUs!
🔥 Soft-masking can also work inconventional continual fine-tuning. Check out our latestEMNLP23 paper!
🔥 Wondering whether you can adapt ablack-box LLM without worrying about the update of its parameters? Check out our latest paper on retrieval-augmented generation (RAG)here!

Quick Links

Introduction

In 2021, we introducedPycontinual, a straightforward and flexible framework for continual learning. Our research has benefited significantly from this framework. Today, we are excited to share theContinualLM, an extensible continual learning framework focused on language models (LMs), designed to sustain the benefits of continual learning (CL) in this field.

Continual learning for LMs is distinct from traditional CL because

  • Each task is treated as adomain-specific corpus (at present, our primary focus is on domain-adaptive pre-training, which is also known as pre-finetuning or post-training).
  • Moreover, the evaluation process involvesfine-tuning the corresponding end-task.

Our repository includes a PyTorch implementation of a collection of state-of-the-art (SoTA) methods, using the same training and evaluation pipeline. This repository is committed to advancing the field of continual learning for LMs. The methods included are:

Simple Example

We have addedcontinual_pretrain.ipynb as a self-contained example of the soft-masking scenario. It runs well without GPUs!

Dataset

When it comes to the continual learning of language models (LMs), finding appropriate datasets is crucial. The datasets we provide adhere to the following principles:

  • Domain-specific: The domain corpus must be specific enough to enhance end-task performance.
  • End-task available: We favor assessing the trained language models through the end-task rather than relying on perplexity, since the former represents a more dependable evaluation approach.

We release our dataset comprising6 distinct domains, each accompanied by its corresponding end-task. The dataset can be foundhere. Below are some statistics for each domain:

Domain CorpusSizeEnd-taskTask#Training#Testing#Classes
Yelp Restaurant758MBRestaurantAspect Sentiment Classification (ASC)3,4521,1203
Amazon Phone724MBPhoneAspect Sentiment Classification (ASC)2395532
Amazon Camera319MBCameraAspect Sentiment Classification (ASC)2306262
ACL Papers867MBACLCitation Intent Classification1,5204216
AI Papers507MBAIRelation Classification2,2602,3887
PubMed Papers989MBPubMedChemical-protein Interaction Prediction2,6677,39813

Architecture

The architecture of ContinualLM largely follows that ofPycontinual,CPT andDGA.

Installation

conda create --name continuallm --file requirements.txt

⚠️ Our model is based ontransformers==4.17.0 andadapter-transformers==3.0.1. We recommend using these specific versions, as using other versions may result in unexpected bugs.

Domain-adaptive Pre-training

This is where continual learning happens. We will learn a sequnce of domains.

max_samples=640000foridrandomin 0doforpt_taskin 0 1 2 3 4 5do     python -m torch.distributed.launch --nproc_per_node 4 --use_env posttrain.py\    --per_device_train_batch_size 62\ --fp16\    --max_seq_length 164\ --max_samples${max_samples}\ --idrandom${idrandom}\ --ntasks 6\ --pt_task${pt_task}\ --baseline'das'donedone
  • --idrandom: choose the task sequence. See./sequences for more details.
  • --baseline: see the introduction for available baseline models (seechoices inconfig.py).

End-task Fine-tuning

After conitinual learning of LMs, now we are able to evaluate the performace by runing end-task fine-tuningindividually.

max_samples=640000     seed=(2021 111 222 333 444 555 666 777 888 999)forroundin 0;doforidrandomin 0;doforpt_taskin 0 1 2 3 4 5doforft_taskin$(seq 0${pt_task});do           python finetune.py\          --max_seq_length 164\       --pt_task${pt_task}\       --ft_task${ft_task}\       --idrandom${idrandom}\       --ntasks 6\       --max_samples${max_samples} \       --seed${seed[$round]}\       --baseline'das'donedonedonedone

Checkpoints in Huggingface

For those who are interested solely in the resulting model or want to continue per-training the model with their own data, we have good news! We offer checkpoints through Hugging Face.

You can easily import our continually post-trained model with HuggingFace'stransformers!

importtorchfromtransformersimportAutoTokenizer,AutoModelForSequenceClassification# Import our model. The package will take care of downloading the models automaticallytokenizer=AutoTokenizer.from_pretrained("UIC-Liu-Lab/DAS-Rest2Cam")model=AutoModelForSequenceClassification.from_pretrained("UIC-Liu-Lab/DAS-Rest2Cam",trust_remote_code=True)# Tokenize input textstexts= ["There's a kid on a skateboard.","A kid is skateboarding.","A kid is inside the house."]inputs=tokenizer(texts,padding=True,truncation=True,return_tensors="pt")# Get the model output!res=model(**inputs)

If you encounter any problem when directly loading the models by HuggingFace's API, you can also download the models manually from therepo and usemodel = AutoModel.from_pretrained({PATH TO THE DOWNLOAD MODEL}).

⚠ The continual pre-training sequence is thefirst sequence at./sequences/posttrain (fromRestaurant to Camera), you can use the downloaded weights to fine-tune the corresponding end-task.

⚠ If you are interested in the importance files, please refer tobefore_distill0 andafter_mlm{domain_id}.before signifies the importance computed before pre-training, which is done only once before the first domain for general pre-trained knowledge.after indicates the importance computed after the pre-training of domain_id.

Reference

We highly appreciate your act of staring and citing. Your attention to detail and recognition is greatly valued.

@inproceedings{ke2022dgs,title={Continual Learning of Language Models},author={Ke, Zixuan and Shao, Yijia and Lin, Haowei and Konishi, Tatsuya and Kim, Gyuhak and Liu, Bing},booktitle={International Conference on Learning Representations (ICLR)},year={2023}}@inproceedings{ke2022dga,title={Adapting a Language Model While Preserving its General Knowledge},author={Ke, Zixuan and Shao, Yijia and Lin, Haowei and Xu, Hu and Shu, Lei, and Liu, Bing},booktitle={Empirical Methods in Natural Language Processing (EMNLP)},year={2022}}@inproceedings{ke2022continual,title={Continual Training of Language Models for Few-Shot Learning},author={Ke, Zixuan and Lin, Haowei and Shao, Yijia and Xu, Hu and Shu, Lei, and Liu, Bing},booktitle={Empirical Methods in Natural Language Processing (EMNLP)},year={2022}}

Contact

If you have any questions regarding the code, please feel free to send an email toZixuan Ke,Yijia Shao, orHaowei Lin. Alternatively, you may open an issue. We would like to express our gratitude toBing Liu,Hu Xu, andLei Shu for their valuable comments and opinions


[8]ページ先頭

©2009-2025 Movatter.jp