YerevaNN/WARPPublic

NotificationsYou must be signed in to change notification settings
Fork16
Star83

Code for ACL'2021 paper WARP 🌀 Word-level Adversarial ReProgramming. Outperforming `GPT-3` on SuperGLUE Few-Shot text classification.https://aclanthology.org/2021.acl-long.381/

License

MIT license

83 stars 16 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
arp		arp
configs		configs
glue		glue
super_glue		super_glue
.allennlp_plugins		.allennlp_plugins
.flake8		.flake8
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Repository files navigation

🌀 WARP: Word-level Adversarial ReProgramming

This repository contains code for ACL'2021 PaperWARP: Word-level Adversarial ReProgramming.

^{WARP adds a few trainable embeddings around the input, which causes the masked language model to predict the sentiment of the sentence in the SST-2 task.}

Transfer learning from pretrained language models recently became the dominant approach for solving many NLP tasks. A common approach to transfer learning for multiple tasks that maximize parameter sharing trains one or more task-specific layers on top of the language model.

In this paper, we present an alternative approach based on adversarial reprogramming, which extends earlier work on automatic prompt generation. Adversarial reprogramming attempts to learn task-specific word embeddings that, when concatenated to the input text, instruct the language model to solve the specified task.

Using up to 25K trainable parameters per task, this approach outperforms all existing methods that use up to 25M trainable parameters on the public leaderboard of the GLUE benchmark. Our method, initialized with task-specific human-readable prompts, also works in a few-shot setting, outperforming GPT-3 on two SuperGLUE tasks after training on just 32 samples.

Few-Shot Results

Set	Model	CB		RTE
Set	Model	F₁	Acc.	Acc.
dev
	GPT-3 Small	26.1	42.9	52.3
	GPT-3 Med	40.4	58.9	48.4
	GPT-3	57.2	82.1	72.9
	PET (ALBERT)	59.4	85.1	69.8
	iPET (ALBERT)	92.4	92.9	74.0
	WARP_init (ALBERT)	84.0	87.5	71.8
test
	GPT-3	52.0	75.6	69.0
	PET (ALBERT)	60.2	87.2	67.2
	iPET (ALBERT)	79.9	88.8	70.8
	WARP_init (ALBERT)	70.2	82.4	69.1

^{Results on SuperGLUE benchmark. The results for the test set are obtained from SuperGLUE evaluation server.We only show systems performing in a similar few-shot training setup using 32 examples.}

Setup

The code requires YerevaNN's internal version ofallennlp

git clone https://github.com/YerevaNN/allennlpgit checkout warppip install .

Training

Linear Probing

forDATASETin'cola''sst2''mrpc''qqp''stsb''mnli''rte''wnli''qnli'doexport HPARAMS='{        "dataset": "'$DATASET'",        "lr": 0.0001,        "num_epochs": 20,        "prompts": [],        "reorder_optimized": false,        "max_batch_size": 8,        "max_tokens_sq": 262144, "on_logits":  false, "pooling_index":  null, "seed":  1}'    python -m allennlp train \    -s .aim/baseline-linear-${DATASET} configs/warp.jsonnetdone

WARP_0

forDATASETin'cola''sst2''mrpc''qqp''stsb''mnli''rte''wnli''qnli'doexport HPARAMS='{        "dataset": "'$DATASET'",        "lr": 0.0001,        "num_epochs": 20,        "prompts": [null, "<mask>"],        "reorder_optimized": true,        "max_batch_size": 8,        "max_tokens_sq": 262144,        "on_logits": "pre_decoder_layer_norm",        "pooling_index": 1,        "seed": 1    }'    python -m allennlp train \    -s .aim/baseline-warp_0-${DATASET} configs/warp.jsonnetdone

Training WARP

export DATASET="rte"export HPARAMS='{    "benchmark":"super_glue",    "classifier_init":null,    "dataset":"'$DATASET'",    "ensure_whitespace_between":false,    "lr":0.001,    "max_batch_size":8,    "max_tokens_sq":262144,    "num_epochs":30,    "prompt_better_init":"<mask>",    "prompts":[-10,-11,-12,-13,-14,null,-15,-16,-17,-18,-19,"<mask>",-20,-21,-22,-23,-24,null,-25,-26,-27,-28,-29],    "seed":1,    "transformer_model":"roberta-large"}'python -m allennlp train \-s .aim/t-${DATASET} configs/warp.jsonnet

WARP_init

Few-Shot Experiments

export HPARAMS='{    "benchmark":"super_glue",    "classifier_init": {        "entailment": " yes",        "not_entailment": " instead"    },    "dataset":"few_rte",    "eval_mode":false,    "lr":0.001,    "max_batch_size":2,    "max_tokens_sq":131072,    "num_epochs":100,    "num_gradient_accumulation_steps":2,    "prompt_better_init": "[PAD]",    "prompts":[-10,-11,[-14,"\""],null,[-15,"\""],  [-16, "?"], "<mask>", [-20, ","], null, [-29, "!"],-30,-31],    "seed":3,    "str_cut_frac":0,    "transformer_model":"albert-xxlarge-v2",    "validation_metric": null}'python -m allennlp train \-s .aim/t-${DATASET}-`date +%s` configs/warp.jsonnet

export HPARAMS='{   "benchmark":"super_glue",   "classifier_init":{      "entailment":" yes",      "not_entailment":" instead"   },   "dataset":"few_rte",   "grad_norm":1,   "lr":0.001,   "max_batch_size":2,   "max_tokens_sq":131072,   "num_epochs":30,   "num_gradient_accumulation_steps":2,   "prompt_better_init":"[PAD]",   "prompts":[-10,-11,[-14,"\""],null,[-15,"\""],[-16,"?"],"<mask>",[-20,","],null,[-29,"!"],-30,-31],   "seed":1,   "str_cut_frac":0.06,   "transformer_model":"albert-xxlarge-v2",   "validation_metric":"+training_val_metric"}'python -m allennlp train \-s .aim/t-${DATASET}-`date +%s` configs/warp.jsonnet

Evaluation

python -m allennlp predict \  --silent --use-dataset-reader --cuda-device 0 \  --batch-size 50 \  --predictor glue --output-file v0.1/AX.tsv /data/arp/.aim/H-93ae5ae9 ax/test

python -m allennlp predict \  --silent --use-dataset-reader --cuda-device 0 \  --batch-size 50 \  --predictor glue --output-file v0.1/MNLI-m.tsv /data/arp/.aim/H-93ae5ae9 test_matched

Citation

If you want to refer to our work use this bibTeX:

@inproceedings{hambardzumyan-etal-2021-warp,    title = "{WARP}: {W}ord-level {A}dversarial {R}e{P}rogramming",    author = "Hambardzumyan, Karen  and      Khachatrian, Hrant  and      May, Jonathan",    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",    month = aug,    year = "2021",    address = "Online",    publisher = "Association for Computational Linguistics",    url = "https://aclanthology.org/2021.acl-long.381",    doi = "10.18653/v1/2021.acl-long.381",    pages = "4921--4933"}

About

Code for ACL'2021 paper WARP 🌀 Word-level Adversarial ReProgramming. Outperforming `GPT-3` on SuperGLUE Few-Shot text classification.https://aclanthology.org/2021.acl-long.381/

mahnerak.com/WARP

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

🌀 WARP: Word-level Adversarial ReProgramming

Few-Shot Results

Setup

Training

Linear Probing

WARP_0

Training WARP

WARP_init

Few-Shot Experiments

Evaluation

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors3

Uh oh!

Languages

Movatterモバイル変換

License

YerevaNN/WARP

Folders and files

Latest commit

History

Repository files navigation

🌀 WARP: Word-level Adversarial ReProgramming

Few-Shot Results

Setup

Training

Linear Probing

WARP_0

Training WARP

WARP_init

Few-Shot Experiments

Evaluation

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors3

Uh oh!

Languages

Packages