- Notifications
You must be signed in to change notification settings - Fork16
Code for ACL'2021 paper WARP 🌀 Word-level Adversarial ReProgramming. Outperforming `GPT-3` on SuperGLUE Few-Shot text classification.https://aclanthology.org/2021.acl-long.381/
License
YerevaNN/WARP
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This repository contains code for ACL'2021 PaperWARP: Word-level Adversarial ReProgramming.
WARP adds a few trainable embeddings around the input, which causes the masked language model to predict the sentiment of the sentence in the SST-2 task.Transfer learning from pretrained language models recently became the dominant approach for solving many NLP tasks. A common approach to transfer learning for multiple tasks that maximize parameter sharing trains one or more task-specific layers on top of the language model.
In this paper, we present an alternative approach based on adversarial reprogramming, which extends earlier work on automatic prompt generation. Adversarial reprogramming attempts to learn task-specific word embeddings that, when concatenated to the input text, instruct the language model to solve the specified task.
Using up to 25K trainable parameters per task, this approach outperforms all existing methods that use up to 25M trainable parameters on the public leaderboard of the GLUE benchmark. Our method, initialized with task-specific human-readable prompts, also works in a few-shot setting, outperforming GPT-3 on two SuperGLUE tasks after training on just 32 samples.
Set | Model | CB | RTE | |
---|---|---|---|---|
F1 | Acc. | Acc. | ||
dev | ||||
GPT-3 Small | 26.1 | 42.9 | 52.3 | |
GPT-3 Med | 40.4 | 58.9 | 48.4 | |
GPT-3 | 57.2 | 82.1 | 72.9 | |
PET (ALBERT) | 59.4 | 85.1 | 69.8 | |
iPET (ALBERT) | 92.4 | 92.9 | 74.0 | |
WARPinit (ALBERT) | 84.0 | 87.5 | 71.8 | |
test | ||||
GPT-3 | 52.0 | 75.6 | 69.0 | |
PET (ALBERT) | 60.2 | 87.2 | 67.2 | |
iPET (ALBERT) | 79.9 | 88.8 | 70.8 | |
WARPinit (ALBERT) | 70.2 | 82.4 | 69.1 |
The code requires YerevaNN's internal version ofallennlp
git clone https://github.com/YerevaNN/allennlpgit checkout warppip install .
forDATASETin'cola''sst2''mrpc''qqp''stsb''mnli''rte''wnli''qnli'doexport HPARAMS='{ "dataset": "'$DATASET'", "lr": 0.0001, "num_epochs": 20, "prompts": [], "reorder_optimized": false, "max_batch_size": 8, "max_tokens_sq": 262144, "on_logits": false, "pooling_index": null, "seed": 1}' python -m allennlp train \ -s .aim/baseline-linear-${DATASET} configs/warp.jsonnetdone
forDATASETin'cola''sst2''mrpc''qqp''stsb''mnli''rte''wnli''qnli'doexport HPARAMS='{ "dataset": "'$DATASET'", "lr": 0.0001, "num_epochs": 20, "prompts": [null, "<mask>"], "reorder_optimized": true, "max_batch_size": 8, "max_tokens_sq": 262144, "on_logits": "pre_decoder_layer_norm", "pooling_index": 1, "seed": 1 }' python -m allennlp train \ -s .aim/baseline-warp_0-${DATASET} configs/warp.jsonnetdone
export DATASET="rte"export HPARAMS='{ "benchmark":"super_glue", "classifier_init":null, "dataset":"'$DATASET'", "ensure_whitespace_between":false, "lr":0.001, "max_batch_size":8, "max_tokens_sq":262144, "num_epochs":30, "prompt_better_init":"<mask>", "prompts":[-10,-11,-12,-13,-14,null,-15,-16,-17,-18,-19,"<mask>",-20,-21,-22,-23,-24,null,-25,-26,-27,-28,-29], "seed":1, "transformer_model":"roberta-large"}'python -m allennlp train \-s .aim/t-${DATASET} configs/warp.jsonnet
export HPARAMS='{ "benchmark":"super_glue", "classifier_init": { "entailment": " yes", "not_entailment": " instead" }, "dataset":"few_rte", "eval_mode":false, "lr":0.001, "max_batch_size":2, "max_tokens_sq":131072, "num_epochs":100, "num_gradient_accumulation_steps":2, "prompt_better_init": "[PAD]", "prompts":[-10,-11,[-14,"\""],null,[-15,"\""], [-16, "?"], "<mask>", [-20, ","], null, [-29, "!"],-30,-31], "seed":3, "str_cut_frac":0, "transformer_model":"albert-xxlarge-v2", "validation_metric": null}'python -m allennlp train \-s .aim/t-${DATASET}-`date +%s` configs/warp.jsonnet
export HPARAMS='{ "benchmark":"super_glue", "classifier_init":{ "entailment":" yes", "not_entailment":" instead" }, "dataset":"few_rte", "grad_norm":1, "lr":0.001, "max_batch_size":2, "max_tokens_sq":131072, "num_epochs":30, "num_gradient_accumulation_steps":2, "prompt_better_init":"[PAD]", "prompts":[-10,-11,[-14,"\""],null,[-15,"\""],[-16,"?"],"<mask>",[-20,","],null,[-29,"!"],-30,-31], "seed":1, "str_cut_frac":0.06, "transformer_model":"albert-xxlarge-v2", "validation_metric":"+training_val_metric"}'python -m allennlp train \-s .aim/t-${DATASET}-`date +%s` configs/warp.jsonnet
python -m allennlp predict \ --silent --use-dataset-reader --cuda-device 0 \ --batch-size 50 \ --predictor glue --output-file v0.1/AX.tsv /data/arp/.aim/H-93ae5ae9 ax/test
python -m allennlp predict \ --silent --use-dataset-reader --cuda-device 0 \ --batch-size 50 \ --predictor glue --output-file v0.1/MNLI-m.tsv /data/arp/.aim/H-93ae5ae9 test_matched
If you want to refer to our work use this bibTeX:
@inproceedings{hambardzumyan-etal-2021-warp, title = "{WARP}: {W}ord-level {A}dversarial {R}e{P}rogramming", author = "Hambardzumyan, Karen and Khachatrian, Hrant and May, Jonathan", booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)", month = aug, year = "2021", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2021.acl-long.381", doi = "10.18653/v1/2021.acl-long.381", pages = "4921--4933"}
About
Code for ACL'2021 paper WARP 🌀 Word-level Adversarial ReProgramming. Outperforming `GPT-3` on SuperGLUE Few-Shot text classification.https://aclanthology.org/2021.acl-long.381/