- Notifications
You must be signed in to change notification settings - Fork20
Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper
License
varunkumar-dev/TransformersDataAugmentation
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This code is originally released from amazon-research package (https://github.com/amazon-research/transformers-data-augmentation) In the paper, we mentionedhttps://github.com/varinf/TransformersDataAugmentation url so we are providing a copy of the same code here.
Code associated with theData Augmentation using Pre-trained Transformer Models paper
Code contains implementation of the following data augmentation methods
- EDA (Baseline)
- Backtranslation (Baseline)
- CBERT (Baseline)
- BERT Prepend (Our paper)
- GPT-2 Prepend (Our paper)
- BART Prepend (Our paper)
In paper, we use three datasets from following resources
- STSA-2 :https://github.com/1024er/cbert_aug/tree/crayon/datasets/stsa.binary
- TREC :https://github.com/1024er/cbert_aug/tree/crayon/datasets/TREC
- SNIPS :https://github.com/MiuLab/SlotGated-SLU/tree/master/data/snips
Runsrc/utils/download_and_prepare_datasets.sh
file to prepare all datsets.download_and_prepare_datasets.sh
performs following steps
- Download data from github
- Replace numeric labels with text for STSA-2 and TREC dataset
- For a given dataset, creates 15 random splits of train and dev data.
To run this code, you need following dependencies
- Pytorch 1.5
- fairseq 0.9
- transformers 2.9
To run data augmentation experiment for a given dataset, run bash script inscripts
folder.For example, to run data augmentation onsnips
dataset,
- run
scripts/bart_snips_lower.sh
for BART experiment - run
scripts/bert_snips_lower.sh
for rest of the data augmentation methods
@inproceedings{kumar-etal-2020-data, title = "Data Augmentation using Pre-trained Transformer Models", author = "Kumar, Varun and Choudhary, Ashutosh and Cho, Eunah", booktitle = "Proceedings of the 2nd Workshop on Life-long Learning for Spoken Language Systems", month = dec, year = "2020", address = "Suzhou, China", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.lifelongnlp-1.3", pages = "18--26",}
Please reachout tokuvrun@amazon.com for any questions related to this code.
This project is licensed under the Creative Common Attribution Non-Commercial 4.0 license.
About
Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper
Topics
Resources
License
Code of conduct
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.