varunkumar-dev/TransformersDataAugmentationPublic

NotificationsYou must be signed in to change notification settings
Fork20
Star133

Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

License

View license

133 stars 20 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Repository files navigation

Data Augmentation using Pre-trained Transformer Models

This code is originally released from amazon-research package (https://github.com/amazon-research/transformers-data-augmentation) In the paper, we mentionedhttps://github.com/varinf/TransformersDataAugmentation url so we are providing a copy of the same code here.

Code associated with theData Augmentation using Pre-trained Transformer Models paper

Code contains implementation of the following data augmentation methods

EDA (Baseline)
Backtranslation (Baseline)
CBERT (Baseline)
BERT Prepend (Our paper)
GPT-2 Prepend (Our paper)
BART Prepend (Our paper)

DataSets

In paper, we use three datasets from following resources

Low-data regime experiment setup

Runsrc/utils/download_and_prepare_datasets.sh file to prepare all datsets.
download_and_prepare_datasets.sh performs following steps

Download data from github
Replace numeric labels with text for STSA-2 and TREC dataset
For a given dataset, creates 15 random splits of train and dev data.

Dependencies

To run this code, you need following dependencies

Pytorch 1.5
fairseq 0.9
transformers 2.9

How to run

To run data augmentation experiment for a given dataset, run bash script inscripts folder.For example, to run data augmentation onsnips dataset,

runscripts/bart_snips_lower.sh for BART experiment
runscripts/bert_snips_lower.sh for rest of the data augmentation methods

How to cite

@inproceedings{kumar-etal-2020-data,    title = "Data Augmentation using Pre-trained Transformer Models",    author = "Kumar, Varun  and      Choudhary, Ashutosh  and      Cho, Eunah",    booktitle = "Proceedings of the 2nd Workshop on Life-long Learning for Spoken Language Systems",    month = dec,    year = "2020",    address = "Suzhou, China",    publisher = "Association for Computational Linguistics",    url = "https://www.aclweb.org/anthology/2020.lifelongnlp-1.3",    pages = "18--26",}

Contact

Please reachout tokuvrun@amazon.com for any questions related to this code.

License

This project is licensed under the Creative Common Attribution Non-Commercial 4.0 license.

About

Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Data Augmentation using Pre-trained Transformer Models

DataSets

Low-data regime experiment setup

Dependencies

How to run

How to cite

Contact

License

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

License

varunkumar-dev/TransformersDataAugmentation

Folders and files

Latest commit

History

Repository files navigation

Data Augmentation using Pre-trained Transformer Models

DataSets

Low-data regime experiment setup

Dependencies

How to run

How to cite

Contact

License

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages