Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

License

NotificationsYou must be signed in to change notification settings

varunkumar-dev/TransformersDataAugmentation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This code is originally released from amazon-research package (https://github.com/amazon-research/transformers-data-augmentation) In the paper, we mentionedhttps://github.com/varinf/TransformersDataAugmentation url so we are providing a copy of the same code here.

Code associated with theData Augmentation using Pre-trained Transformer Models paper

Code contains implementation of the following data augmentation methods

  • EDA (Baseline)
  • Backtranslation (Baseline)
  • CBERT (Baseline)
  • BERT Prepend (Our paper)
  • GPT-2 Prepend (Our paper)
  • BART Prepend (Our paper)

DataSets

In paper, we use three datasets from following resources

Low-data regime experiment setup

Runsrc/utils/download_and_prepare_datasets.sh file to prepare all datsets.
download_and_prepare_datasets.sh performs following steps

  1. Download data from github
  2. Replace numeric labels with text for STSA-2 and TREC dataset
  3. For a given dataset, creates 15 random splits of train and dev data.

Dependencies

To run this code, you need following dependencies

  • Pytorch 1.5
  • fairseq 0.9
  • transformers 2.9

How to run

To run data augmentation experiment for a given dataset, run bash script inscripts folder.For example, to run data augmentation onsnips dataset,

  • runscripts/bart_snips_lower.sh for BART experiment
  • runscripts/bert_snips_lower.sh for rest of the data augmentation methods

How to cite

@inproceedings{kumar-etal-2020-data,    title = "Data Augmentation using Pre-trained Transformer Models",    author = "Kumar, Varun  and      Choudhary, Ashutosh  and      Cho, Eunah",    booktitle = "Proceedings of the 2nd Workshop on Life-long Learning for Spoken Language Systems",    month = dec,    year = "2020",    address = "Suzhou, China",    publisher = "Association for Computational Linguistics",    url = "https://www.aclweb.org/anthology/2020.lifelongnlp-1.3",    pages = "18--26",}

Contact

Please reachout tokuvrun@amazon.com for any questions related to this code.

License

This project is licensed under the Creative Common Attribution Non-Commercial 4.0 license.

About

Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp