This repository was archived by the owner on Jan 3, 2025. It is now read-only.

dojoteef/synstPublic archive

NotificationsYou must be signed in to change notification settings
Fork6
Star81

Source code to reproduce the results in the ACL 2019 paper "Syntactically Supervised Transformers for Faster Neural Machine Translation"

License

BSD-3-Clause license

81 stars 6 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
actions		actions
data		data
metrics		metrics
models		models
resources		resources
scripts		scripts
utils		utils
LICENSE		LICENSE
README.md		README.md
analysis.py		analysis.py
args.py		args.py
main.py		main.py
plot.py		plot.py
requirements.txt		requirements.txt

Repository files navigation

SynST: Syntactically Supervised Transformers

This is the official repository which contains all the code necessary toreplicate the results from the ACL 2019 long paperSyntactically SupervisedTransformers for Faster Neural MachineTranslation. It can also be used to train avanilla Transformer or aSemi-AutoregressiveTransformer.

The full model architecture is displayed below:

Our approach uses syntactic supervision to speed up neural machine translation(NMT) for the Transformer architecture. We modify the Transformer architectureby adding a single layer parse decoder that autoregressively predicts a shallowchunking of the target parse. Then, conditioned on this parse, a separate tokendecoder generates the final target translation in one shot(non-autoregressively). The figure above demonstrates the inputs and outputsfor each module in the architecture.

Requirements

The code requires Python 3.6+. The python dependencies can be installed with thecommand (using a virtual environment is highly recommended):

pip install -r requirements.txt

In order to parse the datasets, the code also depends uponjq and the shift-reduce parsers fromCoreNLP. First, make sure you have anappropriate Java runtime installed.

Then download and unzip the main CoreNLP package to the directory of yourchoice:

curl -O https://nlp.stanford.edu/software/stanford-corenlp-full-2018-10-05.zipunzip stanford-corenlp-full-2018-10-05.zip

You'll also need download the shift reduce parsers for each of the languages:

cd stanford-corenlp-full-2018-10-05curl -O https://nlp.stanford.edu/software/stanford-srparser-2014-10-23-models.jarcurl -O https://nlp.stanford.edu/software/stanford-french-corenlp-2018-10-05-models.jarcurl -O https://nlp.stanford.edu/software/stanford-german-corenlp-2018-10-05-models.jar

Additionally, if you want to use the scripts that wrapmulti-bleu.perl andsacrebleu, then you'll need to haveMoses-SMT available as well.

Basic Usage

The code has one main entry pointmain.py with a couple of support scripts forthe analysis conducted in the paper. Please usepython main.py -h foradditional options not listed below. You can also usepython main.py <action> -h for options specific to the available actions:{train, evaluate, translate, pass}.

Preprocessing

CLASSPATH=stanford-corenlp-full-2018-10-05/* python main.py \  --dataset wmt_en_de_parsed --span 6 -d raw/wmt -p preprocessed/wmt -v pass

Troubleshooting

If you have issues with preprocessing, a few common problems are:

Not correctly setting yourCLASSPATH to include CoreNLP
The environment variables forLANG andLC_ALL are not set to use UTF-8.Try settingLANG=en_US.UTF-8 LC_ALL= on the command-line when running thepreprocessing.

Training

Assuming you have access to 8 1080Ti GPUs you can recreate the results for SynSTon the WMT'14 En-De dataset with:

python main.py -b 3175 --dataset wmt_en_de_parsed --span 6 \  --model parse_transformer -d raw/wmt -p preprocessed/wmt -v train \  --checkpoint-interval 1200 --accumulate 2 --label-smoothing 0

The above commandline will train 8 GPUs with approximately 3175 source/targettokens combined per GPU, and accumulate the gradients over two batches beforeupdating model parameters (leading to ~50.8k tokens per model update).

The default model is the Transformer model, which can take the additionalcommandline argument--span <k> to produce a semi-autoregressive variant(where the default--span 1 is the basic Transformer). For example the belowline will train a semi-autoregressive Transformer withk=2 on the WMT'14 De-Endataset:

python main.py -b 3175 --dataset wmt_de_en --span 2 \  -d raw/wmt -p preprocessed/wmt -v train \  --checkpoint-interval 1200 --accumulate 2

Evalulating Perplexity

You can run a separate process to evaluate each new checkpoint generated duringtraining (you may either want to do it on a GPU not used for training or disablecuda as done below):

python main.py -b 5000 --dataset wmt_en_de_parsed --span 6 \  --model parse_transformer -d raw/wmt -p preprocessed/wmt \  --split valid --disable-cuda -v evaluate \  --watch-directory /tmp/synst/checkpoints

Translating

After training a model, you can generate translations with the followingcommand (currently only translation on a single GPU is supported):

CUDA_VISIBLE_DEVICES=0 python main.py --dataset wmt_en_de_parsed --span 6 \  --model parse_transformer -d raw/wmt -p preprocessed/wmt \  --batch-size 1 --batch-method example --splittest -v \  --restore /tmp/synst/checkpoints/checkpoint.pt \  --average-checkpoints 5 translate \  --max-decode-length 50 --length-basis input_lens --order-output

Which by default, will output translations to/tmp/synst/output.

Experiment tracking

If you have acomet.ml account, on you can trackexperiments, by prefixing the script call with:

env$(cat~/.comet.ml| xargs) python main.py --track ...

Where~/.comet.ml is the file which contains your API key for loggingexperiments on the service. By default, this will track experiments in aworkspace namedumass-nlp with project namesynst. Seeargs.py in order toconfigure the experiment tracking to suit your needs.

Cite

@inproceedings{akoury2019synst,title={Syntactically Supervised Transformers for Faster Neural Machine Translation},author={Akoury, Nader and Krishna, Kalpesh and Iyyer, Mohit},booktitle={Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},year={2019}}

About

Source code to reproduce the results in the ACL 2019 paper "Syntactically Supervised Transformers for Faster Neural Machine Translation"

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

SynST: Syntactically Supervised Transformers

Requirements

Basic Usage

Preprocessing

Troubleshooting

Training

Evalulating Perplexity

Translating

Experiment tracking

Cite

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors2

Uh oh!

Languages

Movatterモバイル変換

License

dojoteef/synst

Folders and files

Latest commit

History

Repository files navigation

SynST: Syntactically Supervised Transformers

Requirements

Basic Usage

Preprocessing

Troubleshooting

Training

Evalulating Perplexity

Translating

Experiment tracking

Cite

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors2

Uh oh!

Languages

Packages