- Notifications
You must be signed in to change notification settings - Fork6
Source code to reproduce the results in the ACL 2019 paper "Syntactically Supervised Transformers for Faster Neural Machine Translation"
License
dojoteef/synst
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This is the official repository which contains all the code necessary toreplicate the results from the ACL 2019 long paperSyntactically SupervisedTransformers for Faster Neural MachineTranslation. It can also be used to train avanilla Transformer or aSemi-AutoregressiveTransformer.
The full model architecture is displayed below:
Our approach uses syntactic supervision to speed up neural machine translation(NMT) for the Transformer architecture. We modify the Transformer architectureby adding a single layer parse decoder that autoregressively predicts a shallowchunking of the target parse. Then, conditioned on this parse, a separate tokendecoder generates the final target translation in one shot(non-autoregressively). The figure above demonstrates the inputs and outputsfor each module in the architecture.
The code requires Python 3.6+. The python dependencies can be installed with thecommand (using a virtual environment is highly recommended):
pip install -r requirements.txt
In order to parse the datasets, the code also depends uponjq and the shift-reduce parsers fromCoreNLP. First, make sure you have anappropriate Java runtime installed.
Then download and unzip the main CoreNLP package to the directory of yourchoice:
curl -O https://nlp.stanford.edu/software/stanford-corenlp-full-2018-10-05.zipunzip stanford-corenlp-full-2018-10-05.zip
You'll also need download the shift reduce parsers for each of the languages:
cd stanford-corenlp-full-2018-10-05curl -O https://nlp.stanford.edu/software/stanford-srparser-2014-10-23-models.jarcurl -O https://nlp.stanford.edu/software/stanford-french-corenlp-2018-10-05-models.jarcurl -O https://nlp.stanford.edu/software/stanford-german-corenlp-2018-10-05-models.jar
Additionally, if you want to use the scripts that wrapmulti-bleu.perl
andsacrebleu
, then you'll need to haveMoses-SMT available as well.
The code has one main entry pointmain.py
with a couple of support scripts forthe analysis conducted in the paper. Please usepython main.py -h
foradditional options not listed below. You can also usepython main.py <action> -h
for options specific to the available actions:{train, evaluate, translate, pass}
.
CLASSPATH=stanford-corenlp-full-2018-10-05/* python main.py \ --dataset wmt_en_de_parsed --span 6 -d raw/wmt -p preprocessed/wmt -v pass
If you have issues with preprocessing, a few common problems are:
- Not correctly setting your
CLASSPATH
to include CoreNLP - The environment variables for
LANG
andLC_ALL
are not set to use UTF-8.Try settingLANG=en_US.UTF-8 LC_ALL=
on the command-line when running thepreprocessing.
Assuming you have access to 8 1080Ti GPUs you can recreate the results for SynSTon the WMT'14 En-De dataset with:
python main.py -b 3175 --dataset wmt_en_de_parsed --span 6 \ --model parse_transformer -d raw/wmt -p preprocessed/wmt -v train \ --checkpoint-interval 1200 --accumulate 2 --label-smoothing 0
The above commandline will train 8 GPUs with approximately 3175 source/targettokens combined per GPU, and accumulate the gradients over two batches beforeupdating model parameters (leading to ~50.8k tokens per model update).
The default model is the Transformer model, which can take the additionalcommandline argument--span <k>
to produce a semi-autoregressive variant(where the default--span 1
is the basic Transformer). For example the belowline will train a semi-autoregressive Transformer withk=2
on the WMT'14 De-Endataset:
python main.py -b 3175 --dataset wmt_de_en --span 2 \ -d raw/wmt -p preprocessed/wmt -v train \ --checkpoint-interval 1200 --accumulate 2
You can run a separate process to evaluate each new checkpoint generated duringtraining (you may either want to do it on a GPU not used for training or disablecuda as done below):
python main.py -b 5000 --dataset wmt_en_de_parsed --span 6 \ --model parse_transformer -d raw/wmt -p preprocessed/wmt \ --split valid --disable-cuda -v evaluate \ --watch-directory /tmp/synst/checkpoints
After training a model, you can generate translations with the followingcommand (currently only translation on a single GPU is supported):
CUDA_VISIBLE_DEVICES=0 python main.py --dataset wmt_en_de_parsed --span 6 \ --model parse_transformer -d raw/wmt -p preprocessed/wmt \ --batch-size 1 --batch-method example --splittest -v \ --restore /tmp/synst/checkpoints/checkpoint.pt \ --average-checkpoints 5 translate \ --max-decode-length 50 --length-basis input_lens --order-output
Which by default, will output translations to/tmp/synst/output
.
If you have acomet.ml account, on you can trackexperiments, by prefixing the script call with:
env$(cat~/.comet.ml| xargs) python main.py --track ...
Where~/.comet.ml
is the file which contains your API key for loggingexperiments on the service. By default, this will track experiments in aworkspace namedumass-nlp
with project namesynst
. Seeargs.py
in order toconfigure the experiment tracking to suit your needs.
@inproceedings{akoury2019synst,title={Syntactically Supervised Transformers for Faster Neural Machine Translation},author={Akoury, Nader and Krishna, Kalpesh and Iyyer, Mohit},booktitle={Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},year={2019}}