Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
This repository was archived by the owner on Jan 3, 2025. It is now read-only.
/synstPublic archive

Source code to reproduce the results in the ACL 2019 paper "Syntactically Supervised Transformers for Faster Neural Machine Translation"

License

NotificationsYou must be signed in to change notification settings

dojoteef/synst

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is the official repository which contains all the code necessary toreplicate the results from the ACL 2019 long paperSyntactically SupervisedTransformers for Faster Neural MachineTranslation. It can also be used to train avanilla Transformer or aSemi-AutoregressiveTransformer.

The full model architecture is displayed below:

Our approach uses syntactic supervision to speed up neural machine translation(NMT) for the Transformer architecture. We modify the Transformer architectureby adding a single layer parse decoder that autoregressively predicts a shallowchunking of the target parse. Then, conditioned on this parse, a separate tokendecoder generates the final target translation in one shot(non-autoregressively). The figure above demonstrates the inputs and outputsfor each module in the architecture.

Requirements

The code requires Python 3.6+. The python dependencies can be installed with thecommand (using a virtual environment is highly recommended):

pip install -r requirements.txt

In order to parse the datasets, the code also depends uponjq and the shift-reduce parsers fromCoreNLP. First, make sure you have anappropriate Java runtime installed.

Then download and unzip the main CoreNLP package to the directory of yourchoice:

curl -O https://nlp.stanford.edu/software/stanford-corenlp-full-2018-10-05.zipunzip stanford-corenlp-full-2018-10-05.zip

You'll also need download the shift reduce parsers for each of the languages:

cd stanford-corenlp-full-2018-10-05curl -O https://nlp.stanford.edu/software/stanford-srparser-2014-10-23-models.jarcurl -O https://nlp.stanford.edu/software/stanford-french-corenlp-2018-10-05-models.jarcurl -O https://nlp.stanford.edu/software/stanford-german-corenlp-2018-10-05-models.jar

Additionally, if you want to use the scripts that wrapmulti-bleu.perl andsacrebleu, then you'll need to haveMoses-SMT available as well.

Basic Usage

The code has one main entry pointmain.py with a couple of support scripts forthe analysis conducted in the paper. Please usepython main.py -h foradditional options not listed below. You can also usepython main.py <action> -h for options specific to the available actions:{train, evaluate, translate, pass}.

Preprocessing

CLASSPATH=stanford-corenlp-full-2018-10-05/* python main.py \  --dataset wmt_en_de_parsed --span 6 -d raw/wmt -p preprocessed/wmt -v pass

Troubleshooting

If you have issues with preprocessing, a few common problems are:

  1. Not correctly setting yourCLASSPATH to include CoreNLP
  2. The environment variables forLANG andLC_ALL are not set to use UTF-8.Try settingLANG=en_US.UTF-8 LC_ALL= on the command-line when running thepreprocessing.

Training

Assuming you have access to 8 1080Ti GPUs you can recreate the results for SynSTon the WMT'14 En-De dataset with:

python main.py -b 3175 --dataset wmt_en_de_parsed --span 6 \  --model parse_transformer -d raw/wmt -p preprocessed/wmt -v train \  --checkpoint-interval 1200 --accumulate 2 --label-smoothing 0

The above commandline will train 8 GPUs with approximately 3175 source/targettokens combined per GPU, and accumulate the gradients over two batches beforeupdating model parameters (leading to ~50.8k tokens per model update).

The default model is the Transformer model, which can take the additionalcommandline argument--span <k> to produce a semi-autoregressive variant(where the default--span 1 is the basic Transformer). For example the belowline will train a semi-autoregressive Transformer withk=2 on the WMT'14 De-Endataset:

python main.py -b 3175 --dataset wmt_de_en --span 2 \  -d raw/wmt -p preprocessed/wmt -v train \  --checkpoint-interval 1200 --accumulate 2

Evalulating Perplexity

You can run a separate process to evaluate each new checkpoint generated duringtraining (you may either want to do it on a GPU not used for training or disablecuda as done below):

python main.py -b 5000 --dataset wmt_en_de_parsed --span 6 \  --model parse_transformer -d raw/wmt -p preprocessed/wmt \  --split valid --disable-cuda -v evaluate \  --watch-directory /tmp/synst/checkpoints

Translating

After training a model, you can generate translations with the followingcommand (currently only translation on a single GPU is supported):

CUDA_VISIBLE_DEVICES=0 python main.py --dataset wmt_en_de_parsed --span 6 \  --model parse_transformer -d raw/wmt -p preprocessed/wmt \  --batch-size 1 --batch-method example --splittest -v \  --restore /tmp/synst/checkpoints/checkpoint.pt \  --average-checkpoints 5 translate \  --max-decode-length 50 --length-basis input_lens --order-output

Which by default, will output translations to/tmp/synst/output.

Experiment tracking

If you have acomet.ml account, on you can trackexperiments, by prefixing the script call with:

env$(cat~/.comet.ml| xargs) python main.py --track ...

Where~/.comet.ml is the file which contains your API key for loggingexperiments on the service. By default, this will track experiments in aworkspace namedumass-nlp with project namesynst. Seeargs.py in order toconfigure the experiment tracking to suit your needs.

Cite

@inproceedings{akoury2019synst,title={Syntactically Supervised Transformers for Faster Neural Machine Translation},author={Akoury, Nader and Krishna, Kalpesh and Iyyer, Mohit},booktitle={Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},year={2019}}

About

Source code to reproduce the results in the ACL 2019 paper "Syntactically Supervised Transformers for Faster Neural Machine Translation"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp