This repository was archived by the owner on Jan 5, 2023. It is now read-only.

lium-lst/nmtpyPublic archive

NotificationsYou must be signed in to change notification settings
Fork32
Star125

nmtpy is a Python framework based on dl4mt-tutorial to experiment with Neural Machine Translation pipelines.

lium-lst.github.io/nmtpy/

License

View license

125 stars 32 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 1,463 Commits
bin		bin
docs		docs
examples		examples
nmtpy		nmtpy
patches		patches
scripts		scripts
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
setup.py		setup.py

Repository files navigation

UNMAINTAINED

This codebase is no longer maintained as we moved towardsnmtpytorch.

If you usenmtpy, you may want to cite the followingpaper:

@article{nmtpy2017,  author    = {Ozan Caglayan and               Mercedes Garc\'{i}a-Mart\'{i}nez and               Adrien Bardet and               Walid Aransa and               Fethi Bougares and               Lo\"{i}c Barrault},  title     = {NMTPY: A Flexible Toolkit for Advanced Neural Machine Translation Systems},  journal   = {Prague Bull. Math. Linguistics},  volume    = {109},  pages     = {15--28},  year      = {2017},  url       = {https://ufal.mff.cuni.cz/pbml/109/art-caglayan-et-al.pdf},  doi       = {10.1515/pralin-2017-0035},  timestamp = {Tue, 12 Sep 2017 10:01:08 +0100}}

List of Important Recent Changes

Model checkpoints were unnecessarily larger by 30% because of a storing format issue. This is fixed now byhttps://github.com/lium-lst/nmtpy/commit/0721f34924d23b02caca52e8c3fcbcaafbb4ef41.

Factored NMT

attention_factors_seplogits.py is removed and its functionality is added toattention_factors model as a configuration switch:sep_h2olayer: True.

NMT

tied_trg_emb: True/False is replaced withtied_emb: False/2way/3way to also support the sharing of "all" embeddings throughout the network.

Introduction

nmtpy is a suite of Python tools, primarily based on the starter code provided indl4mt-tutorial for training neural machine translation networks using Theano. The basic motivation behind forkingdl4mt-tutorial was to create a framework where it would be easy to implement a new model by just copying and modifying an existing model class (or even inheriting from it and overriding some of its methods).

To achieve this purpose,nmtpy tries to completely isolate training loop, beam search,iteration and model definition:

nmt-train script to start a training experiment
nmt-translate to produce model-agnostic translations. You just pass a trained model'scheckpoint file and it does its job.
nmt-rescore to rescore translation hypotheses using an nmtpy model.
An abstractBaseModel class to derive from to define your NMT architecture.
An abstractIterator to derive from for your custom iterators.

A non-exhaustive list of differences betweennmtpy anddl4mt-tutorial is as follows:

No shell script, everything is in Python
Overhaul object-oriented refactoring of the code: clear separation of API and scripts that interface with the API
INI style configuration files to define everything regarding a training experiment
Transparent cleanup mechanism to kill stale processes, remove temporary files
Simultaneous logging of training details to stdout and log file
Supports out-of-the-box BLEU, METEOR and COCO eval metrics
Includessubword-nmt utilities for training and applying BPE model (NOTE: This may change as the upstream subword-nmt moves forward as well.)
Plugin-like text filters for hypothesis post-processing (Example: BPE, Compound, Char2Words for Char-NMT)
Early-stopping and checkpointing based on perplexity, BLEU or METEOR (Ability to add new metrics easily)
Single.npz file to store everything about a training experiment
Automatic free GPU selection and reservation usingnvidia-smi
Shuffling support between epochs:
- Simple shuffle
- Homogeneous batches of same-length samples to improve training speed
Improved parallel translation decoding on CPU
Forced decoding i.e. rescoring using NMT
Export decoding informations intojson for further visualization of attention coefficients
Improved numerical stability and reproducibility
Glorot/Xavier, He, Orthogonal weight initializations
Efficient SGD, Adadelta, RMSProp and ADAM: Single forward/backward theano function without intermediate variables
Ability to stop updating a set of weights by recompiling optimizer
Several recurrent blocks:
- GRU, Conditional GRU (CGRU) and LSTM
- Multimodal attentive CGRU variants
Layer Normalization support for GRU
2-way or 3-waytied target embeddings
Simple/Non-recurrent Dropout, L2 weight decay
Training and validation loss normalization for comparable perplexities
Initialization of a model with a pretrained NMT for further finetuning

Models

It is advised to check the actual model implementations for the most up-to-date informations as what is written may become outdated.

Attentional NMT:`attention.py`

This is the basic attention based NMT fromdl4mt-tutorial improved in different ways:

3 forward dropout layers after source embeddings, source context and before softmax managed by the configuration parametersemb_dropout, ctx_dropout, out_dropout.
Layer normalization for source encoder (layer_norm:True|False)
Tied embeddings (tied_emb:False|2way|3way)

This model uses the simpleBitextIterator i.e. it directly reads plain parallel text files as defined in the experiment configuration file. Please seethis monomodal example for usage.

Multimodal NMT / Image Captioning:`fusion*py`

Thesefusion models derived fromattention.py andbasefusion.py implement several multimodal NMT / Image Captioning architectures detailed in the following papers:

Caglayan, Ozan, et al. "Does Multimodality Help Human and Machine for Translation and Image Captioning?." arXiv preprint arXiv:1605.09186 (2016).

Caglayan, Ozan, Loïc Barrault, and Fethi Bougares. "Multimodal Attention for Neural Machine Translation." arXiv preprint arXiv:1609.03976 (2016).

The models are separated into 8 files implementing their own multimodal CGRU differing in the way the attention is formulated in the decoder (4 ways) x the way the multimodal contexts are fusioned (2 ways: SUM/CONCAT). These models also use a different data iterator, namelyWMTIterator that requires converting the textual data into.pkl as in themultimodal example.

TheWMTIterator only knows how to handle the ResNet-50 convolutional features that we provide in the examples page. If you would like to use FC-style fixed-length vectors or other types of multimodal features, you need to write your own iterator.

Factored NMT:`attention_factors.py`

The model fileattention_factors.py corresponds to the following paper:

García-Martínez, Mercedes, Loïc Barrault, and Fethi Bougares. "Factored Neural Machine Translation." arXiv preprint arXiv:1609.04621 (2016).

In the examples folder of this repository, you can find data and a configuration file to run this model.

RNNLM:`rnnlm.py`

This is a basic recurrent language model to be used withnmt-test-lm utility.

Requirements

You need the following Python libraries installed in order to usenmtpy:

numpy
Theano >= 0.9
We recommend using Anaconda Python distribution which is equipped with Intel MKL (Math Kernel Library) greatlyimproving CPU decoding speeds during beam search. With a correct compilation and installation, you should achievesimilar performance with OpenBLAS as well but the setup procedure may be difficult to follow for inexperienced ones.
nmtpy only supports Python 3.5+, please seepythonclock.org
Please note that METEOR requires aJava runtime sojava should be in your$PATH.

Additional data for METEOR

Before installingnmtpy, you need to runscripts/get-meteor-data.sh to download METEOR paraphrase files.

Installation

$ python setup.py install

Note: When you add a new model undermodels/ it will not be directly available in runtimeas it needs to be installed as well. To avoid re-installing each time, you can use development mode withpython setup.py develop which will directly make Python see thegit folder as the library content.

Ensuring Reproducibility in Theano

(Update: Theano 1.0 includes a configuration optiondeterministic = more that obsoletes the below patch.)

When we started to work ondl4mt-tutorial, we noticed an annoying reproducibility problem wheremultiple runs of the same experiment (same seed, same machine, same GPU) were not producing exactlythe same training and validation losses after a few iterations.

The solution that wasdiscussed in Theanoissues was to replace a non-deterministic GPU operation with its deterministic equivalent. To achieve this,you shouldpatch your local Theano v0.9.0 installation usingthis patch unless upstream developers add a configuration option to.theanorc.

Configuring Theano

Here is a basic.theanorc file (Note that the way you install CUDA, CuDNNmay require some modifications):

[global]# Not so important as nmtpy will pick an available GPUdevice = gpu0# We use float32 everywherefloatX = float32# Keep theano compilation in RAM if you have a 7/24 available serverbase_compiledir=/tmp/theano-%(user)s# For Theano >= 0.10, if you want exact same results for each run# with same seeddeterministic=more[cuda]root = /opt/cuda-8.0[dnn]# Make sure you use CuDNN as wellenabled = autolibrary_path = /opt/CUDNN/cudnn-v5.1/lib64include_path = /opt/CUDNN/cudnn-v5.1/include[lib]# Allocate 95% of GPU memory oncecnmem = 0.95

You may also want to try the new GPU backend afterinstallinglibgpuarray. In order to do so,passGPUARRAY=1 into the environment when runningnmt-train:

$ GPUARRAY=1 nmt-train -c <conf file> ...

Checking BLAS configuration

Recent Theano versions can automatically detect correct MKL flags. You should obtain a similar output after running the following command:

$ python -c 'import theano; print theano.config.blas.ldflags'-L/home/ozancag/miniconda/lib -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -lm -Wl,-rpath,/home/ozancag/miniconda/lib

Acknowledgements

nmtpy includes code from the following projects:

dl4mt-tutorial
Scripts fromsubword-nmt
Ensembling and alignment collection fromnematus
multi-bleu.perl frommosesdecoder
METEOR v1.5 JAR frommeteor
Sorted data iterator, coco eval script and LSTM fromarctic-captions
pycocoevalcap fromcoco-caption

SeeLICENSE file for license information.

About

nmtpy is a Python framework based on dl4mt-tutorial to experiment with Neural Machine Translation pipelines.

lium-lst.github.io/nmtpy/

Movatterモバイル変換

License

lium-lst/nmtpy

Folders and files

Latest commit

History

Repository files navigation

UNMAINTAINED

List of Important Recent Changes

Factored NMT

NMT

Introduction

Models

Attentional NMT:attention.py

Multimodal NMT / Image Captioning:fusion*py

Factored NMT:attention_factors.py

RNNLM:rnnlm.py

Requirements

Additional data for METEOR

Installation

Ensuring Reproducibility in Theano

Configuring Theano

Checking BLAS configuration

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases1

Packages0

Uh oh!

Contributors5

Uh oh!

Languages

Attentional NMT:`attention.py`

Multimodal NMT / Image Captioning:`fusion*py`

Factored NMT:`attention_factors.py`

RNNLM:`rnnlm.py`

Packages