- Notifications
You must be signed in to change notification settings - Fork73
Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction
License
NotificationsYou must be signed in to change notification settings
elbayadm/attn2d
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This is a fork of Fairseq(-py) with implementations of the following models:
An NMT models with two-dimensional convolutions to jointly encode the source and the target sequences.
Pervasive Attention also provides an extensive decoding grid that we leverage to efficiently train wait-k models.
SeeREADME.
Transformer Wait-k models (Ma et al., 2019) with unidirectional encoders and with joint training of multiple wait-k paths.
SeeREADME.
- PyTorch version >= 1.4.0
- Python version >= 3.6
- For training new models, you'll also need an NVIDIA GPU andNCCL
Installing Fairseq
git clone https://github.com/elbayadm/attn2dcd attn2dpip install --editable.
fairseq(-py) is MIT-licensed.The license applies to the pre-trained models as well.
For Pervasive Attention, please cite:
@InProceedings{elbayad18conll,author ="Elbayad, Maha and Besacier, Laurent and Verbeek, Jakob",title ="Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction",booktitle ="Proceedings of the 22nd Conference on Computational Natural Language Learning",year ="2018", }
For our wait-k models, please cite:
@article{elbayad20waitk,title={Efficient Wait-k Models for Simultaneous Machine Translation},author={Elbayad, Maha and Besacier, Laurent and Verbeek, Jakob},journal={arXiv preprint arXiv:2005.08595},year={2020}}
For Fairseq, please cite:
@inproceedings{ott2019fairseq,title ={fairseq: A Fast, Extensible Toolkit for Sequence Modeling},author ={Myle Ott and Sergey Edunov and Alexei Baevski and Angela Fan and Sam Gross and Nathan Ng and David Grangier and Michael Auli},booktitle ={Proceedings of NAACL-HLT 2019: Demonstrations},year ={2019},}