- Notifications
You must be signed in to change notification settings - Fork92
Implementation of Transformer Model in Tensorflow
NotificationsYou must be signed in to change notification settings
lilianweng/transformer-tensorflow
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Implementation of theTransformer model in the paper:
Ashish Vaswani, et al."Attention is all you need." NIPS 2017.
Check my blog post on attention and transformer:
Implementations that helped me:
- https://github.com/Kyubyong/transformer/
- https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/transformer.py
- http://nlp.seas.harvard.edu/2018/04/01/attention.html
$ git clone https://github.com/lilianweng/transformer-tensorflow.git$cd transformer-tensorflow$ pip install -r requirements.txt# Check the help message:$ python train.py --helpUsage: train.py [OPTIONS]Options: --seq-len INTEGER Input sequence length. [default: 20] --d-model INTEGER d_model [default: 512] --d-ff INTEGER d_ff [default: 2048] --n-head INTEGER n_head [default: 8] --batch-size INTEGER Batch size [default: 128] --max-steps INTEGER Max train steps. [default: 300000] --dataset [iwslt15|wmt14|wmt15] Which translation dataset to use. [default: iwslt15] --help Show this message and exit.# Train a model on dataset WMT14:$ python train.py --dataset wmt14
Let's say, the model is saved in foldertransformer-wmt14-seq20-d512-head8-1541573730 incheckpoints folder.
$ python eval.py transformer-wmt14-seq20-d512-head8-1541573730
With the default config, this implementation gets BLEU ~ 20 on wmt14 test set.
[WIP] A couple of tricking points in the implementation.
- How to construct the mask correctly?
- How to correctly shift decoder input (as training input) and decoder target (as ground truth in the loss function)?
- How to make the prediction in an autoregressive way?
- Keeping the embedding of
<pad>as a constant zero vector is sorta important.
About
Implementation of Transformer Model in Tensorflow
Topics
Resources
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
No releases published
Packages0
No packages published
Uh oh!
There was an error while loading.Please reload this page.
