lilianweng/transformer-tensorflowPublic

NotificationsYou must be signed in to change notification settings
Fork92
Star477

Implementation of Transformer Model in Tensorflow

lilianweng.github.io/lil-log/2018/06/24/attention-attention.html

477 stars 92 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.gitignore		.gitignore
README.md		README.md
data.py		data.py
eval.py		eval.py
requirements.txt		requirements.txt
train.py		train.py
transformer.py		transformer.py
transformer_test.py		transformer_test.py
utils.py		utils.py

Repository files navigation

Transformer

Implementation of theTransformer model in the paper:

Ashish Vaswani, et al."Attention is all you need." NIPS 2017.

Check my blog post on attention and transformer:

Attention? Attention!

Implementations that helped me:

Setup

$ git clone https://github.com/lilianweng/transformer-tensorflow.git$cd transformer-tensorflow$ pip install -r requirements.txt

Train a Model

# Check the help message:$ python train.py --helpUsage: train.py [OPTIONS]Options:  --seq-len INTEGER               Input sequence length.  [default: 20]  --d-model INTEGER               d_model  [default: 512]  --d-ff INTEGER                  d_ff  [default: 2048]  --n-head INTEGER                n_head  [default: 8]  --batch-size INTEGER            Batch size  [default: 128]  --max-steps INTEGER             Max train steps.  [default: 300000]  --dataset [iwslt15|wmt14|wmt15]                                  Which translation dataset to use.  [default:                                  iwslt15]  --help                          Show this message and exit.# Train a model on dataset WMT14:$ python train.py --dataset wmt14

Evaluate a Trained Model

Let's say, the model is saved in foldertransformer-wmt14-seq20-d512-head8-1541573730 incheckpoints folder.

$ python eval.py transformer-wmt14-seq20-d512-head8-1541573730

With the default config, this implementation gets BLEU ~ 20 on wmt14 test set.

Implementation Notes

[WIP] A couple of tricking points in the implementation.

How to construct the mask correctly?
How to correctly shift decoder input (as training input) and decoder target (as ground truth in the loss function)?
How to make the prediction in an autoregressive way?
Keeping the embedding of<pad> as a constant zero vector is sorta important.

About

Implementation of Transformer Model in Tensorflow

lilianweng.github.io/lil-log/2018/06/24/attention-attention.html

Releases

No releases published

Packages

No packages published

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Transformer

Setup

Train a Model

Evaluate a Trained Model

Implementation Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

lilianweng/transformer-tensorflow

Folders and files

Latest commit

History

Repository files navigation

Transformer

Setup

Train a Model

Evaluate a Trained Model

Implementation Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages