- Notifications
You must be signed in to change notification settings - Fork3.6k
Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
License
tensorflow/tensor2tensor
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Tensor2Tensor, orT2T for short, is a libraryof deep learning models and datasets designed to make deep learning moreaccessible andaccelerate MLresearch.
T2T was developed by researchers and engineers in theGoogle Brain team and a communityof users. It is now deprecated — we keep it running and welcomebug-fixes, but encourage users to use the successor libraryTrax.
This iPython notebookexplains T2T and runs in your browser using a free VM from Google,no installation needed. Alternatively, here is a one-command version thatinstalls T2T, downloads MNIST, trains a model and evaluates it:
pip install tensor2tensor && t2t-trainer \ --generate_data \ --data_dir=~/t2t_data \ --output_dir=~/t2t_train/mnist \ --problem=image_mnist \ --model=shake_shake \ --hparams_set=shake_shake_quick \ --train_steps=1000 \ --eval_steps=100
- Suggested Datasets and Models
- Basics
- T2T Overview
- Adding your own components
- Adding a dataset
- Papers
- Run on FloydHub
Below we list a number of tasks that can be solved with T2T whenyou train the appropriate model on the appropriate problem.We give the problem and model below and we suggest a setting ofhyperparameters that we know works well in our setup. We usuallyrun either on Cloud TPUs or on 8-GPU machines; you might needto modify the hyperparameters if you run on a different setup.
For evaluating mathematical expressions at the character level involving addition, subtraction and multiplication of both positive and negative decimal numbers with variable digits assigned to symbolic variables, use
- theMLU data-set:
--problem=algorithmic_math_two_variables
You can try solving the problem with different transformer models and hyperparameters as described in thepaper:
- Standard transformer:
--model=transformer
--hparams_set=transformer_tiny
- Universal transformer:
--model=universal_transformer
--hparams_set=universal_transformer_tiny
- Adaptive universal transformer:
--model=universal_transformer
--hparams_set=adaptive_universal_transformer_tiny
For answering questions based on a story, use
- thebAbi data-set:
--problem=babi_qa_concat_task1_1k
You can choose the bAbi task from the range [1,20] and the subset from 1k or10k. To combine test data from all tasks into a single test set, use--problem=babi_qa_concat_all_tasks_10k
For image classification, we have a number of standard data-sets:
- ImageNet (a large data-set):
--problem=image_imagenet
, or oneof the re-scaled versions (image_imagenet224
,image_imagenet64
,image_imagenet32
) - CIFAR-10:
--problem=image_cifar10
(or--problem=image_cifar10_plain
to turn off data augmentation) - CIFAR-100:
--problem=image_cifar100
- MNIST:
--problem=image_mnist
For ImageNet, we suggest to use the ResNet or Xception, i.e.,use--model=resnet --hparams_set=resnet_50
or--model=xception --hparams_set=xception_base
.Resnet should get to above 76% top-1 accuracy on ImageNet.
For CIFAR and MNIST, we suggest to try the shake-shake model:--model=shake_shake --hparams_set=shakeshake_big
.This setting trained for--train_steps=700000
should yieldclose to 97% accuracy on CIFAR-10.
For (un)conditional image generation, we have a number of standard data-sets:
- CelebA:
--problem=img2img_celeba
for image-to-image translation, namely,superresolution from 8x8 to 32x32. - CelebA-HQ:
--problem=image_celeba256_rev
for a downsampled 256x256. - CIFAR-10:
--problem=image_cifar10_plain_gen_rev
for class-conditional32x32 generation. - LSUN Bedrooms:
--problem=image_lsun_bedrooms_rev
- MS-COCO:
--problem=image_text_ms_coco_rev
for text-to-image generation. - Small ImageNet (a large data-set):
--problem=image_imagenet32_gen_rev
for32x32 or--problem=image_imagenet64_gen_rev
for 64x64.
We suggest to use the Image Transformer, i.e.,--model=imagetransformer
, orthe Image Transformer Plus, i.e.,--model=imagetransformerpp
that usesdiscretized mixture of logistics, or variational auto-encoder, i.e.,--model=transformer_ae
.For CIFAR-10, using--hparams_set=imagetransformer_cifar10_base
or--hparams_set=imagetransformer_cifar10_base_dmol
yields 2.90 bits perdimension. For Imagenet-32, using--hparams_set=imagetransformer_imagenet32_base
yields 3.77 bits per dimension.
For language modeling, we have these data-sets in T2T:
- PTB (a small data-set):
--problem=languagemodel_ptb10k
forword-level modeling and--problem=languagemodel_ptb_characters
for character-level modeling. - LM1B (a billion-word corpus):
--problem=languagemodel_lm1b32k
forsubword-level modeling and--problem=languagemodel_lm1b_characters
for character-level modeling.
We suggest to start with--model=transformer
on this task and use--hparams_set=transformer_small
for PTB and--hparams_set=transformer_base
for LM1B.
For the task of recognizing the sentiment of a sentence, use
- the IMDB data-set:
--problem=sentiment_imdb
We suggest to use--model=transformer_encoder
here and since it isa small data-set, try--hparams_set=transformer_tiny
and train forfew steps (e.g.,--train_steps=2000
).
For speech-to-text, we have these data-sets in T2T:
Librispeech (US English):
--problem=librispeech
forthe whole set and--problem=librispeech_clean
for a smallerbut nicely filtered part.Mozilla Common Voice (US English):
--problem=common_voice
for the whole set--problem=common_voice_clean
for a quality-checked subset.
For summarizing longer text into shorter one we have these data-sets:
- CNN/DailyMail articles summarized into a few sentences:
--problem=summarize_cnn_dailymail32k
We suggest to use--model=transformer
and--hparams_set=transformer_prepend
for this task.This yields good ROUGE scores.
There are a number of translation data-sets in T2T:
- English-German:
--problem=translate_ende_wmt32k
- English-French:
--problem=translate_enfr_wmt32k
- English-Czech:
--problem=translate_encs_wmt32k
- English-Chinese:
--problem=translate_enzh_wmt32k
- English-Vietnamese:
--problem=translate_envi_iwslt32k
- English-Spanish:
--problem=translate_enes_wmt32k
You can get translations in the other direction by appending_rev
tothe problem name, e.g., for German-English use--problem=translate_ende_wmt32k_rev
(note that you still need to download the original data with t2t-datagen--problem=translate_ende_wmt32k
).
For all translation problems, we suggest to try the Transformer model:--model=transformer
. At first it is best to try the base setting,--hparams_set=transformer_base
. When trained on 8 GPUs for 300K stepsthis should reach a BLEU score of about 28 on the English-German data-set,which is close to state-of-the art. If training on a single GPU, try the--hparams_set=transformer_base_single_gpu
setting. For very good resultsor larger data-sets (e.g., for English-French), try the big modelwith--hparams_set=transformer_big
.
See thisexample to know how the translation works.
Here's a walkthrough training a good English-to-German translationmodel using the Transformer model fromAttention Is All YouNeed on WMT data.
pip install tensor2tensor# See what problems, models, and hyperparameter sets are available.# You can easily swap between them (and add new ones).t2t-trainer --registry_helpPROBLEM=translate_ende_wmt32kMODEL=transformerHPARAMS=transformer_base_single_gpuDATA_DIR=$HOME/t2t_dataTMP_DIR=/tmp/t2t_datagenTRAIN_DIR=$HOME/t2t_train/$PROBLEM/$MODEL-$HPARAMSmkdir -p $DATA_DIR $TMP_DIR $TRAIN_DIR# Generate datat2t-datagen \ --data_dir=$DATA_DIR \ --tmp_dir=$TMP_DIR \ --problem=$PROBLEM# Train# * If you run out of memory, add --hparams='batch_size=1024'.t2t-trainer \ --data_dir=$DATA_DIR \ --problem=$PROBLEM \ --model=$MODEL \ --hparams_set=$HPARAMS \ --output_dir=$TRAIN_DIR# DecodeDECODE_FILE=$DATA_DIR/decode_this.txtecho "Hello world" >> $DECODE_FILEecho "Goodbye world" >> $DECODE_FILEecho -e 'Hallo Welt\nAuf Wiedersehen Welt' > ref-translation.deBEAM_SIZE=4ALPHA=0.6t2t-decoder \ --data_dir=$DATA_DIR \ --problem=$PROBLEM \ --model=$MODEL \ --hparams_set=$HPARAMS \ --output_dir=$TRAIN_DIR \ --decode_hparams="beam_size=$BEAM_SIZE,alpha=$ALPHA" \ --decode_from_file=$DECODE_FILE \ --decode_to_file=translation.en# See the translationscat translation.en# Evaluate the BLEU score# Note: Report this BLEU score in papers, not the internal approx_bleu metric.t2t-bleu --translation=translation.en --reference=ref-translation.de
# Assumes tensorflow or tensorflow-gpu installedpip install tensor2tensor# Installs with tensorflow-gpu requirementpip install tensor2tensor[tensorflow_gpu]# Installs with tensorflow (cpu) requirementpip install tensor2tensor[tensorflow]
Binaries:
# Data generatort2t-datagen# Trainert2t-trainer --registry_help
Library usage:
python -c "from tensor2tensor.models.transformer import Transformer"
- Many state of the art and baseline models are built-in and new models can beadded easily (open an issue or pull request!).
- Many datasets across modalities - text, audio, image - available forgeneration and use, and new ones can be added easily (open an issue or pullrequest for public datasets!).
- Models can be used with any dataset and input mode (or even multiple); allmodality-specific processing (e.g. embedding lookups for text tokens) is donewith
bottom
andtop
transformations, which are specified per-feature in themodel. - Support for multi-GPU machines and synchronous (1 master, many workers) andasynchronous (independent workers synchronizing through a parameter server)distributed training.
- Easily swap amongst datasets and models by command-line flag with the datageneration script
t2t-datagen
and the training scriptt2t-trainer
. - Train onGoogle Cloud ML andCloud TPUs.
Problems consist of features such as inputs and targets, and metadata suchas each feature's modality (e.g. symbol, image, audio) and vocabularies. Problemfeatures are given by a dataset, which is stored as aTFRecord
file withtensorflow.Example
protocol buffers. Allproblems are imported inall_problems.py
or are registered with@registry.register_problem
. Runt2t-datagen
to see the list of available problems and download them.
T2TModel
s define the core tensor-to-tensor computation. They apply adefault transformation to each input and output so that models may deal withmodality-independent tensors (e.g. embeddings at the input; and a lineartransform at the output to produce logits for a softmax over classes). Allmodels are imported in themodels
subpackage,inherit fromT2TModel
,and are registered with@registry.register_model
.
Hyperparameter sets are encoded inHParams
objects, and are registered with@registry.register_hparams
.Every model and problem has aHParams
. A basic set of hyperparameters aredefined incommon_hparams.py
and hyperparameter set functions can compose other hyperparameter set functions.
Thetrainer binary is the entrypoint for training, evaluation, andinference. Users can easily switch between problems, models, and hyperparametersets by using the--model
,--problem
, and--hparams_set
flags. Specifichyperparameters can be overridden with the--hparams
flag.--schedule
andrelated flags control local and distributed training/evaluation(distributed training documentation).
T2T's components are registered using a central registration mechanism thatenables easily adding new ones and easily swapping amongst them by command-lineflag. You can add your own components without editing the T2T codebase byspecifying the--t2t_usr_dir
flag int2t-trainer
.
You can do so for models, hyperparameter sets, modalities, and problems. Pleasedo submit a pull request if your component might be useful to others.
See theexample_usr_dir
for an example user directory.
To add a new dataset, subclassProblem
and register it with@registry.register_problem
. SeeTranslateEndeWmt8k
for an example. Also see thedata generatorsREADME.
Click this button to open aWorkspace onFloydHub. You can use the workspace to develop and test your code on a fully configured cloud GPU machine.
Tensor2Tensor comes preinstalled in the environment, you can simply open aTerminal and run your code.
# Test the quick-start on a Workspace's Terminal with this commandt2t-trainer \ --generate_data \ --data_dir=./t2t_data \ --output_dir=./t2t_train/mnist \ --problem=image_mnist \ --model=shake_shake \ --hparams_set=shake_shake_quick \ --train_steps=1000 \ --eval_steps=100
Note: Ensure compliance with the FloydHubTerms of Service.
When referencing Tensor2Tensor, please citethispaper.
@article{tensor2tensor, author = {Ashish Vaswani and Samy Bengio and Eugene Brevdo and Francois Chollet and Aidan N. Gomez and Stephan Gouws and Llion Jones and \L{}ukasz Kaiser and Nal Kalchbrenner and Niki Parmar and Ryan Sepassi and Noam Shazeer and Jakob Uszkoreit}, title = {Tensor2Tensor for Neural Machine Translation}, journal = {CoRR}, volume = {abs/1803.07416}, year = {2018}, url = {http://arxiv.org/abs/1803.07416},}
Tensor2Tensor was used to develop a number of state-of-the-art modelsand deep learning methods. Here we list some papers that were based on T2Tfrom the start and benefited from its features and architecture in waysdescribed in theGoogle Research Blog post introducingT2T.
- Attention Is All You Need
- Depthwise Separable Convolutions for Neural MachineTranslation
- One Model To Learn Them All
- Discrete Autoencoders for Sequence Models
- Generating Wikipedia by Summarizing LongSequences
- Image Transformer
- Training Tips for the Transformer Model
- Self-Attention with Relative Position Representations
- Fast Decoding in Sequence Models using Discrete Latent Variables
- Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
- Universal Transformers
- Attending to Mathematical Language with Transformers
- The Evolved Transformer
- Model-Based Reinforcement Learning for Atari
- VideoFlow: A Flow-Based Generative Model for Video
NOTE: This is not an official Google product.
About
Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.