torralba-lab/im2recipePublic

NotificationsYou must be signed in to change notification settings
Fork96
Star406

Code supporting the CVPR 2017 paper "Learning Cross-modal Embeddings for Cooking Recipes and Food Images"

License

MIT license

406 stars 96 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
drivers		drivers
loader		loader
model		model
pyscripts		pyscripts
th-skip @ 7d27965		th-skip @ 7d27965
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
args.lua		args.lua
main.lua		main.lua
requirements.txt		requirements.txt

Repository files navigation

im2recipe: Learning Cross-modal Embeddings for Cooking Recipes and Food Images

This repository contains the code to train and evaluate models from the paper:
Learning Cross-modal Embeddings for Cooking Recipes and Food Images

Clone it using:

git clone --recursive https://github.com/torralba-lab/im2recipe.git

If you find this code useful, please consider citing:

@inproceedings{salvador2017learning,  title={Learning Cross-modal Embeddings for Cooking Recipes and Food Images},  author={Salvador, Amaia and Hynes, Nicholas and Aytar, Yusuf and Marin, Javier and           Ofli, Ferda and Weber, Ingmar and Torralba, Antonio},  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},  year={2017}}

Installation

InstallTorch:

git clone https://github.com/torch/distro.git ~/torch --recursivecd ~/torch; bash install-deps;./install.sh

Install the following packages:

luarocks install torchluarocks install nnluarocks install imageluarocks install optimluarocks install rnnluarocks install loadcaffeluarocks install moonscript

Install CUDA and cudnn. Then run:

luarocks install cutorchluarocks install cunnluarocks install cudnn

A custom fork of torch-hdf5 with string support is needed:

cd ~/torch/extragit clone https://github.com/nhynes/torch-hdf5.gitcd torch-hdf5git checkout chars2luarocks build hdf5-0-0.rockspec

We use Python2.7 for data processing. Install dependencies withpip install -r requirements.txt

Recipe1M Dataset

In order to get access to the dataset, please fill the following formhere.

Vision models

We used the following pretrained vision models:

VGG-16 (prototxt andcaffemodel).

when training, point arguments-proto and-caffemodel to the files you just downloaded.

ResNet-50 (torchfile).

when training, point the argument-resnet_model to this file.

Out-of-the-box training

To train the model, you will need the following files:

data/data.h5: HDF5 file containing skip-instructions vectors, ingredient ids, categories and preprocessed images.
data/text/vocab.bin: ingredient Word2Vec vocabulary. Used during training to select word2vec vectors given ingredient ids.

The links to download them are availablehere.

Prepare training data

We also provide the steps to format and prepare Recipe1M data for training the trijoint model. We hope these instructions will allow others to train similar models with other data sources as well.

Choosing semantic categories

We provide the script we used to extract semantic categories from bigrams in recipe titles:

Runpython bigrams --crtbgrs. This will save to disk all bigrams in the corpus of all recipe titles in the training set, sorted by frequency.
Running the same script with--nocrtbgrs will create class labels from those bigrams adding food101 categories.

These steps will create a file calledclasses1M.pkl in./data/ that will be used later to create the HDF5 file including categories.

Word2Vec

Training word2vec with recipe data:

Runpython tokenize_instructions.py train to create a single file with all training recipe text.
Run the samepython tokenize_instructions.py to generate the same file with data for all partitions (needed for skip-thoughts later).
Download and compileword2vec
Train with:

./word2vec -hs 1 -negative 0 -window 10 -cbow 0 -iter 10 -size 300 -binary 1 -min-count 10 -threads 20 -train tokenized_instructions_train.txt -output vocab.bin

Runpython get_vocab.py vocab.bin to extract dictionary entries from the w2v binary file. This script will savevocab.txt, which will be used to create the dataset later.
Movevocab.bin andvocab.txt to./data/text/.

Skip-instructions

Navigate toth-skip
Create directories where data will be stored:

mkdir datamkdir snaps

Prepare the dataset running fromscripts directory:

python mk_dataset.py --dataset /path/to/recipe1M/ --vocab /path/to/w2v/vocab.txt --toks /path/to/tokenized_instructions.txt

wheretokenized_instructions.txt contains text instructions for the entire dataset (generated in step 2 of the Word2Vec section above), andvocab.txt are the entries of the word2vec dictionary (generated in step 6 in the previous section).

Train the model with:

moon main.moon -dataset data/dataset.h5 -dim 1024 -nEncRNNs 2 -snapfile snaps/snapfile -savefreq 500 -batchSize 128 -w2v /path/to/w2v/vocab.bin

Get encoder from the trained model. Fromscripts:

moon extract_encoder.moon../snaps/snapfile_xx.t7encoder.t7true

Extract features. Fromscripts:

moon encode.moon -data ../data/dataset.h5-model encoder.t7-partition test-out encs_test_1024.t7

Run for-partition = {train,val,test} and-out={encs_train_1024,encs_val_1024,encs_test_1024} to extract features for the dataset.

Move filesencs_*_1024.t7 containing skip-instructions features to./data/text.

Creating HDF5 file

Navigate back to./. Run the following from./pyscripts:

python mk_dataset.py -vocab /path/to/w2v/vocab.txt -dataset /path/to/recipe1M/ -h5_data /path/to/h5/outfile/data.h5-stvecs /path/to/skip-instr_files/

Training

Train the model with:

th main.lua -dataset /path/to/h5/file/data.h5 -ingrW2V /path/to/w2v/vocab.bin-net resnet -resnet_model /path/to/resnet/model/resnet-50.t7-snapfile snaps/snap-dispfreq 1000-valfreq 10000

Note: Again, this can be run without arguments with default parameters if files are in the default location.

You can use multiple GPUs to train the model with the-ngpus flag. With 4 GTX Titan X you can set-batchSize to ~150. This is the default config, which will make the model converge in about 3 days.
Plot loss curves anytime withpython plotcurve.py -logfile /path/to/logfile.txt. Ifdispfreq andvalfreq are different than default, they need to be passed as arguments to this script for the curves to be correctly displayed. Running this script will also give you the elapsed training time.logifle.txt should contain the stdout ofmain.lua. Redirect it withth main.lua > /path/to/logfile.txt.

Testing

Extract features from test setth main.lua -test 1 -loadsnap snaps/snap_xx.dat. They will be saved inresults.
After feature extraction, compute MedR and recall scores withpython rank.py.
Extracting embeddings for any dataset partition is possible with theextract flag, which can be eithertrain,val ortest (default).

Visualization

We provide a script to visualize top-1 im2recipe examples in./pyscripts/vis.py. It will save figures under./data/figs/.

Pretrained model

Our best model can be downloadedhere.You can test it with:

th main.lua -test 1 -loadsnap im2recipe_model.t7

Contact

For any questions or suggestions you can use the issues section or reach us atamaia.salvador@upc.edu ornhynes@mit.edu.

About

Code supporting the CVPR 2017 paper "Learning Cross-modal Embeddings for Cooking Recipes and Food Images"

Movatterモバイル変換

License

torralba-lab/im2recipe

Folders and files

Latest commit

History

Repository files navigation

im2recipe: Learning Cross-modal Embeddings for Cooking Recipes and Food Images

Contents

Installation

Recipe1M Dataset

Vision models

Out-of-the-box training

Prepare training data

Choosing semantic categories

Word2Vec

Skip-instructions

Creating HDF5 file

Training

Testing

Visualization

Pretrained model

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors3

Uh oh!

Languages

Packages