snehashischatterjee1997/ImageCaptioning.pytorchPublic

forked fromruotianluo/ImageCaptioning.pytorch

NotificationsYou must be signed in to change notification settings
Fork0
Star0

I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch for archive)

License

MIT license

0 stars 420 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 377 Commits
captioning		captioning
cider @ e9b736d		cider @ e9b736d
coco-caption @ ea20010		coco-caption @ ea20010
configs		configs
data		data
projects		projects
scripts		scripts
tools		tools
vis		vis
.gitmodules		.gitmodules
ADVANCED.md		ADVANCED.md
LICENSE		LICENSE
MODEL_ZOO.md		MODEL_ZOO.md
README.md		README.md
setup.py		setup.py

Repository files navigation

An Image Captioning codebase

This is a codebase for image captioning research.

It supports:

Self critical training fromSelf-critical Sequence Training for Image Captioning
Bottom up feature fromref.
Test time ensemble
Multi-GPU training. (DistributedDataParallel is now supported with the help of pytorch-lightning, seeADVANCED.md for details)
Transformer captioning model.

A simple demo colab notebook is availablehere

Requirements

Python 3
PyTorch 1.3+ (along with torchvision)
cider (already been added as a submodule)
coco-caption (already been added as a submodule) (Remember to follow initialization steps in coco-caption/README.md)
yacs

Pretrained models

CheckoutMODEL_ZOO.md.

If you want to do evaluation only, you can then followthis section after downloading the pretrained models (and also the pretrained resnet101 or precomputed bottomup features, seedata/README.md).

Train your own network on COCO/Flickr30k

Prepare data.

We now support both flickr30k and COCO. See details indata/README.md. (Note: the later sections assume COCO dataset; it should be trivial to use flickr30k.)

Start training

$ python tools/train.py --idfc --caption_model newfc --input_json data/cocotalk.json --input_fc_dir data/cocotalk_fc --input_att_dir data/cocotalk_att --input_label_h5 data/cocotalk_label.h5 --batch_size 10 --learning_rate 5e-4 --learning_rate_decay_start 0 --scheduled_sampling_start 0 --checkpoint_path log_fc --save_checkpoint_every 6000 --val_images_use 5000 --max_epochs 30

$ python tools/train.py --cfg configs/fc.yml --idfc

The train script will dump checkpoints into the folder specified by--checkpoint_path (default =log_$id/). By default only save the best-performing checkpoint on validation and the latest checkpoint to save disk space. You can also set--save_history_ckpt to 1 to save every checkpoint.

To resume training, you can specify--start_from option to be the path savinginfos.pkl andmodel.pth (usually you could just set--start_from and--checkpoint_path to be the same).

To checkout the training curve or validation curve, you can use tensorboard. The loss histories are automatically dumped into--checkpoint_path.

The current command use scheduled sampling, you can also set--scheduled_sampling_start to -1 to turn off scheduled sampling.

If you'd like to evaluate BLEU/METEOR/CIDEr scores during training in addition to validation cross entropy loss, use--language_eval 1 option, but don't forget to pull the submodulecoco-caption.

For all the arguments, you can specify them in a yaml file and use--cfg to use the configurations in that yaml file. The configurations in command line will overwrite cfg file if there are conflicts.

For more options, seeopts.py.

Train using self critical

First you should preprocess the dataset and get the cache for calculating cider score:

$ python scripts/prepro_ngrams.py --input_json data/dataset_coco.json --dict_json data/cocotalk.json --output_pkl data/coco-train --split train

Then, copy the model from the pretrained model using cross entropy. (It's not mandatory to copy the model, just for back-up)

$ bash scripts/copy_model.sh fc fc_rl

Then

$ python tools/train.py --id fc_rl --caption_model newfc --input_json data/cocotalk.json --input_fc_dir data/cocotalk_fc --input_att_dir data/cocotalk_att --input_label_h5 data/cocotalk_label.h5 --batch_size 10 --learning_rate 5e-5 --start_from log_fc_rl --checkpoint_path log_fc_rl --save_checkpoint_every 6000 --language_eval 1 --val_images_use 5000 --self_critical_after 30 --cached_tokens coco-train-idxs --max_epoch 50 --train_sample_n 5

$ python tools/train.py --cfg configs/fc_rl.yml --id fc_rl

You will see a huge boost on Cider score, : ).

A few notes on training. Starting self-critical training after 30 epochs, the CIDEr score goes up to 1.05 after 600k iterations (including the 30 epochs pertraining).

Generate image captions

Evaluate on raw images

Note: this doesn't work for models trained with bottomup feature.Now place all your images of interest into a folder, e.g.blah, and runthe eval script:

$ python tools/eval.py --model model.pth --infos_path infos.pkl --image_folder blah --num_images 10

This tells theeval script to run up to 10 images from the given folder. If you have a big GPU you can speed up the evaluation by increasingbatch_size. Use--num_images -1 to process all images. The eval script will create anvis.json file inside thevis folder, which can then be visualized with the provided HTML interface:

$cd vis$ python -m SimpleHTTPServer

Now visitlocalhost:8000 in your browser and you should see your predicted captions.

Evaluate on Karpathy's test split

$ python tools/eval.py --dump_images 0 --num_images 5000 --model model.pth --infos_path infos.pkl --language_eval 1

The defualt split to evaluate is test. The default inference method is greedy decoding (--sample_method greedy), to sample from the posterior, set--sample_method sample.

Beam Search. Beam search can increase the performance of the search for greedy decoding sequence by ~5%. However, this is a little more expensive. To turn on the beam search, use--beam_size N, N should be greater than 1.

Evaluate on COCO test set

$ python tools/eval.py --input_json cocotest.json --input_fc_dir data/cocotest_bu_fc --input_att_dir data/cocotest_bu_att --input_label_h5 none --num_images -1 --model model.pth --infos_path infos.pkl --language_eval 0

You can download the preprocessed filecocotest.json,cocotest_bu_att andcocotest_bu_fc fromlink.

Miscellanea

Using cpu. The code is currently defaultly using gpu; there is even no option for switching. If someone highly needs a cpu model, please open an issue; I can potentially create a cpu checkpoint and modify the eval.py to run the model on cpu. However, there's no point using cpus to train the model.

Train on other dataset. It should be trivial to port if you can create a file likedataset_coco.json for your own dataset.

Live demo. Not supported now. Welcome pull request.

For more advanced features:

CheckoutADVANCED.md.

Reference

If you find this repo useful, please consider citing (no obligation at all):

@article{luo2018discriminability,  title={Discriminability objective for training descriptive captions},  author={Luo, Ruotian and Price, Brian and Cohen, Scott and Shakhnarovich, Gregory},  journal={arXiv preprint arXiv:1803.04376},  year={2018}}

Of course, please cite the original paper of models you are using (You can find references in the model files).

Acknowledgements

Thanks the originalneuraltalk2 and awesome PyTorch team.

About

I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch for archive)

Releases

9tags

Packages

No packages published

Languages

Python94.6%
Shell4.8%
HTML0.6%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

An Image Captioning codebase

Requirements

Pretrained models

Train your own network on COCO/Flickr30k

Prepare data.

Start training

Train using self critical

Generate image captions

Evaluate on raw images

Evaluate on Karpathy's test split

Evaluate on COCO test set

Miscellanea

For more advanced features:

Reference

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

License

snehashischatterjee1997/ImageCaptioning.pytorch

Folders and files

Latest commit

History

Repository files navigation

An Image Captioning codebase

Requirements

Pretrained models

Train your own network on COCO/Flickr30k

Prepare data.

Start training

Train using self critical

Generate image captions

Evaluate on raw images

Evaluate on Karpathy's test split

Evaluate on COCO test set

Miscellanea

For more advanced features:

Reference

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages