zchoi/S2-TransformerPublic

NotificationsYou must be signed in to change notification settings
Fork4
Star84

[IJCAI 2022] Official Pytorch code for paper “S2 Transformer for Image Captioning”

License

MIT license

84 stars 4 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.github/workflows		.github/workflows
data		data
evaluation		evaluation
models		models
tensorboard_logs/demo_rl_v5(133.7)		tensorboard_logs/demo_rl_v5(133.7)
utils		utils
vocab_language		vocab_language
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
framework.png		framework.png
test_transformer.py		test_transformer.py
train.sh		train.sh
train_transformer.py		train_transformer.py
vocab.pkl		vocab.pkl
vocab_idx2word.pkl		vocab_idx2word.pkl

Repository files navigation

S² Transformer for Image Captioning [IJCAI 2022]

Official code implementation for the paperS² Transformer for Imgae Captioning
Pengpeng Zeng, Haonan Zhang, Jingkuan Song, and Lianli Gao

Environment setup

Clone this repository and create them2release conda environment using theenvironment.yml file:

conda env create -f environment.yamlconda activate m2release

Then download spacy data by executing the following command:

python-mspacydownloaden_core_web_md

Note

Python 3 is required to run our code. If you suffer network problems, please downloaden_core_web_md library fromhere, unzip and place it to/your/anaconda/path/envs/m2release/lib/python*/site-packages/

Data Preparation

Annotation. Download the annotation filem2_annotations [1]. Extract and put it in the project root directory.
Feature. Download processed image featuresResNeXt-101 andResNeXt-152 features [2] (code9vtB), put it in the project root directory.

Update: Image features onOneDrive

Training

Runpython train_transformer.py using the following arguments:

Argument	Possible values
`--exp_name`	Experiment name
`--batch_size`	Batch size (default: 50)
`--workers`	Number of workers, accelerate model training in the xe stage.
`--head`	Number of heads (default: 8)
`--resume_last`	If used, the training will be resumed from the last checkpoint.
`--resume_best`	If used, the training will be resumed from the best checkpoint.
`--features_path`	Path to visual features file (h5py)
`--annotation_folder`	Path to annotations
`--num_clusters`	Number of pseudo regions

For example, to train the model, run the following command:

pythontrain_transformer.py--exp_nameS2--batch_size50--m40--head8--features_path/path/to/features--num_clusters5

or just run:

bash train.sh

Note

We applytorch.distributed to train our model, you can set theworldSize intrain_transformer.py to determine the number of GPUs for your training.

Evaluation

Offline Evaluation.

Runpython test_transformer.py to evaluate the model using the following arguments:

python test_transformer.py --batch_size 10 --features_path /path/to/features --model_path /path/to/saved_transformer_models/ckpt --num_clusters 5

Tip

We have removed theSPICE evaluation metric during training because it is time-cost. You can add it when evaluating the model: download thisfile and put it in/path/to/evaluation/, then uncomment codes ininit.py.

We provide checkpointhere, you will get the following results (second row):

Model	B@1	B@4	M	R	C	S
Our Paper (ResNext101)	81.1	39.6	29.6	59.1	133.5	23.2
Reproduced Model (ResNext101)	81.2	39.9	29.6	59.1	133.7	23.3

Online Evaluation

We also report the performance of our model on the online COCO test server with an ensemble of four S² models. The detailed online test code can be obtained in thisrepo.

Reference and Citation

Reference

[1] Cornia, M., Stefanini, M., Baraldi, L., & Cucchiara, R. (2020). Meshed-memory transformer for image captioning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[2] Xuying Zhang, Xiaoshuai Sun, Yunpeng Luo, Jiayi Ji, Yiyi Zhou, Yongjian Wu, FeiyueHuang, and Rongrong Ji. Rstnet: Captioning with adaptive attention on visual and non-visual words. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15465–15474, 2021.

Citation

@inproceedings{S2,author    ={Pengpeng Zeng* and               Haonan Zhang* and               Jingkuan Song and               Lianli Gao},title     ={S2 Transformer for Image Captioning},booktitle ={IJCAI},pages     ={1608--1614}  year      ={2022}}

Acknowledgements

Thanks Zhanget.al for releasing the visual features (ResNeXt-101 and ResNeXt-152). Our code implementation is also based on theirrepo.
Thanks for the original annotations prepared byM² Transformer, and effective visual representation fromgrid-feats-vqa.

About

[IJCAI 2022] Official Pytorch code for paper “S2 Transformer for Image Captioning”

www.ijcai.org/proceedings/2022/0224.pdf

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

S² Transformer for Image Captioning [IJCAI 2022]

Table of Contents

Environment setup

Data Preparation

Training

Evaluation

Offline Evaluation.

Online Evaluation

Reference and Citation

Reference

Citation

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors2

Uh oh!

Languages

Movatterモバイル変換

License

zchoi/S2-Transformer

Folders and files

Latest commit

History

Repository files navigation

S2 Transformer for Image Captioning [IJCAI 2022]

Table of Contents

Environment setup

Data Preparation

Training

Evaluation

Offline Evaluation.

Online Evaluation

Reference and Citation

Reference

Citation

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors2

Uh oh!

Languages

S² Transformer for Image Captioning [IJCAI 2022]

Packages