Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

[IJCAI 2022] Official Pytorch code for paper “S2 Transformer for Image Captioning”

License

NotificationsYou must be signed in to change notification settings

zchoi/S2-Transformer

Repository files navigation

Official code implementation for the paperS2 Transformer for Imgae Captioning
Pengpeng Zeng, Haonan Zhang, Jingkuan Song, and Lianli Gao

Relationship-Sensitive Transformer

Table of Contents

Environment setup

Clone this repository and create them2release conda environment using theenvironment.yml file:

conda env create -f environment.yamlconda activate m2release

Then download spacy data by executing the following command:

python-mspacydownloaden_core_web_md

Note

Python 3 is required to run our code. If you suffer network problems, please downloaden_core_web_md library fromhere, unzip and place it to/your/anaconda/path/envs/m2release/lib/python*/site-packages/

Data Preparation

  • Annotation. Download the annotation filem2_annotations [1]. Extract and put it in the project root directory.
  • Feature. Download processed image featuresResNeXt-101 andResNeXt-152 features [2] (code9vtB), put it in the project root directory.

Update: Image features onOneDrive

Training

Runpython train_transformer.py using the following arguments:

ArgumentPossible values
--exp_nameExperiment name
--batch_sizeBatch size (default: 50)
--workersNumber of workers, accelerate model training in the xe stage.
--headNumber of heads (default: 8)
--resume_lastIf used, the training will be resumed from the last checkpoint.
--resume_bestIf used, the training will be resumed from the best checkpoint.
--features_pathPath to visual features file (h5py)
--annotation_folderPath to annotations
--num_clustersNumber of pseudo regions

For example, to train the model, run the following command:

pythontrain_transformer.py--exp_nameS2--batch_size50--m40--head8--features_path/path/to/features--num_clusters5

or just run:

bash train.sh

Note

We applytorch.distributed to train our model, you can set theworldSize intrain_transformer.py to determine the number of GPUs for your training.

Evaluation

Offline Evaluation.

Runpython test_transformer.py to evaluate the model using the following arguments:

python test_transformer.py --batch_size 10 --features_path /path/to/features --model_path /path/to/saved_transformer_models/ckpt --num_clusters 5

Tip

We have removed theSPICE evaluation metric during training because it is time-cost. You can add it when evaluating the model: download thisfile and put it in/path/to/evaluation/, then uncomment codes ininit.py.

We provide checkpointhere, you will get the following results (second row):

ModelB@1B@4MRCS
Our Paper (ResNext101)81.139.629.659.1133.523.2
Reproduced Model (ResNext101)81.239.929.659.1133.723.3

Online Evaluation

We also report the performance of our model on the online COCO test server with an ensemble of four S2 models. The detailed online test code can be obtained in thisrepo.

Reference and Citation

Reference

[1] Cornia, M., Stefanini, M., Baraldi, L., & Cucchiara, R. (2020). Meshed-memory transformer for image captioning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[2] Xuying Zhang, Xiaoshuai Sun, Yunpeng Luo, Jiayi Ji, Yiyi Zhou, Yongjian Wu, FeiyueHuang, and Rongrong Ji. Rstnet: Captioning with adaptive attention on visual and non-visual words. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15465–15474, 2021.

Citation

@inproceedings{S2,author    ={Pengpeng Zeng* and               Haonan Zhang* and               Jingkuan Song and               Lianli Gao},title     ={S2 Transformer for Image Captioning},booktitle ={IJCAI},pages     ={1608--1614}  year      ={2022}}

Acknowledgements

Thanks Zhanget.al for releasing the visual features (ResNeXt-101 and ResNeXt-152). Our code implementation is also based on theirrepo.
Thanks for the original annotations prepared byM2 Transformer, and effective visual representation fromgrid-feats-vqa.

Releases

No releases published

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp