ronghanghu/snmnPublic

NotificationsYou must be signed in to change notification settings
Fork6
Star71

Code release for Hu et al., Explainable Neural Computation via Stack Neural Module Networks. in ECCV, 2018

License

BSD-2-Clause license

71 stars 6 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
exp_clevr_snmn		exp_clevr_snmn
exp_vqa		exp_vqa
models_clevr_snmn		models_clevr_snmn
models_vqa		models_vqa
util		util
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md

Repository files navigation

Explainable Neural Computation via Stack Neural Module Networks

This repository contains the code for the following paper:

R. Hu, J. Andreas, T. Darrell, K. Saenko,Explainable Neural Computation via Stack Neural Module Networks. in ECCV, 2018. (PDF)

@inproceedings{hu2018explainable,  title={Explainable Neural Computation via Stack Neural Module Networks},  author={Hu, Ronghang and Andreas, Jacob and Darrell, Trevor and Saenko, Kate},  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},  year={2018}}

Project Page:http://ronghanghu.com/snmn

Installation

Install Python 3 (Anaconda recommended:https://www.continuum.io/downloads).
Install TensorFlow (we used TensorFlow 1.5.0 in our experiments):
pip install tensorflow-gpu (orpip install tensorflow-gpu==1.5.0 to install TensorFlow 1.5.0)
Download this repository or clone with Git, and then enter the root directory of the repository:
git clone https://github.com/ronghanghu/snmn.git && cd snmn

Train and evaluate on the CLEVR (and CLEVR-Ref) dataset

Note (08/04/2019): there was previously an error in the released code -- the gradient clipping was missing in the released version, causing training to be unstable (especially for VQAv1 and VQAv2). This error has been fixed now.

Download and preprocess the data

Download the CLEVR dataset fromhttp://cs.stanford.edu/people/jcjohns/clevr/, and symbol link it toexp_clevr_snmn/clevr_dataset. After this step, the file structure should look like

exp_clevr_snmn/clevr_dataset/  images/    train/      CLEVR_train_000000.png      ...    val/    test/  questions/    CLEVR_train_questions.json    CLEVR_val_questions.json    CLEVR_test_questions.json  ...

(Optional) If you want to run any experiments on the CLEVR-Ref dataset for the referential expression grounding task, you can download it fromhere, and symbol link it toexp_clevr_snmn/clevr_loc_dataset. After this step, the file structure should look like

exp_clevr_snmn/clevr_loc_dataset/  images/    loc_train/      CLEVR_loc_train_000000.png      ...    loc_val/    loc_test/  questions/    CLEVR_loc_train_questions.json    CLEVR_loc_val_questions.json    CLEVR_loc_test_questions.json  ...

Extract visual features from the images and store them on the disk. In our experiments, we extract visual features using ResNet-101 C4 block. Then, construct the "expert layout" from ground-truth functional programs, and build image collections (imdb) for CLEVR (and CLEVR-Ref). These procedures can be down as follows.

./exp_clevr_snmn/tfmodel/resnet/download_resnet_v1_101.sh  # download ResNet-101cd ./exp_clevr_snmn/data/python extract_resnet101_c4.py  # feature extractionpython get_ground_truth_layout.py  # construct expert policypython build_clevr_imdb.py  # build image collectionscd ../../# (Optional, if you want to run on the CLEVR-Ref dataset)cd ./exp_clevr_snmn/data/python extract_resnet101_c4_loc.py  # feature extractionpython get_ground_truth_layout_loc.py  # construct expert policypython build_clevr_imdb_loc.py  # build image collectionscd ../../

Training

Add the root of this repository to PYTHONPATH:export PYTHONPATH=.:$PYTHONPATH
Train on the CLEVR dataset for VQA:
- with ground-truth layout
  python exp_clevr_snmn/train_net_vqa.py --cfg exp_clevr_snmn/cfgs/vqa_gt_layout.yaml
- without ground-truth layout
  python exp_clevr_snmn/train_net_vqa.py --cfg exp_clevr_snmn/cfgs/vqa_scratch.yaml
(Optional) Train on the CLEVR-Ref dataset for the REF task:
- with ground-truth layout
  python exp_clevr_snmn/train_net_loc.py --cfg exp_clevr_snmn/cfgs/loc_gt_layout.yaml
- without ground-truth layout
  python exp_clevr_snmn/train_net_loc.py --cfg exp_clevr_snmn/cfgs/loc_scratch.yaml
(Optional) Train jointly on the CLEVR and CLEVR-Ref datasets for VQA and REF tasks:
- with ground-truth layout
  python exp_clevr_snmn/train_net_joint.py --cfg exp_clevr_snmn/cfgs/joint_gt_layout.yaml
- without ground-truth layout
  python exp_clevr_snmn/train_net_joint.py --cfg exp_clevr_snmn/cfgs/joint_scratch.yaml

Note:

By default, the above scripts use GPU 0. To run on a different GPU, appendGPU_ID parameter to the commands above (e.g. appendingGPU_ID 2 to use GPU 2). During training, the script will write TensorBoard events toexp_clevr_snmn/tb/{exp_name}/ and save the snapshots underexp_clevr_snmn/tfmodel/{exp_name}/.
When training without ground-truth layout, there is some variance in performance between each run, and training sometimes gets stuck in poor local minima. In our experiments, before evalutating on the test split, we took 4 trials and selected the best one based on validation performance.

Test

Add the root of this repository to PYTHONPATH:export PYTHONPATH=.:$PYTHONPATH
Evaluate on the CLEVR dataset for the VQA task:
python exp_clevr_snmn/test_net_vqa.py --cfg exp_clevr_snmn/cfgs/{exp_name}.yaml TEST.ITER 200000
where{exp_name} should be one ofvqa_gt_layout,vqa_scratch,joint_gt_layout andjoint_scratch.
Expected accuracy: 96.6% forvqa_gt_layout, 93.0% forvqa_scratch, 96.5% forjoint_gt_layout, 93.9% forjoint_scratch. Note:
- The above evaluation script will print out the accuracy (only for val split) and also save it underexp_clevr_snmn/results/{exp_name}/. It will also save a prediction output file in this directory.
- The above evaluation script will generate 100 visualizations by default, and save it underexp_clevr_snmn/results/{exp_name}/. You may change the number of visualizations withTEST.NUM_VIS parameter (e.g. appendingTEST.NUM_VIS 400 to the commands above to generate 400 visualizations).
- By default, the above script evaluates on thevalidation split of CLEVR. To evaluate on thetest split, appendTEST.SPLIT_VQA test to the command above. As there is no ground-truth answers fortest split in the downloaded CLEVR data,the displayed accuracy will be zero on the test split. You may email the prediction outputs inexp_clevr_snmn/results/{exp_name}/ to the CLEVR dataset authors for thetest split accuracy.
- By default, the above script uses GPU 0. To run on a different GPU, appendGPU_ID parameter to the commands above (e.g. appendingGPU_ID 2 to use GPU 2).
(Optional) Evaluate on the CLEVR-Ref dataset for the REF task:
python exp_clevr_snmn/test_net_loc.py --cfg exp_clevr_snmn/cfgs/{exp_name}.yaml TEST.ITER 200000
where{exp_name} should be one ofloc_gt_layout,loc_scratch,joint_gt_layout andjoint_scratch.
Expected accuracy (Precision@1): 96.0% forloc_gt_layout, 93.4% forloc_scratch, 96.2% forjoint_gt_layout, 95.4% forjoint_scratch. Note:
- The above evaluation script will print out the accuracy (Precision@1) and also save it underexp_clevr_snmn/results/{exp_name}/.
- The above evaluation script will generate 100 visualizations by default, and save it underexp_clevr_snmn/results/{exp_name}/. You may change the number of visualizations withTEST.NUM_VIS parameter (e.g. appendingTEST.NUM_VIS 400 to the commands above to generate 400 visualizations).
- By default, the above script evaluates on thevalidation split of CLEVR-Ref. To evaluate on thetest split, appendTEST.SPLIT_LOC loc_test to the command above.
- By default, the above script uses GPU 0. To run on a different GPU, appendGPU_ID parameter to the commands above (e.g. appendingGPU_ID 2 to use GPU 2).

Train and evaluate on the VQAv1 and VQAv2 datasets

Download and preprocess the data

Download the VQAv1 and VQAv2 dataset annotations fromhttp://www.visualqa.org/download.html, and symbol link them toexp_vqa/vqa_dataset. After this step, the file structure should look like

exp_vqa/vqa_dataset/  Questions/    OpenEnded_mscoco_train2014_questions.json    OpenEnded_mscoco_val2014_questions.json    OpenEnded_mscoco_test-dev2015_questions.json    OpenEnded_mscoco_test2015_questions.json    v2_OpenEnded_mscoco_train2014_questions.json    v2_OpenEnded_mscoco_val2014_questions.json    v2_OpenEnded_mscoco_test-dev2015_questions.jso    v2_OpenEnded_mscoco_test2015_questions.json  Annotations/    mscoco_train2014_annotations.json    mscoco_val2014_annotations.json    v2_mscoco_train2014_annotations.json    v2_mscoco_val2014_annotations.json    v2_mscoco_train2014_complementary_pairs.json    v2_mscoco_val2014_complementary_pairs.json

Download the COCO images fromhttp://mscoco.org/, and symbol link it toexp_vqa/coco_dataset. After this step, the file structure should look like

exp_vqa/coco_dataset/  images/    train2014/      COCO_train2014_000000000009.jpg      ...    val2014/      COCO_val2014_000000000042.jpg      ...    test2015/      COCO_test2015_000000000001.jpg      ...

Extract visual features from the images and store them on the disk. In our experiments, we extract visual features using ResNet-152 C5 block. Then, build image collections (imdb) for VQAv1 and VQAv2. These procedures can be done as follows.

./exp_vqa/tfmodel/resnet/download_resnet_v1_152.sh  # Download ResNet-152cd ./exp_vqa/data/python extract_resnet152_c5_7x7.py  # feature extraction for all COCO imagespython build_vqa_imdb_r152_7x7.py  # build image collections for VQAv1python build_vqa_imdb_r152_7x7_vqa_v2.py  # build image collections for VQAv2cd ../../

(Note that this repository already contains the "expert layout" from parsing results using Stanford Parser. They are the same as inN2NMN.)

Pre-trained models

You may skip the training procedure and directly download the pretrained modelshere for evaluation. The downloaded models should be put underexp_vqa/tfmodel/{exp_name}/.

Training

Add the root of this repository to PYTHONPATH:export PYTHONPATH=.:$PYTHONPATH
Train on the VQAv1 dataset:
- with ground-truth layout
  python exp_vqa/train_net_vqa.py --cfg exp_vqa/cfgs/vqa_v1_gt_layout.yaml
- without ground-truth layout
  python exp_vqa/train_net_vqa.py --cfg exp_vqa/cfgs/vqa_v1_scratch.yaml
Train on the VQAv2 dataset:
- with ground-truth layout
  python exp_vqa/train_net_vqa.py --cfg exp_vqa/cfgs/vqa_v2_gt_layout.yaml
- without ground-truth layout
  python exp_vqa/train_net_vqa.py --cfg exp_vqa/cfgs/vqa_v2_scratch.yaml

Note:

By default, the above scripts use GPU 0, and train on the union oftrain2014 andval2014 splits. To run on a different GPU, appendGPU_ID parameter to the commands above (e.g. appendingGPU_ID 2 to use GPU 2). During training, the script will write TensorBoard events toexp_vqa/tb/{exp_name}/ and save the snapshots underexp_vqa/tfmodel/{exp_name}/.

Test

Add the root of this repository to PYTHONPATH:export PYTHONPATH=.:$PYTHONPATH
Evaluate on the VQAv1 dataset:
python exp_vqa/test_net_vqa.py --cfg exp_vqa/cfgs/{exp_name}.yaml TEST.ITER 20000
where{exp_name} should be one ofvqa_v1_gt_layout andvqa_v1_scratch. Note:
- By default, the above script evaluates on thetest-dev2015 split of VQAv1. To evaluate on thetest2015 split, appendTEST.SPLIT_VQA test2015 to the command above.
- By default, the above script uses GPU 0. To run on a different GPU, appendGPU_ID parameter to the commands above (e.g. appendingGPU_ID 2 to use GPU 2).
- The above evaluation script will not print out the accuracy (the displayed accuracy will be zero), but will write the prediction outputs toexp_vqa/eval_outputs/{exp_name}/, which can be uploaded to the evaluation sever (http://www.visualqa.org/roe.html) for evaluation.Expected accuracy: 66.0% forvqa_v1_gt_layout, 65.5% forvqa_v1_scratch.
Evaluate on the VQAv2 dataset:
python exp_vqa/test_net_vqa.py --cfg exp_vqa/cfgs/{exp_name}.yaml TEST.ITER 40000
where{exp_name} should be one ofvqa_v2_gt_layout andvqa_v2_scratch. Note:
- By default, the above script uses GPU 0. To run on a different GPU, appendGPU_ID parameter to the commands above (e.g. appendingGPU_ID 2 to use GPU 2).
- The above evaluation script will not print out the accuracy (the displayed accuracy will be zero), but will write the prediction outputs toexp_vqa/eval_outputs_vqa_v2/{exp_name}/, which can be uploaded to the evaluation sever (http://www.visualqa.org/roe.html) for evaluation.Expected accuracy: 64.0% forvqa_v2_gt_layout, 64.1% forvqa_v2_scratch.

Acknowledgements

The outline of the configuration code (such asmodels_clevr_snmn/config.py) is obtained from theDetectron codebase.

About

Code release for Hu et al., Explainable Neural Computation via Stack Neural Module Networks. in ECCV, 2018

ronghanghu.com/snmn/

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Explainable Neural Computation via Stack Neural Module Networks

Installation

Train and evaluate on the CLEVR (and CLEVR-Ref) dataset

Download and preprocess the data

Training

Test

Train and evaluate on the VQAv1 and VQAv2 datasets

Download and preprocess the data

Pre-trained models

Training

Test

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

License

ronghanghu/snmn

Folders and files

Latest commit

History

Repository files navigation

Explainable Neural Computation via Stack Neural Module Networks

Installation

Train and evaluate on the CLEVR (and CLEVR-Ref) dataset

Download and preprocess the data

Training

Test

Train and evaluate on the VQAv1 and VQAv2 datasets

Download and preprocess the data

Pre-trained models

Training

Test

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages