MILVLG/bottom-up-attention.pytorchPublic

NotificationsYou must be signed in to change notification settings
Fork76
Star304

A PyTorch reimplementation of bottom-up-attention models

License

Apache-2.0 license

304 stars 76 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
bua		bua
configs		configs
datasets		datasets
detectron2 @ be792b9		detectron2 @ be792b9
evaluation		evaluation
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
extract_features.py		extract_features.py
opts.py		opts.py
setup.py		setup.py
train_net.py		train_net.py

Repository files navigation

bottom-up-attention.pytorch

This repository contains aPyTorch reimplementation of thebottom-up-attention project based onCaffe.

We useDetectron2 as the backend to provide completed functions including training, testing and feature extraction. Furthermore, we migrate the pre-trained Caffe-based model from the original repository which can extractthe same visual features as the original model (with deviation < 0.01).

Some example object and attribute predictions for salient image regions are illustrated below. The script to obtain the following visualizations can be foundhere

Prerequisites

Requirements

Note that most of the requirements above are needed for Detectron2.

Installation

Clone the project including the required version (v0.2.1) of Detectron2.Note that if you use another version, some strange problems may occur.
```
# clone the repository inclduing Detectron2(@be792b9)$ git clone --recursive https://github.com/MILVLG/bottom-up-attention.pytorch
```
Install Detectron2
```
$cd detectron2$ pip install -e.$cd ..
```
We recommend using Detectron2 v0.2.1 (@be792b9) as backend for this project, which has been cloned in step 1. We believe a newer Detectron2 version is also compatible with this project unless their interface has been changed (we have tested v0.3 with PyTorch 1.5).

Compile the rest tools using the following script:

# install apex$ git clone https://github.com/NVIDIA/apex.git$cd apex$ python setup.py install$cd ..# install the rest modules$ python setup.py build develop$ pip install ray

Setup

If you want to train or test the model, you need to download the images and annotation files of the Visual Genome (VG) dataset.If you only need to extract visual features using the pre-trained model, you can skip this part.

The original VG images (part1 andpart2) are to be downloaded and unzipped to one folder and put it into thedatasets folder.

The generated annotation files in the original repository are needed to be transformed to a COCO data format required by Detectron2. The preprocessed annotation files can be downloadedhere and unzipped to thedataset folder.

Finally, thedatasets folders will have the following structure:

|-- datasets   |-- visual_genome   |  |-- images   |  |  |  |-- 1.jpg   |  |  |  |-- 2.jpg   |  |  |  |-- ...   |  |  |  |-- ...   |  |-- annotations   |  |  |-- visual_genome_train.json   |  |  |-- visual_genome_test.json   |  |  |-- visual_genome_val.json

Training

The following script will train a bottom-up-attention model on thetrain split of VG.

$ python3 train_net.py --mode d2 \         --config-file configs/d2/train-d2-r101.yaml \         --resume

mode = 'd2' refers to training a model with the Detectron2 backend, which is inspired by thegrid-feats-vqa. We think it is unnecessary to train a new model using thecaffe mode. The pre-trained Caffe models are provided for testing and feature extraction.
config-file refers to all the configurations of the model.
resume refers to a flag if you want to resume training from a specific checkpoint.

Testing

Given the trained model, the following script will test the performance on theval split of VG:

$ python3 train_net.py --mode caffe \         --config-file configs/caffe/test-caffe-r101.yaml \         --eval-only

mode = {'caffe', 'd2'} refers to the used mode. For the converted model from Caffe, you need to use thecaffe mode. For other models trained with Detectron2, you need to use thed2 mode.
config-file refers to all the configurations of the model, which also include the path of the model weights.
eval-only refers to a flag to declare the testing phase.

Feature Extraction

Given the trained model, the following script will extract the bottom-up-attention visual features. Single GPU and multiple GPUs are both supported.

$ python3 extract_features.py --mode caffe \         --num-cpus 32 --gpus'0,1,2,3' \         --extract-mode roi_feats \         --min-max-boxes'10,100' \         --config-file configs/caffe/test-caffe-r101.yaml \         --image-dir<image_dir> --bbox-dir<out_dir> --out-dir<out_dir>         --fastmode

mode = {'caffe', 'd2'} refers to the used mode. For the converted model from Caffe, you need to use thecaffe mode. For other models trained with Detectron2, you need to use thedetectron2 mode.'caffe' is the default value.Note that thedetecron2 mode need to run withRay.
num-cpus refers to the number of cpu cores to use for accelerating the cpu computation.0 stands for using all possible cpus and1 is the default value.
gpus refers to the ids of gpus to use.'0' is the default value. If the number of gpus greater than 1, for example'0,1,2,3', the script will use theRay library for parallelization.
config-file refers to all the configurations of the model, which also include the path of the model weights.
extract-mode refers to the modes for feature extraction, including {roi_feats,bboxes andbbox_feats}.
min-max-boxes refers to the min-and-max number of features (boxes) to be extracted.Note thatmode d2 only support to set the min-and-max number as'100,100' to get 100 boxes per image or other values to get about 50~60 boxes per image.
image-dir refers to the input image directory.
bbox-dir refers to the pre-proposed bbox directory. Only be used if theextract-mode is set to'bbox_feats'.
out-dir refers to the output feature directory.
fastmode refers to use the a faster version (about2x faster on a workstation with 4 Titan-V GPUs and 32 CPU cores), at the expense of a potential memory leakage problem if the computing capability of GPUs and CPUs is mismatched. More details and some matched examples inhere.

Using the same pre-trained model, we also provide an alternativetwo-stage strategy for extracting visual features. This results in (slightly) more accurate bounding boxes and visual features, at the expense of more time overhead:

# extract bboxes only:$ python3 extract_features.py --mode caffe \         --num-cpus 32 --gpu'0,1,2,3' \         --extract-mode bboxes \         --config-file configs/caffe/test-caffe-r101.yaml \         --image-dir<image_dir> --out-dir<out_dir>  --resume# extract visual features with the pre-extracted bboxes:$ python3 extract_features.py --mode caffe \         --num-cpus 32 --gpu'0,1,2,3' \         --extract-mode bbox_feats \         --config-file configs/caffe/test-caffe-r101.yaml \         --image-dir<image_dir> --bbox-dir<bbox_dir> --out-dir<out_dir>  --resume

Pre-trained models

We provided pre-trained models as follows, including the models trained in both thecaffe andd2 mode.

For the models of thecaffe mode,R101-k36 andR101-k10-100 refer to thefix36 model anddynamic 10-100 model provided in the originalbottom-up-attention repository. We additionally provide aR-152 model which outperforms the two counterparts above.

For the models of thed2 mode, we follow the configurations and implementations in thegrid-feats-vqa and trained three models using the training script in this repository, namelyR50,R101 andX152.

name	mode	objects mAP@0.5	weighted objects mAP@0.5	download
R101-k36	caffe	9.3	14.0	model
R101-k10-100	caffe	10.2	15.1	model
R152	caffe	11.1	15.7	model
R50	d2	8.2	14.9	model
R101	d2	9.2	15.9	model
X152	d2	10.7	17.7	model

License

This project is released under theApache 2.0 license.

Contact

This repository is currently maintained by Zhou Yu (@yuzcccc), Tongan Luo (@Zoroaster97), and Jing Li (@J1mL3e_).

Citation

If this repository is helpful for your research or you want to refer the provided pretrained models, you could cite the work using the following BibTeX entry:

@misc{yu2020buapt,  author = {Yu, Zhou and Li, Jing and Luo, Tongan and Yu, Jun},  title = {A PyTorch Implementation of Bottom-Up-Attention},  howpublished = {\url{https://github.com/MILVLG/bottom-up-attention.pytorch}},  year = {2020}}

About

A PyTorch reimplementation of bottom-up-attention models

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

bottom-up-attention.pytorch

Table of Contents

Prerequisites

Requirements

Installation

Setup

Training

Testing

Feature Extraction

Pre-trained models

License

Contact

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors7

Uh oh!

Languages

Movatterモバイル変換

License

MILVLG/bottom-up-attention.pytorch

Folders and files

Latest commit

History

Repository files navigation

bottom-up-attention.pytorch

Table of Contents

Prerequisites

Requirements

Installation

Setup

Training

Testing

Feature Extraction

Pre-trained models

License

Contact

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors7

Uh oh!

Languages

Packages