- Notifications
You must be signed in to change notification settings - Fork1
CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations
License
ExplainableML/CLEVR-X
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
ByLeonard Salewski,A. Sophia Koepke,Hendrik Lensch andZeynep Akata.Published inSpringer LNAI xxAI and also presented at theCVPR 2022 Workshop on Explainable AI for Computer Vision (XAI4CV). A preprint is available onarXiv.
This repository is the official implementation ofCLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations. It contains code to generate the CLEVR-X dataset and aPyTorch dataset implementation.
Below is an example from the CLEVR dataset extended with CLEVR-X's natural language explanation:
Question: There is a purple metallic ball; what number of cyan objects are right of it?
Answer: 1
Explanation: There is a cyan cylinder which is on the right side of the purple metallic ball.
This repository contains instructions for:
The generated CLEVR-X dataset is available here:CLEVR-X dataset (~1.21 GB).
The download includes two JSON files, which contain the explanations for all CLEVR train and CLEVR validation questions (CLEVR_train_explanations_v0.7.10.json
andCLEVR_val_explanations_v0.7.10.json
respectively).The general layout of the JSON files follows the original CLEVR JSON files. Theinfo
key contains general information, whereas thequestions
key contains the dataset itself. The latter is a list of dictionaries, where each dictionary is one sample of the CLEVR-X dataset.
Furthermore, we provide two python pickle files at the same link. Those contain a list of the image indices of the CLEVR-X train and CLEVR-X validation subsets (which are both part of the CLEVR train subset.)
Note, that we do not provide the images of the CLEVR dataset, which can be downloaded from the originalCLEVR project page.
As stated above, the two python pickle files (train_images_ids_v0.7.10-recut.pkl
anddev_images_ids_v0.7.10-recut.pkl
) contain the image indices of all CLEVR-X train explanations and all CLEVR-X validation explanations.
To obtain the train samples, iterate through the samples inCLEVR_train_explanations_v0.7.10.json
and use those samples, whoseimage_index
is in the list contained intrain_images_ids_v0.7.10-recut.pkl
.
To obtain the validation samples, iterate through the samples inCLEVR_train_explanations_v0.7.10.json
and use those samples, whoseimage_index
is in the list contained indev_images_ids_v0.7.10-recut.pkl
.
All samples from the CLEVRvalidation subset (CLEVR_val_explanations_v0.7.10.json
) are used for the CLEVR-Xtest subset.
The following sections explain how to generate the CLEVR-X dataset.
The required libraries for generating the CLEVR-X dataset can be found in the environment.yaml file. To create an environment and to install the requirements useconda:
conda env create --file environment.yaml
Activate it with:
conda activate clevr_explanations
As CLEVR-X uses the same questions and images as CLEVR, it is necessary to download theCLEVR dataset. Follow the instructions on theCLEVR dataset website to download the original dataset (images, scene graphs and questions & answers).The extracted files should be located in a folder calledCLEVR_v1.0
also known as$CLEVR_ROOT
.For further instructions and information about the original CLEVR code, it could also be helpful to refer to theCLEVR GitHub repository.
First change into thequestion_generation
directory:
cd question_generation
To generate explanations for the CLEVR training subset run this command:
python generate_explanations.py \ --input_scene_file$CLEVR_ROOT/scenes/CLEVR_train_scenes.json \ --input_questions_file$CLEVR_ROOT/questions/CLEVR_train_questions.json \ --output_explanations_file$CLEVR_ROOT/questions/CLEVR_train_explanations_v0.7.13.json \ --seed"43" \ --metadata_file ./metadata.json
This generation takes about 6 hours on an Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz.Note, setting the--log_to_dataframe
flag totrue
may increase the generation time significantly, but allows dumping (parts of) the dataset as an HTML table.
First change into thequestion_generation
directory:
cd question_generation
To generate explanations for the CLEVR validation subset run this command:
python generate_explanations.py \ --input_scene_file$CLEVR_ROOT/scenes/CLEVR_val_scenes.json \ --input_questions_file$CLEVR_ROOT/questions/CLEVR_val_questions.json \ --output_explanations_file$CLEVR_ROOT/questions/CLEVR_val_explanations_v0.7.13.json \ --seed"43" \ --metadata_file ./metadata.json
This generation takes less than 1 hour on an Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz.Note, setting the--log_to_dataframe
flag totrue
may increase the generation time significantly, but allows dumping (parts of) the dataset as an HTML table.
Both commands use the--input_scene_file
,--input_questions_file
and the--metadata_file
provided by the originalCLEVR dataset. You can use any name for the--output_explanations_file
argument, but the dataloader expects it in the formatCLEVR_<split>_explanations_<version>.json
.
Note, that the original CLEVR test set does not have publically accessible scene graphs and functional programs. Thus, we use the CLEVR validation set as the CLEVR-X test subset. The following code generates anew split of the CLEVR training set into the CLEVR-X training and validation subsets:
cd question_generationpython dev_split.py --root$CLEVR_ROOT
As each image comes with ten questions, the split is performed alongside the images instead of individual dataset samples. The code stores the image indices of each split in two separate python pickle files (namedtrain_images_ids_v0.7.10-recut.pkl
anddev_images_ids_v0.7.10-recut.pkl
). We have published our files alongside with the dataset download and recommend using those indices.
Different baselines and VQA-X models achieve the following performance on CLEVR-X:
Model name | Accuracy | BLEU | METEOR | ROUGE-L | CIDEr |
---|---|---|---|---|---|
Random Words | 3.6% | 0.0 | 8.4 | 11.4 | 5.9 |
Random Explanations | 3.6% | 10.9 | 16.6 | 35.3 | 30.4 |
PJ-X | 80.3% | 78.8 | 52.5 | 85.8 | 566.8 |
FM | 63.0% | 87.4 | 58.9 | 93.4 | 639.8 |
For more information on the baselines and models, check the respective publications and our CLEVR-X publication itself.
For information on the license please look into theLICENSE
file.
If you use CLEVR-X in any of your works, please use the following bibtex entry to cite it:
@inproceedings{salewski2022clevrx, title = {CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations}, author = {Leonard Salewski and A. Sophia Koepke and Hendrik P. A. Lensch and Zeynep Akata}, booktitle = {xxAI - Beyond explainable Artificial Intelligence}, pages = {85--104}, year = {2022}, publisher = {Springer}}
You can also find our work onGoogle Scholar andSemantic Scholar.