- Notifications
You must be signed in to change notification settings - Fork9
LisaAnne/Hallucination
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Rohrbach*, Anna and Hendricks*, Lisa Anne, et al. "Object Hallucination in Image Captioning." EMNLP (2018).
Find the paperhere.
@inproceedings{objectHallucination, title = {Object Hallucination in Image Captioning.}, author = {Rohrbach, Anna and Hendricks, Lisa Anne and Burns, Kaylee, and Darrell, Trevor, and Saenko, Kate}, booktitle = {Empirical Methods in Natural Language Processing (EMNLP)}, year = {2018} }
License: BSD 2-Clause license
Getting Started
Runsetup.sh to download generated sentences used for our analysis.Additionally you will need MSCOCO annotations (both the instance segmentations and ground truth captions).If you do not already have them, they can be downloadedhere.You can see other python requirements inrequirements.txt.
Replicating Results
After runningsetup.sh
you should be able to replicate results in our paper by runningtable1.py
,table2.py
,table3.py
,table4.py
andfigure6.py
(example usagepython table1.py --annotation_path PATH_TO_COCO_ANNOTATIONS
wherecoco/annotations
is the default for--annotation_path
).Our scripts call onutils/chair.py
to compute the CHAIR metric. See below for more details onutils/chair.py
.
If you would like to runfigure4.py
(language and image model consistency) you will need to download some intermediate features. Please see theLanguage and Image Model Consistency section below.
For reproducing our results on correlation with human scores, runpython table5.py
. The file with images IDs used in the human evaluation, as well as the average human scores for each of the compared models, will be found indata/human_scores
, after running thesetup.sh
.
Evaluating CHAIR
Seeutils/chair.py
to understand how we compute the CHAIRs and CHAIRi metrics. Evaluate generated sentences by inputting a path to the generated sentences as well as the path which includes coco annotations.
Example usage is:
python utils/chair.py --cap_file generated_sentences/fc_beam5_test.json --annotation_path coco
wherecap_file
corresponds to a json file with your generated captions andannotation_path
points to where MSCOCO annotations are stored.
We expect generated sentences to be stored as a dictionary with the following keys:
- overall: metrics from the COCO evaluation toolkit computed over the entire dataset.
- imgToEval: a dictionary with keys corresponding to image ids and values with a caption, image_id, and sentence metrics for the particular caption.
Note that this is the format of the captions output by the open sourced codehere,which we used to replicate most of the models presented in the paper.
Language and Image Model Consistency
To compute language and image consistency, we trained a classifier to predict class labels given an image and a language model to predict the next word in a sentence given all previous words in a sentence.You can access the labels predicted by our language model inoutput/image_classifier
and the words predicted by our language modelhere.To run our code, you ned to first download thezip file into the main directory and unzip.Once you have these intermediate features you can look atutils/lm_consistency.py
andutils/im_consistency.py
to understand how these metrics are computed.Runningfigure4.py
will output the results from our paper (constructing the actual bar plot is left as an exercise to the reader).
Human Eval
Replicate the results from our human evaluation by runningpython table5.py
. Raw human evaluation scores can be found indata/human_scores
after runningsetup.sh
.
Captioning Models
We generated sentences for the majority of models by training open source models availablehere.Within this framework, we wrote code for the LRCN model as well as the topdown deconstructed models (Table 3 in the paper).This code is available upon request.For the top down model with bounding boxes, we used the codehere.For the Neural Baby Talk model, we used the codehere.For the GAN based model, we used the sentences from the paperhere. Sentences were obtained directly from the author (we did not train the GAN model).