Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
NotificationsYou must be signed in to change notification settings

LisaAnne/Hallucination

Repository files navigation

Rohrbach*, Anna and Hendricks*, Lisa Anne, et al. "Object Hallucination in Image Captioning." EMNLP (2018).

Find the paperhere.

@inproceedings{objectHallucination,         title = {Object Hallucination in Image Captioning.},         author = {Rohrbach, Anna and Hendricks, Lisa Anne and Burns, Kaylee, and Darrell, Trevor, and Saenko, Kate},         booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},         year = {2018} }

License: BSD 2-Clause license

Running the Code

Getting Started

Runsetup.sh to download generated sentences used for our analysis.Additionally you will need MSCOCO annotations (both the instance segmentations and ground truth captions).If you do not already have them, they can be downloadedhere.You can see other python requirements inrequirements.txt.

Replicating Results

After runningsetup.sh you should be able to replicate results in our paper by runningtable1.py,table2.py,table3.py,table4.py andfigure6.py (example usagepython table1.py --annotation_path PATH_TO_COCO_ANNOTATIONS wherecoco/annotations is the default for--annotation_path).Our scripts call onutils/chair.py to compute the CHAIR metric. See below for more details onutils/chair.py.

If you would like to runfigure4.py (language and image model consistency) you will need to download some intermediate features. Please see theLanguage and Image Model Consistency section below.

For reproducing our results on correlation with human scores, runpython table5.py. The file with images IDs used in the human evaluation, as well as the average human scores for each of the compared models, will be found indata/human_scores, after running thesetup.sh.

Evaluating CHAIR

Seeutils/chair.py to understand how we compute the CHAIRs and CHAIRi metrics. Evaluate generated sentences by inputting a path to the generated sentences as well as the path which includes coco annotations.

Example usage is:

python utils/chair.py --cap_file generated_sentences/fc_beam5_test.json --annotation_path coco

wherecap_file corresponds to a json file with your generated captions andannotation_path points to where MSCOCO annotations are stored.

We expect generated sentences to be stored as a dictionary with the following keys:

  • overall: metrics from the COCO evaluation toolkit computed over the entire dataset.
  • imgToEval: a dictionary with keys corresponding to image ids and values with a caption, image_id, and sentence metrics for the particular caption.

Note that this is the format of the captions output by the open sourced codehere,which we used to replicate most of the models presented in the paper.

Language and Image Model Consistency

To compute language and image consistency, we trained a classifier to predict class labels given an image and a language model to predict the next word in a sentence given all previous words in a sentence.You can access the labels predicted by our language model inoutput/image_classifier and the words predicted by our language modelhere.To run our code, you ned to first download thezip file into the main directory and unzip.Once you have these intermediate features you can look atutils/lm_consistency.py andutils/im_consistency.py to understand how these metrics are computed.Runningfigure4.py will output the results from our paper (constructing the actual bar plot is left as an exercise to the reader).

Human Eval

Replicate the results from our human evaluation by runningpython table5.py. Raw human evaluation scores can be found indata/human_scores after runningsetup.sh.

Captioning Models

We generated sentences for the majority of models by training open source models availablehere.Within this framework, we wrote code for the LRCN model as well as the topdown deconstructed models (Table 3 in the paper).This code is available upon request.For the top down model with bounding boxes, we used the codehere.For the Neural Baby Talk model, we used the codehere.For the GAN based model, we used the sentences from the paperhere. Sentences were obtained directly from the author (we did not train the GAN model).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp