- Notifications
You must be signed in to change notification settings - Fork0
[COLM 2024] ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning
License
bigai-nlco/ExoViP
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Official implementation of our paper: ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning
In this work, we devise a "plug-and-play" method, ExoViP, to correct the errors at both the planning and execution stages through introspective verification. We employ verification modules as "exoskeletons" to enhance current vision-language programming schemes. Specifically, our proposed verification module utilizes a mixture of three sub-verifiers to validate predictions after each reasoning step, subsequently calibrating the visual module predictions and refining the reasoning trace planned by LLMs.
Paste your OPENAI-API-KEY and OPENAPI-API-BASE toengine/.env
andtasks/*.ipynb
conda env create -f environment.yamlconda activate exovip
If the Huggingface is not available of your network, you can download all checkpoints underprev_trained_models
directory
Errors in existing methods could be summarized to two categories:
- Module Error: The visual modules are not able to correctly execute the program
- Planning Error: LLM can not parse the language query into a correct solvable program
We conducted a comparative analysis of the statistics derived from a random sample of 100 failure incidents before (left) and after (right) the implementation of our method.
Our method has been validated on six tasks:
- Compositional Image Question Answering:GQA
- Referring Expression Understanding:RefCOCO/RefCOCO+/RefCOCOg
- Natural Language for Visual Reasoning:NLVR
- Visual Abstract Reasoning:KILOGRAM
- Language-guided Image Editing:MagicBrush
- Spatial-Temporal Video Reasoning:AGQA
NOTE: All the experiments are applied on subsets of these datasets, please refer todatasets
code demos
cd tasks# GQAgqa.ipynb# NLVRnlvr.ipynb# RefCOCO(+/g)refcoco.ipynb# KILOGRAMkilogram.ipynb# MagicBrushmagicbrush.ipynb# AGQAagqa.ipynb
visprog, a neuro-symbolic system that solves complex and compositional visual tasks given natural language instructions
If you find our work helpful, please cite it.
@inproceedings{wang2024exovip,title={ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning},author={Wang, Yuxuan and Yuille, Alan and Li, Zhuowan and Zheng, Zilong},booktitle={The first Conference on Language Modeling (COLM)},year={2024}}
About
[COLM 2024] ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors3
Uh oh!
There was an error while loading.Please reload this page.