bigai-nlco/ExoViPPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star6

[COLM 2024] ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning

License

MIT license

6 stars 0 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
assets		assets
baselines		baselines
datasets		datasets
docs		docs
engine		engine
prompts		prompts
tasks		tasks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
envirionmental.yaml		envirionmental.yaml
vis_utils.py		vis_utils.py

Repository files navigation

ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning

Official implementation of our paper: ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning

Introduction

In this work, we devise a "plug-and-play" method, ExoViP, to correct the errors at both the planning and execution stages through introspective verification. We employ verification modules as "exoskeletons" to enhance current vision-language programming schemes. Specifically, our proposed verification module utilizes a mixture of three sub-verifiers to validate predictions after each reasoning step, subsequently calibrating the visual module predictions and refining the reasoning trace planned by LLMs.

Environment

Paste your OPENAI-API-KEY and OPENAPI-API-BASE toengine/.env andtasks/*.ipynb

conda env create -f environment.yamlconda activate exovip

If the Huggingface is not available of your network, you can download all checkpoints underprev_trained_models directory

Highlights

Errors in existing methods could be summarized to two categories:

Module Error: The visual modules are not able to correctly execute the program
Planning Error: LLM can not parse the language query into a correct solvable program

We conducted a comparative analysis of the statistics derived from a random sample of 100 failure incidents before (left) and after (right) the implementation of our method.

Start

Our method has been validated on six tasks:

Compositional Image Question Answering:GQA
Referring Expression Understanding:RefCOCO/RefCOCO+/RefCOCOg
Natural Language for Visual Reasoning:NLVR
Visual Abstract Reasoning:KILOGRAM
Language-guided Image Editing:MagicBrush
Spatial-Temporal Video Reasoning:AGQA

NOTE: All the experiments are applied on subsets of these datasets, please refer todatasets

code demos

cd tasks# GQAgqa.ipynb# NLVRnlvr.ipynb# RefCOCO(+/g)refcoco.ipynb# KILOGRAMkilogram.ipynb# MagicBrushmagicbrush.ipynb# AGQAagqa.ipynb

Available Modules

Examples

Acknowledgement

visprog, a neuro-symbolic system that solves complex and compositional visual tasks given natural language instructions

Citation

If you find our work helpful, please cite it.

@inproceedings{wang2024exovip,title={ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning},author={Wang, Yuxuan and Yuille, Alan and Li, Zhuowan and Zheng, Zilong},booktitle={The first Conference on Language Modeling (COLM)},year={2024}}

About

[COLM 2024] ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning

Introduction

Environment

Highlights

Start

Available Modules

Examples

Acknowledgement

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors3

Uh oh!

Languages

Movatterモバイル変換

License

bigai-nlco/ExoViP

Folders and files

Latest commit

History

Repository files navigation

ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning

Introduction

Environment

Highlights

Start

Available Modules

Examples

Acknowledgement

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors3

Uh oh!

Languages

Packages