Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

[COLM 2024] ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning

License

NotificationsYou must be signed in to change notification settings

bigai-nlco/ExoViP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Official implementation of our paper: ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning

image

Introduction

In this work, we devise a "plug-and-play" method, ExoViP, to correct the errors at both the planning and execution stages through introspective verification. We employ verification modules as "exoskeletons" to enhance current vision-language programming schemes. Specifically, our proposed verification module utilizes a mixture of three sub-verifiers to validate predictions after each reasoning step, subsequently calibrating the visual module predictions and refining the reasoning trace planned by LLMs.

Environment

Paste your OPENAI-API-KEY and OPENAPI-API-BASE toengine/.env andtasks/*.ipynb

conda env create -f environment.yamlconda activate exovip

If the Huggingface is not available of your network, you can download all checkpoints underprev_trained_models directory

Highlights

Errors in existing methods could be summarized to two categories:

  • Module Error: The visual modules are not able to correctly execute the program
  • Planning Error: LLM can not parse the language query into a correct solvable program

image

We conducted a comparative analysis of the statistics derived from a random sample of 100 failure incidents before (left) and after (right) the implementation of our method.

image

Start

Our method has been validated on six tasks:

NOTE: All the experiments are applied on subsets of these datasets, please refer todatasets

code demos

cd tasks# GQAgqa.ipynb# NLVRnlvr.ipynb# RefCOCO(+/g)refcoco.ipynb# KILOGRAMkilogram.ipynb# MagicBrushmagicbrush.ipynb# AGQAagqa.ipynb

Available Modules

image

Examples

image

Acknowledgement

visprog, a neuro-symbolic system that solves complex and compositional visual tasks given natural language instructions

Citation

If you find our work helpful, please cite it.

@inproceedings{wang2024exovip,title={ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning},author={Wang, Yuxuan and Yuille, Alan and Li, Zhuowan and Zheng, Zilong},booktitle={The first Conference on Language Modeling (COLM)},year={2024}}

About

[COLM 2024] ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp