NotificationsYou must be signed in to change notification settings
Fork1
Star12

You must be signed in to change notification settings

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
figs		figs
.gitignore		.gitignore
README.md		README.md

Repository files navigation

👀 TopViewRS: Vision-Language Models as Top-View Spatial Reasoners

A novel evaluation benchmark for spatial reasoning of vision-language models.

📄 [Arxiv] ·🕸️ [Project Page] ·🤗 [Data]

Key takeaways

Define top-view spatial reasoning task for VLMs via 4 carefully designed tasks of increasing complexity, also encompassing 9 distinct fine-grained sub-tasks with a structured design of the questions focusing on different model abilities.
CollectTopViewRS Dataset (Top-ViewReasoning inSpace), comprising 11,384 multiple-choice questions with eitherphoto-realistic orsemantic top-view maps of real-world scenarios
Investigate 10 VLMs from different model families and sizes, highlighting theperformance gap compared to human annotators.

Dataset

Part of the benchmark is now available on Huggingface:https://huggingface.co/datasets/chengzu/topviewrs.

Code

Coming soon.

Citation

If you find TopViewRS useful:

@misc{li2024topviewrs,title={TopViewRS: Vision-Language Models as Top-View Spatial Reasoners},author={Chengzu Li and Caiqi Zhang and Han Zhou and Nigel Collier and Anna Korhonen and Ivan Vulić},year={2024},eprint={2406.02537},archivePrefix={arXiv},primaryClass={cs.CL}}

About

No description, website, or topics provided.

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Folders and files

Latest commit

History

Repository files navigation

👀 TopViewRS: Vision-Language Models as Top-View Spatial Reasoners

A novel evaluation benchmark for spatial reasoning of vision-language models.

Key takeaways

Dataset

Code

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages

Movatterモバイル変換

cambridgeltl/topviewrs

Folders and files

Latest commit

History

Repository files navigation

👀 TopViewRS: Vision-Language Models as Top-View Spatial Reasoners

A novel evaluation benchmark for spatial reasoning of vision-language models.

Key takeaways

Dataset

Code

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages0

Packages