Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
NotificationsYou must be signed in to change notification settings

cambridgeltl/topviewrs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation


A novel evaluation benchmark for spatial reasoning of vision-language models.

📄 [Arxiv] ·🕸️ [Project Page] ·🤗 [Data]

Key takeaways

  • Define top-view spatial reasoning task for VLMs via 4 carefully designed tasks of increasing complexity, also encompassing 9 distinct fine-grained sub-tasks with a structured design of the questions focusing on different model abilities.
  • CollectTopViewRS Dataset (Top-ViewReasoning inSpace), comprising 11,384 multiple-choice questions with eitherphoto-realistic orsemantic top-view maps of real-world scenarios
  • Investigate 10 VLMs from different model families and sizes, highlighting theperformance gap compared to human annotators.

sicl

Dataset

Part of the benchmark is now available on Huggingface:https://huggingface.co/datasets/chengzu/topviewrs.

Code

Coming soon.

Citation

If you find TopViewRS useful:

@misc{li2024topviewrs,title={TopViewRS: Vision-Language Models as Top-View Spatial Reasoners},author={Chengzu Li and Caiqi Zhang and Han Zhou and Nigel Collier and Anna Korhonen and Ivan Vulić},year={2024},eprint={2406.02537},archivePrefix={arXiv},primaryClass={cs.CL}}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp