- Notifications
You must be signed in to change notification settings - Fork1
cambridgeltl/topviewrs
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
📄 [Arxiv] ·🕸️ [Project Page] ·🤗 [Data]
- Define top-view spatial reasoning task for VLMs via 4 carefully designed tasks of increasing complexity, also encompassing 9 distinct fine-grained sub-tasks with a structured design of the questions focusing on different model abilities.
- CollectTopViewRS Dataset (Top-ViewReasoning inSpace), comprising 11,384 multiple-choice questions with eitherphoto-realistic orsemantic top-view maps of real-world scenarios
- Investigate 10 VLMs from different model families and sizes, highlighting theperformance gap compared to human annotators.
Part of the benchmark is now available on Huggingface:https://huggingface.co/datasets/chengzu/topviewrs.
Coming soon.
If you find TopViewRS useful:
@misc{li2024topviewrs,title={TopViewRS: Vision-Language Models as Top-View Spatial Reasoners},author={Chengzu Li and Caiqi Zhang and Han Zhou and Nigel Collier and Anna Korhonen and Ivan Vulić},year={2024},eprint={2406.02537},archivePrefix={arXiv},primaryClass={cs.CL}}