Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation,member institutions, and all contributors.Donate
arxiv logo>cs> arXiv:2306.11335
arXiv logo
Cornell University Logo

Computer Science > Robotics

arXiv:2306.11335 (cs)
[Submitted on 20 Jun 2023 (v1), last revised 20 Mar 2024 (this version, v4)]

Title:Surfer: Progressive Reasoning with World Models for Robotic Manipulation

View PDFHTML (experimental)
Abstract:Considering how to make the model accurately understand and follow natural language instructions and perform actions consistent with world knowledge is a key challenge in robot manipulation. This mainly includes human fuzzy instruction reasoning and the following of physical knowledge. Therefore, the embodied intelligence agent must have the ability to model world knowledge from training data. However, most existing vision and language robot manipulation methods mainly operate in less realistic simulator and language settings and lack explicit modeling of world knowledge. To bridge this gap, we introduce a novel and simple robot manipulation framework, called Surfer. It is based on the world model, treats robot manipulation as a state transfer of the visual scene, and decouples it into two parts: action and scene. Then, the generalization ability of the model on new instructions and new scenes is enhanced by explicit modeling of the action and scene prediction in multi-modal information. In addition to the framework, we also built a robot manipulation simulator that supports full physics execution based on the MuJoCo physics engine. It can automatically generate demonstration training data and test data, effectively reducing labor costs. To conduct a comprehensive and systematic evaluation of the robot manipulation model in terms of language understanding and physical execution, we also created a robotic manipulation benchmark with progressive reasoning tasks, called SeaWave. It contains 4 levels of progressive reasoning tasks and can provide a standardized testing platform for embedded AI agents in multi-modal environments. On average, Surfer achieved a success rate of 54.74% on the defined four levels of manipulation tasks, exceeding the best baseline performance of 47.64%.
Subjects:Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:arXiv:2306.11335 [cs.RO]
 (orarXiv:2306.11335v4 [cs.RO] for this version)
 https://doi.org/10.48550/arXiv.2306.11335
arXiv-issued DOI via DataCite

Submission history

From: Kaidong Zhang [view email]
[v1] Tue, 20 Jun 2023 07:06:04 UTC (1,327 KB)
[v2] Wed, 21 Jun 2023 06:56:47 UTC (1,327 KB)
[v3] Mon, 8 Jan 2024 13:29:26 UTC (5,223 KB)
[v4] Wed, 20 Mar 2024 13:18:18 UTC (5,674 KB)
Full-text links:

Access Paper:

Current browse context:
cs.RO
Change to browse by:
export BibTeX citation

Bookmark

BibSonomy logoReddit logo

Bibliographic and Citation Tools

Bibliographic Explorer(What is the Explorer?)
Connected Papers(What is Connected Papers?)
scite Smart Citations(What are Smart Citations?)

Code, Data and Media Associated with this Article

CatalyzeX Code Finder for Papers(What is CatalyzeX?)
Hugging Face(What is Huggingface?)
Papers with Code(What is Papers with Code?)

Demos

Hugging Face Spaces(What is Spaces?)

Recommenders and Search Tools

Influence Flower(What are Influence Flowers?)
CORE Recommender(What is CORE?)

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community?Learn more about arXivLabs.

Which authors of this paper are endorsers? |Disable MathJax (What is MathJax?)

[8]ページ先頭

©2009-2025 Movatter.jp