ritzz-ai/GUI-R1Public

NotificationsYou must be signed in to change notification settings
Fork17
Star221

Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents

License

Apache-2.0 license

221 stars 17 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
examples		examples
guir1		guir1
scripts		scripts
verl		verl
Dockerfile		Dockerfile
Dockerfile.nightly		Dockerfile.nightly
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Repository files navigation

GUI-R1: A Generalist R1-style Vision-Language Action Model For GUI Agents

The official repo for "GUI-R1: A Generalist R1-style Vision-Language Action Model For GUI Agents".

🤗GUI-R1-3K | 🤗GUI-R1 | 📑Paper

News

[2025/05/04] We released an 800K high-quality reinforcement learningdataset filtered from the OS-Atlas pretraining data using QwenVL2.5-7B, with varying levels of difficulty. From this, we further filtered out a diverse subset of 10K samples and applied the DAPO algorithm, giving you the potential to outperform InfiGUI-R1. We warmly welcome everyone to utilize it!.
[2025/04/18] We released the weights, code and scripts.
[2025/04/17] We releasedDataset!
[2025/04/14] Our GUI-R1 paper (GUI-R1: A Generalist R1-style Vision-Language Action Model For GUI Agents) can be accessed in arXiv!
[2025/03/10] We start our project.

Our Exploration

By leveraging a small amount of carefully curated high-quality data across multiple platforms (including Windows, Linux, MacOS, Android, and Web) and employing policy optimization algorithms such as group relative policy optimization (GRPO) to update the model, GUI-R1 achieves superior performance using only 0.02% of the data (3K vs. 13M) compared to previous state-of-the-art methods like OS-Atlas across eight benchmarks spanning three different platforms (mobile, desktop, and web). These results demonstrate the immense potential of reinforcement learning based on unified action space rule modeling in improving the execution capabilities of LVLMs for real-world GUI agent tasks.

Framework

Given the high-level instruction, action history, and visual image inputs, the policy model generates multiple responses containing reasoning steps. Then the verifiable rewards, such as action type reward, click point reward, and input text reward, are used with the policy gradient optimization algorithm to update the policy model.

Result

Requirements

We recommend using thepre-built docker image in EasyR1.

# stabledocker pull hiyouga/verl:ngc-th2.5.1-cu120-vllm0.7.4-hotfix# nightlydocker pull hiyouga/verl:ngc-th2.6.0-cu120-vllm0.8.2

Data preparation

Download the training and evaluation datasetGUI-R1-3K.

The structure of the directory should be:

│──Dataset│ ├──train.parquet│ ├──test.parquet│ ├──androidcontrol_high_test.parquet│ ├──androidcontrol_low_test.parquet│ ├──guiact_web_test.parquet│ ├──guiodyssey_test.parquet│ ├──omniact_web_test.parquet│ ├──omniact_desktop_test.parquet│ ├──screenspot_pro_test.parquet│ ├──screenspot_test.parquet

RL Training

bash examples/qwen2_5_vl_3b_gui_grpo/shbash examples/qwen2_5_vl_7b_gui_grpo/sh

Inference and Evaluation

cd guir1bash inference.shbash eval.sh

Star History

Acknowledgements

We would like to express our sincere gratitude toDeepSeek,VLM-R1,QwenVL,EasyR1, andOS-ATLAS for providing open-source resources that contributed to the development of this project.

Citation

If you find this repo useful for your research, please consider citing the paper

@article{luo2025gui,  title={GUI-R1: A Generalist R1-Style Vision-Language Action Model For GUI Agents},  author={Luo, Run and Wang, Lu and He, Wanwei and Xia, Xiaobo},  journal={arXiv preprint arXiv:2504.10458},  year={2025}}

About

Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents

Releases

No releases published

Packages

No packages published

Languages

Python98.7%
Other1.3%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Folders and files

Latest commit

History

Repository files navigation

GUI-R1: A Generalist R1-style Vision-Language Action Model For GUI Agents

News

Our Exploration

Framework

Result

Requirements

Data preparation

RL Training

Inference and Evaluation

Star History

Acknowledgements

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

License

ritzz-ai/GUI-R1

Folders and files

Latest commit

History

Repository files navigation

GUI-R1: A Generalist R1-style Vision-Language Action Model For GUI Agents

News

Our Exploration

Framework

Result

Requirements

Data preparation

RL Training

Inference and Evaluation

Star History

Acknowledgements

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages