- Notifications
You must be signed in to change notification settings - Fork238
Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.
License
inclusionAI/AReaL
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
|Paper |Documentation |Ask DeepWiki |🤗 Models & Data |
WeChat (微信) Group |
AReaL is an open-sourcefully asynchronous reinforcement learning training systemfor largereasoning and agentic models, developed by the AReaL Team at Ant Group.Built upon the open-source projectReaLHF,we are fully committed to open-source principles by providing training details, data,and infrastructure required to reproduce our results along with the models themselves.AReaL aims to help everyone build their own AI agents easily and affordably. Our teamloves milk tea because it's delicious, customizable, and affordable. We hope you enjoyour project just as you enjoy real-world milk tea (cheers).
AReaL Highlights
- ⚡Flexibility: Seamless customization formulti-turn agentic rolloutworkflows within a single file, and smooth integration withother agentic tooling frameworks.
- 🚀Scalability: Through algorithm-system co-design, AReaL deliversstable fullyasynchronous RL training withindustry-leading speed. AReaL seamlessly adapts todiverse computational environments, scaling from a single node to 1,000+ GPUs.
- 🔪Cutting-Edge Performance: AReaL produces state-of-the-artmath,coding, andsearch agents with exceptionalcapabilities.
[2025/08/30] Introducing ASearcher, a state-of-the-art search agent built withAReaL's end-to-end asynchronous RL training. Check out thepaper andtheopen-source repository!
[2025/07/31] (AReaL-lite) We introduce AReaL-lite, alightweight version ofAReaL designed specifically for AI researchers and rapid prototyping. AReaL-litefeatures analgorithm-first API design that prioritizes ease of use and algorithmdevelopment, while natively supportingfully asynchronous agentic RL. With 80% fewerlines of code, AReaL-lite maintains 90% of AReaL's performance and core functionality.Check outour AReaL-lite design documentation andthe quickstart guide tobegin your journey withAReaL-lite!
📋 Previous Releases
[2025/06/03] (v0.3, boba²) We releaseboba² (double-boba) for fullyasynchronous RL training, which achieves2.77× speedup while delivering comparable orsuperior training performance compared to synchronous systems. Furthermore,asynchronous RL significantly simplifies multi-turn agentic RL training setup! Check outour v0.3 overview blog and theresearch paper.
[2025/03/31] (v0.2, boba) Introducing our milestone release—boba! Please call itA-ReaL-boba! This release features significantly faster training with SGLang support andstate-of-the-art 7B and 32B models for mathematical reasoning. Check out ourv0.2 technical blog.
[2025/02/24] (v0.1) Our initial release includes reproducible results for 1.5B and7B Large Reasoning Models (LRMs). Check out ourv0.1 technical blog.
| Task | Description | Performance |
|---|---|---|
| Math | Mathematical problem solving (SFT, GRPO, or PPO) | TBA |
| Multi-Turn Math | Iterative mathematical problem solving with self-correction | Training Curve |
| LoRA Math | Math Agent Trained With LoRA | TBA |
| VLM Math | CLEVR visual counting tasks | TBA |
| Reasoning | Countdown numbers game with custom rewards | Training Curve |
| Search Agent | An agent with end-to-end reasoning, search, browsing, and summarization capabilities | ASearcher Repo |
| Tool-Integrated Reasoning | An agent that can invoke tools during reasoning | TIR Example |
| RLHF | RLHF for LLM Alignment | RLHF Example |
| Algorithm | Documentation | Paper | Configuration |
|---|---|---|---|
| GRPO | 📖 Docs | 📄 Paper | 🔗 GSM8K Example |
| GSPO | 📖 Docs | 📄 Paper | 🔗 GSM8K Example |
| PPO | - | 📄 Paper | 🔗 GSM8K Example |
| DAPO | 📖 Docs | 📄 Paper | 🔗 GSM8K Example |
| LitePPO | 📖 Docs | 📄 Paper | - |
| Dr.GRPO | 📖 Docs | 📄 Paper | - |
| REINFORCE++ | - | 📄 Paper | 🔗 GSM8K Example |
| RLOO | 📖 Docs | 📄 Paper | 🔗 GSM8K Example |
| RLHF Reward Modeling | - | - | 🔗 RLHF Example |
| SFT | - | - | 🔗 GSM8K Example |
| Model Family | Megatron | PyTorch FSDP | Notes |
|---|---|---|---|
| Qwen2/3 | ✅ | ✅ | - |
| Qwen3-MoE | ✅ | ✅ | - |
| Qwen2.5-VL | ❌ | ✅ | Vision-language model |
| Qwen3-VL | ❌ | ✅ | Vision-language model |
| Gemma 3 | ❌ | ✅ | Vision-language model |
| Other Hugging Face LLM | ❌ | ✅ | Compatibility depending on the version oftransformers |
| Backend | DP | Tensor Parallel | Sequence Parallel within TP | Context Parallel | Pipeline Parallel | Expert Parallel | 1D Sequence Packing | LoRA |
|---|---|---|---|---|---|---|---|---|
| Megatron | ✅ (ZeRO-1) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
| PyTorch FSDP | ✅ (FSDP2) | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ |
| Backend | Tensor Parallel | Context Parallel | Pipeline Parallel | Data Parallel Attention | Expert Parallel |
|---|---|---|---|---|---|
| vLLM | ✅ | ❓ | ✅ | ❓ | ❓ |
| SGLang | ✅ | ❌ | ❌ | ✅ | ✅ |
Our training scripts automatically download the required dataset (openai/gsm8k) andmodel (Qwen/Qwen2-1.5B-Instruct). To run on a single node:
python3 -m areal.launcher.local \ examples/math/gsm8k_grpo.py \ --config examples/math/gsm8k_grpo.yaml
To run on a Ray cluster with 2 nodes and 8 GPUs per node (remember to update paths inthe YAML file to point to your shared storage):
python3 -m areal.launcher.ray \ examples/math/gsm8k_grpo.py \ --config examples/math/gsm8k_grpo.yaml \ cluster.n_nodes=2 \ cluster.n_gpus_per_node=8
For comprehensive setup instructions, seeour quickstart guide.
- Installation
- Quickstart
- CLI Configurations
- Asynchronous RL Explained
- Fine-Tuning Large MoE
- Agentic RL
- Debugging Best Practices
- Handling OOM Issues
- Customize dataset with AReaL-lite
- Customize Agentic/RVLR rollout workflows with AReaL-lite
- Customize algorithms with AReaL-lite
We warmly welcome contributions from the community! Whether you're fixing bugs, addingfeatures, improving documentation, or helping others, your contribution is valued.Please check ourContributing Guide for detailed information.
# Fork and clone the repositorygit clone https://github.com/YOUR-USERNAME/AReaLcd AReaL# Install in development modepip install -e".[dev]"# Set up pre-commit hooks for automatic formattingpip install pre-commitpre-commit install# Make changesgit checkout -b feat/gpt-o5git add.# `git commit` will automatically format your filegit commit -m"Implement gpt-o5 training loop"git push
- GitHub Discussions - Askquestions, share ideas, and connect with the community
- WeChat Group - Join our WeChat community (微信群)
- Project Roadmap - See what we're working on and what's planned
AReaL is under active development with planned minor releases weekly and major releasesmonthly. We warmly welcome community engagement and contributions. We are alsoactively hiring interns and full-time employees with open positions in both the USand China.
We gratefully acknowledge that major contributors are from the AReaL Team at Ant Groupand the Institute for Interdisciplinary Information Sciences, Tsinghua University.
We have also received invaluable assistance from the following groups (listedalphabetically):
The Data Intelligence Lab at Ant Research for their data support
TheRelaxed System Lab from HKUST forseamless collaboration on numerous system-related aspects
TheSGLang team for supporting custom weightupdate features and their contributions during AReaL-lite development
The Super Computing Technology (SCT) team at Ant Group for their expertise inlarge-scale cluster operations and maintenance
Special thanks to @Lyken17 for providing valuable suggestions throughout ourdevelopment process
We also deeply appreciate all pioneering work from the community, particularly theReaLHF project from OpenPsi Inc. and otheroutstanding projects, including but not limited toDeepScaleR,Open-Reasoner-Zero,OpenRLHF,VeRL,SGLang,QwQ,Light-R1, andDAPO.
@inproceedings{mei2025real,author ={Mei, Zhiyu and Fu, Wei and Li, Kaiwei and Wang, Guangju and Zhang, Huanchen and Wu, Yi},title ={ReaL: Efficient RLHF Training of Large Language Models with Parameter Reallocation},booktitle ={Proceedings of the Eighth Conference on Machine Learning and Systems, MLSys 2025, Santa Clara, CA, USA, May 12-15, 2025},publisher ={mlsys.org},year ={2025},}
@misc{fu2025areal,title={AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning},author={Wei Fu and Jiaxuan Gao and Xujie Shen and Chen Zhu and Zhiyu Mei and Chuyi He and Shusheng Xu and Guo Wei and Jun Mei and Jiashu Wang and Tongkai Yang and Binhang Yuan and Yi Wu},year={2025},eprint={2505.24298},archivePrefix={arXiv},primaryClass={cs.LG},url={https://arxiv.org/abs/2505.24298},}
About
Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.
Topics
Resources
License
Contributing
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Uh oh!
There was an error while loading.Please reload this page.
