ReinFlow/ReinFlowPublic

NotificationsYou must be signed in to change notification settings
Fork22
Star255

[NeurIPS 2025] Flow x RL. "ReinFlow: Fine-tuning Flow Policy with Online Reinforcement Learning". Support VLAs e.g., pi0, pi0.5. Fully open-sourced.

License

MIT license

255 stars 22 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 149 Commits
agent		agent
cfg		cfg
data_process		data_process
docs		docs
env		env
installation		installation
model		model
sample_figs		sample_figs
script		script
util		util
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
pyproject.toml		pyproject.toml

Repository files navigation

ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning

💐 Paper accepted atNeurIPS 2025

Tonghe Zhang$^1$,Chao Yu$^{2,3}$,Sichang Su$^4$,Yu Wang$^2$

$^1$ Carnegie Mellon University$^2$ Tsinghua University$^3$ Beijing Zhongguancun Academy$^4$ University of Texas at Austin

This is the official implementation of"ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning".

If you like our work, it will be wonderful if you give us a star⭐!

📢 News

[2025/11/28]🔥 ReinFlow now supports fine-tuning GR00T VLA models from NVIDIA. Check it out atRLinf-GR00T-N1.5
[2025/11/7] Update limitation section
[2025/11/5] Update tips on hyperparameter tuning.
[2025/11/2] 🔥We scaled up ReinFlow to fine-tune VLA models such as$\pi_0$ and$\pi_{0.5}$.
The code and checkpoint for the LIBERO environment are available atRLinf-pi0.A technical report including results on LIBERO, MetaWorld, ManiSkill/Simpler is available at$\pi_{\texttt{RL}}$Online RL Fine-tuning for Flow-based Vision-Language-Action Models: arXiv:2510.25889).
[2025/09/18] ReinFlow paper is accepted atNeurIPS 2025.
[2025/08/18] All training metrics (losses, reward, etc) released inWandB to help you reproduce our results.
[2025/07/30] Fixed the rendering bug in Robomimic. Now supports rendering at 1080p resolution.
[2025/07/29] Add tutorial on how to record videos during evaluation in thedocs
[2025/06/14] Updated webpage for a detailed explanation to the algorithm design.
[2025/05/28] Paper is posted onarXiv!

🚀 About ReinFlow

ReinFlow is a flexiblepolicy gradient framework for fine-tuningflow matching policies atany denoising step.

How does it work?
👉 First, train flow policies usingimitation learning (behavior cloning).
👉 Then, fine-tune them withonline reinforcement learning using ReinFlow!

🧩Supports:

✅ 1-Rectified Flow
✅ Shortcut Models
✅ Any other policy defined by ODEs (in principle)

📈Empirical Results: ReinFlow achieves strong performance across a variety of robotic tasks:

🦵 Legged Locomotion (OpenAI Gym)
✋ State-based manipulation (Franka Kitchen)
👀 Visual manipulation (Robomimic)

🧠Key Innovation: ReinFlow trains anoise injection network end-to-end:

✅ Makes policy probabilities tractable, even withvery few denoising steps (e.g., 4, 2, or 1)
✅ Robust to discretization and Monte Carlo approximation errors

Learn more on our 🔗project website or check out thearXiv paper.

🚀 Installation

Please follow the steps ininstallation/reinflow-setup.md.

🚀 Quick Start: Reproduce Our Results

To fully reproduce our experiments, please refer toReproduceExps.md.

To download our training data and reproduce the plots in the paper, please refer toReproduceFigs.md.

🚀 Implementation Details

Please refer toImplement.md for descriptions of key hyperparameters of FQL, DPPO, and ReinFlow.

🚀 Adding Your Own Dataset or Environment

Please refer toCustom.md.

🚀 Debug Aid and Known Issues

Please refer toKnownIssues.md to see how to resolve errors you encounter.

🚀 Tips on Hyperparameter Tuning

After training flow policies with RL in multiple benchmarks (OpenAI Gym, Franka Kitchen, Robomimic, LIBERO, ManiSkill, MetaWorld) and scaling model size from 3M to 3B,we discover that these hyperparameters are critical to RL's success, especially in visual manipulation from sparse reward:

SFT success rate. RL cannot train visual manipulation policies easily from scratch, so try to optimize your SFT success rate before starting RL. The stronger your SFT is, the easier it will be for RL.
Noise level. When the SFT success rate is low, tune down noise to [0.04, 0.10] or [0.05, 0.12] to avoid too much erroneous behaviors in early-stage exploration.When the SFT success rate is high, relax the noise logvariance to [0.08, 0.16] is usually a good practice.
Entropy coefficient. Turn it off first. When pocliy struggles to improve, add a small coefficient of 0.005 may help. When the policy is small and the problem is simple (dense reward, low-dim input),use larger entropy coefficient. Otherwise be cautious of increasing this constant.
Critic warmup. The stronger your SFT checkpoint is, the more you need a critic warmup. Try to pick the correct critic network architecture and add some rounds of warmup before policy gradient ascent. Try to make the critic loss decrease smoothly after the warmup phase, and keep a keen eye on the explained variance--it should quickly increase to a higher level. However, even without warmup, ReinFlow should be able to increase success rate eventually, but that usually slows down convergence.

🚀 Limitation and Caveats

Based on community feedback, we have added a limitations section to highlight the shortcomings of our algorithm and note important caveats. We hope this discussion will inspire future research.

ReinFlow may not be an optimal method to train RL agents from scratch. Our method is designed for fine-tuning purposes, not pre-training.

⭐ Todo

Release pi0, pi0.5 fine-tuning results.
Release WandB metrics
Release docs
Release checkpoints
Release codebase

License

This repository is released under the MIT license. SeeLICENSE.If you use our code, we appreciate it if you paste the license at the beginning of the script.

Acknowledgement

This repository was developed from multiple open-source projects. Major references include:

TorchCFM, Tong et al.: Conditional flow-matching repository.
Shortcut Models, Francs et al.: One-step Diffusion via Shortcut Models.
DPPO, Ren et al.: DPPO official implementation.

We also thank our collaborators from the open-source RL infrastructure projectRLinf for their generous support, which enabled scaling ReinFlow to models of up to 3 billion parameters across 320 highly randomized visual manipulation environments with thousands of object-scene-task-pose combinations.

For more references, please refer toAcknowledgement.md.

Cite our work

@misc{zhang2025reinflowfinetuningflowmatching,title={ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning},author={Tonghe Zhang and Chao Yu and Sichang Su and Yu Wang},year={2025},eprint={2505.22094},archivePrefix={arXiv},primaryClass={cs.RO},url={https://arxiv.org/abs/2505.22094},}

Star History

About

[NeurIPS 2025] Flow x RL. "ReinFlow: Fine-tuning Flow Policy with Online Reinforcement Learning". Support VLAs e.g., pi0, pi0.5. Fully open-sourced.

reinflow.github.io/

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Folders and files

Latest commit

History

Repository files navigation

ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning

💐 Paper accepted atNeurIPS 2025

📢 News

🚀 About ReinFlow

🚀 Installation

🚀 Quick Start: Reproduce Our Results

🚀 Implementation Details

🚀 Adding Your Own Dataset or Environment

🚀 Debug Aid and Known Issues

🚀 Tips on Hyperparameter Tuning

🚀 Limitation and Caveats

⭐ Todo

License

Acknowledgement

Cite our work

Star History

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Contributors2

Uh oh!

Languages

Movatterモバイル変換

License

ReinFlow/ReinFlow

Folders and files

Latest commit

History

Repository files navigation

ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning

💐 Paper accepted atNeurIPS 2025

📢 News

🚀 About ReinFlow

🚀 Installation

🚀 Quick Start: Reproduce Our Results

🚀 Implementation Details

🚀 Adding Your Own Dataset or Environment

🚀 Debug Aid and Known Issues

🚀 Tips on Hyperparameter Tuning

🚀 Limitation and Caveats

⭐ Todo

License

Acknowledgement

Cite our work

Star History

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Contributors2

Uh oh!

Languages

Packages