Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[NeurIPS 2025] Flow x RL. "ReinFlow: Fine-tuning Flow Policy with Online Reinforcement Learning". Support VLAs e.g., pi0, pi0.5. Fully open-sourced.

License

NotificationsYou must be signed in to change notification settings

ReinFlow/ReinFlow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

149 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

💐 Paper accepted atNeurIPS 2025

Tonghe Zhang$^1$,Chao Yu$^{2,3}$,Sichang Su$^4$,Yu Wang$^2$

$^1$ Carnegie Mellon University$^2$ Tsinghua University$^3$ Beijing Zhongguancun Academy$^4$ University of Texas at Austin

WebsiteDocsNeurIPS
arXivCheckpointsWandB

Architecture Diagram

Shortcut Flow CanShortcut Transport


Installation |Quick Start |Implementation Details |Add Dataset/Environment
Debug & Known Issues |License |Acknowledgement |Citation

This is the official implementation of"ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning".

If you like our work, it will be wonderful if you give us a star!

📢 News

  • [2025/11/28]🔥 ReinFlow now supports fine-tuning GR00T VLA models from NVIDIA. Check it out atRLinf-GR00T-N1.5
  • [2025/11/7] Update limitation section
  • [2025/11/5] Update tips on hyperparameter tuning.
  • [2025/11/2] 🔥We scaled up ReinFlow to fine-tune VLA models such as$\pi_0$ and$\pi_{0.5}$.
    The code and checkpoint for the LIBERO environment are available atRLinf-pi0.A technical report including results on LIBERO, MetaWorld, ManiSkill/Simpler is available at$\pi_{\texttt{RL}}$Online RL Fine-tuning for Flow-based Vision-Language-Action Models: arXiv:2510.25889).
  • [2025/09/18] ReinFlow paper is accepted atNeurIPS 2025.
  • [2025/08/18] All training metrics (losses, reward, etc) released inWandB to help you reproduce our results.
  • [2025/07/30] Fixed the rendering bug in Robomimic. Now supports rendering at 1080p resolution.
  • [2025/07/29] Add tutorial on how to record videos during evaluation in thedocs
  • [2025/06/14] Updated webpage for a detailed explanation to the algorithm design.
  • [2025/05/28] Paper is posted onarXiv!

🚀 About ReinFlow

ReinFlow is a flexiblepolicy gradient framework for fine-tuningflow matching policies atany denoising step.

How does it work?
👉 First, train flow policies usingimitation learning (behavior cloning).
👉 Then, fine-tune them withonline reinforcement learning using ReinFlow!

🧩Supports:

  • ✅ 1-Rectified Flow
  • ✅ Shortcut Models
  • ✅ Any other policy defined by ODEs (in principle)

📈Empirical Results: ReinFlow achieves strong performance across a variety of robotic tasks:

  • 🦵 Legged Locomotion (OpenAI Gym)
  • ✋ State-based manipulation (Franka Kitchen)
  • 👀 Visual manipulation (Robomimic)

🧠Key Innovation: ReinFlow trains anoise injection network end-to-end:

  • ✅ Makes policy probabilities tractable, even withvery few denoising steps (e.g., 4, 2, or 1)
  • ✅ Robust to discretization and Monte Carlo approximation errors

Learn more on our 🔗project website or check out thearXiv paper.

🚀 Installation

Please follow the steps ininstallation/reinflow-setup.md.

🚀 Quick Start: Reproduce Our Results

To fully reproduce our experiments, please refer toReproduceExps.md.

To download our training data and reproduce the plots in the paper, please refer toReproduceFigs.md.

🚀 Implementation Details

Please refer toImplement.md for descriptions of key hyperparameters of FQL, DPPO, and ReinFlow.

🚀 Adding Your Own Dataset or Environment

Please refer toCustom.md.

🚀 Debug Aid and Known Issues

Please refer toKnownIssues.md to see how to resolve errors you encounter.

🚀 Tips on Hyperparameter Tuning

After training flow policies with RL in multiple benchmarks (OpenAI Gym, Franka Kitchen, Robomimic, LIBERO, ManiSkill, MetaWorld) and scaling model size from 3M to 3B,we discover that these hyperparameters are critical to RL's success, especially in visual manipulation from sparse reward:

  • SFT success rate. RL cannot train visual manipulation policies easily from scratch, so try to optimize your SFT success rate before starting RL. The stronger your SFT is, the easier it will be for RL.
  • Noise level. When the SFT success rate is low, tune down noise to [0.04, 0.10] or [0.05, 0.12] to avoid too much erroneous behaviors in early-stage exploration.When the SFT success rate is high, relax the noise logvariance to [0.08, 0.16] is usually a good practice.
  • Entropy coefficient. Turn it off first. When pocliy struggles to improve, add a small coefficient of 0.005 may help. When the policy is small and the problem is simple (dense reward, low-dim input),use larger entropy coefficient. Otherwise be cautious of increasing this constant.
  • Critic warmup. The stronger your SFT checkpoint is, the more you need a critic warmup. Try to pick the correct critic network architecture and add some rounds of warmup before policy gradient ascent. Try to make the critic loss decrease smoothly after the warmup phase, and keep a keen eye on the explained variance--it should quickly increase to a higher level. However, even without warmup, ReinFlow should be able to increase success rate eventually, but that usually slows down convergence.

🚀 Limitation and Caveats

Based on community feedback, we have added a limitations section to highlight the shortcomings of our algorithm and note important caveats. We hope this discussion will inspire future research.

  • ReinFlow may not be an optimal method to train RL agents from scratch. Our method is designed for fine-tuning purposes, not pre-training.

⭐ Todo

  • Release pi0, pi0.5 fine-tuning results.
  • Release WandB metrics
  • Release docs
  • Release checkpoints
  • Release codebase

License

This repository is released under the MIT license. SeeLICENSE.If you use our code, we appreciate it if you paste the license at the beginning of the script.

Acknowledgement

This repository was developed from multiple open-source projects. Major references include:

We also thank our collaborators from the open-source RL infrastructure projectRLinf for their generous support, which enabled scaling ReinFlow to models of up to 3 billion parameters across 320 highly randomized visual manipulation environments with thousands of object-scene-task-pose combinations.

For more references, please refer toAcknowledgement.md.

Cite our work

@misc{zhang2025reinflowfinetuningflowmatching,title={ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning},author={Tonghe Zhang and Chao Yu and Sichang Su and Yu Wang},year={2025},eprint={2505.22094},archivePrefix={arXiv},primaryClass={cs.RO},url={https://arxiv.org/abs/2505.22094},}

Star History

Star History Chart

About

[NeurIPS 2025] Flow x RL. "ReinFlow: Fine-tuning Flow Policy with Online Reinforcement Learning". Support VLAs e.g., pi0, pi0.5. Fully open-sourced.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors2

  •  
  •  

[8]ページ先頭

©2009-2026 Movatter.jp