Movatterモバイル変換


[0]ホーム

URL:


Search

Avatar

Antonin Raffin

Research Engineer in Robotics and Machine Learning

German Aerospace Center (DLR)

Bio

Antonin Raffin is a research engineer at the German Aerospace Center (DLR) who specializes in reinforcement learning (RL). He is the lead developer of Stable-Baselines3 (SB3), an open-source library that implements Deep RL algorithms. His main focus is on learning controllers directly on real robots and improving the reproducibility of RL.

Interests

  • Robotics
  • Reinforcement Learning
  • State Representation Learning
  • Machine Learning

Projects

*

SBX: Stable Baselines Jax

Proof of concept version of Stable-Baselines3 in Jax.

Datasaurust

Blazingly fast implementation of the Datasaurus paper in Rust. Same Stats, Different Graphs.

Stable Baselines3

A set of improved implementations of reinforcement learning algorithms in PyTorch.

Learning to Drive Smoothly in Minutes

Learning to drive smoothly in minutes using reinforcement learning on a Donkey Car.

RL Baselines Zoo

A collection of 70+ pre-trained RL agents using Stable Baselines

S-RL Toolbox

S-RL Toolbox: Reinforcement Learning (RL) and State Representation Learning (SRL) for Robotics

Stable Baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

Racing Robot

Autonomous Racing Robot With an Arduino, a Raspberry Pi and a Pi Camera

Arduino Robust Serial

A simple and robust serial communication protocol. Implementation in C Arduino, C++, Python and Rust.

Selected Publications

Antonin Raffin,Olivier Sigaud,Jens Kober,Alin Albu-Schäffer,Joao Silvério,Freek Stulp
October 2023 RLC 2024

An Open-Loop Baseline for Reinforcement Learning Locomotion Tasks

Outstanding Paper Award on Empirical Resourcefulness in RL
In search of a simple baseline for Deep Reinforcement Learning in locomotion tasks, we propose a model-free open-loop strategy. By leveraging prior knowledge and the elegance of simple oscillators to generate periodic joint motions, it achieves respectable performance in five different locomotion environments, with a number of tunable parameters that is a tiny fraction of the thousands typically required by DRL algorithms. We conduct two additional experiments using open-loop oscillators to identify current shortcomings of these algorithms. Our results show that, compared to the baseline, DRL is more prone to performance degradation when exposed to sensor noise or failure. Furthermore, we demonstrate a successful transfer from simulation to reality using an elastic quadruped, where RL fails without randomization or reward engineering. Overall, the proposed baseline and associated experiments highlight the existing limitations of DRL for robotic applications, provide insights on how to address them, and encourage reflection on the costs of complexity and generality.

Antonin Raffin,Daniel Seidel,Jens Kober,Alin Albu-Schäffer,Joao Silvério,Freek Stulp
October 2022 SoftRobot 2024

Learning to Exploit Elastic Actuators for Quadruped Locomotion

Spring-based actuators in legged locomotion provide energy-efficiency and improved performance, but increase the difficulty of controller design. While previous work has focused on extensive modeling and simulation to find optimal controllers for such systems, we propose to learn model-free controllers directly on the real robot. In our approach, gaits are first synthesized by central pattern generators (CPGs), whose parameters are optimized to quickly obtain an open-loop controller that achieves efficient locomotion. Then, to make this controller more robust and further improve the performance, we use reinforcement learning to close the loop, to learn corrective actions on top of the CPGs. We evaluate the proposed approach on the DLR elastic quadruped bert. Our results in learning trotting and pronking gaits show that exploitation of the spring actuator dynamics emerges naturally from optimizing for dynamic motions, yielding high-performing locomotion, particularly the fastest walking gait recorded on bert, despite being model-free. The whole process takes no more than 1.5 hours on the real robot and results in natural-looking gaits.

Recent Publications

The 37 Implementation Details of Proximal Policy Optimization
Stable-Baselines3: Reliable Reinforcement Learning Implementations
Smooth Exploration for Robotic Reinforcement Learning
See all publications

Recent & Upcoming Talks

Recent Advances in RL for Continuous Control (SOTA) - Early 2026 Update

A presentation on recent advances in model free RL (SOTA early 2026), in terms of algorithms, software, and simulators.
Feb 6, 2026 09:00 — Feb 6, 2025 10:00University of Mannheim, Mannheim, Germany

Stable-Baselines3 (SB3) Tutorial: Getting Started With Reinforcement Learning

This tutorial will present the basics of the Gymnasium and Stable-Baselines3 (SB3) libraries in order to apply reinforcement learning …

Oct 30, 2025 14:00 — 15:30DLR, Munich, Germany

PhD Defense: Enabling Reinforcement Learning on Real Robots

This dissertation makes several contributions to the training of reinforcement learning agents directly on real robots. It introduces a …
Oct 28, 2025 09:00 — 12:00TUM, Munich, Germany

Designing (Robot) Software That Is Easy to Use

Hardware without software is like an instrument without a musician. Based on my experience maintaining the Stable-Baselines3 library and working with real robots, I will present key principles for creating easy-to-use interfaces.
Sep 27, 2025 08:00 — 11:00Seoul, South Korea
See all talks

Recent Posts

RL103: From Deep Q-Learning (DQN) to Soft Actor-Critic (SAC) and Beyond

This second blog post continues my practical introduction to (deep) reinforcement learning, presenting the main concepts and providing intuitions to understand the more recent Deep RL algorithms.In a first post (RL102), I started from tabular Q-learning and worked my way up to Deep Q-learning (DQN).
Dec 12, 2025

RL102: From Tabular Q-Learning to Deep Q-Learning (DQN)

This blog post is meant to be a practical introduction to (deep) reinforcement learning1, presenting the main concepts and providing intuitions to understand the more recent Deep RL algorithms. For a more in-depth and theoretical introduction, I recommend reading the RL Bible by Sutton and Barto.
Sep 16, 2025

Getting SAC to Work on a Massive Parallel Simulator: Tuning for Speed (Part II)

This second post details how I tuned the Soft-Actor Critic (SAC) algorithm to learn as fast as PPO in the context of a massively parallel simulator (thousands of robots simulated in parallel).
Jul 1, 2025

Automatic Hyperparameter Tuning - In Practice (Part 2)

This is the second (and last) post on automatic hyperparameter optimization. In the first part, I introduced the challenges and main components of hyperparameter tuning (samplers, pruners, objective function, …). This second part is about the practical application of this technique with the Optuna library, in a reinforcement learning setting (using the Stable-Baselines3 (SB3) library).
Apr 23, 2025
See all posts

Experience

 
 
 
 
 

Researcher

German Aerospace Center (DLR)

October 2018 – PresentMunich
Machine Learning for Robots.
 
 
 
 
 

PhD in Robotics

Technical University of Munich (TUM)

October 2018 – October 2025Munich
PhD Thesis: Enabling Reinforcement Learning on Real Robots
 
 
 
 
 

Research Engineer

ENSTA ParisTech - U2IS robotics lab

October 2017 – October 2018Palaiseau
Working on Reinforcement Learning and State Representation Learning for the DREAM project.
 
 
 
 
 

Research Intern

Riminder

April 2017 – September 2017Paris
Deep Learning for Human Resources.
 
 
 
 
 

Research Intern

TU Berlin - RBO lab

May 2016 – August 2016Berlin
Research internship in representation and reinforcement learning.

Tags

ArduinoC++Deep LearningJaxKalman FilterMachine LearningPath PlanningPath TrackingPythonReinforcement LearningReinforcement Learning,RoboticsRustState Representation Learning

Contact

Cite
Copy Download

[8]ページ先頭

©2009-2026 Movatter.jp