NeMo RL Documentation#

Welcome to the NeMo RL documentation. NeMo RL is an open-source post-training library developed by NVIDIA, designed to streamline and scale reinforcement learning methods for multimodal models (LLMs, VLMs, etc.).

This documentation provides comprehensive guides, examples, and references to help you get started with NeMo RL and build powerful post-training pipelines for your models.

Getting Started#

Overview

Learn about NeMo RL’s architecture, design philosophy, and key features that make it ideal for scalable reinforcement learning.

Overview
Quick Start

Get up and running quickly with examples for both DTensor and Megatron Core training backends.

Quick Start
Installation

Step-by-step instructions for installing NeMo RL, including prerequisites, system dependencies, and environment setup.

Installation and Prerequisites
Features

Explore the current features and upcoming enhancements in NeMo RL, including distributed training, advanced parallelism, and more.

Features and Roadmap
Tips and Tricks

Troubleshooting common issues including missing submodules, Ray dashboard access, and debugging techniques.

Tips and Tricks

Training and Generation#

Training Backends

Learn about DTensor and Megatron Core training backends, their capabilities, and how to choose the right one for your use case.

Training and Generation Backends
Algorithms

Discover supported algorithms including GRPO, SFT, DPO, RM, and on-policy distillation with detailed guides and examples.

Algorithms
Evaluation

Learn how to evaluate your models using built-in evaluation datasets and custom evaluation pipelines.

Evaluation
Cluster Setup

Configure and deploy NeMo RL on multi-node Slurm or Kubernetes clusters for distributed computing.

Installation: Set Up Clusters

Guides and Examples#

GRPO DeepscaleR

Reproduce DeepscaleR results with NeMo RL using GRPO on mathematical reasoning tasks.

GRPO on DeepScaler
SFT on OpenMathInstruct2

Step-by-step guide for supervised fine-tuning on the OpenMathInstruct2 dataset.

SFT on OpenMathInstruct-2
Environments

Create custom reward environments and integrate them with NeMo RL training pipelines.

Environments for GRPO Training
Adding New Models

Learn how to add support for new model architectures in NeMo RL.

Add New Models

Advanced Topics#

Design and Philosophy

Deep dive into NeMo RL’s architecture, APIs, and design decisions for scalable RL.

Design and Philosophy
Debugging

Tools and techniques for debugging distributed Ray applications and RL training runs.

Debug NeMo RL Applications
FP8 Quantization

Optimize large language models with FP8 quantization for faster training and inference.

FP8 Quantization in NeMo RL
Docker Containers

Build and use Docker containers for reproducible NeMo RL environments.

Build Docker Images

API Reference#

Complete API Documentation

Comprehensive reference for all NeMo RL modules, classes, functions, and methods. Browse the complete Python API with detailed docstrings and usage examples.

API Reference