Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Build RL environments for LLM training

License

NotificationsYou must be signed in to change notification settings

NVIDIA-NeMo/Gym

NeMo Gym is a framework for building reinforcement learning environments to train large language models.

NeMo Gym is a component of theNVIDIA NeMo Framework, NVIDIA’s GPU-accelerated platform for building and training generative AI models.

🏆 Why NeMo Gym?

  • Scaffolding and patterns to accelerate environment development: multi-step, multi-turn, and user modeling scenarios
  • Contribute environments without expert knowledge of the entire RL training loop
  • Test environment and throughput end-to-end independent of the RL training loop
  • Interoperable with existing environments, systems and RL training frameworks
  • Growing collection of training environments and datasets to enable Reinforcement Learning from Verifiable Reward (RLVR)

Important

NeMo Gym is currently in early development. You should expect evolving APIs, incomplete documentation, and occasional bugs. We welcome contributions and feedback - for any changes, please open an issue first to kick off discussion!

🚀 Quick Start

Setup

git clone git@github.com:NVIDIA-NeMo/Gym.gitcd Gym# Install dependenciescurl -LsSf https://astral.sh/uv/install.sh| shsource$HOME/.local/bin/envuv venv --python 3.12&&source .venv/bin/activateuv sync --extra dev --group docs# Configure your model API accessecho"policy_base_url: https://api.openai.com/v1policy_api_key: your-openai-api-keypolicy_model_name: gpt-4.1-2025-04-14"> env.yaml

Start Servers

Terminal 1 (start servers):

# Start servers (this will keep running)config_paths="resources_servers/example_simple_weather/configs/simple_weather.yaml,\responses_api_models/openai_model/configs/openai_model.yaml"ng_run"+config_paths=[${config_paths}]"

Terminal 2 (interact with agent):

# In a NEW terminal, activate environmentsource .venv/bin/activate# Interact with your agentpython responses_api_agents/simple_agent/client.py

Collect Rollouts

Terminal 2 (keep servers running in Terminal 1):

# Create a simple dataset with one queryecho'{"responses_create_params":{"input":[{"role":"developer","content":"You are a helpful assistant."},{"role":"user","content":"What is the weather in Seattle?"}]}}'> weather_query.jsonl# Collect verified rolloutsng_collect_rollouts \    +agent_name=simple_weather_simple_agent \    +input_jsonl_fpath=weather_query.jsonl \    +output_jsonl_fpath=weather_rollouts.jsonl# View the resultcat weather_rollouts.jsonl| python -m json.tool

This generates training data with verification scores!

Clean Up Servers

Terminal 1 with the running servers: Ctrl+C to stop the ng_run process.

📖 Documentation

🤝 Community & Support

We'd love your contributions! Here's how to get involved:

📚 Citations

If you use NeMo Gym in your research, please cite it using the following BibTeX entry:

@misc{nemo-gym,title ={NeMo Gym: An Open Source Framework for Scaling Reinforcement Learning Environments for LLM},howpublished ={\url{https://github.com/NVIDIA-NeMo/Gym}},year ={2025},note ={GitHub repository},}

📦 Available Resource Servers

NeMo Gym includes a curated collection of resource servers for training and evaluation across multiple domains:

Table 1: Example Resource Servers

Purpose: Demonstrate NeMo Gym patterns and concepts.

NameDemonstratesConfigREADME
Multi StepInstruction_Following exampleexample_multi_step.yamlREADME
Simple WeatherBasic single-step tool callingsimple_weather.yamlREADME
Stateful CounterSession state management (in-memory)stateful_counter.yamlREADME

Table 2: Resource Servers for Training

Purpose: Training-ready environments with curated datasets.

Tip

Each resource server includes example data, configuration files, and tests. See each server's README for details.

Resource ServerDomainDatasetDescriptionValueConfigTrainValidationLicense
Google SearchagentNemotron-RL-knowledge-web_search-mcqaMulti-choice question answering problems with search tools integratedImprove knowledge-related benchmarks with search toolsconfig-Apache 2.0
Math Advanced CalculationsagentNemotron-RL-math-advanced_calculationsAn instruction following math environment with counter-intuitive calculatorsImprove instruction following capabilities in specific math environmentsconfig-Apache 2.0
Workplace AssistantagentNemotron-RL-agent-workplace_assistantWorkplace assistant multi-step tool-using environmentImprove multi-step tool use capabilityconfigApache 2.0
Mini Swe AgentcodingSWE-bench_VerifiedA software development with mini-swe-agent orchestrationImprove software development capabilities, like SWE-benchconfigMIT
Instruction Followinginstruction_followingNemotron-RL-instruction_followingInstruction following datasets targeting IFEval and IFBench style instruction following capabilitiesImprove IFEval and IFBenchconfig-Apache 2.0
Structured Outputsinstruction_followingNemotron-RL-instruction_following-structured_outputsCheck if responses are following structured output requirements in promptsImprove instruction following capabilitiesconfigApache 2.0
Equivalence Llm JudgeknowledgeNemotron-RL-knowledge-openQAShort answer questions with LLM-as-a-judgeImprove knowledge-related benchmarks like GPQA / HLEconfig-Apache 2.0
McqaknowledgeNemotron-RL-knowledge-mcqaMulti-choice question answering problemsImprove benchmarks like MMLU / GPQA / HLEconfig-Apache 2.0
Math With JudgemathNemotron-RL-math-OpenMathReasoningMath dataset with math-verify and LLM-as-a-judgeImprove math capabilities including AIME 24 / 25configCreative Commons Attribution 4.0 International

[8]ページ先頭

©2009-2025 Movatter.jp