ForkNous ResearchNous Researchpublished Apr 3, 2026seen 5d

NousResearch/RL

forked from NVIDIA-NeMo/RL

Open original ↗

Captured source

source ↗
published Apr 3, 2026seen 5dcaptured 11hhttp 200method plain

NousResearch/RL

Description: Scalable toolkit for efficient model reinforcement

License: Apache-2.0

Stars: 11

Forks: 2

Open issues: 1

Created: 2026-04-03T14:35:01Z

Pushed: 2026-06-04T18:36:40Z

Default branch: main

Fork: yes

Parent repository: NVIDIA-NeMo/RL

Archived: no

README:

📣 News

  • [03/12/2026] GDPO Support
  • Enabling Group reward-Decoupled Normalization Policy Optimization (GDPO) for multi-reward RL training is now supported.
  • Example: [gdpo_math_1B.yaml](/examples/configs/gdpo_math_1B.yaml)
  • Support Async RL training
  • WIP: Nemo-gym compatibility
  • [03/11/2026] Nemotron-3-Super was post-trained with NeMo-RL! Follow this guide to reproduce the full RL training recipe.
  • [02/04/2026] LoRA Support
  • LoRA SFT is supported on both DTensor and Megatron Core backends.
  • LoRA GRPO is supported on both DTensor and Megatron Core backends.
  • LoRA DPO is supported on both DTensor and Megatron Core backends.
  • Nano v3 LoRA recipes:
  • [sft-nanov3-30BA3B-2n8g-fsdp2-lora.yaml](examples/configs/recipes/llm/sft-nanov3-30BA3B-2n8g-fsdp2-lora.yaml)
  • [grpo-nanov3-30BA3B-2n8g-fsdp2-lora.yaml](examples/configs/recipes/llm/grpo-nanov3-30BA3B-2n8g-fsdp2-lora.yaml)
  • [grpo-nanov3-30BA3B-2n8g-megatron-lora.yaml](examples/configs/recipes/llm/grpo-nanov3-30BA3B-2n8g-megatron-lora.yaml)
  • [01/30/2026] Release v0.5.0!
  • Both linux/amd64 and linux/arm64 Docker containers are available on NGC nvcr.io/nvidia/nemo-rl:v0.5.0.
  • NeMo-Gym + NeMo-RL support
  • 📊 View the release run metrics on Google Colab to get a head start on your experimentation.

Previous News

  • [12/15/2025] NeMo-RL is the framework that trained NVIDIA-NeMotron-3-Nano-30B-A3B-FP8! [This guide](docs/guides/nemotron-3-nano.md) provides reproducible instructions for the post-training process.
  • [10/10/2025] DAPO Algorithm Support

NeMo RL now supports Decoupled Clip and Dynamic Sampling Policy Optimization (DAPO) algorithm that extends GRPO with Clip-Higher, Dynamic Sampling, Token-Level Policy Gradient Loss, and Overlong Reward Shaping for more stable and efficient RL training. See the [DAPO guide](docs/guides/dapo.md) for more details.

  • [5/14/2025] [Reproduce DeepscaleR with NeMo RL!](docs/guides/grpo-deepscaler.md)
  • [5/14/2025] Release v0.2.1!
  • 📊 View the release run metrics on Google Colab to get a head start on your experimentation.

Overview

NeMo RL is an open-source post-training library under the NVIDIA NeMo Framework, designed to streamline and scale reinforcement learning methods for multimodal models (LLMs, VLMs etc.). Designed for flexibility, reproducibility, and scale, NeMo RL enables both small-scale experiments and massive multi-GPU, multi-node deployments for fast experimentation in research and production environments.

!NeMo RL Architecture Diagram

What you can expect:

  • Flexibility with a modular design that allows easy integration and customization.
  • Efficient resource management using Ray, enabling scalable and flexible deployment across different hardware configurations.
  • Hackable with native PyTorch-only paths for quick research prototypes.
  • High performance with Megatron Core, supporting various parallelism techniques for large models and large context lengths.
  • Seamless integration with Hugging Face for ease of use, allowing users to leverage a wide range of pre-trained models and tools.
  • Comprehensive documentation that is both detailed and user-friendly, with practical examples.

Please refer to our design documents for more details on the architecture and design philosophy.

Training Backends

NeMo RL supports multiple training backends to accommodate different model sizes and hardware configurations:

  • DTensor - PyTorch's next-generation distributed training with improved memory…

Excerpt shown — open the source for the full document.

Notability

notability 2.0/10

low stars routine fork