ForkBasetenBasetenpublished May 21, 2026seen 5d

basetenlabs/compact-rl

forked from PrimeIntellect-ai/prime-rl

Open original ↗

Captured source

source ↗
published May 21, 2026seen 5dcaptured 11hhttp 200method plain

basetenlabs/compact-rl

Description: Agentic RL Training at Scale w/ Compaction in the Loop

Language: Python

License: Apache-2.0

Stars: 5

Forks: 0

Open issues: 0

Created: 2026-05-21T16:40:17Z

Pushed: 2026-06-07T08:34:19Z

Default branch: main

Fork: yes

Parent repository: PrimeIntellect-ai/prime-rl

Archived: no

README:

---

PRIME-RL: Async RL Training at Scale

---

Overview

PRIME-RL is a framework for large-scale reinforcement learning. It is designed to be easy to use and hackable, yet capable of scaling to 1000+ GPUs. Here is what we think sets it apart:

1. Fully asynchronous RL for high-throughput agentic training at scale. 2. Performant: built to train 1T+ MoE models on 1000+ GPUs with FSDP2 for training and vLLM for inference, with FP8 inference, PD disaggregation, EP and CP parallelism, and more. 3. Native integration with `verifiers` environments through the Environments Hub, including built-in support for SWE and agentic environments. 4. End-to-end post-training: SFT, RL training, and evals. 5. Multi-node deployment with Slurm and Kubernetes support. 6. Multimodal support for VLMs such as Qwen3-VL. 7. Hackable, modular, and extensible by design.

Models support

The trainer works with both Hugging Face and Prime custom ModelForCausalLM out of the box. For selected families (especially large MoE) we also ship highly optimized training code under src/prime_rl/trainer/models/, including expert parallelism (EP) for MoE layers and context parallelism (CP) for long sequences (see the table), and additional kernels like quack-kernels.

With [model] impl = "auto" (the default), the trainer selects that custom stack when the Hugging Face config type is registered.

| Family | Example IDs | MoE | EP | CP | |--------|-------------|-----|----|-----| | GLM-5 (glm_moe_dsa) | zai-org/GLM-5, zai-org/GLM-5-FP8 | yes | ✅ | ✅ | | Qwen3 MoE (qwen3_moe) | Qwen/Qwen3-30B-A3B, … | yes | ✅ | ✅ | | Qwen3.5 MoE (qwen3_5_moe) | Qwen/Qwen3.5-35B-A3B, … | yes | ✅ | ✅ | | Qwen3 / Qwen3.5 VLMs | [multimodal.md](docs/multimodal.md) (qwen3_vl, qwen3_5, qwen3_5_moe) | MoE only on MoE VLMs | MoE only | ✅ | | Poolside Laguna (laguna) | poolside/Laguna-XS.2 | yes | ✅ | ✅ | | MiniMax M2 (minimax_m2) | MiniMax/MiniMax-M2 | yes | ✅ | ✅ | | Nemotron H (nemotron_h) | nvidia/Nemotron-3-Nano-30B-A3B, nvidia/Nemotron-3-Super-120B-A12B, … | yes | ✅ | ❌ | | Trinity (afmoe) | arcee-ai/Trinity-Mini, … | yes | ✅ | ✅ | | GLM-4 · GLM-4.5 MoE · INTELLECT-3 (glm4_moe) | THUDM/GLM-4-9B-0414, zai-org/GLM-4.5-Air, zai-org/GLM-4.5, PrimeIntellect/INTELLECT-3, … | yes | ✅ | ✅ | | GPT-OSS (HF, MoE) | openai/gpt-oss-20b, openai/gpt-oss-120b | yes | ❌ | ✅ | | Other HF causal LMs | Qwen3 dense, Mistral, … (impl = "hf") | varies | ❌ | ✅ |

Setup

> *We develop and test on NVIDIA RTX 3090/4090/5090, A100, H100, H200, and B200. If your setup fails, please create an issue.*

Prerequisites

Currently, you need at least one NVIDIA GPU to use PRIME-RL. If you don't already have access to one, we recommend our compute platform for everything from renting on-demand single GPUs for developing, debugging and small ablations, to reserving 1000+ GPU clusters for production-scale training.

Quick Setup

Set up PRIME-RL in a single command.

curl -sSL https://raw.githubusercontent.com/PrimeIntellect-ai/prime-rl/main/scripts/install.sh | bash

Manual Setup

1. Clone the repository

git clone https://github.com/PrimeIntellect-ai/prime-rl.git
cd prime-rl

2. Initialize submodules

git submodule update --init -- deps/verifiers deps/renderers deps/research-environments deps/pydantic-config

3. Install uv

curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env

4. Install dependencies from the lock file

uv sync --all-extras

3.1. Optional: Install Flash Attention 3 (on Hopper GPUs only, for flash_attention_3 attention backend)

> *NOTE*: This step will take a while, as it builds the Flash Attention 3 extension from source, as it has no wheels prebuilt. > *NOTE*: After this step, you can't run uv sync --all-extras or uv run as it will uninstall the package, you can avoid it by running uv sync --inexact or uv run --no-sync

uv pip install "flash-attn-3 @ git+https://github.com/Dao-AILab/flash-attention.git@main#subdirectory=hopper" --no-build-isolation

Validate your environment setup

1. Check that the environment uses Python 3.12

uv run python -V

2. Check that flash-attn is installed

uv run python -c "import flash_attn"

3. Check that you can run SFT trainer (*this requires 1 GPU*)

uv run sft @ configs/debug/sft/train.toml

4. Check that you can run the RL trainer (*this requires 1 GPU*)

uv run trainer @ configs/debug/rl/train.toml

5. Check that you can run the inference server (*this requires 1 GPU*)

uv run inference @ configs/debug/infer.toml

*Keep the inference server running in the background for the next steps.*

5.1. Check that you can run the orchestrator against the inference server

uv run orchestrator @ configs/debug/orch.toml

5.2. Check that you can run evals against the inference server

uv run eval @ configs/debug/eval.toml

Additional Setup

1. If you want to log your runs to W&B, log in

uv run wandb login
# Or set `export WANDB_API_KEY=...`

2. If you require gated/ private models or datasets from HuggingFace, log in

uv run hf auth login
# Or set `export HF_TOKEN=...`

Training Examples

We provide end-to-end training examples in the [examples](examples) directory to highlight features of the framework and guide you through the process of training your own models.

Basic Training: 1 to 8 GPUs

Follow this guide to learn the basics of Prime-RL. You can train your own models on 1 to 8 GPUs. Ideal for getting…

Excerpt shown — open the source for the full document.

Notability

notability 1.0/10

Routine fork, low stars