ForkArcee AIArcee AIpublished Nov 11, 2025seen 5d

arcee-ai/prime-rl

forked from PrimeIntellect-ai/prime-rl

Open original ↗

Captured source

source ↗
published Nov 11, 2025seen 5dcaptured 14hhttp 200method plain

arcee-ai/prime-rl

Description: Async RL Training at Scale

Language: Python

License: Apache-2.0

Stars: 0

Forks: 0

Open issues: 0

Created: 2025-11-11T00:08:20Z

Pushed: 2026-02-05T15:20:50Z

Default branch: main

Fork: yes

Parent repository: PrimeIntellect-ai/prime-rl

Archived: no

README:

---

PRIME-RL: Async RL Training at Scale

---

Overview

PRIME-RL is a framework for large-scale asynchronous reinforcement learning. It is designed to be easy-to-use and hackable, yet capable of scaling to 1000+ GPUs. Beyond that, here is why we think you might like it:

1. Integrates natively with `verifiers` environments via the Environments Hub 2. Supports end-to-end post-training, including SFT and RL training and evals 3. Multi-node deployment with FSDP2 training and vLLM inference backend 4. Designed for asynchronous agentic RL training at scale 5. Hackable, modular and extensible by nature

Setup

> *We develop and test on NVIDIA RTX 3090/4090/5090, A100, H100, H200, and B200. If your setup fails, please create an issue.*

Prerequisites

Currently, you need at least one NVIDIA GPU to use PRIME-RL. If you don't already have access to one, we recommend our compute platform for everything from renting on-demand single GPUs for developing, debugging and small ablations, to reserving 1000+ GPU clusters for production-scale training.

Quick Setup

Set up PRIME-RL in a single command.

curl -sSL https://raw.githubusercontent.com/PrimeIntellect-ai/prime-rl/main/scripts/install.sh | bash

Manual Setup

1. Clone the repository

git clone https://github.com/PrimeIntellect-ai/prime-rl.git
cd prime-rl

2. Install uv

curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env

3. Install dependencies from the lock file

uv sync --all-extras

3.1. Optional: Install Flash Attention 3 (on Hopper GPUs only, for flash_attention_3 attention backend)

> *NOTE*: This step will take a while, as it builds the Flash Attention 3 extension from source, as it has no wheels prebuilt. > *NOTE*: After this step, you can't run uv sync --all-extras or uv run as it will uninstall the package, you can avoid it by running uv sync --inexact or uv run --no-sync

uv pip install "flash-attn-3 @ git+https://github.com/Dao-AILab/flash-attention.git@main#subdirectory=hopper" --no-build-isolation

Validate your environment setup

1. Check that the environment uses Python 3.12

uv run python -V

2. Check that flash-attn is installed

uv run python -c "import flash_attn"

3. Check that you can run SFT trainer (*this requires 1 GPU*)

uv run sft @ configs/debug/sft/train.toml

4. Check that you can run the RL trainer (*this requires 1 GPU*)

uv run trainer @ configs/debug/rl/train.toml

5. Check that you can run the inference server (*this requires 1 GPU*)

uv run inference @ configs/debug/infer.toml

*Keep the inference server running in the background for the next steps.*

5.1. Check that you can run the orchestrator against the inference server

uv run orchestrator @ configs/debug/orch.toml

5.2. Check that you can run evals against the inference server

uv run eval @ configs/debug/eval.toml

Additional Setup

1. If you want to log your runs to W&B, log in

uv run wandb login
# Or set `export WANDB_API_KEY=...`

2. If you require gated/ private models or datasets from HuggingFace, log in

uv run hf auth login
# Or set `export HF_TOKEN=...`

Training Examples

We provide end-to-end training examples in the [examples](examples) directory to highlight features of the framework and guide you through the process of training your own models. 1. [Reverse Text](examples/reverse_text/README.md): Train Qwen3-0.6B to reverse a small chunk of text. Demonstrates tiny-scale single-turn SFT and RL training. Can be trained on a single consumer GPU in a few minutes, and is ideal for getting started. 2. [Wordle](examples/wordle/README.md): Train Qwen3-1.7B to play Wordle. A fun example of multi-turn SFT and RL training. Can be trained on a 2-4 H100 GPUs in a few hours. Ideal for exploring the multi-turn training capabilities of the framework. 3. [Alphabet Sort](examples/alphabet_sort/README.md): Train Qwen3-4B-Instruct-2507 to sort names alphabetically. Demonstrates multi-turn RL training via LoRA without SFT warmup. Can be trained on a single H100 GPU in just over an hour. Ideal for exploring LoRA-based training. 4. [Wiki Search](examples/wiki_search/README.md): Train Qwen3-4B-Instruct-2507 to answer trivia questions by searching through a Wikipedia. Demonstrates multi-turn with web search tool use.

4. *More to come...*

Docs

Check out the [docs](docs) directory for in-depth guides on how to use PRIME-RL.

  • [Entrypoints](docs/entrypoints.md) - Overview of the main components (orchestrator, trainer, inference) and how to run SFT, RL, and evals
  • [Configs](docs/configs.md) - Configuration system using TOML files, CLI arguments, and environment variables
  • [Environments](docs/environments.md) - Installing and using verifiers environments from the Environments Hub
  • [Async Training](docs/async.md) - Understanding asynchronous off-policy training and step semantics
  • [Logging](docs/logging.md) - Logging with loguru, torchrun, and Weights & Biases
  • [Checkpointing](docs/checkpointing.md) - Saving and resuming training from checkpoints
  • [Benchmarking](docs/benchmarking.md) - Performance benchmarking and throughput measurement
  • [Deployment](docs/deployment.md) - Training deployment on single-GPU, multi-GPU, and multi-node clusters
  • [Troubleshooting](docs/troubleshooting.md) - Common issues and their solutions

Contributing

We warmly welcome community contributions! We use issues to track bugs, feature requests, and share our internal roadmap. If…

Excerpt shown — open the source for the full document.

Notability

notability 3.0/10

Routine fork without notable traction