togethercomputer/xorl
Python
Captured source
source ↗togethercomputer/xorl
Description: XoRL
Language: Python
Stars: 10
Forks: 0
Open issues: 0
Created: 2026-03-12T23:33:34Z
Pushed: 2026-04-25T12:02:38Z
Default branch: main
Fork: no
Archived: no
README:
High-performance distributed training for LLMs — RL, SFT, MoE, and beyond.
🚀 Installation · ⚡ Quick Start · 📚 Documentation
---
🔍 Overview
XoRL is a distributed training framework designed for large language models with composable parallelism and flexible training modes.
The XoRL stack consists of three repos:
| Repo | Description | |---|---| | [xorl](https://github.com/togethercomputer/xorl-internal) | Distributed training framework — local SFT/pretraining and server-mode RL training | | [xorl-client](https://github.com/togethercomputer/xorl-client) | Lightweight Python SDK for driving the xorl training server (forward/backward, optimizer steps, checkpointing, sampling) | | [xorl-sglang](https://github.com/togethercomputer/xorl-sglang) | Fork of SGLang with weight-sync APIs, MoE routing export, and numerical alignment for online RL |
Two training modes:
- Local —
torchrun-based training for offline SFT and pretraining - Server — REST API-driven training for online RL loops where xorl-client drives the training loop and xorl-sglang serves inference
Parallelism strategies — mix and match freely:
| Strategy | Description | |---|---| | FSDP2 | Fully sharded data parallelism (PyTorch native) | | Tensor Parallel | Column/row weight sharding across GPUs | | Pipeline Parallel | Interleaved 1F1B schedule across stages | | Context Parallel | Ring attention + Ulysses sequence parallel | | Expert Parallel | MoE expert sharding via DeepEP |
Fine-tuning methods — full weights, LoRA, and QLoRA (int4/nvfp4/block_fp8), all FSDP2-compatible.
---
🚀 Installation
git clone --recurse-submodules git@github.com:togethercomputer/xorl-internal.git cd xorl-internal
> Already cloned without --recurse-submodules? Run git submodule update --init --recursive
Option A: uv (recommended)
uv sync source .venv/bin/activate
Option B: conda
conda create -n xorl python=3.12 conda activate xorl pip install -e .
Submodules
The repo includes two git submodules under submodules/ (needed for server / online RL training):
- [xorl-client](https://github.com/togethercomputer/xorl-client) — Lightweight Python SDK (no PyTorch dependency) for driving the xorl training server. Provides
ServiceClient,TrainingClient,SamplingClient, andRestClientwith async-firstAPIFuturesemantics, automatic request ordering, and Tinker API compatibility. - [xorl-sglang](https://github.com/togethercomputer/xorl-sglang) — XoRL's fork of SGLang with NCCL-based weight sync endpoints, MoE routing data export (R3), and numerical alignment flags for online RL.
Install individually:
pip install -e submodules/xorl-client pip install -e "submodules/xorl-sglang/python[all]"
Or use the bundled pyproject.sglang.toml which pins PyTorch to 2.9.1 (required by sglang) and installs everything together:
uv:
cp pyproject.sglang.toml pyproject.toml uv sync source .venv/bin/activate
conda:
conda create -n xorl-sglang python=3.12 conda activate xorl-sglang cp pyproject.sglang.toml pyproject.toml pip install -e .
> Note: The default pyproject.toml uses PyTorch 2.10.0. sglang requires PyTorch 2.9.1, so the two cannot coexist in the same environment unless you use pyproject.sglang.toml.
See the installation guide for full setup including optional dependencies (DeepEP, Flash Attention).
⚡ Quick Start
# Local training on 8 GPUs torchrun --nproc_per_node=8 -m xorl.cli.train examples/local/dummy/configs/full/qwen3_8b.yaml
See the quick start guide for more examples including MoE, server training, and LoRA.
---
📚 Documentation
| Topic | Link | |---|---| | Parallelism | Overview | | MoE & DeepEP | MoE docs | | LoRA / QLoRA | Adapters | | Server training | Server docs | | Config reference | Local · Server |
---
🧠 Supported Models
| Model | Type | HuggingFace ID | |---|---|---| | Qwen3 | Dense | Qwen/Qwen3-8B, Qwen/Qwen3-32B, ... | | Qwen3-MoE | Mixture-of-Experts | Qwen/Qwen3-30B-A3B, Qwen/Qwen3-235B-A22B, ... | | Qwen3.5 | Dense | Qwen/Qwen3.5-7B, ... | | Qwen3.5-MoE | Mixture-of-Experts | Qwen/Qwen3.5-35B-A3B, Qwen/Qwen3.5-397B-A17B, ... |
Models are loaded directly from HuggingFace checkpoints — no preprocessing needed. See the supported models page for details.
---
🤝 Contributing
See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup, coding conventions, and how to run tests.
Excerpt shown — open the source for the full document.
Notability
notability 2.0/10Low stars, routine repo