inclusionAI/asystem-awex
Python
Captured source
source ↗inclusionAI/asystem-awex
Description: A high-performance RL training-inference weight synchronization framework, designed to enable second-level parameter updates from training to inference in RL workflows
Language: Python
License: Apache-2.0
Stars: 160
Forks: 18
Open issues: 3
Created: 2025-11-17T02:43:55Z
Pushed: 2026-05-25T09:31:45Z
Default branch: main
Fork: no
Archived: no
README:
Awex
Awex is a high-performance RL training-inference weight synchronization framework, designed to enable second-level parameter updates from training to inference in RL workflows. It minimizes iteration latency, ensuring rollout phases consistently use the latest model.
🚀 Key Features
- Extreme Sync Speed: Trillion-parameter models fully synchronized within 10 seconds; validated on thousand-GPU
clusters with industry-leading performance.
- Unified Weight Adaptation Layer: Automatically handles tensor format/layout differences across parallel strategies
and engine frameworks, supporting any model architecture.
- Zero-Redundancy Transfer & In-Place Update: Transfers only necessary shards; supports in-place GPU memory updates
on inference, avoiding costly allocation and copying.
- Multi-Mode Transfer Support: Support NCCL, RDMA, and shared memory transfer mode to leverage NVLink/NVSwitch/RDMA
bandwidth and reduce long-tail latency.
- Heterogeneous Deployment Compatibility: Fully supports co-location and separation modes, make RL sync/async
algorithms runs seamlessly.
- Extensibility: Easily extends to support new training and inference engines.
Architecture
The Awex weight exchange framework consists primarily of three components:
- WeightWriter: Runs within each training process, responsible for metadata collection and reporting of weight shards for the current training process, weight convert, resharding transfer plan construction, weight transmission, and other functions;
- WeightReader: Runs on the control process of each inference instance, which starts a WorkerWeightsReader on each GPU managed by the inference instance, corresponding to the WeightWriter of the training process. Responsible for metadata collection and reporting of weight shards for each inference process, weight convert, resharding transfer plan construction, weight reception, and other functions;
- MetaServer: Job-level global server for service discovery and weight metadata exchange between training and inference engines, as well as event notification functions in co-located scenarios;
The core modules of weight exchange consist mainly of 5 parts:
- Unified training-inference weight convert: Responsible for converting weights from training and inference engines with different parallelism strategies and tensor layouts into a unified format for subsequent weight metadata calculation and weight transmission;
- Global weight metadata calculation and exchange: After converting training and inference weights into a unified format, collects all weight shard metadata from each worker and reports to Meta Server for subsequent weight transmission plan construction;
- P2P weight transmission execution plan: Training and inference engines obtain global weight shard metadata from all workers, then separately construct peer-to-peer deterministic transfer plan for sending and receiving;
- NCCL weight transmission: Uses NCCL's send/recv API for peer-to-peer weight transmission based on the constructed transmission plan;
- RDMA weight transmission: Uses NUMA affinity and RDMA communication for globally load-balanced transfer plan for weight updates;
Awex also supports tensor-level validation of weights, comparing weights loaded through file system mode with those loaded through transmission mode at the tensor level for fine-grained comparison, ensuring the correctness of the transmission mode.
See more details on our [Document](docs).
For comprehensive introduction about awex, see the medium article
Performance Benchmarks
On thousand-GPU scale clusters, Awex using NCCL transmission can exchange 10B-scale model weights within one second, and exchange 1T-scale model weights within twenty seconds. Using RDMA for transmission, 1T model weight exchange time can be further reduced to six seconds.
| Weight Parameter Scale | Weight Data Size | Verl Time | Awex NCCL Transmission Time | Awex RDMA Transmission Time | | ---------------------- | ---------------- | --------- | --------------------------- | --------------------------- | | 10B | 31GB | 3.5S | 0.8S | 0.5S | | 100B | 191GB | 35S | 9S | 3.2S | | 1000B | 1000GB (FP8) | / | 20S | 6S |
📦 Installation
Requirements
- Python 3.8 or higher
- PyTorch 2.0.0 or higher (for GPU support)
Basic Installation
Install awex using pip:
pip install awex
Build from Source
Clone the repository and install in development mode:
git clone git@github.com:inclusionAI/awex.git cd awex pip install -e .
For development with additional tools:
pip install -e ".[dev]"
Quick Start
Awex is a pure Python library that can be installed and used with one command, supporting Python 3.8 and above.
pip install awex
Megatron training engine weight sending example:
from awex import NCCLWeightsWriter from awex.engine.mcore import MegatronEngine # init train_engine = MegatronEngine(awex_config, hf_config, mcore_model) writer = NCCLWeightsWriter(train_engine) writer.initialize() # write weights writer.write_weights(step_id=1)
SGLang inference engine weight update example:
from awex import WeightsReader, InferenceConfig from awex.engine.sglang import SGLangEngine import sglang as sgl sgl_engine = sgl.Engine(model_path="xxx", tp_size=2, random_seed=42) awex_config = InferenceConfig.from_sgl_engine(sgl_engine, comm_backend="nccl") # for sglang support, you must ensure https://github.com/sgl-project/sglang/pull/13595 # is included in your sglang version inference_engine = SGLangEngine(awex_config, sgl_engine) reader = WeightsReader(inference_engine) reader.initialize() # update weights reader.update_weights(step_id=1)
Weight Conversion Tests
These scripts compare weight formats across Megatron, vLLM, and SGLang by converting all parameters into…
Excerpt shown — open the source for the full document.
Notability
notability 5.0/10New repo with 160 stars, moderate traction