What does this repo signal mean?

Novita AI published novitalabs/pegaflow (Rust). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo novitalabs/pegaflow · language Rust · New repo with 133 stars, solid but not major. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

Novita AI Repo: novitalabs/pegaflow

Captured source

source ↗

GitHub/github.com/novitalabs/pegaflow

novitalabs/pegaflow repository metadata

Source ↗

published Jan 5, 2026seen Jun 5captured Jun 11http 200method plain

novitalabs/pegaflow

Description: High-performance KV cache storage for LLM inference — GPU offloading, SSD caching, and cross-node sharing via RDMA. Works with vLLM and SGLang.

Language: Rust

License: Apache-2.0

Stars: 136

Forks: 20

Open issues: 36

Created: 2026-01-05T08:38:08Z

Pushed: 2026-06-10T17:56:00Z

Default branch: master

Fork: no

Archived: no

README:

Pegaflow

PegaFlow is a high-performance KV cache storage engine for LLM inference. Offload KV cache from GPU to host memory or SSD, and share it across nodes via RDMA.

Decoupled from inference lifecycle — runs as an independent sidecar; KV cache survives engine restarts, scales independently, and is shared across instances
Topology-aware, PCIe-saturating transfers — NUMA-aware pinned memory + layer-wise DMA to maximize hardware bandwidth
GIL-free Rust core — zero Python overhead on the hot path; your inference engine keeps its threads
Production-ready observability — built-in Prometheus metrics and OTLP export, not an afterthought
Pluggable — works with vLLM as a drop-in KV connector

News

2026-05-18 — vLLM x Novita AI: PegaFlow for Production-Grade External KV Cache, a joint blog post with the vLLM team.

Architecture

Framework Integration

| Framework | Status | Link | |-----------|--------|------| | vLLM | ✅ Ready | [Quick Start](#3-launch-your-inference-engine) |

Quick Start

1. Install

uv pip install pegaflow-llm # CUDA 12
uv pip install pegaflow-llm-cu13 # CUDA 13

2. Start PegaFlow Server

pegaflow-server

3. Launch your inference engine

vLLM:

vllm serve Qwen/Qwen3-0.6B \
--kv-transfer-config '{"kv_connector": "PegaKVConnector", "kv_role": "kv_both", "kv_connector_module_path": "pegaflow.connector"}'

> For full server options, multi-node setup, and advanced configuration, see [Server Configuration](./docs/server.md).

Development

Build from source

export PYO3_PYTHON=$(which python)
export LD_LIBRARY_PATH=$(python -c "import sysconfig; print(sysconfig.get_config_var('LIBDIR'))"):$LD_LIBRARY_PATH

cargo run -r # start server
cd python && maturin develop -r # build Python bindings

The default source build targets CUDA 12.8. If your environment uses CUDA 13, disable the default CUDA feature and enable cuda-13 explicitly:

cargo run -r --no-default-features --features cuda-13 --bin pegaflow-server
cd python && uv run maturin develop -r --no-default-features --features cuda-13
./scripts/build-wheel.sh --release --no-default-features --features cuda-13

We use Conventional Commits — run cz c for an interactive commit prompt.

Benchmarks

KV Cache Benchmark

H800 reference numbers with Llama-3.1-8B (8 prompts, 10K-token prefill, 1-token decode, 4.0 req/s):

| Configuration | TTFT mean (ms) | TTFT p99 (ms) | | --------------- | -------------- | ------------- | | PegaFlow (Cold) | 572.5 | 1113.7 | | PegaFlow (Warm) | 61.5 | 77.0 |

The warm-start path achieves ~9x faster TTFT compared to cold-start, demonstrating effective KV cache sharing across requests.

Documentation

[Server Configuration](./docs/server.md) — full CLI options, SSD cache, multi-node setup
[Python Package](./python/README.md) — Python bindings and vLLM connector configuration
[P2P KV Cache Sharing](./docs/p2p.md) — cross-node RDMA setup, tuning, and troubleshooting
[P/D Router](./docs/pd.md) — prefill/decode disaggregation
[vLLM I/O Patch](./docs/vllm-patch.md) — optional patch for better transfer throughput
[Metrics](./docs/metrics.md) — Prometheus and OTLP metrics reference
[Goals & Non-Goals](./docs/goals.md) — project scope and design philosophy

Notability

notability 6.0/10

New repo with 133 stars, solid but not major