novitalabs/pegaflow v0.22.4
novitalabs/pegaflow
Captured source
source ↗published May 29, 2026seen 5dcaptured 9hhttp 200method plain
0.22.4
Repository: novitalabs/pegaflow
Tag: v0.22.4
Published: 2026-05-29T05:49:47Z
Prerelease: no
Release notes: ✨ Highlights
- Disaggregated P/D over RDMA push (#297) — New PdConnector plus a v2 transfer engine that pushes KV prefill→decode layer-by-layer, overlapping transfer with compute. Added TTFT is 2–4× lower than NIXL on H20/Qwen3-8B.
- Query leases (#284, #288) — Pin refcounts replaced by lease-backed query/load/release; query results are Loading/Ready only, with TTL-based reclaim.
- Save-only mode (#300) — New pegaflow.mode lets an instance populate the cache without serving reads.
🚀 Features
- Sharded SSD cache across multiple files (#299)
- Per-peer N QPs with WQE-level round-robin, --qps-per-peer (default 2) (#291)
- Metaserver node-lifecycle fencing with heartbeat UUIDs, --node-stale-secs (#285)
🐛 Fixes
- Preserve non-MLA KV layout registration, e.g. GLM-4.7-FP8 (#295)
- Allocate pinned pools on GPU-local NUMA nodes (#293)
- Handle split physical KV blocks for FlashMLA (#292)
- Allow query lease consume once per worker (multi-worker loads) (#288)
- Validate --nics; fail on RDMA init error instead of silently disabling P2P (#283)
- Remove scheduler save limit (#282); demote cache_lookup_reuse log to debug (#280)
⚡ Performance
- CPU-path benchmarks + long-block save optimizations (#290): query 12.3 → 6.1 ms, save 21.3 → 13.1 ms
⚠️ Upgrade notes
- Query API is now Loading/Ready only; pin/unpin semantics removed (#284)
- Release RPC returns FailedPrecondition for unknown/expired leases (#289)
- --nics rejects empty entries and fails on RDMA init error (#283)
- New flags: --qps-per-peer, --node-stale-secs; new config pegaflow.mode
- TinyLFU admission is now off unless explicitly enabled (#287)
Notability
notability 4.0/10Routine version release, no notable traction.