What does this fork signal mean?

Together AI forked togethercomputer/sglang-mla-rotation (forked from sgl-project/sglang). This fork signal points to upstream code the lab may be inspecting, patching, or building on. High-signal details: repo togethercomputer/sglang-mla-rotation · parent sgl-project/sglang · Optimized structured generation framework with attention rotation technique. onlylabs links this event to 1 captured evidence page and 6 related fork signals.

Together AI Fork: togethercomputer/sglang-mla-rotation

Captured source

source ↗

GitHub/github.com/togethercomputer/sglang-mla-rotation

togethercomputer/sglang-mla-rotation repository metadata

Source ↗

published Apr 8, 2026seen Jun 5captured Jun 11http 200method plain

togethercomputer/sglang-mla-rotation

Description: SGLang is a high-performance serving framework for large language models and multimodal models.

License: Apache-2.0

Stars: 0

Forks: 0

Open issues: 0

Created: 2026-04-08T00:10:20Z

Pushed: 2026-04-08T18:47:50Z

Default branch: main

Fork: yes

Parent repository: sgl-project/sglang

Archived: no

README:

--------------------------------------------------------------------------------

News

[2026/02] 🔥 Unlocking 25x Inference Performance with SGLang on NVIDIA GB300 NVL72 (blog).
[2026/01] 🔥 SGLang Diffusion accelerates video and image generation (blog).
[2025/12] SGLang provides day-0 support for latest open models (MiMo-V2-Flash, Nemotron 3 Nano, Mistral Large 3, LLaDA 2.0 Diffusion LLM, MiniMax M2).
[2025/10] 🔥 SGLang now runs natively on TPU with the SGLang-Jax backend (blog).
[2025/09] Deploying DeepSeek on GB200 NVL72 with PD and Large Scale EP (Part II): 3.8x Prefill, 4.8x Decode Throughput (blog).
[2025/09] SGLang Day 0 Support for DeepSeek-V3.2 with Sparse Attention (blog).
[2025/08] SGLang x AMD SF Meetup on 8/22: Hands-on GPU workshop, tech talks by AMD/xAI/SGLang, and networking (Roadmap, Large-scale EP, Highlights, AITER/MoRI, Wave).

[2025/11] SGLang Diffusion accelerates video and image generation (blog).
[2025/10] PyTorch Conference 2025 SGLang Talk (slide).
[2025/10] SGLang x Nvidia SF Meetup on 10/2 (recap).
[2025/08] SGLang provides day-0 support for OpenAI gpt-oss model (instructions)
[2025/06] SGLang, the high-performance serving infrastructure powering trillions of tokens daily, has been awarded the third batch of the Open Source AI Grant by a16z (a16z blog).
[2025/05] Deploying DeepSeek with PD Disaggregation and Large-scale Expert Parallelism on 96 H100 GPUs (blog).
[2025/06] Deploying DeepSeek on GB200 NVL72 with PD and Large Scale EP (Part I): 2.7x Higher Decoding Throughput (blog).
[2025/03] Supercharge DeepSeek-R1 Inference on AMD Instinct MI300X (AMD blog)
[2025/03] SGLang Joins PyTorch Ecosystem: Efficient LLM Serving Engine (PyTorch blog)
[2025/02] Unlock DeepSeek-R1 Inference Performance on AMD Instinct™ MI300X GPU (AMD blog)
[2025/01] SGLang provides day one support for DeepSeek V3/R1 models on NVIDIA and AMD GPUs with DeepSeek-specific optimizations. (instructions, AMD blog, 10+ other companies)
[2024/12] v0.4 Release: Zero-Overhead Batch Scheduler, Cache-Aware Load Balancer, Faster Structured Outputs (blog).
[2024/10] The First SGLang Online Meetup (slides).
[2024/09] v0.3 Release: 7x Faster DeepSeek MLA, 1.5x Faster torch.compile, Multi-Image/Video LLaVA-OneVision (blog).
[2024/07] v0.2 Release: Faster Llama3 Serving with SGLang Runtime (vs. TensorRT-LLM, vLLM) (blog).
[2024/02] SGLang enables 3x faster JSON decoding with compressed finite state machine (blog).
[2024/01] SGLang provides up to 5x faster inference with RadixAttention (blog).
[2024/01] SGLang powers the serving of the official LLaVA v1.6 release demo (usage).

About

SGLang is a high-performance serving framework for large language models and multimodal models. It is designed to deliver low-latency and high-throughput inference across a wide range of setups, from a single GPU to large distributed clusters. Its core features include:

Fast Runtime: Provides efficient serving with RadixAttention for prefix caching, a zero-overhead CPU scheduler, prefill-decode disaggregation, speculative decoding, continuous batching, paged attention, tensor/pipeline/expert/data parallelism, structured outputs, chunked prefill, quantization (FP4/FP8/INT4/AWQ/GPTQ), and multi-LoRA batching.
Broad Model Support: Supports a wide range of language models (Llama, Qwen, DeepSeek, Kimi, GLM, GPT, Gemma, Mistral, etc.), embedding models (e5-mistral, gte, mcdse), reward models (Skywork), and diffusion models (WAN, Qwen-Image), with easy extensibility for adding new models. Compatible with most Hugging Face models and OpenAI APIs.

-...

Excerpt shown — open the source for the full document.

Notability

notability 2.0/10

Routine fork by notable lab, no traction.