Snowflake/Arctic-LSTM-Speculator-gpt-oss-120b
Captured source
source ↗published Aug 21, 2025seen 5dcaptured 11hhttp 200method plainlicense apache-2.0downloads 78likes 5
ArcticSpeculator
Build the fastest OSS vllm-based speculative decoding system for your own model, using ArcticTraining and ArcticInference!
Throughput (tokens/s) of gpt-oss-120b on 8xH100 using vLLM below:
| method | ShareGPT | HumanEval | |--------------------------------------|----------------|--------------| | vLLM V1 Baseline | 220.2 | 220.7 | | ArcticSpeculator | 377.3 | 400.0 |
For more details about ArcticSpeculator and how to use it:
- ❄️ [Using Arctic-Inference and Arctic-Training for improving real-world speculative decoding Performance (blog)]()
- 🚀 Getting started guide using ArcticTraining
See all of the speculators we have released via our Speculators Collection
Notability
notability 4.0/10Low traction, niche speculator model release.