How CoreWeave Spot and Flex Reservations Work (and When to Use Each)
Captured source
source ↗Flex Reservations and Spot Explained | CoreWeave Blog
Announcement
Announcement
Webinar
Announcement
Podcast
Announcement
GTC 2026
Announcement
CoreWeave brings up the industry’s first NVIDIA Vera Rubin NVL72 deployment.
Read more
Products
Data and storage
Infrastructure control
Runtime acceleration
Model and agent development
Mission control
Solutions
Pricing
Resources
About us
Contact us Login
Contact us Login
Clear
AI demand isn’t flat, even when workloads run continuously. Some work needs always-on capacity, some needs guaranteed headroom, and some can trade interruptions for cost. That mix requires more than a single capacity model. To support these diverse workloads, CoreWeave Capacity Plans now include four models—Reservations, On-Demand, Spot (GA), and Flex Reservations (preview). Each model is designed for a different workload pattern. Designed to match AI workloads that aren’t always steady, Spot and Flex Reservations expand how you balance guarantees, cost, and utilization. This post explains how each works in practice and when to use them.
Capacity Plans Mix Flex Reservations: Guaranteed capacity without overbuying Traditional reservation models offer guaranteed access but introduce the challenge of overprovisioning. When you reserve GPUs for 24/7 access, you pay for them whether they’re fully utilized or not. For AI teams with fluctuating workloads, that often means paying for idle capacity and underutilized infrastructure.
21% budget loss | An estimated 21% of enterprise cloud infrastructure spend is wasted on underutilized resources. ( Harness Press Release, February 2025 )(Harness Press Release, February 2025)
Flex Reservations are designed to change that, separating guaranteed access from full 24/7 run-rate pricing. Flex Reservations provide: Guaranteed access up to your Flex Reservations ceiling A lower holding fee to keep that capacity reserved (idle or running) A complementary usage rate that applies only when GPUs nodes are in use
Most clouds force a binary choice. Commit and pay continuously or stay flexible and give up capacity guarantees. With this new offering, CoreWeave sets a new standard by reserving capacity without overprovisioning. No more overbuying full-time reservations just to innovate. You secure the capacity you need for the long term, but your cost structure better aligns with actual utilization. How Flex works From an operational perspective, nothing changes in how you deploy workloads. The change is in the commercial structure beneath it. You reserve a peak ceiling You pay a 24/7 holding fee You pay usage charges only when instances are in use
With Flex, you commit to a defined peak capacity (for example, 200 GPUs). That capacity is reserved for you up to that ceiling throughout the term, just like a traditional Reservation. The reservation is continuous, but the cost is not. Instead of paying full run rates around the clock, you pay a lower holding fee to keep that capacity reserved. That holding fee keeps the GPUs set aside for you so they’re available when you need them. The key change is simple. You are paying to hold the capacity, not paying full run rates when you are not using it. When you actually spin up and use those GPUs, you pay a usage rate. When you scale down below your peak, you stop paying the usage rate on idle capacity and only pay the lower holding fee. That means your total cost flexes with real utilization, while your access does not. The end goal is a cost model that reflects true AI usage—without forcing teams to plan down to the minute, or predict exactly the amount of capacity they will need. Spot: Lower-cost compute for interruptible workloads Not every workload needs guaranteed uptime. Some jobs are inherently fault-tolerant while others are experimental, batch-based, or opportunistic. For these use cases, paying a premium for guaranteed capacity doesn’t always make sense. Spot instances offer lower-cost access to GPUs with no long-term commitment, designed for workloads that can tolerate interruption. How CoreWeave Spot handles preemption differently Like other Spot-style offerings in the market, these instances can be preempted, but for the right workloads the savings can be significant. CoreWeave Spot surfaces preemption as a clear signal with advance notice, giving workloads time to checkpoint and shift work before a node is terminated. If a Spot node is going to be preempted, you will receive: Explicit termination signaling: you receive explicit preemption signals and a defined notice window to checkpoint before termination, so interruption is manageable, not chaotic Defined preemption notice window: CoreWeave provides a defined advance notice window before termination, giving workloads time to checkpoint and drain cleanly
Other clouds typically offer shorter or variable notice periods, depending on the product tier and instance type. CoreWeave gives you more time to pause, pivot, and plan for when a spot reservation will be interrupted. Take Spot for a spin We’ve said we designed Capacity Plans for real-world AI. Now you can test it on real-world infrastructure, with real benchmarks and real insights. For Spot instances, CoreWeave ARENA provides a way to test them before you scale. You get a controlled environment to run end-to-end workloads under production-like conditions. Test performance, understand cost behavior, and evaluate scaling dynamics before committing to Spot or long-term capacity. If your focus is production inference workflows, Weights and Biases Inference is another path built on CoreWeave infrastructure. It lets teams deploy and iterate within existing ML workflows while keeping visibility into performance and cost as demand evolves, including explicit GPU type selection. Is your workload a good fit for Spot? For teams that build with resilience in mind, Spot can dramatically reduce the cost of experimentation and large-scale processing. At a high level, Spot works best when workloads are: Checkpointable: training jobs that regularly save model state, so progress isn’t lost Retry-safe: jobs that can automatically restart from the last saved step Distributed or fault-tolerant: systems where losing a node doesn’t collapse the entire job Non-urgent or batch-based: backfills, experiments, evaluation runs, and overflow workloads
Interruption ahead? Make sure you’ve designed for it. The most successful Spot users treat interruption as a design constraint, not...
Excerpt shown — open the source for the full document.
Notability
notability 4.0/10Informative but routine infrastructure blog post