databricks/rollout-operator
forked from grafana/rollout-operator
Captured source
source ↗databricks/rollout-operator
Description: Kubernetes Rollout Operator
Language: Go
License: Apache-2.0
Stars: 1
Forks: 3
Open issues: 1
Created: 2024-03-26T17:29:26Z
Pushed: 2025-09-01T23:07:00Z
Default branch: db_main
Fork: yes
Parent repository: grafana/rollout-operator
Archived: no
README:
Kubernetes Rollout Operator
This operator coordinates the rollout of pods between different StatefulSets within a specific namespace and can be used to manage multi-AZ deployments where pods running in each AZ are managed by a dedicated StatefulSet.
How updates work
The operator coordinates the rollout of pods belonging to StatefulSets with the rollout-group label and updates strategy set to OnDelete. The label value should identify the group of StatefulSets to which the StatefulSet belongs to. Make sure the StatefulSet has a label name in its spec.template, as the operator uses it to find pods belonging to it.
For example, given the following StatefulSets in a namespace:
ingester-zone-awithrollout-group: ingesteringester-zone-bwithrollout-group: ingestercompactor-zone-awithrollout-group: compactorcompactor-zone-bwithrollout-group: compactor
The operator independently coordinates the rollout of pods of each group:
- Rollout group:
ingester ingester-zone-aingester-zone-b- Rollout group:
compactor compactor-zone-acompactor-zone-b
For each rollout group, the operator guarantees: 1. Pods in 2 different StatefulSets are not rolled out at the same time 1. Pods in a StatefulSet are rolled out if and only if all pods in all other StatefulSets of the same group are Ready (otherwise it will start or continue the rollout once this check is satisfied) 1. Pods are rolled out if and only if all StatefulSets in the same group have OnDelete update strategy (otherwise the operator will skip the group and log an error) 1. The maximum number of not-Ready pods in a StatefulSet doesn't exceed the value configured in the rollout-max-unavailable annotation (if not set, it defaults to 1). Values:
1: pods are rolled out in parallel (honoring the configured number of max unavailable pods)
How scaling up and down works
The operator can also optionally coordinate scaling up and down of StatefulSets that are part of the same rollout-group based on the grafana.com/rollout-downscale-leader annotation. When using this feature, the grafana.com/min-time-between-zones-downscale label must also be set on each StatefulSet.
This can be useful for automating the tedious scaling of stateful services like Mimir ingesters. Making use of this feature requires adding a few annotations and labels to configure how it works.
If the grafana.com/rollout-upscale-only-when-leader-ready annotation is set to true on a follower StatefulSet, the operator will only scale up the follower once all replicas in the leader StatefulSet are ready. This ensures that the follower zone does not scale up until the leader zone is completely stable.
Example usage for a multi-AZ ingester group:
- For
ingester-zone-a, add the following: - Labels:
grafana.com/min-time-between-zones-downscale=12h(change the value here to an appropriate duration)grafana.com/prepare-downscale=true(to allow the service to be notified when it will be scaled down)- Annotations:
grafana.com/prepare-downscale-http-path=ingester/prepare-shutdown(to call a specific endpoint on the service)grafana.com/prepare-downscale-http-port=80(to call a specific endpoint on the service)- For
ingester-zone-b, add the following: - Labels:
grafana.com/min-time-between-zones-downscale=12h(change the value here to an appropriate duration)grafana.com/prepare-downscale=true(to allow the service to be notified when it will be scaled down)- Annotations:
grafana.com/rollout-downscale-leader=ingester-zone-a(zonebwill follow zonea, after a delay)grafana.com/rollout-upscale-only-when-leader-ready=true(zonebwill only scale up once all replicas in zoneaare ready)grafana.com/prepare-downscale-http-path=ingester/prepare-shutdown(to call a specific endpoint on the service)grafana.com/prepare-downscale-http-port=80(to call a specific endpoint on the service)- For
ingester-zone-c, add the following: - Labels:
grafana.com/min-time-between-zones-downscale=12h(change the value here to an appropriate duration)grafana.com/prepare-downscale=true(to allow the service to be notified when it will be scaled down)- Annotations:
grafana.com/rollout-downscale-leader=ingester-zone-b(zonecwill follow zoneb, after a delay)grafana.com/rollout-upscale-only-when-leader-ready=true(zonecwill only scale up once all replicas in zonebare ready)grafana.com/prepare-downscale-http-path=ingester/prepare-shutdown(to call a specific endpoint on the service)grafana.com/prepare-downscale-http-port=80(to call a specific endpoint on the service)
Scaling based on reference resource
Rollout-operator can use custom resource with scale and status subresources as a "source of truth" for number of replicas for target statefulset. "Source of truth" resource (or "reference resource") is configured using following annotations:
grafana.com/rollout-mirror-replicas-from-resource-namegrafana.com/rollout-mirror-replicas-from-resource-kindgrafana.com/rollout-mirror-replicas-from-resource-api-versiongrafana.com/rollout-mirror-replicas-from-resource-write-back
These annotations must be set on StatefulSet that rollout-operator will scale (ie. target statefulset). Number of replicas in target statefulset will follow replicas in reference resource (from scale subresource). Reference resource's status subresource will be updated with current number of replicas in target statefulset, unless explicitly disabled by setting grafana.com/rollout-mirror-replicas-from-resource-write-back annotation to false.
This is similar to using grafana.com/rollout-downscale-leader, but reference resource can be any kind of resource, not just statefulset. Furthermore grafana.com/min-time-between-zones-downscale is not respected when using scaling based on reference resource.
This can be used in combination with HorizontalPodAutoscaler, when it is undesireable to set number of replicas directly on target statefulset, because we want to add custom logic to the scaledown (see next…
Excerpt shown — open the source for the full document.