SentinelRolloutService — a Restate virtual object keyed by the literal string singleton. There is only ever one rollout in flight. Architecture and lifecycle details live in Sentinel Deployment.
This doc is the operator-facing recipe: how to actually start, monitor, and unstick a rollout in prod.
Prerequisites
- The target image exists in
ghcr.io/unkeyed/sentineland passed CI. - You have a kubeconfig for the prod control-plane cluster.
- You have the Restate ingress bearer token from the
restate-cloud-credentialssecret (AWS Secrets Manager). - A Slack webhook URL (optional but strongly recommended in prod — it’s the only progress stream you’ll get without tailing logs).
Start a rollout (curl)
[1, 5, 25, 50, 100] (cumulative percent). Pass wave_percentages to override, e.g. [10, 100] for a fast two-wave rollout on staging.
The call is synchronous from Restate’s perspective — the response returns when the rollout reaches completed or paused. Use /send instead of the path above if you want fire-and-forget and will monitor from Slack.
The rollout is rejected if another rollout is active (any state that isn’t idle, completed, or cancelled). Cancel or roll back the previous one first.
Start a rollout (Restate UI)
If you’d rather click than curl, the Restate Cloud UI exposes every handler as a button.- Open the Restate Cloud dashboard for the target environment and sign in.
- Go to Services →
hydra.v1.SentinelRolloutService. - Click the
Rollouthandler. The playground opens with a request form. - Set the virtual-object key to
singleton(the service only accepts this key). - Fill in the JSON body:
Add
"wave_percentages": [10, 100]if you want to override the defaults. - Hit Send (blocks until
completed/paused) or Send async (fire-and-forget — watch Slack).
Resume, Cancel, and RollbackAll, use the same flow: pick the handler on SentinelRolloutService, key singleton, empty {} body.
Observing a running rollout in the UI:
- Invocations tab — find the active
Rolloutinvocation; inspect its journal to see which wave is executing, what eachSentinelService.Deploycall returned, and where it’s suspended. - State tab on the
SentinelRolloutService/singletonobject — the currentrolloutState(wave index, succeeded/failed IDs, previous images) is stored here and updates live.
Monitor progress
- Slack: messages fire on every phase transition (rollout started, wave started/completed, paused, resumed, rollback started/completed).
- Logs: tail the control-plane worker — look for
starting sentinel rollout,starting wave,sentinel deploy failed. - DB:
sentinels.deploy_statusmovesprogressing → ready(orfailed) as each wave runs.
When a wave fails
The rollout transitions topaused and returns. Sentinels that succeeded in the paused wave stay on the new image; failed ones stay wherever Kubernetes left them. Investigate the failure (sentinel logs, Krane logs, deploy_status = failed rows), then pick one:
Resume — skip the failed wave and continue
Cancel — stop here, keep the new image on whatever succeeded
in_progress or paused.
RollbackAll — revert every sentinel that took the new image
SentinelService.Deploy back to each sentinel’s previous image (captured at rollout start). Failed sentinels are not touched — they never took the new image. Valid from paused or cancelled. Response returns the count of sentinels successfully reverted.
State reference
| State | Next legal ops |
|---|---|
idle / completed / cancelled | Rollout |
in_progress | Cancel |
paused | Resume, Cancel, RollbackAll |
rolling_back | wait |
Tips
- Test on staging first. The same RPCs exist on the staging control-plane — use the staging ingress URL and always run a full rollout there before prod.
- Don’t skip the Slack webhook in prod. If you
/sendthe rollout and forget to pass it, your only progress signal is worker logs. - Custom waves for emergencies. Rolling back a bad image via a fresh rollout of the last-known-good tag is often faster than
RollbackAllif most sentinels are already on the bad image — but think about whatpreviousImageswill capture before you do it. - The
singletonkey is intentional. Don’t try to run two rollouts at once by varying the key — clients always addresssingleton.

