Skip to main content

Purpose

The control plane worker is Unkey’s asynchronous control plane execution layer. It owns long-running, stateful workflows that coordinate changes across the control plane and downstream systems. The worker sits between the control API and infrastructure services, taking durable tasks off the request path and ensuring they complete exactly once.

Source

svc/ctrl/worker

Place in the stack

The control plane worker is not an API surface. It is a workflow host that acts on state changes initiated by the control API and scheduled jobs.
  1. The control API validates requests and persists intent to MySQL.
  2. The control API triggers a Restate workflow on the worker.
  3. The worker coordinates downstream systems, writes new state to MySQL, and emits side effects such as builds, certificate issuance, and routing updates.
  4. Edge components (Frontline, Krane, Sentinel) consume the updated state to apply changes at the data plane.
This separation keeps API handlers short, deterministic, and retry-safe. The worker assumes responsibility for orchestration, retries, and durable state transitions.

Interfaces

  • Restate workflow handlers served by the worker.
  • Health endpoints: /health/live, /health/ready, and /health/startup.
  • Optional Prometheus metrics server.

Service boundaries

The worker groups multiple Restate services into one process. Each service uses a virtual object key that defines concurrency boundaries and protects against conflicting state mutations.
ServiceVirtual object keyResponsibility
DeployServiceproject_idBuild, deploy, promote, and rollback orchestration for a project.
DeploymentServicedeployment_idSerializes desired state changes with nonce-based last-writer-wins.
RoutingServiceproject_idAtomic reassignment of frontline routes to a deployment.
CustomDomainServicedomainDomain ownership verification and post-verify actions.
CertificateServicedomainCertificate issuance and renewal with ACME and Vault.
ClickhouseUserServiceworkspace_idClickHouse user provisioning and quota updates when enabled.
SentinelServicesentinel_idDeploys a single sentinel: updates config, suspends on an awakeable until Krane reports healthy, marks failed on timeout.
SentinelRolloutServicesingletonFleet-wide progressive rollout of sentinel images using percentage waves, with pause-on-failure and RollbackAll.
CronServicevaries per handlerUnified entry point for all scheduled tasks (quota check, key refill, key last-used sync, audit log export, audit log outbox cleanup, ratelimit global counters cleanup). Per-handler VO keys: billing period, date, or a fixed per-task slug.
KeyLastUsedPartitionServicepartition_indexPer-partition fan-out target for CronService.RunKeyLastUsedSync.

System responsibilities

The worker centralizes orchestration for operations that touch multiple systems or must span minutes:
  • Deployment orchestration across regions, including builds, rollout, and routing updates.
  • Domain ownership verification, certificate issuance, and renewal.
  • Background maintenance, such as key refills and quota checks.
  • Optional ClickHouse user provisioning for analytics access.

Durability model

The worker relies on Restate to make workflows durable and idempotent. Each workflow step is journaled so Restate can replay completed steps and resume from the last successful checkpoint.
  • Durable steps isolate side effects and provide exactly-once semantics.
  • Virtual object keys serialize conflicting operations per domain, project, deployment, workspace, or region.
  • Long-running operations use Restate retries and durable sleep for external rate limits.
  • Background jobs persist progress in Restate state for safe resumption.

Determinism: never read the wall clock directly

Handler bodies replay on every retry, so any non-deterministic value read outside a journaled step diverges between executions and Restate aborts with a diverging-paths error. Reading time.Now() (or a clock.Clock) directly in handler code is the common offender: the first run and each replay observe different timestamps. Read the current time through restateutil.Now(ctx), which wraps the read in a restate.Run step so the value is journaled on the first execution and reused verbatim on every replay. Unit conversions on the result (UnixMilli, Add, …) are deterministic and safe outside the step.
now, err := restateutil.Now(ctx)
if err != nil {
    return nil, fmt.Errorf("get now: %w", err)
}
cutoff := now.Add(-retention).UnixMilli()
The same rule applies to any other non-deterministic source (randomness, external reads): produce it inside a restate.Run step, never in the bare handler body.

Dependencies

  • MySQL for control plane state.
  • Restate admin and ingress endpoints.
  • Vault for encryption operations.
  • ClickHouse for analytics and build telemetry (optional).
  • GitHub App credentials for git-based deployments.
  • Route53 credentials for ACME DNS challenges.
  • Depot and registry credentials for builds.