Unkey
ArchitectureServices

Control Plane (Ctrl)

The control plane service for managing deployments and infrastructure

The ctrl service provides a deployment platform similar to Vercel, Railway, or Fly.io. When a customer deploys their application, ctrl:

  1. Builds container images from source code using Depot.dev
  2. Orchestrates deployment across regions through event streaming to Krane agents
  3. Deploys containers to Kubernetes via the pull-based infrastructure model
  4. Assigns domains to route traffic and configure sentinels
  5. Secures applications with automatic TLS certificate provisioning

All multi-step operations are durable, using Restate workflows to ensure consistency even during failures, network partitions, or process crashes. The pull-based architecture ensures deployments remain resilient even when individual regions experience connectivity issues.

Architecture

Service Composition

The ctrl service is composed of several specialized services and workflows. The RPC services handle synchronous operations like pull-based infrastructure coordination through ClusterService, container image building through BuildService, deployment creation and management through DeployService, ACME challenge coordination through AcmeService, OpenAPI spec management through OpenApiService, and health checks through CtrlService.

Running alongside these are the Restate workflows that provide durable orchestration. The DeployService workflow orchestrates the full deployment lifecycle, the RoutingService workflow manages domain and sentinel configuration, and the CertificateService workflow handles TLS certificate provisioning through the ACME protocol.

Technology Stack

The ctrl service is built on Connect RPC for service-to-service communication using HTTP/2, with gRPC streaming for the ClusterService. Restate provides durable workflow orchestration with exactly-once semantics, ensuring operations complete reliably even during failures. A single MySQL database stores all persistent state: projects, deployments, deployment topology, instances, domains, sentinel configurations, and certificates. S3 stores build contexts and encrypted vault data. The ClusterService coordinates with Krane agents using event streaming for pull-based deployments to Kubernetes. Depot.dev handles remote container image building with persistent layer caching.

Services

Cluster Service

The cluster service implements version-based synchronization for coordinating deployments across multiple regions. Rather than pushing events to connected agents, it exposes WatchDeployments and WatchSentinels RPCs that Krane instances use to stream state changes. This design makes the control plane stateless with respect to connected clients.

The service provides these key RPCs. WatchDeployments and WatchSentinels establish server-streaming connections for receiving deployment and sentinel state changes respectively. For fresh connections (version=0), they first stream the complete desired state as a bootstrap, then switch to incremental mode. GetDesiredDeploymentState and GetDesiredSentinelState return current desired state for individual resources. ReportDeploymentStatus and ReportSentinelStatus receive pod status updates from agents.

When resources are created, updated, or deleted, the deploy workflow updates the resource with a monotonically increasing version number. Krane instances watching that region receive the change and apply it locally. This decouples the control plane from connection management and enables reliable at-least-once delivery through version-based resumption.

Read detailed Pull-Based Provisioning docs →

Build Service

The build service manages container image building for customer deployments through Depot, which provides remote BuildKit with persistent layer caching for fast rebuilds.

For GitHub-connected repositories, builds are triggered automatically via webhooks. BuildKit fetches the repository directly from GitHub using its native git context support, authenticated via GitHub App installation tokens. This eliminates the need for intermediate storage and provides efficient builds with automatic layer caching.

For CLI deployments, users provide pre-built Docker images directly, bypassing the build service entirely.

Read detailed Build System docs →

Deployment Service

The deployment service orchestrates the complete deployment lifecycle through durable workflows. It provides four key operations: CreateDeployment initiates a new deployment, GetDeployment queries the current status, Promote promotes a deployment to live, and Rollback rolls back to a previous deployment.

The deployment workflow progresses through several phases. It first builds a container image from Git via Depot (or accepts a pre-built image), then creates deployment topologies for all configured regions, ensuring sentinels and Cilium network policies exist per region. It polls in parallel until all instances are running. Once healthy, it generates frontline routes for per-commit, per-branch, and per-environment domains, assigns them atomically through the routing service, marks the deployment as ready, and — for non-rolled-back production environments — updates the project's live deployment pointer. The previous live deployment is then scheduled for standby via DeploymentService.

Restate implements durable executions by recording progress in a distributed persistent log. The log is managed by the Restate server. If ctrl crashes during deployment, Restate resumes from the last completed phase rather than restarting from the beginning. This ensures deployments complete reliably even during system failures.

Deployments are keyed by project_id in Restate's virtual object model. This ensures only one deployment operation per project runs at a time, preventing race conditions during concurrent deploy, rollback, or promote operations that could leave the system in an inconsistent state.

Read detailed Deployment Workflow docs →

ACME Service

The ACME service handles ACME protocol coordination for TLS certificate provisioning. It provides three key operations: CreateACMEUser registers an ACME account for a workspace, ValidateDomain validates domain ownership, and GetCertificate retrieves issued certificates.

The service coordinates with the Certificate workflow for actual certificate issuance. It supports both HTTP-01 challenges for custom domains and DNS-01 challenges via the Cloudflare provider for wildcard certificates on the default domain.

Private keys are encrypted using the vault service before storage. Certificates are stored in the main database for fast sentinel access without encryption overhead. Challenge records track certificate expiry with 90-day validity periods.

Read detailed Certificate docs →

OpenAPI Service

The OpenAPI service manages OpenAPI specifications scraped from deployed applications. It provides two key operations: GetDiff compares OpenAPI specs between deployments to detect breaking changes, and GetSpec retrieves the spec for a specific deployment.

Specs are scraped from GET /openapi.yaml on running instances during the deployment workflow. They're stored in the database and used for API documentation generation, request validation in sentinels, and breaking change detection between deployments.

Workflows

Workflows are implemented as Restate services for durable execution. The Deployment Workflow handles deploy, rollback, and promote operations. The Routing Workflow manages domain assignment and sentinel configuration. The Certificate Workflow processes ACME challenges for TLS certificate provisioning. See the individual workflow documentation pages for detailed implementation specifics.

Database Schema

The ctrl service uses a single MySQL database (unkey) that stores all data: projects, environments, and workspaces, along with deployments and deployment history, deployment topology for regional distribution, instances tracking individual pods, domains and SSL certificates, ACME users and challenges, sentinel configurations, and certificate storage in PEM format.

Resource tables (deployment_topology, sentinels) include a version column that drives Krane synchronization. Each mutation updates the version via the VersioningService singleton, providing a monotonically increasing version across all resources. Krane instances stream changes via WatchDeployments and WatchSentinels RPCs to receive incremental updates. Rows are indexed by (region, version) for efficient streaming.

Monitoring

The ctrl service exposes metrics and logs through OpenTelemetry. Key metrics include deployment duration broken down by phase, build success and failure rates, the number of Krane poll iterations required for deployments to become ready, domain assignment latency, ACME challenge success rates, state change processing latency, and instance status update processing time.

All operations include structured logging fields for correlation and debugging. Common fields include deployment_id, project_id, and workspace_id across all operations. Build operations add build_id and depot_project_id. ClusterService operations add region and sequence for tracking sync progress. System-level logs include instance_id, region, and platform to identify which ctrl instance handled the operation.

Logs are shipped to Grafana Loki in production for centralized log aggregation and querying.

On this page