Skip to main content

Table of Contents

Overview

This documentation provides a comprehensive guide to Unkey’s infrastructure setup using Pulumi for AWS deployments. The infrastructure is designed with a multi-account architecture, leveraging GitHub Actions for CI/CD, and implements cross-account role assumption for secure and scalable infrastructure management.

Architecture

Unkey’s infrastructure is organized into multiple AWS accounts:
  • Management/Root account: Central account for authentication and cross-account access management
  • Sandbox account: Development environment
  • Canary account: Testing environment
  • Production account: Production environment
Each account has a dedicated role (UnkeyPulumiAWSExecutor) that Pulumi can assume to deploy resources. GitHub Actions workflows use OIDC authentication to assume a role in the management account, which then assumes the executor roles in target accounts.

Authentication & Authorization Flow

Setup & Configuration

GitHub OIDC Configuration

Pulumi Environment, Secrets and Configuration (ESC)

Unkey uses Pulumi ESC to manage configuration and secrets across environments. The naming convention for stacks and environments follows a pattern:
unkey/<project>/<cloud>-<cloud_account_name>-<cloud_region>
Example:
  • Project: api
  • Cloud: aws
  • AWS Account: canary
  • Region: us-east-1
  • Stack name: unkey/api/aws-canary-us-east-1
  • Environment: unkey/api/aws-canary-us-east-1
There is also a global environment for each AWS account to hold configuration items that don’t change across regions: unkey/api/canary-global.

ESC Configuration items

The following configuration items are explicitly required in the Pulumi code and would cause a panic if not set. These are retrieved using config.Require() which will panic if the value is not found.

Required Configuration Items

Config KeyDescriptionUsage
roleToAssumeARNARN of the AWS role to assume for deploymentsUsed to create the privileged AWS provider for cross-account access
cidrBlockCIDR block for the VPCDefines the IP address range for the VPC
hostedZoneIDRoute53 hosted zone IDUsed for DNS record creation for certificate validation
certificateDomainDomain name for the SSL certificateUsed to create and validate the ACM certificate
awsRegionAWS region for deploymentUsed in container environment variables
clickhouseUrlURL for ClickHouse databasePassed as environment variable to containers

Required Secret Configuration Items

These items are retrieved using config.RequireSecret() which will also panic if not found:
Secret Config KeyDescriptionUsage
clickhouseUrlURL for the ClickHouse databaseUsed as an environment variable in the container
OTEL_EXPORTER_OTLP_HEADERSHeaders for OpenTelemetry exporterUsed as an environment variable in the container
OTEL_EXPORTER_OTLP_ENDPOINTEndpoint for OpenTelemetry exporterUsed as an environment variable in the container

Environment-Specific Items from ESC

These configuration items would be set in Pulumi ESC environments and imported into stacks:

Global Environment (e.g., unkey/api/canary-global)

Config KeyDescription
databasePrimaryDsnConnection string for the primary database

Regional Environment (e.g., unkey/api/aws-canary-us-east-1)

Config KeyDescription
aws:profileAWS profile name (e.g., unkey-canary-admin)
aws:regionAWS region (e.g., us-east-1)

Configuration for Database Password Resources

The code also creates database password resources with these parameters:
  • Database: "unkey"
  • Branch: "main"
  • Name: Dynamically generated based on project and stack
  • Replica: Determined by stack name (false for canary stacks)
  • Role: "readwriter" for primary, "reader" for replica

Example Stack Configuration

Here’s an example of what should be in a complete stack configuration file (Pulumi.aws-canary-us-east-1.yaml):
imports:
  - api/canary-global

environment:
  - api/aws-canary-us-east-1

values:
  pulumiConfig:

Environment Configurations

In the global environment: In the regional environment:
esc env set unkey/api/aws-canary-us-east-1 pulumiConfig.aws:profile unkey-canary-admin
esc env set unkey/api/aws-canary-us-east-1 pulumiConfig.aws:region us-east-1

Working with Stacks and Environments

# Create a stack
pulumi stack init unkey/api/aws-canary-us-east-1

# Create the global environment
esc env init unkey/api/canary-global

# Create the region environment
esc env init unkey/api/aws-canary-us-east-1

Setting Secrets

Secrets are managed using ESC:
esc env set --secret unkey/api/canary-global pulumiConfig.api:databasePrimaryDsn "thesecretgoeshere"

Accessing Configuration in Code

In Go code, configuration values are accessed using the Pulumi config system:
func main() {
  pulumi.Run(func(ctx *pulumi.Context) error {
    // Initialize configuration
    config := config.New(ctx, "")

    // Access regular config
    cidrBlock := config.Require("cidrBlock")

    // Access secrets
    databasePrimaryDsn := config.RequireSecret("databasePrimaryDsn")

    // ... rest of code
  })
}

Role Assumption in Pulumi Code

The Pulumi code uses role assumption to obtain the necessary permissions in the target AWS account:
roleToAssumeARN := config.Require("roleToAssumeARN")

provider, err := aws.NewProvider(ctx, "privileged", &aws.ProviderArgs{
  AssumeRole: &aws.ProviderAssumeRoleArgs{
    RoleArn:     pulumi.StringPtr(roleToAssumeARN),
    SessionName: pulumi.String("NameYourSession"),
    ExternalId:  pulumi.String("SomeNameIsUsefulHere"),
  },
  Region: pulumi.String(region),
})

Deployed Resources

The Pulumi code deploys several AWS resources:
  1. VPC and Networking:
    • VPC with public subnets
    • Security groups for ALB, Fargate tasks, and Redis
  2. Serverless Redis (Valkey):
    • Elasticache Serverless Cache with Valkey engine
  3. Load Balancing:
    • Application Load Balancer
    • Target groups and listeners for HTTP/HTTPS traffic
    • SSL/TLS certificate with DNS validation
  4. ECS Fargate Service:
    • ECS Cluster
    • Fargate service with task definition
    • Container configuration with environment variables
  5. Secrets Management:
    • Database credentials for primary and replica databases

Making Changes to Infrastructure

There are two primary methods for deploying infrastructure changes: automated deployment through GitHub Actions and manual deployment by human operators. Each approach has specific workflows and considerations.

Automated Deployment via GitHub Actions

GitHub Actions is the primary method for deploying infrastructure changes to all environments. This approach provides consistency, auditability, and reduces the risk of human error.

Prerequisites

  1. GitHub Repository Access: Ensure you have appropriate access to the unkeyed/infra repository.
  2. Pull Request Process: All changes should follow the standard PR review process.

Workflow

  1. Create a Feature Branch:
    git checkout -b feature/my-infrastructure-change
    
  2. Make Your Changes:
    • Update Pulumi code in Go files
    • Modify stack configuration in Pulumi.*.yaml files
  3. Test Locally (if possible):
    # Preview changes without applying them
    pulumi preview --stack unkey/api/aws-sandbox-us-east-1
    
  4. Commit and Push Changes:
You should have git commit signing enabled, and use it!
git add .
git commit -S -s -m "fix: description of infrastructure changes"
# follow conventional commits
git push origin feature/my-infrastructure-change
  1. Create Pull Request:
    • Open a PR against the main branch
    • Include a detailed description of the changes
    • Request reviews from appropriate team members
  2. CI/CD Pipeline Execution:
    • Coordinating changes across stacks might be something to consider!
    • GitHub Actions will automatically run the Pulumi workflow
    • The workflow will:
      • Authenticate to AWS using OIDC
      • Assume the necessary roles
      • Deploy stacks when either:
        • The github action workflow file Changes
        • Any of the code in the project of the workflows
  3. Deployment Order:
    • Changes are typically deployed to sandbox first
    • Once verified, they’re deployed to canary
    • Finally, they’re deployed to production
  4. Monitor Deployments:
    • Check GitHub Actions logs for deployment status
    • Verify resources in AWS Console
    • Check application functionality

Troubleshooting CI/CD Deployments

  • Authentication Issues: Verify the OIDC trust relationship is correctly configured
  • Permission Errors: Check that the assumed roles have the necessary policies
  • Failed Deployments: Review the GitHub Actions logs for specific error messages

Manual Deployment by us hoomans

In some scenarios, you may need to deploy changes manually. This approach is typically used for emergency fixes or when testing new infrastructure components.

Prerequisites

  1. AWS CLI and Credentials:
    # Install AWS CLI
    brew install awscli
    
    # Configure credentials
    aws configure
    
  2. Pulumi and ESC CLI:
    brew update && brew install pulumi/tap/esc pulumi/tap/pulumi
    
  3. AWS SSO Access: Ensure you have SSO access to the relevant AWS accounts with Administrator permissions.

Workflow

  1. Clone the Repository:
    git clone https://github.com/unkeyed/infra.git
    cd infra
    
  2. Switch to the Appropriate Branch:
    git checkout main  # Or feature branch if testing
    
  3. Login to AWS SSO:
    # Login to the relevant account
    aws sso login --profile unkey-sandbox-admin
    
  4. Set Up Pulumi Stack:
    # Select the appropriate stack
    AWS_PROFILE=unkey-sandbox-admin pulumi stack select unkey/api/aws-sandbox-us-east-1
    
  5. Configure Role Assumption:
    • The roleToAssumeARN config is set at the project’s global stack.
    • Your SSO role gives you the ability to assume the UnkeyPulumiAWSExecutor role.
  6. Preview Changes:
    AWS_PROFILE=unkey-sandbox-admin pulumi preview
    
  7. Apply Changes:
    AWS_PROFILE=unkey-sandbox-admin pulumi up
    
  8. Verify Deployment:
    • Check AWS Console in the account you’re working in for deployed resources
    • Test functionality
    • Monitor logs and metrics (ECS tasks for API are a good place to start, for example)
  9. Document the Changes:
    • Create necessary tickets…
    • Update documentation
    • Create a PR to formalize the changes once the incident has passed.
  10. Thoughts for testing:
    • When testing GitHub Actions changes, use a test branch like workflow-testing or one of your choosing.

Managing Secrets Manually

When working with secrets manually, use the ESC CLI:
# View existing secrets (will not show values)
esc env open unkey/api/aws-sandbox-us-east-1 -f yaml

# Set a new secret
esc env set --secret unkey/api/aws-sandbox-global pulumiConfig.api:myNewSecret "secretvalue"

# Update an existing secret
esc env set --secret unkey/api/aws-sandbox-global pulumiConfig.api:existingSecret "newsecretvalue"

Common Manual Operations

  1. Emergency Resource Updates: If you can’t wait around you can run locally.. but ONLY use this in extreme cases.
    pulumi up --skip-preview -y
    

Best Practices for Manual Deployments

  1. Communicate Changes: Notify @imeyer and @chronark before and after manual deployments
  2. Document Everything: Record all manual changes in appropriate tickets or documentation
  3. Transfer to CI/CD: Move manual changes to the CI/CD pipeline as soon as possible
  4. Limit Scope: Make the smallest possible change needed to resolve the issue
  5. Test First: Always test changes in sandbox before applying to production
  6. Note how often none of this is followed: Ahem.

Common Workflows

Adding a New Secret

# Set the secret in the global environment
esc env set --secret unkey/api/canary-global pulumiConfig.api:newSecret "secretvalue"

# Access the secret in code
newSecret := config.RequireSecret("newSecret")

Working with Multiple AWS Accounts through automation

The infrastructure is designed to support multiple AWS accounts through role assumption:
  1. GitHub Actions assumes the GitHubActionsOIDCRole in the management account
  2. The Pulumi code assumes the UnkeyPulumiAWSExecutor role in the target account
  3. Project resources are created in the target account

Troubleshooting

If you encounter issues:
  1. Check the GitHub Actions workflow logs
  2. Verify the role assumption chain is working correctly (TODO)
  3. Ensure the UnkeyPulumiAWSExecutor role has the necessary permissions
Number 3 is the most likely to happen right now as things grow.