Table of Contents
Overview
This documentation provides a comprehensive guide to Unkey’s infrastructure setup using Pulumi for AWS deployments. The infrastructure is designed with a multi-account architecture, leveraging GitHub Actions for CI/CD, and implements cross-account role assumption for secure and scalable infrastructure management.Architecture
Unkey’s infrastructure is organized into multiple AWS accounts:- Management/Root account: Central account for authentication and cross-account access management
- Sandbox account: Development environment
- Canary account: Testing environment
- Production account: Production environment
UnkeyPulumiAWSExecutor) that Pulumi can assume to deploy resources. GitHub Actions workflows use OIDC authentication to assume a role in the management account, which then assumes the executor roles in target accounts.
Authentication & Authorization Flow
Setup & Configuration
GitHub OIDC Configuration
Pulumi Environment, Secrets and Configuration (ESC)
Unkey uses Pulumi ESC to manage configuration and secrets across environments. The naming convention for stacks and environments follows a pattern:- Project:
api - Cloud:
aws - AWS Account:
canary - Region:
us-east-1 - Stack name:
unkey/api/aws-canary-us-east-1 - Environment:
unkey/api/aws-canary-us-east-1
unkey/api/canary-global.
ESC Configuration items
The following configuration items are explicitly required in the Pulumi code and would cause a panic if not set. These are retrieved usingconfig.Require() which will panic if the value is not found.
Required Configuration Items
| Config Key | Description | Usage |
|---|---|---|
roleToAssumeARN | ARN of the AWS role to assume for deployments | Used to create the privileged AWS provider for cross-account access |
cidrBlock | CIDR block for the VPC | Defines the IP address range for the VPC |
hostedZoneID | Route53 hosted zone ID | Used for DNS record creation for certificate validation |
certificateDomain | Domain name for the SSL certificate | Used to create and validate the ACM certificate |
awsRegion | AWS region for deployment | Used in container environment variables |
clickhouseUrl | URL for ClickHouse database | Passed as environment variable to containers |
Required Secret Configuration Items
These items are retrieved usingconfig.RequireSecret() which will also panic if not found:
| Secret Config Key | Description | Usage |
|---|---|---|
clickhouseUrl | URL for the ClickHouse database | Used as an environment variable in the container |
OTEL_EXPORTER_OTLP_HEADERS | Headers for OpenTelemetry exporter | Used as an environment variable in the container |
OTEL_EXPORTER_OTLP_ENDPOINT | Endpoint for OpenTelemetry exporter | Used as an environment variable in the container |
Environment-Specific Items from ESC
These configuration items would be set in Pulumi ESC environments and imported into stacks:Global Environment (e.g., unkey/api/canary-global)
| Config Key | Description |
|---|---|
databasePrimaryDsn | Connection string for the primary database |
Regional Environment (e.g., unkey/api/aws-canary-us-east-1)
| Config Key | Description |
|---|---|
aws:profile | AWS profile name (e.g., unkey-canary-admin) |
aws:region | AWS region (e.g., us-east-1) |
Configuration for Database Password Resources
The code also creates database password resources with these parameters:- Database:
"unkey" - Branch:
"main" - Name: Dynamically generated based on project and stack
- Replica: Determined by stack name (false for canary stacks)
- Role:
"readwriter"for primary,"reader"for replica
Example Stack Configuration
Here’s an example of what should be in a complete stack configuration file (Pulumi.aws-canary-us-east-1.yaml):
Environment Configurations
In the global environment: In the regional environment:Working with Stacks and Environments
Setting Secrets
Secrets are managed using ESC:Accessing Configuration in Code
In Go code, configuration values are accessed using the Pulumi config system:Role Assumption in Pulumi Code
The Pulumi code uses role assumption to obtain the necessary permissions in the target AWS account:Deployed Resources
The Pulumi code deploys several AWS resources:-
VPC and Networking:
- VPC with public subnets
- Security groups for ALB, Fargate tasks, and Redis
-
Serverless Redis (Valkey):
- Elasticache Serverless Cache with Valkey engine
-
Load Balancing:
- Application Load Balancer
- Target groups and listeners for HTTP/HTTPS traffic
- SSL/TLS certificate with DNS validation
-
ECS Fargate Service:
- ECS Cluster
- Fargate service with task definition
- Container configuration with environment variables
-
Secrets Management:
- Database credentials for primary and replica databases
Making Changes to Infrastructure
There are two primary methods for deploying infrastructure changes: automated deployment through GitHub Actions and manual deployment by human operators. Each approach has specific workflows and considerations.Automated Deployment via GitHub Actions
GitHub Actions is the primary method for deploying infrastructure changes to all environments. This approach provides consistency, auditability, and reduces the risk of human error.Prerequisites
- GitHub Repository Access: Ensure you have appropriate access to the
unkeyed/infrarepository. - Pull Request Process: All changes should follow the standard PR review process.
Workflow
-
Create a Feature Branch:
-
Make Your Changes:
- Update Pulumi code in Go files
- Modify stack configuration in
Pulumi.*.yamlfiles
-
Test Locally (if possible):
- Commit and Push Changes:
-
Create Pull Request:
- Open a PR against the main branch
- Include a detailed description of the changes
- Request reviews from appropriate team members
-
CI/CD Pipeline Execution:
- Coordinating changes across stacks might be something to consider!
- GitHub Actions will automatically run the Pulumi workflow
- The workflow will:
- Authenticate to AWS using OIDC
- Assume the necessary roles
- Deploy stacks when either:
- The github action workflow file Changes
- Any of the code in the project of the workflows
-
Deployment Order:
- Changes are typically deployed to sandbox first
- Once verified, they’re deployed to canary
- Finally, they’re deployed to production
-
Monitor Deployments:
- Check GitHub Actions logs for deployment status
- Verify resources in AWS Console
- Check application functionality
Troubleshooting CI/CD Deployments
- Authentication Issues: Verify the OIDC trust relationship is correctly configured
- Permission Errors: Check that the assumed roles have the necessary policies
- Failed Deployments: Review the GitHub Actions logs for specific error messages
Manual Deployment by us hoomans
In some scenarios, you may need to deploy changes manually. This approach is typically used for emergency fixes or when testing new infrastructure components.Prerequisites
-
AWS CLI and Credentials:
-
Pulumi and ESC CLI:
- AWS SSO Access: Ensure you have SSO access to the relevant AWS accounts with Administrator permissions.
Workflow
-
Clone the Repository:
-
Switch to the Appropriate Branch:
-
Login to AWS SSO:
-
Set Up Pulumi Stack:
-
Configure Role Assumption:
- The
roleToAssumeARNconfig is set at the project’s global stack. - Your SSO role gives you the ability to assume the
UnkeyPulumiAWSExecutorrole.
- The
-
Preview Changes:
-
Apply Changes:
-
Verify Deployment:
- Check AWS Console in the account you’re working in for deployed resources
- Test functionality
- Monitor logs and metrics (ECS tasks for API are a good place to start, for example)
-
Document the Changes:
- Create necessary tickets…
- Update documentation
- Create a PR to formalize the changes once the incident has passed.
-
Thoughts for testing:
- When testing GitHub Actions changes, use a test branch like
workflow-testingor one of your choosing.
- When testing GitHub Actions changes, use a test branch like
Managing Secrets Manually
When working with secrets manually, use the ESC CLI:Common Manual Operations
-
Emergency Resource Updates:
If you can’t wait around you can run locally.. but ONLY use this in extreme cases.
Best Practices for Manual Deployments
- Communicate Changes: Notify @imeyer and @chronark before and after manual deployments
- Document Everything: Record all manual changes in appropriate tickets or documentation
- Transfer to CI/CD: Move manual changes to the CI/CD pipeline as soon as possible
- Limit Scope: Make the smallest possible change needed to resolve the issue
- Test First: Always test changes in sandbox before applying to production
- Note how often none of this is followed: Ahem.
Common Workflows
Adding a New Secret
Working with Multiple AWS Accounts through automation
The infrastructure is designed to support multiple AWS accounts through role assumption:- GitHub Actions assumes the
GitHubActionsOIDCRolein the management account - The Pulumi code assumes the
UnkeyPulumiAWSExecutorrole in the target account - Project resources are created in the target account
Troubleshooting
If you encounter issues:- Check the GitHub Actions workflow logs
- Verify the role assumption chain is working correctly (TODO)
- Ensure the
UnkeyPulumiAWSExecutorrole has the necessary permissions

