> ## Documentation Index
> Fetch the complete documentation index at: https://engineering.unkey.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Unkey Infrastructure as Code Documentation

> Pulumi AWS infrastructure architecture and workflows.

## Table of Contents

<details>
  <summary>Expand for menu...</summary>

  * [Unkey Infrastructure as Code Documentation](#unkey-infrastructure-as-code-documentation)
    * [Table of Contents](#table-of-contents)
    * [Overview](#overview)
    * [Architecture](#architecture)
    * [Authentication & Authorization Flow](#authentication--authorization-flow)
      * [GitHub OIDC Authentication Explained](#github-oidc-authentication-explained)
    * [Setup & Configuration](#setup--configuration)
      * [GitHub OIDC Configuration](#github-oidc-configuration)
      * [Cross-Account Access Policy](#cross-account-access-policy)
      * [Executor Role Trust Policy](#executor-role-trust-policy)
      * [Pulumi Permissions Policy](#pulumi-permissions-policy)
    * [Pulumi Environment, Secrets and Configuration (ESC)](#pulumi-environment-secrets-and-configuration-esc)
      * [ESC Configuration items](#esc-configuration-items)
        * [Required Configuration Items](#required-configuration-items)
        * [Required Secret Configuration Items](#required-secret-configuration-items)
      * [Environment-Specific Items from ESC](#environment-specific-items-from-esc)
        * [Global Environment (e.g., unkey/api/canary-global)](#global-environment-eg-unkeyapicanary-global)
        * [Regional Environment (e.g., unkey/api/aws-canary-us-east-1)](#regional-environment-eg-unkeyapiaws-canary-us-east-1)
    * [Configuration for Database Password Resources](#configuration-for-database-password-resources)
    * [Example Stack Configuration](#example-stack-configuration)
    * [Environment Configurations](#environment-configurations)
      * [Working with Stacks and Environments](#working-with-stacks-and-environments)
      * [Setting Secrets](#setting-secrets)
      * [Accessing Configuration in Code](#accessing-configuration-in-code)
    * [Role Assumption in Pulumi Code](#role-assumption-in-pulumi-code)
    * [Deployed Resources](#deployed-resources)
    * [Making Changes to Infrastructure](#making-changes-to-infrastructure)
      * [Automated Deployment via GitHub Actions](#automated-deployment-via-github-actions)
        * [Prerequisites](#prerequisites)
        * [Workflow](#workflow)
        * [Troubleshooting CI/CD Deployments](#troubleshooting-cicd-deployments)
      * [Manual Deployment by us hoomans](#manual-deployment-by-us-hoomans)
        * [Prerequisites](#prerequisites-1)
        * [Workflow](#workflow-1)
      * [Managing Secrets Manually](#managing-secrets-manually)
        * [Common Manual Operations](#common-manual-operations)
        * [Best Practices for Manual Deployments](#best-practices-for-manual-deployments)
    * [Common Workflows](#common-workflows)
      * [Adding a New Secret](#adding-a-new-secret)
      * [Working with Multiple AWS Accounts through automation](#working-with-multiple-aws-accounts-through-automation)
      * [Troubleshooting](#troubleshooting)
</details>

## Overview

This documentation provides a comprehensive guide to Unkey's infrastructure setup using Pulumi for AWS deployments. The infrastructure is designed with a multi-account architecture, leveraging GitHub Actions for CI/CD, and implements cross-account role assumption for secure and scalable infrastructure management.

## Architecture

Unkey's infrastructure is organized into multiple AWS accounts:

* **Management/Root account**: Central account for authentication and cross-account access management
* **Sandbox account**: Development environment
* **Canary account**: Testing environment
* **Production account**: Production environment

Each account has a dedicated role (`UnkeyPulumiAWSExecutor`) that Pulumi can assume to deploy resources. GitHub Actions workflows use OIDC authentication to assume a role in the management account, which then assumes the executor roles in target accounts.

## Authentication & Authorization Flow

<details>
  <summary>Expand for all the gory details...</summary>

  ### GitHub OIDC Authentication Explained

  OpenID Connect (OIDC) integration between GitHub Actions and AWS works as follows:

  1. **Token-based Authentication**: Instead of storing long-lived AWS credentials as secrets in GitHub, GitHub Actions generates a short-lived OIDC token during workflow runs.

  2. **Trust Relationship**: AWS is configured to trust GitHub Actions as an identity provider. The trust is established through:
     * An OIDC provider in AWS IAM that points to `token.actions.githubusercontent.com`
     * A role (`GitHubActionsOIDCRole`) with a trust policy that validates the OIDC token

  3. **Conditional Access**: The trust policy includes conditions that verify:
     * The token audience (`aud`) is `sts.amazonaws.com`
     * The token subject (`sub`) matches `repo:unkeyed/infra:*`, meaning it came from workflows in the specified repository

  4. **Role Assumption**: When the GitHub Actions workflow runs, it:
     * Requests an OIDC token from GitHub's token issuer
     * Sends this token to AWS STS (Security Token Service) using the `AssumeRoleWithWebIdentity` API
     * If the token is valid and meets the conditions, AWS returns temporary credentials

  This approach eliminates the need for storing AWS access keys, enhances security by using short-lived credentials, and simplifies credential management.

  The full authentication flow is:

  1. GitHub Actions authenticates via OIDC to AWS
  2. GitHub Actions assumes the GitHubActionsOIDCRole in the Management Account
  3. The GitHubActionsOIDCRole assumes the UnkeyPulumiAWSExecutor role in the Target Account
  4. Pulumi uses the assumed role to deploy AWS resources in the Target Account
</details>

## Setup & Configuration

### GitHub OIDC Configuration

<details>
  <summary>Expand for all the gory details...</summary>

  The management account is configured with an OIDC provider for GitHub Actions, allowing secure authentication without storing long-lived credentials:

  <details>
    <summary>Exammple policy</summary>

    ```json theme={"theme":"kanagawa-wave"}
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "Federated": "arn:aws:iam::333769656712:oidc-provider/token.actions.githubusercontent.com"
          },
          "Action": "sts:AssumeRoleWithWebIdentity",
          "Condition": {
            "StringEquals": {
              "token.actions.githubusercontent.com:aud": "sts.amazonaws.com"
            },
            "StringLike": {
              "token.actions.githubusercontent.com:sub": "repo:unkeyed/infra:*"
            }
          }
        }
      ]
    }
    ```
  </details>

  ### Cross-Account Access Policy

  The `CrossAccountAssumeRole` policy attached to the `GitHubActionsOIDCRole` allows it to assume the `UnkeyPulumiAWSExecutor` role in target accounts:

  <details>
    <summary>Example policy</summary>

    ```json theme={"theme":"kanagawa-wave"}
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": "sts:AssumeRole",
          "Resource": [
            "arn:aws:iam::343218208612:role/UnkeyPulumiAWSExecutor",
            "arn:aws:iam::920373003756:role/UnkeyPulumiAWSExecutor",
            "arn:aws:iam::222634365038:role/UnkeyPulumiAWSExecutor"
          ]
        },
        {
          "Effect": "Allow",
          "Action": ["ec2:DescribeAvailabilityZones", "ec2:DescribeRegions"],
          "Resource": "*"
        }
      ]
    }
    ```
  </details>

  ### Executor Role Trust Policy

  The `UnkeyPulumiAWSExecutor` role in each target account has a trust policy allowing assumption by the `GitHubActionsOIDCRole` from the management account and the Administrator role in the target account:

  <details>
    <summary>Example policy</summary>

    ```json theme={"theme":"kanagawa-wave"}
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "AWS": [
              "arn:aws:iam::222634365038:role/aws-reserved/sso.amazonaws.com/AWSReservedSSO_AdministratorAccess_231cb3ec4a4e5945",
              "arn:aws:iam::333769656712:role/GitHubActionsOIDCRole"
            ]
          },
          "Action": "sts:AssumeRole"
        }
      ]
    }
    ```
  </details>

  ### Pulumi Permissions Policy

  The `UnkeyPulumiPolicy` attached to the `UnkeyPulumiAWSExecutor` role grants permissions to manage AWS resources:

  <details>
    <summary>UnkeyPulumiPolicy (Click to expand)</summary>

    ```json theme={"theme":"kanagawa-wave"}
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "cloudformation:*",
            "cloudwatch:*",
            "ec2:*",
            "ecr:*",
            "ecs:*",
            "elasticache:*",
            "elasticloadbalancing:*",
            "globalaccelerator:*",
            "iam:AttachRolePolicy",
            "iam:CreateRole",
            "iam:DeleteRole",
            "iam:DeleteRolePolicy",
            "iam:DetachRolePolicy",
            "iam:GetRole",
            "iam:GetRolePolicy",
            "iam:PassRole",
            "iam:PutRolePolicy",
            "iam:ListRolePolicies",
            "iam:ListAttachedRolePolicies",
            "iam:ListInstanceProfilesForRole",
            "kms:*",
            "logs:*",
            "route53:*",
            "ssm:*"
          ],
          "Resource": "*"
        }
      ]
    }
    ```
  </details>
</details>

## Pulumi Environment, Secrets and Configuration (ESC)

Unkey uses Pulumi ESC to manage configuration and secrets across environments. The naming convention for stacks and environments follows a pattern:

```
unkey/<project>/<cloud>-<cloud_account_name>-<cloud_region>
```

Example:

* Project: `api`
* Cloud: `aws`
* AWS Account: `canary`
* Region: `us-east-1`
* Stack name: `unkey/api/aws-canary-us-east-1`
* Environment: `unkey/api/aws-canary-us-east-1`

There is also a global environment for each AWS account to hold configuration items that don't change across regions: `unkey/api/canary-global`.

### ESC Configuration items

The following configuration items are explicitly required in the Pulumi code and would cause a panic if not set. These are retrieved using `config.Require()` which will panic if the value is not found.

#### Required Configuration Items

| Config Key          | Description                                   | Usage                                                               |
| ------------------- | --------------------------------------------- | ------------------------------------------------------------------- |
| `roleToAssumeARN`   | ARN of the AWS role to assume for deployments | Used to create the privileged AWS provider for cross-account access |
| `cidrBlock`         | CIDR block for the VPC                        | Defines the IP address range for the VPC                            |
| `hostedZoneID`      | Route53 hosted zone ID                        | Used for DNS record creation for certificate validation             |
| `certificateDomain` | Domain name for the SSL certificate           | Used to create and validate the ACM certificate                     |
| `awsRegion`         | AWS region for deployment                     | Used in container environment variables                             |
| `clickhouseUrl`     | URL for ClickHouse database                   | Passed as environment variable to containers                        |

#### Required Secret Configuration Items

These items are retrieved using `config.RequireSecret()` which will also panic if not found:

| Secret Config Key             | Description                         | Usage                                            |
| ----------------------------- | ----------------------------------- | ------------------------------------------------ |
| `clickhouseUrl`               | URL for the ClickHouse database     | Used as an environment variable in the container |
| `OTEL_EXPORTER_OTLP_HEADERS`  | Headers for OpenTelemetry exporter  | Used as an environment variable in the container |
| `OTEL_EXPORTER_OTLP_ENDPOINT` | Endpoint for OpenTelemetry exporter | Used as an environment variable in the container |

### Environment-Specific Items from ESC

These configuration items would be set in Pulumi ESC environments and imported into stacks:

#### Global Environment (e.g., `unkey/api/canary-global`)

| Config Key           | Description                                |
| -------------------- | ------------------------------------------ |
| `databasePrimaryDsn` | Connection string for the primary database |

#### Regional Environment (e.g., `unkey/api/aws-canary-us-east-1`)

| Config Key    | Description                                   |
| ------------- | --------------------------------------------- |
| `aws:profile` | AWS profile name (e.g., `unkey-canary-admin`) |
| `aws:region`  | AWS region (e.g., `us-east-1`)                |

## Configuration for Database Password Resources

The code also creates database password resources with these parameters:

* Database: `"unkey"`
* Branch: `"main"`
* Name: Dynamically generated based on project and stack
* Replica: Determined by stack name (false for canary stacks)
* Role: `"readwriter"` for primary, `"reader"` for replica

## Example Stack Configuration

Here's an example of what should be in a complete stack configuration file (`Pulumi.aws-canary-us-east-1.yaml`):

```yaml theme={"theme":"kanagawa-wave"}
imports:
  - api/canary-global

environment:
  - api/aws-canary-us-east-1

values:
  pulumiConfig:
```

## Environment Configurations

In the global environment:

<details>
  <summary>Global Environment Configuration Commands (Click to expand)</summary>

  ```bash theme={"theme":"kanagawa-wave"}
  esc env set --secret unkey/api/canary-global pulumiConfig.api:databasePrimaryDsn "yourDsnHere"
  esc env set --secret unkey/api/canary-global pulumiConfig.api:clickhouseUrl "yourClickhouseUrlHere"
  esc env set --secret unkey/api/canary-global pulumiConfig.api:OTEL_EXPORTER_OTLP_HEADERS "yourHeadersHere"
  esc env set --secret unkey/api/canary-global pulumiConfig.api:OTEL_EXPORTER_OTLP_ENDPOINT "yourEndpointHere"
  esc env set unkey/api/canary-global pulumiConfig.api:roleToAssumeARN "roleARNHere"
  ```
</details>

In the regional environment:

```bash theme={"theme":"kanagawa-wave"}
esc env set unkey/api/aws-canary-us-east-1 pulumiConfig.aws:profile unkey-canary-admin
esc env set unkey/api/aws-canary-us-east-1 pulumiConfig.aws:region us-east-1
```

### Working with Stacks and Environments

```bash theme={"theme":"kanagawa-wave"}
# Create a stack
pulumi stack init unkey/api/aws-canary-us-east-1

# Create the global environment
esc env init unkey/api/canary-global

# Create the region environment
esc env init unkey/api/aws-canary-us-east-1
```

### Setting Secrets

Secrets are managed using ESC:

```bash theme={"theme":"kanagawa-wave"}
esc env set --secret unkey/api/canary-global pulumiConfig.api:databasePrimaryDsn "thesecretgoeshere"
```

### Accessing Configuration in Code

In Go code, configuration values are accessed using the Pulumi config system:

```go theme={"theme":"kanagawa-wave"}
func main() {
  pulumi.Run(func(ctx *pulumi.Context) error {
    // Initialize configuration
    config := config.New(ctx, "")

    // Access regular config
    cidrBlock := config.Require("cidrBlock")

    // Access secrets
    databasePrimaryDsn := config.RequireSecret("databasePrimaryDsn")

    // ... rest of code
  })
}
```

## Role Assumption in Pulumi Code

The Pulumi code uses role assumption to obtain the necessary permissions in the target AWS account:

```go theme={"theme":"kanagawa-wave"}
roleToAssumeARN := config.Require("roleToAssumeARN")

provider, err := aws.NewProvider(ctx, "privileged", &aws.ProviderArgs{
  AssumeRole: &aws.ProviderAssumeRoleArgs{
    RoleArn:     pulumi.StringPtr(roleToAssumeARN),
    SessionName: pulumi.String("NameYourSession"),
    ExternalId:  pulumi.String("SomeNameIsUsefulHere"),
  },
  Region: pulumi.String(region),
})
```

## Deployed Resources

The Pulumi code deploys several AWS resources:

1. **VPC and Networking**:
   * VPC with public subnets
   * Security groups for ALB, Fargate tasks, and Redis

2. **Serverless Redis** (Valkey):
   * Elasticache Serverless Cache with Valkey engine

3. **Load Balancing**:
   * Application Load Balancer
   * Target groups and listeners for HTTP/HTTPS traffic
   * SSL/TLS certificate with DNS validation

4. **ECS Fargate Service**:
   * ECS Cluster
   * Fargate service with task definition
   * Container configuration with environment variables

5. **Secrets Management**:
   * Database credentials for primary and replica databases

## Making Changes to Infrastructure

There are two primary methods for deploying infrastructure changes: automated deployment through GitHub Actions and manual deployment by human operators. Each approach has specific workflows and considerations.

### Automated Deployment via GitHub Actions

GitHub Actions is the primary method for deploying infrastructure changes to all environments. This approach provides consistency, auditability, and reduces the risk of human error.

#### Prerequisites

1. **GitHub Repository Access**: Ensure you have appropriate access to the `unkeyed/infra` repository.
2. **Pull Request Process**: All changes should follow the standard PR review process.

#### Workflow

1. **Create a Feature Branch**:
   ```bash theme={"theme":"kanagawa-wave"}
   git checkout -b feature/my-infrastructure-change
   ```

2. **Make Your Changes**:
   * Update Pulumi code in Go files
   * Modify stack configuration in `Pulumi.*.yaml` files

3. **Test Locally** (if possible):
   ```bash theme={"theme":"kanagawa-wave"}
   # Preview changes without applying them
   pulumi preview --stack unkey/api/aws-sandbox-us-east-1
   ```

4. **Commit and Push Changes**:

You should have git commit signing enabled, and use it!

```bash theme={"theme":"kanagawa-wave"}
git add .
git commit -S -s -m "fix: description of infrastructure changes"
# follow conventional commits
git push origin feature/my-infrastructure-change
```

5. **Create Pull Request**:
   * Open a PR against the main branch
   * Include a detailed description of the changes
   * Request reviews from appropriate team members

6. **CI/CD Pipeline Execution**:
   * Coordinating changes across stacks might be something to consider!
   * GitHub Actions will automatically run the Pulumi workflow
   * The workflow will:
     * Authenticate to AWS using OIDC
     * Assume the necessary roles
     * Deploy stacks when either:
       * The github action workflow file Changes
       * Any of the code in the project of the workflows

7. **Deployment Order**:
   * Changes are typically deployed to sandbox first
   * Once verified, they're deployed to canary
   * Finally, they're deployed to production

8. **Monitor Deployments**:
   * Check GitHub Actions logs for deployment status
   * Verify resources in AWS Console
   * Check application functionality

#### Troubleshooting CI/CD Deployments

* **Authentication Issues**: Verify the OIDC trust relationship is correctly configured
* **Permission Errors**: Check that the assumed roles have the necessary policies
* **Failed Deployments**: Review the GitHub Actions logs for specific error messages

### Manual Deployment by us hoomans

In some scenarios, you may need to deploy changes manually. This approach is typically used for emergency fixes or when testing new infrastructure components.

#### Prerequisites

1. **AWS CLI and Credentials**:
   ```bash theme={"theme":"kanagawa-wave"}
   # Install AWS CLI
   brew install awscli

   # Configure credentials
   aws configure
   ```

2. **Pulumi and ESC CLI**:
   ```bash theme={"theme":"kanagawa-wave"}
   brew update && brew install pulumi/tap/esc pulumi/tap/pulumi
   ```

3. **AWS SSO Access**: Ensure you have SSO access to the relevant AWS accounts with Administrator permissions.

#### Workflow

1. **Clone the Repository**:
   ```bash theme={"theme":"kanagawa-wave"}
   git clone https://github.com/unkeyed/infra.git
   cd infra
   ```

2. **Switch to the Appropriate Branch**:
   ```bash theme={"theme":"kanagawa-wave"}
   git checkout main  # Or feature branch if testing
   ```

3. **Login to AWS SSO**:
   ```bash theme={"theme":"kanagawa-wave"}
   # Login to the relevant account
   aws sso login --profile unkey-sandbox-admin
   ```

4. **Set Up Pulumi Stack**:
   ```bash theme={"theme":"kanagawa-wave"}
   # Select the appropriate stack
   AWS_PROFILE=unkey-sandbox-admin pulumi stack select unkey/api/aws-sandbox-us-east-1
   ```

5. **Configure Role Assumption**:
   * The `roleToAssumeARN` config is set at the project's global stack.
   * Your SSO role gives you the ability to assume the `UnkeyPulumiAWSExecutor` role.

6. **Preview Changes**:
   ```bash theme={"theme":"kanagawa-wave"}
   AWS_PROFILE=unkey-sandbox-admin pulumi preview
   ```

7. **Apply Changes**:
   ```bash theme={"theme":"kanagawa-wave"}
   AWS_PROFILE=unkey-sandbox-admin pulumi up
   ```

8. **Verify Deployment**:
   * Check [AWS Console](https://unkey.awsapps.com/start/) in the account you're working in for deployed resources
   * Test functionality
   * Monitor logs and metrics (ECS tasks for API are a good place to start, for example)

9. **Document the Changes**:
   * Create necessary tickets...
   * Update documentation
   * Create a PR to formalize the changes once the incident has passed.

10. **Thoughts for testing**:
    * When testing GitHub Actions changes, use a test branch like `workflow-testing` or one of your choosing.

### Managing Secrets Manually

When working with secrets manually, use the ESC CLI:

```bash theme={"theme":"kanagawa-wave"}
# View existing secrets (will not show values)
esc env open unkey/api/aws-sandbox-us-east-1 -f yaml

# Set a new secret
esc env set --secret unkey/api/aws-sandbox-global pulumiConfig.api:myNewSecret "secretvalue"

# Update an existing secret
esc env set --secret unkey/api/aws-sandbox-global pulumiConfig.api:existingSecret "newsecretvalue"
```

#### Common Manual Operations

1. **Emergency Resource Updates**:

   If you can't wait around you can run locally.. but ONLY use this in extreme cases.

   ```bash theme={"theme":"kanagawa-wave"}
   pulumi up --skip-preview -y
   ```

#### Best Practices for Manual Deployments

1. **Communicate Changes**: Notify @imeyer and @chronark before and after manual deployments
2. **Document Everything**: Record all manual changes in appropriate tickets or documentation
3. **Transfer to CI/CD**: Move manual changes to the CI/CD pipeline as soon as possible
4. **Limit Scope**: Make the smallest possible change needed to resolve the issue
5. **Test First**: Always test changes in sandbox before applying to production
6. **Note how often none of this is followed**: Ahem.

## Common Workflows

### Adding a New Secret

```bash theme={"theme":"kanagawa-wave"}
# Set the secret in the global environment
esc env set --secret unkey/api/canary-global pulumiConfig.api:newSecret "secretvalue"

# Access the secret in code
newSecret := config.RequireSecret("newSecret")
```

### Working with Multiple AWS Accounts through automation

The infrastructure is designed to support multiple AWS accounts through role assumption:

1. GitHub Actions assumes the `GitHubActionsOIDCRole` in the management account
2. The Pulumi code assumes the `UnkeyPulumiAWSExecutor` role in the target account
3. Project resources are created in the target account

### Troubleshooting

If you encounter issues:

1. Check the GitHub Actions workflow logs
2. Verify the role assumption chain is working correctly (TODO)
3. Ensure the `UnkeyPulumiAWSExecutor` role has the necessary permissions

Number 3 is the most likely to happen right now as things grow.
