Goals
- Achieve 100% dataplane availability independent of primary database
- Provide fast access to dynamic data across global regions
- Propagate data across the system quickly
- Minimize load on expensive storage
- Enable efficient cache invalidation
- Can run on any cloud or on premise
Options
1. Direct S3 + In-Memory Cache with SWR
Pros
- Simple, straightforward design
- Low architectural complexity
Cons
- Cache invalidation requires communication with all machines or really low TTLs (<10s)
- Inefficient cache refresh patterns
- High load on S3 due to concurrent SWR requests from multiple machines
2. S3 + In-Memory Cache with Gossip Protocol
Pros
- Efficient cache invalidation through gossip, allowing higher TTLs
- Reduced load on primary storage
- Only need to notify one node for changes
Cons
- Need to implement ordering mechanism (timestamps/Lamport clocks)
- More complex system architecture
- Global gossip cluster management overhead
3. S3 + Dedicated Cache Layer
Pros
- Better cache retention due to less frequent reboots
- Optional global eviction via gossip/kafka later
- Maybe we only need 1 S3 region now instead of replicating it
Cons
- Additional infrastructure to manage
4. DynamoDB Global Tables + Caching
Option 4A: Direct DynamoDB + Sentinel Memory Cache- Each sentinel maintains a local memory SWR cache with a TTL of 10s
- DynamoDB serves as source of truth
- Automatic multi-region replication handled by AWS
Pros
- Built-in multi-region replication with strong consistency
- No need to manage complex replication logic
- Lower latency reads from local region
- Automatic conflict resolution
- Serverless and fully managed by AWS
- Cheaper for small reads than S3
- 99.999% availability (s3 only has 99.99%)
Cons
- Vendor lock-in to AWS -> we need to have an abstraction
- Higher storage cost compared to S3 due to replication
- Cost of replication
- Replication lag is controlled by AWS, not us
- More expensive for large >13kb reads than S3

