Gossip Cluster
The pkg/cluster package provides gossip-based cluster membership and cross-region message propagation. Its primary use case is cache invalidation — when one node mutates data, all other nodes (including those in different regions) evict stale cache entries.
Built on hashicorp/memberlist (SWIM protocol).
Two-Tier Architecture
The cluster uses a two-tier gossip design: a fast LAN pool within each region and a WAN pool that connects regions through elected bridge nodes.
LAN Pool (intra-region)
Every node in a region joins the same LAN pool. Uses memberlist.DefaultLANConfig() — tuned for low-latency networks with ~1ms propagation. All nodes broadcast and receive messages.
- Port:
GossipLANPort(default7946) - Seeds:
GossipLANSeeds— typically a Kubernetes headless service DNS name resolving to all pod IPs in the region - Encryption: AES-256 via
GossipSecretKey
WAN Pool (cross-region)
Only the bridge node in each region participates in the WAN pool. Uses memberlist.DefaultWANConfig() — tolerates higher latency and packet loss typical of cross-region links.
- Port:
GossipWANPort(default7947) - Seeds:
GossipWANSeeds— addresses of bridge-capable nodes in other regions
Bridge Election
Each region auto-elects exactly one bridge — the node whose NodeID is lexicographically smallest among all LAN pool members. This is fully deterministic and requires no coordination protocol.
Election is re-evaluated whenever:
- A node joins the LAN pool (
NotifyJoin) - A node leaves the LAN pool (
NotifyLeave) - The initial LAN seed join completes
Failover
When the bridge leaves (crash, scale-down, deployment), NotifyLeave fires on remaining nodes, triggering re-evaluation. The node with the next smallest name automatically promotes itself. No manual intervention required.
Message Flow
Same-region broadcast
Cross-region relay
Loop Prevention
- LAN → WAN relay only happens for messages with
direction=LAN(prevents re-relaying WAN messages) - WAN → LAN re-broadcast is tagged
direction=WAN, so the receiving bridge doesn't relay it again source_regioncheck on the WAN delegate drops messages originating in the same region
Protobuf Envelope
All messages use a single protobuf envelope (proto/cluster/v1/envelope.proto):
Adding a new message type:
- Add a new
oneofvariant toClusterMessage - Call
cluster.Subscribe[*clusterv1.ClusterMessage_YourType](mux, handler)
The MessageMux handles routing automatically.
Wiring: API Service Example
The API service (svc/api/run.go) wires gossip like this:
Component Roles
| Component | Role |
|---|---|
cluster.Cluster | Manages LAN/WAN memberlists, bridge election, message transport |
cluster.MessageMux | Routes incoming ClusterMessage payloads to typed handlers |
cluster.Subscribe[T] | Generic subscription — only receives messages matching the oneof variant |
clustering.GossipBroadcaster | Bridges cache.Broadcaster interface to gossip Cluster.Broadcast() |
Fail-Open Design
Gossip is designed to never take down the API service. Every failure path degrades gracefully to local-only caching:
| Failure | Behavior |
|---|---|
cluster.New() fails at startup | Logs error, continues without gossip (local-only caching) |
| LAN/WAN seed join exhaustion | Retries in background goroutine, logs and gives up — never crashes |
Broadcast() fails (proto marshal) | Error logged and swallowed, returns nil to caller |
| Bridge promotion fails | Logs error, node stays non-bridge — LAN still works |
| Incoming message handler errors | Logged, never propagated to request handling |
| Bridge node dies | Next node auto-promotes, no manual intervention |
Configuration Reference
| Config Field | Default | Description |
|---|---|---|
GossipEnabled | false | Enable gossip cluster |
GossipBindAddr | 0.0.0.0 | Bind address for memberlist |
GossipLANPort | 7946 | LAN memberlist port |
GossipWANPort | 7947 | WAN memberlist port (bridge only) |
GossipLANSeeds | — | Comma-separated LAN seed addresses |
GossipWANSeeds | — | Comma-separated WAN seed addresses |
GossipSecretKey | — | Base64-encoded AES key (openssl rand -base64 32) |