Skip to main content
Rate limiting is intentionally not a single synchronous global counter. The system converges in layers so the request path stays fast even when shared dependencies are slow or unavailable.

Convergence layers

Each layer handles a different scope.
LayerScopePurpose
Local memoryOne processMake the immediate decision without a network round trip
Regional originOne regionConverge multiple processes serving the same region
Global countersAll regionsShare meaningful regional usage with other regions
The lower layer always remains useful when a higher layer lags. A process can continue making local decisions if regional convergence is delayed. A region can continue enforcing regional limits if global convergence is delayed.

Regional convergence

After a request is accepted, the process buffers a replay event for the regional origin. Replay merges the regional value back into local memory with max and marks the entry fresh. Processes in the same region converge toward the same count without waiting on every request. Cold counters, stale counters, and strict-mode counters read the regional origin synchronously before deciding. This makes the first decision for a key, decisions after stale local state, and decisions after a denial use a fresher regional baseline. Warm entries carry a freshness deadline. Regional-origin reads and successful replays extend that deadline. While the entry is fresh, the request path can use local memory without blocking on origin. After the deadline passes, the next request refreshes from origin before deciding. The freshness interval is intentionally short. Active identifiers normally stay fresh through replay, while idle or lagging identifiers re-read origin before they can keep serving an old local view for the rest of a long window. Concurrent stale requests for the same window cell share one origin read, then continue from the same refreshed value.

Strict mode

Strict mode is regional. When a request is denied, the service records a deadline for the (workspace, namespace, identifier, duration) tuple. Until that deadline passes, later requests for the same tuple refresh the current window from the regional origin before evaluating the limit. The strict-mode key excludes the sequence. A denial in one fixed window can still affect the weighted previous-window term in the next fixed window, so strict mode survives the sequence rollover. Strict mode does not publish cross-region state. It refreshes the current window only. Previous windows use the normal cold and stale refresh path because they no longer receive new accepted increments.

Global convergence

Global convergence is eventual. A region publishes its own regional count when the count becomes meaningful for remote decisions. Other regions import the sum of foreign regional counts and include that imported count in future decisions. The publishing region may also import its own published count as a lower bound for local regional state on nodes that have not yet seen the same regional origin value.
Region A accepts traffic


Region A converges its local nodes


Region A publishes its regional count


Region B imports Region A's count


Region B includes that count in later decisions
This model means simultaneous traffic in multiple regions can briefly pass before every region has imported the latest remote usage. The tradeoff is deliberate: request-serving processes do not wait for cross-region coordination on the hot path. Own-region imports are a safety net, not a replacement for the regional origin. They can only raise local regional counts. They do not refresh the local entry’s regional-origin freshness deadline, and foreign counts still stay separate so they cannot be published again. Windows shorter than 60 seconds are effectively regional because the global convergence cadence is too coarse to provide useful cross-region accuracy. Longer windows can include global convergence before the window expires.

Failure behavior

Failures degrade toward local decisions and recover when the affected layer becomes available again.
FailureBehavior
Regional read failsThe process continues from its local count and retries soon; failed reads do not make the local entry fresh
Regional replay lagsOther nodes in the region converge later, or refresh when their local entry becomes stale
Global publish lagsOther regions do not see the new count yet
Global import lagsThe region continues with its existing imported counts
Correctness does not depend on making every layer synchronous. The invariant is that accepted local work is monotonic within a window cell, so delayed convergence can merge later without subtracting or rewriting history.