Rate limiting is a shared runtime subsystem used by API, Frontline, and Sentinel policy execution. The implementation lives inDocumentation Index
Fetch the complete documentation index at: https://engineering.unkey.com/llms.txt
Use this file to discover all available pages before exploring further.
internal/services/ratelimit.
The system is designed to protect request-serving paths first. Accuracy improves as state converges, but no normal request depends on a synchronous global counter.
Design goals
The rate limiter optimizes for these goals, in this order:- Keep the hot path local. A process must be able to make a decision from memory without waiting on cross-region state.
- Preserve availability. Shared dependencies can improve accuracy, but a dependency failure must degrade toward local enforcement instead of taking down requests.
- Converge where it matters. Nodes in one region converge quickly. Regions share counts when usage is high enough to affect remote decisions.
- Avoid double counting. A region’s own count and imported foreign counts stay separate so remote usage is never republished as local usage.
- Smooth reset boundaries. Fixed window cells are evaluated as a sliding window so callers cannot spend a full limit on both sides of a boundary.
Tradeoffs
These goals create deliberate tradeoffs:| Choice | Benefit | Cost |
|---|---|---|
| Local-memory decision first | Low latency and high availability | Simultaneous traffic in different processes can briefly see different views |
| Async regional convergence | Requests don’t wait on the regional origin | A neighboring node may lag until replay or strict mode catches up |
| Async global convergence | No request waits on cross-region coordination | Multi-region bursts can briefly pass before imported counts arrive |
| Publish only meaningful regional usage | Cross-region writes stay proportional to useful signal | Low regional usage may remain regional only |
| Sliding-window evaluation over fixed cells | Smooths boundary bursts without per-request histories | Requires reading the current and previous window cells |
Core idea
The rate limiter stores fixed window cells and evaluates them as a sliding window. Each request updates the current cell if the effective count is still under the limit. State converges in layers: Local counters keep the hot path fast. The regional origin converges nodes inside one region. Global counters converge regional observations across regions for longer windows.Pages
- Request path explains how one request is evaluated, including sliding-window math and batch semantics.
- Consistency model explains what converges locally, regionally, and globally.
- Global counters explains the G-Counter model used for cross-region convergence.
Invariants
These invariants shape the implementation:- The request path must not wait on cross-region state.
- A local count represents only this region’s own observations.
- Imported global count represents other regions and must not be pushed back out.
- A fixed window cell is grow-only while it is active.
- Sliding-window behavior comes from weighting the previous cell, not from decrementing the current cell.
- Batch requests must preserve all-or-nothing semantics.

