Token Bucket

The token bucket is the primary algorithm for request-rate limiting (requests per second). It allows short bursts while enforcing a steady-state rate over time.

How it works

The implementation uses a continuous-time token bucket (also called a leaky bucket as a meter):

Each unique limit key has a bucket with capacity burst tokens
Tokens refill at tokens_per_second per second, continuously
Each request consumes cost tokens (default 1)
If the bucket has enough tokens, the request is allowed; otherwise rejected

Refill is calculated lazily on each request — there is no background timer:

elapsed     = max(0, now - last_refill)
new_tokens  = min(tokens + elapsed × tokens_per_second, burst)

if new_tokens >= cost:
    allow, set tokens = new_tokens - cost
else:
    reject, retry_after = ceil((cost - new_tokens) / tokens_per_second)

The max(0, ...) guards against backward clock jumps.

State

State is stored in ngx.shared.fairvisor_counters as a single string entry per limit key:

Key:    tb:{rule_name}:{composite_limit_key}
Value:  "{tokens}:{last_refill}"   (both as 6-decimal floats)

Example:

tb:per-org-rps:org-abc  →  "87.432100:1736940000.000000"

Configuration

{
  "algorithm": "token_bucket",
  "algorithm_config": {
    "tokens_per_second": 100,
    "burst": 200,
    "cost_source": "fixed",
    "fixed_cost": 1,
    "default_cost": 1
  }
}

Field	Type	Required	Default	Constraint	Description
`tokens_per_second`	number	yes*	—	> 0	Steady-state refill rate
`rps`	number	yes*	—	> 0	Alias for `tokens_per_second`
`burst`	number	yes	—	≥ `tokens_per_second`	Maximum bucket capacity (max burst size)
`cost_source`	string	no	`"fixed"`	—	`"fixed"`, `"header:<name>"`, or `"query:<name>"`
`fixed_cost`	number	no	`1`	> 0	Token cost per request when `cost_source = "fixed"`
`default_cost`	number	no	`1`	> 0	Fallback cost when the source header/query param is absent or non-numeric

*One of tokens_per_second or rps must be provided.

Variable cost

You can charge a variable number of tokens per request by reading the cost from a header or query parameter:

{
  "algorithm_config": {
    "tokens_per_second": 100,
    "burst": 1000,
    "cost_source": "header:x-request-weight",
    "default_cost": 1
  }
}

If X-Request-Weight: 5 is sent, 5 tokens are consumed. If the header is absent or contains a non-numeric value, default_cost (1) is used.

Response headers

On every allowed request the following headers are set (in enforce mode), per draft-ietf-httpapi-ratelimit-headers and Retry-After per RFC 9110:

RateLimit-Limit: 200
RateLimit-Remaining: 87
RateLimit-Reset: 1
RateLimit: "per-org-rps";r=87;t=1

On rejection:

HTTP 429 Too Many Requests
Retry-After: 2
RateLimit-Limit: 200
RateLimit-Remaining: 0
RateLimit-Reset: 2
X-Fairvisor-Reason: token_bucket_exceeded

`Retry-After` jitter

The Retry-After value is the raw ceil((cost - tokens) / tokens_per_second) plus a deterministic per-identity jitter of up to 50%. The jitter is seeded from the identity (JWT subject, IP, path) so the same client always gets the same offset — different clients get different offsets. This prevents the thundering herd problem when many clients hit the limit at the same time.

Failure behavior

If the shared dict increment fails (for example, under dict memory pressure), the algorithm fails open:

the request is allowed
token state is left unchanged (no partial decrement is applied)
the failure is logged for metrics

Traffic is never blocked due to storage failure.

Tuning

Scenario	Recommendation
Interactive API (users expect burst then steady)	`burst = 5 × tokens_per_second`
Strict steady-rate enforcement	`burst = tokens_per_second`
LLM tool calls (each call is expensive)	`burst = tokens_per_second` + set `cost_source` to a per-call cost header
Background batch job (burst allowed at start)	`burst = 10 × tokens_per_second`

Examples

Global rate limit with IP partitioning

{
  "name": "global-rps",
  "limit_keys": ["ip:address"],
  "algorithm": "token_bucket",
  "algorithm_config": {
    "rps": 5,
    "burst": 10
  }
}

Each unique IP address gets its own 10-token bucket that refills at 5 tokens per second.

Per-API-key tiered limits

{
  "rules": [
    {
      "name": "enterprise",
      "limit_keys": ["header:x-api-key"],
      "algorithm": "token_bucket",
      "algorithm_config": { "rps": 1000, "burst": 2000 },
      "match": { "jwt:plan": "enterprise" }
    },
    {
      "name": "free",
      "limit_keys": ["header:x-api-key"],
      "algorithm": "token_bucket",
      "algorithm_config": { "rps": 10, "burst": 20 }
    }
  ]
}