Cost-Based Budget

cost_based enforces a cumulative budget per period — designed for daily or hourly spend caps where each request has variable cost (money, units, weighted operations). It supports graceful degradation near budget exhaustion and a hard stop at the limit, making it complementary to token bucket, which limits instantaneous rate rather than total consumption.

How it works

For each limiter key, Fairvisor tracks a single period counter:

current_usage(period, key)

On each request:

  1. Resolve request cost (fixed, header:*, query:*)
  2. Compute current period start (5m, 1h, 1d, 7d)
  3. Atomically increment usage by cost
  4. Compute usage_percent = usage / budget * 100
  5. Pick highest matching staged action
  6. Allow / throttle / reject
  7. If rejected, roll back increment (request is not charged)

Period alignment

Period Alignment
5m start of current 5-minute UTC slot (:00, :05, :10, …)
1h start of current clock hour
1d start of current UTC day
7d start of current UTC week (Monday 00:00 UTC)

No explicit TTL is required for counters, because period start is embedded in the key. When period changes, a new key is used.

Cost source resolution

cost_key determines where cost is read from:

  • fixed
  • header:<name>
  • query:<name>

If missing/invalid/non-positive, default_cost is used.

Practical pattern:

  • start with fixed_cost: 1
  • migrate to weighted pricing using header:x-request-cost

Staged actions

staged_actions is a sorted threshold table. The active stage is the highest threshold reached by current usage.

Example:

"staged_actions": [
  { "threshold_percent": 80, "action": "warn" },
  { "threshold_percent": 95, "action": "throttle", "delay_ms": 500 },
  { "threshold_percent": 100, "action": "reject" }
]

Action semantics:

  • warn: request allowed, warning marker is set
  • throttle: request delayed by delay_ms, then allowed
  • reject: request denied with budget_exceeded

Important runtime nuance:

  • config must contain reject at 100%
  • a reject threshold below 100% is not used while still under budget
  • true rejection happens only when usage goes over budget

Rejection and rollback

When the increment makes usage exceed budget:

  • Fairvisor performs compensating dict:incr(..., -cost)
  • rejected request is not counted as consumed budget
  • response reason is budget_exceeded

This keeps accounting aligned with accepted traffic.

State format

Keys are stored in ngx.shared.fairvisor_counters:

cb:{rule_name}:{composite_limit_key}:{period_start_epoch}

Example:

cb:daily-budget:org-abc:1761177600

Configuration

{
  "algorithm": "cost_based",
  "algorithm_config": {
    "budget": 1000,
    "period": "5m",
    "cost_key": "header:x-cost-units",
    "fixed_cost": 1,
    "default_cost": 1,
    "staged_actions": [
      { "threshold_percent": 80, "action": "warn" },
      { "threshold_percent": 95, "action": "throttle", "delay_ms": 500 },
      { "threshold_percent": 100, "action": "reject" }
    ]
  }
}
Field Required Default Validation
budget yes - positive number
period yes - one of 5m, 1h, 1d, 7d
cost_key no fixed fixed, header:<name>, query:<name>
fixed_cost no 1 positive (used for fixed)
default_cost no 1 positive
staged_actions yes - non-empty, strictly ascending thresholds, must include reject@100

staged_actions[] fields:

Field Required Validation
threshold_percent yes number in [0, 100]
action yes warn, throttle, reject
delay_ms conditional required and > 0 for throttle

Response headers

On allowed requests (in enforce mode):

RateLimit-Limit: 1000
RateLimit-Remaining: 753
RateLimit-Reset: <seconds until period end>

On rejection:

HTTP 429 Too Many Requests
Retry-After: <seconds until next period start>
RateLimit-Limit: 1000
RateLimit-Remaining: 0
X-Fairvisor-Reason: budget_exceeded

Retry-After is ceil(next_period_start - now), minimum 1 second.

Failure behavior

If shared dict increment fails (for example dict pressure), algorithm fails open:

  • request is allowed
  • error detail is returned to caller path for logging/metrics
  • traffic is not blocked due to storage failure

Tuning

  1. Choose budget from real commercial quota (not synthetic RPS)
  2. Start with one warning stage (70-85%) and one throttle stage (90-97%)
  3. Keep throttle delay modest; runtime caps throttle sleep at 30s anyway
  4. Always keep a hard reject at 100%
  5. Use fixed cost first, then move to weighted cost once producers can send trustworthy cost headers

Example

{
  "name": "org-daily-spend",
  "limit_keys": ["jwt:org_id"],
  "algorithm": "cost_based",
  "algorithm_config": {
    "budget": 500,
    "period": "5m",
    "cost_key": "header:x-request-cost",
    "default_cost": 1,
    "staged_actions": [
      { "threshold_percent": 80, "action": "warn" },
      { "threshold_percent": 95, "action": "throttle", "delay_ms": 200 },
      { "threshold_percent": 100, "action": "reject" }
    ]
  }
}

Combine with other controls

Typical production stack:

  • token_bucket for short-term burst control
  • cost_based for period quota
  • token_bucket_llm for token economics on LLM endpoints

All matched rules must pass, so you can enforce both real-time and period constraints together.