Cost-Based Budget

cost_based enforces a cumulative budget per period — designed for daily or hourly spend caps where each request has variable cost (money, units, weighted operations). It supports graceful degradation near budget exhaustion and a hard stop at the limit, making it complementary to token bucket, which limits instantaneous rate rather than total consumption.

How it works

For each limiter key, Fairvisor tracks a single period counter:

current_usage(period, key)

On each request:

Resolve request cost (fixed, header:*, query:*)
Compute current period start (5m, 1h, 1d, 7d)
Atomically increment usage by cost
Compute usage_percent = usage / budget * 100
Pick highest matching staged action
Allow / throttle / reject
If rejected, roll back increment (request is not charged)

Period alignment

Period	Alignment
`5m`	start of current 5-minute UTC slot (`:00`, `:05`, `:10`, …)
`1h`	start of current clock hour
`1d`	start of current UTC day
`7d`	start of current UTC week (Monday 00:00 UTC)

No explicit TTL is required for counters, because period start is embedded in the key. When period changes, a new key is used.

Cost source resolution

cost_key determines where cost is read from:

fixed
header:<name>
query:<name>

If missing/invalid/non-positive, default_cost is used.

Practical pattern:

start with fixed_cost: 1
migrate to weighted pricing using header:x-request-cost

Staged actions

staged_actions is a sorted threshold table. The active stage is the highest threshold reached by current usage.

Example:

"staged_actions": [
  { "threshold_percent": 80, "action": "warn" },
  { "threshold_percent": 95, "action": "throttle", "delay_ms": 500 },
  { "threshold_percent": 100, "action": "reject" }
]

Action semantics:

warn: request allowed, warning marker is set
throttle: request delayed by delay_ms, then allowed
reject: request denied with budget_exceeded

Important runtime nuance:

config must contain reject at 100%
a reject threshold below 100% is not used while still under budget
true rejection happens only when usage goes over budget

Rejection and rollback

When the increment makes usage exceed budget:

Fairvisor performs compensating dict:incr(..., -cost)
rejected request is not counted as consumed budget
response reason is budget_exceeded

This keeps accounting aligned with accepted traffic.

State format

Keys are stored in ngx.shared.fairvisor_counters:

cb:{rule_name}:{composite_limit_key}:{period_start_epoch}

Example:

cb:daily-budget:org-abc:1761177600

Configuration

{
  "algorithm": "cost_based",
  "algorithm_config": {
    "budget": 1000,
    "period": "5m",
    "cost_key": "header:x-cost-units",
    "fixed_cost": 1,
    "default_cost": 1,
    "staged_actions": [
      { "threshold_percent": 80, "action": "warn" },
      { "threshold_percent": 95, "action": "throttle", "delay_ms": 500 },
      { "threshold_percent": 100, "action": "reject" }
    ]
  }
}

Field	Required	Default	Validation
`budget`	yes	-	positive number
`period`	yes	-	one of `5m`, `1h`, `1d`, `7d`
`cost_key`	no	`fixed`	`fixed`, `header:<name>`, `query:<name>`
`fixed_cost`	no	`1`	positive (used for `fixed`)
`default_cost`	no	`1`	positive
`staged_actions`	yes	-	non-empty, strictly ascending thresholds, must include `reject@100`

staged_actions[] fields:

Field	Required	Validation
`threshold_percent`	yes	number in `[0, 100]`
`action`	yes	`warn`, `throttle`, `reject`
`delay_ms`	conditional	required and `> 0` for `throttle`

Response headers

On allowed requests (in enforce mode):

RateLimit-Limit: 1000
RateLimit-Remaining: 753
RateLimit-Reset: <seconds until period end>

On rejection:

HTTP 429 Too Many Requests
Retry-After: <seconds until next period start>
RateLimit-Limit: 1000
RateLimit-Remaining: 0
X-Fairvisor-Reason: budget_exceeded

Retry-After is ceil(next_period_start - now), minimum 1 second.

Failure behavior

If shared dict increment fails (for example dict pressure), algorithm fails open:

request is allowed
error detail is returned to caller path for logging/metrics
traffic is not blocked due to storage failure

Tuning

Choose budget from real commercial quota (not synthetic RPS)
Start with one warning stage (70-85%) and one throttle stage (90-97%)
Keep throttle delay modest; runtime caps throttle sleep at 30s anyway
Always keep a hard reject at 100%
Use fixed cost first, then move to weighted cost once producers can send trustworthy cost headers

Example

{
  "name": "org-daily-spend",
  "limit_keys": ["jwt:org_id"],
  "algorithm": "cost_based",
  "algorithm_config": {
    "budget": 500,
    "period": "5m",
    "cost_key": "header:x-request-cost",
    "default_cost": 1,
    "staged_actions": [
      { "threshold_percent": 80, "action": "warn" },
      { "threshold_percent": 95, "action": "throttle", "delay_ms": 200 },
      { "threshold_percent": 100, "action": "reject" }
    ]
  }
}

Combine with other controls

Typical production stack:

token_bucket for short-term burst control
cost_based for period quota
token_bucket_llm for token economics on LLM endpoints

All matched rules must pass, so you can enforce both real-time and period constraints together.