Cost-Based Budget
cost_based enforces a cumulative budget per period — designed for daily or hourly spend caps where each request has variable cost (money, units, weighted operations). It supports graceful degradation near budget exhaustion and a hard stop at the limit, making it complementary to token bucket, which limits instantaneous rate rather than total consumption.
How it works
For each limiter key, Fairvisor tracks a single period counter:
current_usage(period, key)
On each request:
- Resolve request cost (
fixed,header:*,query:*) - Compute current period start (
5m,1h,1d,7d) - Atomically increment usage by cost
- Compute
usage_percent = usage / budget * 100 - Pick highest matching staged action
- Allow / throttle / reject
- If rejected, roll back increment (request is not charged)
Period alignment
| Period | Alignment |
|---|---|
5m |
start of current 5-minute UTC slot (:00, :05, :10, …) |
1h |
start of current clock hour |
1d |
start of current UTC day |
7d |
start of current UTC week (Monday 00:00 UTC) |
No explicit TTL is required for counters, because period start is embedded in the key. When period changes, a new key is used.
Cost source resolution
cost_key determines where cost is read from:
fixedheader:<name>query:<name>
If missing/invalid/non-positive, default_cost is used.
Practical pattern:
- start with
fixed_cost: 1 - migrate to weighted pricing using
header:x-request-cost
Staged actions
staged_actions is a sorted threshold table. The active stage is the highest threshold reached by current usage.
Example:
"staged_actions": [
{ "threshold_percent": 80, "action": "warn" },
{ "threshold_percent": 95, "action": "throttle", "delay_ms": 500 },
{ "threshold_percent": 100, "action": "reject" }
]
Action semantics:
warn: request allowed, warning marker is setthrottle: request delayed bydelay_ms, then allowedreject: request denied withbudget_exceeded
Important runtime nuance:
- config must contain
rejectat100% - a
rejectthreshold below100%is not used while still under budget - true rejection happens only when usage goes over budget
Rejection and rollback
When the increment makes usage exceed budget:
- Fairvisor performs compensating
dict:incr(..., -cost) - rejected request is not counted as consumed budget
- response reason is
budget_exceeded
This keeps accounting aligned with accepted traffic.
State format
Keys are stored in ngx.shared.fairvisor_counters:
cb:{rule_name}:{composite_limit_key}:{period_start_epoch}
Example:
cb:daily-budget:org-abc:1761177600
Configuration
{
"algorithm": "cost_based",
"algorithm_config": {
"budget": 1000,
"period": "5m",
"cost_key": "header:x-cost-units",
"fixed_cost": 1,
"default_cost": 1,
"staged_actions": [
{ "threshold_percent": 80, "action": "warn" },
{ "threshold_percent": 95, "action": "throttle", "delay_ms": 500 },
{ "threshold_percent": 100, "action": "reject" }
]
}
}
| Field | Required | Default | Validation |
|---|---|---|---|
budget |
yes | - | positive number |
period |
yes | - | one of 5m, 1h, 1d, 7d |
cost_key |
no | fixed |
fixed, header:<name>, query:<name> |
fixed_cost |
no | 1 |
positive (used for fixed) |
default_cost |
no | 1 |
positive |
staged_actions |
yes | - | non-empty, strictly ascending thresholds, must include reject@100 |
staged_actions[] fields:
| Field | Required | Validation |
|---|---|---|
threshold_percent |
yes | number in [0, 100] |
action |
yes | warn, throttle, reject |
delay_ms |
conditional | required and > 0 for throttle |
Response headers
On allowed requests (in enforce mode):
RateLimit-Limit: 1000
RateLimit-Remaining: 753
RateLimit-Reset: <seconds until period end>
On rejection:
HTTP 429 Too Many Requests
Retry-After: <seconds until next period start>
RateLimit-Limit: 1000
RateLimit-Remaining: 0
X-Fairvisor-Reason: budget_exceeded
Retry-After is ceil(next_period_start - now), minimum 1 second.
Failure behavior
If shared dict increment fails (for example dict pressure), algorithm fails open:
- request is allowed
- error detail is returned to caller path for logging/metrics
- traffic is not blocked due to storage failure
Tuning
- Choose budget from real commercial quota (not synthetic RPS)
- Start with one warning stage (
70-85%) and one throttle stage (90-97%) - Keep throttle delay modest; runtime caps throttle sleep at 30s anyway
- Always keep a hard reject at 100%
- Use fixed cost first, then move to weighted cost once producers can send trustworthy cost headers
Example
{
"name": "org-daily-spend",
"limit_keys": ["jwt:org_id"],
"algorithm": "cost_based",
"algorithm_config": {
"budget": 500,
"period": "5m",
"cost_key": "header:x-request-cost",
"default_cost": 1,
"staged_actions": [
{ "threshold_percent": 80, "action": "warn" },
{ "threshold_percent": 95, "action": "throttle", "delay_ms": 200 },
{ "threshold_percent": 100, "action": "reject" }
]
}
}
Combine with other controls
Typical production stack:
token_bucketfor short-term burst controlcost_basedfor period quotatoken_bucket_llmfor token economics on LLM endpoints
All matched rules must pass, so you can enforce both real-time and period constraints together.