Runbook: Rate Limit by User
Purpose / When to use
Use this runbook when you need stable per-user fairness and protection against noisy clients without throttling whole tenants.
Blast radius & risk level
- Risk level: medium
- Typical impact if misconfigured: high 429 rate for legitimate traffic sharing the same identity key or missing descriptor
Signals / symptoms
- One user can starve shared backend capacity
- Per-IP limits look healthy, but user-level abuse still passes
fairvisor_descriptor_missing_totalgrows for your identity key
Detection queries
sum by (reason) (rate(fairvisor_decisions_total{action="reject"}[5m]))
rate(fairvisor_descriptor_missing_total{key="jwt:sub"}[5m])
rate(fairvisor_descriptor_missing_total{key="header:x-user-id"}[5m])
Optional header trace:
curl -i -X POST http://localhost:8080/v1/decision \
-H 'X-Original-Method: GET' \
-H 'X-Original-URI: /api/v1/items' \
-H 'Authorization: Bearer <jwt>'
Triage checklist
- Pick the identity source (
jwt:subpreferred,header:x-user-idfallback). - Confirm identity field exists on production traffic paths.
- Confirm gateway forwards required headers consistently.
- Validate no descriptor missing spikes before enforcement.
- Confirm route selector scope is narrow enough (avoid global accidental coverage).
Mitigation playbook
Safe-first path:
- Create policy with
mode: shadowand per-user token bucket. - Observe would-reject volume for at least one traffic cycle.
- Tune
tokens_per_secondandburstuntil false positives are acceptable. - Promote to
mode: enforcewith monotonicbundle_versionbump.
Reference policy snippet:
{
"id": "api-per-user-limit",
"spec": {
"mode": "shadow",
"selector": { "pathPrefix": "/api/" },
"rules": [
{
"name": "per-user-rps",
"limit_keys": ["jwt:sub"],
"algorithm": "token_bucket",
"algorithm_config": {
"tokens_per_second": 10,
"burst": 20
}
}
]
}
}
Fallback identity variant:
"limit_keys": ["header:x-user-id"]
Verification checklist
- Reject reason distribution is stable and expected.
fairvisor_descriptor_missing_totalis near zero for chosen key.- No unexpected 429 surge on core endpoints.
RateLimit-*headers align with expected user-level buckets.
Exit criteria
- No sustained user-facing error regression
- Per-user fairness objective achieved
- Alert noise stays within team threshold
Rollback / recovery path
- Switch policy back to
mode: shadow. - If still noisy, remove policy and redeploy known-good bundle.
- Verify reject-rate baseline recovery.
Post-incident notes
Record:
- chosen identity key and reason
- final threshold values
- false-positive examples
- gateway forwarding fixes applied
Do not
- Do not enforce per-user limits before validating descriptor presence.
- Do not combine multiple new identity keys in one rollout.
- Do not rely on IP as the primary identity for authenticated APIs.