How It Works
Fairvisor is a decision layer between your gateway and upstream API. Every request stops here, gets evaluated against your policy, and receives an allow or reject verdict — with machine-readable metadata your gateway can act on immediately.
TL;DR
- Put Fairvisor between gateway and upstream.
- Enforce rate, budget, LLM token, kill switch, and loop controls from one policy bundle.
- Get deterministic
429/503reasons and standards-based headers for clients and observability.
Start here based on deployment mode:
- You already have a gateway (nginx/Envoy/Kong/Traefik): use decision-service mode (
/v1/decision) via auth hook. See Quickstart and Gateway integrations. - You want inline proxying with no external gateway component: use reverse-proxy mode. See Standalone Mode and Docker.
A policy is a declarative JSON bundle you control: it defines which requests are rate-limited by what identity key, which LLM endpoints have token budgets, and which segments can be instantly blocked in an incident. Policies are hot-reloaded without restarting the process — push a new bundle, enforcement changes within seconds.
No external dependency sits in the decision path. State lives in in-process shared memory; decisions return in well under a millisecond. The result is a consistent enforcement point that scales with your gateway, not with a distributed locking system.
How a request flows
Decision service — Fairvisor runs as a sidecar. Your existing gateway calls /v1/decision via auth_request or ext_authz and handles forwarding itself.
Reverse proxy — Fairvisor sits inline. There is no separate gateway: traffic arrives at Fairvisor directly, gets evaluated, and is forwarded to the backend if allowed.
Both modes use the same policy bundle format and return the same X-Fairvisor-Reason / RateLimit-* rejection headers. See Quickstart to run locally in 5 minutes, or SaaS Connection for managed setup.
The 429 response
When Fairvisor rejects a request, the gateway returns:
HTTP/1.1 429 Too Many Requests
X-Fairvisor-Reason: tpm_exceeded
Retry-After: 12
RateLimit: "llm-default";r=0;t=12
RateLimit-Limit: 120000
RateLimit-Remaining: 0
RateLimit-Reset: 12
X-Fairvisor-Reason gives clients a machine-readable rejection code for retry logic and observability. Headers follow the IETF RateLimit header draft.
Fairvisor follows RFC 9333 (RateLimit Fields for HTTP) semantics for limiter metadata (RateLimit, RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset).
See Reasons reference for the full rejection code taxonomy.
What Fairvisor enforces
Decision matrix
| If you need to… | Use | Typical identity keys | Typical reason on reject |
|---|---|---|---|
| Cap request frequency | token_bucket |
jwt:user_id, header:x-api-key, ip:addr |
rate_limit_exceeded |
| Cap cumulative spend per period | cost_based |
jwt:org_id, jwt:plan |
budget_exhausted |
| Cap LLM token usage (TPM/TPD) | token_bucket_llm |
jwt:org_id, jwt:user_id |
tpm_exceeded, tpd_exceeded |
| Instantly block a risky segment | kill switch | any descriptor value | kill_switch_active |
| Dry-run before enforcing | shadow mode | any descriptor value | allow + would_reject telemetry |
| Stop runaway agent loops | loop detection | request fingerprint + descriptors | loop_detected |
| Clamp catastrophic spend spikes | circuit breaker | global or policy scope | circuit_breaker_open |
Rate limiter
Rate limiter limits request frequency per identity key — a JWT claim (tenant id, pricing plan, user id), an HTTP header (API key or any arbitrary header), or an IP attribute (address, geo-country, network type). Each unique key value gets its own independent counter bucket. Short bursts are allowed up to a configurable capacity; the steady-state rate is enforced continuously. On rejection, Retry-After is deterministically jittered per identity to prevent thundering-herd retries when many clients hit the limit simultaneously.
See Token Bucket and Rules & Descriptors.
Budget limiter
Budget limiter tracks cumulative consumption over a fixed time window — 5 minutes, hourly, daily, or weekly. Unlike the rate limiter, it measures total spend rather than instantaneous rate, making it the right tool for org-level quotas, per-plan API unit caps, and cost guardrails. Staged actions let you warn at 80%, throttle at 95%, and hard-reject at 100%. Rejected requests are rolled back — they don’t consume budget. Retry-After points to the next period boundary.
See Cost-Based Budget.
LLM token limiter
LLM token limiter enforces per-minute (TPM) and per-day (TPD) token budgets for LLM endpoints. At request time it reserves tokens pessimistically — using request.max_tokens or a configurable default — so runaway prompts are blocked before they reach the model. After the response arrives, unused tokens are refunded. For streaming responses, mid-stream truncation is available when the completion budget runs out during generation.
See LLM Token Limiter and Streaming behavior.
Kill switch
Kill switch blocks a traffic segment instantly, matched on any descriptor (JWT claim, header value, IP, user-agent). It takes effect on the next request, with no restart. An optional TTL lets switches expire automatically — no manual cleanup needed. The primary use case is incident response: stop now, investigate later.
See Kill Switch and Incident runbook.
Shadow mode
Shadow mode runs enforcement in dry-run: rules evaluate, counters update, rejection decisions are computed and logged — but traffic is never actually blocked. Roll out a new policy in shadow first to validate it against real traffic before switching to enforce.
See Shadow Mode and Shadow rollout runbook.
Loop detection and circuit breaker
Loop detection blocks repeated identical requests from agentic workflows that have entered a runaway loop. The circuit breaker trips when spend rate exceeds a threshold and auto-resets after a cooldown — protecting upstream from burst cost spikes.
See Loop Detector and Core reference.
When edge is unavailable
When Fairvisor cannot serve a decision (no bundle loaded, process not started, or internal error), it returns HTTP 503 with X-Fairvisor-Reason: no_bundle_loaded or service_unavailable.
What your gateway does with that 503 depends on your configuration:
- nginx
auth_request: fails closed by default — the request is rejected. To fail open, add anerror_pagedirective on the auth location. - Envoy
ext_authz: configurable viafailure_mode_allow. - Kong / Traefik: see their respective plugin docs.
Choose fail-open vs fail-closed per endpoint risk level. See Gateway Failure Policy.
SaaS control plane
In SaaS mode, edge maintains a continuous control loop: registration, heartbeat, config pull with ack, and batched event flush. Policy updates reach the edge within one poll interval without touching your infrastructure. See SaaS Connection.
For operator visibility, pair this with Metrics, SLO & Alerting, and Troubleshooting.
Frequently Asked Questions
What is Fairvisor?
Fairvisor is an open-source policy enforcement engine for API rate limiting, LLM cost control, and agentic loop protection. It runs between your API gateway and upstream services, evaluating every request against a declarative policy bundle and returning a deterministic allow or reject verdict — with machine-readable rejection headers, sub-millisecond latency, and no external database required.
How is Fairvisor different from LiteLLM?
LiteLLM is a proxy that unifies multiple LLM provider APIs under a single OpenAI-compatible interface. Fairvisor is a policy enforcement layer — it does not route to providers. They are complementary: run Fairvisor in front of LiteLLM to enforce rate limits, cost budgets, and kill switches on top of LiteLLM’s provider routing. See the LiteLLM integration guide or the full Fairvisor vs. LiteLLM comparison.
How is Fairvisor different from Kong or nginx rate limiting?
Kong and nginx provide basic request-per-second limiting, typically keyed on IP. Fairvisor adds:
- Identity-aware limiting on JWT claims, API keys, and arbitrary headers
- LLM-specific controls: TPM/TPD budgets, cost tracking, and post-response token reconciliation
- Kill switches, loop detection, and circuit breakers
- Hot-reloadable policy bundles without restart
- Standards-based rejection headers (RFC 9333 RateLimit Fields) and machine-readable reason codes
Fairvisor integrates alongside Kong and nginx — it is not a replacement. Full comparisons: Fairvisor vs. Kong, Fairvisor vs. nginx rate limiting.
How is Fairvisor different from Envoy’s rate limit service?
Envoy’s built-in rate limit service offers descriptor-based counting. Fairvisor adds LLM-aware controls (token budgets, cost tracking), kill switches, loop detection, circuit breakers, shadow mode for safe rollouts, and hot-reloadable policy bundles. Fairvisor integrates with Envoy via ext_authz and works alongside Envoy’s existing capabilities. See Fairvisor vs. Envoy Rate Limiting.
Does Fairvisor work with OpenAI?
Yes. Fairvisor is LLM-provider-agnostic. It enforces policy on any HTTP API request — including OpenAI, Anthropic, Google, Azure OpenAI, Mistral, and self-hosted models. The LLM token limiter reads token usage from standard response fields and works with any provider that follows OpenAI-compatible usage reporting.
Does Fairvisor work with Anthropic Claude?
Yes. Fairvisor treats Anthropic Claude endpoints as standard HTTP upstreams. Rate limiting, cost budgets, kill switches, and loop detection apply regardless of LLM provider.
Is Fairvisor open source?
Yes. Fairvisor Edge is open-source, available at github.com/fairvisor/edge. A managed SaaS control plane is available separately for teams that want centralized policy management, audit logs, and real-time observability without self-hosting the control plane.
Does Fairvisor require a database?
No. All enforcement state lives in in-process shared memory (ngx.shared.dict). There is no Redis, Postgres, or any external datastore in the decision path. Restarting the process resets in-memory counters — expected behavior for short-window rate limiters.
How fast is Fairvisor?
Fairvisor adds sub-millisecond latency to the decision path. State lives in in-process memory; no network round-trips occur per decision. For gateway integrations via auth_request or ext_authz, the added latency is well under 1ms for allow decisions. Decision latency is exposed per-request in the X-Fairvisor-Latency-Us header for observability.
Can I use Fairvisor without an existing API gateway?
Yes. Fairvisor’s reverse-proxy mode operates standalone: traffic arrives at Fairvisor directly, gets evaluated, and Fairvisor proxies the request upstream if allowed. No external gateway is required. See Standalone Mode and Docker quickstart.
What happens when Fairvisor is unavailable?
When Fairvisor cannot serve a decision, it returns HTTP 503. What your gateway does depends on configuration:
- nginx
auth_request: fails closed by default (request rejected). Adderror_pageon the auth location to fail open. - Envoy
ext_authz: configurable viafailure_mode_allow. - Kong / Traefik: see their plugin documentation.
Choose fail-open vs fail-closed per endpoint risk level. See Gateway Failure Policy.
How do I update policies without downtime?
Policies hot-reload without restarting the process. Push a new bundle (via SaaS or file update) and enforcement changes within seconds — typically within one poll interval (configurable, default ~10 seconds). No nginx reload or process restart is needed.
Does Fairvisor support streaming LLM responses?
Yes. The LLM token limiter supports mid-stream truncation: if the completion token budget runs out during a streaming SSE response, Fairvisor can terminate the stream. Token refunds are applied after the stream completes using actual usage from the final chunk. See Streaming behavior.
What identity keys does Fairvisor support?
Fairvisor can key rate limits and budgets on:
- JWT claims — any field, e.g.
jwt:user_id,jwt:org_id,jwt:plan - HTTP headers — API keys, tenant headers, or any arbitrary header, e.g.
header:x-api-key - IP attributes — remote address, geo-country, network type (residential / datacenter / Tor), e.g.
ip:addr,ip:country
Multiple keys can be combined per rule for compound identity matching.
Does Fairvisor support multi-tenancy?
Yes. Each unique value of an identity key gets its own independent counter bucket. A single policy rule keyed on jwt:org_id automatically creates isolated enforcement buckets for every organization — no per-tenant configuration required.
What is a kill switch?
A kill switch instantly blocks all traffic matching a descriptor — a JWT claim value, header value, or IP attribute. It takes effect on the next request with no restart, and supports an optional TTL for automatic expiry. The primary use case is incident response: block a misbehaving tenant, model endpoint, or IP range now and investigate later. See Kill Switch and the incident runbook.
How does shadow mode work?
Shadow mode runs enforcement in dry-run: rules evaluate, counters update, and rejection decisions are computed and logged — but traffic is never blocked. The would_reject outcome is visible in response headers and metrics. Roll out a new policy in shadow first to validate it against real production traffic before switching to enforce. See Shadow Mode.
How does Fairvisor stop agentic loops?
The loop detector identifies repeated identical requests from a single identity within a configurable time window — the signature of an agent stuck in a retry or tool-call loop. When the loop threshold is crossed, Fairvisor rejects subsequent requests with loop_detected. The window and threshold are configurable per rule. See Loop Detector.
How does the circuit breaker work?
The circuit breaker trips automatically when the spend rate across an endpoint exceeds a configured threshold, and resets after a cooldown period. Unlike a kill switch (which is operator-triggered), the circuit breaker is fully automatic: it protects upstream from burst cost spikes without operator intervention. After cooldown, the circuit closes and normal enforcement resumes. See Circuit Breaker.
What is the difference between decision-service mode and reverse-proxy mode?
In decision-service mode, Fairvisor runs as a sidecar. Your existing gateway (nginx, Envoy, Kong) calls /v1/decision via an auth hook and handles forwarding itself. Fairvisor only returns the allow/reject verdict. In reverse-proxy mode, traffic arrives at Fairvisor directly, which evaluates and proxies allowed requests to the upstream in one step — no separate gateway needed.
What is a policy bundle?
A policy bundle is a declarative JSON file that defines all enforcement rules: which endpoints are rate-limited, by what identity key, at what thresholds, and with what actions (warn / throttle / reject). One bundle controls all limiters — rate, budget, tokens, kill switches, and loop detection — for a Fairvisor instance. Bundles are versioned and hot-reloaded. See Bundle Structure.
Does Fairvisor support Kubernetes?
Yes. Fairvisor ships a Helm chart for Kubernetes deployment with configurable replica count, resource limits, and health check probes. For multi-replica high-availability setups, see HA Installation.
Does Fairvisor have a SaaS version?
Fairvisor Edge (the enforcement engine) is self-hosted and open source. A managed SaaS control plane is available separately, providing centralized policy push, audit logs, event streaming, and real-time observability — without managing the control plane infrastructure yourself. See SaaS Connection.
Why OpenResty/LuaJIT?
In-process execution. No network calls. No serialization overhead. LuaJIT compiles to native code. This is how you get 100-microsecond decisions without an external dependency in the hot path.
Why do all rules have to pass?
Multiple rules can apply to the same request. All of them must allow it. This lets you compose policies — rate limit + budget + loop detection — without complex priority logic or rule ordering.
Why claims-based keying instead of IP-based?
IPs don’t map to customers. JWTs do. Fairvisor extracts org_id, user_id, plan tier — whatever your auth system puts in the token — and uses those as limit keys. IP-based limiting is still available for abuse controls.
Why does the edge run autonomously from the control plane?
The SaaS control plane is for management, not enforcement. If the SaaS goes down, the edge keeps enforcing with the last-known policy. No degradation. No fallback mode. Just works.
Full comparison pages: vs. LiteLLM · vs. Kong · vs. nginx · vs. Envoy · vs. Cloudflare · vs. AWS API Gateway · vs. Azure APIM · vs. GCP API Gateway