Operator FAQ

Does Fairvisor require an external datastore?

No. Core limiter state is in nginx shared dict memory.

What is the fastest health check set?

/livez, /readyz, and a sampled /v1/decision request.

Why does /v1/decision return 503?

Usually no active bundle, invalid startup env, or runtime init failure.

Can I run without SaaS?

Yes. Use standalone mode with FAIRVISOR_CONFIG_FILE.

How do I force rollback quickly?

Re-apply known-good bundle and reload workers.

How do I debug sudden 429 spikes?

Start with reject reason distribution, then map to policy/rule.

How do I detect descriptor wiring bugs?

Watch fairvisor_descriptor_missing_total and verify gateway forwarding.

Is Retry-After random?

It includes deterministic per-identity jitter to spread retries.

Does reverse_proxy change decision logic?

No; logic is same. Context source differs from decision_service mode.

Can I test limits safely in production?

Yes, via mode: shadow and phased promotion to enforce.

Should I fail-open or fail-closed at gateway?

Route-dependent. High-risk endpoints usually fail-closed.

What is the minimum alert set?

Reject spike, no_bundle_loaded, and SaaS disconnected (if SaaS mode).

How do I size FAIRVISOR_SHARED_DICT_SIZE?

Start at 128m, load test with real key cardinality, then increase.

Why do counters reset after restart?

Shared dict is process-local memory; restart resets in-memory state.

How do I trace one reject end-to-end?

Use X-Fairvisor-Reason, Retry-After, RateLimit*, and metrics. For policy/rule attribution, use debug session headers (X-Fairvisor-Debug-*).

What if policy matches unexpectedly broad traffic?

Audit pathPrefix and method filters; add narrower selectors first.

Is there a public admin policy API?

Not in current OSS runtime; policy delivery is file/SaaS oriented.

Can I trust client-supplied X-Original-* headers?

Only from trusted gateway boundary; never from untrusted public clients.

What should be in incident postmortem notes?

Trigger, impacted policy/rule, mitigation, rollback steps, and preventive checks.

What is the latency impact of Fairvisor on my gateway?

Sub-millisecond per decision; state is in-process shared memory. No network hop per request. Actual overhead visible in X-Fairvisor-Latency-Us header.

How do I configure different limits for free vs paid plans?

Use jwt:plan as the identity key and define separate rules per plan value, or use a single rule with per-key overrides if your plan maps to a descriptor value.

Can I limit by both user and organization simultaneously?

Yes. Add two rules in the same policy: one keyed on jwt:user_id and one on jwt:org_id. Both counters are checked independently; either can reject the request.

What is the difference between TPM and TPD limits?

TPM (tokens per minute) enforces instantaneous throughput — useful for protecting upstream capacity. TPD (tokens per day) enforces cumulative daily spend — useful for per-user or per-org quotas. Both can be configured on the same rule.

How do I handle burst traffic without rejecting legitimate users?

Configure the token bucket capacity (burst allowance) above the steady-state rate. Bursts up to capacity are absorbed; sustained rate beyond rate is rejected. Shadow mode lets you tune these values against real traffic patterns before enforcing.

How do I rotate API keys without downtime?

Update the policy bundle to accept the new key descriptor value (add it alongside the old one), push the bundle, wait for hot-reload, then decommission the old key. No restart required.

How do I know which rule triggered a reject?

Check X-Fairvisor-Reason for the reject code, then use debug session headers (X-Fairvisor-Debug-*) to get rule and policy attribution. See Decision Tracing.

What metrics should I alert on?

Minimum set: fairvisor_requests_rejected_total spike, fairvisor_no_bundle_loaded (non-zero), and fairvisor_saas_disconnected_seconds (if using SaaS). See SLO & Alerting.

How do I test a new policy before deploying it?

Use fairvisor policy test CLI command with a fixture file, or deploy in mode: shadow to validate against real traffic. Shadow mode never blocks traffic but tracks would-reject decisions.

What happens if a JWT is missing or malformed?

The JWT claim identity key falls back to a sentinel value (empty or __invalid__). Configure a catch-all rule or selector to handle unauthenticated traffic explicitly.

How do I monitor Fairvisor in production?

Expose the /metrics endpoint to Prometheus. Key metrics: request rate, reject rate by reason, latency histogram, bundle reload events, and (in SaaS mode) heartbeat lag. See Metrics reference.

Can I apply different policies to different API paths?

Yes. Selectors support pathPrefix and HTTP method filters. Define separate policies per route and Fairvisor will apply only the matching rules per request.

How do I prevent one tenant from starving others?

Use per-tenant identity keys (e.g. jwt:org_id). Each tenant gets independent counter buckets — one tenant hitting their limit does not affect others.

What is the difference between warn, throttle, and reject actions?

warn — allows the request but sets headers indicating the budget is nearly exhausted. throttle — applies a response delay. reject — returns 429/503 immediately. Staged actions let you warn at 80%, throttle at 95%, and hard-reject at 100%.

How do I audit which policy was active at the time of an incident?

Bundle version and load timestamp are included in the /v1/status response. SaaS mode records a full policy change log with timestamps and actor.

Can Fairvisor enforce limits on non-LLM APIs?

Yes. The token bucket and cost-based budget limiters work on any HTTP API. LLM-specific limiters (TPM/TPD, token refund) apply only when configured on LLM endpoints, but rate and budget controls apply universally.

How do I set a global emergency kill switch for all traffic?

Add a kill switch rule with a selector that matches all paths (pathPrefix: /) and a descriptor that evaluates to true for the traffic you want to block. Push the bundle and it takes effect on the next request.

What is `fairvisor_descriptor_missing_total` and why is it spiking?

This counter increments when Fairvisor cannot extract the configured identity key from a request (e.g. a missing JWT claim or header). A spike usually means a gateway misconfiguration or a policy change that references a descriptor not being forwarded. Check gateway auth header passthrough and selector configuration.