HA Installation
This guide describes a practical HA pattern for Fairvisor when you run two parallel installations (A and B) behind one gateway.
Key constraint
Fairvisor OSS keeps limiter state in local memory (ngx.shared.dict) per instance.
- distributed/global shared counters are not supported
- each installation enforces limits independently
How you configure limits depends entirely on how your load balancer routes traffic.
Option 1: Random / round-robin balancing
Each request is routed to any instance regardless of the client key. On average, each instance sees ~50% of the traffic for any given key, so you divide every limit by 2.
Client
-> Gateway / LB (round-robin)
-> Fairvisor A (50% limits)
-> Fairvisor B (50% limits)
-> Upstream
How it works: a client sending 120 req/s against a 100 req/s limit distributes ~60 req/s to each instance. Each instance enforces its 50 req/s limit and rejects 10 — total 100 req/s allowed.
How to set limits (÷2)
token_bucket.tokens_per_second→ divide by 2token_bucket.burst→ divide by 2cost_based.budget→ divide by 2token_bucket_llm.tokens_per_minute→ divide by 2token_bucket_llm.tokens_per_day→ divide by 2- per-request caps (
max_*) stay unchanged
Traffic requirements
- keep distribution close to 50/50
- avoid sticky routing that sends a key mostly to one installation
- monitor per-installation reject rates and drift — heavy skew means one instance rejects early while the other still has budget
Option 2: Sticky balancing (per-key)
Each client key (API key, user ID, tenant) is consistently routed to the same installation. Each instance enforces the full limit independently for the keys it owns.
Client key X -> Gateway / LB (hash on key) -> Fairvisor A (100% limits)
Client key Y -> Gateway / LB (hash on key) -> Fairvisor B (100% limits)
-> Upstream
How it works: each instance only ever sees traffic for its own set of keys and enforces the full configured limit for each. No division needed.
Configure your load balancer to hash on the header or field that corresponds to your limit_keys (e.g. Authorization, X-Tenant-ID).
How to set limits (unchanged)
Use your target limits directly — no division. Both installations carry identical policy.
Traffic requirements
- sticky routing must be consistent — a key that hops between instances will see independent (effectively doubled) limits
- monitor per-installation traffic volume to detect key distribution imbalance
Examples
Token bucket
Target global policy:
{
"algorithm": "token_bucket",
"algorithm_config": {
"tokens_per_second": 100,
"burst": 200
}
}
| Balancing | Per-installation config |
|---|---|
| Round-robin | tokens_per_second: 50, burst: 100 |
| Sticky | tokens_per_second: 100, burst: 200 |
Cost budget
Target global daily budget: 100000
| Balancing | Per-installation budget |
|---|---|
| Round-robin | 50000 |
| Sticky | 100000 |
LLM token budgets
Target global: tokens_per_minute = 120000, tokens_per_day = 2000000
| Balancing | tokens_per_minute |
tokens_per_day |
|---|---|---|
| Round-robin | 60000 |
1000000 |
| Sticky | 120000 |
2000000 |
Keep max_prompt_tokens, max_completion_tokens, max_tokens_per_request unchanged in both cases.
Operational checklist
- Deploy installation A and B in separate failure domains (nodes/zones)
- Decide on balancing mode and configure limits accordingly
- Validate both
/readyzendpoints in health checks - Test failover (all traffic to one side) and confirm expected behavior
- Alert on reject-rate imbalance between A and B
Failure behavior
If one installation is down, all traffic shifts to the remaining one.
- Round-robin: the surviving instance has only half the limit — effective global limit halves during the outage.
- Sticky: the surviving instance takes over all keys but still enforces full limits — clients whose keys now land on the wrong instance see full enforcement without history.
Options for degraded mode:
- accept temporary stricter limiting during failure
- maintain alternate emergency policy bundle with adjusted limits