HA Installation

This guide describes a practical HA pattern for Fairvisor when you run two parallel installations (A and B) behind one gateway.

Key constraint

Fairvisor OSS keeps limiter state in local memory (ngx.shared.dict) per instance.

distributed/global shared counters are not supported
each installation enforces limits independently

How you configure limits depends entirely on how your load balancer routes traffic.

Option 1: Random / round-robin balancing

Each request is routed to any instance regardless of the client key. On average, each instance sees ~50% of the traffic for any given key, so you divide every limit by 2.

Client
  -> Gateway / LB (round-robin)
     -> Fairvisor A (50% limits)
     -> Fairvisor B (50% limits)
         -> Upstream

How it works: a client sending 120 req/s against a 100 req/s limit distributes ~60 req/s to each instance. Each instance enforces its 50 req/s limit and rejects 10 — total 100 req/s allowed.

How to set limits (÷2)

token_bucket.tokens_per_second → divide by 2
token_bucket.burst → divide by 2
cost_based.budget → divide by 2
token_bucket_llm.tokens_per_minute → divide by 2
token_bucket_llm.tokens_per_day → divide by 2
per-request caps (max_*) stay unchanged

Traffic requirements

keep distribution close to 50/50
avoid sticky routing that sends a key mostly to one installation
monitor per-installation reject rates and drift — heavy skew means one instance rejects early while the other still has budget

Option 2: Sticky balancing (per-key)

Each client key (API key, user ID, tenant) is consistently routed to the same installation. Each instance enforces the full limit independently for the keys it owns.

Client key X  -> Gateway / LB (hash on key)  -> Fairvisor A (100% limits)
Client key Y  -> Gateway / LB (hash on key)  -> Fairvisor B (100% limits)
                                                   -> Upstream

How it works: each instance only ever sees traffic for its own set of keys and enforces the full configured limit for each. No division needed.

Configure your load balancer to hash on the header or field that corresponds to your limit_keys (e.g. Authorization, X-Tenant-ID).

How to set limits (unchanged)

Use your target limits directly — no division. Both installations carry identical policy.

Traffic requirements

sticky routing must be consistent — a key that hops between instances will see independent (effectively doubled) limits
monitor per-installation traffic volume to detect key distribution imbalance

Examples

Token bucket

Target global policy:

{
  "algorithm": "token_bucket",
  "algorithm_config": {
    "tokens_per_second": 100,
    "burst": 200
  }
}

Balancing	Per-installation config
Round-robin	`tokens_per_second: 50`, `burst: 100`
Sticky	`tokens_per_second: 100`, `burst: 200`

Cost budget

Target global daily budget: 100000

Balancing	Per-installation `budget`
Round-robin	`50000`
Sticky	`100000`

LLM token budgets

Target global: tokens_per_minute = 120000, tokens_per_day = 2000000

Balancing	`tokens_per_minute`	`tokens_per_day`
Round-robin	`60000`	`1000000`
Sticky	`120000`	`2000000`

Keep max_prompt_tokens, max_completion_tokens, max_tokens_per_request unchanged in both cases.

Operational checklist

Deploy installation A and B in separate failure domains (nodes/zones)
Decide on balancing mode and configure limits accordingly
Validate both /readyz endpoints in health checks
Test failover (all traffic to one side) and confirm expected behavior
Alert on reject-rate imbalance between A and B

Failure behavior

If one installation is down, all traffic shifts to the remaining one.

Round-robin: the surviving instance has only half the limit — effective global limit halves during the outage.
Sticky: the surviving instance takes over all keys but still enforces full limits — clients whose keys now land on the wrong instance see full enforcement without history.

Options for degraded mode:

accept temporary stricter limiting during failure
maintain alternate emergency policy bundle with adjusted limits

HA Installation

Key constraint

Option 1: Random / round-robin balancing

How to set limits (÷2)

Traffic requirements

Option 2: Sticky balancing (per-key)

How to set limits (unchanged)

Traffic requirements

Examples

Token bucket

Cost budget

LLM token budgets

Operational checklist

Failure behavior

Related pages