HA Installation

This guide describes a practical HA pattern for Fairvisor when you run two parallel installations (A and B) behind one gateway.

Key constraint

Fairvisor OSS keeps limiter state in local memory (ngx.shared.dict) per instance.

  • distributed/global shared counters are not supported
  • each installation enforces limits independently

How you configure limits depends entirely on how your load balancer routes traffic.

Option 1: Random / round-robin balancing

Each request is routed to any instance regardless of the client key. On average, each instance sees ~50% of the traffic for any given key, so you divide every limit by 2.

Client
  -> Gateway / LB (round-robin)
     -> Fairvisor A (50% limits)
     -> Fairvisor B (50% limits)
         -> Upstream

How it works: a client sending 120 req/s against a 100 req/s limit distributes ~60 req/s to each instance. Each instance enforces its 50 req/s limit and rejects 10 — total 100 req/s allowed.

How to set limits (÷2)

  • token_bucket.tokens_per_second → divide by 2
  • token_bucket.burst → divide by 2
  • cost_based.budget → divide by 2
  • token_bucket_llm.tokens_per_minute → divide by 2
  • token_bucket_llm.tokens_per_day → divide by 2
  • per-request caps (max_*) stay unchanged

Traffic requirements

  • keep distribution close to 50/50
  • avoid sticky routing that sends a key mostly to one installation
  • monitor per-installation reject rates and drift — heavy skew means one instance rejects early while the other still has budget

Option 2: Sticky balancing (per-key)

Each client key (API key, user ID, tenant) is consistently routed to the same installation. Each instance enforces the full limit independently for the keys it owns.

Client key X  -> Gateway / LB (hash on key)  -> Fairvisor A (100% limits)
Client key Y  -> Gateway / LB (hash on key)  -> Fairvisor B (100% limits)
                                                   -> Upstream

How it works: each instance only ever sees traffic for its own set of keys and enforces the full configured limit for each. No division needed.

Configure your load balancer to hash on the header or field that corresponds to your limit_keys (e.g. Authorization, X-Tenant-ID).

How to set limits (unchanged)

Use your target limits directly — no division. Both installations carry identical policy.

Traffic requirements

  • sticky routing must be consistent — a key that hops between instances will see independent (effectively doubled) limits
  • monitor per-installation traffic volume to detect key distribution imbalance

Examples

Token bucket

Target global policy:

{
  "algorithm": "token_bucket",
  "algorithm_config": {
    "tokens_per_second": 100,
    "burst": 200
  }
}
Balancing Per-installation config
Round-robin tokens_per_second: 50, burst: 100
Sticky tokens_per_second: 100, burst: 200

Cost budget

Target global daily budget: 100000

Balancing Per-installation budget
Round-robin 50000
Sticky 100000

LLM token budgets

Target global: tokens_per_minute = 120000, tokens_per_day = 2000000

Balancing tokens_per_minute tokens_per_day
Round-robin 60000 1000000
Sticky 120000 2000000

Keep max_prompt_tokens, max_completion_tokens, max_tokens_per_request unchanged in both cases.

Operational checklist

  1. Deploy installation A and B in separate failure domains (nodes/zones)
  2. Decide on balancing mode and configure limits accordingly
  3. Validate both /readyz endpoints in health checks
  4. Test failover (all traffic to one side) and confirm expected behavior
  5. Alert on reject-rate imbalance between A and B

Failure behavior

If one installation is down, all traffic shifts to the remaining one.

  • Round-robin: the surviving instance has only half the limit — effective global limit halves during the outage.
  • Sticky: the surviving instance takes over all keys but still enforces full limits — clients whose keys now land on the wrong instance see full enforcement without history.

Options for degraded mode:

  • accept temporary stricter limiting during failure
  • maintain alternate emergency policy bundle with adjusted limits