Suggested SLOs
- Decision availability (
/v1/decision usable): >= 99.9%
- Readiness (
/readyz): >= 99.95%
- Reject classification accuracy: no unexplained reason spikes
- SaaS connectivity (for SaaS mode): target >= 99.0%
Core alerts
1) Reject spike
- alert: FairvisorRejectSpike
expr: rate(fairvisor_decisions_total{action="reject"}[5m]) > 50
for: 10m
labels:
severity: warning
annotations:
summary: "Fairvisor reject rate spike"
2) No bundle loaded
- alert: FairvisorNoBundleLoaded
expr: rate(fairvisor_decisions_total{action="reject",reason="no_bundle_loaded"}[5m]) > 0
for: 2m
labels:
severity: critical
3) SaaS disconnected
- alert: FairvisorSaasDisconnected
expr: fairvisor_saas_reachable == 0
for: 5m
labels:
severity: warning
4) Descriptor mismatch regression
- alert: FairvisorDescriptorMissing
expr: rate(fairvisor_descriptor_missing_total[5m]) > 0
for: 10m
labels:
severity: warning
Recommended dashboard panels
- reject rate by reason
- allow/reject split
- retry-after bucket distribution
- saas reachable state
- top descriptor-missing keys