The three terms
Example across all three
SLI: "The proportion of successful (non-5xx) /api/checkout requests."
Definition: count(status < 500) / count(*)
SLO: "99.9% of /api/checkout requests succeed over a rolling 28-day window."
Internal target; breached means we invest in reliability.
SLA: "We guarantee 99.5% availability of /api/checkout.
Customers exceeding failed requests receive service credits."
External contract; breached means we pay out.Error budget — the unlock
The inverse of the SLO. If your SLO is 99.9%, your error budget is 0.1%. Over a 30-day window that's 43 minutes of tolerable failure. The budget is a currency:
- Budget plentiful → green light for risky deploys, new features, velocity
- Budget running out → freeze risky changes, invest in reliability
- Budget fully consumed → hard freeze until the next window resets
This framing ends the debate between "ship features" and "fix reliability" teams — the error budget tells you which phase you're in.
Common Laravel SLOs
| Endpoint class | Latency SLO | Availability SLO |
|---|---|---|
| Auth, checkout, payment | p95 < 300ms | 99.9% / 28 days |
| JSON API (list / read) | p95 < 200ms | 99.5% / 28 days |
| Page render (dashboard) | p95 < 500ms | 99.5% / 28 days |
| Internal admin | p95 < 1000ms | 99% / 28 days |
Burn-rate alerts
Alert on how fast you're burning the budget, not on raw thresholds. A 1-hour burn rate of 14x means you'd exhaust a 30-day budget in about 2 days if it continued — actionable. A brief latency spike that uses 0.5% of the budget probably isn't.
Google's SRE workbook has canonical multi-window burn-rate alert setups: page on high burn over 1h + sustained burn over 6h.