Why average is misleading
Consider two routes with the same average latency of 149ms:
| Route | Avg | p95 | p99 | Max |
|---|---|---|---|---|
| /api/orders | 149ms | 180ms | 210ms | 280ms |
| /api/checkout | 149ms | 50ms | 6,500ms | 12,000ms |
Same average. Wildly different user experience. /api/orders is uniformly slow-ish; /api/checkout is fast for most people but catastrophic for a long tail. Only p95 and p99 reveal the truth.
How to compute p95 in Laravel
The exact definition: sort all N durations ascending, pick the value at index floor(N * 0.95).
Exact p95 in PostgreSQL
SELECT
route,
COUNT(*) AS requests,
AVG(duration_ms) AS avg_ms,
percentile_cont(0.5) WITHIN GROUP (ORDER BY duration_ms) AS p50,
percentile_cont(0.95) WITHIN GROUP (ORDER BY duration_ms) AS p95,
percentile_cont(0.99) WITHIN GROUP (ORDER BY duration_ms) AS p99
FROM request_logs
WHERE created_at > NOW() - INTERVAL '1 hour'
GROUP BY route
ORDER BY p95 DESC
LIMIT 20; percentile_cont computes the exact percentile by interpolation. For MySQL, PERCENTILE_CONT is window-function form (8.0+) or compute manually with ORDER BY duration_ms LIMIT 1 OFFSET FLOOR(COUNT*0.95).
How APMs compute p95 at scale
Storing every request's duration forever is expensive. APMs use streaming percentile algorithms — t-digest or HDR histograms — that estimate p95 from a small fixed memory footprint with bounded error.
NightOwl stores exact per-request duration in your PostgreSQL (which is cheap at Laravel volumes — Postgres handles this at billions of rows) and computes p95 on-demand using percentile_cont. No accuracy tradeoff.
Alerting on p95
Good p95 alerts have three properties:
- Per-route, not global. Global p95 hides per-endpoint regressions. Alert on specific high-value routes (/checkout, /login, /api/orders).
- Comparative, not absolute. "p95 exceeded 500ms" is a weak signal if the baseline was 450ms. "p95 is 30% above last week's median" catches regressions.
- Sustained, not instant. Alert when p95 exceeds threshold for 5+ consecutive minutes. Single-minute spikes are usually noise.
What p95 doesn't tell you
P95 is a single number — it can't tell you why the slow tail is slow. Pair it with:
- Per-request trace view — see which query or external call ate the budget
- User dimension — is the slow tail concentrated in specific user segments?
- Time dimension — is the tail uniform or spiky at certain hours?
- p99 and max — does p99 correlate with p95 or does it blow up independently?
THE EASY WAY
NightOwl computes p95 per route with trace-level drilldown
Every Laravel request is recorded with route, duration, and component spans. The requests dashboard shows p95 per route with time-series trending. Click into a route to see its slowest individual requests, and into a request to see which query or external call drove the latency.
composer require nightowl/agent
php artisan nightowl:installFrom $5/month flat. Data in your PostgreSQL.