[ GLOSSARY ]

Head-based vs tail-based trace sampling

QUICK ANSWER

What's the difference between head-based and tail-based sampling?

Head-based sampling decides to keep a trace before the trace completes — usually a random % per trace ID. Cheap, simple, but you throw away traces without knowing whether they were interesting. Tail-based sampling buffers the whole trace and decides after it finishes — you can keep every error and every slow trace while sampling normal ones. Smarter but operationally heavier.

Updated · 2026-04-13

Why sample at all

A fully-instrumented request generates 5-20 spans. At 1,000 req/s that's 10M-20M spans/day. At ~1 KB per span that's 10-20 GB/day, before retention. Storage adds up quickly. Sampling trades data volume for cost.

Head-based sampling

Decide at the root span. A deterministic hash of the trace ID lets distributed services agree on the same decision — a trace either lives or dies across every service it touches.

OpenTelemetry head sampler (PHP)

php
use OpenTelemetry\SDK\Trace\Sampler\TraceIdRatioBasedSampler;
use OpenTelemetry\SDK\Trace\TracerProvider;

$sampler = new TraceIdRatioBasedSampler(0.1); // keep 10%

$tracerProvider = TracerProvider::builder()
    ->setSampler($sampler)
    ->build();

Pros: cheap (no buffering), works cross-service, zero operational complexity. Cons: you throw away traces before knowing whether they were errors or slow. 99% of your interesting traces are in the 10% you're statistically likely to have sampled out.

Tail-based sampling

Buffer every span for some window (usually 30-60 seconds, keyed by trace ID). When the trace completes, apply rules to decide whether to keep it.

OTel Collector tail sampling config

yaml
processors:
  tail_sampling:
    decision_wait: 30s
    num_traces: 50000
    policies:
      - name: errors-always
        type: status_code
        status_code:
          status_codes: [ERROR]
      - name: slow-always
        type: latency
        latency:
          threshold_ms: 1000
      - name: sample-normal
        type: probabilistic
        probabilistic:
          sampling_percentage: 10

Pros: keep all errors, keep all slow traces, sample the rest. Much better data quality per dollar. Cons: you need a Collector cluster with enough memory to buffer all in-flight traces. Operational complexity.

Hybrid approach

Many production systems combine both: head-sample at 100% (keep everything) through a Collector that then applies tail rules. The head-sampler stops being a sampler and becomes a pass-through; the tail rules do the actual filtering.

The NightOwl approach

NightOwl doesn't sample by default. At typical Laravel volumes (1-10K req/s) storing everything in PostgreSQL is cheap, and the full dataset matters for debugging rare issues. At high volumes where storage cost matters, configure sampling at the agent level — the Nightwatch package supports it via its nightwatch.sample_rate config key.

Frequently asked questions

What's the difference between head-based and tail-based trace sampling?

Head-based sampling decides to keep a trace at the moment the root span starts — before the trace is complete. Tail-based sampling buffers the whole trace and decides after it finishes. Head is cheap and fast; tail is smarter because you can keep all error traces and all slow traces and drop the boring ones.

Why do I need to sample traces at all?

Cost. A fully-sampled trace at moderate traffic (1000 req/s) generates ~100 GB of trace data per day at ~10 spans per request. At 50¢/GB stored for 14 days, that's $700/month just for traces. Sampling at 10% cuts it to $70. At 1% with tail-sampling bias toward errors, you keep all the interesting data at ~$7.

How do I configure head-based sampling in Laravel?

In OpenTelemetry SDK config, set TraceIdRatioBasedSampler with a ratio (0.1 = 10%, 0.01 = 1%). The SDK will deterministically keep or drop based on the trace ID's first bytes, so all services with the same sampler config agree on the same decision — keeping cross-service trace consistency.

What's the right sampling rate?

Depends on traffic and budget. For low-traffic apps (under 100 req/s) keep everything. For moderate traffic (100-1000 req/s) head-sample at 10% + always keep errors and slow requests (tail-sampling). For high traffic (1000+ req/s) head-sample at 1% + tail-sampling for anomalies. The pure head-sampling 'all at 100%' stops being viable around 1B traces/month.

How does NightOwl handle sampling?

NightOwl stores every request by default — no sampling — because BYOD Postgres at Laravel-typical volumes (1K-10K req/s) is still cheap. At very high volumes (tens of thousands of req/s) we recommend enabling sampling at the agent level. This is the opposite tradeoff from cloud APMs, where per-event cost forces sampling earlier.

Can I use tail-based sampling with OpenTelemetry?

Yes, via the OTel Collector's tail_sampling_processor. You run a Collector cluster that buffers spans per trace ID for a window (usually 30-60 seconds) then decides what to keep based on rules (always keep errors, sample 10% of successful traces over 1 second, etc.). More operational complexity than head-sampling but dramatically better data quality.

PRICING

Flat pricing. No event caps. No per-seat fees.

14-day free trial, no credit card. Your PostgreSQL, your data.

HOBBY

$5 /month

1 app · 14 days lookback · all Laravel events

TEAM

$15 /month

Up to 3 connected apps · unlimited environments · all Laravel events

AGENCY

$69 /month

Unlimited apps · unlimited agent instances · same flat rate at any traffic