[ GUIDE ]

How to monitor failed Laravel jobs in production

From the failed_jobs table to aggregated per-class failure rates — four layers, and when each is the right tool.

QUICK ANSWER

How do I monitor failed Laravel jobs?

Check php artisan queue:failed locally, listen to the JobFailed event in a service provider to dispatch alerts, and install a tool that groups failures by job class with per-class failure rate trending. Laravel's failed_jobs table captures the data; monitoring tools surface it. NightOwl groups every job attempt by class with status, duration, and retry history.

Updated · 2026-04-13

Layer 1 — The failed_jobs table

Laravel stores failed jobs in the failed_jobs table by default. Inspect the last 24 hours:

bash
php artisan queue:failed
# lists all failed jobs with UUID, connection, queue, class, failed_at

php artisan queue:retry all
# retries every failed job

php artisan queue:retry 5c1d1234-...
# retries a specific job by UUID

php artisan queue:flush
# deletes all failed job records

This is fine for small apps. Once you're past a few failures per day, eyeballing the list stops working — you need aggregation.

Layer 2 — Event listeners for alerts

Laravel fires Illuminate\Queue\Events\JobFailed when a job exhausts retries. Listen in EventServiceProvider:

app/Providers/EventServiceProvider.php

php
use Illuminate\Queue\Events\JobFailed;
use Illuminate\Support\Facades\Event;
use Illuminate\Support\Facades\Log;

public function boot(): void
{
    Event::listen(function (JobFailed $event) {
        Log::channel('slack')->error('Job failed', [
            'job' => $event->job->resolveName(),
            'queue' => $event->job->getQueue(),
            'exception' => $event->exception->getMessage(),
            'attempts' => $event->job->attempts(),
        ]);
    });
}

Pair with a Slack log channel in config/logging.php. You get a ping on every failure. Good enough for low-volume apps; noisy for anything bigger.

Layer 3 — Per-class failed() handler

For class-specific cleanup or alerts, Laravel calls a failed() method on the job if present:

app/Jobs/SendWelcomeEmail.php

php
public function failed(Throwable $exception): void
{
    // Mark the user as needing manual outreach
    $this->user->update(['onboarding_failed_at' => now()]);

    // Alert the customer success team
    CustomerSuccess::notify(new OnboardingEmailFailed($this->user, $exception));
}

Useful for side effects (cleanup, customer support notifications). Don't use it for monitoring — you'd be duplicating logic across every job class.

Layer 4 — Aggregated failure rate per class

The real signal is failure rate trending, not individual failures. A job class that goes from 0.1% failure rate to 5% is in trouble, even if individual failures look routine.

Metrics to surface per job class:

  • Total attempts (count)
  • Success / released / failed breakdown
  • p50, p95, p99 duration
  • Failure rate over time (spike detection)
  • Most common exception fingerprint per class

Horizon does this for Redis queues in real time. For other drivers (database, SQS, Beanstalkd) you need an APM. NightOwl records every job attempt with class, duration, status, and exception — grouped by class for aggregate trends.

Common failure patterns worth catching

  1. External API timeouts — wrap HTTP calls in Http::timeout(); retry with backoff rather than letting the whole job fail.
  2. Database deadlocks — retry the transaction; don't count deadlocks as real failures.
  3. Serialization errors — Eloquent model referenced in the job was deleted before the worker picked it up. Use SerializesModels only when the model is guaranteed to exist.
  4. Memory exhaustion — jobs processing large collections should chunk with chunkById(). Tune worker --memory flag.
  5. Stale code on long-running workers — always queue:restart on deploy.

THE EASY WAY

NightOwl groups every job attempt by class with failure trending

NightOwl records every job attempt — connection, queue, class, duration, status (processed / released / failed), and exception fingerprint. Per-class dashboards show failure rate over time, p95 duration, and most common error. Alerts fire on rate spikes across any configured channel (Slack, Discord, Email, Webhook).

bash
composer require nightowl/agent
php artisan nightowl:install

Works with database, Redis, SQS, Beanstalkd — all queue drivers. From $5/month flat.

Frequently asked questions

Where do failed Laravel jobs go?

By default, Laravel writes failed jobs to the failed_jobs table (or the database configured in queue.failed). Each row contains the UUID, connection, queue, payload, exception, and failed_at. Jobs fail after exceeding their maxTries (default 1) or retryUntil deadline. Without monitoring, they sit there until someone runs php artisan queue:failed.

How do I get alerted when a Laravel job fails?

Three options from simplest to most robust: (1) listen to the Illuminate\Queue\Events\JobFailed event in a service provider and dispatch a Slack/email alert, (2) use a failed() method on the job class itself for per-class handling, (3) use a monitoring tool like NightOwl that groups failures by job class, tracks failure rate over time, and alerts on threshold breaches without code changes.

What's the difference between a failed job and a released job?

A released job was attempted, errored, and pushed back onto the queue for retry — it hasn't exhausted maxTries yet. A failed job has exhausted its retry budget and lives in failed_jobs permanently. Both are worth tracking: high release rates predict future failures, and failure-only dashboards miss flapping jobs that eventually succeed.

How do I retry all failed Laravel jobs?

php artisan queue:retry all — this dispatches every failed job back to the queue. You can target specific jobs by UUID (queue:retry <uuid>) or by queue (queue:retry --queue=emails). After retrying, jobs are removed from failed_jobs. If they fail again, they're re-added.

How do I detect stuck or slow Laravel jobs?

Two mechanisms. First, set timeout on the job class — workers kill jobs that exceed it. Second, monitor per-class duration percentiles: a job class whose p95 duration jumps from 2s to 30s is stuck even if it technically completes. Horizon surfaces this for Redis queues; NightOwl does it across all drivers.

Should I use Horizon or a third-party tool for queue monitoring?

Horizon is excellent if you're on Redis — it shows queue depth, throughput, and worker status in real time. It doesn't monitor failed jobs aggregated over long windows, and it only works with Redis. For database, SQS, Beanstalkd, or multi-driver setups, you'll need a third-party APM like NightOwl that covers all drivers.

How do I avoid losing failed jobs on deploy?

Two things. First, keep queue.failed.driver = 'database' (or another persistent store) so failed_jobs survives worker restarts. Second, always run php artisan queue:restart on deploy so long-running workers pick up your new code — old workers can process jobs with stale code and fail unexpectedly. Supervisord/systemd handle the actual restart.

What exceptions cause Laravel jobs to fail?

Any exception thrown from handle() that isn't caught. Common culprits: HTTP timeouts on external APIs, database deadlocks, memory limits (especially for jobs processing large collections), serialization failures on Eloquent models where the row was deleted, and TokenMismatchException on jobs that assume a CSRF context. Structured monitoring groups these by fingerprint so you can see which cause dominates.

PRICING

Flat pricing. No event caps. No per-seat fees.

14-day free trial, no credit card. Your PostgreSQL, your data.

HOBBY

$5 /month

1 app · 14 days lookback · all Laravel events

TEAM

$15 /month

Up to 3 connected apps · unlimited environments · all Laravel events

AGENCY

$69 /month

Unlimited apps · unlimited agent instances · same flat rate at any traffic

Related