Layer 1 — The failed_jobs table
Laravel stores failed jobs in the failed_jobs table by default. Inspect the last 24 hours:
php artisan queue:failed
# lists all failed jobs with UUID, connection, queue, class, failed_at
php artisan queue:retry all
# retries every failed job
php artisan queue:retry 5c1d1234-...
# retries a specific job by UUID
php artisan queue:flush
# deletes all failed job recordsThis is fine for small apps. Once you're past a few failures per day, eyeballing the list stops working — you need aggregation.
Layer 2 — Event listeners for alerts
Laravel fires Illuminate\Queue\Events\JobFailed when a job exhausts retries. Listen in EventServiceProvider:
app/Providers/EventServiceProvider.php
use Illuminate\Queue\Events\JobFailed;
use Illuminate\Support\Facades\Event;
use Illuminate\Support\Facades\Log;
public function boot(): void
{
Event::listen(function (JobFailed $event) {
Log::channel('slack')->error('Job failed', [
'job' => $event->job->resolveName(),
'queue' => $event->job->getQueue(),
'exception' => $event->exception->getMessage(),
'attempts' => $event->job->attempts(),
]);
});
}
Pair with a Slack log channel in config/logging.php. You get a ping on every failure. Good enough for low-volume apps; noisy for anything bigger.
Layer 3 — Per-class failed() handler
For class-specific cleanup or alerts, Laravel calls a failed() method on the job if present:
app/Jobs/SendWelcomeEmail.php
public function failed(Throwable $exception): void
{
// Mark the user as needing manual outreach
$this->user->update(['onboarding_failed_at' => now()]);
// Alert the customer success team
CustomerSuccess::notify(new OnboardingEmailFailed($this->user, $exception));
}Useful for side effects (cleanup, customer support notifications). Don't use it for monitoring — you'd be duplicating logic across every job class.
Layer 4 — Aggregated failure rate per class
The real signal is failure rate trending, not individual failures. A job class that goes from 0.1% failure rate to 5% is in trouble, even if individual failures look routine.
Metrics to surface per job class:
- Total attempts (count)
- Success / released / failed breakdown
- p50, p95, p99 duration
- Failure rate over time (spike detection)
- Most common exception fingerprint per class
Horizon does this for Redis queues in real time. For other drivers (database, SQS, Beanstalkd) you need an APM. NightOwl records every job attempt with class, duration, status, and exception — grouped by class for aggregate trends.
Common failure patterns worth catching
- External API timeouts — wrap HTTP calls in
Http::timeout(); retry with backoff rather than letting the whole job fail. - Database deadlocks — retry the transaction; don't count deadlocks as real failures.
- Serialization errors — Eloquent model referenced in the job was deleted before the worker picked it up. Use
SerializesModelsonly when the model is guaranteed to exist. - Memory exhaustion — jobs processing large collections should chunk with
chunkById(). Tune worker--memoryflag. - Stale code on long-running workers — always
queue:restarton deploy.
THE EASY WAY
NightOwl groups every job attempt by class with failure trending
NightOwl records every job attempt — connection, queue, class, duration, status (processed / released / failed), and exception fingerprint. Per-class dashboards show failure rate over time, p95 duration, and most common error. Alerts fire on rate spikes across any configured channel (Slack, Discord, Email, Webhook).
composer require nightowl/agent
php artisan nightowl:installWorks with database, Redis, SQS, Beanstalkd — all queue drivers. From $5/month flat.