1. Keep your workers alive
Queue workers are long-running processes. They leak memory, die on OOM, and restart on deploy. You need a supervisor that restarts them automatically.
Supervisor config — /etc/supervisor/conf.d/laravel-worker.conf
[program:laravel-worker]
process_name=%(program_name)s_%(process_num)02d
command=php /srv/app/artisan queue:work redis --sleep=3 --tries=3 --max-time=3600
autostart=true
autorestart=true
stopasgroup=true
killasgroup=true
user=www-data
numprocs=4
redirect_stderr=true
stdout_logfile=/srv/app/storage/logs/worker.log
stopwaitsecs=3600If you're on Redis, Horizon is strictly better than raw Supervisor — it auto-scales workers, exposes a dashboard, and handles graceful deploys.
composer require laravel/horizon
php artisan horizon:install
php artisan horizon2. Handle failures loud, not silent
Laravel writes failed jobs to the failed_jobs table by default. That's useful as a dead-letter store but invisible unless you look.
Hook into the Queue::failing() event to alert on failures:
use Illuminate\Queue\Events\JobFailed;
use Illuminate\Support\Facades\Queue;
public function boot(): void
{
Queue::failing(function (JobFailed $event) {
logger()->error('Job failed', [
'job' => $event->job->resolveName(),
'connection' => $event->connectionName,
'queue' => $event->job->getQueue(),
'attempts' => $event->job->attempts(),
'exception' => $event->exception->getMessage(),
]);
});
}Logging is a baseline. For production alerting you want something that groups failures by job class and throttles duplicate noise — raw logger() calls will drown your channel.
3. Track attempts, duration, and throughput
A job that takes 10x longer today than yesterday is a problem — even if it doesn't fail. The three metrics that matter:
- Wait time — how long a job sat in the queue before a worker picked it up
- Processing time — how long handle() took, with p50/p95/p99 per job class
- Attempt count — how often jobs are being retried before succeeding
Horizon exposes wait time and throughput for Redis queues on its metrics page. For everything else (SQS, database, mixed drivers, per-class duration percentiles), you need an APM with queue-specific watchers.
4. Tag jobs for context
When a job fails, you want to know whose data it was processing. Use Horizon tags or add context via $this->user_id properties that your APM can surface.
class ChargeCustomer implements ShouldQueue
{
public function __construct(public Customer $customer) {}
public function tags(): array
{
return ['billing', 'customer:'.$this->customer->id];
}
public function handle(): void
{
// ...
}
}THE EASY WAY
NightOwl groups every queue attempt by job class
NightOwl's queue watcher records every dispatch, released/processed/failed transition, and exception, grouped by job class. You see p50/p95/p99 processing duration per class, wait-time trends, and failure fingerprints in one view — across all drivers (Redis, SQS, database, Beanstalkd), not just Redis.
Combine it with alert channels (Slack, Discord, Email, Webhook) to get paged on failure spikes without wiring custom Queue::failing listeners.
composer require nightowl/agent
php artisan nightowl:installTelemetry lands in your PostgreSQL. Works alongside Horizon — NightOwl for the big picture, Horizon for live worker ops.