Saqueib Ansari

Posted on Apr 14 • Originally published at qcode.in

Why Your Laravel Queue Stops Processing Without Telling You

#laravel #queues #php #debugging

Most Laravel queue “bugs” aren’t bugs—they’re missing feedback loops. If a job can fail without paging you (or at least showing up somewhere you actually look), it will. The fix is not “add more retries” or “restart the worker more often”. The fix is to make failure observable, make retries intentional, and make “lost work” impossible to ignore.

This post is a production-focused checklist of the silent failure modes I see most often in Laravel queues, why they happen, and how to harden your setup so you detect, alert, and recover quickly.

The silent failure pattern: your queue is working… until it isn’t

A queue feels healthy when:

queue:work is running
Horizon (or your process manager) shows workers online
jobs are being pushed

But “healthy” is not “reliable”. Silent failure usually looks like one of these:

jobs stop being processed for minutes/hours with no alerts
jobs are processed but do nothing (early returns, swallowed exceptions)
jobs fail and get retried forever (or die) without anyone noticing
jobs are “processed” but side effects don’t happen (DB committed, external API not called, emails not sent)

The root cause is almost always one of:

the worker process is alive but stuck
the job is failing in a way that doesn’t surface
the job is being released/backed off indefinitely
the queue driver semantics are misunderstood

If you only take one recommendation: treat queue health as an SLO with alerting, not as a background implementation detail.

Failure mode 1: the worker is running but not processing (stuck/hung)

The most dangerous state is a worker process that’s alive but not making progress. Your supervisor says it’s “RUNNING”, but jobs pile up.

Common causes:

long-running jobs without timeouts (HTTP calls with no timeout, stuck I/O)
deadlocks or slow queries holding a connection
memory leaks causing swapping / GC thrash
a single job monopolizing the worker

What to do (opinionated)

1) Set hard timeouts at multiple layers:

job-level timeout
worker-level --timeout
HTTP client timeouts

2) Make “no progress” alertable:

alert on queue depth (backlog)
alert on oldest job age
alert on Horizon “wait time” (if using Horizon)

3) Prefer more workers with shorter jobs over fewer workers with long jobs. Long-running jobs are where silent failures breed.

Example: enforce timeouts and fail fast

<?php

namespace App\Jobs;

use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;
use Illuminate\Support\Facades\Http;

class SyncVendorCatalog implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    // Hard stop for the worker
    public int $timeout = 60;

    // Don’t keep retrying forever; make it loud
    public int $tries = 3;

    // Optional: prevent a job from being attempted too long after it was queued
    public int $maxExceptions = 1;

    public function handle(): void
    {
        $response = Http::timeout(10)           // connect+read timeout
            ->retry(2, 200)                     // short retry with backoff
            ->get('https://api.vendor.com/catalog');

        $response->throw();

        // … process payload
    }
}

This does two important things:

the job cannot hang indefinitely
failures become deterministic (you’ll hit failed_jobs after tries)

If you’re using curl directly, Guzzle, S3 clients, etc., set timeouts there too. Laravel can’t kill a job that’s blocked in a syscall unless the worker timeout triggers.

Operational note

If you run workers with php artisan queue:work, use explicit flags:

--timeout=60
--sleep=1
--tries=3

And don’t pretend queue:listen is acceptable in production; it’s slower and easier to misconfigure.

Official docs: Queues https://laravel.com/docs/queues

Failure mode 2: exceptions are swallowed (your job “succeeds” while doing nothing)

This is the classic silent failure: the job finishes without throwing, so the queue driver marks it as done—even though the intended side effect never happened.

Where it happens:

try { ... } catch (\Throwable $e) { /* log? */ } with no rethrow
returning early on invalid state without recording it
“best effort” integrations that ignore non-200 responses

What to do

Be strict: if a job is responsible for a side effect, it should throw when it can’t perform it. “Best effort” should be explicit and observable.

If you truly want to swallow an error (rare), you still need:

a metric increment
an error report (Sentry/Bugsnag)
a dead-letter workflow (manual replay)

Example: don’t swallow; classify

<?php

namespace App\Jobs;

use App\Services\Payments\PaymentGateway;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;
use Throwable;

class CapturePayment implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    public int $tries = 5;
    public int $backoff = 30; // seconds

    public function __construct(public int $orderId) {}

    public function handle(PaymentGateway $gateway): void
    {
        try {
            $gateway->capture($this->orderId);
        } catch (Throwable $e) {
            // If it’s transient, rethrow to retry.
            // If it’s permanent, fail fast so it lands in failed_jobs.

            if ($gateway->isPermanentFailure($e)) {
                $this->fail($e); // marks as failed immediately
                return;
            }

            throw $e;
        }
    }
}

Key judgment call: “permanent failure” should not burn retries. You want it to go to failed_jobs quickly so you can handle it (refund, contact user, etc.).

Failure mode 3: misconfigured retries/backoff create infinite limbo

Laravel makes it easy to back off and retry, but it’s also easy to create jobs that never succeed and never fail loudly.

How this happens:

release() is called repeatedly without a cap
backoff grows but tries is high or defaulted
retryUntil() pushes the failure window far out
Horizon auto-balancing hides the fact that one queue is stuck

What to do

Set finite retries ($tries) for most jobs.
Use bounded retry windows for time-sensitive work (retryUntil).
For idempotent tasks that can be replayed later, fail early and rely on a replay mechanism.

A reasonable default for many teams:

$tries = 3 for external API calls
$tries = 1 for non-idempotent side effects unless you’ve built idempotency
$backoff = [10, 60, 300] for transient network issues

If you can’t explain why a job is safe to retry, it probably isn’t.

Failure mode 4: “lost” jobs due to driver semantics and worker restarts

Not all queue drivers behave the same. Silent loss or duplication is often a mismatch between assumptions and reality.

Database driver gotchas

Jobs are stored in a table; workers poll.
Under load, polling delay can look like “stuck”.
Long transactions can block job reservation.

If you’re at the point where queue latency matters, move off database.

Redis driver gotchas

Redis is the default for a reason: it’s fast and supports good semantics. But you still need to understand:

visibility timeout / retry_after: if a worker dies mid-job, the job becomes available again after retry_after.
if retry_after is too small relative to job runtime, you’ll get duplicate processing.

In Laravel, retry_after is configured per connection in config/queue.php.

Rule: set retry_after comfortably above your maximum real job runtime (including worst-case API slowness), and keep job timeouts below it.

Supervisor/systemd gotchas

Workers restart. Deploys happen. Servers reboot.

Silent failure here is often:

workers not restarted after deploy (stale code)
process manager configured to restart too aggressively (thrashing)
logs not captured anywhere useful

If you deploy frequently, prefer Laravel Horizon for Redis-backed queues. It gives you process control, visibility, and per-queue balancing.

Official link: Horizon https://laravel.com/docs/horizon

Failure mode 5: you have “failed jobs”, but nobody sees them

Laravel will happily write to failed_jobs (or Horizon’s failed list) forever while your team never checks it.

This is the most common “silent failure” in real companies: the system is technically recording failure, but it’s not operationally connected to humans.

What to do (minimum viable)

Ensure failed job storage is configured (queue:failed-table migration for DB).
Alert on failed job rate.
Give engineers a one-command way to inspect and replay.

Concrete setup: hook into Queue events and report

Laravel emits queue events you can subscribe to. Use them to send failures to Sentry/Bugsnag and to emit metrics.

<?php

namespace App\Providers;

use Illuminate\Queue\Events\JobFailed;
use Illuminate\Queue\Events\JobProcessed;
use Illuminate\Queue\Events\JobProcessing;
use Illuminate\Support\Facades\Event;
use Illuminate\Support\ServiceProvider;

class QueueObservabilityServiceProvider extends ServiceProvider
{
    public function boot(): void
    {
        Event::listen(JobProcessing::class, function (JobProcessing $event) {
            // Example: increment a metric, add trace context, etc.
            // statsd()->increment('queue.job_processing', 1, ['queue' => $event->job->getQueue()]);
        });

        Event::listen(JobProcessed::class, function (JobProcessed $event) {
            // statsd()->increment('queue.job_processed', 1, ['queue' => $event->job->getQueue()]);
        });

        Event::listen(JobFailed::class, function (JobFailed $event) {
            // Send to your error tracker
            if (app()->bound('sentry')) {
                \Sentry\captureException($event->exception);
            }

            // Also log with strong context
            logger()->error('Queue job failed', [
                'connection' => $event->connectionName,
                'queue'      => $event->job->getQueue(),
                'job'        => $event->job->resolveName(),
                'uuid'       => method_exists($event->job, 'uuid') ? $event->job->uuid() : null,
                'message'    => $event->exception->getMessage(),
            ]);
        });
    }
}

This is deliberately boring: make failures impossible to ignore by routing them into the same system that pages you for HTTP 500s.

If you’re using Horizon, also configure its notification hooks for long waits and failures.

Practical alerting targets

Pick alerts that catch “silent” quickly without constant noise:

Queue backlog: jobs waiting > N for > 5 minutes
Oldest job age: oldest pending job > X minutes
Failed jobs rate: > 0 in 10 minutes for critical queues
No processing: processed count = 0 while pending count increases

Don’t overcomplicate it. Two alerts (oldest job age + failed jobs) catch most incidents.

Failure mode 6: duplication and non-idempotent side effects (the “it ran twice” incident)

Some teams call this a “silent failure” because the queue doesn’t error—but users see double emails, double charges, duplicate webhooks.

Duplication happens when:

a worker times out but the side effect already happened
retry_after expires and the job is re-queued while still running
external systems retry webhooks and you enqueue duplicates

What to do

Assume at-least-once delivery. Build idempotency where it matters.

For emails/notifications: store a send record keyed by a deterministic id
For payments: use provider idempotency keys (Stripe supports this) and store request IDs
For “sync” jobs: use upserts and version checks

Example: simple idempotency guard using cache lock

This won’t solve every case, but it’s a strong baseline for “don’t run twice concurrently” and “avoid rapid duplicates”.

<?php

namespace App\Jobs;

use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;
use Illuminate\Support\Facades\Cache;

class SendInvoiceEmail implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    public int $tries = 3;

    public function __construct(public int $invoiceId) {}

    public function handle(): void
    {
        $key = "invoice:{$this->invoiceId}:email_sent";

        // 24h idempotency window
        if (Cache::has($key)) {
            return;
        }

        $lock = Cache::lock("lock:{$key}", 30);

        if (! $lock->get()) {
            // Another worker is doing it.
            return;
        }

        try {
            // ... send email

            Cache::put($key, true, now()->addDay());
        } finally {
            optional($lock)->release();
        }
    }
}

Tradeoff: cache-based idempotency is operationally dependent on Redis. For “money moved” side effects, prefer database unique constraints or provider idempotency keys.

What to change in a real Laravel codebase this week

If your queue failures are currently “silent”, don’t start by rewriting jobs. Start by tightening the system around them.

1) Move critical queues to Redis + Horizon if you’re still on the database driver.

2) Set explicit timeouts and retries on every job that touches the network.

3) Wire JobFailed to your error tracker and create one alert: “failed jobs > 0 on critical queue”.

4) Alert on oldest job age (this catches stuck workers even when nothing is failing).

5) Add idempotency to the top 1–2 risky jobs (payments, emails, webhooks). Don’t boil the ocean.

Decision rule to keep you out of trouble

If a queued job triggers an external side effect (email, payment, webhook, data sync), treat it like a mini production service:

it must time out
it must fail loudly (error tracker + alert)
it must be safe to retry (or it must not retry)

When you’re unsure, bias toward: fail fast → land in failed jobs → replay intentionally. Silent “best effort” is how queues rot in production.

Read the full post on QCode: https://qcode.in/why-your-laravel-queue-fails-silently-and-how-to-fix-it/

DEV Community

Why Your Laravel Queue Stops Processing Without Telling You

The silent failure pattern: your queue is working… until it isn’t

Failure mode 1: the worker is running but not processing (stuck/hung)

What to do (opinionated)

Example: enforce timeouts and fail fast

Operational note

Failure mode 2: exceptions are swallowed (your job “succeeds” while doing nothing)

What to do

Example: don’t swallow; classify

Failure mode 3: misconfigured retries/backoff create infinite limbo

What to do

Failure mode 4: “lost” jobs due to driver semantics and worker restarts

Database driver gotchas

Redis driver gotchas

Supervisor/systemd gotchas

Failure mode 5: you have “failed jobs”, but nobody sees them

What to do (minimum viable)

Concrete setup: hook into Queue events and report

Practical alerting targets

Failure mode 6: duplication and non-idempotent side effects (the “it ran twice” incident)

What to do

Example: simple idempotency guard using cache lock

What to change in a real Laravel codebase this week

Decision rule to keep you out of trouble

Top comments (0)