Most Laravel queue “bugs” aren’t bugs—they’re missing feedback loops. If a job can fail without paging you (or at least showing up somewhere you actually look), it will. The fix is not “add more retries” or “restart the worker more often”. The fix is to make failure observable, make retries intentional, and make “lost work” impossible to ignore.
This post is a production-focused checklist of the silent failure modes I see most often in Laravel queues, why they happen, and how to harden your setup so you detect, alert, and recover quickly.
The silent failure pattern: your queue is working… until it isn’t
A queue feels healthy when:
-
queue:workis running - Horizon (or your process manager) shows workers online
- jobs are being pushed
But “healthy” is not “reliable”. Silent failure usually looks like one of these:
- jobs stop being processed for minutes/hours with no alerts
- jobs are processed but do nothing (early returns, swallowed exceptions)
- jobs fail and get retried forever (or die) without anyone noticing
- jobs are “processed” but side effects don’t happen (DB committed, external API not called, emails not sent)
The root cause is almost always one of:
- the worker process is alive but stuck
- the job is failing in a way that doesn’t surface
- the job is being released/backed off indefinitely
- the queue driver semantics are misunderstood
If you only take one recommendation: treat queue health as an SLO with alerting, not as a background implementation detail.
Failure mode 1: the worker is running but not processing (stuck/hung)
The most dangerous state is a worker process that’s alive but not making progress. Your supervisor says it’s “RUNNING”, but jobs pile up.
Common causes:
- long-running jobs without timeouts (HTTP calls with no timeout, stuck I/O)
- deadlocks or slow queries holding a connection
- memory leaks causing swapping / GC thrash
- a single job monopolizing the worker
What to do (opinionated)
1) Set hard timeouts at multiple layers:
- job-level
timeout - worker-level
--timeout - HTTP client timeouts
2) Make “no progress” alertable:
- alert on queue depth (backlog)
- alert on oldest job age
- alert on Horizon “wait time” (if using Horizon)
3) Prefer more workers with shorter jobs over fewer workers with long jobs. Long-running jobs are where silent failures breed.
Example: enforce timeouts and fail fast
<?php
namespace App\Jobs;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;
use Illuminate\Support\Facades\Http;
class SyncVendorCatalog implements ShouldQueue
{
use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;
// Hard stop for the worker
public int $timeout = 60;
// Don’t keep retrying forever; make it loud
public int $tries = 3;
// Optional: prevent a job from being attempted too long after it was queued
public int $maxExceptions = 1;
public function handle(): void
{
$response = Http::timeout(10) // connect+read timeout
->retry(2, 200) // short retry with backoff
->get('https://api.vendor.com/catalog');
$response->throw();
// … process payload
}
}
This does two important things:
- the job cannot hang indefinitely
- failures become deterministic (you’ll hit
failed_jobsafter tries)
If you’re using curl directly, Guzzle, S3 clients, etc., set timeouts there too. Laravel can’t kill a job that’s blocked in a syscall unless the worker timeout triggers.
Operational note
If you run workers with php artisan queue:work, use explicit flags:
--timeout=60--sleep=1--tries=3
And don’t pretend queue:listen is acceptable in production; it’s slower and easier to misconfigure.
Official docs: Queues https://laravel.com/docs/queues
Failure mode 2: exceptions are swallowed (your job “succeeds” while doing nothing)
This is the classic silent failure: the job finishes without throwing, so the queue driver marks it as done—even though the intended side effect never happened.
Where it happens:
-
try { ... } catch (\Throwable $e) { /* log? */ }with no rethrow - returning early on invalid state without recording it
- “best effort” integrations that ignore non-200 responses
What to do
Be strict: if a job is responsible for a side effect, it should throw when it can’t perform it. “Best effort” should be explicit and observable.
If you truly want to swallow an error (rare), you still need:
- a metric increment
- an error report (Sentry/Bugsnag)
- a dead-letter workflow (manual replay)
Example: don’t swallow; classify
<?php
namespace App\Jobs;
use App\Services\Payments\PaymentGateway;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;
use Throwable;
class CapturePayment implements ShouldQueue
{
use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;
public int $tries = 5;
public int $backoff = 30; // seconds
public function __construct(public int $orderId) {}
public function handle(PaymentGateway $gateway): void
{
try {
$gateway->capture($this->orderId);
} catch (Throwable $e) {
// If it’s transient, rethrow to retry.
// If it’s permanent, fail fast so it lands in failed_jobs.
if ($gateway->isPermanentFailure($e)) {
$this->fail($e); // marks as failed immediately
return;
}
throw $e;
}
}
}
Key judgment call: “permanent failure” should not burn retries. You want it to go to failed_jobs quickly so you can handle it (refund, contact user, etc.).
Failure mode 3: misconfigured retries/backoff create infinite limbo
Laravel makes it easy to back off and retry, but it’s also easy to create jobs that never succeed and never fail loudly.
How this happens:
-
release()is called repeatedly without a cap -
backoffgrows buttriesis high or defaulted -
retryUntil()pushes the failure window far out - Horizon auto-balancing hides the fact that one queue is stuck
What to do
- Set finite retries (
$tries) for most jobs. - Use bounded retry windows for time-sensitive work (
retryUntil). - For idempotent tasks that can be replayed later, fail early and rely on a replay mechanism.
A reasonable default for many teams:
-
$tries = 3for external API calls -
$tries = 1for non-idempotent side effects unless you’ve built idempotency -
$backoff = [10, 60, 300]for transient network issues
If you can’t explain why a job is safe to retry, it probably isn’t.
Failure mode 4: “lost” jobs due to driver semantics and worker restarts
Not all queue drivers behave the same. Silent loss or duplication is often a mismatch between assumptions and reality.
Database driver gotchas
- Jobs are stored in a table; workers poll.
- Under load, polling delay can look like “stuck”.
- Long transactions can block job reservation.
If you’re at the point where queue latency matters, move off database.
Redis driver gotchas
Redis is the default for a reason: it’s fast and supports good semantics. But you still need to understand:
-
visibility timeout / retry_after: if a worker dies mid-job, the job becomes available again after
retry_after. - if
retry_afteris too small relative to job runtime, you’ll get duplicate processing.
In Laravel, retry_after is configured per connection in config/queue.php.
Rule: set retry_after comfortably above your maximum real job runtime (including worst-case API slowness), and keep job timeouts below it.
Supervisor/systemd gotchas
Workers restart. Deploys happen. Servers reboot.
Silent failure here is often:
- workers not restarted after deploy (stale code)
- process manager configured to restart too aggressively (thrashing)
- logs not captured anywhere useful
If you deploy frequently, prefer Laravel Horizon for Redis-backed queues. It gives you process control, visibility, and per-queue balancing.
Official link: Horizon https://laravel.com/docs/horizon
Failure mode 5: you have “failed jobs”, but nobody sees them
Laravel will happily write to failed_jobs (or Horizon’s failed list) forever while your team never checks it.
This is the most common “silent failure” in real companies: the system is technically recording failure, but it’s not operationally connected to humans.
What to do (minimum viable)
- Ensure failed job storage is configured (
queue:failed-tablemigration for DB). - Alert on failed job rate.
- Give engineers a one-command way to inspect and replay.
Concrete setup: hook into Queue events and report
Laravel emits queue events you can subscribe to. Use them to send failures to Sentry/Bugsnag and to emit metrics.
<?php
namespace App\Providers;
use Illuminate\Queue\Events\JobFailed;
use Illuminate\Queue\Events\JobProcessed;
use Illuminate\Queue\Events\JobProcessing;
use Illuminate\Support\Facades\Event;
use Illuminate\Support\ServiceProvider;
class QueueObservabilityServiceProvider extends ServiceProvider
{
public function boot(): void
{
Event::listen(JobProcessing::class, function (JobProcessing $event) {
// Example: increment a metric, add trace context, etc.
// statsd()->increment('queue.job_processing', 1, ['queue' => $event->job->getQueue()]);
});
Event::listen(JobProcessed::class, function (JobProcessed $event) {
// statsd()->increment('queue.job_processed', 1, ['queue' => $event->job->getQueue()]);
});
Event::listen(JobFailed::class, function (JobFailed $event) {
// Send to your error tracker
if (app()->bound('sentry')) {
\Sentry\captureException($event->exception);
}
// Also log with strong context
logger()->error('Queue job failed', [
'connection' => $event->connectionName,
'queue' => $event->job->getQueue(),
'job' => $event->job->resolveName(),
'uuid' => method_exists($event->job, 'uuid') ? $event->job->uuid() : null,
'message' => $event->exception->getMessage(),
]);
});
}
}
This is deliberately boring: make failures impossible to ignore by routing them into the same system that pages you for HTTP 500s.
If you’re using Horizon, also configure its notification hooks for long waits and failures.
Practical alerting targets
Pick alerts that catch “silent” quickly without constant noise:
-
Queue backlog:
jobs waiting > Nfor > 5 minutes - Oldest job age: oldest pending job > X minutes
- Failed jobs rate: > 0 in 10 minutes for critical queues
- No processing: processed count = 0 while pending count increases
Don’t overcomplicate it. Two alerts (oldest job age + failed jobs) catch most incidents.
Failure mode 6: duplication and non-idempotent side effects (the “it ran twice” incident)
Some teams call this a “silent failure” because the queue doesn’t error—but users see double emails, double charges, duplicate webhooks.
Duplication happens when:
- a worker times out but the side effect already happened
-
retry_afterexpires and the job is re-queued while still running - external systems retry webhooks and you enqueue duplicates
What to do
Assume at-least-once delivery. Build idempotency where it matters.
- For emails/notifications: store a send record keyed by a deterministic id
- For payments: use provider idempotency keys (Stripe supports this) and store request IDs
- For “sync” jobs: use upserts and version checks
Example: simple idempotency guard using cache lock
This won’t solve every case, but it’s a strong baseline for “don’t run twice concurrently” and “avoid rapid duplicates”.
<?php
namespace App\Jobs;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;
use Illuminate\Support\Facades\Cache;
class SendInvoiceEmail implements ShouldQueue
{
use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;
public int $tries = 3;
public function __construct(public int $invoiceId) {}
public function handle(): void
{
$key = "invoice:{$this->invoiceId}:email_sent";
// 24h idempotency window
if (Cache::has($key)) {
return;
}
$lock = Cache::lock("lock:{$key}", 30);
if (! $lock->get()) {
// Another worker is doing it.
return;
}
try {
// ... send email
Cache::put($key, true, now()->addDay());
} finally {
optional($lock)->release();
}
}
}
Tradeoff: cache-based idempotency is operationally dependent on Redis. For “money moved” side effects, prefer database unique constraints or provider idempotency keys.
What to change in a real Laravel codebase this week
If your queue failures are currently “silent”, don’t start by rewriting jobs. Start by tightening the system around them.
1) Move critical queues to Redis + Horizon if you’re still on the database driver.
2) Set explicit timeouts and retries on every job that touches the network.
3) Wire JobFailed to your error tracker and create one alert: “failed jobs > 0 on critical queue”.
4) Alert on oldest job age (this catches stuck workers even when nothing is failing).
5) Add idempotency to the top 1–2 risky jobs (payments, emails, webhooks). Don’t boil the ocean.
Decision rule to keep you out of trouble
If a queued job triggers an external side effect (email, payment, webhook, data sync), treat it like a mini production service:
- it must time out
- it must fail loudly (error tracker + alert)
- it must be safe to retry (or it must not retry)
When you’re unsure, bias toward: fail fast → land in failed jobs → replay intentionally. Silent “best effort” is how queues rot in production.
Read the full post on QCode: https://qcode.in/why-your-laravel-queue-fails-silently-and-how-to-fix-it/
Top comments (0)