I built a Laravel queue monitoring tool because I got tired of not knowing what my jobs actually do

#laravel #php #opensource

At some point I realized I don’t really understand what’s going on with my queues.
I mean, yeah:

jobs are running
workers are alive
logs exist somewhere

But if I try to answer simple questions:

did this job actually do what it was supposed to do?
or did it just “successfully complete”?
where do things silently break?

— I don’t really have good answers. The most annoying part — silent failures. There’s a type of bug that’s especially frustrating.
A job:

runs
throws no errors
finishes with success
…and does absolutely nothing.
I thought it was rare, but once I started digging, I found several cases like this. What helped me spot them was looking at execution time.
Something like: normally the job takes ~500ms, sometimes it finishes in 5ms, that’s… suspicious 🙂
That’s how I found jobs that were “successful” but effectively doing nothing. At some point I just wanted a clear view. Nothing fancy.
Just open a page and understand:
what’s happening right now
what’s failing
what looks weird

Plus a few practical things:

see all retries
understand failure reasons
group recurring errors
retry not just one job, but many
sometimes fix the payload and re-run

And of course alerts:

Slack
Webhook
PagerDuty for critical stuff So I built a small package, installation is basically:

composer require romalytar/yammi-jobs-monitoring-laravel php artisan migrate

Then you just open:

/jobs-monitor
No heavy setup, it just works.
What it actually shows. The main idea is: not just status, but behavior.
You can open any job and see:

all attempts
errors
execution time for each try
Which already gives way more insight than just “failed / success”.
There’s also some basic stats:
which jobs fail the most
slowest jobs
retry rate

Failed jobs (DLQ) are actually usable:

retry
edit & retry (JSON payload)
bulk operations

One thing I personally like a lot — error grouping.
Instead of scrolling through identical stack traces, you see grouped failures and immediately understand:
“ok, this all comes from the same issue”.
And probably the most useful part — anomaly detection.
If a job suddenly:

becomes much slower
or suspiciously fast

—it gets flagged.

Those “too fast” cases are often silent failures.
Also included:

worker heartbeat (you see when workers disappear)
scheduled task monitoring
alerts (Slack / Webhook / PagerDuty / etc.)

The whole idea is pretty simple Not to build “another tool”.
But to answer one question: what is actually happening in my queues?
Would love some feedback
If anyone tries it, I’d be really interested in: