Pub/Sub model vs Task Queues

aslasn profile image nkaskjn ・1 min read

My perspective on publish/subscribe model is that it's running on the application level without a persistent queue like rabbitmq. All it is doing for me is making tasks like sending users a verification mail asynchronous. Where mailing might take up to a few hundred milliseconds.

Now I can achieve the same with task queues. Which seem to be more viable since once the task is queued, it's there even after a server crash. Even better in a distributed system, systems running long/complex tasks etc.

I wanna talk simple for most everyday cases like sending emails, analytics, doing some complex database queries.

Adding a job schedule requires me to do some structuring in the codebase and managing those queues and logs. While it might be a good use case for lots of tasks, why should i even bother for cases where i can get away with a pub/sub layer. What are the constrains I am not aware of or wrong about?

I would like to know your opinion on this.

Posted on by:


markdown guide

I'd personally use Amazon SQS (aws.amazon.com/de/sqs/) for any kind of queues as it's really simple to get going and it can also prevent duplication; deduplication is truly a life safer.

I can't think of too many reasons to avoid running tasks on the application level but the most prominent is definitely something unexpectingly crashing/failing. What would you do then? Try again some time later.. ? But you'd need some worker for that and some data in the database to determine who already received a mail. Now it becomes all convoluted in comparison to a simple queue.

Another upside of a queue is that you can decouple your modules. One module inserts into the queue, another processes. Neither of the modules ever need to talk to eachother. This allows for scaling on either side of the queue very easily. If everything is tightly linked, you need to scale the whole thing.

Personal example:
For our website we use SQS to store a list of tasks that should run whenever ressources are available.
So we got:

  • 1 SQS Queue
  • Multiple worker instances (Kubernetes)

The worker polls sqs every few seconds to check if there is something in the queue that should be processed. If there is a task, grab it from the queue and do some processing. Kubernetes sees that the first worker is busy, spawns another one who starts checking for tasks in the queue. Rinse and repeat. Once everything is processed the workers start to shutdown again, leaving just 1 worker polling for work again. So it's basically a living system that responds to spikes in the queue on demand.


that's definitely good to use third party services like sqs queue. reduces quite some of the work.
But here's thing. server crashes can happen anytime during lots of application level operations. I opened a discussion on this but sadly no one replied yet ._.


Yup, you can only try to mitigate potential crashes by implementing unit-tests and getting the test coverage close to 100% as possible. Ship with confidence knowing your server can handle every incoming request. Also input validation is a must-have to avoid any weird side effects, especially when using forms where the user can input data. Client-side validation is just eye-candy for the user so they know what they did wrong - the server is your source of truth who should decide if the input was correct. If the input was false from the beginning, you don't even need to try to start any of your processes, yet again reducing your chance of running into any potential crash scenarios.

But when outsourcing these kind of things you don't need to test the actual queue. You only need to test your implementation of using the queue. Saves a lot of work - and to be honest it's super cost effective.

ah. reducing chances of application and it's processor's crashes leaves us only to unexpected server crashes or caused by some other process. cool. lazy me: unit tests 0.0

Still trying to figure out how to handle those out of control crashes. Might be a good idea to check for half-done logs after a server boot and take actions?

Can you explain a little bit more about those "out of control" crashes.. What causes your apps to crash this randomly?

No, It isn't about mine but about recovering when they happen. Like I can now work on my issues, identify and eliminate them.

But some things always slips away and there's the unexpected. Maybe even we couldn't scale in time and a overwhelming traffic crashed the hell our of my app and server, it can and does happen as far as i know.

And when something bad does happen, we have to recover from it, right? That's all I'm trying to learn, designing and implementing architectures that might not be fail-safe but can recover with the least possible damage/data loss.

getting off topic 0.0