DEV Community

Discussion on: Pub/Sub model vs Task Queues

Collapse
 
alexanderjanke profile image
Alex Janke

I'd personally use Amazon SQS (aws.amazon.com/de/sqs/) for any kind of queues as it's really simple to get going and it can also prevent duplication; deduplication is truly a life safer.

I can't think of too many reasons to avoid running tasks on the application level but the most prominent is definitely something unexpectingly crashing/failing. What would you do then? Try again some time later.. ? But you'd need some worker for that and some data in the database to determine who already received a mail. Now it becomes all convoluted in comparison to a simple queue.

Another upside of a queue is that you can decouple your modules. One module inserts into the queue, another processes. Neither of the modules ever need to talk to eachother. This allows for scaling on either side of the queue very easily. If everything is tightly linked, you need to scale the whole thing.

Personal example:
For our website we use SQS to store a list of tasks that should run whenever ressources are available.
So we got:

  • 1 SQS Queue
  • Multiple worker instances (Kubernetes)

The worker polls sqs every few seconds to check if there is something in the queue that should be processed. If there is a task, grab it from the queue and do some processing. Kubernetes sees that the first worker is busy, spawns another one who starts checking for tasks in the queue. Rinse and repeat. Once everything is processed the workers start to shutdown again, leaving just 1 worker polling for work again. So it's basically a living system that responds to spikes in the queue on demand.

Collapse
 
aslasn profile image
Ande

that's definitely good to use third party services like sqs queue. reduces quite some of the work.
But here's thing. server crashes can happen anytime during lots of application level operations. I opened a discussion on this but sadly no one replied yet ._.

Collapse
 
alexanderjanke profile image
Alex Janke

Yup, you can only try to mitigate potential crashes by implementing unit-tests and getting the test coverage close to 100% as possible. Ship with confidence knowing your server can handle every incoming request. Also input validation is a must-have to avoid any weird side effects, especially when using forms where the user can input data. Client-side validation is just eye-candy for the user so they know what they did wrong - the server is your source of truth who should decide if the input was correct. If the input was false from the beginning, you don't even need to try to start any of your processes, yet again reducing your chance of running into any potential crash scenarios.

But when outsourcing these kind of things you don't need to test the actual queue. You only need to test your implementation of using the queue. Saves a lot of work - and to be honest it's super cost effective.

Thread Thread
 
aslasn profile image
Ande

ah. reducing chances of application and it's processor's crashes leaves us only to unexpected server crashes or caused by some other process. cool. lazy me: unit tests 0.0

Still trying to figure out how to handle those out of control crashes. Might be a good idea to check for half-done logs after a server boot and take actions?

Thread Thread
 
alexanderjanke profile image
Alex Janke

Can you explain a little bit more about those "out of control" crashes.. What causes your apps to crash this randomly?

Thread Thread
 
aslasn profile image
Ande • Edited

No, It isn't about mine but about recovering when they happen. Like I can now work on my issues, identify and eliminate them.

But some things always slips away and there's the unexpected. Maybe even we couldn't scale in time and a overwhelming traffic crashed the hell our of my app and server, it can and does happen as far as i know.

And when something bad does happen, we have to recover from it, right? That's all I'm trying to learn, designing and implementing architectures that might not be fail-safe but can recover with the least possible damage/data loss.

getting off topic 0.0