A lot of the details about building an app are mysterious, even if you've been a programmer for years. So, I'm going to bundle up some of the discussions I've had with friends who are starting their own apps into these App101 posts. Follow along and we'll bring you up to speed on the decisions that you'll have to make when building a web-based app today.
If you boil it down far enough, most internet programming follows this cycle:
Receive HTTP Request -> Do Work -> Deliver HTTP Response
This works fine 99% of the time because usually the work we're doing is simple: update a record in the database, pull some data and return it as json, and so on. However, once in a rare while this cycle breaks down. Maybe you find yourself in a situation where the "do work" step is too time- or resource-consuming to finish before the requester times out. Or maybe the actual work is only being scheduled by this request and can't start until its scheduled time. These are classic cases where you might want to build a work queue on your backend.
A work queue is like a todo list for your server. You put messages into the queue, which are often no more than simple dictionary objects. You assign one or more workers to the top of the queue. Your workers then peel a message off the queue one-by-one, does some work, and then marks them as processed.
You can process these messages in whatever order you want. Most queues start with first-in, first-out processing (FIFO) processing, where the oldest message in the queue is the next to be processed. However, many queue tools support additional features like scheduling messages - so they can't be processed before a certain time - and adding priority values - so that higher priority messages can jump the queue like a first class passenger at the airport.
So now we know how a queue works, let's get back to the why of using one. The main benefit of queues is that they allow you to constrain resources and not constrain time.
What do I mean by that? Well, when you are handling an HTTP request, you are time constrained - you only have so many seconds to return a value before the connection to your client times out or breaks. You're not, though, resource constrained - if you're using a serverless architecture or a load balancer your host might just spin up ever more instances of your API server to handle additional traffic. This is bad in both directions when you have heavy work to do, though.
Let's imagine you have a request that is really hard to process and takes 60 seconds, for example calculating some complex AI response. As the time needed to handle that request goes up then not only does your failure rate go up (since connections time out while your working), but also your infrastructure costs go up because you need more web servers to handle your traffic (since your existing instances are spending all their time doing this heavy work).
Queues solve this problem in two ways. First, they escape your time constraint by moving the work to another process. Your web server can now return a "working on it" message almost immediately. Plus, because you pulled the processing out of your web server and into the queue worker, this heavy work no longer runs the risk of slowing or crashing your server. The second way they solve the constraint issue is that they allow you to constrain your infrastructure costs. Now you can control the number of workers who are working on the queue without causing failures. If your traffic goes up, it just means the queue takes longer to do the work rather than necessitating more infrastructure to handle that traffic.
That is, though, also why you may not want to use a queue. When you have more work coming in, the time to handle each job goes up because the queue gets longer. Additionally, you've added complexity to your API - now your client is dealing with an asynchronous process, so it will need to call back later to check "is my job done yet?" However, if you have heavy work that necessitates this, queues are often the better solution than just hoping the connection doesn't time out.
It can take years to truly master the administration and tuning of queues. (Indeed, I am far from a master here.) However, it takes only minutes to start learning those skills. I found two easy ways to start experimenting with queues today:
These services are cheap (both will handle billions of messages per month for free), and both can be configured to run your worker a serverless FaaS process (Amazon's documentation for that is here, and Google's is here).
They have two downsides, though. First, like many GCP and AWS products, they both have a bit of a painful learning curve as you learn to configure the tool. The main one, though, is that both of these have time limits to how long your worker can run. They are long - Google's is 10 minutes and Amazon's is 15 - but you should consider this if your work might approach that threshold. That was, eventually, why I went with Option 2 during my own experimentation.
The other choice is to roll your queue yourself. This is the route I took for my current side project, Tweet Sweep, because I really wanted to learn how the system worked under the hood and I was worried my queue workers might take longer than 15 minutes.
If you're working in Node, I'd recommend using one of the two most maintained open source projects: bee-queue and BullMQ. Both have the features I've mentioned here, so you can get started with priorities and scheduling. Personally, I chose bee-queue because its api for creating parallelized workers was more straight-forward.
With these packages (and, most likely, with any similar package in your preferred language), setting up the queue has two steps. First, you need a location for the queue messages to be stored while they're awaiting processing. Usually, this is a Redis datastore, since it's very fast and it is generally accepted as OK that you might lose queue messages if your infrastructure fails and needs to restart. Second, you will need a computer acting as your worker. Usually, this is process on a server that attaches to your queue(s) in the datastore and, when a new message is published to one, pulls it down and processes it.
A hot tip: while learning to build my own queue, I found fly.io really useful. Its free tier comes with both a preconfigured Redis store and enough CPU budget to run two processes full-time, so you can run your API server and queue worker side by side.
I've used queues at every company I've worked at. Indeed, Foursquare had tens of queues handling tasks like score calculation on a check-in or resizing profile images that were uploaded. Even this month, I'm helping a current client explore them for a long application processing step during onboarding. They're a tremendously useful tool and they can, when appropriately used, take a huge amount of strain off of your API servers and network hardware. But the first step to being able to use them in your own apps is learning the basics - let me know how you get on with queues down in the comments.