Scaling HTTP request with responses

Metheny80 — Sat, 27 Aug 2022 11:47:13 +0000

I'm trying to understand how to be able to scale requests which need.
We have a REST API backend hosted in AWS ECS instances, and of course we can scale-up horizontally when needed.
The question is how to deal with peaks, or an increase in requests while the scale-up is in progress without losing any requests.
I found many posts about using queues, but mainly for jobs that are fire & forget.
I assume handling scale for HTTP requests where a response to the client is required is a very common issue.
Examples include a user trying to get or update info about himself (GET /user or POST /user). These requests need to return information or sometimes an error code (e.g. data is invalid / DB data conflict occurred).
Of course we can use caching to reduce the processing time for a request, but still that doesn't eliminate the ability to handle a sudden increase in requests.
Using queues for this case means that the backend should queue the request and then after the request is handled, needs to wait for it to be processed and return a result / error code to the blocking client.
Meaning handling these types of requests asynchonously.
While technically it may be possible, this complicates the implementation, and I'm wondering if I'm missing anything simpler.

What is the best practice for this?

DEV Community: Metheny80

Scaling HTTP request with responses