Practical ECS scaling: an introduction

#aws #containers #cdk #scaling

How effective is it to vertically scale an application that has a memory leak? Can a CPU-heavy application perform better with more CPU resources? Should you horizontally scale your application based on response times?

To show the answers to these questions in practice, I built and load tested a mock application in order to achieve results that are the same or very similar to those shown in Nathan Peck's great article on Amazon ECS Scalability Best Practices.

Meet the app

The application itself is built in Flask and its infrastructure is managed with AWS CDK for Python. The app has several REST API endpoints:

/cpu_intensive, simulating a CPU-intensive task. Calculates the square root of 64 * 64 * 64 * 64 * 64 * 64 ** 64 on each request.
/memory_leak, simulating a memory leak. Adds 1MB of memory on each request.
/long_response_time, simulating increasingly longer responses from a busy downstream system (e.g. a database). Sleeps for 2ms for each request received since the app was started.

All source code is available in this repository.

The performance envelope

Vertically scaling a container means giving the container more hardware resources.

When you vertically scale an application the first step is to identify what resources the application container actually needs in order to function.

There are different dimensions of resources that an application needs. For example: CPU, memory, storage, and network bandwidth. Some machine learning workloads may actually require GPU as well. Source

These resources form the performance envelope; a set of hardware constraints for the container.