Ivica Kolenkaš for AWS Community Builders

Posted on Dec 17, 2023 • Edited on Dec 20, 2023

Practical ECS scaling: vertically scaling a CPU-heavy application

#aws #containers #cdk #scaling

The introductory article defined the performance envelope, and this one looks at how changing the performance envelope for a CPU-heavy application affects its performance.

The endpoint under test
Running tests
Results
Can a CPU-heavy application perform better with more CPU resources?

The endpoint under test

Our mock application is built in Flask and has several REST API endpoints, one of which is:

/cpu_intensive, simulating a CPU-intensive task.

When this endpoint is invoked, the application calculates the square root of 64 * 64 * 64 * 64 * 64 * 64 ** 64 and returns the result.

$ http  http://ALB.eu-central-1.elb.amazonaws.com/cpu_intensive

HTTP/1.1 200 OK
Connection: close
Content-Length: 36
Content-Type: application/json
Date: Sun, 26 Nov 2023 15:52:46 GMT
Server: Werkzeug/2.3.7 Python/3.9.6

{
    "result": "2.0568806966515076e+62"
}

Running tests

It is better to be roughly right than precisely wrong.
— Alan Greenspan

To load-test this application, I used hey to invoke the endpoint with 5 requests per second for 30 minutes using hey -z 30m -q 1 -c 5 $URL/cpu_intensive

To be able to compare results, I ran the same application in three containers, each with different hardware constraints:

	CPUs	Memory (GB)
Container 1	0.25	0.5
Container 2	0.5	1.0
Container 3	1.0	2.0

Results

Container 1 (0.25CPU)

As expected, Container 1 performed the worst, averaging 3.13 requests per second. Containers 2 and 3 were both able to serve 4.99 requests per second.

One of the graphs from Nathan's article shows a CPU load peaking and staying at 100% for the duration of the load test. I was able to achieve the same results with container 1 in my test.

Container 1 clearly on its knees with average CPU utilization at 100% for the duration of the test:

In this graph you can see CPU and memory utilization over time as the load test ramps up. The CPU metric is much higher than the memory metric, and it flattens out around 100%.

This means that the application ran out of CPU resource first. The workload is primarily CPU bound. This is quite normal, as most workloads run out of CPU before they run out of memory. As the application runs out of CPU, the quality of the service suffers before it actually runs out of memory.

This tells us one micro optimization we might be able to make, is to modify the performance envelope to add a bit more CPU and a bit less memory. Source

Container 2 (0.5CPU)

Container 2 has the double amount of CPU and delivers the expected performance of 5 requests per second with an average CPU utilization around 90%:

Container 3 (1CPU)

Doubling the amount of CPU again, container 3 delivers the expected performance with average CPU utilization around 35%:

We could even say that container 3, with 1CPU and 2GB of memory is over provisioned. In dollar amounts, it would cost $41 to run per month. On the other hand, container 2 would cost $20 while delivering the same baseline performance of 5 requests per second.

Can a CPU-heavy application perform better with more CPU resources?

As expected, yes. Increasing the amount of CPU from 0.25 to 0.5 allows the application container to deliver the expected performance of 5 requests per second while doing a CPU-heavy calculation.

Going from 0.5CPU to 1CPU doesn't add any measurable benefit at 5 requests per second, but it would allow the application to respond more quickly and scale to more requests per second.

Looking at hey's output in more detail, we can see that container 3 had response times that are almost 3 times faster that those from container 2.

	CPUs	Memory (GB)	Requests/sec	Avg. response time (sec)
Container 1	0.25	0.5	3.1384	1.5909
Container 2	0.5	1.0	4.9974	0.8514
Container 3	1.0	2.0	4.9990	0.3217

The end goal of all this load testing and metric analysis is to define an expected performance envelope that fits your application needs. Ideally it should also provide a little bit of extra space for occasional bursts of activity. Source

Container 2, with 0.5CPU and 1GB of memory provides just that. Vertically scaling a CPU-heavy applications results in increased performance.

Next up: Let's look at how vertically scaling an application with a memory leak goes. ☠️

DEV Community

Practical ECS scaling: vertically scaling a CPU-heavy application

The endpoint under test

Running tests

Results

Container 1 (0.25CPU)

Container 2 (0.5CPU)

Container 3 (1CPU)

Can a CPU-heavy application perform better with more CPU resources?

Top comments (0)

Read next

Do you know that EKS Auto Mode enforces a 21-day maximum node lifetime?

Building Cloud Security Efforts with AWS CAF and Well-Architected Framework

Choosing Between Cloud Providers: Azure, AWS, or Google Cloud?

Joins, Scale, and Denormalization