DEV Community

Cover image for Gunicorn performance analysis on AWS EC2
Jinwook Baek
Jinwook Baek

Posted on

Gunicorn performance analysis on AWS EC2

Introduction

There are several production level self-hosted options for running django app server behind nginx web sever using wsgi protocol, such as uWsgi, mod_wsgi and gunicorn. I have been using gunicorn for many projects for mayn years. Since then I have never doubted the performance and configuration details but recently I wanted to test the actual performance for gunicorn associated with worker count and find out optimal configuration for ECS fargate. In contrast to on-prem servers where I can grasp on actual number of physical cores, AWS only allow me to configure number of logical cores(via vCPU). I will illustrate how I have tested the performance using gunicorn, django and locust.

Standalone WSGI Containers - Flask Documentation (1.1.x)

Gunicorn

Gunicorn is Python WSGI HTTP Server for UNIX.

$ gunicorn django-sample.wsgi:application -w ${WORKER_COUNT} 
--threads ${THREAD_COUNT} -b 0.0.0.0:8000

You generally use following command to run the applicaiton. According to the design document, they recommend (2 x $num_cores) + 1 as the number of workers to start off with. The formula is based on the assumption that for a given core, one worker will be reading or writing from the socket while the other worker is processing a request.

Gunicorn - WSGI server - Gunicorn 20.0.4 documentation

EC2 vCPU

I need to determine the number of cpu cores for the server. In traditional linux server, you can determine number of core using following commands.

According to AWS, not all vCPUs are made the same. For T2 instances, 1 vCPU = 1 physical core. For all others, 1 vCPU = 1 logical core.

$ cat /proc/cpuinfo
-> cpu_cores

Optimizing CPU options

The number of vCPUs for the instance is the number of CPU cores multiplied by the threads per core. To specify a custom number of vCPUs, you must specify a valid number of CPU cores and threads per core for the instance type.

vCPU = Physical core * threads

According to this blog post there were 34% drop in multi-threaded performance on logical cores. ****

What's in a vCPU: State of Amazon EC2 in 2018 - Credera

You can configure threads per core during instance launch.

CPU option

Methodology

Architecture

Github Repo

Alt Text

Test will run on 3 tier web architecture. Django application will run behind the nginx through gunicorn. Django app will be querying Mysql DB in private subnet.

Instances used

  • t3.micro - 1 CPU
    • cpuinfo
  • t3.micro - 2 vCPU (1 cpu * 2 threads)
    • cpuinfo
  • t3.xlarge - 2 cpu
    • cpuinfo
  • t3.xlarge - 4 vCPU (2 cpu * 2 threads)
  • t3.2xlarge - 4 cpu
    • cpuinfo

Locust

Locust - A modern load testing framework

I will use locust to test and analyze the performance under load. Following Locust file will spawn multiple users concurrently with simple SQL query.

from random import randint

from locust import User, TaskSet, between, task, HttpUser

class ReadPosts(TaskSet):
    @task(1)
    def read_posts(self):
        response = self.client.get("api/v1/posts/", name="list posts")
        print("Response status code:", response.status_code)
        print("Response content:", response.text)

class WebsiteUser(HttpUser):
    tasks = [ReadPosts]
    wait_time = between(1.0, 2.0)

Additional setup

  • Tested django app with DEBUG = False
  • Latency is not considered since test was executed within VPC
  • Used serverless aurora RDS to avoid bottleneck on DB I/O

Test procedure

  • Used htop to record CPU usage of each core and mem usage
  • Incremented number of concurrent users by 100 to determine the load to see two thresholds

Result

Alt Text

Full results:
Gunicorn performance test result

Observations

  • Single worker cannot fully utilize cpus
  • Recommended number of workers (2*core +1) generally perform well as expected.
  • There are not much performance difference in physical or logical core.
    • Single threads seem to have slightly better performance
    • Therefore, it is efficient to use multiple threads if you consider price and memory usage is not the issue
      • t3.micro 1 core 2 threads - $0.0104 per Hour
      • t3.large 2 cores 1 thread - $0.0832 per Hour

Questions

  • What would be the efficient cpu usage cap? 60% - 90%?
  • Any other metrics I should have count into experiments?

ECS fargate performance

Since I gained general idea how gunicorn performed on ec2 instances. I took it further to test how gunicorn performs on ECS fargate setup with regarding vCPUs allocated. Please refer to the github for task definitions I have used.

ecs-task-definition

Since I cannot manage instance in faragte setup, I needed to assign cpu units for each tasks to comare with ec2. It was impossible to run htop on fargate since I have no access to the instances. ECS provide container insight but they are not as accurate nor real-time as I would run htop inside the instances. I only recorded Fail rate according to concurrent user count.

I didn't assign cpu unit for each task, since two tasks in previous test shared cpu resources in single machine.

Task definition parameters

Alt Text

Results

Results does not seem to be as consistent as ec2 cores. It seems like more worker counts result better performance proportionally.

Alt Text

After thoughts

I was not entirely satisfied with the result since I was not able to isolate all variables.

Next Todo

  • ASGI benchmark on different vCPU options
  • Use Asyncworker + threads options

Reference

PEP 3333 -- Python Web Server Gateway Interface v1.0.1

uWSGI vs. Gunicorn, or How to Make Python Go Faster than Node

A Performance Analysis of Python WSGI Servers: Part 2 | Blog | AppDynamics

Better performance by optimizing Gunicorn config

A Guide to ASGI in Django 3.0 and its Performance

Quick start - Locust 1.1.1 documentation

A Guide to ASGI in Django 3.0 and its Performance

What's in a vCPU: State of Amazon EC2 in 2018 - Credera

How Amazon ECS manages CPU and memory resources | Amazon Web Services

Top comments (0)