Kien Do

Posted on Jan 26 • Edited on Feb 2

Low-Noise EC2 Benchmarking: A Practical Guide

#rust #aws #performance #githubactions

Introduction

Running reliable performance benchmarks in the cloud is notoriously difficult. EC2 instances have inherent performance variability—CPU frequency scaling, power-saving states, NUMA balancing, swap activity, and shared I/O—that can mask real code regressions or trigger false alarms. When your benchmark shows a 15% regression, is it your code or just EC2 having a bad day?

I initially encountered this flakiness issue while benchmarking Rust production code with the Criterion crate in GitHub Actions CI, and observing wildly varied wall-clock results between runs (no code changes).

As it turns out, this is expected behaviour and has been documented in Criterion's FAQ - How Should I Run Criterion.rs Benchmarks In A CI Pipeline? (short answer: don't). With further research, I quickly found out that getting reliable Rust benchmarking in CI was a common issue. The problem isn't Rust-specific, it's the underlying hardware that affects all code benchmarks.

Screenshot from Everett Pompeii's "Catch Performance Regressions in CI @ Rust DC"

After implementing optimizations based on MongoDB's extensive research into cloud performance testing, I reduced benchmark variance from 20–30% down to under 5%. This guide provides the practical scripts and step-by-step instructions to do the same for your CI benchmarking infrastructure.

What you'll learn:

How to configure an EC2 instance for low-noise performance testing
How to register the instance as a GitHub Actions self-hosted runner
How to create a CI workflow for automated benchmark regression detection (with Rust Criterion examples)

Why Dedicated EC2?

With dynamically provisioned shared runners (e.g. GitHub-hosted runners), you have no control over CPU frequency scaling, other workloads on the same physical host, memory pressure, or I/O contention. With a dedicated EC2 that you manage yourself, you control the configuration.

MongoDB's performance team spent years researching this problem. Their key insight: prioritize repeatability over peak performance. A benchmark that's consistently 10% slower but stable is far more useful than one that's sometimes fast but varies by 30%.

Why not bare metal?

Yes, bare metal would be ideal, as there is no hypervisor, no noisy neighbours, and predictable performance. But when I asked the DevOps team to provision physical machines, the answer was an immediate "no". Too expensive, too difficult to manage and secure, and nobody does that anymore. Cloud VMs are the reality now. Shared, ephemeral resources are how modern infrastructure works. So rather than fighting this trend, the goal becomes: how do we make cloud instances behave as close to bare metal as possible for the duration of our benchmarks?

That's what this guide addresses.

MongoDB Research References

Step 1: Create an EC2 Instance

Launch an EC2 instance with the following specifications:

Component	Recommended	Alternative
Instance Type	`c6i.8xlarge` (Nitro)	`c3.8xlarge` (Xen)
vCPUs / RAM	32 cores, 64 GiB	32 cores, 60 GiB
OS	Ubuntu 24.04 LTS	Ubuntu 22.04 LTS
Root Volume	30 GiB gp3, 3000 IOPS	—
EBS Data Volume	100+ GiB gp3, 4000 IOPS	100+ GiB io2, 5000+ IOPS

Why these specs?

32 vCPUs: Enables disabling hyperthreading (32 → 16 physical cores), eliminating interference between logical cores sharing execution units.
Dedicated EBS: MongoDB's research showed EBS with provisioned IOPS dramatically outperforms ephemeral SSDs by eliminating shared I/O contention. This is counterintuitive but well-documented.
c3.8xlarge advantage: Runs at a fixed optimal frequency automatically—less configuration needed.

EBS cost comparison (100 GiB):

Option	Type	IOPS	Monthly Cost
Recommended	gp3	4000	~$38 USD
MongoDB exact	io2	5000+	~$338 USD

For most workloads, gp3 provides 90% of the benefit at 11% of the cost.

Step 2: Run the Setup Scripts

Create a folder called ec2-setup and add the scripts from the Scripts section. Then:

# SSH into your instance
ssh ubuntu@<instance-ip>

# Make scripts executable
chmod +x ./ec2-setup/*.sh

# Identify your EBS data volume
lsblk
# Nitro instances: /dev/nvme1n1 or /dev/nvme2n1
# Xen instances: /dev/xvdb, /dev/xvdc

# Run setup (adjust device path as needed)
sudo ./ec2-setup/setup-ec2-instance.sh /dev/nvme1n1

Step 3: Reboot and Apply CPU Optimizations

# Reboot to apply kernel parameters
sudo reboot

# After reboot, apply CPU optimizations
sudo /usr/local/bin/disable-hyperthreading.sh
sudo /usr/local/bin/set-cpu-frequency.sh

# Verify the system is ready
run-canary-tests
benchmark-status

Step 4: Register as a GitHub Actions Runner

Navigate to Repository → Settings → Actions → Runners → New self-hosted runner
Download and configure the runner:

mkdir actions-runner && cd actions-runner

# Download (check GitHub for current version)
curl -o actions-runner-linux-x64-2.311.0.tar.gz -L \
    https://github.com/actions/runner/releases/download/v2.311.0/actions-runner-linux-x64-2.311.0.tar.gz
tar xzf ./actions-runner-linux-x64-2.311.0.tar.gz

# Configure with your repository token
./config.sh --url https://github.com/YOUR_ORG/YOUR_REPO --token YOUR_TOKEN

When prompted, add a descriptive label like benchmark-runner
Install and start as a service:

sudo ./svc.sh install
sudo ./svc.sh start

For complete documentation, see GitHub Docs: Adding self-hosted runners.

Corporate environments: If you have GitHub Enterprise, proxies, or firewalls, consult your DevOps team for additional configuration.

Step 5: Verify Your Setup

Run through this checklist:

[ ] CPU governor shows performance (or N/A on Xen)
[ ] Hyperthreading disabled (16 active cores from 32)
[ ] Swap completely disabled (free -h shows 0 swap)
[ ] NUMA balancing disabled
[ ] EBS storage mounted at /opt/benchmark-data
[ ] Canary tests passing consistently
[ ] GitHub runner showing online in repository settings

# Quick verification commands
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
nproc  # Should show 16
free -h
cat /proc/sys/kernel/numa_balancing  # Should show 0
df -h /opt/benchmark-data

Your EC2 instance is now configured as a GitHub Actions self-hosted runner! You can create and run automated benchmarks on it as part of your CI workflow.

The next section provides an example workflow using Rust's Criterion crate, but the same approach works for any language's benchmarking framework.

Example CI Workflow - Rust Criterion Benchmarks

This workflow implements a baseline comparison strategy: run benchmarks on main, then on your feature branch, and compare. This provides meaningful regression detection in virtualized environments where absolute timing values are unreliable. The example uses Criterion, but the pattern applies to any benchmarking tool.

Create .github/workflows/benchmark.yaml:

name: Performance Benchmarks

on:
  push:
    branches: [main]
  pull_request:
    types: [labeled]

permissions:
  contents: read
  pull-requests: read

jobs:
  benchmark:
    name: Run Criterion Benchmarks
    if: >
      github.event_name == 'push' ||
      github.event.label.name == 'run-benchmark' ||
      github.event.label.name == 'full-benchmark'
    runs-on: [self-hosted, Linux, X64, benchmark-runner]

    steps:
      - name: Checkout code
        uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Install Rust toolchain
        run: |
          curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
          echo "$HOME/.cargo/bin" >> $GITHUB_PATH

      - name: Check benchmark scope
        id: check-labels
        run: |
          if [ "${{ github.event.label.name }}" = "full-benchmark" ]; then
            echo "feature_flag=--features full-benchmark" >> $GITHUB_OUTPUT
          else
            echo "feature_flag=" >> $GITHUB_OUTPUT
          fi

      - name: Run benchmarks with baseline comparison
        run: |
          cd src/your-project  # Adjust to your project path
          mkdir -p benchmark_results

          # Save current commit
          CURRENT_COMMIT=$(git rev-parse HEAD)

          # Run baseline on main
          git fetch origin main:refs/remotes/origin/main --force
          git checkout origin/main
          cargo bench ${{ steps.check-labels.outputs.feature_flag }} -- --verbose \
            > benchmark_results/baseline.txt 2>&1

          # Run on feature branch
          git checkout "$CURRENT_COMMIT"
          cargo bench ${{ steps.check-labels.outputs.feature_flag }} -- --verbose \
            > benchmark_results/feature.txt 2>&1

          # Check for regressions
          if grep -q "Performance has regressed" benchmark_results/feature.txt; then
            echo "::error::Performance regression detected!"
            grep -A5 "Performance has regressed" benchmark_results/feature.txt
            exit 1
          fi
          echo "No regressions detected."

      - name: Upload results
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: benchmark-results
          path: src/your-project/benchmark_results/
          retention-days: 30

      - name: Cleanup
        if: always()
        run: |
          cd src/your-project
          cargo clean || true
          rm -rf benchmark_results || true

Controlling Benchmark Scope with Labels

Label	Effect
`run-benchmark`	Run standard benchmarks
`full-benchmark`	Run extended benchmarks with larger datasets

In your Cargo.toml:

[features]
full-benchmark = []

In your benchmark code:

#[cfg(not(feature = "full-benchmark"))]
const TEST_SIZES: &[usize] = &[100, 1_000];

#[cfg(feature = "full-benchmark")]
const TEST_SIZES: &[usize] = &[100, 1_000, 10_000, 100_000];

Running Benchmarks Manually

# Set target directory to EBS for faster builds
export CARGO_TARGET_DIR=/opt/benchmark-data/workdir

# Pin to specific cores for isolation
taskset -c 0-3 cargo bench

# With NUMA binding (multi-socket systems)
numactl --cpunodebind=0 --membind=0 cargo bench

# Full benchmark suite
cargo bench --features full-benchmark

What the Scripts Do

CPU Optimization (`01-system-tuning.sh`)

Goal: Eliminate frequency scaling and power-saving states.

Performance governor: Locks CPU to maximum frequency
Disable C-states: Prevents power-saving delays via kernel parameters: processor.max_cstate=1 intel_idle.max_cstate=1 intel_pstate=disable idle=poll
Disable hyperthreading: Two logical cores on one physical core share execution units, caches, and branch predictors. Disabling hyperthreading eliminates this interference.

Memory Optimization

Goal: Eliminate swap and NUMA balancing.

Disable swap: Even with plenty of RAM, Linux may swap pages. A 10ms disk access in your hot loop destroys timing data.
Disable NUMA balancing: The kernel may migrate memory between NUMA nodes, causing unpredictable stalls.

Storage Optimization (`02-storage-setup.sh`)

Goal: Consistent I/O performance.

ext4 without journaling: Eliminates journal write overhead
Mount options: noatime,nodiratime,nobarrier,discard
noop I/O scheduler: Best for SSDs

Canary Tests (`03-system-validation.sh`)

Goal: Distinguish EC2 issues from code regressions.

Run a known workload before your benchmarks. If the canary regresses, the problem is EC2, not your code.

run-canary-tests run       # Run all tests
run-canary-tests readiness # Check system configuration

Expected Results

Metric	Before	After
Benchmark variance	20–30%	<5%
False positive rate	High	Near zero
Regression detection	Unreliable	Catches 5% regressions

Scripts

Download these scripts and place them in an ec2-setup folder.

setup-ec2-instance.sh

#!/bin/bash
set -euo pipefail

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
LOG_FILE="/var/log/ec2-benchmark-setup.log"

print_header() { echo -e "\e[1;34m=== $1 ===\e[0m"; }
print_status() { echo -e "\e[1;32m[$(date '+%H:%M:%S')] $1\e[0m" | tee -a "$LOG_FILE"; }
print_error() { echo -e "\e[1;31m[$(date '+%H:%M:%S')] ERROR: $1\e[0m" | tee -a "$LOG_FILE"; }
print_warning() { echo -e "\e[1;33m[$(date '+%H:%M:%S')] WARNING: $1\e[0m" | tee -a "$LOG_FILE"; }

check_root() {
    if [ "$EUID" -ne 0 ]; then
        print_error "This script must be run as root"
        exit 1
    fi
}

detect_instance_info() {
    print_status "Detecting EC2 instance information..."
    INSTANCE_TYPE=$(curl -s http://169.254.169.254/latest/meta-data/instance-type 2>/dev/null || echo "unknown")
    print_status "Instance Type: $INSTANCE_TYPE"
    print_status "CPU Cores: $(nproc)"
    print_status "Memory: $(free -g | awk '/^Mem:/{print $2}')GB"
}

update_system() {
    print_header "Updating System Packages"
    export DEBIAN_FRONTEND=noninteractive
    apt-get update && apt-get upgrade -y
    apt-get install -y \
        htop iotop sysstat cpufrequtils linux-tools-common \
        stress-ng bc curl wget jq git build-essential \
        fio nvme-cli smartmontools python3 unzip
}

install_rust() {
    print_header "Installing Rust Toolchain"
    if sudo -u "$SUDO_USER" bash -c 'command -v cargo' &> /dev/null; then
        print_status "Rust already installed"
        return 0
    fi
    sudo -u "$SUDO_USER" bash -c 'curl --proto "=https" --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y'
    echo 'export PATH="$HOME/.cargo/bin:$PATH"' >> "/home/$SUDO_USER/.bashrc"
}

setup_directories() {
    mkdir -p /usr/local/bin/benchmark-scripts
    cp "$SCRIPT_DIR"/*.sh /usr/local/bin/benchmark-scripts/
    chmod +x /usr/local/bin/benchmark-scripts/*.sh
}

create_convenience_scripts() {
    cat > /usr/local/bin/run-canary-tests << 'EOF'
#!/bin/bash
exec /usr/local/bin/benchmark-scripts/03-system-validation.sh "$@"
EOF
    chmod +x /usr/local/bin/run-canary-tests

    cat > /usr/local/bin/benchmark-status << 'EOF'
#!/bin/bash
echo "=== EC2 Benchmark Status ==="
echo "Instance: $(curl -s http://169.254.169.254/latest/meta-data/instance-type 2>/dev/null || echo 'unknown')"
echo "Cores: $(nproc)"
echo "Governor: $(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor 2>/dev/null || echo 'N/A')"
echo ""
free -h
echo ""
df -h | grep -E '(/opt/benchmark-data|/$)'
EOF
    chmod +x /usr/local/bin/benchmark-status
}

setup_monitoring() {
    cat > /etc/cron.d/benchmark-canary << 'EOF'
0 * * * * root /usr/local/bin/run-canary-tests run >> /var/log/canary-cron.log 2>&1
EOF
}

main() {
    local ebs_device="${1:-}"
    check_root
    detect_instance_info
    update_system
    install_rust
    setup_directories

    bash "$SCRIPT_DIR/01-system-tuning.sh"

    if [ -n "$ebs_device" ]; then
        bash "$SCRIPT_DIR/02-storage-setup.sh" "$ebs_device"
    else
        print_warning "No EBS device specified—skipping storage setup"
        mkdir -p /opt/benchmark-data/{workdir,results,tmp}
        chown -R "$SUDO_USER:$SUDO_USER" /opt/benchmark-data
    fi

    create_convenience_scripts
    setup_monitoring

    print_header "Setup Complete"
    print_warning "REBOOT REQUIRED to apply kernel parameters"
    print_status "After reboot, run:"
    print_status "  sudo /usr/local/bin/disable-hyperthreading.sh"
    print_status "  sudo /usr/local/bin/set-cpu-frequency.sh"
}

main "$@"

01-system-tuning.sh

#!/bin/bash
set -euo pipefail

echo "=== Applying System Tuning ==="

# CPU governor
if [ -d /sys/devices/system/cpu/cpu0/cpufreq ]; then
    echo 'performance' | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
    cat > /etc/systemd/system/cpu-performance.service << 'EOF'
[Unit]
Description=Set CPU governor to performance
After=multi-user.target

[Service]
Type=oneshot
ExecStart=/bin/bash -c 'echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor'
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target
EOF
    systemctl enable cpu-performance.service
else
    echo "CPU frequency scaling not exposed (typical for Xen instances)"
fi

# Kernel parameters
sed -i 's/GRUB_CMDLINE_LINUX_DEFAULT="\(.*\)"/GRUB_CMDLINE_LINUX_DEFAULT="\1 processor.max_cstate=1 intel_idle.max_cstate=1 intel_pstate=disable idle=poll"/' /etc/default/grub
update-grub

# Memory
swapoff -a
sed -i '/swap/d' /etc/fstab

cat > /etc/sysctl.d/99-benchmark-tuning.conf << 'EOF'
kernel.numa_balancing=0
vm.swappiness=1
vm.dirty_ratio=15
vm.dirty_background_ratio=5
vm.vfs_cache_pressure=50
EOF
sysctl -p /etc/sysctl.d/99-benchmark-tuning.conf

# Disable unnecessary services
for service in bluetooth cups avahi-daemon ModemManager; do
    systemctl disable "$service" 2>/dev/null || true
    systemctl stop "$service" 2>/dev/null || true
done

# Hyperthreading disable script
cat > /usr/local/bin/disable-hyperthreading.sh << 'EOF'
#!/bin/bash
CPU_COUNT=$(nproc)
if [ "$CPU_COUNT" -eq 32 ]; then
    echo "Disabling hyperthreading (32 → 16 cores)..."
    for core in $(seq 16 31); do
        echo 0 > /sys/devices/system/cpu/cpu$core/online
    done
    echo "Active cores: $(nproc)"
fi
EOF
chmod +x /usr/local/bin/disable-hyperthreading.sh

# CPU frequency script
cat > /usr/local/bin/set-cpu-frequency.sh << 'EOF'
#!/bin/bash
if [ -d /sys/devices/system/cpu/cpu0/cpufreq ]; then
    for cpu in /sys/devices/system/cpu/cpu*/cpufreq/scaling_setspeed; do
        [ -f "$cpu" ] && echo "$(cat "${cpu%/*}/scaling_max_freq")" > "$cpu" 2>/dev/null || true
    done
    echo "CPU frequency locked to maximum"
else
    echo "CPU frequency scaling not available"
fi
EOF
chmod +x /usr/local/bin/set-cpu-frequency.sh

echo "=== System tuning complete. Reboot required. ==="

02-storage-setup.sh

#!/bin/bash
set -euo pipefail

echo "=== Configuring EBS Storage ==="

EBS_DEVICE="${1:-}"
MOUNT_POINT="/opt/benchmark-data"

if [ -z "$EBS_DEVICE" ]; then
    echo "Auto-detecting EBS device..."
    ROOT_DEVICE=$(lsblk -no PKNAME "$(findmnt -n -o SOURCE /)" | head -1)
    EBS_DEVICE="/dev/$(lsblk -dnr -o NAME | grep -E '(nvme|xvd)' | grep -v "$ROOT_DEVICE" | grep -v 'xvda' | head -1)"
    echo "Detected: $EBS_DEVICE"
fi

[ ! -b "$EBS_DEVICE" ] && echo "ERROR: Device $EBS_DEVICE not found!" && exit 1

# Create filesystem
mkfs.ext4 -F -O ^has_journal -E stride=32,stripe-width=32 -b 4096 -m 0 "$EBS_DEVICE"

mkdir -p "$MOUNT_POINT"
UUID=$(blkid -s UUID -o value "$EBS_DEVICE")
sed -i "\|$MOUNT_POINT|d" /etc/fstab
echo "UUID=$UUID $MOUNT_POINT ext4 noatime,nodiratime,nobarrier,discard 0 2" >> /etc/fstab
mount "$MOUNT_POINT"

mkdir -p "$MOUNT_POINT"/{workdir,results,tmp}
chmod 755 "$MOUNT_POINT"/{workdir,results,tmp}

# I/O scheduler
DEVICE_NAME=$(basename "$EBS_DEVICE")
echo noop > "/sys/block/$DEVICE_NAME/queue/scheduler" 2>/dev/null || true
echo 32 > "/sys/block/$DEVICE_NAME/queue/nr_requests" 2>/dev/null || true

chown -R "$(logname)":"$(logname)" "$MOUNT_POINT" 2>/dev/null || true

echo "=== EBS configured at $MOUNT_POINT ==="

03-system-validation.sh

#!/bin/bash
set -euo pipefail

RESULTS_DIR="/opt/benchmark-data/results/canary"
LOG_FILE="/var/log/ec2-canary-tests.log"

log() { echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a "$LOG_FILE"; }

run_cpu_canary() {
    log "Running CPU canary..."
    mkdir -p "$RESULTS_DIR/cpu"
    local start=$(date +%s.%N)
    echo "scale=1000; 4*a(1)" | timeout 5s bc -l > /dev/null 2>&1 || true
    local duration=$(echo "$(date +%s.%N) - $start" | bc -l)
    echo "$duration" > "$RESULTS_DIR/cpu/pi_$(date +%s).txt"
    log "CPU canary: ${duration}s"
}

run_io_canary() {
    log "Running I/O canary..."
    mkdir -p "$RESULTS_DIR/io"
    mountpoint -q "/opt/benchmark-data" || { log "ERROR: Storage not mounted"; return 1; }

    local test_file="/opt/benchmark-data/tmp/canary.dat"
    local start=$(date +%s.%N)
    dd if=/dev/zero of="$test_file" bs=4k count=25600 oflag=direct 2>/dev/null
    local mbps=$(echo "scale=2; (25600 * 4) / 1024 / ($(date +%s.%N) - $start)" | bc -l)
    rm -f "$test_file"
    echo "$mbps" > "$RESULTS_DIR/io/write_$(date +%s).txt"
    log "I/O canary: ${mbps} MB/s"
}

check_readiness() {
    log "Checking readiness..."
    local ready=true
    local gov=$(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor 2>/dev/null || echo "unknown")
    [ "$gov" != "performance" ] && [ "$gov" != "unknown" ] && log "WARNING: Governor is '$gov'" && ready=false
    mountpoint -q "/opt/benchmark-data" || { log "WARNING: Storage not mounted"; ready=false; }
    [ "$ready" = true ] && log "System ready" || log "Configuration issues detected"
}

case "${1:-run}" in
    run) mkdir -p "$RESULTS_DIR"; run_cpu_canary; run_io_canary; log "Canary tests complete" ;;
    readiness) check_readiness ;;
    *) echo "Usage: $0 [run|readiness]" ;;
esac

revert-all-optimizations.sh

#!/bin/bash
set -euo pipefail

[ "$EUID" -ne 0 ] && echo "Must be run as root" && exit 1

echo "=== Reverting Optimizations ==="

# Re-enable all cores
for core in $(seq 0 31); do
    echo 1 > "/sys/devices/system/cpu/cpu$core/online" 2>/dev/null || true
done

# Restore governor
for gov in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do
    echo ondemand > "$gov" 2>/dev/null || true
done

# Unmount storage
umount "/opt/benchmark-data" 2>/dev/null || true
sed -i '\|/opt/benchmark-data|d' /etc/fstab

# Remove services and scripts
systemctl disable cpu-performance.service 2>/dev/null || true
rm -f /etc/systemd/system/cpu-performance.service
rm -f /etc/sysctl.d/99-benchmark-tuning.conf
rm -f /usr/local/bin/{run-canary-tests,benchmark-status,disable-hyperthreading.sh,set-cpu-frequency.sh}
rm -rf /usr/local/bin/benchmark-scripts
rm -f /etc/cron.d/benchmark-{canary,stats}

# Restore services
for svc in bluetooth cups avahi-daemon; do
    systemctl enable "$svc" 2>/dev/null || true
done

systemctl daemon-reload
echo "=== Revert complete. Reboot required. ==="

If you found this useful, I'd love to hear about your experience applying these optimizations to your own benchmarking infrastructure.

Special thanks to the MongoDB performance engineering team for their extensive research on EC2 benchmarking variability, which formed the foundation for this guide.