DEV Community

Cover image for Bare Metal vs. AWS RDS: A Deep Dive into NUMA-Aware Tuning and PostgreSQL Performance
Iwan Setiawan
Iwan Setiawan

Posted on

Bare Metal vs. AWS RDS: A Deep Dive into NUMA-Aware Tuning and PostgreSQL Performance

A real-world comparison across 7 environments, same workload, same hardware class, zero guesswork.


When a client asked whether self-managed PostgreSQL on bare metal Kubernetes could replace their AWS RDS setup, I didn't want to answer based on intuition. So we ran the benchmarks ourselves — methodically, across seven environments, with the same scale factor and the same hardware class throughout.

This article documents what we found, including a tuning discovery that cut baseline latency by 56% without changing a single piece of hardware.


The Setup: 32-Core NUMA-Aware Beast

To keep the comparison fair, all environments (Bare Metal vs Cloud) were allocated 2 vCPU / 8 GB RAM.

However, there is a fundamental difference in the underlying "Metal." Our bare metal node is a powerhouse:

  • 32 Cores (16 Physical Cores / 16 Hyper-Threading)
  • NUMA-Aware Architecture
  • Local SSD SAS Storage

Unlike Cloud VMs (t3.large) that run on a hypervisor with "noisy neighbors," our Bare Metal allows the PostgreSQL process to talk directly to physical cores and dedicated memory banks. This is why "2 vCPU" on Bare Metal is not the same as "2 vCPU" in the Cloud.

Label Description
CNPG Local ① CloudNativePG, local-path storage, default tuning
CNPG Local ② Same cluster, work_mem + connection tuning
CNPG Local ③ Same cluster, shared_buffers maximized
CNPG Longhorn CloudNativePG, Longhorn distributed storage
RDS Standard AWS RDS PostgreSQL 17.6, t3.large
Aurora IO-Opt AWS Aurora PostgreSQL 17.4, IO-Optimized
Aurora Standard AWS Aurora PostgreSQL 17.4, Standard

1. The Benchmark Results (Average TPS)

After multiple iterations across the full matrix, here is the final average performance leaderboard:

Environment Avg TPS (Read-Write) Performance Status
AWS RDS (t3.large) 4,826.39 Peak Performance (Burstable)
AWS Aurora (I/O Prov.) 3,480.31 High Performance (Expensive)
CNPG + Tuning ③ 3,350.58 The Efficiency King (Bare Metal)
AWS Aurora Standard 3,325.89 AWS Standard Tier
CNPG + Tuning ② 3,214.43 High Performance
CNPG + Longhorn 1,654.84 Distributed Storage Overhead

2. Peak Read-Only TPS (Maximum Reads)

Maximum read throughput achieved across all client/thread combinations:

Environment Peak Read TPS Best Config
RDS Standard 13,955 10c / 4t
Aurora IO-Optimized 10,928 10c / 1t
Aurora Standard 10,020 10c / 1t
CNPG Local ③ 8,325 10c / 1t
CNPG Local ② 8,065 10c / 1t
CNPG Longhorn 6,165 10c / 8t
CNPG Local ① 4,758 10c / 8t

RDS Standard leads on raw reads, but the 75% gap between CNPG ① and ③ (on the same physical hardware) proves that configuration is just as critical as the underlying metal.


3. Peak Read-Write TPS (Maximum Writes)

Environment Peak Write TPS Best Config
RDS Standard 2,839 25c / 8t
CNPG Local ③ 2,539 25c / 1t
Aurora IO-Opt 1,622 100c / 1t
Aurora Standard 1,557 100c / 8t
CNPG Longhorn 1,318 25c / 8t
CNPG Local ① 1,254 25c / 1t

Result: Tuned bare metal beats both Aurora variants. Aurora's write path involves distributed storage replication, which adds latency. Our NUMA-aware physical cores combined with local SSD SAS proved superior for raw write throughput.


4. The Tuning Discovery That Changed Everything

CNPG Local ① started with poor performance: 1,116 TPS (0.896 ms latency).

Step 1: Solving Memory Contention (Tuning ②)

We reduced work_mem (32MB -> 4MB) and max_connections (800 -> 200).
The Math: 800 connections × 32MB = 25.6GB potential RAM pressure on an 8GB node. This caused constant swapping. By capping connections, the system stabilized.
Result: 2,402 TPS (115% improvement).

Step 2: Maximizing Buffer Pool (Tuning ③)

We increased shared_buffers to 6GB and utilized HugePages, leveraging our host's direct memory access and NUMA affinity.
Result: Latency dropped to 0.394 ms.

① 0.896 ms ████████████████████████
② 0.416 ms ███████████
③ 0.394 ms ██████████

56% latency reduction. Same hardware.


5. Final Verdict: Performance vs. Cost Efficiency

"While RDS Standard showed higher peak TPS due to the burstable nature of the t3.large instance, it comes with a significantly higher price tag and unpredictable long-term performance once CPU credits are exhausted. For consistent production workloads where cost-efficiency is a priority, CNPG on Bare Metal is the clear winner—delivering 70% of the performance at 1/10th of the cost."

The Burstable CPU Trap

AWS t3 instances rely on a credit-based system. Once those credits are gone, performance drops. Our 32-Core Bare Metal gives you Dedicated Resources. No CPU credits, no "noisy neighbors." You get 100% of the CPU, 100% of the time.

The ROI Factor

Our Bare Metal host 32-Core Bare Metal and 512GB RAM. On AWS, a 512GB RAM instance would cost a fortune. On Bare Metal, we can host dozens of high-performance clusters on this one machine for a flat rate of ~Rp 7M/mo.


Key Takeaways

  1. Storage is Everything: Our tests with Longhorn showed a massive performance drop. For databases, local SSD SAS tuning is non-negotiable.
  2. The "Cloud Tax" is Real: You are paying a massive premium for the "Managed" label. With CNPG, managing Postgres on K8s becomes viable and much cheaper.
  3. Know Your Workload: If you have unpredictable spikes, RDS burst capacity is great. But for a stable, high-traffic system, Bare Metal is the foundation.
  4. Default Config is the Enemy: The interaction between work_mem and max_connections is non-linear. Never trust the defaults.
  5. Bare Metal is Not Inherently Slower: Properly tuned, we achieved 69% of RDS's write throughput on owned hardware.
  6. Aurora's Write Path Overhead: Distributed storage optimizes for durability and read scale-out, not raw single-node write speed.
  7. Storage Choice = Config Choice: Longhorn vs. local-path is a 33-50% difference. The storage backend deserves as much attention as the DB config.

Environment Details

  • CloudNativePG: v1.24 on K8s 1.31
  • Host: Bare Metal 32-Core (16 Physical/16 HT), NUMA-Aware
  • Storage: 2x Samsung SM863a Enterprise SSD in RAID 1 (SAS Interface). Even without NVMe, proper tuning achieved 100% of Aurora Standard's performance.
  • AWS Region: ap-southeast-3 (Indonesia)
  • Scale Factor: 100 (10M Rows)

— Iwan Setiawan, Hybrid Cloud & Platform Architect · portfolio.kangservice.cloud

Top comments (0)