DEV Community

Cover image for Bare Metal vs. AWS RDS: A Deep Dive into NUMA-Aware Tuning and PostgreSQL Performance (Part 1)
Iwan Setiawan
Iwan Setiawan

Posted on • Edited on

Bare Metal vs. AWS RDS: A Deep Dive into NUMA-Aware Tuning and PostgreSQL Performance (Part 1)

Bare Metal vs. AWS RDS: Storage Baseline — Longhorn vs Local SSD vs Managed Cloud

Before tuning anything, we needed to answer a simpler question first: does storage backend matter more than the platform itself?

This is Part 1 of a 2-part series. This article establishes bare metal storage baselines across four configurations and compares them against AWS managed PostgreSQL. Part 2 covers CPU/NUMA pinning and HugePages, where bare metal overtakes Aurora on write throughput.


Most PostgreSQL performance comparisons jump straight to config tuning. We didn't.

Before touching CPU governors or HugePages, we needed to answer a more fundamental question: how much does storage backend affect performance on bare metal Kubernetes? We ran four configurations — and the results reveal exactly where the bottleneck lives.


The Setup

All environments: 2 vCPU / 8 GB RAM throughout. Our bare metal node is a 32-core NUMA-aware host with Samsung SM863a Enterprise SSD in RAID 1 (SAS). AWS environments run on t3.large in ap-southeast-3.

Single-instance comparison throughout — one CNPG pod vs one RDS instance vs one Aurora instance. No read replicas, no Multi-AZ, no connection pooling.

PostgreSQL config — intentionally matched to AWS defaults:

shared_buffers       = ~1.9 GB
effective_cache_size = ~3.8 GB
work_mem             = 4 MB
max_connections      = 839
wal_buffers          = ~60 MB
maintenance_work_mem = 128 MB
Enter fullscreen mode Exit fullscreen mode

By using the same config across all environments, performance differences come purely from platform and storage architecture — not tuning.

Benchmark: pgbench · Scale factor 100 (~10M rows) · 60s per run · 39 runs per environment

Four bare metal storage configurations tested:

Label Storage Replicas Disk
CNPG Local SSD Direct-attached Samsung SM863a SAS Dedicated
CNPG Longhorn 1R Longhorn distributed storage 1 replica Dedicated
CNPG Longhorn 2R Longhorn distributed storage 2 replicas Dedicated
CNPG Longhorn 2R+shared Longhorn distributed storage 2 replicas Shared with OS/worker

Results: AWS RDS Standard (t3.large)

Clients RO TPS RO Lat RW TPS RW Lat TPC-B TPS TPC-B Lat
1 1,677 0.60 ms 253 3.95 ms 178 5.63 ms
10 13,955 0.72 ms 1,881 5.32 ms 1,460 6.85 ms
25 12,859 1.94 ms 2,839 8.80 ms 1,864 13.41 ms
50 10,397 4.81 ms 2,620 19.09 ms 1,646 30.37 ms
100 10,627 9.41 ms 2,585 38.68 ms 1,623 61.61 ms

Results: AWS Aurora IO-Optimized (t3.large)

Clients RO TPS RO Lat RW TPS RW Lat TPC-B TPS TPC-B Lat
1 2,607 0.38 ms 285 3.51 ms 218 4.58 ms
10 10,928 0.92 ms 984 10.16 ms 739 13.53 ms
25 9,265 2.70 ms 1,278 19.57 ms 880 28.42 ms
50 8,163 6.12 ms 1,472 33.96 ms 990 50.49 ms
100 7,783 12.85 ms 1,623 61.63 ms 1,027 97.41 ms

Results: AWS Aurora Standard (t3.large)

Clients RO TPS RO Lat RW TPS RW Lat TPC-B TPS TPC-B Lat
1 1,540 0.65 ms 191 5.23 ms 150 6.66 ms
10 10,020 1.00 ms 922 10.85 ms 690 14.48 ms
25 9,189 2.72 ms 1,179 21.20 ms 800 31.23 ms
50 8,014 6.24 ms 1,384 36.13 ms 897 55.77 ms
100 7,665 13.05 ms 1,557 64.22 ms 970 103.10 ms

Results: CNPG Local SSD

Clients RO TPS RO Lat RW TPS RW Lat TPC-B TPS TPC-B Lat
1 749 1.34 ms 134 7.48 ms 99 10.10 ms
10 7,675 1.30 ms 1,425 7.02 ms 1,031 9.70 ms
25 6,788 3.68 ms 1,560 16.02 ms 1,073 23.30 ms
50 6,430 7.78 ms 1,550 32.27 ms 996 50.18 ms
100 6,092 16.41 ms 1,464 68.32 ms 902 110.92 ms

Results: CNPG Longhorn 1 Replica (Dedicated Disk)

Clients RO TPS RO Lat RW TPS RW Lat TPC-B TPS TPC-B Lat
1 754 1.33 ms 119 8.43 ms 90 11.12 ms
10 7,713 1.30 ms 940 10.64 ms 748 13.37 ms
25 7,311 3.42 ms 1,254 19.93 ms 1,015 24.64 ms
50 6,587 7.59 ms 1,384 36.12 ms 1,064 46.98 ms
100 6,109 16.37 ms 1,453 68.83 ms 1,009 99.14 ms

Results: CNPG Longhorn 2 Replicas (Dedicated Disk)

Clients RO TPS RO Lat RW TPS RW Lat TPC-B TPS TPC-B Lat
1 752 1.33 ms 103 9.71 ms 81 12.39 ms
10 7,655 1.31 ms 712 14.04 ms 607 16.47 ms
25 7,379 3.39 ms 996 25.11 ms 835 29.93 ms
50 6,699 7.46 ms 1,144 43.71 ms 908 55.07 ms
100 6,063 16.49 ms 1,269 78.78 ms 852 117.39 ms

Results: CNPG Longhorn 2 Replicas (Shared Disk)

Clients RO TPS RO Lat RW TPS RW Lat TPC-B TPS TPC-B Lat
1 741 1.35 ms 97 10.32 ms 76 13.08 ms
10 8,399 1.19 ms 681 14.69 ms 598 16.72 ms
25 8,279 3.02 ms 957 26.11 ms 802 31.17 ms
50 7,406 6.75 ms 1,081 46.27 ms 873 57.29 ms
100 6,697 14.93 ms 1,206 82.89 ms 829 120.67 ms

The Combined Average Summary

Averaged across all 13 client/thread combinations per workload type:

Config Avg RO TPS Avg RW TPS Avg RW Lat Overall Avg TPS
AWS RDS Standard 10,724 2,250 17.30 ms 4,826
AWS Aurora IO-Prov 8,370 1,234 29.72 ms 3,480
AWS Aurora Standard 8,039 1,162 31.45 ms 3,326
CNPG Local SSD 6,111 1,355 30.02 ms 2,796
CNPG Longhorn 1R 6,376 1,152 32.46 ms 2,797
CNPG Longhorn 2R 6,356 935 39.35 ms 2,671
CNPG Longhorn 2R+shared 7,052 892 40.95 ms 2,885

What The Data Tells Us

Finding 1: Overall average is misleading for storage comparison.
Overall avg TPS (RO+RW+TPCB combined) shows all bare metal configs at ~2,670–2,885 — nearly identical. This is because read tests dominate by volume and read performance is unaffected by storage. Always disaggregate by workload type.

Finding 2: Read performance — storage backend is irrelevant.
All four bare metal configs produce 6,111–7,052 Avg RO TPS — variation is within normal test noise. Aurora leads on reads (8,039–8,370) due to its distributed read-optimized storage layer.

Finding 3: Write performance — Local SSD wins clearly.
Local SSD delivers 1,355 Avg RW TPS vs Longhorn 2R's 892 — a 52% advantage. Every Longhorn replica adds ~3–4 ms write latency (network round-trip for replication acknowledgment).

Finding 4: Dedicated vs shared disk makes almost no difference.
Longhorn 2R dedicated (935 Avg RW TPS) vs Longhorn 2R shared (892) — only 4.6% difference. The bottleneck is network replication, not disk I/O contention.

Finding 5: Bare metal Local SSD write TPS beats Aurora IO-Optimized.
Local SSD Avg RW TPS (1,355) vs Aurora IO-Prov (1,234) — +9.8% advantage at baseline, before any CPU or kernel tuning. Aurora's write path pays network replication overhead just like Longhorn.

Finding 6: RDS Standard leads overall — but it's burstable.
RDS Standard's 4,826 overall avg and 2,250 Avg RW TPS comes from t3 CPU burst credits. Once credits are exhausted in sustained workloads, performance drops significantly.


Recommendations

Workload Recommendation
Write-intensive OLTP Local SSD — 52% higher write TPS vs Longhorn 2R
Read-heavy (API, reporting) Longhorn is fine — zero read overhead vs local SSD
HA with block-level durability Longhorn 2R — accept write penalty, gain replication
Best write + HA Local SSD + CNPG streaming replication — no storage network in write path
Managed simplicity Aurora Standard — competitive write TPS, no ops overhead

**→ Part 2 :** CPU/NUMA pinning + HugePages — pushing bare metal write performance even further past Aurora.


Environment Details

  • CloudNativePG: v1.24 on Kubernetes 1.31
  • Host: Bare Metal 32-Core (16 Physical / 16 HT), NUMA-Aware
  • Storage: Samsung SM863a Enterprise SSD RAID 1 (SAS Interface)
  • PostgreSQL config: Intentionally matched to AWS t3.large defaults for fair comparison
  • Deployment: Single instance — no HA, no read replicas, no connection pooling
  • AWS Region: ap-southeast-3 (Indonesia)
  • Scale Factor: 100 (~10M rows, ~1.5 GB table)
  • Benchmark runner: Kubernetes-native pgbench Job — source on GitHub

— Iwan Setiawan, Hybrid Cloud & Platform Architect · portfolio.kangservice.cloud

Top comments (0)