Iwan Setiawan

Posted on Mar 7 • Edited on Mar 16

Bare Metal vs. AWS RDS: A Deep Dive into NUMA-Aware Tuning and PostgreSQL Performance (Part 1)

#postgressql #devops #cloudnative #kubernetes

Bare Metal vs. AWS RDS: Storage Baseline — Longhorn vs Local SSD vs Managed Cloud

Before tuning anything, we needed to answer a simpler question first: does storage backend matter more than the platform itself?

This is Part 1 of a 2-part series. This article establishes bare metal storage baselines across four configurations and compares them against AWS managed PostgreSQL. Part 2 covers CPU/NUMA pinning and HugePages, where bare metal overtakes Aurora on write throughput.

Most PostgreSQL performance comparisons jump straight to config tuning. We didn't.

Before touching CPU governors or HugePages, we needed to answer a more fundamental question: how much does storage backend affect performance on bare metal Kubernetes? We ran four configurations — and the results reveal exactly where the bottleneck lives.

The Setup

All environments: 2 vCPU / 8 GB RAM throughout. Our bare metal node is a 32-core NUMA-aware host with Samsung SM863a Enterprise SSD in RAID 1 (SAS). AWS environments run on t3.large in ap-southeast-3.

Single-instance comparison throughout — one CNPG pod vs one RDS instance vs one Aurora instance. No read replicas, no Multi-AZ, no connection pooling.

PostgreSQL config — intentionally matched to AWS defaults:

shared_buffers       = ~1.9 GB
effective_cache_size = ~3.8 GB
work_mem             = 4 MB
max_connections      = 839
wal_buffers          = ~60 MB
maintenance_work_mem = 128 MB

By using the same config across all environments, performance differences come purely from platform and storage architecture — not tuning.

Benchmark: pgbench · Scale factor 100 (~10M rows) · 60s per run · 39 runs per environment

Four bare metal storage configurations tested:

Label	Storage	Replicas	Disk
CNPG Local SSD	Direct-attached Samsung SM863a SAS	—	Dedicated
CNPG Longhorn 1R	Longhorn distributed storage	1 replica	Dedicated
CNPG Longhorn 2R	Longhorn distributed storage	2 replicas	Dedicated
CNPG Longhorn 2R+shared	Longhorn distributed storage	2 replicas	Shared with OS/worker

Results: AWS RDS Standard (t3.large)

Clients	RO TPS	RO Lat	RW TPS	RW Lat	TPC-B TPS	TPC-B Lat
1	1,677	0.60 ms	253	3.95 ms	178	5.63 ms
10	13,955	0.72 ms	1,881	5.32 ms	1,460	6.85 ms
25	12,859	1.94 ms	2,839	8.80 ms	1,864	13.41 ms
50	10,397	4.81 ms	2,620	19.09 ms	1,646	30.37 ms
100	10,627	9.41 ms	2,585	38.68 ms	1,623	61.61 ms

Results: AWS Aurora IO-Optimized (t3.large)

Clients	RO TPS	RO Lat	RW TPS	RW Lat	TPC-B TPS	TPC-B Lat
1	2,607	0.38 ms	285	3.51 ms	218	4.58 ms
10	10,928	0.92 ms	984	10.16 ms	739	13.53 ms
25	9,265	2.70 ms	1,278	19.57 ms	880	28.42 ms
50	8,163	6.12 ms	1,472	33.96 ms	990	50.49 ms
100	7,783	12.85 ms	1,623	61.63 ms	1,027	97.41 ms

Results: AWS Aurora Standard (t3.large)

Clients	RO TPS	RO Lat	RW TPS	RW Lat	TPC-B TPS	TPC-B Lat
1	1,540	0.65 ms	191	5.23 ms	150	6.66 ms
10	10,020	1.00 ms	922	10.85 ms	690	14.48 ms
25	9,189	2.72 ms	1,179	21.20 ms	800	31.23 ms
50	8,014	6.24 ms	1,384	36.13 ms	897	55.77 ms
100	7,665	13.05 ms	1,557	64.22 ms	970	103.10 ms

Results: CNPG Local SSD

Clients	RO TPS	RO Lat	RW TPS	RW Lat	TPC-B TPS	TPC-B Lat
1	749	1.34 ms	134	7.48 ms	99	10.10 ms
10	7,675	1.30 ms	1,425	7.02 ms	1,031	9.70 ms
25	6,788	3.68 ms	1,560	16.02 ms	1,073	23.30 ms
50	6,430	7.78 ms	1,550	32.27 ms	996	50.18 ms
100	6,092	16.41 ms	1,464	68.32 ms	902	110.92 ms

Results: CNPG Longhorn 1 Replica (Dedicated Disk)

Clients	RO TPS	RO Lat	RW TPS	RW Lat	TPC-B TPS	TPC-B Lat
1	754	1.33 ms	119	8.43 ms	90	11.12 ms
10	7,713	1.30 ms	940	10.64 ms	748	13.37 ms
25	7,311	3.42 ms	1,254	19.93 ms	1,015	24.64 ms
50	6,587	7.59 ms	1,384	36.12 ms	1,064	46.98 ms
100	6,109	16.37 ms	1,453	68.83 ms	1,009	99.14 ms

Results: CNPG Longhorn 2 Replicas (Dedicated Disk)

Clients	RO TPS	RO Lat	RW TPS	RW Lat	TPC-B TPS	TPC-B Lat
1	752	1.33 ms	103	9.71 ms	81	12.39 ms
10	7,655	1.31 ms	712	14.04 ms	607	16.47 ms
25	7,379	3.39 ms	996	25.11 ms	835	29.93 ms
50	6,699	7.46 ms	1,144	43.71 ms	908	55.07 ms
100	6,063	16.49 ms	1,269	78.78 ms	852	117.39 ms

Results: CNPG Longhorn 2 Replicas (Shared Disk)

Clients	RO TPS	RO Lat	RW TPS	RW Lat	TPC-B TPS	TPC-B Lat
1	741	1.35 ms	97	10.32 ms	76	13.08 ms
10	8,399	1.19 ms	681	14.69 ms	598	16.72 ms
25	8,279	3.02 ms	957	26.11 ms	802	31.17 ms
50	7,406	6.75 ms	1,081	46.27 ms	873	57.29 ms
100	6,697	14.93 ms	1,206	82.89 ms	829	120.67 ms

The Combined Average Summary

Averaged across all 13 client/thread combinations per workload type:

Config	Avg RO TPS	Avg RW TPS	Avg RW Lat	Overall Avg TPS
AWS RDS Standard	10,724	2,250	17.30 ms	4,826
AWS Aurora IO-Prov	8,370	1,234	29.72 ms	3,480
AWS Aurora Standard	8,039	1,162	31.45 ms	3,326
CNPG Local SSD	6,111	1,355	30.02 ms	2,796
CNPG Longhorn 1R	6,376	1,152	32.46 ms	2,797
CNPG Longhorn 2R	6,356	935	39.35 ms	2,671
CNPG Longhorn 2R+shared	7,052	892	40.95 ms	2,885

What The Data Tells Us

Finding 1: Overall average is misleading for storage comparison.
Overall avg TPS (RO+RW+TPCB combined) shows all bare metal configs at ~2,670–2,885 — nearly identical. This is because read tests dominate by volume and read performance is unaffected by storage. Always disaggregate by workload type.

Finding 2: Read performance — storage backend is irrelevant.
All four bare metal configs produce 6,111–7,052 Avg RO TPS — variation is within normal test noise. Aurora leads on reads (8,039–8,370) due to its distributed read-optimized storage layer.

Finding 3: Write performance — Local SSD wins clearly.
Local SSD delivers 1,355 Avg RW TPS vs Longhorn 2R's 892 — a 52% advantage. Every Longhorn replica adds ~3–4 ms write latency (network round-trip for replication acknowledgment).

Finding 4: Dedicated vs shared disk makes almost no difference.
Longhorn 2R dedicated (935 Avg RW TPS) vs Longhorn 2R shared (892) — only 4.6% difference. The bottleneck is network replication, not disk I/O contention.

Finding 5: Bare metal Local SSD write TPS beats Aurora IO-Optimized.
Local SSD Avg RW TPS (1,355) vs Aurora IO-Prov (1,234) — +9.8% advantage at baseline, before any CPU or kernel tuning. Aurora's write path pays network replication overhead just like Longhorn.

Finding 6: RDS Standard leads overall — but it's burstable.
RDS Standard's 4,826 overall avg and 2,250 Avg RW TPS comes from t3 CPU burst credits. Once credits are exhausted in sustained workloads, performance drops significantly.

Recommendations

Workload	Recommendation
Write-intensive OLTP	Local SSD — 52% higher write TPS vs Longhorn 2R
Read-heavy (API, reporting)	Longhorn is fine — zero read overhead vs local SSD
HA with block-level durability	Longhorn 2R — accept write penalty, gain replication
Best write + HA	Local SSD + CNPG streaming replication — no storage network in write path
Managed simplicity	Aurora Standard — competitive write TPS, no ops overhead

**→ Part 2 :** CPU/NUMA pinning + HugePages — pushing bare metal write performance even further past Aurora.

Environment Details

CloudNativePG: v1.24 on Kubernetes 1.31
Host: Bare Metal 32-Core (16 Physical / 16 HT), NUMA-Aware
Storage: Samsung SM863a Enterprise SSD RAID 1 (SAS Interface)
PostgreSQL config: Intentionally matched to AWS t3.large defaults for fair comparison
Deployment: Single instance — no HA, no read replicas, no connection pooling
AWS Region: ap-southeast-3 (Indonesia)
Scale Factor: 100 (~10M rows, ~1.5 GB table)
Benchmark runner: Kubernetes-native pgbench Job — source on GitHub

— Iwan Setiawan, Hybrid Cloud & Platform Architect · portfolio.kangservice.cloud

DEV Community