The $400/Month Surprise
I ran the same BERT model on a T4 GPU and a 4-core CPU for a month. The GPU was faster, obviously. But it cost $400 more than the CPU setup, which handled 95% of requests under our 200ms SLA just fine.
Most benchmarks compare raw throughput or single-request latency. They skip the part where you actually pick hardware for a production system with a budget, an SLA, and real traffic patterns. This post runs five realistic scenarios — from a personal side project to a high-traffic API — and shows when GPUs pay for themselves and when you're just burning money.
Test Setup: Models, Hardware, Traffic
I tested three model sizes across two compute tiers:
Models:
- BERT-base (110M params): text classification, sequence length 128
- ResNet50 (25M params): image classification, 224×224 input
- Whisper-tiny (39M params): speech-to-text, 30s audio clips
Hardware:
- GPU: AWS g4dn.xlarge (NVIDIA T4, 16GB VRAM, 4 vCPUs) — $0.526/hour on-demand
Continue reading the full article on TildAlice

Top comments (0)