I Couldn't Build a Local LLM PC for $1,300 — Budget Tiers and the VRAM Cliffs Between Them
You want to run LLMs locally. But "which GPU should I buy?" has no decent answer. Gaming benchmarks are everywhere. "How many billion parameters fit in this much VRAM?" — almost nowhere.
I started at $3,500, then cut to $2,000, $1,700, $1,300. Three breaking points appeared.
Scope: New parts only, NVIDIA GPUs. Used cards (RTX 3090), AMD GPUs (RX 7900 XTX), and Apple Silicon are valid alternatives, but each introduces warranty, software compatibility, or availability trade-offs that deserve their own articles. US street pricing as of early 2026.
The Premise: VRAM Decides Everything
Local LLM inference speed is nothing like gaming fps. Whether the model fits entirely in VRAM creates a discontinuous jump in performance.
CUDA core count and clock speed are secondary. If the full model sits in VRAM, inference is fast. If it spills to system RAM, the CPU-bound layers become the bottleneck and speed drops by an order of magnitude.
Measured on an RTX 4060 8GB: a 9B model with all layers on GPU runs at 33 t/s. A 27B model loads only 24 of 58 layers onto the GPU — 3.6 t/s. Same GPU, nearly 10x difference. That's the VRAM cliff.
The "fits or doesn't fit" boundary is the cliff, and cutting budget pushes you straight into it.
$3,500: You Still Have to Choose
The RTX 5090 has 32GB GDDR7. Dream-tier VRAM for local LLM work. But the card alone is $2,000 MSRP, with street prices hitting $2,500–3,000.
[5090 Route: GPU-First]
GPU: RTX 5090 32GB ~$2,700 (street price)
CPU: Ryzen 5 9600X ~$200 ← budget consumed by GPU
RAM: DDR5 64GB ~$120
M/B: B650 ~$150
SSD: 1TB NVMe Gen4 ~$60
PSU: 1000W 80+ Gold ~$140
Case ~$80
──────────────────────────
Total ~$3,450
At $3,500 the 5090 build barely fits — and everything around the GPU is squeezed thin. You get 32GB VRAM, but the rest of the system is entry-level.
[5080 Route: Balanced]
GPU: RTX 5080 16GB ~$1,000
CPU: Ryzen 7 9800X3D ~$400
RAM: DDR5 128GB (32GBx4) ~$250
M/B: B650E ~$180
SSD: 2TB NVMe Gen4 ~$120
PSU: 850W 80+ Gold ~$110
Case ~$90
──────────────────────────
Total ~$2,150 (well under budget)
The 5080 route gives 16GB VRAM with headroom everywhere else. A 27B model at Q4_K_M (~16.7GB file size) won't fit entirely, but the majority of layers land on GPU. Compared to 8GB partial offload (3.6 t/s), you're looking at an estimated 15–25 t/s. Different world.
128GB RAM matters for MoE models. Qwen3.5-35B-A3B consumes over 30GB of system RAM for inactive expert weights in real measurements. 128GB handles large MoE without breaking a sweat.
The decision at $3,500: go all-in on 5090, or take 5080 and balance everything? The 32GB VRAM advantage only matters when you're running 70B+ models seriously. For 27B-class and below, 16GB is enough to fight with. I'd pick the 5080 route.
$2,000: The Budget Where Decisions Get Interesting
Dropping from $3,500 to $2,000 makes the "GPU decides everything" structure even more visible.
GPU: RTX 5070 Ti 16GB ~$750
CPU: Ryzen 7 9700X ~$280
RAM: DDR5 96GB (48GBx2) ~$200
M/B: B650 ~$150
SSD: 1TB NVMe Gen4 ~$60
PSU: 750W 80+ Gold ~$90
Case ~$60
──────────────────────────
Total ~$1,590
The 5070 Ti holds 16GB VRAM while keeping the CPU and RAM at respectable levels. The remaining $400 can go to a better CPU cooler, SSD expansion, or savings.
There's a fork here: "Should I stretch to the RTX 5080 at ~$1,000?" The 5080 and 5070 Ti share the same 16GB VRAM. The difference is CUDA core count and memory bandwidth. For LLM inference where VRAM is the same, the measured gap is roughly 10–20% from bandwidth.
I'd pick the 5070 Ti and put the $250 savings into RAM. MoE model inactive experts live in system RAM, so there are real scenarios where RAM capacity affects the experience more than a 10% GPU speed bump.
$2,000 is the "comfort line" for a local LLM build. 16GB VRAM + 96GB RAM + a solid CPU. Below this, you start sacrificing something important.
$1,700: The Cliff's Edge
Cutting just $300 from $2,000 changes the build's stability dramatically.
Keeping the 5070 Ti
GPU: RTX 5070 Ti 16GB ~$750
CPU: Ryzen 5 7600 ~$160 ← dropped two generations
RAM: DDR5 64GB ~$120 ← down from 96GB
M/B: B650 ~$150
SSD: 1TB NVMe ~$60
PSU: 750W ~$90
Case ~$60
──────────────────────────
Total ~$1,390
Everything except the GPU is compromised. The CPU drops two generations. RAM shrinks from 96GB to 64GB. Running Qwen3.5-35B-A3B MoE required 30GB+ of RAM in measurements, so at 64GB, coexistence with other applications gets tight.
Switching to the 5070 (12GB)
GPU: RTX 5070 12GB ~$550
CPU: Ryzen 5 9600X ~$200
RAM: DDR5 64GB ~$120
M/B + SSD + PSU + Case ~$340
──────────────────────────
Total ~$1,210
The balance is better. But 12GB VRAM occupies an awkward position.
12GB VRAM: Improvement, Not Transformation
Mapping "what fits" by VRAM tier makes 12GB's position clear.
VRAM Models (Q4_K_M) Est. Speed New Capability
──────────────────────────────────────────────────────────────
8GB 7B–9B all layers 25–35 t/s ← Entry line
12GB 7B–9B comfortably, 14B full 14B: 15–25 t/s ← 14B added
16GB 27B mostly on GPU 27B: 10–20 t/s ← 27B in range ★
24GB+ 27B full, 70B partial 27B: 20–30 t/s ← 70B realistic
Going from 8GB to 12GB unlocks the 14B class (Qwen3 14B, Phi-4 14B, etc.) with all layers on GPU. 14B is a genuine quality step up from 9B, especially in code generation and reasoning tasks. The improvement is real.
But the 12GB-to-16GB jump is bigger. The quality gap between 9B and 27B is significantly larger than between 9B and 14B. Code generation accuracy, long-context coherence, complex instruction following — there are many tasks where 27B is the first size that feels "usable."
Same 4GB increment, different returns:
- 8GB → 12GB: Incremental improvement (14B becomes available. Noticeable, not transformative)
- 12GB → 16GB: Step-function leap (27B class comes into range. The experience changes)
12GB isn't worthless. But if you're choosing where to invest 4GB under a budget constraint, the return on 12GB→16GB is overwhelmingly larger.
$1,300: The Budget Where I Couldn't Build
Trying to hold 16GB VRAM at $1,300:
RTX 5070 Ti 16GB ($750) → $550 left for the entire PC
CPU: Bottom-tier
RAM: 32GB (not even an upgrade over many laptops)
SSD: 500GB (fills up with 2-3 models)
Everything except the GPU drops below a current mid-range laptop. That's not "building a desktop" — that's a GPU in a cardboard box.
Dropping to 12GB VRAM does allow a balanced $1,300 build. As a platform for 14B models, it's not bad. But the leap from an 8GB environment is limited, and as the payoff for building a desktop PC, it feels thin.
Spending $1,300 on a 12GB VRAM desktop versus spending $1,300 on a laptop with an RTX 4060 8GB — which gives a richer local LLM experience? Honestly, it's a close call.
Three Cliffs Revealed by Budget
| Cliff | Range | What Happens |
|---|---|---|
| VRAM cliff | 12GB → 16GB | 27B class comes into range. Same 4GB, nonlinear returns |
| Build cliff | $2,000 → $1,700 | Everything except GPU collapses. $300 changes the entire picture |
| Viability cliff | $1,700 → $1,300 | Holding 16GB becomes impossible. Even 12GB has questionable ROI |
Bottom line: $1,700 is the floor for building a local LLM PC. At this price point you can still get an RTX 5070 Ti 16GB with a reasonable CPU and RAM. Below that, you can technically build something, but the value proposition breaks down.
If you can hit $1,700, stretching to $2,000 is worth it. RAM goes from 64GB to 96GB, and MoE model operation becomes practical.
References
- Running Qwen2.5-32B on RTX 4060 8GB — Beating M4 at 10.8 t/s with llama.cpp — The limits and possibilities of 8GB VRAM
- MoE Beat Dense 27B by 2.4x on 8GB VRAM — The 35B-A3B Benchmark Nobody Expected — Measured RAM consumption of MoE models
- Parameter Count Is the Worst Way to Pick a Model on 8GB VRAM — Measured data on the VRAM cliff
Top comments (0)