Last week I ran nvidia-smi on my MacBook Pro M4 Max.
128GB unified memory. 7,168 CUDA cores. CUDA 12.8, running natively on Apple Silicon.
Then I loaded a 70B parameter LLM. Full QLoRA finetune. On a laptop. From my couch.
The Part Nobody Talks About
The H100 has 80GB of HBM3. The M4 Max has 128GB unified. The model that literally doesn't fit on a $40,000 datacenter GPU fits on a MacBook.
The Math Nobody Does
| Setup | Cost |
|---|---|
| H100 cloud | 730 hrs x $3/hr = $2,190/month = $26,280/year |
| M4 Max MacBook Pro | $4,000 one-time |
Break-even: month 2. After that: pure savings.
Inference Performance
The M4 Max's memory bandwidth (546 GB/s) gives me about 15 tok/s on a 70B model. Production-usable for most use cases.
The Real Shift
Three years ago, finetuning a 70B model required a cluster. Now it requires a laptop and an afternoon.
What's your current setup for ML work? Cloud or local?
Top comments (0)