This article was originally published on runaihome.com
TL;DR: Bloomberg reports Apple will skip the M6 Pro/Max chips and fast-track an "AI-focused" M7 line — but the home-lab-relevant tiers (M7 Max/Ultra) land at the end of 2027 at the earliest. Token generation is bandwidth-bound, not Neural-Engine-bound, so "AI-focused" branding won't translate into proportional tok/s gains. If you need a Mac for local AI, buy the M5 Max with max RAM now.
| Buy M5 Max now | Wait for M7 | Used RTX 3090 tower | |
|---|---|---|---|
| Availability | Now (shipping) | M7 Max ~end 2027 / Ultra 2028 | Now |
| Bandwidth | 614 GB/s (40-core) | TBD (M6 base only 153→200 GB/s) | 936 GB/s |
| Best for | Capacity (70B+ on 128GB) | Nobody waiting today | Speed/$ under 24GB |
| Entry price | ~$3,599 (MBP M5 Max 2TB) | Unknown | ~$1,070 used |
| The catch | Pay now, no AI discount | 18+ month wait, NPU ≠ tok/s | 24GB ceiling, 350W |
Honest take: The chip worth waiting for (M7 Max/Ultra) is 18+ months out, and "AI-focused" mostly means a faster Neural Engine that doesn't drive token generation. Buy the M5 Max now if you want a Mac; buy a used RTX 3090 if you want tokens-per-dollar and can live under 24GB.
What the report actually says
On June 25, 2026, Bloomberg's Mark Gurman reported that Apple plans to skip the high-end M6 Mac chips — no M6 Pro, no M6 Max — and jump straight to an AI-focused M7 generation. This would be the first time since the 2020 move to Apple Silicon that Apple hasn't shipped a Pro/Max variant of a chip generation.
The specifics, per the report:
- A base M6 (codenamed Komodo / H18G) still ships in 2026 for entry-level Macs. It improves memory bandwidth to ~200 GB/s, up from ~153 GB/s on the M5 base, with an updated memory architecture and an upgraded Neural Engine.
- The M7 line is designed primarily around on-device AI processing. Apple is reportedly fast-tracking technologies it originally planned for later.
- Timeline: base M7 as early as the first half of 2027, M7 Pro and M7 Max as early as the end of 2027, and the M7 Ultra in 2028.
That last bullet is the whole story for home-lab buyers. The chips that matter for local AI — the Max and Ultra tiers with the wide memory buses and 128GB+ RAM ceilings — are a year and a half away at best, and historically Apple's "as early as" dates slip.
This is a single-sourced report about unannounced products. Treat the timeline as directional, not a promise. But even taking it at face value, the conclusion for anyone shopping today is clear.
Why "AI-focused" doesn't mean "faster at local LLMs"
Here's the trap. Apple markets Neural Engine TOPS. The M7 is reportedly built around on-device AI. It's natural to assume an "AI chip" will run your local models proportionally faster. It won't — and the reason is the single most important fact in all of local inference:
Token generation is bottlenecked by memory bandwidth, not compute.
When an LLM generates a token, it has to stream every active weight from memory once. A 30B model at Q4 is ~18GB of weights; at 600 GB/s that read takes roughly 30ms, which caps you at ~30 tok/s no matter how many TOPS the chip claims. The Neural Engine — the part Apple is pouring "AI focus" into — barely touches the decode path. On Apple Silicon, Ollama and MLX run inference on the GPU via Metal, not the Neural Engine. We dug into exactly why TOPS doesn't predict tokens/second in NPU vs Discrete GPU for Local LLMs.
So what would actually make an M7 faster at local LLMs? More memory bandwidth. And the only bandwidth number the report gives is the base M6 going from 153 to 200 GB/s — a 31% bump on the lowest tier, the one with the least RAM. There's no public bandwidth figure for any M7 Max. Until there is, "AI-focused M7" is a claim about Neural Engine throughput and on-device model features (think a beefier Apple Foundation Model running Siri), not about how fast llama.cpp will spit out tokens.
The Neural Engine focus will help Apple's own on-device features — the 20B-class Apple Foundation Models we covered in the WWDC 2026 home lab verdict. It does very little for your Ollama or LM Studio workflow.
What the M5 Max actually delivers today
This is the machine you'd be buying instead of waiting, so the numbers matter. The M5 Max ships with up to 128GB unified memory and 614 GB/s of bandwidth in the 40-core GPU configuration (460 GB/s in the 32-core trim). For comparison, the M4 Max tops out at 546 GB/s (40-core) / 410 GB/s (32-core).
Real measured token-generation speeds on the M5 Max:
| Model | Quant | M5 Max tok/s |
|---|---|---|
| Llama 3.3 8B | Q4/Q5 | 100–120 |
| Qwen3.5 30B-A3B (MoE) | Q4 | ~58 |
| Llama 3.3 70B | Q4 | ~15–25 |
A few things stand out. The 8B speed is excellent — well past the ~7–10 tok/s human reading speed, so it feels instant. The 30B MoE number (~58 tok/s) is the sweet spot: a smart model at comfortable speed, because MoE only activates ~3B parameters per token. The 70B dense number (~15–25 tok/s) is usable but not snappy — fine for batch work and long-form, sluggish for back-and-forth.
One free speedup: MLX runs 15–25% faster than Ollama on Apple Silicon because of native Metal optimization. If you buy a Mac for local AI, run MLX-backed Ollama or LM Studio, not the generic GGUF path. We covered what the stable MLX release changed in Ollama v0.30 on Apple Silicon.
The M5 Max's real superpower isn't speed — it's capacity. 128GB of unified memory lets you load models that no single consumer GPU can hold. That's the entire reason to buy a high-RAM Mac for AI: not because it's the fastest, but because it fits things a 24GB card can't. (How much you actually need is its own question — see How Much System RAM for Local LLMs.)
The buy-now-vs-wait math
Let's be concrete about what waiting costs you.
If you wait for the M7:
- The base M7 (H1 2027) won't help — base tier means modest RAM ceiling and ~200 GB/s bandwidth, the wrong machine for serious local AI.
- The M7 Max — the tier you'd actually want — is "as early as end of 2027." Call it 18 months from today, optimistically. Apple's "as early as" dates have a habit of becoming "actually shipping in spring."
- The M7 Ultra is 2028.
- For 18+ months you run nothing, or you run on hardware you already have.
If you buy the M5 Max now:
- You get 614 GB/s and 128GB today.
- The "AI-focused" M7 improvement is concentrated in the Neural Engine, which — as covered above — doesn't drive token generation. The generational tok/s gain for your workload will track bandwidth, and we have no evidence the M7 Max's bandwidth leap will be dramatic.
- Resale on Apple Silicon holds up well, so a 2026 purchase isn't stranded if you upgrade in 2028.
There's a real scenario where waiting makes sense: if Apple's "AI-focused" push includes a genuinely wide memory bus on the M7 Max/Ultra (say, pushing toward 800 GB/s–1 TB/s to feed on-device models), the tok/s gain would be large. But that's speculation on an unannounced chip, and "don't buy now because a much better thing might exist in two years" is true of every computer ever made.
The option Apple doesn't want in this conversation: a used 3090
Every Mac-for-AI discussion needs this reality check. A used RTX 3090 sold for a lowest-average of $1,070 in June 2026 (range $966–$1,189; eBay listings often $800–$1,050). It has 936 GB/s of bandwidth — more than the M5 Max's 614 GB/s — and does roughly 95 tok/s on a 7B model, beating the M5 Max on raw token speed for anything that fits in its 24GB.
So the honest framing is two separate questions:
- Do you want a Mac at all? (Quiet, low-power, integrated, runs macOS, portable in the MacBook Pro.) If yes, buy the M5 Max — don't wait 18 months for a Neural Engine bump.
- **Do you just want the m
Top comments (0)