keeper

Posted on May 24

Windows Has an AI Problem. Can HarmonyOS PC Be the Answer?

#ai #hardware #windows #machinelearning

Windows Has an AI Problem. Can HarmonyOS PC Be the Answer?

In May 2025, a mini PC with a Ryzen 7 8845HS sat on my desk. It had an NPU rated at 16 TOPS — just enough for Microsoft's Copilot+ checklist. I was trying to run a 14B local model on it. The result: 2.1 tokens per second on CPU, the iGPU was bottlenecked by shared memory bandwidth, and the NPU was completely unusable — it only worked with Microsoft's proprietary QNN API, and no open-source LLM runtime supported it.

This isn't a hardware problem. It's a structural one.

The Windows AI Trap

Windows has a fundamental architectural disadvantage for local AI that no amount of NPU TOPS marketing can fix. It stems from three layers:

1. Fragmented Memory Architecture

The defining feature of the M-series Mac is unified memory — a single pool of high-bandwidth, low-latency RAM shared between CPU, GPU, and Neural Engine. A MacBook with 64GB unified memory can run a 70B parameter model (quantized to Q4, ~40GB) because the GPU has full access to all 64GB at 400+ GB/s.

Windows PCs have no equivalent. The architecture is:

System RAM: 16-64GB (DDR5/LPDDR5, 50-80 GB/s)
GPU VRAM: 8-24GB (GDDR6/7, 400-800 GB/s)
NPU memory: Shared with system, but bandwidth-constrained and API-locked

The GPU is the best inference engine, but it's limited to VRAM. A 14B Q4 model needs ~8GB — fits on a 12GB+ card. A 70B Q4 needs ~40GB — only RTX 6000 Ada ($6,800) or server GPUs. The system RAM has the capacity but runs at 1/10th the bandwidth. The NPU has the TOPS but no ecosystem.

Result: Every Windows AI PC is a "here's the capacity, here's the bandwidth, here's the TOPS — but you can't use all three at once" machine. Apple gets to use all three simultaneously.

2. OEM Incentive Misalignment

Microsoft doesn't make the hardware — Dell, Lenovo, HP, ASUS do. Each OEM competes on price. The cheapest SSD, the slowest RAM, the smallest battery. NPU is a checkbox component, not a system-level optimization. No OEM invests in unified memory because it requires a custom SoC and motherboard — which means they can't differentiate on any other dimension.

Qualcomm's Snapdragon X Elite was supposed to fix this. It has unified memory (LPDDR5X, up to 64GB, 135 GB/s). But Windows on ARM has its own problems: x86 emulation overhead, driver compatibility, and the same OEM cost-cutting pressure that makes Lenovo ship a 45W charger with a 28W SoC.

3. The NPU Ecosystem Is Still Born

Every Windows NPU needs a different SDK:

Intel NPU: OpenVINO
AMD NPU: Ryzen AI / DirectML
Qualcomm NPU: QNN / ONNX Runtime
Microsoft's own: DirectML (if it supports your hardware)

No major open-source LLM runtime — not llama.cpp, not MLX, not ExLlamaV2 — supports any of these NPUs for text generation. The 16-45 TOPS on the spec sheet is a marketing number. In practice, those TOPS are only accessible through Microsoft's proprietary pipeline for Copilot+ features like real-time captions or Windows Studio Effects. Try running Llama 3.2 on the NPU. You can't.

Apple's Moat Is Getting Deeper

The M4 Ultra makes 192GB unified memory available to the GPU. A single Mac Studio can run Llama 3.1 405B (Q2, ~100GB) entirely locally. Not fast, but it works. With MLX, a 70B model runs at 15-20 tok/s on an M4 Ultra.

Apple doesn't compete on peak TOPS or FLOPs. It competes on usable architecture — the hardware memory model that makes inference practical. And this advantage compounds as models grow: a 2027-era 1T-parameter MoE model will need 200-300GB. Only Apple's architecture can deliver that at a consumer price point.

Microsoft's response is to push NPUs harder — but a 100 TOPS NPU with 16GB of slow shared memory is still worse than a 30 TOPS GPU with 64GB of fast unified memory. The bottleneck is bandwidth and capacity, not multiply-accumulate operations.

Enter HarmonyOS PC: A Clean Sheet

Huawei launched HarmonyOS PC on May 19, 2025, powered by the Kirin X90 — a 5nm SoC with 14 CPU cores, an integrated GPU (Maleoon), and a self-developed NPU. This is a mobile-SoC-derived architecture — which means it's inherently unified memory. The CPU, GPU, and NPU share the same LPDDR5 pool.

This is the only non-Apple PC architecture that natively supports unified memory. And that matters.

Kirin X90: What We Know

Dimension	Kirin X90	Apple M4	Snapdragon X Elite
Process	5nm (SMIC)	3nm (TSMC)	4nm (TSMC)
CPU	14-core	10-core	12-core
GPU	Maleoon (custom)	10-40 core	Adreno
NPU	Custom, AI +200% vs prev gen	38 TOPS	45 TOPS
Memory	LPDDR5 (unified)	LPDDR5 (unified)	LPDDR5x (unified)
AI Ops (claimed)	~60-80 TOPS*	38 TOPS	45 TOPS

*Estimated from Huawei's "200% improvement over previous gen" claim, assuming Kirin 9010 NPU baseline of ~30 TOPS.

The architecture is right — unified memory is the non-negotiable prerequisite for local AI inference. But capacity is the open question.

The 14B model test: A 14B Q4 model needs ~8GB. If the X90 supports 32GB LPDDR5 — MacBook Air territory — it fits comfortably. The GPU (Maleoon) handles inference via Vulkan/compute shaders, while both the GPU and CPU share the full memory bandwidth. Inference speed would depend on GPU optimization, but 10-20 tok/s is plausible with proper runtime support.

At 16GB, it fits but with memory pressure from the OS and browser. At 8GB (phone territory), forget it.

The Supply Chain Math: CXMT and the Memory Squeeze

Here's where the theoretical architecture meets reality. HarmonyOS PC needs LPDDR5 memory. Huawei's HBM technology (the HiZQ 2.0 used in Ascend 950) supplies — and competes for — the same DRAM fab capacity.

CXMT (长鑫存储) is the only Chinese DRAM IDM. By mid-2026, their three 12-inch fabs (two in Hefei, one in Beijing) reach approximately 300K wafers/month total capacity.

The allocation problem:

Product	Die Consumption	Monthly Wafer Need (Est.)
Ascend 950 HBM (HiZQ 2.0)	Each 144GB chip = 4 stacks of 12-hi = 48 DRAM dies + base dies	30K-50K wafers (at 3x DDR die consumption per GB)
Kirin X90 LPDDR5 (5M units/year)	5M × 4-8 dies/device = 20-40M dies total	~25K-50K wafers (one-time for first batch)
DDR5 commodity	Industry contracts, server, PC aftermarket	Remaining 200K-240K wafers

The squeeze: HBM consumes roughly 3x the wafer capacity per GB compared to standard DDR (per Tom's Hardware analysis — yield loss from stacking, smaller dies, base die overhead). A single Atlas 950 SuperNode (8,192 Ascend 950 chips, 1.1 PB HBM) uses as much DRAM wafer capacity as millions of PCs.

CXMT's profitability adds pressure. In H1 2026, CXMT reported ¥50-57 billion net profit, driven by the global DRAM shortage. The highest-margin product is HBM, not LPDDR. From a business perspective, CXMT's incentive is to allocate more capacity to HBM for Ascend and less to commodity LPDDR.

Verdict on availability: CXMT's 300K wafer/month capacity can supply both HBM and LPDDR5, but at the volumes Huawei needs, the LPDDR5 allocation is likely constrained. The better question isn't "can CXMT make enough" — it's "how much is Huawei willing to pay vs NAND margins on the spot market."

If Huawei procures CXMT LPDDR5 at above-market prices (internal transfer pricing as a related-party transaction), they secure supply at the cost of lower margins. If they try to source from Samsung/SK Hynix, they face US export controls and uncertain allocation (both Korean vendors prioritize HBM for NVIDIA).

The pragmatic path: CXMT supplies just enough LPDDR5 for Kirin X90 volumes (~25K wafers/month at scale), while the bulk of HBM capacity goes to Ascend 950 — which has higher margins and strategic AI infrastructure importance.

Bottom line: A HarmonyOS PC with 16GB RAM is likely. One with 32GB is possible but expensive. One with 64GB (Mac-level) is unlikely at launch.

The Real Disruption: Distributed AI

The most interesting possibility for HarmonyOS PC isn't local performance — it's what no other platform can do.

HarmonyOS has a distributed hardware abstraction layer that treats all devices in a user's ecosystem as a single resource pool. This was originally designed for file sharing and phone-as-webcam use cases. Applied to AI inference, it becomes something genuinely different:

Your phone's NPU + your PC's NPU + your tablet's NPU = pooled inference
Model layers sharded across devices over high-speed local interconnect
A 14B model's attention layers run on the PC GPU, embedding layers on the phone NPU

Windows can't do this. macOS can't do this (Continuity doesn't extend to GPU compute pooling). HarmonyOS's distributed architecture is unique.

The practical challenge: interconnect latency. Even WiFi 7 at 5Gbps has 1-3ms latency between devices, which is 10-100x slower than on-chip memory access. Real-time layer-sharded inference requires sub-microsecond synchronization. This limits distributed AI to batch/offline workloads (background summarization, async data processing) rather than interactive chat.

But for the use case of "leave your PC processing a large model overnight while sharing the workload with your phone" — it works.

Three Scenarios for HarmonyOS PC

Scenario 1: The "Good Enough" (Likely)

16GB LPDDR5, unified memory, ~60 TOPS NPU
Runs 7B models at 15-25 tok/s via GPU inference
Runs 14B models at 5-10 tok/s (usable but slow)
Distributed AI for async/background workloads
Price: ¥3,999-4,999 ($550-700)
Verdict: A legitimate third option alongside Mac and Windows, but not a "Mac killer"

Scenario 2: The Memory Expansion (Possible)

32GB LPDDR5 (CXMT secures allocation, higher BOM cost)
Runs 14B Q4 models at 15-20 tok/s
Runs 32B Q4 models at 5-8 tok/s
Distributed AI with phone NPU for real-time assistance
Price: ¥5,999-6,999 ($830-970)
Verdict: Genuinely competitive with MacBook Air M4 for local AI workloads
Key risk: CXMT LPDDR5 allocation, BOM margin pressure

Scenario 3: The Distributed Breakthrough (Long Shot)

16-32GB LPDDR5
HarmonyOS 6+ with native distributed inference API
"AI router" mode: PC serves as local inference server for all household HarmonyOS devices
Third-party model runtime (llama.cpp port, MindSpore Lite + Vulkan)
Verdict: Unique value proposition Windows and Mac can't replicate
Key unlock: Software ecosystem maturity, developer adoption

What Needs to Happen

For HarmonyOS PC to matter in the AI era, three things must align:

1. Memory capacity must exceed 16GB. Without this, the unified memory advantage is theoretical. A 14B model barely fits, leaving no room for the OS or applications. 32GB is the sweet spot.

2. An open runtime must exist. If the only way to run AI is through Huawei's MindSpore/CANN pipeline, developer adoption will be slow — the same trap Windows fell into with proprietary NPU SDKs. A Vulkan-based llama.cpp port would be transformative.

3. The distributed inference API must ship as a first-party feature. Not a developer preview. Not an enterprise SKU. A system-level API that any app can call: harmonyos.distribute.infer(model, input, devices=[pc, phone]). This is the feature that differentiates HarmonyOS from every other platform.

The Honest Assessment

HarmonyOS PC solves the architectural problem Windows has — unified memory — and has a theoretical distributed computing advantage no other platform can match. But it faces three constraints:

Manufacturing: 5nm at SMIC is behind TSMC's 3nm. Performance per watt will trail Apple M4.
Ecosystem: No Adobe, no major games, limited developer tools. This limits adoption to first-wave enthusiasts and government/enterprise procurement.
Memory supply: CXMT capacity is strained by HBM demand. 32GB LPDDR5 at scale is not guaranteed.

The best case for HarmonyOS PC: It becomes what Windows should have been in the AI era — a platform where local AI inference is architecturally natural, not bolted on. But it only happens if Huawei prioritizes memory capacity over NPU TOPS, and opens the runtime to the community.

The worst case: It's another Windows — impressive NPU TOPS on paper, functionally inaccessible for real AI workloads, held back by proprietary SDKs and memory constraints.

Right now, the data points toward Scenario 1: a good product that's competitive but not transformative. The switch flips to transformative at Scenario 2 — and that requires CXMT to deliver 32GB-class LPDDR5 at scale, which is a supply chain question, not a technology one.

This analysis is a best-effort public-data assessment. CXMT capacity figures are estimates based on public reporting. Kirin X90 NPU TOPS is inferred from Huawei's "200% improvement" claim relative to an unknown baseline.

DEV Community

Windows Has an AI Problem. Can HarmonyOS PC Be the Answer?

Windows Has an AI Problem. Can HarmonyOS PC Be the Answer?

The Windows AI Trap

1. Fragmented Memory Architecture

2. OEM Incentive Misalignment

3. The NPU Ecosystem Is Still Born

Apple's Moat Is Getting Deeper

Enter HarmonyOS PC: A Clean Sheet

Kirin X90: What We Know

The Supply Chain Math: CXMT and the Memory Squeeze

The Real Disruption: Distributed AI

Three Scenarios for HarmonyOS PC

Scenario 1: The "Good Enough" (Likely)

Scenario 2: The Memory Expansion (Possible)

Scenario 3: The Distributed Breakthrough (Long Shot)

What Needs to Happen

The Honest Assessment

Top comments (0)