From VAI to Mini-GPT: Verified Transformer Execution on Custom NPU Hardware

#ai #hardware #npu #verification

Running transformers in software is easy.
Proving they work correctly in custom hardware is the hard part.

At WIOWIZ, we set out to verify whether a transformer-based language model could execute end-to-end, bit-accurately, on our own NPU RTL — not just as a concept, but as real hardware logic.

📘 Full technical blog:
VAI (Virtual AI Inference) to Mini-GPT : RTL-Verified Transformer Execution on WZ-NPU

Why VAI (Virtual AI Inference)?

Most inference stacks treat model weights as transient data. Switching models means reloading weights from memory — expensive in both latency and bandwidth.

VAI flips this model.

Weights are loaded once into hardware weight banks
Weights remain resident across inferences
Model switching becomes a single-cycle select, not a reload

This only matters if it survives RTL verification.

Verification Strategy (No Shortcuts)

Our goal was correctness, not benchmarks.

We verified in layers:

Persistent weight residency
Transformer attention math
Full transformer blocks
A complete language model end-to-end

Every stage was validated against RTL execution, not just software models.

Persistent Weight Residency

Verification confirmed:

Multiple models coexist in separate weight banks
No reload after initial programming
One-cycle model selection

This validated the core VAI promise.

Verifying Transformer Attention

We verified the full attention pipeline:

Q × Kᵀ
Scaling and softmax
Weighted value accumulation

RTL outputs matched reference behavior bit-exactly.

Full Transformer Blocks

We scaled to complete transformer layers:

Multi-head attention (4 heads)
Feed-forward networks (FFN)
Residual and normalization paths

Cycle-accurate execution remained consistent and correct.

Mini-GPT on WZ-NPU

Finally, we ran Mini-GPT end-to-end on the NPU:

Embedding layer
Two transformer blocks
Language-model head Inference completed in 165,632 cycles, with bit-exact correctness across the entire pipeline.

Closing

From VAI’s weight-as-firmware concept to a fully verified Mini-GPT on WZ-NPU RTL, this project reinforces one principle:

In AI hardware, credibility comes from verification — not claims.

🔗 Read the full technical deep dive:
VAI (Virtual AI Inference) to Mini-GPT : RTL-Verified Transformer Execution on WZ-NPU