From VAI to Mini-GPT: Verified Transformer Execution on Custom NPU Hardware
Running transformers in software is easy.
Proving they work correctly in custom hardware is the hard part.
At WIOWIZ, we set out to verify whether a transformer-based language model could execute end-to-end, bit-accurately, on our own NPU RTL — not just as a concept, but as real hardware logic.
📘 Full technical blog:
VAI (Virtual AI Inference) to Mini-GPT : RTL-Verified Transformer Execution on WZ-NPU
Why VAI (Virtual AI Inference)?
Most inference stacks treat model weights as transient data. Switching models means reloading weights from memory — expensive in both latency and bandwidth.
VAI flips this model.
- Weights are loaded once into hardware weight banks
- Weights remain resident across inferences
- Model switching becomes a single-cycle select, not a reload
This only matters if it survives RTL verification.
Verification Strategy (No Shortcuts)
Our goal was correctness, not benchmarks.
We verified in layers:
- Persistent weight residency
- Transformer attention math
- Full transformer blocks
- A complete language model end-to-end
Every stage was validated against RTL execution, not just software models.
Persistent Weight Residency
Verification confirmed:
- Multiple models coexist in separate weight banks
- No reload after initial programming
- One-cycle model selection
This validated the core VAI promise.
Verifying Transformer Attention
We verified the full attention pipeline:
- Q × Kᵀ
- Scaling and softmax
- Weighted value accumulation
RTL outputs matched reference behavior bit-exactly.
Full Transformer Blocks
We scaled to complete transformer layers:
- Multi-head attention (4 heads)
- Feed-forward networks (FFN)
- Residual and normalization paths
Cycle-accurate execution remained consistent and correct.
Mini-GPT on WZ-NPU
Finally, we ran Mini-GPT end-to-end on the NPU:
- Embedding layer
- Two transformer blocks
- Language-model head Inference completed in 165,632 cycles, with bit-exact correctness across the entire pipeline.
Closing
From VAI’s weight-as-firmware concept to a fully verified Mini-GPT on WZ-NPU RTL, this project reinforces one principle:
In AI hardware, credibility comes from verification — not claims.
🔗 Read the full technical deep dive:
VAI (Virtual AI Inference) to Mini-GPT : RTL-Verified Transformer Execution on WZ-NPU
Top comments (0)