DEV Community

Cover image for From VAI to Mini-GPT: Verified Transformer Execution on Custom NPU Hardware
WIOWIZ Technologies
WIOWIZ Technologies

Posted on • Originally published at wiowiz.com

From VAI to Mini-GPT: Verified Transformer Execution on Custom NPU Hardware

From VAI to Mini-GPT: Verified Transformer Execution on Custom NPU Hardware

Running transformers in software is easy.
Proving they work correctly in custom hardware is the hard part.

At WIOWIZ, we set out to verify whether a transformer-based language model could execute end-to-end, bit-accurately, on our own NPU RTL — not just as a concept, but as real hardware logic.

📘 Full technical blog:
VAI (Virtual AI Inference) to Mini-GPT : RTL-Verified Transformer Execution on WZ-NPU


Why VAI (Virtual AI Inference)?

Most inference stacks treat model weights as transient data. Switching models means reloading weights from memory — expensive in both latency and bandwidth.

VAI flips this model.

  • Weights are loaded once into hardware weight banks
  • Weights remain resident across inferences
  • Model switching becomes a single-cycle select, not a reload

This only matters if it survives RTL verification.


Verification Strategy (No Shortcuts)

Our goal was correctness, not benchmarks.

We verified in layers:

  • Persistent weight residency
  • Transformer attention math
  • Full transformer blocks
  • A complete language model end-to-end

Every stage was validated against RTL execution, not just software models.


Persistent Weight Residency

Verification confirmed:

  • Multiple models coexist in separate weight banks
  • No reload after initial programming
  • One-cycle model selection

This validated the core VAI promise.


Verifying Transformer Attention

We verified the full attention pipeline:

  • Q × Kᵀ
  • Scaling and softmax
  • Weighted value accumulation

RTL outputs matched reference behavior bit-exactly.


Full Transformer Blocks

We scaled to complete transformer layers:

  • Multi-head attention (4 heads)
  • Feed-forward networks (FFN)
  • Residual and normalization paths

Cycle-accurate execution remained consistent and correct.


Mini-GPT on WZ-NPU

Finally, we ran Mini-GPT end-to-end on the NPU:

  • Embedding layer
  • Two transformer blocks
  • Language-model head Inference completed in 165,632 cycles, with bit-exact correctness across the entire pipeline.

Closing

From VAI’s weight-as-firmware concept to a fully verified Mini-GPT on WZ-NPU RTL, this project reinforces one principle:

In AI hardware, credibility comes from verification — not claims.

🔗 Read the full technical deep dive:
VAI (Virtual AI Inference) to Mini-GPT : RTL-Verified Transformer Execution on WZ-NPU

Top comments (0)