The Problem
I read "Attention is All You Need" a couple of times and watched a few hours worth of Youtube (thanks 3Blue1Brown and Andrej Karpathy!) to try and wrap my head around multi-attention heads and transformers. However, it wasn't quite clicking.
So, I built a visualiser where I could watch it happen myself - and it turned out to be pretty useful! There's something about wading through all the error messages to get to a functional final product that is the most satisfying feeling in the world.
The Solution
Seeing how attention works in such a plain format was the 'A-Ha!' moment, and I thought it may be pretty help some others as well. So, I fleshed it out a little more, added some trained models and causal masking options, added some user friendly features and bundled it together in a Streamlit app.
So now, you can try my Transformer Attention Visualiser right here! Try it here →
What It Does
It let's you:
- Build sentences (Mad Libs style or custom input)
- Watch attention patterns form across 1-16 heads
- Toggle trained vs random weights (see the difference training makes)
- Enable causal masking (watch the matrix become triangular)
- Read explanations tailored to what you're currently seeing
What's Next
I'm certainly not done learning, and I suspect that means I am not done creating hands on apps for all of the various topics I will encounter. If you're also a visual learner, or these help you in any way, let me know - especially if there are things missing or under-explained! I'd love to hear about it, and learn something in the process of getting it online.
If anybody knows how to get it to load faster on the first cold open, please reach out. I may be getting better at ML concepts, but this has stumped me for some time!
The GitHub Repo is below - this won't be the last you hear of me!
Built this in Week 3 of my transition from operations manager to ML engineer. One concept at a time, one visualization at a time.
Top comments (1)
Nice insights, thanks for writing them. If it helps you. I wrote a reading guide for myself, to understand the great engineering of "Attention is all you need"
Or I made Claude write it, once I described it, because the LLM is better at sounding coherent.