In the previous article, we explored the self-attention concept for transformers, in this article we will go deeper into how the comparisons are performed.
Building Query and Key Values
Let’s go back to our example.
We have already added positional encoding to the words “Let’s” and “go”.
Creating Query Values
The first step is to multiply the position-encoded values for the word “Let’s” by a set of weights.
Next, we repeat the same process using a different set of weights, which gives us another value (for example, 3.7).
We do this twice because we started with two position-encoded values representing the word “Let’s”.
These resulting values together represent “Let’s” in a new form.
In transformer terminology, these are called query values.
Creating Key Values
Now, we use these query values to measure similarity with other words, such as “go”.
To do this, we first create a new set of values for each word, similar to how we created the query values.
- We generate two values for “Let’s”
- And two values for “go”
These new values are called key values.
What’s Next?
We will use these key values along with the query values to calculate how similar “Let’s” is to “go”.
We will explore how this similarity is calculated in the next article.
Looking for an easier way to install tools, libraries, or entire repositories?
Try Installerpedia: a community-driven, structured installation platform that lets you install almost anything with minimal hassle and clear, reliable guidance.
Just run:
ipm install repo-name
… and you’re done! 🚀






Top comments (0)