DEV Community

Cover image for Understanding Attention Mechanisms – Part 3: From Cosine Similarity to Dot Product
Rijul Rajesh
Rijul Rajesh

Posted on

Understanding Attention Mechanisms – Part 3: From Cosine Similarity to Dot Product

In the previous article, we explored the comparison between encoder and decoder outputs. In this article, we will be checking the math on how the calculation is done, and how it can be further simplified.

The output values for the two LSTM cells in the encoder for the word "Let’s" are -0.76 and 0.75.

The output values from the two LSTM cells in the decoder for the <EOS> token are 0.91 and 0.38.

We can represent this as:

A = Encoder
B = Decoder

Cell #1     Cell #2
-0.76       0.75
 0.91       0.38
Enter fullscreen mode Exit fullscreen mode

Now, we plug these values into the cosine similarity equation.

This gives us a result of -0.39.

To simplify this further, a common approach is to compute only the numerator.

The denominator mainly scales the value between -1 and 1, so in some cases, we can ignore it for simplicity.

Since we are dealing with a fixed number of cells, this simplification works well. This is also known as the dot product.

When we calculate only the dot product, we get:

(-0.76 × 0.91) + (0.75 × 0.38) = -0.41

We will explore this further in the next article.


Looking for an easier way to install tools, libraries, or entire repositories?
Try Installerpedia: a community-driven, structured installation platform that lets you install almost anything with minimal hassle and clear, reliable guidance.

Just run:

ipm install repo-name
Enter fullscreen mode Exit fullscreen mode

… and you’re done! 🚀

Installerpedia Screenshot

🔗 Explore Installerpedia here

Top comments (0)