Understanding Attention Mechanisms – Part 2: Comparing Encoder and Decoder Outputs

#ai #machinelearning

In the previous article, we explored the main idea of attention and the modifications it requires in an encoder–decoder model. Now, we will explore that idea further.

An encoder–decoder model can be as simple as an embedding layer attached to a single LSTM. If we want a more advanced encoder, we can add additional LSTM cells.

Now, we initialize the long-term and short-term memory in the LSTMs of the encoder with zeros.

If our input sentence, which we want to translate into Spanish, is "Let's go", we can feed a 1 for "Let's" into the embedding layer, unroll the network, and then feed a 1 for "go" into the embedding layer.

This process creates the context vector, which we use to initialize a separate set of LSTM cells in the decoder.

All of the input is compressed into the context vector.

But the idea of attention is that each step in the decoder should have direct access to the inputs.

So, let’s understand how attention connects the inputs to each step of the decoder.

In this example, the first thing attention does is determine how similar the outputs from the encoder LSTMs are to the outputs from the decoder LSTMs at each step.

In other words, we compute a similarity score between the LSTM outputs (the short-term memory or hidden states) from the encoder and the decoder.

For instance, we calculate a similarity score between:

The LSTM output from the first step in the encoder, and
The LSTM output from the first step in the decoder

We also calculate a similarity score between:

The LSTM output from the second step in the encoder, and
The LSTM output from the first step in the decoder

There are various ways to calculate this similarity.

One simple method is cosine similarity, which measures how similar two sequences of numbers (representing words) are.

We will explore this further in the next article.

Looking for an easier way to install tools, libraries, or entire repositories?
Try Installerpedia: a community-driven, structured installation platform that lets you install almost anything with minimal hassle and clear, reliable guidance.

Just run: