Understanding Transformers Part 13: Introducing Encoder–Decoder Attention

#deeplearning #machinelearning #nlp #tutorial

In the previous article, we built up the decoder layers and stopped at the relationship between input and output sentence.

So this brings us to the concept of Encoder-Decoder Attention

Why Encoder–Decoder Attention Matters

Consider the input sentence:

“Don’t eat the delicious looking and smelling pizza.”

When translating this sentence, it is very important to keep track of the word “Don’t”.

If the translation ignores this word, we might end up with:

“Eat the delicious looking and smelling pizza.”

These two sentences have completely opposite meanings.

Key Idea

Because of this, the decoder must pay close attention to the important words in the input.

This is where encoder–decoder attention comes in.

It allows the decoder to focus on the most relevant parts of the input sentence while generating the output.

In simple terms, encoder–decoder attention helps the decoder keep track of significant words in the input.

Updated Transformer Structure

With this idea, our current encoder–decoder structure looks like this:

We will build on this and explore the details in the next article.

Looking for an easier way to install tools, libraries, or entire repositories?
Try Installerpedia: a community-driven, structured installation platform that lets you install almost anything with minimal hassle and clear, reliable guidance.

Just run: