Understanding Transformers Part 4: Introduction to Self-Attention

#ai #machinelearning

In the previous article, we learned how word embeddings and positional encoding are combined to represent both meaning and position.

Now let’s go back to our example where we translate the English sentence “Let’s go”, and add positional values to the word embeddings.

Now, let’s get the positional encoding for both words.

Understanding Relationships Between Words

Now let’s explore how a transformer keeps track of relationships between words.

Consider the sentence:

“The pizza came out of the oven and it tasted good.”

The word “it” could refer to pizza, or it could potentially refer to oven.

It is important that the transformer correctly associates “it” with “pizza”.

Self-Attention

Transformers use a mechanism called self-attention to handle this.

Self-attention helps the model determine how each word relates to every other word in the sentence, including itself.

Once these relationships are calculated, they are used to determine how each word is represented.

For example, if “it” is more strongly associated with “pizza”, then the similarity score for pizza will have a larger impact on how “it” is encoded by the transformer.

We have now covered the basic idea of self-attention. We will explore it in more detail in the next article.

Looking for an easier way to install tools, libraries, or entire repositories?
Try Installerpedia: a community-driven, structured installation platform that lets you install almost anything with minimal hassle and clear, reliable guidance.

Just run: