Understanding Transformers Part 3: How Transformers Combine Meaning and Position

#ai #machinelearning

In the previous article, we learned how positional encoding is generated using sine and cosine waves. Now we will apply those values to each word in the sentence.

Applying Positional Encoding to All Words

To get the positional values for the second word, we take the y-axis values from each curve at the x-axis position corresponding to the second word.

To get the positional values for the third word, we follow the same process.

Positional Values for Each Word

By doing this for every word, we get a set of positional values for each one:

Each word now has its own unique sequence of positional values.

Combining Embeddings with Positional Encoding

The next step is to add these positional values to the word embeddings.

After this addition, each word embedding now contains both:

semantic meaning (from embeddings)
positional information (from positional encoding)

So for the sentence:

"Jack eats burger"

we now have embeddings that also capture word order.

What Happens When We Change Word Order?

Let us reverse the sentence:

"Burger eats Jack"

The embeddings for the first and third words get swapped.
However, the positional values for positions 1, 2, and 3 remain the same.

When we add the positional values to the embeddings again:

The final vectors for the first and third words become different from before.

This is how positional encoding helps transformers understand word order.

Even if the same words are used, changing their positions results in different final representations.

We will explore further in the next article.

Looking for an easier way to install tools, libraries, or entire repositories?
Try Installerpedia: a community-driven, structured installation platform that lets you install almost anything with minimal hassle and clear, reliable guidance.

Just run: