Understanding Transformers Part 2: Positional Encoding with Sine and Cosine

#ai #machinelearning

In the previous article, we converted words into embeddings. Now let’s see how transformers add position to those numbers.

The numbers that represent word order in a transformer come from a sequence of sine and cosine waves.

Each curve is responsible for generating position values for a specific dimension of the word embedding.

Understanding the Idea

Think of each embedding dimension as getting its value from a different wave.

For example:

The green curve provides the positional values for the first embedding dimension of every word.

For the first word in the sentence, which lies at the far left of the graph (position 0 on the x-axis):

The value taken from the green curve is 0 (the y-axis value at that position).

The orange curve provides the positional values for the second embedding dimension.

At the same position (first word):

The value from the orange curve is 1.

The blue curve provides the positional values for the third embedding dimension.

For the first word:

The value is 0.

The red curve provides the positional values for the fourth embedding dimension.

For the first word:

The value is 1.

Final Positional Encoding for the First Word

By combining the values from all four curves, we get the positional encoding vector for the first word:

We will apply the same process to the remaining words in the next article.

Looking for an easier way to install tools, libraries, or entire repositories?
Try Installerpedia: a community-driven, structured installation platform that lets you install almost anything with minimal hassle and clear, reliable guidance.

Just run: