Understanding Transformers Part 17: Generating the Output Word

#ai #machinelearning

In the previous article, we set up the residual connections to get the final output values from the decoder.

In this article, we begin by passing these two output values through a fully connected layer.

This layer has:

One input for each value representing the current token (in this case, 2 inputs)
One output for each word in the output vocabulary

Since our vocabulary has 4 tokens, this gives us 4 output values.

Selecting the Output Word

Next, we pass these 4 output values through a softmax function.

This allows us to select the most likely output word, which in this case is “vamos”.

So far, the translation is correct. However, the process does not stop here.

Continuing the Decoding Process

The decoder continues generating words until it produces an token, which indicates the end of the sentence.

To generate the next word, we feed the predicted word back into the decoder.

We will explore this step in the next article.

Looking for an easier way to install tools, libraries, or entire repositories?
Try Installerpedia: a community-driven, structured installation platform that lets you install almost anything with minimal hassle and clear, reliable guidance.

Just run: