Understanding Seq2Seq Neural Networks – Part 4: The Encoder and the Context Vector

#ai #machinelearning

In the previous article, we stopped with the problem where we wanted to add more weights and biases to fit our data.

So in this article, We will add one more LSTM layer to the encoder.

What this means is that the output values (the short-term memories, or the hidden states) from the unrolled LSTM units in the first layer are used as the inputs to the unrolled LSTM units in the second layer.

Just like how both embedding values are used as inputs to both LSTM cells in the first layer, both outputs (the short-term memories) from each cell in the first layer are used as inputs to both LSTM cells in the second layer.

Initializing the Memories

The only thing left to do is initialize the long-term and short-term memories.

So with this, we are done creating the encoder part of the Encoder–Decoder model.

The Encoder Output

In this example, we have two layers of LSTMs, with two LSTM cells per layer.

The encoder encodes the input sentence “Let’s go” into a collection of long-term and short-term memories (cell states and hidden states).

The final long-term and short-term memories from both layers of the LSTM cells in the encoder are called the context vector.

So the encoder encodes the input sentence “Let’s go” into the context vector.

Now we need to decode this context vector.

We will explore this in the next article.

Looking for an easier way to install tools, libraries, or entire repositories?
Try Installerpedia: a community-driven, structured installation platform that lets you install almost anything with minimal hassle and clear, reliable guidance.

Just run: