DEV Community

Chandan Kumar
Chandan Kumar

Posted on

LSTMs

LSTMs

The idea of LSTMs (Long Short-term Memory Networks) is closely related to the fact that we humans understand the context when we talk. And, there is no way a traditional neural network can help in storing this context. So Recurrent neural networks (RNNs) were invented to tackle this problem. These are networks with a recurring nature that allows information to persist.
 
A simple RNN looks like this:
A Simple RNN

It takes an input x and outputs a hidden state h. It also has another output which also goes as input to the RNN block. This way it has the ability to connect previous information to the current task. Though simple RNNs fail when it comes to understanding longer sequences of texts.

LSTMs are a special kind of RNN. It covers up the shortcomings of a simple RNN which is good for a short sequence of information.

LSTMs are capable of learning long-term dependencies. Its key part is - Cell state. The horizontal green line running in the below picture. It holds information on previous input sequences.

Alt Text

 
 
The process of LSTMs can be understood in the following 4 steps:

Step 1 - Throw away what's not needed

Alt Text

The incoming Cell state contains information about the whole input sequence up to that moment. The first task is to figure out what information to be thrown away from the Cell state. This decision is taken using a sigmoid layer called the Forget gate. It looks at the incoming input and the hidden state and produces numbers between 0 and 1. Values reaching towards 0 will be the ones that should be forgotten.
 
 
Step 2 - Compute and keep what's needed

Alt Text

This is done by the part called the Input gate.

It has two parts, the first one is a Sigmoid layer and another is a tanh layer. Both of them figure out what information will be updated to the Cell state.
 
 
Step 3 - Update the Cell State

Alt Text

Now is the time to update the old Cell state to a new Cell state using the values calculated in the previous step.
 
 
Step 4 - Compute what to output

Alt Text
Run a sigmoid layer to decide which part of the new Cell state to output. This process happens at the Output gate.
Put the new Cell state through tanh and multiply it by the output of the sigmoid function, so that only selected parts are in the output.

Hope the blog gives you an idea of how LSTMs work!

Top comments (0)