"not good" and "good not" mean opposite things — order and history matter. A plain network can't capture that; an RNN can, because it carries a hidden state forward. Build a sequence and watch the memory update.
🔁 Step through a sequence: https://dev48v.infy.uk/dl/day10-rnn.html
The trick: a hidden state
let h = zeros(); // the "memory", a summary of everything seen so far
Before reading a token it holds the past; after reading, it's updated.
The recurrence
for (const x of sequence) {
h = tanh(Wx·x + Wh·h + b); // mix this token with the running memory
}
The Wh·h term feeds memory forward; tanh keeps it bounded. The output at any step depends on the whole history, not just the current token.
One shared cell, over time
There aren't separate networks per step — it's ONE small cell with one set of weights, applied again and again down the sequence (like a conv kernel sharing weights across space, an RNN shares across time). So it handles any length.
Memory in action
In the demo, "not" sets a context so the NEXT token's effect flips — "not good" ends negative, "not bad" positive. Same token, different history → different result. That's memory.
The flaw
Training unrolls the loop and backprops through every step. But over long sequences the gradient vanishes (or explodes) — so vanilla RNNs forget the distant past. The fix is gates: LSTM/GRU, next.
Top comments (0)