In the previous article, we saw how the next-word prediction is done, and how lack of training is causing errors.
In this article, we will just visualize the problem, before moving into weight optimization and getting the problem solved.
Before we optimize all the weights, remember that these weights represent the numbers associated with each word.
Since in this example we have two weights for each word, we can plot each word on a graph.
The graph uses the weight values connected to the top activation function on the x-axis, and the weight values connected to the bottom activation function on the y-axis.
For example, “The Incredibles” is plotted here.
Similarly, if we plot the other words, the graph looks like this.
In this graph, the words “Despicable Me” and “The Incredibles” are currently not any more similar to each other.
However, in the training data, both appear in the same context:
- The Incredibles is great!
- Despicable Me is great!
So we expect backpropagation to adjust their weights, making them more similar to each other.
Let’s see how the graph changes after training in the next article.
Looking for an easier way to install tools, libraries, or entire repositories?
Try Installerpedia: a community-driven, structured installation platform that lets you install almost anything with minimal hassle and clear, reliable guidance.
Just run:
ipm install repo-name
… and you’re done! 🚀




Top comments (0)