Understanding Word2Vec – Part 7: How Negative Sampling Speeds Up Word2Vec

#ai #machinelearning

In the previous article, we saw the huge number of weights and mentioned about a technnique called negative sampling.
We will explore that further in this article.

One way Word2Vec speeds up training is by using something called negative sampling.

Negative sampling works by randomly selecting a subset of words that we do not want to predict during optimization.

For example, suppose we want the word “Antelope” to predict the word “A.”

So only Antelope has a 1 in its input position, and all the other words have 0s.

This means we can ignore the weights coming from every word except Antelope, because the other words multiply their weights by 0.

So this alone removes 300 million weights from the optimization step.

However, we still have 300 million weights after the activation function, because we want to predict the word “A.”
We do not want to predict Antelope, Discover, or any of the other words.

Now imagine that Word2Vec randomly selects “Discover” as a word that we do not want to predict.

So for this round of backpropagation, we can ignore all the other weights that lead to the remaining possible outputs.

In the end, out of 600 million total weights in this neural network, we only optimize about 300 weights per step.

So this is one way Word2Vec can efficiently create many word embeddings for each word in a large vocabulary.

Looking for an easier way to install tools, libraries, or entire repositories?
Try Installerpedia: a community-driven, structured installation platform that lets you install almost anything with minimal hassle and clear, reliable guidance.

Just run:

ipm install repo-name

… and you’re done! 🚀

🔗 Explore Installerpedia here

Top comments (1)

klement Gunndu • Mar 13

The weight reduction from 600M to ~300 per step is wild. Curious if you've compared this to NCE — noise contrastive estimation supposedly converges faster on smaller vocabularies.