Understanding Word2Vec – Part 6: Two Ways Word2Vec Learns Context

#ai #machinelearning

In the previous article, we saw the word embeddings concept, and how training causes similar words to be in proximity with each other.

We will be looking at two strategies that Word2Vec uses to include more context.

Continuous Bag of Words (CBOW)

The first method is the Continuous Bag of Words (CBOW) model. It increases the context by using surrounding words to predict the word that occurs in the middle.

For example, the CBOW method could use the words “The Incredibles” and “Great” to predict the word that occurs in between, “Is.”

Skip Gram

The second method is called Skip-Gram, which increases the context by using the word in the middle to predict the surrounding words.

For example, the Skip-Gram method could use the word “Is” to predict the surrounding words “The Incredibles,” “Great,” and “Despicable Me.”

In practice, instead of using just two activation functions to create two embeddings per word, people often use 100 or more activation functions to create many embeddings per word.

And instead of using just two sentences, the model can train on millions of words and phrases.

So the total number of weights in this neural network that we need to optimize becomes very large.

So this results in about 600 million weights, which makes training slow.

One way that Word2Vec speeds things up is by using something called negative sampling.

We will explore what negative sampling is in the next article.

Looking for an easier way to install tools, libraries, or entire repositories?
Try Installerpedia: a community-driven, structured installation platform that lets you install almost anything with minimal hassle and clear, reliable guidance.

Just run: