DEV Community

Cover image for Word Embeddings
Rohab Shabbir
Rohab Shabbir

Posted on

Word Embeddings

Human Language and word meanings

Human languages can be highly complex and misunderstood. It can be easily understandable for humans but not for computers. Because same words can have different meanings in different context.
Google translate translates to a good certain limit but in some cases when it literally translates a webpage, some of the lines doesn't make sense because sometimes it translates independent of context. GPT-3 is a very good AI tool released by openAI that is trained over large models for translation, summarization and other purposes.

Meaning

What is meaning according to different definitions:

  • the idea represented by word or phrase

  • the idea that a person wants to convey using different words, phrases etc

  • the idea that is expressed in a work of writing
    the commonest way of linguistic way of thinking about meaning is that they say
    "it is a signifier(symbol) that signifies an idea"
    It is also referred as denotational semantics
    This model is not deeply implemented. In NLP the traditionally the way by which meaning is handled is to make use of dictionaries.

Wordnet

Wordnet is a large lexical database of English, which makes groups of synonyms of words.
But it is also not highly efficient. For example, in this database good id grouped as synonym of proficient that may be correct but not in all contexts.
It is also missing new words.

Word Relationships

Representing words as discrete symbols
In traditional NLP, words are represented as discrete symbols. In deep learning we have symbols like hotel, conference, motel which we refer as localized representation.
We have vector representation for each word separately.
For example: if I represent 2 words {hot vectors)
hotel as [0 0 0 0 0 0 0 0 0 0 1 0 0]
motel as [0 0 0 0 0 0 0 1 0 0 0 0 0]

now if a user mistakenly type motel instead of hotel this vector representation will never take uuser from motel to hotel because it doesn't show any similarity between these 2 words

Distributional semantics
In this, a meaning of a word is given by words around which it frequently occurs(meaning by context).

  • The bank of road is curved here

  • This bank increases the salary of employs annually

Depending upon context the word "bank" has different meaning

Word Embeddings

Word vectors are called word embeddings.
It is basically how we represent a word to neural network. The word is represented in form of vectors in continuous vector space.
A dense vector is being built for each word, chosen so that it is similar to vectors of words that appear in similar contexts.
The very common size for vectors in real life is 300 dimensional vector.

Word2vec

Introduced by Mikolov et al. in 2013.
Idea
We have a large number of text.
Each word in fixed vocabulary is represented by a vector.
Go through each position in text (t), which will have a center word (c) and context (o).
Use similarity of word vectors for c and o to calculate probability of o given c or the other way.
keep adjusting word vectors to increase probability.
Remember every word has 2 vectors:
center vector and context vector

To minimize the loss
To train a model, we gradually adjust our parameters to minimize loss.
We use some calculus here i.e. Chain rule in order to determine how to optimize values of parameters.

Conclusion

In conclusion, word embeddings have transformed NLP by representing words as dense vectors that capture semantic relationships and contextual meanings, which traditional methods like one-hot encoding and TF-IDF could not. Tools like WordNet helped but had limitations. Word2Vec, introduced by Mikolov et al., significantly advanced the field by using context to create meaningful word vectors. These embeddings are crucial for translating, summarizing, and understanding text more accurately, bridging the gap between human language complexity and machine understanding.

Top comments (0)