Word Embeddings

Rohab Shabbir — Tue, 25 Jun 2024 20:34:18 +0000

Human Language and word meanings

Human languages can be highly complex and misunderstood. It can be easily understandable for humans but not for computers. Because same words can have different meanings in different context.
Google translate translates to a good certain limit but in some cases when it literally translates a webpage, some of the lines doesn't make sense because sometimes it translates independent of context. GPT-3 is a very good AI tool released by openAI that is trained over large models for translation, summarization and other purposes.

Meaning

What is meaning according to different definitions:

the idea represented by word or phrase
the idea that a person wants to convey using different words, phrases etc
the idea that is expressed in a work of writing
the commonest way of linguistic way of thinking about meaning is that they say
"it is a signifier(symbol) that signifies an idea"
It is also referred as denotational semantics
This model is not deeply implemented. In NLP the traditionally the way by which meaning is handled is to make use of dictionaries.

Wordnet

Wordnet is a large lexical database of English, which makes groups of synonyms of words.
But it is also not highly efficient. For example, in this database good id grouped as synonym of proficient that may be correct but not in all contexts.
It is also missing new words.

Word Relationships

Representing words as discrete symbols
In traditional NLP, words are represented as discrete symbols. In deep learning we have symbols like hotel, conference, motel which we refer as localized representation.
We have vector representation for each word separately.
For example: if I represent 2 words {hot vectors)
hotel as [0 0 0 0 0 0 0 0 0 0 1 0 0]
motel as [0 0 0 0 0 0 0 1 0 0 0 0 0]

now if a user mistakenly type motel instead of hotel this vector representation will never take uuser from motel to hotel because it doesn't show any similarity between these 2 words

Distributional semantics
In this, a meaning of a word is given by words around which it frequently occurs(meaning by context).

The bank of road is curved here
This bank increases the salary of employs annually

Depending upon context the word "bank" has different meaning

Word Embeddings

Word vectors are called word embeddings.
It is basically how we represent a word to neural network. The word is represented in form of vectors in continuous vector space.
A dense vector is being built for each word, chosen so that it is similar to vectors of words that appear in similar contexts.
The very common size for vectors in real life is 300 dimensional vector.

Word2vec

Introduced by Mikolov et al. in 2013.
Idea
We have a large number of text.
Each word in fixed vocabulary is represented by a vector.
Go through each position in text (t), which will have a center word (c) and context (o).
Use similarity of word vectors for c and o to calculate probability of o given c or the other way.
keep adjusting word vectors to increase probability.
Remember every word has 2 vectors:
center vector and context vector

To minimize the loss
To train a model, we gradually adjust our parameters to minimize loss.
We use some calculus here i.e. Chain rule in order to determine how to optimize values of parameters.

Conclusion

In conclusion, word embeddings have transformed NLP by representing words as dense vectors that capture semantic relationships and contextual meanings, which traditional methods like one-hot encoding and TF-IDF could not. Tools like WordNet helped but had limitations. Word2Vec, introduced by Mikolov et al., significantly advanced the field by using context to create meaningful word vectors. These embeddings are crucial for translating, summarizing, and understanding text more accurately, bridging the gap between human language complexity and machine understanding.

Introduction to Transformer Models

Rohab Shabbir — Thu, 13 Jun 2024 01:24:06 +0000

NLP

NLP is a field of linguistics and machine learning focused on understanding everything related to human language.

What is NLP

Classifying whole sentences — sentiment analysis
Classifying each word in a sentence — grammatically
Generating text content — auto generated text

Transformers and NLP
Transformers are game-changers in NLP. Unlike traditional models, they excel at understanding connections between words, no matter the distance. This "attention" allows them to act like language experts, analyzing massive amounts of text to perform tasks like translation and summarization with impressive accuracy. We'll explore how these transformers work next!

Transformers

These are basically models that can do almost every task of NLP; some are mentioned below. The most basic object that can do these tasks is pipeline() function.

Sentiment analysis
It can classify sentences that are positive or negative.

0.999… score tells that machine is confident about this 99%.
We can also pass several sentences, score for each will be provided.
By default, this pipeline selects a particular pretrained model that has been fine-tuned for sentiment analysis in English. The model is downloaded and cached when we create the classifier object

Zero-shot classification
It allows us to label the data which we want instead of relying the data labels in the models.

Text generation
The main idea about text generation is we’ll provide some text and it will generate text. We can also control the total length of output text.
If we don’t specify any model, it will use default model otherwise we can specify models as in above picture.

Mask filling
The idea of this task is to fill in the blanks
The value of k tells the number of possibility in the place of .

Named entity recognition
It can separate the person, organization or other things in a sentence.

PER – person
ORG – organization
LOC – location

Question answering
It will give the answer based on provided information. It does not generate answers it just extracts the answers from the given context

Summarization
In this case, it will summarize the whole paragraph which we will provide.

Translation
It will translate your provide text into different languages.
I have provided model name as well as translation languages “en-ur” English to Urdu.

How transformers work?

The architecture was introduced in 2018, some influential models are GPT, BERT etc.
The transformer models are basically language models, meaning they have been trained on large amounts of raw text in a self-supervised fashion. Self supervised learning means that humans are not needed to label the data. It is not useful for specific practical tasks so in that case we use Transfer Learning. It is transferring knowledge of specific model to other model for other specific task.
Transformers are large models, to achieve better results, the models should be trained on large data but training on large data impacts environment heavily due to emission of carbon dioxide.
So instead of pretraining(training of model from scratch) we finetune the existing models(using pretraining models) in order to reduce time, effects on environment.
Fine-tuning a model therefore has lower time, data, financial, and environmental costs. It is also quicker and easier to iterate over different fine-tuning schemes, as the training is less constraining than a full pretraining.

General Architecture
It generally consists of 2 sections

Encoders
Decoders

Encoders receive input and builds representation of its features.
Decoders uses the above representation and gives output.

Models
There 3 types of models

Only encoders — these are good for tasks that require understanding of input such as name or entity recognition etc.
Only decoders — these are good for generative tasks.
Both encoders and decoders — these are good for generative tasks that need input such as summarization or translation.

ENCODERS

The architecture of BERT(the most popular model) is “encoder only”.

How does it actually works
It takes input of certain words then generate a sequence (numerical, feature vector) for these words.
The numerical values generated for each word is not just value of that word but the numerical value or sequence is generated depending upon context of the sentence (Self attention mechanism), from left and right in sentence.(bi-directional)

When encoders can be used

Classifications tasks
Question answering tasks
Masked language modeling In these tasks Encoders really shine.

Representatives of this family

ALBERT
BERT
DistilBERT
ELECTRA
RoBERTa

DECODERS

We can do similar tasks with decoders as in encoders with a little bit loss of performance.
The difference between Encoders and decoders is that encoders uses self attention mechanism while decoders use a masked self attention mechanism, that is it generates sequence for a word independently of its context.

When we should use a decoder

Text generation (ability to generate text, a word or a known sequence of words in NLP is called casual language modeling)
Word prediction At each stage, for a given word the attention layers can only access the words positioned before it in the sentence. These models are often called auto-regressive models.

Representatives of this family

CTRL
GPT
GPT-2
Transformer XL

ENCODER-DECODER

In these type of models, we use encoder alongside with decoder.

Working
Let’s take an example of translation (transduction)
We give a sentence as input to encoder, it generates some numerical sequence for those words and then these words are taken as input by decoder. Decoder decodes the sequence and output a word. The start of sequence word indicates that it should start decoding the words. When we get the first word and feature vector(sequence generated by encoder), encoder is no more needed.
We have learnt about auto regressive manner of decoder. So, the word it output can now be used as its input to generate 2nd word. It will goes on until the sequence is finished.
In this model, encoder takes care of understanding the sequence and decoder takes care about generation of output based on understanding of encoder.

Where we can use these

Translation
Summarization
generative question answering

Representatives of this family

BART
mBART
Marian
T5

Limitations
Important note at the end of article is that if you want to use pretrain the model or finetune model, while these models are powerful but comes with limitations.
While requiring a mask for above data it gives these possible words gender specific. So if you are using any of these models this can be an issue.

Conclusion
In conclusion, transformer models have revolutionized the field of NLP. Their ability to understand relationships between words and handle long sequences makes them powerful tools for a wide range of tasks, from translation and text summarization to question answering and text generation. While the technical details can be complex, hopefully, this introduction has given you a basic understanding of how transformers work and their potential impact on the future of human-computer interaction.

DEV Community: Rohab Shabbir

Word Embeddings

Human Language and word meanings

Meaning

Wordnet

Word Relationships

Word Embeddings

Word2vec

Conclusion

Introduction to Transformer Models

NLP

Transformers

How transformers work?

ENCODERS

DECODERS

ENCODER-DECODER