<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Rohab Shabbir</title>
    <description>The latest articles on DEV Community by Rohab Shabbir (@rohab_shabbir).</description>
    <link>https://dev.to/rohab_shabbir</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1614121%2Fbb33788b-5ad9-4d29-89d0-ec3c3fed9ab9.jpeg</url>
      <title>DEV Community: Rohab Shabbir</title>
      <link>https://dev.to/rohab_shabbir</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rohab_shabbir"/>
    <language>en</language>
    <item>
      <title>Word Embeddings</title>
      <dc:creator>Rohab Shabbir</dc:creator>
      <pubDate>Tue, 25 Jun 2024 20:34:18 +0000</pubDate>
      <link>https://dev.to/rohab_shabbir/word-embeddings-446a</link>
      <guid>https://dev.to/rohab_shabbir/word-embeddings-446a</guid>
      <description>&lt;h2&gt;
  
  
  &lt;strong&gt;Human Language and word meanings&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Human languages can be highly complex and misunderstood. It can be easily understandable for humans but not for computers. Because same words can have different meanings in different context.&lt;br&gt;
Google translate translates to a good certain limit but in some cases when it literally translates a webpage, some of the lines doesn't make sense because sometimes it translates independent of context. GPT-3 is a very good AI tool released by openAI that is trained over large models for translation, summarization and other purposes.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Meaning&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;What is meaning according to different definitions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;the idea represented by  word or phrase&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;the idea that a person wants to convey using different words, phrases etc&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;the idea that is expressed in a work of writing&lt;br&gt;
the commonest way of linguistic way of thinking about meaning is that they say&lt;br&gt;
"it is a signifier(symbol) that signifies an idea"&lt;br&gt;
It is also referred as denotational semantics&lt;br&gt;
This model is not deeply implemented. In NLP the traditionally the way by which meaning is handled is to make use of dictionaries.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Wordnet&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Wordnet is a large lexical database of English, which makes groups of synonyms of words.&lt;br&gt;
But it is also not highly efficient. For example, in this database good id grouped as synonym of proficient that may be correct but not in all contexts.&lt;br&gt;
It is also missing new words.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Word Relationships&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Representing words as discrete symbols&lt;/strong&gt;&lt;br&gt;
In traditional NLP, words are represented as discrete symbols. In deep learning we have symbols like hotel, conference, motel which we refer as localized representation.&lt;br&gt;
We have vector representation for each word separately.&lt;br&gt;
For example: if I represent 2 words {hot vectors)&lt;br&gt;
hotel as [0 0 0 0 0 0 0 0 0 0 1 0 0]&lt;br&gt;
motel as [0 0 0 0 0 0 0 1 0 0 0 0 0]&lt;/p&gt;

&lt;p&gt;now if a user mistakenly type motel instead of hotel this vector representation will never take uuser from motel to hotel because it doesn't show any similarity between these 2 words &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Distributional semantics&lt;/strong&gt;&lt;br&gt;
In this, a meaning of a word is given by words around which it frequently occurs(meaning by context).&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The bank of road is curved here&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;This bank increases the salary of employs annually&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Depending upon context the word "bank" has different meaning&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Word Embeddings&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Word vectors are called word embeddings.&lt;br&gt;
It is basically how we represent a word to neural network. The word is represented in form of vectors in continuous vector space.&lt;br&gt;
A dense vector is being built for each word, chosen so that it is similar to vectors of words that appear in similar contexts.&lt;br&gt;
The very common size for vectors in real life is 300 dimensional vector. &lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Word2vec&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Introduced by Mikolov et al. in 2013.&lt;br&gt;
&lt;strong&gt;Idea&lt;/strong&gt;&lt;br&gt;
We have a large number of text.&lt;br&gt;
Each word in fixed vocabulary is represented by a vector.&lt;br&gt;
Go through  each position in text (t), which will have a center word (c) and context (o).&lt;br&gt;
Use similarity of word vectors for c and o to calculate probability of o given c or the other way.&lt;br&gt;
keep adjusting word vectors to increase probability.&lt;br&gt;
Remember every word has 2 vectors:&lt;br&gt;
center vector and context vector&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;To minimize the loss&lt;/strong&gt;&lt;br&gt;
To train a model, we gradually adjust our parameters  to minimize loss.&lt;br&gt;
We use some calculus here i.e. Chain rule in order to determine how to optimize values of parameters.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Conclusion&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;In conclusion, word embeddings have transformed NLP by representing words as dense vectors that capture semantic relationships and contextual meanings, which traditional methods like one-hot encoding and TF-IDF could not. Tools like WordNet helped but had limitations. Word2Vec, introduced by Mikolov et al., significantly advanced the field by using context to create meaningful word vectors. These embeddings are crucial for translating, summarizing, and understanding text more accurately, bridging the gap between human language complexity and machine understanding.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>nlp</category>
      <category>wordvector</category>
    </item>
    <item>
      <title>Introduction to Transformer Models</title>
      <dc:creator>Rohab Shabbir</dc:creator>
      <pubDate>Thu, 13 Jun 2024 01:24:06 +0000</pubDate>
      <link>https://dev.to/rohab_shabbir/introduction-to-transformer-models-1eon</link>
      <guid>https://dev.to/rohab_shabbir/introduction-to-transformer-models-1eon</guid>
      <description>&lt;h3&gt;
  
  
  &lt;strong&gt;NLP&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;NLP is a field of linguistics and machine learning focused on understanding everything related to human language.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is NLP&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Classifying whole sentences — sentiment analysis&lt;/li&gt;
&lt;li&gt;Classifying each word in a sentence — grammatically&lt;/li&gt;
&lt;li&gt;Generating text content — auto generated text&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Transformers and NLP&lt;/strong&gt;&lt;br&gt;
Transformers are game-changers in NLP. Unlike traditional models, they excel at understanding connections between words, no matter the distance. This "attention" allows them to act like language experts, analyzing massive amounts of text to perform tasks like translation and summarization with impressive accuracy.  We'll explore how these transformers work next!&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Transformers&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;These are basically models that can do almost every task of NLP; some are mentioned below. The most basic object that can do these tasks is pipeline() function.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sentiment analysis&lt;/strong&gt;&lt;br&gt;
It can classify sentences that are positive or negative.&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3jiajoyst7kd1f7291wg.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3jiajoyst7kd1f7291wg.jpeg" alt="Sentiment analysis" width="800" height="244"&gt;&lt;/a&gt;&lt;br&gt;
0.999… score tells that machine is confident about this 99%.&lt;br&gt;
We can also pass several sentences, score for each will be provided.&lt;br&gt;
By default, this pipeline selects a particular pretrained model that has been fine-tuned for sentiment analysis in English. The model is downloaded and cached when we create the classifier object&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zero-shot classification&lt;/strong&gt;&lt;br&gt;
It allows us to label the data which we want instead of relying the data labels in the models.&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnhtdx75rt08cbyh80l9f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnhtdx75rt08cbyh80l9f.png" alt="zero shot" width="800" height="172"&gt;&lt;/a&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8kljivks1pj48i6rc6ws.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8kljivks1pj48i6rc6ws.jpeg" alt="output" width="800" height="84"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Text generation&lt;/strong&gt;&lt;br&gt;
 The main idea about text generation is we’ll provide some text and it will generate text. We can also control the total length of output text.&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5e40h0qaz2h49693lsg3.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5e40h0qaz2h49693lsg3.jpeg" alt="text generation" width="800" height="459"&gt;&lt;/a&gt;If we don’t specify any model, it will use default model otherwise we can specify models as in above picture.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mask filling&lt;/strong&gt;&lt;br&gt;
The idea of this task is to fill in the blanks&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp19a0bv3kz9zjml4x2sk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp19a0bv3kz9zjml4x2sk.png" alt="mask" width="800" height="107"&gt;&lt;/a&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpd6yaof103atwmtbm1v0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpd6yaof103atwmtbm1v0.png" alt="mask filling" width="800" height="370"&gt;&lt;/a&gt;The value of k tells the number of possibility in the place of .&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Named entity recognition&lt;/strong&gt;&lt;br&gt;
It can separate the person, organization or other things in a sentence.&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F052hakn1bl3i7bzbkxrx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F052hakn1bl3i7bzbkxrx.png" alt="recognition" width="733" height="156"&gt;&lt;/a&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fopu1j175f303hzu2ctn2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fopu1j175f303hzu2ctn2.png" alt="result" width="506" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PER – person&lt;/li&gt;
&lt;li&gt;ORG – organization&lt;/li&gt;
&lt;li&gt;LOC – location&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Question answering&lt;/strong&gt;&lt;br&gt;
It will give the answer based on provided information. It does not generate answers it just extracts the answers from the given context&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fec8oe23qnc4a9uev0zov.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fec8oe23qnc4a9uev0zov.png" alt="question answer" width="800" height="205"&gt;&lt;/a&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5urzr1j4txfvpyu7u2ms.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5urzr1j4txfvpyu7u2ms.png" alt="output" width="800" height="42"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Summarization&lt;/strong&gt;&lt;br&gt;
In this case, it will summarize the whole paragraph which we will provide.&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8at3ingforp7fwipzzrv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8at3ingforp7fwipzzrv.png" alt="summary" width="800" height="399"&gt;&lt;/a&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzw7ygfif9qxm7u793ihs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzw7ygfif9qxm7u793ihs.png" alt="output" width="800" height="90"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Translation&lt;/strong&gt;&lt;br&gt;
It will translate your provide text into different languages.&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdg1f65nwesmmj06ik8zj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdg1f65nwesmmj06ik8zj.png" alt="translation" width="800" height="153"&gt;&lt;/a&gt;I have provided model name as well as translation languages “en-ur” English to Urdu.&lt;/p&gt;

&lt;h3&gt;
  
  
  How transformers work?
&lt;/h3&gt;

&lt;p&gt;The architecture was introduced in 2018, some influential models are GPT, BERT etc.&lt;br&gt;
The transformer models are basically language models, meaning they have been trained on large amounts of raw text in a self-supervised fashion. &lt;strong&gt;Self supervised learning&lt;/strong&gt; means that humans are not needed to label the data. It is not useful for specific practical tasks so in that case we use &lt;strong&gt;Transfer Learning&lt;/strong&gt;. It is transferring knowledge of specific model to other model for other specific task.&lt;br&gt;
Transformers are large models, to achieve better results, the models should be trained on large data but training on large data impacts environment heavily due to emission of carbon dioxide.&lt;br&gt;
So instead of &lt;strong&gt;pretraining&lt;/strong&gt;(training of model from scratch) we &lt;strong&gt;finetune the existing models&lt;/strong&gt;(using pretraining models) in order to reduce time, effects on environment.&lt;br&gt;
Fine-tuning a model therefore has lower time, data, financial, and environmental costs. It is also quicker and easier to iterate over different fine-tuning schemes, as the training is less constraining than a full pretraining.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;General Architecture&lt;/strong&gt;&lt;br&gt;
It generally consists of 2 sections&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Encoders&lt;/li&gt;
&lt;li&gt;Decoders&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;u&gt;Encoders&lt;/u&gt; receive input and builds representation of its features.&lt;br&gt;
&lt;u&gt;Decoders&lt;/u&gt; uses the above representation and gives output.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Models&lt;/strong&gt;&lt;br&gt;
There 3 types of models&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Only encoders — these are good for tasks that require 
understanding of input such as name or entity recognition etc.&lt;/li&gt;
&lt;li&gt;Only decoders — these are good for generative tasks.&lt;/li&gt;
&lt;li&gt;Both encoders and decoders — these are good for generative tasks that need input such as summarization or translation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcs599i2um769ftfbkyj0.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcs599i2um769ftfbkyj0.jpeg" alt="layers" width="800" height="1039"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  ENCODERS
&lt;/h3&gt;

&lt;p&gt;The architecture of BERT(the most popular model) is “encoder only”.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does it actually works&lt;/strong&gt;&lt;br&gt;
It takes input of certain words then generate a sequence (numerical, feature vector) for these words.&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4num6yz8w22vab453b9y.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4num6yz8w22vab453b9y.jpeg" alt="Iencoder" width="800" height="475"&gt;&lt;/a&gt;The numerical values generated for each word is not just value of that word but the numerical value or sequence is generated depending upon context of the sentence (Self attention mechanism), from left and right in sentence.(bi-directional)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When encoders can be used&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Classifications tasks&lt;/li&gt;
&lt;li&gt;Question answering tasks&lt;/li&gt;
&lt;li&gt;Masked language modeling
In these tasks Encoders really shine.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Representatives of this family&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ALBERT&lt;/li&gt;
&lt;li&gt;BERT&lt;/li&gt;
&lt;li&gt;DistilBERT&lt;/li&gt;
&lt;li&gt;ELECTRA&lt;/li&gt;
&lt;li&gt;RoBERTa&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  DECODERS
&lt;/h3&gt;

&lt;p&gt;We can do similar tasks with decoders as in encoders with a little bit loss of performance.&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F684lieiq4fczd2wgih7l.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F684lieiq4fczd2wgih7l.jpeg" alt="decoder" width="800" height="484"&gt;&lt;/a&gt;The difference between Encoders and decoders is that encoders uses self attention mechanism while decoders use a masked self attention mechanism, that is it generates sequence for a word independently of its context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When we should use a decoder&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Text generation (ability to generate text, a word or a known sequence of words in NLP is called casual language modeling)&lt;/li&gt;
&lt;li&gt;Word prediction
At each stage, for a given word the attention layers can only access the words positioned before it in the sentence. These models are often called auto-regressive models.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Representatives of this family&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CTRL&lt;/li&gt;
&lt;li&gt;GPT&lt;/li&gt;
&lt;li&gt;GPT-2&lt;/li&gt;
&lt;li&gt;Transformer XL&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  ENCODER-DECODER
&lt;/h3&gt;

&lt;p&gt;In these type of models, we use encoder alongside with decoder.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Working&lt;/strong&gt;&lt;br&gt;
Let’s take an example of translation (transduction)&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzjyrolrznyw40k3xbyy3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzjyrolrznyw40k3xbyy3.png" alt="encoder-decoder" width="800" height="450"&gt;&lt;/a&gt;We give a sentence as input to encoder, it generates some numerical sequence for those words and then these words are taken as input by decoder. Decoder decodes the sequence and output a word. The start of sequence word indicates that it should start decoding the words. When we get the first word and feature vector(sequence generated by encoder), encoder is no more needed.&lt;br&gt;
We have learnt about auto regressive manner of decoder. So, the word it output can now be used as its input to generate 2nd word. It will goes on until the sequence is finished.&lt;br&gt;
In this model, encoder takes care of understanding the sequence and decoder takes care about generation of output based on understanding of encoder.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where we can use these&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Translation&lt;/li&gt;
&lt;li&gt;Summarization&lt;/li&gt;
&lt;li&gt;generative question answering&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Representatives of this family&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;BART&lt;/li&gt;
&lt;li&gt;mBART&lt;/li&gt;
&lt;li&gt;Marian&lt;/li&gt;
&lt;li&gt;T5&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Limitations&lt;/strong&gt;&lt;br&gt;
Important note at the end of article is that if you want to use pretrain the model or finetune model, while these models are powerful but comes with limitations.&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvqfl8zm0l9lj1ck0vk9i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvqfl8zm0l9lj1ck0vk9i.png" alt="limitations" width="800" height="162"&gt;&lt;/a&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9lk94oqt0a9d247hwcqg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9lk94oqt0a9d247hwcqg.png" alt="output" width="782" height="85"&gt;&lt;/a&gt;While requiring a mask for above data it gives these possible words gender specific. So if you are using any of these models this can be an issue.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;br&gt;
In conclusion, transformer models have revolutionized the field of NLP. Their ability to understand relationships between words and handle long sequences makes them powerful tools for a wide range of tasks, from translation and text summarization to question answering and text generation. While the technical details can be complex, hopefully, this introduction has given you a basic understanding of how transformers work and their potential impact on the future of human-computer interaction.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>beginners</category>
      <category>learning</category>
    </item>
  </channel>
</rss>
