<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Rijul Rajesh</title>
    <description>The latest articles on DEV Community by Rijul Rajesh (@rijultp).</description>
    <link>https://dev.to/rijultp</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1207862%2F2d1456e5-ef74-42a1-ac31-d0e6d6bc547f.webp</url>
      <title>DEV Community: Rijul Rajesh</title>
      <link>https://dev.to/rijultp</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rijultp"/>
    <language>en</language>
    <item>
      <title>Understanding Attention Mechanisms – Part 6: Final Step in Decoding</title>
      <dc:creator>Rijul Rajesh</dc:creator>
      <pubDate>Sat, 04 Apr 2026 20:50:14 +0000</pubDate>
      <link>https://dev.to/rijultp/understanding-attention-mechanisms-part-6-final-step-in-decoding-5a87</link>
      <guid>https://dev.to/rijultp/understanding-attention-mechanisms-part-6-final-step-in-decoding-5a87</guid>
      <description>&lt;p&gt;In the &lt;a href="https://dev.to/rijultp/understanding-attention-mechanisms-part-5-how-attention-produces-the-first-output-1f3l"&gt;previous article&lt;/a&gt;, we obtained the initial output, but we didn’t receive the &lt;strong&gt;EOS token&lt;/strong&gt; yet.&lt;/p&gt;

&lt;p&gt;To get that, we need to &lt;strong&gt;unroll the embedding layer and the LSTMs in the decoder&lt;/strong&gt;, and then feed the translated word &lt;strong&gt;“vamos”&lt;/strong&gt; into the decoder’s unrolled embedding layer.&lt;/p&gt;

&lt;p&gt;After that, we follow the same process as before.&lt;br&gt;
But this time, we use the encoded values for &lt;strong&gt;“vamos”&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The second output from the decoder is &lt;strong&gt;EOS&lt;/strong&gt;, which means we are done decoding.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzq74iqzwe9az1bm85bvl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzq74iqzwe9az1bm85bvl.png" alt=" " width="800" height="483"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9x312c5mhfohuiplujoz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9x312c5mhfohuiplujoz.png" alt=" " width="800" height="739"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When we add attention to an encoder-decoder model, the encoder mostly stays the same.&lt;/p&gt;

&lt;p&gt;However, during each step of decoding, the model has access to the individual encodings for each input word.&lt;/p&gt;

&lt;p&gt;We use similarity scores and the &lt;strong&gt;softmax function&lt;/strong&gt; to determine &lt;strong&gt;what percentage of each encoded input word&lt;/strong&gt; should be used to predict the next output word.&lt;/p&gt;

&lt;p&gt;Now that we have added attention to the model, we won’t strictly need LSTMs in the same way.&lt;/p&gt;

&lt;p&gt;We’ll explore this further when we move on to &lt;strong&gt;transformers&lt;/strong&gt;.&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;Looking for an easier way to install tools, libraries, or entire repositories?&lt;/strong&gt;&lt;br&gt;
Try &lt;strong&gt;Installerpedia&lt;/strong&gt;: a &lt;strong&gt;community-driven, structured installation platform&lt;/strong&gt; that lets you install almost anything with &lt;strong&gt;minimal hassle&lt;/strong&gt; and &lt;strong&gt;clear, reliable guidance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipm &lt;span class="nb"&gt;install &lt;/span&gt;repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… and you’re done! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hexmos.com/freedevtools/installerpedia" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2s3mzj8pfcq94a1y4at.png" alt="Installerpedia Screenshot" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://hexmos.com/freedevtools/installerpedia/" rel="noopener noreferrer"&gt;&lt;strong&gt;Explore Installerpedia here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Understanding Attention Mechanisms – Part 5: How Attention Produces the First Output</title>
      <dc:creator>Rijul Rajesh</dc:creator>
      <pubDate>Wed, 01 Apr 2026 21:16:46 +0000</pubDate>
      <link>https://dev.to/rijultp/understanding-attention-mechanisms-part-5-how-attention-produces-the-first-output-1f3l</link>
      <guid>https://dev.to/rijultp/understanding-attention-mechanisms-part-5-how-attention-produces-the-first-output-1f3l</guid>
      <description>&lt;p&gt;In the &lt;a href="https://dev.to/rijultp/understanding-attention-mechanisms-part-4-turning-similarity-scores-into-attention-weights-5aj2"&gt;previous article&lt;/a&gt;, we stopped at using the &lt;strong&gt;softmax function to scale the scores&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;When we scale the values for the first encoded word &lt;strong&gt;“Let’s”&lt;/strong&gt; by &lt;strong&gt;0.4&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff2mh2c1dzkberz4204ur.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff2mh2c1dzkberz4204ur.png" alt=" " width="800" height="552"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And we scale the values for the second encoded word &lt;strong&gt;“go”&lt;/strong&gt; by &lt;strong&gt;0.6&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fculyv993hxrv7e5iyz9f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fculyv993hxrv7e5iyz9f.png" alt=" " width="637" height="780"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;Finally, we add the scaled values together:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxw4ktni42f7fo3ms7eag.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxw4ktni42f7fo3ms7eag.png" alt=" " width="745" height="578"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;These sums combine the separate encodings for both input words, &lt;strong&gt;“Let’s”&lt;/strong&gt; and &lt;strong&gt;“go”&lt;/strong&gt;, based on their similarity to &lt;strong&gt;EOS&lt;/strong&gt;.&lt;br&gt;
These are the &lt;strong&gt;attention values for EOS&lt;/strong&gt;.&lt;/p&gt;



&lt;p&gt;Now, to determine our first output word, we need to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Feed the attention values into a fully connected layer&lt;/li&gt;
&lt;li&gt;Also include the encoding for &lt;strong&gt;EOS&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Then pass everything through a &lt;strong&gt;softmax function&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This allows the model to select the first output word, &lt;strong&gt;“vamos”&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxd1jf8p4j1q2v61taj0z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxd1jf8p4j1q2v61taj0z.png" alt=" " width="800" height="473"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But we haven’t reached EOS yet.&lt;br&gt;
We will explore how to move further in the next article.&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;Looking for an easier way to install tools, libraries, or entire repositories?&lt;/strong&gt;&lt;br&gt;
Try &lt;strong&gt;Installerpedia&lt;/strong&gt;: a &lt;strong&gt;community-driven, structured installation platform&lt;/strong&gt; that lets you install almost anything with &lt;strong&gt;minimal hassle&lt;/strong&gt; and &lt;strong&gt;clear, reliable guidance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipm &lt;span class="nb"&gt;install &lt;/span&gt;repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… and you’re done! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hexmos.com/freedevtools/installerpedia" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2s3mzj8pfcq94a1y4at.png" alt="Installerpedia Screenshot" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://hexmos.com/freedevtools/installerpedia/" rel="noopener noreferrer"&gt;&lt;strong&gt;Explore Installerpedia here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Understanding Attention Mechanisms – Part 4: Turning Similarity Scores into Attention Weights</title>
      <dc:creator>Rijul Rajesh</dc:creator>
      <pubDate>Tue, 31 Mar 2026 19:10:00 +0000</pubDate>
      <link>https://dev.to/rijultp/understanding-attention-mechanisms-part-4-turning-similarity-scores-into-attention-weights-5aj2</link>
      <guid>https://dev.to/rijultp/understanding-attention-mechanisms-part-4-turning-similarity-scores-into-attention-weights-5aj2</guid>
      <description>&lt;p&gt;In the previous &lt;a href="https://dev.to/rijultp/understanding-attention-mechanisms-part-3-from-cosine-similarity-to-dot-product-39ga"&gt;article&lt;/a&gt;, we just explored the benefits of using dot product instead of cosine similarity for attention.&lt;/p&gt;

&lt;p&gt;Let's dig in further and see how our diagram looks after using the dot product.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqxltq5xdslpwvwgnbddh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqxltq5xdslpwvwgnbddh.png" alt=" " width="800" height="602"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We simply multiply each pair of output values and add them together:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(-0.76 × 0.91) + (0.75 × 0.38)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives us &lt;strong&gt;-0.41&lt;/strong&gt;.&lt;/p&gt;




&lt;p&gt;Likewise, we can compute a similarity score using the dot product between the second input word &lt;strong&gt;“go”&lt;/strong&gt; and the &lt;strong&gt;EOS token&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv5mxenu6r5xxs3x514c3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv5mxenu6r5xxs3x514c3.png" alt=" " width="800" height="615"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This gives us &lt;strong&gt;0.01&lt;/strong&gt;.&lt;/p&gt;




&lt;p&gt;Now that we have the scores, let’s see how to use them.&lt;/p&gt;

&lt;p&gt;The score between &lt;strong&gt;“go” and EOS (0.01)&lt;/strong&gt; is higher than the score between &lt;strong&gt;“let’s” and EOS (-0.41)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Since the score for &lt;strong&gt;“go”&lt;/strong&gt; is higher, we want the encoding for &lt;strong&gt;“go”&lt;/strong&gt; to have more influence on the first word that comes out of the decoder.&lt;/p&gt;




&lt;p&gt;We can achieve this by passing the scores through the &lt;strong&gt;softmax function&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The softmax function gives us values between &lt;strong&gt;0 and 1&lt;/strong&gt;, and they all add up to &lt;strong&gt;1&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F74ye44r05ka3frlqitq8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F74ye44r05ka3frlqitq8.png" alt=" " width="800" height="473"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So, we can think of the softmax function as a way to determine &lt;strong&gt;what percentage of each encoded input word we should use&lt;/strong&gt; when decoding.&lt;/p&gt;

&lt;p&gt;In this case:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We use &lt;strong&gt;40%&lt;/strong&gt; of the first encoded word (“let’s”)&lt;/li&gt;
&lt;li&gt;We use &lt;strong&gt;60%&lt;/strong&gt; of the second encoded word (“go”)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This helps the decoder decide what the first translated word should be.&lt;/p&gt;

&lt;p&gt;We will continue with the scaling part in the next article.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Looking for an easier way to install tools, libraries, or entire repositories?&lt;/strong&gt;&lt;br&gt;
Try &lt;strong&gt;Installerpedia&lt;/strong&gt;: a &lt;strong&gt;community-driven, structured installation platform&lt;/strong&gt; that lets you install almost anything with &lt;strong&gt;minimal hassle&lt;/strong&gt; and &lt;strong&gt;clear, reliable guidance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipm &lt;span class="nb"&gt;install &lt;/span&gt;repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… and you’re done! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hexmos.com/freedevtools/installerpedia" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2s3mzj8pfcq94a1y4at.png" alt="Installerpedia Screenshot" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://hexmos.com/freedevtools/installerpedia/" rel="noopener noreferrer"&gt;&lt;strong&gt;Explore Installerpedia here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Cosine Similarity vs Dot Product in Attention Mechanisms</title>
      <dc:creator>Rijul Rajesh</dc:creator>
      <pubDate>Mon, 30 Mar 2026 21:35:34 +0000</pubDate>
      <link>https://dev.to/rijultp/cosine-similarity-vs-dot-product-in-attention-mechanisms-1m9h</link>
      <guid>https://dev.to/rijultp/cosine-similarity-vs-dot-product-in-attention-mechanisms-1m9h</guid>
      <description>&lt;p&gt;For comparing the hidden states between the encoder and decoder, we need a &lt;strong&gt;similarity score&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Two common approaches to calculate this are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cosine similarity&lt;/li&gt;
&lt;li&gt;Dot product&lt;/li&gt;
&lt;/ul&gt;




&lt;h4&gt;
  
  
  Cosine Similarity
&lt;/h4&gt;

&lt;p&gt;It performs a dot product on the vectors and then normalizes the result.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Encoder output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;-0.76&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.75&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Decoder output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.91&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.38&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cosine similarity ≈ &lt;strong&gt;-0.39&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Close to &lt;strong&gt;1&lt;/strong&gt; → very similar → strong attention&lt;/li&gt;
&lt;li&gt;Close to &lt;strong&gt;0&lt;/strong&gt; → not related&lt;/li&gt;
&lt;li&gt;Negative → opposite → low attention&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;This is useful when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Values can vary a lot in size&lt;/li&gt;
&lt;li&gt;You want a &lt;strong&gt;consistent scale (-1 to 1)&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;The problem is that it’s a bit expensive. It requires extra calculations (division, square roots), and in attention we don’t always need that.&lt;/p&gt;




&lt;h4&gt;
  
  
  Dot Product
&lt;/h4&gt;

&lt;p&gt;Dot product is much simpler. It does the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multiply corresponding values&lt;/li&gt;
&lt;li&gt;Add them up&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(-0.76 × 0.91) + (0.75 × 0.38) = -0.41
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;Dot product is preferred in attention because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It’s fast&lt;/li&gt;
&lt;li&gt;It’s simple&lt;/li&gt;
&lt;li&gt;It gives good relative scores&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even if the numbers are not normalized, the model can still figure out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which words are more important&lt;/li&gt;
&lt;li&gt;Which words to ignore&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;Looking for an easier way to install tools, libraries, or entire repositories?&lt;/strong&gt;&lt;br&gt;
Try &lt;strong&gt;Installerpedia&lt;/strong&gt;: a &lt;strong&gt;community-driven, structured installation platform&lt;/strong&gt; that lets you install almost anything with &lt;strong&gt;minimal hassle&lt;/strong&gt; and &lt;strong&gt;clear, reliable guidance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipm &lt;span class="nb"&gt;install &lt;/span&gt;repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… and you’re done! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hexmos.com/freedevtools/installerpedia" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2s3mzj8pfcq94a1y4at.png" alt="Installerpedia Screenshot"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://hexmos.com/freedevtools/installerpedia/" rel="noopener noreferrer"&gt;&lt;strong&gt;Explore Installerpedia here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Understanding Attention Mechanisms – Part 3: From Cosine Similarity to Dot Product</title>
      <dc:creator>Rijul Rajesh</dc:creator>
      <pubDate>Sat, 28 Mar 2026 21:55:13 +0000</pubDate>
      <link>https://dev.to/rijultp/understanding-attention-mechanisms-part-3-from-cosine-similarity-to-dot-product-39ga</link>
      <guid>https://dev.to/rijultp/understanding-attention-mechanisms-part-3-from-cosine-similarity-to-dot-product-39ga</guid>
      <description>&lt;p&gt;In the &lt;a href="https://dev.to/rijultp/understanding-attention-mechanisms-part-2-comparing-encoder-and-decoder-outputs-22ch"&gt;previous article&lt;/a&gt;, we explored the comparison between encoder and decoder outputs. In this article, we will be checking the math on how the calculation is done, and how it can be further simplified.&lt;/p&gt;

&lt;p&gt;The output values for the &lt;strong&gt;two LSTM cells in the encoder&lt;/strong&gt; for the word &lt;strong&gt;"Let’s"&lt;/strong&gt; are &lt;strong&gt;-0.76&lt;/strong&gt; and &lt;strong&gt;0.75&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The output values from the &lt;strong&gt;two LSTM cells in the decoder&lt;/strong&gt; for the &lt;strong&gt;&lt;code&gt;&amp;lt;EOS&amp;gt;&lt;/code&gt; token&lt;/strong&gt; are &lt;strong&gt;0.91&lt;/strong&gt; and &lt;strong&gt;0.38&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;We can represent this as:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A = Encoder&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;B = Decoder&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Cell #1     Cell #2
-0.76       0.75
 0.91       0.38
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, we plug these values into the &lt;strong&gt;cosine similarity equation&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4scsocrbsp48d6cdoq61.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4scsocrbsp48d6cdoq61.png" alt=" " width="349" height="228"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This gives us a result of &lt;strong&gt;-0.39&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;To simplify this further, a common approach is to &lt;strong&gt;compute only the numerator&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The denominator mainly &lt;strong&gt;scales the value between -1 and 1&lt;/strong&gt;, so in some cases, we can ignore it for simplicity.&lt;/p&gt;

&lt;p&gt;Since we are dealing with a &lt;strong&gt;fixed number of cells&lt;/strong&gt;, this simplification works well. This is also known as the &lt;strong&gt;dot product&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx6oaj38y7mgeolaw27bz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx6oaj38y7mgeolaw27bz.png" alt=" " width="244" height="108"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When we calculate only the dot product, we get:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;(-0.76 × 0.91) + (0.75 × 0.38) = -0.41&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We will &lt;strong&gt;explore this further in the next article&lt;/strong&gt;.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Looking for an easier way to install tools, libraries, or entire repositories?&lt;/strong&gt;&lt;br&gt;
Try &lt;strong&gt;Installerpedia&lt;/strong&gt;: a &lt;strong&gt;community-driven, structured installation platform&lt;/strong&gt; that lets you install almost anything with &lt;strong&gt;minimal hassle&lt;/strong&gt; and &lt;strong&gt;clear, reliable guidance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipm &lt;span class="nb"&gt;install &lt;/span&gt;repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… and you’re done! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hexmos.com/freedevtools/installerpedia" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2s3mzj8pfcq94a1y4at.png" alt="Installerpedia Screenshot" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://hexmos.com/freedevtools/installerpedia/" rel="noopener noreferrer"&gt;&lt;strong&gt;Explore Installerpedia here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Understanding Attention Mechanisms – Part 2: Comparing Encoder and Decoder Outputs</title>
      <dc:creator>Rijul Rajesh</dc:creator>
      <pubDate>Fri, 27 Mar 2026 19:16:17 +0000</pubDate>
      <link>https://dev.to/rijultp/understanding-attention-mechanisms-part-2-comparing-encoder-and-decoder-outputs-22ch</link>
      <guid>https://dev.to/rijultp/understanding-attention-mechanisms-part-2-comparing-encoder-and-decoder-outputs-22ch</guid>
      <description>&lt;p&gt;In the &lt;a href="https://dev.to/rijultp/understanding-attention-mechanisms-part-1-why-long-sentences-break-encoder-decoders-lm9"&gt;previous article&lt;/a&gt;, we &lt;strong&gt;explored the main idea of attention&lt;/strong&gt; and the modifications it requires in an encoder–decoder model. Now, we will &lt;strong&gt;explore that idea further&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;An encoder–decoder model can be as simple as &lt;strong&gt;an embedding layer attached to a single LSTM&lt;/strong&gt;. If we want a more advanced encoder, we can &lt;strong&gt;add additional LSTM cells&lt;/strong&gt;.&lt;/p&gt;




&lt;p&gt;Now, we initialize the &lt;strong&gt;long-term and short-term memory&lt;/strong&gt; in the LSTMs of the encoder with &lt;strong&gt;zeros&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If our input sentence, which we want to translate into Spanish, is &lt;strong&gt;"Let's go"&lt;/strong&gt;, we can &lt;strong&gt;feed a 1 for "Let's" into the embedding layer&lt;/strong&gt;, unroll the network, and then &lt;strong&gt;feed a 1 for "go" into the embedding layer&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This process creates the &lt;strong&gt;context vector&lt;/strong&gt;, which we use to &lt;strong&gt;initialize a separate set of LSTM cells in the decoder&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdygvu0rik7w0ajcfrm5m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdygvu0rik7w0ajcfrm5m.png" alt=" " width="800" height="413"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;All of the input is &lt;strong&gt;compressed into the context vector&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;But the idea of &lt;strong&gt;attention&lt;/strong&gt; is that &lt;strong&gt;each step in the decoder should have direct access to the inputs&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;So, let’s understand how &lt;strong&gt;attention connects the inputs to each step of the decoder&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0at2hw9f0ehyotambfqf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0at2hw9f0ehyotambfqf.png" alt=" " width="800" height="491"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;In this example, the first thing attention does is &lt;strong&gt;determine how similar the outputs from the encoder LSTMs are to the outputs from the decoder LSTMs at each step&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In other words, we compute a &lt;strong&gt;similarity score between the LSTM outputs&lt;/strong&gt; (the &lt;strong&gt;short-term memory or hidden states&lt;/strong&gt;) from the encoder and the decoder.&lt;/p&gt;

&lt;p&gt;For instance, we calculate a similarity score between:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The LSTM output from the first step in the encoder&lt;/strong&gt;, and&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The LSTM output from the first step in the decoder&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn93fc6zknyy94lvwaq3m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn93fc6zknyy94lvwaq3m.png" alt=" " width="800" height="531"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We also calculate a similarity score between:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The LSTM output from the second step in the encoder&lt;/strong&gt;, and&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The LSTM output from the first step in the decoder&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1aaq7bm68gpabseihwxi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1aaq7bm68gpabseihwxi.png" alt=" " width="800" height="532"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;There are &lt;strong&gt;various ways to calculate this similarity&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;One simple method is &lt;strong&gt;cosine similarity&lt;/strong&gt;, which measures how similar two sequences of numbers (representing words) are.&lt;/p&gt;

&lt;p&gt;We will &lt;strong&gt;explore this further in the next article&lt;/strong&gt;.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Looking for an easier way to install tools, libraries, or entire repositories?&lt;/strong&gt;&lt;br&gt;
Try &lt;strong&gt;Installerpedia&lt;/strong&gt;: a &lt;strong&gt;community-driven, structured installation platform&lt;/strong&gt; that lets you install almost anything with &lt;strong&gt;minimal hassle&lt;/strong&gt; and &lt;strong&gt;clear, reliable guidance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipm &lt;span class="nb"&gt;install &lt;/span&gt;repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… and you’re done! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hexmos.com/freedevtools/installerpedia" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2s3mzj8pfcq94a1y4at.png" alt="Installerpedia Screenshot" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://hexmos.com/freedevtools/installerpedia/" rel="noopener noreferrer"&gt;&lt;strong&gt;Explore Installerpedia here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Understanding Attention Mechanisms – Part 1: Why Long Sentences Break Encoder–Decoders</title>
      <dc:creator>Rijul Rajesh</dc:creator>
      <pubDate>Thu, 26 Mar 2026 19:48:40 +0000</pubDate>
      <link>https://dev.to/rijultp/understanding-attention-mechanisms-part-1-why-long-sentences-break-encoder-decoders-lm9</link>
      <guid>https://dev.to/rijultp/understanding-attention-mechanisms-part-1-why-long-sentences-break-encoder-decoders-lm9</guid>
      <description>&lt;p&gt;In the &lt;a href="https://dev.to/rijultp/understanding-seq2seq-neural-networks-part-8-when-does-the-decoder-stop-4jdi"&gt;previous articles&lt;/a&gt;, we understood Seq2Seq models. Now, on the path toward transformers, we need to understand one more concept before reaching there: Attention.&lt;/p&gt;

&lt;p&gt;The encoder in a basic encoder–decoder, by unrolling the LSTMs, &lt;strong&gt;compresses the entire input sentence into a single context vector&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This works fine for short phrases like &lt;strong&gt;"Let's go"&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;But if we had a bigger input vocabulary with thousands of words, then we could input longer and more complicated sentences, like &lt;strong&gt;"Don't eat the delicious-looking and smelling pasta"&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For longer phrases, even with LSTMs, &lt;strong&gt;words that are input early on can be forgotten&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In this case, if we forget the first word &lt;strong&gt;"Don't"&lt;/strong&gt;, then it becomes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"eat the delicious-looking and smelling pasta"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So, sometimes it is important to &lt;strong&gt;remember the first word&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Basic RNNs had problems with long-term memory because they &lt;strong&gt;ran both long- and short-term information through a single path&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The main idea of &lt;strong&gt;Long Short-Term Memory (LSTM) units&lt;/strong&gt; is that they solve this problem by &lt;strong&gt;providing separate paths for long- and short-term memory&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Even with separate paths, if we have a lot of data, &lt;strong&gt;both paths still have to carry a large amount of information&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;So, a word at the start of a long phrase, like &lt;strong&gt;"Don't"&lt;/strong&gt;, can still get lost.&lt;/p&gt;

&lt;p&gt;So, the main idea of &lt;strong&gt;attention&lt;/strong&gt; is to &lt;strong&gt;add multiple new paths from the encoder to the decoder&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;There is &lt;strong&gt;one path per input value&lt;/strong&gt;, so each step of the decoder can &lt;strong&gt;directly access the relevant input values&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;We will explore more about attention in the next article.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Looking for an easier way to install tools, libraries, or entire repositories?&lt;/strong&gt;&lt;br&gt;
Try &lt;strong&gt;Installerpedia&lt;/strong&gt;: a &lt;strong&gt;community-driven, structured installation platform&lt;/strong&gt; that lets you install almost anything with &lt;strong&gt;minimal hassle&lt;/strong&gt; and &lt;strong&gt;clear, reliable guidance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipm &lt;span class="nb"&gt;install &lt;/span&gt;repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… and you’re done! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hexmos.com/freedevtools/installerpedia" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2s3mzj8pfcq94a1y4at.png" alt="Installerpedia Screenshot" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://hexmos.com/freedevtools/installerpedia/" rel="noopener noreferrer"&gt;&lt;strong&gt;Explore Installerpedia here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Understanding Seq2Seq Neural Networks – Part 8: When Does the Decoder Stop?</title>
      <dc:creator>Rijul Rajesh</dc:creator>
      <pubDate>Wed, 25 Mar 2026 22:23:53 +0000</pubDate>
      <link>https://dev.to/rijultp/understanding-seq2seq-neural-networks-part-8-when-does-the-decoder-stop-4jdi</link>
      <guid>https://dev.to/rijultp/understanding-seq2seq-neural-networks-part-8-when-does-the-decoder-stop-4jdi</guid>
      <description>&lt;p&gt;In the &lt;a href="https://dev.to/rijultp/understanding-seq2seq-neural-networks-part-7-generating-the-output-with-softmax-387i"&gt;previous article&lt;/a&gt;, we saw the translation being done.&lt;/p&gt;

&lt;p&gt;But there is an issue.&lt;/p&gt;

&lt;p&gt;The decoder &lt;strong&gt;does not stop until it outputs an EOS token&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;So, we plug the word &lt;strong&gt;"Vamos"&lt;/strong&gt; into the decoder’s unrolled embedding layer and unroll the two LSTM cells in each layer.&lt;/p&gt;

&lt;p&gt;Then, we run the output values (short-term memory or hidden states) into the same fully connected layer.&lt;/p&gt;

&lt;p&gt;The next predicted token is &lt;strong&gt;EOS&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgfk92jxp7gf6ufxqa7rg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgfk92jxp7gf6ufxqa7rg.png" alt=" " width="800" height="476"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How the Decoder Works&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;So now, this means we translated the English sentence &lt;strong&gt;"let’s go"&lt;/strong&gt; into the correct Spanish sentence.&lt;/p&gt;

&lt;p&gt;For the decoder, the &lt;strong&gt;context vector&lt;/strong&gt;, which is created by both layers of encoder unrolled LSTM cells, is used to initialize the LSTMs in the decoder.&lt;/p&gt;

&lt;p&gt;The input to the LSTMs comes from the output word embedding layer, which starts with &lt;strong&gt;EOS&lt;/strong&gt;. After that, it uses whatever word was predicted by the output layer.&lt;/p&gt;

&lt;p&gt;In practice, the decoder keeps predicting words until it predicts the &lt;strong&gt;EOS token&lt;/strong&gt; or reaches some maximum output length.&lt;/p&gt;

&lt;p&gt;All these weights and biases are trained using &lt;strong&gt;backpropagation&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;When training an encoder-decoder model, instead of using the predicted token as input to the decoder LSTMs, we use the &lt;strong&gt;known correct token&lt;/strong&gt;. This is known as &lt;strong&gt;teacher forcing&lt;/strong&gt; (&lt;a href="https://dev.to/rijultp/understanding-teacher-forcing-in-seq2seq-models-a89"&gt;explained in this article&lt;/a&gt;).&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What’s Next&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;That’s it for sequence-to-sequence neural networks.&lt;/p&gt;

&lt;p&gt;In the next article, we will continue with the &lt;strong&gt;attention mechanism&lt;/strong&gt; for neural networks.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Looking for an easier way to install tools, libraries, or entire repositories?&lt;/strong&gt;&lt;br&gt;
Try &lt;strong&gt;Installerpedia&lt;/strong&gt;: a &lt;strong&gt;community-driven, structured installation platform&lt;/strong&gt; that lets you install almost anything with &lt;strong&gt;minimal hassle&lt;/strong&gt; and &lt;strong&gt;clear, reliable guidance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipm &lt;span class="nb"&gt;install &lt;/span&gt;repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… and you’re done! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hexmos.com/freedevtools/installerpedia" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2s3mzj8pfcq94a1y4at.png" alt="Installerpedia Screenshot" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://hexmos.com/freedevtools/installerpedia/" rel="noopener noreferrer"&gt;&lt;strong&gt;Explore Installerpedia here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Understanding Teacher Forcing in Seq2Seq Models</title>
      <dc:creator>Rijul Rajesh</dc:creator>
      <pubDate>Mon, 23 Mar 2026 19:35:35 +0000</pubDate>
      <link>https://dev.to/rijultp/understanding-teacher-forcing-in-seq2seq-models-a89</link>
      <guid>https://dev.to/rijultp/understanding-teacher-forcing-in-seq2seq-models-a89</guid>
      <description>&lt;p&gt;When we learn about seq2seq neural networks, there is a term we should know called Teacher Forcing.&lt;/p&gt;

&lt;p&gt;When we train a seq2seq model, the decoder generates one token at a time, building the output sequence step by step.&lt;/p&gt;

&lt;p&gt;At each step, it needs a previous token as input to predict the next one.&lt;/p&gt;

&lt;p&gt;So in this case, we should think about what to provide as the previous token, since this choice directly affects how well the model learns.&lt;/p&gt;




&lt;p&gt;Without teacher forcing, the model uses its own previous prediction as input.&lt;/p&gt;

&lt;p&gt;Suppose the target is "I am learning"&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Predict "I" ✅&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Uses "I" and predicts "Is" ❌&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Uses "is", and everything goes off track&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here, one small mistake early causes all the following predictions to be wrong.&lt;/p&gt;

&lt;p&gt;So, if there is one mistake early, the mistakes keep compounding step by step.&lt;/p&gt;

&lt;p&gt;This makes training slow, unstable, and harder for the model to learn correct sequences.&lt;/p&gt;




&lt;p&gt;With teacher forcing, instead of using the model’s prediction, we feed the correct token from the dataset at every step.&lt;/p&gt;

&lt;p&gt;So even if the model makes a mistake at one step, we replace it with the correct token before moving forward.&lt;/p&gt;

&lt;p&gt;This ensures that the model always sees the right context while learning.&lt;/p&gt;

&lt;p&gt;Even if the model makes a mistake, we do not let that mistake affect future steps during training.&lt;/p&gt;

&lt;p&gt;This makes training faster, more stable, and easier for the model to converge.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Looking for an easier way to install tools, libraries, or entire repositories?&lt;/strong&gt;&lt;br&gt;
Try &lt;strong&gt;Installerpedia&lt;/strong&gt;: a &lt;strong&gt;community-driven, structured installation platform&lt;/strong&gt; that lets you install almost anything with &lt;strong&gt;minimal hassle&lt;/strong&gt; and &lt;strong&gt;clear, reliable guidance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipm &lt;span class="nb"&gt;install &lt;/span&gt;repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… and you’re done! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hexmos.com/freedevtools/installerpedia" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2s3mzj8pfcq94a1y4at.png" alt="Installerpedia Screenshot" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://hexmos.com/freedevtools/installerpedia/" rel="noopener noreferrer"&gt;&lt;strong&gt;Explore Installerpedia here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Understanding Seq2Seq Neural Networks – Part 7: Generating the Output with Softmax</title>
      <dc:creator>Rijul Rajesh</dc:creator>
      <pubDate>Sat, 21 Mar 2026 20:26:23 +0000</pubDate>
      <link>https://dev.to/rijultp/understanding-seq2seq-neural-networks-part-7-generating-the-output-with-softmax-387i</link>
      <guid>https://dev.to/rijultp/understanding-seq2seq-neural-networks-part-7-generating-the-output-with-softmax-387i</guid>
      <description>&lt;p&gt;In the &lt;a href="https://dev.to/rijultp/understanding-seq2seq-neural-networks-part-6-decoder-outputs-and-the-fully-connected-layer-1bhp"&gt;previous article&lt;/a&gt;,  we were transforming the outputs to the fully connected layer.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;fully connected layer&lt;/strong&gt; is just another name for a &lt;strong&gt;basic neural network&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This &lt;strong&gt;fully connected layer&lt;/strong&gt; has &lt;strong&gt;two inputs&lt;/strong&gt;, corresponding to the &lt;strong&gt;two values that come from the top layer of LSTM cells&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft40w0dlofjxmq3f305s3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft40w0dlofjxmq3f305s3.png" alt=" " width="709" height="488"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It has &lt;strong&gt;four outputs&lt;/strong&gt;, one for each &lt;strong&gt;token in the Spanish vocabulary&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwckxvl3osgzi1mwkip57.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwckxvl3osgzi1mwkip57.png" alt=" " width="574" height="423"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In between, we have &lt;strong&gt;connections between each input and output&lt;/strong&gt;, each with their own &lt;strong&gt;weights and biases&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Then, we run the &lt;strong&gt;output of the fully connected layer&lt;/strong&gt; through a &lt;strong&gt;Softmax function&lt;/strong&gt; to &lt;strong&gt;select the output word&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsl9kdpz6b6fn9bs1lnoh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsl9kdpz6b6fn9bs1lnoh.png" alt=" " width="581" height="726"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, going back to the &lt;strong&gt;full Encoder–Decoder model&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm3ng3ncxxocvp47hkz3k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm3ng3ncxxocvp47hkz3k.png" alt=" " width="800" height="526"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can see that the &lt;strong&gt;output from the Softmax function is “Vamos,”&lt;/strong&gt; which is the &lt;strong&gt;Spanish translation for “Let’s go.”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We will continue further in the &lt;strong&gt;next article&lt;/strong&gt;.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Looking for an easier way to install tools, libraries, or entire repositories?&lt;/strong&gt;&lt;br&gt;
Try &lt;strong&gt;Installerpedia&lt;/strong&gt;: a &lt;strong&gt;community-driven, structured installation platform&lt;/strong&gt; that lets you install almost anything with &lt;strong&gt;minimal hassle&lt;/strong&gt; and &lt;strong&gt;clear, reliable guidance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipm &lt;span class="nb"&gt;install &lt;/span&gt;repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… and you’re done! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hexmos.com/freedevtools/installerpedia" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2s3mzj8pfcq94a1y4at.png" alt="Installerpedia Screenshot" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://hexmos.com/freedevtools/installerpedia/" rel="noopener noreferrer"&gt;&lt;strong&gt;Explore Installerpedia here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Understanding Seq2Seq Neural Networks – Part 6: Decoder Outputs and the Fully Connected Layer</title>
      <dc:creator>Rijul Rajesh</dc:creator>
      <pubDate>Fri, 20 Mar 2026 19:45:53 +0000</pubDate>
      <link>https://dev.to/rijultp/understanding-seq2seq-neural-networks-part-6-decoder-outputs-and-the-fully-connected-layer-1bhp</link>
      <guid>https://dev.to/rijultp/understanding-seq2seq-neural-networks-part-6-decoder-outputs-and-the-fully-connected-layer-1bhp</guid>
      <description>&lt;p&gt;In the &lt;a href="https://dev.to/rijultp/understanding-seq2seq-neural-networks-part-5-decoding-the-context-vector-358l"&gt;previous article&lt;/a&gt;, we were looking at the &lt;strong&gt;embedding values in the encoder and the decoder&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyspq3gjqxrqd9gkmeqqw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyspq3gjqxrqd9gkmeqqw.png" alt=" " width="800" height="381"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you can see, they have &lt;strong&gt;different input words and symbols (tokens)&lt;/strong&gt; and &lt;strong&gt;different weights&lt;/strong&gt;, which result in &lt;strong&gt;different embedding values for each token&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Because we have just finished encoding the English sentence &lt;strong&gt;“Let’s go,”&lt;/strong&gt; the decoder starts with the &lt;strong&gt;embedding values for the  token&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbz4djm02wc1tvhm2e2mo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbz4djm02wc1tvhm2e2mo.png" alt=" " width="800" height="494"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The decoder then performs computations using &lt;strong&gt;two layers of LSTMs&lt;/strong&gt;, each with &lt;strong&gt;two LSTM cells&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;output values&lt;/strong&gt; (the &lt;strong&gt;short-term memories&lt;/strong&gt;, or &lt;strong&gt;hidden states&lt;/strong&gt;) from the &lt;strong&gt;top layer of LSTM cells&lt;/strong&gt; are then transformed using additional &lt;strong&gt;weights and biases&lt;/strong&gt; in what is called a &lt;strong&gt;fully connected layer&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;We will explore this further in the &lt;strong&gt;next article&lt;/strong&gt;.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Looking for an easier way to install tools, libraries, or entire repositories?&lt;/strong&gt;&lt;br&gt;
Try &lt;strong&gt;Installerpedia&lt;/strong&gt;: a &lt;strong&gt;community-driven, structured installation platform&lt;/strong&gt; that lets you install almost anything with &lt;strong&gt;minimal hassle&lt;/strong&gt; and &lt;strong&gt;clear, reliable guidance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipm &lt;span class="nb"&gt;install &lt;/span&gt;repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… and you’re done! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hexmos.com/freedevtools/installerpedia" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2s3mzj8pfcq94a1y4at.png" alt="Installerpedia Screenshot" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://hexmos.com/freedevtools/installerpedia/" rel="noopener noreferrer"&gt;&lt;strong&gt;Explore Installerpedia here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Understanding Seq2Seq Neural Networks – Part 5: Decoding the Context Vector</title>
      <dc:creator>Rijul Rajesh</dc:creator>
      <pubDate>Thu, 19 Mar 2026 03:12:22 +0000</pubDate>
      <link>https://dev.to/rijultp/understanding-seq2seq-neural-networks-part-5-decoding-the-context-vector-358l</link>
      <guid>https://dev.to/rijultp/understanding-seq2seq-neural-networks-part-5-decoding-the-context-vector-358l</guid>
      <description>&lt;p&gt;In the previous article, we stopped at the concept of the &lt;strong&gt;context vector&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In this article, we will start by &lt;strong&gt;decoding the context vector&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Connecting the Decoder&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The first thing we need to do is &lt;strong&gt;connect the long-term and short-term memories&lt;/strong&gt; (the &lt;strong&gt;cell states and hidden states&lt;/strong&gt;) that form the &lt;strong&gt;context vector&lt;/strong&gt; to a new set of &lt;strong&gt;LSTMs&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just like the &lt;strong&gt;encoder&lt;/strong&gt;, the decoder will also have &lt;strong&gt;two layers&lt;/strong&gt;, and each layer will have &lt;strong&gt;two LSTM cells&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;LSTMs in the decoder&lt;/strong&gt; are different from the ones in the encoder and have their own &lt;strong&gt;separate weights and biases&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Using the Context Vector&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;context vector&lt;/strong&gt; is used to &lt;strong&gt;initialize the long-term and short-term memories&lt;/strong&gt; (the &lt;strong&gt;cell states and hidden states&lt;/strong&gt;) in the LSTMs of the &lt;strong&gt;decoder&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This is important because it allows the decoder to &lt;strong&gt;start with the information learned from the input sentence&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Goal of the Decoder&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The ultimate goal of the &lt;strong&gt;decoder&lt;/strong&gt; is to &lt;strong&gt;convert the context vector into the output sentence&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In simple terms, the &lt;strong&gt;encoder understands the input&lt;/strong&gt;, and the &lt;strong&gt;decoder generates the output&lt;/strong&gt; based on that understanding.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Decoder Inputs&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Just like in the encoder, the input to the &lt;strong&gt;LSTM cells in the first layer&lt;/strong&gt; comes from an &lt;strong&gt;embedding layer&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;However, in this case, the embedding layer creates &lt;strong&gt;embedding values for Spanish words&lt;/strong&gt;, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;ir&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;vamos&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;y&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;/strong&gt; (End of Sentence symbol)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each of these words is treated as a &lt;strong&gt;token&lt;/strong&gt;, and the embedding layer converts them into &lt;strong&gt;numbers&lt;/strong&gt; that the neural network can process.&lt;/p&gt;

&lt;p&gt;We will explore the &lt;strong&gt;details of how the decoder generates the output sentence&lt;/strong&gt; in the next article.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Looking for an easier way to install tools, libraries, or entire repositories?&lt;/strong&gt;&lt;br&gt;
Try &lt;strong&gt;Installerpedia&lt;/strong&gt;: a &lt;strong&gt;community-driven, structured installation platform&lt;/strong&gt; that lets you install almost anything with &lt;strong&gt;minimal hassle&lt;/strong&gt; and &lt;strong&gt;clear, reliable guidance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipm &lt;span class="nb"&gt;install &lt;/span&gt;repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… and you’re done! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hexmos.com/freedevtools/installerpedia" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2s3mzj8pfcq94a1y4at.png" alt="Installerpedia Screenshot" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://hexmos.com/freedevtools/installerpedia/" rel="noopener noreferrer"&gt;&lt;strong&gt;Explore Installerpedia here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
