<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ravi Suresh Mashru</title>
    <description>The latest articles on DEV Community by Ravi Suresh Mashru (@ravimashru).</description>
    <link>https://dev.to/ravimashru</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F496328%2F6126fe81-03d5-4238-8089-56c1e77929b8.png</url>
      <title>DEV Community: Ravi Suresh Mashru</title>
      <link>https://dev.to/ravimashru</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ravimashru"/>
    <language>en</language>
    <item>
      <title>A Brief Introduction to Dynamic Programming</title>
      <dc:creator>Ravi Suresh Mashru</dc:creator>
      <pubDate>Thu, 01 Jul 2021 04:29:16 +0000</pubDate>
      <link>https://dev.to/ravimashru/a-brief-introduction-to-dynamic-programming-563i</link>
      <guid>https://dev.to/ravimashru/a-brief-introduction-to-dynamic-programming-563i</guid>
      <description>&lt;p&gt;Dynamic programming is a technique that can be used to solve a particular class of problems. Let's take a look at how to determine if you can use dynamic programming for a given problem, and the different approaches (top-down and bottom-up) that you can use.&lt;/p&gt;

&lt;h2&gt;
  
  
  When can you use dynamic programming?
&lt;/h2&gt;

&lt;p&gt;To use dynamic programming, you need to be able to break down the problem into smaller subproblems. If you are sure you can do that, then you need to check if the problem has the following properties: &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Overlapping subproblems &lt;/li&gt;
&lt;li&gt;Optimal substructure &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If a problem can be broken down into smaller subproblems and has these two properties, then you can apply dynamic programming to solve the problem and &lt;strong&gt;success is guaranteed&lt;/strong&gt;!&lt;/p&gt;

&lt;p&gt;Let's dive deeper into what each of these mean while trying to solve &lt;a href="https://leetcode.com/problems/climbing-stairs/" rel="noopener noreferrer"&gt;problem #70&lt;/a&gt; - Climbing Stairs on Leetcode. &lt;/p&gt;

&lt;p&gt;Here is what the problem statement says: &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You are climbing a staircase. It takes &lt;code&gt;n&lt;/code&gt; steps to reach the top. &lt;/p&gt;

&lt;p&gt;Each time you can either climb 1 or 2 steps. In how many distinct ways can you climb to the top?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Break down the problem
&lt;/h3&gt;

&lt;p&gt;Let us first see if we can break down the problem into smaller subproblems. Let us say that the answer we want - the number of distinct ways we can climb a staircase with &lt;code&gt;n&lt;/code&gt; steps, can be expressed as &lt;code&gt;countDistinctPaths(n)&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;Now, we know that we can take either one or two steps at a time. If we take one step, we need to follow the same rules to climb the remaining &lt;code&gt;n-1&lt;/code&gt; steps. Similarly, if we climb two steps, we again need to follow the same rules to climb the remaining &lt;code&gt;n-2&lt;/code&gt; steps. &lt;/p&gt;

&lt;p&gt;So, the total number of ways we can climb the staircase is either by taking one step and then taking one of the &lt;code&gt;countDistinctPaths(n-1)&lt;/code&gt; paths for the remaining &lt;code&gt;n-1&lt;/code&gt; steps, or by taking two steps and then taking one of the &lt;code&gt;countDistinctPaths(n-2)&lt;/code&gt; paths for the remaining &lt;code&gt;n-2&lt;/code&gt; steps. &lt;/p&gt;

&lt;p&gt;We can write the total number of paths we can take as follows:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;countDistinctPaths(n) = countDistinctPaths(n-1) + countDistinctPaths(n-2)&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;As you can see, we've managed to break the problem down into smaller subproblems! If we have the answer for &lt;code&gt;n-1&lt;/code&gt; and &lt;code&gt;n-2&lt;/code&gt;, then we can combine those answers to calculate the answer for &lt;code&gt;n&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;There's one more thing we need to think about though - how small can we keep making the problems? &lt;/p&gt;

&lt;p&gt;Well, the smallest staircase we can have is one with a single step. And there is only one way we can climb that step (since we can't take two steps in this case - because there's only a single step!) Also, if there is no staircase at all, there is no way we can climb it! &lt;/p&gt;

&lt;p&gt;So we can rewrite the problem as: &lt;/p&gt;

&lt;p&gt;&lt;code&gt;countDistinctPaths(0) = 0&lt;/code&gt; (no staircase, so no way to climb it!) &lt;/p&gt;

&lt;p&gt;&lt;code&gt;countDistinctPaths(1) = 1&lt;/code&gt; (only one step, only one way to climb it) &lt;/p&gt;

&lt;p&gt;For any value of n greater than 1, &lt;code&gt;countDistinctPaths(n) = countDistinctPaths(n-1) + countDistinctPaths(n-2)&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This kind of expression is also commonly known as a &lt;strong&gt;recurrence relation&lt;/strong&gt;. &lt;/p&gt;

&lt;h3&gt;
  
  
  Overlapping subproblems
&lt;/h3&gt;

&lt;p&gt;To check if there are overlapping subproblems, let us try to think about how many times we will call &lt;code&gt;countDistinctPaths&lt;/code&gt; if we want the answer of say a staircase with 5 steps.&lt;/p&gt;

&lt;p&gt;We know that the answer we're looking for is &lt;code&gt;countDistinctPaths(5)&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;From our recurrence relation, we know that &lt;code&gt;countDistinctPaths(5) = countDistinctPaths(4) + countDistinctPaths(3)&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;We can then use the recurrence relation to further break down &lt;code&gt;countDistinctPaths(4)&lt;/code&gt; into &lt;code&gt;countDistinctPaths(4) = countDistinctPaths(3) + countDistinctPaths(2)&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;Now if we put this back in the first expression, we get &lt;code&gt;countDistinctPaths(5) = (countDistinctPaths(3) + countDistinctPaths(2)) + countDistinctPaths(3)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Similarly, we can replace &lt;code&gt;countDistinctPaths(3)&lt;/code&gt; with &lt;code&gt;countDistinctPaths(2) + countDistinctPaths(1)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The result is then &lt;code&gt;countDistinctPaths(5) = (countDistinctPaths(3) + countDistinctPaths(2)) + (countDistinctPaths(2) + countDistinctPaths(1))&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If you look closely, you'll notice that we're computing &lt;code&gt;countDistinctPaths(3)&lt;/code&gt; and &lt;code&gt;countDistinctPaths(2)&lt;/code&gt; multiple times. &lt;/p&gt;

&lt;p&gt;It's easier to see all the computations we have to do in the form of a tree: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe6sskmtbrxkr4u3p0mzz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe6sskmtbrxkr4u3p0mzz.png" alt="Call tree of a recursive function solving a problem with overlapping subproblems"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Each node in this tree is a call to &lt;code&gt;countDistinctPaths&lt;/code&gt; and the value in the node is the parameter we are passing to the function. As you can see, we're repeating the calls for 2 and 3. This tells us that the problem we are trying to solve has overlapping subproblems. &lt;/p&gt;

&lt;h3&gt;
  
  
  Optimal substructure
&lt;/h3&gt;

&lt;p&gt;If you can find the optimal solution to a problem using optimal solutions to its subproblems, then the problem is said to have an optimal substructure. &lt;/p&gt;

&lt;p&gt;What this means for us is that to find the optimal solution for &lt;code&gt;countDistinctPaths(n)&lt;/code&gt;, we need the optimal solution for &lt;code&gt;countDistinctPaths(n-1)&lt;/code&gt; and &lt;code&gt;countDistinctPaths(n-2)&lt;/code&gt;. In this particular context, "optimal" for us means the maximum number of paths. Therefore, this problem has an optimal substructure. &lt;/p&gt;

&lt;p&gt;I personally had a tough time understanding this property and found looking at examples of problems that DON'T have optimal substructure helped understand this better. You can find a list of such problems on the &lt;a href="https://en.wikipedia.org/wiki/Optimal_substructure#Problems_without_optimal_substructure" rel="noopener noreferrer"&gt;optimal substructure page on Wikipedia&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  How can you apply dynamic programming to a problem?
&lt;/h2&gt;

&lt;p&gt;Once you have verified that a problem can be solved using dynamic programming, there are two approaches you can use to solve it: the &lt;strong&gt;top-down&lt;/strong&gt; approach or the &lt;strong&gt;bottom-up approach&lt;/strong&gt;. Let's take a closer look at each of these. &lt;/p&gt;

&lt;h3&gt;
  
  
  The top-down approach
&lt;/h3&gt;

&lt;p&gt;The top-down approach involves converting the recurrence relation we wrote to recursive function calls and then making some minor tweaks to prevent the repeated evaluation of the overlapping subproblems. &lt;/p&gt;

&lt;p&gt;Let's start with a recursive implementation of our recurrence relation in JavaScript:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;countDistinctPaths&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; 

  &lt;span class="c1"&gt;// If there are no stairs &lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; 
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
  &lt;span class="p"&gt;}&lt;/span&gt; 

  &lt;span class="c1"&gt;// If there is only one stair &lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; 
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
  &lt;span class="p"&gt;}&lt;/span&gt; 

  &lt;span class="c1"&gt;// The recurrence relation &lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;countDistinctPaths&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nf"&gt;countDistinctPaths&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; 

&lt;span class="p"&gt;}&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As we saw in the tree above, this plain recursive implementation makes repeated calculations for the overlapping subproblems. These repeated calls mean we are spending time calculating the same values over and over again. And the number of repeated calls is much higher for larger values of &lt;code&gt;n&lt;/code&gt; so we're wasting a lot of time!&lt;/p&gt;

&lt;p&gt;With dynamic programming, we can compute the value of each subproblem just once and store it in memory - a technique called "memoization" (nope, not a typo, it's not memorization. See &lt;a href="https://en.wikipedia.org/wiki/Memoization" rel="noopener noreferrer"&gt;this Wikipedia page&lt;/a&gt; for how this term was coined). The next time we encounter the same subproblem and need to calculate the value, instead of using the recurrence relation, we can just retrieve the result from memory!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// We've added "memo"ry to store values we compute. It is empty in the beginning. &lt;/span&gt;
&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;countDistinctPaths&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;memo&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; 

  &lt;span class="c1"&gt;// If there are no stairs &lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; 
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
  &lt;span class="p"&gt;}&lt;/span&gt; 

  &lt;span class="c1"&gt;// If there is only one stair &lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; 
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
  &lt;span class="p"&gt;}&lt;/span&gt; 

  &lt;span class="c1"&gt;// Check if we have already computed this before &lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="nx"&gt;memo&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; 
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;memo&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt; 
  &lt;span class="p"&gt;}&lt;/span&gt; 

  &lt;span class="c1"&gt;// The recurrence relation with two minor tweaks: &lt;/span&gt;
  &lt;span class="c1"&gt;// 1. We store the computed value in memo[n] for future use &lt;/span&gt;
  &lt;span class="c1"&gt;// 2. We pass the memory object to the recursive calls &lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;memo&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;countDistinctPaths&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;memo&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nf"&gt;countDistinctPaths&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;memo&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; 

&lt;span class="p"&gt;}&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This approach is called top-down because we start with the biggest problem first, &lt;code&gt;countDistinctPaths(n)&lt;/code&gt; and keep recursively breaking it down into smaller problems until we reach the smallest possible subproblems - &lt;code&gt;countDistinctPaths(0)&lt;/code&gt; and &lt;code&gt;countDistinctPaths(1)&lt;/code&gt;. &lt;/p&gt;

&lt;h3&gt;
  
  
  The bottom-up approach
&lt;/h3&gt;

&lt;p&gt;With the bottom up approach, we start from the smallest subproblems and iteratively combine them until we find a solution to the original problem. &lt;/p&gt;

&lt;p&gt;For us, this means we start with &lt;code&gt;countDistinctPaths(0)&lt;/code&gt; and &lt;code&gt;countDistinctPaths(1)&lt;/code&gt;to which we know the answer, and then combine them to get the answer to &lt;code&gt;countDistinctPaths(2)&lt;/code&gt; and then &lt;code&gt;countDistinctPaths(3)&lt;/code&gt; and so on until we find the answer to &lt;code&gt;countDistinctPaths(n)&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;We need a way to store the value of &lt;code&gt;countDistinctPaths(n)&lt;/code&gt; for each value of &lt;code&gt;n&lt;/code&gt;. This storage is commonly known as a "table" and this bottom-up approach is also commonly called "tabulation". &lt;/p&gt;

&lt;p&gt;As you can see, there is no recursion involved here. Just good old iteration!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;countDistinctPaths&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; 

  &lt;span class="c1"&gt;// Our "table" to store the result for each value of n&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;table&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; 

  &lt;span class="c1"&gt;// The case with no stairs &lt;/span&gt;
  &lt;span class="nx"&gt;table&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 

  &lt;span class="c1"&gt;// The case with one stair &lt;/span&gt;
  &lt;span class="nx"&gt;table&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 

  &lt;span class="c1"&gt;// We keep combining subproblems until we find a solution to the original problem &lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="nx"&gt;n&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; 
    &lt;span class="nx"&gt;table&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;table&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;table&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt; 
  &lt;span class="p"&gt;}&lt;/span&gt; 

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;table&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt; 
&lt;span class="p"&gt;}&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And that's it! Once the loop finishes, we'll have the result we need at index &lt;code&gt;n&lt;/code&gt; of the table. &lt;/p&gt;

&lt;h2&gt;
  
  
  How can you learn more about dynamic programming?
&lt;/h2&gt;

&lt;p&gt;Like the title of this post says, this was just a brief introduction dynamic programming. The example I chose to work with was quite simple. There will be more complicated questions in which either the recurrence relation is not very obvious, or you need a two or even three dimensional table. The only way to get used to identifying if you can use dynamic programming to solve a particular problem and if so, how you can break down the problem and identify the key components is by practice. &lt;/p&gt;

&lt;p&gt;To practice problems, I highly recommend LeetCode. Here's a &lt;a href="https://leetcode.com/tag/dynamic-programming/" rel="noopener noreferrer"&gt;list of all the dynamic programming problems on LeetCode&lt;/a&gt;. LeetCode has great quality questions for every topic. But my personal favorite feature on LeetCode is the &lt;strong&gt;Discussions&lt;/strong&gt; tab in each problem. After you've solved the problem you can see and discuss how others are solving the same problem. Learning from other, more experienced programmers has always worked great for me. &lt;/p&gt;

&lt;p&gt;So go ahead and start solving problems! Even better, make a challenge out of it! I'm currently doing a "100 Days of Code" challenge (which I log in &lt;a href="https://github.com/ravimashru/100-days-of-code" rel="noopener noreferrer"&gt;this repo&lt;/a&gt; on GitHub) where I try to spend at least an hour every day solving problems on LeetCode or other similar sites. &lt;/p&gt;

</description>
      <category>algorithms</category>
      <category>100daysofcode</category>
      <category>beginners</category>
      <category>computerscience</category>
    </item>
    <item>
      <title>Understanding the object-fit CSS property</title>
      <dc:creator>Ravi Suresh Mashru</dc:creator>
      <pubDate>Sat, 26 Jun 2021 11:14:52 +0000</pubDate>
      <link>https://dev.to/ravimashru/understanding-the-object-fit-css-property-mek</link>
      <guid>https://dev.to/ravimashru/understanding-the-object-fit-css-property-mek</guid>
      <description>&lt;p&gt;The &lt;code&gt;object-fit&lt;/code&gt; property determines how the content of a &lt;strong&gt;replaced element&lt;/strong&gt; is resized to fit inside its container on a web page.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is a replaced element?
&lt;/h2&gt;

&lt;p&gt;A replaced element is an element whose contents are not affected by CSS in the current document.&lt;/p&gt;

&lt;p&gt;An example of a replaced element is the &lt;code&gt;&amp;lt;img&amp;gt;&lt;/code&gt; tag. We can specify the position of the image, but we can't actually influence the contents of the image displayed inside the &lt;code&gt;&amp;lt;img&amp;gt;&lt;/code&gt; tag using CSS.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using object-fit with &lt;code&gt;&amp;lt;img&amp;gt;&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;Let's look at a few examples of how we can use the &lt;code&gt;object-fit&lt;/code&gt; property to influence how an image is resized to fit inside its container element (i.e. the &lt;code&gt;img&lt;/code&gt; tag).&lt;/p&gt;

&lt;p&gt;The behaviour we see is applicable to all other replaced elements as well. Some other examples of replaced elements apart from &lt;code&gt;&amp;lt;img&amp;gt;&lt;/code&gt; are: &lt;code&gt;&amp;lt;embed&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;iframe&amp;gt;&lt;/code&gt; and &lt;code&gt;&amp;lt;video&amp;gt;&lt;/code&gt;. &lt;a href="https://developer.mozilla.org/en-US/docs/Web/CSS/Replaced_element" rel="noopener noreferrer"&gt;This page&lt;/a&gt; on MDN has more information about replaced elements.&lt;/p&gt;

&lt;p&gt;Consider the following two images of dimensions 100x150 and 250x300 respectively:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foahwzc7kxzygmf9supcr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foahwzc7kxzygmf9supcr.png" alt="img1"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We will place both images in &lt;code&gt;&amp;lt;img&amp;gt;&lt;/code&gt; tags with a width and height of &lt;code&gt;200px&lt;/code&gt; and see how the various &lt;code&gt;object-fit&lt;/code&gt; property values affect how the images are resized. I have given the &lt;code&gt;&amp;lt;img&amp;gt;&lt;/code&gt; tag a gray border so that its edges are easy to identify.&lt;/p&gt;

&lt;p&gt;Note: the &lt;code&gt;object-fit&lt;/code&gt; CSS property is used on the &lt;code&gt;&amp;lt;img&amp;gt;&lt;/code&gt; tag.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;object-fit: fill&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This is the default value of the &lt;code&gt;object-fit&lt;/code&gt; property. With this value, the image inside an &lt;code&gt;&amp;lt;img&amp;gt;&lt;/code&gt; tag will be resized to the size of the container (i.e. the &lt;code&gt;&amp;lt;img&amp;gt;&lt;/code&gt; tag).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuprhm2n9swc824qvwa23.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuprhm2n9swc824qvwa23.png" alt="fill"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you can see, if the aspect ratio of the container and the &lt;code&gt;&amp;lt;img&amp;gt;&lt;/code&gt; tag aren't the same, the image will be stretched.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;object-fit: contain&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;With this value, the image will be resized with its aspect ratio maintained so that the entire image fits within the container.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcirl7tq0sprhzftao5xl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcirl7tq0sprhzftao5xl.png" alt="contain"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;As you can see, if the aspect ratio of the image is different from that of the &lt;code&gt;&amp;lt;img&amp;gt;&lt;/code&gt; tag, then there will be a portion of the container that doesn't contain the image.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;object-fit: cover&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;The image is resized with its aspect ratio maintained for this value as well. However, the image will be resized so that it doesn't leave any empty space in the container like with &lt;code&gt;object-fit: contain&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1xe3j2obnq73azvqwpck.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1xe3j2obnq73azvqwpck.png" alt="cover"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If the aspect ratio of the image is different from that of the container, then there will be parts of the image that are clipped off.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;object-fit: none&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;With this value, the image is not resized at all and maintains its original dimensions. If the image is smaller than the container, the entire image is displayed with its original size. If the image is bigger than the container, it is clipped off.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcm0henhs76bfld6h3266.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcm0henhs76bfld6h3266.png" alt="none"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;object-fit: scale-down&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This value behaves as if the &lt;code&gt;object-fit&lt;/code&gt; property has either the value &lt;code&gt;none&lt;/code&gt; or the value &lt;code&gt;contain&lt;/code&gt; depending on which one results in a smaller image.&lt;/p&gt;

&lt;p&gt;This means that images smaller than the container size remain the same size (like with &lt;code&gt;object-fit: none&lt;/code&gt;) and images that are bigger than the container size are resized to fit the container (like with &lt;code&gt;object-fit: contain&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flmk171gdz8h9ih75rfhr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flmk171gdz8h9ih75rfhr.png" alt="scale-down"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>css</category>
      <category>webdev</category>
      <category>html</category>
    </item>
    <item>
      <title>An Introduction to Hash Tables</title>
      <dc:creator>Ravi Suresh Mashru</dc:creator>
      <pubDate>Fri, 04 Jun 2021 06:03:59 +0000</pubDate>
      <link>https://dev.to/ravimashru/an-introduction-to-hash-tables-50p7</link>
      <guid>https://dev.to/ravimashru/an-introduction-to-hash-tables-50p7</guid>
      <description>&lt;p&gt;Before diving into hash tables, let us quickly review two other data structures: &lt;strong&gt;arrays&lt;/strong&gt; and &lt;strong&gt;linked lists&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Arrays
&lt;/h2&gt;

&lt;p&gt;An array is used to store elements of the same type in contiguous memory locations. Since contiguous memory locations need to be reserved, you need to specify the size of the array upfront (when creating the array). Any element in the array can then be accessed in constant time since we know the exact memory location of each element in the array.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1hmlp7tyv0181iz6jazc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1hmlp7tyv0181iz6jazc.png" alt="Creating an array and accessing the element at index 2"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Since contiguous memory locations are used, we know exactly where to find the element at index 2 given the memory address at which the array starts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Linked Lists
&lt;/h2&gt;

&lt;p&gt;A linked list is also used to store a collection of elements. However, each element in the list is stored at a different location in memory. Each element in the list has a link pointing to the next element in the list.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftomhvxicomfs2h4bmgbw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftomhvxicomfs2h4bmgbw.png" alt="A linked list with four items"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Starting from the first element in the list (also known as the &lt;strong&gt;head&lt;/strong&gt; of the list), we can follow this chain of links pointing to the next element in the list to sequentially access every item in the list. The last item in the list doesn't have a link pointing to the next element. That's how we know that we have reached the end of the list.&lt;/p&gt;

&lt;p&gt;Since the items in a linked list can be stored anywhere in memory, we do not need to specify upfront what the size of the list should be. The linked list grows and shrinks dynamically as elements are added to and removed from it. However, we lose the ability to directly access elements at a given index in the list. Instead, we have to start at the head of the list and follow the links to the next item until we arrive at the index we want.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hash Tables - the middle ground
&lt;/h2&gt;

&lt;p&gt;Arrays give us the advantage of quick random access, but require us to define a size in advance. On the other hand, linked lists allow us to add and remove items without the need to specify a size. However, they take away our ability to access an item at a given index in constant time.&lt;/p&gt;

&lt;p&gt;Hash tables give us the best of both worlds. They allow us to add, look up and remove elements in constant time&lt;sup id="fnref1"&gt;1&lt;/sup&gt;, and they also allow us to add and remove elements without having to specify a list size upfront.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Hash Function
&lt;/h2&gt;

&lt;p&gt;When storing an element in a hash table, it is first passed through a hash function. The output of the hash function is the location in the hash table where the element should be stored.&lt;/p&gt;

&lt;p&gt;Since we need to look up the element from the table later, we need to ensure that when identical elements are passed through the hash function, it produces the same value.&lt;/p&gt;

&lt;p&gt;However, it is possible that when different elements are passed through the hash function, it produces the same value. When this happens, we call it a collision.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyo572fj9w069g4lnm1yc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyo572fj9w069g4lnm1yc.png" alt="Two elements being mapped to the same location in a hash table resulting in a collision"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A good hash function minimizes the chance of collision, but collisions cannot be eliminated entirely.&lt;/p&gt;

&lt;h2&gt;
  
  
  Handling Collisions
&lt;/h2&gt;

&lt;p&gt;There are two main ways of dealing with collisions: &lt;strong&gt;linear probing&lt;/strong&gt; and &lt;strong&gt;chaining&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Linear Probing
&lt;/h3&gt;

&lt;p&gt;When using linear probing, when there is a collision, we just keep looking at the next available location in the table until we find a free slot (or realize that the table is full and we can't store any more elements&lt;sup id="fnref2"&gt;2&lt;/sup&gt;).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgaa50eh3o9zu3qf1qd58.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgaa50eh3o9zu3qf1qd58.png" alt="Solving a hash table collision by linear probing"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When using this approach, we need to adjust the way we look up values from the hash table as well. If we hash an element and don't find it at the index specified by the hash function, then there is still a possibility that it was stored at the next available free slot in the table. We therefore have to look other locations in the table, one at a time, starting from the index specified by the hash function to ensure that the element really isn't stored in the hash table. That sounds very similar to traversing all elements in a linked list!&lt;sup id="fnref3"&gt;3&lt;/sup&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Chaining
&lt;/h3&gt;

&lt;p&gt;The other way we can deal with collisions is by storing all elements that get mapped to the same index in a linked list within the hash table.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flm4qk4pu3ofl296b6r1q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flm4qk4pu3ofl296b6r1q.png" alt="A hash table with elements having the same hash value chained in a linked list"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We are now not restricted by the size of the hash table when storing elements, since the elements themselves are stored in a linked list outside the hash table. The hash table just contains links to the head of the various linked lists.&lt;/p&gt;

&lt;p&gt;However, if we have too many collisions and a lot of elements get mapped to the same index then the linked list for that index grows. We then have to traverse the entire linked list to find the element we are looking for. This is why it is important that we use a hash function that minimizes collisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Choosing a Good Hash Function
&lt;/h2&gt;

&lt;p&gt;A good hash function minimizes the number of collisions by trying to uniformly distribute values over the entire hash table. It also uses all the information available in the element to calculate the hash value, and tries to map similar elements to different locations in a hash table. Finally, a hash function should be very fast to compute since we use it every time we insert into or look up a value in a hash table.&lt;/p&gt;

&lt;h2&gt;
  
  
  Complexity Analysis
&lt;/h2&gt;

&lt;p&gt;Assuming there are no collisions, it takes constant time to insert into and look up values from a hash table. The actual time taken for these operations depends on how fast the hash function is. However, when there are collisions, the hash table performs slightly worse.&lt;/p&gt;

&lt;p&gt;When using chaining, it still takes a constant time to add a new element to the hash table since adding a new element at the beginning of a linked list can be done in constant time. When using linear probing, however, we may take up to &lt;code&gt;O(n)&lt;/code&gt; time (where &lt;code&gt;n&lt;/code&gt; is the size of the hash table) in the worst case where the hash table is full and we probe the entire table looking for a free slot.&lt;/p&gt;

&lt;p&gt;When looking up values in a hash table during a collision, we take &lt;code&gt;O(n)&lt;/code&gt; time (where &lt;code&gt;n&lt;/code&gt; is the size of the hash table) since in the worst case, all elements are mapped to the same index and therefore we have to traverse the entire list of all elements to find the one we are looking for. This applies to deleting elements from a linked list as well, since to delete an element we first need to find it.&lt;/p&gt;

&lt;p&gt;We can use a balanced search tree instead of a linked list while chaining to reduce the worst case time complexity to &lt;code&gt;O(log n)&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Although hash tables provide a good balance between an array and a linked list, they are by no means a silver bullet.&lt;/p&gt;

&lt;p&gt;Since their worst case performance is &lt;code&gt;O(n)&lt;/code&gt;, if you know how many elements you need to store upfront, you would be better off using an array which guarantees adding and updating elements in constant time.&lt;/p&gt;

&lt;p&gt;Also, if you care about the order of items in the list and don't know how many items you need to store upfront, then you need to use a linked list since a hash table does not store this information.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://techdevguide.withgoogle.com/paths/data-structures-and-algorithms" rel="noopener noreferrer"&gt;Google Tech Dev Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=h2d9b_nEzoA&amp;amp;ab_channel=CS50" rel="noopener noreferrer"&gt;Hash Tables - CS50 - YouTube&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://leetcode.com/explore/learn/card/hash-table/" rel="noopener noreferrer"&gt;Introduction to Data Structure - Hash Table - LeetCode&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;The operations take constant time on average, and perform worse in the worst case. See the complexity analysis section for more details. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn2"&gt;
&lt;p&gt;When this happens, we can tell the user that the hash table is full, but then we wouldn't be doing any better than an array. Instead, we can dynamically increase the size of the hash table once it is full and redistribute the existing items and make space for new items. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn3"&gt;
&lt;p&gt;The worst case time complexity using this approach is in fact O(n) - the same as that of finding a value in a linked list. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>beginners</category>
      <category>100daysofcode</category>
      <category>programming</category>
    </item>
    <item>
      <title>Git: Merge vs Rebase</title>
      <dc:creator>Ravi Suresh Mashru</dc:creator>
      <pubDate>Thu, 11 Mar 2021 12:23:26 +0000</pubDate>
      <link>https://dev.to/ravimashru/git-merge-vs-rebase-59mf</link>
      <guid>https://dev.to/ravimashru/git-merge-vs-rebase-59mf</guid>
      <description>&lt;p&gt;I personally struggled to understand the difference between merging and rebasing in Git, especially since they both did what I wanted in the end (bring changes in another branch into my current branch), until I visualized what exactly happens in both processes.&lt;/p&gt;

&lt;p&gt;To compare the two, let us assume that the &lt;code&gt;main&lt;/code&gt; branch of your repository is where all the code is kept up-to-date.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--S_RmO9B0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/bsegnlkfmrpqdqcgjqbs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--S_RmO9B0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/bsegnlkfmrpqdqcgjqbs.png" alt="commit history of main branch"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Each circle in the sequence represents a commit. An arrow goes from a parent commit to a child commit (from a previous commit to the next commit).&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;To work on a new feature, you check out a new branch &lt;code&gt;cool-feature&lt;/code&gt; and make a few commits.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--evhQzdv9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/i1w8g1yal49wf8ov0b8u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--evhQzdv9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/i1w8g1yal49wf8ov0b8u.png" alt="commit history of main branch with feature branch checked out with new commits"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the meantime, some of your team members finished a feature they were working on and merged their commits to the &lt;code&gt;main&lt;/code&gt; branch.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--c1YBJqH7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/yuq70w51ndeyamg2r9dy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--c1YBJqH7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/yuq70w51ndeyamg2r9dy.png" alt="commit history of diverged main and feature branch"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You now need to bring the new changes in the &lt;code&gt;main&lt;/code&gt; branch into your &lt;code&gt;cool-feature&lt;/code&gt; branch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Git Merge
&lt;/h2&gt;

&lt;p&gt;If you want to &lt;strong&gt;merge&lt;/strong&gt; the changes into your branch, you would run the command &lt;code&gt;git merge main&lt;/code&gt; while still on your &lt;code&gt;cool-feature&lt;/code&gt; branch.&lt;/p&gt;

&lt;p&gt;This will replay the changes contained in commits on the &lt;code&gt;main&lt;/code&gt; branch &lt;strong&gt;since the first common commit between the two branches&lt;/strong&gt;. A new &lt;strong&gt;merge commit&lt;/strong&gt; will be created in your &lt;code&gt;cool-feature&lt;/code&gt; branch that contains all the latest changes from the &lt;code&gt;main&lt;/code&gt; branch.&lt;/p&gt;

&lt;p&gt;If both branches have changes to the same lines, then Git cannot determine which of the two changes to keep (the one in the &lt;code&gt;main&lt;/code&gt; branch or the one in the &lt;code&gt;cool-feature&lt;/code&gt; branch). This results in what Git calls a &lt;strong&gt;conflict&lt;/strong&gt; that the user has to resolve. After making necessary changes in the file that has a conflict (i.e. deciding whether to keep the changes from the &lt;code&gt;main&lt;/code&gt; branch, the &lt;code&gt;cool-feature&lt;/code&gt; branch or a mix of both) the user can commit these changes as a part of the &lt;strong&gt;merge commit&lt;/strong&gt; created.&lt;/p&gt;

&lt;p&gt;The merge commit has two parent commits: the latest commits on both the branches.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--KilWEJVH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/hort65si10psqqtwav68.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--KilWEJVH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/hort65si10psqqtwav68.png" alt="commit history after merging main branch with feature branch"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The algorithm Git uses to merge changes this way is called the &lt;strong&gt;3-way merge&lt;/strong&gt; algorithm because it uses three commits to generate the merge commit:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The latest commit on the &lt;code&gt;main&lt;/code&gt; branch.&lt;/li&gt;
&lt;li&gt;The latest commit on the &lt;code&gt;cool-feature&lt;/code&gt; branch.&lt;/li&gt;
&lt;li&gt;The latest commit that is common in both branches.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Git Rebase
&lt;/h2&gt;

&lt;p&gt;To perform a rebase, you would run the command &lt;code&gt;git rebase main&lt;/code&gt; while on your &lt;code&gt;cool-feature&lt;/code&gt; branch.&lt;/p&gt;

&lt;p&gt;This will replay each commit in your &lt;code&gt;cool-feature&lt;/code&gt; branch on top of the latest state of the &lt;code&gt;main&lt;/code&gt; branch. For each commit in your &lt;code&gt;cool-feature&lt;/code&gt; branch, a new commit will be created which contains the same changes as the initial commit.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---J-TyQKI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/jlrpbwrpbmkwf3r32vl9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---J-TyQKI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/jlrpbwrpbmkwf3r32vl9.png" alt="commit history after rebasing feature branch on main branch"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In case there are conflicts when applying each commit, Git will stop rebasing and notify you so that you can resolve them. The changes you make to resolve the conflicts will be added to the new commit.&lt;/p&gt;

&lt;p&gt;Git also gives you the option of performing an &lt;strong&gt;&lt;a href="https://git-scm.com/docs/git-rebase#_interactive_mode"&gt;interactive rebase&lt;/a&gt;&lt;/strong&gt; where you can decide what to do with each commit that will be reapplied to the &lt;code&gt;main&lt;/code&gt; branch. &lt;/p&gt;

&lt;h2&gt;
  
  
  Which One Should You Use?
&lt;/h2&gt;

&lt;p&gt;The first question I asked myself after understanding the difference between the two was: which one should I use?&lt;/p&gt;

&lt;p&gt;Not surprisingly, like most things in software engineering, the answer turned out to be &lt;strong&gt;it depends&lt;/strong&gt;. Merging and rebasing give you two completely different views of your repository's commit history.&lt;/p&gt;

&lt;p&gt;The downside of using &lt;strong&gt;merge&lt;/strong&gt; is the creation of merge commits that can clutter the commit history of your repository. In a project where a lot of changes are actively being added to the main branch, your feature branch will have a new merge commit in it every time you merge it with the main branch. This can make it difficult to clearly see the actual commits you are making to implement the feature.&lt;/p&gt;

&lt;p&gt;On the other hand, using &lt;strong&gt;rebase&lt;/strong&gt; is not as straight forward as using &lt;strong&gt;merge&lt;/strong&gt;. Since rebasing re-writes history (by dropping old commits and creating new ones), rebasing a branch that you have previously pushed (e.g. to GitHub) and others have started using can cause problems. You will now have a different commit history after the rebase from what others accessing the same branch have. If they do not pull this new rewritten history, they will be working on an outdated copy of the branch. This might become a problem when they try to push their changes in a branch with an outdated history.&lt;/p&gt;

&lt;p&gt;Personally, if I am working on a feature branch that I have not yet pushed upstream (and as a result no one would have checked it out), I use &lt;strong&gt;rebase&lt;/strong&gt; to avoid the extra merge commit. However, if the branch I am working on has already been made public, I avoid rewriting history with a &lt;strong&gt;rebase&lt;/strong&gt; and live with the extra merge commit.&lt;/p&gt;

&lt;h2&gt;
  
  
  Further Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.atlassian.com/git/tutorials/using-branches/git-merge"&gt;Git Merge - Atlassian Git Tutorial&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.atlassian.com/git/tutorials/rewriting-history/git-rebase"&gt;git rebase - Atlassian Git Tutorial&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.atlassian.com/git/tutorials/merging-vs-rebasing"&gt;Merging vs. Rebasing - Atlassian Git Tutorial&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>git</category>
      <category>github</category>
      <category>beginners</category>
    </item>
    <item>
      <title>An Introduction to Character Encodings</title>
      <dc:creator>Ravi Suresh Mashru</dc:creator>
      <pubDate>Wed, 06 Jan 2021 11:52:04 +0000</pubDate>
      <link>https://dev.to/ravimashru/an-introduction-to-character-encodings-4od4</link>
      <guid>https://dev.to/ravimashru/an-introduction-to-character-encodings-4od4</guid>
      <description>&lt;p&gt;In this post, we'll take a look at the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What character encodings are and why we need them.&lt;/li&gt;
&lt;li&gt;A few common character encoding formats (e.g. ASCII, The ISO 8859 Family).&lt;/li&gt;
&lt;li&gt;How Unicode and UTF-8 solved the problem of encoding characters from all the different languages in the world (including emojis! 😃).&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Letter, words and sentences are all human constructs created to communicate. Computers, however, understand only the language of binary - 0s and 1s. By understanding &lt;strong&gt;character encodings&lt;/strong&gt;, we can understand how computers store all the text that we see on our digital devices - tweets, facebook posts, and even this blog post!&lt;/p&gt;

&lt;p&gt;All language can be broken down into a sequence of characters. Different encodings store these characters in different ways. To keep things simple in the beginning, let us assume that we are interested in only letters of the English alphabet (lowercase and uppercase), the 10 digits (0 to 9), and a few special symbols (e.g. +, -, ?, *).&lt;/p&gt;

&lt;h2&gt;
  
  
  ASCII
&lt;/h2&gt;

&lt;p&gt;In ASCII (American Standard Code for Information Exchange), each character is stored as a sequence of 7 bits. Each bit can be either 0 or 1. Therefore, there are 

&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;27=1282^7 = 128&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;2&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;7&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;128&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 possible characters that can be represented using ASCII. This collection of 128 characters is called the &lt;a href="http://www.asciitable.com" rel="noopener noreferrer"&gt;ASCII character set&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Most computers deal with memory in chunks of 8 bits (also known as a &lt;strong&gt;byte&lt;/strong&gt;). In this case, the left-most bit is left unused (kept with a value of 0) and the 7 bits on the right-hand side are used to represent the character.&lt;/p&gt;

&lt;p&gt;For example, the character "A" has a value of 65 (in decimal). This means it would be stored in memory as: 01000001.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fjk3813tcap0k6mb20trh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fjk3813tcap0k6mb20trh.png" alt="ASCII character A in binary"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Since writing in binary can be space and time-consuming, we can use the &lt;a href="https://www.rapidtables.com/convert/number/binary-to-hex.html" rel="noopener noreferrer"&gt;hexadecimal equivalent&lt;/a&gt; instead. So the character "A" can be represented as 41.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fq72vunezdiworxnbgb2p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fq72vunezdiworxnbgb2p.png" alt="ASCII character A in hexadecimal"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Exercise 1&lt;/strong&gt;: Decode the following hexadecimal ASCII encoded text: &lt;em&gt;48 65 6C 6C 6F 20 77 6F 72 6C 64 21&lt;/em&gt;&lt;br&gt;
&lt;em&gt;&lt;a href="https://github.com/ravimashru/blog/blob/main/01-character-encodings/README.md#exercise-1" rel="noopener noreferrer"&gt;(Solution)&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Exercise 2&lt;/strong&gt;: Encode the following text using ASCII (in hexadecimal): &lt;em&gt;That's All Folks!&lt;/em&gt;&lt;br&gt;
&lt;em&gt;&lt;a href="https://github.com/ravimashru/blog/blob/main/01-character-encodings/README.md#exercise-2" rel="noopener noreferrer"&gt;(Solution)&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You can find a list of how each character is represented in ASCII &lt;a href="http://www.asciitable.com" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Since ASCII can be used to represent only 128 characters, it isn't enough for all the different characters in various languages.&lt;/p&gt;

&lt;p&gt;An attempt to fix this issue was to make the left-most bit that is unused in ASCII do something. This gave birth to ISO 8859.&lt;/p&gt;

&lt;h2&gt;
  
  
  The ISO 8859 Family
&lt;/h2&gt;

&lt;p&gt;The ISO 8859 family is a series of 10 different standards that are a superset of ASCII.&lt;/p&gt;

&lt;p&gt;In each of these standards, when the left-most bit is 0, the remaining bits represent ASCII characters as usual.&lt;/p&gt;

&lt;p&gt;When the left-most bit is 1, each of the standards use the remaining 7 bits to represent 128 new characters.&lt;/p&gt;

&lt;p&gt;You can find a list of all the characters represented by each of the standards when the left-most bit is 1 &lt;a href="http://czyborra.com/charsets/iso8859.html" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Although this doubled the number of characters that could be represented, this free-for-all ended up with so many different characters being represented in the extra space created for 128 new characters. And if you saved data using a computer that used one standard and read it in a computer that used another one, you wouldn't be able to make sense of the message since the ASCII characters would be the same, but the extra 128 characters would now be displayed differently.&lt;/p&gt;

&lt;p&gt;Also, a total of 256 characters was still not enough for all the languages in the world that collectively have thousands of letters!&lt;/p&gt;

&lt;h2&gt;
  
  
  Unicode
&lt;/h2&gt;

&lt;p&gt;The solution to the problem of representing thousands of different characters across different languages turned out to be: splitting &lt;strong&gt;character sets&lt;/strong&gt; and &lt;strong&gt;character encodings&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A character set is a mapping of a single character to unique numbers. These numbers are called &lt;strong&gt;code points&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A character encoding is a mapping of these code points to actual bytes that are stored in a computer.&lt;/p&gt;

&lt;p&gt;Unicode is a character set - it is a mapping of over 140,000 characters to a unique number. You can find a complete list of all characters and their corresponding numeric value on the Unicode &lt;a href="https://home.unicode.org" rel="noopener noreferrer"&gt;homepage&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Unicode values are usually represented by "U+" followed by hexadecimal numbers, e.g. U+0033 is the Unicode number for the digit "3".&lt;/p&gt;

&lt;p&gt;The way these decimal numbers are stored as bytes in a computer's memory depends on the character encoding used, e.g. UTF-8, UTF-16 and UTF-32.&lt;/p&gt;

&lt;h2&gt;
  
  
  UTF-8
&lt;/h2&gt;

&lt;p&gt;UTF-8 is a &lt;strong&gt;variable encoding&lt;/strong&gt; format. This means that a fixed number of bytes cannot be used to represent every character like in ASCII. Each character can be between 1 to 4 bytes long.&lt;/p&gt;

&lt;p&gt;All code points between 0 - 127 are the same as ASCII and are also stored in a single byte with the left-most bit set to 0. Therefore, all valid ASCII is valid UTF-8.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Exercise 3&lt;/strong&gt;: Encode the following text from exercise 2 using UTF-8 (in hexadecimal): &lt;em&gt;That's All Folks!&lt;/em&gt;&lt;br&gt;
&lt;em&gt;&lt;a href="https://github.com/ravimashru/blog/blob/main/01-character-encodings/README.md#exercise-3" rel="noopener noreferrer"&gt;(Solution)&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;All code points above 127 require multiple bytes to be encoded. The number of left-most 1s followed by a 0 in the first byte indicates how many bytes are there are in the encoding.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fzesy4b8aygaalh15hs9o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fzesy4b8aygaalh15hs9o.png" alt="2-byte character encoding in utf-8"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Similarly, three-byte encodings would have the format &lt;code&gt;1110XXXX 10XXXXXX 10XXXXXX&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;X&lt;/code&gt;s are the positions that can be used to store the actual encoding of the character in binary format. However, not all two-byte encodings that have this format are valid.&lt;/p&gt;

&lt;p&gt;For example, consider the character "A" with the code point 65 (1000001 in binary). It may be tempting to encode it using two bytes as follows:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Flz8ege0vnw4t0mlj4492.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Flz8ege0vnw4t0mlj4492.png" alt="Character A encoded using 2 bytes in UTF-8 format"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is an invalid encoding since all code points between 0 - 127 require a single byte.&lt;/p&gt;

&lt;p&gt;The smallest code point that can be encoded using two bytes is 128. Therefore, &lt;code&gt;11000010 10000000&lt;/code&gt; is the smallest valid UTF-8 two-byte encoding.&lt;/p&gt;

&lt;p&gt;The biggest two-byte encoding is &lt;code&gt;11011111 10111111&lt;/code&gt;, or 2047 (in decimal)/7FF (in hexadecimal).&lt;/p&gt;

&lt;p&gt;The following table summarizes the range of code points that each multi-byte encoding can be used to represent.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Number of bytes&lt;/th&gt;
&lt;th&gt;Smallest code point (decimal/hexadecimal)&lt;/th&gt;
&lt;th&gt;Largest code point (decimal/hexadecimal)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0 / 00&lt;/td&gt;
&lt;td&gt;127 / 7F&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;128 / 80&lt;/td&gt;
&lt;td&gt;2047 / 7FF&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;2048 / 800&lt;/td&gt;
&lt;td&gt;65535 / FFFF&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;65536 / 10000&lt;/td&gt;
&lt;td&gt;1114111 / 1FFFFF&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;When trying to encode a character using UTF-8, you need to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Determine the code point value of the character.&lt;/li&gt;
&lt;li&gt;Use the table above to determine how many bytes are required to encode the character.&lt;/li&gt;
&lt;li&gt;Convert the code point value of the character to binary.&lt;/li&gt;
&lt;li&gt;Place the bits in the binary representation in the right places in the multi-byte encoding.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Let's say we want to encode the Greek capital letter delta (Δ) using UTF-8. We can use &lt;a href="https://unicode-table.com/en/" rel="noopener noreferrer"&gt;this&lt;/a&gt; site to find the Unicode value of this character - U+0394. This means the code point value of this character is 394 (in hexadecimal).&lt;/p&gt;

&lt;p&gt;Since 394 falls between 80 and 7FF, we need 2 bytes to encode this character. The two bytes will have the format &lt;code&gt;110XXXXX 10XXXXXX&lt;/code&gt;, where the &lt;code&gt;X&lt;/code&gt;s will be replaced by the binary value of the code point.&lt;/p&gt;

&lt;p&gt;The hexadecimal value 394 in binary is: 1110010100. This can now be placed in the two bytes as follows:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Funl9gk267usnvmsgq0be.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Funl9gk267usnvmsgq0be.png" alt="Encoding of delta character in UTF-8"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When placing these bits, start from the right-hand side. Put 0 in all the additional &lt;code&gt;X&lt;/code&gt;s on the left-hand side.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Exercise 4&lt;/strong&gt;: Encode the following emoji using UTF-8: 😃&lt;br&gt;
&lt;em&gt;&lt;a href="https://github.com/ravimashru/blog/blob/main/01-character-encodings/README.md#exercise-4" rel="noopener noreferrer"&gt;(Solution)&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Decoding a sequence of bytes encoded using UTF-8 is a two stage process. Since a character may be encoded using multiple bytes, we first need to group bytes that are part of a multi-byte encoding together. Then we can convert each byte/group of bytes into the character they represent.&lt;/p&gt;

&lt;p&gt;For example, consider the following byte sequence: 72 C3 A9 73 75 6D C3 A9.&lt;/p&gt;

&lt;p&gt;It would be easier to see which bytes are part of a multi-byte sequence if we convert to binary.&lt;/p&gt;

&lt;p&gt;01110010 11000011 10101001 01110011 01110101 01101101 11000011 10101001&lt;/p&gt;

&lt;p&gt;These bytes can be divided as follows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The first byte starts with 0 and therefore represents a character on its own (from the ASCII character set).&lt;/li&gt;
&lt;li&gt;The second byte starts with &lt;code&gt;110&lt;/code&gt; and is therefore the first byte of a 2-byte sequence made up of the 2nd and 3rd byte.&lt;/li&gt;
&lt;li&gt;The 4th, 5th and 6th bytes all start with 0 and each represent characters from the ASCII character set.&lt;/li&gt;
&lt;li&gt;The 7th byte starts with &lt;code&gt;110&lt;/code&gt; and is also the first of a 2-byte sequence (the 7th and 8th).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;01110010 | 11000011 10101001 | 01110011 | 01110101 | 01101101 | 11000011 10101001&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Using the &lt;a href="http://www.asciitable.com" rel="noopener noreferrer"&gt;ASCII table&lt;/a&gt;, we can see that the first byte represents the character "r".&lt;/li&gt;
&lt;li&gt;In the following 2-byte sequence, the highlighted bits contain the binary representation of the encoded character: 110 &lt;strong&gt;00011&lt;/strong&gt; 10 &lt;strong&gt;101001&lt;/strong&gt;. When extracted, they form the number 11101001 (in binary) or E9 (in hexadecimal). In Unicode, &lt;a href="https://unicode-table.com/en/search/?q=E9" rel="noopener noreferrer"&gt;this code point&lt;/a&gt; is for the character "é".&lt;/li&gt;
&lt;li&gt;Using the &lt;a href="http://www.asciitable.com" rel="noopener noreferrer"&gt;ASCII table&lt;/a&gt; again, we find the 4th, 5th and 6th bytes represent the characters "s", "u" and "m" respectively.&lt;/li&gt;
&lt;li&gt;The 7th and 8th bytes are exactly the same as the 2nd and 3rd bytes, and represent the character "é".&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Therefore, this byte sequence is an encoding of the word "résumé" in UTF-8.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Exercise 5&lt;/strong&gt;: Decode the following bytes that have been encoded using UTF-8: &lt;em&gt;74 61 64 61 20 F0 9F 8E 89&lt;/em&gt;&lt;br&gt;
&lt;em&gt;&lt;a href="https://github.com/ravimashru/blog/blob/main/01-character-encodings/README.md#exercise-5" rel="noopener noreferrer"&gt;(Solution)&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Hopefully, the following things were made clear by reading this post:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ASCII is an encoding that requires 7 bits to represent each character. It can represent up to 128 characters.&lt;/li&gt;
&lt;li&gt;Since memory deals with groups of 8 bits, the left-most bit is set to 0 in ASCII.&lt;/li&gt;
&lt;li&gt;ISO 8859 is a family of encodings that sets the left-most bit of a byte to 1 to create space for a total of 256 characters.&lt;/li&gt;
&lt;li&gt;Unicode is not an encoding, but a mapping of characters to code points.&lt;/li&gt;
&lt;li&gt;UTF-8 is one of the encodings that can be used to convert code points into sequences of bytes to be stored in a computer's memory. Other encodings are: UTF-16 and UTF-32.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Further Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.w3.org/International/questions/qa-what-is-encoding" rel="noopener noreferrer"&gt;Character encodings for beginners&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/" rel="noopener noreferrer"&gt;The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/markadell/explain-utf-8-character-encoding-like-i-m-five-15gh"&gt;Explain UTF-8 character encoding like I'm five&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/zenulabidin/lab-easiest-encoding-and-character-sets-guide-4ppg"&gt;Lab: Easiest Encoding and Character Sets Guide&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>webdev</category>
    </item>
    <item>
      <title>The Evolution of Recurrent Neural Networks</title>
      <dc:creator>Ravi Suresh Mashru</dc:creator>
      <pubDate>Thu, 03 Dec 2020 13:21:58 +0000</pubDate>
      <link>https://dev.to/ravimashru/the-evolution-of-recurrent-neural-networks-2flb</link>
      <guid>https://dev.to/ravimashru/the-evolution-of-recurrent-neural-networks-2flb</guid>
      <description>&lt;p&gt;In this post, we will take a look at the evolution of recurrent architectures of neural networks and why the concept of &lt;strong&gt;attention&lt;/strong&gt; is so exciting.&lt;/p&gt;

&lt;p&gt;After reading this post, you should be able to appreciate why the Transformer architecture, introduced in the paper &lt;a href="https://arxiv.org/abs/1706.03762"&gt;Attention Is All You Need&lt;/a&gt; (from which the title of this post is also borrowed), uses only attention mechanisms to achieve previously unimaginable feats like &lt;a href="https://gpt3examples.com"&gt;OpenAI's GPT-3&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recurrent Networks
&lt;/h2&gt;

&lt;p&gt;In feedforward networks, information flows in one direction: from the input layer, through your hidden layers, to the output layer. This type of architecture works for tasks like image classification, where you need to look at the entire input just once.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--omcAl3IZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/58jyjep96g6cydmiufou.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--omcAl3IZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/58jyjep96g6cydmiufou.png" alt="Feedforward network for image classification"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Recurrent networks, on the other hand, are well suited for inputs like natural language that are recurrent in nature: made up of word after word in a sequence. They take one input at a time from the sequence. With each input, they also get their previous output - this enables recurrent neural networks to process the input as a sequence.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--dti4pff1--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/itzvdrwt8z3mq3myfk68.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--dti4pff1--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/itzvdrwt8z3mq3myfk68.png" alt="Recurrent Network cell"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;An easier way to visualize recurrent networks is to "unroll" them - draw them separately for each time step.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--CTJPDkN_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/n2vgf4fsvfqkcpjdblpu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--CTJPDkN_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/n2vgf4fsvfqkcpjdblpu.png" alt="Unrolled recurrent network cells"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Although RNNs have access to their previous output, they are not very good at remembering things that came long before in a sentence. These are more commonly known as "long-term dependencies".&lt;/p&gt;

&lt;h2&gt;
  
  
  LSTMs
&lt;/h2&gt;

&lt;p&gt;LSTMs (short for Long Short Term Memory networks) are a type of recurrent network that has a vector called the "cell state" that helps it remember long-term dependencies.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--A-WOtrEw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/qscq3w445alu7kmtuyvr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--A-WOtrEw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/qscq3w445alu7kmtuyvr.png" alt="Unrolled LSTM cells"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At each time step, an LSTM performs the following operations:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;It decides how much information it should forget from its cell state, using a hidden layer called the forget gate.&lt;/li&gt;
&lt;li&gt;It decides what new information it should remember given the current input, using a hidden layer called the input gate.&lt;/li&gt;
&lt;li&gt;It decides what information it should use from its memory to provide an output, using a hidden layer called the output gate.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--8izC5_eR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/w8fkvihaa8yfqfzj80wj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--8izC5_eR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/w8fkvihaa8yfqfzj80wj.png" alt="LSTM cell internals"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The above diagram is a little over-simplified so that it is easy to visualize the main parts that an LSTM is made up of. Let's take a closer look at each of the gates to see what goes on underneath.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Forget Gate
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--zwjqe2AW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/kq7620hxh8oqwaalxssz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--zwjqe2AW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/kq7620hxh8oqwaalxssz.png" alt="LSTM cell internals forget gate"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The forget gate concatenates the current input and previous output and passes it through a hidden layer with a sigmoid activation. The output of this layer is then multiplied elementwise with the context vector.&lt;/p&gt;

&lt;p&gt;Since sigmoid activations are between 0 and 1, values close to 0 instruct the LSTM to forget information in those parts of the cell state, and retain information where the sigmoid activation values are close to 1.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Input Gate
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Q49qy_PY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/huy2s1qw91kb8g3uar3y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Q49qy_PY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/huy2s1qw91kb8g3uar3y.png" alt="LSTM cell internals input gate"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The input gate also works on the concatenation of the current input and previous output.&lt;/p&gt;

&lt;p&gt;There is a sigmoid layer that decides which values to keep (between 0 and 1), and there is a tanh layer that provides values for what could be stored in the context vector. These two outputs are multiplied by each other and then added element-wise to the context vector.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Output Gate
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--P2pD3ooT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/4ddrq7tb8joauoyggk4x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--P2pD3ooT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/4ddrq7tb8joauoyggk4x.png" alt="LSTM cell internals output gate"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The output gate then multiplies elementwise the output of a sigmoid gate with the context layer passed through a tanh activation. The tanh activation helps bring the context vector in a range of -1 to 1 and the sigmoid activation helps select which parts of the context should be used for the output.&lt;/p&gt;

&lt;p&gt;For a better understanding of what happens inside LSTMs, take a look at &lt;a href="http://colah.github.io/posts/2015-08-Understanding-LSTMs/"&gt;this blog post&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  GRUs
&lt;/h2&gt;

&lt;p&gt;GRUs (short for Gated Recurrent Units) are different from LSTMs in two major ways:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;GRUs don't have a separate cell state. The hidden state of the cell and its output are the same.&lt;/li&gt;
&lt;li&gt;GRUs have just 2 gates instead of 3 - the reset gate and update gate. The assumption is that if the network is trying to forget something from the hidden state, then it is because it wants to store some other information there. Therefore, we don't need separate forget and input gates. The additive inverse of the forget gate can be used as the input gate.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--RK6_3LYX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/hvx1gki032lyxuubbe8w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--RK6_3LYX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/hvx1gki032lyxuubbe8w.png" alt="GRU gated recurrent unit cell"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The reset gate decides what information from the hidden state needs to be supressed during this time step.&lt;/p&gt;

&lt;p&gt;The update gate acts like a combination of the forget gate and input gate in an LSTM.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sequence modelling with Encoder-Decoder architecture
&lt;/h2&gt;

&lt;p&gt;Although LSTMs and GRUs perform better than vanilla RNNs, they have a problem that they have one output for each input at every time step.&lt;/p&gt;

&lt;p&gt;This is not a problem for tasks like sentiment analysis where we are interested in just the last output. However, for tasks like language translation where the length of the output sequence and even the position of the translated words with respect to each other can be different, this is a major limitation.&lt;/p&gt;

&lt;p&gt;Sequence-to-sequence (often abbreviated seq2seq) models solve this problem by splitting the entire process into two steps.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The input sequence is first run through an encoder. The encoder generates a fixed vector representation of the input at the last time step.&lt;/li&gt;
&lt;li&gt;This fixed representation is then run through a decoder which produces the output sequence of interest.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--W6UdxZd5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/6a1eyi1535lug7hp83u9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--W6UdxZd5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/6a1eyi1535lug7hp83u9.png" alt="Encoder decoder architecture seq2seq sequence-to-sequence"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The importance of paying attention
&lt;/h2&gt;

&lt;p&gt;A major limitation of seq2seq models is that the encoder tries to cram all the information it can into a single vector. The model is therefore limited by how much information it can put in there. Also, the information in this vector will be biased towards the last few words that the encoder has seen most recently.&lt;/p&gt;

&lt;p&gt;An attention layer works on all the output states of the encoder. At each step, the decoder looks at its previous output and a weighted sum of all the encoder outputs that the attention layer generates.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--B-usf42r--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/5gckoejw4x32pjgjzc1d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--B-usf42r--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/5gckoejw4x32pjgjzc1d.png" alt="Attention layer in encoder decoder"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The attention layer is a fully connected layer that is trained to select the right encoder output to focus on at each given timestep. It is a fully connected network that receives the previous output of the decoder and all the hidden states of the encoder. The output is a scalar value for each encoder output that acts as the "weightage" for that time step.&lt;/p&gt;

&lt;p&gt;These scalar values are passed through a softmax layer to ensure they add up to 1, and then multiplied by the corresponding encoder outputs and added together.&lt;/p&gt;

&lt;p&gt;The resulting vector of the weighted sums of the encoder outputs is provided as an input to the decoder along with the output of the decoder at the previous time step. That way, the decoder can "see" which part of the input it needs to focus on to generate the next output.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--FKXhlq-L--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/gxxepc2kj0njxcw669c7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--FKXhlq-L--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/gxxepc2kj0njxcw669c7.png" alt="Attention layer internals"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The concept of attention is so powerful that the Transformer architecture used by models like GPT-3 does away completely with recurrent and convolutional operations and uses &lt;strong&gt;only attention&lt;/strong&gt;!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Further reading&lt;/strong&gt;&lt;br&gt;
I highly recommend the following articles that helped me understand what I have written in this post:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://towardsdatascience.com/transformers-141e32e69591"&gt;How Transformers Work - The Neural Network used by Open AI and DeepMind&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://indico.io/blog/sequence-modeling-neural-networks-part2-attention-models/"&gt;Sequence Modeling with Neural Networks (Part 2): Attention Models&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://colah.github.io/posts/2015-08-Understanding-LSTMs/"&gt;Understanding LSTM Networks&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21"&gt;Illustrated Guide to LSTMs and GRUs: A step by step explanation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>nlp</category>
      <category>deeplearning</category>
    </item>
  </channel>
</rss>
