<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jose Flavio Quispe Irrazábal</title>
    <description>The latest articles on DEV Community by Jose Flavio Quispe Irrazábal (@jflavio11).</description>
    <link>https://dev.to/jflavio11</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1108161%2F2d8e5319-3b77-4789-b565-e5e9149d0c42.png</url>
      <title>DEV Community: Jose Flavio Quispe Irrazábal</title>
      <link>https://dev.to/jflavio11</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jflavio11"/>
    <language>en</language>
    <item>
      <title>Analyzing the algorithmic complexity of the Kotlin API’s distinctBy function</title>
      <dc:creator>Jose Flavio Quispe Irrazábal</dc:creator>
      <pubDate>Wed, 10 Jan 2024 16:02:17 +0000</pubDate>
      <link>https://dev.to/jflavio11/analyzing-the-algorithmic-complexity-of-the-kotlin-apis-distinctby-function-509m</link>
      <guid>https://dev.to/jflavio11/analyzing-the-algorithmic-complexity-of-the-kotlin-apis-distinctby-function-509m</guid>
      <description>&lt;p&gt;A few times when I’ve conducted technical interviews with candidates for Software Engineer roles at the company where I was working, I’ve come across people who actually know how they might filter or sort a list, however when I delve into the depth of the problem, I start to find difficulties.&lt;/p&gt;

&lt;p&gt;Many times, as software developers, we get used to using libraries and APIs that have most complex solutions already implemented in a single function, making our lives easier. This doesn’t mean that we put aside our curiosity and ability to investigate how things work, at least in its most basic form.&lt;/p&gt;

&lt;p&gt;In this post, we are going to specifically analyze a very used function of the Kotlin API, this is &lt;code&gt;distinctBy&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;You can find the official documentation by clicking &lt;a href="https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/distinct-by.html" rel="noopener noreferrer"&gt;at this link&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Applying an algorithm in real life
&lt;/h2&gt;

&lt;p&gt;Suppose you have a list of products in which, for reasons not relevant to this post, there are repeated elements. We clearly don’t want to show the user a list of duplicate items. So we know that we should filter the list.&lt;/p&gt;

&lt;p&gt;We have a couple of alternatives:&lt;/p&gt;

&lt;h3&gt;
  
  
  Filtering the list with a loop
&lt;/h3&gt;

&lt;p&gt;We could instantiate a new empty list and looping through the original list, adding element by element only if the current item of the iteration is not found in this new list.&lt;/p&gt;

&lt;p&gt;Here we have a slight problem, that “only if it is not found in this new list” implies a search. Let’s assume that we haven’t used a hash table and are using a simple &lt;em&gt;ArrayList&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;We would have something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;filterList&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;products&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
   &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;newList&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;emptyList&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
   &lt;span class="n"&gt;products&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;forEach&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;product&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;
       &lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;findProduct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;products&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
           &lt;span class="n"&gt;newList&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="p"&gt;}&lt;/span&gt;
   &lt;span class="p"&gt;}&lt;/span&gt;
   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;newList&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;findProduct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;productToFind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Product&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;Boolean&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
   &lt;span class="n"&gt;list&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;forEach&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;product&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;
       &lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="n"&gt;productToFind&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
           &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;
       &lt;span class="p"&gt;}&lt;/span&gt;
   &lt;span class="p"&gt;}&lt;/span&gt;
   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Doing a quick analysis, the &lt;code&gt;filterList&lt;/code&gt; function will call the &lt;code&gt;findProduct&lt;/code&gt; function N times, where N is the number of elements in the original list.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;findProduct&lt;/code&gt; function will take &lt;strong&gt;up to N times&lt;/strong&gt; to complete. Assuming that the element to search for is in the last position of the list, the worst case (&lt;em&gt;worst case scenario&lt;/em&gt;) will be &lt;strong&gt;O(n)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This means that the complexity of our algorithm will be &lt;strong&gt;O(n²)&lt;/strong&gt;. That is, it could take &lt;strong&gt;n²&lt;/strong&gt; units of time to finish. And if we talk about Memory Consumption (space complexity), we will have a new list, that is &lt;strong&gt;O(n)&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Filtering the list with the distinctBy function
&lt;/h3&gt;

&lt;p&gt;Great. It turns out that we know of the existence of a function called &lt;code&gt;distinctBy&lt;/code&gt; in which we must pass it as a parameter, a function that indicates the “key” which will serve as an indicator of whether an element is repeated or not. This can be a first name, a last name, or a code.&lt;/p&gt;

&lt;p&gt;We just need to write:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;newList&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;distinctBy&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;product&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;¡And, &lt;em&gt;voilá&lt;/em&gt;!&lt;/p&gt;

&lt;p&gt;Ready, one line of code was enough and we already have a new list that does not contain repeated products with the same code.&lt;/p&gt;

&lt;p&gt;But… do we know what he does inside?&lt;/p&gt;

&lt;p&gt;Let’s analyze the source code of this function.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffckhn8eo6yo9fx34qup4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffckhn8eo6yo9fx34qup4.png" alt="distinctBy source code"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As we can see, it is an &lt;code&gt;inline&lt;/code&gt; extension function applicable to any data structure that implements the &lt;code&gt;Iterable&lt;/code&gt; interface.&lt;/p&gt;

&lt;p&gt;This function depends on two generic classes that it will need for its operation: one that determines the data types of the list to return (the same as the original list) denoted by the letter &lt;strong&gt;T&lt;/strong&gt;, and another that determines the differentiator of the objects, denoted with the letter &lt;strong&gt;K&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In the first two lines, two new objects are instantiated. The first is a &lt;em&gt;hash table&lt;/em&gt; or dictionary data structure that at the same time implements the &lt;em&gt;Set&lt;/em&gt; interface (HashSet): it will not allow repeated elements thanks to the fact that it contains a &lt;em&gt;hash table&lt;/em&gt; inside. The second is a simple list that will serve as the resulting filtered list.&lt;/p&gt;

&lt;p&gt;Iterates over the original list (shown as &lt;em&gt;this&lt;/em&gt; in the &lt;em&gt;for&lt;/em&gt; statement) and executes the &lt;code&gt;selector&lt;/code&gt; function that is passed as a parameter. This &lt;code&gt;selector&lt;/code&gt; function is passed the current element of the iteration.&lt;/p&gt;

&lt;p&gt;That means that in each iteration, the function &lt;code&gt;{ product -&amp;gt; product.code }&lt;/code&gt; will be returning what we want to serve as a unique identifier, that’s why its value is assigned to a variable named “key”.&lt;/p&gt;

&lt;p&gt;Once we get the unique identifier (which in our example is the product code), we proceed to insert it into our &lt;em&gt;hash table&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;As we can see, this insertion occurs inside a conditional if.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8f1ud3ctnmmnt7n4v821.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8f1ud3ctnmmnt7n4v821.png" alt="for loop inside distinctBy function"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Set interface states that the &lt;em&gt;add&lt;/em&gt; function will return &lt;em&gt;true&lt;/em&gt; if, and only if, the element was inserted. And, at what point is the element &lt;strong&gt;not inserted&lt;/strong&gt;? Well, when it already exists (it is verified with a hash table!).&lt;/p&gt;

&lt;p&gt;Once we know that this identifier did not exist in our identifier &lt;em&gt;hash table&lt;/em&gt; (&lt;code&gt;set.add(key)&lt;/code&gt; returns &lt;em&gt;true&lt;/em&gt;), then we proceed to insert the element (the product) in our new list (&lt;code&gt;list.add(e)&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;No lookup is performed to see if the iteration item already exists in the list. We only try to insert its identifier (the product code) in the &lt;em&gt;hash table&lt;/em&gt; and if this operation is successful (returns &lt;em&gt;true&lt;/em&gt;) then we just add this product to the filtered list.&lt;/p&gt;

&lt;p&gt;Since the whole original list is iterated anyway, the time complexity will be &lt;strong&gt;O(n)&lt;/strong&gt;, however, the verification of whether or not an element already exists in the list is returns in constant time &lt;strong&gt;O(1)&lt;/strong&gt;, due to our &lt;em&gt;hash table&lt;/em&gt; inside our &lt;em&gt;HashSet&lt;/em&gt; which will tell us if the identifier was inserted or not.&lt;/p&gt;

&lt;p&gt;Of course, when creating two new objects, the memory space used will be a little larger, but not as relevant since the purpose of the &lt;em&gt;HashSet&lt;/em&gt; is to store only the identifiers to know if an element has already been created. exists or not in the original list.&lt;/p&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;The &lt;code&gt;distinctby&lt;/code&gt; function uses a &lt;em&gt;HashSet&lt;/em&gt; structure to speed up the cost of time with the verification of the existence of an element in our list.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Unlike a &lt;em&gt;HashMap&lt;/em&gt;, a &lt;em&gt;HashSet&lt;/em&gt; needs only one element at insert time and it cannot be repeated, it acts as key and value at the same time (actually, if we look at the internal implementation, a generic static &lt;code&gt;Object&lt;/code&gt; object is used as value).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Of course, &lt;code&gt;distinctBy&lt;/code&gt; offers us better performance than if we opted for an implementation of two iterations, one inside the other. However, we could also have used a &lt;em&gt;HashMap&lt;/em&gt; and inserted on top of it using the &lt;em&gt;product code&lt;/em&gt; as the key and the product as the value. The result would be a clean product structure.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;What did you think? Do you also often analyze the functions that we use on a daily basis and that make our lives easier? I would like to know if you know of any other functions or APIs that also make use of these data structures efficiently and simplify our day-to-day developments 😄.&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
