<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: AnotherRandomDev</title>
    <description>The latest articles on DEV Community by AnotherRandomDev (@anotherrandomdev).</description>
    <link>https://dev.to/anotherrandomdev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2895748%2F886c5d8c-0cd7-4b50-bbcc-fc64347a52a5.jpeg</url>
      <title>DEV Community: AnotherRandomDev</title>
      <link>https://dev.to/anotherrandomdev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/anotherrandomdev"/>
    <language>en</language>
    <item>
      <title>EFFICIENTLY COMPARE MASSIVE DATA STREAMS IN JAVASCRIPT</title>
      <dc:creator>AnotherRandomDev</dc:creator>
      <pubDate>Wed, 26 Feb 2025 11:38:33 +0000</pubDate>
      <link>https://dev.to/anotherrandomdev/efficiently-compare-massive-data-streams-in-javascript-53ib</link>
      <guid>https://dev.to/anotherrandomdev/efficiently-compare-massive-data-streams-in-javascript-53ib</guid>
      <description>&lt;p&gt;I know, it looks impossible. Javascript is not a very performant language, mostly because of its single-threaded model. So if I tell you we’re going to compare 1.8 million votes in a fictional city election, right into the browser, you’ll probably think I’m crazy. A browser would freeze and crash — every time. Guess what? You’d be right. Here is the result with a very popular diff library:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxbifzaiaboldsojv7u8j.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxbifzaiaboldsojv7u8j.gif" alt="Image description" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Clearly, it’s not &lt;em&gt;wow&lt;/em&gt; time. But with the right techniques — efficient memory management, a streaming architecture, web workers, and batch updates — you can do it. Here’s the proof:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3tztknu5j198hparfw6v.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3tztknu5j198hparfw6v.gif" alt="Demo of an application that smoothly compares 1.8 million votes over two elections." width="800" height="438"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you can see, even with 1.8 millions objects processed in real time, the UI is perfectly responsive. Most importantly, by injecting small data batches over time, we’ve turned a &lt;em&gt;wait for it&lt;/em&gt; delay into a &lt;em&gt;watch it happen&lt;/em&gt; experience.&lt;/p&gt;

&lt;p&gt;Alright, now let’s dive into the making process.&lt;/p&gt;

&lt;h2&gt;
  
  
  Challenges
&lt;/h2&gt;

&lt;p&gt;We have three challenges to solve.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Achieving top performance.&lt;/li&gt;
&lt;li&gt;Handling different input formats.&lt;/li&gt;
&lt;li&gt;Being easy for developers to use.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Performance&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The first thing is to avoid blocking the main thread at all costs. So we need asynchronous code, a worker to handle the heavy lifting, and a linear complexity &lt;em&gt;O(n)&lt;/em&gt;, which means that the execution time grows proportionally with the input size. Additionally, efficient memory management is crucial.&lt;/p&gt;

&lt;p&gt;Efficient memory management requires freeing up memory as soon as some data is processed, and also progressively sending the data in chunks to minimize UI updates. For example, instead of sending one million object diffs in a row, we could send groups of a thousand. This would dramatically reduce the number of DOM updates.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Input Format&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The second challenge is to handle different kind of input formats. We need to be versatile. We should accept arrays of objects, JSON files, and even data streams. So before starting the diff process, we need to convert everything into a single format: a readable stream.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Developer Experience&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Finally, we need to provide a great developer experience. The function must be easy to use, easy to customize. Basically, the user will be able to give two lists to our function. We will compare them, and progressively send back the result.&lt;/p&gt;

&lt;p&gt;The simplest way to do it is to expose an event listener, with three kind of events: &lt;code&gt;ondata&lt;/code&gt;, &lt;code&gt;onerror&lt;/code&gt;, &lt;code&gt;onfinish&lt;/code&gt;. Easy peasy.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo4rl8y1utocuoi1gl23w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo4rl8y1utocuoi1gl23w.png" alt="Our streamListDiff function and its event listeners" width="800" height="159"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;ondata&lt;/code&gt; event will receive an array of object diffs, called a &lt;code&gt;chunk&lt;/code&gt;. The diff should be clear, with a previous value, a current value, an index tracker, and a status — equal, updated, moved, or deleted.&lt;/p&gt;

&lt;p&gt;And because I know most of you love TypeScript, we’ll also have autocompletion. Icing on the cake, users will be able to specify options to refine the output. Let’s see how the algorithm works.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm3yrqxfi9bk7oxouv4nr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm3yrqxfi9bk7oxouv4nr.png" alt="Usage example with options and typing" width="800" height="217"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Algorithm Walkthrough
&lt;/h2&gt;

&lt;p&gt;For simplicity’ sake, the code we’re going through comes from the &lt;code&gt;streamListDiff&lt;/code&gt; function of the &lt;code&gt;@donedeal0/superdiff&lt;/code&gt; library.&lt;/p&gt;

&lt;p&gt;Let’s take a look at the main function. It takes two lists to compare, a common key across all the objects, like an &lt;code&gt;id&lt;/code&gt; for exemple, to match the objects between the two lists, and some options to refine the output.&lt;/p&gt;

&lt;p&gt;In the function body, we can see it returns an event listener before starting the diff. The trick is to trigger the logic block asynchronously. Basically, the event loop will execute all synchronous code first, and only then start the real work.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2hw5apjeef3plp9ixpgc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2hw5apjeef3plp9ixpgc.png" alt="streamListDiff: our main function" width="800" height="491"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then, we convert our two lists to readable streams, using different methods for each input type (an array, a file, etc.).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ljh611oegzcgbo89aco.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ljh611oegzcgbo89aco.png" alt="Convert file or array inputs to readable streams" width="800" height="385"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once we have two valid streams, we iterate over both of them in parallel thanks to &lt;code&gt;Promise.all()&lt;/code&gt;. At each iteration, we do two things: first, we check if the object is valid — if not, we emit an error message. Second, we check if an object with a similar reference property is already in the data buffer of the other list.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff5xdl774ux5yi1l4pn75.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff5xdl774ux5yi1l4pn75.png" alt="Iterating over the two streams" width="800" height="284"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What’s a data buffer? There are two buffers, one for each list. The idea is to store the unmatched objects in a hashmap so that the other list, which is being parsed at the same moment, can check in real time if there is a match for its current object. This avoids doing two full iterations, with no results and high memory consumption, before starting the real diff. We don’t lose time, and we’re efficient.&lt;/p&gt;

&lt;p&gt;We use a hashmap to store unmatched objects because it supports any kind of value as keys, is very performant and provides an iterator out of the box.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frc1qlmn8c6n3njg7s9w5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frc1qlmn8c6n3njg7s9w5.png" alt="Concurrent insertions and retrieval of matching entries" width="800" height="347"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Long story short, if there is a match in the buffer, we immediately remove it to free up memory and compare it to the current object. Once we have the object diff, we immediately send it to the user. If we can’t do the comparaison, we insert the object in the relevant buffer, waiting to be compared. Afterward, we simply iterate over each buffer and process the remaining objects in the same way.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc4rw9qp8g7k69r0bt730.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc4rw9qp8g7k69r0bt730.png" alt="Sending an object diff to the user" width="800" height="286"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, you’re probably wondering why we don’t do a single iteration over the first list, and find its match in the second list. Well, if you have a million objects, a &lt;code&gt;find()&lt;/code&gt; lookup will be highly inefficient, as it would lead to a huge number of iterations. But the data buffer method lets us retrieve the match with zero performance cost thanks to the &lt;code&gt;has()&lt;/code&gt; method.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzqzp57zg3ksodr9twkgd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzqzp57zg3ksodr9twkgd.png" alt="Map() is more efficient given our constraints" width="800" height="352"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Earlier, I said we immediately send each object diff to the user. This is partially true. Imagine the user receives a million objects in a row — it could overload the main thread. What we do instead, is store the object diffs in another buffer. Once this buffer reaches a certain size, say a thousand objects, we flush it and send a single batch of data to the user.&lt;/p&gt;

&lt;p&gt;To do it, we use a closure. Let’s take a look at the &lt;code&gt;outputDiffChunk&lt;/code&gt; function. It has an array to store the diffs and returns two functions: &lt;code&gt;handleDiffChunk&lt;/code&gt; and &lt;code&gt;releaseLastChunks&lt;/code&gt;. &lt;code&gt;handleDiffChunk&lt;/code&gt; receives an object diff and adds it to the buffer if it’s not full yet. If it’s full, it sends the batch to the user. Since we use a single instance of &lt;code&gt;handleDiffChunk&lt;/code&gt;, the context of &lt;code&gt;outputDiffChunk&lt;/code&gt; is preserved, which is why we can access the diff buffer each time we process a new object diff.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5dwb9mjp8zz5elc7yqoi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5dwb9mjp8zz5elc7yqoi.png" alt="Instead of sending data one at a time, we send it in batches." width="800" height="449"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Finally, &lt;code&gt;releaseLastChunks&lt;/code&gt; is self-explanatory. Once all diffs are processed, it flushes the diff buffer one last time and send the remaining data to the user.&lt;/p&gt;

&lt;p&gt;Ultimately, we emit a &lt;code&gt;finish&lt;/code&gt; event, and that’s all.&lt;/p&gt;

&lt;h2&gt;
  
  
  One more thing
&lt;/h2&gt;

&lt;p&gt;The demo we saw earlier uses virtualized list to render massive amounts of items in the DOM, and also takes advantage of &lt;code&gt;requestAnimationFrame&lt;/code&gt; to avoid updating it too often.&lt;/p&gt;

&lt;p&gt;As a result, everything runs really smooth. It seems like John Doe has been elected, congratulation to him!&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;LINKS&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://github.com/DoneDeal0/superdiff" rel="noopener noreferrer"&gt;Repository&lt;/a&gt; | &lt;a href="https://superdiff.gitbook.io/donedeal0-superdiff" rel="noopener noreferrer"&gt;Documentation&lt;/a&gt; | &lt;a href="https://www.npmjs.com/package/@donedeal0/superdiff" rel="noopener noreferrer"&gt;Npm&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
