<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Parthipan Natkunam</title>
    <description>The latest articles on DEV Community by Parthipan Natkunam (@parthipan).</description>
    <link>https://dev.to/parthipan</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F480406%2F9252d786-5688-43f7-bba0-712b0c1fec1f.png</url>
      <title>DEV Community: Parthipan Natkunam</title>
      <link>https://dev.to/parthipan</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/parthipan"/>
    <language>en</language>
    <item>
      <title>Processing a 2GB CSV in Node Without Running Out of Memory</title>
      <dc:creator>Parthipan Natkunam</dc:creator>
      <pubDate>Sat, 30 May 2026 05:08:13 +0000</pubDate>
      <link>https://dev.to/coded_parts/processing-a-2gb-csv-in-node-without-running-out-of-memory-526c</link>
      <guid>https://dev.to/coded_parts/processing-a-2gb-csv-in-node-without-running-out-of-memory-526c</guid>
      <description>&lt;p&gt;&lt;strong&gt;Why the obvious approach crashes, and how a few generator functions keep memory flat no matter how big the file gets.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here's a task that looks trivial on paper: Read a CSV export, filter the rows you care about, sum one column, write a small report. The kind of thing you bang out in ten minutes. Now say the file is around 2GB.&lt;/p&gt;

&lt;p&gt;The first version is four lines. It works great on a 5MB sample. Then you point it at the real export and Node falls over with JavaScript heap out of memory. The reflex is to do what most of us do first, bump --max-old-space-size, give it more heap, run it again. It gets further and dies again. That's the moment to stop fighting the symptom and look at what the code is actually asking the machine to do.&lt;/p&gt;

&lt;p&gt;Here is the thing worth internalizing: the size of your data does not have to dictate the size of your memory footprint. You can process a file bigger than your RAM. The trick is to never hold the whole thing at once, and generators give you a clean way to write code that does exactly that without turning into a mess of callbacks and manual state.&lt;/p&gt;

&lt;p&gt;Let's build up to it properly.&lt;/p&gt;

&lt;h2&gt;
  
  
  The version that dies
&lt;/h2&gt;

&lt;p&gt;Here's roughly what the first attempt looked like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;fs&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;fs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;readFileSync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;export.csv&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;utf8&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;total&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;row&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;amount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Number&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nb"&gt;Number&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isNaN&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="nx"&gt;total&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;total:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;total&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Read the file. Split on newlines. Loop. Sum. Clean and readable, and on a small file it's perfect.&lt;/p&gt;

&lt;p&gt;The problem is hiding in the first line, and it's actually two problems stacked on top of each other.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;fs.readFileSync&lt;/code&gt; pulls the entire file into memory as one big buffer before you do anything with it. A 2GB file is a 2GB allocation, minimum. Then &lt;code&gt;.split('\n')&lt;/code&gt; takes that buffer and produces an array with one string per line. For a file with millions of rows, that's millions of string objects, each with its own overhead, all alive at the same time. So now you're holding the raw file &lt;strong&gt;and&lt;/strong&gt; a fully expanded array of every line. You've roughly doubled the cost of the thing that was already too big.&lt;/p&gt;

&lt;p&gt;I wanted to see how bad it actually is, so I ran it. I generated a CSV with 2 million rows (&lt;code&gt;id,name,amount&lt;/code&gt;), which came out to about 45MB. Modest. Not even close to 2GB. Here's what the load-everything approach did to memory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;naive sum: 999000000 | peak RSS MB: 238
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;238MB of resident memory to process a 45MB file. That's more than five times the file size sitting in RAM at peak. Now scale that ratio up. A 2GB file with the same shape would want somewhere north of 10GB, and your container almost certainly does not have that. Hence the crash.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we actually want
&lt;/h2&gt;

&lt;p&gt;Step back from the code for a second.&lt;/p&gt;

&lt;p&gt;To sum a column, do you ever genuinely need every row in memory simultaneously? No. You need one row at a time. Read a line, pull out the number, add it to a running total, throw the line away, move on. At no point does row 1,400,000 need to coexist with row 3.&lt;/p&gt;

&lt;p&gt;That's the whole insight. The work is sequential and one-pass, so the memory should be too. We want to pull rows through the program one at a time, like water through a pipe, instead of trying to fill an entire Ocean in a bucket.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Node has had streams forever, and streams do exactly this. But raw streams are awkward to compose&lt;/strong&gt;. The moment you want to chain "read lines" into "parse them" into "filter them" into "sum them," you're wiring up event handlers and managing backpressure by hand, and the readable four-line version turns into something you don't want to look at.&lt;/p&gt;

&lt;p&gt;This is where generators earn their place.&lt;/p&gt;

&lt;h2&gt;
  
  
  Generators, the one-paragraph version
&lt;/h2&gt;

&lt;p&gt;A normal function runs start to finish and returns once. A generator function (the &lt;code&gt;function*&lt;/code&gt; syntax) can pause itself partway through, hand a value back to whoever called it, and then resume from exactly where it left off the next time you ask for a value. It does this with &lt;code&gt;yield&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;For reading files we want the async flavor, &lt;code&gt;async function*&lt;/code&gt;, because reading from disk is asynchronous. The consuming side uses &lt;code&gt;for await...of&lt;/code&gt; instead of a plain &lt;code&gt;for...of&lt;/code&gt;. Same idea, just async.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building the pipeline
&lt;/h2&gt;

&lt;p&gt;Let's write the big-file version as a set of small generators, each doing one job.&lt;/p&gt;

&lt;p&gt;First, a generator that yields the file one line at a time. Node's &lt;code&gt;readline&lt;/code&gt; module already reads a stream line by line, so we wrap it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;fs&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;readline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;readline&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;readLines&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;rl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;readline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createInterface&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;fs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createReadStream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="na"&gt;crlfDelay&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;Infinity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;line&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;rl&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="nx"&gt;line&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;createReadStream&lt;/code&gt; reads the file in small chunks rather than all at once. &lt;code&gt;readline&lt;/code&gt; hands us complete lines off those chunks. We &lt;code&gt;yield&lt;/code&gt; each line as it arrives. Crucially, nothing is accumulating here. A line comes in, goes out, and is gone.&lt;/p&gt;

&lt;p&gt;Next, a generator that turns raw lines into parsed objects:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;line&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// skip the header row&lt;/span&gt;
    &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Number&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice it takes a source of lines as its input and yields objects. It doesn't know or care whether those lines came from a file, a network socket, or an array in a test. It just transforms what flows through it.&lt;/p&gt;

&lt;p&gt;Now a filter, because in this scenario, I only wanted rows above a threshold:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;onlyAbove&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;min&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;row&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;amount&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="nx"&gt;min&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And finally we connect them and consume the result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;lines&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;readLines&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;export.csv&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;parsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;filtered&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;onlyAbove&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;total&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;row&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;filtered&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;total&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;total:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;total&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;count:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;})();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Read it from the inside out. &lt;code&gt;readLines&lt;/code&gt; produces lines, &lt;code&gt;parse&lt;/code&gt; consumes those and produces objects, &lt;code&gt;onlyAbove&lt;/code&gt; consumes those and produces a filtered subset, and the &lt;code&gt;for await&lt;/code&gt; loop at the bottom pulls the whole chain. Each stage is maybe five lines. Each one does a single thing. You can test them in isolation, reorder them, drop one in or out, all without touching the others.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9duakh3tz5f8llv4k5vv.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9duakh3tz5f8llv4k5vv.jpg" alt="Pipeline diagram with four generator stages feeding a for-await loop; forward arrows show data flow while dashed return arrows show the consumer pulling the next value upstream" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's the part that matters. I ran this exact pipeline against the same 2 million row file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pipeline sum: 999000000 count: 2000000 | peak RSS MB: 89
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg3cqdlfxnog1918b3fb9.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg3cqdlfxnog1918b3fb9.jpg" alt="Bar chart showing peak memory of 238 MB for the load-everything approach versus 89 MB for the generator pipeline on the same 45 MB file" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Same answer, 999000000, down to the last digit. But peak memory went from 238MB to 89MB. And that 89MB is not really "memory for the data." It's Node's baseline plus the read buffer plus a couple of objects in flight. The data itself is barely there because we only ever hold one row at a time. Throw a 2GB file at this and the number stays flat. That's the whole game.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this composes when streams alone don't
&lt;/h2&gt;

&lt;p&gt;You might be thinking, fine, but Node streams could do this too, and you'd be right. So why the generators?&lt;/p&gt;

&lt;p&gt;Pull versus push. A raw readable stream pushes data at you through events; you react to &lt;code&gt;'data'&lt;/code&gt; and &lt;code&gt;'end'&lt;/code&gt; and you manage the timing yourself. When you chain several transformations, you're coordinating several event emitters and making sure none of them races ahead of a slow consumer. Backpressure, in the jargon.&lt;/p&gt;

&lt;p&gt;Generators flip it to pull. The consumer at the bottom of the loop asks for the next value, and that request travels back up the chain. &lt;code&gt;onlyAbove&lt;/code&gt; asks &lt;code&gt;parse&lt;/code&gt; for a row, &lt;code&gt;parse&lt;/code&gt; asks &lt;code&gt;readLines&lt;/code&gt; for a line, &lt;code&gt;readLines&lt;/code&gt; asks the file for a chunk. Nothing is produced until something downstream wants it. &lt;strong&gt;Backpressure isn't something you configure; it's just how &lt;code&gt;yield&lt;/code&gt; works.&lt;/strong&gt; The producer literally cannot get ahead because it's paused until you call for the next value.&lt;/p&gt;

&lt;p&gt;That's why the four small functions above read almost like the naive version, but behave like a carefully tuned stream. You get the readability of the simple loop and the memory profile of hand-written streaming, without choosing between them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where this bites you
&lt;/h2&gt;

&lt;p&gt;I'd be lying if I said this is free.&lt;/p&gt;

&lt;p&gt;The big one: you get one pass. A generator is exhausted once you've iterated it. If you need to loop over the data twice, say, sum a column and then also find the max in a separate pass, you can't just iterate the same pipeline again. It's empty the second time. You either compute both in a single pass, or you re-create the pipeline from the source, or, if the result genuinely fits in memory, you collect it into an array (&lt;code&gt;const arr = []; for await (const x of pipe) arr.push(x);&lt;/code&gt;) and accept the cost. The streaming approach is for when the dataset doesn't fit, so collecting it usually defeats the point.&lt;/p&gt;

&lt;p&gt;The other one is debugging. With an array you can &lt;code&gt;console.log&lt;/code&gt; the whole thing and see your data. With a lazy pipeline there's nothing to log until you pull a value through, and a &lt;code&gt;console.log&lt;/code&gt; inside a generator only fires when that value is actually demanded. The execution order can surprise you the first few times. It clicks, but there's an adjustment period.&lt;/p&gt;

&lt;p&gt;And async generators do carry some per-iteration overhead compared to a tight synchronous loop over an array. If your data comfortably fits in memory and you care about raw speed, the array might genuinely be faster. This technique is about not dying on data that doesn't fit, not about winning microbenchmarks on data that does.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bit underneath
&lt;/h2&gt;

&lt;p&gt;What I find quietly interesting is that the &lt;code&gt;for await...of&lt;/code&gt; loop driving this whole thing is doing something generators were partly built to enable. The pause-and-resume machinery that lets a generator give up control and pick back up later is the same machinery that &lt;code&gt;async/await&lt;/code&gt; is built on top of. When you &lt;code&gt;await&lt;/code&gt; a promise, your function is effectively yielding control and waiting to be resumed, exactly like a generator yielding a value. async/await is, more or less, a generator and a runner that feeds it resolved promises. Once you've written a few generators by hand, a lot of the async behavior you've been taking on faith stops being magic.&lt;/p&gt;

&lt;p&gt;I dug into that whole layer, the two-way communication, &lt;code&gt;yield*&lt;/code&gt; composition, the async runner that became async/await, in a short book on generators. It's free. If the pipeline pattern here was useful and you want the full mental model under it, grab it: &lt;a href="https://codedparts.gumroad.com/l/generators-in-js" rel="noopener noreferrer"&gt;Generators in JavaScript&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The next time Node tells you it's out of memory, before you reach for a bigger heap, ask whether you ever needed all that data at once in the first place. Usually you didn't.&lt;/p&gt;

&lt;p&gt;Cheers :)&lt;/p&gt;

</description>
      <category>node</category>
      <category>javascript</category>
      <category>generators</category>
      <category>performance</category>
    </item>
  </channel>
</rss>
