DEV Community

sdcaulley
sdcaulley

Posted on • Originally published at sdcaulley.com

Node.js - Streams and Promises

I have been working on a project that requires reading large .csv files from the local file system and then working with the data. Node.js has some great tools for working with this, namely streams, event emitters, the readline native modules. However, all of the example code/tutorials fell into one of three categories:

  • print the data to the console (not useful)
  • write the data to a file
  • push the incoming data to an outside array
  • use an external library

I started with using the external library csv-parser. However, since it is basically a wrapper around the base Node.js technologies I listed above I has the same problems working with my data that I will list below. I eventually uninstalled it and wrote my own light-weight version.

Background

The `readline` module provides an interface for reading data from a Readable stream...one line at a time. from [Node.js Documentaion](https://nodejs.org/docs/latest-v16.x/api/readline.html)
All streams are instances of EventEmitter. from [Node.js Documentaion](https://nodejs.org/docs/latest-v16.x/api/stream.html)

Basically working with streams means listening for events with your data. And since the .on method of an EventEmitter expects a callback, everything you want to do next needs to happen in that callback. The readline module gives you the line event to listen for.

Solution #1

At first I tried the "push the incoming data to an outside array" approach.

const incomingData = [];

rl.on('line', data => [
  incomingData.push(data);
])
  .on('close', () => {
    // do something with incomingData
  });
Enter fullscreen mode Exit fullscreen mode

This solution does actually work if you are only reading one file. Unfortunately, I need to loop through a directory of files and read each one, and then do something with the data. I tired all sorts of things with counters and what not, but kept running into race conditions with the loops and what needed to happen next. So not really a solution for me.

Solution #2

This solution actually came from a member of my local code mentoring meetup. This solution uses Promises.

First, I created a JavaScript class for my various .csv needs.

const fs = require('fs');
const readline = require('readline');
const path = require('path');

class CSVHelpers {
  constructor () {
    super();
  }

  /**
   * @param  {string} filePath
   * @return {promise} Array of row objects. Key: header, value: field value
   */
  read (filePath) {
    return new Promise ((resolve, reject) => {
      try {
        const reader = this._createReadStream(filePath);
        let rows = [];
        let headers = null;

        reader.on('line', row => {
          if (headers === null) {
            headers = row.split(',');
          } else {
            const rowArray = row.split(',');
            const rowObject = {};
            rowArray.forEach((item, index) => {
              rowObject[headers[index]] = item;
            });

            rows.push(rowObject);
          }
        })
          .on('close', () => {
            resolve({
              rows,
              file: filePath
            });
          });
      } catch (error) {
        reject(error);
      }
    });
  }

  /**
   * @param  {type} filePath
   * @return {type} Readline event emitter
   */
  _createReadStream (filePath) {
    const fd = fs.openSync(path.resolve(filePath));
    const fileStream = fs.createReadStream(path.resolve(filePath), {fd});
    return readline.createInterface({
      input: fileStream
    });
  }
}

module.exports = CSVHelpers;
Enter fullscreen mode Exit fullscreen mode

Then in my code:

const csv = new CSVHelpers();
const dataFiles = fs.readdirSync(<pathToDirectory);

const filePromises = dataFiles.map(file => {
  return csv.read(<pathToFile>);
});

Promise.all(filePromises)
  .then(values => {
    // do something with the values.
  });
Enter fullscreen mode Exit fullscreen mode

This Promise approach means I don't need to trying to next loops or callbacks.

Conclusion

I do not know if this is the best solution, but it works for my use case, and solves the race conditions I was having. If you have better ways to solve the problem, please let me know.

Top comments (0)