DEV Community

Cover image for Parsing CSV Files in Node.js with fs.createReadStream() and csv-parser
Isa Levine
Isa Levine

Posted on

Parsing CSV Files in Node.js with fs.createReadStream() and csv-parser

Cover image credit: Hunter x Hunter manga by Yoshihiro Togashi, meme-ified by yours truly. <3

Ah, good ol' CSV files: spreadsheet/database data rendered as pure text. Beautiful!

Parsing CSV (Comma-Separated Value) files involves reading a sequence of records row-by-row. Typically (but not necessarily), the first row is interpreted as the header for each column. All rows after the first are considered individual records, with each comma-separated-value belonging to a single column. Because of this, each row of a CSV must have the exact same number of values as each other row. (OK, I'm realizing that writing about CSVs is very tricky--see below for a visualization!)

The Challenge

In a recent takehome challenge, I was asked to parse a CSV of purchases that included:

  • date (as a String)
  • item (as an Integer) -- for this example, we'll use candy!
  • price (as a Float)
  • name (as a String)

Here's the example code we'll be working with (and a reveal about my recent TV-watching):

Raw CSV:

PURCHASE DATE,CANDY PURCHASED,CASH PAID,BUYER NAME
2016-04-03,1000,10000.11,Charlie Kelly
2017-11-14,1000,12000.22,Frank Reynolds
2018-01-20,2000,40000.33,Frank Reynolds
2018-03-20,2000,40000.44,Mac
2019-01-02,2000,50000.55,Sweet Dee
2019-01-02,1500,13500.66,Dennis Reynolds

More readable formatting:

PURCHASE DATE, | CANDY PURCHASED, | CASH PAID, | BUYER NAME
---------------|------------------|------------|-----------------
2016-04-03,    | 1000,            | 10000.11,  | Charlie Kelly
2017-11-14,    | 1000,            | 12000.22,  | Frank Reynolds
2018-01-20,    | 2000,            | 40000.33,  | Frank Reynolds
2018-03-20,    | 2000,            | 40000.44,  | Mac
2019-01-02,    | 2000,            | 50000.55,  | Sweet Dee
2019-01-02,    | 1500,            | 13500.66,  | Dennis Reynolds

Our Tools

We'll use two particular Node.js tools to accomplish our goal:

  1. The Node.js fs (file system) module, specifically the fs.createReadStream() method
  2. The npm package, csv-parser, which will convert our CSV into JSON

Since the fs module is native to Node.js, no external packages are needed. For our csv-parser npm package, go ahead and install it by running $ npm install csv-parser in your terminal.

Let's start off our app.js code by importing both with require:

// app.js

const csvParser = require('csv-parser');
const fs = require('fs');

Importing csv-parser this way will allow us to call csvParser() as a method, which will allow us to pipe in its data row-by-row. Now we're ready to dive in!

Reading the CSV File

gif of anime hand turning pages on a black book with a pentagram on the cover
Let's read some external data!

The CSV data above is currently stored in example_data.csv, located in the same directory as app.js. Let's create a variable that holds the filepath:

// app.js

const filepath = './example_data.csv'

Now let's put that fs.createReadStream() method to use:

fs.createReadStream(filepath)
    .on('error', () => {
        // handle error
    })

    .pipe(csvParser())
    .on('data', (row) => {
        // use row data
    })

    .on('end', () => {
        // handle end of CSV
    })

Let's break down each step:

  • fs.createReadStream(filepath) initializes the fs.ReadStream object -- since this is under-the-hood, we won't interact with fs.ReadStream explicitly.
  • .on('error', () => {} ) checks for errors with the given filepath before we start trying to pipe in its data.
  • .pipe(csvParser()) begins to pipe data into our ReadStream, which is now listening for the next two events:
    • .on('data', (row) => {} ) returns each line of the CSV row-by-row, accessible in its callback as row.
    • .on('end', () => {} ) listens for the end of the CSV. You can use this event to call methods you need after the entire file is read.

csv-parser converts CSV data to JSON

Let's take a look at how csv-parser is converting each row into usable JSON data.

Remember, the first line of a CSV is recorded as the column-headers! In our JSON, this will show up as the keys of each row Object, with their values being the data in their respective fields. This will make accessing the data in each row much simpler.

Here's our complete code:

const csv = require('csv-parser');
const fs = require('fs');

const filepath = "./example_data.csv"

fs.createReadStream(filepath)
    .on('error', () => {
        // handle error
    })

    .pipe(csv())
    .on('data', (row) => {
        console.log(row);
    })

    .on('end', () => {
        // handle end of CSV
    })

Running $ node app.js outputs:

// $ node app.js

Row {
'PURCHASE DATE': '2016-04-03',
'CANDY PURCHASED': '1000',
'CASH PAID': '10000.11',
'BUYER NAME': 'Charlie Kelly' }
Row {
'PURCHASE DATE': '2017-11-14',
'CANDY PURCHASED': '1000',
'CASH PAID': '12000.22',
'BUYER NAME': 'Frank Reynolds' }
Row {
'PURCHASE DATE': '2018-01-20',
'CANDY PURCHASED': '2000',
'CASH PAID': '40000.33',
'BUYER NAME': 'Frank Reynolds' }
Row {
'PURCHASE DATE': '2018-03-20',
'CANDY PURCHASED': '2000',
'CASH PAID': '40000.44',
'BUYER NAME': 'Mac' }
Row {
'PURCHASE DATE': '2019-01-02',
'CANDY PURCHASED': '2000',
'CASH PAID': '50000.55',
'BUYER NAME': 'Sweet Dee' }
Row {
'PURCHASE DATE': '2019-01-02',
'CANDY PURCHASED': '1500',
'CASH PAID': '13500.66',
'BUYER NAME': 'Dennis Reynolds' }

A couple things to note:

  • csv-parser does some pretty-printing for its console.log()-ing of the row-JSON. Keep this in mind if you have a challenge where the console output must match a specific format!
  • The keys from the first row have been stored as Strings, so in order to access each value, we must use the bracket notation: row["PURCHASE DATE"]. If the titles in the first row are single words with no special characters or spaces (i.e. if we had DATE instead of PURCHASE DATE in the first row), they can be accessed with dot notation: row.DATE

Using the row data

Let's revise our .on('data', (row) => {} ) callback to use the row's data, and print out a sentence describing each candy-purchase:

    .pipe(csv())
    .on('data', (row) => {
        let str = `${row["BUYER NAME"]} bought ${row["CANDY PURCHASED"]} pieces of candy on ${row["PURCHASE DATE"]} and paid $${row["CASH PAID"]}.`;
        console.log(str)
    })

Of course, you could use destructuring on each JSON-row to avoid writing out row["PURCHASE DATE"] and the other verbose key names inside the string...but I'll save that for another article. ;)

Running $ node app.js now outputs:

// $ node app.js

Charlie Kelly bought 1000 pieces of candy on 2016-04-03 and paid $10000.11.
Frank Reynolds bought 1000 pieces of candy on 2017-11-14 and paid $12000.22.
Frank Reynolds bought 2000 pieces of candy on 2018-01-20 and paid $40000.33.
Mac bought 2000 pieces of candy on 2018-03-20 and paid $40000.44.
Sweet Dee bought 2000 pieces of candy on 2019-01-02 and paid $50000.55.
Dennis Reynolds bought 1500 pieces of candy on 2019-01-02 and paid $13500.66.

Voila! CSV parsed!

Conclusion

Working with CSVs in Node.js is very easy and straightforward with the csv-parser package. However, there are plenty of other ways to approach this, including numerous other npm package options (surprise surprise), as well as purely using the fs module.

This particular code snippet from Dev.to user ItsASine does a great job of illustrating how to use fs only -- long story short, use fs.createReadStream() to return a Promise that resolves to a utf-8 encoded String. However, you MUST handle the first row headers and newline characters manually!

Have any other strategies to recommend? Please feel free to share and comment below!

Top comments (11)

Collapse
 
lejepra profile image
Lejepra

Hi Isa!

Great code!
Thank you for sharing that!

I am using this code, but I'm stuck...
I did the imports and everything, and I push the csv processed data into an array. After the csv processing I with to consume the data from the array, but the array is empty. Please see my code below:

fs.createReadStream(csvFilePath, 'utf8')
  .pipe(csv())
  .on('data', async row => {
    const { title, type, value, category }: CsvTransactions = row;
    console.log(row);
    transactionsArray.push({ title, type, value, category });
    console.log('transactionsArray inside loop: ', transactionsArray);
  })
  .on('end', () => {
    console.log('Csv file successfully processed!');
    return transactionsArray;
  });
await console.log('transactionsArray outside loop: ', transactionsArray);
Enter fullscreen mode Exit fullscreen mode

The console.logs show me that the csv processing is going fine, but the last console.log that should show me the final array with the comment 'transactionsArray outside loop: ' is printed before the processing occurs:
csvFilePath: E:\tmp\transactions.csv
transactionsArray outside loop: []
{ title: 'Loan', type: 'income', value: '1500', category: 'Others' }
transactionsArray inside loop: [
{ title: 'Loan', type: 'income', value: '1500', category: 'Others' }
]
{
title: 'Cellphone',
type: 'outcome',
value: '50',
category: 'Others'
}
transactionsArray inside loop: [
{ title: 'Loan', type: 'income', value: '1500', category: 'Others' },
{
title: 'Cellphone',
type: 'outcome',
value: '50',
category: 'Others'
}
]
{ title: 'Ice cream', type: 'outcome', value: '3', category: 'Food' }
transactionsArray inside loop: [
{ title: 'Loan', type: 'income', value: '1500', category: 'Others' },
{
title: 'Cellphone',
type: 'outcome',
value: '50',
category: 'Others'
},
{ title: 'Ice cream', type: 'outcome', value: '3', category: 'Food' }
]
Csv file successfully processed!

I understand that fs.createReadStream is an async operation, but I am struggling on how to make the code wait for that to finish, before going on with the execution.
Please help!

Thank you!

Collapse
 
isalevine profile image
Isa Levine

Hi Lejepra! I didn't actually use async / await in my own example code, so I'm basically guessing here.

I'd try moving the async to before the fs.createReadStream call, and seeing if that forces the await call to let the CSV-parsing finish.

Let me know if you're still working on this, and what you find out! :)

Collapse
 
rsjhon90 profile image
Jhony Rodrigues

Hi. I came across this situation in an academic project and we had to use the traditional Promise() constructor.
csv-parser does not support promises natively. Same for FS stream and console.log methods
Your code would be more or less like this, trying as hard as possible to make it look like it.

function loadTransactions(): Promise<CsvTransactions[]> {
  const transactions: Promise<CsvTransactions[]> = new Promise((resolve, reject) => {
      const transactionsArray: CsvTransactions[] = []

      fs.createReadStream(csvFilePath, 'utf8')
        .pipe(csv())
        .on('data', row => {
          const { title, type, value, category }: CsvTransactions = row;
          console.log(row);
          transactionsArray.push({ title, type, value, category });
          console.log('transactionsArray inside loop: ', transactionsArray);
        })
        .on('end', () => {
          console.log('Csv file successfully processed!');
          resolve(transactionsArray)
        })
        .on('error', (err) => {
          reject(err)
        })
    }
  )

  console.log('transactionsArray outside loop: ', transactions);

  return transactions;
}
Enter fullscreen mode Exit fullscreen mode

I'll take the opportunity and thank the author for the post that clarified other things for me.
Thank you!

Collapse
 
rohit114_1 profile image
Rohit Kumar • Edited

I have fixed this issue by using Promices,
that issue because the method in createReadStream is executed asynchronously
function createReadStream(){
var csvData = [];
return new Promise(resolve => {
fs.createReadStream('myfile.csv')
.pipe(csv())
.on('data', (data) => csvData.push(data))
.on('end', () => {
resolve(csvData)
});
})
}

const finalData = await createReadStream()

Collapse
 
bolugbogi profile image
Bolugbogi

I tried the code above but it is telling me

TypeError: csv is not a function
at Object. (C:\Users\user\Desktop\Kepler Project\index.js:19:9)
at Module._compile (node:internal/modules/cjs/loader:1126:14)
at Object.Module._extensions..js (node:internal/modules/cjs/loader:1180:10)
at Module.load (node:internal/modules/cjs/loader:1004:32)
at Function.Module._load (node:internal/modules/cjs/loader:839:12)
at Function.executeUserEntryPoint as runMain
at node:internal/main/run_main_module:17:47

Collapse
 
lclc68lclc profile image
Willina Clark

I'm running this. Is there something weird with Dates and CSV, for some reason when I try to console.log my statements, Date reads as undefined and the others read fine.

Collapse
 
isalevine profile image
Isa Levine • Edited

Hi Willina! As-is, the dates should just be processed as strings, so if the other string is printing fine, the date should too...

Just to test, I created a new directory, did a local install of csv-parser, and here's the code I'm running (copied verbatim):

// app.js

const csv = require('csv-parser');
const fs = require('fs');

const filepath = "./example_data.csv"

fs.createReadStream(filepath)
  .on('error', () => {
    // handle error
  })

  .pipe(csv())
  .on('data', (row) => {
    let str = `${row["BUYER NAME"]} bought ${row["CANDY PURCHASED"]} pieces of candy on ${row["PURCHASE DATE"]} and paid $${row["CASH PAID"]}.`;
    console.log(str)
  })

  .on('end', () => {
    // handle end of CSV
  })



// example_data.csv

PURCHASE DATE,CANDY PURCHASED,CASH PAID,BUYER NAME
2016-04-03,1000,10000.11,Charlie Kelly
2017-11-14,1000,12000.22,Frank Reynolds
2018-01-20,2000,40000.33,Frank Reynolds
2018-03-20,2000,40000.44,Mac
2019-01-02,2000,50000.55,Sweet Dee
2019-01-02,1500,13500.66,Dennis Reynolds



// console

Isas-MacBook-Pro:node-csv-parser isalevine$ node app.js 
Charlie Kelly bought 1000 pieces of candy on 2016-04-03 and paid $10000.11.
Frank Reynolds bought 1000 pieces of candy on 2017-11-14 and paid $12000.22.
Frank Reynolds bought 2000 pieces of candy on 2018-01-20 and paid $40000.33.
Mac bought 2000 pieces of candy on 2018-03-20 and paid $40000.44.
Sweet Dee bought 2000 pieces of candy on 2019-01-02 and paid $50000.55.
Dennis Reynolds bought 1500 pieces of candy on 2019-01-02 and paid $13500.66.

Any discrepancies between the code above and what you're running? Feel free to reply with the code itself and I can take a look :)

Collapse
 
lclc68lclc profile image
Willina Clark • Edited

I'm sorry, I was running this with my own data, everything shows up except the date which is formatted like this: 2017:12:16 12:45. The header shows up, but when I try to access the rows, I get undefined. Could the formatting be the issue?

Thread Thread
 
isalevine profile image
Isa Levine

Hi Willina, I'm so sorry I missed your reply!! I hope you were able to get unstuck--if not, I'd be happy to take a look at the data you were using, and see if we can find what's resulting in undefined! :)

Collapse
 
housseyn_cheriet profile image
Housseyn

Try this package:
npmjs.com/package/select-csv

Collapse
 
marcoazar18 profile image
marcoazar18

Hello, If I use this code in a helper function, how can I return the array to the main function without doing console.log? Thank you for helping