DEV Community

Randall
Randall

Posted on

2 2

Don't parse CSVs by hand!

I've seen custom JavaScript code to parse CSV data a number of times. It looks something like this:

const csvData = fs.readFileSync(csvFilePath, 'utf8');
const lines = csvData.split('\n');
const rows = lines.map(line => line.split(','));
Enter fullscreen mode Exit fullscreen mode

So simple, and it may be fine if you're parsing a single static CSV file whose content you know. But if you're parsing unknown CSV files, don't do this, there are many ways it can break.

What Can Go Wrong

Escaped cells

Columns in a CSV file may contain commas. Such columns are typically escaped with quotes. The raw data for a row with such a column might look something like this:

Name,Best Line
Hamlet,"To be, or not to be"
Enter fullscreen mode Exit fullscreen mode

Though the second line has two commas, this produces a table with two columns, not three:

Name Best Line
Hamlet To be, or not to be

This breaks the simple CSV parser implementation which just splits on ,.

Line Endings

The simple CSV parser implementation assumes LF line endings (\n), but text files, including CSV files, created on Windows will often have CRLF line endings (\r\n). Further, there may be line ending characters within escaped cells. Needless to say, these cases break the simple CSV parser implementation.

What to do instead

Bring in a library that's built for this. I like to use csv-parse. It's easy to use:

import { parse as csvParse } from 'csv-parse/sync';

const csvData = fs.readFileSync(csvFilePath, 'utf8');
const rows = csvParse(csvData);
Enter fullscreen mode Exit fullscreen mode

That's it.

Conclusion

I know no one likes bringing in libraries to solve what seems to be a simple task, but parsing arbitrary CSVs correctly is actually a lot harder than it looks at first glance. Save yourself the pain, use someone else's battle-hardened parser.

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read full post →

Top comments (0)

Billboard image

Create up to 10 Postgres Databases on Neon's free plan.

If you're starting a new project, Neon has got your databases covered. No credit cards. No trials. No getting in your way.

Try Neon for Free →