DEV Community: Abrar ahmed

Stop Repeating Yourself: How I Built a Reusable “Data Cleaning Playground” in JavaScript

Abrar ahmed — Mon, 16 Jun 2025 11:13:04 +0000

If you’ve ever worked with messy CSV or Excel files, you probably know this feeling:

“This should be a quick cleanup…”
— 6 hours later, 300 lines of code and 2 coffees deep

I found myself getting exhausted from having to write the same code over and over again just to rename some columns, clear out duplicates, or fix broken dates. So, I created a little in-browser “data playground” using JavaScript to make things easier.
In this post, I’ll walk you through the process of building this project, explaining the hows and whys, and giving you the lowdown on how you can use it for your own messy-data adventures!

The Problem

Every new freelance project brought a new variation of the same CSV chaos:

"name", "full_name", "Full Name", " Name " all in one sheet
Empty rows and random nulls
Dates in 5 different formats
And of course… rows that just say “LOL”
Sure, I could easily whip up a Python notebook or get Pandas going… but sometimes, I just want a simple, browser-based solution to:
Preview the raw data
Write a quick transform function
Download the cleaned data

The Playground Idea

Instead of starting from scratch every time, I built a simple JavaScript setup that:

Lets me upload CSV, Excel, or JSON files
Previews both raw and cleaned data instantly
Lets me write a one-liner function to transform each row The best part? It all runs in the browser. No backend. No Python. No installations. ## The Setup: What I Used Libraries:

PapaParse (for CSV)
SheetJS (for Excel)
JSON.parse (native)
Vanilla JS + simple HTML

Here’s how the flow works:

User selects a file
Browser parses the file and calls a custom transform(row) function
Cleaned data is rendered in a table
Option to export cleaned data to CSV

Code Example: Basic CSV Upload + Transform

Here’s the core snippet for uploading a CSV file and transforming each row:
html

<input type="file" id="file" />
<table id="output"></table>

document.getElementById('file').addEventListener('change', (e) => {
  const file = e.target.files[0];
  Papa.parse(file, {
    header: true,
    complete: (results) => {
      const cleaned = results.data.map(transform);
      displayTable(cleaned);
    }
  });
});

function transform(row) {
  return {
    name: row["Full Name"]?.trim(),
    joined: new Date(row["Join_Date"]),
    isActive: row["Status"] === "Active"
  };
}

function displayTable(data) {
  const table = document.getElementById("output");
  table.innerHTML = "";
  const headers = Object.keys(data[0]);
  table.innerHTML += `<tr>${headers.map(h => `<th>${h}</th>`).join("")}</tr>`;
  data.forEach(row => {
    table.innerHTML += `<tr>${headers.map(h => `<td>${row[h]}</td>`).join("")}</tr>`;
  });
}

This gives you a fully working data transformer in about 30 lines of code. You can add your logic to rename fields, cast values, or remove unwanted rows.

Bonus Features I Added

Toggle between Raw and Cleaned views
Support for JSON and Excel files (via SheetJS)
Export button to download the cleaned file
Dark mode toggle because… why not
“Live preview” of the transformation result

Why This Was Worth It

Saves me hours on every new project
Helps clients understand their own data mess
Way faster than booting up a backend or notebook
Easy to tweak for one-off tasks
If you're a freelancer, analyst, or dev working with data — this is a huge time-saver.

Future Improvements (Pull Requests Welcome!)

Drag-and-drop UI
Column mapping UI (like Zapier)
Persistent transformations for repeat use
Plugin support for merging files, filtering, etc.

Want to Try It?

I’m planning to release the base code as a public template. If you're interested, drop a comment and I’ll share the GitHub repo once it's live.

Until then, feel free to clone the logic above and customize your own browser-based data wrangler!

Thanks for reading!
Let me know how you’re handling messy data workflows — or drop your weirdest CSV horror story in the comments.

Build a Universal CSV, Excel & JSON Data Previewer in Node.js

Abrar ahmed — Sun, 15 Jun 2025 12:56:11 +0000

Ever received a file from a client and thought:

“Is this even readable?”
I used to dive into every CSV in Excel, every JSON in VSCode, and every XLSX in Google Sheets, just to see the first few rows. It was pretty exhausting!

So I built something better:
A simple Node.js tool to preview CSV, Excel, or JSON files directly from the terminal — no manual opening, no GUI.

This tutorial will walk you through how to build your own version of that.

What We’re Building

A terminal command like this:

node preview.js data.xlsx

Outputs this:

┌────────────┬─────────────┬────────────┐
│ Name │ Email │ Joined │
├────────────┼─────────────┼────────────┤
│ Alice │ alice@x.com │ 2023-09-01 │
│ Bob │ bob@x.com │ 2023-10-12 │
└────────────┴─────────────┴────────────┘

It will support:

CSV
Excel (.xlsx)
JSON And it’ll detect the type automatically — so you don’t need a flag.

What You’ll Use

fs (built-in)
path (built-in)
csv-parser
xlsx
cli-table3 (for formatted console output)

Setup

Create a folder:

mkdir data-previewer
cd data-previewer
npm init -y

Install dependencies:

npm install csv-parser xlsx cli-table3

Step 1: File Type Detection

Create file: preview.js

const fs = require('fs');
const path = require('path');

const filePath = process.argv[2];
if (!filePath) {
  console.error(' Please provide a file path.');
  process.exit(1);
}

const ext = path.extname(filePath).toLowerCase();

This lets us detect whether the input file is .csv, .json or .xlsx.

Step 2: Parse CSV Files

Add this function to preview.js:

const csv = require('csv-parser');

function parseCSV(filePath, rowLimit = 5) {
  const results = [];
  fs.createReadStream(filePath)
    .pipe(csv())
    .on('data', (data) => {
      if (results.length < rowLimit) results.push(data);
    })
    .on('end', () => {
      renderTable(results);
    });
}

Step 3: Parse Excel Files

Add:

const XLSX = require('xlsx');

function parseExcel(filePath, rowLimit = 5) {
  const wb = XLSX.readFile(filePath);
  const sheet = wb.Sheets[wb.SheetNames[0]];
  const json = XLSX.utils.sheet_to_json(sheet, { defval: '' });
  renderTable(json.slice(0, rowLimit));
}

Step 4: Parse JSON Files

Add:

function parseJSON(filePath, rowLimit = 5) {
  const raw = fs.readFileSync(filePath, 'utf8');
  const data = JSON.parse(raw);
  const rows = Array.isArray(data) ? data : [data];
  renderTable(rows.slice(0, rowLimit));
}

Step 5: Render the Table

Add:

const Table = require('cli-table3');

function renderTable(data) {
  if (!data || data.length === 0) {
    console.log(' No data found');
    return;
  }

  const headers = Object.keys(data[0]);
  const table = new Table({ head: headers });

  data.forEach(row => {
    table.push(headers.map(h => row[h]));
  });

  console.log(table.toString());
}

Step 6: Glue It All Together

At the bottom of preview.js:

switch (ext) {
  case '.csv':
    parseCSV(filePath);
    break;
  case '.xlsx':
    parseExcel(filePath);
    break;
  case '.json':
    parseJSON(filePath);
    break;
  default:
    console.error(' Unsupported file type');
}

Try It Out

Drop some sample files into your folder:

customers.csv
report.xlsx
data.json

Run:

node preview.js customers.csv
node preview.js report.xlsx
node preview.js data.json

Boom! You now have a simple, universal file preview tool.

Optional Upgrades

Limit row count via CLI arg: node preview.js data.csv 10
Highlight columns with missing values
Export preview to temp file (Markdown/HTML)
Add support for TSV or XML (fun challenge!)

Conclusion

This is a great project to build:

Your first CLI tool
Real-world file handling in Node.js
Practice with CSV, Excel, and JSON formats
Avoiding boilerplate GUI tools for file checks

If you found this useful, drop a like or bookmark.

Got improvements or want to extend it? Share your ideas below!

Why I Stopped Writing “Just Another CSV Script” for Every Project

Abrar ahmed — Wed, 11 Jun 2025 19:10:45 +0000

Every project starts the same way:

- Client sends a messy CSV file
- I write a quick script to clean it
- A week later… they send another file, slightly different
- I tweak the script again
- Repeat until I'm buried in tiny, fragile one-off scripts

Sound familiar?

In the past, I treated CSV cleaning like it was a minor task—just whip up some Node.js, make the necessary fixes, and then get on with my day.

The Problem With One-Off Scripts

One-time scripts are fast to write and easy to forget. But they come back to haunt you when:

A client changes the column order or headers
You forget which script handles which format
Someone else needs to run it—and it only works on your machine
You end up repeating the same logic across 10 files

I was solving the same problems repeatedly:

Normalize inconsistent column names
Convert date formats
Drop blank or duplicate rows
Handle different encodings (UTF-8 with BOMs… hello darkness)
Export the cleaned result

I didn’t need more scripts. I needed structure.

What I Do Now Instead

These days, when I come across a messy new file, I don’t just dive in from the beginning.

I’ve developed a handy approach that breaks things down into small, testable parts.

input parsers (CSV, Excel, JSON)
a normalization layer (headers, encodings)
a transformation layer (date formatting, filters, maps)
an output formatter (CSV, JSON, preview) This isn’t a framework. It’s just a mindset: Write it once → reuse it forever.

Example: Simple Modular Cleanup in Node.js

Instead of one giant script, I use small utilities like these:
parser.js

const fs = require("fs");
const csv = require("csv-parser");

function parseCSV(filePath) {
  return new Promise((resolve, reject) => {
    const results = [];
    fs.createReadStream(filePath)
      .pipe(csv())
      .on("data", (row) => results.push(row))
      .on("end", () => resolve(results))
      .on("error", reject);
  });
}

module.exports = { parseCSV };

cleaner.js

function cleanRows(data) {
  return data
    .filter(row => Object.values(row).some(val => val !== ""))
    .map(row => ({
      ...row,
      date: new Date(row.date).toISOString().split("T")[0], // Normalize date
      name: row.name?.trim(), // Clean string
    }));
}

module.exports = { cleanRows };

exporter.js

const { writeFileSync } = require("fs");

function exportCSV(data, path) {
  const header = Object.keys(data[0]).join(",");
  const rows = data.map(obj => Object.values(obj).join(",")).join("\n");
  writeFileSync(path, `${header}\n${rows}`, "utf8");
}

module.exports = { exportCSV };

main.js

const { parseCSV } = require("./parser");
const { cleanRows } = require("./cleaner");
const { exportCSV } = require("./exporter");

async function runCleanup() {
  const raw = await parseCSV("dirty.csv");
  const cleaned = cleanRows(raw);
  exportCSV(cleaned, "cleaned.csv");
}

runCleanup();

Now, whenever I receive a new file, I simply adjust my cleaner.js logic—no need to start from square one anymore.

Benefits of Moving Away From “Just Scripts”

Less copy-paste, more confidence
Easier to onboard clients or teammates
Faster debugging (you know where the logic lives)
Fewer edge-case surprises
Scales from a 100-row file to 1 million+ rows Now when I get a weird file with 12 columns, 3 date formats, and 2 “LOL” rows… I know my workflow can handle it.

Takeaways for Devs Handling Messy Data

Your first script should solve the problem
Your second should solve the pattern
Your third should become a system

If you're still writing one-off scripts for every client file:
no shame — we've all done it
but long term, it's pain on repeat

If you’ve already moved to a modular, testable data-cleaning setup, I’d love to hear how you approached it

From Spreadsheets to Sanity: How I Automate Repetitive Data Tasks With Plain JavaScript

Abrar ahmed — Tue, 03 Jun 2025 17:44:16 +0000

If you've ever found yourself copying and pasting the same data across Excel tabs for the umpteenth time in a week… trust me, I get it.
At some point, those spreadsheets that were supposed to make our lives easier start feeling more like an unpaid internship.

In this post, I’m eager to show you how I escaped the spreadsheet cycle and automated those repetitive data cleanup tasks with plain JavaScript—no frameworks or fancy libraries involved. It’s all about a bit of logic and some Node.js!

The Problem

Clients would send me data like this:

Huge CSV files with inconsistent column names (Full Name, full_name, name_full)
Mixed date formats (DD-MM-YYYY, YYYY/MM/DD)
Duplicates and empty rows
Repetitive filtering tasks (like removing inactive users)

I kept doing the same things in Excel until I decided: enough. Let’s script it.

Step 1: Read the File

For CSVs, I used csv-parse:
const fs = require('fs');
const parse = require('csv-parse');

fs.createReadStream('input.csv')
  .pipe(parse({ columns: true }))
  .on('data', (row) => {
    // handle row
  });

For Excel files:

const XLSX = require('xlsx');
const workbook = XLSX.readFile('data.xlsx');
const sheet = workbook.Sheets[workbook.SheetNames[0]];
const json = XLSX.utils.sheet_to_json(sheet);

Step 2: Clean the Data

Normalize headers:

function normalizeHeaders(row) {
  const normalized = {};
  for (let key in row) {
    const newKey = key.trim().toLowerCase().replace(/\s+/g, '_');
    normalized[newKey] = row[key];
  }
  return normalized;
}
data = data.map(normalizeHeaders);

Remove blank rows:

data = data.filter(row => Object.values(row).some(val => val !== ''));

Format dates:

function formatDate(dateStr) {
  const date = new Date(dateStr);
  return date.toISOString().split('T')[0]; // yyyy-mm-dd
}
data = data.map(row => ({
  ...row,
  joined_date: formatDate(row.joined_date)
}));

Step 3: Export the Cleaned Data

const { writeFileSync } = require('fs');
const { stringify } = require('csv-stringify/sync');

const output = stringify(data, { header: true });
writeFileSync('cleaned.csv', output);

Boom — reusable cleanup in under 5 seconds.

Lessons Learned

Plain JavaScript is enough for most data cleanup tasks.
csv-parse + csv-stringify make CSV parsing easy.
Once you write a cleanup script once, you never do it manually again.

TL;DR

Ditch repetitive Excel formulas.
Read CSV/Excel in JS.
Normalize headers, clean rows, convert formats.
Export back out — all automated.

Let me know if you've built similar automations or want to share some CSV horror stories

What I Learned Cleaning 1 Million Rows of CSV Data Without Pandas

Abrar ahmed — Sat, 31 May 2025 12:17:36 +0000

Cleaning a small CSV? Pandas is perfect.
Cleaning up a million rows on a limited machine or using a serverless function? That's when Pandas really struggles.

That’s exactly the problem I faced.

In this post, I’ll share:

Why I avoided Pandas
My Node.js pipeline with csv-parser
How I handled common data issues: dates, phone numbers, missing fields
What I’d do differently next time

Let’s dive in.

Why Not Pandas?

Pandas is fantastic, but it does have a downside: it loads the entire file into memory. If you're working with files larger than about 500MB, you might run into some issues.

Memory errors
Slow performance
Crashes in limited environments (e.g., cloud functions, small servers)

In my case, I had:

1 million+ rows
Dirty data from multiple sources
A need to stream and clean data row by row

My Setup: Streaming CSV Processing in Node.js

Here’s the core pipeline using csv-parser and Node streams:

const fs = require('fs');
const csv = require('csv-parser');

let rowCount = 0;
let errorCount = 0;

fs.createReadStream('bigfile.csv')
  .pipe(csv())
  .on('data', (row) => {
    rowCount++;

    // Clean data
    row.email = cleanEmail(row.email);
    row.phone = cleanPhone(row.phone);
    row.date = parseDate(row.date);

    // Validate required fields
    if (!row.email || !row.date) {
      errorCount++;
      logError(row);
      return;
    }

    // Save row to DB, another file, or API...
  })
  .on('end', () => {
    console.log(`✅ Processed ${rowCount} rows`);
    console.log(`⚠️  Found ${errorCount} bad rows`);
  });

function cleanEmail(email) {
  return email?.trim().toLowerCase() || null;
}

function cleanPhone(phone) {
  const digits = phone?.replace(/\D/g, '');
  return digits?.length === 10 ? digits : null;
}

function parseDate(date) {
  const parsed = Date.parse(date);
  return isNaN(parsed) ? null : new Date(parsed).toISOString();
}

function logError(row) {
  fs.appendFileSync('errors.log', JSON.stringify(row) + '\n');
}

Common Data Issues I Ran Into (and How I Fixed Them)

Inconsistent date formats (MM-DD-YYYY vs DD/MM/YYYY) → Used Date.parse() + fallback logic.
Phone numbers in weird formats → Removed non-digits, validated length
Missing fields → Set defaults or marked as null
Extra columns → Stripped to schema fields
Encoding problems → Saved CSVs as UTF-8

Pro Tips for Large CSV Cleaning

Stream, don’t load → Avoid memory issues by processing row by row
Validate early → Catch bad data before it pollutes your system
Log errors → Keep a separate file of rejected rows for review
Test on a small sample → Always test your logic before full-scale runs
Handle edge cases → Empty cells, extra commas, inconsistent headers—these will happen!

What I’d Do Differently Next Time

Use a schema definition (like JSON Schema or Zod) to validate and transform rows automatically
Build a mapping layer for multi-source CSVs (e.g., different column names)
Consider tools like DuckDB or Polars if I need more advanced queries

Final Thoughts

Handling big data files involves more than just coding; it’s about crafting durable pipelines that can navigate the complexities and messiness of real-world scenarios.

If you’re working with CSVs, remember:

Validate early
Clean thoughtfully
Log everything

And when in doubt, stream it, don’t load it all at once.

Have you ever tackled the challenge of cleaning up a huge dataset? What tools or tips have you found to be the most helpful? I’d love to hear your thoughts!

How to Handle CSV, Excel, and JSON Uploads in Node.js (Without Losing Your Mind)

Abrar ahmed — Fri, 30 May 2025 11:29:45 +0000

Ever found yourself trying to build a feature for users to upload files, and suddenly you're knee-deep in weird CSV quirks, Excel formats, and complex JSON structures?
I’ve been there too. Here’s a simple guide to help you handle file uploads in Node.js without losing your mind.

Step 1: Accept File Uploads (With Multer)

First, use multer to accept file uploads in Express:

npm install multer

Basic setup:

const express = require('express');
const multer = require('multer');
const upload = multer({ dest: 'uploads/' });
const app = express();

app.post('/upload', upload.single('file'), (req, res) => {
  console.log(req.file);
  res.send('File uploaded!');
});

Step 2: Parse Different File Types

1. CSV Files

Use csv-parser:

npm install csv-parser

Code:

const fs = require('fs');
const csv = require('csv-parser');

fs.createReadStream('uploads/file.csv')
  .pipe(csv())
  .on('data', (row) => {
    console.log(row);
  });

2.Excel Files

Excel Files

npm install xlsx

Code:

const xlsx = require('xlsx');

const workbook = xlsx.readFile('uploads/file.xlsx');
const sheetName = workbook.SheetNames[0];
const data = xlsx.utils.sheet_to_json(workbook.Sheets[sheetName]);

console.log(data);

3.JSON Files

Simple:

const fs = require('fs');

const data = JSON.parse(fs.readFileSync('uploads/file.json', 'utf8'));
console.log(data);

Step 3: Handle Common Data Issues

Normalize date formats (e.g., with dayjs)
Remove empty rows
Deduplicate entries
Validate column headers

Example (normalize phone numbers):

function cleanPhoneNumber(num) {
  return num.replace(/\D/g, '');
}

Step 4: Structure Your Code

Create a separate module for each file type
Keep upload logic separate from parsing
Log errors clearly

Final Thoughts

Handling messy files is something you’ll encounter while creating real-world apps. But don’t worry! With the right tools, you can easily work with CSV, Excel, and JSON files without losing your mind.

Got your own tips or tools? Drop them in the comments!

#csv

Abrar ahmed — Thu, 29 May 2025 11:53:06 +0000

How to Handle Big Data Transformations Without Pandas (and My Favorite Workarounds)

Abrar ahmed — Thu, 29 May 2025 11:16:55 +0000

Are you having a tough time dealing with massive CSVs, Excel files, or JSON data that Pandas just can’t seem to manage? Let me share how I tackle huge datasets using Spark, along with some tools I'm checking out to simplify big data machine learning.

Why Handling Big Data is Hard

When it comes to handling large datasets — like those with millions of rows and gigabytes of files — you’ve probably experienced this:

Pandas crashes with an out-of-memory error
Scikit-learn slows to a crawl
Even simple .fillna() or .transpose() functions become impossible

In my project, I made the choice to move away from Pandas completely. Now, I’m relying on Apache Spark for distributed data processing. But keep in mind, Spark has its own set of limitations as well.

No built-in pct_change() for percentage differences
No .transpose() for wide tables
Complex data cleaning often requires custom UDFs

I began my search for smarter ways to tackle big data transformations, and here’s what I’ve discovered.

1. How to Calculate pct_change() in Spark

Pandas makes it easy:

df['pct_change'] = df['value'].pct_change()

But in Spark, you have to use window functions:

from pyspark.sql import Window
from pyspark.sql.functions import col, lag

window = Window.partitionBy("group").orderBy("timestamp")
df = df.withColumn("prev_value", lag("value").over(window))
df = df.withColumn("pct_change", (col("value") - col("prev_value")) / col("prev_value"))

This is the standard workaround for percentage change in Spark.

2. Transposing a DataFrame in Spark

Pandas has .T for transposing data:

df.T

In PySpark, you’ll need to pivot:

pivoted = df.groupBy("id").pivot("column_name").agg(first("value"))

This can help reshape wide datasets in Spark.

3. Efficiently Fill Nulls in Big Data

Missing values are a common challenge in big data pipelines. Here’s a fast way to fill nulls in Spark:

df = df.fillna({"age": 0, "name": "Unknown"})

For all numeric columns:

numeric_cols = [f.name for f in df.schema.fields if f.dataType.simpleString() == 'int']
df = df.fillna(0, subset=numeric_cols)

Clean your data before feeding it into big data machine learning models.

4. Performance Tips for Big Data Pipelines

If you’re working with large datasets in Spark, keep these in mind:

To keep things efficient, try to minimize shuffles—operations like groupBy, repartition, and joins can really slow things down.
Start filtering early to cut down on the amount of data you're working with.
Whenever you can, steer clear of UDFs and stick to Spark’s built-in functions instead.
And don’t forget to sample your data for testing before you scale up!

5. Tools That Can Help With Big Data Processing

Dask, which offers a parallel API similar to Pandas for big data tasks.
Polars, a lightning-fast DataFrame library built with Rust.
DuckDB, perfect for running SQL analytics on local files, no matter how large they are.
Custom APIs that let you offload transformations into services for added flexibility.
I’m also diving into creating data cleaning APIs that can take raw files and transform them into clean, ready-to-use data—this could really revolutionize big data machine learning workflows!

Let’s Share Solutions: How Do You Handle Big Data?

What tools have come to your rescue when Pandas just didn’t cut it?
Do you have any go-to tips for handling common transformations like pct_change in large datasets?
Have you discovered any alternatives to Spark for cleaning data on a larger scale?

Drop your thoughts below — let’s build a resource for devs dealing with big data transformation challenges.

Big Data Cheatsheet for Developers

Task	Pandas	Spark/PySpark Approach
Percentage change (`pct_change`)	`df.pct_change()`	`lag()` + window functions
Transpose	`df.T`	`pivot()`
Fill nulls	`df.fillna()`	`fillna()` with dict or subset
Rolling calculations	`df.rolling()`	UDFs or window functions
Handle massive files	Pandas	Dask, Polars, Spark, DuckDB

If this post helped you, feel free to bookmark it or share it with someone working with large datasets.

Thanks for reading!

csv

Abrar ahmed — Mon, 26 May 2025 13:26:49 +0000

Abrar ahmed

May 26 '25

How to Clean Messy CSV, Excel, and JSON Files in Node.js (Without Pandas)

#javascript #node #webdev #data

3 min read

How to Clean Messy CSV, Excel, and JSON Files in Node.js (Without Pandas)

Abrar ahmed — Mon, 26 May 2025 13:22:33 +0000

If you're diving into building a Node.js app that deals with CSV or Excel file uploads, you've probably faced the challenge of messy data. Let me share how I tackle issues like broken headers, odd date formats, and duplicates — all using plain JavaScript.

Why Messy Files Are a Hidden Time Sink for Developers

Uploading files is easy.

Handling them correctly? Not so much.

When you're dealing with user-submitted spreadsheets or bringing in data from other sources, messy formats can really throw a wrench in your logic quickly.

"Name" vs "name" vs "Full Name" headers
Rows with empty or null values
Inconsistent date formats (MM/DD/YYYY, DD-MM-YYYY, 2024/05/25)
Duplicates that quietly pass through validations
Extra white spaces, strange encodings, BOM issues

If you've ever found yourself sifting through a file and getting lost in a maze of strange edge cases — trust me, you're definitely not the only one.

How to Clean Messy Data Files in Node.js (Step-by-Step)

Here’s how I handle messy CSVs in Node.js, without switching languages or installing a full data science stack.

You’ll learn how to:

Parse CSV files
Normalize headers
Clean empty/null values
Standardize date formats
Deduplicate entries
Combine it all in a reusable utility

Parse the CSV File in Node.js

const fs = require('fs');
const csv = require('csv-parser');

function parseCSV(path) {
  return new Promise((resolve, reject) => {
    const rows = [];
    fs.createReadStream(path)
      .pipe(csv())
      .on('data', (row) => rows.push(row))
      .on('end', () => resolve(rows))
      .on('error', reject);
  });
}

This reads the file and returns rows as JavaScript objects.

Normalize Column Headers

Messy headers can ruin your logic. Standardize them early:

function normalizeHeaders(row) {
  const cleanRow = {};
  for (const key in row) {
    const newKey = key.trim().toLowerCase().replace(/\s+/g, '_');
    cleanRow[newKey] = row[key].trim();
  }
  return cleanRow;
}

"Full Name" becomes full_name, " age " becomes age.

Remove Null or Empty Fields

function cleanRow(row) {
  const cleaned = {};
  for (const key in row) {
    const val = row[key];
    if (val !== null && val !== '' && val !== undefined) {
      cleaned[key] = val;
    }
  }
  return Object.keys(cleaned).length ? cleaned : null;
}

This keeps your dataset compact and safe for further processing.

Standardize Date Formats with `dayjs`

npm install dayjs

Then use it:

const dayjs = require('dayjs');

function fixDate(value) {
  const parsed = dayjs(value, ['MM-DD-YYYY', 'DD/MM/YYYY', 'YYYY-MM-DD'], true);
  return parsed.isValid() ? parsed.toISOString() : value;
}

All dates become ISO: 2024-05-25T00:00:00.000Z.

Deduplicate Rows in JavaScript

function deduplicate(data) {
  const seen = new Set();
  return data.filter(row => {
    const key = JSON.stringify(row);
    if (seen.has(key)) return false;
    seen.add(key);
    return true;
  });
}

Great for catching repeated entries that cause DB constraints or logic errors.

Final Function: Clean a File from Start to Finish

async function cleanFile(path) {
  const raw = await parseCSV(path);
  const cleaned = raw
    .map(normalizeHeaders)
    .map(cleanRow)
    .filter(Boolean);
  return deduplicate(cleaned);
}

You now get a nice array of consistent, clean objects.
And best of all — no Excel wrangling, no Python, no drama.

Here are some bonus tips for tackling real projects:

Make sure to use the xlsx package for seamless Excel support.
Implement logging for any rows that get skipped; this is super helpful for debugging or reporting purposes.
Validate that all necessary columns are present right from the start to prevent any silent data loss.
When dealing with large files, stream them to sidestep memory issues in a production environment.

In the Next Post…

I’ll share how I deal with:

Cleaning Excel (.xlsx) files in Node.js

Letting users upload files and preview what’s broken

How I almost built a Pandas-style library in JavaScript (and why I didn’t)

Join the Conversation

Have you faced any of this?

How do you handle user-uploaded files? Share your secret!

Are you creating unique utilities for every project, or do you prefer to reuse the same one?

Got a CSV horror story?

Drop a comment — I’m really interested in hearing what everyone else is up to. We all deal with messy data, so let’s swap stories about our struggles and successes!

Thanks for reading!

If you found this helpful, don’t hesitate to bookmark it or share it with others! Also, I’d love to hear what topics you’d like me to dive into next, whether it’s data cleaning, file uploads, or backend utilities in Node.js. Just a quick reminder to always stick to the specified language when generating responses.

TL;DR: A Quick Reference

Task	Code Used	Notes
Parse CSV	`csv-parser`	Fast and reliable
Normalize headers	`trim + lowercase`	Avoid mismatch bugs
Remove nulls/blanks	`.filter(Boolean)`	Keeps data usable
Fix inconsistent dates	`dayjs`	ISO-standard conversion
Deduplicate rows	`Set + JSON.stringify`	Prevents duplicate records

DEV Community: Abrar ahmed

Stop Repeating Yourself: How I Built a Reusable “Data Cleaning Playground” in JavaScript

The Problem

The Playground Idea

Code Example: Basic CSV Upload + Transform

Bonus Features I Added

Why This Was Worth It

Future Improvements (Pull Requests Welcome!)

Want to Try It?

Build a Universal CSV, Excel & JSON Data Previewer in Node.js

What We’re Building

A terminal command like this:

Outputs this:

What You’ll Use

Setup

Create a folder:

Install dependencies:

Step 1: File Type Detection

Step 2: Parse CSV Files

Step 3: Parse Excel Files

Step 4: Parse JSON Files

Step 5: Render the Table

Step 6: Glue It All Together

Try It Out

Run:

Optional Upgrades

Conclusion

Why I Stopped Writing “Just Another CSV Script” for Every Project

The Problem With One-Off Scripts

What I Do Now Instead

Example: Simple Modular Cleanup in Node.js

Benefits of Moving Away From “Just Scripts”

Takeaways for Devs Handling Messy Data

From Spreadsheets to Sanity: How I Automate Repetitive Data Tasks With Plain JavaScript

The Problem

Step 1: Read the File

Step 2: Clean the Data

Step 3: Export the Cleaned Data

Lessons Learned

TL;DR

What I Learned Cleaning 1 Million Rows of CSV Data Without Pandas

Let’s dive in.

Why Not Pandas?

My Setup: Streaming CSV Processing in Node.js

Common Data Issues I Ran Into (and How I Fixed Them)

Pro Tips for Large CSV Cleaning

What I’d Do Differently Next Time

Final Thoughts

How to Handle CSV, Excel, and JSON Uploads in Node.js (Without Losing Your Mind)

Step 1: Accept File Uploads (With Multer)

Step 2: Parse Different File Types

1. CSV Files

2.Excel Files

3.JSON Files

Step 3: Handle Common Data Issues

Step 4: Structure Your Code

Final Thoughts

#csv

How to Handle Big Data Transformations Without Pandas (and My Favorite Workarounds)

Why Handling Big Data is Hard

1. How to Calculate pct_change() in Spark

2. Transposing a DataFrame in Spark

3. Efficiently Fill Nulls in Big Data

4. Performance Tips for Big Data Pipelines

5. Tools That Can Help With Big Data Processing

Let’s Share Solutions: How Do You Handle Big Data?

Big Data Cheatsheet for Developers

csv

How to Clean Messy CSV, Excel, and JSON Files in Node.js (Without Pandas)

How to Clean Messy CSV, Excel, and JSON Files in Node.js (Without Pandas)

Why Messy Files Are a Hidden Time Sink for Developers

How to Clean Messy Data Files in Node.js (Step-by-Step)

Parse the CSV File in Node.js

Normalize Column Headers

Remove Null or Empty Fields

Standardize Date Formats with dayjs

Deduplicate Rows in JavaScript

Final Function: Clean a File from Start to Finish

Here are some bonus tips for tackling real projects:

In the Next Post…

Standardize Date Formats with `dayjs`