DEV Community

Labeeb Bappu
Labeeb Bappu

Posted on

What Excel Doesn’t Tell You: Fixing Dirty Data at Scale

Today, we tackled a common challenge in web applications: safely and efficiently importing customer data from Excel spreadsheets. What started as a simple request evolved into a comprehensive tool designed for clarity, robustness, and user confidence.

However, the path to a truly robust solution is rarely linear. We quickly discovered that initial assumptions about data cleanliness and user input were often challenged by real-world scenarios. From unexpected whitespace in headers and data cells, to missing mandatory columns, and even the subtle nuances of contact information, each "failed case" became a valuable lesson, driving us to refine and strengthen the tool.

1. From Zero to Hero: Laying the Foundation

Our journey began with scaffolding the basic UI for a new "Import Customers" tab. We implemented core file upload functionality, ensuring that users could easily select their Excel files. This initial step set the stage for all the powerful features that followed.

2. Taming Wild Data: The Art of Data Cleaning

Real-world data is rarely pristine. (As any developer who's ever touched a spreadsheet knows, data often comes with its own unique personality, and sometimes, a few extra spaces.) A crucial part of our work involved making the tool resilient to common Excel quirks. We implemented intelligent data cleaning, automatically trimming whitespace from both column headers and individual data cells. Crucially, we also added logic to skip entirely empty rows, preventing unnecessary processing and potential errors down the line. This ensures that only meaningful data makes it into our system.

3. Decoding Contacts: Handling Complex Information

One of the more interesting puzzles was extracting contact information. Instead of a single "contact" field, we had two distinct contact numbers: "AGREEMENT CONTACT" and "SITE CONTACT." We designed the system to correctly identify and extract these, mapping them to "Office" and "Site" contacts respectively. A small but significant detail was automatically removing spaces from these contact numbers, ensuring clean, usable data for communication.

Here's a simplified look at how we processed these contacts:

// Inside the data processing logic
const agreementContactNumber = row["AGREEMENT CONTACT"];
if (agreementContactNumber) {
  const cleanedAgreementContactNumber = String(agreementContactNumber).replace(/\s/g, '');
  // Add "Office" contact if not already present
  if (!acc[customerName].contacts.some((c: any) => c.contact_name === "Office")) {
    acc[customerName].contacts.push({
      contact_name: "Office",
      contact_number: cleanedAgreementContactNumber,
    });
  }
}

const siteContactNumber = row["SITE CONTACT"];
if (siteContactNumber) {
  const cleanedSiteContactNumber = String(siteContactNumber).replace(/\s/g, '');
  // Add "Site" contact if not already present
  if (!acc[customerName].contacts.some((c: any) => c.contact_name === "Site")) {
    acc[customerName].contacts.push({
      contact_name: "Site",
      contact_number: cleanedSiteContactNumber,
    });
  }
}
Enter fullscreen mode Exit fullscreen mode

This snippet demonstrates how we ensure each contact type is uniquely added and its number is standardized.

4. The Power of Batches: Saving Data Safely and Efficiently

Saving potentially thousands of records to a database requires careful planning. We devised a robust batching strategy, sending data in manageable chunks (e.g., 100 records at a time) to the API. This approach prevents overwhelming the server, mitigates the risk of hitting API rate limits, and ensures that even if an issue occurs, only a small portion of the data is affected, making recovery much simpler.

5. User-First Design: Clarity Through Feedback

Throughout the development, user experience was paramount. We integrated clear, immediate feedback mechanisms:

  • Explicit Header Errors: If a mandatory column was missing, the UI now clearly states which ones, guiding the user to fix their file.
  • Real-time Progress: During the saving process, users see a live update of which batch is being processed (e.g., "Saving... (5/10)"), providing transparency and reducing anxiety.
  • Detailed Reports: After the import, a downloadable Excel report provides a row-by-row status, indicating whether each customer was created, updated, or encountered an error, all while preserving the original column order for easy reconciliation.

This project was a great example of how thoughtful design and incremental improvements can transform a complex task into a smooth, reliable, and user-friendly experience.

Conclusion: Balancing User Experience with Infrastructure Health

Our work on the customer import tool beautifully illustrates a core principle of software engineering: the delicate balance between providing an exceptional user experience and safeguarding the underlying infrastructure. By meticulously cleaning data, implementing smart batching, and offering transparent feedback, we not only empowered users with a powerful tool but also ensured the stability and performance of our systems. This commitment to both the user and the infrastructure is what truly defines a robust and sustainable solution.

Top comments (0)