My CSV had 50K rows. Row 23,487 broke everything.

#python #data #debugging #programming

My CSV had 50K rows. Row 23,487 broke everything.

Built a data processor for vendor CSV exports. Tested with sample files, looked good, pushed to production.

Two weeks later, 3 AM alert.

Vendor changed export format. Not the whole thing, just ONE field on random rows. They added an unquoted comma.

# Expected format
product_name,price,stock
Widgets,12.99,45

# Row 23,487
Widgets, Premium Edition,12.99,45

That comma shifted columns. Parser read "Premium Edition" as price, tried float conversion, crashed.

Spent an hour thinking it was my regex. Nope.

Thought maybe encoding issue. Nope.

Finally printed the raw row. There it was. Unquoted comma in product name field.

Tests missed it because first 20K rows were clean. Formatting bug only appeared when product names had certain keywords. My test CSVs had none of those.

No validation existed. Python csv module parsed it without error, so I assumed valid data. Bad assumption.

Fixed it:

import csv

def validate_row(row, expected_cols=3):
    if len(row) != expected_cols:
        return False, f"Expected {expected_cols} cols, got {len(row)}"

    try:
        float(row[1])  # price should be numeric
    except ValueError:
        return False, f"Not a number: {row[1]}"

    return True, None

with open('export.csv') as f:
    reader = csv.reader(f)
    next(reader)

    for i, row in enumerate(reader, start=1):
        valid, error = validate_row(row)
        if not valid:
            print(f"Row {i}: {error}")
            continue

        process_product(row)