My CSV had 50K rows. Row 23,487 broke everything.
Built a data processor for vendor CSV exports. Tested with sample files, looked good, pushed to production.
Two weeks later, 3 AM alert.
Vendor changed export format. Not the whole thing, just ONE field on random rows. They added an unquoted comma.
# Expected format
product_name,price,stock
Widgets,12.99,45
# Row 23,487
Widgets, Premium Edition,12.99,45
That comma shifted columns. Parser read "Premium Edition" as price, tried float conversion, crashed.
Spent an hour thinking it was my regex. Nope.
Thought maybe encoding issue. Nope.
Finally printed the raw row. There it was. Unquoted comma in product name field.
Tests missed it because first 20K rows were clean. Formatting bug only appeared when product names had certain keywords. My test CSVs had none of those.
No validation existed. Python csv module parsed it without error, so I assumed valid data. Bad assumption.
Fixed it:
import csv
def validate_row(row, expected_cols=3):
if len(row) != expected_cols:
return False, f"Expected {expected_cols} cols, got {len(row)}"
try:
float(row[1]) # price should be numeric
except ValueError:
return False, f"Not a number: {row[1]}"
return True, None
with open('export.csv') as f:
reader = csv.reader(f)
next(reader)
for i, row in enumerate(reader, start=1):
valid, error = validate_row(row)
if not valid:
print(f"Row {i}: {error}")
continue
process_product(row)
Validate column count and types before processing. Log bad rows, skip them, don't explode.
Should have tested with actual production data instead of clean samples. Dry run mode would have caught this without waking me up at 3 AM.
Top comments (0)