Got a CSV from the marketing team last week. Simple request: clean it up and import into the database.
Opened it in Excel first (mistake #1). Looked fine. 2,000 rows, standard product data. SKU, name, price, quantity.
Wrote a quick parser in Python.
import csv
with open('products.csv', 'r') as f:
reader = csv.DictReader(f)
for row in reader:
print(row)
Printed fine. Pushed to the database.
Two hours later, the frontend stopped rendering prices correctly.
Debugging time.
Turns out the CSV had quote marks inside text fields. The product name column had entries like:
"Ergonomic Desk Chair (with "ergonomic" mesh)"
My parser treated that inner quote as the field terminator. Everything after "with " got truncated. So instead of "Ergonomic Desk Chair (with ergonomic mesh)" we got "Ergonomic Desk Chair (with ".
The database had half product names. The frontend was trying to parse "$" and failing.
Excel had hidden this from me. It auto-escapes quotes in the display. The CSV looked clean but the actual file content was a mess.
Fixed it with:
import csv
with open('products.csv', 'r') as f:
reader = csv.DictReader(f, quotechar='"', doublequote=True)
for row in reader:
print(row)
Wait, that still didn't work. The actual fix needed csv.QUOTE_ALL behavior or manual preprocessing.
Ended up with:
import csv
with open('products.csv', 'r') as f:
content = f.read()
# Escape unescaped quotes manually
lines = []
for line in content.split('\n'):
# Handle cases where quotes aren't properly escaped
fixed = line.replace('""', '<<QUOTE>>')
fixed = fixed.replace('"', '""')
fixed = fixed.replace('<<QUOTE>>', '""')
lines.append(fixed)
reader = csv.DictReader(lines, quotechar='"', doublequote=True)
for row in reader:
print(row)
Not pretty but it worked.
I'm just gonna validate CSV files from now on. Every single one. And marketing saying it's "simple" before sending it is basically a red flag at this point honestly.
Top comments (0)