CSV quote marks broke my parser. 2,000 records silently corrupted.

#python #data #csv #programming

Got a CSV from the marketing team last week. Simple request: clean it up and import into the database.

Opened it in Excel first (mistake #1). Looked fine. 2,000 rows, standard product data. SKU, name, price, quantity.

Wrote a quick parser in Python.

import csv

with open('products.csv', 'r') as f:
    reader = csv.DictReader(f)
    for row in reader:
        print(row)

Printed fine. Pushed to the database.

Two hours later, the frontend stopped rendering prices correctly.

Debugging time.

Turns out the CSV had quote marks inside text fields. The product name column had entries like:

"Ergonomic Desk Chair (with "ergonomic" mesh)"

My parser treated that inner quote as the field terminator. Everything after "with " got truncated. So instead of "Ergonomic Desk Chair (with ergonomic mesh)" we got "Ergonomic Desk Chair (with ".

The database had half product names. The frontend was trying to parse "$" and failing.

Excel had hidden this from me. It auto-escapes quotes in the display. The CSV looked clean but the actual file content was a mess.

Fixed it with:

import csv

with open('products.csv', 'r') as f:
    reader = csv.DictReader(f, quotechar='"', doublequote=True)
    for row in reader:
        print(row)

Wait, that still didn't work. The actual fix needed csv.QUOTE_ALL behavior or manual preprocessing.

Ended up with:

import csv

with open('products.csv', 'r') as f:
    content = f.read()
    # Escape unescaped quotes manually
    lines = []
    for line in content.split('\n'):
        # Handle cases where quotes aren't properly escaped
        fixed = line.replace('""', '<<QUOTE>>')
        fixed = fixed.replace('"', '""')
        fixed = fixed.replace('<<QUOTE>>', '""')
        lines.append(fixed)

    reader = csv.DictReader(lines, quotechar='"', doublequote=True)
    for row in reader:
        print(row)

Not pretty but it worked.

I'm just gonna validate CSV files from now on. Every single one. And marketing saying it's "simple" before sending it is basically a red flag at this point honestly.

DEV Community

CSV quote marks broke my parser. 2,000 records silently corrupted.

Top comments (0)