DEV Community

Nico Reyes
Nico Reyes

Posted on

Spent 2 hours debugging CSV import. Turns out Excel added invisible characters.

Spent 2 hours debugging CSV import. Turns out Excel added invisible characters.

Client sent me a CSV file yesterday. Simple product list with prices. Import it into my Python script and half the rows fail validation.

Opened the CSV in a text editor. Everything looks fine. Columns are there. Data looks clean. Run the script again. Same errors.

What broke

My script was parsing product prices like this:

import csv

with open('products.csv', 'r') as f:
    reader = csv.DictReader(f)
    for row in reader:
        try:
            price = float(row['Price'])
            print(f"{row['Product']}: ${price}")
        except ValueError:
            print(f"Failed: {row['Product']} - Bad price: '{row['Price']}'")
Enter fullscreen mode Exit fullscreen mode

Output showed weird stuff:

Product A: $19.99
Failed: Product B - Bad price: '29.99'
Product C: $39.99
Failed: Product D - Bad price: '49.99'
Enter fullscreen mode Exit fullscreen mode

The prices that failed looked identical to the ones that worked. Copy pasted them into the Python REPL. Worked fine there.

Fun.

Figured it out eventually

After way too long staring at this, I hexdumped the CSV file:

hexdump -C products.csv | head -20
Enter fullscreen mode Exit fullscreen mode

Found this:

00000030  50 72 6f 64 75 63 74 20  42 2c ef bb bf 32 39 2e  |Product B,...29.|
00000040  39 39 0d 0a 50 72 6f 64  75 63 74 20 43 2c 33 39  |99..Product C,39|
Enter fullscreen mode Exit fullscreen mode

See those ef bb bf bytes? UTF8 BOM (Byte Order Mark). Excel loves adding these to the start of fields when you export to CSV. Invisible in text editors. Breaks everything.

The failing prices had BOM characters prepended. So float() choked on them because technically the string was \ufeff29.99 not 29.99.

Fix

Strip the BOM when reading:

import csv

with open('products.csv', 'r', encoding='utf-8-sig') as f:
    reader = csv.DictReader(f)
    for row in reader:
        price = float(row['Price'])
        print(f"{row['Product']}: ${price}")
Enter fullscreen mode Exit fullscreen mode

The encoding='utf-8-sig' parameter tells Python to strip BOM characters automatically. All rows parsed fine after that.

When CSV imports fail randomly and you can't see why, check for invisible characters. hexdump is your friend.

Top comments (0)