Spent 2 hours debugging CSV import. Turns out Excel added invisible characters.
Client sent me a CSV file yesterday. Simple product list with prices. Import it into my Python script and half the rows fail validation.
Opened the CSV in a text editor. Everything looks fine. Columns are there. Data looks clean. Run the script again. Same errors.
What broke
My script was parsing product prices like this:
import csv
with open('products.csv', 'r') as f:
reader = csv.DictReader(f)
for row in reader:
try:
price = float(row['Price'])
print(f"{row['Product']}: ${price}")
except ValueError:
print(f"Failed: {row['Product']} - Bad price: '{row['Price']}'")
Output showed weird stuff:
Product A: $19.99
Failed: Product B - Bad price: '29.99'
Product C: $39.99
Failed: Product D - Bad price: '49.99'
The prices that failed looked identical to the ones that worked. Copy pasted them into the Python REPL. Worked fine there.
Fun.
Figured it out eventually
After way too long staring at this, I hexdumped the CSV file:
hexdump -C products.csv | head -20
Found this:
00000030 50 72 6f 64 75 63 74 20 42 2c ef bb bf 32 39 2e |Product B,...29.|
00000040 39 39 0d 0a 50 72 6f 64 75 63 74 20 43 2c 33 39 |99..Product C,39|
See those ef bb bf bytes? UTF8 BOM (Byte Order Mark). Excel loves adding these to the start of fields when you export to CSV. Invisible in text editors. Breaks everything.
The failing prices had BOM characters prepended. So float() choked on them because technically the string was \ufeff29.99 not 29.99.
Fix
Strip the BOM when reading:
import csv
with open('products.csv', 'r', encoding='utf-8-sig') as f:
reader = csv.DictReader(f)
for row in reader:
price = float(row['Price'])
print(f"{row['Product']}: ${price}")
The encoding='utf-8-sig' parameter tells Python to strip BOM characters automatically. All rows parsed fine after that.
When CSV imports fail randomly and you can't see why, check for invisible characters. hexdump is your friend.
Top comments (0)