Built data validation for a product feed importer. CSV comes in, checks run, flags issues. Clean data goes to database.
Spent a week on it. Checked for empty fields, invalid URLs, price format, date format, SKU format. Passed every test.
First real import. 2000 products. Zero validation errors. Perfect.
Client checks dashboard next morning. "Why do 400 products say $0.00?"
Turns out zero is valid
Prices were there. Format was valid. Just all zeros.
My validation:
def validate_price(price):
if not price:
raise ValidationError("Price required")
try:
float_price = float(price)
except ValueError:
raise ValidationError("Price must be number")
if float_price < 0:
raise ValidationError("Price cannot be negative")
return True
Looks fine. Checks for empty, checks format, checks negative.
What could go wrong?
Everything apparently.
Zero is technically valid. Not empty. Not negative. Passes all checks. But no product costs $0. Obviously wrong data.
Went back through CSV. Supplier had encoding issues. Cells that should have been "$49.99" came through as "$0" after their export broke.
Validator saw valid numbers. Database accepted valid numbers. Dashboard showed valid numbers.
All wrong.
The one line fix
Added range check. You're selling something, it costs something.
def validate_price(price):
if not price:
raise ValidationError("Price required")
try:
float_price = float(price)
except ValueError:
raise ValidationError("Price must be number")
if float_price <= 0: # Changed this
raise ValidationError("Price must be greater than zero")
return True
Reimported. 400 products flagged immediately. Got corrected feed from supplier. Imported clean.
Other nonsense I found
Started checking for stuff that looks valid but isn't:
Dates in future. Product available date set to 2099.
Weights of 0. Physically impossible unless you're selling air.
Descriptions that are just spaces. Technically not empty string.
Categories that don't exist in system. String validation passed, reference check didn't.
None of these broke format rules. All broke business rules.
Validator has two layers now. Format layer catches malformed data. Logic layer catches nonsense data that happens to be formatted correctly.
Still find weird edge cases honestly. Log everything so I can add checks when something stupid gets through.
Top comments (0)