Messy CSV exports are a tax on every data task: stray whitespace, duplicate rows, inconsistent headers, files too big to open. You don't always need pandas for this — Python's built-in csv module handles most of it with zero dependencies and code you can drop on any machine.
Here are three small patterns I reach for constantly.
1. Clean: dedupe, trim, fix headers
import csv
def clean(path, out):
seen = set()
with open(path, newline='', encoding='utf-8-sig') as f:
rows = list(csv.reader(f))
header = [h.strip().lower().replace(' ', '_') for h in rows[0]]
cleaned = []
for raw in rows[1:]:
cells = [c.strip() for c in raw]
if all(c == '' for c in cells):
continue # drop empty rows
key = tuple(cells)
if key in seen:
continue # drop duplicates
seen.add(key)
cleaned.append(cells)
with open(out, 'w', newline='', encoding='utf-8') as f:
w = csv.writer(f)
w.writerow(header)
w.writerows(cleaned)
Trims every cell, normalizes headers (First Name -> first_name), and removes empty/duplicate rows.
2. Split a huge CSV into chunks
import csv
def split(path, rows_per_file):
with open(path, newline='', encoding='utf-8-sig') as f:
reader = csv.reader(f)
header = next(reader)
chunk, part = [], 1
for row in reader:
chunk.append(row)
if len(chunk) >= rows_per_file:
_write(f'part{part}.csv', header, chunk); part += 1; chunk = []
if chunk:
_write(f'part{part}.csv', header, chunk)
def _write(name, header, rows):
with open(name, 'w', newline='', encoding='utf-8') as f:
w = csv.writer(f); w.writerow(header); w.writerows(rows)
3. Merge many CSVs into one
import csv, glob
def merge(pattern, out):
header = None
with open(out, 'w', newline='', encoding='utf-8') as o:
w = csv.writer(o)
for path in glob.glob(pattern):
with open(path, newline='', encoding='utf-8-sig') as f:
r = csv.reader(f)
h = next(r)
if header is None:
header = h; w.writerow(header)
w.writerows(r)
Why no pandas?
For one-off cleanups and small tools, the stdlib csv module is faster to ship: no install, runs anywhere Python 3.8+ runs, and the code stays readable enough to tweak for your own rules.
I cleaned these up into a tiny toolkit (proper CLI flags, edge cases, comments) so I stop rewriting them. If you'd rather grab them ready-made, it's $10 with full source: https://ko-fi.com/s/bfedf3fb78
What's your go-to for quick CSV wrangling?
Top comments (0)