DEV Community

Cover image for You don't need pandas to see what's in a CSV — I built a zero-dep CLI for it
benjamin
benjamin

Posted on

You don't need pandas to see what's in a CSV — I built a zero-dep CLI for it

Someone drops a CSV export in your lap. Before you can do anything with it you need the basics: how many rows? what are the columns? which ones are mostly empty? what's the range on that amount field, and how many distinct status values are there?

The reflex is to reach for pandas:

import pandas as pd
df = pd.read_csv("data.csv")
df.shape; df.dtypes; df.isna().mean(); df["amount"].describe(); df["status"].value_counts()
Enter fullscreen mode Exit fullscreen mode

That's a pip install pandas, a Python session, and five API calls you half-remember — to answer questions a glance should answer. csvkit is nicer but still a dependency install; Excel chokes on big files and isn't in your terminal.

So I built csvsight — one command, zero dependencies:

npx csvsight data.csv
Enter fullscreen mode Exit fullscreen mode
data.csv — 10,432 rows × 5 columns  (comma, utf-8)

#  column   type    nulls       unique  detail
─  ───────  ──────  ──────────  ──────  ─────────────────────────────────────────
1  id       int     0 (0.0%)    10,432  min 1 · max 10432 · mean 5216.5
2  email    string  12 (0.1%)   10,411  e.g. "ada@example.com" · len 9–48
3  amount   float   34 (0.3%)   2,015   min 0.01 · max 9999 · mean 42.3
4  status   string  0 (0.0%)    3       active (61%) · churned (28%) · trial (11%)
5  country  string  120 (1.1%)  47      US (40%) · GB (12%) · DE (7%)
Enter fullscreen mode Exit fullscreen mode

That's the whole tool. It:

  • auto-detects the delimiter (, tab ; |),
  • infers each column's type (int / float / string) from the actual values,
  • counts 10+ spellings of "missing"NULL, N/A, nan, none, -, empty, … — because real-world CSVs are wildly inconsistent about nulls,
  • shows min / max / mean for numbers, value distributions for low-cardinality columns, and an example + length range for free text.

--json gives the same analysis machine-readable; --no-header, --top N, and --delimiter are there when you need them.

Why zero dependencies matters here

This is the kind of thing you want to run now, on whatever machine you're on, without setting up an environment. It's pure standard library — a hand-rolled CSV parser (so quoting and embedded newlines work) plus the profiling. npx / pipx and it runs. The Node and Python ports are behavior-identical, byte-for-byte.

It's intentionally not a pandas replacement — no transforms, no querying. It answers exactly one question: what's in this file?

Install

npx csvsight data.csv       # Node >= 18
pip install csvsight        # Python >= 3.8
Enter fullscreen mode Exit fullscreen mode

When a random CSV lands on you, what's your first move — pandas, csvkit, Excel, head + eyeballing? And what's the one stat you always end up wanting that a tool like this should show?

Top comments (0)