Someone drops a CSV export in your lap. Before you can do anything with it you need the basics: how many rows? what are the columns? which ones are mostly empty? what's the range on that amount field, and how many distinct status values are there?
The reflex is to reach for pandas:
import pandas as pd
df = pd.read_csv("data.csv")
df.shape; df.dtypes; df.isna().mean(); df["amount"].describe(); df["status"].value_counts()
That's a pip install pandas, a Python session, and five API calls you half-remember — to answer questions a glance should answer. csvkit is nicer but still a dependency install; Excel chokes on big files and isn't in your terminal.
So I built csvsight — one command, zero dependencies:
npx csvsight data.csv
data.csv — 10,432 rows × 5 columns (comma, utf-8)
# column type nulls unique detail
─ ─────── ────── ────────── ────── ─────────────────────────────────────────
1 id int 0 (0.0%) 10,432 min 1 · max 10432 · mean 5216.5
2 email string 12 (0.1%) 10,411 e.g. "ada@example.com" · len 9–48
3 amount float 34 (0.3%) 2,015 min 0.01 · max 9999 · mean 42.3
4 status string 0 (0.0%) 3 active (61%) · churned (28%) · trial (11%)
5 country string 120 (1.1%) 47 US (40%) · GB (12%) · DE (7%)
That's the whole tool. It:
-
auto-detects the delimiter (
,tab;|), - infers each column's type (int / float / string) from the actual values,
-
counts 10+ spellings of "missing" —
NULL,N/A,nan,none,-, empty, … — because real-world CSVs are wildly inconsistent about nulls, - shows min / max / mean for numbers, value distributions for low-cardinality columns, and an example + length range for free text.
--json gives the same analysis machine-readable; --no-header, --top N, and --delimiter are there when you need them.
Why zero dependencies matters here
This is the kind of thing you want to run now, on whatever machine you're on, without setting up an environment. It's pure standard library — a hand-rolled CSV parser (so quoting and embedded newlines work) plus the profiling. npx / pipx and it runs. The Node and Python ports are behavior-identical, byte-for-byte.
It's intentionally not a pandas replacement — no transforms, no querying. It answers exactly one question: what's in this file?
Install
npx csvsight data.csv # Node >= 18
pip install csvsight # Python >= 3.8
- npm: https://www.npmjs.com/package/csvsight
- PyPI: https://pypi.org/project/csvsight
- GitHub: https://github.com/jjdoor/csvsight
When a random CSV lands on you, what's your first move — pandas, csvkit, Excel, head + eyeballing? And what's the one stat you always end up wanting that a tool like this should show?
Top comments (0)