Skip to content

DEV Community

Yasin Islam

Posted on Apr 22

I built a CLI tool to quickly sanity-check CSV files (tidypeek)

#python #cli #data #opensource

Working with CSV files is annoying.

You load a dataset and immediately start wondering:

Are there missing values?
Are there duplicate rows?
Which column is the actual ID?
Is this dataset even clean enough to work with?

I found myself doing the same basic checks over and over again — so I built a small CLI tool to speed it up.

Introducing tidypeek

tidypeek is a lightweight command-line tool that gives you a quick sanity check of any CSV file.

You can install it with:

pip install tidypeek

and run:

tidypeek yourfile.csv

What it does

It analyzes your dataset and shows:

total rows and columns
column types
missing values
duplicate rows
likely identifier columns
duplicate IDs
simple insights about your data

Example output

Why I built it

Most tools are either:

too heavy (full profiling libraries)
or too manual (writing the same pandas code every time)

I wanted something:

fast
simple
terminal-based
useful before real analysis

Some example insights it gives

“4 columns have high missing values”
“Column ‘name’ appears to be an identifier but contains duplicates”
“12 columns have low uniqueness — useful for grouping”

Thoughts

This is still v1, but already useful for:

quick dataset inspection
data cleaning workflows
learning data analysis

GitHub

[https://github.com/Yasliu/TidyPeek]

PyPI

[https://pypi.org/project/tidypeek/]

If you work with CSVs a lot, would love feedback on what else to add.

Top comments (0)

Subscribe