Working with CSV files is annoying.
You load a dataset and immediately start wondering:
- Are there missing values?
- Are there duplicate rows?
- Which column is the actual ID?
- Is this dataset even clean enough to work with?
I found myself doing the same basic checks over and over again — so I built a small CLI tool to speed it up.
Introducing tidypeek
tidypeek is a lightweight command-line tool that gives you a quick sanity check of any CSV file.
You can install it with:
pip install tidypeek
and run:
tidypeek yourfile.csv
What it does
It analyzes your dataset and shows:
- total rows and columns
- column types
- missing values
- duplicate rows
- likely identifier columns
- duplicate IDs
- simple insights about your data
Example output
Why I built it
Most tools are either:
- too heavy (full profiling libraries)
- or too manual (writing the same pandas code every time)
I wanted something:
- fast
- simple
- terminal-based
- useful before real analysis
Some example insights it gives
- “4 columns have high missing values”
- “Column ‘name’ appears to be an identifier but contains duplicates”
- “12 columns have low uniqueness — useful for grouping”
Thoughts
This is still v1, but already useful for:
- quick dataset inspection
- data cleaning workflows
- learning data analysis
GitHub
[https://github.com/Yasliu/TidyPeek]
PyPI
[https://pypi.org/project/tidypeek/]
If you work with CSVs a lot, would love feedback on what else to add.

Top comments (0)