DEV Community

Yasin Islam
Yasin Islam

Posted on

I built a CLI tool to quickly sanity-check CSV files (tidypeek)

Working with CSV files is annoying.

You load a dataset and immediately start wondering:

  • Are there missing values?
  • Are there duplicate rows?
  • Which column is the actual ID?
  • Is this dataset even clean enough to work with?

I found myself doing the same basic checks over and over again — so I built a small CLI tool to speed it up.


Introducing tidypeek

tidypeek is a lightweight command-line tool that gives you a quick sanity check of any CSV file.

You can install it with:

pip install tidypeek

and run:

tidypeek yourfile.csv


What it does

It analyzes your dataset and shows:

  • total rows and columns
  • column types
  • missing values
  • duplicate rows
  • likely identifier columns
  • duplicate IDs
  • simple insights about your data

Example output

An example screenshot of how the output will look like


Why I built it

Most tools are either:

  • too heavy (full profiling libraries)
  • or too manual (writing the same pandas code every time)

I wanted something:

  • fast
  • simple
  • terminal-based
  • useful before real analysis

Some example insights it gives

  • “4 columns have high missing values”
  • “Column ‘name’ appears to be an identifier but contains duplicates”
  • “12 columns have low uniqueness — useful for grouping”

Thoughts

This is still v1, but already useful for:

  • quick dataset inspection
  • data cleaning workflows
  • learning data analysis

GitHub

[https://github.com/Yasliu/TidyPeek]

PyPI

[https://pypi.org/project/tidypeek/]


If you work with CSVs a lot, would love feedback on what else to add.

Top comments (0)