DEV Community: Yasin Islam

DEV Community: Yasin Islam The latest articles on DEV Community by Yasin Islam (@yasliu). https://dev.to/yasliu https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3892924%2F6c91627c-5892-41cd-81a5-5951f2b6c544.png DEV Community: Yasin Islam https://dev.to/yasliu en I built a CLI tool to quickly sanity-check CSV files (tidypeek) Yasin Islam Wed, 22 Apr 2026 17:45:45 +0000 https://dev.to/yasliu/i-built-a-cli-tool-to-quickly-sanity-check-csv-files-tidypeek-1do5 https://dev.to/yasliu/i-built-a-cli-tool-to-quickly-sanity-check-csv-files-tidypeek-1do5 <p>Working with CSV files is annoying.</p> <p>You load a dataset and immediately start wondering:</p> <ul> <li>Are there missing values?</li> <li>Are there duplicate rows?</li> <li>Which column is the actual ID?</li> <li>Is this dataset even clean enough to work with?</li> </ul> <p>I found myself doing the same basic checks over and over again — so I built a small CLI tool to speed it up.</p> <h2> Introducing tidypeek </h2> <p><strong>tidypeek</strong> is a lightweight command-line tool that gives you a quick sanity check of any CSV file.</p> <p>You can install it with:</p> <p>pip install tidypeek</p> <p>and run:</p> <p>tidypeek yourfile.csv</p> <h2> What it does </h2> <p>It analyzes your dataset and shows:</p> <ul> <li>total rows and columns</li> <li>column types</li> <li>missing values</li> <li>duplicate rows</li> <li>likely identifier columns</li> <li>duplicate IDs</li> <li>simple insights about your data</li> </ul> <h2> Example output </h2> <p><a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F715w699vsf687whpfjaz.png" class="article-body-image-wrapper"><img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F715w699vsf687whpfjaz.png" alt="An example screenshot of how the output will look like" width="800" height="626"></a></p> <h2> Why I built it </h2> <p>Most tools are either:</p> <ul> <li>too heavy (full profiling libraries)</li> <li>or too manual (writing the same pandas code every time)</li> </ul> <p>I wanted something:</p> <ul> <li>fast</li> <li>simple</li> <li>terminal-based</li> <li>useful before real analysis</li> </ul> <h2> Some example insights it gives </h2> <ul> <li>“4 columns have high missing values”</li> <li>“Column ‘name’ appears to be an identifier but contains duplicates”</li> <li>“12 columns have low uniqueness — useful for grouping”</li> </ul> <h2> Thoughts </h2> <p>This is still v1, but already useful for:</p> <ul> <li>quick dataset inspection</li> <li>data cleaning workflows</li> <li>learning data analysis</li> </ul> <h2> GitHub </h2> <p>[<a href="https://github.com/Yasliu/TidyPeek" rel="noopener noreferrer">https://github.com/Yasliu/TidyPeek</a>]</p> <h2> PyPI </h2> <p>[<a href="https://pypi.org/project/tidypeek/" rel="noopener noreferrer">https://pypi.org/project/tidypeek/</a>]</p> <p>If you work with CSVs a lot, would love feedback on what else to add.</p> python cli data opensource