sam jha

Posted on Feb 22 • Edited on Mar 19

I built a Python CLI that reads INSIDE your files to organize them — here's how it works

#python #opensource #cli #showdev

Have you ever looked at a folder full of files named scan001.pdf, document_final_FINAL_v3.docx, or IMG_20231105_142233.jpg and thought: "I have absolutely no idea what any of these are"?

I've been there. So I built a tool to fix it.

What is Folder Intelligence?

Folder Intelligence is a Python CLI that doesn't just look at your filenames — it reads the content of your files to understand what they actually are, then organizes them intelligently.

GitHub: https://github.com/SGajjar24/folder-intelligence

Instead of renaming files based on metadata or patterns, it:

Extracts text from PDFs, Word docs, images (via OCR), and more
Generates meaningful names based on what the file actually contains
Detects and removes duplicate files using SHA-256 hashing
Creates an audit report of everything it did

The Core Problem It Solves

Most file organizers are dumb — they sort by extension, date, or size. They can't tell you that scan001.pdf is actually your 2022 tax return, or that IMG_0042.jpg is a receipt from last year.

Folder Intelligence reads the content and acts on it.

Key Features

1. Content-Aware Renaming

The tool extracts text from files and generates descriptive names:

before: scan001.pdf
after:  2022_tax_return_w2_form.pdf

2. OCR for Images

Using Tesseract OCR, it reads text from scanned documents and images:

before: IMG_20231105.jpg
after:  invoice_amazon_order_receipt.jpg

3. SHA-256 Deduplication

Finds exact duplicates regardless of filename and removes them safely:

Found 3 duplicates (47.2 MB freed)

4. Audit Trail

Every action is logged so you can review (or undo) changes:

audit_report_2024-01-15.json

5. Safe Dry-Run Mode

Run it first with --dry-run to preview all changes before anything gets touched.

Quick Start

pip install folder-intelligence

# Audit your folder
folder-intelligence audit /path/to/folder

# Rename files based on content
folder-intelligence rename /path/to/folder

# Find and remove duplicates
folder-intelligence dedupe /path/to/folder

# Run everything
folder-intelligence pipeline /path/to/folder

Tech Stack

Python 3.8+
PyMuPDF — PDF text extraction
python-docx — Word document parsing
Tesseract OCR — Image text recognition
SHA-256 — Duplicate detection
Rich — Beautiful CLI output

Why I Built This

I was cleaning up 10+ years of accumulated files — downloads, scans, backups — and realized I was spending more time figuring out what a file was before I could organize it than actually organizing it.

Every existing tool I found was either too simple (sort by date) or required cloud AI (expensive, privacy concerns). I wanted something local, fast, and smart.

So I built Folder Intelligence: enterprise-grade file organization, no AI models required, runs 100% offline.

What's Next

[ ] GUI wrapper (Tkinter/PyQt)
[ ] Plugin system for custom file handlers
[ ] Smarter rename suggestions with local LLM support (optional)
[ ] Windows Explorer / macOS Finder context menu integration

Try It Out

GitHub: https://github.com/SGajjar24/folder-intelligence
Star it if you find it useful!
Issues and PRs are very welcome

I'd love feedback from the dev.to community — especially on the rename logic and any edge cases you'd throw at it. What file types or workflows would make this more useful for you?

Top comments (1)

Tommy Chan • Mar 24

thanks 4 ur helpful post... good lucky 4 u