DEV Community

Cover image for I built a Python CLI that reads INSIDE your files to organize them — here's how it works
sam jha
sam jha

Posted on • Edited on

I built a Python CLI that reads INSIDE your files to organize them — here's how it works

Have you ever looked at a folder full of files named scan001.pdf, document_final_FINAL_v3.docx, or IMG_20231105_142233.jpg and thought: "I have absolutely no idea what any of these are"?

I've been there. So I built a tool to fix it.

What is Folder Intelligence?

Folder Intelligence is a Python CLI that doesn't just look at your filenames — it reads the content of your files to understand what they actually are, then organizes them intelligently.

GitHub: https://github.com/SGajjar24/folder-intelligence

Instead of renaming files based on metadata or patterns, it:

  • Extracts text from PDFs, Word docs, images (via OCR), and more
  • Generates meaningful names based on what the file actually contains
  • Detects and removes duplicate files using SHA-256 hashing
  • Creates an audit report of everything it did

The Core Problem It Solves

Most file organizers are dumb — they sort by extension, date, or size. They can't tell you that scan001.pdf is actually your 2022 tax return, or that IMG_0042.jpg is a receipt from last year.

Folder Intelligence reads the content and acts on it.

Key Features

1. Content-Aware Renaming

The tool extracts text from files and generates descriptive names:

before: scan001.pdf
after:  2022_tax_return_w2_form.pdf
Enter fullscreen mode Exit fullscreen mode

2. OCR for Images

Using Tesseract OCR, it reads text from scanned documents and images:

before: IMG_20231105.jpg
after:  invoice_amazon_order_receipt.jpg
Enter fullscreen mode Exit fullscreen mode

3. SHA-256 Deduplication

Finds exact duplicates regardless of filename and removes them safely:

Found 3 duplicates (47.2 MB freed)
Enter fullscreen mode Exit fullscreen mode

4. Audit Trail

Every action is logged so you can review (or undo) changes:

audit_report_2024-01-15.json
Enter fullscreen mode Exit fullscreen mode

5. Safe Dry-Run Mode

Run it first with --dry-run to preview all changes before anything gets touched.

Quick Start

pip install folder-intelligence

# Audit your folder
folder-intelligence audit /path/to/folder

# Rename files based on content
folder-intelligence rename /path/to/folder

# Find and remove duplicates
folder-intelligence dedupe /path/to/folder

# Run everything
folder-intelligence pipeline /path/to/folder
Enter fullscreen mode Exit fullscreen mode

Tech Stack

  • Python 3.8+
  • PyMuPDF — PDF text extraction
  • python-docx — Word document parsing
  • Tesseract OCR — Image text recognition
  • SHA-256 — Duplicate detection
  • Rich — Beautiful CLI output

Why I Built This

I was cleaning up 10+ years of accumulated files — downloads, scans, backups — and realized I was spending more time figuring out what a file was before I could organize it than actually organizing it.

Every existing tool I found was either too simple (sort by date) or required cloud AI (expensive, privacy concerns). I wanted something local, fast, and smart.

So I built Folder Intelligence: enterprise-grade file organization, no AI models required, runs 100% offline.

What's Next

  • [ ] GUI wrapper (Tkinter/PyQt)
  • [ ] Plugin system for custom file handlers
  • [ ] Smarter rename suggestions with local LLM support (optional)
  • [ ] Windows Explorer / macOS Finder context menu integration

Try It Out

I'd love feedback from the dev.to community — especially on the rename logic and any edge cases you'd throw at it. What file types or workflows would make this more useful for you?

Top comments (1)

Collapse
 
giggler profile image
Tommy Chan

thanks 4 ur helpful post... good lucky 4 u