DEV Community

sam jha
sam jha

Posted on

I built a Python CLI that reads INSIDE your files to organize them — here's how it works

Have you ever looked at a folder full of files named scan001.pdf, document_final_FINAL_v3.docx, or IMG_20231105_142233.jpg and thought: "I have absolutely no idea what any of these are"?

I've been there. So I built a tool to fix it.

What is Folder Intelligence?

Folder Intelligence is a Python CLI that doesn't just look at your filenames — it reads the content of your files to understand what they actually are, then organizes them intelligently.

GitHub: https://github.com/SGajjar24/folder-intelligence

Instead of renaming files based on metadata or patterns, it:

  • Extracts text from PDFs, Word docs, images (via OCR), and more
  • Generates meaningful names based on what the file actually contains
  • Detects and removes duplicate files using SHA-256 hashing
  • Creates an audit report of everything it did

The Core Problem It Solves

Most file organizers are dumb — they sort by extension, date, or size. They can't tell you that scan001.pdf is actually your 2022 tax return, or that IMG_0042.jpg is a receipt from last year.

Folder Intelligence reads the content and acts on it.

Key Features

1. Content-Aware Renaming

The tool extracts text from files and generates descriptive names:

before: scan001.pdf
after:  2022_tax_return_w2_form.pdf
Enter fullscreen mode Exit fullscreen mode

2. OCR for Images

Using Tesseract OCR, it reads text from scanned documents and images:

before: IMG_20231105.jpg
after:  invoice_amazon_order_receipt.jpg
Enter fullscreen mode Exit fullscreen mode

3. SHA-256 Deduplication

Finds exact duplicates regardless of filename and removes them safely:

Found 3 duplicates (47.2 MB freed)
Enter fullscreen mode Exit fullscreen mode

4. Audit Trail

Every action is logged so you can review (or undo) changes:

audit_report_2024-01-15.json
Enter fullscreen mode Exit fullscreen mode

5. Safe Dry-Run Mode

Run it first with --dry-run to preview all changes before anything gets touched.

Quick Start

pip install folder-intelligence

# Audit your folder
folder-intelligence audit /path/to/folder

# Rename files based on content
folder-intelligence rename /path/to/folder

# Find and remove duplicates
folder-intelligence dedupe /path/to/folder

# Run everything
folder-intelligence pipeline /path/to/folder
Enter fullscreen mode Exit fullscreen mode

Tech Stack

  • Python 3.8+
  • PyMuPDF — PDF text extraction
  • python-docx — Word document parsing
  • Tesseract OCR — Image text recognition
  • SHA-256 — Duplicate detection
  • Rich — Beautiful CLI output

Why I Built This

I was cleaning up 10+ years of accumulated files — downloads, scans, backups — and realized I was spending more time figuring out what a file was before I could organize it than actually organizing it.

Every existing tool I found was either too simple (sort by date) or required cloud AI (expensive, privacy concerns). I wanted something local, fast, and smart.

So I built Folder Intelligence: enterprise-grade file organization, no AI models required, runs 100% offline.

What's Next

  • [ ] GUI wrapper (Tkinter/PyQt)
  • [ ] Plugin system for custom file handlers
  • [ ] Smarter rename suggestions with local LLM support (optional)
  • [ ] Windows Explorer / macOS Finder context menu integration

Try It Out

I'd love feedback from the dev.to community — especially on the rename logic and any edge cases you'd throw at it. What file types or workflows would make this more useful for you?

Top comments (0)