Have you ever looked at a folder full of files named scan001.pdf, document_final_FINAL_v3.docx, or IMG_20231105_142233.jpg and thought: "I have absolutely no idea what any of these are"?
I've been there. So I built a tool to fix it.
What is Folder Intelligence?
Folder Intelligence is a Python CLI that doesn't just look at your filenames — it reads the content of your files to understand what they actually are, then organizes them intelligently.
GitHub: https://github.com/SGajjar24/folder-intelligence
Instead of renaming files based on metadata or patterns, it:
- Extracts text from PDFs, Word docs, images (via OCR), and more
- Generates meaningful names based on what the file actually contains
- Detects and removes duplicate files using SHA-256 hashing
- Creates an audit report of everything it did
The Core Problem It Solves
Most file organizers are dumb — they sort by extension, date, or size. They can't tell you that scan001.pdf is actually your 2022 tax return, or that IMG_0042.jpg is a receipt from last year.
Folder Intelligence reads the content and acts on it.
Key Features
1. Content-Aware Renaming
The tool extracts text from files and generates descriptive names:
before: scan001.pdf
after: 2022_tax_return_w2_form.pdf
2. OCR for Images
Using Tesseract OCR, it reads text from scanned documents and images:
before: IMG_20231105.jpg
after: invoice_amazon_order_receipt.jpg
3. SHA-256 Deduplication
Finds exact duplicates regardless of filename and removes them safely:
Found 3 duplicates (47.2 MB freed)
4. Audit Trail
Every action is logged so you can review (or undo) changes:
audit_report_2024-01-15.json
5. Safe Dry-Run Mode
Run it first with --dry-run to preview all changes before anything gets touched.
Quick Start
pip install folder-intelligence
# Audit your folder
folder-intelligence audit /path/to/folder
# Rename files based on content
folder-intelligence rename /path/to/folder
# Find and remove duplicates
folder-intelligence dedupe /path/to/folder
# Run everything
folder-intelligence pipeline /path/to/folder
Tech Stack
- Python 3.8+
- PyMuPDF — PDF text extraction
- python-docx — Word document parsing
- Tesseract OCR — Image text recognition
- SHA-256 — Duplicate detection
- Rich — Beautiful CLI output
Why I Built This
I was cleaning up 10+ years of accumulated files — downloads, scans, backups — and realized I was spending more time figuring out what a file was before I could organize it than actually organizing it.
Every existing tool I found was either too simple (sort by date) or required cloud AI (expensive, privacy concerns). I wanted something local, fast, and smart.
So I built Folder Intelligence: enterprise-grade file organization, no AI models required, runs 100% offline.
What's Next
- [ ] GUI wrapper (Tkinter/PyQt)
- [ ] Plugin system for custom file handlers
- [ ] Smarter rename suggestions with local LLM support (optional)
- [ ] Windows Explorer / macOS Finder context menu integration
Try It Out
- GitHub: https://github.com/SGajjar24/folder-intelligence
- Star it if you find it useful!
- Issues and PRs are very welcome
I'd love feedback from the dev.to community — especially on the rename logic and any edge cases you'd throw at it. What file types or workflows would make this more useful for you?
Top comments (0)