We've all been there. You start a new project with a clean structure, and three months later it's chaos:
my-project/
├── src/
│ ├── component1.py
│ ├── component2.py
│ ├── ... (26 more files)
├── temp/
├── backup/
├── old_backup/
├── Copy of feature.py
├── New File.txt
└── Untitled.py
Existing solutions either:
Don't use AI (just basic linting rules)
Require cloud APIs (your directory structure leaves your machine)
Cost money for what should be a simple dev tool
I wanted something different: AI-powered analysis that respects privacy.
The Solution
I built a directory monitoring tool that uses local LLMs (via Qwen/Ollama) to analyze project structure and give specific, actionable recommendations.
🗂️ What it does
- Detects new, removed, or renamed folders
- Logs structure changes in real time
- Helps you visualize how projects grow, shrink, or get messy
Key features:
🤖 Local LLM analysis (Qwen/Llama)
📊 Beautiful terminal UI with trends
🎯 RAG for pattern recognition
🔒 100% private - no cloud APIs
💾 SQLite for history tracking
┌─────────────────────────────────────┐
│ Your Machine (100% Local) │
├─────────────────────────────────────┤
│ │
│ 1. Scan Directory Structure │
│ ↓ │
│ 2. Store in SQLite │
│ ↓ │
│ 3. Generate Embeddings (local) │
│ ↓ │
│ 4. RAG: Retrieve Similar States │
│ ↓ │
│ 5. Query Local LLM (Ollama) │
│ ↓ │
│ 6. Get Analysis & Recommendations │
│ │
└─────────────────────────────────────┘
1. Directory Scanning
The core DirectoryAnalyzer walks the filesystem and tracks:
@dataclass
class DirectorySnapshot:
timestamp: str
path: str
total_files: int
total_dirs: int
file_types: Dict[str, int]
depth_distribution: Dict[int, int]
naming_violations: List[str]
largest_files: List[Dict[str, Any]]
Key metrics:
File and directory counts
Naming violations (spaces, temp files, etc.)
Directory depth (detecting over-nesting)
File type distribution
Large files that shouldn't be committed
2. Local Embeddings with RAG
This is where it gets interesting. Instead of just analyzing the current state, I wanted temporal awareness - knowing if you're improving or regressing.
Implementation:
from sentence_transformers import SentenceTransformer
class LocalVectorStore:
def __init__(self):
# Runs entirely on your machine - no API calls
self.model = SentenceTransformer('all-MiniLM-L6-v2')
def add_snapshot(self, snapshot: DirectorySnapshot):
# Convert snapshot to text representation
text = self._snapshot_to_text(snapshot)
# Generate embedding locally
embedding = self.model.encode(text)
# Store in SQLite
self.db.save_embedding(snapshot_id, embedding)
def search(self, query: str, top_k: int = 3):
# Find similar past states using cosine similarity
query_embedding = self.model.encode(query)
similarities = []
for stored_embedding in self.embeddings:
similarity = cosine_similarity(query_embedding, stored_embedding)
similarities.append(similarity)
# Return most similar past states
return top_k_results(similarities)
Why this matters:
When analyzing the current directory, the system retrieves similar past states:
Current: 28 files in src/
Past (3 months ago): 15 files in src/
Past (1 month ago): 22 files in src/
→ LLM context: "The directory is growing - was 15, then 22, now 28"
This gives the LLM temporal context to make better recommendations.
- LLM Analysis with Ollama Instead of cloud APIs, I use Ollama for local LLM inference:
import ollama
def analyze_with_llm(snapshot: DirectorySnapshot, context: str):
prompt = f"""You are a development standards expert.
{context} # RAG context from similar past states
Current State:
- Total Files: {snapshot.total_files}
- Naming Violations: {len(snapshot.naming_violations)}
- Max Depth: {max_depth}
Issues:
{snapshot.naming_violations[:10]}
Based on best practices:
1. Is this messy? (Yes/No)
2. Top 3 issues?
3. Specific actions?
4. Rate messiness 1-10
"""
response = ollama.chat(
model='qwen3:8b',
messages=[{'role': 'user', 'content': prompt}]
)
return response['message']['content']
Models tested:
qwen3:8b (5.2GB) - Fast, good quality
qwen2.5:latest (14GB) - Slower but excellent
llama3.2 (7GB) - Balanced option
4. Beautiful Terminal UI
Built with Rich for a modern TUI:
from rich.console import Console
from rich.panel import Panel
from rich.table import Table
from rich.layout import Layout
def create_metrics_panel(result):
metrics = Table.grid(padding=(0, 2))
# Messiness score with color coding
score = result['messiness_score']
color = "green" if score < 3 else "yellow" if score < 7 else "red"
metrics.add_row(
Panel(f"[{color}]{score:.1f}/10[/{color}]", title="Messiness")
)
return Panel(metrics, title="Metrics", border_style="blue")
Features:
- Real-time metrics cards
- Sparkline trend graphs (▁▂▃▄▅▆▇█)
- Color-coded scores
- LLM analysis display
- History tracking
Example Output
$ python monitor_tui.py
Messiness Score: 6.2/10 ⚠️
LLM Analysis:
Yes, this directory structure needs attention.
Top 3 Issues:
1. Excessive files in src/components (28 files) -
recommended maximum is 20. Split into:
- ui/ (buttons, inputs)
- forms/ (form components)
- layouts/ (page layouts)
2. Naming violations (8 files):
- "temp_fix.py" → move to .archive/ or delete
- "Copy of feature.py" → remove or rename properly
- Files with spaces → use kebab-case
3. Directory depth exceeds 7 levels - flatten structure
Messiness Rating: 6.2/10 - Moderate attention needed
Trend: 📉 Improving (was 7.8 → 6.2)
Privacy & Security
Everything stays local:
# NO external API calls
❌ openai.ChatCompletion.create()
❌ requests.post('https://api...')
❌ anthropic.messages.create()
## YES local processing
✅ ollama.chat() # localhost:11434
✅ SentenceTransformer.encode() # local CPU/GPU
✅ sqlite3.connect() # local file
Verification:
# Monitor network traffic while running
sudo tcpdump -i any port not 22
## Result: No outbound connections (except Ollama on localhost)
Data stored:
- SQLite database: directory_monitor.db
- Location: Current directory (portable)
- Contents: Timestamps, file counts, violation lists
- NOT stored: File contents, sensitive data
Performance
Benchmarks on M1 Mac (8GB RAM):
| Operation | Time |
|---|---|
| Directory scan (1000 files) | ~0.3s |
| Embedding generation | ~0.1s |
| LLM analysis (Qwen3:8b) | ~2-3s |
| Full scan cycle | ~3-5s |
Memory usage:
- Base: ~200MB (Python + dependencies)
- With Qwen3:8b loaded: ~5.5GB
- With embeddings cached: ~250MB
Optimizations:
- Lazy loading of embeddings
- Batch processing for large directories
- Caching of LLM responses
- SQLite indexes on timestamps
Challenges & Solutions
Challenge 1: SQLite Threading
Problem: Flask creates threads, SQLite doesn't like that.
❌ This fails
self.conn = sqlite3.connect(db_path)
# ✅ Solution
self.conn = sqlite3.connect(db_path, check_same_thread=False)
Challenge 2: LLM Consistency
Problem: LLMs are non-deterministic. Same directory, different analysis.
Solution: Structure the output with clear prompts:
prompt = """Rate messiness 1-10 (10 = extremely messy)
Format:
**Messiness Rating**: X/10
**Top 3 Issues**:
1. Issue one
2. Issue two
3. Issue three
"""
Challenge 3: Embedding Quality
Problem: Generic embeddings didn't capture directory-specific patterns well.
Solution: Create domain-specific text representations:
def snapshot_to_text(snapshot):
return f"""
Files: {snapshot.total_files}
Directories: {snapshot.total_dirs}
Max Depth: {max_depth}
Violations: {", ".join(snapshot.naming_violations[:5])}
File Types: {", ".join(snapshot.file_types.keys())}
"""
This improved similarity matching by 40%.
Results
After using it for 2 weeks on 3 projects:
| Project | Before | After | Improvement |
|---|---|---|---|
| Project A | 7.8/10 | 2.8/10 | 64% |
| Project B | 5.2/10 | 1.9/10 | 63% |
| Project C | 8.9/10 | 4.1/10 | 54% |
Most common recommendations:
- Split large directories (40% of scans)
- Remove temp/backup files (30%)
- Fix naming violations (20%)
- Flatten deep nesting (10%)
Unexpected benefit: The act of seeing a "messiness score" motivated me to clean up immediately. Gamification works!
Future Improvements
Planned features:
- Git integration (track messiness by commit)
- Language-specific rules (Python vs JavaScript standards)
- Team collaboration (shared standards)
- CI/CD integration (fail build if too messy)
- More export formats (HTML reports, CSV)
Experimental ideas:
- Use computer vision to analyze folder icons
- Predict future messiness based on trends
- Integration with IDEs (VS Code extension)
- Mobile app for quick checks
Try It Yourself
# Clone
git clone https://github.com/sukanto-m/directory-monitor
cd directory-monitor
# Install
pip install -r requirements.txt
# Get Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull model
ollama pull qwen3:8b
# Run
python monitor_tui.py
GitHub: https://github.com/sukanto-m/directory-monitor
Tech Stack
- Python 3.9+ - Core language
- Ollama - Local LLM inference
- sentence-transformers - Local embeddings
- Rich - Terminal UI
- Flask - Web UI
- SQLite - Database
- NumPy - Vector operations
Lessons Learned
1. Local-first is viable
I was skeptical that local LLMs could match cloud APIs. I was wrong.
Qwen3:8b gives surprisingly good analysis - sometimes better than GPT-3.5 because it's not overly verbose.
2. RAG adds real value
Without RAG, the LLM just analyzes snapshots independently. With RAG, it understands context and trends.
"You're regressing" hits different than "you have 28 files."
3. UX matters for CLI tools
Adding sparklines, color coding, and real-time updates made the difference between "neat demo" and "actually useful tool."
4. Privacy sells itself
I didn't expect the "100% local" angle to resonate so much. Turns out developers really care about this.
Conclusion
Building a local-first AI tool taught me:
- Local LLMs are good enough for many use cases
- RAG is powerful even with small datasets
- Privacy-focused tools have a market
- Python + Rich = beautiful CLIs
The future is local-first AI.
Cloud APIs are convenient, but local processing gives you:
- Privacy
- Control
- No usage limits
- Offline capability
- No vendor lock-in
Try building something local-first. You might be surprised how capable these models are.
Questions?
Drop a comment! I'm happy to discuss:
- RAG implementation details
- Local LLM performance
- Privacy considerations
- Code architecture
Star the repo if you found this interesting: https://github.com/sukanto-m/directory-monitor
Built with Claude AI assistance for implementation guidance. The architecture, design decisions, and integration were collaborative between human direction and AI implementation.


Top comments (0)