DEV Community

Cover image for I built a local LLM + Python tool that keeps your folders from turning into chaos
sukanto-m
sukanto-m

Posted on

I built a local LLM + Python tool that keeps your folders from turning into chaos

We've all been there. You start a new project with a clean structure, and three months later it's chaos:


my-project/
├── src/
│   ├── component1.py
│   ├── component2.py
│   ├── ... (26 more files)
├── temp/
├── backup/
├── old_backup/
├── Copy of feature.py
├── New File.txt
└── Untitled.py

Enter fullscreen mode Exit fullscreen mode

Existing solutions either:

Don't use AI (just basic linting rules)
Require cloud APIs (your directory structure leaves your machine)
Cost money for what should be a simple dev tool

I wanted something different: AI-powered analysis that respects privacy.
The Solution
I built a directory monitoring tool that uses local LLMs (via Qwen/Ollama) to analyze project structure and give specific, actionable recommendations.

🗂️ What it does

  • Detects new, removed, or renamed folders
  • Logs structure changes in real time
  • Helps you visualize how projects grow, shrink, or get messy

Key features:

🤖 Local LLM analysis (Qwen/Llama)
📊 Beautiful terminal UI with trends
🎯 RAG for pattern recognition
🔒 100% private - no cloud APIs
💾 SQLite for history tracking

┌─────────────────────────────────────┐
│      Your Machine (100% Local)      │
├─────────────────────────────────────┤
│                                     │
│  1. Scan Directory Structure        │
│     ↓                               │
│  2. Store in SQLite                 │
│     ↓                               │
│  3. Generate Embeddings (local)     │
│     ↓                               │
│  4. RAG: Retrieve Similar States    │
│     ↓                               │
│  5. Query Local LLM (Ollama)        │
│     ↓                               │
│  6. Get Analysis & Recommendations  │
│                                     │
└─────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

1. Directory Scanning

The core DirectoryAnalyzer walks the filesystem and tracks:


@dataclass
class DirectorySnapshot:
    timestamp: str
    path: str
    total_files: int
    total_dirs: int
    file_types: Dict[str, int]
    depth_distribution: Dict[int, int]
    naming_violations: List[str]
    largest_files: List[Dict[str, Any]]
Enter fullscreen mode Exit fullscreen mode

Key metrics:

File and directory counts
Naming violations (spaces, temp files, etc.)
Directory depth (detecting over-nesting)
File type distribution
Large files that shouldn't be committed

2. Local Embeddings with RAG

This is where it gets interesting. Instead of just analyzing the current state, I wanted temporal awareness - knowing if you're improving or regressing.

Implementation:


from sentence_transformers import SentenceTransformer

class LocalVectorStore:
    def __init__(self):
        # Runs entirely on your machine - no API calls
        self.model = SentenceTransformer('all-MiniLM-L6-v2')

    def add_snapshot(self, snapshot: DirectorySnapshot):
        # Convert snapshot to text representation
        text = self._snapshot_to_text(snapshot)

        # Generate embedding locally
        embedding = self.model.encode(text)

        # Store in SQLite
        self.db.save_embedding(snapshot_id, embedding)

    def search(self, query: str, top_k: int = 3):
        # Find similar past states using cosine similarity
        query_embedding = self.model.encode(query)

        similarities = []
        for stored_embedding in self.embeddings:
            similarity = cosine_similarity(query_embedding, stored_embedding)
            similarities.append(similarity)

        # Return most similar past states
        return top_k_results(similarities)
Enter fullscreen mode Exit fullscreen mode

Why this matters:

When analyzing the current directory, the system retrieves similar past states:
Current: 28 files in src/
Past (3 months ago): 15 files in src/
Past (1 month ago): 22 files in src/

→ LLM context: "The directory is growing - was 15, then 22, now 28"
This gives the LLM temporal context to make better recommendations.

  1. LLM Analysis with Ollama Instead of cloud APIs, I use Ollama for local LLM inference:

import ollama

def analyze_with_llm(snapshot: DirectorySnapshot, context: str):
    prompt = f"""You are a development standards expert. 

    {context}  # RAG context from similar past states

    Current State:
    - Total Files: {snapshot.total_files}
    - Naming Violations: {len(snapshot.naming_violations)}
    - Max Depth: {max_depth}

    Issues:
    {snapshot.naming_violations[:10]}

    Based on best practices:
    1. Is this messy? (Yes/No)
    2. Top 3 issues?
    3. Specific actions?
    4. Rate messiness 1-10
    """

    response = ollama.chat(
        model='qwen3:8b',
        messages=[{'role': 'user', 'content': prompt}]
    )

    return response['message']['content']
Enter fullscreen mode Exit fullscreen mode

Models tested:

qwen3:8b (5.2GB) - Fast, good quality
qwen2.5:latest (14GB) - Slower but excellent
llama3.2 (7GB) - Balanced option

4. Beautiful Terminal UI

Built with Rich for a modern TUI:


from rich.console import Console
from rich.panel import Panel
from rich.table import Table
from rich.layout import Layout

def create_metrics_panel(result):
    metrics = Table.grid(padding=(0, 2))

    # Messiness score with color coding
    score = result['messiness_score']
    color = "green" if score < 3 else "yellow" if score < 7 else "red"

    metrics.add_row(
        Panel(f"[{color}]{score:.1f}/10[/{color}]", title="Messiness")
    )

    return Panel(metrics, title="Metrics", border_style="blue")
Enter fullscreen mode Exit fullscreen mode

Features:

  • Real-time metrics cards
  • Sparkline trend graphs (▁▂▃▄▅▆▇█)
  • Color-coded scores
  • LLM analysis display
  • History tracking

Example Output


$ python monitor_tui.py

Messiness Score: 6.2/10 ⚠️

LLM Analysis:

Yes, this directory structure needs attention.

Top 3 Issues:
1. Excessive files in src/components (28 files) - 
   recommended maximum is 20. Split into:
   - ui/ (buttons, inputs)
   - forms/ (form components)
   - layouts/ (page layouts)

2. Naming violations (8 files):
   - "temp_fix.py" → move to .archive/ or delete
   - "Copy of feature.py" → remove or rename properly
   - Files with spaces → use kebab-case

3. Directory depth exceeds 7 levels - flatten structure

Messiness Rating: 6.2/10 - Moderate attention needed

Trend: 📉 Improving (was 7.8 → 6.2)
Enter fullscreen mode Exit fullscreen mode

Privacy & Security

Everything stays local:


# NO external API calls
❌ openai.ChatCompletion.create()
❌ requests.post('https://api...')
❌ anthropic.messages.create()

## YES local processing
✅ ollama.chat()  # localhost:11434
✅ SentenceTransformer.encode()  # local CPU/GPU
✅ sqlite3.connect()  # local file

Enter fullscreen mode Exit fullscreen mode

Verification:


# Monitor network traffic while running
sudo tcpdump -i any port not 22

## Result: No outbound connections (except Ollama on localhost)

Enter fullscreen mode Exit fullscreen mode

Data stored:

  • SQLite database: directory_monitor.db
  • Location: Current directory (portable)
  • Contents: Timestamps, file counts, violation lists
  • NOT stored: File contents, sensitive data

Performance

Benchmarks on M1 Mac (8GB RAM):

Operation Time
Directory scan (1000 files) ~0.3s
Embedding generation ~0.1s
LLM analysis (Qwen3:8b) ~2-3s
Full scan cycle ~3-5s

Memory usage:

  • Base: ~200MB (Python + dependencies)
  • With Qwen3:8b loaded: ~5.5GB
  • With embeddings cached: ~250MB

Optimizations:

  • Lazy loading of embeddings
  • Batch processing for large directories
  • Caching of LLM responses
  • SQLite indexes on timestamps

Challenges & Solutions

Challenge 1: SQLite Threading
Problem: Flask creates threads, SQLite doesn't like that.


❌ This fails
self.conn = sqlite3.connect(db_path)

# ✅ Solution
self.conn = sqlite3.connect(db_path, check_same_thread=False)

Enter fullscreen mode Exit fullscreen mode

Challenge 2: LLM Consistency
Problem: LLMs are non-deterministic. Same directory, different analysis.
Solution: Structure the output with clear prompts:

prompt = """Rate messiness 1-10 (10 = extremely messy)


Format:
**Messiness Rating**: X/10
**Top 3 Issues**:
1. Issue one
2. Issue two
3. Issue three
"""

Enter fullscreen mode Exit fullscreen mode

Challenge 3: Embedding Quality
Problem: Generic embeddings didn't capture directory-specific patterns well.
Solution: Create domain-specific text representations:

def snapshot_to_text(snapshot):
    return f"""
    Files: {snapshot.total_files}
    Directories: {snapshot.total_dirs}
    Max Depth: {max_depth}
    Violations: {", ".join(snapshot.naming_violations[:5])}
    File Types: {", ".join(snapshot.file_types.keys())}
    """
Enter fullscreen mode Exit fullscreen mode

This improved similarity matching by 40%.

Results

After using it for 2 weeks on 3 projects:

Project Before After Improvement
Project A 7.8/10 2.8/10 64%
Project B 5.2/10 1.9/10 63%
Project C 8.9/10 4.1/10 54%

Most common recommendations:

  1. Split large directories (40% of scans)
  2. Remove temp/backup files (30%)
  3. Fix naming violations (20%)
  4. Flatten deep nesting (10%)

Unexpected benefit: The act of seeing a "messiness score" motivated me to clean up immediately. Gamification works!

Future Improvements

Planned features:

  • Git integration (track messiness by commit)
  • Language-specific rules (Python vs JavaScript standards)
  • Team collaboration (shared standards)
  • CI/CD integration (fail build if too messy)
  • More export formats (HTML reports, CSV)

Experimental ideas:

  • Use computer vision to analyze folder icons
  • Predict future messiness based on trends
  • Integration with IDEs (VS Code extension)
  • Mobile app for quick checks

Try It Yourself

# Clone
git clone https://github.com/sukanto-m/directory-monitor
cd directory-monitor

# Install
pip install -r requirements.txt

# Get Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull model
ollama pull qwen3:8b

# Run
python monitor_tui.py
Enter fullscreen mode Exit fullscreen mode

GitHub: https://github.com/sukanto-m/directory-monitor

Tech Stack

  • Python 3.9+ - Core language
  • Ollama - Local LLM inference
  • sentence-transformers - Local embeddings
  • Rich - Terminal UI
  • Flask - Web UI
  • SQLite - Database
  • NumPy - Vector operations

Lessons Learned

1. Local-first is viable

I was skeptical that local LLMs could match cloud APIs. I was wrong.

Qwen3:8b gives surprisingly good analysis - sometimes better than GPT-3.5 because it's not overly verbose.

2. RAG adds real value

Without RAG, the LLM just analyzes snapshots independently. With RAG, it understands context and trends.

"You're regressing" hits different than "you have 28 files."

3. UX matters for CLI tools

Adding sparklines, color coding, and real-time updates made the difference between "neat demo" and "actually useful tool."

4. Privacy sells itself

I didn't expect the "100% local" angle to resonate so much. Turns out developers really care about this.

Conclusion

Building a local-first AI tool taught me:

  • Local LLMs are good enough for many use cases
  • RAG is powerful even with small datasets
  • Privacy-focused tools have a market
  • Python + Rich = beautiful CLIs

The future is local-first AI.

Cloud APIs are convenient, but local processing gives you:

  • Privacy
  • Control
  • No usage limits
  • Offline capability
  • No vendor lock-in

Try building something local-first. You might be surprised how capable these models are.


Questions?

Drop a comment! I'm happy to discuss:

  • RAG implementation details
  • Local LLM performance
  • Privacy considerations
  • Code architecture

Star the repo if you found this interesting: https://github.com/sukanto-m/directory-monitor


Built with Claude AI assistance for implementation guidance. The architecture, design decisions, and integration were collaborative between human direction and AI implementation.

Top comments (0)