Most AI assistants today rely heavily on cloud APIs. While powerful, they introduce latency, cost, and privacy concerns.
So I built a fully local voice-controlled AI agent that can:
- Understand voice commands
- Detect user intent
- Generate code
- Create files
- Summarize text
- Chat interactively
All running completely offline using open-source tools.
System Architecture
End-to-End Flow
User Input (Voice/Text)
↓
Speech-to-Text (Whisper)
↓
Intent Detection (Rules + LLM)
↓
Execution Engine
├── File Operations
├── Code Generation
├── Summarization
└── Chat
↓
Streamlit UI (Results + Memory)
Component Breakdown
app.py → UI + orchestration
agent.py → intent detection + LLM calls
tools.py → secure file operations
stt.py → voice → text
How It Works
- Input Layer
User provides either:
Voice input (recorded via browser)
Text command
- Speech-to-Text
Voice input is converted to text using Whisper:
"create a file called hello.txt"
- Intent Detection
A hybrid approach is used:
Rule-based classification (fast + reliable)
LLM fallback for flexibility
Example:
"Write a Python function for factorial"
→ Intent: write_code
- Execution Engine
Depending on intent:
- create_file → writes to sandbox
- write_code → calls LLM
- summarize → LLM summarization
- chat → conversational response
- UI Layer
Built with Streamlit:
Shows transcription
Displays detected intent
Requires confirmation for file actions
Displays results + saved files
All file operations are sandboxed to:
/output/
This prevents:
Directory traversal (../../)
Overwriting system files
Unsafe file access
Model Strategy
Running large models locally can be tricky, so I used:
Model Purpose
llama3.2:3b Primary model
llama3.2:1b Fallback (low RAM)
Fallback Mechanism
If the main model fails:
→ automatically switches to a smaller one
This ensures stability even on low-memory systems.
Dynamic Model Switching
The UI includes a dropdown to switch models in real time:
No restart required
Useful for testing performance
Helps in benchmarking
Session Memory (Bonus Feature)
The system maintains a short-term memory:
Stores last commands
Tracks detected intents
Displays recent activity
Example:
- Command: create hello.txt
Intent: create_file
⚠️ Challenges Faced
- LLM Returning Bad JSON
Sometimes the model output was malformed.
Fix:
Avoid strict JSON parsing
Use rule-based fallback
- High Memory Usage
Large models like 70B were unusable locally.
Fix:
Switched to smaller models (3B, 1B)
Added fallback logic
- Voice Misinterpretation
Example:
"write" → "right"
Fix:
Added text cleaning layer
- Parameter Extraction Issues
Example:
"write hello world in it"
Fix:
Regex-based extraction
Post-cleaning of phrases
Bonus Features Implemented
- Human-in-the-loop confirmation
- Graceful error handling
- Session memory
- Model switching
- Sandboxed file system
Future Improvements
Multi-command execution
“Summarize this and save it to file”
Persistent memory (database)
Model benchmarking dashboard
Smarter NLP-based intent detection
Top comments (0)