Building a Private, Voice-Controlled AI Agent with Ollama and Faster-Whisper
π― Project Overview
As part of the Mem0 AI & Generative AI Developer Intern assignment, I built a local AI agent that allows users to manage files, write code, and summarize text using only their voice. The core mission: 100% privacy and zero cloud dependencies.
π οΈ The Tech Stack
To ensure the agent runs entirely on a local machine, I selected the following components:
- Frontend: Streamlit for a fast, responsive Web UI.
- Speech-to-Text: Faster-Whisper (Int8 quantized) for high-speed local transcription on a CPU.
-
Brain (LLM): Ollama running
phi3:mini(orllama3.2:1b) to classify intents. -
Tool Execution: Python's
osandpathlibfor safe file operations.
ποΈ The Architecture
The pipeline follows a clear flow:
- Audio Input: The user provides audio via the browser microphone or a file upload.
- Transcription: Faster-Whisper processes the audio into text.
- Intent Detection: The LLM analyzes the text and returns a structured JSON object.
-
Action: The system executes the specific intent (e.g., creating a file in the
output/folder).
π§ Challenges Faced
1. Hardware Constraints (RAM)
My local machine had limited available memory (~3.1 GiB), which initially caused crashes when running larger models.
Solution: I optimized the system by switching to a smaller parameter model (Phi-3) and forcing CPU-only mode in Ollama.
2. Browser Mic Permissions
Running Streamlit on localhost often triggers strict browser security blocks for the microphone.
Solution: I implemented a dual-input system that allows users to upload .wav files as a reliable fallback.
π Safety & Security
To prevent accidental system damage, I implemented a strict safety constraint: all file operations are sandboxed within a dedicated ./output/ directory.
πΊ Conclusion
Building this agent taught me how to bridge the gap between speech-to-text and LLM tool-calling in a local environment.
Links:
- GitHub: [https://github.com/idrisshaik630/voice_ai_agent]
- Video Demo: [https://youtu.be/dTX6O6MKJrs?si=zabAcyBtRyD_2xr4]
Top comments (0)