<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jessica Ekka</title>
    <description>The latest articles on DEV Community by Jessica Ekka (@jessica_ekka_c7a605f5a2ee).</description>
    <link>https://dev.to/jessica_ekka_c7a605f5a2ee</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3884680%2F31e4e9d7-94da-42c6-8984-d555f2d6e7fa.jpg</url>
      <title>DEV Community: Jessica Ekka</title>
      <link>https://dev.to/jessica_ekka_c7a605f5a2ee</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jessica_ekka_c7a605f5a2ee"/>
    <language>en</language>
    <item>
      <title>Local Voice-Controlled AI Agent (Whisper + Ollama + Streamlit)</title>
      <dc:creator>Jessica Ekka</dc:creator>
      <pubDate>Fri, 17 Apr 2026 14:53:37 +0000</pubDate>
      <link>https://dev.to/jessica_ekka_c7a605f5a2ee/local-voice-controlled-ai-agent-whisper-ollama-streamlit-6ac</link>
      <guid>https://dev.to/jessica_ekka_c7a605f5a2ee/local-voice-controlled-ai-agent-whisper-ollama-streamlit-6ac</guid>
      <description>&lt;p&gt;Most AI assistants today rely heavily on cloud APIs. While powerful, they introduce latency, cost, and privacy concerns.&lt;/p&gt;

&lt;p&gt;So I built a fully local voice-controlled AI agent that can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Understand voice commands&lt;/li&gt;
&lt;li&gt;Detect user intent&lt;/li&gt;
&lt;li&gt;Generate code&lt;/li&gt;
&lt;li&gt;Create files&lt;/li&gt;
&lt;li&gt;Summarize text&lt;/li&gt;
&lt;li&gt;Chat interactively&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All running completely offline using open-source tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  System Architecture
&lt;/h2&gt;

&lt;p&gt;End-to-End Flow&lt;br&gt;
User Input (Voice/Text)&lt;br&gt;
        ↓&lt;br&gt;
Speech-to-Text (Whisper)&lt;br&gt;
        ↓&lt;br&gt;
Intent Detection (Rules + LLM)&lt;br&gt;
        ↓&lt;br&gt;
Execution Engine&lt;br&gt;
   ├── File Operations&lt;br&gt;
   ├── Code Generation&lt;br&gt;
   ├── Summarization&lt;br&gt;
   └── Chat&lt;br&gt;
        ↓&lt;br&gt;
Streamlit UI (Results + Memory)&lt;/p&gt;

&lt;h2&gt;
  
  
  Component Breakdown
&lt;/h2&gt;

&lt;p&gt;app.py → UI + orchestration&lt;br&gt;
agent.py → intent detection + LLM calls&lt;br&gt;
tools.py → secure file operations&lt;br&gt;
stt.py → voice → text&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How It Works&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Input Layer&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;User provides either:&lt;/p&gt;

&lt;p&gt;Voice input (recorded via browser)&lt;br&gt;
Text command&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Speech-to-Text&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Voice input is converted to text using Whisper:&lt;/p&gt;

&lt;p&gt;"create a file called hello.txt"&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Intent Detection&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A hybrid approach is used:&lt;/p&gt;

&lt;p&gt;Rule-based classification (fast + reliable)&lt;br&gt;
LLM fallback for flexibility&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;p&gt;"Write a Python function for factorial"&lt;br&gt;
→ Intent: write_code&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Execution Engine&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Depending on intent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;create_file → writes to sandbox&lt;/li&gt;
&lt;li&gt;write_code → calls LLM&lt;/li&gt;
&lt;li&gt;summarize → LLM summarization&lt;/li&gt;
&lt;li&gt;chat → conversational response&lt;/li&gt;
&lt;li&gt; UI Layer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Built with Streamlit:&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Shows transcription&lt;br&gt;
Displays detected intent&lt;br&gt;
Requires confirmation for file actions&lt;br&gt;
Displays results + saved files&lt;/p&gt;

&lt;p&gt;All file operations are sandboxed to:&lt;/p&gt;

&lt;p&gt;/output/&lt;/p&gt;

&lt;p&gt;This prevents:&lt;/p&gt;

&lt;p&gt;Directory traversal (../../)&lt;br&gt;
Overwriting system files&lt;br&gt;
Unsafe file access&lt;/p&gt;

&lt;h2&gt;
  
  
  Model Strategy
&lt;/h2&gt;

&lt;p&gt;Running large models locally can be tricky, so I used:&lt;/p&gt;

&lt;p&gt;Model   Purpose&lt;br&gt;
llama3.2:3b Primary model&lt;br&gt;
llama3.2:1b Fallback (low RAM)&lt;/p&gt;

&lt;h2&gt;
  
  
  Fallback Mechanism
&lt;/h2&gt;

&lt;p&gt;If the main model fails:&lt;br&gt;
→ automatically switches to a smaller one&lt;/p&gt;

&lt;p&gt;This ensures stability even on low-memory systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dynamic Model Switching
&lt;/h2&gt;

&lt;p&gt;The UI includes a dropdown to switch models in real time:&lt;/p&gt;

&lt;p&gt;No restart required&lt;br&gt;
Useful for testing performance&lt;br&gt;
Helps in benchmarking&lt;br&gt;
 Session Memory (Bonus Feature)&lt;/p&gt;

&lt;p&gt;The system maintains a short-term memory:&lt;/p&gt;

&lt;p&gt;Stores last commands&lt;br&gt;
Tracks detected intents&lt;br&gt;
Displays recent activity&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Command: create hello.txt
Intent: create_file
⚠️ Challenges Faced

&lt;ol&gt;
&lt;li&gt;LLM Returning Bad JSON&lt;/li&gt;
&lt;/ol&gt;


&lt;/li&gt;

&lt;/ol&gt;

&lt;p&gt;Sometimes the model output was malformed.&lt;/p&gt;

&lt;p&gt;Fix:&lt;/p&gt;

&lt;p&gt;Avoid strict JSON parsing&lt;br&gt;
Use rule-based fallback&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;High Memory Usage&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Large models like 70B were unusable locally.&lt;/p&gt;

&lt;p&gt;Fix:&lt;/p&gt;

&lt;p&gt;Switched to smaller models (3B, 1B)&lt;br&gt;
Added fallback logic&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Voice Misinterpretation&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;p&gt;"write" → "right"&lt;/p&gt;

&lt;p&gt;Fix:&lt;/p&gt;

&lt;p&gt;Added text cleaning layer&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Parameter Extraction Issues&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;p&gt;"write hello world in it"&lt;/p&gt;

&lt;p&gt;Fix:&lt;/p&gt;

&lt;p&gt;Regex-based extraction&lt;br&gt;
Post-cleaning of phrases&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bonus Features Implemented&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Human-in-the-loop confirmation&lt;/li&gt;
&lt;li&gt;Graceful error handling&lt;/li&gt;
&lt;li&gt;Session memory&lt;/li&gt;
&lt;li&gt;Model switching&lt;/li&gt;
&lt;li&gt;Sandboxed file system&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Future Improvements
&lt;/h2&gt;

&lt;p&gt;Multi-command execution&lt;/p&gt;

&lt;p&gt;“Summarize this and save it to file”&lt;/p&gt;

&lt;p&gt;Persistent memory (database)&lt;br&gt;
Model benchmarking dashboard&lt;br&gt;
Smarter NLP-based intent detection&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>llm</category>
      <category>python</category>
    </item>
  </channel>
</rss>
