<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Srishti Mishra</title>
    <description>The latest articles on DEV Community by Srishti Mishra (@srishti1806).</description>
    <link>https://dev.to/srishti1806</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3873651%2F81bd91f9-c61e-467e-8f40-d4f8cc2f2d09.jpeg</url>
      <title>DEV Community: Srishti Mishra</title>
      <link>https://dev.to/srishti1806</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/srishti1806"/>
    <language>en</language>
    <item>
      <title>I Built a Voice-Controlled Local AI Agent That Actually Works — Here's Everything I Learned</title>
      <dc:creator>Srishti Mishra</dc:creator>
      <pubDate>Sun, 12 Apr 2026 15:36:08 +0000</pubDate>
      <link>https://dev.to/srishti1806/i-built-a-voice-controlled-local-ai-agent-that-actually-works-heres-everything-i-learned-3lal</link>
      <guid>https://dev.to/srishti1806/i-built-a-voice-controlled-local-ai-agent-that-actually-works-heres-everything-i-learned-3lal</guid>
      <description>&lt;p&gt;&lt;strong&gt;What I Built&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A voice-controlled AI agent that accepts audio input, classifies the user's intent, executes local tools, and displays the full pipeline in a Streamlit UI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Try it:&lt;/strong&gt; &lt;a href="https://github.com/Srishti-1806/OA_Submission/" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; | &lt;a href="https://youtu.be/CogucxcxqlI?si=4qf1I-4La3uCyc6X" rel="noopener noreferrer"&gt;Demo Video&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;*&lt;em&gt;Architecture&lt;br&gt;
*&lt;/em&gt;&lt;br&gt;
Audio (mic / file / text)&lt;br&gt;
        ↓&lt;br&gt;
[1] STT  — Groq Whisper API&lt;br&gt;
        ↓&lt;br&gt;
[2] Intent Detection — LLaMA 3.3 70B (JSON output)&lt;br&gt;
        ↓&lt;br&gt;
   create_file | write_code | summarize | general_chat | compound&lt;br&gt;
        ↓&lt;br&gt;
[3] Tool Execution → output/ folder&lt;br&gt;
        ↓&lt;br&gt;
[4] Streamlit UI&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;Models Used&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;STT&lt;/td&gt;
&lt;td&gt;Groq Whisper (whisper-large-v3)&lt;/td&gt;
&lt;td&gt;Local Whisper took 12–18s per clip on CPU; Groq does it in under 1s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM&lt;/td&gt;
&lt;td&gt;LLaMA 3.3 70B via Groq&lt;/td&gt;
&lt;td&gt;Reliable structured JSON output; local 3B models were inconsistent&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Hardware note:&lt;/strong&gt; Local &lt;code&gt;openai/whisper-base&lt;/code&gt; via HuggingFace works but is too slow for real-time use without a GPU. Groq's free tier is fast enough to feel instant.&lt;/p&gt;



&lt;p&gt;** Intent Classification**&lt;/p&gt;

&lt;p&gt;The LLM returns strict JSON:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;{&lt;br&gt;
  "intent": "write_code",&lt;br&gt;
  "filename": "retry.py",&lt;br&gt;
  "language": "python"&lt;br&gt;
}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For compound commands like &lt;em&gt;"write a retry function and send a leave letter"&lt;/em&gt;, it returns a &lt;code&gt;tasks&lt;/code&gt; array and the agent runs each sub-task sequentially. A keyword-based fallback handles API failures.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Supported Intents&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Intent&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;create_file&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;em&gt;"Make a notes.txt"&lt;/em&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;write_code&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;em&gt;"Write a Python retry decorator"&lt;/em&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;summarize&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;em&gt;"Summarize this and save it"&lt;/em&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;general_chat&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;em&gt;"What is a linked list?"&lt;/em&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;compound&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;em&gt;"Write a C++ sort and a leave letter"&lt;/em&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;p&gt;** Key Challenges**&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Follow-up save commands&lt;/strong&gt; — &lt;em&gt;"Save that as a text file"&lt;/em&gt; has no content. The agent looks backward through chat history to find the last substantive assistant response and writes that to disk.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Compound commands&lt;/strong&gt; — Split on &lt;code&gt;\band\b&lt;/code&gt; or commas, run intent detection on each fragment, execute independently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Sandboxing&lt;/strong&gt; — All file writes are restricted to &lt;code&gt;output/&lt;/code&gt; via &lt;code&gt;_safe_filename()&lt;/code&gt; which strips path traversal sequences.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Bonus Features Implemented&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ &lt;strong&gt;Compound commands&lt;/strong&gt; — multiple actions in one input&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Human-in-the-loop&lt;/strong&gt; — confirmation toggle before any file operation&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Graceful degradation&lt;/strong&gt; — keyword fallback if LLM fails; errors shown in UI&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Session memory&lt;/strong&gt; — last 10 messages passed as context on every call&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;Setup&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;git clone &lt;a href="https://github.com/Srishti-1806/.git/OA_Submission.git" rel="noopener noreferrer"&gt;https://github.com/Srishti-1806/.git/OA_Submission.git&lt;/a&gt;&lt;br&gt;
cd OA_Submission&lt;br&gt;
pip install -r requirements.txt&lt;br&gt;
cp .env   # add GROQ_API_KEY&lt;br&gt;
streamlit run app.py&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Free Groq API key at &lt;a href="https://console.groq.com" rel="noopener noreferrer"&gt;console.groq.com&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;**&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flb61qv22otcsklr1c99h.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flb61qv22otcsklr1c99h.jpeg" alt=" " width="800" height="316"&gt;&lt;/a&gt;**&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;True token
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe0pbuzzjxbmrpxzqu1ua.jpeg" alt=" " width="800" height="402"&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa4fvv1f2gvuvvmsadkit.jpeg" alt=" " width="383" height="805"&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;streaming in the UI&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Structured outputs / function calling instead of regex-cleaned JSON&lt;/li&gt;
&lt;li&gt;TTS for voice responses to close the loop&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>machinelearning</category>
      <category>backend</category>
    </item>
  </channel>
</rss>
