<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sanvi_Kulkarni</title>
    <description>The latest articles on DEV Community by Sanvi_Kulkarni (@sanvi09kulkarni).</description>
    <link>https://dev.to/sanvi09kulkarni</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3872222%2Fe425ac7d-f89d-4bad-9bed-9c0bbfeefd85.jpeg</url>
      <title>DEV Community: Sanvi_Kulkarni</title>
      <link>https://dev.to/sanvi09kulkarni</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sanvi09kulkarni"/>
    <language>en</language>
    <item>
      <title>How I Built a Voice-Controlled Local AI Agent with Python and Groq</title>
      <dc:creator>Sanvi_Kulkarni</dc:creator>
      <pubDate>Fri, 10 Apr 2026 17:33:35 +0000</pubDate>
      <link>https://dev.to/sanvi09kulkarni/how-i-built-a-voice-controlled-local-ai-agent-with-python-and-groq-gm3</link>
      <guid>https://dev.to/sanvi09kulkarni/how-i-built-a-voice-controlled-local-ai-agent-with-python-and-groq-gm3</guid>
      <description>&lt;p&gt;What I Built&lt;/p&gt;

&lt;p&gt;I built a voice-controlled AI agent that can take spoken input and convert it into meaningful actions.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The system:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Accepts input via microphone or audio file&lt;br&gt;
Converts speech to text using Whisper (via Groq API)&lt;br&gt;
Uses an LLM to understand what the user wants&lt;br&gt;
Executes the appropriate action locally — like creating files, generating code, summarizing content, or responding conversationally&lt;/p&gt;

&lt;p&gt;This project was developed as part of the Mem0 AI/ML Generative AI Developer Intern assignment.&lt;/p&gt;

&lt;p&gt;Live demo: [your streamlit URL here]&lt;br&gt;
GitHub: [your github URL here]&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Architecture Overview&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The application follows a simple but effective pipeline:&lt;/p&gt;

&lt;p&gt;Audio Input → Speech-to-Text → Intent Detection → Action Execution → UI Output&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tech stack used:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Streamlit — for building the UI quickly&lt;br&gt;
Groq API — Whisper (speech-to-text) + LLM (intent understanding)&lt;br&gt;
faster-whisper — local fallback for transcription&lt;br&gt;
Python — core logic and tool execution&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Intent Classification Approach&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of training a separate model, I used prompt engineering to guide the LLM to return structured outputs.&lt;/p&gt;

&lt;p&gt;The model is instructed to respond strictly in JSON format like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"intents"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"write_code"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"params"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"filename"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sort.py"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"language"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"python"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bubble sort function"&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This makes the system:&lt;/p&gt;

&lt;p&gt;predictable&lt;br&gt;
easy to parse&lt;br&gt;
extensible for multiple actions&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Supported intents include:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;create_file — creates a file in a safe directory&lt;br&gt;
write_code — generates and saves code&lt;br&gt;
summarize — produces concise summaries&lt;br&gt;
general_chat — handles normal conversations&lt;br&gt;
Handling Multiple Actions&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The system supports compound commands.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;p&gt;“Write a retry function and save it as retry.py”&lt;/p&gt;

&lt;p&gt;This results in multiple intents (write_code + create_file) which are executed sequentially by the system.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Safety Measures&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All file operations are restricted to a controlled output/ directory.&lt;/p&gt;

&lt;p&gt;To prevent misuse:&lt;/p&gt;

&lt;p&gt;Filenames are sanitized&lt;br&gt;
Path traversal (like ../../) is blocked&lt;/p&gt;

&lt;p&gt;This ensures no unintended access to the system.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Challenges I Faced&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Missing file contents&lt;br&gt;
Some project files were empty after setup, which caused import errors. I had to manually verify and restore each file.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Model changes during development&lt;br&gt;
The Groq model llama3-8b-8192 was deprecated, so I switched to llama-3.3-70b-versatile.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Incorrect language detection&lt;br&gt;
The local Whisper model sometimes transcribed in the wrong language. Using Groq’s hosted Whisper resolved this.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Git issues with virtual environment&lt;br&gt;
Accidentally committed .venv, which caused large commits. Fixed using .gitignore and removing it from tracking.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Groq vs Local Ollama — Speed Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Backend&lt;/th&gt;
&lt;th&gt;Transcription&lt;/th&gt;
&lt;th&gt;Intent Classification&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;faster-whisper (local, base model)&lt;/td&gt;
&lt;td&gt;~15-30 seconds&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ollama llama3 (local)&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;~8-12 seconds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Groq API&lt;/td&gt;
&lt;td&gt;~1 second&lt;/td&gt;
&lt;td&gt;~1-2 seconds&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Groq wins by a huge margin for development and demos. For a fully offline/private deployment, Ollama + faster-whisper is the way to go.&lt;/p&gt;




&lt;h2&gt;
  
  
  Graceful Degradation
&lt;/h2&gt;

&lt;p&gt;If the LLM is unavailable, the app falls back to a rule-based intent classifier using keyword matching. So even without an API key or internet connection, basic intents like create_file and write_code still work.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to Run It Yourself
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Clone the repo: git clone &lt;a href="https://github.com/Sanvi09Kulkarni/voice-AI-agent" rel="noopener noreferrer"&gt;https://github.com/Sanvi09Kulkarni/voice-AI-agent&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Install deps: pip install -r requirements.txt&lt;/li&gt;
&lt;li&gt;Get a free Groq API key at console.groq.com&lt;/li&gt;
&lt;li&gt;Run: streamlit run app.py&lt;/li&gt;
&lt;li&gt;Select Groq API in both dropdowns, paste your key, and start talking&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Structured JSON prompting is more reliable than free-form LLM output for classification tasks&lt;/li&gt;
&lt;li&gt;Groq's hosted Whisper is dramatically faster than running Whisper locally on CPU&lt;/li&gt;
&lt;li&gt;Streamlit makes it surprisingly easy to build production-looking AI apps quickly&lt;/li&gt;
&lt;li&gt;Always add .venv to .gitignore before your first commit&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Thanks for reading! Check out the live demo and drop a star on GitHub if you found this useful.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>python</category>
      <category>github</category>
    </item>
  </channel>
</rss>
