<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Saichaithanya Kyatham</title>
    <description>The latest articles on DEV Community by Saichaithanya Kyatham (@saichaithanya_kyatham_7d1).</description>
    <link>https://dev.to/saichaithanya_kyatham_7d1</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3873782%2F1dd909e0-f916-43ea-abaa-4fb0dbab34fd.png</url>
      <title>DEV Community: Saichaithanya Kyatham</title>
      <link>https://dev.to/saichaithanya_kyatham_7d1</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/saichaithanya_kyatham_7d1"/>
    <language>en</language>
    <item>
      <title>Building a Voice-Controlled AI Agent with Groq, OpenRouter, and Streamlit</title>
      <dc:creator>Saichaithanya Kyatham</dc:creator>
      <pubDate>Sat, 11 Apr 2026 16:23:34 +0000</pubDate>
      <link>https://dev.to/saichaithanya_kyatham_7d1/building-a-voice-controlled-ai-agent-with-groq-openrouter-and-streamlit-4g9j</link>
      <guid>https://dev.to/saichaithanya_kyatham_7d1/building-a-voice-controlled-ai-agent-with-groq-openrouter-and-streamlit-4g9j</guid>
      <description>&lt;p&gt;I built a voice-controlled AI agent that can take audio input, convert speech to text, understand the user’s intent, perform local actions, and display the complete pipeline in a simple web UI.&lt;/p&gt;

&lt;p&gt;The goal of the project was to create an agent that supports:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;creating files&lt;/li&gt;
&lt;li&gt;writing code into files&lt;/li&gt;
&lt;li&gt;summarizing text&lt;/li&gt;
&lt;li&gt;general chat&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;To keep the system safe, all generated files are stored only inside an output/ folder.&lt;/p&gt;

&lt;p&gt;Tech Stack:&lt;/p&gt;

&lt;p&gt;I used:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Python&lt;/li&gt;
&lt;li&gt;Streamlit for the UI&lt;/li&gt;
&lt;li&gt;Groq Speech-to-Text API for transcription&lt;/li&gt;
&lt;li&gt;OpenRouter API for intent classification and text generation&lt;/li&gt;
&lt;li&gt;python-dotenv for API key management&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;How It Works:&lt;/p&gt;

&lt;p&gt;The workflow is simple:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The user records audio or uploads an audio file.&lt;/li&gt;
&lt;li&gt;The audio is saved temporarily.&lt;/li&gt;
&lt;li&gt;Groq converts the speech into text.&lt;/li&gt;
&lt;li&gt;OpenRouter classifies the user’s intent.&lt;/li&gt;
&lt;li&gt;Based on the intent, the system performs the required action.&lt;/li&gt;
&lt;li&gt;The UI shows the transcription, intent, action taken, and final result.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Why I Chose This Approach:&lt;/p&gt;

&lt;p&gt;At first, I considered using local Whisper and Ollama. However, local speech-to-text often needs extra dependencies like FFmpeg, and local LLM setup can be harder to manage across devices.&lt;/p&gt;

&lt;p&gt;To make the project easier to run and more deployment-friendly, I used:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Groq for fast speech transcription&lt;/li&gt;
&lt;li&gt;OpenRouter for reasoning, summarization, chat, and code generation&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This made the system more stable and portable.&lt;/p&gt;

&lt;p&gt;Main Challenges:&lt;/p&gt;

&lt;p&gt;One challenge was intent classification.&lt;br&gt;
For example, a command like:&lt;/p&gt;

&lt;p&gt;“Create a Python file with a retry function”&lt;/p&gt;

&lt;p&gt;was initially classified as create_file instead of write_code.&lt;/p&gt;

&lt;p&gt;I fixed this by improving the classifier prompt and making the intent rules more explicit.&lt;/p&gt;

&lt;p&gt;Another issue was handling API keys securely. I solved that by using a .env file and excluding it from GitHub with .gitignore.&lt;/p&gt;

&lt;p&gt;What I Learned:&lt;/p&gt;

&lt;p&gt;This project taught me that building an AI agent is not just about calling a model. The real work is in:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;handling inputs properly&lt;/li&gt;
&lt;li&gt;structuring model outputs&lt;/li&gt;
&lt;li&gt;routing actions safely&lt;/li&gt;
&lt;li&gt;building a clear interface for users&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I also learned how much prompt design matters. A small prompt change can significantly improve the quality of intent detection.&lt;/p&gt;

&lt;p&gt;Conclusion:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fldxp1vgvgytghn6kty50.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fldxp1vgvgytghn6kty50.png" alt=" " width="800" height="462"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This project was a practical way to combine speech recognition, intent understanding, local tool execution, and UI design into one application.&lt;/p&gt;

&lt;p&gt;Using Groq, OpenRouter, and Streamlit, I built a voice-controlled AI agent that can listen, understand, and act on user commands in a safe and structured way.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>python</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
