Building a Voice-Controlled AI Agent with Groq, OpenRouter, and Streamlit

Saichaithanya Kyatham — Sat, 11 Apr 2026 16:23:34 +0000

I built a voice-controlled AI agent that can take audio input, convert speech to text, understand the user’s intent, perform local actions, and display the complete pipeline in a simple web UI.

The goal of the project was to create an agent that supports:

creating files
writing code into files
summarizing text
general chat

To keep the system safe, all generated files are stored only inside an output/ folder.

Tech Stack:

I used:

Python
Streamlit for the UI
Groq Speech-to-Text API for transcription
OpenRouter API for intent classification and text generation
python-dotenv for API key management

How It Works:

The workflow is simple:

The user records audio or uploads an audio file.
The audio is saved temporarily.
Groq converts the speech into text.
OpenRouter classifies the user’s intent.
Based on the intent, the system performs the required action.
The UI shows the transcription, intent, action taken, and final result.

Why I Chose This Approach:

At first, I considered using local Whisper and Ollama. However, local speech-to-text often needs extra dependencies like FFmpeg, and local LLM setup can be harder to manage across devices.

To make the project easier to run and more deployment-friendly, I used:

Groq for fast speech transcription
OpenRouter for reasoning, summarization, chat, and code generation

This made the system more stable and portable.

Main Challenges:

One challenge was intent classification.
For example, a command like:

“Create a Python file with a retry function”

was initially classified as create_file instead of write_code.

I fixed this by improving the classifier prompt and making the intent rules more explicit.

Another issue was handling API keys securely. I solved that by using a .env file and excluding it from GitHub with .gitignore.

What I Learned:

This project taught me that building an AI agent is not just about calling a model. The real work is in:

handling inputs properly
structuring model outputs
routing actions safely
building a clear interface for users

I also learned how much prompt design matters. A small prompt change can significantly improve the quality of intent detection.

Conclusion:

This project was a practical way to combine speech recognition, intent understanding, local tool execution, and UI design into one application.

Using Groq, OpenRouter, and Streamlit, I built a voice-controlled AI agent that can listen, understand, and act on user commands in a safe and structured way.

DEV Community: Saichaithanya Kyatham

Building a Voice-Controlled AI Agent with Groq, OpenRouter, and Streamlit