prakashkumarmahato807

Posted on Apr 14

Building a Voice Controlled AI Agent with Groq and Streamlit

#agents #ai #llm #showdev

Introduction

I built a Voice Controlled AI Agent for Mem0 internship
assignment. It accepts audio input, detects user intent,
and executes actions automatically.

How It Works

User uploads an audio file
Groq Whisper converts audio to text
LLaMA 3.3 70B detects the intent
Tools execute the action
Results shown in Streamlit UI

Architecture

Audio Input
↓
Groq Whisper Large V3 (Speech to Text)
↓
LLaMA 3.3 70B (Intent Detection)
↓
Tool Execution
↓
Streamlit UI

Models Used

Speech to Text: Groq Whisper Large V3

Fast and accurate transcription
Supports WAV and MP3 formats
Free API available on Groq

Intent Detection: LLaMA 3.3 70B Versatile

Classifies user intent accurately
Fast response time
Free on Groq API

Supported Intents

create_file: Creates a new file automatically
write_code: Generates and saves Python code
summarize: Summarizes given text
general_chat: General conversation

Tech Stack

Python 3.x
Streamlit for UI
Groq API for STT and LLM
Python-dotenv for API key management

Challenges I Faced

Challenge 1: Model Decommissioned

llama3-8b-8192 model stopped working
Solution: Switched to llama-3.3-70b-versatile

Challenge 2: Python PATH Issue on Windows

python command not recognized
Solution: Used py -m pip install commands

Challenge 3: GitHub Authentication Error

Permission denied during git push
Solution: Used Personal Access Token

How to Run This Project

Step 1: Clone the repository
Step 2: Install dependencies
pip install streamlit groq python-dotenv

Step 3: Create .env file and add your key
GROQ_API_KEY=your_key_here

Step 4: Run the app
streamlit run app.py

GitHub Repository

https://github.com/prakashkumarmahato807/voice-agent

Conclusion

Successfully built a working voice agent that processes
audio commands and executes actions automatically.
This project taught me about speech to text,
intent detection, and AI powered web applications.

DEV Community