Introduction
I built a Voice Controlled AI Agent for Mem0 internship
assignment. It accepts audio input, detects user intent,
and executes actions automatically.
How It Works
- User uploads an audio file
- Groq Whisper converts audio to text
- LLaMA 3.3 70B detects the intent
- Tools execute the action
- Results shown in Streamlit UI
Architecture
Audio Input
↓
Groq Whisper Large V3 (Speech to Text)
↓
LLaMA 3.3 70B (Intent Detection)
↓
Tool Execution
↓
Streamlit UI
Models Used
Speech to Text: Groq Whisper Large V3
- Fast and accurate transcription
- Supports WAV and MP3 formats
- Free API available on Groq
Intent Detection: LLaMA 3.3 70B Versatile
- Classifies user intent accurately
- Fast response time
- Free on Groq API
Supported Intents
- create_file: Creates a new file automatically
- write_code: Generates and saves Python code
- summarize: Summarizes given text
- general_chat: General conversation
Tech Stack
- Python 3.x
- Streamlit for UI
- Groq API for STT and LLM
- Python-dotenv for API key management
Challenges I Faced
Challenge 1: Model Decommissioned
- llama3-8b-8192 model stopped working
- Solution: Switched to llama-3.3-70b-versatile
Challenge 2: Python PATH Issue on Windows
- python command not recognized
- Solution: Used py -m pip install commands
Challenge 3: GitHub Authentication Error
- Permission denied during git push
- Solution: Used Personal Access Token
How to Run This Project
Step 1: Clone the repository
Step 2: Install dependencies
pip install streamlit groq python-dotenv
Step 3: Create .env file and add your key
GROQ_API_KEY=your_key_here
Step 4: Run the app
streamlit run app.py
GitHub Repository
https://github.com/prakashkumarmahato807/voice-agent
Conclusion
Successfully built a working voice agent that processes
audio commands and executes actions automatically.
This project taught me about speech to text,
intent detection, and AI powered web applications.
Top comments (0)