DEV Community

prakashkumarmahato807
prakashkumarmahato807

Posted on

Building a Voice Controlled AI Agent with Groq and Streamlit

Introduction

I built a Voice Controlled AI Agent for Mem0 internship
assignment. It accepts audio input, detects user intent,
and executes actions automatically.

How It Works

  1. User uploads an audio file
  2. Groq Whisper converts audio to text
  3. LLaMA 3.3 70B detects the intent
  4. Tools execute the action
  5. Results shown in Streamlit UI

Architecture

Audio Input

Groq Whisper Large V3 (Speech to Text)

LLaMA 3.3 70B (Intent Detection)

Tool Execution

Streamlit UI

Models Used

Speech to Text: Groq Whisper Large V3

  • Fast and accurate transcription
  • Supports WAV and MP3 formats
  • Free API available on Groq

Intent Detection: LLaMA 3.3 70B Versatile

  • Classifies user intent accurately
  • Fast response time
  • Free on Groq API

Supported Intents

  • create_file: Creates a new file automatically
  • write_code: Generates and saves Python code
  • summarize: Summarizes given text
  • general_chat: General conversation

Tech Stack

  • Python 3.x
  • Streamlit for UI
  • Groq API for STT and LLM
  • Python-dotenv for API key management

Challenges I Faced

Challenge 1: Model Decommissioned

  • llama3-8b-8192 model stopped working
  • Solution: Switched to llama-3.3-70b-versatile

Challenge 2: Python PATH Issue on Windows

  • python command not recognized
  • Solution: Used py -m pip install commands

Challenge 3: GitHub Authentication Error

  • Permission denied during git push
  • Solution: Used Personal Access Token

How to Run This Project

Step 1: Clone the repository
Step 2: Install dependencies
pip install streamlit groq python-dotenv

Step 3: Create .env file and add your key
GROQ_API_KEY=your_key_here

Step 4: Run the app
streamlit run app.py

GitHub Repository

https://github.com/prakashkumarmahato807/voice-agent

Conclusion

Successfully built a working voice agent that processes
audio commands and executes actions automatically.
This project taught me about speech to text,
intent detection, and AI powered web applications.

Top comments (0)