DEV Community

Vani Soni
Vani Soni

Posted on

VOICE CONTROLLED LOCAL AI AGENT

Hey Everyone!!
Just wanted to share a quick glimpse of my latest project, "VOICE CONTROLLED LOCAL AI AGENT".

I have always been deeply interested in AI and Machine Learning, especially how systems like ChatGPT, Claude, Copilot, etc., work behind the scenes. So, I decided to create something related to it.

βš™οΈHOW IT WORKS:-

  1. The user provides input - either in the form of AUDIO or TEXT.
  2. Audio input is then converted into text using speechrecognition.
  3. The text is sent to a local LLM (Llama3 via Ollama) to detect user intent.
  4. Based on the detected intent, the agent performs actions like: *Creating a file *Generating Python code *Summarising text *General chat response
  5. The result is displayed through a Streamlit interface.

πŸ› οΈTECH STACK:-
Python
Streamlit
Pydub
SpeechRecognition
Llama3 via Ollama (LLM)

🚧 CHALLENGES FACED:-
*Initially implemented Whisper for speech-to-text, but faced compatibility issues as it works best with Python 3.10, while my environment was Python 3.14.
*Switched to SpeechRecognition + Pydub, ensuring smoother compatibility and execution.
*Handling multiple audio formats required converting files into WAV format before processing.
*Ensuring accurate intent detection from LLM responses and cleaning inconsistent outputs.

This project helped me understand how AI agents actually function β€” from input processing β†’ intent understanding β†’ action execution.

Still improving it and planning to add more advanced features soon!!

Would love to hear your feedback!😊

Top comments (0)