Introduction

As a third-year undergraduate interested in AI systems, I wanted to
explore how we can move beyond chat-based interfaces and build systems
that actually perform real actions.

In this project, I built a voice-controlled AI agent that takes audio
input, understands user intent, and executes tasks like file creation,
code generation, summarization, and general chat.

Problem Statement

Most AI systems today are limited to text-based interaction. Even voice
assistants often act as wrappers over chat models and do not perform
meaningful system-level actions.

The goal of this project was to build an agent that: - accepts voice
input - understands the intent behind it - executes real actions on the
system safely

Project Link

suryakalla06 / voice-controlled_local_ai_agent

Voice-Controlled Local AI Agent

This project implements the assignment from Mem0_ AI_ML & Generative AI Developer Intern Assignment.pdf: a voice-driven AI agent that accepts audio, transcribes speech, classifies the user's intent, safely executes local actions inside output/, and shows the full pipeline in a Streamlit UI.

Assignment status

Requirement-by-requirement status against the PDF:

Audio input from microphone: satisfied
Audio file upload: satisfied
Speech-to-text: satisfied through OpenAI or Groq API-based STT
Local or API STT note in README: satisfied
Intent understanding with LLM: satisfied through Ollama, OpenAI, or Groq
Minimum supported intents
- create file: satisfied
- write code to new or existing file: satisfied
- summarize text: satisfied
- general chat: satisfied
Tool execution for local file operations: satisfied
Create files or folders inside sandboxed output/: satisfied
Code generation saved directly to file: satisfied
Text summarization: satisfied
UI shows transcription: satisfied
UI shows detected intent: satisfied
UI shows action taken: satisfied
…

View on GitHub

DEV Community

A Voice-Controlled AI Agent for Real-World Task Execution

Introduction

Problem Statement

System Overview

Tech Stack

Key Design Decisions

Speech-to-Text (Groq API)

LLM (Ollama)

Core Features

Challenges

Example Flow

Conclusion

Project Link

suryakalla06 / voice-controlled_local_ai_agent

Voice-Controlled Local AI Agent

Assignment status

Top comments (0)