What if you could control your computer using just your voice — without needing a powerful GPU or heavy local models?
I built a Voice-Controlled AI Agent that:
- Understands speech 🎤
- Detects user intent 🧠
- Executes real actions like file creation, code generation, and summarization ⚡
And the best part?
👉 It works smoothly even on low-end systems (8GB RAM).
🎬 Demo
📽️ Watch the full demo here:
👉 https://drive.google.com/file/d/17Uvp72dDi82pAqEqbJ6pl3LaLphxwaGm/view?usp=sharing
(Replace with your YouTube / Drive / Loom video link)
https://youtu.be/Pl3lwBoYruM
LIVE LINK
https://localaiagent-twxulfwrigcagqtbecnomh.streamlit.app/
✨ Features
-
🎤 Audio Input
- Record directly from microphone
- Upload audio files
-
🧠 Intent Classification
- Converts speech → structured JSON
- Accurately detects user commands
-
⚡ Core Actions
-
create_file→ Creates files safely -
write_code→ Generates and saves code -
summarize_text→ Summarizes content -
general_chat→ Handles normal queries
-
-
🔒 Safe Execution
- All outputs are restricted to
/outputdirectory - Prevents accidental system modification
- All outputs are restricted to
🏗️ System Architecture
Building AI systems locally with limited RAM is challenging. Here's how I solved it:
1. 🎙️ Speech-to-Text (STT)
Local Mode:
Usesopenai-whisper(tiny model) → runs on CPUFast Mode (Recommended):
Uses Groq API (Whisper-large-v3) → extremely fast ⚡
2. 🧠 LLM + Intent Engine
Running large models locally was not feasible:
- 8B models consume ~5GB RAM ❌
- Causes system slowdown
👉 Solution:
- Used Groq API (Llama 3 - 8B / 70B)
-
Provides:
- Fast inference ⚡
- Structured JSON output
- Reliable intent classification
3. 🖥️ Frontend
- Built using Streamlit
- Uses
st.audio_inputfor seamless recording - Simple and clean UI
🔄 How It Works
- User speaks or uploads audio 🎤
- Whisper converts speech → text
- LLM processes text → structured JSON
- System executes action locally
Example:
{
"action": "create_file",
"filename": "hello.py"
}
💻 Example Use Case
🗣️ User says:
"Create a Python file called hello.py"
⚙️ System:
- Transcribes audio
- Detects
create_fileintent - Creates file in
/outputfolder - Shows success message
⚡ Setup Instructions
Prerequisites
- Python 3.10+
- Groq API Key → https://console.groq.com
- FFmpeg installed
Installation
git clone <your-repo-link>
cd local_ai_agent
pip install -r requirements.txt
Environment Setup
GROQ_API_KEY=your_api_key_here
Run the App
streamlit run app.py
⚠️ Challenges Faced
- Running LLMs on 8GB RAM
- Slow transcription using CPU Whisper
- Ensuring consistent JSON output from LLM
- Managing safe file execution
💡 Key Learnings
- Hybrid approach (local + API) is powerful
- Structured prompts = better automation
- UI simplicity improves usability massively
🔮 Future Improvements
- Add more actions (email automation, system control)
- Improve offline performance
- Add memory (conversation history)
- Multi-command execution
🔗 Links
- 💻 GitHub: https://github.com/Akash7367/Local_AI_Agent
- 🌐 Portfolio: https://portfolio-c2xg.vercel.app/?_vercel_share=v9vu4mbb0xIGMIHlCjfjGlcQPbiusSj5
- 🔗 LinkedIn: https://www.linkedin.com/in/akash-kumar-298113264/
🙌 Final Thoughts
This project shows that you don’t need expensive hardware to build powerful AI systems.
With the right architecture and smart trade-offs, even a mid-range laptop can run intelligent AI agents efficiently.
If you found this useful, feel free to ⭐ the repo or share your thoughts!
Top comments (0)