Building Mini Gravity: A Local, Private Voice AI Agent

#agents #ai #llm #privacy

What if you could build a private, high-performance voice agent that runs entirely on your local machine, handles your documents, and generates code? That was the goal behind Mini Gravity ,it is similar to Google's antigravity in concept but nothing leaves your machine. I quickly learned that while the model matters, the real "secret sauce" is the combination of robust logic definitions and the relentless pursuit of the perfect prompt.

The Architecture: A Three-Layer Pipeline
Mini Gravity is designed as a sequential pipeline that mirrors human interaction:

The Ear (STT Layer): We use Groq’s Whisper-large-v3 for sub-second transcription that feels like a real conversation.

The Brain (Intent Layer): Uses DeepSeek-Coder-6.7B via a local Ollama REST API to classify natural language into structured JSON intents.

The Hands (Execution Layer): A Python engine that maps intents to system actions—parsing PDFs, managing files, and executing code.

The Model Pivot: Llama 3.2 vs. DeepSeek-Coder
Initially, I opted for Llama 3.2, but the output was often "contaminated" with conversational filler—deadly for a system saving raw .py files to disk. I pivoted to DeepSeek-Coder-6.7B, while refining the instructions with surgical precision. Thus,i learnt the lesson that regardless of the model size, if your prompt isn’t airtight, you’ll get garbage out.

The "Ah-ha!" Moment: The Power of Primitives
The biggest breakthrough came while building executor.py. Writing the primitives for every operation—PDF parsing, Excel extraction, file management—realized that these simple, robust definitions are the true backbone of the agent. The LLM is the navigator, but the executor is the engine that keeps the user satisfied with predictable, high-speed logic.

Challenges Faced
Phonetic Drift: STT often misinterprets filenames (e.g., "Balagi" vs "Balaji"). I implemented a Context-Aware system that "snaps" misspelled voice commands to documents currently in the user's active session.
The CLI Deadlock: Moving from local subprocess calls to a REST API architecture eliminated silent deadlocks and drastically improved response times.
What’s Next for Mini Gravity?
I’m now deep-diving into Prompt Tuning and exploring how to make the agent even more proactive. The goal is to move beyond manual uploads into deeper, local system integration.

Check it out on GitHub: https://github.com/Sri-Balagi/Mini-gravity.git

Have you built any such local AI agent lately? I'd love to hear about your own "Ah-ha!" moments in the comments below!