<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Monika Pralayakaveri</title>
    <description>The latest articles on DEV Community by Monika Pralayakaveri (@monika_joestar).</description>
    <link>https://dev.to/monika_joestar</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3875519%2Fd1db1dc1-757e-45d6-894e-8c49d76bb775.jpeg</url>
      <title>DEV Community: Monika Pralayakaveri</title>
      <link>https://dev.to/monika_joestar</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/monika_joestar"/>
    <language>en</language>
    <item>
      <title>Building a Context-Aware Local Voice Agent without the Bloat</title>
      <dc:creator>Monika Pralayakaveri</dc:creator>
      <pubDate>Mon, 13 Apr 2026 14:59:28 +0000</pubDate>
      <link>https://dev.to/monika_joestar/building-a-context-aware-local-voice-agent-without-the-bloat-1g23</link>
      <guid>https://dev.to/monika_joestar/building-a-context-aware-local-voice-agent-without-the-bloat-1g23</guid>
      <description>&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  🔗 Project Links &amp;amp; Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub Repository&lt;/strong&gt;: &lt;a href="https://github.com/MonikaPralayakaveri/Assignment_Mem0/tree/main" rel="noopener noreferrer"&gt;MonikaPralayakaveri/Assignment_Mem0&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Demo Stack&lt;/strong&gt;: Python 3.10+, Streamlit, Groq SDK.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The flow of this post follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Introduction&lt;/li&gt;
&lt;li&gt;Architecture: The Lean Three-Layer Design&lt;/li&gt;
&lt;li&gt;Model Selection&lt;/li&gt;
&lt;li&gt;The Speech-to-Text (STT) and Text-to-Speech (TTS) Flow&lt;/li&gt;
&lt;li&gt;Bypassing LangChain for JSON-Array Routing&lt;/li&gt;
&lt;li&gt;The Safety Sandbox (Directory Isolation)&lt;/li&gt;
&lt;li&gt;Challenges faced and resolved&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Conclusion&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Solving Hallucination with Context Injection&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Security &amp;amp; Stability: Human-in-the-Loop Logic&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  1. Introduction:
&lt;/h2&gt;

&lt;p&gt;Building AI agents has become heavily tied to massive orchestration libraries like LangChain, LlamaIndex, AutoGen, or Crew AI. &lt;/p&gt;

&lt;p&gt;While these frameworks are incredible for enterprise scalability, they often introduce unnecessary latency, complex abstractions, and steep learning curves for simpler tasks.&lt;/p&gt;

&lt;p&gt;For this project(Assignment), the goal was to build a local, voice-controlled AI Assistant capable of manipulating the local file system(creating, reading, rewriting, and deleting files) based on spoken intents like:&lt;br&gt;
○ Create a file.&lt;br&gt;
○ Write code to a new or existing file.&lt;br&gt;
○ Summarize text.&lt;br&gt;
○ General Chat. &lt;br&gt;
Instead of wrapping it in eternal libraries, we stripped it down to raw LLM reasoning and built it from scratch using plain Python. Here is a breakdown of the architecture, the models deployed, and the unique challenges faced.&lt;/p&gt;
&lt;h2&gt;
  
  
  2. Architecture: The Lean Three-Layer Design
&lt;/h2&gt;

&lt;p&gt;The system operates linearly over three distinct layers:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7q1pxhvjujfmlgm0g4l1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7q1pxhvjujfmlgm0g4l1.png" alt="Architecture Flow"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  1. The Interface Layer:
&lt;/h4&gt;

&lt;p&gt;A Streamlit Web UI handling chat logic, microphone Web Audio captures, and file uploads.&lt;/p&gt;
&lt;h4&gt;
  
  
  2. The LLM Intent Router (&lt;code&gt;agent.py&lt;/code&gt;):
&lt;/h4&gt;

&lt;p&gt;A pure Python execution loop that takes a strict JSON schema prompt and categorizes user requests into mapped intent lists.&lt;/p&gt;
&lt;h4&gt;
  
  
  3. The Tool Executors (&lt;code&gt;tools.py&lt;/code&gt; &amp;amp; &lt;code&gt;audio.py&lt;/code&gt;):
&lt;/h4&gt;

&lt;p&gt;Secured Python functions holding specific logic isolating operations to a safe &lt;code&gt;.output/&lt;/code&gt; sandbox.&lt;/p&gt;
&lt;h2&gt;
  
  
  3. Model Selection
&lt;/h2&gt;

&lt;p&gt;I relied entirely on the Groq LPU Engine for this deployment:&lt;/p&gt;
&lt;h4&gt;
  
  
  Reasoning / Routing: &lt;code&gt;llama-3.3-70b-versatile&lt;/code&gt;
&lt;/h4&gt;

&lt;p&gt;I needed a model that would absolutely never break JSON structure formatting. A 70-Billion parameter model guarantees flawless adherence to syntax handling, even when navigating compound tool queries. Using an 8B model occasionally resulted in hallucinated parameters.&lt;/p&gt;
&lt;h4&gt;
  
  
  Transcription: &lt;code&gt;whisper-large-v3-turbo&lt;/code&gt;
&lt;/h4&gt;

&lt;p&gt;Extremely resilient to different mic hardware setups and exceptionally fast.&lt;/p&gt;
&lt;h2&gt;
  
  
  4. The Speech-to-Text (STT) and Text-to-Speech (TTS) Flow
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Used Streamlit’s native &lt;code&gt;st.audio_input&lt;/code&gt; to capture &lt;code&gt;.wav&lt;/code&gt; buffers directly from the user &amp;gt;&amp;gt;&amp;gt; than using complex WebRTC WebSocket streams like LiveKit.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  STT:
&lt;/h3&gt;

&lt;p&gt;These bytes go through Groq’s hosted &lt;code&gt;whisper-large-v3-turbo&lt;/code&gt; model.&lt;br&gt;
The result is near-instantaneous transcription without consuming heavy local GPU resources.&lt;/p&gt;
&lt;h3&gt;
  
  
  TTS:
&lt;/h3&gt;

&lt;p&gt;Once LLM generates the response or action confirmation, we pipe the output string through &lt;code&gt;gTTS&lt;/code&gt;(Google Text-To-Speech) and dynamically update a Streamlit &lt;code&gt;st.audio&lt;/code&gt; component in the UI to read the status back to the user seamlessly.&lt;/p&gt;
&lt;h2&gt;
  
  
  5. Bypassing LangChain for JSON-Array Routing
&lt;/h2&gt;

&lt;p&gt;Most of the AI developers tend to choose LangChain's&lt;code&gt;@tool&lt;/code&gt;decorator&lt;/p&gt;

&lt;p&gt;However, for a set of 7 localized tools, we found that simply prompting a highly capable model to return an Array of JSON Action Objects was astronomically faster.&lt;/p&gt;

&lt;p&gt;Because the system allows "Compound Commands" (e.g. "Create a file called index.html and then summarize this text"), we instructed the LLM to return an array:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"actions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"intent"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"create_file"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"filename"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"index.html"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"intent"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"summarize_text"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"text_to_summarize"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Sequential&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;independent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;execution&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By isolating the execution loop in native Python if/elif statements, we achieved flawless multi-level orchestration without the latency of an external framework.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. The Safety Sandbox (Directory Isolation)
&lt;/h2&gt;

&lt;p&gt;I used Directory Isolation to prevent accidental deletions or modifications of my core project files.&lt;/p&gt;

&lt;p&gt;Giving an AI agent access to my terminal is like giving a toddler a chainsaw—it’s dangerous. To prevent the agent from accidentally deleting my system32 or messing with my source code, I built a Restricted Execution Environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Challenges faced and resolved
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Challenge 1: The Hallucination of File Extensions (Fuzzy Context)
&lt;/h3&gt;

&lt;h4&gt;
  
  
  The Problem:
&lt;/h4&gt;

&lt;p&gt;When a user says "Delete the dummy file," LLMs often hallucinate extensions, confidently guessing &lt;code&gt;dummy.py&lt;/code&gt; even if the file is just named &lt;code&gt;dummy&lt;/code&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  The Solution:
&lt;/h4&gt;

&lt;p&gt;I implemented Dynamic Context Injection, where the backend silently crawls the &lt;code&gt;/output&lt;/code&gt; folder and injects the actual file list into the hidden System Prompt. This gives the agent "omnipresence," forcing the LLM to string-match the user's verbal slang directly to real files on disk.&lt;/p&gt;

&lt;h3&gt;
  
  
  Challenge 2: Human in the Loop (Destructive Interception)
&lt;/h3&gt;

&lt;h4&gt;
  
  
  The Problem:
&lt;/h4&gt;

&lt;p&gt;Granting an AI direct permission to rename or delete files is inherently reckless and dangerous for local file systems.&lt;/p&gt;

&lt;h4&gt;
  
  
  The Solution:
&lt;/h4&gt;

&lt;p&gt;I decoupled the intent engine from immediate execution by intercepting "dangerous" actions (like &lt;code&gt;delete_file&lt;/code&gt;) and returning a pending_action payload. This halts the UI from displaying the raw &lt;code&gt;JSON&lt;/code&gt; plan and requires the user to click ✅ Approve or ❌ Reject, ensuring absolute transparency before any changes touch the disk.&lt;/p&gt;

&lt;h2&gt;
  
  
  8. Conclusion
&lt;/h2&gt;

&lt;p&gt;By refusing to rely on bloated frameworks for internal routing, this architecture proved that pure context-injection and strict JSON prompts are wildly sufficient for handling localized, secure, multimodal tasks effortlessly!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
