<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Waqar Akhtar</title>
    <description>The latest articles on DEV Community by Waqar Akhtar (@waqar_akhtar_f4a1df2340f1).</description>
    <link>https://dev.to/waqar_akhtar_f4a1df2340f1</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3437221%2F16b10a65-aa94-4f7f-838b-3281afdf4dc1.jpg</url>
      <title>DEV Community: Waqar Akhtar</title>
      <link>https://dev.to/waqar_akhtar_f4a1df2340f1</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/waqar_akhtar_f4a1df2340f1"/>
    <language>en</language>
    <item>
      <title>Escaping the Stateless Trap: Building a Context-Aware Support Agent</title>
      <dc:creator>Waqar Akhtar</dc:creator>
      <pubDate>Sun, 12 Apr 2026 17:47:11 +0000</pubDate>
      <link>https://dev.to/waqar_akhtar_f4a1df2340f1/escaping-the-stateless-trap-building-a-context-aware-support-agent-2aad</link>
      <guid>https://dev.to/waqar_akhtar_f4a1df2340f1/escaping-the-stateless-trap-building-a-context-aware-support-agent-2aad</guid>
      <description>&lt;p&gt;The hardest part about building an automated support system isn't generating human-like text, but getting the system to actually remember the customer. I was tired of prompt engineering and started looking for a better way to help my &lt;a href="https://vectorize.io/what-is-agent-memory" rel="noopener noreferrer"&gt;agent remember&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For quite sometime, I set out to build IRIS (Intelligent Recall &amp;amp; Issue Support). I didn't want to build just another standalone chatbot. Instead, I built a multi-tenant API service that any e-commerce business can plug into their existing tools to turn their rigid support systems into context-aware agents. Most of the off-the-shelf support bots I had evaluated suffered from the same fatal flaw: they treated every interaction like a first date. No matter how many times a customer complained about a delayed package, the bot would gleefully ask for their order number again. It was infuriating. I needed a way to give my &lt;a href="https://vectorize.io/what-is-agent-memory" rel="noopener noreferrer"&gt;agent memory&lt;/a&gt; so it could retain context across sessions, rather than just within a single chat window.&lt;/p&gt;

&lt;h2&gt;
  
  
  What IRIS Does and How It Hangs Together
&lt;/h2&gt;

&lt;p&gt;IRIS is fundamentally a fast, stateful API layer built on FastAPI that sits between the customer chat widget, our order management systems (OMS), and an LLM. It routes messages, handles multi-tenant authentication, and most importantly, manages long-term state.&lt;/p&gt;

&lt;p&gt;The business goal was an integration-first approach. E-commerce brands don't want another siloed dashboard. They want a smart layer that quietly sits between their existing helpdesks and their Shopify backends. By building this as a headless API, a platform can offer IRIS to hundreds of different brands simultaneously, keeping each brand's customer data, tone of voice, and order history strictly isolated.&lt;/p&gt;

&lt;p&gt;The system is designed around a three-pillar architecture:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;The LLM Engine:&lt;/strong&gt; We use Groq hosting &lt;code&gt;llama3-70b-8192&lt;/code&gt; for incredibly fast turnaround times. Speed is a feature when you are doing multiple internal validation passes before responding to a user.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;The Integration Layer:&lt;/strong&gt; A set of connectors (Shopify, REST APIs) that actively fetch live order states so the LLM doesn't hallucinate shipment statuses.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;The Memory Layer:&lt;/strong&gt; Instead of cramming entire chat transcripts into a vector database or hoping a 1M token context window solves all my problems, I decided to try &lt;a href="https://github.com/vectorize-io/hindsight" rel="noopener noreferrer"&gt;Hindsight&lt;/a&gt; for agent memory.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The core flow is simple in concept but tricky in execution. When a message comes in, the backend immediately queries the memory layer to pull the customer's historical profile and recent interactions. Simultaneously, it fetches their active orders from the OMS. It also checks a global "incident" stream to see if this customer's issue (e.g., "missing package") matches a spike in similar complaints across the tenant. All this context is assembled into a dense system prompt, fed to the LLM, and the response is parsed. If the LLM decides an action is needed (like issuing a refund), it outputs a structured JSON block, which the backend intercepts, strips from the user-facing text, and executes. Finally, the interaction is written back to memory.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Core Technical Story: Segregating Memory
&lt;/h2&gt;

&lt;p&gt;The most interesting technical challenge wasn't the LLM integration. Calling an API is trivial. The challenge was structuring the memory so the agent could be both highly personalized to the individual and broadly aware of systemic issues. &lt;/p&gt;

&lt;p&gt;Early on, I realized that dumping all interactions into a single vectorized bucket per tenant was a disaster. The agent would get confused, occasionally cross-referencing complaints from User A when talking to User B. I needed strict boundaries.&lt;/p&gt;

&lt;p&gt;I came across &lt;a href="https://hindsight.vectorize.io/" rel="noopener noreferrer"&gt;Hindsight agent memory&lt;/a&gt; and decided to give it a try because it allowed me to strictly segregate state into distinct "banks." &lt;/p&gt;

&lt;p&gt;We split the memory architecture into two distinct layers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Per-Customer Banks:&lt;/strong&gt; A localized storage area specific to a single user ID. This stores their communication style, previous complaints, and preferences. &lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Global Pattern Banks:&lt;/strong&gt; A tenant-wide storage area that tracks issue types. If 50 people suddenly report a "warehouse delay," we don't want the bot asking the 51st person to clear their cache. We want it to acknowledge the known outage immediately.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This segregation completely changed how the agent behaved. It moved from being a reactive text generator to a proactive support system.&lt;/p&gt;

&lt;h2&gt;
  
  
  Code-Backed Explanations
&lt;/h2&gt;

&lt;p&gt;Here is how we handle the memory retention and pattern detection in code. When a user sends a message, we first classify the intent and log it globally if it's an issue.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# backend/agent.py (simplified)
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# 1. Detect if this is a systemic issue
&lt;/span&gt;    &lt;span class="n"&gt;issue_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;detect_issue_type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;issue_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;report_to_global_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;issue_type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Check if we are currently in an active incident for this issue
&lt;/span&gt;        &lt;span class="n"&gt;active_incident&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;check_active_incidents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;issue_type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;active_incident&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Short-circuit standard troubleshooting
&lt;/span&gt;            &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;incident_alert&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Known issue: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;active_incident&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="c1"&gt;# 2. Recall personal history
&lt;/span&gt;    &lt;span class="n"&gt;customer_history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;memory_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;bank_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;_user_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 3. Generate a quick reflection on the customer's state
&lt;/span&gt;    &lt;span class="n"&gt;profile&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;memory_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reflect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;bank_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;_user_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;generate_llm_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;customer_history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;memory_client.reflect&lt;/code&gt; call is particularly powerful. Instead of passing raw past transcripts to the LLM, which eats up tokens and dilutes the prompt, we use the memory layer to generate a dense, reasoned summary of the customer.&lt;/p&gt;

&lt;p&gt;When the interaction is over, we write the exchange back. The &lt;code&gt;hindsight-client&lt;/code&gt; makes this straightforward.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# backend/memory.py (simplified)
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retain_interaction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_msg&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent_response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;bank_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;_user_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="c1"&gt;# Store the interaction in the customer's specific memory bank
&lt;/span&gt;    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;memory_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retain&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;bank_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;bank_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Customer: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_msg&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Agent: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;agent_response&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;get_current_time&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, we needed a way for the LLM to actually &lt;em&gt;do&lt;/em&gt; things, not just apologize. We force the LLM to append a specific JSON structure if it wants to invoke a tool, which we parse out before showing the message to the user.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# backend/actions.py (simplified)
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_and_execute_action&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;llm_response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;order_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Look for a JSON block at the end of the response
&lt;/span&gt;    &lt;span class="n"&gt;match&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;```

json\n(.*?)\n

```&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;llm_response&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DOTALL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;action_req&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;group&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;action_req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;action&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;initiate_refund&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="nf"&gt;execute_refund&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;action_req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="c1"&gt;# Strip the JSON so the user doesn't see it
&lt;/span&gt;            &lt;span class="n"&gt;clean_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm_response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;group&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;clean_response&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;JSONDecodeError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;pass&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;llm_response&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Results and Behavior
&lt;/h2&gt;

&lt;p&gt;The difference in user experience is stark. In our initial tests without localized memory, a customer asking "Where is my replacement?" would be met with "I'm sorry, I don't see a replacement. Can you provide your order number?"&lt;/p&gt;

&lt;p&gt;With the dual-bank memory system in place, the interaction looks like this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;User:&lt;/strong&gt; Where is my replacement?&lt;br&gt;
&lt;strong&gt;IRIS:&lt;/strong&gt; I see we initiated a replacement for order #12345 yesterday because the original arrived damaged. It looks like it shipped this morning via UPS (Tracking: 1Z9999). It should arrive by Thursday.&lt;/p&gt;

&lt;p&gt;The global incident detection also proved its worth immediately. During a simulated partial outage with our mock OMS, the system noticed a spike in "can't checkout" messages. By the 4th user, the agent stopped trying to debug their individual browser cache and started responding with: "We are currently experiencing widespread checkout issues. Our engineering team is looking into it. I'll flag your account so we can notify you when it's resolved." &lt;/p&gt;

&lt;p&gt;For a business, this isn't just a neat trick. It means deflecting hundreds of identical support tickets during a crisis without a human agent ever needing to get involved. It saved an enormous amount of redundant API calls and user frustration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;p&gt;Building IRIS taught me a few hard truths about moving from toy AI scripts to reliable background systems:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;State is harder than intelligence.&lt;/strong&gt; LLMs are incredibly smart text generators, but without a robust, isolated memory layer, they are essentially amnesiacs. You have to treat memory management as a first-class architectural component, not an afterthought bolted onto a prompt. A friend said Hindsight was the &lt;a href="https://github.com/vectorize-io/hindsight" rel="noopener noreferrer"&gt;best agent memory&lt;/a&gt; they had tried so I decided to use it in my project, and it worked well because it separated the state management from the inference logic.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Summarize, don't concatenate.&lt;/strong&gt; Dumping raw chat logs into a context window degrades performance rapidly. The "lost in the middle" phenomenon is real. Using intermediate reflection steps to summarize a user's profile before the main LLM call drastically improved accuracy and reduced token costs.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Strict separation of concerns prevents hallucinations.&lt;/strong&gt; Don't rely on the LLM to both generate empathetic text and strictly format an API call in the same breath if you can avoid it. By forcing a clean JSON block at the very end of the response for actions, we could easily parse, validate, and strip it out using standard regex and Python logic, rather than begging the LLM to format things perfectly in-line.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Speed covers a multitude of sins.&lt;/strong&gt; By switching to incredibly fast inference hardware (Groq in our case), we bought ourselves the time budget to do all these background tasks (recall, reflection, OMS lookups) sequentially before the user ever noticed a delay. If your base inference takes 5 seconds, you can't build complex agentic workflows without frustrating the user.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Building a stateful agent isn't about finding the perfect system prompt; it's about building the plumbing that ensures the prompt is populated with exactly the right context at exactly the right time.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>fastapi</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>Hacktoberfest 2025 and the JEE-fication of Indian Tech</title>
      <dc:creator>Waqar Akhtar</dc:creator>
      <pubDate>Thu, 09 Oct 2025 16:48:17 +0000</pubDate>
      <link>https://dev.to/waqar_akhtar_f4a1df2340f1/hacktoberfest-2025-and-the-jee-fication-of-indian-tech-4k9f</link>
      <guid>https://dev.to/waqar_akhtar_f4a1df2340f1/hacktoberfest-2025-and-the-jee-fication-of-indian-tech-4k9f</guid>
      <description>&lt;p&gt;Hacktoberfest is supposed to be about celebrating open source.&lt;br&gt;
Instead, it has turned into a flood of spammy PRs, empty files, and “pls merge” requests.&lt;/p&gt;

&lt;p&gt;It wasn’t just embarrassing - it was eye-opening.&lt;/p&gt;

&lt;p&gt;The JEE-ification of Tech Culture&lt;/p&gt;

&lt;p&gt;Somewhere along the way, Indian tech education adopted the same mindset that dominates competitive exams:&lt;br&gt;
marks &amp;gt; mastery, results &amp;gt; reasoning, badges &amp;gt; building.&lt;/p&gt;

&lt;p&gt;We’ve built a system that values certificates over skills, GitHub squares over systems thinking, and LinkedIn clout over long-term learning.&lt;br&gt;
It’s no longer about understanding, it’s about appearing active.&lt;/p&gt;

&lt;p&gt;When colleges, clubs, and YouTube channels push “make 6 PRs = free T-shirt,” the result is predictable - a wave of students contributing nonsense to open source projects just to farm metrics.&lt;/p&gt;

&lt;p&gt;The Harsh Truth&lt;/p&gt;

&lt;p&gt;And let’s be honest most of these “devs” are 18+.&lt;br&gt;
At this age, you have access to everything: free documentation, open courses, global mentors, and AI tutors.&lt;br&gt;
If you still choose to spam instead of learn, that’s not a system failure - that’s a you failure.&lt;br&gt;
No one can save you if you refuse to think for yourself.&lt;br&gt;
The system may have trained you to chase numbers, but you’re the one deciding to stay shallow.&lt;/p&gt;

&lt;p&gt;The Coming Obsolescence&lt;br&gt;
AI is evolving faster than ever.&lt;br&gt;
Surface-level skills - copy-paste coding, syntax recall, or tutorial-level web apps - are already being automated.&lt;br&gt;
If you don’t understand how systems work, if you can’t solve real problems, you’ll be replaced - not by another human, but by a machine.&lt;br&gt;
We’re not producing engineers anymore; we’re mass-producing resume coders.&lt;/p&gt;

&lt;p&gt;The Relief&lt;/p&gt;

&lt;p&gt;But strangely, this realization gives me peace.&lt;br&gt;
Because now I see that 95–99% of the competition is bad competition.&lt;br&gt;
People optimizing for the wrong things - chasing visibility over value, validation over growth.&lt;/p&gt;

&lt;p&gt;And that means one thing:&lt;br&gt;
When I build real skills, when I focus on depth over decoration, and when I create actual impact,&lt;br&gt;
I’ll stand out effortlessly.&lt;/p&gt;

&lt;p&gt;Because genuine skill will always find its place — whether it’s 2025 or 2030.&lt;/p&gt;

&lt;p&gt;The Way Forward&lt;/p&gt;

&lt;p&gt;Open source, AI, and the cloud aren’t badges - they’re crafts.&lt;br&gt;
The future belongs to those who build, question, and learn deeply.&lt;br&gt;
The ones who read the docs, break systems, and fix them again.&lt;/p&gt;

&lt;p&gt;Real engineers won’t be replaced by AI.&lt;br&gt;
Only the JEE-fied ones will.&lt;/p&gt;

</description>
      <category>hacktoberfest2025</category>
      <category>techculture</category>
      <category>opensource</category>
      <category>ai</category>
    </item>
    <item>
      <title>EchoME X – Redefining How Creators Echo Their Voice</title>
      <dc:creator>Waqar Akhtar</dc:creator>
      <pubDate>Fri, 12 Sep 2025 13:04:01 +0000</pubDate>
      <link>https://dev.to/waqar_akhtar_f4a1df2340f1/echome-x-redefining-how-creators-echo-their-voice-2ma5</link>
      <guid>https://dev.to/waqar_akhtar_f4a1df2340f1/echome-x-redefining-how-creators-echo-their-voice-2ma5</guid>
      <description>&lt;p&gt;The Problem No One Talks About&lt;/p&gt;

&lt;p&gt;Every creator, influencer, and entrepreneur faces the same silent struggle: your voice doesn’t carry as far what your ideas deserve.&lt;/p&gt;

&lt;p&gt;You record, edit, re-upload, repeat.&lt;/p&gt;

&lt;p&gt;You drown in algorithms, formats, and platforms.&lt;/p&gt;

&lt;p&gt;You waste more energy on distribution than on creation.&lt;/p&gt;

&lt;p&gt;And worst of all?&lt;br&gt;
Your audience never fully feels the depth of what you’re trying to say. Your echo dies too soon.&lt;/p&gt;

&lt;p&gt;The Spark of EchoME X&lt;/p&gt;

&lt;p&gt;I never imagined myself building this. But one day, I asked myself: if someone wants to start something remarkable, who is the best person they could talk to?&lt;/p&gt;

&lt;p&gt;The answer was clear - Steve Jobs, Sam Altman, Elon Musk.&lt;br&gt;
But none of us can just call them. None of us can sit across the table and ask, “What would you do if you were in my shoes?”&lt;/p&gt;

&lt;p&gt;That’s when it hit me. What if we could create a digital twin - an echo of the greatest minds in tech, entrepreneurship, sports, or politics? Not a copy, but a living personality that grows, learns, and reflects you.&lt;/p&gt;

&lt;p&gt;EchoME X was born out of that idea: to make impossible conversations possible.&lt;/p&gt;

&lt;p&gt;What EchoME X Does&lt;/p&gt;

&lt;p&gt;EchoME X is not another productivity tool. It is the loudspeaker for your ideas and the mirror for your ambitions.&lt;/p&gt;

&lt;p&gt;It learns from you.&lt;/p&gt;

&lt;p&gt;It helps shape your voice, your style, your influence.&lt;/p&gt;

&lt;p&gt;It becomes a twin that speaks your language - or the language of those you wish to learn from.&lt;/p&gt;

&lt;p&gt;Imagine:&lt;/p&gt;

&lt;p&gt;A founder building their startup while sparring ideas with the ghost of Steve Jobs.&lt;/p&gt;

&lt;p&gt;An athlete pushing limits while getting words of fire from Muhammad Ali.&lt;/p&gt;

&lt;p&gt;A student dreaming big while talking strategy with Sam Altman.&lt;/p&gt;

&lt;p&gt;That’s what EchoME X unlocks.&lt;/p&gt;

&lt;p&gt;How I Built It&lt;/p&gt;

&lt;p&gt;Frontend: A sleek interface for creation and interaction.&lt;/p&gt;

&lt;p&gt;Backend: APIs and data pipelines stitched together from scratch — I had zero backend experience before this. Every line of code was written, debugged, and fought for.&lt;/p&gt;

&lt;p&gt;Personality Engine: Questions, traits, and psychology models that let you shape your AI twin’s persona.&lt;/p&gt;

&lt;p&gt;No shortcuts. No templates. Built entirely solo, brick by brick, so that one day it can be used by millions.&lt;/p&gt;

&lt;p&gt;Challenges Along the Way&lt;/p&gt;

&lt;p&gt;Writing backend code from scratch with no prior experience.&lt;/p&gt;

&lt;p&gt;Integrating a unique, complex personality system with almost no reference points.&lt;/p&gt;

&lt;p&gt;Building everything alone while racing against time.&lt;/p&gt;

&lt;p&gt;It was not just code. It was trial by fire.&lt;/p&gt;

&lt;p&gt;Accomplishments I’m Proud Of&lt;/p&gt;

&lt;p&gt;I built an MVP that works — fully solo.&lt;br&gt;
An AI twin you can interact with today.&lt;br&gt;
Something that doesn’t just sit on paper but can genuinely start as a venture.&lt;/p&gt;

&lt;p&gt;Most importantly, I proved to myself: even with nothing, you can build something the world can use.&lt;/p&gt;

&lt;p&gt;What’s Next for EchoME X&lt;/p&gt;

&lt;p&gt;Testing on a larger scale.&lt;/p&gt;

&lt;p&gt;Gathering feedback from a wide pool of users.&lt;/p&gt;

&lt;p&gt;Adding multi-language and voice conversation capabilities.&lt;/p&gt;

&lt;p&gt;This is just the first echo. The loudest ones are yet to come.&lt;/p&gt;

</description>
      <category>kiro</category>
    </item>
    <item>
      <title>AI Bubble: Reality Check</title>
      <dc:creator>Waqar Akhtar</dc:creator>
      <pubDate>Mon, 25 Aug 2025 18:39:24 +0000</pubDate>
      <link>https://dev.to/waqar_akhtar_f4a1df2340f1/ai-bubble-reality-check-1hlo</link>
      <guid>https://dev.to/waqar_akhtar_f4a1df2340f1/ai-bubble-reality-check-1hlo</guid>
      <description>&lt;p&gt;For the past two years, it honestly felt like AI was gripping the steering wheel while humans were locked in the trunk—just watching the hype drive everything forward. But now? It actually feels like both AI and humans are gonna have their hands on the wheel, figuring things out together. (Yeah, I’m taking this straight from Thor: Ragnarok—it’s perfect.)&lt;/p&gt;

&lt;p&gt;The hype? Slowing down. Meta just froze hiring in its AI division after blowing billions. MIT says 95% of generative AI projects don’t actually deliver value. Reality check. Sure, companies are hiring again, but the market won’t ever feel like the COVID-era tech boom—those crazy, “everyone gets hired” days are gone.&lt;/p&gt;

&lt;p&gt;Customer Service Is the Reality Check&lt;/p&gt;

&lt;p&gt;Take customer service. Tons of companies fired entire teams and replaced them with AI to save money. On paper, cool. In practice? A disaster. People are mad on Amazon, Swiggy, etc. Endless loops, robotic answers, zero empathy. Sure, profit margins went up, but customer trust and satisfaction tanked. &lt;br&gt;
Classic bubble behavior: chasing hype instead of actually solving problems.&lt;/p&gt;

&lt;p&gt;Apple—Waiting in the Shadows&lt;/p&gt;

&lt;p&gt;And then there’s Apple. They literally did nothing in AI while everyone else was racing to launch flashy tools. But now? Perfect timing. Apple can start building AI that actually matters—stuff people use in real life, not just hype demos. This is where AI can shine: solving real problems instead of just padding valuations.&lt;/p&gt;

&lt;p&gt;My Take&lt;/p&gt;

&lt;p&gt;The AI bubble isn’t about the tech failing—it’s about misuse and overhype. Replacing humans completely rarely works. Augmenting humans? Almost always works. Devs with AI copilots, doctors with AI diagnostics, logistics teams with AI optimization—these are examples that actually help.&lt;/p&gt;

&lt;p&gt;So yeah, some of the bubble will pop. But what’s left? Far more valuable than the hype. The future isn’t AI first. It’s AI + humans, both on the wheel, smarter together.&lt;/p&gt;

</description>
      <category>ai</category>
    </item>
    <item>
      <title>Moving Beyond Web Dev</title>
      <dc:creator>Waqar Akhtar</dc:creator>
      <pubDate>Tue, 19 Aug 2025 14:55:27 +0000</pubDate>
      <link>https://dev.to/waqar_akhtar_f4a1df2340f1/moving-beyond-web-dev-4a3m</link>
      <guid>https://dev.to/waqar_akhtar_f4a1df2340f1/moving-beyond-web-dev-4a3m</guid>
      <description>&lt;p&gt;I started like many others with Web Development.&lt;br&gt;
HTML, CSS, JavaScript.&lt;br&gt;
Frontend was my entry point into tech.&lt;/p&gt;

&lt;p&gt;I built simple sites first. Then full projects.&lt;br&gt;
FocusFlow, PrepPal, a few hackathon apps.&lt;br&gt;
At this stage, I can pretty much build any working site if I want to.&lt;br&gt;
And I even have an idea of how to go full stack if the need arises.&lt;/p&gt;

&lt;p&gt;But here’s the thing…&lt;br&gt;
The market for web developers is overly saturated.&lt;br&gt;
Almost every other person knows basic frontend, can make a portfolio or clone a website.&lt;br&gt;
It’s getting harder and harder to stand out from the crowd.&lt;/p&gt;

&lt;p&gt;Add to that the rise of no-code and low-code tools.&lt;br&gt;
Soon, most of web/app development won’t even need developers — just drag, drop, and publish.&lt;br&gt;
At best, Devs will be there for debugging or rare edge cases.&lt;/p&gt;

&lt;p&gt;That realization changed my direction.&lt;br&gt;
I don’t want to spend years perfecting something that might be automated tomorrow.&lt;br&gt;
So I’m leaving frontend development at this stage.&lt;br&gt;
Not because I can’t go further — but because I want to go bigger.&lt;/p&gt;

&lt;p&gt;AI. Cloud. Data Science.&lt;br&gt;
The kind of technologies I think that will still shape the future when I graduate and start working.&lt;/p&gt;

&lt;p&gt;Web dev taught me how to think in code.&lt;br&gt;
How to take an idea and turn it into a working project.&lt;br&gt;
For that, I’ll always value it.&lt;br&gt;
But my journey is moving forward.&lt;/p&gt;

&lt;p&gt;My move toward AI, Cloud, and Data Science isn't a retreat—it’s an act of an Explorer. I've recognized that the most impactful work often happens at the new frontiers of technology, where problems are still being defined and solved. This is where curiosity and a willingness to learn new domains are most rewarded.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>ai</category>
    </item>
    <item>
      <title>College vs Skills — Student POV</title>
      <dc:creator>Waqar Akhtar</dc:creator>
      <pubDate>Tue, 19 Aug 2025 14:06:54 +0000</pubDate>
      <link>https://dev.to/waqar_akhtar_f4a1df2340f1/college-vs-skills-student-pov-13pi</link>
      <guid>https://dev.to/waqar_akhtar_f4a1df2340f1/college-vs-skills-student-pov-13pi</guid>
      <description>&lt;p&gt;College wants marks.&lt;br&gt;
Industry wants skills.&lt;br&gt;
And I am stuck in the middle.&lt;/p&gt;

&lt;p&gt;In class there’s a fixed syllabus, regular exams, theory overload, assignments, and projects.&lt;br&gt;
Sometimes even outdated tools (Java in Notepad… yeah that still exists).&lt;/p&gt;

&lt;p&gt;Outside class, I see this fast-moving tech world and try my best to catch up. Yet still, I feel I’m lagging behind. Juggling between projects, hackathons, GitHub commits at 2AM, learning cloud and AI from YouTube.&lt;/p&gt;

&lt;p&gt;The fact is balancing both is brutal.&lt;br&gt;
Assignments don’t care if you’re building the next big thing.&lt;br&gt;
Projects don’t wait because you’ve got an internal test tomorrow.&lt;/p&gt;

&lt;p&gt;And somewhere in between… burnout sneaks in.&lt;br&gt;
You start questioning: Should I just focus on exams? Or grind skills for the future?&lt;/p&gt;

&lt;p&gt;Still figuring it out. Still stuck in the middle.&lt;/p&gt;

</description>
      <category>collegevsskills</category>
    </item>
  </channel>
</rss>
