<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Tharunika Shri</title>
    <description>The latest articles on DEV Community by Tharunika Shri (@tharunika_shri_23).</description>
    <link>https://dev.to/tharunika_shri_23</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3874881%2F7d661051-63de-40b3-af89-ea0253f0b2c4.png</url>
      <title>DEV Community: Tharunika Shri</title>
      <link>https://dev.to/tharunika_shri_23</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tharunika_shri_23"/>
    <language>en</language>
    <item>
      <title>Why My Support Bot Finally Stopped Acting Like a Goldfish</title>
      <dc:creator>Tharunika Shri</dc:creator>
      <pubDate>Sun, 12 Apr 2026 12:49:32 +0000</pubDate>
      <link>https://dev.to/tharunika_shri_23/why-my-support-bot-finally-stopped-acting-like-a-goldfish-3npm</link>
      <guid>https://dev.to/tharunika_shri_23/why-my-support-bot-finally-stopped-acting-like-a-goldfish-3npm</guid>
      <description>&lt;p&gt;&lt;strong&gt;Most support bots have the same problem: they forget everything.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can talk to them for an hour, come back the next day, and the entire conversation might as well have never happened. Every session starts from zero. Every problem must be re-explained. From a systems perspective, most of these bots are just stateless request processors pretending to be conversational systems.&lt;/p&gt;

&lt;p&gt;I ran into this problem while building a support automation system, and the fix turned out to be less about better prompting and more about something much simpler: memory.&lt;/p&gt;

&lt;p&gt;Specifically, persistent agent memory.&lt;/p&gt;

&lt;p&gt;Once I added that piece, the system stopped behaving like a goldfish.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the System Actually Does&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The system I built is a customer support agent that remembers interactions across sessions. Instead of treating every conversation as a fresh request, the agent keeps a structured record of what happened before.&lt;/p&gt;

&lt;p&gt;When a user messages support, the system:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Looks up their past interactions&lt;/li&gt;
&lt;li&gt;Retrieves relevant memories&lt;/li&gt;
&lt;li&gt;Uses that context to generate a response&lt;/li&gt;
&lt;li&gt;Stores the outcome of the interaction back into memory&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The core architecture is intentionally simple.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frs3hulrvcdd07b6i0e6g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frs3hulrvcdd07b6i0e6g.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;br&gt;
The memory layer is where the interesting work happens. Instead of trying to stuff all historical context into prompts, I needed something designed specifically for storing and retrieving conversational state.&lt;/p&gt;

&lt;p&gt;That led me to try Hindsight, which is essentially a purpose-built system for persistent agent memory.&lt;/p&gt;

&lt;p&gt;Before that, I tried a few naive approaches. None of them worked well.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Prompt Engineering Wasn't Enough&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The first version of the system used a common pattern: just include the conversation history in the prompt.&lt;/p&gt;

&lt;p&gt;That works fine for short interactions, but breaks down quickly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prompts get huge&lt;/li&gt;
&lt;li&gt;Context windows fill up&lt;/li&gt;
&lt;li&gt;Retrieval becomes expensive&lt;/li&gt;
&lt;li&gt;Historical information becomes noisy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The bigger problem is that not all information should be treated equally. If a user tells support once that they are frustrated with billing errors, that signal should persist. If they ask about shipping status, that might not matter next week.&lt;/p&gt;

&lt;p&gt;I needed something that behaved less like a transcript and more like memory.&lt;/p&gt;

&lt;p&gt;In other words, I needed a way to give the system real agent memory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Memory Model&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The key design decision was to treat each interaction as an event that produces memory artifacts.&lt;/p&gt;

&lt;p&gt;Instead of storing raw chat logs, the system stores structured observations about the interaction.&lt;br&gt;
For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Customer name&lt;/li&gt;
&lt;li&gt;Known issues&lt;/li&gt;
&lt;li&gt;Previous failed fixes&lt;/li&gt;
&lt;li&gt;Sentiment signals&lt;/li&gt;
&lt;li&gt;Resolution outcomes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those get written into Hindsight after every conversation.&lt;/p&gt;

&lt;p&gt;When a user returns, the system retrieves the relevant pieces of memory and injects them into the prompt.&lt;/p&gt;

&lt;p&gt;The backend code that retrieves memory looks roughly like this:&lt;/p&gt;

&lt;p&gt;Python&lt;br&gt;
memory = hindsight.search(&lt;br&gt;
    user_id=user_id,&lt;br&gt;
    query=user_message,&lt;br&gt;
    top_k=5&lt;br&gt;
)&lt;br&gt;
context = "\n".join([m.content for m in memory])&lt;/p&gt;

&lt;p&gt;The important part isn't the code. It's the behavior.&lt;/p&gt;

&lt;p&gt;Instead of feeding the model everything, the system feeds it the right things.&lt;/p&gt;

&lt;p&gt;That alone dramatically improved response quality.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Writing Memories Back&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Retrieval only works if the memory store is constantly updated.&lt;/p&gt;

&lt;p&gt;After every interaction, the system extracts the outcome of the conversation and writes it back to the memory store.&lt;/p&gt;

&lt;p&gt;This happens after the LLM generates a response.&lt;/p&gt;

&lt;p&gt;Python&lt;br&gt;
memory_entry = {&lt;br&gt;
    "user_id": user_id,&lt;br&gt;
    "event": "support_interaction",&lt;br&gt;
    "summary": conversation_summary,&lt;br&gt;
    "resolution": outcome&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;hindsight.write(memory_entry)&lt;br&gt;
The first few iterations of this were messy.&lt;/p&gt;

&lt;p&gt;I initially tried storing entire conversations, but those quickly became noisy and redundant. Eventually I settled on storing short summaries of interactions instead.&lt;/p&gt;

&lt;p&gt;This kept the memory store useful instead of bloated.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Retrieval Loop&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Once both retrieval and writing were in place, the agent started behaving very differently.&lt;/p&gt;

&lt;p&gt;The request flow looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;User sends message&lt;/li&gt;
&lt;li&gt;Retrieve past memory&lt;/li&gt;
&lt;li&gt;Construct prompt with relevant history&lt;/li&gt;
&lt;li&gt;Generate response&lt;/li&gt;
&lt;li&gt;Store summarized outcome&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A simplified version of the response generation looks like this:&lt;/p&gt;

&lt;p&gt;Python&lt;br&gt;
prompt = f"""&lt;br&gt;
Customer message:&lt;br&gt;
{user_message}&lt;/p&gt;

&lt;p&gt;Relevant past interactions:&lt;br&gt;
{context}&lt;/p&gt;

&lt;p&gt;Respond helpfully and avoid repeating failed solutions.&lt;br&gt;
"""&lt;br&gt;
response = llm.generate(prompt)&lt;/p&gt;

&lt;p&gt;That one instruction — avoid repeating failed solutions — became surprisingly powerful once the system actually knew what had failed before.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What This Looks Like in Practice&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The difference shows up clearly when the same customer comes back.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First Interaction&lt;/strong&gt;&lt;br&gt;
Customer:&lt;br&gt;
My billing is wrong this month.&lt;br&gt;
Agent:&lt;br&gt;
I'm sorry about that. Can you provide your account email and invoice number?&lt;br&gt;
The system records:&lt;/p&gt;

&lt;p&gt;Issue: billing discrepancy&lt;br&gt;
Sentiment: frustrated&lt;br&gt;
Resolution: unresolved&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fifth Interaction&lt;/strong&gt;&lt;br&gt;
Customer:&lt;br&gt;
My bill is still wrong again.&lt;/p&gt;

&lt;p&gt;The agent already knows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;This user had a billing issue last month&lt;/li&gt;
&lt;li&gt;They were frustrated&lt;/li&gt;
&lt;li&gt;The previous fix did not work&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of asking the same questions again, the response becomes:&lt;/p&gt;

&lt;p&gt;Hi Alex — I see you had a billing issue last month that wasn't resolved. Let's fix that properly this time. I'm escalating this directly to billing so you don't have to repeat the details again.&lt;/p&gt;

&lt;p&gt;That behavior doesn't come from clever prompting.&lt;br&gt;
It comes from memory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Retrieval Matters More Than Storage&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;One of the biggest lessons from building this system was that retrieval quality matters far more than storage volume.&lt;/p&gt;

&lt;p&gt;You can store thousands of interactions, but if retrieval pulls irrelevant memories, the model gets confused quickly.&lt;/p&gt;

&lt;p&gt;A smaller, well-filtered memory set works much better.&lt;/p&gt;

&lt;p&gt;This is one reason I ended up using Hindsight agent memory. It focuses heavily on retrieving the most relevant context rather than dumping everything back into the model.&lt;/p&gt;

&lt;p&gt;In practice, that means returning only a handful of memory entries for each request.&lt;/p&gt;

&lt;p&gt;Five memories consistently worked better than fifty.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Escalation Problem&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Another tricky part of the system was escalation.&lt;/p&gt;

&lt;p&gt;Not every issue should be handled by the agent. Some problems require a human support rep.&lt;/p&gt;

&lt;p&gt;The agent decides whether to escalate based on a few signals:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;unresolved issues&lt;/li&gt;
&lt;li&gt;repeated failures&lt;/li&gt;
&lt;li&gt;high frustration sentiment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When escalation happens, the system generates a summary of the entire interaction history and passes it to a human agent.&lt;br&gt;
Something like this:&lt;/p&gt;

&lt;p&gt;Customer: Alex&lt;br&gt;
Known issues: billing discrepancy&lt;br&gt;
Previous attempts: invoice correction&lt;br&gt;
Sentiment: frustrated&lt;br&gt;
Resolution: unresolved&lt;/p&gt;

&lt;p&gt;The human agent now starts with context instead of a blank screen.&lt;/p&gt;

&lt;p&gt;That alone cuts a lot of friction out of support workflows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lessons From Building It&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A few practical lessons came out of this project.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Stateless chatbots are fundamentally limited&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most bots fail because they reset context every session.&lt;/p&gt;

&lt;p&gt;Without persistent memory, you can't build real relationships with users.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Memory should be structured&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Raw transcripts are noisy.&lt;/p&gt;

&lt;p&gt;Short summaries and labeled events are far easier to retrieve and reason about.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Retrieval beats bigger prompts&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Better context selection improves responses more than longer prompts.&lt;/p&gt;

&lt;p&gt;Focus on relevance, not volume.&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
