Why My Support Bot Finally Stopped Acting Like a Goldfish

Tharunika Shri — Sun, 12 Apr 2026 12:49:32 +0000

Most support bots have the same problem: they forget everything.

You can talk to them for an hour, come back the next day, and the entire conversation might as well have never happened. Every session starts from zero. Every problem must be re-explained. From a systems perspective, most of these bots are just stateless request processors pretending to be conversational systems.

I ran into this problem while building a support automation system, and the fix turned out to be less about better prompting and more about something much simpler: memory.

Specifically, persistent agent memory.

Once I added that piece, the system stopped behaving like a goldfish.

What the System Actually Does

The system I built is a customer support agent that remembers interactions across sessions. Instead of treating every conversation as a fresh request, the agent keeps a structured record of what happened before.

When a user messages support, the system:

Looks up their past interactions
Retrieves relevant memories
Uses that context to generate a response
Stores the outcome of the interaction back into memory

The core architecture is intentionally simple.

The memory layer is where the interesting work happens. Instead of trying to stuff all historical context into prompts, I needed something designed specifically for storing and retrieving conversational state.

That led me to try Hindsight, which is essentially a purpose-built system for persistent agent memory.

Before that, I tried a few naive approaches. None of them worked well.

Why Prompt Engineering Wasn't Enough

The first version of the system used a common pattern: just include the conversation history in the prompt.

That works fine for short interactions, but breaks down quickly:

Prompts get huge
Context windows fill up
Retrieval becomes expensive
Historical information becomes noisy

The bigger problem is that not all information should be treated equally. If a user tells support once that they are frustrated with billing errors, that signal should persist. If they ask about shipping status, that might not matter next week.

I needed something that behaved less like a transcript and more like memory.

In other words, I needed a way to give the system real agent memory.

The Memory Model

The key design decision was to treat each interaction as an event that produces memory artifacts.

Instead of storing raw chat logs, the system stores structured observations about the interaction.
For example:

Customer name
Known issues
Previous failed fixes
Sentiment signals
Resolution outcomes

Those get written into Hindsight after every conversation.

When a user returns, the system retrieves the relevant pieces of memory and injects them into the prompt.

The backend code that retrieves memory looks roughly like this:

Python
memory = hindsight.search(
user_id=user_id,
query=user_message,
top_k=5
)
context = "\n".join([m.content for m in memory])

The important part isn't the code. It's the behavior.

Instead of feeding the model everything, the system feeds it the right things.

That alone dramatically improved response quality.

Writing Memories Back

Retrieval only works if the memory store is constantly updated.

After every interaction, the system extracts the outcome of the conversation and writes it back to the memory store.

This happens after the LLM generates a response.

Python
memory_entry = {
"user_id": user_id,
"event": "support_interaction",
"summary": conversation_summary,
"resolution": outcome
}

hindsight.write(memory_entry)
The first few iterations of this were messy.

I initially tried storing entire conversations, but those quickly became noisy and redundant. Eventually I settled on storing short summaries of interactions instead.

This kept the memory store useful instead of bloated.

The Retrieval Loop

Once both retrieval and writing were in place, the agent started behaving very differently.

The request flow looks like this:

User sends message
Retrieve past memory
Construct prompt with relevant history
Generate response
Store summarized outcome

A simplified version of the response generation looks like this:

Python
prompt = f"""
Customer message:
{user_message}

Relevant past interactions:
{context}

Respond helpfully and avoid repeating failed solutions.
"""
response = llm.generate(prompt)

That one instruction — avoid repeating failed solutions — became surprisingly powerful once the system actually knew what had failed before.

What This Looks Like in Practice

The difference shows up clearly when the same customer comes back.

First Interaction
Customer:
My billing is wrong this month.
Agent:
I'm sorry about that. Can you provide your account email and invoice number?
The system records:

Issue: billing discrepancy
Sentiment: frustrated
Resolution: unresolved

Fifth Interaction
Customer:
My bill is still wrong again.

The agent already knows:

This user had a billing issue last month
They were frustrated
The previous fix did not work

Instead of asking the same questions again, the response becomes:

Hi Alex — I see you had a billing issue last month that wasn't resolved. Let's fix that properly this time. I'm escalating this directly to billing so you don't have to repeat the details again.

That behavior doesn't come from clever prompting.
It comes from memory.

Why Retrieval Matters More Than Storage

One of the biggest lessons from building this system was that retrieval quality matters far more than storage volume.

You can store thousands of interactions, but if retrieval pulls irrelevant memories, the model gets confused quickly.

A smaller, well-filtered memory set works much better.

This is one reason I ended up using Hindsight agent memory. It focuses heavily on retrieving the most relevant context rather than dumping everything back into the model.

In practice, that means returning only a handful of memory entries for each request.

Five memories consistently worked better than fifty.

The Escalation Problem

Another tricky part of the system was escalation.

Not every issue should be handled by the agent. Some problems require a human support rep.

The agent decides whether to escalate based on a few signals:

unresolved issues
repeated failures
high frustration sentiment

When escalation happens, the system generates a summary of the entire interaction history and passes it to a human agent.
Something like this:

Customer: Alex
Known issues: billing discrepancy
Previous attempts: invoice correction
Sentiment: frustrated
Resolution: unresolved

The human agent now starts with context instead of a blank screen.

That alone cuts a lot of friction out of support workflows.

Lessons From Building It

A few practical lessons came out of this project.

1. Stateless chatbots are fundamentally limited

Most bots fail because they reset context every session.

Without persistent memory, you can't build real relationships with users.

2. Memory should be structured

Raw transcripts are noisy.

Short summaries and labeled events are far easier to retrieve and reason about.

3. Retrieval beats bigger prompts

Better context selection improves responses more than longer prompts.

Focus on relevance, not volume.

DEV Community: Tharunika Shri

Why My Support Bot Finally Stopped Acting Like a Goldfish