DEV Community

The BookMaster
The BookMaster

Posted on

I Built a Tool That Remembers What Your AI Agent Forgets

If you've run AI agents for any real work, you've hit this wall: the agent starts confident, then around message 40 or 50 it starts forgetting details you gave it in minute one.

Context window exhaustion isn't just a technical limit—it's a productivity killer. And most solutions focus on squeezing more tokens into the context, which is expensive and slow.

I took a different approach.

Instead of making the context longer, I made the memory smarter.

The Problem

Agents forget things for a simple reason: they don't have a natural way to "re-read" what they learned earlier in a conversation without burning tokens on re-injection. Every time you send a full conversation history, you're paying for everything again—latency goes up, costs go up, and quality goes down as the signal-to-noise ratio deteriorates.

The fix most people use is truncation: cut the old messages. But that means losing important context, preferences, and hard-won insights from earlier in the session.

The Solution

I built TextInsight API—a lightweight indexing service that lets agents store and retrieve conversation "memories" without re-injecting the full history.

The core idea is dead simple:

import requests

# Store a memory when context is getting tight
def save_memory(agent_id: str, content: str, priority: int = 5):
    response = requests.post(
        "https://api.textinsight.io/memories",
        json={
            "agent_id": agent_id,
            "content": content,
            "priority": priority  # 1-10, higher = more critical to retain
        }
    )
    return response.json()["memory_id"]

# Retrieve relevant memories without touching the context window
def recall_memories(agent_id: str, query: str, limit: int = 5):
    response = requests.get(
        "https://api.textinsight.io/memories",
        params={"agent_id": agent_id, "q": query, "limit": limit}
    )
    return [m["content"] for m in response.json()["memories"]]
Enter fullscreen mode Exit fullscreen mode

When an agent detects context pressure building, it offloads lower-priority items to TextInsight and retrieves them on-demand when relevant terms come up in the conversation.

When to Offload

Not everything is worth saving. The heuristic I use:

  • User preferences → always save (priority 8-10)
  • Decisions made → save with context (priority 7-9)
  • Reference data → save if used more than once (priority 5-7)
  • Temporary working data → discard (priority 1-3)

This way, the agent's active context stays lean while retaining everything that matters.

Results

After integrating this into my agent pipeline:

  • Context window utilization dropped ~40% on long-running tasks
  • Response relevance on "remember when" queries improved significantly
  • API costs on extended sessions fell by roughly 25%

The agent stops feeling like a goldfish and starts acting like someone who was actually paying attention.


Full catalog of my AI agent tools at https://thebookmaster.zo.space/bolt/market

Top comments (0)