When I work on a tricky problem I often go down random directions. I explore, learn what I can, and bring the learnings back. And I remember not to go down there again. This exploration in my mind is similar to Age of Empires fog of war or slime mold exploring its surroundings and then finding the right path.
AI agents could learn to benefit from doing the same, and they do for the most part, but they've got a bottleneck. They can't forget useless things. It's all appended in their context, or compacted when the context window is about to be reached. This doesn't help and can be improved.
Right now there are a few ways agents deal with long context
- Summarization Compress old messages into a summary. But summaries are lossy. You lose the reasoning chain the failed attempts, the "why" behind decisions
- Stores Store stuff externally and retrieve when needed. But this is passive. The agent doesn't control what's stored or when it's retrieved.
- Just... keep everything Until you hit the context limit and things get weird.
There's also some cool research happening here. The Reflexion framework lets agents reset their environment and try again, keeping self-reflections in memory. Tree of Thoughts explores multiple reasoning paths. And there's recent work like "Memory as Action" and "Git Context Controller" that treat context management as something the agent should actively control.
But they're either architectural changes (you need to rebuild your whole system) or they're too drastic (reset everything and start over).
history.go(-10, "Bruh this path was a dead end")
What if we gave the agent a tool? Something dead simple:
Ok I've learned enough here. Take this message as my learning, drop the last 20 messages, and let's continue.
That's it. The agent decides when to use it. The agent decides what to keep, decides how far back to go.
The important bit here is that the agent initiates this, not some external system watching token counts. The agent realizes "I've been spinning my wheels" or "I found what I needed" and actively chooses to clean up after itself. I think the smarter models like Opus 4.5 and GPT-5.2 will be able to do this well.
I'm imagining a tool like:
time_travel(
learning: "The API doesn't support batch operations, need to loop instead",
steps_back: 15 // or go_to: "message_id_12345"
)
The agent calls this. The system:
- Takes the learning and injects it back as context (system prompt or user message)
- Drops the last N messages from the conversation
- Continues from there
There's this thing called Physarum polycephalum. It's a slime mold. It doesn't have a brain, but it can solve mazes and find optimal paths between food sources. Researchers even used it to recreate the Tokyo rail network. Here is a cool video of it in action:
How does it do this? It explores everywhere at first. Then it withdraws from dead ends while reinforcing successful paths. It doesn't keep track of every failed path — it just stops going there.
That's what I want for agents. Explore, learn, withdraw, reinforce. Not "append everything forever and hope the model figures it out."
Top comments (0)