CodeKing

Posted on Jul 3

"My Coding Agent Remembered Sessions, Not Work. That Was the Bug"

#webdev #javascript #ai #opensource

A coding agent can keep a thread alive and still feel forgetful.

That was the annoying part I ran into after fixing session continuity in CliGate, my local control plane for a resident assistant, Claude Code, Codex CLI, model routing, channels, and scheduled work.

The session was there. The follow-up worked. But repeated tasks were still slower than they should have been.

Why?

Because the agent remembered the conversation, not the work.

A live session is not the same thing as usable memory

Session continuity solves one real problem.

It keeps follow-ups like these from turning into a fresh start:

continue
do the same for this file
retry that
explain the error

That matters.

But it does not solve another problem that shows up the second you repeat a workflow a few days later.

If the agent previously figured out:

which button actually works
which step is a dead end
which field needs special handling
which rule the user always wants applied

then a still-open session is not enough.

The useful thing is not that the thread exists.

The useful thing is that the agent can recall what made the last run succeed.

The bug was “process amnesia”

The first run usually contains the expensive part.

That is when the agent explores, verifies, backtracks, and discovers the tiny details that are never in the idealized prompt:

this page hides the action under a menu
this editor is actually an iframe
this project wants Chinese replies and concise status updates
this environment URL is different from production

Before I fixed this in CliGate, those details existed only as raw execution history.

That meant the agent technically had logs, but not reusable memory.

On the next similar task, it often had to rediscover too much.

That is not intelligence.

That is paying the same debugging cost twice.

I stopped treating memory like a chat transcript problem

The wrong model was:

save more history -> keep more context -> hope the next run uses it well

That helps a little, but it gets noisy fast.

What I actually needed was a smaller, more reusable layer:

procedures
facts
directives
references

In other words, not “everything that happened,” but “what should be remembered from what happened.”

That changed the shape of the system.

Instead of forcing the next run to infer the lesson from a giant transcript, the assistant now has a file-based memory layer that can recall:

a procedure: the current best steps and known dead ends
a fact: a URL, environment rule, or known setting
a directive: how the user wants things done
a reference: where the relevant doc lives

That is much closer to how people actually work.

The important rule was verify-then-trust

The trap with workflow memory is obvious.

Interfaces change.

Buttons move.

Old steps decay.

So I did not want “perfect replay.”

I wanted something closer to this:

recall the previous best procedure
-> try it first
-> verify each important step
-> if it no longer works, fall back to exploration
-> update the memory after success

That turned out to be the difference between brittle automation and useful operational memory.

A remembered workflow should save exploration, not replace judgment.

This also fixed a second problem: user rules kept getting lost in the noise

There is another kind of “memory” that does not belong in a runtime session at all.

Things like:

always reply in Chinese
keep answers concise
do not touch production data
this project uses this test environment

Those are not just follow-up context.

They are standing operating rules.

Once I separated those from ordinary conversation history, the assistant became much more predictable.

The model no longer had to rediscover the same user preference in the middle of a task. It could start from it.

That sounds small, but it reduces a lot of friction.

The real improvement was skipping re-discovery

The best sign that this change mattered was not that the memory layer looked elegant.

It was that repeat tasks got shorter.

The assistant could move faster because it was no longer starting with an empty tactical model every time.

Instead of:

inspect again
guess again
retry again
rediscover the same dead end

it could do this:

recall the best-known path
verify it
continue from there

That is a better loop for both desktop-style workflows and non-desktop tasks.

And it scales better than pretending one long-lived session can stand in for real memory.

The rule I am keeping

If you are building a coding agent, do not confuse “the thread is still attached” with “the system learned something useful.”

A session helps with continuity.

Memory helps with repeated work.

Those are different layers.

The session should keep the conversation alive.

The memory layer should keep the useful lessons alive.

Once I separated those two, CliGate started feeling less like a chat system with a very long buffer and more like an assistant that can actually learn how work gets done.

If you are building agent workflows, is your system remembering the thread, or remembering the successful procedure?

Top comments (1)

Maria andrew • Jul 3

A valuable distinction session continuity preserves conversation, while procedural memory preserves expertise. For AI agents, storing verified workflows, user preferences, and reusable patterns is far more effective than relying on long chat histories alone.