DEV Community

CodeKing
CodeKing

Posted on

"My Coding Agent Remembered Sessions, Not Work. That Was the Bug"

The first version of my coding agent had a very common bug: it remembered the conversation, but not the work.

That sounds fine until the agent has to do something real.

I would start a task from the web UI, continue it from a mobile channel, approve one command, ask for progress later, and then discover that the system was mostly guessing from the last few messages. It knew there was a session. It did not really know what job that session belonged to.

That is the difference between a chatbot and a working assistant.

The Problem Was The Unit Of Memory

Most agent systems begin with a simple shape:

conversation -> runtime session -> messages
Enter fullscreen mode Exit fullscreen mode

That works for demos because the user does one thing at a time.

It breaks when the user behaves normally:

  • "continue the routing task"
  • "use Claude Code to review what Codex just changed"
  • "what happened with the thing from yesterday?"
  • "retry that, but keep the same working directory"

None of those are really about a chat session. They are about work.

A runtime session can crash. A user can switch from web to Telegram or Feishu. Two agents can work on the same issue from different roles. If the system treats the runtime session as the main identity, every one of those cases becomes fragile.

The Fix: Split Work From Execution

In CliGate, I started moving the design toward a different model:

Person
  -> Project
    -> Task
      -> Execution
        -> RuntimeSession
Enter fullscreen mode Exit fullscreen mode

The important part is not the diagram. It is the boundary.

A Task is the thing the user thinks they are doing: "fix routing", "review the auth change", "write release notes", "check why the build failed".

An Execution is one concrete attempt to move that task forward. It may be Codex acting as the editor, Claude Code acting as a reviewer, or another provider doing a focused job.

A RuntimeSession is just the current process or provider session underneath that execution.

That means the assistant can say: this is still the same task, even if the runtime process has changed.

Why This Matters In Real Use

The most annoying bugs came from follow-ups.

When I typed:

make the button green

I did not mean "start an unrelated new job." I meant "continue the last task with the same context."

When I typed:

use cc to review it too

I did not mean "replace the current agent." I meant "spawn a second execution under the same task, with a reviewer role."

Those two messages look similar if all you have is chat history. They are very different if the system has a task model.

Once the assistant can distinguish task identity from execution identity, a few things become much easier:

  • status questions can be answered from task state
  • provider preference can follow the work instead of the channel
  • a dead runtime can be replaced without pretending the task is new
  • multiple agents can collaborate without sharing one messy transcript
  • web UI and mobile channels can show different levels of detail

That last point surprised me. On mobile, I want a short answer: "Codex is waiting for approval." In the web UI, I may want the full timeline: user message, assistant decision, runtime event, command output, file changes, approval, result.

Same task. Different presentation.

The Rule I Wish I Had Started With

If the user can reasonably ask "what happened with that thing?", that thing deserves an identity outside the chat transcript.

For my project, that identity became Task.

The runtime session is still useful. It preserves provider context and lets the agent resume efficiently. But it should not be the thing the product uses to understand the user's work.

Sessions are implementation details. Work is the product surface.

What Changed

I am still iterating on the architecture, but the direction already cleaned up several design decisions:

  • follow-ups route to tasks, not just the latest session
  • retries can keep the same task identity
  • reviewer agents can attach to the same task as editor agents
  • approvals can be remembered at task or project scope
  • channel messages can stay short without losing full traceability in the dashboard

This also made failure handling less awkward. If a runtime dies, the assistant does not need to tell the user "your session is gone, please start over." It can start a new runtime under the same execution or create a fresh execution under the same task, depending on what actually failed.

That is a small implementation detail with a large UX effect.

The Takeaway

I used to think agent memory meant better summaries of previous messages.

Now I think the more important question is: what are you summarizing into?

If everything collapses back into a conversation, the assistant will eventually lose the shape of the work. If the product has explicit projects, tasks, executions, and runtime sessions, the agent has somewhere stable to put its memory.

That has become one of the design principles behind CliGate.

If you are building coding agents, how are you modeling the difference between a conversation, a task, and a runtime session?

Top comments (0)