What should an AI coding agent learn after a failed run?

Max Baluev — Sat, 13 Jun 2026 04:32:48 +0000

I am building AccInt (https://accint.xyz/), a local Work Model for agent-run work. The product is early, but the technical question is broader than one tool:

When an AI coding agent fails, what exactly should be learned?

Most agent-memory discussions stop at storing more context. That helps recall, but it does not answer the harder engineering question: which context, action, check, or decision actually helped a future run land?

The unit I am testing is a settled commitment:

What did the agent think it was going to do?
Which files, docs, traces, or prior runs did it retrieve?
What action did it take?
What needed human approval?
What did tests, reviewers, or production reality say after?
Which pieces should get stronger next time, and which should be penalized?

For coding agents, this can be grounded in practical signals:

test results
diffs that actually shipped
failed commands and their fixes
reviewer corrections
repeated repo navigation mistakes
whether a future similar task takes fewer steps

That is the gap I am trying to make concrete with AccInt: not just a memory store, not just a trace viewer, and not just orchestration. A local learning substrate that turns agent activity into a Work Model, running on hardware you control.

The first wedge is Claude Code / Codex / OpenCode / MCP-style workflows near real repos, because those runs already produce commitments, diffs, tests, and outcomes.

If you use coding agents seriously, I would value feedback:

What evidence would you trust enough to update an agent memory?
What should never be learned automatically?
What would make this safe enough to use on a real codebase?

Early access / context: https://accint.xyz/

AccInt: a Work Model for AI coding agents

Max Baluev — Sat, 13 Jun 2026 00:44:36 +0000

I have been building AccInt, a local work loop for AI coding agents.

The short version: agents do not just need generic memory. They need a Work Model: a record of the context retrieved, decisions made, failed attempts, tests run, and outcomes that proved whether the work actually landed.

That matters because repeated agent work usually fails in the same places:

the right context was not retrieved next time
a past failed attempt was repeated
passing tests were not connected back to the decision that caused them
memory grew, but no one knew which memory earned its keep

AccInt is my attempt at making that feedback loop explicit. It uses late-interaction / MaxSim retrieval over scored tokens, commitments and outcomes, and surprise-gated credit so useful context gets stronger only when reality validates it.

I am especially looking for feedback from people using Claude Code, OpenCode, Codex, or building agentic devtools / RAG systems:

Where do your agents repeat the same mistakes?
What evidence should count as useful memory?
What would make a Work Model useful in your workflow?

Early access: https://accint.xyz/

DEV Community: Max Baluev

What should an AI coding agent learn after a failed run?

AccInt: a Work Model for AI coding agents