I am building AccInt (https://accint.xyz/), a local Work Model for agent-run work. The product is early, but the technical question is broader than one tool:
When an AI coding agent fails, what exactly should be learned?
Most agent-memory discussions stop at storing more context. That helps recall, but it does not answer the harder engineering question: which context, action, check, or decision actually helped a future run land?
The unit I am testing is a settled commitment:
- What did the agent think it was going to do?
- Which files, docs, traces, or prior runs did it retrieve?
- What action did it take?
- What needed human approval?
- What did tests, reviewers, or production reality say after?
- Which pieces should get stronger next time, and which should be penalized?
For coding agents, this can be grounded in practical signals:
- test results
- diffs that actually shipped
- failed commands and their fixes
- reviewer corrections
- repeated repo navigation mistakes
- whether a future similar task takes fewer steps
That is the gap I am trying to make concrete with AccInt: not just a memory store, not just a trace viewer, and not just orchestration. A local learning substrate that turns agent activity into a Work Model, running on hardware you control.
The first wedge is Claude Code / Codex / OpenCode / MCP-style workflows near real repos, because those runs already produce commitments, diffs, tests, and outcomes.
If you use coding agents seriously, I would value feedback:
- What evidence would you trust enough to update an agent memory?
- What should never be learned automatically?
- What would make this safe enough to use on a real codebase?
Early access / context: https://accint.xyz/
Top comments (0)