<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Max Baluev</title>
    <description>The latest articles on DEV Community by Max Baluev (@max_baluev_4390903d1f3998).</description>
    <link>https://dev.to/max_baluev_4390903d1f3998</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3982031%2F84127fbf-8f95-4696-ab77-7fc18cb65994.png</url>
      <title>DEV Community: Max Baluev</title>
      <link>https://dev.to/max_baluev_4390903d1f3998</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/max_baluev_4390903d1f3998"/>
    <language>en</language>
    <item>
      <title>What should an AI coding agent learn after a failed run?</title>
      <dc:creator>Max Baluev</dc:creator>
      <pubDate>Sat, 13 Jun 2026 04:32:48 +0000</pubDate>
      <link>https://dev.to/max_baluev_4390903d1f3998/what-should-an-ai-coding-agent-learn-after-a-failed-run-5aek</link>
      <guid>https://dev.to/max_baluev_4390903d1f3998/what-should-an-ai-coding-agent-learn-after-a-failed-run-5aek</guid>
      <description>&lt;p&gt;I am building AccInt (&lt;a href="https://accint.xyz/" rel="noopener noreferrer"&gt;https://accint.xyz/&lt;/a&gt;), a local Work Model for agent-run work. The product is early, but the technical question is broader than one tool:&lt;/p&gt;

&lt;p&gt;When an AI coding agent fails, what exactly should be learned?&lt;/p&gt;

&lt;p&gt;Most agent-memory discussions stop at storing more context. That helps recall, but it does not answer the harder engineering question: which context, action, check, or decision actually helped a future run land?&lt;/p&gt;

&lt;p&gt;The unit I am testing is a settled commitment:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What did the agent think it was going to do?&lt;/li&gt;
&lt;li&gt;Which files, docs, traces, or prior runs did it retrieve?&lt;/li&gt;
&lt;li&gt;What action did it take?&lt;/li&gt;
&lt;li&gt;What needed human approval?&lt;/li&gt;
&lt;li&gt;What did tests, reviewers, or production reality say after?&lt;/li&gt;
&lt;li&gt;Which pieces should get stronger next time, and which should be penalized?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For coding agents, this can be grounded in practical signals:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;test results&lt;/li&gt;
&lt;li&gt;diffs that actually shipped&lt;/li&gt;
&lt;li&gt;failed commands and their fixes&lt;/li&gt;
&lt;li&gt;reviewer corrections&lt;/li&gt;
&lt;li&gt;repeated repo navigation mistakes&lt;/li&gt;
&lt;li&gt;whether a future similar task takes fewer steps&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the gap I am trying to make concrete with AccInt: not just a memory store, not just a trace viewer, and not just orchestration. A local learning substrate that turns agent activity into a Work Model, running on hardware you control.&lt;/p&gt;

&lt;p&gt;The first wedge is Claude Code / Codex / OpenCode / MCP-style workflows near real repos, because those runs already produce commitments, diffs, tests, and outcomes.&lt;/p&gt;

&lt;p&gt;If you use coding agents seriously, I would value feedback:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What evidence would you trust enough to update an agent memory?&lt;/li&gt;
&lt;li&gt;What should never be learned automatically?&lt;/li&gt;
&lt;li&gt;What would make this safe enough to use on a real codebase?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Early access / context: &lt;a href="https://accint.xyz/" rel="noopener noreferrer"&gt;https://accint.xyz/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>productivity</category>
    </item>
    <item>
      <title>AccInt: a Work Model for AI coding agents</title>
      <dc:creator>Max Baluev</dc:creator>
      <pubDate>Sat, 13 Jun 2026 00:44:36 +0000</pubDate>
      <link>https://dev.to/max_baluev_4390903d1f3998/accint-a-work-model-for-ai-coding-agents-427e</link>
      <guid>https://dev.to/max_baluev_4390903d1f3998/accint-a-work-model-for-ai-coding-agents-427e</guid>
      <description>&lt;p&gt;I have been building &lt;a href="https://accint.xyz/" rel="noopener noreferrer"&gt;AccInt&lt;/a&gt;, a local work loop for AI coding agents.&lt;/p&gt;

&lt;p&gt;The short version: agents do not just need generic memory. They need a &lt;strong&gt;Work Model&lt;/strong&gt;: a record of the context retrieved, decisions made, failed attempts, tests run, and outcomes that proved whether the work actually landed.&lt;/p&gt;

&lt;p&gt;That matters because repeated agent work usually fails in the same places:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the right context was not retrieved next time&lt;/li&gt;
&lt;li&gt;a past failed attempt was repeated&lt;/li&gt;
&lt;li&gt;passing tests were not connected back to the decision that caused them&lt;/li&gt;
&lt;li&gt;memory grew, but no one knew which memory earned its keep&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AccInt is my attempt at making that feedback loop explicit. It uses late-interaction / MaxSim retrieval over scored tokens, commitments and outcomes, and surprise-gated credit so useful context gets stronger only when reality validates it.&lt;/p&gt;

&lt;p&gt;I am especially looking for feedback from people using Claude Code, OpenCode, Codex, or building agentic devtools / RAG systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Where do your agents repeat the same mistakes?&lt;/li&gt;
&lt;li&gt;What evidence should count as useful memory?&lt;/li&gt;
&lt;li&gt;What would make a Work Model useful in your workflow?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Early access: &lt;a href="https://accint.xyz/" rel="noopener noreferrer"&gt;https://accint.xyz/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>devops</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
