<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Peter Tamas</title>
    <description>The latest articles on DEV Community by Peter Tamas (@kondvik).</description>
    <link>https://dev.to/kondvik</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3865877%2F6bf2c760-4e66-4d57-99bf-a118885f93d5.jpeg</url>
      <title>DEV Community: Peter Tamas</title>
      <link>https://dev.to/kondvik</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kondvik"/>
    <language>en</language>
    <item>
      <title>Yesterday's "not worth it" is today's quick win</title>
      <dc:creator>Peter Tamas</dc:creator>
      <pubDate>Tue, 28 Apr 2026 09:41:14 +0000</pubDate>
      <link>https://dev.to/kondvik/yesterdays-not-worth-it-is-todays-quick-win-bm0</link>
      <guid>https://dev.to/kondvik/yesterdays-not-worth-it-is-todays-quick-win-bm0</guid>
      <description>&lt;p&gt;Every team has a list of these. The repetitive jobs nobody wants, that everyone agrees should be automated, but never quite make it onto the roadmap. The ones where the right thing to do is build a proper UI, a proper pipeline, a proper anything. But the right thing costs three weeks, and the wrong thing (a human doing it 40 times a month) costs less. So we hire a student. We delegate it to an intern. We rotate it around the team. We grumble.&lt;/p&gt;

&lt;p&gt;I don't think we noticed the moment the math changed.&lt;/p&gt;

&lt;h2&gt;
  
  
  The configuration drawer
&lt;/h2&gt;

&lt;p&gt;We have one of these on a platform we work on at Bobcats Coding. The team needs to produce a configuration for the application based on client input, every time a campaign goes live. The proper solution would be a configuration UI on the admin site, where the client could create the configs themselves through a structured form. We have wanted to build it for a long time. But you know how it goes: the client wants the configs immediately, and other features always end up higher on the list than automating the day-to-day workflow.&lt;/p&gt;

&lt;p&gt;So creating each configuration takes about 15 minutes. Read the client input, figure out the not-quite-trivial mappings to values in our DB, deal with the exceptions (every config has at least one), produce the JSON seed. Fast for an experienced dev. Painful when there are 30 of them in a queue. Familiar?&lt;/p&gt;

&lt;p&gt;We hired a student to do it. Cheapest solution. And for years, that math made sense.&lt;/p&gt;

&lt;h2&gt;
  
  
  The math changed and we almost missed it
&lt;/h2&gt;

&lt;p&gt;Here is what is different now. The cost of "automating" this kind of task collapsed. Not by 10%, not by half. It went from "three weeks of dev time we don't have" to "one focused afternoon with Claude in plan mode."&lt;/p&gt;

&lt;p&gt;That is a different threshold. And once the threshold moves, the list of things worth automating expands. Suddenly the boring 15-minute task is on the table. Suddenly the messy one-off, the script you would never have committed to, the thing you used to delegate to the cheapest pair of hands you could find. All of it is a candidate.&lt;/p&gt;

&lt;p&gt;This is the part I keep coming back to. The mindshift is not "AI writes code now." Plenty of people have written that already. The mindshift is: the set of problems where automation is worth it just got dramatically bigger.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it actually looks like
&lt;/h2&gt;

&lt;p&gt;For our configuration task, we did this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Wrote down the onboarding material. The same thing we would write for a new student joining the team.&lt;/li&gt;
&lt;li&gt;Opened Claude Code in plan mode. Threw in the onboarding doc, a few previous client inputs paired with the hand-made outputs, and the bits of code and documentation that the task touches.&lt;/li&gt;
&lt;li&gt;Asked Claude to turn all of it into a slash command. Asked it to ask back if anything was unclear. (It always has questions. Usually good ones.)&lt;/li&gt;
&lt;li&gt;Specified the input arguments for the command, including a final freeform argument for "any custom instructions for this specific config."
What came out the other side is a &lt;code&gt;/create-configuration&lt;/code&gt; command we can call like a function:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/create-configuration &amp;lt;client input spreadsheet url&amp;gt; &amp;lt;optional additional instructions&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two things worth noting. First, Claude wrote a better description of the task than we ever had on paper. The command file is, in effect, a human-readable functional spec for a process that lived in our heads for years. Second, because the output is a JSON object, the command also writes a test that checks the output values against the input spreadsheet. So we got testable automation, not a nice-looking script that fails silently.&lt;/p&gt;

&lt;p&gt;Time per config: from 15 minutes to about 2. Across the volume we run, that is roughly 70% of what the student was doing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bit I did not expect
&lt;/h2&gt;

&lt;p&gt;I asked the student to automate the remaining 30%. He did. We have another command now, for another use case, written by the person who actually understood the corners and the exceptions.&lt;/p&gt;

&lt;p&gt;He still works with us. He runs the commands every day. But the work he does on top of them is the interesting part now: the cases the commands cannot handle yet, the edges, the new clients, the things that need judgment. He likes his job more. And we get more value out of his time.&lt;/p&gt;

&lt;p&gt;This is the part that does not show up in the AI-replaces-jobs takes. When the threshold moves, the people who used to do the repetitive 70% do not disappear. They move up the stack. The work they do becomes the work that actually needs a human. Which, it turns out, is more interesting work.&lt;/p&gt;

&lt;h2&gt;
  
  
  A mental model
&lt;/h2&gt;

&lt;p&gt;I have started running a quick check on tasks I used to skip past. Roughly:&lt;/p&gt;

&lt;p&gt;If I can onboard a new colleague to a task that they could solve in front of a computer, I can probably onboard Claude to it. Without the forgetting, the typos, the inconsistent output, or the bad day on Tuesday.&lt;/p&gt;

&lt;p&gt;Not just code. Anything that fits the shape of "structured input, defined process, evaluable output." Configurations. Reports. Data cleanup. Migrations. The boring layer of work between the interesting tickets.&lt;/p&gt;

&lt;h2&gt;
  
  
  Your homework
&lt;/h2&gt;

&lt;p&gt;Pick one task on your team that nobody wanted to automate because it was not worth it. Then:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Imagine onboarding a new colleague to it. Write that down.&lt;/li&gt;
&lt;li&gt;Collect a few previous executions and any related docs.&lt;/li&gt;
&lt;li&gt;Decide how you would evaluate the output.&lt;/li&gt;
&lt;li&gt;Open Claude in plan mode, throw all of it in, and ask for a command.&lt;/li&gt;
&lt;li&gt;Try it. Iterate. Ship it.
If you have done this already, I would love to hear what you put on your list. We are running our own at Bobcats Coding, and I suspect we are not done finding things that used to be "not worth it."&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>automation</category>
      <category>management</category>
      <category>productivity</category>
      <category>softwaredevelopment</category>
    </item>
    <item>
      <title>AI Field Notes #004 | Typing is no longer the bottleneck. Thinking is.</title>
      <dc:creator>Peter Tamas</dc:creator>
      <pubDate>Sat, 25 Apr 2026 12:27:10 +0000</pubDate>
      <link>https://dev.to/kondvik/ai-field-notes-004-typing-is-no-longer-the-bottleneck-thinking-is-50fn</link>
      <guid>https://dev.to/kondvik/ai-field-notes-004-typing-is-no-longer-the-bottleneck-thinking-is-50fn</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Software engineer behind this case study: Mark Kővári &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Goal: Test whether agentic coding workflows can produce production-grade, architecturally complex software from a phone. The project: &lt;a href="https://github.com/markkovari/clutter" rel="noopener noreferrer"&gt;Clutter&lt;/a&gt;, a polyglot multi-agent orchestration system (Rust, TypeScript, K8s, NATS, SurrealDB).&lt;/p&gt;

&lt;h2&gt;
  
  
  Highlights
&lt;/h2&gt;

&lt;p&gt;The act of writing code is increasingly something an agent can do for you. &lt;em&gt;What&lt;/em&gt; it can't do is decide what to build, how to structure it, when to test, and where to draw boundaries. As more of the typing gets delegated, everything else gets proportionally harder to be good at: specification, architecture, review, testing strategy, and the judgment calls that hold a system together. This experiment is about that shift as much as it is about walking.&lt;/p&gt;

&lt;p&gt;Every AI coding tool that shipped in the past few months converges on the same interaction surface: prompt, forms, tool-use approval, text output. That's the entire human-in-the-loop contract. None of it requires a desktop.&lt;/p&gt;

&lt;p&gt;So I tested a hypothesis: can someone run a full agentic development workflow from a phone while walking the dog?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flaeuv21ct6azs82jinzg.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flaeuv21ct6azs82jinzg.webp" alt="Map of the dog walking" width="800" height="1005"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The bulk of the work happened across a few sessions in late March (March 26-29, 110 commits), with a follow-up on April 9. The walk shown below was a single 18 km, 4-hour session through Budapest, responsible for the biggest commit spike.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Requirement&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Remote execution environment&lt;/td&gt;
&lt;td&gt;Claude Code on a Mac Mini&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Thin client&lt;/td&gt;
&lt;td&gt;Claude iOS app&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Input method&lt;/td&gt;
&lt;td&gt;iOS voice dictation + text&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Nice weather&lt;/td&gt;
&lt;td&gt;Recommended&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Power bank&lt;/td&gt;
&lt;td&gt;Recommended&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dog and a nice view&lt;/td&gt;
&lt;td&gt;Optional but highly recommended&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Claude Code already supports the full agentic loop: file reads, edits, shell commands, tool-use approvals. The process is identical to terminal usage, just accessed via &lt;a href="https://code.claude.com/docs/en/remote-control" rel="noopener noreferrer"&gt;remote session&lt;/a&gt; on iOS.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I built (and why)
&lt;/h2&gt;

&lt;p&gt;The walking experiment created its own problem quickly. Multiple concurrent Claude Code sessions, each on a different task, and switching between remote sessions on a phone was painful. I kept losing track of which session was working on what. The friction wasn't the coding, it was the context-switching.&lt;/p&gt;

&lt;p&gt;So I started building &lt;a href="https://github.com/markkovari/clutter" rel="noopener noreferrer"&gt;Clutter&lt;/a&gt;: a multi-agent orchestration system that manages one-shot agents. Describe a task, fire off an isolated agent, get results back through NATS events. About 80% of it was built while walking, and all of it was written AI-native.&lt;/p&gt;

&lt;p&gt;It also serves an MCP server, so I can create projects and tasks in Clutter directly from a Claude conversation. AI-assisted development building the tool that manages AI-assisted development. During development I was spoon-feeding Clutter its own tasks, and it was creating PRs on its own repo automatically (e.g. &lt;a href="https://github.com/markkovari/clutter/pull/51" rel="noopener noreferrer"&gt;PR #51&lt;/a&gt;, branch &lt;code&gt;agent/picur-agent-1&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;Funny enough, for a while that's all I was doing: developing Clutter &lt;em&gt;with&lt;/em&gt; itself but never using it on other projects. Someone called that out in a meeting. So I created a project targeting an external repo, added a task, and it completed first try. Sometimes you need someone to point out you've been sharpening the knife without cutting anything.&lt;/p&gt;

&lt;p&gt;The best way to judge whether this workflow produces real output is to look at what it produced:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Commits&lt;/td&gt;
&lt;td&gt;112 (commit activity)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Walking sessions&lt;/td&gt;
&lt;td&gt;Main session: ~18 km, ~4 hours (biggest commit day). Overall: March 26-29 + April 9 follow-up&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Languages&lt;/td&gt;
&lt;td&gt;Rust (76%), TypeScript (20%), Gherkin, Dockerfile, Helm&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rust crates&lt;/td&gt;
&lt;td&gt;6 (control-plane, agent-runner, core, embedder, agent-mcp, mcp-server)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TS/Node packages&lt;/td&gt;
&lt;td&gt;3 (dashboard, shared types, shared UI)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Infrastructure&lt;/td&gt;
&lt;td&gt;Docker, Kubernetes, Helm, NATS JetStream, SurrealDB, GitHub Actions CI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Documentation&lt;/td&gt;
&lt;td&gt;Architecture docs, ADRs, glossary, orchestration spec, conventions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tests&lt;/td&gt;
&lt;td&gt;BDD feature specs (Gherkin), unit tests, integration tests&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The system is a Rust/Axum control plane with K8s agent isolation, SurrealDB task queue with atomic claiming, NATS event streaming, a React/Vite real-time dashboard, and a vector embedder for semantic search across agent history. Agents run in air-gapped namespaces because multiple instances on the same machine fight for ports. The point isn't the architecture itself. It's that this level of complexity came out of a phone screen and a pair of walking shoes.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🐈 A group of bobcats is called a "clutter." That's where the name comes from.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The process
&lt;/h2&gt;

&lt;p&gt;Typical cycle: dictate a feature description while walking, review prompt on screen, send. Claude scaffolds the module, pauses for tool-use approval. Review proposed changes, approve or redirect, ask for tests. Another approval round. One feature, three to four approvals, ten minutes of walking.&lt;/p&gt;

&lt;p&gt;What changes without a desk is everything around the core loop. No cmd+click to jump to a definition. No split-screen diff. No grep. All of that gets delegated to the agent: "show me the swarm worker interface," "what imports the NATS subscriber," "run &lt;code&gt;cargo test&lt;/code&gt; and show me failures." The agent becomes the IDE.&lt;/p&gt;

&lt;p&gt;This forces you to work at a higher abstraction level. Instead of navigating files, you describe what you want to see. Instead of reading trait implementations line by line, you ask the agent to summarize. You stay in the intent layer, the agent handles navigation. For a project with this many moving parts, that's arguably the right level anyway.&lt;/p&gt;

&lt;h2&gt;
  
  
  What worked
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ambient development is real.&lt;/strong&gt; The cognitive overhead of the approval loop is low enough that walking actively helps architectural thinking.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Voice-first input&lt;/strong&gt; works for prompt composition. Not perfect, but sufficient.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Phone as thin client.&lt;/strong&gt; Functionally equivalent to a laptop for the human-in-the-loop surface.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cognitive offloading.&lt;/strong&gt; Moving forces you to reason about structure rather than grep through files. Helps with modularity.&lt;/li&gt;
&lt;li&gt;Genuinely fun.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What didn't work
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Voice-to-text accuracy.&lt;/strong&gt; Mishears technical terms and identifiers. Not continuous like Gemini's voice mode either: dictate, review, send.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No push notifications for agent state.&lt;/strong&gt; Had to keep checking whether Claude was waiting for an approval. A notification on agent yield would change this significantly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No code navigation without the agent.&lt;/strong&gt; Every file lookup costs context window tokens.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Session stability.&lt;/strong&gt; Occasional remote session hiccups requiring reopen.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;UI collision.&lt;/strong&gt; Tool-use approval buttons appearing during typing cause misclicks.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Side effects: what building this way does to you
&lt;/h2&gt;

&lt;p&gt;There's something that doesn't show up in the commit count.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sharper instincts for code boundaries.&lt;/strong&gt; When you can't scroll through files, module boundaries need to be explicit and self-explanatory. You notice when an interface is too wide or a module's responsibility is unclear, because those are the moments you need three follow-up questions instead of one. This trains you to read and reason about code structure faster, whether you're on a phone or not.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Code organization follows your mental model.&lt;/strong&gt; When navigation is by description ("show me the task lifecycle," "what handles NATS events"), the codebase starts reflecting how you reason about it. Modules get named for what they do, not where they sit in a directory tree. Interfaces get narrower because you want to ask for one thing and get one thing back.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Traditional guardrails harden continuously.&lt;/strong&gt; When an agent writes the code, you stop trusting that things are correct just because they compile. More tests, stricter type boundaries, better CI, more explicit conventions. The BDD specs, the CONVENTIONS.md, the orchestration spec all exist because this workflow surfaces the cost of ambiguity immediately. And this is where the human in the loop actually matters most: every intermediate artifact becomes a quality gate. A PR review isn't just a formality, it's the moment you catch what the agent missed. A rendered UI isn't just a preview, it's verification that intent survived translation. Every checkpoint (a green CI run, a visual diff, a passing smoke test) carries more weight now because the thing that produced the code between checkpoints isn't reasoning the way you would. The quality assurance surface doesn't shrink when agents write code. It grows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;I built a polyglot multi-agent orchestration system with 112 commits, BDD specs, architecture docs, and a real-time dashboard, mostly from a phone while walking. The project itself was born from the friction of doing exactly that.&lt;/p&gt;

&lt;p&gt;The limiting factor isn't the device. It's how well you know your architecture and how clearly you can describe intent to the agent.&lt;/p&gt;

&lt;p&gt;As AI coding tools converge on the same agentic loop, the interface becomes thinner. The logical endpoint: the "IDE" is just a notification that your agent needs a decision. Everything else happens in the background.&lt;/p&gt;

&lt;p&gt;If you want to checkout the repository &lt;a href="https://github.com/markkovari/clutter" rel="noopener noreferrer"&gt;https://github.com/markkovari/clutter&lt;/a&gt; and see what I made.&lt;/p&gt;

&lt;p&gt;Highly recommend experimenting with it. Worst case, you go for a nice walk.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>productivity</category>
    </item>
    <item>
      <title>AI Field Notes #003 | When AI Reads Too Much: The Real Price of Complexity</title>
      <dc:creator>Peter Tamas</dc:creator>
      <pubDate>Tue, 14 Apr 2026 16:15:42 +0000</pubDate>
      <link>https://dev.to/kondvik/ai-field-notes-003-when-ai-reads-too-much-the-real-price-of-complexity-2j35</link>
      <guid>https://dev.to/kondvik/ai-field-notes-003-when-ai-reads-too-much-the-real-price-of-complexity-2j35</guid>
      <description>&lt;p&gt;Let’s be honest: reading code is not always as straightforward as we would like. Even experienced developers know that some codebases take more effort to navigate than others. And now, AI has joined the same reality.&lt;/p&gt;

&lt;p&gt;Turns out, when an AI agent walks through a messy codebase, it does not get tired. It gets expensive. Not in time, but in tokens. The more tangled the logic, the more it costs to figure out what is going on. Same confusion, different billing model.&lt;/p&gt;

&lt;p&gt;That is where this tool comes in. Instead of letting packages pile up like an overambitious Jenga tower, it restructures them into a more balanced, layered system. The goal is simple: make the codebase easier to navigate, not just for developers, but for AI agents too.&lt;/p&gt;

&lt;p&gt;Whether you are human or silicon, nobody enjoys digging through chaos. And if we can make code more readable for both, that is not just optimization. It is survival.&lt;/p&gt;

&lt;p&gt;Source: &lt;a href="https://bobcats-coding.notion.site/ai-field-notes-by-bobcats-coding" rel="noopener noreferrer"&gt;https://bobcats-coding.notion.site/ai-field-notes-by-bobcats-coding&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;Goal: Structure node packages so AI agents read less and understand more.&lt;/p&gt;

&lt;p&gt;Specifically: measure how the TypeScript monorepo structure affects context window consumption, and build a tool that quantifies the waste and fixes it.&lt;/p&gt;

&lt;p&gt;Repository: markkovari/context-pnpm&lt;/p&gt;




&lt;h2&gt;
  
  
  Context awareness for node packages
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; Structure node packages so AI agents read less and understand more.&lt;/p&gt;

&lt;p&gt;Specifically: measure how TypeScript monorepo structure affects context window consumption, and build a tool that quantifies the waste and fixes it.&lt;/p&gt;

&lt;p&gt;Repository: &lt;a href="https://github.com/markkovari/context-pnpm" rel="noopener noreferrer"&gt;markkovari/context-pnpm&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Before/After Highlights
&lt;/h2&gt;

&lt;p&gt;When I work on different parts of a codebase with AI assistants, the context window fills up fast. Every file the assistant reads to understand a dependency is loaded in full, including implementation details it will never touch. For a busy utility module, that's thousands of tokens of waste, on every session, across every file that imports it. I kept hitting conversation compacting earlier than expected, and it was slowing me down.&lt;/p&gt;

&lt;p&gt;My theory was that the shape of your modules, how many packages you have, how big they are, how nested, directly influences how many tokens get burned just loading context. But I didn't have numbers. I didn't know the threshold where splitting a module actually pays off versus adding maintenance overhead for no gain.&lt;/p&gt;

&lt;p&gt;So I built a tool to find out.&lt;/p&gt;

&lt;h2&gt;
  
  
  The approach
&lt;/h2&gt;

&lt;p&gt;I wanted to answer a simple question: given a TypeScript codebase, which files are costing you the most tokens per AI session, and is it worth restructuring them?&lt;/p&gt;

&lt;p&gt;The core insight is that file size alone doesn't predict waste. What matters is how much of a file is implementation versus exported API, multiplied by how many files import it. A 10,000-token type declaration file with 98% exports barely registers. A 700-token utility module with a large implementation body, imported by 18 files, costs more than almost anything else.&lt;/p&gt;

&lt;p&gt;I landed on this scoring formula:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;score = (total_tokens − surface_tokens) × importer_count
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Term&lt;/th&gt;
&lt;th&gt;Definition&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;total_tokens&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Full file token count (tiktoken cl100k_base)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;surface_tokens&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Only the exported declarations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;importer_count&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Number of files that import this one&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 If the score is above 60 (the overhead of a &lt;code&gt;package.json&lt;/code&gt; + &lt;code&gt;index.ts&lt;/code&gt; boilerplate), extraction into a separate workspace package is worth it. Below that, leave it alone.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The toolchain
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;External packages&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Package&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;a href="https://github.com/openai/tiktoken" rel="noopener noreferrer"&gt;tiktoken&lt;/a&gt; (OpenAI)&lt;/td&gt;
&lt;td&gt;Accurate token counting with cl100k_base encoding&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;a href="https://typescript-eslint.io/packages/typescript-estree/" rel="noopener noreferrer"&gt;typescript-estree&lt;/a&gt; (typescript-eslint)&lt;/td&gt;
&lt;td&gt;ESTree-compatible AST parser to distinguish exported surface from implementation body&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Internal packages&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Package&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;analyzer&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Reads folders via glob pattern, returns total tokens, surface tokens, and importer counts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;estimator&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Projects token savings per AI session from analyzer output&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;cli&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;User-facing tool: &lt;code&gt;analyze&lt;/code&gt;, &lt;code&gt;estimate&lt;/code&gt;, &lt;code&gt;scaffold&lt;/code&gt;, &lt;code&gt;verify&lt;/code&gt;, &lt;code&gt;rebalance&lt;/code&gt;. Dry-run by default; nothing written without &lt;code&gt;--apply&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;scaffolder&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Rewires imports/exports, registers new pnpm workspace packages, generates minimal &lt;code&gt;index.ts&lt;/code&gt; re-export surfaces&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The process: traverse the module tree, tokenize each file, separate surface from implementation via AST analysis, count importers, score everything, and rank by extraction value. The CLI can then scaffold the actual package extraction if the numbers justify it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmarks
&lt;/h2&gt;

&lt;p&gt;During development, I spoon-fed the tool its own internal packages as test cases and added synthetic fixtures for both extremes: a "symmetric" already-optimized codebase and an "asymmetric" monolith with classic shared-utility anti-patterns.&lt;/p&gt;

&lt;p&gt;But the interesting part was running it against real-world open-source monorepos.&lt;/p&gt;

&lt;h3&gt;
  
  
  External package benchmarks
&lt;/h3&gt;

&lt;p&gt;I ran dry-run estimations against three popular TypeScript repositories:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Codebase&lt;/th&gt;
&lt;th&gt;Files&lt;/th&gt;
&lt;th&gt;Candidates&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;th&gt;Tokens saved / session&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;a href="https://github.com/trpc/trpc" rel="noopener noreferrer"&gt;tRPC&lt;/a&gt; &lt;code&gt;packages/server/src&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;89&lt;/td&gt;
&lt;td&gt;56&lt;/td&gt;
&lt;td&gt;63%&lt;/td&gt;
&lt;td&gt;68,572&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;a href="https://github.com/tanstack/query" rel="noopener noreferrer"&gt;TanStack Query&lt;/a&gt; &lt;code&gt;packages/query-core/src&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;65%&lt;/td&gt;
&lt;td&gt;34,155&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;a href="https://github.com/radix-ui/primitives" rel="noopener noreferrer"&gt;Radix UI&lt;/a&gt; &lt;code&gt;packages/&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;131&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;4%&lt;/td&gt;
&lt;td&gt;1,591&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 Pricing reference: Claude Sonnet input at $3/1M tokens. The tRPC result means ~$0.21 in unnecessary tokens per session, which adds up across a team over weeks.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  tRPC: the "deep internals" anti-pattern
&lt;/h4&gt;

&lt;p&gt;tRPC's &lt;code&gt;unstable-core-do-not-import/&lt;/code&gt; is a textbook case. 56 files fan out to 2-18 consumers each. Every adapter file that an AI session reads drags in the full internals of the procedure builder, router, and streaming infrastructure, even when it only needs one or two types. The top offender, &lt;code&gt;procedureBuilder.ts&lt;/code&gt;, scores 8,260: 4,386 tokens of implementation consumed by 5 importers. After extraction, each consumer would read only a ~200-token surface.&lt;/p&gt;

&lt;h4&gt;
  
  
  TanStack Query: tight coupling in a small graph
&lt;/h4&gt;

&lt;p&gt;31 files, 20 heavily cross-imported. &lt;code&gt;utils.ts&lt;/code&gt; is imported by 17 files and &lt;code&gt;queryClient.ts&lt;/code&gt; by 13. The interesting finding here: &lt;code&gt;types.ts&lt;/code&gt; is the largest file (10,521 tokens) but scores only fifth because 98% of it is surface. &lt;code&gt;utils.ts&lt;/code&gt; scores second despite being half the size, because its implementation body is large relative to what callers use. File size is a bad proxy for waste.&lt;/p&gt;

&lt;h4&gt;
  
  
  Radix UI: the correct negative result
&lt;/h4&gt;

&lt;p&gt;Only 5 candidates from 131 files, all with a single importer. Radix is already decomposed into ~30 packages with 1-5 files each and minimal internal coupling. The tool correctly says "nothing to do." This was an important validation: I needed to confirm it doesn't generate false positives on well-structured code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Synthetic fixtures
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Fixture&lt;/th&gt;
&lt;th&gt;Files&lt;/th&gt;
&lt;th&gt;Candidates&lt;/th&gt;
&lt;th&gt;Tokens saved&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;monolith-service&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;9,936&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;db.ts&lt;/code&gt;, &lt;code&gt;logger.ts&lt;/code&gt;, &lt;code&gt;config.ts&lt;/code&gt; each imported by every other module. Most common anti-pattern.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;decomposed-app&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;Small focused files, 1-2 consumers each. Correct negative.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What surprised me
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 The biggest finding: &lt;strong&gt;file size doesn't predict waste.&lt;/strong&gt; R² ~ 0.15. Importer count alone is equally weak. The strongest predictor is hidden tokens (implementation body), but score is multiplicative (&lt;code&gt;hidden x importers&lt;/code&gt;), so both dimensions matter.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This means you can't eyeball your way to the answer. A module that looks "big" might be mostly type exports and perfectly fine. A module that looks "small" might be silently burning thousands of tokens because it's imported everywhere and its public API is tiny compared to its internals.&lt;/p&gt;

&lt;h2&gt;
  
  
  What worked
&lt;/h2&gt;

&lt;p&gt;The Claude Code hook integration turned out to be the most practical outcome. Wire &lt;code&gt;estimate&lt;/code&gt; as a &lt;code&gt;SessionStart&lt;/code&gt; hook and it automatically surfaces context bloat whenever you open a session:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"SessionStart"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx context-pnpm estimate . 2&amp;gt;/dev/null | grep -E 'Total|No extraction'"&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the codebase is clean, you see &lt;code&gt;No extraction candidates&lt;/code&gt;. If it has drifted, you see the token savings waiting to be unlocked, before you've written a single line of code. This feedback loop keeps the team aware of structural drift without adding process overhead.&lt;/p&gt;

&lt;h2&gt;
  
  
  What didn't work (yet)
&lt;/h2&gt;

&lt;p&gt;The scaffolder, while functional, is the least mature piece. It handles straightforward cases well: generating workspace packages with minimal re-export surfaces and rewriting import paths. But in codebases with circular dependencies or complex re-export chains, the rewiring logic still needs manual intervention. I'm treating this as a "preview" feature while I iterate on edge cases.&lt;/p&gt;

&lt;p&gt;I also initially assumed I could use a simpler heuristic (just file size times importer count) and skip the AST-based surface detection entirely. The Radix UI and TanStack &lt;code&gt;types.ts&lt;/code&gt; results proved that assumption wrong. Without distinguishing surface from implementation, the scoring would have flagged &lt;code&gt;types.ts&lt;/code&gt; as one of the top offenders when it's actually fine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Current status and next steps
&lt;/h2&gt;

&lt;p&gt;The tool is open source and usable today for the read-only commands (&lt;code&gt;analyze&lt;/code&gt;, &lt;code&gt;estimate&lt;/code&gt;). The mutation commands (&lt;code&gt;scaffold&lt;/code&gt;, &lt;code&gt;rebalance&lt;/code&gt;) work but should be used with review.&lt;/p&gt;

&lt;p&gt;Next steps I'm considering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Adding support for JavaScript/JSX alongside TypeScript (partially done)&lt;/li&gt;
&lt;li&gt;Making the scoring engine language-agnostic via &lt;a href="https://tree-sitter.github.io/tree-sitter/" rel="noopener noreferrer"&gt;Tree-sitter&lt;/a&gt;. The core formula is language-independent; I'd only need per-language definitions of "what counts as surface" (Python: &lt;code&gt;__all__&lt;/code&gt;; Go: capitalized identifiers; Rust: &lt;code&gt;pub&lt;/code&gt; items). The &lt;a href="https://github.com/kreuzberg-dev/tree-sitter-language-pack" rel="noopener noreferrer"&gt;tree-sitter-language-pack&lt;/a&gt; bundles 248+ grammars with Rust/Node.js/Python bindings, so the plumbing is there.&lt;/li&gt;
&lt;li&gt;A &lt;code&gt;rebalance&lt;/code&gt; command that identifies merge/split/inline opportunities on existing workspace packages, not just extraction from monoliths&lt;/li&gt;
&lt;li&gt;Better heuristics for the "when to extract" decision, incorporating churn rate from git history alongside the static score&lt;/li&gt;
&lt;li&gt;Integration with CI pipelines so teams get warned when a PR pushes a module past the extraction threshold&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I don't think AI coding assistants will solve the context window problem on their own. Models will get bigger windows, but tokens are never free, and the cost curve is multiplicative with team size. Structuring your code so that AI can read less and understand more is a lever that keeps compounding.&lt;/p&gt;

&lt;h2&gt;
  
  
  Writing code for two audiences
&lt;/h2&gt;

&lt;p&gt;For decades, "clean code" meant code that humans can read and maintain. That's still true, but AI agents are now a second consumer of your codebase. They read your modules, trace your imports, and parse your exports on every session, from scratch, burning tokens the whole time.&lt;/p&gt;

&lt;p&gt;The practices that help humans (small functions, clear separation of concerns) mostly overlap with what helps agents, but not entirely. An agent doesn't care about naming aesthetics. It cares about how many tokens it has to ingest before it can do useful work. A module with a 50-line public API and 2,000 lines of implementation behind it is perfectly clean by human standards, but it's wasteful for an agent that only needs the API.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;context-pnpm&lt;/code&gt; is built around treating AI readability as a first-class design constraint alongside human readability. The two rarely conflict: narrow interfaces, minimal public surface, and well-decomposed modules are good for both. The difference is that now there's a measurable cost when you get it wrong: tokens per session, dollars per month. I think this will quietly become a standard part of how teams think about code architecture, not as a buzzword, but as a practical recognition that your codebase has two kinds of readers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Old principles, new payoff: why SOLID matters for AI readability
&lt;/h2&gt;

&lt;p&gt;Most of what makes code AI-readable isn't new. The Interface Segregation Principle (the "I" in SOLID) says no consumer should depend on methods it doesn't use - that's literally what the scoring formula measures. The Dependency Inversion Principle says depend on abstractions, not implementations - that's what extraction into a minimal re-export surface achieves. IDD formalizes this into "design the interface before the implementation." The difference now is that these principles have a measurable second payoff: every unnecessary token you hide behind an interface is a token the agent doesn't burn.&lt;/p&gt;

&lt;h2&gt;
  
  
  Automatic rebalancing
&lt;/h2&gt;

&lt;p&gt;Extraction is a one-time event, but codebases drift. The &lt;code&gt;rebalance&lt;/code&gt; command (in preview) treats the module tree like a self-balancing tree: merge, split, inline, or extract packages as import patterns change. The missing signal is git churn rate, which I'm exploring to avoid suggesting extraction on modules being actively rewritten.&lt;/p&gt;

&lt;h3&gt;
  
  
  Alternatives and similar tools
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Language&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;th&gt;Difference from context-pnpm&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;a href="https://github.com/tach-org/tach" rel="noopener noreferrer"&gt;Tach&lt;/a&gt; (Gauge)&lt;/td&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;td&gt;Module boundaries, dependency enforcement, strict public interfaces. Written in Rust.&lt;/td&gt;
&lt;td&gt;No token-based scoring&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://arxiv.org/abs/2603.27277" rel="noopener noreferrer"&gt;Codebase-Memory&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;66 languages&lt;/td&gt;
&lt;td&gt;Tree-sitter knowledge graph, 10x fewer tokens via MCP&lt;/td&gt;
&lt;td&gt;Optimizes retrieval, not structure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://github.com/multilang-depends/depends" rel="noopener noreferrer"&gt;Depends&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Java, C/C++, Ruby&lt;/td&gt;
&lt;td&gt;Language-agnostic dependency extraction&lt;/td&gt;
&lt;td&gt;Raw data, no scoring or restructuring&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Link&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Interface-Driven Development&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://milan.milanovic.org/post/interface-driven-development/" rel="noopener noreferrer"&gt;IDD overview&lt;/a&gt; (Milanovic, 2022)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Spec-Driven Development&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://www.infoq.com/articles/spec-driven-development/" rel="noopener noreferrer"&gt;Spec Driven Development&lt;/a&gt; (InfoQ, 2026)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Interface-based programming&lt;/td&gt;
&lt;td&gt;&lt;a href="https://en.wikipedia.org/wiki/Interface-based_programming" rel="noopener noreferrer"&gt;Wikipedia&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dependency graphs at scale&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://www.hudsonrivertrading.com/hrtbeat/dependency-graph-python-codebase/" rel="noopener noreferrer"&gt;Building a Dependency Graph&lt;/a&gt; (HRT, 2025)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dependency graph management&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://www.tweag.io/blog/2025-09-18-managing-dependency-graph/" rel="noopener noreferrer"&gt;Managing dependency graph in a large codebase&lt;/a&gt; (Tweag, 2025)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context engineering&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://martinfowler.com/articles/exploring-gen-ai/context-engineering-coding-agents.html" rel="noopener noreferrer"&gt;Context Engineering for Coding Agents&lt;/a&gt; (Fowler, 2026)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Token optimization research&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://arxiv.org/abs/2603.27277" rel="noopener noreferrer"&gt;Codebase-Memory&lt;/a&gt; (arXiv, 2026)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context strategies&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://www.faros.ai/blog/context-engineering-for-developers" rel="noopener noreferrer"&gt;Context Engineering for Developers&lt;/a&gt; (Faros AI, 2025)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>codequality</category>
    </item>
    <item>
      <title>AI FIELD NOTES #002 – Weekly memos for Engineering Leaders</title>
      <dc:creator>Peter Tamas</dc:creator>
      <pubDate>Tue, 07 Apr 2026 13:12:03 +0000</pubDate>
      <link>https://dev.to/kondvik/ai-field-notes-002-weekly-memos-for-engineering-leaders-1o52</link>
      <guid>https://dev.to/kondvik/ai-field-notes-002-weekly-memos-for-engineering-leaders-1o52</guid>
      <description>&lt;p&gt;You are a software engineer, so you know that feeling. You are deep in dependency hell, reading library docs, digging through version histories, staring at compatibility matrices. Running tools to detect transitive version conflicts.&lt;/p&gt;

&lt;p&gt;I got tired of it, too. So I asked Opus 4.6 to build a version checker hook and let AI deal with this problem instead of me. It turned out to be one of the most quietly impactful things I've added to my workflow.&lt;/p&gt;

&lt;p&gt;At &lt;a href="https://www.bobcatscoding.com/ai-field-notes" rel="noopener noreferrer"&gt;Bobcats Coding&lt;/a&gt;, we've deliberately built AI into our delivery system: how we specify, how we build, how we test, and how we learn. (Of course, if you are in the middle of a large legacy codebase with years of untouched dependencies, your mileage may vary, but that's a story for another time.:))&lt;/p&gt;

&lt;h2&gt;
  
  
  The core issue
&lt;/h2&gt;

&lt;p&gt;One of my recurring frustrations with AI-generated code was debugging problems caused by poorly selected dependency versions. When you let AI add a dependency to a project, it usually does &lt;strong&gt;not&lt;/strong&gt; install the latest version of the library. Instead, it often selects an artifact that is &lt;strong&gt;a few major versions behind&lt;/strong&gt; the latest release. This behavior caused two types of issues for me repeatedly:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Usage of already deprecated functions&lt;/li&gt;
&lt;li&gt;Version mismatch bugs&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you don’t have proper &lt;strong&gt;E2E tests&lt;/strong&gt; in the project, version mismatch bugs can be very difficult to detect. The feature simply doesn’t behave as expected, usually without any error messages, which makes debugging extremely frustrating.&lt;/p&gt;

&lt;p&gt;After running into this situation multiple times across several projects, I asked &lt;strong&gt;Opus 4.6&lt;/strong&gt; to create a &lt;a href="https://www.notion.so/Version-checker-hook-Claude-Code-31b1c06aab6e809496c2de877f9b77ab?pvs=21" rel="noopener noreferrer"&gt;version-checking script&lt;/a&gt; and &lt;a href="https://www.notion.so/Version-checker-hook-Claude-Code-31b1c06aab6e809496c2de877f9b77ab?pvs=21" rel="noopener noreferrer"&gt;integrated it into a PostToolUse hook&lt;/a&gt;, that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;checks whether all project dependencies are up to date (similar to Dependabot)&lt;/li&gt;
&lt;li&gt;detects version mismatch issues&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Since I started using this hook, I’ve never had to struggle with dependency versions again.
&lt;/h3&gt;

&lt;p&gt;When AI adds a new dependency for a feature, even if I miss an E2E test for a particular scenario, this hook saves me a lot of time by catching hidden issues caused by version mismatches.&lt;/p&gt;

&lt;p&gt;After every change in my dependencies, it runs version checks and &lt;strong&gt;automatically iterates on dependency versions until they align and all tests pass&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix
&lt;/h2&gt;

&lt;p&gt;Ask AI to create a &lt;strong&gt;version-checker PostToolUse&lt;/strong&gt; hook in projects that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;shows a warning when it detects a newer version of a dependency&lt;/li&gt;
&lt;li&gt;checks for version mismatch issues and resolves them when found&lt;/li&gt;
&lt;li&gt;detects transitive dependency conflicts and resolves them automatically&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Best practices:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Update dependencies in a &lt;strong&gt;separate commit or PR&lt;/strong&gt; when warnings appear to keep the project up to date.&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;TDD with BDD-style E2E tests&lt;/strong&gt; (only partially related, but still helpful for catching subtle runtime issues).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Resulting Workflow
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;AI adds dependencies during feature implementation&lt;/li&gt;
&lt;li&gt;The PostToolUse hook detects version issues&lt;/li&gt;
&lt;li&gt;The hook automatically aligns dependency versions&lt;/li&gt;
&lt;li&gt;Lint, type checks, and tests verify correctness&lt;/li&gt;
&lt;li&gt;Only stable commits enter the repository&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automated dependency version assurance&lt;/li&gt;
&lt;li&gt;Reduced debugging time&lt;/li&gt;
&lt;li&gt;Safer AI-generated code&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I learned
&lt;/h2&gt;

&lt;p&gt;In my full-stack TypeScript projects, the hook works surprisingly well. Even when the updated major versions were not yet compatible with each other. The feedback loops in the project recognized this and automatically iterated with minor versions, reading issues and forums, and also automatically came up with a small, effective, temporary patch that solved the incompatibility issue. &lt;/p&gt;

&lt;p&gt;The bottom line: connecting a version-checker pre-commit hook to all your projects isn't optional when you're working with AI-generated code. It's a mandatory feedback loop for maintaining developer productivity and saving a lot of my nerves.:) &lt;/p&gt;

&lt;h2&gt;
  
  
  Want to try it? Here is my implementation example
&lt;/h2&gt;

&lt;p&gt;In my TypeScript projects, I implemented this workflow with a small script:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.notion.so/bobcats-coding/Version-checker-hook-Claude-Code-31b1c06aab6e809496c2de877f9b77ab?source=copy_link#31c1c06aab6e804e9c2efe8509462cee" rel="noopener noreferrer"&gt;check-versions.ts&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It performs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;dependency outdated checks&lt;/li&gt;
&lt;li&gt;peer dependency mismatch detection&lt;/li&gt;
&lt;li&gt;transitive conflict detection&lt;/li&gt;
&lt;li&gt;optional automatic resolution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example usage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;bun scripts/check-versions.ts              #&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;All checks &lt;span class="o"&gt;(&lt;/span&gt;pre-commit&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="gp"&gt;bun scripts/check-versions.ts --mismatch   #&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Mismatch only &lt;span class="o"&gt;(&lt;/span&gt;fast&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="gp"&gt;bun scripts/check-versions.ts --fix        #&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Auto-resolve mismatches + transitive conflicts
&lt;span class="gp"&gt;bun scripts/check-versions.ts --json       #&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;JSON output &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;for &lt;/span&gt;Claude hook&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This allows different checks depending on the stage of the workflow.&lt;/p&gt;

&lt;p&gt;For example, the &lt;strong&gt;fast mismatch-only check&lt;/strong&gt; is ideal for pre-commit hooks.&lt;/p&gt;

&lt;p&gt;An example output of the script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;bun scripts/check-versions.ts 2&amp;gt;&amp;amp;1)
  ⎿  ⚠ OUTDATED PACKAGES:
       @vitejs/plugin-react           4.7.0 → 5.1.4        ▲ major
       vite                           6.4.1 → 7.3.1        ▲ major
       @colyseus/schema               2.0.37 → 4.0.17      ▲ major
       colyseus.js                    0.15.28 → 0.16.22    ▲ minor
       @colyseus/ws-transport         0.15.3 → 0.17.9      ▲ minor
       colyseus                       0.15.57 → 0.17.8     ▲ minor
       @colyseus/testing              0.15.4 → 0.17.11     ▲ minor
       @biomejs/biome                 2.4.4 → 2.4.6        ▲ patch
       @storybook/react               10.2.13 → 10.2.16    ▲ patch
       @storybook/react-vite          10.2.13 → 10.2.16    ▲ patch
       storybook                      10.2.13 → 10.2.16    ▲ patch

     ✓ No version mismatches or transitive conflicts.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The &lt;code&gt;check-versions.sh&lt;/code&gt; hook
&lt;/h2&gt;

&lt;p&gt;In the &lt;code&gt;.claude/hooks&lt;/code&gt; folder, I created the &lt;code&gt;check-versions.sh&lt;/code&gt; script, where I run the &lt;code&gt;check-versions.ts&lt;/code&gt; script only when a package.json file is modified.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/usr/bin/env bash&lt;/span&gt;
&lt;span class="c"&gt;# Version checker hook — runs after PostToolUse on Edit/Write.&lt;/span&gt;
&lt;span class="c"&gt;# Only fires when the modified file is a package.json.&lt;/span&gt;
&lt;span class="c"&gt;# Runs mismatch check and injects result into Claude's context.&lt;/span&gt;

&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-euo&lt;/span&gt; pipefail

&lt;span class="nv"&gt;INPUT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;ROOT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"/Users/kond/kondfox/isuperhero-claude"&lt;/span&gt;

&lt;span class="c"&gt;# Extract the file path from the hook payload&lt;/span&gt;
&lt;span class="nv"&gt;TOOL_INPUT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$INPUT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.tool_input // empty'&lt;/span&gt; 2&amp;gt;/dev/null &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;FILE_PATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$TOOL_INPUT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.file_path // empty'&lt;/span&gt; 2&amp;gt;/dev/null &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Only run when a package.json was modified&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$FILE_PATH&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="o"&gt;[[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$FILE_PATH&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="s2"&gt;"package.json"&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt; &lt;span class="o"&gt;]]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  &lt;/span&gt;&lt;span class="nb"&gt;exit &lt;/span&gt;0
&lt;span class="k"&gt;fi&lt;/span&gt;

&lt;span class="c"&gt;# Run mismatch check only (fast, no network call)&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$ROOT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nv"&gt;RESULT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;~/.bun/bin/bun scripts/check-versions.ts &lt;span class="nt"&gt;--mismatch&lt;/span&gt; &lt;span class="nt"&gt;--json&lt;/span&gt; 2&amp;gt;&amp;amp;1 &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$RESULT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  &lt;code&gt;PostToolUse&lt;/code&gt; hook integration
&lt;/h2&gt;

&lt;p&gt;In the &lt;code&gt;.claude/settings.json&lt;/code&gt; I defined the &lt;code&gt;PostToolsUse&lt;/code&gt; hook:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"PostToolUse"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"matcher"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Edit|Write"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/Users/kond/kondfox/isuperhero-claude/.claude/hooks/check-versions.sh"&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Pre-commit Integration (Husky)
&lt;/h2&gt;

&lt;p&gt;The script runs automatically before every commit via Husky.&lt;/p&gt;

&lt;p&gt;Example pre-commit hook:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/usr/bin/env sh&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;git rev-parse &lt;span class="nt"&gt;--show-toplevel&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

~/.bun/bin/bun scripts/check-versions.ts--fix
~/.bun/bin/bun run lint:fix
~/.bun/bin/bun run typecheck
~/.bun/bin/bun run &lt;span class="nb"&gt;test&lt;/span&gt;
~/.bun/bin/bun run &lt;span class="nb"&gt;test&lt;/span&gt;:e2e
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This ensures that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;dependency versions are aligned&lt;/li&gt;
&lt;li&gt;lint errors are fixed&lt;/li&gt;
&lt;li&gt;types compile&lt;/li&gt;
&lt;li&gt;unit tests pass&lt;/li&gt;
&lt;li&gt;E2E tests pass&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;before any commit enters the repository.&lt;/p&gt;

&lt;p&gt;This creates a &lt;strong&gt;tight feedback loop for AI-generated code&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;You can see a full working example here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/kondfox/isuperhero-claude" rel="noopener noreferrer"&gt;https://github.com/kondfox/isuperhero-claude&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The repository contains the full &lt;code&gt;check-versions.ts&lt;/code&gt; implementation and the Husky integration used in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Legacy project?
&lt;/h2&gt;

&lt;p&gt;I have doubts about how good idea it is to integrate the version checker feedback loop into a legacy project where the dependencies are far behind the actual versions. But I’m gonna give it a try in such a project shortly, and share the result with you.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>productivity</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>AI Field Notes #001 | Is AI frontend development finally getting good? Our Opus 4.6 test says yes. (And no.)</title>
      <dc:creator>Peter Tamas</dc:creator>
      <pubDate>Tue, 07 Apr 2026 13:00:00 +0000</pubDate>
      <link>https://dev.to/kondvik/ai-field-notes-001-is-ai-frontend-development-finally-getting-good-our-opus-46-test-says-yes-5c0a</link>
      <guid>https://dev.to/kondvik/ai-field-notes-001-is-ai-frontend-development-finally-getting-good-our-opus-46-test-says-yes-5c0a</guid>
      <description>&lt;p&gt;In December 2025, &lt;a href="https://www.notion.so/bobcats-coding/UI-component-development-with-Windsurf-Gemini-3-Pro-Figma-MCP-Playwright-MCP-2c31c06aab6e80b8b570ecec267b2499" rel="noopener noreferrer"&gt;I wrote about trying to build a full page with AI&lt;/a&gt; with a much smaller scope, and it didn’t go well.&lt;/p&gt;

&lt;p&gt;At that time, my conclusion was that while implementing a simple, small UI component with AI and Figma MCP worked quite well, it was surprising how badly it handled the implementation of a full page. The small UI component generation wasn't perfect either. I could get a ~90% "close enough" output that I could quickly align to the requirements by hand. But when I asked AI to implement a simple login page that contained only already-existing components, even with Figma MCP, the result was disappointing. The layout was far from the design, and it hallucinated elements that weren't in the design at all. No matter how I prompted, it just produced different hallucinations. Which I really don't understand, because Figma MCP provides a structured description of the design. In the end, I spent much more time experimenting with AI than it would have taken to puzzle the components into their places by myself within a few minutes.&lt;/p&gt;

&lt;p&gt;My current experience is still not flawless, but I'm amazed by the improvement in this area over the past 3 months. &lt;strong&gt;I managed to implement a whole complex page, with existing and new components, that I had estimated at 48 hours, in just 8 hours.&lt;/strong&gt; Not in one iteration, not 100% AI-generated, not without refactoring and human code reviews, but the velocity is impressive.&lt;/p&gt;

&lt;h2&gt;
  
  
  Some Context on the Comparison
&lt;/h2&gt;

&lt;p&gt;After having satisfying experiences with Opus 4.6 UI component implementation, I was eager to retry a full-page AI implementation experiment. When you don't have a strict specification, it's easy to vibe-code a fair-looking result, but it's hard to evaluate how well the output matches the client's needs. That's why I chose a project where we had clear requirements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Figma designs that we need to implement&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;An OpenAPI specification of the backend API&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are strict, structural anchors that provide clear and easy verification of the result.&lt;/p&gt;

&lt;p&gt;The state of the project when I ran my experiment:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;The project was already "Claude-ready". It had a well-set-up, project-specific Claude.md that my colleagues had been using for months.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;We already had an API client, but none of the endpoints that this page uses were defined in it yet.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The site layout, design system, and some of the UI components that the page needed were already in place, but the design also contained new, complex UI components.&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Unfortunately, this was a client project we built at &lt;a href="https://www.bobcatscoding.com/ai-field-notes" rel="noopener noreferrer"&gt;Bobcats Coding&lt;/a&gt;, so screenshots, product details, and the repository stay private. But I'm going to write about everything else.&lt;/p&gt;

&lt;h2&gt;
  
  
  The first iteration
&lt;/h2&gt;

&lt;p&gt;The better you specify, the better outcome you can expect. This isn't a new directive; it was true before AI coding as well. But with agentic engineering, specification is the new code.&lt;/p&gt;

&lt;p&gt;So I spent ~1 hour specifying the task as my initial prompt. I gave a general context about the page we were building, linked the design of the whole page in Figma and the components one by one. I gave a clear specification for each element of the page: which API endpoint it gets its data from, what it represents, how it should work. I specified all the page actions as well. What should happen when a button is clicked, when a dropdown element is selected, and so on. I also instructed Claude to generate every new UI component in a reusable way within our UI library, test them, and provide Storybook stories and documentation.&lt;/p&gt;

&lt;p&gt;I asked AI to create an implementation plan that multiple agents could work on in parallel (because I was curious how this would work). I required a contract-first approach so that the results of the asynchronously working agents could be integrated at the end.&lt;/p&gt;

&lt;p&gt;Opus 4.6 worked for 9 minutes to create the plan. It correctly found all the files in the project that it needed to modify and the workspaces and folders where it should create the new files.&lt;/p&gt;

&lt;p&gt;It separated the work into 4 agents with clear responsibilities, tasks, and restrictions:&lt;/p&gt;

&lt;h3&gt;
  
  
  Agent 1: API Layer
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Update the API schema&lt;/li&gt;
&lt;li&gt;Generate API types from the schema&lt;/li&gt;
&lt;li&gt;Create and export DTO types&lt;/li&gt;
&lt;li&gt;Add mapping functions&lt;/li&gt;
&lt;li&gt;Implement new API client methods based on the given interface&lt;/li&gt;
&lt;li&gt;Add unit tests&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Agent 2: UI Components
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Implement the discovered new UI components (the plan listed their names, dependencies, and functional descriptions)&lt;/li&gt;
&lt;li&gt;Add unit tests&lt;/li&gt;
&lt;li&gt;Create Storybook stories&lt;/li&gt;
&lt;li&gt;Export the components from the UI library&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Agent 3: Page Composition
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Replace the current placeholder component on the page (server component)&lt;/li&gt;
&lt;li&gt;Call the proper API client methods for data&lt;/li&gt;
&lt;li&gt;Feed the data to the created client component&lt;/li&gt;
&lt;li&gt;Implement the layout and state management of the client component&lt;/li&gt;
&lt;li&gt;Place the required UI components on the page&lt;/li&gt;
&lt;li&gt;Implement the page actions&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Agent 4: E2E Tests
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Write the necessary BDD-style E2E tests for the page (the BDD features were also included in the plan for quick human verification)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It created an execution order as well:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Phase 1 (parallel): Agent 1 (API) + Agent 2 (UI Components)
Phase 2 (after Phase 1): Agent 3 (Page Composition)
Phase 3 (after Phase 2): Agent 4 (E2E Tests)

Agents 1 and 2 have zero dependencies and run fully in parallel.
Agent 3 depends on both but can start skeleton code immediately.
Agent 4 runs last as it needs rendered DOM.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The execution of the plan took &lt;strong&gt;29 minutes and 25 seconds&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The result at first glance was a bit odd. It clearly contained all the required elements and showed the data correctly from the API, but the layout was broken, and the component designs were only ~80–90% faithful to the Figma designs. No hallucinations, though!&lt;/p&gt;

&lt;p&gt;All in all, it was not great, not terrible for a first iteration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Refining the design
&lt;/h2&gt;

&lt;p&gt;I asked Claude to use Playwright MCP to verify its result: find the differences from the Figma design and fix them.&lt;/p&gt;

&lt;p&gt;Using Playwright MCP as a feedback loop in frontend development works surprisingly well. Claude opens the page in a browser, takes screenshots, analyzes them, finds the problems, fixes them, verifies the fix with Playwright again, and iterates until it's solved.&lt;/p&gt;

&lt;p&gt;However, my prompt was too vague, and the use of Figma MCP is still far from perfect, so the result was also disappointing. What worked much better was creating screenshots of both the UI implementation and the expected design, then describing the problems. Most of the design issues could be solved this way.&lt;/p&gt;

&lt;p&gt;Creating a 100%, pixel-perfect design is still not something an LLM is capable of.&lt;/p&gt;

&lt;p&gt;You need to recognize the point when the agent gets stuck in a loop, when every iteration just makes the problem different, but you don't get any closer to the solution. That's the point when you need to take the keyboard and finish the job yourself.&lt;/p&gt;

&lt;p&gt;In the case of pixel-perfect design implementation, in my experience with current models and tools, you can usually reach a ~90–95% state with AI.&lt;/p&gt;

&lt;h2&gt;
  
  
  Refining the code
&lt;/h2&gt;

&lt;p&gt;Opus 4.6 generates relatively decent code, but most of the time it needs some refactoring. In the case of this experiment, here's what I found during code review:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It didn't create components for all the UI elements it should have. This led to unnecessary duplication that would have been difficult to maintain.&lt;/li&gt;
&lt;li&gt;It didn't always use the tokens from the design system or our SASS mixins (e.g., for typography).&lt;/li&gt;
&lt;li&gt;I found some overcomplicated, mutating logic that could have been written more simply.&lt;/li&gt;
&lt;li&gt;It didn't create some of the components as reusable as I expected.&lt;/li&gt;
&lt;li&gt;It hardcoded some constants that shouldn't have been hardcoded.&lt;/li&gt;
&lt;li&gt;It wasn't forward-thinking enough to extract functionality we could reuse later into a hook or utility function.&lt;/li&gt;
&lt;li&gt;It used far more useMemo than necessary. But when I pointed these out to Claude, it could fix them much faster than I would have. With proper permissions, it can even read PR comments from GitHub, so you don't necessarily need to prompt these manually. You can just review on GitHub, then give a short fix my reviews on the PR instruction.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What didn’t work
&lt;/h2&gt;

&lt;p&gt;Figma MCP still surprisingly underperforms compared to a simple screenshot.&lt;/p&gt;

&lt;p&gt;The multi-agent implementation was fun to try, but resulted in a 3,000+ line PR, which is far from optimal. Next time, after I have the multi-agent implementation plan and the contracts (types, interfaces) in the code, I'd try to solve the task on separate branches using worktrees.&lt;/p&gt;

&lt;h2&gt;
  
  
  What worked
&lt;/h2&gt;

&lt;p&gt;With all the refinements, I could ship the module in ~8 hours instead of the estimated 48 hours. Fully tested and documented. Two things stood out from the start:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Playwright MCP is a MUST in frontend development.&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;2. Creating the API schema mapping, types, and API client based on the given OpenAPI specification worked perfectly, even on the first iteration.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I don't think AI frontend development is solved, but for the first time, the velocity feels real. One thing's for sure: I'll keep testing, and I'll keep writing when something interesting happens.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>frontend</category>
      <category>ui</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
