<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Karun Japhet</title>
    <description>The latest articles on DEV Community by Karun Japhet (@javatarz).</description>
    <link>https://dev.to/javatarz</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F243178%2Fb6350b6f-468d-4124-88f8-9119b98e01db.jpeg</url>
      <title>DEV Community: Karun Japhet</title>
      <link>https://dev.to/javatarz</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/javatarz"/>
    <language>en</language>
    <item>
      <title>The Comfort Plateau AI Built For You</title>
      <dc:creator>Karun Japhet</dc:creator>
      <pubDate>Thu, 21 May 2026 06:02:00 +0000</pubDate>
      <link>https://dev.to/javatarz/the-comfort-plateau-ai-built-for-you-29h</link>
      <guid>https://dev.to/javatarz/the-comfort-plateau-ai-built-for-you-29h</guid>
      <description>&lt;p&gt;&lt;a href="https://karun.me/assets/images/posts/2026-05-21-the-comfort-plateau-ai-built-for-you/cover.png" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxckffh1ay6iwxldqkko0.png" alt="A man stands proudly next to a bicycle in a park, hand on hip, satisfied. The bicycle looks polished and complete but cannot actually be ridden." width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I've spent the last year watching engineers and writers adopt these tools, and the same pattern keeps showing up. The first three months look transformative. The next nine look almost identical to the first three.&lt;/p&gt;

&lt;p&gt;The output keeps coming. The confidence keeps building. The skill stops moving.&lt;/p&gt;

&lt;p&gt;This isn't a story about AI making people dumber. It's a story about a comfort plateau that didn't exist before, and what it costs to stay on it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The accessibility breakthrough is real
&lt;/h2&gt;

&lt;p&gt;Draw a spider chart. Hundreds of axes: SQL, contract law, React, radiology, negotiation, Kubernetes, copywriting. Score a human. Most experts look like a star with two or three spikes and a flat field everywhere else. Depth is the point.&lt;/p&gt;

&lt;p&gt;Plot AI on the same chart. It's not a 5 on any axis. But it sits at a 2–4 across nearly everything. No individual has ever had a tool this broadly competent.&lt;/p&gt;

&lt;p&gt;This is why the gains for newcomers are so large. A &lt;a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4573321" rel="noopener noreferrer"&gt;BCG study&lt;/a&gt; of 758 consultants found below-average performers improved by 43% and above-average performers by 17%. A &lt;a href="https://www.nber.org/papers/w31161" rel="noopener noreferrer"&gt;field study&lt;/a&gt; of 5,179 customer-support agents found novices gained 34% in productivity while the most experienced gained almost nothing. Going from "can't do this" to "shipped a thing" used to take months. Now it takes minutes.&lt;/p&gt;

&lt;p&gt;This part of the story is good. It's also where most writing about AI stops.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bicycle you can't draw
&lt;/h2&gt;

&lt;p&gt;Ask a person to rate their understanding of how a bicycle works. They will rate it high. Ask them to draw one from memory. About &lt;a href="https://link.springer.com/article/10.3758/BF03195929" rel="noopener noreferrer"&gt;half of non-cyclists get the chain wrong&lt;/a&gt;, looping it around both wheels instead of only the back wheel and pedals. Many draw the frame joining the front and back wheels, which would make steering impossible. Asked to re-rate after the drawing, confidence collapses.&lt;/p&gt;

&lt;p&gt;This is the &lt;a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3062901/" rel="noopener noreferrer"&gt;Illusion of Explanatory Depth&lt;/a&gt;. People feel they understand things they can't actually explain. The illusion only breaks when you're forced to &lt;em&gt;produce&lt;/em&gt; the explanation. Without that production step, confidence sits unchallenged forever.&lt;/p&gt;

&lt;p&gt;AI removes the production step.&lt;/p&gt;

&lt;p&gt;You don't draw the bicycle. You ask for one. The pedals are in the right place. The chain runs correctly. You ship it. Nothing collapses. The next time you need a bicycle, you ask again, and the feedback loop that would have built a mental model never closes.&lt;/p&gt;

&lt;p&gt;I'm not certain the bicycle effect maps perfectly onto code or prose. But the mechanism, fluent output without forced explanation, feels right. And there's now direct evidence: an &lt;a href="https://www.anthropic.com/research/AI-assistance-coding-skills" rel="noopener noreferrer"&gt;Anthropic study&lt;/a&gt; measured exactly this in January. Junior developers learning an unfamiliar library with AI assistance scored 17% lower on conceptual understanding and debugging than those who learned without it. The code they shipped looked indistinguishable. The judgment behind it did not develop.&lt;/p&gt;

&lt;h2&gt;
  
  
  "But we had this fear before"
&lt;/h2&gt;

&lt;p&gt;Calculators were supposed to kill arithmetic. Spell-check was going to end literacy. Stack Overflow would breed shallow programmers. Most of those predictions were wrong. A &lt;a href="https://www.jstor.org/stable/749255" rel="noopener noreferrer"&gt;meta-analysis&lt;/a&gt; found calculator use, integrated properly, &lt;em&gt;improved&lt;/em&gt; math skills. Stack Overflow &lt;a href="https://ieee-security.org/TC/SP2016/papers/0824a289.pdf" rel="noopener noreferrer"&gt;made code more functional but significantly less secure&lt;/a&gt;, and yet didn't atrophy programmers broadly. Why would AI be different?&lt;/p&gt;

&lt;p&gt;Because of where it removes the work.&lt;/p&gt;

&lt;p&gt;A calculator removes execution. You still set up the problem. The mental model of "I need to multiply these two things and check the units" stays yours. Stack Overflow removes retrieval. You still adapt the answer, integrate it, decide whether it fits.&lt;/p&gt;

&lt;p&gt;GPS removes generation, the building of the spatial map. A &lt;a href="https://www.nature.com/articles/s41598-020-62877-0" rel="noopener noreferrer"&gt;longitudinal study&lt;/a&gt; tracking GPS users over three years found that heavy users showed declining hippocampal-dependent spatial memory. Not because they were already bad navigators. Because they stopped generating maps.&lt;/p&gt;

&lt;p&gt;AI is like GPS, not like a calculator. It removes the generation step, the building of the mental model. That's the step that triggers the bicycle collapse. Take it away and the collapse never happens.&lt;/p&gt;

&lt;h2&gt;
  
  
  The motivated minority objection
&lt;/h2&gt;

&lt;p&gt;There's a real counter-argument: every tool produces a distribution. The top quartile uses it to accelerate; the median plateaus; nothing changes about who pulls ahead. AI is just the latest example.&lt;/p&gt;

&lt;p&gt;This is half right. The thing that used to make some people more valuable than others, deep knowledge in their specific domain, matters less now. AI has narrowed that gap on the work it can do. The BCG study showed it explicitly: bottom-half consultants gained more than top-half. The expertise gap is closing on AI-shaped tasks.&lt;/p&gt;

&lt;p&gt;But a &lt;a href="https://arxiv.org/html/2605.18143v1" rel="noopener noreferrer"&gt;new axis&lt;/a&gt; is opening. Variance among AI users isn't shrinking; it's growing by about 47% in some measures. The new top quartile isn't the people with the deepest domain knowledge. It's the people who can sit with an AI output and ask: &lt;em&gt;what would I have to draw to know if this is right?&lt;/em&gt; The ones who treat fluency as a warning, not a signal.&lt;/p&gt;

&lt;p&gt;That's a skill the AI cannot give you. By design, it removes the very moment that develops it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The plateau is where the squeeze is
&lt;/h2&gt;

&lt;p&gt;There's an external reason this matters now, not just a personal one. The roles being compressed aren't the ones at the top of any skill chart. They're the ones whose required skill sits at or below what AI already provides. AI raises the floor. Roles at or under the floor get cheaper, fewer, or absorbed into someone else's job. Roles above the floor, where the work requires understanding deep enough to explain and not just outputs polished enough to ship, don't.&lt;/p&gt;

&lt;p&gt;The comfort plateau is the floor. Stay there and the work you're doing is work AI already does. The work AI cannot do for you is the work that will keep belonging to humans.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this asks of you
&lt;/h2&gt;

&lt;p&gt;Try this. Take something you shipped with AI last week. Explain how it works, out loud, without looking. Notice where the explanation gets thin.&lt;/p&gt;

&lt;p&gt;If most of it gets thin, that's the plateau. Not a failure of effort. A failure of friction.&lt;/p&gt;

&lt;p&gt;Five years from now, the largest skill gap in knowledge work won't be between people who use AI and people who don't. It will be between people who let the fluency pass unchallenged and people who insisted on drawing the bicycle themselves.&lt;/p&gt;

&lt;p&gt;The tool will keep working either way. The growth will not.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>career</category>
    </item>
    <item>
      <title>The Judgement Pyramid: Reasoning vs Measurement</title>
      <dc:creator>Karun Japhet</dc:creator>
      <pubDate>Wed, 20 May 2026 06:00:16 +0000</pubDate>
      <link>https://dev.to/javatarz/the-judgement-pyramid-reasoning-vs-measurement-35il</link>
      <guid>https://dev.to/javatarz/the-judgement-pyramid-reasoning-vs-measurement-35il</guid>
      <description>&lt;p&gt;&lt;a href="https://karun.me/assets/images/posts/2026-05-20-the-judgement-pyramid/cover.png" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvhadf7g7ps403iinniyj.png" alt="A three-tiered cutaway workshop with a glowing envelope descending between layers" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A team I was talking to had built a code-review skill in Claude Code. It looked good: a careful prompt, a set of rules, and examples. Then it shipped a PR with a nested recursive loop inside another recursive loop. Cyclomatic complexity any static analyser (&lt;a href="https://pmd.github.io/" rel="noopener noreferrer"&gt;PMD&lt;/a&gt;, &lt;a href="https://radon.readthedocs.io/" rel="noopener noreferrer"&gt;radon&lt;/a&gt;, &lt;a href="https://eslint.org/docs/latest/rules/complexity" rel="noopener noreferrer"&gt;ESLint's complexity rule&lt;/a&gt;) would have flagged in milliseconds. The skill missed it.&lt;/p&gt;

&lt;p&gt;The team's response was to iterate the skill. More examples. More rules. A few more lines of prompt. The &lt;a href="https://dev.to/javatarz/context-engineering-for-ai-assisted-development-b8i#what-goes-wrong"&gt;context window kept ballooning&lt;/a&gt;. I think the skill wasn't the problem. The placement was. The LLM was being asked to &lt;em&gt;reason&lt;/em&gt; about something a tool can &lt;em&gt;measure&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Judgement Pyramid
&lt;/h2&gt;

&lt;p&gt;Every check in your AI-assisted workflow runs at one of three layers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fkarun.me%2Fassets%2Fimages%2Fposts%2F2026-05-20-the-judgement-pyramid%2Fjudgement-pyramid.svg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fkarun.me%2Fassets%2Fimages%2Fposts%2F2026-05-20-the-judgement-pyramid%2Fjudgement-pyramid.svg" alt="Three-tier pyramid: deterministic tools at the bottom, LLM judgement in the middle, human judgement at the top. Cost and judgement required both climb upward." width="680" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The metaphor borrows from the &lt;a href="https://martinfowler.com/bliki/TestPyramid.html" rel="noopener noreferrer"&gt;testing pyramid&lt;/a&gt;: a stack ordered by cost and reliability. This one orders by &lt;em&gt;how much judgement the check needs&lt;/em&gt;. The bottom is measurement: questions with a deterministic answer for a given input. The middle is soft judgement: style, design quality, naming, abstraction fit. The top is irreducible human judgement: does this serve the user, the team, the business.&lt;/p&gt;

&lt;p&gt;Cost climbs with judgement too. Not money, necessarily. Cognitive load. Time. Attention. The bottom layer runs in milliseconds and costs nothing. The middle burns tokens and produces a different answer each time you ask. The top costs a person's focus, which is the most finite thing in the system.&lt;/p&gt;

&lt;p&gt;The rule is simple: push every check to the lowest layer that can do it reliably.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reasoning vs measurement
&lt;/h2&gt;

&lt;p&gt;The useful question isn't &lt;em&gt;can the LLM do this&lt;/em&gt;. Usually it can. The question is &lt;em&gt;should it&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;You wouldn't format code by hand. You'd run a formatter. Not because you can't think about indentation, but because thinking about indentation is wasted thinking. A formatter is the right tool because formatting is a measurement, not a judgement.&lt;/p&gt;

&lt;p&gt;Cyclomatic complexity is a measurement. Coverage delta is a measurement. So is dead-code detection. So is whether two functions are near-duplicates, whether the import resolves, whether the test contains an assertion, whether the file ends with a newline. All of these have deterministic answers for a given input.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If a check has a deterministic answer for a given input, it's measurement. If it depends on context, taste, or domain knowledge, it's judgement.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Measurement belongs at the bottom of the pyramid. Judgement belongs higher.&lt;/p&gt;

&lt;h2&gt;
  
  
  Apex creep
&lt;/h2&gt;

&lt;p&gt;Work creeps up the pyramid. The LLM is the most visible tool in the room, so checks gravitate toward it, even when a linter would do the job faster, cheaper, and more reliably. I'll call this &lt;em&gt;apex creep&lt;/em&gt;: the steady drift of work toward the most expensive layer that can technically do it.&lt;/p&gt;

&lt;p&gt;You recognise the symptom. A review skill that keeps growing. A prompt that gains a few lines every sprint. A team that keeps tuning the LLM to catch a class of bug a static analyser would flag for free. Each iteration adds more rules and more examples. The skill works harder. The placement is still wrong.&lt;/p&gt;

&lt;p&gt;Apex creep is a placement bug. The fix is not a smarter LLM. The fix is moving the check to the layer that handles it deterministically.&lt;/p&gt;

&lt;h2&gt;
  
  
  The push-down move
&lt;/h2&gt;

&lt;p&gt;Two questions to ask of every check in your harness.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is this reasoning, or measurement?&lt;/strong&gt; If measurement, push it to a deterministic tool. &lt;a href="https://www.sonarsource.com/products/sonarqube/" rel="noopener noreferrer"&gt;Sonar&lt;/a&gt;, &lt;a href="https://github.com/diffplug/spotless" rel="noopener noreferrer"&gt;Spotless&lt;/a&gt;, &lt;a href="https://docs.astral.sh/ruff/" rel="noopener noreferrer"&gt;Ruff&lt;/a&gt;, &lt;a href="https://eslint.org/" rel="noopener noreferrer"&gt;ESLint&lt;/a&gt;, coverage gates, pre-commit hooks, complexity calculators. Write a script if no tool exists. That's how &lt;a href="https://dev.to/javatarz/level-up-code-quality-with-an-ai-assistant-5cdn"&gt;&lt;code&gt;just lint&lt;/code&gt;&lt;/a&gt; got built, and that's &lt;a href="https://dev.to/javatarz/the-unix-philosophy-for-agentic-coding-112p"&gt;the Unix-philosophy move&lt;/a&gt; for agentic coding. Hooks fire on tool calls; CI fires on PRs; pre-commit fires on commit. Pick the cheapest layer that catches the failure and run it there.&lt;/p&gt;

&lt;p&gt;This is &lt;a href="https://en.wikipedia.org/wiki/Shift-left_testing" rel="noopener noreferrer"&gt;shift-left&lt;/a&gt; for AI checks: push verification as early and as cheap as it goes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What goes to the LLM?&lt;/strong&gt; Everything that genuinely needs context, taste, or cross-cutting judgement. REST endpoint shape. Naming. Abstraction fit. Whether the test asserts what the story actually asked for. Whether the change matches the intent of the design. The LLM is good at this kind of work. Once you stop using it as a linter, you've given it room to be good at it.&lt;/p&gt;

&lt;p&gt;What goes to the human is the top of the pyramid: does this serve the goal, and is the LLM's middle-tier work good enough.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bottom rung isn't free
&lt;/h2&gt;

&lt;p&gt;Some failures hide &lt;em&gt;below&lt;/em&gt; the bottom, in places where the obvious deterministic tool won't catch them.&lt;/p&gt;

&lt;p&gt;A hallucinated import name fails on &lt;code&gt;import&lt;/code&gt; or test run. A &lt;a href="https://snyk.io/articles/slopsquatting-mitigation-strategies/" rel="noopener noreferrer"&gt;slopsquatted&lt;/a&gt; package (a real package whose name mimics a popular one, but malicious) doesn't. Both are LLM-shaped failures. The hallucinated import is caught by the bottom layer you already have. The slopsquatted package isn't.&lt;/p&gt;

&lt;p&gt;That's not an argument for moving the check upward. It's an argument for adding the right deterministic tool to the bottom: lockfiles, allowlists, supply-chain scanners, sandboxed installs. Match the deterministic tool to the failure mode. Don't reach for the LLM because the bottom seems thin.&lt;/p&gt;

&lt;h2&gt;
  
  
  This isn't a new idea
&lt;/h2&gt;

&lt;p&gt;The top and the bottom have always existed. Humans have done judgement work since there's been work. Compilers, type checkers, linters, formatters, test runners, CI pipelines: the deterministic floor under us has been growing for fifty years. Most of what we call "engineering practice" is arguments about where to draw the line: what to automate, what to keep manual, what the cost-benefit looks like.&lt;/p&gt;

&lt;p&gt;Some teams over-rotated. They under-automated and called it craft, paying a cognitive tax forever to avoid investing in a tool once. Others &lt;a href="https://xkcd.com/1319/" rel="noopener noreferrer"&gt;over-rotated the other way&lt;/a&gt; and spent more building the automation than they ever saved.&lt;/p&gt;

&lt;p&gt;The LLM didn't replace either layer. It added a new one in the middle: cheaper than a human, more flexible than a script, but slower and less reliable than a script on anything a script can already check. The pyramid is the same shape it's always been. We just have a new layer to misplace work on.&lt;/p&gt;

&lt;h2&gt;
  
  
  Audit your harness
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://openai.com/index/harness-engineering/" rel="noopener noreferrer"&gt;Harness engineering&lt;/a&gt; is the work of 2026. Agents take longer tasks, run for longer horizons, and make more decisions before a human sees the output. The deterministic floor underneath them is the difference between an autonomous workflow you trust and one you babysit. Get placement right and autonomy compounds. Get it wrong and every increment of agent capability costs you more in review.&lt;/p&gt;

&lt;p&gt;So open your review skill. Open your pre-commit hooks. Open the prompts you've written for grooming, testing, ops investigation. Go line by line. For each check, ask: &lt;em&gt;reasoning, or measurement?&lt;/em&gt; If measurement, the LLM doesn't belong on it. Push it down.&lt;/p&gt;

&lt;p&gt;Most teams discover their pyramid is top-heavy not because they planned it that way, but because the LLM was the easiest place to add a rule. Apex creep is what happens when "add a rule" defaults to "add a prompt."&lt;/p&gt;

&lt;p&gt;Push it down.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Multi-Agent Development Workflows with Claude Code</title>
      <dc:creator>Karun Japhet</dc:creator>
      <pubDate>Tue, 19 May 2026 21:31:53 +0000</pubDate>
      <link>https://dev.to/javatarz/multi-agent-development-workflows-with-claude-code-n23</link>
      <guid>https://dev.to/javatarz/multi-agent-development-workflows-with-claude-code-n23</guid>
      <description>&lt;p&gt;Single-agent Claude Code is pair programming. One developer, one task, full attention.&lt;/p&gt;

&lt;p&gt;I've been running three or four agents against a project backlog simultaneously. Not because single-agent broke, but because groomed cards were sitting idle.&lt;/p&gt;

&lt;p&gt;Here's what that looks like in practice.&lt;/p&gt;

&lt;h2&gt;
  
  
  The shift: from writing code to shaping work
&lt;/h2&gt;

&lt;p&gt;When you use Claude Code as a single agent, you're pair programming. That's powerful when you're exploring a problem or designing an approach. But if you have independent cards groomed and ready, you're leaving throughput on the table.&lt;/p&gt;

&lt;p&gt;Your role shifts. Instead of writing code alongside one agent, you shape the work before it starts and judge it when it's done. You groom cards, make design decisions, dispatch work, and review output. The agents write the code. Addy Osmani calls this the &lt;a href="https://addyosmani.com/blog/factory-model/" rel="noopener noreferrer"&gt;factory model&lt;/a&gt;: you're no longer building software, you're building the factory that builds your software. The spec becomes the primary deliverable, and the harness (task tracking, isolation, quality gates, review) is the factory floor.&lt;/p&gt;

&lt;p&gt;Steve Yegge's &lt;a href="https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16dd04" rel="noopener noreferrer"&gt;Gas Town post&lt;/a&gt; maps this journey in eight stages, from IDE copilot to building your own orchestrator. I started multi-agent work at stage 6: three or four terminal windows, each running Claude Code on a different card. You realise quickly that you're the bottleneck. The agents can move faster than you can review, approve, and redirect. The answer isn't more attention from you. It's giving the agents more autonomy with safety nets: quality gates that catch problems automatically, structured dispatch so agents find their own work, and a review workflow for when they're done.&lt;/p&gt;

&lt;p&gt;This post is my version of stage 8. The tooling is still maturing, and this harness will look different in six months. This is the April 2026 version.&lt;/p&gt;

&lt;p&gt;Anthropic's &lt;a href="https://resources.anthropic.com/2026-agentic-coding-trends-report" rel="noopener noreferrer"&gt;2026 Agentic Coding Trends Report&lt;/a&gt; says multi-agent "doesn't make sense for 95% of agent-assisted development tasks." That's probably true for ad-hoc coding. But if you have a groomed backlog of independent cards, running them in parallel is the logical next step to move through the backlog quicker.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two modes, not a progression
&lt;/h2&gt;

&lt;p&gt;These aren't stages you graduate through. They're modes you switch between based on what you're doing right now.&lt;/p&gt;

&lt;h3&gt;
  
  
  Thinking mode (single agent)
&lt;/h3&gt;

&lt;p&gt;When you're exploring, designing, or working through a single problem. Grooming cards, writing acceptance criteria, debugging something complex. The value is in the conversation, not the throughput.&lt;/p&gt;

&lt;p&gt;This is pair programming. Full attention on one thing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Throughput mode (parallel workers)
&lt;/h3&gt;

&lt;p&gt;When you have multiple cards ready to go. Each worker gets a card, a worktree, and runs independently. You review their output when they're done.&lt;/p&gt;

&lt;p&gt;Choose based on card complexity and dependencies:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://code.claude.com/docs/en/sub-agents" rel="noopener noreferrer"&gt;Sub-agents&lt;/a&gt;&lt;/strong&gt; for small, independent cards (roughly 15-minute tasks):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Quick fixes, config changes, bounded features&lt;/li&gt;
&lt;li&gt;Research running in background&lt;/li&gt;
&lt;li&gt;Automated code review of completed work&lt;/li&gt;
&lt;li&gt;Short-lived: no auto-compaction, so longer tasks can exhaust the context window&lt;/li&gt;
&lt;li&gt;Cheaper: minimal context startup, returns summaries only&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://code.claude.com/docs/en/agent-teams" rel="noopener noreferrer"&gt;Agent teams&lt;/a&gt;&lt;/strong&gt; for substantial cards or cards with cross-card dependencies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multi-file features that need to read large parts of the codebase&lt;/li&gt;
&lt;li&gt;Cards where the agent needs sustained autonomy and may hit context limits&lt;/li&gt;
&lt;li&gt;Each teammate is a full Claude Code session with auto-compaction, so they can sustain longer work&lt;/li&gt;
&lt;li&gt;More expensive: each teammate loads full project context independently&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Agent teams also handle coordination. When cards genuinely depend on each other, teammates can communicate directly via peer-to-peer messaging (&lt;code&gt;SendMessage&lt;/code&gt;), shared task lists with dependency tracking, and auto-unblocking. &lt;a href="https://www.jeremyjarrell.com/vertically-slicing-user-stories" rel="noopener noreferrer"&gt;Vertically-sliced stories&lt;/a&gt; following the &lt;a href="https://en.wikipedia.org/wiki/INVEST_(mnemonic)" rel="noopener noreferrer"&gt;INVEST principle&lt;/a&gt; produce fewer cross-card dependencies than horizontal slicing, but they don't eliminate them. Real dependencies exist even in well-groomed backlogs.&lt;/p&gt;

&lt;p&gt;Real coordination cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One card updates a shared schema; other in-flight cards need to know before they merge&lt;/li&gt;
&lt;li&gt;Large refactors that can't be one card, where agents need to agree on new interfaces&lt;/li&gt;
&lt;li&gt;Adversarial debugging: competing hypotheses where agents share findings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Agent Teams require &lt;code&gt;CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1&lt;/code&gt; (v2.1.32+) and are still experimental. No session resumption, task status can lag, one team per session.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to choose
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Situation&lt;/th&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Exploring, grooming, designing&lt;/td&gt;
&lt;td&gt;Thinking&lt;/td&gt;
&lt;td&gt;You need the dialogue&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;One thing that needs full attention&lt;/td&gt;
&lt;td&gt;Thinking&lt;/td&gt;
&lt;td&gt;Conversation &amp;gt; throughput&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multiple small, bounded cards ready&lt;/td&gt;
&lt;td&gt;Throughput (sub-agents)&lt;/td&gt;
&lt;td&gt;Fast, cheap, parallel&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multiple substantial cards&lt;/td&gt;
&lt;td&gt;Throughput (agent teams)&lt;/td&gt;
&lt;td&gt;Full context, sustained autonomy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cards with cross-card dependencies&lt;/td&gt;
&lt;td&gt;Throughput (agent teams)&lt;/td&gt;
&lt;td&gt;Agents need to communicate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Research while you work&lt;/td&gt;
&lt;td&gt;Throughput (sub-agents)&lt;/td&gt;
&lt;td&gt;Background tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Review of completed work&lt;/td&gt;
&lt;td&gt;Throughput (sub-agents)&lt;/td&gt;
&lt;td&gt;Fresh context, separate reviewer&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Rate limits are the real parallelism ceiling.&lt;/strong&gt; They're pooled across all sessions on your account. Opus has the strictest limits. Plan for this when dispatching multiple workers.&lt;/p&gt;

&lt;h2&gt;
  
  
  The harness: making throughput mode reliable
&lt;/h2&gt;

&lt;p&gt;Dispatching multiple agents is easy. Getting reliable output is hard. The harness (task tracking, isolation, quality gates, review) is what makes multi-agent development repeatable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 0: the upstream gate
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The most important quality gate happens before any code is written.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Careful grooming is what makes the whole pipeline work. Clear description, specific acceptance criteria, explicit non-goals. As Ankit Jain puts it, "&lt;a href="https://www.latent.space/p/reviews-dead" rel="noopener noreferrer"&gt;the most valuable human judgment is exercised before the first line of code is generated, not after&lt;/a&gt;."&lt;/p&gt;

&lt;p&gt;I spend more time grooming cards than I do reviewing agent output. That ratio feels right. Groom in your main Claude Code session, use the conversation to think through edge cases, and write precise acceptance criteria. The card is the spec.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1: task tracking
&lt;/h3&gt;

&lt;p&gt;Agents need to discover available work, claim it atomically, and track what's been tried. A TODO list isn't enough.&lt;/p&gt;

&lt;p&gt;I'm using &lt;a href="https://github.com/steveyegge/beads" rel="noopener noreferrer"&gt;Beads&lt;/a&gt; for this. It stores data locally via &lt;a href="https://github.com/dolthub/dolt" rel="noopener noreferrer"&gt;Dolt&lt;/a&gt;, gives agents programmatic access, and handles dependencies between tasks. The key commands:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;bd ready&lt;/code&gt; lists tasks with no open blockers&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;bd update &amp;lt;id&amp;gt; --claim&lt;/code&gt; atomically claims a task&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;bd show &amp;lt;id&amp;gt;&lt;/code&gt; gets full card details including previous notes and rejection feedback&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A &lt;code&gt;/dispatch&lt;/code&gt; skill wraps this into a workflow: find available cards via &lt;code&gt;bd ready&lt;/code&gt;, present them for selection, claim each one, and spawn a worker per card with worktree isolation.&lt;/p&gt;

&lt;p&gt;For multi-developer setups, a centralized tool (GitHub Issues, Linear) may be more practical. Beads' strength is agent-native programmatic access. See also &lt;a href="https://github.com/AvivK5498/Claude-Code-Beads-Orchestration" rel="noopener noreferrer"&gt;The Claude Protocol&lt;/a&gt; and &lt;a href="https://github.com/dsifry/metaswarm" rel="noopener noreferrer"&gt;Metaswarm&lt;/a&gt; for existing harness implementations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 2: isolation
&lt;/h3&gt;

&lt;p&gt;Without worktree isolation, parallel agents can't write to the same files. With it, each agent gets its own branch and working directory.&lt;/p&gt;

&lt;p&gt;A worker agent definition (&lt;code&gt;.claude/agents/worker.md&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sonnet&lt;/span&gt;
&lt;span class="na"&gt;isolation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;worktree&lt;/span&gt;
&lt;span class="na"&gt;background&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Read&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Write&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Edit&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Bash&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Glob&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Grep&lt;/span&gt;
&lt;span class="na"&gt;permissionMode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;acceptEdits&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;isolation: worktree&lt;/code&gt; gives each worker its own git &lt;a href="https://code.claude.com/docs/en/common-workflows" rel="noopener noreferrer"&gt;worktree&lt;/a&gt;. &lt;code&gt;background: true&lt;/code&gt; means the dispatch doesn't block waiting for workers to finish. &lt;code&gt;model: sonnet&lt;/code&gt; keeps costs down for development work (swap to &lt;code&gt;opus&lt;/code&gt; for complex cards).&lt;/p&gt;

&lt;p&gt;Supporting config:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;.worktreeinclude&lt;/code&gt; copies gitignored files (like &lt;code&gt;.env&lt;/code&gt;) into new worktrees&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;WorktreeCreate&lt;/code&gt; &lt;a href="https://code.claude.com/docs/en/hooks" rel="noopener noreferrer"&gt;hooks&lt;/a&gt; handle dependency installation&lt;/li&gt;
&lt;li&gt;Scope each agent via CLAUDE.md to prevent merge conflicts across worktrees&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Anthropic's &lt;a href="https://www.anthropic.com/engineering/building-c-compiler" rel="noopener noreferrer"&gt;C compiler case study&lt;/a&gt; used this pattern with 16 parallel agents. They hit duplicate work and merge conflicts. Tighter scoping and atomic task claiming address both.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 3: quality gates
&lt;/h3&gt;

&lt;p&gt;Two categories: automated (hooks that block the agent) and manual (human judgment during review). I underestimated how large agent-generated diffs get when the card isn't tightly scoped. The diff size guard was an afterthought; it's now one of the more useful gates.&lt;/p&gt;

&lt;h4&gt;
  
  
  Automated gates (fail-fast pyramid)
&lt;/h4&gt;

&lt;p&gt;Run fastest and cheapest first, most expensive last:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Formatting&lt;/strong&gt; (PostToolUse on Write/Edit, instant). Auto-fix, not a gate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Linting / static analysis&lt;/strong&gt; (seconds). Fast, deterministic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Type checking&lt;/strong&gt; (seconds). Catches interface mismatches.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secret detection&lt;/strong&gt; (PreToolUse on Edit/Write). Blocks before secrets hit disk.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unit tests&lt;/strong&gt; (minutes). The foundation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Diff size guard&lt;/strong&gt; (instant). Reject if change exceeds threshold. Prevents &lt;a href="https://addyo.substack.com/p/the-80-problem-in-agentic-coding" rel="noopener noreferrer"&gt;comprehension debt&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automated code review&lt;/strong&gt; (subagent, 30-90s). Separate agent reviews the diff.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The code review subagent must be a separate agent with its own context window. As Nick Tune &lt;a href="https://www.oreilly.com/radar/auto-reviewing-claudes-code/" rel="noopener noreferrer"&gt;writes&lt;/a&gt;, "asking the main agent to mark its own homework is obviously not a good approach." &lt;a href="https://hamy.xyz/blog/2026-02_code-reviews-claude-subagents" rel="noopener noreferrer"&gt;Hamilton Greene's 9-agent approach&lt;/a&gt; achieves roughly 75% useful suggestions versus less than 50% from single-agent review.&lt;/p&gt;

&lt;p&gt;Hook implementation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"PreToolUse"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"matcher"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Edit|Write"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"./scripts/detect-secrets.sh"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"PostToolUse"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"matcher"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Write|Edit"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx prettier --write &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;$FILE_PATH&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"TaskCompleted"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"matcher"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"./scripts/quality-gate.sh"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Exit code 0 proceeds. Exit code 2 blocks with feedback (the agent gets the stderr message and iterates). Lint, tests, and code review fire on &lt;code&gt;TaskCompleted&lt;/code&gt; (runs once when the agent says "done"). Secret detection fires on &lt;code&gt;PreToolUse&lt;/code&gt; (blocks before the write). See &lt;a href="https://code.claude.com/docs/en/hooks" rel="noopener noreferrer"&gt;hooks reference&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  Manual gates
&lt;/h4&gt;

&lt;p&gt;What automated checks can't catch:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://addyosmani.com/blog/good-spec/" rel="noopener noreferrer"&gt;Scope adherence&lt;/a&gt;.&lt;/strong&gt; Did the agent build what the card asked for, or add unrequested features?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://addyosmani.com/blog/factory-model/" rel="noopener noreferrer"&gt;Architectural coherence&lt;/a&gt;.&lt;/strong&gt; Does the implementation fit the architecture of the rest of the system, or did the agent invent its own patterns?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://earezki.com/ai-news/2026-04-04-ai-code-review-checklist/" rel="noopener noreferrer"&gt;Business logic correctness&lt;/a&gt;.&lt;/strong&gt; Models infer patterns statistically, not semantically.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://addyo.substack.com/p/the-80-problem-in-agentic-coding" rel="noopener noreferrer"&gt;Comprehension check&lt;/a&gt;.&lt;/strong&gt; If you can't understand the diff, it's too large or too novel.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Layer 4: review gates
&lt;/h3&gt;

&lt;p&gt;For trunk-based development without PRs, the worktree branch is the review surface. &lt;code&gt;git diff main&lt;/code&gt; from the worktree shows exactly what would change on merge.&lt;/p&gt;

&lt;p&gt;A &lt;code&gt;/review-worktree&lt;/code&gt; skill handles this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Cross-references &lt;code&gt;bd list --label review:pending&lt;/code&gt; with &lt;code&gt;git worktree list&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Shows commit history and diff summary for the selected worktree&lt;/li&gt;
&lt;li&gt;Options: view full diff, view specific file, run tests, run review agent, approve, reject&lt;/li&gt;
&lt;li&gt;Approve: merge to main, close card, clean up worktree&lt;/li&gt;
&lt;li&gt;Reject: reopen card with feedback comment visible to the next worker via &lt;code&gt;bd show&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For PR-based teams, the agent creates a PR from the worktree branch. Standard review process.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 5: the feedback loop
&lt;/h3&gt;

&lt;p&gt;When work is rejected, the card reopens with a comment explaining why. On the next dispatch, the worker agent sees the rejection feedback via &lt;code&gt;bd show&lt;/code&gt;. It has context on what was tried and why it failed.&lt;/p&gt;

&lt;p&gt;When an agent is stuck, the card goes to &lt;code&gt;blocked&lt;/code&gt; with a &lt;code&gt;needs-help&lt;/code&gt; label and a note explaining what was tried. The human reviews and either re-grooms the card or splits it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it looks like in practice
&lt;/h2&gt;

&lt;p&gt;Here's a walkthrough on &lt;a href="https://github.com/javatarz/credit-card-lending" rel="noopener noreferrer"&gt;credit-card-lending&lt;/a&gt; using Agent Teams. Three cards groomed and ready. The session starts with &lt;code&gt;CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1&lt;/code&gt; enabled.&lt;/p&gt;

&lt;h3&gt;
  
  
  Check the backlog and create the team
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;bd ready &lt;span class="nt"&gt;--plain&lt;/span&gt;
&lt;span class="go"&gt;
bd-c4a1  P1  Add payment grace period calculation     customer
bd-e2f7  P1  Fix SSN validation accepting 000 prefix  customer
bd-b8d3  P2  Add CSV export to transaction history     reporting

&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Create a team with three teammates, one &lt;span class="k"&gt;for &lt;/span&gt;each card. Each teammate should use the worker agent definition and get its own worktree.
&lt;span class="go"&gt;
Creating team "sprint-work"...

Spawning teammate "grace-period" for bd-c4a1...
  Worktree: work/bd-c4a1-payment-grace-period
  Card claimed.

Spawning teammate "ssn-fix" for bd-e2f7...
  Worktree: work/bd-e2f7-fix-ssn-validation
  Card claimed.

Spawning teammate "csv-export" for bd-b8d3...
  Worktree: work/bd-b8d3-csv-export-transactions
  Card claimed.

Team "sprint-work" running. Ctrl+T to toggle task list. Shift+Down to
cycle between teammates.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Monitor progress
&lt;/h3&gt;

&lt;p&gt;The shared task list shows what each teammate is working on. &lt;code&gt;Ctrl+T&lt;/code&gt; toggles it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Tasks:
  [in_progress] bd-c4a1: Add payment grace period calculation (grace-period)
  [completed]   bd-e2f7: Fix SSN validation accepting 000 prefix (ssn-fix)
  [in_progress] bd-b8d3: Add CSV export to transaction history (csv-export)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;While teammates work, I stay in the lead session. Groom next sprint's cards, explore a design problem, whatever needs thinking. Teammates message the lead if they're stuck or need clarification.&lt;/p&gt;

&lt;h3&gt;
  
  
  Review completed work
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;ssn-fix&lt;/code&gt; and &lt;code&gt;csv-export&lt;/code&gt; have finished. I review each worktree diff from the lead session.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;&lt;span class="gi"&gt;&amp;gt; Show me the diff for ssn-fix's worktree
&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="p"&gt;Commits (main..HEAD):
&lt;/span&gt;  a3f8c21 Fix SSN validation to reject 000 and 999 prefixes
  e7b2d14 Add test cases for invalid SSN prefixes
&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="p"&gt;Changed files:
&lt;/span&gt;  src/.../customer/validation/SsnValidator.java      | 12 ++++++--
  src/.../customer/validation/SsnValidatorTest.java  | 28 ++++++++++++++++
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Small, focused fix. Two files, clear test coverage. Merge it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Merge ssn-fix&lt;span class="s1"&gt;'s worktree to main
&lt;/span&gt;&lt;span class="go"&gt;
Merging work/bd-e2f7-fix-ssn-validation into main... done
Closing bd-e2f7... done
Removing worktree... done
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the CSV export:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;&lt;span class="gi"&gt;&amp;gt; Show me the diff for csv-export's worktree
&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="p"&gt;Commits (main..HEAD):
&lt;/span&gt;  b1c4e89 Add CSV export endpoint for transaction history
  d5a7f23 Add PDF export endpoint for transaction history
  f9e1b34 Add export format selection dropdown to UI
&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="p"&gt;Changed files:
&lt;/span&gt;  12 files changed, 847 insertions(+), 23 deletions(-)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Scope creep. The card said CSV export. The teammate added PDF export and a UI component.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Reject. Send csv-export a message: &lt;span class="s2"&gt;"Card asked for CSV export only.
&lt;/span&gt;&lt;span class="go"&gt;  PDF export and UI dropdown are out of scope. Revert those changes
  and keep only the CSV export."

Message sent to csv-export. Reopening bd-b8d3...
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With Agent Teams, the rejection goes directly to the teammate via &lt;code&gt;SendMessage&lt;/code&gt;. The teammate receives the feedback, reverts the out-of-scope work, and resubmits. No re-dispatch needed.&lt;/p&gt;

&lt;p&gt;This is a common failure mode: agents are eager to build adjacent features. The tighter the acceptance criteria in the card, the less often this happens.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where it falls apart
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Compound reliability
&lt;/h3&gt;

&lt;p&gt;Each agent at 95% success. Five agents chained: &lt;a href="https://towardsdatascience.com/the-multi-agent-trap/" rel="noopener noreferrer"&gt;roughly 77%&lt;/a&gt;. Multi-agent trades reliability for parallelism. The benefit must justify the overhead.&lt;/p&gt;

&lt;h3&gt;
  
  
  Context loss between agents
&lt;/h3&gt;

&lt;p&gt;Every handoff is lossy compression. Google Research found &lt;a href="https://github.blog/ai-and-ml/generative-ai/multi-agent-workflows-often-fail-heres-how-to-engineer-ones-that-dont/" rel="noopener noreferrer"&gt;39-70% degradation&lt;/a&gt; in sequential multi-agent tasks. Subagents summarize results back to the caller; teammates don't get the lead's conversation history. Isolation prevents context pollution but loses nuance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Token cost
&lt;/h3&gt;

&lt;p&gt;Multi-agent consumes &lt;a href="https://www.augmentcode.com/guides/single-agent-vs-multi-agent-ai" rel="noopener noreferrer"&gt;2-5x more tokens&lt;/a&gt; for equivalent work. No published harness has budget limits per task. &lt;code&gt;/usage&lt;/code&gt; monitoring is the best we have. This is an unsolved problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  Time blindness
&lt;/h3&gt;

&lt;p&gt;From the &lt;a href="https://www.anthropic.com/engineering/building-c-compiler" rel="noopener noreferrer"&gt;C compiler case study&lt;/a&gt;: Claude can't tell time and will spend hours running tests instead of making progress. The harness needs to print progress infrequently and offer fast-test options.&lt;/p&gt;

&lt;h3&gt;
  
  
  Duplicate work
&lt;/h3&gt;

&lt;p&gt;Without task claiming, multiple agents fix the same bug independently and overwrite each other. I've seen this even with bd's &lt;code&gt;--claim&lt;/code&gt;, when two cards touch overlapping files. The C compiler case study hit it at scale with 16 agents targeting the same bug.&lt;/p&gt;

&lt;h3&gt;
  
  
  The 18-month wall
&lt;/h3&gt;

&lt;p&gt;Without quality gates, the pattern is: early velocity (months 1-3), plateau (4-9), decline (10-15), stall (16-18) as comprehension debt accumulates. &lt;a href="https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-generation-report" rel="noopener noreferrer"&gt;CodeRabbit's research&lt;/a&gt; found AI-generated code produces 1.7x more issues and performance inefficiencies 8x more often than human code. This is why quality gates matter. Without them, the velocity gains are temporary.&lt;/p&gt;

&lt;h2&gt;
  
  
  The honest tradeoffs
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Model lock-in
&lt;/h3&gt;

&lt;p&gt;Claude Code is locked to Claude models. The orchestration layer (sub-agents, agent teams, skills, hooks, worktrees) doesn't exist in other tools. Your model choice is portable (use Claude API keys with aider, opencode, etc.) but the harness is not. No open-source tool today gives you model flexibility and Claude Code's agent stack. If you're invested in this workflow, you're invested in Claude Code.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to stay in thinking mode
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;You're exploring, designing, or grooming. The value is in the conversation.&lt;/li&gt;
&lt;li&gt;One task that needs your full attention and steering.&lt;/li&gt;
&lt;li&gt;Cost constraint. Throughput mode is 2-5x more expensive per equivalent output.&lt;/li&gt;
&lt;li&gt;The work isn't decomposed into independent cards yet. Dispatch without grooming is waste.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The real cost
&lt;/h3&gt;

&lt;p&gt;Anthropic's C compiler project: &lt;a href="https://www.anthropic.com/engineering/building-c-compiler" rel="noopener noreferrer"&gt;$20K in API costs&lt;/a&gt; for 16 agents producing 100K lines of code. That excludes significant human effort for workflow design, task decomposition, agent management, output review, and integration. Budget for both.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;Today's harness is human-triggered. You run &lt;code&gt;/dispatch&lt;/code&gt; when you're ready. The next step is agents that continuously pull from the backlog as cards become ready, with the human as reviewer rather than dispatcher.&lt;/p&gt;

&lt;p&gt;The pieces exist: &lt;code&gt;bd ready&lt;/code&gt; for discovery, worktrees for isolation, hooks for quality, agent teams for coordination. The missing piece is the continuous loop, and the trust to let it run.&lt;/p&gt;

&lt;p&gt;Companies with agentic coding infrastructure report 30-50% acceleration in development cycles. But a &lt;a href="https://www.nber.org/" rel="noopener noreferrer"&gt;February 2026 NBER study&lt;/a&gt; of nearly 6,000 executives found 89% of firms report zero productivity change from AI. The gap between those groups isn't model quality. It's the infrastructure around the model.&lt;/p&gt;

&lt;p&gt;That's been the consistent lesson: harness design matters as much as prompt design.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Structuring Claude Code for Multi-Repo Workspaces</title>
      <dc:creator>Karun Japhet</dc:creator>
      <pubDate>Thu, 26 Mar 2026 19:38:34 +0000</pubDate>
      <link>https://dev.to/javatarz/structuring-claude-code-for-multi-repo-workspaces-4147</link>
      <guid>https://dev.to/javatarz/structuring-claude-code-for-multi-repo-workspaces-4147</guid>
      <description>&lt;p&gt;Claude Code understands one repo at a time. Most teams have thirty.&lt;/p&gt;

&lt;p&gt;Microservices, shared libraries, infrastructure-as-code, frontend apps, data pipelines, all in separate git repos. Start Claude Code in one and ask about another, and it has no context. It doesn't know the workspace exists.&lt;/p&gt;

&lt;p&gt;Here's how I've been setting this up to work across repositories.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://karun.me/assets/images/posts/2026-03-26-structuring-claude-code-for-multi-repo-workspaces/cover.png" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fafmfxd109jf30ob3yb08.png" alt="Three translucent layers showing org, team, and repo context stacking in a multi-repo workspace" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;When you start Claude Code in &lt;code&gt;orders/order-service&lt;/code&gt;, it has no idea that &lt;code&gt;orders/orders-ui&lt;/code&gt; exists next door, or that shared libraries live in &lt;code&gt;shared/&lt;/code&gt;, or that the data team's Spark jobs are in &lt;code&gt;analytics/&lt;/code&gt;. Every session starts with you explaining the workspace layout.&lt;/p&gt;

&lt;p&gt;The same problem shows up when someone new joins the team. They clone one repo, but they don't know what other repos exist, how they relate, or where to look for shared infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  A bootstrap repo as the workspace root
&lt;/h2&gt;

&lt;p&gt;The approach I landed on: a bootstrap repo that sits above all the other repos as the workspace root. It doesn't contain application code. It contains:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;A repo manifest&lt;/strong&gt; listing every repo, where it lives, and what it does&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context files&lt;/strong&gt; that Claude Code picks up from the directory tree&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tasks&lt;/strong&gt; for common cross-repo operations (pull all, search all, check status)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I use &lt;a href="https://github.com/alajmo/mani" rel="noopener noreferrer"&gt;mani&lt;/a&gt; as the repo manager, but the ideas apply regardless of tooling. You could do this with a shell script and a list of repos.&lt;/p&gt;

&lt;h3&gt;
  
  
  Directory structure
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;workspace/
  mani.yaml                  # imports per-product configs
  CLAUDE.md                  # org-level context
  mani.d/
    orders.yaml              # order management (3-tier)
    shipping.yaml            # shipping &amp;amp; logistics (3-tier)
    analytics.yaml           # data platform (Spark, Airflow, APIs)
    assist.yaml              # agentic AI system (FastAPI, LangGraph, React)
    shared.yaml              # shared libraries and services
    infra.yaml               # infrastructure repos
  orders/
    CLAUDE.md                # team-level context (tracked in bootstrap)
    order-service/           # Spring Boot (gitignored)
    payment-service/         # Spring Boot (gitignored)
    orders-ui/               # React (gitignored)
    reporting-service/       # Spring Boot + PostgreSQL (gitignored)
    pricing-engine/          # Vert.x, not Spring Boot (gitignored)
  shipping/
    CLAUDE.md
    shipment-service/        # Spring Boot + MongoDB
    shipping-ui/             # Angular
    carrier-service/         # Spring Boot, reactive
  analytics/
    CLAUDE.md
    airflow-dags/            # Python, Airflow
    spark-jobs/              # PySpark on EMR
    metrics-service/         # Kotlin, Micronaut
    dashboard-ui/            # React
  assist/
    CLAUDE.md
    agent-service/           # FastAPI + LangGraph
    conversation-service/    # Spring Boot + WebSocket
    chat-ui/                 # React + streaming chat
  shared/
    CLAUDE.md
    react-lib/
    java-commons/
    feature-toggles/
  infra/
    CLAUDE.md
    terraform-modules/
    ci-templates/
    cluster/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each indented directory under a product (&lt;code&gt;order-service/&lt;/code&gt;, &lt;code&gt;orders-ui/&lt;/code&gt;, &lt;code&gt;spark-jobs/&lt;/code&gt;, etc.) is a separate git repo, cloned by the repo manager and gitignored by the bootstrap repo. The CLAUDE.md files at each level are tracked in the bootstrap repo.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three layers of context
&lt;/h2&gt;

&lt;p&gt;Claude Code walks up the directory tree looking for CLAUDE.md files. If you start it in &lt;code&gt;orders/order-service&lt;/code&gt;, it reads:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;orders/order-service/CLAUDE.md&lt;/code&gt; (repo-level, committed in that repo)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;orders/CLAUDE.md&lt;/code&gt; (team-level, committed in bootstrap)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;workspace/CLAUDE.md&lt;/code&gt; (org-level, committed in bootstrap)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each layer adds context without repeating what the others provide.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1: Organisation
&lt;/h3&gt;

&lt;p&gt;The org-level CLAUDE.md covers things that apply everywhere:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Warning that this is a multi-repo workspace (check &lt;code&gt;git rev-parse --show-toplevel&lt;/code&gt; before git operations)&lt;/li&gt;
&lt;li&gt;How to discover repos (point to the manifest file)&lt;/li&gt;
&lt;li&gt;Which products exist and what they own&lt;/li&gt;
&lt;li&gt;Common cross-repo operations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Keep this short. Claude reads it on every session regardless of which repo you're in.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 2: Team
&lt;/h3&gt;

&lt;p&gt;The team-level CLAUDE.md covers conventions shared across repos in that group. The content varies by product type:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A 3-tier product&lt;/strong&gt; (like orders or shipping) might cover:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Backend stack (Java 21, Spring Boot 3.5, Gradle, MongoDB)&lt;/li&gt;
&lt;li&gt;Frontend stack (React 19, Vite, TypeScript)&lt;/li&gt;
&lt;li&gt;Build and test commands for each&lt;/li&gt;
&lt;li&gt;The one exception (the pricing engine uses Vert.x, not Spring Boot)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;A data platform&lt;/strong&gt; (like analytics) might cover:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Orchestration (Airflow DAGs, triggered via async-job-service)&lt;/li&gt;
&lt;li&gt;Processing (PySpark on EMR, containerised Python jobs on ECS)&lt;/li&gt;
&lt;li&gt;Multi-region support (pipelines run per-region with region-specific config)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;An agentic system&lt;/strong&gt; (like assist) might cover:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agent framework (FastAPI + LangGraph for orchestration)&lt;/li&gt;
&lt;li&gt;Backing services (Spring Boot for persistence, WebSocket for streaming)&lt;/li&gt;
&lt;li&gt;Frontend (React with streaming UI patterns)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I learned not to list repos here. Lists go stale. Instead, tell Claude where to look: "This group's repos are defined in &lt;code&gt;mani.d/orders.yaml&lt;/code&gt;. Each project has a &lt;code&gt;desc&lt;/code&gt; field. Check that file for the current list."&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 3: Repository
&lt;/h3&gt;

&lt;p&gt;This lives in each repo and is maintained by the team that owns it. Build commands, architecture notes, test instructions, things specific to that codebase. This is standard Claude Code usage, nothing new.&lt;/p&gt;

&lt;h2&gt;
  
  
  Project descriptions in the manifest
&lt;/h2&gt;

&lt;p&gt;One-line descriptions in the repo manifest make a big difference for discovery. When Claude reads the manifest, it knows what each repo does without cloning or exploring it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;projects&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;order-service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;desc&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Order lifecycle management and fulfilment&lt;/span&gt;
    &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;git@gitlab.com:acme/order-service.git&lt;/span&gt;
    &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;orders/order-service&lt;/span&gt;
    &lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;orders&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;java&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

  &lt;span class="na"&gt;pricing-engine&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;desc&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Vert.x real-time pricing engine&lt;/span&gt;
    &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;git@gitlab.com:acme/pricing-engine.git&lt;/span&gt;
    &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;orders/pricing-engine&lt;/span&gt;
    &lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;orders&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;java&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

  &lt;span class="na"&gt;orders-ui&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;desc&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;React UI for order management and reporting&lt;/span&gt;
    &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;git@gitlab.com:acme/orders-ui.git&lt;/span&gt;
    &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;orders/orders-ui&lt;/span&gt;
    &lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;orders&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;ui&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;desc&lt;/code&gt; field costs almost nothing to maintain and saves Claude from guessing or asking.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cross-repo tasks
&lt;/h2&gt;

&lt;p&gt;A repo manager like mani lets you define tasks that run across repos:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;tasks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;update-repos&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;desc&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pull latest for all repos&lt;/span&gt;
    &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;all&lt;/span&gt;
    &lt;span class="na"&gt;cmd&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
      &lt;span class="s"&gt;current=$(git rev-parse --abbrev-ref HEAD)&lt;/span&gt;
      &lt;span class="s"&gt;if [[ -n $(git status -s) ]]; then&lt;/span&gt;
        &lt;span class="s"&gt;git fetch origin $branch&lt;/span&gt;
        &lt;span class="s"&gt;echo "FETCHED (dirty working tree on $current)"&lt;/span&gt;
      &lt;span class="s"&gt;elif [[ "$$current" != "$branch" ]]; then&lt;/span&gt;
        &lt;span class="s"&gt;git fetch origin $branch&lt;/span&gt;
        &lt;span class="s"&gt;echo "FETCHED (on branch $current, not $branch)"&lt;/span&gt;
      &lt;span class="s"&gt;else&lt;/span&gt;
        &lt;span class="s"&gt;git pull --rebase origin $branch&lt;/span&gt;
      &lt;span class="s"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This one pulls latest on repos that are clean and on the default branch, and fetches (but doesn't touch) repos with work in progress. The data is available locally either way, so the next pull is fast.&lt;/p&gt;

&lt;p&gt;Other useful tasks: search across all repos, check which repos have uncommitted changes, trigger CI pipelines.&lt;/p&gt;

&lt;h2&gt;
  
  
  The gitignore trick for team-level CLAUDE.md files
&lt;/h2&gt;

&lt;p&gt;The bootstrap repo gitignores all sub-repo directories. But the team-level CLAUDE.md files need to be tracked in bootstrap, inside those same directories. The fix:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Use dir/* instead of dir/ so exceptions work
orders/*
!orders/CLAUDE.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;orders/&lt;/code&gt; ignores the directory entirely (git won't look inside). &lt;code&gt;orders/*&lt;/code&gt; ignores everything inside it but lets you exclude specific files.&lt;/p&gt;

&lt;h2&gt;
  
  
  Skills, hooks, and commands
&lt;/h2&gt;

&lt;p&gt;Claude Code supports &lt;a href="https://docs.anthropic.com/en/docs/claude-code" rel="noopener noreferrer"&gt;skills, hooks, and custom commands&lt;/a&gt; configured in the &lt;code&gt;.claude/&lt;/code&gt; directory of a repo. These have always worked at the repo level. The bootstrap structure gives you two more levels:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Org level&lt;/strong&gt; (in the bootstrap repo's &lt;code&gt;.claude/&lt;/code&gt;):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Skills that work across all repos. I have one that queries SonarQube for any repo in the workspace, auto-detecting the project key from the current directory.&lt;/li&gt;
&lt;li&gt;Pre-commit hooks (gitleaks for secret detection, applied to the bootstrap repo itself).&lt;/li&gt;
&lt;li&gt;Shell scripts for operations that span teams, like auditing which repos still need a branch migration.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Team level&lt;/strong&gt; (in each team's CLAUDE.md or tracked config):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Build conventions that apply to all repos in a team but not the whole org. A team with ten Spring Boot services can document the shared Gradle convention plugins once, in the team CLAUDE.md.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Repo level&lt;/strong&gt; (in each repo, as before):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Repo-specific skills, hooks, and commands. Nothing changes here.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The layering means you write a SonarQube skill once at the org level and it works in any repo. You document &lt;code&gt;./gradlew spotlessApply&lt;/code&gt; once at the team level and every repo in that team inherits the context.&lt;/p&gt;

&lt;h2&gt;
  
  
  Partial and full checkouts
&lt;/h2&gt;

&lt;p&gt;Not everyone needs the whole workspace. Most developers I work with only clone their team's repos:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;workspace/
  mani.yaml
  CLAUDE.md
  orders/
    CLAUDE.md
    order-service/
    payment-service/
    orders-ui/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;They still get the org-level and team-level CLAUDE.md files. Claude Code still understands the team's conventions and knows how to discover the rest of the organisation through the manifest.&lt;/p&gt;

&lt;p&gt;A platform engineer or architect who works across teams clones everything. They get the full context at every level.&lt;/p&gt;

&lt;p&gt;The repo manager handles both. You can tag repos by team and clone selectively (&lt;code&gt;mani sync --tags orders&lt;/code&gt;) or clone everything (&lt;code&gt;mani sync&lt;/code&gt;). Either way, the layered context works because CLAUDE.md files at each level are already in place.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this gets you
&lt;/h2&gt;

&lt;p&gt;When someone starts Claude Code in any repo in the workspace, it already knows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What the repo does and how to build it&lt;/li&gt;
&lt;li&gt;What other repos exist in the same team and how they relate&lt;/li&gt;
&lt;li&gt;How to navigate to shared libraries, infrastructure, and deployment configs&lt;/li&gt;
&lt;li&gt;Common conventions and exceptions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to try this, start small. Create a bootstrap repo, add a CLAUDE.md with your workspace layout, and list your repos in a manifest with one-line descriptions. You can add team-level context and cross-repo tasks as the structure proves useful.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Agentic Patterns Developers Should Steal</title>
      <dc:creator>Karun Japhet</dc:creator>
      <pubDate>Thu, 19 Mar 2026 05:13:51 +0000</pubDate>
      <link>https://dev.to/javatarz/agentic-patterns-developers-should-steal-pb1</link>
      <guid>https://dev.to/javatarz/agentic-patterns-developers-should-steal-pb1</guid>
      <description>&lt;p&gt;Production agentic systems decompose problems and use the right tool for each step. Most developers hand the AI the whole problem.&lt;/p&gt;

&lt;p&gt;That's the gap. Teams building production AI workflows have developed patterns for making AI reliable. Developers using AI coding assistants like Claude Code, Cursor, or Copilot mostly haven't adopted them yet.&lt;/p&gt;

&lt;p&gt;These patterns aren't theoretical. They're practical and don't require special tooling.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://karun.me/assets/images/posts/2026-03-19-agentic-patterns-developers-should-steal/cover.png" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhesd60z9bx0m2bp6oj7p.png" alt="A figure crossing a bridge from a chaotic single-screen setup to an organised multi-station workspace" width="800" height="339"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Patterns
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;th&gt;What most devs currently do&lt;/th&gt;
&lt;th&gt;What devs should be doing&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Deterministic tool delegation&lt;/td&gt;
&lt;td&gt;Ask AI to do everything&lt;/td&gt;
&lt;td&gt;Use tools for solved problems, AI orchestrates&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Verification loops&lt;/td&gt;
&lt;td&gt;Accept first output&lt;/td&gt;
&lt;td&gt;Generate → evaluate → revise&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context engineering&lt;/td&gt;
&lt;td&gt;Dump everything in&lt;/td&gt;
&lt;td&gt;Curate what the model sees&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Upfront planning&lt;/td&gt;
&lt;td&gt;One big prompt&lt;/td&gt;
&lt;td&gt;Reviewable plan before execution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Persistent memory&lt;/td&gt;
&lt;td&gt;Start fresh each session&lt;/td&gt;
&lt;td&gt;Cross-session learning, codified constraints&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Structured guardrails&lt;/td&gt;
&lt;td&gt;Hope for the best&lt;/td&gt;
&lt;td&gt;Execution-layer constraints, hooks, gates&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Observability&lt;/td&gt;
&lt;td&gt;Look at the output&lt;/td&gt;
&lt;td&gt;Structured traces, quality measurement&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-agent specialisation&lt;/td&gt;
&lt;td&gt;One agent does everything&lt;/td&gt;
&lt;td&gt;Separate agents for separate concerns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Human-in-the-loop checkpoints&lt;/td&gt;
&lt;td&gt;Trust everything or nothing&lt;/td&gt;
&lt;td&gt;Consequence-based approval tiers&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Here's what each one looks like. Some link to deeper posts.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deterministic Tool Delegation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The pattern:&lt;/strong&gt; Don't let the AI make decisions it doesn't need to make. If a deterministic tool can handle something (refactoring, formatting, linting, data validation), use the tool. The AI's job is orchestration, not execution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What most developers do instead:&lt;/strong&gt; Ask the AI to rewrite code for a rename, follow a style guide from memory, or process data it doesn't need to see.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it matters:&lt;/strong&gt; Every unnecessary decision is a degree of freedom. Every degree of freedom is an opportunity to get something wrong, burn tokens, and produce a result you can't reproduce. Deterministic tools give you the same output every time.&lt;/p&gt;

&lt;p&gt;I wrote about this in depth in &lt;a href="https://dev.to/javatarz/the-unix-philosophy-for-agentic-coding-112p"&gt;The Unix Philosophy for Agentic Coding&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Verification Loops
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The pattern:&lt;/strong&gt; Instead of accepting the first output, create a generate-evaluate-revise cycle. The agent produces work, a separate pass critiques it against explicit criteria, and the agent revises.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What most developers do instead:&lt;/strong&gt; Prompt, receive, accept or reject. The interaction model is single-shot.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it matters:&lt;/strong&gt; LLMs produce plausible output that can be subtly wrong. Research shows &lt;a href="https://www.anthropic.com/research/building-effective-agents" rel="noopener noreferrer"&gt;10-20 percentage point improvements&lt;/a&gt; on coding benchmarks from reflection alone. Anthropic's own guidance identifies the evaluator-optimizer workflow as one of the core composable patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What this looks like in practice:&lt;/strong&gt; After asking your AI assistant to implement a feature, follow up with: "Review what you just wrote. Check for edge cases, error handling, and whether it follows patterns in this codebase. List problems, then fix them." For high-stakes changes, use a separate session as an independent reviewer.&lt;/p&gt;

&lt;p&gt;This pattern is also the foundation of test-driven development with AI: write the test first, let the AI implement, then the test itself becomes the verification loop. I've touched on this in the &lt;a href="https://dev.to/javatarz/intelligent-engineering-in-practice-41kf#3-tdd-implementation"&gt;TDD workflow in intelligent Engineering: In Practice&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Context Engineering
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The pattern:&lt;/strong&gt; Deliberately architect what information the model sees, when it sees it, and in what form. Treat context as a finite resource, not an infinite scratchpad.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What most developers do instead:&lt;/strong&gt; Paste entire files, full error logs, and broad descriptions, trusting the model to extract what's relevant.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it matters:&lt;/strong&gt; Including irrelevant data actively worsens output quality. Models have attention patterns that favour the start and end of context, with the middle getting less focus. More context is not always better context.&lt;/p&gt;

&lt;p&gt;I wrote a full post on this: &lt;a href="https://dev.to/javatarz/context-engineering-for-ai-assisted-development-b8i"&gt;Context Engineering for AI-Assisted Development&lt;/a&gt;. The short version: curate your CLAUDE.md for signal density, use &lt;code&gt;.claudeignore&lt;/code&gt; to exclude noise, provide the two or three most relevant files rather than the entire directory, and start fresh sessions when context degrades.&lt;/p&gt;

&lt;h3&gt;
  
  
  Upfront Planning
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The pattern:&lt;/strong&gt; Before any code is written, create an explicit plan that decomposes the work into steps with dependencies and acceptance criteria. Review the plan before execution begins.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What most developers do instead:&lt;/strong&gt; Give the AI a single prompt describing what they want and let it figure out the approach. "Add user authentication" becomes one big prompt rather than a sequence of reviewable steps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it matters:&lt;/strong&gt; Internal planning by the model is invisible and unreviewable. An explicit plan is where you catch architectural mistakes that are expensive to fix after implementation. It also prevents the "AI rewrote half the codebase and something is broken but I don't know where" problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What this looks like in practice:&lt;/strong&gt; For any task that touches more than two files: "Before implementing, create a plan. List the files you'll modify, the changes in each, the order of changes, and how you'll verify each step works." Review the plan before saying "proceed."&lt;/p&gt;

&lt;p&gt;This is central to the &lt;a href="https://dev.to/javatarz/intelligent-engineering-in-practice-41kf#2-design-discussion"&gt;design discussion workflow&lt;/a&gt; I use.&lt;/p&gt;

&lt;h3&gt;
  
  
  Persistent Memory
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The pattern:&lt;/strong&gt; Retain lessons, decisions, and discovered patterns across sessions. Build institutional knowledge over time rather than starting from zero each conversation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What most developers do instead:&lt;/strong&gt; Every session starts fresh. They rediscover the same issues, re-explain the same conventions, and re-learn the same codebase quirks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it matters:&lt;/strong&gt; Without cross-session memory, the AI makes the same mistakes repeatedly and you correct it repeatedly. Codified constraints prevent the same mistakes from recurring.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What this looks like in practice:&lt;/strong&gt; Maintain a CLAUDE.md that evolves. When you discover a gotcha ("the payments service returns 200 even on failures, check the response body"), add it immediately. When the AI makes a mistake, codify the prevention rule. Over time, your context docs accumulate the institutional knowledge that makes the AI genuinely useful on your specific project.&lt;/p&gt;

&lt;p&gt;I cover this in detail in the &lt;a href="https://dev.to/javatarz/intelligent-engineering-in-practice-41kf#level-1-foundation"&gt;Foundation&lt;/a&gt; and &lt;a href="https://dev.to/javatarz/intelligent-engineering-in-practice-41kf#level-2-context-documentation"&gt;Context Documentation&lt;/a&gt; layers of the intelligent Engineering stack.&lt;/p&gt;

&lt;h3&gt;
  
  
  Structured Guardrails
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The pattern:&lt;/strong&gt; Define explicit boundaries around which decisions the AI can make autonomously and which it should escalate. This includes architectural constraints ("don't introduce a new database without discussing it"), scope boundaries ("only modify files in this module"), and approval gates for high-impact changes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What most developers do instead:&lt;/strong&gt; Give the AI full autonomy without defining what's in and out of scope. The agent makes architectural decisions, introduces new patterns, or changes public APIs without checking whether that's what you intended.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it matters:&lt;/strong&gt; A prompt might be ignored as context fills up. A pre-commit hook won't be. Deterministic enforcement catches what prompt-based instructions miss.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What this looks like in practice:&lt;/strong&gt; Define boundaries in your CLAUDE.md ("never modify migration files without asking"). Use pre-commit hooks for formatting, linting, and security checks. Set up Claude Code hooks for auto-formatting and blocking sensitive operations. Let low-risk operations run freely. Pause high-risk ones for review.&lt;/p&gt;

&lt;p&gt;I wrote a hands-on tutorial on this: &lt;a href="https://dev.to/javatarz/level-up-code-quality-with-an-ai-assistant-5cdn"&gt;Level Up Code Quality with an AI Assistant&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Observability
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The pattern:&lt;/strong&gt; Systematic tracking of what the AI did, what worked, what failed, and using that data to improve future interactions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What most developers do instead:&lt;/strong&gt; Look at the output. No structured feedback, no trend tracking, no quality measurement over time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it matters:&lt;/strong&gt; The &lt;a href="https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/" rel="noopener noreferrer"&gt;METR study&lt;/a&gt; found developers estimated they were 24% faster with AI when they were actually 19% slower. Gut feel is unreliable. Without measurement, you don't know if the AI is helping, and you can't systematically improve your workflows.&lt;/p&gt;

&lt;p&gt;This is the least mature pattern in the list. The tooling barely exists for individuals and is fragmented across teams. I explore the current state, the gaps, and what we'd like to see in &lt;a href="https://dev.to/javatarz/observability-for-ai-assisted-development-2m06"&gt;Observability for AI-Assisted Development&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-Agent Specialisation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The pattern:&lt;/strong&gt; Instead of one generalist agent handling everything, use multiple specialised agents with focused context, specific tool access, and defined roles.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What most developers do instead:&lt;/strong&gt; One session, one agent, planning, implementation, and review all in the same context window.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it matters:&lt;/strong&gt; Each agent gets a fresh, focused context window rather than one bloated context trying to hold planning, implementation, review, and testing simultaneously. Specialisation also lets you use different models for different tasks (a thinking model for planning, a fast model for implementation).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What this looks like in practice:&lt;/strong&gt; Claude Code recently started offering to clear context when you accept a plan, giving the implementation phase a fresh, focused window with only the plan carried forward. Planning and implementation benefit from separate contexts.&lt;/p&gt;

&lt;p&gt;Take it further. Build an agentic team with a backlog: a planning agent that decomposes work into tasks, implementation agents that execute them, QA agents that test, and review agents that validate. Each agent has specific skills and focused context for its role. Claude Code's &lt;a href="https://code.claude.com/docs/en/agent-teams" rel="noopener noreferrer"&gt;Agent Teams&lt;/a&gt; and subagent features support this natively. Anthropic's engineering team &lt;a href="https://www.anthropic.com/engineering/building-c-compiler" rel="noopener noreferrer"&gt;built an entire C compiler&lt;/a&gt; using 16 agent teams, producing 100,000 lines of Rust code. Codex has &lt;a href="https://developers.openai.com/codex/multi-agent/" rel="noopener noreferrer"&gt;similar multi-agent capabilities&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Anthropic's internal benchmarks showed a &lt;a href="https://www.anthropic.com/engineering/multi-agent-research-system" rel="noopener noreferrer"&gt;90% improvement&lt;/a&gt; with multi-agent (Opus lead + Sonnet subagents) over solo Opus on complex tasks. &lt;a href="https://www.augmentcode.com/customers/Tekion-enabled-AI-agents" rel="noopener noreferrer"&gt;Tekion&lt;/a&gt; deployed persona-driven agents across 1,300 engineers and saw 50-85% productivity gains, compared to 30-40% with raw LLM prompting. The trade-off is tokens: multi-agent workflows use 2-3x more tokens, but for significant features, the quality improvement justifies the cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  Human-in-the-Loop Checkpoints
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The pattern:&lt;/strong&gt; Rather than either fully trusting the AI or micromanaging every line, define structured approval gates based on the consequence of the action.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What most developers do instead:&lt;/strong&gt; Operate in one of two modes. Either review everything line-by-line (treating the AI as fancy autocomplete) or accept large chunks with only a cursory glance. A formatting change and a database schema change get the same level of scrutiny.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it matters:&lt;/strong&gt; Not all changes carry the same risk. A tiered approach gives you speed where it's safe and control where it matters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What this looks like in practice:&lt;/strong&gt; Define personal approval tiers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Auto-approve:&lt;/strong&gt; Formatting, import organisation, adding type annotations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quick review:&lt;/strong&gt; New functions, test additions, single-file refactors&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Careful review:&lt;/strong&gt; Public API changes, database operations, auth logic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full review with plan:&lt;/strong&gt; Multi-file refactors, new architectural patterns, build/deploy changes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use small, frequent git commits as checkpoints. If something goes wrong, you can revert to a known-good state without losing everything. Before accepting a change, ask yourself: if this is wrong, what breaks and how hard is it to fix?&lt;/p&gt;

&lt;h2&gt;
  
  
  Where to Start
&lt;/h2&gt;

&lt;p&gt;You don't need all nine patterns at once. Start with the ones that address your biggest pain points:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Code quality issues?&lt;/strong&gt; Start with structured guardrails and verification loops.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI keeps making the same mistakes?&lt;/strong&gt; Start with persistent memory and context engineering.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Large diffs that are hard to review?&lt;/strong&gt; Start with upfront planning and human-in-the-loop checkpoints.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Spending too much on tokens?&lt;/strong&gt; Start with deterministic tool delegation and context engineering.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not sure if AI is helping?&lt;/strong&gt; Observability is still largely unsolved, but start by establishing baselines now so you can measure later.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Stop handing the AI the whole problem. Break it down and use the right tool for each step.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is part of a series on applying patterns from agentic systems to AI-assisted development. See also: &lt;a href="https://dev.to/javatarz/the-unix-philosophy-for-agentic-coding-112p"&gt;The Unix Philosophy for Agentic Coding&lt;/a&gt; and &lt;a href="https://dev.to/javatarz/observability-for-ai-assisted-development-2m06"&gt;Observability for AI-Assisted Development&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Observability for AI-Assisted Development</title>
      <dc:creator>Karun Japhet</dc:creator>
      <pubDate>Sat, 14 Mar 2026 12:57:41 +0000</pubDate>
      <link>https://dev.to/javatarz/observability-for-ai-assisted-development-2m06</link>
      <guid>https://dev.to/javatarz/observability-for-ai-assisted-development-2m06</guid>
      <description>&lt;p&gt;Developers using AI estimate they're 24% faster. A randomised controlled trial measured them at 19% slower.&lt;/p&gt;

&lt;p&gt;That's from METR's &lt;a href="https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/" rel="noopener noreferrer"&gt;2025 study&lt;/a&gt;. These were experienced open-source developers working on their own codebases with tools they chose. Their self-assessment was off by over 40 percentage points.&lt;/p&gt;

&lt;p&gt;If your perception of AI's impact is that unreliable, what are you actually measuring?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://karun.me/assets/images/posts/2026-03-12-observability-for-ai-assisted-development/cover.png" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fajkvd66bzyxn901ttc53.png" alt="A figure in a boat on foggy water, holding a lantern that barely illuminates the surrounding mist" width="800" height="339"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  You Need a Baseline First
&lt;/h2&gt;

&lt;p&gt;If you didn't measure before AI, measuring with AI won't work.&lt;/p&gt;

&lt;p&gt;You can't attribute improvements to AI if you don't know what "before" looked like. Cycle time, deployment frequency, change failure rate, MTTR, value delivered per sprint: these need to exist as baselines before you introduce a new variable. Otherwise you're guessing, and as the METR study shows, our guesses aren't great.&lt;/p&gt;

&lt;p&gt;I've seen teams adopt AI coding assistants and then ask "how do we know it's helping?" three months later. The real question is six months earlier: "how do we measure effectiveness?" If you didn't have that defined before AI, you won't have it now.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Exists Today
&lt;/h2&gt;

&lt;p&gt;The tooling for observability in AI-assisted development is fragmented. Cost visibility is reasonable. Quality visibility is nearly zero.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Code&lt;/strong&gt; is the most transparent. It ships with native &lt;a href="https://code.claude.com/docs/en/monitoring-usage" rel="noopener noreferrer"&gt;OpenTelemetry support&lt;/a&gt;, tracking tokens, cost, tool calls, and session duration. The &lt;code&gt;/cost&lt;/code&gt; command shows real-time spend. &lt;code&gt;/stats&lt;/code&gt; visualises daily usage, session history, and model preferences. &lt;code&gt;/insights&lt;/code&gt; goes further, analysing your sessions to surface project areas, interaction patterns, and friction points. Commits are auto-tagged with a co-author line, giving you a built-in "was this AI-generated?" marker in your git history. Anthropic provides an &lt;a href="https://github.com/anthropics/claude-code-monitoring-guide" rel="noopener noreferrer"&gt;official monitoring guide&lt;/a&gt; with Grafana dashboard configs and a Docker Compose setup, and the community has built &lt;a href="https://grafana.com/grafana/dashboards/24640-claude-code-victoriastack/" rel="noopener noreferrer"&gt;importable dashboards&lt;/a&gt; and &lt;a href="https://grafana.com/grafana/plugins/timurdigital-claudestats-app/" rel="noopener noreferrer"&gt;plugins&lt;/a&gt;. The infrastructure for collecting data exists. What to do with it is the harder question.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenAI Codex CLI&lt;/strong&gt; tags commits with a co-author line and supports &lt;a href="https://developers.openai.com/codex/cli/" rel="noopener noreferrer"&gt;OTel export&lt;/a&gt; for logs and traces. The &lt;a href="https://developers.openai.com/codex/enterprise/governance/" rel="noopener noreferrer"&gt;enterprise dashboard&lt;/a&gt; tracks daily users by product, code review completion rates, review priority and sentiment, and session-level message counts. It's adoption-focused: who's using what and how much. No quality metrics, no incident correlation, no rework tracking. Individual developers get &lt;code&gt;/status&lt;/code&gt; for rate limits but no cost visibility.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Aider&lt;/strong&gt; has the &lt;a href="https://aider.chat/docs/git.html" rel="noopener noreferrer"&gt;most configurable commit attribution&lt;/a&gt; of any tool (co-author trailers include the model name). But no OTel, no dashboard, no persistent cost history.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub Copilot&lt;/strong&gt; offers &lt;a href="https://docs.github.com/en/copilot/concepts/copilot-usage-metrics/copilot-metrics" rel="noopener noreferrer"&gt;team-level dashboards&lt;/a&gt;: acceptance rates, DAU/MAU, feature adoption. It's oriented toward "is our license worth it?" rather than "is the output good?" No commit tagging.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cursor&lt;/strong&gt; exposes very little. A "Year in Code" summary and an "AI Share of Committed Code" metric. No tracing, no commit tagging, no event-level data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cline&lt;/strong&gt; shows per-request cost in the UI (one of its standout features) and supports &lt;a href="https://docs.cline.bot/more-info/telemetry" rel="noopener noreferrer"&gt;OTel export at the enterprise tier&lt;/a&gt;. No commit tagging.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Amazon Q Developer&lt;/strong&gt; has the &lt;a href="https://docs.aws.amazon.com/amazonq/latest/qdeveloper-ug/dashboard.html" rel="noopener noreferrer"&gt;richest built-in analytics dashboard&lt;/a&gt; of any tool: acceptance rates, lines of code by feature type, code review counts, per-language breakdowns. But it's admin-only, subscription-based (no per-token tracking), and publishes to CloudWatch rather than OTel.&lt;/p&gt;

&lt;p&gt;Some of us have built our own layers on top. We use &lt;a href="https://github.com/Maciek-roboblog/Claude-Code-Usage-Monitor" rel="noopener noreferrer"&gt;Claude Code Usage Monitor&lt;/a&gt; to track token usage as a proxy for understanding consumption patterns. It isn't perfect, isn't always accurate, but it gives you a feeling for where your usage goes. A few engineers on our teams have personal Grafana dashboards tracking their own AI metrics. But these aren't centralised, aren't standardised, and aren't as useful as they could be.&lt;/p&gt;

&lt;p&gt;The picture across the industry: cost visibility is reasonable if you're willing to set it up. Commit tagging is inconsistent (Claude Code and Codex do it by default, most others don't). Quality visibility is nearly zero everywhere.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Missing
&lt;/h2&gt;

&lt;p&gt;The gaps fall into three levels: what individual developers need, what teams need, and what organisations need.&lt;/p&gt;

&lt;h3&gt;
  
  
  For the Individual Developer
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;No effort distribution.&lt;/strong&gt; You know how much you spent in tokens. You don't know where that effort went. Imagine if your AI assistant could tell you: "This week, 40% of your AI time went to test writing, 30% to refactoring, 20% to feature work, 10% to debugging. Your test-writing sessions had the highest acceptance rate. Your debugging sessions cost the most tokens per useful output." That would let you consciously decide where AI is worth using and where you're better off working without it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Limited failure pattern detection.&lt;/strong&gt; Claude Code's &lt;code&gt;/insights&lt;/code&gt; is the closest thing we have: it analyses sessions and surfaces friction points. That's a real start, and most other tools don't offer anything comparable. But it's still a snapshot of recent sessions, not a long-running trend line. If the AI keeps making the same category of mistake (wrong import paths, ignoring your test conventions, using a deprecated API), you want something that surfaces "you've corrected the AI on import paths 12 times this month" and suggests adding it to your CLAUDE.md. Some people maintain a manual &lt;code&gt;lessons-learned.md&lt;/code&gt; where they log AI mistakes. It works, but it's ad hoc.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No context effectiveness feedback.&lt;/strong&gt; CLAUDE.md files are checked in, reviewed in PRs, and engineered for effectiveness over time, much like prompts. The feedback loop exists but it's manual and slow. You notice the AI getting something wrong, update the file, and see if it improves. What's missing is the measurement that closes the loop: did that change actually improve output quality, or did it just feel like it did? The METR perception gap applies here too.&lt;/p&gt;

&lt;h3&gt;
  
  
  For the Team
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;No aggregate failure patterns.&lt;/strong&gt; If three engineers on the same team are all hitting the same AI failure mode, that's not three individual problems. It's a systemic context gap: a missing architectural convention, an undocumented pattern, a guardrail that should exist but doesn't. No tool surfaces this today.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No RCA correlation.&lt;/strong&gt; Claude Code tags commits with a co-author line. That's the "was this AI-generated?" link in the RCA chain. But other tools don't do this consistently. And even with the tag, nobody is aggregating that data: correlating AI-tagged commits with incident rates, rework rates, or review times over time. Traditional RCA follows a clear chain (incident → deployment → commit → PR → review → root cause). AI adds a question to that chain: was the reviewer's miss caused by a large AI-generated diff? Was the AI missing context it should have had? Is this a known AI weakness that should be in the team's guardrails?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The velocity flatline problem.&lt;/strong&gt; We've seen this firsthand. Teams get faster with AI. Then velocity flattens. Not because AI stopped helping, but because teams redirected the extra capacity to paying off debt or solving problems they found interesting. That's not necessarily bad, but if you're not tracking what work goes where, you can't tell the difference between "team is investing in sustainability" and "team is coasting."&lt;/p&gt;

&lt;p&gt;The fix we found: track work against cards. Measure total value delivered, not just pace. Make sure the extra capacity from AI shows up as increased value, not just different work. This is a process fix, not a tooling fix. No observability tool surfaces this today.&lt;/p&gt;

&lt;h3&gt;
  
  
  For the Organisation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;No cross-team maturity view.&lt;/strong&gt; Some teams will be excellent at AI-assisted development. Others will struggle. As a CTO, you need to know which is which, and more importantly, what the effective teams are doing differently. Are they better at context engineering? More disciplined about review? Today, finding this out requires manual investigation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No automated "are we improving?" picture.&lt;/strong&gt; This is the hardest gap. Drawing a full picture of whether an engineering organisation is improving has always required someone to build that view manually. AI hasn't changed that. It's just added another variable.&lt;/p&gt;

&lt;p&gt;The data exists. Commits are tagged. Tickets track value. CI tracks quality. AI tools track cost and usage. But nobody is stitching them into a coherent picture that answers: "Is AI helping us deliver more value, or is it making us feel faster while quality degrades?"&lt;/p&gt;

&lt;h2&gt;
  
  
  What We'd Like to See
&lt;/h2&gt;

&lt;p&gt;Here's what I wish existed:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI timesheets.&lt;/strong&gt; Not for billing. For self-awareness. Show me where my AI time goes, which task types have the best return, and where I'm burning tokens for low value. Let me compare across weeks and see trends.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Automated RCA tagging.&lt;/strong&gt; Correlate AI-tagged commits with downstream incidents, reverts, and rework. Not to blame the tool, but to know where to invest in better review, context, or guardrails.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context effectiveness scoring.&lt;/strong&gt; When I change my CLAUDE.md, show me whether output quality improved for the task types I was targeting. Even a rough signal (fewer corrections needed, lower rework rate) would be valuable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Failure pattern aggregation.&lt;/strong&gt; Surface repeated AI mistakes at the team level. If the same failure shows up across engineers, flag it as a context gap, not an individual problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The org-wide picture, stitched together.&lt;/strong&gt; Combine git data, ticket data, CI data, and AI usage data into a view that answers: are we delivering more value? Is quality holding? Where should we invest next?&lt;/p&gt;

&lt;h2&gt;
  
  
  Questions for Solution Builders
&lt;/h2&gt;

&lt;p&gt;If you're building in this space, here are the questions I'd want answered:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Can the "are we improving?" picture be automated?&lt;/strong&gt; The data is there (git, tickets, CI, AI usage). Can you stitch it together without someone manually maintaining a dashboard? Can you infer value delivery trends from data that already exists?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;How do you measure context effectiveness without controlled experiments?&lt;/strong&gt; A/B testing CLAUDE.md configurations isn't practical in real workflows. What proxy signals can tell us whether a context change helped?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;What does a useful AI timesheet look like?&lt;/strong&gt; Not session-level token counts, but task-level effort distribution. How do you classify AI sessions by task type without requiring the developer to manually tag them?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;How do you surface failure patterns across a team?&lt;/strong&gt; Individual correction patterns are noisy. Aggregate patterns are signal. What's the right level of abstraction?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;How do you separate "AI made us faster" from "we redirected capacity"?&lt;/strong&gt; Velocity metrics alone can't tell you this. What combination of signals can?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;How do you handle the perception gap?&lt;/strong&gt; Developers believe they're faster. Measurement sometimes shows otherwise. How do you present this data in a way that's constructive rather than demoralising?&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These aren't rhetorical questions. If you're building tools in this space, I'd like to hear your answers.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is the second post in a series on applying patterns from agentic systems to everyday AI-assisted development. The first, &lt;a href="https://dev.to/javatarz/the-unix-philosophy-for-agentic-coding-112p"&gt;The Unix Philosophy for Agentic Coding&lt;/a&gt;, covers deterministic tool delegation.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>discuss</category>
    </item>
    <item>
      <title>The Unix Philosophy for Agentic Coding</title>
      <dc:creator>Karun Japhet</dc:creator>
      <pubDate>Sat, 14 Mar 2026 12:57:26 +0000</pubDate>
      <link>https://dev.to/javatarz/the-unix-philosophy-for-agentic-coding-112p</link>
      <guid>https://dev.to/javatarz/the-unix-philosophy-for-agentic-coding-112p</guid>
      <description>&lt;p&gt;Most people use AI coding agents backwards. They hand the agent a problem and ask it to solve the whole thing. The agent reads, reasons, generates, and hopes for the best.&lt;/p&gt;

&lt;p&gt;There's a better way. One that's cheaper, more predictable, and already well understood. It's the &lt;a href="https://en.wikipedia.org/wiki/Unix_philosophy" rel="noopener noreferrer"&gt;Unix philosophy&lt;/a&gt;, applied to how we work with AI.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://karun.me/assets/images/posts/2026-03-05-the-unix-philosophy-for-agentic-coding/cover.png" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh00aqm9rviog128pgikv.png" alt="A robotic conductor directing an orchestra of developer tools" width="800" height="339"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pattern
&lt;/h2&gt;

&lt;p&gt;The Unix philosophy boils down to: do one thing well, compose small tools, let the shell orchestrate. When you work with an AI coding agent, the agent is the shell.&lt;/p&gt;

&lt;p&gt;Here's how I think about it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Break the problem down.&lt;/strong&gt; Don't hand the agent a big, vague goal. Decompose it into sub-problems.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;If a tool exists, use it.&lt;/strong&gt; Refactoring, formatting, linting, deployment: these are solved problems. Don't ask the AI to reinvent them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;If no tool exists, build one.&lt;/strong&gt; A small, deterministic script is better than an LLM making judgment calls where none are needed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The agent orchestrates.&lt;/strong&gt; It decides what to do, in what order, with which tools. That's where its intelligence adds value.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The principle is simple: &lt;strong&gt;don't let AI make decisions it doesn't need to make.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every unnecessary decision is a degree of freedom. Every degree of freedom is an opportunity for the model to get something wrong, burn tokens, and produce a result you can't reproduce.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Goes Wrong Without This
&lt;/h2&gt;

&lt;p&gt;When you ask an AI agent to do something a deterministic tool already handles, you get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Inconsistency.&lt;/strong&gt; LLMs aren't deterministic. Run the same prompt twice, get different results. A tool gives you the same output every time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wasted tokens.&lt;/strong&gt; Generating 200 lines of reformatted code costs tokens. Running &lt;a href="https://prettier.io" rel="noopener noreferrer"&gt;Prettier&lt;/a&gt; or &lt;a href="https://docs.astral.sh/ruff/" rel="noopener noreferrer"&gt;Ruff&lt;/a&gt; costs nothing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;More failure modes.&lt;/strong&gt; The model might miss edge cases a dedicated tool handles by design. A refactoring tool knows about downstream dependencies. An LLM might not.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Slower feedback loops.&lt;/strong&gt; Generating code, reviewing it, finding the error, regenerating: that cycle is slower than calling a tool that gets it right the first time.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Examples
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Refactoring
&lt;/h3&gt;

&lt;p&gt;I want to rename a method. The method is used across dozens of files.&lt;/p&gt;

&lt;p&gt;The naive approach: ask the agent to read the codebase, find all references, and rewrite them. The agent will try. It might miss some. It might introduce a formatting inconsistency along the way. You'll spend time reviewing a diff that's harder to trust.&lt;/p&gt;

&lt;p&gt;The better approach: the agent calls &lt;a href="https://www.jetbrains.com/help/idea/mcp-server.html" rel="noopener noreferrer"&gt;IntelliJ's refactoring tools via MCP&lt;/a&gt;. One command. Every reference updated. Downstream dependencies handled. No formatting changes. No guesswork.&lt;/p&gt;

&lt;p&gt;Refactoring is a solved problem. I wouldn't ask a teammate to do a manual find-and-replace across a codebase. I wouldn't ask an AI agent to either.&lt;/p&gt;

&lt;h3&gt;
  
  
  Analysing CSV Data
&lt;/h3&gt;

&lt;p&gt;I have a set of CSVs I need to extract insights from.&lt;/p&gt;

&lt;p&gt;The naive approach: hand the files to the agent and ask it to read, validate, extract, and summarise everything. The agent will try. It might misparse a column, silently drop malformed rows, or hallucinate a trend that isn't there. You won't know unless you check every step. Large CSVs make this worse. Hundreds of thousands of rows won't fit in a context window, and even if they did, you're burning tokens on data the model doesn't need to see. The agent doesn't know which rows matter until it's processed all of them.&lt;/p&gt;

&lt;p&gt;The better approach: build a small CLI that pre-processes the data first. Validate schemas, flag missing values, confirm row counts, filter to the relevant subset, compute the aggregations that don't need intelligence. This is deterministic work. Then pass the clean, reduced output to the agent for the part that actually needs judgment: identifying patterns and summarising insights.&lt;/p&gt;

&lt;p&gt;No tool existed for this specific validation, so I asked the agent to build one. That's the pattern. Build the tool, then use the tool. The agent wrote a script I can run repeatedly with predictable results. Now it's free to focus on what it's good at.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code Formatting
&lt;/h3&gt;

&lt;p&gt;I want my code to follow our team's style guide.&lt;/p&gt;

&lt;p&gt;The naive approach: include the style guide in the prompt and ask the agent to follow it. It will mostly comply. It will sometimes get creative (especially as &lt;a href="https://dev.to/javatarz/context-engineering-for-ai-assisted-development-b8i"&gt;context fills up&lt;/a&gt;). You'll find inconsistencies across files that are annoying to track down.&lt;/p&gt;

&lt;p&gt;The better approach: let the agent write code however it wants, then run &lt;a href="https://prettier.io" rel="noopener noreferrer"&gt;Prettier&lt;/a&gt;, &lt;a href="https://github.com/psf/black" rel="noopener noreferrer"&gt;Black&lt;/a&gt;, &lt;a href="https://docs.astral.sh/ruff/" rel="noopener noreferrer"&gt;Ruff&lt;/a&gt;, or &lt;a href="https://eslint.org" rel="noopener noreferrer"&gt;ESLint&lt;/a&gt;. Zero ambiguity. The agent doesn't need to think about formatting at all, which means fewer tokens spent and fewer decisions that could go wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  Skills, Hooks, and Tools
&lt;/h2&gt;

&lt;p&gt;If you use &lt;a href="https://docs.anthropic.com/en/docs/claude-code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;, you'll know about skills (composable prompt-driven capabilities) and hooks (event-driven automation). These are the wiring. But wiring without workers doesn't accomplish much.&lt;/p&gt;

&lt;p&gt;A good skill is composable. A great skill is composable and delegates to deterministic tools instead of taking on responsibilities it doesn't need. If a skill invokes a CLI tool, an API, or a build system instead of asking the LLM to reason through a solved problem, that skill will be faster, cheaper, and more reliable.&lt;/p&gt;

&lt;p&gt;The same applies beyond Claude Code. Cursor rules, Windsurf workflows, any AI assistant: the pattern holds. Build your workflows so the AI orchestrates tools, not replaces them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bigger Picture
&lt;/h2&gt;

&lt;p&gt;This isn't just about code formatting and refactoring. The same principle applies to deployment pipelines, database migrations, CI/CD workflows, building CLIs for business operations. Anywhere a deterministic tool can guarantee a correct result, use it. Reserve the LLM for the parts that genuinely need judgment: understanding intent, choosing an approach, reasoning about trade-offs, writing novel logic.&lt;/p&gt;

&lt;p&gt;Not every problem needs this treatment. For exploratory work, prototyping, or genuinely novel problems, letting the agent roam is the right call. But for the repeatable parts of your workflow, reach for a tool.&lt;/p&gt;

&lt;p&gt;The best AI workflows I've built look like Unix pipelines. Small, focused tools. A smart orchestrator composing them. The AI's value isn't in doing everything. It's in knowing what to do and calling the right tool to do it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Thanks to &lt;a href="https://www.linkedin.com/in/carmenmardiros/" rel="noopener noreferrer"&gt;Carmen Mardiros&lt;/a&gt; whose &lt;a href="https://www.meetup.com/data-engineers-london/events/313209661/" rel="noopener noreferrer"&gt;talk at Data Engineers London&lt;/a&gt; helped crystallize this thinking.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>intelligent Engineering: In Practice</title>
      <dc:creator>Karun Japhet</dc:creator>
      <pubDate>Sat, 03 Jan 2026 18:32:40 +0000</pubDate>
      <link>https://dev.to/javatarz/intelligent-engineering-in-practice-41kf</link>
      <guid>https://dev.to/javatarz/intelligent-engineering-in-practice-41kf</guid>
      <description>&lt;p&gt;Principles are easy. Application is hard.&lt;/p&gt;

&lt;p&gt;I've written about &lt;a href="https://dev.to/javatarz/intelligent-engineering-principles-for-building-with-ai-34aa"&gt;intelligent Engineering principles&lt;/a&gt; and &lt;a href="https://dev.to/javatarz/intelligent-engineering-a-skill-map-for-learning-ai-assisted-development-3kaj"&gt;the skills needed to build with AI&lt;/a&gt;. But I kept getting the same question: "How do I actually set this up on a real project?"&lt;/p&gt;

&lt;p&gt;This post answers that question. I'll walk through the complete setup, using a real repository as a worked example. Not a toy project. Not a weekend experiment. A codebase with architectural decisions, test coverage, documentation, and a clear development workflow.&lt;/p&gt;

&lt;p&gt;Here's what it looks like in action:&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/oK0N7pQ5rIY"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;h2&gt;
  
  
  The intelligent Engineering Stack
&lt;/h2&gt;

&lt;p&gt;Before diving into details, here's the mental model I use. intelligent Engineering isn't one thing. It's layers that enable each other:&lt;/p&gt;

&lt;p&gt;&lt;a href="/assets/images/posts/2026-01-02-intelligent-engineering-in-practice/ie-stack.svg"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fkarun.me%2Fassets%2Fimages%2Fposts%2F2026-01-02-intelligent-engineering-in-practice%2Fie-stack.svg" alt="The intelligent Engineering Stack: four layers from Foundation at the bottom, through Context, Interaction, to Workflow at the top"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This diagram shows &lt;a href="https://claude.ai/code/" rel="noopener noreferrer"&gt;Claude Code's&lt;/a&gt; primitives. Other AI assistants have different building blocks: Cursor has rules and &lt;code&gt;.cursorrules&lt;/code&gt;, Windsurf has Cascade workflows. The layers matter more than the specific implementation.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The screencast showed the workflow. The rest of this post explains what makes it work, layer by layer from top to bottom.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Two Phases of intelligent Engineering
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Shaping AI&lt;/strong&gt; is preparation. You define agentic workflows, set up tooling, provide context, and build a prompt library. Context includes coding guidelines, architecture patterns, and deployment patterns. This is the work before you write code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Leading AI&lt;/strong&gt; is execution. You move through the development cycle: ideation, research, feature specs, architecture, stories, code, test, deploy. This is the work of building software, with AI as a collaborator.&lt;/p&gt;

&lt;p&gt;The screencast showed Leading AI in action. But Leading AI only works well when Shaping AI is done right. Most of this post focuses on Shaping AI, the setup that makes everything else possible.&lt;/p&gt;

&lt;p&gt;We've written more about this framework on the &lt;a href="https://sahaj.ai/featured-article/realising-efficiency-and-productivity-through-intelligent-engineering/" rel="noopener noreferrer"&gt;Sahaj website&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Level 4: The Workflow
&lt;/h2&gt;

&lt;p&gt;This workflow mirrors how many teams already work: pick up a card, groom it, discuss the approach, TDD the implementation. Your team might work differently. intelligent Engineering is about shaping AI to match your existing practices, not adopting a prescribed workflow. The stack exists to encode how your team works. You should be &lt;a href="https://dev.to/javatarz/intelligent-engineering-principles-for-building-with-ai-34aa#shape-ai-deliberately"&gt;shaping AI&lt;/a&gt;, not letting it shape you. This example shows one approach; yours will look different.&lt;/p&gt;

&lt;p&gt;Here's what the screencast showed, step by step:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Pick Up Work
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; /pickup

## Available Stories by Priority

### P0 - Critical
- #23 - S01.3: Profile Completion [module:customer]

Which card would you like to pick up?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Design Discussion
&lt;/h3&gt;

&lt;p&gt;Here's what the design discussion looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;I've read #23 and the customer domain docs. Here's my understanding:

**What we're building:**
- Profile completion endpoint (PUT /api/v1/customers/{id}/profile)
- Validation: name, DOB (18+), SSN format, address
- SSN encryption before storage

**Proposed approach:**
1. Add ProfileRequest DTO with validation annotations
2. Extend CustomerService with completeProfile method
3. Add encryption service for SSN

**Questions:**
- Should SSN validation reject obvious invalid patterns (000-00-0000)?
- Should we allow profile updates after initial completion?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is Leading AI in action. The AI proposes, the human validates and refines. Agreement on approach before code.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. TDD Implementation
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://martinfowler.com/bliki/TestDrivenDevelopment.html" rel="noopener noreferrer"&gt;Test-Driven Development&lt;/a&gt; (TDD) is a discipline where you write tests before code. The cycle is Red-Green-Refactor: write a failing test (red), write minimum code to pass (green), clean up (refactor). Kent Beck's &lt;a href="https://www.oreilly.com/library/view/test-driven-development/0321146530/" rel="noopener noreferrer"&gt;Test-Driven Development by Example&lt;/a&gt; is the canonical reference.&lt;/p&gt;

&lt;p&gt;With agreement on approach, the TDD skill takes over:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Running in interactive mode...

RED: Writing test for profile completion with valid data
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://github.com/javatarz/credit-card-lending/blob/main/.claude/skills/tdd/SKILL.md" rel="noopener noreferrer"&gt;TDD skill&lt;/a&gt; supports multiple review modes that control when you review AI's work. Interactive mode pauses after each red-green cycle; batch and autonomous modes give more flow with less oversight.&lt;/p&gt;

&lt;p&gt;The AI writes a failing test:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Test&lt;/span&gt;
&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;shouldCompleteProfile_whenAllFieldsValid&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;aProfileRequest&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;withFirstName&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"John"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;withLastName&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Doe"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;withDateOfBirth&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;LocalDate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1990&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;withSsn&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"123-45-6789"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;customerService&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;completeProfile&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customerId&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

    &lt;span class="n"&gt;assertThat&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="o"&gt;()).&lt;/span&gt;&lt;span class="na"&gt;isEqualTo&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="no"&gt;PROFILE_COMPLETE&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And runs it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Test written. Running...
❌ FAILED: completeProfile method doesn't exist

RED confirmed. Ready for GREEN?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This continues through each acceptance criterion with small commits after each cycle.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Context Updates
&lt;/h3&gt;

&lt;p&gt;If you use &lt;code&gt;/pickup&lt;/code&gt; to start work, context updates happen automatically when the story completes. The &lt;code&gt;/update-context&lt;/code&gt; command is for when you need to update docs outside the standard workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; /update-context

Recent changes:
- Added ProfileRequest DTO
- Added completeProfile to CustomerService
- Added SSN encryption

Context docs to review:
- docs/context/domain/customer.md (profile rules)
- docs/context/modules/customer-module.md (new endpoint)
- docs/context/current-state.md (mark feature built)

Proceed with updates?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. Push to Main
&lt;/h3&gt;

&lt;p&gt;Trunk-based development. Small commits. Direct to main.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git push origin main
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Story auto-closes via the &lt;code&gt;Closes #23&lt;/code&gt; footer in the final commit.&lt;/p&gt;




&lt;p&gt;That's the workflow. Now let's understand what makes it possible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Level 3: The Interaction Layer
&lt;/h2&gt;

&lt;p&gt;This is how you interact with the AI during development. The examples use Claude Code primitives, but the concepts transfer to other tools:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Equivalents&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cursor&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://cursor.com/docs/context/rules#rules" rel="noopener noreferrer"&gt;Rules&lt;/a&gt; (&lt;code&gt;.cursorrules&lt;/code&gt;), custom instructions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GitHub Copilot&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://docs.github.com/copilot/customizing-copilot/adding-custom-instructions-for-github-copilot" rel="noopener noreferrer"&gt;Custom instructions&lt;/a&gt; (&lt;code&gt;.github/copilot-instructions.md&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Windsurf&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://docs.windsurf.com/windsurf/cascade/workflows" rel="noopener noreferrer"&gt;Workflows&lt;/a&gt;, &lt;a href="https://docs.windsurf.com/windsurf/cascade/memories#memories-and-rules" rel="noopener noreferrer"&gt;rules&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenAI Codex&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://developers.openai.com/codex/guides/agents-md/" rel="noopener noreferrer"&gt;AGENTS.md&lt;/a&gt;, &lt;a href="https://developers.openai.com/codex/skills/" rel="noopener noreferrer"&gt;skills&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Claude Code organizes these into distinct primitives: &lt;a href="https://code.claude.com/docs/en/slash-commands" rel="noopener noreferrer"&gt;commands&lt;/a&gt;, &lt;a href="https://code.claude.com/docs/en/skills" rel="noopener noreferrer"&gt;skills&lt;/a&gt;, and &lt;a href="https://code.claude.com/docs/en/hooks" rel="noopener noreferrer"&gt;hooks&lt;/a&gt;. Each serves a different purpose.&lt;/p&gt;

&lt;h3&gt;
  
  
  Design Principles
&lt;/h3&gt;

&lt;p&gt;Whether you use Claude Code, Cursor, or another tool, these principles apply:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Description quality is critical.&lt;/strong&gt; AI tools use descriptions to discover which skill to activate. Vague descriptions mean skills never get triggered. Include what the skill does AND when to use it, with specific trigger terms users would naturally say.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Bad&lt;/span&gt;
description: Helps with testing

&lt;span class="gh"&gt;# Good&lt;/span&gt;
description: Enforces Red-Green-Refactor discipline for code changes.
             Use when implementing features, fixing bugs, or writing code.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Single responsibility.&lt;/strong&gt; Each command or skill does one thing. &lt;code&gt;/pickup&lt;/code&gt; selects work. &lt;code&gt;/start-dev&lt;/code&gt; begins development. Combining them makes both harder to discover and maintain.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Give goals, not steps.&lt;/strong&gt; Let the AI decide specifics. "Sort by priority and present options" beats a rigid sequence of exact commands. The AI can adapt to context you didn't anticipate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Include escape hatches.&lt;/strong&gt; "If blocked, ask the user" prevents infinite loops. AI will try to solve problems; give it permission to ask for help instead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Progressive disclosure.&lt;/strong&gt; Keep the main instruction file concise. Put detailed references in separate files that load on-demand. Context windows are shared: your skill competes with conversation history for space.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Match freedom to fragility.&lt;/strong&gt; Some tasks need exact steps (database migrations). Others benefit from AI judgment (refactoring). Use specific scripts for fragile operations; flexible instructions for judgment calls.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Test across models.&lt;/strong&gt; What works with a powerful model may need more guidance for a faster one. If you switch models for cost or speed, verify your skills still work.&lt;/p&gt;

&lt;h3&gt;
  
  
  Commands
&lt;/h3&gt;

&lt;p&gt;Commands are user-invoked. You type &lt;code&gt;/pickup&lt;/code&gt; and something happens.&lt;/p&gt;

&lt;p&gt;Here's the command set I use:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/pickup&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Select next issue from backlog&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/start-dev&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Begin TDD workflow on assigned issue&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/update-context&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Review and update context docs after work&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/check-drift&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Detect misalignment between docs and code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/tour&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Onboard newcomers to the project&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each command is a markdown file in &lt;code&gt;.claude/commands/&lt;/code&gt; with instructions for the AI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Pick Up Next Card&lt;/span&gt;

You are helping the user pick up the next prioritized story.

&lt;span class="gu"&gt;## Instructions&lt;/span&gt;
&lt;span class="p"&gt;
1.&lt;/span&gt; Fetch open stories using GitHub CLI
&lt;span class="p"&gt;2.&lt;/span&gt; Sort by priority (P0 first, then P1, P2)
&lt;span class="p"&gt;3.&lt;/span&gt; Present options to the user
&lt;span class="p"&gt;4.&lt;/span&gt; When selected, assign the issue
&lt;span class="p"&gt;5.&lt;/span&gt; Show issue details to begin work
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;/tour&lt;/code&gt; command walks through project architecture, module structure, coding conventions, testing approach, and domain glossary. It turns context docs into an interactive onboarding experience.&lt;/p&gt;

&lt;h3&gt;
  
  
  Skills
&lt;/h3&gt;

&lt;p&gt;Skills are model-invoked. The AI activates them automatically based on context. If I ask to "implement the registration endpoint," the TDD skill activates without me saying &lt;code&gt;/tdd&lt;/code&gt;.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Skill&lt;/th&gt;
&lt;th&gt;Triggers On&lt;/th&gt;
&lt;th&gt;Does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tdd&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Code implementation requests&lt;/td&gt;
&lt;td&gt;Enforces Red-Green-Refactor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;review&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;After code changes&lt;/td&gt;
&lt;td&gt;Structured quality assessment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;wiki&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Wiki read/write requests&lt;/td&gt;
&lt;td&gt;Manages wiki access&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The TDD skill&lt;/strong&gt; is the one I use most:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trigger&lt;/strong&gt;: User asks to implement something, fix a bug, or write code&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Workflow&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;RED&lt;/strong&gt;: Write a failing test, run it, confirm it fails&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GREEN&lt;/strong&gt;: Write minimum code to pass, run tests, confirm green&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;REFACTOR&lt;/strong&gt;: Clean up while keeping tests green&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;COMMIT&lt;/strong&gt;: Small commit with issue reference&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Review modes&lt;/strong&gt; control how much human oversight:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Review Point&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Interactive&lt;/td&gt;
&lt;td&gt;Each Red-Green cycle&lt;/td&gt;
&lt;td&gt;Learning, complex logic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Batch AC&lt;/td&gt;
&lt;td&gt;After each acceptance criterion&lt;/td&gt;
&lt;td&gt;Moderate oversight&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Batch Story&lt;/td&gt;
&lt;td&gt;After all criteria complete&lt;/td&gt;
&lt;td&gt;Maximum flow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Autonomous&lt;/td&gt;
&lt;td&gt;Agent reviews continuously&lt;/td&gt;
&lt;td&gt;Speed with quality gates&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;I typically use interactive mode for unfamiliar code and batch-ac mode for well-understood patterns. I mostly use batch-story and autonomous modes for demos, though they'd suit repetitive work with well-established patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The review skill&lt;/strong&gt; provides structured feedback:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Code Review: normal mode&lt;/span&gt;

&lt;span class="gu"&gt;### Blockers (0 found)&lt;/span&gt;

&lt;span class="gu"&gt;### Warnings (2 found)&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; &lt;span class="gs"&gt;**CustomerService.java:45**&lt;/span&gt; Method exceeds 20 lines
&lt;span class="p"&gt;   -&lt;/span&gt; Consider extracting validation logic

&lt;span class="gu"&gt;### Suggestions (1 found)&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; &lt;span class="gs"&gt;**CustomerServiceTest.java:112**&lt;/span&gt; Test name could be more specific

&lt;span class="gu"&gt;### Summary&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Blockers: 0
&lt;span class="p"&gt;-&lt;/span&gt; Warnings: 2
&lt;span class="p"&gt;-&lt;/span&gt; Suggestions: 1
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**Verdict**&lt;/span&gt;: NEEDS ATTENTION
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The autonomous TDD mode uses this skill with configurable thresholds. "Strict" interrupts on any finding. "Relaxed" only stops for blockers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hooks
&lt;/h3&gt;

&lt;p&gt;Hooks are event-driven. They run shell commands or LLM prompts at specific lifecycle events: before a tool runs, after a file is written, when Claude asks for permission.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Event&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;PostToolUse&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Auto-format files after writes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;PreToolUse&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Block sensitive operations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;UserPromptSubmit&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Validate prompts before execution&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Example: auto-format with Prettier after every file write:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"PostToolUse"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"matcher"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Write|Edit"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx prettier --write &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;$FILE_PATH&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://github.com/javatarz/credit-card-lending" rel="noopener noreferrer"&gt;credit-card-lending&lt;/a&gt; project doesn't use hooks yet. They're next on the list.&lt;/p&gt;

&lt;h3&gt;
  
  
  Other Primitives
&lt;/h3&gt;

&lt;p&gt;Claude Code has additional constructs I haven't used in this project:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Primitive&lt;/th&gt;
&lt;th&gt;What It Does&lt;/th&gt;
&lt;th&gt;When to Use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;a href="https://code.claude.com/docs/en/sub-agents" rel="noopener noreferrer"&gt;Subagents&lt;/a&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Specialized delegates with separate context&lt;/td&gt;
&lt;td&gt;Complex multi-step tasks, context isolation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;a href="https://code.claude.com/docs/en/mcp" rel="noopener noreferrer"&gt;MCP&lt;/a&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;External tool integrations&lt;/td&gt;
&lt;td&gt;Database access, APIs, custom tools&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;a href="https://code.claude.com/docs/en/output-styles" rel="noopener noreferrer"&gt;Output Styles&lt;/a&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Custom system prompts&lt;/td&gt;
&lt;td&gt;Non-engineering tasks (teaching, writing)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;a href="https://code.claude.com/docs/en/plugins" rel="noopener noreferrer"&gt;Plugins&lt;/a&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Bundled primitives for distribution&lt;/td&gt;
&lt;td&gt;Team-wide deployment&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Start with commands, skills, and context docs. Add the others as your needs grow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Level 2: Context Documentation
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://dev.to/javatarz/context-engineering-for-ai-assisted-development-b8i"&gt;Context&lt;/a&gt; is what the AI knows about your project. I've seen teams underinvest here. They write a README and call it done, then wonder why AI assistants keep making the same mistakes.&lt;/p&gt;

&lt;p&gt;What's missing is your engineering culture. The hardest part isn't the tools, it's capturing what your team actually does. For example, code reviews are hard because most time goes to style, not substance. "Why isn't this using our logging pattern?" "We don't structure tests that way here." Without codification, AI applies its own defaults. The code might work, but it doesn't feel like &lt;em&gt;your&lt;/em&gt; code.&lt;/p&gt;

&lt;p&gt;When you codify your team's preferences, AI follows YOUR patterns instead of its defaults. Style debates &lt;a href="https://en.wikipedia.org/wiki/Shift-left_testing" rel="noopener noreferrer"&gt;shift left&lt;/a&gt;: instead of the same argument across a dozen pull requests, you debate once over a document. Once the document reflects consensus, it's settled.&lt;/p&gt;

&lt;h3&gt;
  
  
  What to Document
&lt;/h3&gt;

&lt;p&gt;I've settled on this structure:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;File&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;overview.md&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Architecture, tech stack, module boundaries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;conventions.md&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Code patterns, naming, git workflow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;testing.md&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;TDD approach, test structure, tooling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;glossary.md&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Domain terms with precise definitions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;current-state.md&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;What's built vs planned&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;domain/*.md&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Business rules for each domain&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;modules/*.md&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Technical details for each module&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The &lt;a href="https://github.com/javatarz/credit-card-lending" rel="noopener noreferrer"&gt;credit-card-lending&lt;/a&gt; project extends this with &lt;code&gt;integrations.md&lt;/code&gt; (external systems) and &lt;code&gt;metrics.md&lt;/code&gt; (measuring iE effectiveness). Adapt the structure to your domain's needs.&lt;/p&gt;

&lt;p&gt;These docs exist for both AI and human consumption, but discoverability matters. New team members shouldn't have to hunt through &lt;code&gt;docs/context/&lt;/code&gt; to understand what exists. The &lt;a href="https://github.com/javatarz/credit-card-lending" rel="noopener noreferrer"&gt;credit-card-lending&lt;/a&gt; project solves this with a &lt;code&gt;/tour&lt;/code&gt; command: run it and get an AI-guided walkthrough covering architecture, conventions, testing, and domain knowledge. This transforms static documentation into an interactive onboarding flow. Context docs become working tools, not forgotten reference material.&lt;/p&gt;

&lt;h3&gt;
  
  
  Context Doc Anatomy
&lt;/h3&gt;

&lt;p&gt;Every context doc starts with "Why Read This?" and prerequisites:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Testing Strategy&lt;/span&gt;

&lt;span class="gu"&gt;## Why Read This?&lt;/span&gt;

TDD principles, test pyramid, and testing tools.
Read when writing tests or understanding the test approach.

&lt;span class="gs"&gt;**Prerequisites:**&lt;/span&gt; conventions.md for code style
&lt;span class="gs"&gt;**Related:**&lt;/span&gt; domain/ for business rules being tested
&lt;span class="p"&gt;
---
&lt;/span&gt;
&lt;span class="gu"&gt;## Philosophy&lt;/span&gt;

We practice Test-Driven Development as our primary approach.
Tests drive design and provide confidence for change.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This helps AI tools (and humans) know whether they need this file and what to read first.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dense facts beat explanatory prose.&lt;/strong&gt; Compare:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Our testing philosophy emphasizes the importance of test-driven development. We believe that writing tests first leads to better design..."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;vs.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"TDD: Red-Green-Refactor. Tests before code. One assertion per test. Naming: &lt;code&gt;should{Expected}_when{Condition}&lt;/code&gt;."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The second version is what AI tools need. Save the narrative for human-focused documentation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Living Documentation
&lt;/h3&gt;

&lt;p&gt;Stale documentation lies confidently. It states things that are no longer true. You write tests to catch broken code. Your documentation needs the same capability.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/javatarz/credit-card-lending" rel="noopener noreferrer"&gt;credit-card-lending&lt;/a&gt; project handles this two ways:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Definition of Done includes context updates&lt;/strong&gt;: Every story card lists which context docs to review. The AI won't let you forget. You can bypass it by working without your AI pair or deleting the prompt, but the default path nudges you toward keeping docs current.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Drift detection&lt;/strong&gt;: A &lt;code&gt;/check-drift&lt;/code&gt; command compares docs against code&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The second point catches what the first misses. I've seen projects where features get built but &lt;code&gt;current-state.md&lt;/code&gt; still shows them as planned. Regular drift checks catch this before it causes confusion.&lt;/p&gt;

&lt;h3&gt;
  
  
  Patterns for Teams
&lt;/h3&gt;

&lt;p&gt;The examples above work within a single repository. At team and org level:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Shared context repository&lt;/strong&gt;: A company-wide repo with organization-level conventions, security requirements, architectural patterns. Each project references it but can override.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Team-level customization&lt;/strong&gt;: Team-specific &lt;code&gt;CLAUDE.md&lt;/code&gt; additions for their domain, their tools, their workflow quirks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt library&lt;/strong&gt;: Reusable prompts for common tasks. "Review this PR for security issues" with the right context attached.&lt;/p&gt;

&lt;h2&gt;
  
  
  Level 1: Foundation
&lt;/h2&gt;

&lt;p&gt;The foundation is what the AI sees when it first encounters your project.&lt;/p&gt;

&lt;h3&gt;
  
  
  CLAUDE.md
&lt;/h3&gt;

&lt;p&gt;This is your project's instruction manual for AI assistants. It goes in the repository root and contains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Project context&lt;/strong&gt;: What this is, what it does&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Git workflow&lt;/strong&gt;: Commit conventions, branching strategy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context file references&lt;/strong&gt;: Where to find domain knowledge, conventions, architecture&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool-specific instructions&lt;/strong&gt;: Commands, scripts, common tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's an excerpt from the &lt;a href="https://github.com/javatarz/credit-card-lending/blob/main/CLAUDE.md" rel="noopener noreferrer"&gt;credit-card-lending CLAUDE.md&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# CLAUDE.md&lt;/span&gt;

&lt;span class="gu"&gt;## Project Context&lt;/span&gt;
Credit card lending platform built with Java 25 and Spring Boot 4.
Modular monolith architecture with clear module boundaries.

&lt;span class="gu"&gt;## Git Workflow&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Trunk-based development: push to main, no PRs for standard work
&lt;span class="p"&gt;-&lt;/span&gt; Small commits (&amp;lt;200 lines) with descriptive messages
&lt;span class="p"&gt;-&lt;/span&gt; Reference issue numbers in commits

&lt;span class="gu"&gt;## Context Files&lt;/span&gt;
Read these before working on specific areas:
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`docs/context/overview.md`&lt;/span&gt; - Architecture and module structure
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`docs/context/conventions.md`&lt;/span&gt; - Code standards and patterns
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`docs/context/testing.md`&lt;/span&gt; - TDD principles and test strategy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;CLAUDE.md is dense and factual, not explanatory. It tells the AI what to do, not why. The "why" lives in context docs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Project Structure
&lt;/h3&gt;

&lt;p&gt;Structure matters because AI tools use file paths to understand context. I've found this layout works well:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;project/
├── CLAUDE.md                    # AI instruction manual
├── .claude/
│   ├── commands/                # User-invoked slash commands
│   └── skills/                  # Model-invoked capabilities
├── docs/
│   ├── context/                 # Dense reference documentation
│   │   ├── overview.md
│   │   ├── conventions.md
│   │   ├── testing.md
│   │   └── domain/
│   ├── wiki/                    # Narrative documentation
│   └── adr/                     # Architectural decisions
└── src/                         # Your code
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The separation between &lt;code&gt;context/&lt;/code&gt; (for AI consumption) and &lt;code&gt;wiki/&lt;/code&gt; (for humans) is intentional. Context docs are dense facts. &lt;a href="https://github.com/javatarz/credit-card-lending/wiki" rel="noopener noreferrer"&gt;Wiki pages&lt;/a&gt; explain concepts with diagrams and narrative. &lt;a href="https://adr.github.io" rel="noopener noreferrer"&gt;ADRs&lt;/a&gt; (Architectural Decision Records) capture why significant decisions were made. This context prevents future teams from wondering "why did they do it this way?"&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaways
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://github.com/javatarz/credit-card-lending" rel="noopener noreferrer"&gt;credit-card-lending&lt;/a&gt; repository demonstrates everything discussed above. Here's what I learned applying it.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Worked
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Small batches&lt;/strong&gt;: Most commits are under 100 lines. This makes review meaningful and rollbacks clean.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context primacy&lt;/strong&gt;: The AI reads &lt;code&gt;conventions.md&lt;/code&gt; before writing code. It knows our test naming patterns, package structure, and error handling approach without me repeating it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TDD skill with review modes&lt;/strong&gt;: Interactive mode for complex validation logic. Batch-ac mode for straightforward CRUD operations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Living documentation&lt;/strong&gt;: Every completed story updates &lt;code&gt;current-state.md&lt;/code&gt;. I know what's built by reading one file.&lt;/p&gt;

&lt;h3&gt;
  
  
  What We Learned
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Context docs need maintenance&lt;/strong&gt;: Early on, I'd update code without updating context docs. The AI would then generate code following outdated patterns. The &lt;code&gt;/check-drift&lt;/code&gt; command catches this now.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skills are better than scripts&lt;/strong&gt;: I started with bash scripts for workflows. Moving to skills let the AI adapt to context instead of following rigid steps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Design discussion matters&lt;/strong&gt;: Agreeing on approach before coding feels slow. In reality, it saves rework.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;Ready to try this? Here's a path:&lt;/p&gt;

&lt;h3&gt;
  
  
  If You're Starting Fresh
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Create &lt;code&gt;CLAUDE.md&lt;/code&gt; with your project context&lt;/li&gt;
&lt;li&gt;Add &lt;code&gt;docs/context/conventions.md&lt;/code&gt; with your coding standards&lt;/li&gt;
&lt;li&gt;Start with one command: &lt;code&gt;/start-dev&lt;/code&gt; for TDD workflow&lt;/li&gt;
&lt;li&gt;Add context docs as you need them&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  If You Have an Existing Project
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Create &lt;code&gt;CLAUDE.md&lt;/code&gt; capturing how you want the project worked on&lt;/li&gt;
&lt;li&gt;Document your most important conventions&lt;/li&gt;
&lt;li&gt;Add the &lt;code&gt;/update-context&lt;/code&gt; command so documentation stays current&lt;/li&gt;
&lt;li&gt;Gradually expand context as you work&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Try It Yourself
&lt;/h3&gt;

&lt;p&gt;Clone the example repository and explore:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/javatarz/credit-card-lending
&lt;span class="nb"&gt;cd &lt;/span&gt;credit-card-lending
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run &lt;code&gt;/tour&lt;/code&gt; to get an interactive walkthrough of the project structure, setup, and key concepts. Then try &lt;code&gt;/pickup&lt;/code&gt; to see available work or &lt;code&gt;/start-dev&lt;/code&gt; to see TDD in action.&lt;/p&gt;

&lt;p&gt;The branch &lt;code&gt;blog-ie-setup-jan2025&lt;/code&gt; contains the exact state referenced in this post.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;If you try this approach, I'd like to hear what works and what doesn't. The practices here evolved from experimentation. They'll keep evolving.&lt;/p&gt;

&lt;h2&gt;
  
  
  Credits
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;The intelligent Engineering framework was developed in collaboration with &lt;a href="https://www.linkedin.com/in/anandiyengar/" rel="noopener noreferrer"&gt;Anand Iyengar&lt;/a&gt; and other Sahajeevis. It was originally published on the &lt;a href="https://sahaj.ai/featured-article/realising-efficiency-and-productivity-through-intelligent-engineering/" rel="noopener noreferrer"&gt;Sahaj website&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>intelligent Engineering: A Skill Map for Learning AI-Assisted Development</title>
      <dc:creator>Karun Japhet</dc:creator>
      <pubDate>Thu, 01 Jan 2026 05:58:05 +0000</pubDate>
      <link>https://dev.to/javatarz/intelligent-engineering-a-skill-map-for-learning-ai-assisted-development-3kaj</link>
      <guid>https://dev.to/javatarz/intelligent-engineering-a-skill-map-for-learning-ai-assisted-development-3kaj</guid>
      <description>&lt;p&gt;Principles are useful, but they don't tell you what to practice.&lt;/p&gt;

&lt;p&gt;In my previous post on &lt;a href="https://dev.to/javatarz/intelligent-engineering-principles-for-building-with-ai-34aa"&gt;intelligent Engineering principles&lt;/a&gt;, I outlined the ideas that guide how I build software with AI. Since then, I've had people ask: "Where do I start? What skills should I build first?"&lt;/p&gt;

&lt;p&gt;This post answers that: a map of the skills that make up intelligent Engineering, organised into a learning path you can follow whether you're an individual contributor looking to level up or a tech leader building your team's AI fluency.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is intelligent Engineering?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://sahaj.ai/intelligent-engineering/" rel="noopener noreferrer"&gt;intelligent Engineering&lt;/a&gt; is a framework for integrating AI across the entire software development lifecycle, not just code generation.&lt;/p&gt;

&lt;p&gt;Writing code represents only 10-20% of software development effort. The rest is research, analysis, design, testing, deployment, and maintenance. intelligent Engineering applies AI across all of these stages while keeping humans accountable for outcomes.&lt;/p&gt;

&lt;p&gt;I've already written about the &lt;a href="https://dev.to/javatarz/intelligent-engineering-principles-for-building-with-ai-34aa"&gt;five core principles&lt;/a&gt; in detail. This post focuses on the skills that make those principles actionable.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Skill Map
&lt;/h2&gt;

&lt;p&gt;&lt;a href="/assets/images/posts/2026-01-01-skill-map/skill-progression.png"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6cboy16yilvjufbsck1e.png" alt="Skill progression map showing four stages: Foundations, AI Interaction, Workflow Integration, and Advanced/Agentic" width="800" height="551"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Master the skills at each stage before moving to the next. Skipping ahead creates gaps that AI will expose.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Foundations
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://dora.dev/research/2025/dora-report/" rel="noopener noreferrer"&gt;2025 DORA report&lt;/a&gt; confirmed what many suspected: AI amplifies your existing capability, magnifying both strengths and weaknesses.&lt;/p&gt;

&lt;p&gt;If your fundamentals are weak, AI won't fix them. It will make the cracks more visible, faster.&lt;/p&gt;

&lt;p&gt;This map assumes you already have solid computer science fundamentals: data structures, algorithms, and an understanding of how systems work (processors, memory, networking, databases, etc.). AI doesn't replace the need to know these.&lt;/p&gt;

&lt;h4&gt;
  
  
  Version control fluency
&lt;/h4&gt;

&lt;p&gt;Git workflows, meaningful commits, safe experimentation with branches. AI generates code quickly. If you can't safely integrate and roll back changes, you'll spend more time cleaning up than you save.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to build this:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If you haven't used branches and pull requests regularly, start a side project that forces you to&lt;/li&gt;
&lt;li&gt;Read &lt;a href="https://git-scm.com/book/en/v2" rel="noopener noreferrer"&gt;Pro Git&lt;/a&gt; (free online) - chapters 1-3 cover the essentials&lt;/li&gt;
&lt;li&gt;Learn &lt;a href="https://git-scm.com/docs/git-worktree" rel="noopener noreferrer"&gt;git worktrees&lt;/a&gt; - you'll need them for multi-agent workflows in the Advanced section&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Testing fundamentals
&lt;/h4&gt;

&lt;p&gt;The &lt;a href="https://martinfowler.com/articles/practical-test-pyramid.html" rel="noopener noreferrer"&gt;test pyramid&lt;/a&gt; still applies. Unit, integration, end-to-end. AI can generate tests, but knowing which tests matter, when to push tests up or down the pyramid, and reviewing their quality is your job. Build intuition for what belongs at each layer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to build this:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Practice writing tests before code (TDD) on a small project&lt;/li&gt;
&lt;li&gt;Read &lt;a href="https://www.oreilly.com/library/view/test-driven-development/0321146530/" rel="noopener noreferrer"&gt;Test-Driven Development: By Example&lt;/a&gt; by Kent Beck, the foundational TDD book&lt;/li&gt;
&lt;li&gt;Read &lt;a href="https://www.pearson.com/en-us/subject-catalog/p/growing-object-oriented-software-guided-by-tests/P200000009298/" rel="noopener noreferrer"&gt;Growing Object-Oriented Software, Guided by Tests&lt;/a&gt; by Steve Freeman and Nat Pryce for TDD in practice&lt;/li&gt;
&lt;li&gt;Apply &lt;a href="https://martinfowler.com/bliki/TestPyramid.html" rel="noopener noreferrer"&gt;Martin Fowler's test pyramid rule&lt;/a&gt;: if a unit test covers it, don't duplicate at higher levels. Push tests down: unit test business logic, integration test service interactions, end-to-end only for critical user paths&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Code review discipline
&lt;/h4&gt;

&lt;p&gt;You'll review more code than ever. AI-generated code often looks plausible but handles edge cases incorrectly. Strengthen your eye for subtle bugs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What to watch for in AI-generated code:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Security vulnerabilities&lt;/strong&gt;: SQL injection, unsafe data handling, hardcoded secrets. AI often generates patterns that work but aren't secure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Edge cases&lt;/strong&gt;: Null handling, empty collections, boundary conditions. AI tends to handle the happy path well but miss edge cases.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Business logic errors&lt;/strong&gt;: AI can't understand your domain. Verify that the code does what the business actually needs, not just what the prompt described.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Architectural violations&lt;/strong&gt;: Does the code respect your layer boundaries? Does it follow your ADRs? AI doesn't know your architectural constraints unless you tell it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code smells&lt;/strong&gt;: Duplicated logic, overly complex methods, inconsistent patterns. AI doesn't always match your codebase conventions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hallucinated APIs&lt;/strong&gt;: Functions or methods that look real but don't exist. Always verify imports and dependencies.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;How to build this:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Review pull requests on open source projects you use&lt;/li&gt;
&lt;li&gt;Read &lt;a href="https://google.github.io/eng-practices/review/" rel="noopener noreferrer"&gt;Code Review Guidelines&lt;/a&gt; from Google's engineering practices&lt;/li&gt;
&lt;li&gt;Practice the "trust but verify" mindset: assume AI code needs checking, not approval&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Code quality intuition
&lt;/h4&gt;

&lt;p&gt;Can you recognize maintainable, clean code vs technically-correct-but-messy? AI generates code fast. If you can't tell good from bad, you'll accept garbage that costs you later.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to build this:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Read &lt;a href="https://www.oreilly.com/library/view/clean-code-a/9780136083238/" rel="noopener noreferrer"&gt;Clean Code&lt;/a&gt; by Robert Martin&lt;/li&gt;
&lt;li&gt;Refactor old code you wrote, or practice on &lt;a href="https://github.com/emilybache/GildedRose-Refactoring-Kata" rel="noopener noreferrer"&gt;clean code katas&lt;/a&gt; - notice what makes code hard to change&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Documentation practices
&lt;/h4&gt;

&lt;p&gt;Documentation becomes AI context. Quality documentation into the system means quality AI output. Poor docs mean the AI hallucinates or makes wrong assumptions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to build this:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Document a project you're working on as if a new teammate needs to understand it&lt;/li&gt;
&lt;li&gt;Read &lt;a href="https://docsfordevelopers.com/" rel="noopener noreferrer"&gt;Docs for Developers&lt;/a&gt; for practical guidance&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Architecture understanding
&lt;/h4&gt;

&lt;p&gt;Data flow, component boundaries, dependency management. AI tools need you to describe constraints clearly. If you don't understand the architecture, you can't provide good context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to build this:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Draw architecture diagrams for systems you work with&lt;/li&gt;
&lt;li&gt;Read &lt;a href="https://www.oreilly.com/library/view/fundamentals-of-software/9781492043447/" rel="noopener noreferrer"&gt;Fundamentals of Software Architecture&lt;/a&gt; by Richards and Ford for trade-offs and patterns&lt;/li&gt;
&lt;li&gt;Read &lt;a href="https://dataintensive.net/" rel="noopener noreferrer"&gt;Designing Data-Intensive Applications&lt;/a&gt; by Kleppmann for distributed systems and data architecture&lt;/li&gt;
&lt;li&gt;For microservices specifically, read &lt;a href="https://www.oreilly.com/library/view/building-microservices-2nd/9781492034018/" rel="noopener noreferrer"&gt;Building Microservices&lt;/a&gt; by Sam Newman&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  2. AI Interaction
&lt;/h3&gt;

&lt;p&gt;The skills specific to working with AI systems. You're learning to communicate with a system that's capable but context-limited, confident but sometimes wrong.&lt;/p&gt;

&lt;h4&gt;
  
  
  Prompt engineering basics
&lt;/h4&gt;

&lt;p&gt;Specificity matters. Vague requests get vague results.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bad prompt:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Write a function to parse dates
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Good prompt:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Write a Python function that:
- Parses ISO 8601 date strings (e.g., "2025-12-31T14:30:00Z")
- Handles timezone offsets
- Returns None for invalid input
- Include docstring and type hints
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The difference isn't cleverness - it's precision.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key techniques:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Technique&lt;/th&gt;
&lt;th&gt;What It Is&lt;/th&gt;
&lt;th&gt;When to Use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Specificity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Precise requirements over vague requests&lt;/td&gt;
&lt;td&gt;Always - the biggest lever&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Few-shot prompting&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Show 1-3 examples of input → output&lt;/td&gt;
&lt;td&gt;Team patterns, consistent formatting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Chain of thought&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"Think step-by-step: analyze, identify, explain, then fix"&lt;/td&gt;
&lt;td&gt;Debugging, complex reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Role prompting&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"Act as a senior security engineer reviewing for vulnerabilities"&lt;/td&gt;
&lt;td&gt;When expertise framing helps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Meta prompting&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Prompts that generate or refine other prompts&lt;/td&gt;
&lt;td&gt;Org-level standards, team templates&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Explicit constraints&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"Don't use external libraries. Keep it under 50 lines."&lt;/td&gt;
&lt;td&gt;Avoiding common failure modes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Few-shot example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Convert these function names from camelCase to snake_case:

Example 1: getUserById -&amp;gt; get_user_by_id
Example 2: validateEmailAddress -&amp;gt; validate_email_address

Now convert: fetchAllActiveUsers
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Chain of thought example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Debug this function. Think step-by-step:
1. What is this function supposed to do?
2. Trace through with input X - what happens at each line?
3. Where does the actual behavior differ from expected?
4. What's the fix?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;How to build this:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Spend a week being deliberate about prompts. Write down what you asked, what you got, and what you wish you'd asked.&lt;/li&gt;
&lt;li&gt;Read &lt;a href="https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/overview" rel="noopener noreferrer"&gt;Anthropic's Prompt Engineering Guide&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Reference &lt;a href="https://www.promptingguide.ai/" rel="noopener noreferrer"&gt;promptingguide.ai&lt;/a&gt; for comprehensive techniques&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Context engineering
&lt;/h4&gt;

&lt;p&gt;A clever prompt won't fix bad context. Context engineering is about curating what information the model sees: project constraints, coding standards, relevant examples, what you've already tried.&lt;/p&gt;

&lt;p&gt;This is the 80% of the skill. Prompt engineering is maybe 20%.&lt;/p&gt;

&lt;p&gt;I've written a detailed guide on this: &lt;a href="https://dev.to/javatarz/context-engineering-for-ai-assisted-development-b8i"&gt;Context Engineering for AI-Assisted Development&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to build this:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create a project-level context file (e.g., CLAUDE.md) for your current codebase&lt;/li&gt;
&lt;li&gt;Add coding standards, architectural constraints, common patterns&lt;/li&gt;
&lt;li&gt;Notice when AI output improves because of better context&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Understanding model behaviour
&lt;/h4&gt;

&lt;p&gt;You don't need to become an ML engineer, but knowing the basics helps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What to understand:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Concept&lt;/th&gt;
&lt;th&gt;Why It Matters&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Context windows&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Why your 50-file codebase overwhelms the model. Why it "forgets" earlier instructions. (&lt;a href="https://docs.anthropic.com/en/docs/build-with-claude/context-windows" rel="noopener noreferrer"&gt;Anthropic's context window docs&lt;/a&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Training data &amp;amp; fine-tuning&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Why Claude excels at code review. Why some models are verbose, others concise.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Knowledge cutoff&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Why the model doesn't know about libraries released last month.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hallucinations&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Models confidently generate plausible-looking nonsense. Verify APIs exist. Test edge cases.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cost per token&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Why Opus is expensive for exploration but worth it for complex reasoning. (&lt;a href="https://www.anthropic.com/pricing" rel="noopener noreferrer"&gt;Anthropic pricing&lt;/a&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Model strengths (from my experience):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Strengths&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude&lt;/td&gt;
&lt;td&gt;Thoughtful about edge cases, good at following complex instructions, strong code review&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT&lt;/td&gt;
&lt;td&gt;Fast, good at general tasks, wide knowledge&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini&lt;/td&gt;
&lt;td&gt;Larger context windows, good at multimodal tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These observations come from my own work. Models evolve quickly - what's true today may change next quarter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to build this:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Try the same task with different models. Note where each excels.&lt;/li&gt;
&lt;li&gt;Read model release notes when new versions come out&lt;/li&gt;
&lt;li&gt;Track which models work best for your common tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Understanding tool behaviour
&lt;/h4&gt;

&lt;p&gt;Here's something that trips people up: &lt;strong&gt;the same model behaves differently in different tools&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Cursor's Claude is not the same as Claude Code's Claude is not the same as Windsurf's Claude. Why? Each tool wraps the model with its own system prompt.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Model Nuances (Intrinsic)&lt;/th&gt;
&lt;th&gt;Tool Nuances (Extrinsic)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;What it is&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Differences baked into the model itself&lt;/td&gt;
&lt;td&gt;Differences from how the tool wraps the model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Examples&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Context window, reasoning style, training data, cost&lt;/td&gt;
&lt;td&gt;System prompts, UI, context injection, available commands&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;What to learn&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Model strengths for different tasks&lt;/td&gt;
&lt;td&gt;How your tool injects context, what its system prompt optimizes for&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This means: instructions that work well in Claude Code might not work the same in Cursor, even with the same underlying model. The tool's system prompt and context injection change the behavior.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to build this:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Try the same prompt in multiple tools. Notice the differences.&lt;/li&gt;
&lt;li&gt;Read your tool's documentation on how it manages context&lt;/li&gt;
&lt;li&gt;Understand what your tool's system prompt optimizes for (coding, conversation, etc.)&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  3. Workflow Integration
&lt;/h3&gt;

&lt;p&gt;Making AI a standard part of how you build software, not a novelty you occasionally use.&lt;/p&gt;

&lt;h4&gt;
  
  
  Tool configuration
&lt;/h4&gt;

&lt;p&gt;Configure your AI tools for your team's context. This isn't a one-time setup. Rules files need tuning. Context evolves. Tools update frequently.&lt;/p&gt;

&lt;p&gt;Each tool has its own configuration mechanism:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude Code uses &lt;a href="https://code.claude.com/docs/en/memory" rel="noopener noreferrer"&gt;CLAUDE.md files&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Cursor uses &lt;a href="https://cursor.directory" rel="noopener noreferrer"&gt;rules&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Windsurf uses &lt;a href="https://docs.windsurf.com/windsurf/cascade/memories" rel="noopener noreferrer"&gt;memories&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instructions that work in one tool won't transfer directly to another because system prompts differ.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to build this:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Document your configuration so teammates can get productive quickly&lt;/li&gt;
&lt;li&gt;Review and update configuration monthly as tools evolve&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Specs-before-implementation
&lt;/h4&gt;

&lt;p&gt;Define what to build before AI generates code. AI generates code that matches a spec well. It struggles to determine what the spec should be.&lt;/p&gt;

&lt;p&gt;Write the spec first - acceptance criteria, edge cases, constraints. Then let AI implement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to build this:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Practice writing specs for features before touching code&lt;/li&gt;
&lt;li&gt;Include: what it should do, what it shouldn't do, edge cases to handle&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Test-driven mindset with AI
&lt;/h4&gt;

&lt;p&gt;Write tests first. Let AI implement to pass them. This flips the usual flow: instead of "generate code, then test it", you "define the contract, then fill it in."&lt;/p&gt;

&lt;p&gt;The tests become your spec. When AI has an executable target (tests that must pass), it produces better code than when interpreting prose requirements.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to build this:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Try TDD on a small feature: write failing tests, then ask AI to make them pass&lt;/li&gt;
&lt;li&gt;Review the generated code - does it just satisfy the tests or is it actually good?&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Human review gates
&lt;/h4&gt;

&lt;p&gt;AI-generated code requires the same (or stricter) review as human-written code. Build the habit of treating AI output like code from a confident junior developer: often correct, sometimes subtly wrong, occasionally completely off base.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to build this:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Set a personal rule: no AI-generated code merged without reviewing every line&lt;/li&gt;
&lt;li&gt;Track your AI acceptance rate. If you're accepting &amp;gt;80% without modification, you might be over-trusting.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Small batches
&lt;/h4&gt;

&lt;p&gt;Generate less, review more. A 1000-line AI diff is harder to review than a 100-line one. Work in small chunks. Commit often.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to build this:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Break tasks into steps that produce &amp;lt;200 lines of change&lt;/li&gt;
&lt;li&gt;Commit after each step passes review&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Quality guardrails
&lt;/h4&gt;

&lt;p&gt;Integrate linting, static analysis, and security scanning into your workflow. These catch issues AI introduces. Shift left. Catch problems early.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to build this:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Set up pre-commit hooks for linting and formatting&lt;/li&gt;
&lt;li&gt;Add security scanning to CI (e.g., &lt;a href="https://snyk.io/" rel="noopener noreferrer"&gt;Snyk&lt;/a&gt;, &lt;a href="https://semgrep.dev/" rel="noopener noreferrer"&gt;Semgrep&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Living documentation
&lt;/h4&gt;

&lt;p&gt;Documentation updated atomically with code changes. When code changes, docs change in the same commit. This keeps your AI context current.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to build this:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Include doc updates in your definition of done&lt;/li&gt;
&lt;li&gt;Review PRs for documentation staleness&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  4. Advanced / Agentic
&lt;/h3&gt;

&lt;p&gt;Skills for autonomous AI workflows. These are powerful but risky - more autonomy needs stronger guardrails.&lt;/p&gt;

&lt;h4&gt;
  
  
  Agentic workflow design
&lt;/h4&gt;

&lt;p&gt;Tools like Claude Code, Cursor, and Windsurf can run shell commands, edit files, and chain actions. Know what your tool can do and design workflows that leverage it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to build this:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Start with supervised agents - review each step before allowing the next&lt;/li&gt;
&lt;li&gt;Read &lt;a href="https://code.claude.com/docs/en/github-actions" rel="noopener noreferrer"&gt;Claude Code's GitHub Actions integration&lt;/a&gt; for CI/CD examples&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Task decomposition
&lt;/h4&gt;

&lt;p&gt;Break complex work into subtasks an agent can handle. Good decomposition is a skill in itself. Too big and the agent loses focus. Too small and you spend all your time orchestrating.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to build this:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Practice breaking features into agent-sized tasks (~30 min of work each)&lt;/li&gt;
&lt;li&gt;Notice which decompositions lead to better agent output&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Guardrails for agents
&lt;/h4&gt;

&lt;p&gt;More autonomy needs stronger guardrails. Sandboxing, approval gates, rollback procedures. Agents make mistakes. Build systems that catch them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to build this:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Never give agents write access to production&lt;/li&gt;
&lt;li&gt;Implement approval gates for destructive operations&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Engineering culture codification
&lt;/h4&gt;

&lt;p&gt;Turn your team's standards, patterns, and guidelines into structured artifacts that AI can use. This is how you scale intelligent Engineering beyond individuals.&lt;/p&gt;

&lt;p&gt;When you document coding standards, architectural patterns, and review checklists in a format AI can consume, every team member (and AI tool) operates from the same playbook.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to build this:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Start with a CLAUDE.md (or equivalent) that captures your team's conventions&lt;/li&gt;
&lt;li&gt;Add architectural decision records (ADRs) that AI can reference&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Multi-agent orchestration
&lt;/h4&gt;

&lt;p&gt;Running parallel agents (e.g., using git worktrees). Coordinating results. This is emerging territory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to build this:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Try running two agents on independent tasks&lt;/li&gt;
&lt;li&gt;Notice coordination challenges and develop patterns for handling them&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  CI/CD integration
&lt;/h4&gt;

&lt;p&gt;Running AI reviews on pull requests. Automated code analysis. Scheduled agents for maintenance tasks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to build this:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Set up &lt;a href="https://docs.github.com/en/copilot/how-tos/agents/copilot-code-review/using-copilot-code-review" rel="noopener noreferrer"&gt;Copilot code review&lt;/a&gt; or similar on your repo&lt;/li&gt;
&lt;li&gt;Start with comment-only (no auto-merge) until you trust it&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Learning Paths
&lt;/h2&gt;

&lt;p&gt;Not everyone starts from the same place.&lt;/p&gt;

&lt;h3&gt;
  
  
  For Developers New to AI Tools
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Start here:&lt;/strong&gt; Foundations + AI Interaction basics&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Get comfortable with one AI tool. GitHub Copilot is a good starting point for its low cost and tight editor integration. For open source alternatives, try &lt;a href="https://aider.chat/" rel="noopener noreferrer"&gt;Aider&lt;/a&gt; or &lt;a href="https://github.com/sst/opencode" rel="noopener noreferrer"&gt;OpenCode&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Spend 2-4 weeks using it for completion and simple generation.&lt;/li&gt;
&lt;li&gt;Practice prompting: be specific, iterate, learn what works.&lt;/li&gt;
&lt;li&gt;Move to a more capable tool (Claude Code, Cursor, Windsurf) once you're comfortable.&lt;/li&gt;
&lt;li&gt;Build your first context file.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Expected ramp-up:&lt;/strong&gt; 4-8 weeks to feel productive.&lt;/p&gt;

&lt;h3&gt;
  
  
  For Developers Experienced With AI
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Start here:&lt;/strong&gt; Workflow Integration + Advanced&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Audit your current workflow. Where are you using AI effectively? Where are you over-trusting?&lt;/li&gt;
&lt;li&gt;Strengthen context engineering. Create comprehensive project context files.&lt;/li&gt;
&lt;li&gt;Set up guardrails: linting, security scanning, review checklists.&lt;/li&gt;
&lt;li&gt;Experiment with agentic workflows under supervision.&lt;/li&gt;
&lt;li&gt;Integrate AI into CI/CD.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Expected ramp-up:&lt;/strong&gt; 2-4 weeks to significantly improve your workflow.&lt;/p&gt;

&lt;h3&gt;
  
  
  For Tech Leaders Building Team Capability
&lt;/h3&gt;

&lt;p&gt;Whether you're a Tech Lead, Engineering Manager, Principal Engineer, or anyone else responsible for growing your team's capability, this section is for you.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Start here:&lt;/strong&gt; The &lt;a href="https://cloud.google.com/resources/content/2025-dora-ai-capabilities-model-report" rel="noopener noreferrer"&gt;2025 DORA AI Capabilities Model&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The report identified seven practices that amplify AI's positive impact:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Clear AI stance&lt;/strong&gt;: Establish expectations for how your team uses AI.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Healthy data ecosystem&lt;/strong&gt;: Quality documentation enables quality AI outputs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strong version control&lt;/strong&gt;: Rollback capability provides a safety net for experimentation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Small batches&lt;/strong&gt;: Enable quick course corrections.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;User-centric focus&lt;/strong&gt;: Clear goals improve AI output quality.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quality internal platforms&lt;/strong&gt;: Standardised tooling scales AI benefits.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI-accessible data&lt;/strong&gt;: Make context available to AI tools.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Actions:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Assess your team against these practices. Where are the gaps?&lt;/li&gt;
&lt;li&gt;Don't change everything at once. Introduce AI at one delivery stage at a time.&lt;/li&gt;
&lt;li&gt;Expect a learning curve: 2-4 weeks of reduced productivity before gains appear.&lt;/li&gt;
&lt;li&gt;Invest in guardrails before acceleration.&lt;/li&gt;
&lt;li&gt;Measure impact with DORA metrics: deployment frequency, lead time, change failure rate, time to restore.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Common Pitfalls
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Starting with advanced tools&lt;/strong&gt;: If you skip fundamentals, you'll produce more code, faster, with worse quality. The problems compound.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ignoring context engineering&lt;/strong&gt;: Most teams spend all their energy on prompt engineering. Context engineering matters far more. Good context makes mediocre prompts work; perfect prompts can't fix missing context. And context scales: set it up once, benefit every interaction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Over-trusting AI&lt;/strong&gt;: "The AI suggested it" is not an acceptable answer in a post-mortem. &lt;a href="https://dev.to/javatarz/intelligent-engineering-principles-for-building-with-ai-34aa#ai-augments-humans-stay-accountable"&gt;You're accountable for what ships&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Under-trusting AI&lt;/strong&gt;: Some developers refuse to adopt AI tools, treating them as a passing fad. The productivity gap is real. Healthy skepticism is fine, but refusing to engage is risky. For tech leaders: &lt;a href="https://dora.dev/ai/research-insights/adopt-gen-ai/" rel="noopener noreferrer"&gt;DORA's research on AI adoption&lt;/a&gt; shows that addressing anxieties directly and providing dedicated exploration time significantly improves adoption.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No guardrails&lt;/strong&gt;: AI makes it easy to move fast. Without automated quality checks, you'll ship bugs faster too. &lt;a href="https://dev.to/javatarz/intelligent-engineering-principles-for-building-with-ai-34aa#smarter-ai-needs-smarter-guardrails"&gt;Smarter AI needs smarter guardrails&lt;/a&gt;. If you don't have linting, security scanning, and CI checks, add them before increasing your AI usage. For legacy codebases without tests, start with &lt;a href="https://understandlegacycode.com/blog/best-way-to-start-testing-untested-code/" rel="noopener noreferrer"&gt;characterization tests&lt;/a&gt; to capture current behaviour before refactoring. Michael Feathers' &lt;a href="https://www.oreilly.com/library/view/working-effectively-with/0131177052/" rel="noopener noreferrer"&gt;Working Effectively with Legacy Code&lt;/a&gt; is the definitive guide here. AI can accelerate this process, but verify every generated test passes against the real system without any changes to production code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Confusing model and tool behaviour&lt;/strong&gt;: When AI output is wrong, is it the model's limitation or the tool's system prompt? Knowing the difference helps you fix it. To diagnose: try the same prompt in a different tool or the raw API. If the problem persists across tools, it's likely a model limitation. If it only happens in one tool, check how that tool injects context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trying to measure productivity improvement without baselines&lt;/strong&gt;: You can't prove AI made your team faster if you weren't measuring before. Worse, once estimates become targets for measuring AI impact, &lt;a href="https://www.linkedin.com/feed/update/urn:li:activity:7405299770233135105/" rel="noopener noreferrer"&gt;developers adjust their estimates&lt;/a&gt; (consciously or not). Skip the productivity theatre. Instead, measure what matters: features shipped, customer value delivered, time from idea to production, team satisfaction.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;This skill map is a snapshot. The tools evolve weekly. New capabilities emerge monthly.&lt;/p&gt;

&lt;p&gt;If you're on this journey, I'd like to hear what's working for you. What skills have I missed? What resources have you found valuable?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Coming up:&lt;/strong&gt; Putting these skills into practice. I'll walk through setting up intelligent Engineering on a real project, covering tool configuration, context files, and workflow patterns that work.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>career</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Context Engineering for AI-Assisted Development</title>
      <dc:creator>Karun Japhet</dc:creator>
      <pubDate>Thu, 01 Jan 2026 05:57:49 +0000</pubDate>
      <link>https://dev.to/javatarz/context-engineering-for-ai-assisted-development-b8i</link>
      <guid>https://dev.to/javatarz/context-engineering-for-ai-assisted-development-b8i</guid>
      <description>&lt;p&gt;Same model, different tools, different results.&lt;/p&gt;

&lt;p&gt;If you've used Claude Sonnet in &lt;a href="https://claude.ai/code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;, &lt;a href="https://cursor.com" rel="noopener noreferrer"&gt;Cursor&lt;/a&gt;, &lt;a href="https://github.com/features/copilot" rel="noopener noreferrer"&gt;Copilot&lt;/a&gt;, and &lt;a href="https://windsurf.com" rel="noopener noreferrer"&gt;Windsurf&lt;/a&gt;, you've noticed this. The model is identical, but the behavior varies. This isn't magic. It's context engineering.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://karun.me/assets/images/posts/2025-12-31-context-engineering/cover.jpg" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fkarun.me%2Fassets%2Fimages%2Fposts%2F2025-12-31-context-engineering%2Fcover.jpg" alt="Two people collaborating at a whiteboard with diagrams and notes" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://dev.to/javatarz/intelligent-engineering-principles-for-building-with-ai-34aa"&gt;intelligent Engineering: Principles for Building With AI&lt;/a&gt;, I mentioned that "context is everything" and that "context engineering matters more than prompt engineering." But I didn't explain what that means or how to do it. This post fills that gap.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Whiteboard
&lt;/h2&gt;

&lt;p&gt;Imagine you're in a day-long strategy meeting. There's one whiteboard in the room. That's all the shared space you have.&lt;/p&gt;

&lt;p&gt;Your teammate is brilliant. They can see everything on the board and reason about it. But here's the thing: they have no memory outside this whiteboard. What's written is all they know. Erase something, and it's gone.&lt;/p&gt;

&lt;p&gt;Before the meeting started, someone wrote ground rules at the top: "Focus on Q1 priorities. Be specific. No tangents." This section doesn't get erased. It frames everything that follows. (That's the system prompt.)&lt;/p&gt;

&lt;p&gt;The meeting begins. You add notes, diagrams, decisions. The board fills up. You need to add something new, but there's no space. What do you erase? The detailed debate from 9am, or the decision it produced? You keep the decision, erase the discussion. (That's compaction.)&lt;/p&gt;

&lt;p&gt;Three hours in, you notice something odd. Your teammate keeps referencing the top and bottom of the board, but seems to miss what's in the middle. Important context from 10:30am is right there, but they're not looking at it. The middle of the board gets less attention.&lt;/p&gt;

&lt;p&gt;Someone raises a topic that needs last quarter's data. Do you copy the entire Q4 report onto the board? No. You flip open your notebook, find the one relevant chart, add it to the board, discuss it, then erase it when you move on. (That's just-in-time retrieval.) The notebook stays on the table. You reference it when needed, but it doesn't consume board space.&lt;/p&gt;

&lt;p&gt;By afternoon, old notes are causing problems. A 9am assumption turned out to be wrong, but it's still on the board. Your teammate keeps building on it. The board is poisoned with outdated information. You need to actively clean it up.&lt;/p&gt;

&lt;p&gt;There's too much on the board now. Some notes are written in shorthand. Others are cramped into corners with tiny handwriting. Your teammate can technically see it all, but finding anything takes effort. Attention is diluted. (That's context distraction.)&lt;/p&gt;

&lt;p&gt;For a complex sub-problem, you send two people to side rooms with fresh whiteboards. They work independently, then return with one-page summaries. You add the summaries to your board and integrate the findings. You never needed their full whiteboards. (That's sub-agents.)&lt;/p&gt;

&lt;p&gt;The whiteboard is your teammate's entire context window. What's on it is all they can work with. Your job is to curate what goes on the board so they can focus on what matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means Technically
&lt;/h2&gt;

&lt;p&gt;The whiteboard story maps directly to how AI models process information.&lt;/p&gt;

&lt;h3&gt;
  
  
  System Prompts vs User Prompts
&lt;/h3&gt;

&lt;p&gt;The ground rules at the top of the board are the &lt;strong&gt;system prompt&lt;/strong&gt;. You didn't write them. They were there when you walked in, set by whoever built the tool. They define how the model behaves, what it prioritizes, what it can do.&lt;/p&gt;

&lt;p&gt;What you add during the meeting is the &lt;strong&gt;user prompt&lt;/strong&gt;. Your requests, your context, your questions. It works within the frame the system prompt establishes.&lt;/p&gt;

&lt;p&gt;The model sees both. But system prompts carry more weight because they come first and set expectations.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Context Window
&lt;/h3&gt;

&lt;p&gt;The whiteboard's physical dimensions are the &lt;strong&gt;context window&lt;/strong&gt;. There's a fixed amount of space. Everything competes for it: system instructions, conversation history, files you've pulled in, tool definitions, and the model's own output. When it fills up, something has to go.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lost in the Middle
&lt;/h3&gt;

&lt;p&gt;Remember how your teammate focused on the top and bottom of the board but missed the middle? That's a real phenomenon. Research shows a U-shaped attention curve: information at the start and end of context gets more attention than information in the middle.&lt;/p&gt;

&lt;p&gt;&lt;a href="/assets/images/posts/2025-12-31-context-engineering/attention-curve.svg"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fkarun.me%2Fassets%2Fimages%2Fposts%2F2025-12-31-context-engineering%2Fattention-curve.svg" alt="U-shaped attention curve showing high attention at start and end of context, with 'Lost in the Middle' highlighting the attention dip" width="500" height="280"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cramming everything into context can hurt performance&lt;/li&gt;
&lt;li&gt;Position matters: put important information first or last&lt;/li&gt;
&lt;li&gt;As context grows, accuracy often decreases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In &lt;a href="https://dev.to/javatarz/patterns-for-ai-assisted-software-development-4ga2"&gt;Patterns for AI-assisted Software Development&lt;/a&gt;, I described LLMs as "teammates with anterograde amnesia." They can hold information, but only within the context window. Understanding how to manage that window is key.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Attention Budget
&lt;/h3&gt;

&lt;p&gt;Even with everything visible on the board, your teammate can only actively focus on so much while reasoning. Each item costs attention. Add more, and something else gets less focus. Think of it as a budget: every token you add depletes some of the model's capacity to focus on what matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Different Tools Set Up the Room
&lt;/h2&gt;

&lt;p&gt;Here's why the same model behaves differently across tools: different rooms have different ground rules at the top of the board.&lt;/p&gt;

&lt;p&gt;Take Claude Sonnet 4.5. Same teammate. But put them in different rooms:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Room (Tool)&lt;/th&gt;
&lt;th&gt;Top of the board says&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude Code&lt;/td&gt;
&lt;td&gt;"Work autonomously. Read files, run terminal commands, complete multi-step tasks."&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cursor&lt;/td&gt;
&lt;td&gt;"Stay in the editor. Complete code inline, understand the open file, suggest edits."&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Copilot&lt;/td&gt;
&lt;td&gt;"Autocomplete as they type. Quick suggestions, stay out of the way."&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Windsurf&lt;/td&gt;
&lt;td&gt;"Maintain flow. Remember preferences across sessions, keep continuity."&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Your teammate reads the top of the board and behaves accordingly. That's why the same model feels different in each tool. The system prompt shapes everything.&lt;/p&gt;

&lt;p&gt;This also explains why prompts don't transfer directly between tools. A prompt that works well in Claude Code might fail in Cursor because the framing is different.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Goes Wrong
&lt;/h2&gt;

&lt;p&gt;When context fails, it fails in predictable ways. Recognizing these patterns helps you diagnose problems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Context Poisoning
&lt;/h3&gt;

&lt;p&gt;Early errors compound. Your teammate builds on incorrect assumptions, reinforcing mistakes with each exchange. By the time you notice, the board is thoroughly polluted with wrong information.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Use backtrack to undo recent turns. &lt;a href="https://code.claude.com/docs/en/checkpointing" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;, &lt;a href="https://cursor.com/docs/agent/chat/checkpoints" rel="noopener noreferrer"&gt;Cursor&lt;/a&gt;, and &lt;a href="https://docs.windsurf.com/windsurf/cascade/cascade#named-checkpoints-and-reverts" rel="noopener noreferrer"&gt;Windsurf&lt;/a&gt; all support this. If the pollution runs deeper, compact to summarize past the bad section. Clear is the nuclear option when context is unsalvageable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Context Distraction
&lt;/h3&gt;

&lt;p&gt;Too much information competes for attention. The model can technically process it all, but signal gets lost in noise.&lt;/p&gt;

&lt;p&gt;On the whiteboard: shorthand, tiny writing, notes crammed into corners. Your teammate can see it all, but finding anything takes effort.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Keep context lean. Compact proactively. Don't dump everything onto the board.&lt;/p&gt;

&lt;h3&gt;
  
  
  Context Confusion
&lt;/h3&gt;

&lt;p&gt;Mixed content types muddle the model's understanding. Code snippets, prose explanations, JSON configs, and error logs all blur together. The model can't distinguish what's an instruction versus an example versus context.&lt;/p&gt;

&lt;p&gt;On the whiteboard: sticky notes, diagrams, tables, arrows, different colored markers. Your teammate can't parse what type of information to use for what purpose.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Use focused tools. Don't overload the board with too many formats or capabilities.&lt;/p&gt;

&lt;h3&gt;
  
  
  Context Clash
&lt;/h3&gt;

&lt;p&gt;Contradictory instructions coexist. "Prioritize speed" in one corner. "Prioritize quality" in another. Your teammate sees both, doesn't know which to follow, and produces something incoherent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Keep instructions centralized and current. Review your context files periodically for contradictions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Managing Context Well
&lt;/h2&gt;

&lt;p&gt;Five techniques make a difference.&lt;/p&gt;

&lt;h3&gt;
  
  
  Just-in-Time Retrieval
&lt;/h3&gt;

&lt;p&gt;Don't paste your whole codebase onto the board. Reference specific files and let the tool search.&lt;/p&gt;

&lt;p&gt;Bad: "Here's my entire src/ directory. Now fix the bug."&lt;br&gt;
Good: "There's a bug in the date parser. Check src/utils/dates.ts."&lt;/p&gt;

&lt;p&gt;The notebook stays on the table. You flip it open when needed, find the relevant page, add it to the discussion, then move on.&lt;/p&gt;

&lt;h3&gt;
  
  
  Compaction
&lt;/h3&gt;

&lt;p&gt;Context fills up during long sessions. Compaction summarizes conversation history, preserving key decisions while discarding noise.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to compact:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;After completing a major task (before starting the next one)&lt;/li&gt;
&lt;li&gt;During long sessions when you notice drift&lt;/li&gt;
&lt;li&gt;Before context hits limits (proactively, not reactively)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can provide custom instructions when compacting: "focus on architectural decisions" or "preserve the error messages we encountered." This guides what gets kept versus summarized away.&lt;/p&gt;

&lt;p&gt;My preference hierarchy:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Small tasks with &lt;code&gt;/clear&lt;/code&gt;&lt;/strong&gt; - fresh context beats compressed context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Early compaction with custom instructions&lt;/strong&gt; - you control what matters&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Early compaction with default prompt&lt;/strong&gt; - still gives thinking room&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Late compaction&lt;/strong&gt; - avoid this&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Late compaction (waiting until 95% capacity) is the worst option. The model has no thinking room, and the automatic summarization is opaque. You lose nuance without knowing what disappeared. Early compaction, ideally with custom instructions, gives you control and leaves space for the model to reason. Steve Kinney's &lt;a href="https://stevekinney.com/courses/ai-development/claude-code-compaction" rel="noopener noreferrer"&gt;guide to Claude Code compaction&lt;/a&gt; covers the mechanics well.&lt;/p&gt;

&lt;h3&gt;
  
  
  Structured Note-Taking
&lt;/h3&gt;

&lt;p&gt;For complex, multi-hour work, maintain notes outside the conversation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A NOTES.md file tracking progress&lt;/li&gt;
&lt;li&gt;Decision logs capturing why you chose specific approaches&lt;/li&gt;
&lt;li&gt;TODO lists that persist across compactions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The model can reference these files when needed, but they're not consuming context constantly. The notebook on the table, not copied onto the board.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sub-Agents
&lt;/h3&gt;

&lt;p&gt;For large tasks, send people to side rooms with fresh whiteboards:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Main agent coordinates the overall task&lt;/li&gt;
&lt;li&gt;Sub-agents handle specific, focused work with clean context&lt;/li&gt;
&lt;li&gt;Sub-agents return condensed summaries&lt;/li&gt;
&lt;li&gt;Main agent integrates results without carrying full sub-task context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="/assets/images/posts/2025-12-31-context-engineering/sub-agents.svg"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fkarun.me%2Fassets%2Fimages%2Fposts%2F2025-12-31-context-engineering%2Fsub-agents.svg" alt="Sub-agent workflow: main agent delegates tasks to sub-agents with fresh context, receives summaries back, and integrates results" width="500" height="340"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This mirrors how teams work: delegate, get summaries, integrate. Claude Code supports this pattern for &lt;a href="https://www.geeky-gadgets.com/how-to-use-git-worktrees-with-claude-code-for-seamless-multitasking/" rel="noopener noreferrer"&gt;parallel issue work&lt;/a&gt; using git worktrees.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tool-Specific Tips
&lt;/h3&gt;

&lt;p&gt;Each tool has different mechanisms for managing what goes on the board.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Code:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CLAUDE.md files load automatically at session start. Keep them focused and current.&lt;/li&gt;
&lt;li&gt;Hierarchical loading: user-level, project-level, directory-level. More specific overrides more general.&lt;/li&gt;
&lt;li&gt;Trust the tool's search. Don't paste file contents manually unless retrieval fails.&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;/compact&lt;/code&gt; between logical units of work.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cursor:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rules files inject instructions with different scopes: global, project, file-type specific.&lt;/li&gt;
&lt;li&gt;Use @-mentions deliberately. More files isn't better; relevant files are better.&lt;/li&gt;
&lt;li&gt;Keep rule files short. They add to every interaction.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Copilot:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lighter touch. Works best for autocomplete and quick suggestions.&lt;/li&gt;
&lt;li&gt;Less configurable context, so prompt quality matters more.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Windsurf:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Memories persist across sessions automatically.&lt;/li&gt;
&lt;li&gt;Good for maintaining preferences and patterns over time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Aider, Cline, and similar terminal-based tools&lt;/strong&gt; follow the same principles. Different mechanisms, same underlying constraints. For a deeper comparison, see &lt;a href="https://dev.to/javatarz/how-to-choose-your-coding-assistants-90k"&gt;How to choose your coding assistants&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Core Principle
&lt;/h2&gt;

&lt;p&gt;Anthropic's engineering team puts it well in their &lt;a href="https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents" rel="noopener noreferrer"&gt;guide to context engineering&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Find the smallest set of high-signal tokens that maximize the likelihood of your desired outcome.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;More context isn't better. Relevant context is better. Your job is to curate what goes on the board so your teammate can focus on what matters.&lt;/p&gt;

&lt;p&gt;Context drives quality. But "quality context" doesn't mean volume. It means signal: information the model needs to reason correctly. Everything else dilutes attention.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;Context engineering is a skill that develops with practice. Start by noticing when your tools perform well and when they drift. Ask why. Usually, the answer is in the context.&lt;/p&gt;

&lt;p&gt;Take a few minutes to examine how your tool handles context. Where do instructions go? How do files get included? What happens during long sessions?&lt;/p&gt;

&lt;p&gt;Understanding this is the difference between fighting your tools and working with them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Coming up:&lt;/strong&gt; Context engineering is one piece of the puzzle. In &lt;a href="https://dev.to/javatarz/intelligent-engineering-a-skill-map-for-learning-ai-assisted-development-3kaj"&gt;intelligent Engineering: A Skill Map for Learning AI-Assisted Development&lt;/a&gt;, I map out the full landscape of skills worth building.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>career</category>
    </item>
    <item>
      <title>intelligent Engineering: Principles for Building With AI</title>
      <dc:creator>Karun Japhet</dc:creator>
      <pubDate>Sat, 27 Dec 2025 17:46:56 +0000</pubDate>
      <link>https://dev.to/javatarz/intelligent-engineering-principles-for-building-with-ai-34aa</link>
      <guid>https://dev.to/javatarz/intelligent-engineering-principles-for-building-with-ai-34aa</guid>
      <description>&lt;p&gt;Software engineering is changing. Again.&lt;/p&gt;

&lt;p&gt;I've spent the last two years applying AI across prototyping, internal tools, production systems, and team workflows. I've watched it generate elegant solutions in seconds and confidently produce complete nonsense. I've seen it save hours on boilerplate and cost hours debugging hallucinated APIs.&lt;/p&gt;

&lt;p&gt;One thing has become clear: AI doesn't make engineering easier. It shifts where the hard parts are.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://karun.me/assets/images/posts/2025-11-06-intelligent-engineering-building-skills-and-shaping-principles/cover.jpg" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fkarun.me%2Fassets%2Fimages%2Fposts%2F2025-11-06-intelligent-engineering-building-skills-and-shaping-principles%2Fcover.jpg" alt="AI and human collaboration in software engineering" width="800" height="449"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The teams I've seen succeed with AI aren't the ones using it everywhere. They're the ones using it deliberately, knowing when to trust it, when to verify, and when to ignore it entirely.&lt;/p&gt;

&lt;p&gt;Here's a working set of principles I've found useful. They aren't finished and will evolve with the tools. But they help keep me grounded in what actually matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  intelligent Engineering Principles
&lt;/h2&gt;

&lt;p&gt;These principles fall into two buckets: what is new, and what remains timeless but more important than ever.&lt;/p&gt;

&lt;h3&gt;
  
  
  AI-Native Principles
&lt;/h3&gt;

&lt;p&gt;These principles exist because of AI. They address challenges that didn't matter before.&lt;/p&gt;

&lt;h4&gt;
  
  
  AI augments, humans stay accountable.
&lt;/h4&gt;

&lt;p&gt;AI can help you move faster and see options you'd miss on your own. But it can't own the outcome. Engineering judgment stays with you. When something breaks in production, "the AI suggested it" isn't an acceptable answer.&lt;/p&gt;

&lt;h4&gt;
  
  
  Context is everything.
&lt;/h4&gt;

&lt;p&gt;AI output reflects what you put in. Vague requests get vague results. Bring useful context: project constraints, coding standards, relevant examples, what you've already tried.&lt;/p&gt;

&lt;p&gt;As systems grow, context management becomes a discipline of its own. How do new teammates get AI tools primed with the right information? How do you keep that context current? When context exceeds what fits in a prompt, you'll need solutions like modular documentation.&lt;/p&gt;

&lt;h4&gt;
  
  
  Smarter AI needs smarter guardrails.
&lt;/h4&gt;

&lt;p&gt;Faster generation demands sharper review. AI-produced code still needs validation: Is it correct? Secure? Does it solve the right problem?&lt;/p&gt;

&lt;h4&gt;
  
  
  Shape AI deliberately.
&lt;/h4&gt;

&lt;p&gt;I've seen teams adopt whatever AI tools are trending without asking whether they fit. Six months later, half the codebase assumed Copilot's import ordering, onboarding docs referenced prompts that no longer worked, and no one remembered why. Decide upfront: where does AI help us? Where does it not? What happens when we switch tools?&lt;/p&gt;

&lt;h4&gt;
  
  
  Learning never stops.
&lt;/h4&gt;

&lt;p&gt;At the start of 2025, AI practices evolved weekly. By year's end, monthly. That's still faster than most teams are used to. What didn't work three months ago might work now. The only way to know is to keep experimenting.&lt;/p&gt;

&lt;p&gt;I've settled on 90% getting work done, 10% experimenting. Try new ways to solve the same problem. Revisit old problems to see if there's a simpler solution now. Check if techniques you learned last quarter still make sense.&lt;/p&gt;

&lt;h3&gt;
  
  
  Timeless Foundations
&lt;/h3&gt;

&lt;p&gt;These aren't new, but AI makes them more important.&lt;/p&gt;

&lt;h4&gt;
  
  
  Learn fast, adapt continuously.
&lt;/h4&gt;

&lt;p&gt;Start small, validate often, and shorten feedback loops. If an AI-assisted workflow isn't helping, change it. Don't let sunk cost keep you on a bad path.&lt;/p&gt;

&lt;h4&gt;
  
  
  Fast doesn't mean good.
&lt;/h4&gt;

&lt;p&gt;AI makes it easy to generate code fast. That doesn't mean the code is worth keeping. Unmaintainable, insecure, or rigid solutions cost more than they save. Build the right thing, not just the quick thing.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Looks Like in Practice
&lt;/h2&gt;

&lt;p&gt;Here's what this means day-to-day:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I use AI to draft implementations, then spend more time reviewing than I saved generating. The review is where the real work happens.&lt;/li&gt;
&lt;li&gt;When AI suggests an approach, I ask "why?" If I can't explain the choice to a teammate, I don't use it.&lt;/li&gt;
&lt;li&gt;I've learned to be specific. "Write a function to parse dates" gets garbage. "Parse ISO 8601 dates, handle timezone offsets, return None for invalid input" gets something useful.&lt;/li&gt;
&lt;li&gt;I treat AI output like code from a confident junior developer: often correct, sometimes subtly wrong, occasionally completely off base.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The craft hasn't changed. I still need to understand the problem, reason about edge cases, and take responsibility for what ships.&lt;/p&gt;

&lt;h2&gt;
  
  
  Skills Worth Building
&lt;/h2&gt;

&lt;p&gt;Principles guide decisions. Skills make them possible.&lt;/p&gt;

&lt;p&gt;Here's what I've found worth investing in:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context engineering matters more than prompt engineering.&lt;/strong&gt; A clever prompt won't fix bad context. I spend more time curating what information the model sees than crafting how I ask for things. Project documentation, coding standards, relevant examples. These matter more than prompt tricks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Understanding tokens and context windows helps.&lt;/strong&gt; You don't need to become an ML engineer. But it helps to know why your 50-file codebase overwhelms the model, or why it "forgets" earlier instructions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agentic workflow primitives matter more than AI theory.&lt;/strong&gt; You won't build RAG systems from scratch. You'll use tools with these built in. What matters is configuring them: hooks that customize behavior, skills that extend capabilities, context management that keeps information relevant. I spend more time learning how my tools' hooks work or how to structure context files than reading ML papers.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;For a comprehensive guide to building these skills, see &lt;a href="https://dev.to/javatarz/intelligent-engineering-a-skill-map-for-learning-ai-assisted-development-3kaj"&gt;A Skill Map for Learning AI-Assisted Development&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;I've seen what happens when teams adopt AI without thinking it through. Prototypes that demo well but collapse under real load. Codebases where no one understands why decisions were made because "the AI suggested it." Bugs that take days to track down because the generated code looked plausible but handled edge cases incorrectly.&lt;/p&gt;

&lt;p&gt;The failure mode isn't dramatic. It's slow erosion: teams that gradually stop reasoning deeply because the model provides answers quickly.&lt;/p&gt;

&lt;p&gt;The alternative isn't avoiding AI. It's using it with intention. The engineers I've seen do this well have gotten faster &lt;em&gt;and&lt;/em&gt; more thoughtful. They use AI to handle the routine and focus on the hard problems.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;These principles aren't final. I expect to revise them as tools improve and as I learn what actually works versus what sounds good in theory.&lt;/p&gt;

&lt;p&gt;If you're experimenting with AI in your engineering work, I'd be curious to hear what's working for you. What would you add? What would you challenge?&lt;/p&gt;

&lt;h2&gt;
  
  
  Credits
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;This blog would not have been possible without the review and feedback from&lt;/em&gt; &lt;a href="https://www.linkedin.com/in/greg-reiser-6910462/" rel="noopener noreferrer"&gt;&lt;em&gt;Greg Reiser&lt;/em&gt;&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://www.linkedin.com/in/gsong/" rel="noopener noreferrer"&gt;&lt;em&gt;George Song&lt;/em&gt;&lt;/a&gt; &lt;em&gt;and&lt;/em&gt; &lt;a href="https://www.linkedin.com/in/karthika-vijayan/" rel="noopener noreferrer"&gt;&lt;em&gt;Karthika Vijayan&lt;/em&gt;&lt;/a&gt; &lt;em&gt;for reviewing multiple versions of this post and providing patient feedback 😀.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This content has been written on the shoulders of giants (at and outside&lt;/em&gt; &lt;a href="https://sahaj.ai" rel="noopener noreferrer"&gt;&lt;em&gt;Sahaj&lt;/em&gt;&lt;/a&gt;&lt;em&gt;).&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>career</category>
    </item>
    <item>
      <title>Level Up Code Quality with an AI Assistant</title>
      <dc:creator>Karun Japhet</dc:creator>
      <pubDate>Sat, 27 Dec 2025 17:46:40 +0000</pubDate>
      <link>https://dev.to/javatarz/level-up-code-quality-with-an-ai-assistant-5cdn</link>
      <guid>https://dev.to/javatarz/level-up-code-quality-with-an-ai-assistant-5cdn</guid>
      <description>&lt;p&gt;Using AI coding assistants to introduce, automate, and evolve quality checks in your project.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://karun.me/assets/images/uploads/code-quality-with-ai-cover-art.png" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx2goflob85nsv9o387f2.png" alt="Chosing Coding Assistants Cover Art: Choose your tool" width="650" height="433"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I have talked about teams needing to have a &lt;a href="https://dev.to/javatarz/what-makes-developer-experience-world-class-4l3i"&gt;world class developer experience&lt;/a&gt; as a pre-requisite for a well functioning team. When teams lack such a setup, the most common response is a lack of time or buy in from stakeholders to build these things. With &lt;a href="https://dev.to/javatarz/how-to-choose-your-coding-assistants-90k"&gt;AI coding assistants being readily available to most developers today&lt;/a&gt;, the engineering effort and the cost investment for the business lesser reducing the barrier to entry.&lt;/p&gt;

&lt;h1&gt;
  
  
  Current State
&lt;/h1&gt;

&lt;p&gt;This post showcases an actual codebase that has not been actively maintained for over 5 years but runs a product that is actively used. It is business critical but did not have the necessary safety nets in place. Let us go through the journey, prompts inclusive, on how to make the code quality of this repository better, one prompt at a time.&lt;/p&gt;

&lt;p&gt;The project is a Django backend application that exposes APIs. We start off with a quick overview of the code and notice that there are tests and some documentation but a lack of consistent way to run and test the application.&lt;/p&gt;

&lt;h1&gt;
  
  
  The Journey
&lt;/h1&gt;

&lt;p&gt;I am assuming you are running these commands using Claude Code (with Claude Sonnet 4 in most cases). This is equally applicable across any coding assistant. Results will vary based on your choices of models, prompts and the codebase.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting up Basic Documentation and Some Automation
&lt;/h2&gt;

&lt;p&gt;If you are using a tool like Claude Code, run &lt;code&gt;/init&lt;/code&gt; in your repository and you will get a significant part of this documentation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Can you analyse the code and write up documentation in README.md that
 clearly summarises how to setup, run, test and lint the application.
Please make sure the file is concise and does not repeat itself. 
Write it like technical documentation. Short and sweet.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next step is to start setting up some automation (like just files) to help make the project easier to use. This will take a couple of attempts to get right but here is a prompt you can start off with&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Please write up a just file. I would like the following commands
`just setup` - set up all the dependencies of the project
`just run` - start up the applications including any dependencies
`just test` - run all tests
If you require clarifications, please ask questions. 
Think hard about what other requirements I need to fulfill. 
Be critical and question everything. 
Do not make code changes till you are clear on what needs to be done.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will give you a base structure for you to modify quickly and get up and running. If you &lt;code&gt;README.md&lt;/code&gt; has a preferred way to run the application (locally vs docker), the just will automatically use it. If not, you will have to provide clarification.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting up pre-commit for Early Feedback
&lt;/h2&gt;

&lt;p&gt;Let’s start small and build on it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Please setup pre-commit with a single task to run all tests on every push.
Update the just script to ensure pre-commit hooks are installed locally
 during the setup process.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We probably didn’t need to be this explicit but I find managing context and keeping tasks small mean I move a lot quicker.&lt;/p&gt;

&lt;h2&gt;
  
  
  Curating Code Quality Tools
&lt;/h2&gt;

&lt;p&gt;Lets begin by finding good tools to use, create a plan for the change and then execute the plan. Start off by moving Claude Code to &lt;code&gt;Plan mode&lt;/code&gt; (shift+tab twice)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;What's a good tool to check the complexity of the python code this
 repository has and lint on it to provide the team feedback as a 
 pre-commit hook?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It came back with a set of tools I liked but it assumed that the commit will immediately go green. In an existing large codebase with tech debt, this will not happen. Let’s break this down further.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The list of tools you're suggesting sound good. 
The codebase currently will have a very large number of violations. 
I want the ability to incrementally improve things with every commit. 
How do we achieve this?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Creating a Plan
&lt;/h2&gt;

&lt;p&gt;After you iterate on the previous prompt with the agent, you will get a plan that you’ll be happy with. The AI assistant will ask for permission to move forward and execute the plan but before doing so, it will be worth creating a save state. Imagine this as a video game save, if something goes wrong, come back and restore from this point. This also allows you to clear context since everything is dumped to markdown files on disk.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Can you create a plan that is executable in steps?
Write that plan to `docs/code-quality-improvements`.
Try to use multiple background agents if it helps speed up this process.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Give it a few minutes to analyse the code. In my case, the following files were created. &lt;code&gt;README.md&lt;/code&gt; says that “Tasks within the same phase can be executed in parallel by multiple Claude Code assistants, as long as prerequisites are met”. You are ready to hit &lt;code&gt;/clear&lt;/code&gt; and clear out the context window.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fovxjoy0gaqqgiu4ox25b.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fovxjoy0gaqqgiu4ox25b.jpg" alt="Plan as tasks" width="607" height="454"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Phase 1 sets up the basic tools, phase 2 configures them, phase 3 focuses on integration and automation and phase 4 adds monitoring and focuses on improving the code quality.&lt;/p&gt;

&lt;p&gt;Before executing the plan, I commit the plan (&lt;code&gt;docs/code-quality-improvement&lt;/code&gt;). This allows me to track any changes that have been made. When executing the plan, I do not check in the changes made to the plan. This allows me to drop the plan at the end of the process. As a team, we have discussed potentially keeping the plan around as an artifact. To do so, you would have to ask Claude Code to use relative paths (it uses absolute paths when asking for files to be updated in the plan).&lt;/p&gt;

&lt;h2&gt;
  
  
  Executing the Plan
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;I would like to improve code quality and I have come up with a plan to do 
so under `docs/code-quality-improvement`.
Can you analyse the plan and start executing it? The `README.md` has a 
quick start section which tasks about how to execute different phases of the 
plan. As you execute the plan, mark tasks as done to track state.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You will notice that Claude Code will add dependencies to &lt;code&gt;requirements-dev.txt&lt;/code&gt; and try to run things without installing them. Also, it will add dependencies that do not exist. Stop the execution (by pressing &lt;code&gt;Esc&lt;/code&gt; ) and use the following prompt to course correct&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;For every pip dependency you add to `requirements-dev.txt`, please run 
`pip install`. 
Before adding a dependency to the dependency file, please check if it is 
available on `pip`.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once phase 1 and phase 2 of the plan are complete, the following files are created and ready to be committed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6xbydjyi1v45ix5bkgjv.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6xbydjyi1v45ix5bkgjv.jpg" alt="Linting tools setup" width="253" height="142"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When the quality gates are added on phase 3, run the command once to test if everything works and create another commit. After this, I had to prompt it once more to integrate the lint steps into a simplified developer experience.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Please add `just lint` as a command to run all quality checks
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Test the brand new lint command and then run a commit. Ask claude code to proceed to phase 4.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmzo9ktrp5b8z4dk901cv.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmzo9ktrp5b8z4dk901cv.jpg" alt="Claude Code’s self doubt" width="538" height="319"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You might see Claude Code doubt a plan that it has created. It is a good question because the system is &lt;em&gt;functional&lt;/em&gt; but if we prefer the more advanced checks, we should request it pushes on with Phase 4 implementation.&lt;/p&gt;

&lt;p&gt;After phase 4, we have a codebase that checks for code quality every time a developer is pushing code. Our repository has pre-commit hooks for linting, runs all quality checks once before pushing. The quality checks will fail if the code added has unformatted files, imports in the wrong order, &lt;code&gt;flake8&lt;/code&gt; lint issues or functions with higher code complexity. It checks this only in the files being touched (because we told it that we had debt that needs to be reduced and all checks will not pass by default)&lt;/p&gt;

&lt;p&gt;You still have debt, lets go over fixing this in the next step.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fixing Existing Debt
&lt;/h2&gt;

&lt;p&gt;Tools like &lt;code&gt;isort&lt;/code&gt; can highlight issues and fix them. You should start off running such commands to fix the code. On most codebases, this will touch almost all of the files. The challenge with this is that all the issues that cannot be fixed automatically (like wildcard imports) will need to be fixed manually. This is where you make a choice either to fix issues manually or automatically. If you’re using Claude Code to fix these issues and there is a large number, you’re probably going to pay in upwards of $10 for this session on any decent sized codebase. I recommend moving to GitHub Copilot’s agent to help push down costs here.&lt;/p&gt;

&lt;p&gt;Ask your coding assistant of choice to run the lint command and fix the issues. Most of them will stop after 1–2 attempts because the list is large. You can tell it to “keep doing this task till there are no linting errors left. DO NOT stop till the lint command passes”. If your context file (&lt;code&gt;CLAUDE.md&lt;/code&gt;) does not talk about how to lint, be explicit and tell your coding assistant what the command to be run is.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Left?
&lt;/h2&gt;

&lt;p&gt;If you look at the &lt;code&gt;gradual-tightening&lt;/code&gt; task, it created a command to analyse the code and keep being gradually more strict. This command can either be run manually or automatically on a pipeline. One of the parameters it changes is the &lt;code&gt;max-complexity&lt;/code&gt; which is set to 20 by default. This complexity will be reduced over a period of time. Similarly, the complexity check tasks have a lower bar to begin with and should be improved periodically to tighten the quality guidelines on this repository.&lt;/p&gt;

&lt;p&gt;While our AI coding pair has helped design and improve the code quality to a large extent, the last mile has to be walked by all of our teammates. We now have a strong feedback mechanism for bad code that will fail the pipeline and stop code from being committed or pushed. The last bit requires team culture to be built. On one of my teams, we had a soft check in every retro to see if every member had made the codebase a little bit better in a sprint. A sprint is 10 days and “a little bit” can include refactoring a tiny 2–3 line function and making it better. The bar is really low but the social pressure of wanting to make things better motivated all of us to drive positive change.&lt;/p&gt;

&lt;p&gt;Having a high quality codebase with a good developer experience is not a pipe dream and making it a reality is easier than ever with AI coding assistants like Claude Code or Copilot. What have you been able to improve recently? 😃&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>testing</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
