<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ian Johnson</title>
    <description>The latest articles on DEV Community by Ian Johnson (@tacoda).</description>
    <link>https://dev.to/tacoda</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F171498%2Fb1207a6e-f740-43c4-bb64-c675e3b3ce1d.jpeg</url>
      <title>DEV Community: Ian Johnson</title>
      <link>https://dev.to/tacoda</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tacoda"/>
    <language>en</language>
    <item>
      <title>Harness Engineering — Building Reliable Workflows Around Non-Deterministic Agents</title>
      <dc:creator>Ian Johnson</dc:creator>
      <pubDate>Sun, 21 Jun 2026 22:55:28 +0000</pubDate>
      <link>https://dev.to/tacoda/harness-engineering-building-reliable-workflows-around-non-deterministic-agents-3f5m</link>
      <guid>https://dev.to/tacoda/harness-engineering-building-reliable-workflows-around-non-deterministic-agents-3f5m</guid>
      <description>&lt;p&gt;Six months ago this was a blog post. Then a second blog post. Then Bridle, Sellier, intent-driven-delivery, and eventually Keystone.&lt;/p&gt;

&lt;p&gt;Now it’s a book: &lt;em&gt;Harness Engineering — Building Reliable Workflows Around Non-Deterministic Agents.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;What surprised me writing it: the experiments earned their place. Bridle taught me the flywheel. Sellier taught me defaults beat flexibility. IDD taught me the team layer can be reified. Keystone is the synthesis they pointed at.&lt;/p&gt;

&lt;p&gt;It’s on LeanPub now. Pay what you want, starts at $10.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://leanpub.com/harness-engineering" rel="noopener noreferrer"&gt;https://leanpub.com/harness-engineering&lt;/a&gt;&lt;/p&gt;

</description>
      <category>agenticworkflow</category>
      <category>books</category>
      <category>softwareengineering</category>
      <category>harnessengineering</category>
    </item>
    <item>
      <title>Sensors: The Other Half of the Harness</title>
      <dc:creator>Ian Johnson</dc:creator>
      <pubDate>Sat, 20 Jun 2026 15:15:50 +0000</pubDate>
      <link>https://dev.to/tacoda/sensors-the-other-half-of-the-harness-3mml</link>
      <guid>https://dev.to/tacoda/sensors-the-other-half-of-the-harness-3mml</guid>
      <description>&lt;p&gt;The pre-commit hook caught a migration last Tuesday that would have shipped to staging green and broken production on the next deploy. The migration dropped a foreign key constraint without naming a backfill plan, and the rule that said &lt;em&gt;every constraint change needs a backfill plan&lt;/em&gt; had been sitting in migrations.md for four months. The rule didn’t stop the agent. The rule didn’t stop the human reviewer either. The diff was 38 files and the constraint drop was a single line. What stopped it was a four-line shell script wired to pre-commit that grepped the staged migration for DROP CONSTRAINT and exited non-zero if no--backfill: comment followed it.&lt;/p&gt;

&lt;p&gt;That four-line script is what I want to talk about. Most of what gets written about agent harnesses is about rules — what CLAUDE.md should say, where to put it, how to scope it, when to version it. Rules are half the harness. The other half is the set of checks that fire when a rule gets broken. I call them sensors, and they get talked about a lot less than they should.&lt;/p&gt;

&lt;h3&gt;
  
  
  The asymmetry in the conversation
&lt;/h3&gt;

&lt;p&gt;Read any of the well-circulated posts on agent harnesses and count the words spent on each side. Rules are the centerpiece. Sensors are glanced over, as if they’re so obvious they don’t deserve their own treatment. They aren’t obvious, and the assumption that they’re already there is the most expensive assumption in the field.&lt;/p&gt;

&lt;p&gt;The asymmetry has a reason. Rules are easier to write. A rule is a paragraph. A sensor is a script, a hook, a CI step, a pre-commit config, a custom check. Rules sit at the cognitive level a writer naturally operates at; sensors sit at the level of plumbing. The first one is a lot more fun to write than the second.&lt;/p&gt;

&lt;p&gt;The asymmetry also has a cost. &lt;em&gt;Rules without sensors are vibes.&lt;/em&gt; The agent reads them, claims to follow them, and gets graded on whether the resulting code &lt;em&gt;looks&lt;/em&gt; like it followed them. &lt;em&gt;Looks like&lt;/em&gt; is the failure mode. The agent is a probabilistic system. It will obey a rule most of the time and skip it some of the time, and the times it skips are the times the rule mattered most: the awkward branch, the migration nobody likes to think about, or the edge case the rule was written to catch.&lt;/p&gt;

&lt;p&gt;A sensor changes the contract. The rule is no longer a hope. It’s a falsifiable check.&lt;/p&gt;

&lt;h3&gt;
  
  
  What a sensor actually is
&lt;/h3&gt;

&lt;p&gt;A sensor is anything that can detect whether a rule was followed and signal that detection in a way the workflow respects. The shape is narrow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It runs deterministically.&lt;/li&gt;
&lt;li&gt;It returns a clear pass or fail.&lt;/li&gt;
&lt;li&gt;It fires at a point in the workflow where its result still matters.&lt;/li&gt;
&lt;li&gt;It’s cheap enough that nobody is tempted to skip it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most of the sensors in a working harness aren’t fancy. A grep with set -e. A line in a linter config. A pytest fixture that asserts the database fixture hasn’t drifted. A pre-commit hook that runs mypy on changed files. A GitHub Actions job that fails the PR if the migration directory has a file without a paired rollback. None of these is impressive on its own. The point isn’t the individual sensor; &lt;em&gt;it’s the discipline of having one for every rule that matters.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Five places a sensor fires
&lt;/h3&gt;

&lt;p&gt;There are five workflow positions where sensors earn their pay. Each catches a different class of failure. A working harness has sensors at most of them.&lt;/p&gt;

&lt;h4&gt;
  
  
  Pre-edit
&lt;/h4&gt;

&lt;p&gt;The earliest sensor is the one the agent encounters before it writes code. The rule is loaded into context; the sensor is something the agent can run to check its own work before producing output. A type stub generator, a schema dump, a --dry-run command that simulates the change. The agent doesn’t need approval; it just needs to see what its proposed change &lt;em&gt;would&lt;/em&gt; do.&lt;/p&gt;

&lt;p&gt;Pre-edit sensors are the rarest of the five, because they require giving the agent tools, not just rules. The pay-off is that they catch errors before any bytes hit disk. The agent that can run prisma validate on a proposed schema change is going to make fewer broken commits than the agent that can only read the rule that says &lt;em&gt;make sure your schema is valid&lt;/em&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  Pre-commit
&lt;/h4&gt;

&lt;p&gt;Pre-commit is where most teams put their first sensor, because Git makes it easy. A .pre-commit.config.yaml or a husky hook fires the moment the agent (or human) tries to land a change. The check has the staged diff to look at, the rest of the repo for context, and a hard exit code that the workflow respects.&lt;/p&gt;

&lt;p&gt;This is the sensor layer for rules that constrain the &lt;em&gt;artifact&lt;/em&gt; — what the code looks like, not what it does. Format, lint, type, dead code, banned imports, naming patterns, the migration backfill check above. Cheap, fast, local. The agent that breaks a pre-commit hook either fixes the violation or doesn’t commit. Either outcome beats &lt;em&gt;the agent commits and nobody notices&lt;/em&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  Pre-PR
&lt;/h4&gt;

&lt;p&gt;A pre-PR sensor is one that runs after the commits exist but before the diff goes up for review. The check has more to work with: a full branch, a base ref, a set of changed files that’s no longer one commit at a time. It’s the right layer for cross-file checks. &lt;em&gt;Did you change this API without updating its consumers? Did you add a new migration without bumping the schema version? Does the test coverage diff drop more than two points?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Pre-PR runs cheap on CI and slow locally. Most teams run it on the branch push, which catches the agent’s work right as it’s offering the work for human attention. That timing matters. A check that fires after the human has started reviewing is a check that’s been skipped.&lt;/p&gt;

&lt;h4&gt;
  
  
  Post-merge
&lt;/h4&gt;

&lt;p&gt;Post-merge sensors are the ones that run against main. They aren’t strictly part of the agent’s loop, but they’re part of the harness because they detect when the loop produced something that broke once integrated. Smoke tests, end-to-end suites, schema-diff jobs that compare staging to production, query-plan monitors that fire when a slow query joins the rotation.&lt;/p&gt;

&lt;p&gt;The instinct is to think of post-merge as &lt;em&gt;CI&lt;/em&gt; and not as part of the harness. That’s a mistake. The agent’s behavior is shaped by what fails &lt;em&gt;after&lt;/em&gt; it shipped. A post-merge sensor that catches a regression and gets traced back to a missing rule is the most valuable kind of sensor in the whole stack. It tells you something the inner sensors missed, and it tells you what to add upstream.&lt;/p&gt;

&lt;h4&gt;
  
  
  Drift
&lt;/h4&gt;

&lt;p&gt;The fifth and most-skipped sensor is the one that checks whether the rules themselves are still true. The codebase moves. The framework version bumps. The pattern the rule used to describe gets replaced by a new pattern. The rule is now wrong, and nobody notices because nothing fires.&lt;/p&gt;

&lt;p&gt;A drift sensor is the check that wakes up periodically and asks: does this rule still match reality? Sometimes that’s a literal grep for the pattern the rule references. Sometimes it’s a count of how many files in the codebase still match the rule’s example. Sometimes it’s a last-modified audit that flags any rule older than six months for review. The cadence is monthly or quarterly; the goal is to keep the harness honest as the code beneath it changes.&lt;/p&gt;

&lt;p&gt;I’ll write a whole post about rule-rot soon. The short version: &lt;em&gt;a drift sensor is the difference between a harness that’s six months old and helpful and a harness that’s six months old and lying.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The pairing rule
&lt;/h3&gt;

&lt;p&gt;If a rule matters, it has a sensor. That’s the principle. It sounds rigid, and it is, on purpose.&lt;/p&gt;

&lt;p&gt;The discipline works because it forces a real question every time a rule gets written: &lt;em&gt;how would I know if this got violated?&lt;/em&gt; If the answer is &lt;em&gt;I’d see it in code review&lt;/em&gt;, the rule is on probation. Code review is slow, expensive, and inconsistent. It is also the wrong layer for rules that produce mechanical violations.&lt;/p&gt;

&lt;p&gt;If the answer is &lt;em&gt;the linter would catch it&lt;/em&gt;, the rule probably doesn’t need to be a rule. &lt;strong&gt;The linter is the rule. Just configure the linter and skip the prose.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The interesting cases are the rules where the answer is &lt;em&gt;nothing currently checks for this, but I could write a check&lt;/em&gt;. Those are the rules that earn a sensor. The sensor doesn’t have to ship the same day the rule ships, but it should be on the list and have a date.&lt;/p&gt;

&lt;p&gt;The rules that aren’t worth a sensor are the rules that aren’t worth writing down. &lt;em&gt;Be thoughtful&lt;/em&gt; is not a rule. &lt;em&gt;Bullet points should be sentence fragments unless they form a list of full sentences&lt;/em&gt; is also not a rule, because nothing catches a violation and the cost of writing it down is more than the cost of fixing the rare slip in editing.&lt;/p&gt;

&lt;p&gt;A rule with no sensor is either too vague to matter or important enough that someone should pair it within the month. &lt;em&gt;Either way, the absence of a sensor is information.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  What good sensors share
&lt;/h3&gt;

&lt;p&gt;The sensors that survive in production look alike. They share five traits.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;They run in seconds.&lt;/strong&gt; A pre-commit hook that takes 90 seconds gets bypassed within two weeks. A pre-PR check that takes 12 minutes gets the --skip-checks shortcut added to the team Slack. The check has to be fast enough that running it is cheaper than figuring out how to avoid running it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;They produce clear messages.&lt;/strong&gt; A failed sensor that prints EXIT 1 is a failed sensor that’s about to get ignored. A failed sensor that prints migrations/20260620_drop_users_index.sql dropped an index without a paired backfill comment. Add a comment starting with --backfill: explaining the plan, or annotate the migrations with --no-backfill-needed:  is a sensor that teaches the agent (or the human) how to fix the violation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;They run locally.&lt;/strong&gt; A check that only exists in CI is a check that the agent finds out about ten minutes after it pushed. &lt;em&gt;Running the same check locally before push closes the loop.&lt;/em&gt; The pre-commit framework and npm run check-style scripts both make this easy; the discipline is to make sure CI runs the &lt;em&gt;same&lt;/em&gt; scripts the developer can run.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;They cite the rule they enforce.&lt;/strong&gt; Every sensor message that earns its weight names the rule. *Failed: rule MIGRATIONS-04 (foreign-key changes require a backfill plan).* The agent reads the message; if the rule is wrong, the agent can find it and propose a change. The sensor is the gate, but the rule is the explanation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;They fail loud and pass quiet.&lt;/strong&gt; A sensor that prints OK checked 47 migration files in 1.2s every commit is a sensor whose output everyone learns to ignore. Pass silently. Fail with a paragraph. Save the human’s attention for the cases that need it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where sensors live in the project
&lt;/h3&gt;

&lt;p&gt;A working sensor layout has four physical homes. Each holds the sensors for a different workflow position.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.husky/ # or .pre-commit-config.yaml
  pre-commit # fast, staged-files only
  commit-msg # commit message linting
.github/workflows/
  pr-checks.yml # pre-PR, runs on push
  post-merge.yml # post-merge, runs on main
.claude/
  hooks/ # agent-facing pre-edit checks
  sensors/ # one shell script per drift check
scripts/check-*.sh # the actual checks, invoked from above
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The shape isn’t sacred. &lt;em&gt;What matters is that each sensor has a single home and the workflow knows where to call it.&lt;/em&gt; The anti-pattern is the same check running in three places with slightly different logic and the team unable to remember which one is authoritative.&lt;/p&gt;

&lt;p&gt;The sensors themselves go in scripts/ (or wherever your project keeps tooling) and get called by whichever runner needs them. That separation means the same check can fire from pre-commit, from CI, and from a drift-audit cron without three copies. One script, three invocations.&lt;/p&gt;

&lt;p&gt;Personally, I prefer to wrap these scripts in make tasks. The benefit that I receive is that the agent always knows to use make. And that makes everything easier to find and more consistent. I use make in CI too. Usually, I work with docker and so this is a natural pair for me. &lt;em&gt;Isolate the environment with docker; isolate the commands with make.&lt;/em&gt; That makes it easier for the human &lt;em&gt;and&lt;/em&gt; the agent to work with the repository.&lt;/p&gt;

&lt;h3&gt;
  
  
  Two sensors worth writing this month
&lt;/h3&gt;

&lt;p&gt;The two highest-value sensors for a team starting from zero are both small.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The migration safeguard.&lt;/strong&gt; Whatever your migration tool is, the team’s hardest-to-catch bugs ship through it. A pre-commit script that knows the patterns your migrations are supposed to avoid — DROP COLUMN without paired backfill, ADD COLUMN NOT NULL without a default, schema-breaking renames without a two-phase plan — catches more production issues than any rule on its own. The agent will write migrations that look right and miss the constraint. The sensor sees what the human reviewer doesn’t.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The diff-budget gate.&lt;/strong&gt; A pre-PR check that fails if the diff exceeds N files or M lines. The number is yours to pick. I use 15 files and 400 lines, which roughly maps to a PR a human can review in one sitting. The rule that says &lt;em&gt;keep PRs small&lt;/em&gt; is exactly the kind of rule that gets nodded at and then drowned in the next month. The sensor that fails the build at file 16 is the rule with teeth.&lt;/p&gt;

&lt;p&gt;Both are under fifty lines of shell. Both pay for themselves the first week.&lt;/p&gt;

&lt;h3&gt;
  
  
  The anti-patterns to watch
&lt;/h3&gt;

&lt;p&gt;A few sensor failure modes recur often enough to name.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The unrunnable sensor.&lt;/strong&gt; It lives in CI, takes 14 minutes, and depends on a secret only the deploy bot has. Nobody on the team can run it locally. By the time it fires, the human reviewer has already mentally signed off. The fix: factor out the local-runnable piece and run &lt;em&gt;that&lt;/em&gt; on every commit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The sensor that always passes.&lt;/strong&gt; Someone wrote it months ago, the inputs changed, the assertions still hold, the failure case is no longer reachable. The sensor is a green light glued to the dashboard. The fix: every sensor needs at least one &lt;em&gt;known-bad&lt;/em&gt; fixture that the test suite runs against it to confirm it still detects the violation it was written for.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The flaky sensor.&lt;/strong&gt; It fails sometimes and passes the rest. The team learns to rerun. Within a month, the rerun habit has spread to the other checks, and a real failure gets bypassed because it &lt;em&gt;looked&lt;/em&gt; flaky. The fix: &lt;strong&gt;a flaky sensor is broken.&lt;/strong&gt; Pull it from the gate, fix it, or delete it. Do not leave it firing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The rule with no sensor.&lt;/strong&gt; The whole point of this post. A rule that lives in CLAUDE.md and has no check is a rule that gets followed when the agent feels like it. The fix is the pairing rule above: if the rule matters, pair it within a month.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The sensor with no rule.&lt;/strong&gt; Less common but worth naming. A check fires, a developer doesn’t know why, the rule it’s enforcing isn’t written down. The check is correct on its own terms but unteachable. The fix: cite the rule in the failure message. If you can’t, write the rule.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pair one rule to a sensor this week
&lt;/h3&gt;

&lt;p&gt;Pick one rule from your CLAUDE.md — the one you find yourself reminding the agent about most. Open a file. Write a check. Wire it to pre-commit. Ship it.&lt;/p&gt;

&lt;p&gt;The right rule for the exercise is one with a clear violation pattern. &lt;em&gt;Always use the&lt;/em&gt; &lt;em&gt;Result type for fallible operations&lt;/em&gt; is a fine candidate; you can grep for throw in the changed files and warn. &lt;em&gt;Match the existing code style&lt;/em&gt; is a poor candidate; the violation pattern is too diffuse for a small sensor to catch.&lt;/p&gt;

&lt;p&gt;When the sensor catches its first violation, two things happen. The rule becomes load-bearing. The agent now has to satisfy it instead of just acknowledge it. Additionally, you find out whether the rule was right. Sensors that fail a lot on changes that look reasonable in review are sensors enforcing rules that need to change. &lt;em&gt;The sensor is the feedback loop; the rule is the hypothesis.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Add a second one next month. By the end of the quarter your harness will feel different. Not because the rules are smarter, but because the rules can be checked.&lt;/p&gt;

&lt;p&gt;The rules are the part everyone writes. The sensors are the part that makes them true.&lt;/p&gt;

</description>
      <category>agenticai</category>
      <category>softwareengineering</category>
      <category>harnessengineering</category>
    </item>
    <item>
      <title>The Accidental Framework</title>
      <dc:creator>Ian Johnson</dc:creator>
      <pubDate>Fri, 19 Jun 2026 15:23:25 +0000</pubDate>
      <link>https://dev.to/tacoda/the-accidental-framework-40l5</link>
      <guid>https://dev.to/tacoda/the-accidental-framework-40l5</guid>
      <description>&lt;p&gt;The thing I built was a Go binary that copied markdown files into a repository. I called it an installer. The README called it an installer. The CLI verb was init. For the several months in early 2026, that framing held. Then I sat down to plan the 1.0 release, started typing, and could not write the next paragraph without using the word &lt;em&gt;framework&lt;/em&gt;. Not in the marketing-blurb sense. In the load-bearing sense: a small runtime, a set of named extension points, conventional file layouts, a resolver that picked a winner among competing sources, a lockfile, a migration runner, per-backend adapters. The thing I had been calling an installer for four months had been doing framework work the whole time, and I just had not noticed.&lt;/p&gt;

&lt;p&gt;This post is about that moment of recognition and what came after. It is about the refactor that turned an installer-shaped framework into a framework-shaped framework. The most surprising part was not the rewrite. It was how little of the &lt;em&gt;concepts&lt;/em&gt; had to change. The names were already right. The directories needed to move. The runtime needed a physical line down the middle between framework code and content. But the abstractions I had been using (guides, corpus, sensors, actions, playbooks, adapters) held up under the new framing without revision. The work I had done naming things over the previous months turned out to be the work that made the refactor cheap.&lt;/p&gt;

&lt;p&gt;I want to walk through how that happened, what tipped me off, and the one wrong turn the refactor almost made before backing out of it. The story has a tidy ending; the lesson does not, and I will get to that.&lt;/p&gt;

&lt;h3&gt;
  
  
  The shape was there before the word
&lt;/h3&gt;

&lt;p&gt;Keystone started as keystone init — a binary you ran in a fresh repo and it laid down a harness/ directory full of markdown. Rules for the agent. A small lifecycle. A few sensors that ran your existing lint and test commands. Nothing especially framework-ish. The 0.1 README said &lt;em&gt;installer&lt;/em&gt; four times.&lt;/p&gt;

&lt;p&gt;Then I started adding things. Each addition felt like a feature, not an architectural shift. The commit log reads like a feature log:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;0.2: install-time options. Pick your agent. Add a target.&lt;/li&gt;
&lt;li&gt;0.3: restructured the harness into corpus, guides, sensors, flywheels. Named taxonomy for the first time.&lt;/li&gt;
&lt;li&gt;0.4: a kind taxonomy on guides and sensors. Different kinds, different load behavior.&lt;/li&gt;
&lt;li&gt;0.5: a migrate command for forward-compatible upgrades. The harness had a schema now, and the schema had a version.&lt;/li&gt;
&lt;li&gt;0.10: policy plugins. Org-level rules that projects pulled in.&lt;/li&gt;
&lt;li&gt;0.11: the policy cascade. Org → Team → Project. Strict and non-strict layers.&lt;/li&gt;
&lt;li&gt;0.12: playbooks. Ordered chains of actions.&lt;/li&gt;
&lt;li&gt;0.13: sensors as a tier-aware policy kind.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Read that list once and it looks like product growth. Read it twice and the shape gets harder to miss. By 0.13 there was a runtime with conventions, plugins, a cascade, a lockfile, migrations, and per-agent rendering. The README still said &lt;em&gt;installer&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The thing about a shape forming under a name is that the name is the last thing to change. Each individual commit looks like a small feature; the architectural mass piles up underneath and nobody is watching the mass. I was not, anyway. I was writing the next feature.&lt;/p&gt;

&lt;p&gt;When I opened a fresh document called PLAN-10.md and tried to summarize where things stood, the first sentence I wrote was: &lt;em&gt;Convert Keystone from a harness installer with org policy plugins into a harness framework.&lt;/em&gt; I stared at that for a while. The conversion was not a conversion. &lt;strong&gt;It was an admission.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  What an installer was pretending not to be
&lt;/h3&gt;

&lt;p&gt;Here is what makes the installer label dishonest in retrospect. An installer drops files. It is done. The user takes it from there. The framework that Keystone had become did all of this on top:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;It defined named extension points.&lt;/strong&gt; Guides, corpus, sensors, actions, playbooks, adapters per agent. Each one had a path convention, a frontmatter contract, and a load behavior. That is an API, not a directory listing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It resolved conflicts across sources.&lt;/strong&gt; The same /.md could exist in the project, in a team plugin, in an org plugin. Exactly one won. The resolution order mattered, and the ordering rules were stable across versions. That is a runtime, not a copy step.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It tracked drift.&lt;/strong&gt; A lockfile pinned per-source SHAs and per-file hashes. The binary would notice when plugin files had been edited under it. That is integrity enforcement, not file-laying.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It migrated old layouts forward.&lt;/strong&gt; keystone migrate would walk an installed harness, apply numbered transforms, and bring it to the current schema. That is a schema migrator, with all the implications of having a schema.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It rendered the same harness into multiple targets.&lt;/strong&gt; Claude Code’s CLAUDE.md shape, Codex’s AGENTS.md, Cursor’s .cursor/rules/. The adapter layer translated one source into many. That is a code generator, not a paste-once-and-done.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Take any one of those in isolation and you can argue it was a feature an installer happened to include. Take them together and the argument falls over. Installers do not carry adapter layers. They do not have lockfiles. They do not have migration runners. They certainly do not have a cascade resolver with strict and non-strict overrides.&lt;/p&gt;

&lt;p&gt;The give-away, when I went looking, was the layout of the source tree. Go files at the repo root next to a harness/ directory full of markdown that the binary embedded. No physical boundary. A single Go module mixing the runtime with the content the runtime shipped. If you asked me to point at “the framework” and “the content” in the 0.x tree, I would have had to do it with words, not directories. &lt;em&gt;That is a tell.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The vocabulary did the heavy lifting
&lt;/h3&gt;

&lt;p&gt;Here is the part of the story that surprised me most. I was bracing for a rewrite — the kind where you sit down with a blank internal/framework/ directory and start naming concepts from scratch, then translate the old code into them, then translate again because the first names were wrong. That is the usual cost of recognizing a shape late.&lt;/p&gt;

&lt;p&gt;It did not happen. The abstractions held.&lt;/p&gt;

&lt;p&gt;When I sat down to write the ports-and-adapters layer of the 1.0 plan, I had a table to draft. One row per port. For each port, a name, a path convention, an activation rule. I wrote it expecting to discover gaps. Here is what I found instead.&lt;/p&gt;

&lt;p&gt;Every port already had a name. Every name was already in use. Every path was already conventional. &lt;em&gt;Guide&lt;/em&gt; meant a rule loaded on every turn at guides//.md. &lt;em&gt;Corpus&lt;/em&gt; meant reasoning loaded on demand at corpus//.md. &lt;em&gt;Sensor&lt;/em&gt; meant an automated check at sensors/.md. &lt;em&gt;Action&lt;/em&gt; meant a single unit of lifecycle work. &lt;em&gt;Playbook&lt;/em&gt; meant an ordered chain of actions. &lt;em&gt;Adapter&lt;/em&gt; meant the per-agent binding at adapters//…. The table I sat down to invent had been the truth of the code since half-way to 1.0. I was writing it down, not designing it.&lt;/p&gt;

&lt;p&gt;Writing it down still mattered. The contract was implicit until that table existed. New ports would have slipped in by accident the same way old ones had. But the &lt;em&gt;design&lt;/em&gt; work had already happened, one commit at a time, while I thought I was just shipping features.&lt;/p&gt;

&lt;p&gt;That is the angle worth chewing on. The names I had given things while building features carried more architectural weight than any single commit they appeared in. &lt;em&gt;Guide&lt;/em&gt; and &lt;em&gt;corpus&lt;/em&gt; drew the line between rules-loaded-every-turn and reasoning-loaded-on-demand. &lt;em&gt;Sensor&lt;/em&gt; drew the line between an automated check and an aspirational rule. &lt;em&gt;Action&lt;/em&gt; and &lt;em&gt;playbook&lt;/em&gt; drew the line between an atomic unit of work and an ordered chain of them. Every one of those lines was a load-bearing distinction in the runtime, and I had named them all before I knew they were load-bearing.&lt;/p&gt;

&lt;p&gt;The corollary is the part to hold onto: when the refactor came, &lt;strong&gt;I did not have to invent a single new concept&lt;/strong&gt;. I had to relocate the code that &lt;em&gt;implemented&lt;/em&gt; the concepts. The concepts themselves stayed put. Naming had front-loaded the design work. The refactor was a directory move on top of a vocabulary that was already correct.&lt;/p&gt;

&lt;p&gt;This is the underrated payoff of taking names seriously while building. If you have named the abstractions well in version 0.3, you can refactor the runtime in version 1.0 without renaming anything. If you have named them badly, version 1.0 starts with a six-month tax to fix the words before you can fix the code. The cheap version of a hard refactor is the one where the user-facing vocabulary is already correct and only the implementation moves.&lt;/p&gt;

&lt;p&gt;It also changes the social cost of the refactor. A user who learned &lt;em&gt;guide&lt;/em&gt; and &lt;em&gt;corpus&lt;/em&gt; in 0.3 still knows what those words mean in 1.0. The blog post they wrote about the harness in 0.5 is still accurate. The wiki page in their team’s Notion has not gone stale. Refactor-without-renames is the kind a user does not feel.&lt;/p&gt;

&lt;h3&gt;
  
  
  Writing it down made it real
&lt;/h3&gt;

&lt;p&gt;Once I admitted what was happening, the next move was to make the framing physical. Not “we say it is a framework now.” That is marketing. The framework had to &lt;em&gt;look&lt;/em&gt; like a framework when you opened the source tree, and behave like one when you read the code.&lt;/p&gt;

&lt;p&gt;The plan went through six phases. Each had a small handful of commits, none too clever, each one a small structural improvement. Order mattered, because some moves enabled others.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fzn1bd0kdy7phslii0puq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fzn1bd0kdy7phslii0puq.png" width="800" height="45"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The phased refactor, in order. Each arrow is a phase the next one depends on.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A few things about the order are worth pulling out.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 0 was the cheapest and the most important.&lt;/strong&gt; Before any code moved, I wrote eight architecture decision records and one port contract per abstraction. Each ADR was a single page: context, decision, consequences, alternatives considered. Each port contract was a single page: path convention, required frontmatter, cascade behavior, an example, the command that scaffolds it. The total page count was around twenty. Writing them took two days. They turned out to be the most useful pages in the whole project, because every subsequent phase referenced them. When a phase started, I re-read the relevant ADR; when a phase finished, I re-read it again to check we had stayed inside the lines.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 1 was the physical line down the middle.&lt;/strong&gt; Every Go file under the repo root got moved into internal/framework/. The CLI entrypoint moved to cmd/keystone/. The template tree relocated to internal/framework/scaffold/templates/. After Phase 1, you could point at the framework in the directory tree without using words. That is a small thing that turns out to be a big thing. Two months later, when a contributor asks “is this framework behavior or content?”, the answer is a path, not a paragraph.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phases 2 through 5 were small structural improvements stacking on top of that line.&lt;/strong&gt; JSON-only config (Phase 2), because YAML had drifted across half a dozen schemas during 0.x. Vendored read-only plugins (Phase 3), because the plugin model had been “edit the markdown in harness/policies/” and that turned out to be the wrong default. Projects would silently diverge from upstream and nobody would notice. Conventions, generators, and a doctor command (Phase 4), because once you have named ports you can write generators that scaffold an adapter for each port. Per-port token budgets (Phase 5), because once you have named ports you can also count tokens per port and tell the user when one is bloated.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 6 was the documentation phase.&lt;/strong&gt; An upgrade guide for 0.x users. A compatibility doc spelling out what 1.x promised to keep stable. Nothing structural changed in Phase 6. The point was to make a contract that the next year’s worth of changes had to respect.&lt;/p&gt;

&lt;p&gt;Each phase landed in a separate set of commits. Each commit was small enough to read in one sitting. None of them were a “big refactor commit.” That is the texture of a refactor that respects the work already there: many small surgical moves, each one preserving behavior, with the architecture emerging from the sum.&lt;/p&gt;

&lt;h3&gt;
  
  
  The plugin draft we threw away
&lt;/h3&gt;

&lt;p&gt;I want to talk about a wrong turn, because every refactor has one, and the ones that do not get talked about tend to be the ones that bite later.&lt;/p&gt;

&lt;p&gt;The first draft of the 1.0 plan made the built-in defaults (universal guides, lifecycle playbook, default sensors, per-agent adapters) into &lt;em&gt;first-class plugins shipped embedded in the binary&lt;/em&gt;. The plan read:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The universal engineering corpus/guides, the lifecycle actions, the task playbook, and the default sensors all become first-class policy plugins — same shape as user-installed plugins, loaded by the same engine.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The promise was symmetry. Built-ins and external policy would travel the same pipeline. One loader, one cascade, no special cases. It was clean. It was elegant. It also had a giant cost buried in the elegance: the moment defaults are loaded as plugins, &lt;em&gt;editing a default&lt;/em&gt; stops being “edit a markdown file in your repo” and becomes “fork the embedded plugin or override it from a higher layer.”&lt;/p&gt;

&lt;p&gt;That breaks the Rails-style ergonomics I wanted at the project layer. The whole point of conventions over configuration is that the conventional file is just sitting there in the repo, your repo, your git, ready to be edited like any other file. If the default lives inside the binary and only appears in the consumer’s repo as a &lt;em&gt;shadow&lt;/em&gt; that overrides it, you have turned a one-line edit into a four-step debugging session. &lt;em&gt;Where is this rule actually coming from? Is the override winning? Why is my edit not taking effect?&lt;/em&gt; I have seen that pattern in tools that load defaults from inside their distribution, and the answer is always the same: it is a constant low-level tax on every user, paid forever.&lt;/p&gt;

&lt;p&gt;I caught it because I sat with the plan for a day before starting any code. Two of the ADRs (number five, &lt;em&gt;Conventions, not plugins&lt;/em&gt;, and number two, &lt;em&gt;Framework / client boundary&lt;/em&gt;) ended up being the place where the wrong turn got walked back. The decision was: defaults are &lt;em&gt;scaffolded&lt;/em&gt; into the consumer’s harness// on keystone init, from embedded templates. From that moment on, defaults are project content. The user edits them as markdown files in their own git. There is no override mechanism for defaults, because there is nothing to override. There is just one file sitting on their disk.&lt;/p&gt;

&lt;p&gt;Plugins still exist. They do one job: share policy across projects. Read-only, vendored, hash-verified, drift-reset on the next run. They are not the mechanism for shipping defaults, and the mechanism for shipping defaults is not the mechanism for sharing policy. Two concerns, two mechanisms. The earlier symmetry was buying elegance with the user’s debugging time.&lt;/p&gt;

&lt;p&gt;The lesson I want to draw from this is narrower than “avoid premature symmetry.” It is: when two mechanisms produce a surface that looks identical to the user, you should still ask whether the &lt;em&gt;editing model&lt;/em&gt; is identical. If the user edits both the same way, symmetry is paying its rent. If the user edits one of them and merely &lt;em&gt;reads&lt;/em&gt; the other, you have two concerns that happen to share a shape, and unifying them costs you the editing UX of the one the user actually edits.&lt;/p&gt;

&lt;p&gt;I had to throw away two days of plan-writing to walk that back. It was worth it. &lt;em&gt;The walkback is what made the 1.0 surface usable.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Code moved, concepts did not
&lt;/h3&gt;

&lt;p&gt;When 1.0 shipped, I went back through the changelog to count. The framework had moved every Go file into a new location. It had dropped YAML. It had rewritten the cascade resolver. It had added a vendored plugin model, a doctor command, a budget command, a port-level scaffold generator, and per-agent adapter regeneration. The repo layout looked nothing like 0.x.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The user-facing vocabulary had not changed.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A 0.x user who never read the 1.0 plan could open the new docs and find every word they already knew. &lt;em&gt;Guide&lt;/em&gt; still meant rules loaded every turn. &lt;em&gt;Corpus&lt;/em&gt; still meant reasoning loaded on demand. &lt;em&gt;Sensor&lt;/em&gt; still meant an automated check. &lt;em&gt;Action&lt;/em&gt; and &lt;em&gt;playbook&lt;/em&gt; still meant what they meant. The only new word at 1.0 was &lt;em&gt;port&lt;/em&gt;, and &lt;em&gt;port&lt;/em&gt; was a name for something the user already understood without the name. Namely, the named slot a piece of content lives in.&lt;/p&gt;

&lt;p&gt;This is the payoff I underestimated when I was naming things in 0.2 and 0.3. The cost of naming a thing well at the start is that you have to actually sit with it for a few minutes and ask whether the name says what the thing does. The benefit is that later, when you refactor the runtime, you do not have to renegotiate the contract with every user who learned the old word. &lt;em&gt;The user’s&lt;/em&gt; mental model does not change. Only the implementation does.&lt;/p&gt;

&lt;p&gt;A refactor that does not rename anything is the cheapest kind for a user. They keep their docs, their wiki, their training onboarding. They keep the words they say out loud when describing the product to a teammate. The internals can move freely as long as the names stay fixed.&lt;/p&gt;

&lt;h3&gt;
  
  
  How shapes form under names
&lt;/h3&gt;

&lt;p&gt;Step back from Keystone for a second. The pattern is broader than one project.&lt;/p&gt;

&lt;p&gt;Tools grow into frameworks. They almost always do, if they live long enough. The path looks like this: a small utility solves a small problem. It picks up a configuration file. The config file grows extension points. The extension points need naming conventions. The naming conventions need a resolver. The resolver needs ordering rules. The ordering rules need a way to opt out, then a way to opt back in. Suddenly there is a runtime, and the runtime has a contract with everything plugged into it, &lt;em&gt;and the contract is a framework whether you call it one or not&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The interesting question is when to notice. Too early and you are over-designing: naming abstractions before you have three concrete uses for them, building cascade rules for a single tier, writing port contracts for one port. Too late and the runtime has accumulated implicit contracts that nobody wrote down, and the cost of writing them down is paid in surprise regressions when you try to clean up.&lt;/p&gt;

&lt;p&gt;The way I think about it now: the moment you find yourself adding a &lt;em&gt;resolver&lt;/em&gt; (anything that picks a winner among multiple sources of the same concept) you are in framework territory. Lockfiles and migrations are also strong signals. Per-target rendering (the adapter layer) is a hard signal. Any one of those, taken alone, is not enough. Two of them together is. Three of them together and the question is not &lt;em&gt;if&lt;/em&gt; you are building a framework, it is whether the framework is going to be &lt;em&gt;honest&lt;/em&gt; about itself in its README.&lt;/p&gt;

&lt;p&gt;If you only catch it at three, the way I did, you are not in trouble. You are in the spot where naming the shape costs you a few days of writing and a few weeks of refactoring, and the user-facing surface comes out untouched on the other side. If you catch it at six or seven signals deep, the cost is higher, because by then there are real users with real assumptions and the cleanup runs into compatibility costs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Three things to do this week
&lt;/h3&gt;

&lt;p&gt;If you are reading this and recognizing your own tool in the description, here is what I would do this week, in order.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Write the port table.&lt;/strong&gt; Open a new document. List every named extension point in the tool. For each one, write the path convention, the activation rule, and the shape of the file (or config) that goes there. Do not stop until the table is exhaustive. The table is the easiest possible draft of the framework contract. If it takes an hour, you were already a framework and your vocabulary was holding the design up; if it takes a week and you keep discovering anonymous concepts, you have found the work that needs naming first. Either way, this is the highest-value hour you can spend on the project this week.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Draw the physical boundary.&lt;/strong&gt; Look at your source tree. Can you point at “framework code” and “content the framework happens to ship” using a path, or do you have to use words? If it is words, move directories until it is a path. This is the single most clarifying refactor for a maturing tool, and the tests will catch you if you break anything along the way. One day’s work for a small project, a week for a medium one. From the moment it lands, every conversation about “where does this go?” gets shorter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Write ADRs for the decisions you have already made.&lt;/strong&gt; Not the future decisions, but the ones already locked in by the code. Three to ten ADRs of one page each. Context, decision, consequences, alternatives considered. Future you will thank present you the next time someone asks why X is the way it is. The act of writing them often surfaces a wrong turn you can still back out of cheaply, exactly like the plugin draft I walked back. Better to find that during ADR-writing than during a refactor PR.&lt;/p&gt;

&lt;p&gt;After those three, you will have a clear picture of what you have, where it lives, and why. The work that comes next is the work that actually changes things — the renaming, if any, the directory moves, the new scaffolders, the migration path for existing users. That work is real and it takes time. But you can scope it, plan it, and ship it in phases that do not break users, because the contract you wrote down in the first three steps is the contract every subsequent phase has to honor.&lt;/p&gt;

&lt;p&gt;The thing I keep coming back to from the Keystone refactor is how much of the work had been done already by past versions of me when I named &lt;em&gt;guide&lt;/em&gt; and &lt;em&gt;corpus&lt;/em&gt; and &lt;em&gt;sensor&lt;/em&gt; and &lt;em&gt;action&lt;/em&gt; in different commits weeks apart. They were not designing a framework. They were each adding one feature and giving it a name. The framework emerged from the sum of those names. The 1.0 refactor was the moment I looked up and noticed.&lt;/p&gt;

&lt;p&gt;If you have been adding features and naming them well, your framework is already there. It is just waiting for you to write it down.&lt;/p&gt;

&lt;h3&gt;
  
  
  Keystone
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.tacoda.dev/keystone/" rel="noopener noreferrer"&gt;Keytsone&lt;/a&gt; is now 2.1.1! It is open source and MIT-licensed. I am very open to feedback, so if you have any, please create a discussion or an issue on &lt;a href="https://github.com/tacoda/keystone" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. It has a &lt;em&gt;real&lt;/em&gt; chance of landing.&lt;/p&gt;

&lt;p&gt;A lot landed between the 1.0 cut and 2.1.1, and all of it sits on top of the framework/client split 1.0 introduced. The shape of the project didn’t change. The surfaces around it did.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2.0&lt;/strong&gt; was the big one, shipped June 17. A new primitive taxonomy ran through the harness end to end. An in-binary &lt;strong&gt;MCP server&lt;/strong&gt; moved keystone onto the same wire its agents already speak, so there’s no separate process to babysit. A &lt;strong&gt;localhost dashboard&lt;/strong&gt; surfaced live state for the first time. An &lt;strong&gt;eval suite&lt;/strong&gt; came with it. The disk layout moved alongside all of that, and the upgrade was still one command: keystone migrate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2.1.0&lt;/strong&gt; followed the next day. The old patches mechanism retired in favor of a versioned migrations subsystem with paired Up/Down transforms under migrations//, run with keystone migrate up | down | status. The plugin→policy rename finished across Go source, JSON schemas, and docs in the same release. Installs that hadn’t migrated yet started warning and continuing instead of breaking, so the upgrade path stays soft.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2.1.1&lt;/strong&gt; reshaped the dashboard. It’s now an HTMX SPA with a single &lt;/p&gt; swap target, real back/forward navigation, and fragment responses keyed off HX-Request. The 14 single-purpose nav links collapsed into five sections: &lt;strong&gt;Observability, Harness, Sources, Flywheels, Quality.&lt;/strong&gt; SSE topic narrowing means a widget only refreshes when its own path changes. A per-session audit log lands at .keystone/state/audit/session--.jsonl, opened with O_CREATE|O_EXCL so nothing ever overwrites a prior run. &lt;em&gt;Cmd+K&lt;/em&gt; (&lt;em&gt;Ctrl+K&lt;/em&gt; on Windows and Linux) opens a search popover from any page.

&lt;p&gt;If you’re still on 1.x, the upgrade is the same as the install (except Brew, which has an upgrade subcommand. It carries you the whole way to 2.1.1. To bring a project harness up-to-date with the new core framework structure, run keystone migrate up. This will patch core files only and will never touch user-edited files.&lt;/p&gt;

</description>
      <category>softwareengineering</category>
      <category>harnessengineering</category>
      <category>agenticai</category>
      <category>agentharness</category>
    </item>
    <item>
      <title>The Harness Is Also Onboarding</title>
      <dc:creator>Ian Johnson</dc:creator>
      <pubDate>Thu, 18 Jun 2026 15:14:48 +0000</pubDate>
      <link>https://dev.to/tacoda/the-harness-is-also-onboarding-1lba</link>
      <guid>https://dev.to/tacoda/the-harness-is-also-onboarding-1lba</guid>
      <description>&lt;p&gt;A new engineer joins the team. Their first question (even if they don’t know it): &lt;em&gt;Where should I read first to understand how things work here?&lt;/em&gt; I started to answer with the usual list — the README, the architecture doc, the deployment runbook — and stopped halfway through. The most useful document in the repo, by a wide margin, was the project’s CLAUDE.md. I had not written it for an agent, not a human. It still answered most questions better than the docs I had written for humans.&lt;/p&gt;

&lt;p&gt;After reading, a new team member will have sharper questions than the previous three hires had asked in their first week. The questions are sharper because the harness had taught him the team’s conventions, naming, escalation patterns, and what not to do. The things that take months to absorb from code review and tribal memory had been compressed into a file he read on day two.&lt;/p&gt;

&lt;p&gt;The harness is also onboarding. We had not designed it that way, but it was working that way anyway, and once I noticed it I started writing it that way on purpose.&lt;/p&gt;

&lt;p&gt;This is the post about what changed when I noticed. Not a lot, in terms of effort. A lot, in terms of who the file was for and how the team treated it.&lt;/p&gt;

&lt;h3&gt;
  
  
  The README is aspirational; the harness is forensic
&lt;/h3&gt;

&lt;p&gt;A harness captures the things the team has learned, mostly by paying for them. Every rule has an incident behind it. Every annotation explains a choice that was not obvious at the time. Every scope adjustment reflects a misread that the team made, noticed, and corrected. The file accumulates the way a scar accumulates; one event at a time, each event leaving a mark.&lt;/p&gt;

&lt;p&gt;A traditional onboarding doc is written before the team has made the mistakes. It says what the architecture is supposed to look like, what the deploy process is supposed to be, what the conventions are supposed to be. It is aspirational. The harness is forensic.&lt;/p&gt;

&lt;p&gt;The forensic document is more useful for a new hire because it tells them what actually happens when you do the wrong thing, not what you are supposed to do in theory. The new hire’s first PR is going to break three conventions. The harness tells them which three, and why, before they write the PR. The architecture doc does not. &lt;em&gt;It cannot&lt;/em&gt;, because the conventions did not exist when it was written.&lt;/p&gt;

&lt;p&gt;The accidental onboarding works because the harness is honest about the team’s reality in a way the docs built for humans tend not to be. A new hire does not need to know what the system &lt;em&gt;should&lt;/em&gt; be. They need to know what happens when they touch it.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the README leaves out
&lt;/h3&gt;

&lt;p&gt;The README has the project’s pitch and the commands to run. The architecture doc has the diagrams. The contributing guide has the PR template. None of these tell a new engineer what the team has learned to be careful about.&lt;/p&gt;

&lt;p&gt;The harness does. Here are four kinds of lines the harness tends to carry and the human docs tend not to:&lt;/p&gt;

&lt;p&gt;A rule that says &lt;em&gt;in the migrations directory, always test against a real database&lt;/em&gt;. That line tells the new hire this team has paid for a mocked-migration mistake. They learn the lesson without paying for it.&lt;/p&gt;

&lt;p&gt;A rule that says &lt;em&gt;errors in this service propagate as Result types, not exceptions&lt;/em&gt;. That line tells the new hire about a convention that pervades the codebase and would otherwise be invisible until their first review came back red.&lt;/p&gt;

&lt;p&gt;A rule that says &lt;em&gt;do not edit anything in&lt;/em&gt; &lt;em&gt;legacy/; coordinate in&lt;/em&gt; &lt;em&gt;#team-platfrom first&lt;/em&gt;. That line tells the new hire about a political-technical boundary that nobody put in a doc but everyone on the team knows.&lt;/p&gt;

&lt;p&gt;A rule that says &lt;em&gt;the test for this module hits a real Redis; bring it up with&lt;/em&gt; &lt;em&gt;docker-compose up redis&lt;/em&gt;. That line gives them the exact command they would otherwise spend twenty minutes searching for.&lt;/p&gt;

&lt;p&gt;The harness is dense. Every line is something the team chose to say. A new hire reads density they can use; they do not read fluff. That density is also why the harness wins against the README on day two. The README has the bird’s-eye view, but the new hire does not need a bird’s-eye view to ship their first PR. They need the things that will get them in trouble.&lt;/p&gt;

&lt;h3&gt;
  
  
  Writing for two readers at once
&lt;/h3&gt;

&lt;p&gt;Once you accept the dual role, a few writing choices change. None of them make the harness longer. They make it more useful per line.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Annotate the reason.&lt;/strong&gt; A rule that says &lt;em&gt;use the structured logger&lt;/em&gt; is enough for the agent. A rule that says &lt;em&gt;use the structured logger; we had a parsing outage in February because grep-style logs broke the alert pipeline&lt;/em&gt; tells the new hire the why. The agent does not need the why to apply the rule. The new hire does. The cost is one extra line. The benefit is a harness that explains itself the first time it is read.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Name the conventions, not just the rules.&lt;/strong&gt; The harness gets sharper for the agent if rules are imperative — &lt;em&gt;always X&lt;/em&gt;, &lt;em&gt;never Y&lt;/em&gt;. The harness gets sharper for the new hire if the conventions have names. &lt;em&gt;We call this the import-flow contract&lt;/em&gt; is more useful for a human than a list of three imperatives in the import directory, even if the latter is more usable for the agent. The trick is to do both: name the convention in the section header, then list the imperatives below.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Make the boundaries explicit.&lt;/strong&gt; The harness should say which modules are owned by which sub-team, which directories require coordination, which areas are stable and which are in flux. The agent uses this to know where to be careful. The new hire uses it to know who to ask. The same line does both jobs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keep it readable end to end.&lt;/strong&gt; A harness that is sectioned and skimmable is a harness a new hire can actually finish in an hour. A harness that is one unbroken list of rules is a harness the new hire gives up on at minute eleven. The agent does not care about pacing. The human reader does, and the human reader is the one who closes the file if it loses them.&lt;/p&gt;

&lt;p&gt;The dual-role harness is not longer than the agent-only harness. It is the same length, with two or three extra lines per major rule and a structure that supports both readers.&lt;/p&gt;

&lt;h3&gt;
  
  
  The onboarding flow that puts the harness first
&lt;/h3&gt;

&lt;p&gt;Our current first-week flow for a new engineer:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Day one: environment.&lt;/strong&gt; Clone the repo, install dependencies, run the test suite locally. This is the boring stuff and it has not changed. The harness does not help with this; a good make bootstrap does.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Day two: read the project&lt;/strong&gt;  &lt;strong&gt;CLAUDE.md end to end.&lt;/strong&gt; Read it slowly. Ask questions in the team channel as they come up. The questions become annotations the harness was missing. The whole day is for this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Day three: read one subdirectory’s&lt;/strong&gt;  &lt;strong&gt;CLAUDE.md.&lt;/strong&gt; The one closest to the engineer’s first task. The path-scoped harness has the local conventions, which are denser and more specific than the project-level rules. By the end of the day the engineer has a working model of the area they are about to change.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Day four: a small PR, with the agent assisting.&lt;/strong&gt; The harness catches the failure modes the engineer would otherwise walk into. The human review is on the substance of the work, not on the conventions, because the harness handled the conventions before the PR was opened.&lt;/p&gt;

&lt;p&gt;The first week ends with the engineer having absorbed more of the team’s accumulated practice than they would have in two weeks under the previous flow. The cost of writing the harness for the dual role was small — three or four hours of editing, spread over a couple of evenings. The return compounds on every subsequent new hire.&lt;/p&gt;

&lt;h3&gt;
  
  
  A new hire’s questions are a free audit
&lt;/h3&gt;

&lt;p&gt;The thing I underestimated, before noticing this pattern, was how much signal a new hire’s questions are for the harness itself.&lt;/p&gt;

&lt;p&gt;A new hire asking &lt;em&gt;why does this module have a different error pattern than the others&lt;/em&gt; is telling me the harness does not name the convention clearly enough, or does not name it at all. A new hire confused about which directory a particular kind of code belongs in is telling me the harness does not encode the structural rule. A new hire surprised by a CI failure is telling me the pre-commit hooks are not communicating their constraints clearly. None of these are gaps in the new hire’s understanding. They are gaps in my document.&lt;/p&gt;

&lt;p&gt;I treat the new hire’s first month as a free audit of the harness. Their questions are the gaps. Their confusions are the rules that need to be sharper. I patch as the questions come in, which means by the time the next new hire arrives, the document is a measurable amount better. The cycle has run six times now and the file has gotten better every single round.&lt;/p&gt;

&lt;p&gt;This is the second dividend of the dual role. The harness is onboarding. The onboarding feedback is harness improvement. The flywheel runs in both directions, and it runs for free. The new hire is going to ask the questions either way.&lt;/p&gt;

&lt;h3&gt;
  
  
  What this does to the architecture doc
&lt;/h3&gt;

&lt;p&gt;The first time I noticed all this, I asked whether the architecture doc was still earning its place. The answer turned out to be yes, but a narrower yes than before.&lt;/p&gt;

&lt;p&gt;The architecture doc has the diagrams, the high-level shape of the system, the names of the services and how they communicate. It is the map. The harness is the ground truth: the conventions, the constraints, the practices the map does not capture.&lt;/p&gt;

&lt;p&gt;The new hire reads the map first to know what they are looking at, then the ground truth to know how to walk on it. Both are still required. The harness did not replace the architecture doc; the harness made the architecture doc a smaller document, because the harness was carrying the load the architecture doc had been trying to carry and failing.&lt;/p&gt;

&lt;p&gt;The map is short. The ground truth is detailed. That division turns out to be the right one. The diagram I used to update every quarter to keep up with the conventions is now stable for a year at a time, because the conventions live in the file that gets edited every week anyway.&lt;/p&gt;

&lt;h3&gt;
  
  
  Read your harness like a new hire this week
&lt;/h3&gt;

&lt;p&gt;If you are running an AGENTS.md or CLAUDE.md or any equivalent file on a team project, three things to do this week.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Open the harness and read it cold.&lt;/strong&gt; Pretend you joined the team last Monday. You do not know what the codebase does. You do not know who owns what. Read every line. Note every sentence that assumes context a stranger would not have. That list is your patch list. Annotate the reasons. Name the conventions. Fix the unexplained jargon. Do this in one sitting if you can; the cold-reader perspective decays fast.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hand the patched harness to the next hire instead of writing them a new onboarding doc.&lt;/strong&gt; Sit with them while they read it. Write down every question they ask out loud. Their questions are your next patch list. Do not defend the existing document. Edit it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Move one section out of the architecture doc and into the harness.&lt;/strong&gt; Pick the section that has gone stale fastest — usually the conventions section, or the section listing which team owns which module. Put it where the file is actually updated. Delete the now-empty section in the architecture doc, or leave a one-line pointer. The architecture doc gets smaller. The harness gets one more thing it is honest about.&lt;/p&gt;

&lt;p&gt;After those three, the harness is doing on purpose what it had already started doing by accident. A document that is read by both the agent and the new hire has to earn its place twice. The rules that earn their place twice are the rules worth keeping. The rules that earn their place only for the agent are still load-bearing, but the dual-role audit makes them visible, which is the first step to making them better.&lt;/p&gt;

&lt;p&gt;The harness is doing more work than you think. Notice the second job. Then write it on purpose.&lt;/p&gt;

</description>
      <category>knowledgesharing</category>
      <category>agenticai</category>
      <category>harnessengineering</category>
      <category>employeeonboarding</category>
    </item>
    <item>
      <title>Keystone 2.0 — A Worthy 2.0</title>
      <dc:creator>Ian Johnson</dc:creator>
      <pubDate>Wed, 17 Jun 2026 19:41:27 +0000</pubDate>
      <link>https://dev.to/tacoda/keystone-20-a-worthy-20-2jhp</link>
      <guid>https://dev.to/tacoda/keystone-20-a-worthy-20-2jhp</guid>
      <description>&lt;p&gt;&lt;a href="https://www.tacoda.dev/keystone/" rel="noopener noreferrer"&gt;https://www.tacoda.dev/keystone/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A major version should mean something. If you ship 2.0 and a user opens the repo to find the same shape with a new number on it, you have wasted their attention. The shelf is already full of frameworks that did that. So when I started planning the next Keystone release, the question I kept asking was simple: what would make a developer say, out loud, “this is a different tool now”?&lt;/p&gt;

&lt;p&gt;Two answers held up. The first was &lt;em&gt;shape&lt;/em&gt;. Keystone 1.x had the right ideas (guides, corpus, sensors, actions, playbooks, adapters) but the taxonomy stopped just short of being a real framework vocabulary. A user could not look at the abstractions and immediately know where the next thing they needed lived. The second was &lt;em&gt;observability&lt;/em&gt;. The harness was healthy, but you had to take my word for it. There was no operator view. No dashboard. No way to see, at a glance, what the agent saw.&lt;/p&gt;

&lt;p&gt;2.0 fixes both.&lt;/p&gt;

&lt;h3&gt;
  
  
  A real framework vocabulary
&lt;/h3&gt;

&lt;p&gt;Keystone is the agent harness framework. That has been the pitch since 1.0. The Rails analogy is the right one. A working set of components, conventions, and slots, so the team building on top isn’t inventing the world from scratch each Monday morning.&lt;/p&gt;

&lt;p&gt;2.0 makes the vocabulary explicit. &lt;strong&gt;Eleven primitive kinds in two layers&lt;/strong&gt; : Framework — guide, corpus, sensor, action, playbook, eval, source — and Agent — rule, skill, subagent, command, persona. Every file in .keyston/harness/ carries canonical frontmatter declaring its kind, id, and per-kind required fields. The walker emits a single .keystone/INDEX.json that every tool reads first. You stop searching the directory tree for where a thing lives. You ask the index.&lt;/p&gt;

&lt;p&gt;This is the part that makes 2.0 feel like a different tool. The old harness/ layout worked. The new .keystone/harness/ layout &lt;em&gt;teaches&lt;/em&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  An operator view
&lt;/h3&gt;

&lt;p&gt;keystone web serve opens localhost:4773. The dashboard is fourteen pages of insight into the harness you just installed: home, metrics, insights, primitives, policies, investigator, sources, verify, prune, inbox, flywheels, evals, search, graph. Same binary. HTMX plus SSE; fsnotify on .keystone/ swaps fragments when files change. Open it in a browser, edit a guide in your editor, and watch the dashboard update without a refresh.&lt;/p&gt;

&lt;p&gt;A harness you can see is a harness you will actually maintain. Before 2.0, “is the harness healthy?” was a question you answered by reading files. Now it’s a tab you keep open.&lt;/p&gt;

&lt;h3&gt;
  
  
  The other things worth knowing
&lt;/h3&gt;

&lt;p&gt;A few more pieces ship in 2.0 that earn their own mention.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A built-in MCP server.&lt;/strong&gt; keystone mcp install --agent cladue-code writes .mcp.json in one shot. Twenty-one tools, four prompts, resources for index, primitives, sources, and skills. The same binary that authors the harness now dispatches it to the agent over the model-context-protocol. One source of truth, one runtime contract.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Evals with baseline diffs.&lt;/strong&gt; A new framework primitive lives at .keystone/harness/evals//EVAL.md. Static and sensor levels in 2.0; agent level reserved for 2.1. The interesting verb is keystone eval run --baseline  — it materializes the ref in a git worktree, runs both sides, and diffs the results into a regression report. Your harness gets its own test suite, and the suite knows what “last week” looked like.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A slash-command surface in 2.0.1.&lt;/strong&gt; Every Keystone action ships as a /keystone:* skill, projected into .claude/skills/ on init. /keystone:bootstrap, /keystone:learn, /keystone:synthesize, /keystone:audit, /keystone:spec, /keystone:orient, /keystone:review. The agent already knows how to call them. You just say the word.&lt;/p&gt;

&lt;p&gt;There’s more: keystone search over every primitive, keystone graph --format mermaid|dot for a relationship view, keystone watch for an fsnotify loop that re-indexes on save, the plugin → policy rename, and the retirement of --harness-root in favor of a fixed framework path. The website has the full tour.&lt;/p&gt;

&lt;h3&gt;
  
  
  Moving from 1.x
&lt;/h3&gt;

&lt;p&gt;One command. keystone migrate moves harness/ to .keystone/harness/, renames plugins/ to policies/, rewrites keystone.json to the v2 schema, regenerates the index, and refreshes host projections. It is idempotent. Pair it with keystone snapshot save --label pre-2.0 for insurance and the rollback is a single restore away.&lt;/p&gt;

&lt;h3&gt;
  
  
  Try it
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew &lt;span class="nb"&gt;install &lt;/span&gt;tacoda/tap/keystone
keystone init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The full walkthrough (every primitive kind, every CLI verb, the MCP tool surface, the dashboard tour) lives at &lt;a href="https://www.tacoda.dev/keystone/" rel="noopener noreferrer"&gt;tacoda.dev/keystone&lt;/a&gt;. Keystone is MIT-licensed and agent-agnostic; Claude Code, Cursor, Codex, Aider, Continue, Cline, Goose — whichever one you’ve already settled on, the same harness drives it.&lt;/p&gt;

&lt;p&gt;2.0 is the version where the framework shape and the operator view both showed up. 1.0 was the right idea; 2.0 is the one that earns the number.&lt;/p&gt;

</description>
      <category>harnessengineering</category>
      <category>softwareengineering</category>
      <category>agenticaiarchitectur</category>
      <category>aitools</category>
    </item>
    <item>
      <title>Versioning the Harness Itself</title>
      <dc:creator>Ian Johnson</dc:creator>
      <pubDate>Wed, 17 Jun 2026 14:36:56 +0000</pubDate>
      <link>https://dev.to/tacoda/versioning-the-harness-itself-4db7</link>
      <guid>https://dev.to/tacoda/versioning-the-harness-itself-4db7</guid>
      <description>&lt;p&gt;The harness is code. The team executes it the way the team executes code: in production, on real tasks, with stakes. Code gets versioning, change review, and migration discipline. The harness usually has none of that, and the absence costs us.&lt;/p&gt;

&lt;h3&gt;
  
  
  What versioning actually means
&lt;/h3&gt;

&lt;p&gt;I do not mean a version number in the file header. That would be theater. The harness lives in git already; commit hashes are the version.&lt;/p&gt;

&lt;p&gt;I mean the practice around changes: how a change is proposed, how it is reviewed, how it is communicated, how it is rolled out, and how it is rolled back. The discipline that surrounds any other change to shared code. The harness deserves the same discipline because it has the same blast radius. A rule change reaches every engineer the next time they start a session.&lt;/p&gt;

&lt;p&gt;The harness is infrastructure. Treating it casually because it is a markdown file is the category error.&lt;/p&gt;

&lt;h3&gt;
  
  
  The change review nobody runs
&lt;/h3&gt;

&lt;p&gt;Most teams I have talked to do not review harness changes the way they review code. The change lands in a PR, someone glances at the diff, the rule reads sensibly in isolation, the PR is approved. The conflict check, the scope check, the &lt;em&gt;who does this affect&lt;/em&gt; check — none of it happens.&lt;/p&gt;

&lt;p&gt;The minimum review I run now, on any non-trivial harness change:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does this conflict with an existing rule.&lt;/strong&gt; A grep against the harness for the same concepts, the same file paths, the same workflows. If the new rule overlaps with an old one, the change is to merge or replace, not to add.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is the scope right.&lt;/strong&gt; Project root, subdirectory, path-glob. A rule about the API code should not be in the project root if it never applies elsewhere.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What workflows is this change going to affect.&lt;/strong&gt; Not “what tasks.” What recurring workflows. A rule that says &lt;em&gt;always do X in the import flow&lt;/em&gt; affects every engineer who touches the import flow next week. The change is a coordination event.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is this reversible.&lt;/strong&gt; Some rule changes are: delete the rule, the agent goes back to whatever it did before. Some are not. The rule taught the team a habit, and the habit will persist after the rule is gone, possibly carrying the rule’s mistake with it.&lt;/p&gt;

&lt;p&gt;The review takes ten minutes. It catches most of the failures before they ship.&lt;/p&gt;

&lt;h3&gt;
  
  
  When a rule change is actually a migration
&lt;/h3&gt;

&lt;p&gt;The harness sometimes needs more than a rule change. It needs a migration. The structure shifts; a section moves; the conventions for how rules are written change. Those changes break every contributor’s mental model at once, the same way a directory restructure breaks every IDE’s open tabs at once.&lt;/p&gt;

&lt;p&gt;The teams that handle this well treat it the same way they handle a code migration: announced ahead of time, deployed on a Monday, with a written migration note in the PR. The note says what changed, why, what to update if you have in-flight work, and who to ask if something looks broken.&lt;/p&gt;

&lt;p&gt;The teams that handle this poorly merge the migration on a Friday afternoon and answer DMs about it for the next week.&lt;/p&gt;

&lt;p&gt;The cost of the announcement is fifteen minutes of writing. The cost of not announcing is hours of fragmented confusion across the team, plus the slow erosion of trust in the harness that comes from being surprised by it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rolling out without breaking everyone
&lt;/h3&gt;

&lt;p&gt;The rollout discipline matches the change’s blast radius.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Additive change.&lt;/strong&gt; A new rule that does not conflict with anything, scoped to a path that already has a CLAUDE.md, applies to new work only. Merge it. The team notices the next time the agent acts in that path.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Change to an active workflow.&lt;/strong&gt; Post in the team channel before merging, name the workflow, name the change, give people a chance to push back or flag in-flight work. Wait a day if anyone has a branch in that workflow. The cost is one day of delay; the benefit is that the rule lands without breaking active work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Structural change.&lt;/strong&gt; Write a migration note, schedule it for a Monday morning, batch other harness changes into the same window if they overlap. The team gets one event to update their mental model rather than five spread over the week.&lt;/p&gt;

&lt;p&gt;A rule about commit messages is not the same blast radius as a rule about how the API layer handles errors. Treating them with the same care is overkill on one and insufficient on the other.&lt;/p&gt;

&lt;h3&gt;
  
  
  The rollback that earns its place
&lt;/h3&gt;

&lt;p&gt;The rollback is easy because the harness is in git. Revert the commit, the agent reads the old rule on the next session, the workflow recovers.&lt;/p&gt;

&lt;p&gt;The rollback that is not easy is the one where the rule taught the team a behavior that survived the rule. A rule that said &lt;em&gt;name files like X&lt;/em&gt; sat in the harness for two months. The team adopted the convention. The rule got rolled back when the convention turned out wrong for a different module. The agent now sees the convention everywhere in the codebase and treats it as fact, even though the harness no longer says to follow it.&lt;/p&gt;

&lt;p&gt;Some rules have persistent effects, and rolling them back is not enough. The rollback has to be paired with a correction. A new rule that explicitly contradicts the old one. A test that catches the old behavior. A note in the team channel that the convention is gone. Otherwise the agent and the team both keep doing what the rule used to say.&lt;/p&gt;

&lt;p&gt;The check before merging any non-trivial rule: if this rule turned out to be wrong, what would I have to undo? If the answer is &lt;em&gt;revert the commit&lt;/em&gt;, fine. If the answer is &lt;em&gt;revert the commit plus untrain the team plus catch the residual cases in review&lt;/em&gt;, the rule needs more thought before it lands.&lt;/p&gt;

&lt;h3&gt;
  
  
  The one-paragraph summary
&lt;/h3&gt;

&lt;p&gt;Every harness change merges with a one-paragraph summary in the team channel. Not a link to the PR. A summary the team can read in fifteen seconds.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Added a rule about testing the migration path against a real database, not a mock. Affects anyone working in the migrations directory. Prompted by the incident two weeks ago. Push back in this thread if it conflicts with something I missed.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The summary takes two minutes to write. It does three things: it tells the team something changed, it gives them the context to evaluate the change, and it gives them a place to push back if I got it wrong.&lt;/p&gt;

&lt;p&gt;Most pushback comes within a day. Most of it is useful. Sometimes it reveals the rule should be scoped down, or that it conflicts with an in-flight piece of work, or that the team has a better way of expressing the same intent. The discussion happens in public, and the next maintainer reading the commit history can see the reasoning.&lt;/p&gt;

&lt;h3&gt;
  
  
  A monthly cadence for big changes
&lt;/h3&gt;

&lt;p&gt;Small changes go in whenever. Big changes, like ones that touch the harness’s structure or change a rule the team has built habits around, go in on a regular cadence. We do it monthly.&lt;/p&gt;

&lt;p&gt;The cadence does two things. It batches the disruption: the team’s mental model updates once a month, not constantly. And it forces a backlog of harness changes to accumulate, which surfaces patterns. Three of the changes pending this month all point in the same direction; the actual change should be a single bigger move that subsumes them, not three small ones.&lt;/p&gt;

&lt;p&gt;The discipline is to resist landing big changes between cadences. The cost of waiting two weeks is small. The cost of breaking everyone’s flow on a random Tuesday is large.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ship your next harness change on a Monday
&lt;/h3&gt;

&lt;p&gt;The harness has the same blast radius as a build configuration or a CI pipeline. The teams that treat it that way get a harness that improves. The teams that do not get one that drifts.&lt;/p&gt;

&lt;p&gt;Three things to try this week:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Before merging the next non-trivial rule change, grep the harness for the same concepts and paths. If the new rule overlaps with an old one, merge or replace; do not add.
&lt;/li&gt;
&lt;li&gt;Write a one-paragraph summary and post it in the team channel before merging. Wait a day if the change affects an active workflow.
&lt;/li&gt;
&lt;li&gt;Put a monthly slot on the calendar for structural changes. Push big changes to that slot. Land small ones whenever.&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>harnessengineering</category>
      <category>software</category>
      <category>architecture</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>The Daemon in the Middle</title>
      <dc:creator>Ian Johnson</dc:creator>
      <pubDate>Tue, 16 Jun 2026 15:02:01 +0000</pubDate>
      <link>https://dev.to/tacoda/the-daemon-in-the-middle-17hk</link>
      <guid>https://dev.to/tacoda/the-daemon-in-the-middle-17hk</guid>
      <description>&lt;p&gt;My laptop runs a Python process that does almost nothing. Every 60 seconds it asks Jira for the list of open tickets, looks at each one, and decides whether anything needs to happen. Most of the time, nothing does. The whole tick takes under a second. Then it sleeps.&lt;/p&gt;

&lt;p&gt;I recently wrote about intent-driven delivery. A customer signal turns into a contract on a Jira ticket. An agent reads the contract, does the work, posts evidence. A human approves the result. Six steps, three of them mechanical. The middle was supposed to run without a human babysitting it.&lt;/p&gt;

&lt;p&gt;The daemon is what runs the middle. It’s called iddd (Intent-Driven Delivery Daemon) and it does almost nothing. That’s the whole design.&lt;/p&gt;

&lt;p&gt;This post is about how the pieces fit. The daemon is not the agent. The daemon doesn’t generate code, doesn’t decide correctness, doesn’t read the contract. The daemon notices a ticket moved into the wrong status and pokes the agent to come look. Everything interesting happens inside the spawned agent process. The daemon is plumbing.&lt;/p&gt;

&lt;p&gt;That split is the design. Most projects I’ve seen in this space conflate the orchestrator with the agent. They build something smart at the wrong layer. The agent ends up tangled with retry logic, the orchestrator tangled with prompt engineering, and neither is easy to change. Split them and the daemon stays mechanical and testable. The agent stays a Claude Code process spawned with a known command and known inputs.&lt;/p&gt;

&lt;h3&gt;
  
  
  The pieces
&lt;/h3&gt;

&lt;p&gt;The Python package is iddd/. A handful of files, none of them long:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;reconciler.py — a loop on a 60-second interval.&lt;/li&gt;
&lt;li&gt;derive.py — one pure function that takes a Jira issue and returns the next action.&lt;/li&gt;
&lt;li&gt;queue.py — a SQLite table with a UNIQUE constraint on a dedupe key.&lt;/li&gt;
&lt;li&gt;worker.py — pulls jobs and spawns claude -p in a fresh repo clone.&lt;/li&gt;
&lt;li&gt;jira.py and gitHub.py — adapters. REST and the gh CLI.&lt;/li&gt;
&lt;li&gt;cli.py — iddd run, iddd status, iddd drain, iddd tail.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The deps are small on purpose. apscheduler for the loop. requests for Jira REST. pyyaml for config. sqlite3 from the stdlib for the queue. Nothing else.&lt;/p&gt;

&lt;h3&gt;
  
  
  The shape of a tick
&lt;/h3&gt;

&lt;p&gt;A tick is what happens every 60 seconds. The reconciler runs a JQL query against Jira for all open issues in the project. For each one it calls derive_action(issue). That function returns either None (nothing to do, ticket is in a steady state) or a tuple like &lt;code&gt;(“idd-dispatch”, “PROJ-123”)&lt;/code&gt; meaning “this ticket is ready to be picked up by the dispatch agent.” The reconciler then enqueues the action against the SQLite queue.&lt;/p&gt;

&lt;p&gt;That’s the whole brain. A polling loop, a pure function, a queue.&lt;/p&gt;

&lt;p&gt;The pure function is the part I’m proudest of. derive_action doesn’t talk to the network on its own. It takes the issue payload — status, labels, comments, description — and returns the next action by inspection. That makes it easy to test. I have a directory of fixture issues — backlog_no_intent.json, needs_details_with_draft.json, to_do_approved.json, in_progress_pr_open.json — and a pytest run that asserts the right action for each. No mocks, no fakes, no fragile mocking-the-mocking. The function reads inputs and returns a decision. When a bug shows up in the field, I capture the issue payload that triggered it, drop it in fixtures, write a failing test, and fix the function.&lt;/p&gt;

&lt;p&gt;The queue is the second thing I’m proud of, and it’s the smaller idea. The table looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;jobs&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;dedupe_key&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;UNIQUE&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;issue_key&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;action&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;state&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;attempts&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;updated_at&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;dedupe_key is {issue_key}:{action} — for example, PROJ-123:idd-dispatch. enqueue() does INSERT OR IGNORE. If a tick fires while the previous tick’s enqueue is still pending in the worker, the second enqueue is a no-op. Dedupe is free, and it’s stored in the database, not in process memory. The daemon can restart mid-job and the queue is intact.&lt;/p&gt;

&lt;p&gt;The state column is pending | running | done | failed. A worker grabs the lowest-id pending row whose issue_key isn’t already in the running set, transitions it to running, does the work, transitions it to done or failed. Per-issue serialization comes from that “not in the running set” clause: one ticket can only have one job in flight at a time, even though the pool runs multiple workers in parallel across different tickets.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the worker does
&lt;/h3&gt;

&lt;p&gt;Here’s the worker’s core:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_job&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
  &lt;span class="n"&gt;clone&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/tmp/iddd-clone/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;issue_key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
  &lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gh&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;repo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;clone&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;GITHUB_REPO&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;clone&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;env&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CODE_REPO&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;clone&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;cmd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-p&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; --output-format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stream-json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; --dangerously-skip-permissions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;action&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;issue_key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cwd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;IDDD_HOME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1800&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stream_to_log&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;finally&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;shutil&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rmtree&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ignore_errors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A fresh gh repo clone per job. A claude -p headless invocation. A 30-minute timeout. Stream the agent’s JSON output into a log file. Tear the clone down when the job ends, succeeded or failed.&lt;/p&gt;

&lt;p&gt;The fresh clone is cheap and worth it. Workers running in parallel never see each other’s tree. A bad agent that mangles its working directory can’t poison the next job. Worktrees could be used for this, but I’m experimenting with on-demand clones to see how it fares.&lt;/p&gt;

&lt;p&gt;The slash command — /idd-dispatch, /idd-new, /idd-complete — is where the actual work lives. The daemon hands the agent a ticket key and a verb. The agent does everything else: reads the contract from the ticket, runs the harness, writes the code, opens the PR, posts evidence back to the ticket. The agent uses MCP servers internally for Jira and GitHub. The daemon does not.&lt;/p&gt;

&lt;p&gt;That last point matters. The daemon talks to Jira via REST and to GitHub via the gh CLI. It does not use MCP. The reason is testability: MCP servers expect a live claude process to mediate them, which means out-of-band callers, like a polling daemon, can’t reach them cleanly. REST and gh are stable, well-documented, and easy to fake in tests. MCP is for the agent. Everything outside the agent talks to the source systems directly.&lt;/p&gt;

&lt;h3&gt;
  
  
  How a ticket flows through
&lt;/h3&gt;

&lt;p&gt;Here’s a full lap. A teammate files a signal: a one-line problem statement that lands as a Jira ticket in Backlog, tagged with the workflow label and carrying a YAML intent block at the top of the description. That YAML block holds the impact / urgency / clarity scores, the signal source, and a list of open questions the agent needs me to answer before it can write a falsifiable contract. The next tick fires. derive_action looks at the ticket: status Backlog. The decision is no-op — awaiting human triage. The daemon does nothing until I move the ticket.&lt;/p&gt;

&lt;p&gt;I drag it from Backlog to Needs Details. That’s the first human gate. I’m looking at the seed, deciding if it’s worth carrying forward. If yes, I move it. If no, I edit the description or trash the ticket.&lt;/p&gt;

&lt;p&gt;Once in Needs Details, the next tick sees the new status. There’s no idd:feedback comment yet, so derive_action returns no-op — awaiting feedback. I post a Jira comment whose first line is the marker idd:feedback, with answers to the seed’s open questions in the body. The next tick picks up the marker. The action is idd-draft. A worker grabs the job, clones the harness repo, spawns claude -p /idd-draft PROJ-123. The agent reads the description, folds my answers into Outcome, Signal, Scope in, Scope out, numbered Acceptance criteria, Release Gating, and Risk Surface, then rewrites the Jira description with the full contract. The Open Questions section is empty — every question is resolved. The agent posts a TP2 notification comment and exits. The ticket stays in Needs Details.&lt;/p&gt;

&lt;p&gt;I read the contract. If I want to push back — the Outcome is too broad, an Acceptance item isn’t falsifiable, the Scope in is wrong — I post another idd:feedback comment with the design notes I want folded in. The daemon picks that up too and the agent re-renders the contract on the next tick. I iterate until it reads right.&lt;/p&gt;

&lt;p&gt;When I’m happy, I approve. And here’s the part that’s still in flux.&lt;/p&gt;

&lt;p&gt;Approval is a comment, for now. The daemon watches for a comment whose first line is the marker idd:approve and treats that as the approval signal. The reconciler enqueues /idd-approve, the agent runs the full approval checklist (Outcome is one present-tense sentence, Acceptance has ≥2 falsifiable items, Scope out is non-empty, releasability holds, flag gating named if user-visible), and only then transitions the ticket Needs Details -&amp;gt; To Do.&lt;/p&gt;

&lt;p&gt;The reason it’s a comment is a question I haven’t answered: &lt;em&gt;who is the bot user?&lt;/em&gt; If the daemon transitions tickets through the real workflow, it needs Jira credentials with permission to do that, which means a service account, which means a license seat, which means a budget conversation. Using a comment-as-approval lets me prove the rest of the loop works while I sort out the identity story. It’s a temporary cheat, and I want to be honest that it’s a cheat — but the alternative was to block the whole project on a procurement question. I’d rather ship and assess. I’m sure comment-as-approval is not the right long-term answer, and I’m honest about that with myself every time I use it.&lt;/p&gt;

&lt;p&gt;Once approved, the success hook fires an immediate reconciler tick. The ticket is now in To Do. The action is idd-dispatch. The agent reads the contract, transitions Jira To Do -&amp;gt; In Progress, branches off main with a name like PROJ-123/expired-card-msg, runs an inline chain of three skills — /idd-plan writes a per-acceptance test+code plan with no code, idd-implement writes failing tests first then the smallest change to green them all inside Scope in, /idd-review captures green test output and screenshots if the change is user-visible, runs parallel review agents, runs pre-commit, commits with Conventional Commits, pushes, and opens the PR with the Test Plan inlined in the body. The agent then posts an idd:completion comment on the Jira card carrying acceptance evidence, transitions Jira In Progress -&amp;gt; In Review, and fires the TP3 notification with the PR URL.&lt;/p&gt;

&lt;p&gt;I do a code review on the PR. If I’m happy, I comment idd:accept. If I’m not, I comment idd:reject on the PR with the one-sentence reason, or idd:changes-requested with a revision brief. Reject closes the contract and files a new gap signal. Changes-requested triggers another executor pass on the same PR. On a clean merge the daemon watches the post-merge CI run on main via gh run watch --exit-status, and only on green transitions the ticket In Review -&amp;gt; In Staging. That’s the terminal state for the workflow. UAT happens there. The eventual Done transition is a separate human business decision the harness never makes. &lt;em&gt;Humans&lt;/em&gt; own that decision.&lt;/p&gt;

&lt;p&gt;Six phases. Three with my eyes on the work — promoting the seed, approving the contract, reviewing the PR. The other three the daemon and the agent handle between themselves.&lt;/p&gt;

&lt;h3&gt;
  
  
  The dashboard
&lt;/h3&gt;

&lt;p&gt;The daemon also serves a small local web dashboard on a loopback port, accessible only from the same machine. It’s a single-page app, refreshed live over Server-Sent Events so the view changes within a couple of seconds of any state movement. There’s no build step, no client framework — the daemon emits HTML fragments and htmx swaps them into place on each tick.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;Quick actions&lt;/strong&gt; panel at the top exposes two short forms. One starts a new signal from a one-line problem statement, the other adopts an existing ticket into the workflow. Submitting either form enqueues the corresponding command.&lt;/p&gt;

&lt;p&gt;Below that, a &lt;strong&gt;Pipeline&lt;/strong&gt; band shows every ticket the workflow currently owns, grouped into columns by status ( Backlog, Needs Details, To Do, In Progress, In Review, In Staging, Done). Each ticket renders as a small card with its key, summary, assignee, and a colored action-state tag along the bottom edge — needs-feedback, approval-requested, interrupt-blocked, review-requested, uat-pending, agent-running, and so on. The tag color is mirrored as the card’s left-border accent so the eye can pick out “what needs me right now?” without reading the words. A red interrupt-blocked card stands out instantly. A green agent-running card recedes into the background.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;Ticket inspector&lt;/strong&gt; lets me type a Jira key and pull up the queue history and live state for that specific ticket.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Active&lt;/strong&gt; , &lt;strong&gt;Pending&lt;/strong&gt; , and &lt;strong&gt;Recent&lt;/strong&gt; tables list the daemon’s queue jobs. Each row shows job id, state badge, action (the slash command being run), attempts, age, and a one-line snippet of the most recent activity event captured from the agent’s stream. Clicking any row reveals a panel that shows full job metadata, the last error if any, a chronological activity timeline of every event the agent emitted, and three small buttons (retry, force-fail, drop) for unsticking jobs manually.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;Metrics&lt;/strong&gt; card sits below: running and pending counts, done and failed totals over the last 24 hours, throughput per hour, failure rate as a percentage, p50 and p95 duration. A per-action breakdown shows total / done / failed / average time for the top job types.&lt;/p&gt;

&lt;p&gt;A live &lt;strong&gt;log tail&lt;/strong&gt; at the bottom streams the daemon’s structured output, with WARNING and ERROR lines color-highlighted.&lt;/p&gt;

&lt;p&gt;I don’t &lt;em&gt;need&lt;/em&gt; the dashboard to run the workflow. Everything happens through Jira comments and PR comments and Jira status transitions. But when I’m looking at five tickets at once and trying to remember which one is waiting on me versus which one is waiting on the agent, the colored tag along the bottom of each card is the difference between “context switch and read three tickets” and “look once and know.”&lt;/p&gt;

&lt;h3&gt;
  
  
  A note on going remote
&lt;/h3&gt;

&lt;p&gt;The next move is to put the daemon on a remote host so it doesn’t depend on my laptop being awake. The shape stays the same: reconciler, queue, worker, headless agent. The host moves to a remote machine. The queue can stay SQLite a while longer. The architecture grows a real webhook receiver so we don’t wait 60 seconds for the next tick.&lt;/p&gt;

&lt;p&gt;I’ll write about the remote build separately. The reason I’m flagging it is that the local design is deliberately shaped to make the remote move boring. The daemon already speaks REST and gh instead of relying on a local-only API. The queue is already durable. The worker already isolates jobs in clones. None of those choices were free on a single laptop, but they all pay off when the host stops being a laptop.&lt;/p&gt;

&lt;h3&gt;
  
  
  What surprised me
&lt;/h3&gt;

&lt;p&gt;A few things that didn’t go the way I expected.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The pure function is the whole game.&lt;/strong&gt; I thought the daemon would end up being mostly worker logic. It isn’t. The worker is fifteen lines. derive-action is where the bugs live, and the fixture-driven tests for it are what I iterate against. When I want to change how the workflow behaves, I add a fixture for the new case and update derive_action. The rest of the daemon doesn’t move.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Comments make a fine approval mechanism even when they’re a cheat.&lt;/strong&gt; I expected the comment-as-approval pattern to feel hacky. It mostly doesn’t. Comments are scoped, threaded, timestamped, tied to a user — everything an approval needs. The reason I’m hesitating to make them the permanent answer is bot-user identity, not the UX. The UX is fine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SQLite is the right size of database for this.&lt;/strong&gt; A single-file durable queue with no daemon to run and no schema migration to write is what a local automation wants. I’ll outgrow it when the daemon goes remote. I haven’t outgrown it yet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The hardest part was deciding what &lt;em&gt;not&lt;/em&gt; to put in the daemon.&lt;/strong&gt; I kept catching myself adding features to the orchestrator instead of the agent. Every time, I had to talk myself out of it. The daemon stays dumb. The agent gets smart. The seam between them is the queue and the slash command name. If the seam blurs, the testability story falls apart.&lt;/p&gt;

</description>
      <category>softwareengineering</category>
      <category>harnessengineering</category>
      <category>agenticworkflow</category>
      <category>aiorchestration</category>
    </item>
    <item>
      <title>Intent-Driven Delivery</title>
      <dc:creator>Ian Johnson</dc:creator>
      <pubDate>Mon, 15 Jun 2026 14:00:36 +0000</pubDate>
      <link>https://dev.to/tacoda/intent-driven-delivery-3lh9</link>
      <guid>https://dev.to/tacoda/intent-driven-delivery-3lh9</guid>
      <description>&lt;p&gt;Two poles dominate how teams talk about AI in their workflow right now. On one end, vibe coding: open a chat, describe the change, accept what comes back, move on. On the other end, spec-driven development: write the formal spec, generate the code from it, treat the spec as the source of truth. Both have a real argument. Both miss the same thing.&lt;/p&gt;

&lt;p&gt;Vibe coding is fast and unreliable. The work that comes back looks fine and is wrong in ways nobody can name, because nobody wrote down what right was. Spec-driven development is reliable and slow. The spec gets so heavy that writing it becomes the project, and the code ends up locked to the spec rather than to the outcome.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Waterfall has the same problems regardless of a human or agent actor.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Intent-driven delivery&lt;/strong&gt; is the middle path. It is not a coding style. It is a delivery model — how a change moves through the whole system, from idea to production, in a way an agent can run and a human can trust. The unit of work is an &lt;strong&gt;intent contract&lt;/strong&gt; : a small, falsifiable spec for one outcome, with named scope, acceptance, and a feature flag. Humans approve at three gates inside the cycle and one more at production. Agents do the middle. The flow is defined by constraints, not vibes and not waterfalls.&lt;/p&gt;

&lt;p&gt;It answers four questions at once. What do agents do, if not vibe code or generate from a heavy spec? How does work get tracked when the ticket isn’t the unit of work anymore? Who decides a change is safe for users to see? And what’s the shape of a delivery pipeline when both the org and the project have to constrain it? One model covers all four because they’re the same problem in four costumes — the gap between human intent and shipped behavior, with an agent in between.&lt;/p&gt;

&lt;p&gt;I’ve been running this against a real project for a few months. The skeleton holds. One part is still soft and I’ll say where.&lt;/p&gt;

&lt;h3&gt;
  
  
  Not vibe coding, not spec-driven development
&lt;/h3&gt;

&lt;p&gt;The point isn’t which pole is right. They’re answering different questions. Vibe coding asks how &lt;em&gt;fast&lt;/em&gt; a change can ship if you trust the agent. Spec-driven development asks how &lt;em&gt;reliable&lt;/em&gt; a change can be if you write everything down up front.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Neither&lt;/em&gt; asks how a customer signal becomes a green deploy without burning out the team.&lt;/p&gt;

&lt;p&gt;Intent-driven delivery picks a small contract per outcome — &lt;em&gt;much smaller than a full spec&lt;/em&gt; — and then trusts the constraints around the contract to do the work the spec would have done in a heavier model. The harness catches what the contract didn’t say. The feature flag absorbs the rollout risk. The three touchpoints place humans where humans add value, and nowhere else.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This is closer to agile than to either pole.&lt;/em&gt; Small batches, working software, falsifiable acceptance, continuous delivery, retros on what’s slowing the team down. None of that is new. What’s new is that the unit of work is structured enough for an agent to read it, and the flow is constrained enough for the agent to run the middle without a human babysitting. Agile assumed every step had a human in it.&lt;/p&gt;

&lt;p&gt;Intent-driven delivery keeps the cadence and the values, and lifts the human out of the steps where the human was a bottleneck instead of a contributor.&lt;/p&gt;

&lt;p&gt;That is also where it splits from spec-driven development. Spec-driven development wants the spec to drive the code generation end-to-end. Intent-driven delivery wants the contract to drive one outcome, then get out of the way. The next contract gets its own spec. The system never accumulates a giant document that has to stay in sync with the code.&lt;/p&gt;

&lt;h3&gt;
  
  
  A delivery model, not a coding style
&lt;/h3&gt;

&lt;p&gt;A coding style stops at the editor. A delivery model covers the whole pipe.&lt;/p&gt;

&lt;p&gt;In this one, the pipe is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A signal lands — support ticket, metric, incident, customer request.
&lt;/li&gt;
&lt;li&gt;A human turns the signal into a contract.
&lt;/li&gt;
&lt;li&gt;An agent reads the contract and does the work.
&lt;/li&gt;
&lt;li&gt;The agent posts evidence against the contract’s acceptance.
&lt;/li&gt;
&lt;li&gt;A human checks the evidence against the contract.
&lt;/li&gt;
&lt;li&gt;The change merges, CI runs on main, the flag stays off until the rollout is ready.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Six steps. Three of them involve a human — seed review before step 2, the contract approval at step 2, and the completion review at step 5. Three of them are mechanical. The model is the whole pipe, including the bits that are usually invisible — post-merge CI, the flag, the rollback story. If any of those is missing, the model leaks somewhere and the team eats the cost.&lt;/p&gt;

&lt;p&gt;The thing to keep in mind: well-defined constraints are what hold the pipe together. The contract constrains &lt;em&gt;what&lt;/em&gt; the agent may touch and what done looks like. The harness constrains &lt;em&gt;how&lt;/em&gt; the work gets done. The flag constrains &lt;em&gt;when&lt;/em&gt; users see it. The CI watch constrains &lt;em&gt;when&lt;/em&gt; the contract closes. Pull any one of those out and the middle stops being safe.&lt;/p&gt;

&lt;h3&gt;
  
  
  What this assumes
&lt;/h3&gt;

&lt;p&gt;This isn’t a stand-alone proposal. Three things have to be in place before the contract model earns its keep.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;An agentic flow.&lt;/strong&gt; One agent or a fleet. Babysitting a single tab or dispatching dozens. The shape doesn’t matter; the existence does. If the team is still writing every line by hand, the contract is overkill. The contract pays for itself when an agent is reading it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Continuous delivery.&lt;/strong&gt; Squash-merge, green main, deploy-on-merge. If a contract gets accepted on Tuesday and ships on Friday after a release-train meeting, the feedback loop is too long for the model to work. The contract presumes the merge is the release.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A project harness with guides and sensors.&lt;/strong&gt; Guides are the rules the agent reads — coding standards, patterns to prefer, files to leave alone. Sensors are the checks that fire when a rule is broken — linters, tests, type checks, custom inspections. Without guides, the agent invents conventions. Without sensors, nobody catches it. The harness is what makes the agent’s freedom inside a contract safe.&lt;/p&gt;

&lt;p&gt;Miss any of the three and the contract becomes paperwork. Have all three and the contract becomes the steering wheel.&lt;/p&gt;

&lt;h3&gt;
  
  
  The contract
&lt;/h3&gt;

&lt;p&gt;The replacement is a contract with named sections. Not a wishlist. A spec the agent and the human can both check against.&lt;/p&gt;

&lt;p&gt;The sections that earn their place:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Outcome.&lt;/strong&gt; One sentence. The observable end state. “Users on an expired card see a clear renewal message instead of a generic error.” Not “fix the expired card issue.”&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Signal.&lt;/strong&gt; Where this came from. A support ticket, a metric, an incident, an internal request. Keeps the context attached so reviewers can sanity-check the framing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scope in.&lt;/strong&gt; The files, modules, or surfaces the work is allowed to touch. Concrete paths or named components. Not “the billing area.”&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scope out.&lt;/strong&gt; The places the work must not touch, even if they look related. This one is load-bearing. An empty Scope out is a draft defect — it means the author didn’t think about boundaries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Acceptance.&lt;/strong&gt; A numbered list of falsifiable conditions. Each one has to be the kind of thing a test or a command can decide. “Renewal message appears” is acceptable. “Improved error UX” is not. “Improved” is a draft defect. Block the contract until it’s rewritten.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Release gating.&lt;/strong&gt; Name the feature flag and its default. If the change is user-visible and there’s no flag, the contract can’t be approved. Default is off. Always.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Risk surface.&lt;/strong&gt; What this could plausibly break. Auth, billing, data integrity, third-party dependencies. The reviewer uses this to decide how hard to look.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Open Questions.&lt;/strong&gt; Things the author couldn’t resolve before drafting. A contract with non-empty Open Questions cannot be approved. They have to be answered first, even if the answer is “we accept this risk.”&lt;/p&gt;

&lt;p&gt;Why each section is there: the contract has to be falsifiable end-to-end. If a section is empty, either the section doesn’t apply (rare) or the author is hiding a decision. The contract works because the gaps are visible.&lt;/p&gt;

&lt;p&gt;The shift in mindset: the ticket used to be a prompt. &lt;em&gt;Now it’s a test.&lt;/em&gt; If the work passes the contract, it’s done. If it doesn’t, it isn’t. The reviewer’s eyeballs are no longer the rubric.&lt;/p&gt;

&lt;h3&gt;
  
  
  Four phases, three touchpoints
&lt;/h3&gt;

&lt;p&gt;The cycle around the contract has four phases. Humans show up at three named gates. Nowhere else.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4pxjxuzluo16xio72f26.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4pxjxuzluo16xio72f26.png" width="800" height="94"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Three touchpoints, four phases. Notice the agent stretch in the middle — that’s the part that used to be hallway conversations.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The thing to notice in the diagram: between TP2 and TP3, no human is in the loop. If the agent gets stuck, it doesn’t ask Slack. It files a structured interrupt against the contract, with options and a recommendation, and stops. The human resolves the interrupt in writing, and the agent resumes. Interrupts attach to the contract and become part of the record.&lt;/p&gt;

&lt;p&gt;This matters. Most of what makes the old model expensive is the unstructured interruption: “quick question on the dashboard ticket?” pinged twelve times a day, no record kept, no decision attributed. Force the interrupt into a structured artifact and the cost of asking goes up enough that the agent learns to only ask when it matters, and the answer is preserved for the next contract. It can also serve as a source for agent post-mortems to fix failure modes.&lt;/p&gt;

&lt;p&gt;Three touchpoints, no more. Seed review at TP1 (is this worth doing?), contract approval at TP2 (is this spec good enough?), completion review at TP3 (did we get what we asked for?). Anyone who tries to insert a fourth gate in the middle is reinventing the status field.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fybh6tbpp9fsaiclvbl07.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fybh6tbpp9fsaiclvbl07.png" width="792" height="1420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;A more granular look at the Signal to Completion flow.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Where the contract lives
&lt;/h3&gt;

&lt;p&gt;I made a wrong call here and want to flag it.&lt;/p&gt;

&lt;p&gt;First version: contracts lived as markdown files in a separate repo. Clean, version-controlled, diffable, lovely. Nobody read them. The issue tracker still had the old ticket, with a vague title and no acceptance, and the team kept working from the ticket because that’s where the search lives and that’s where the dashboards point. The contracts drifted from the tickets.&lt;/p&gt;

&lt;p&gt;The fix is to make the issue description &lt;strong&gt;be&lt;/strong&gt; the contract. The whole contract, in the description field, with a tagged YAML block for the structured parts. The tracker is where everyone already looks. The contract goes there.&lt;/p&gt;

&lt;p&gt;The intent block at the top of the issue:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;intent&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;IC-PROJ-42&lt;/span&gt;
 &lt;span class="na"&gt;outcome&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Users on an expired card see a clear renewal message instead of a generic error.&lt;/span&gt;
 &lt;span class="na"&gt;flag&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;FEATURE_EXPIRED_CARD_MSG&lt;/span&gt;
 &lt;span class="na"&gt;flag_default&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;off&lt;/span&gt;
 &lt;span class="na"&gt;scope_in&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
   &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;app/Billing/ExpiredCardNotice.php&lt;/span&gt;
   &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;resources/views/billing/notice.blade.php&lt;/span&gt;
 &lt;span class="na"&gt;scope_out&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
   &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;app/Billing/PaymentGateway.php&lt;/span&gt;
 &lt;span class="na"&gt;acceptance&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
   &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;When the card is expired, the renewal message renders with the flag on.&lt;/span&gt;
   &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;When the flag is off, behavior is identical to current main.&lt;/span&gt;
   &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;No change to the payment gateway code path.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Completion and interrupts go in as comments on the same issue, with a leading marker so the harness can find them programmatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;!-- intent:completion --&amp;gt;
Acceptance:
1. Pass - renewal message renders under flag on. See test BillingNoticeTest::testExpiredCardWithFlag.
2. Pass - flag off path unchanged. See test ExpiredCardLegacyPathTest.
3. Pass - payment gateway file untouched in diff.
Interrupts: none.
Change summary: added ExpiredCardNotice service and a Blade partial behind FEATURE_EXPIRED_CARD_MSG.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The trick is the marker. &amp;lt;!-- intent:completion --&amp;gt; and &amp;lt;!-- intent:interrupt --&amp;gt; are invisible in the tracker UI but addressable by the harness. The humans see a normal comment. The agent sees a structured artifact.&lt;/p&gt;

&lt;p&gt;This is the part that answers “what comes after ticket trackers.” The tracker doesn’t go away — that’s where the org’s gravity is, where search lives, where the internal stakeholders look. What changes is what lives inside it. The vague ticket is gone; the structured contract takes its place. Status isn’t typed by humans dragging cards — it’s derived from artifacts on the contract. A contract is “in progress” because a branch exists and no completion comment has landed. It’s “verified” because the completion comment is there and post-merge CI went green. It’s “rolled out” because the rollout marker is in the comments. The kanban board still renders. It just reads truth from the contract instead of asking a human what they did yesterday.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ports, adapters, and not getting locked in
&lt;/h3&gt;

&lt;p&gt;The contract is a data shape. The phases are a workflow. Neither cares which vendor you’re using.&lt;/p&gt;

&lt;p&gt;Four ports do all the I/O:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;signal-source — where signals come from and where contracts live. Issue tracker.&lt;/li&gt;
&lt;li&gt;contract-store— where the markdown body of the contract is rendered and parsed. Often the same as signal-source.&lt;/li&gt;
&lt;li&gt;code-host — where the branches and PRs live.&lt;/li&gt;
&lt;li&gt;notification — where the agent posts when a touchpoint is ready for a human.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The install I run uses one major tracker and one major code host. The adapters are about 200 lines each. If I had to move to a different tracker or a different code host tomorrow, I’d write new adapters and the commands would not change. The contract format wouldn’t change. The phases wouldn’t change.&lt;/p&gt;

&lt;p&gt;This is the part that should make people relax about the proposal. It’s not “switch to this product.” It’s a shape and a set of touchpoints. You pick the providers. The shape stays.&lt;/p&gt;

&lt;p&gt;You can overlay onto the services and tools that &lt;em&gt;you already use.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If you decide to use a completely different way to manage issue information, you can plug that in. Instead of backing that with Jira or Linear, you could do it with Notion or Markdown files or an SQLite database. If you want it, just write an adapter for it.&lt;/p&gt;

&lt;h3&gt;
  
  
  What runs through the code host
&lt;/h3&gt;

&lt;p&gt;The code-host adapter has more moving parts than the others, because the agent does most of its work there. The conventions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Branch name intent/IC-PROJ-42. The branch is the contract.&lt;/li&gt;
&lt;li&gt;One PR per contract. No stacked PRs across contracts. Every PR branches from main.&lt;/li&gt;
&lt;li&gt;PR label intent-driven. So the team can filter agent work from human work.&lt;/li&gt;
&lt;li&gt;No GitHub assignee. The contract is the owner; the assignee field invites the old model back in. &lt;em&gt;Note: this is a temporary constraint until fully exploring assignee and approval flows.&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The verify step has a sharp edge. Before doing anything else, it asks the code host for the PR’s mergeable state. If it’s CONFLICTING, the agent attempts an auto-rebase on origin/main with --force-with-lease. If the rebase is clean, work continues. If there are file-level conflicts, the agent stops and files an interrupt — humans decide how to resolve, not the agent.&lt;/p&gt;

&lt;p&gt;When acceptance is verified, the merge is one command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gh &lt;span class="nb"&gt;pr &lt;/span&gt;merge https://github.com/org/some-app/pull/123 &lt;span class="nt"&gt;--squash&lt;/span&gt; &lt;span class="nt"&gt;--delete-branch&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Squash-merge keeps main linear and one-commit-per-contract. The branch is gone the moment it lands.&lt;/p&gt;

&lt;p&gt;Then comes the part that I’ve seen skipped everywhere else and which matters more than the merge itself. Post-merge CI is watched:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gh run watch &amp;lt;run-id&amp;gt; &lt;span class="nt"&gt;--exit-status&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The issue tracker does not advance the contract to “verified” until CI returns green on main. If main goes red after the merge, the contract is still open and the agent owns the cleanup. This closes the loophole where the PR is “merged” and the team finds out three hours later that main is broken.&lt;/p&gt;

&lt;h3&gt;
  
  
  The production gate
&lt;/h3&gt;

&lt;p&gt;The merge isn’t the release. That’s the whole point of the flag, and it’s where the fourth human gate lives. The merge ships dark — code on main, flag off, no user impact. &lt;em&gt;The rollout is a separate decision on a separate cadence.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The three in-cycle touchpoints decide whether the change is right. The production gate decides whether the change is ready for users. Different question, different rubric, often a different reviewer. Engineering can ship a contract that’s technically correct on a Tuesday and still wait until Thursday to flip the flag, because support is staffed differently or a marketing email goes out that morning. That decision doesn’t belong in the contract review.&lt;/p&gt;

&lt;p&gt;In practice: the contract closes when post-merge CI on main is green and the flag is still off. A named human — product owner, on-call, whoever the team has agreed on — flips the flag when deployment health, support readiness, and customer comms line up. If something goes sideways at rollout, the fix is to flip the flag back, not to revert the merge. The code stays on main. The decision being rolled back is the rollout, not the change.&lt;/p&gt;

&lt;p&gt;The flip is recorded on the contract with the same marker pattern as completion:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;!-- intent:rollout --&amp;gt;
Flag FEATURE_EXPIRED_CARD_MSG flipped on at 2026–06–12 14:02 UTC by @sara.
Cohort: 100%. No rollback as of 24h post-flip.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That marker is what closes the contract for good. Until it lands, the change exists but doesn’t ship. This is the gate that prevents agent-driven middles from quietly putting things in front of customers.&lt;/p&gt;

&lt;p&gt;The reason why I chose an explicit gate at production is how many horrors stories have happened with misuse of an agent with production access. Our solution? &lt;em&gt;Never&lt;/em&gt; give the agent production access. Production releases are human-triggered and business decisions, just like the Continuous Delivery model.&lt;/p&gt;

&lt;h3&gt;
  
  
  The honest part: bot identity
&lt;/h3&gt;

&lt;p&gt;Here’s where the design is still soft.&lt;/p&gt;

&lt;p&gt;GitHub blocks the PR author from approving their own PR. If you try, the API returns:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Review cannot be requested from pull request author.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For a single-identity agent setup — where the agent opens the PR under the same user account as the human reviewer — this is a wall. You can’t have a separate human approve, because there’s no separate identity.&lt;/p&gt;

&lt;p&gt;The clean fix is to give the agent its own bot identity. The bot opens the PR. The human reviews and approves it. Two GitHub users, no API conflict, normal review flow. The harness I run has the bot config commented out as a follow-up. It works in single-identity mode for solo projects and breaks the moment another reviewer is involved.&lt;/p&gt;

&lt;p&gt;I’m calling this out because the rest of the design is tight and this part isn’t. The fix is operational, not architectural — register a bot, give it write access to the repo, point the code-host adapter at its token. A day of work. I haven’t done it yet. If you build this for a team, do it on day one.&lt;/p&gt;

&lt;p&gt;This part also does not &lt;em&gt;necessarily&lt;/em&gt; need to be automated. This could, in fact, be a human PR review. Some organizations may want to implement this. We chose not to because a) our harness is mature, and b) we practice continuous delivery (not deployment) and so merging into main triggers the next human actions of user acceptance testing in staging environments.&lt;/p&gt;

&lt;h3&gt;
  
  
  What changes for the team
&lt;/h3&gt;

&lt;p&gt;The rituals shift in ways that are mostly good.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stand-ups become read-throughs.&lt;/strong&gt; Instead of “what did you do yesterday,” you walk the contracts in flight. Each one has a state, a current touchpoint, and an open interrupt or not. The status comes from the contract, not the human’s memory. Meetings shorten.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Estimation sharpens.&lt;/strong&gt; A contract has falsifiable acceptance. Estimation against falsifiable acceptance is meaningfully easier than estimation against “improve performance.” The team gets better at sizing because the targets are sharper.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TP3 review becomes real.&lt;/strong&gt; When the work comes back, the reviewer reads the acceptance, reads the completion comment, and checks each numbered item. The review takes ten or fifteen minutes for most contracts because the rubric is already written. The reviewer isn’t deciding what done means; they’re checking whether done was reached. &lt;em&gt;This also opens an opportunity for exploratory testing, which has a high product value and customer impact.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Status becomes derived.&lt;/strong&gt; No more “drag the card.” The contract’s state is computed from artifacts — does a completion comment exist, did CI go green, are there open interrupts. The Kanban board still renders, but nobody curates it. It reads the truth from the contracts.&lt;/p&gt;

&lt;p&gt;The thing that doesn’t change: the human is still the source of intent. The contract is what the human wants. The phases protect the human from being interrupted constantly. &lt;em&gt;They don’t replace the human.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Org norms and project specifics
&lt;/h3&gt;

&lt;p&gt;A delivery pipeline has two halves that usually get conflated. The “what” — organizational norms, product strategy, the standards the company holds itself to. The “how” — project conventions, code patterns, the sensors that catch the agent doing the wrong thing in a particular codebase. Most teams encode the “how” in their repos (linters, tests, docs) and the “what” in human heads (roadmaps, all-hands slides, Slack threads). Agents can read the “how.” They can’t read the heads.&lt;/p&gt;

&lt;p&gt;Intent-driven delivery uses the same contract shape at both altitudes.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;project harness&lt;/strong&gt; is what I’ve assumed through most of this post. Guides and sensors local to one repository. Files to touch, patterns to follow, checks to pass. Acceptance falsifiable against a test runner.&lt;/p&gt;

&lt;p&gt;The contract works at the project level because the project harness constrains how the work gets done. The “how” is local. The “what” is not. The “what” is product. What the org wants, who it’s for, why it matters. Today that lives in roadmaps, PRD drives, Slack threads, and a few people’s heads. Same problem as the old ticket, one altitude higher.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;org harness&lt;/strong&gt; sits one layer up. Same primitives — guides and sensors — but the subject is the company, not the code. Guides describe what the org cares about: who the customer is, which metrics matter, what changes need product review, what data classifications mean. Sensors run against contracts before they reach a project: does this align with a current org intent, does it touch a regulated surface, does the outcome map to a tracked metric. An org intent (“reduce billing-related support volume by 30% this quarter”) decomposes into project intents (“show a clear renewal message when the card is expired”), each with its own contract and gates.&lt;/p&gt;

&lt;p&gt;Product and engineering stop being two languages stapled together. They share a contract format, they share gates, they share the same definition of falsifiable. An agent running inside a project harness inherits the org’s standards by reference instead of by reminder. The CEO opening the tracker doesn’t see “improve performance” — they see an intent that traces up to a quarterly outcome and down to a green CI run.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to try this on your next ticket
&lt;/h3&gt;

&lt;p&gt;You don’t need to install anything to test the idea. Pick a real ticket from your current sprint and do this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rewrite acceptance so every line is falsifiable.&lt;/strong&gt; If a line says “improved,” “cleaner,” or “better,” rewrite it as a check a test or a command could run. If you can’t, the line wasn’t acceptance — it was a wish.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Add an explicit Scope out section.&lt;/strong&gt; Three to five files, modules, or surfaces that the work must not touch. If you can’t think of any, you haven’t thought about the boundaries yet. The exercise is worth ten minutes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Name the feature flag.&lt;/strong&gt; If the change is user-visible and there’s no flag, add one before you start. Default off. The flag is part of the same change, not a follow-up.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Push back on tickets where “done” is what the reviewer says.&lt;/strong&gt; If the next ticket coming your way doesn’t have falsifiable acceptance, return it. Don’t start work. The cost of writing the acceptance up front is much lower than the cost of arguing about “done” at the end.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Run one cycle where humans show up at three gates only.&lt;/strong&gt; Seed, contract approval, completion. No mid-execution check-ins. Watch what happens to the interruption volume. Watch what happens to review time. Watch what happens to the diff size.&lt;/p&gt;

&lt;p&gt;Most of the value shows up on the first ticket you rewrite, before any tooling exists. The contract is the lever. The phases are how to operate it without burning out the team. The flow — signal in, deploy out, three in-cycle touchpoints plus a production gate, constraints from the org and the project pressing in from both sides — is what makes intent-driven delivery a delivery model and not another spec. One shape answers what to do, how to track it, who decides it ships, and where the org and the project meet.&lt;/p&gt;

&lt;p&gt;That’s the bet.&lt;/p&gt;

</description>
      <category>agenticworkflow</category>
      <category>agenticai</category>
      <category>harnessengineering</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>Ports and Adapters for Prose</title>
      <dc:creator>Ian Johnson</dc:creator>
      <pubDate>Sun, 14 Jun 2026 15:48:27 +0000</pubDate>
      <link>https://dev.to/tacoda/ports-and-adapters-for-prose-3lpf</link>
      <guid>https://dev.to/tacoda/ports-and-adapters-for-prose-3lpf</guid>
      <description>&lt;p&gt;I rewrote the same skill file three times last month. Each time I changed the explanation. The trigger conditions never changed. I’d baked them into the body so deeply that touching the explanation meant re-typing the trigger from memory, which meant the trigger drifted, which meant the skill fired in places it shouldn’t, which meant a fourth rewrite.&lt;/p&gt;

&lt;p&gt;The trigger was right the first time. The body wasn’t. I rewrote the wrong half of the file three times because the two halves weren’t separable.&lt;/p&gt;

&lt;p&gt;That afternoon I went back through every skill file, slash command, and harness rule I’d written in the last quarter and looked for the same shape. It was everywhere. The contract — what the artifact must accomplish — was tangled together with the prose explaining how to do it. I couldn’t change one without rewriting the other. The seam wasn’t there.&lt;/p&gt;

&lt;p&gt;This is the ports-and-adapters pattern, applied to prose.&lt;/p&gt;

&lt;h3&gt;
  
  
  The pattern, translated
&lt;/h3&gt;

&lt;p&gt;Ports and adapters comes from software architecture. You define an interface — the port — and write multiple implementations behind it — the adapters. The interface is stable. The implementations swap freely. The point is the seam: code on one side of it doesn’t know or care about the code on the other side.&lt;/p&gt;

&lt;p&gt;The same shape applies to the prose we write for LLMs. A prompt, a skill file, an agent definition, a harness rule — each one has two parts authors usually conflate.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;port&lt;/strong&gt; is what the prose must accomplish. The preconditions, the inputs, the outputs, the failure modes, what counts as satisfied, what counts as out of scope. The contract the prose has to honor, stated independently of how you’d explain it.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;adapter&lt;/strong&gt; is the actual wording. The metaphors, the worked examples, the tone, the level of detail. One specific way to satisfy the port.&lt;/p&gt;

&lt;p&gt;Most prose-for-LLMs collapses these. You sit down to write a skill or a system prompt and what comes out is a tangle of “what this does” with “how I’m explaining it.” Six months later you want to change the explanation without changing the contract, or vice versa. The seam isn’t there, so you rewrite the whole thing.&lt;/p&gt;

&lt;p&gt;The pattern is to separate them at the point of authoring. Write the port explicitly first — a short list of inputs, outputs, invariants, failure modes. Then write the adapter as one specific way to satisfy the port. Multiple adapters can satisfy the same port: terse, verbose, different audience, different model.&lt;/p&gt;

&lt;h3&gt;
  
  
  What a port looks like, written down
&lt;/h3&gt;

&lt;p&gt;Here is a port for a skill file that decides whether a code review should run. I write the port as a small block before I write the body.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;review-decision&lt;/span&gt;
&lt;span class="na"&gt;triggers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;user asks "review this", "look at the diff", "/review"&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;PR is open and unreviewed&lt;/span&gt;
&lt;span class="na"&gt;preconditions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;a diff exists against a base ref&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;the diff is not empty&lt;/span&gt;
&lt;span class="na"&gt;outputs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;a list of findings, each with a file path and line&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;a verdict -&amp;gt; ok | suggest-changes | block&lt;/span&gt;
&lt;span class="na"&gt;invariants&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;never edits the diff&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;never claims pass without evidence&lt;/span&gt;
&lt;span class="na"&gt;out-of-scope&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;running tests&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;refactoring adjacent code&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s the port. It’s twelve lines and it says nothing about tone, examples, metaphors, or the model on the other end. It is, deliberately, boring.&lt;/p&gt;

&lt;p&gt;The first time I wrote a port this way I felt like I was missing something. The contract is so spare. There’s nothing in it to chew on. Then I tried writing two adapters against it and the spareness paid for itself.&lt;/p&gt;

&lt;h3&gt;
  
  
  Two adapters, same port
&lt;/h3&gt;

&lt;p&gt;Here’s adapter A — terse, for a model that infers well from short instructions:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Review the diff against base. For each file, list findings as path:line — issue.&lt;br&gt;&lt;br&gt;
Verdict: ok / suggest-changes / block. Don’t edit. Don’t run tests. Show evidence for any “block.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Here’s adapter B — verbose, for a model that benefits from worked examples:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You’re reviewing a diff against a base ref. Walk through each changed file. For&lt;br&gt;&lt;br&gt;
every issue you find, write one line: the file path, the line number, then a&lt;br&gt;&lt;br&gt;
short description of the problem.&lt;/p&gt;

&lt;p&gt;End with a verdict: “ok” if nothing meaningful, “suggest-changes” for findings&lt;br&gt;&lt;br&gt;
that aren’t blockers, “block” for findings that should stop the merge.&lt;/p&gt;

&lt;p&gt;You won’t edit the diff. You won’t run tests. If you call something a blocker,&lt;br&gt;&lt;br&gt;
quote the line that proves it.&lt;/p&gt;

&lt;p&gt;Example finding:&lt;br&gt;&lt;br&gt;
src/auth.ts:42 — token compared with ==, allows type coercion&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Both adapters honor the same port. The findings have a path and a line. The verdict is one of the three values. The invariants — no editing, no test-running, evidence for blocks — survive in both. The difference between them is texture, not contract.&lt;/p&gt;

&lt;p&gt;Now: which one do I edit when I want to change the tone? Adapter A or B. Which one do I edit when I want to change what “blocker” means? The port, and then both adapters fall out of date in a way I can see. The seam tells me where to touch.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where the seam shows up in real harness work
&lt;/h3&gt;

&lt;p&gt;Once I started looking, the port-adapter split was hiding in every artifact type I had.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skill files.&lt;/strong&gt; A skill has triggers — the conditions under which it fires — and a body — what to do once it fires. The triggers are the port. The body is the adapter. I had been writing both in one stream of prose, with the trigger conditions described inline (“when the user asks about X, do Y”). Splitting them out — frontmatter or top-of-file list for triggers, body for behavior — meant I could change the body without touching the trigger, and rewrite the trigger without disturbing the body. The skill files I rewrote three times stopped needing rewrites.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Slash commands.&lt;/strong&gt; The frontmatter — description, argument hints, required tools — is part of the port. The body is the adapter explaining how to behave. The frontmatter is what the harness reads to decide whether to surface the command; the body is what the model reads once it’s invoked. Those are two different readers, and the seam respects that.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent definitions.&lt;/strong&gt; The tools list is a port: what capabilities does this agent have. The system prompt has both port content (invariants the agent must hold) and adapter content (worked examples, voice, the way you’d explain the job to a new hire). The most useful thing I did with my agent files was move the invariants to the top in a small list, separate from the worked examples below. The invariants stopped getting accidentally edited when I improved an example.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Harness rules.&lt;/strong&gt; A rule has a scope line — when this rule applies, which paths it covers — and a body — what to do when in scope. Scope is the port. Body is the adapter. I had been writing rules with the scope baked into the body (“when working in the API layer, never…”). Pulling the scope into a header line meant the rule could be rewritten without re-deriving the scope, and the harness could decide whether to load the rule without reading the body.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Intent contracts.&lt;/strong&gt; In an intent-driven workflow, the acceptance criteria are the port for “done.” The body of the contract — the explanation, the worked context, the references — is the adapter. The acceptance is what a verifier reads. The body is what the executor reads. Two readers, two purposes, one seam.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsdhhhpmpyaqpk09crodk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsdhhhpmpyaqpk09crodk.png" width="487" height="406"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;One port, many adapters. The port is the expensive part. The adapters are cheap.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Why this pays off
&lt;/h3&gt;

&lt;p&gt;The argument for the pattern is the second-draft argument. First drafts get slower. You’re writing the port explicitly, which feels like overhead. You’d write the body faster without it.&lt;/p&gt;

&lt;p&gt;Second drafts get faster. You’re changing the explanation without re-deriving the contract, or rewriting the contract without disturbing the explanation that already works. The seam means each edit touches one side at a time.&lt;/p&gt;

&lt;p&gt;Most prose-for-LLMs gets edited more than it gets written. A skill file you author in fifteen minutes you’ll edit twenty times over its life. A system prompt you draft in an hour you’ll tune for months. The cumulative cost is in the edits. The seam lowers the cost of every edit after the first.&lt;/p&gt;

&lt;p&gt;The other argument is the multiple-readers argument. Prose-for-LLMs has more readers than people realize.&lt;/p&gt;

&lt;p&gt;You, today, writing it.&lt;/p&gt;

&lt;p&gt;You, in six months, having forgotten what it does.&lt;/p&gt;

&lt;p&gt;Another engineer who inherits it.&lt;/p&gt;

&lt;p&gt;A different model than the one you wrote it against — first Sonnet, then Opus, Fable for a few days, and then back to Opus — something smaller for cost reasons next quarter.&lt;/p&gt;

&lt;p&gt;Each reader wants a different adapter against the same port. The terse adapter works for the smart model and the engineer who already knows the domain. The verbose adapter works for the smaller model and the engineer who’s new to the area. Without a port, you can’t write a second adapter; you have to rewrite the whole thing, and you’ll get the contract subtly wrong on the rewrite.&lt;/p&gt;

&lt;p&gt;With a port, you write one adapter for each reader you care about. They all satisfy the same contract. They diverge in texture and they agree in substance.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to tell if your seam is real
&lt;/h3&gt;

&lt;p&gt;Two tests. Both are cheap to run.&lt;/p&gt;

&lt;p&gt;The first test: rewrite the adapter without changing what’s true. Take your skill file, your prompt, your rule. Cut its length in half, or double it. Change the metaphors. Move from second person to first person. Now read the port and ask: is anything in the port newly out of date? If yes, you weren’t actually working only on the adapter — adapter territory was leaking into the port. If no, the seam held.&lt;/p&gt;

&lt;p&gt;The second test: rewrite the port and watch the adapter fall out of date. Change an input. Change a failure mode. Add an invariant. Look at the adapter and find the parts that now contradict the port. The places the adapter contradicts the port are the places where the seam was leaking the other way — the adapter was specifying contract details. If the adapter falls out of date in a clean, visible way, the seam is real. If you can’t tell which parts of the adapter to update, the seam isn’t really there yet.&lt;/p&gt;

&lt;p&gt;The failure mode to watch for is ports that try to specify the adapter. “Must use a friendly tone.” “Must include three examples.” “Must explain by analogy.” Those are adapter concerns dressed up as contract. They belong in the adapter, where they can vary. The port is what’s true; the adapter is how it sounds. Mixing those up is what got you into the tangle in the first place.&lt;/p&gt;

&lt;p&gt;The other failure mode is adapters that try to redefine the port. The adapter that says “actually, blockers can also mean stylistic issues if the team agrees” is rewriting the contract from inside the explanation. You won’t see it until two adapters drift apart and you can’t tell which one is right.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where the pattern doesn’t fit
&lt;/h3&gt;

&lt;p&gt;The pattern is overhead. For one-off prompts you’ll use once and discard, skip it. For short scripts you’re throwing away after a single run, skip it. The seam pays off over edits, and prose you’ll never edit doesn’t need a seam.&lt;/p&gt;

&lt;p&gt;The other place to be careful: when the port is genuinely entangled with the adapter — when the &lt;em&gt;way&lt;/em&gt; you explain something is part of what you’re trying to do. Writing about tone, voice, style, examples-as-pedagogy. There the adapter is the contract. Forcing a separation is theater.&lt;/p&gt;

&lt;p&gt;For everything else — skills, slash commands, agent definitions, harness rules, prompts you’ll tune over time, contracts that other readers will inherit — the pattern earns its keep within a week.&lt;/p&gt;

&lt;h3&gt;
  
  
  Write your next prompt against an interface
&lt;/h3&gt;

&lt;p&gt;Pick one piece of prose-for-LLMs you’ve edited more than twice. A skill file, a system prompt, a rule, whichever.&lt;/p&gt;

&lt;p&gt;Open it and write the port at the top. Eight to twelve lines. Inputs, outputs, invariants, failure modes, out-of-scope. Don’t worry about wording the body yet — just write down what’s true about the contract.&lt;/p&gt;

&lt;p&gt;Read the existing body against the new port. Find the parts of the body that are stating contract details (“only fires when X”) and move them up to the port. Find the parts of the port you accidentally wrote that are really about tone or examples (“respond in a friendly voice”) and move them down to the body. Be honest about which side each line belongs on.&lt;/p&gt;

&lt;p&gt;Now do the second test. Make a small change to the port — add an invariant, tighten an input. Watch the body. The parts of the body that contradict the new port are the parts where the seam was leaking. Fix them. Repeat once.&lt;/p&gt;

&lt;p&gt;Then leave the file alone for a week. The next time you edit it, notice whether you can change the body without re-deriving the contract, or the port without disturbing the body. If yes, the seam is doing its work. If no, the seam isn’t real yet — find the leak and fix it.&lt;/p&gt;

&lt;p&gt;The artifacts I rewrote three times don’t get rewritten three times anymore. The ports are stable. The adapters change as I learn what works. That’s the whole shape of it.&lt;/p&gt;

</description>
      <category>softwareengineering</category>
      <category>agenticworkflow</category>
      <category>portsandadapters</category>
      <category>agenticai</category>
    </item>
    <item>
      <title>Scoping Rules: Global, Project, Path-Glob</title>
      <dc:creator>Ian Johnson</dc:creator>
      <pubDate>Sat, 13 Jun 2026 16:55:48 +0000</pubDate>
      <link>https://dev.to/tacoda/scoping-rules-global-project-path-glob-dd6</link>
      <guid>https://dev.to/tacoda/scoping-rules-global-project-path-glob-dd6</guid>
      <description>&lt;p&gt;A rule that fires on every file is a tax. A rule that never fires is dead code. The cost of getting scope wrong is paid every session, on every task, by every contributor and almost nobody talks about it.&lt;/p&gt;

&lt;p&gt;I want to talk about it.&lt;/p&gt;

&lt;p&gt;The three scopes in Claude Code are global (~/.claude/CLAUDE.md), project (./CLAUDE.md), and path-scoped (subdirectory CLAUDE.md, or feature docs loaded by path match). They look like a hierarchy. They behave like three different tools.&lt;/p&gt;

&lt;p&gt;The default mistake is to put everything in the project CLAUDE.md and call it done. The file grows. It crosses 500 lines. It crosses 1,000. The agent reads it on every task, including the ones where 90% of the file is irrelevant. The team stops reading it because it is too long to absorb. The new contributor never reads it at all.&lt;/p&gt;

&lt;p&gt;Tokens are the easy half of the cost. Attention is the harder one. Every line of CLAUDE.md is a line the agent has to weigh against every other line, and a line the human has to skip past when they are looking for the one rule they actually need.&lt;/p&gt;

&lt;h3&gt;
  
  
  Global: the rules that follow you
&lt;/h3&gt;

&lt;p&gt;Global rules apply to every project on your machine. The bar is very high. If a rule does not survive the test &lt;em&gt;would I want this in a Python repo, a TypeScript repo, and a shell-script repo&lt;/em&gt;, it does not belong in global.&lt;/p&gt;

&lt;p&gt;What I keep there are personal preferences about how the agent should communicate. Be terse. Do not narrate intermediate reasoning. Do not add commentary to commit messages. Do not write documentation files unless I ask. None of those depend on the language, the framework, or the project. They are about the working relationship, not the code.&lt;/p&gt;

&lt;p&gt;What does not belong in global: anything about a specific language, framework, library, or convention. &lt;em&gt;Use functional components in React&lt;/em&gt; is a fine rule. It does not belong in global. The moment I open a backend repo, the rule is firing against code that does not have React in it. The agent has to either ignore it (wasted tokens) or get confused (worse).&lt;/p&gt;

&lt;p&gt;If you cannot phrase the rule in language that applies equally to any repo, the rule is not global.&lt;/p&gt;

&lt;h3&gt;
  
  
  Project: the rules that define the codebase
&lt;/h3&gt;

&lt;p&gt;Project rules apply to every file in the project. They are for conventions true everywhere in this codebase. Naming. Layout. Build commands. The library you use for X. The framework’s pitfalls.&lt;/p&gt;

&lt;p&gt;The bar for a project rule is &lt;em&gt;does this apply to most of the files in the repo&lt;/em&gt;. If the rule only applies to the frontend, or only the database layer, or only one feature, it does not belong at the project root. It belongs in a path-scoped file.&lt;/p&gt;

&lt;p&gt;A good project rule: &lt;em&gt;Use pnpm, not npm&lt;/em&gt;. Applies everywhere. Cheap for the agent to honor, cheap for a human to verify. &lt;em&gt;Tests live in tests/, not __tests__/&lt;/em&gt;. Same shape. &lt;em&gt;All public exports go through index.ts&lt;/em&gt;. Same shape.&lt;/p&gt;

&lt;p&gt;A bad project rule, the one I see most often: &lt;em&gt;When working in the API module, make sure the validation schema is updated to match&lt;/em&gt;. Applies to maybe 10% of the files. The agent reads it on every other task. It earns its keep one out of ten sessions; the other nine, it is noise.&lt;/p&gt;

&lt;p&gt;That rule belongs in a subdirectory CLAUDE.md inside the API module, where it loads when the agent is touching files in that module and stays silent when it is not.&lt;/p&gt;

&lt;h3&gt;
  
  
  Path-scoped: the rules that earn their keep
&lt;/h3&gt;

&lt;p&gt;Path-scoped rules are the underused primitive. Most teams know they exist and do not use them, because the project root is the path of least resistance and &lt;em&gt;I will move it later&lt;/em&gt; is the path of &lt;em&gt;I will never move it&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The mental model is simple. The agent loads context based on what it is looking at. Edit a file in src/components/, the CLAUDE.md in src/components/ loads. Edit a file in src/api/, the CLAUDE.md in src/api/ loads. Rules that are only true for that subtree go there.&lt;/p&gt;

&lt;p&gt;The discipline pays off three ways.&lt;/p&gt;

&lt;p&gt;The agent’s context window is smaller on every task. The rules it loads are the rules that apply. Nothing else.&lt;/p&gt;

&lt;p&gt;The rules themselves can be specific. A rule that only fires inside src/api/ does not have to caveat itself. It does not have to say &lt;em&gt;in the API module, do X&lt;/em&gt;. It just says &lt;em&gt;do X&lt;/em&gt;, because the file location is doing the scoping. The rule reads more directly. The agent applies it more reliably.&lt;/p&gt;

&lt;p&gt;The rules are colocated with the code they constrain. A new contributor opening src/api/ sees the rules for that module right there. The convention is documented where the convention lives.&lt;/p&gt;

&lt;h3&gt;
  
  
  The test for moving a rule down
&lt;/h3&gt;

&lt;p&gt;The quick check I run on every rule in the project root: &lt;em&gt;would this rule still apply if I deleted half the repo?&lt;/em&gt; If the answer is no, the rule is mis-scoped. It is pretending to be a project rule when it is a module rule.&lt;/p&gt;

&lt;p&gt;The fix is to find the narrowest directory that contains every file the rule applies to, and move the rule there. Sometimes that directory is the project root, and the rule stays. More often it is two or three levels down, and the rule moves with it.&lt;/p&gt;

&lt;p&gt;The move is rarely complex. Cut three lines from CLAUDE.md at the root, paste them into a new CLAUDE.md inside the relevant directory, commit. The agent’s behavior does not change on tasks that touched files in that directory. The agent’s behavior does change on tasks that did not, because the rule is no longer firing where it was not needed.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;That second effect is the whole point.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Audit your project root this weekend
&lt;/h3&gt;

&lt;p&gt;Open your project CLAUDE.md. Walk each rule. Ask the &lt;em&gt;delete half the repo&lt;/em&gt; question. Move the ones that fail down to the narrowest directory that owns them.&lt;/p&gt;

&lt;p&gt;The first audit usually moves four or five rules. The root file gets shorter. The subdirectories get a CLAUDE.md they did not have before, and the conventions of each module become legible to the next person who opens it.&lt;/p&gt;

&lt;p&gt;The harness is not the file at the root. It is the tree.&lt;/p&gt;

</description>
      <category>agenticai</category>
      <category>agenticworkflow</category>
      <category>rules</category>
      <category>harnessengineering</category>
    </item>
    <item>
      <title>Post-Mortems for Agent Runs</title>
      <dc:creator>Ian Johnson</dc:creator>
      <pubDate>Fri, 12 Jun 2026 16:00:34 +0000</pubDate>
      <link>https://dev.to/tacoda/post-mortems-for-agent-runs-32bd</link>
      <guid>https://dev.to/tacoda/post-mortems-for-agent-runs-32bd</guid>
      <description>&lt;p&gt;The agent burned five hours on a refactor that should have taken one. The first hour was fine. The second, it was rewriting a module nobody had asked it to touch. The third through fifth went to rolling back, re-planning, and producing the smaller diff we should have ended up with at the start. The work landed late. The team was annoyed. The lesson was sitting there waiting to be picked up.&lt;/p&gt;

&lt;p&gt;We did not pick it up. The work shipped, the agent moved on, and the same failure mode showed up two weeks later in a different module. We paid for the first incident twice because we had treated it as an annoyance instead of a learning moment.&lt;/p&gt;

&lt;p&gt;The fix was obvious in retrospect. Agent failures want post-mortems, the same way human incidents do. The practice does not transfer automatically, and most teams have not built the habit.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the post-mortem is actually for
&lt;/h3&gt;

&lt;p&gt;The point of a post-mortem is not to assign blame. With agents, that is even more true. There is no person to blame, and pretending the agent is a person leads to bad analysis. The agent did what the agent does. The question is where the team’s setup failed to catch it.&lt;/p&gt;

&lt;p&gt;The setup is a stack of layers: the task description, the harness rules, the hooks, the tests, the code review, and the human supervising. A failure is a place where one of those layers should have caught the problem and did not.&lt;/p&gt;

&lt;p&gt;The post-mortem’s job is to find which layer missed and patch it. The deliverable is a small set of changes — a rule added or modified, a hook tightened, a feature doc written, a task-writing practice updated. Not a list of “things to remember.” A list of changes to the system.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to run one
&lt;/h3&gt;

&lt;p&gt;Not every failure deserves a full post-mortem. The cost is real and the bar should be too.&lt;/p&gt;

&lt;p&gt;I run one when the failure cost at least an hour of redo, or when the same failure type has shown up twice in two weeks. The first condition catches the expensive single incidents. The second catches the cheap repeats that compound into expensive patterns.&lt;/p&gt;

&lt;p&gt;The other trigger: a near-miss that would have been a real incident if a particular reviewer had not caught it. Near-misses are the most underused signal in agent work. The harness was almost not enough. The reviewer was the last line of defense. The next failure of the same kind might not have that reviewer in the room.&lt;/p&gt;

&lt;p&gt;A near-miss is a free post-mortem. The cost has not been paid yet. The lesson is available for the price of the analysis.&lt;/p&gt;

&lt;h3&gt;
  
  
  Five sections, one page
&lt;/h3&gt;

&lt;p&gt;The shape that works for me is short and rigid.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What happened.&lt;/strong&gt; Three sentences, no commentary. &lt;em&gt;The agent was asked to add a validation step to the user-import flow. It produced a 600-line PR that rewrote the entire flow. The PR was caught in review and rolled back.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent was working from.&lt;/strong&gt; The task description. The harness rules that fired. The context it had loaded. If the failure came from a missing rule, the absence is the finding. If it came from a misread task, the description is the finding.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where the layers failed.&lt;/strong&gt; Walk down the stack. For each layer, ask: was this failure mode in scope, and if so, why did the layer not catch it? This is the load-bearing section. The answers are the changes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What changes.&lt;/strong&gt; One or two specific edits. A rule added. A hook tightened. A feature doc scoped to a path. A change to how tickets are written for that kind of work. Small enough to land the same week as the post-mortem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What we are not doing.&lt;/strong&gt; This is the part teams skip and then regret. A post-mortem produces a temptation to add three rules, two hooks, and a process. Most of them will be wrong. Name the ones you considered and rejected, with the reason. The next post-mortem on a similar failure can revisit them, with evidence.&lt;/p&gt;

&lt;p&gt;The whole document is one page. If it grows past a page, the failure is being over-analyzed and the changes are being padded.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where most teams miss the lesson
&lt;/h3&gt;

&lt;p&gt;The pattern I see in teams that adopt agents but do not improve over time: failures get framed as the agent’s failures. The response is “the agent is not ready for that,” and the rope gets pulled back without a corresponding investment in the harness.&lt;/p&gt;

&lt;p&gt;The next failure of the same shape happens to the next contributor, who has no record of the first one. They make the same trust adjustment, alone. The team’s collective knowledge of where the agent is reliable stays the same. The harness does not improve because nobody wrote down what should change.&lt;/p&gt;

&lt;p&gt;The fix is the post-mortem and the patching. The failures are the agent’s, in a literal sense. They are also the team’s, in the sense that the team’s setup let the failure through. The “team’s failure” framing is what produces the patching. The “agent’s failure” framing produces resignation.&lt;/p&gt;

&lt;h3&gt;
  
  
  The patterns that keep showing up
&lt;/h3&gt;

&lt;p&gt;A few categories of finding come up over and over, across the teams I have talked to.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The agent acted on an ambiguous request.&lt;/strong&gt; The task description had two readings and the agent picked one silently. The patch is on the task-writing side, on the harness side (a rule that flags ambiguity), or both.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The agent worked in an unfamiliar area without orienting.&lt;/strong&gt; The diff touched a module the agent had not read. The conventions of that module did not survive the touch. The patch is a feature doc for the module, or a rule about orienting before editing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The agent followed an aspirational rule too literally.&lt;/strong&gt; The harness said &lt;em&gt;prefer composition&lt;/em&gt; and the agent rewrote a working inheritance hierarchy. The patch is to scope the rule or soften it. State it as a preference for new code, not a rewrite trigger.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The agent escalated when it should have proceeded, or proceeded when it should have escalated.&lt;/strong&gt; The escalation ladder is mistuned. The patch is on the ladder itself: a new rung, a tightened rung, a removed rung.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The agent’s mental model of the code drifted from reality.&lt;/strong&gt; A refactor renamed something. The harness still references the old name. The agent reads the harness and confidently uses the missing name. The patch is to update the harness when the rename happens. The harness is a document. Documents need maintenance.&lt;/p&gt;

&lt;p&gt;None of these are exotic. They are failures the team already half-knows about. The post-mortem makes the half-knowledge explicit and ships a change.&lt;/p&gt;

&lt;h3&gt;
  
  
  Once a week, once a quarter
&lt;/h3&gt;

&lt;p&gt;I write the post-mortem within a week of the failure. Later than that and the details fade; the analysis becomes a story rather than a forensic exercise. The changes I produce, I land the same week.&lt;/p&gt;

&lt;p&gt;I read the last quarter of post-mortems once a quarter, looking for cross-cutting patterns. Sometimes three of them all point to the same gap and the right patch is a single change that subsumes the three smaller patches. The quarterly review is where that consolidation happens.&lt;/p&gt;

&lt;p&gt;The cost is small. An hour for the post-mortem itself. An hour or two for the patches. Half a day per quarter for the review. The compounding return is most of why our harness is the shape it is today.&lt;/p&gt;

&lt;h3&gt;
  
  
  Run your first post-mortem this week
&lt;/h3&gt;

&lt;p&gt;The first post-mortem is the hardest. There is no template, the team is not sure what they are looking for, and the temptation is to make it about the agent’s competence instead of the team’s setup. Lean against that temptation. The practice only pays off if it produces changes to the setup.&lt;/p&gt;

&lt;p&gt;The next one writes itself a little easier. The shape emerges by the third or fourth document. The structure above is what mine settled into after about six months; yours will look slightly different and that is fine.&lt;/p&gt;

&lt;p&gt;Pick one signal to start. If &lt;em&gt;the agent burned more than an hour going the wrong direction&lt;/em&gt; is too specific, try &lt;em&gt;any time someone says the agent made the work harder this week&lt;/em&gt;. Looser triggers, narrower analysis. The point is to make the failure visible and the patch concrete.&lt;/p&gt;

&lt;p&gt;Three things to do this week:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Pick the most recent failure where somebody said &lt;em&gt;that took way too long&lt;/em&gt;. Write one page. What happened, what the agent had loaded, which layer should have caught it, what one change you are making.
&lt;/li&gt;
&lt;li&gt;Land the change before Friday. A rule. A hook. A line in the task template. Whatever the post-mortem named.
&lt;/li&gt;
&lt;li&gt;Set a calendar reminder for ninety days out: read the post-mortems in order and look for a pattern.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The agent’s failures are not noise. &lt;strong&gt;They are the most accurate map you have of where your setup is weak.&lt;/strong&gt; Stop letting them go unanalyzed.&lt;/p&gt;

&lt;p&gt;The next failure will tell you something. The post-mortem is how you hear it.&lt;/p&gt;

</description>
      <category>postmortem</category>
      <category>softwareengineering</category>
      <category>failuremodeanalysis</category>
      <category>agenticai</category>
    </item>
    <item>
      <title>Lisp’s Influence on Ruby</title>
      <dc:creator>Ian Johnson</dc:creator>
      <pubDate>Thu, 11 Jun 2026 14:38:48 +0000</pubDate>
      <link>https://dev.to/tacoda/lisps-influence-on-ruby-4j6d</link>
      <guid>https://dev.to/tacoda/lisps-influence-on-ruby-4j6d</guid>
      <description>&lt;p&gt;Once I wrote users.select { |u| u.admin? }.map(&amp;amp;:email) and realized I’d written Lisp.&lt;/p&gt;

&lt;p&gt;Not literally. The parentheses are gone, the prefix notation is gone, the lambdas are syntactic blocks. But the shape of the code (chain a filter onto a transform, ask each element a yes-or-no question with ?, build the result without mutating anything) is Lisp. Ruby just put it in business casual.&lt;/p&gt;

&lt;p&gt;Matz has said as much. He’s described Ruby’s design as starting from a simple Lisp, stripping out macros and s-expressions, then adding an object system, blocks, and Smalltalk-style methods. The features most Rubyists fall in love with aren’t the object-oriented ones. They’re the functional ones, dressed in friendlier clothes.&lt;/p&gt;

&lt;p&gt;Here is the list I think about often, and why each one matters.&lt;/p&gt;

&lt;h3&gt;
  
  
  Method names with question marks
&lt;/h3&gt;

&lt;p&gt;The convention that predicates end in ? came from Scheme. zero?, nil?, empty?, respond_to?, valid?. The mark tells you, at a glance, that the method answers a yes-or-no question. It does not mutate. It does not perform an action. It tells you something true or false about the receiver.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;nil?&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;unless&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;admin?&lt;/span&gt;
&lt;span class="n"&gt;notify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subscribed?&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can read those three lines as English because the ?&lt;code&gt;?&lt;/code&gt; does the heavy lifting. The same convention shows up as ! for methods that mutate or raise: save!, sort!, compact!. Both marks come from Scheme, where null?, pair?, and set! work the same way.&lt;/p&gt;

&lt;p&gt;A small syntactic borrow, but it threads through the whole language. Reading Ruby is faster because of those two characters.&lt;/p&gt;

&lt;h3&gt;
  
  
  Closures and blocks
&lt;/h3&gt;

&lt;p&gt;Blocks are the feature most Rubyists name first when asked what they love about the language. They’re closures: chunks of code that capture their surrounding scope and can be passed around as values.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;each&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="c1"&gt;# =&amp;gt; 6&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The block closes over total. That is the closure pattern: a function value that remembers the environment it was defined in. Lisp had closures decades before Ruby. Scheme made them first-class objects you could pass to anything. Ruby kept the idea and added the lighter syntax. A block, with do…end or curly braces, is a closure with the parentheses stripped off.&lt;/p&gt;

&lt;p&gt;Procs and lambdas are the same idea with the parentheses back on:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;square&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;square&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# =&amp;gt; [1, 4, 9]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That arrow syntax is Ruby’s lambda. The word itself is Lisp’s, from Church’s lambda calculus, plumbed into a working programming language for the first time in 1958.&lt;/p&gt;

&lt;h3&gt;
  
  
  First-class functions
&lt;/h3&gt;

&lt;p&gt;Once you can name a closure and pass it around, functions become values. You can store them in arrays, return them from methods, attach them to objects. Ruby’s Method and Proc classes make this explicit. So does &amp;amp;:method_name, which converts a symbol into a block by looking up the method on the receiver.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;emails&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="ss"&gt;:email&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;admins&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="ss"&gt;:admin?&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That &amp;amp;:foo is a small piece of magic, and it works because functions are values in Ruby. The symbol gets coerced into a proc, the proc gets passed as a block, the block gets called on each element. First-class functions all the way down.&lt;/p&gt;

&lt;p&gt;This is Lisp’s foundational idea: programs are built by composing functions. Ruby borrows the composition and dresses it up in dot-chains.&lt;/p&gt;

&lt;h3&gt;
  
  
  Symbols
&lt;/h3&gt;

&lt;p&gt;:foo is a symbol. It looks like a string with a colon, but it’s a different kind of value. Symbols are interned: every time you write :foo, you get the same object. Two strings that look the same are usually two separate objects in memory; two symbols that look the same are always one.&lt;/p&gt;

&lt;p&gt;That property comes from Lisp. Lisp symbols (atoms, in some dialects) are the original interned values. The reader sees foo, looks it up in a symbol table, and either returns the existing symbol or creates a new one and remembers it. After that, all references to foo point to the same object.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="ss"&gt;:status&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;equal?&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:status&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# =&amp;gt; true&lt;/span&gt;
&lt;span class="s2"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;equal?&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# =&amp;gt; false&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What it buys you in Ruby: fast comparison, free hashing, and a clean syntax for names that aren’t strings.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="ss"&gt;host: &lt;/span&gt;&lt;span class="s2"&gt;"localhost"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;port: &lt;/span&gt;&lt;span class="mi"&gt;5432&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;ssl: &lt;/span&gt;&lt;span class="kp"&gt;true&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:host&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hash keys are the obvious case, but the deeper use is method names. method_name and :method_name are the same idea at two levels. send(:save) calls the save method. define_method(:fetch) {…} defines one. respond_to?(:to_s) asks if one exists. Symbols are how Ruby refers to methods reflectively, which is how the metaprogramming works.&lt;/p&gt;

&lt;p&gt;The &amp;amp;:foo shortcut from the last section is the same idea on a closer pass: a symbol naming a method, coerced into a callable. Symbols carry the names; Ruby looks them up.&lt;/p&gt;

&lt;h3&gt;
  
  
  Collection methods
&lt;/h3&gt;

&lt;p&gt;map, select, reject, reduce, each, flat_map, zip, partition, chunk_while. The Enumerable module is the part of Ruby I would miss most if I had to leave. It’s also the part most directly descended from Lisp.&lt;/p&gt;

&lt;p&gt;Lisp gave us mapcar, filter, reduce. The shape is the same: take a collection, apply a function, get a collection back. No indices. No off-by-ones. No accumulator variable to forget to reset.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;orders&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;placed_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;week&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ago&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;group_by&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="ss"&gt;:customer_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;transform_values&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;group&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="ss"&gt;:total&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That snippet would be five for-loops and a hash in a less expressive language. In Ruby it’s a paragraph that reads top-to-bottom. The chain is doing the same thing a series of nested Lisp &lt;code&gt;map&lt;/code&gt;s and &lt;code&gt;reduce&lt;/code&gt;s would do; the syntax is dotted instead of parenthesized.&lt;/p&gt;

&lt;p&gt;When Rubyists say “the language reads like English,” what they usually mean is “the collection methods compose into sentences.” That’s Lisp’s gift, with Ruby’s punctuation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lazy enumerators
&lt;/h3&gt;

&lt;p&gt;Eager collection methods build the whole result, then return it. [1, 2, 3].map { |n| n *2 } allocates a new array, fills it, hands it back. Fine for small lists. For large or infinite ones it’s a problem.&lt;/p&gt;

&lt;p&gt;Lisp solved this with lazy evaluation and streams. Scheme’s delay and force, Clojure’s lazy sequences, Haskell’s &lt;em&gt;everything&lt;/em&gt;. The idea: don’t compute the result until someone asks for it. A list isn’t an array sitting in memory; it’s a recipe for producing one element at a time.&lt;/p&gt;

&lt;p&gt;Ruby has the same trick. Enumerable#lazy returns an enumerator that pipes operations together without materializing the intermediate collections.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="no"&gt;Float&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;INFINITY&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lazy&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;first&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# =&amp;gt; [9, 36, 81, 144, 225]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That pipeline reads from an infinite range. Without lazy, the select would try to scan the whole range before passing it on; the program would never finish. With lazy, each value flows through the chain one at a time, and only five of them are ever computed.&lt;/p&gt;

&lt;p&gt;The mechanics are pure Lisp. A lazy enumerator is a closure over the source plus a transformation. Calling next advances the closure by one step. first(5) calls next five times, then stops. Everything else stays uncomputed.&lt;/p&gt;

&lt;p&gt;You don’t reach for it often. When you do (paging through a large file, generating combinations until you find one that fits, walking a tree without flattening it), there’s nothing else in Ruby that does the job as cleanly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Duck typing
&lt;/h3&gt;

&lt;p&gt;If it walks like a duck and quacks like a duck, treat it like a duck. Don’t check its type. Send it the message and see what happens.&lt;/p&gt;

&lt;p&gt;Smalltalk shares the credit here. Smalltalk’s “send any message to any object” is closer to duck typing than Lisp’s typed-but-dynamic approach. But Lisp’s tradition of dynamic typing, where values know their types and variables don’t, is part of the same lineage. The idea that a function should care about behavior, not class, runs through both.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;render&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;thing&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;thing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_s&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That method works for anything that responds to to_s. Strings, integers, custom objects, nil. The method does not ask what thing &lt;em&gt;is&lt;/em&gt;. It asks what thing can &lt;em&gt;do&lt;/em&gt;. That posture (behavior over identity) is part of what makes Ruby feel forgiving.&lt;/p&gt;

&lt;h3&gt;
  
  
  Expression-oriented design
&lt;/h3&gt;

&lt;p&gt;Every statement in Ruby returns a value. if returns a value. case returns a value. A method returns its last expression. A block returns its last expression.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;code&lt;/span&gt;
  &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;299&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="ss"&gt;:ok&lt;/span&gt;
  &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;499&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="ss"&gt;:client_error&lt;/span&gt;
  &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;599&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="ss"&gt;:server_error&lt;/span&gt;
  &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="ss"&gt;:unknown&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s Lisp. Lisp has no statements, only expressions. Every form evaluates to something. Ruby kept the discipline without keeping the parentheses, and the result is code that composes. You can drop any expression into any slot.&lt;/p&gt;

&lt;p&gt;Languages with statements ask you to write extra lines. if (x) { result = a; } else { result = b; } is three lines for what should be one. Ruby and Lisp both reject the split. result = if x then a else b end. One less variable, one less assignment to forget.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code that writes code
&lt;/h3&gt;

&lt;p&gt;Lisp’s signature trick is that code is data. Programs are lists, and lists are values, so a program can take a program and return a program. Macros, Lisp’s most-imitated and least-replicated feature, are functions that operate on code before it runs.&lt;/p&gt;

&lt;p&gt;Ruby doesn’t have macros. It has the next-best thing: a metaobject protocol that lets you reshape classes at runtime. define_method, method_missing, class_eval, instance_eval, open classes. None of it is as elegant as Lisp’s macros. All of it solves the same kinds of problems.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Status&lt;/span&gt;
  &lt;span class="sx"&gt;%i[draft published archived]&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;each&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
    &lt;span class="n"&gt;define_method&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;?"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
      &lt;span class="vi"&gt;@state&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That generates three predicate methods at class-definition time. In a language without first-class metaprogramming, you’d write the three methods by hand and accept the duplication. The fact that you can write a loop that defines methods is a direct descendant of “code is data.” It’s the same idea, narrower, in a language that traded macros for blocks.&lt;/p&gt;

&lt;p&gt;This is why DSLs are easy in Ruby. RSpec, Rails routing, Rake, Sinatra. They look like English because Ruby’s syntax bends. They bend because the underlying model is closer to Lisp than to C. The closer you look at a Ruby DSL, the more you see method calls all the way down: receivers and messages like Smalltalk, with metaprogramming carving the shape like Lisp.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why FP and OOP aren’t a fight
&lt;/h3&gt;

&lt;p&gt;It’s tempting to read all of the above as “Ruby is secretly a functional language.” It isn’t. Ruby is an object-oriented language with a functional accent, and the accent is where most of the joy lives.&lt;/p&gt;

&lt;p&gt;The functional-versus-object-oriented debate is mostly a category error. The two paradigms answer different questions. OOP picks an abstraction (usually a domain noun, a thing with state and behavior) and builds from there. FP picks a different abstraction (a function, a transformation, a composition) and builds from that. The choice is which abstraction sits at the center.&lt;/p&gt;

&lt;p&gt;Ruby picks the object. Then it lets you call map on it.&lt;/p&gt;

&lt;p&gt;You can write functional code in Ruby all day. users.map(&amp;amp;:email).reject(&amp;amp;:empty?).sort.uniq is pure functional pipelining. No mutation, no shared state, no surprise. You can also write deeply object-oriented Ruby: domain models, ActiveRecord, service objects, dependency injection. The two styles share the file. Sometimes they share the line.&lt;/p&gt;

&lt;p&gt;Lisp had this conversation first. The Common Lisp Object System is one of the most powerful OO systems ever shipped, and it sits inside a language people usually call functional. Scheme has objects when you want them; they’re closures with a dispatch table. The two paradigms have always been compatible. Hostility between them is a story we tell ourselves.&lt;/p&gt;

&lt;p&gt;What matters is the main abstraction. Pick the one that fits the problem. If the domain is full of behaviors-with-state, lead with objects and use functional methods to operate on collections of them. If the domain is a pipeline of transformations, lead with functions and use objects to carry data through the pipeline. Ruby supports both, because Lisp and Smalltalk both supported both, and Ruby is the language Matz built by taking the best parts of each.&lt;/p&gt;

&lt;h3&gt;
  
  
  Same shapes, different paint
&lt;/h3&gt;

&lt;p&gt;The expressiveness people love about Ruby isn’t original to Ruby. It’s a careful selection from older languages, with Lisp as the largest single source. Knowing where the ideas came from makes them easier to use deliberately, and it makes the next language easier to learn, because the ideas show up again in Clojure, in Elixir, in Scheme, in OCaml. Same shapes, different paint.&lt;/p&gt;

</description>
      <category>functionalprogrammin</category>
      <category>ruby</category>
      <category>oop</category>
      <category>softwaredevelopment</category>
    </item>
  </channel>
</rss>
