<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: arun rajkumar</title>
    <description>The latest articles on DEV Community by arun rajkumar (@mickyarun).</description>
    <link>https://dev.to/mickyarun</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3835684%2F4771b603-8faa-42b1-9e0e-0687faea63a3.jpg</url>
      <title>DEV Community: arun rajkumar</title>
      <link>https://dev.to/mickyarun</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mickyarun"/>
    <language>en</language>
    <item>
      <title>We're Still Designing for Eyes. The Thing Reading Our Apps Now Doesn't Have Any.</title>
      <dc:creator>arun rajkumar</dc:creator>
      <pubDate>Fri, 03 Jul 2026 07:50:41 +0000</pubDate>
      <link>https://dev.to/mickyarun/were-still-designing-for-eyes-the-thing-reading-our-apps-now-doesnt-have-any-hnp</link>
      <guid>https://dev.to/mickyarun/were-still-designing-for-eyes-the-thing-reading-our-apps-now-doesnt-have-any-hnp</guid>
      <description>&lt;p&gt;I spent about an hour of WWDC watching Apple tell developers to make their apps more beautiful.&lt;/p&gt;

&lt;p&gt;Liquid Glass everywhere. A new transparency slider. Refined edges, better legibility. And this year you don't get to opt out. Recompile with Xcode 27 and your app adopts the new look whether you asked for it or not.&lt;/p&gt;

&lt;p&gt;Then, in more or less the same breath, Apple told us the look might not matter much longer.&lt;/p&gt;

&lt;p&gt;Because the other half of that keynote was App Intents. New entity and intent schemas that let your app push its content into Spotlight's semantic index, so Siri can find it and act on it through plain language. A View Annotations API so Siri can reach into what's on screen and do something with it. Foundation Models going open source. Xcode itself running coding agents from Anthropic, Google and OpenAI, wired in over MCP.&lt;/p&gt;

&lt;p&gt;Read those two stories next to each other and the message is hard to miss. Make the screen prettier. Also, assume something that can't see the screen is about to become your real user.&lt;/p&gt;

&lt;p&gt;Nobody at Apple stood up and said "the UI is dead." They rarely say the quiet part out loud. But it doesn't have to land in the App Review Guidelines as a hard rule to be true.&lt;/p&gt;

&lt;h2&gt;
  
  
  The front door moved
&lt;/h2&gt;

&lt;p&gt;Here's what I keep coming back to as someone who ships product.&lt;/p&gt;

&lt;p&gt;For years the front door to software was a screen. You designed the screen, you argued over the screen, you A/B tested the button on the screen. The screen was the product.&lt;/p&gt;

&lt;p&gt;That's shifting under us. The front door is becoming an agent. Siri, Copilot, Claude, whatever your customer happens to be talking to. It reads on their behalf and makes the call on their behalf. Increasingly it acts on their behalf too. It does not care about your hero animation. It cares whether it can understand what you do, and then do it.&lt;/p&gt;

&lt;p&gt;I ran into this with our own product. We're a payments company. We built an MCP server for our platform so an agent can look up a payment or kick off a refund without anyone opening the dashboard. Building that changed how I think about the whole thing. In that flow, our carefully designed dashboard wasn't the product. The clean, machine-readable version of it was.&lt;/p&gt;

&lt;h2&gt;
  
  
  Then everyone rushes to llms.txt and gets it half wrong
&lt;/h2&gt;

&lt;p&gt;The reflex, once this clicks, is to drop an &lt;code&gt;llms.txt&lt;/code&gt; on your site and call yourself future-proofed.&lt;/p&gt;

&lt;p&gt;I had that instinct too. And clean, structured text genuinely helps. Serving markdown instead of a wall of HTML can cut the tokens an agent burns reading your page by half or more. Some teams report close to 10x. Fewer tokens means the model actually finishes your page instead of bailing halfway through. That part is real.&lt;/p&gt;

&lt;p&gt;But be honest about what &lt;code&gt;llms.txt&lt;/code&gt; is and isn't. After roughly eighteen months of noise, it's on about one in ten sites, and a big slice of that is Shopify quietly switching it on for every store by default. The bigger problem: the major crawlers from OpenAI, Google and Anthropic mostly don't fetch it. A study across 300,000 domains found it doesn't measurably move your AI citations. If you're adding it as an SEO cheat code, you'll be let down.&lt;/p&gt;

&lt;p&gt;Where it actually earns its place today is narrower and more useful. It works as a clean map for coding agents. Cursor, Claude Code, Copilot and the rest read it. Documentation sites get real mileage from it. That's the tell. The people getting value from agent-readable content treat it as plumbing, not marketing.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'm telling my team instead
&lt;/h2&gt;

&lt;p&gt;So I'm not chasing &lt;code&gt;llms.txt&lt;/code&gt; for citations. I'm asking for something duller and, I think, more durable.&lt;/p&gt;

&lt;p&gt;Treat the machine-readable version of every important surface as a real output, not an afterthought. If a page or a screen matters, there should be a clean text representation an agent can consume without scraping your DOM and guessing.&lt;/p&gt;

&lt;p&gt;Put a short summary at the top of everything. Two or three lines, plain language, saying what this page is and what you can do here. A summary block helps a human skimming on their phone. It also happens to be the first thing a model reads before deciding whether the rest of your content is worth its context window. You write it once and both readers win.&lt;/p&gt;

&lt;p&gt;Expose actions, not just words. Content is turning into table stakes. What an agent actually wants is a verb. "Refund this." "Book that." "Show me last month." App Intents on Apple's side, MCP on ours, a boring documented API underneath. The teams that do well in the agent era won't be the ones with the prettiest content. They'll be the ones whose product can be operated with no human in the room.&lt;/p&gt;

&lt;p&gt;And measure the thing that now matters. Can an agent complete your core task, start to finish, without a screenshot? We've started treating that as a first-class test, the same way we treat "can a new engineer run the whole stack in five minutes." If the answer is no, no amount of Liquid Glass is going to save you.&lt;/p&gt;

&lt;h2&gt;
  
  
  The UI got demoted, not buried
&lt;/h2&gt;

&lt;p&gt;I don't think the UI is dead. I've got designers I'd go to bat for and screens I'm proud of. But it's been demoted. It used to be the product. Now it's one interface among several, and often the one your customer touches least.&lt;/p&gt;

&lt;p&gt;Apple just spent a keynote making the screen prettier and, at the same time, making it optional. That's not a contradiction. That's the shift, sitting in one room.&lt;/p&gt;

&lt;p&gt;So here's the question I keep putting to my team, and I'll put it to you. If the agent is the new user, what does your product look like to something that can't see?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>webdev</category>
      <category>startup</category>
    </item>
    <item>
      <title>I Deleted Our Confluence. The Code Is the Wiki Now.</title>
      <dc:creator>arun rajkumar</dc:creator>
      <pubDate>Mon, 29 Jun 2026 13:19:47 +0000</pubDate>
      <link>https://dev.to/mickyarun/i-deleted-our-confluence-the-code-is-the-wiki-now-ja9</link>
      <guid>https://dev.to/mickyarun/i-deleted-our-confluence-the-code-is-the-wiki-now-ja9</guid>
      <description>&lt;p&gt;A feature needed context. I opened the wiki.&lt;/p&gt;

&lt;p&gt;The page was six months old. It described an architecture we'd rebuilt twice since. It was wrong, top to bottom, and confident about it. Three engineers had already shipped decisions that week based on it.&lt;/p&gt;

&lt;p&gt;Nobody messed up. The page just got left behind by the code, the way they all do.&lt;/p&gt;

&lt;p&gt;So I deleted it. Then I deleted the habit that kept making it.&lt;/p&gt;

&lt;p&gt;This is the part of Bodhiorchard, the open-source dev workflow I've been building on my own, that I get the most pushback on. So let me actually make the case. Including where it doesn't hold.&lt;/p&gt;

&lt;h2&gt;
  
  
  Every wiki you own is lying to you
&lt;/h2&gt;

&lt;p&gt;You just haven't caught it yet.&lt;/p&gt;

&lt;p&gt;It starts honest. Someone writes up how the payments service works, and that day it's perfect. Then the code moves. A field gets renamed. A flow reroutes. A service splits in two.&lt;/p&gt;

&lt;p&gt;Nobody updates the page. Not out of laziness. Updating the page is everyone's job, so it's nobody's.&lt;/p&gt;

&lt;p&gt;Six months later a new hire reads it, believes it, and ships a bug.&lt;/p&gt;

&lt;p&gt;By then the wiki isn't documentation. It's a museum of how things used to work.&lt;/p&gt;

&lt;p&gt;We've all quietly signed up for the same deal: pretend the wiki is current, then grep the code when you actually need an answer. The grep is the real documentation. The page is decoration.&lt;/p&gt;

&lt;h2&gt;
  
  
  The reason docs rot isn't discipline
&lt;/h2&gt;

&lt;p&gt;It's that a wiki is a second copy of the truth.&lt;/p&gt;

&lt;p&gt;You write it by hand. You keep it next to the thing it describes. The two drift apart. Always. Two copies of one fact, updated by different people on different days, will diverge. That's not a team failing, it's just entropy.&lt;/p&gt;

&lt;p&gt;Code can't drift from itself. It's the thing that runs. When the doc and the code disagree, the code wins, because the code is what's live at 2am.&lt;/p&gt;

&lt;p&gt;So the whole thing rests on one line:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The code is the wiki.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Not "we document well." Not "the docs sit near the code." The list of what your system does gets generated from the source, on its own, and never gets hand-written in the first place. You can't forget to update something that was never separate.&lt;/p&gt;

&lt;h2&gt;
  
  
  The baseline scan
&lt;/h2&gt;

&lt;p&gt;Connect a repo and Bodhiorchard scans it. It doesn't ask you to describe anything. It reads the call graph, the module boundaries, the route handlers, the history, and pulls the live features straight out.&lt;/p&gt;

&lt;p&gt;My own run came back with this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Baseline scan complete.
  4 repositories indexed
  19 active features extracted
  312 code locations linked
  0 pages written by a human
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nineteen features. Each one traceable to the exact files that back it. I typed none of it. Nobody keeps it current.&lt;/p&gt;

&lt;p&gt;Two things make this more than a fancy grep.&lt;/p&gt;

&lt;p&gt;First, it indexes code locations, not just words. Every fact points back to a real file and symbol:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"feature"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Idempotent refund webhook handling"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"summary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Bank may resend the same refund webhook up to 3x; processed once."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"code_locations"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"payments/src/webhooks/refund.handler.ts:24"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"payments/src/guards/refundGuard.ts:8"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"shared/db/migrations/0041_refund_dedupe.sql"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"linked_repos"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"payments"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"shared"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"last_synced_commit"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"9f2c1ab"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Ask "how does refund idempotency work?" and the answer gets rebuilt from that. Not from a paragraph someone wrote at launch.&lt;/p&gt;

&lt;p&gt;Second, it links across repos. A frontend call gets wired to the backend handler it actually hits. The thing wikis are worst at, "where does this logic continue," is the thing a code graph is best at.&lt;/p&gt;

&lt;p&gt;And it doesn't go stale, because staleness is designed out:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;on PR merge to main&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="s"&gt;-&amp;gt; re-scan the affected paths&lt;/span&gt;
  &lt;span class="s"&gt;-&amp;gt; update the feature's code_locations + commit history&lt;/span&gt;
  &lt;span class="s"&gt;-&amp;gt; re-embed for semantic search&lt;/span&gt;
  &lt;span class="s"&gt;-&amp;gt; flag anything the diff orphaned&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every merge updates the feature that changed. The next person, or agent, to touch it inherits today's truth. A daily job sweeps for drift, so it shows up as a flag instead of a production incident.&lt;/p&gt;

&lt;p&gt;You don't even open a dashboard to read any of this. You ask in plain English in Slack, and the answer comes back with a link to the code it came from.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it changes day to day
&lt;/h2&gt;

&lt;p&gt;Onboarding stops starting from a lie. A new engineer asks what the system does and gets an answer built from what it does today, files attached. Not the version from two refactors ago.&lt;/p&gt;

&lt;p&gt;Status questions get answered from reality. "Are we on track for the P3 item, and what's the go-live date?" comes back from the live record, not a number someone typed into a board last Tuesday.&lt;/p&gt;

&lt;p&gt;And your AI agents stop reasoning from stale context. This is the one I underrated. An agent is only as good as the context you hand it. Feed it a six-month-old wiki page and it'll cheerfully build on an architecture that's already gone. You get a clean PR that's wrong in a way that costs you an afternoon to spot.&lt;/p&gt;

&lt;p&gt;That's why I stopped treating AI context files and human docs as two separate things. They're one thing with two readers. Generate them from one source, or watch them split apart the way the wiki did.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where it stops
&lt;/h2&gt;

&lt;p&gt;I'm not going to tell you this kills documentation. It doesn't. And where it stops matters.&lt;/p&gt;

&lt;p&gt;Auto-extraction is great at what exists and where. It's useless at why.&lt;/p&gt;

&lt;p&gt;A scan can tell you there's an idempotency guard on the refund webhook and point you at the line. It can't tell you the guard is there because one bank resends webhooks three times and double-refunded a customer back in March. That's intent. You earn it through pain, and no scan will ever infer it.&lt;/p&gt;

&lt;p&gt;So the why still needs a person. In Bodhiorchard it lives in the BUD: one markdown file per feature, holding the intent, the criteria, the decisions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# BUD-241 · Idempotent webhook handler for refunds&lt;/span&gt;

&lt;span class="gu"&gt;## Intent&lt;/span&gt;
Bank resends the same refund webhook up to 3x. We must process exactly once.

&lt;span class="gu"&gt;## Why this is hard (the part a scan can't infer)&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; 2026-03 incident: duplicate webhook -&amp;gt; double refund. Don't regress this.
&lt;span class="p"&gt;-&lt;/span&gt; An already-refunded txn must be REJECTED, not silently retried.

&lt;span class="gu"&gt;## Acceptance criteria&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Duplicate webhook IDs are a no-op (return 200, no state change)
&lt;span class="p"&gt;-&lt;/span&gt; Illegal transition complete -&amp;gt; pending is impossible
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The BUD is written by a human. But it's versioned, it's searchable, and it travels with the code as context for every agent. So even the human part stays alive instead of rotting in a tool nobody opens.&lt;/p&gt;

&lt;p&gt;That split is the point. Machines maintain what rots fastest. People own what machines can't. The busywork dies, the judgment stays.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it, and tell me where it breaks
&lt;/h2&gt;

&lt;p&gt;Bodhiorchard is Apache 2.0, self-hosted, and runs on a Mac mini in the corner of my room. Your repos, embeddings, and audit log never leave your hardware. I built it on my own time. The regulated fintech I run engineering at is where I felt this pain. It's not what owns the code.&lt;/p&gt;

&lt;p&gt;Repo, with six demo videos and four sample repos to point it at: &lt;a href="https://github.com/mickyarun/bodhiorchard" rel="noopener noreferrer"&gt;https://github.com/mickyarun/bodhiorchard&lt;/a&gt;&lt;br&gt;
Full methodology: &lt;a href="https://bodhiorchard.ai/" rel="noopener noreferrer"&gt;https://bodhiorchard.ai/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I'm not after stars. I'm stuck on one question.&lt;/p&gt;

&lt;p&gt;Does "the code is the wiki" survive contact with your team? Or does the why sprawl back across five tools no matter how good the extraction gets? And if it does, where does it leak first?&lt;/p&gt;

&lt;p&gt;I spent fifteen years maintaining wikis that were wrong by Friday. I finally stopped. If you've killed a doc tool and lived to tell it, what broke first: the tool, or your team's trust in it?&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm Arun, CTO and co-founder of Atoa, a UK open banking payments platform, and the solo author of Bodhiorchard. I write about what building with AI is actually like, not what the conference slides say. Find me on &lt;a href="https://x.com/mickyarun" rel="noopener noreferrer"&gt;X @mickyarun&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>ai</category>
      <category>devops</category>
      <category>productivity</category>
    </item>
    <item>
      <title>We Keep Our Architecture Rules in the Repo. The AI and the New Hire Read the Same File.</title>
      <dc:creator>arun rajkumar</dc:creator>
      <pubDate>Tue, 23 Jun 2026 13:23:04 +0000</pubDate>
      <link>https://dev.to/mickyarun/we-keep-our-architecture-rules-in-the-repo-the-ai-and-the-new-hire-read-the-same-file-48me</link>
      <guid>https://dev.to/mickyarun/we-keep-our-architecture-rules-in-the-repo-the-ai-and-the-new-hire-read-the-same-file-48me</guid>
      <description>&lt;p&gt;A few weeks ago I watched an agent build a feature beautifully.&lt;/p&gt;

&lt;p&gt;Clean code. Tests passed. Did exactly what I asked.&lt;/p&gt;

&lt;p&gt;Three sessions later, I opened the same service and didn't recognise it. Nothing was &lt;em&gt;wrong&lt;/em&gt;, exactly. Every individual decision was reasonable. Stacked together, they'd quietly walked the codebase somewhere I would never have designed it.&lt;/p&gt;

&lt;p&gt;That's when it clicked. The problem wasn't the model. The model was great. The problem was that every session started from zero — no memory of the boundaries I'd been protecting, no idea which patterns were load-bearing, no clue about the trade-off I made three weeks ago for a reason I never wrote down.&lt;/p&gt;

&lt;p&gt;I'd seen this exact failure before. Just with humans.&lt;/p&gt;

&lt;h2&gt;
  
  
  The new hire who never read the docs
&lt;/h2&gt;

&lt;p&gt;You know the version. Someone joins. They're sharp. They ship something in week one that works — and breaks an unwritten rule nobody told them about. Not their fault. The rule lived in my head, or in a Slack thread from last March, or in the muscle memory of whoever's been here longest.&lt;/p&gt;

&lt;p&gt;So we'd explain it. Then explain it again to the next person. The "why" never made it anywhere durable. It just got re-explained, badly, on demand.&lt;/p&gt;

&lt;p&gt;An AI session is a new hire who shows up brilliant, fast, and with total amnesia. Every single time. Re-explaining the codebase to a fresh chat window every morning is exhausting, and honestly I'd forget half of it under pressure anyway.&lt;/p&gt;

&lt;p&gt;Same problem. I'd just been solving it twice.&lt;/p&gt;

&lt;h2&gt;
  
  
  One source of truth, two audiences
&lt;/h2&gt;

&lt;p&gt;Here's the shift that fixed it for us: the context an AI agent needs to not wreck your codebase is &lt;em&gt;the same context&lt;/em&gt; a new engineer needs on day one.&lt;/p&gt;

&lt;p&gt;Not similar. The same.&lt;/p&gt;

&lt;p&gt;Which patterns are deliberate. Where the boundaries are and why crossing them costs you. What "done" means here. Which decisions are settled and which are still up for debate. A human needs that to contribute without breaking things. An agent needs that to contribute without breaking things.&lt;/p&gt;

&lt;p&gt;So we stopped keeping it in our heads and started keeping it in the repo. Plain markdown. Committed. Versioned with the code it describes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;service-payments/
├── CLAUDE.md          # how this service works, and why
├── src/
├── test/
└── ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A root &lt;code&gt;CLAUDE.md&lt;/code&gt; (or &lt;code&gt;AGENTS.md&lt;/code&gt; — pick the convention your tools read) carries the project-wide principles. Each service that has real rules of its own gets its own. When an agent opens that service, it loads the file automatically. When a human opens that service, the same file is sitting right there, written so a person can actually read it.&lt;/p&gt;

&lt;p&gt;One file. Two readers. Nothing to keep in sync, because there's only one of it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually goes in the file
&lt;/h2&gt;

&lt;p&gt;This is where most teams get it wrong. They dump the obvious in there — "we use TypeScript," "run the tests before you push" — and the file becomes noise everyone scrolls past.&lt;/p&gt;

&lt;p&gt;The rule I use: &lt;strong&gt;a line earns its place only if leaving it out would let a plausible, reasonable-looking mistake sail through review.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That filter kills most of what you'd be tempted to write. What survives is the good stuff:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Boundaries&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Services talk over the message bus, never direct HTTP to each other.
  If you need another service's data synchronously, that's a design
  smell — raise it, don't route around it.

&lt;span class="gu"&gt;## Validation&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Every inbound payload is parsed by a schema at the edge. No raw
  request bodies past the controller. A missing field fails loud on
  the way in, not three layers deep at runtime.

&lt;span class="gu"&gt;## What "done" means here&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; A feature isn't done when it works. It's done when the next person
  can tell what it does without asking you.

&lt;span class="gu"&gt;## Settled vs open&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; SETTLED: how we do idempotency. Don't reinvent it; copy the pattern.
&lt;span class="p"&gt;-&lt;/span&gt; OPEN: our caching story. If you touch it, expect a conversation.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice none of those are syntax. They're &lt;em&gt;judgment&lt;/em&gt;. The settled-vs-open split alone has saved me hours, because it tells both the agent and the human where to copy an existing pattern versus where to stop and ask a person.&lt;/p&gt;

&lt;p&gt;The "why" matters as much as the rule. "Don't call services directly" is an order. "Don't call services directly, because the moment you do you've created a hidden dependency that nobody can see until it breaks at 2am" is something a human will actually remember and an agent will actually respect.&lt;/p&gt;

&lt;h2&gt;
  
  
  The honest part
&lt;/h2&gt;

&lt;p&gt;This isn't free, and it isn't a silver bullet.&lt;/p&gt;

&lt;p&gt;The files rot if you let them. A rule that's no longer true is worse than no rule — it teaches the wrong thing to two audiences at once. So they get reviewed like code, because they &lt;em&gt;are&lt;/em&gt; code now. When a settled decision changes, the file changes in the same PR. If that feels like overhead, it's the overhead you were already paying in repeated explanations — just made visible.&lt;/p&gt;

&lt;p&gt;And no, I didn't stop using AI tools while waiting for the perfect setup. The point isn't to control the agent. It's to make sure my long-term thinking is still present in a codebase that's increasingly being written by someone — something — that wasn't in the room when the decisions got made.&lt;/p&gt;

&lt;h2&gt;
  
  
  When one file stops being enough
&lt;/h2&gt;

&lt;p&gt;A markdown file works until it doesn't. One service, a handful of rules — &lt;code&gt;CLAUDE.md&lt;/code&gt; is perfect. But spread it across a dozen services and the &lt;em&gt;tending&lt;/em&gt; becomes the whole job. The rules drift from the code they describe. The "why" behind a feature ends up split across a spec, a ticket, a PR description, and a conversation nobody can find six weeks later. The flat file can't keep up, and a stale rule teaches the wrong thing to both readers at once.&lt;/p&gt;

&lt;p&gt;That rot is what pushed me to build &lt;a href="https://bodhiorchard.ai" rel="noopener noreferrer"&gt;Bodhiorchard&lt;/a&gt; — an open-source, self-hosted project I've been working on independently (Apache 2.0, runs on Claude Code). Same idea as the file, taken further. Instead of one feature's knowledge scattered across tickets, everything for that feature lives in a single living document: the spec, the tech spec, the test plan, the acceptance criteria, the full history — tied to the actual code. And the knowledge layer stays current by syncing from the code and those docs automatically, so it's semantically searchable by a human and fed straight into every agent's prompt.&lt;/p&gt;

&lt;p&gt;That's the part I care about most. Not a wiki you have to remember to update. A wiki that's current &lt;em&gt;because&lt;/em&gt; it's wired into where the work already happens. Confluence goes stale the day you write it. This doesn't, because nobody's job is to keep it alive by hand.&lt;/p&gt;

&lt;p&gt;The principle never changed. One source of truth, two audiences. I just got tired of being the sync engine.&lt;/p&gt;

&lt;h2&gt;
  
  
  What changed
&lt;/h2&gt;

&lt;p&gt;Less re-explaining. That's the boring, real win.&lt;/p&gt;

&lt;p&gt;New engineers read the same file the agent reads, and it turns out the document you write for a machine that takes everything literally is a &lt;em&gt;really&lt;/em&gt; good onboarding doc. No assumed context. No "you'll pick it up." Just the actual shape of the thing.&lt;/p&gt;

&lt;p&gt;And the drift slowed down. Not because the agent got smarter — because it finally knew which walls were load-bearing.&lt;/p&gt;

&lt;p&gt;I used to think the codebase lived in the code. It doesn't. Half of it lives in the decisions &lt;em&gt;around&lt;/em&gt; the code — and for years I kept that half in my head, where exactly one reader could access it.&lt;/p&gt;

&lt;p&gt;Now it's in the repo. Where everyone can. Human or not.&lt;/p&gt;

&lt;p&gt;If you're letting AI write a meaningful share of your code: where do the rules of your codebase actually live right now? And who can read them?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>nestjs</category>
      <category>devops</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Same Prompt, Four AI Tools, One Cricket Banner: ChatGPT Won the Image, Grok Won the Video, and Claude Built a Website Again</title>
      <dc:creator>arun rajkumar</dc:creator>
      <pubDate>Tue, 16 Jun 2026 12:30:00 +0000</pubDate>
      <link>https://dev.to/mickyarun/same-prompt-four-ai-tools-one-cricket-banner-chatgpt-won-the-image-grok-won-the-video-and-1gba</link>
      <guid>https://dev.to/mickyarun/same-prompt-four-ai-tools-one-cricket-banner-chatgpt-won-the-image-grok-won-the-video-and-1gba</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; — A few weeks ago I tested four AI tools on a &lt;em&gt;build&lt;/em&gt; job: a website for my son's cricket academy. This time the job had nothing to do with code. The coach just wanted a banner he could post. Same four tools, totally different result. ChatGPT made the best image, Grok made the best video, Gemini wouldn't make anything, and Claude tried to solve a graphics problem by writing HTML.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you read &lt;a href="https://dev.to/mickyarun/i-asked-three-coding-agents-to-build-my-sons-cricket-coach-a-website-the-result-wasnt-decided-by-3fam"&gt;the last post&lt;/a&gt;, you've met my son's cricket coach. He runs MMCA — Maverick Master's Cricket Academy. Started in 2020, based in Bengaluru, genuinely good with the kids.&lt;/p&gt;

&lt;p&gt;The website is live now and parents have started messaging him on WhatsApp. So last weekend he came back with the next thing he needed, which is the thing every small academy actually runs on:&lt;/p&gt;

&lt;p&gt;"Can you make me a weekend batch banner? Something I can post in the parent groups."&lt;/p&gt;

&lt;p&gt;Now, this is a completely different job from the last one. That first experiment was design and development — agents writing real code, running tests, deploying to Cloudflare. This one is just graphics. No repo, no deploy, nobody reviewing a pull request. Just: here's my logo, here's a sample I like, make me something I'd be happy to send out.&lt;/p&gt;

&lt;p&gt;So I figured I'd run the same four tools again and see what happened. Same brief, same logo, everything on the &lt;strong&gt;default model with no special settings&lt;/strong&gt;: ChatGPT, Claude, Gemini, Grok.&lt;/p&gt;

&lt;p&gt;Here's roughly what I typed, the way a normal client would brief you:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Similar to this banner, make one for MMCA Academy (since 2020, logo attached). Weekend batch Sat 4:30—7, Sun 7—9:30pm. Add a small phrase like the sample. Be creative, keep it simple, but don't copy the sample exactly.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The whole test really came down to one instruction: be creative, but don't copy. Whatever each tool did with that told me everything.&lt;/p&gt;

&lt;h2&gt;
  
  
  Round 1: the static banner
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;ChatGPT&lt;/strong&gt; got it on the first go. "WEEKEND BATCH. TRAIN. PLAY. GROW." Logo top-left, the "Since 2020" bit kept, timings in clean little cards, an enrol number, three badges across the bottom for coaching, skill, discipline. It clearly understood this was a flyer a coach would hand out at a school gate, and that's exactly what it gave me.&lt;/p&gt;

&lt;p&gt;Then I asked for one more version — smaller logo, add a human hero this time — and it came back with "DREAM. PRACTICE. PERFORM." and a photoreal batsman walking out under stadium lights. Looked like a film poster. Two prompts, two banners I'd genuinely use, no arguing with it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fssnseem4i6oz3cwdo7xi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fssnseem4i6oz3cwdo7xi.png" alt=" " width="800" height="1428"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhal3bsgsxxtsxqkd3ezz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhal3bsgsxxtsxqkd3ezz.png" alt=" " width="800" height="1200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude&lt;/strong&gt; is the one that made me laugh. I asked for a banner. It told me it would "create the MMCA banner as an HTML file you can download," ran six commands, and gave me a dark navy web page. "TRAIN HARDER. Play Smarter. Win Together," with Saturday/Sunday cards and a Register button. And to be fair it looked nice — same eye for design that won me over on the website build.&lt;/p&gt;

&lt;p&gt;But it's not a banner. It's a landing section. You can't drop a &lt;code&gt;&amp;lt;div&amp;gt;&lt;/code&gt; into a WhatsApp group and call it a poster.&lt;/p&gt;

&lt;p&gt;That's the part worth sitting with. The tool I actually shipped the website with, the one with the best taste when the medium is code, defaulted straight back to code the moment I asked for graphics. It's wired to build. Ask it to build something and it's brilliant. Ask it to draw something and it quietly turns your design job back into an engineering one.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyw29dpksvoegs0s4cmfk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyw29dpksvoegs0s4cmfk.png" alt=" " width="800" height="1265"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Grok&lt;/strong&gt; at least made an actual image, which already put it ahead of half the field. Problem was it threw everything at the wall — three overlapping player photos, mismatched fonts, text everywhere, faded cricketers bleeding through the background. The exact opposite of "keep it simple." It knew what a banner was. It just didn't know when to stop.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp1ndew9k9ghtozmnjomv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp1ndew9k9ghtozmnjomv.png" alt=" " width="800" height="985"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gemini&lt;/strong&gt; gave me three attempts and three different ways of saying it couldn't quite generate that. The image guardrails kept tripping over what was, I'll remind you, a children's cricket flyer. I gave up after the third try. For a test that's purely about making graphics, a tool that won't make graphics doesn't really place.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhcvd5nuj4hgk7qi3jw35.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhcvd5nuj4hgk7qi3jw35.png" alt=" " width="800" height="889"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Round 1 goes to ChatGPT,&lt;/strong&gt; and it's not close. Two prompts, two finished banners, on brand, logo respected, the restraint Grok didn't have and an actual image Gemini wouldn't produce.&lt;/p&gt;

&lt;h2&gt;
  
  
  Round 2: make it move
&lt;/h2&gt;

&lt;p&gt;Static was only half of what I was curious about. The real 2026 question is whether I can turn a flat banner into a short clip with a voiceover. So I took the good banner and gave two tools a single line: "make this banner live."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Grok&lt;/strong&gt; nailed it. One prompt, six seconds. The banner came alive — the batsman moving, the paint-splash colours animating in, the timings resolving on screen — and over the top, a clean Indian-accented voiceover reading the academy out. "Where practice meets purpose." Honestly it looked like something an agency would charge ₹15,000 for.&lt;/p&gt;

&lt;p&gt;One caveat, and it's the CTO in me talking: check the details. The phone number that showed up in the video wasn't the one I'd put in the source banner. Motion tools will happily rewrite text you wanted left alone, so you proof it before it goes out. Gorgeous result, but you don't post it blind.&lt;/p&gt;


&lt;div&gt;
    &lt;iframe src="https://www.youtube.com/embed/5qiRJGbioDk"&gt;
    &lt;/iframe&gt;
  &lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Gemini&lt;/strong&gt;, the same tool that wouldn't make me a still image, decided it &lt;em&gt;would&lt;/em&gt; make me a video. Ten seconds of abstract paint-splash motion, a "WEEKEND BATCH" title card, a voiceover — but disconnected from the actual banner, and the on-screen text kept breaking apart and reflowing into gibberish. The idea was there, the execution wasn't. A trailer for a banner that never got made.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/Camj5qn0CpU"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Round 2 goes to Grok.&lt;/strong&gt; For motion plus voice off a single prompt, nothing else came close.&lt;/p&gt;

&lt;h2&gt;
  
  
  The scorecard
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Static banner&lt;/th&gt;
&lt;th&gt;Live video&lt;/th&gt;
&lt;th&gt;Read the brief?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ChatGPT&lt;/td&gt;
&lt;td&gt;Won it in 2 prompts&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;Yes, with restraint&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Grok&lt;/td&gt;
&lt;td&gt;Cluttered, no restraint&lt;/td&gt;
&lt;td&gt;Won it, voice + motion&lt;/td&gt;
&lt;td&gt;Half — strong on motion&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini&lt;/td&gt;
&lt;td&gt;No output, guardrails&lt;/td&gt;
&lt;td&gt;Out of context, broken text&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude&lt;/td&gt;
&lt;td&gt;Built an HTML page&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;Wrong medium, nice taste&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Image goes to ChatGPT. Video goes to Grok.&lt;/p&gt;

&lt;h2&gt;
  
  
  What changed from the coding test, and what didn't
&lt;/h2&gt;

&lt;p&gt;Last time, on a development job, Claude was the tool I actually wanted to keep working with. The taste that made its website the nicest in that lineup is the same taste that made its HTML "banner" pleasant to look at here. None of that's a knock — it was just answering a question I hadn't asked. Point a code-native model at a design brief and it reaches for the thing it's best at.&lt;/p&gt;

&lt;p&gt;Switch the job from build to draw and the whole ranking flips. A few things I'm taking away:&lt;/p&gt;

&lt;p&gt;The best coding agent isn't automatically the best graphics agent. Obvious once you say it out loud, easy to forget when your whole team has standardised on one tool. The model that shipped my website couldn't make me a poster.&lt;/p&gt;

&lt;p&gt;Guardrails are a product decision you feel as a user. Gemini turned down a kids' cricket flyer three times and then made a broken video of it. Whatever the safety reasoning, what I experienced was a tool that wouldn't do the simplest creative thing I needed.&lt;/p&gt;

&lt;p&gt;And "done" beats "impressive." ChatGPT's banners weren't the flashiest pixels I've ever seen. They were finished, and I could post them in two prompts. Grok's video was the flashiest thing in the whole test and still needed me to catch a wrong phone number. For graphics, I'll take the tool that hands me something I can ship today over the one that wows me in the demo.&lt;/p&gt;

&lt;p&gt;So I'm not picking a winner overall. I'm picking one per medium. ChatGPT makes the still, Grok makes it move. That's the actual workflow now — not one model to rule them all, just the right one for the thing in front of you.&lt;/p&gt;

&lt;p&gt;The coach got his banner. Two of them, really, plus a video. Cost me a handful of prompts and one careful read-through.&lt;/p&gt;

&lt;p&gt;Last time the lesson was that taste decided a coding job. Turns out taste decides a graphics job too. It just lives in different tools.&lt;/p&gt;

&lt;p&gt;Which one are you reaching for when the job is graphics and not code? And has your favourite coding agent ever quietly tried to turn a design task back into a dev task on you? Curious to hear it.&lt;/p&gt;

&lt;h1&gt;
  
  
  AI #DesignTools #BuildInPublic #GenAI #Startup
&lt;/h1&gt;

</description>
      <category>ai</category>
      <category>designtools</category>
      <category>genai</category>
      <category>startup</category>
    </item>
    <item>
      <title>Open Banking vs Card Rails: Latency, Cost, and Developer Experience</title>
      <dc:creator>arun rajkumar</dc:creator>
      <pubDate>Wed, 10 Jun 2026 08:08:10 +0000</pubDate>
      <link>https://dev.to/mickyarun/open-banking-vs-card-rails-latency-cost-and-developer-experience-2knh</link>
      <guid>https://dev.to/mickyarun/open-banking-vs-card-rails-latency-cost-and-developer-experience-2knh</guid>
      <description>&lt;p&gt;I've integrated both. Cards and open banking. In production. Moving real money for real UK merchants.&lt;/p&gt;

&lt;p&gt;So when developers ask me "which is actually better to build on?" I don't give them the marketing answer. I give them the three numbers that decide it: how much it costs, how fast the money moves, and how much of your life you lose to the integration.&lt;/p&gt;

&lt;p&gt;Let me walk through all three. Honestly. Including where cards still win.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Cost — and why it's not close
&lt;/h2&gt;

&lt;p&gt;Here's the part nobody at the card networks wants in a headline.&lt;/p&gt;

&lt;p&gt;A UK card payment costs you somewhere between &lt;strong&gt;1.5% and 3%+&lt;/strong&gt; per transaction once you stack interchange, scheme fees, and your processor's margin. The "0.2% debit interchange cap" everyone quotes is the floor of the floor — it's not what lands on your statement.&lt;/p&gt;

&lt;p&gt;An open banking payment costs roughly &lt;strong&gt;0.1%–1.0%, or a flat 20p–50p&lt;/strong&gt;. No interchange. No scheme fee. Because there's no scheme. The customer authorises the payment inside their own banking app and the bank moves the money directly.&lt;/p&gt;

&lt;p&gt;The concrete version: a local garage takes £500 for a repair. A 1.5% card fee costs them £7.50. The same payment over open banking can cost around 10p. (&lt;a href="https://noda.live/articles/open-banking-costs-uk" rel="noopener noreferrer"&gt;Noda&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;That's not a rounding difference. That's the difference between a payments line item you tolerate and one you forget exists.&lt;/p&gt;

&lt;p&gt;And there's a second-order cost cards carry that nobody puts in the pricing table: &lt;strong&gt;chargebacks&lt;/strong&gt;. £20 a pop, plus the engineering time to fight them, plus the fraud surface you have to defend. Open banking payments are bank-authenticated at source. There's no card number to steal and no "I didn't authorise this" dispute when the customer tapped approve in their own banking app. The fraud surface is smaller, so the price &lt;em&gt;can&lt;/em&gt; be lower. The two facts are connected.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Latency — settlement, not the spinner
&lt;/h2&gt;

&lt;p&gt;This is where developers get the comparison wrong, so let me split it cleanly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Authorisation latency&lt;/strong&gt; — the spinner the user stares at — is comparable. Both flows take a few seconds. A card auth round-trips the network; an open banking payment redirects the user to their bank's SCA and back. From the user's chair, both feel like "tap, wait, done."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Settlement latency&lt;/strong&gt; is where they diverge violently.&lt;/p&gt;

&lt;p&gt;A card payment authorises instantly but &lt;em&gt;settles&lt;/em&gt; in ~2 business days. The money is promised, then it sits in limbo, then it arrives — minus fees, and reversible for months.&lt;/p&gt;

&lt;p&gt;An open banking payment runs over &lt;strong&gt;Faster Payments&lt;/strong&gt;. Settlement is near-instant — seconds to minutes — straight bank-to-bank, 24/7. There's no two-day float, no "pending payout" dashboard, no reconciling Tuesday's sales against Thursday's deposit. (&lt;a href="https://payop.com/business/the-role-of-open-banking-in-enabling-faster-payments-and-real-time-settlement/" rel="noopener noreferrer"&gt;Payop&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;If you've ever written reconciliation code, you already feel why this matters. Half the complexity in payments tooling exists to model the gap between "authorised" and "settled." Close that gap to near-zero and a whole category of state machine — pending, settling, settled, partially-reversed — collapses into one event: paid.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Developer experience — where I'll be honest both ways
&lt;/h2&gt;

&lt;p&gt;Let me give cards their due first.&lt;/p&gt;

&lt;p&gt;Card SDKs are mature. Stripe's docs are art. The card flow is a solved, copy-paste problem with twenty years of Stack Overflow behind it. If you're doing global, card-first commerce, that maturity is worth real money. I'd still reach for cards there.&lt;/p&gt;

&lt;p&gt;Open banking is younger, and the early ecosystem was genuinely painful — you were integrating against dozens of bank APIs, each with its own quirks, its own auth dance, its own downtime. That's the part that earned open banking its "hard to build on" reputation a few years ago.&lt;/p&gt;

&lt;p&gt;But that reputation is now outdated, and here's why: the bank-by-bank mess is exactly what a good PISP abstracts away. You don't integrate 40 banks. You integrate &lt;strong&gt;one API&lt;/strong&gt; that speaks Payment Initiation, handles SCA, manages the consent lifecycle, and fans out to Faster Payments for you.&lt;/p&gt;

&lt;p&gt;In practice, the flow is short:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// 1. Create a payment — you describe intent, not card mechanics&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;payment&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;atoa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;processPayment&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4999&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;            &lt;span class="c1"&gt;// £49.99 in minor units&lt;/span&gt;
  &lt;span class="na"&gt;currency&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;GBP&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;reference&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;order_10472&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;redirectUrl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://yourapp.com/return&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// 2. Send the customer to their bank to authorise (SCA happens here)&lt;/span&gt;
&lt;span class="nf"&gt;redirect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;payment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;authorisationUrl&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// 3. The bank moves the money over Faster Payments.&lt;/span&gt;
&lt;span class="c1"&gt;//    You get told when it's actually settled — not "authorised, check back Thursday."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Webhook: the event you actually care about is real, not a promise&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/webhooks/atoa&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;verify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;              &lt;span class="c1"&gt;// verify signature&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;payment.completed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;fulfilOrder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;reference&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;    &lt;span class="c1"&gt;// money is already in the bank&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sendStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice what's &lt;em&gt;missing&lt;/em&gt; from that code. No card object. No PCI scope to inherit. No tokenisation vault to secure. No &lt;code&gt;requires_capture&lt;/code&gt; → &lt;code&gt;capture&lt;/code&gt; two-step. No chargeback webhook to handle. You describe a payment, the customer approves it in their bank, the money arrives, you fulfil. The thing you're modelling is the thing that actually happens.&lt;/p&gt;

&lt;p&gt;That's the DX argument in one sentence: &lt;strong&gt;open banking lets you write code that matches reality instead of code that models a 1970s settlement delay.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The honest scorecard
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Card rails&lt;/th&gt;
&lt;th&gt;Open banking&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cost per txn&lt;/td&gt;
&lt;td&gt;1.5%–3%+&lt;/td&gt;
&lt;td&gt;0.1%–1% / 20p–50p&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Settlement&lt;/td&gt;
&lt;td&gt;~2 business days&lt;/td&gt;
&lt;td&gt;Near-instant (Faster Payments)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chargebacks&lt;/td&gt;
&lt;td&gt;Yes, £20+ each&lt;/td&gt;
&lt;td&gt;Structurally absent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PCI scope&lt;/td&gt;
&lt;td&gt;Yours to carry&lt;/td&gt;
&lt;td&gt;Not your problem&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SDK maturity&lt;/td&gt;
&lt;td&gt;Excellent, 20 yrs&lt;/td&gt;
&lt;td&gt;Younger, but abstracted&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best for&lt;/td&gt;
&lt;td&gt;Global, card-first&lt;/td&gt;
&lt;td&gt;UK consumers, instant settlement&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Cards aren't dead. If your customers are international and card-native, build on cards. I mean that.&lt;/p&gt;

&lt;p&gt;But if you're a UK SaaS, marketplace, or merchant tool charging UK consumers — you're paying card prices and eating card latency for an experience your users don't need. The numbers back the switch, and they're moving in one direction: UK open banking payments are up &lt;strong&gt;53% year on year&lt;/strong&gt;, with nearly 1 in 3 adults already using it. (&lt;a href="https://www.openbanking.org.uk/news/open-banking-surges-to-15-million-uk-users-as-july-marks-record-adoption/" rel="noopener noreferrer"&gt;Open Banking Ltd&lt;/a&gt;)&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it yourself
&lt;/h2&gt;

&lt;p&gt;The fastest way to feel the difference is to build it. Atoa's sandbox gives you a real Payment Initiation API, real Faster Payments settlement, and webhooks that fire when money actually moves — not when it's promised. &lt;a href="https://docs.atoa.me" rel="noopener noreferrer"&gt;docs.atoa.me&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Spin up a test payment. Watch it settle in seconds instead of days. Then look at what you'd have paid in card fees for the same transaction.&lt;/p&gt;

&lt;p&gt;That second number is the one that changes your mind.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Have you shipped both card and open banking flows in production? I want to hear where the DX actually broke down for you — the messy bits, not the brochure version.&lt;/em&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  OpenBanking #Payments #Fintech #API #BuildInPublic
&lt;/h1&gt;

</description>
      <category>openbanking</category>
      <category>payments</category>
      <category>fintech</category>
      <category>api</category>
    </item>
    <item>
      <title>I Replaced Scrum, Jira, and Our Wiki With 12 AI Agents on a Mac Mini</title>
      <dc:creator>arun rajkumar</dc:creator>
      <pubDate>Mon, 08 Jun 2026 16:33:49 +0000</pubDate>
      <link>https://dev.to/mickyarun/i-replaced-scrum-jira-and-our-wiki-with-12-ai-agents-on-a-mac-mini-o7o</link>
      <guid>https://dev.to/mickyarun/i-replaced-scrum-jira-and-our-wiki-with-12-ai-agents-on-a-mac-mini-o7o</guid>
      <description>&lt;p&gt;A survey last week put it at 54%. More than half the code shipped today is AI-generated.&lt;/p&gt;

&lt;p&gt;In my own work the number is probably higher. AI writes the first draft. AI estimates the work. AI generates the tests. I've written before about &lt;a href="https://bodhiorchard.ai/" rel="noopener noreferrer"&gt;the dangerous 20%&lt;/a&gt; — the edge cases, the illegal state transitions, the judgment AI quietly skips. That 20% is why I still need senior engineers.&lt;/p&gt;

&lt;p&gt;But there's a second 20% problem nobody talks about. Not in the code. Around it.&lt;/p&gt;

&lt;p&gt;Sprints. Story points. Standups. Jira boards no one updates. Confluence pages that went stale the day they were written. Every one of those tools assumes a human does the work and another human tracks the work.&lt;/p&gt;

&lt;p&gt;That's not my team anymore.&lt;/p&gt;

&lt;p&gt;So I stopped bending fifteen-year-old process around an AI-native team. I built my own way of working and open-sourced it. It runs on a Mac mini in the corner of my room. This is what's inside.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvi1z23hgj1eqg5uuabxo.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvi1z23hgj1eqg5uuabxo.jpg" alt=" " width="800" height="459"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Your whole org as a grove. Each repo is a tree, each feature a branch, each teammate present in the world. More on this below — but yes, that's the actual dashboard.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The thing that finally broke me: the wiki
&lt;/h2&gt;

&lt;p&gt;Here's the moment it clicked.&lt;/p&gt;

&lt;p&gt;A new feature needed context. I opened our wiki. The page was six months old. It described an architecture we'd refactored twice since. The "source of truth" was confidently, completely wrong — and three engineers had made decisions based on it that week.&lt;/p&gt;

&lt;p&gt;Documentation lies the moment you stop maintaining it. And nobody maintains it, because maintaining it is the busywork we all silently agree to skip.&lt;/p&gt;

&lt;p&gt;Source code doesn't lie. It can't. It's the thing that actually runs.&lt;/p&gt;

&lt;p&gt;So the first rule of the system I built: &lt;strong&gt;the code is the wiki.&lt;/strong&gt; Knowledge is extracted from the repository — the call graph, the module boundaries, the patterns, the history — and indexed continuously. When an agent or a human asks "how does settlement work?", the answer is reconstructed from what's true &lt;em&gt;right now&lt;/em&gt;, not from a page someone wrote last quarter and abandoned.&lt;/p&gt;

&lt;p&gt;No Confluence. No Notion graveyard. The only document that's allowed to be authoritative is the one that compiles.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fprdkldeibebkbp5iwzjw.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fprdkldeibebkbp5iwzjw.jpg" alt=" " width="800" height="483"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Nobody wrote this wiki. A baseline scan read the repositories and produced it — 19 live features across 4 repos, each one traceable to the code that backs it.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;And you don't even open the dashboard to read it. Ask in Slack, in plain English — "are we progressing on the P3 backlog item? what's the go-live date?" — and a bot answers from the live BUD: status, assignee, target date, a link back to the source. Not a number someone typed into a board last Tuesday. The thing that's actually true, right now.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Finmclkcrllcudohyp0be.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Finmclkcrllcudohyp0be.jpg" alt=" " width="800" height="1031"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The same emoji-react, thread-reply Slack you already live in — except the answers come from the source of truth, not from memory.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;So "the code is the wiki" isn't a slogan — it's an architecture. Knowledge lives in four layers that stay in sync on their own:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The repos themselves&lt;/strong&gt; — source code plus a per-repo &lt;code&gt;CLAUDE.md&lt;/code&gt;, synced on every PR merge to main.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent skills&lt;/strong&gt; — org standards, design guidelines, API patterns; synced on change.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The central store&lt;/strong&gt; — BUDs, enterprise rules, architecture decisions; real-time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector search&lt;/strong&gt; — semantic search across all of it, auto-indexed.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Two things make this more than a fancy &lt;code&gt;grep&lt;/code&gt;. It indexes &lt;strong&gt;code locations&lt;/strong&gt;, so any knowledge captured during development points back to the exact file and symbol it came from — and it links &lt;strong&gt;across repos&lt;/strong&gt;, so a frontend call is connected to the backend handler it actually hits, not left as two disconnected facts in two different wikis. And it never goes stale: after every PR merge, the affected feature is updated with the new commit history and the new code locations automatically, so the next agent that touches it inherits the &lt;em&gt;current&lt;/em&gt; truth, not last month's.&lt;/p&gt;

&lt;p&gt;That's the whole pitch against Confluence — auto-synced from source instead of hand-maintained, semantically searchable instead of keyword-matched, always current with daily staleness detection, and wired straight into the agents' prompts so they're never reasoning from a stale page.&lt;/p&gt;




&lt;h2&gt;
  
  
  Agent-Driven Development, in one table
&lt;/h2&gt;

&lt;p&gt;I call the methodology Agent-Driven Development (ADD). The simplest way to explain it is to put it next to the thing it replaces.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agile ceremony&lt;/th&gt;
&lt;th&gt;What it assumed&lt;/th&gt;
&lt;th&gt;Agent-Driven Development&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Sprint planning&lt;/td&gt;
&lt;td&gt;Humans do all the work, so plan their hours&lt;/td&gt;
&lt;td&gt;Agents draft; humans decide what's worth building&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Story points / planning poker&lt;/td&gt;
&lt;td&gt;Gut-feel proxy for time&lt;/td&gt;
&lt;td&gt;AI-PERT + Monte Carlo → real P50/P70/P85 &lt;strong&gt;dates&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Jira tickets&lt;/td&gt;
&lt;td&gt;Work scattered across a board&lt;/td&gt;
&lt;td&gt;One &lt;strong&gt;BUD&lt;/strong&gt; per feature: spec + tech plan + tests + history&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Confluence / wiki&lt;/td&gt;
&lt;td&gt;Someone keeps docs current (nobody does)&lt;/td&gt;
&lt;td&gt;Knowledge syncs from the source code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Daily standup&lt;/td&gt;
&lt;td&gt;Humans report status out loud&lt;/td&gt;
&lt;td&gt;A Status Agent reads the PRs and tells you what moved&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retrospective&lt;/td&gt;
&lt;td&gt;A meeting you forget by Friday&lt;/td&gt;
&lt;td&gt;A Learning Agent mines the actual diffs and incidents&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The pattern underneath all six rows is the same: &lt;strong&gt;let the machines handle the noise, so humans spend their judgment where judgment actually matters.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The 12 agents
&lt;/h2&gt;

&lt;p&gt;Here's the whole cycle on one diagram before I break it down — twelve agents around a loop, with a human reviewing at the centre and at every gate.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frshmr16yk8wnd5wm50gm.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frshmr16yk8wnd5wm50gm.jpg" alt=" " width="800" height="614"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Chat Intake (Triage) → BUD → Design → Tech Architecture (Tech Lead reviews; Smart Assignment picks the dev) → Development (AI + Human) → Test Generation → Testing (QA) → UAT &amp;amp; Deploy (Status) → Feature → Learning &amp;amp; Skills. An external bug reopens the feature. The loop never pretends it's a straight line.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;ADD runs a feature from a chat message to production through a chain of specialised agents. Each owns one phase. A human reviews and decides at every gate — this is human-in-the-loop by design, not lights-out automation.&lt;/p&gt;

&lt;p&gt;It starts in Slack. You drop a request; the &lt;strong&gt;Intake agent&lt;/strong&gt; doesn't just file it — it checks for existing features and BUDs so you don't build a duplicate, then asks the questions a good PM would: who is this for, why now, what's the timeline.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh3tweovntidxq7zzu2wf.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh3tweovntidxq7zzu2wf.jpg" alt=" " width="800" height="621"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"Change the notification icon to modern design?" → the agent checks for duplicates, then interrogates the intent before a single line is written.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;From there, every feature moves through the same seven-phase lifecycle, each phase a tab on its BUD:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Slack idea → Intake → Requirements → Design → Tech Spec
   → Development → Code Review → Testing → Prod
        ↑ estimation, status, learning and skills run alongside ↑
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3pb4wyc64lt9ch3isj1n.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3pb4wyc64lt9ch3isj1n.jpg" alt=" " width="800" height="489"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Every phase can run on an agent — or you flip it off and drive it yourself from your local AI via MCP. "Stage agents are off, you're driving this BUD" is a real toggle, per phase, per assignee. That's what human-in-the-loop actually looks like.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Around that spine sit the agents that kill the ceremonies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Estimation&lt;/strong&gt; — AI-PERT + Monte Carlo instead of story points (below).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Status&lt;/strong&gt; — reads the PRs so you never run another standup.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Learning&lt;/strong&gt; — mines the real diffs and incidents when a BUD closes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skills&lt;/strong&gt; — profiles who's strong at what from git history, and feeds it back into estimation and routing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The agents do the busywork. You do the deciding. That division is the whole philosophy.&lt;/p&gt;




&lt;h2&gt;
  
  
  The standup reads the work, not the people
&lt;/h2&gt;

&lt;p&gt;I haven't run a status standup in months. The Standup Agent does it at 08:30 on a cron — but the interesting part is &lt;em&gt;where it reads from&lt;/em&gt;. It doesn't ask anyone "what did you do yesterday." It reads what actually happened.&lt;/p&gt;

&lt;p&gt;Hooks and an MCP server in each dev's local setup post the real signal back to the BUD: the prompts, the commits, the sessions. A TODO gets auto-claimed when work starts on it and auto-marked done when the agent finishes the code — so the board reflects reality without anyone updating it. The agent then aggregates the git, PR, bug and chat activity into a summary with risk flags on anything lagging.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyb8wuyjxldlffnofdi85.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyb8wuyjxldlffnofdi85.jpg" alt=" " width="800" height="489"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffjfqenbi7dao1rdvuxo2.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffjfqenbi7dao1rdvuxo2.jpg" alt=" " width="800" height="493"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Four file-level TODOs, all ticked by the work itself. PR #50 merged, 4 commits, 2 files, 5 sessions, 0 errors — captured from hooks, not typed into a board. The status is a side effect of building, not a separate chore.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;And because the Design Agent generates wireframes from your project's &lt;strong&gt;design system extracted out of the code&lt;/strong&gt; — the real CSS tokens, not a guess — what it produces is on-brand by construction. Same with the tech spec: it's written against your actual architecture and tokens, so "follows the brand guidelines" stops being a review comment and becomes the default.&lt;/p&gt;




&lt;h2&gt;
  
  
  The quality loop that reassigns itself
&lt;/h2&gt;

&lt;p&gt;This is the part I'm proudest of, because it's where most teams quietly accumulate debt.&lt;/p&gt;

&lt;p&gt;The Test Plan Agent auto-generates the test plan from the BUD's acceptance criteria and the code — Playwright e2e, unit and integration, security, and the &lt;strong&gt;manual&lt;/strong&gt; UAT cases a human still has to sign off. An MCP token wires your QA automation repo in, so test commits flow straight back to the BUD.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F85p31iol8hr6j0dzvw6u.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F85p31iol8hr6j0dzvw6u.jpg" alt=" " width="800" height="456"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0dgrs9v452zspng225w7.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0dgrs9v452zspng225w7.jpg" alt=" " width="800" height="493"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;24 test cases for one small feature — and notice the manual ones marked "neither can ship as silent regressions, require human sign-off." The agent writes the tests; it doesn't get to wave them through.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Code review is auto-triggered against your org's rules and submitted back on the PR. And here's the loop that closes itself: testing has a &lt;strong&gt;bug threshold&lt;/strong&gt; — complexity × a configurable multiplier. Cross it, and the work auto-reassigns. The original developer moves to bug review, QA rotates to the next waiting BUD, and each bug is auto-classified as a &lt;em&gt;missed feature&lt;/em&gt; versus a &lt;em&gt;development bug&lt;/em&gt; so it takes the right fix path. Quality debt doesn't pile up quietly, because the system reacts to it before a human notices.&lt;/p&gt;




&lt;h2&gt;
  
  
  The BUD: one document instead of three tools
&lt;/h2&gt;

&lt;p&gt;Every feature lives in a single markdown document called a &lt;strong&gt;BUD&lt;/strong&gt; — Business Understanding Document. Spec, technical spec, test plan, and decision history, all in one place, vector-indexed so any agent can pull it as context.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# BUD-241 · Idempotent webhook handler for refunds&lt;/span&gt;

&lt;span class="gu"&gt;## Intent&lt;/span&gt;
Bank sends the same refund webhook up to 3x. We must process once.

&lt;span class="gu"&gt;## Acceptance criteria&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Duplicate webhook IDs are a no-op (return 200, no state change)
&lt;span class="p"&gt;-&lt;/span&gt; A refund on an already-refunded txn is rejected, not retried
&lt;span class="p"&gt;-&lt;/span&gt; Illegal transition complete → pending is impossible

&lt;span class="gu"&gt;## Tech plan&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Dedup key: (provider, webhook_id) unique in Postgres
&lt;span class="p"&gt;-&lt;/span&gt; Reuse shared &lt;span class="sb"&gt;`refundGuard`&lt;/span&gt; util — do NOT reinvent

&lt;span class="gu"&gt;## History&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; 2026-06-05 design approved (human gate)
&lt;span class="p"&gt;-&lt;/span&gt; 2026-06-05 estimation: P70 = 2 days
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the whole feature. No ticket in Jira, no spec in Confluence, no test plan in a Google Doc that nobody opens. One file. It travels with the code, and it's the context every agent reads before it touches anything.&lt;/p&gt;




&lt;h2&gt;
  
  
  Killing story points with statistics
&lt;/h2&gt;

&lt;p&gt;Story points always bothered me. They're a proxy for time that we then pretend isn't a proxy for time, and they don't compose across a team where one person knows a module cold and another has never opened it.&lt;/p&gt;

&lt;p&gt;ADD replaces them with AI-PERT plus a Monte Carlo simulation.&lt;/p&gt;

&lt;p&gt;For each phase the model generates optimistic / likely / pessimistic estimates — classic PERT — but weighted by a per-developer, per-module &lt;strong&gt;skill score&lt;/strong&gt; (0–1.0, derived from git and BUD history), current load, and backlog depth. Then 10,000 simulated runs turn that distribution into dates with confidence intervals:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight gherkin"&gt;&lt;code&gt;&lt;span class="kd"&gt;Feature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; Idempotent refund webhooks
  P50  →  Jun 9   (50% chance done by)
  P70  →  Jun 10  (70% chance done by)
  P85  →  Jun 12  (85% chance done by)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;"85% confident by the 12th" is the shape a stakeholder actually wants. It's also honest in a way "8 points" never was — it shows you the uncertainty instead of hiding it inside a fake integer.&lt;/p&gt;

&lt;p&gt;Where do those skill scores come from? Git history. The system reads who has actually shipped what, per module, and builds a profile — expertise you can see instead of guess at.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzfztbj1g43qsv4iz0b9u.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzfztbj1g43qsv4iz0b9u.jpg" alt=" " width="800" height="487"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Five developers, eighteen modules, scored from real commits. This is what feeds estimation and routing — not a manager's hunch about who "knows the auth code."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Is the skill-score input perfect? No. It's derived from who happened to touch what, so it can encode bias. That's one of the two things I most want feedback on.&lt;/p&gt;

&lt;p&gt;And the loop closes itself. When a BUD ships, the &lt;strong&gt;Learning Agent&lt;/strong&gt; writes the retrospective from the actual diffs — including an estimated-vs-actual table that tells you exactly where the model was wrong, so the next estimate is better.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc3oksjjoeczf99vxesju.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc3oksjjoeczf99vxesju.jpg" alt=" " width="800" height="498"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;No retro meeting. The agent reads the merges and the timeline and hands you the drift — Design −25%, Development +603% — so estimation actually learns.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The part that sounds whimsical and isn't: the virtual world
&lt;/h2&gt;

&lt;p&gt;The whole organisation renders as a living 3D world — and it's &lt;strong&gt;multiplayer&lt;/strong&gt;. Not a dashboard you look at. A place your team is actually &lt;em&gt;in&lt;/em&gt;, together.&lt;/p&gt;

&lt;p&gt;Each repository is a tree. Each feature is a branch. Each agent is an orchardist tending the grove. A feature in progress is a branch growing; a merged one bears fruit; a stalled one needs pruning. Health is visible at a glance: a thriving tree versus one quietly dying.&lt;/p&gt;

&lt;p&gt;And every teammate is there with you. You walk around with WASD, sprint, jump, orbit the camera over the grove. Your colleagues are avatars with their own houses, present in real time. You can wave, cheer, greet, invite someone over. It sounds like a game because part of it is one — but the effect is presence. A standup is people reading status out loud. This is people standing in the same place, looking at the same living map of the work.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4go4l7gvz2ef3ykkqrht.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4go4l7gvz2ef3ykkqrht.jpg" alt=" " width="800" height="459"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Your team, present. Move, sprint, wave, cheer, invite. The status bar is real controls, not decoration.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It started as a visualisation. It became the most honest org chart I've ever had — because it's drawn from the code, not from a slide. &lt;a href="https://youtu.be/OxoqBI7BNxU" rel="noopener noreferrer"&gt;Here's a walkthrough.&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Shipping quality is the game
&lt;/h2&gt;

&lt;p&gt;Here's the part I didn't expect to care about and now love.&lt;/p&gt;

&lt;p&gt;The world is gamified — but it rewards the &lt;em&gt;right&lt;/em&gt; thing. You earn XP and Skill Points, level up, unlock vehicles, upgrade your house. Crucially, the economy is tuned to quality, not output. Ship a BUD to production: &lt;strong&gt;+1 SP&lt;/strong&gt;. Give a code review: &lt;strong&gt;+0.25&lt;/strong&gt;. Quality score above 80%: &lt;strong&gt;+0.5&lt;/strong&gt;. Bug found in testing: &lt;strong&gt;−0.25&lt;/strong&gt;. Bug found in &lt;em&gt;production&lt;/em&gt;: &lt;strong&gt;−1&lt;/strong&gt;. And the points for shipping don't pay out until the BUD actually reaches CLOSED — through testing, UAT, prod. You don't get rewarded for the green checkmark. You get rewarded for the thing surviving contact with reality.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6vgo2bglkoghgy6a0y5m.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6vgo2bglkoghgy6a0y5m.jpg" alt=" " width="800" height="459"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Read the numbers: a production bug costs you more than shipping earns. That's the whole point. In a world where AI can churn out code that passes tests, the scoreboard has to reward what AI is bad at — code that holds up.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That ties straight back to where I started. AI nails the 80%. The 20% — the part that doesn't blow up in production — is what we actually want to incentivise. So that's what the game scores.&lt;/p&gt;




&lt;h2&gt;
  
  
  It runs on a Mac mini, and your data never leaves it
&lt;/h2&gt;

&lt;p&gt;This is the part I care about most, and the part most "AI dev platform" pitches skip.&lt;/p&gt;

&lt;p&gt;Bodhiorchard is &lt;strong&gt;self-hosted by design.&lt;/strong&gt; Postgres with pgvector, your repositories, the embeddings, and the full audit log live on your hardware. For me, that hardware is a Mac mini. No repo content is shipped to anyone's cloud. For a regulated shop — and I lead engineering at an FCA-authorised fintech, so this is not theoretical for me — that's the difference between "interesting demo" and "allowed to exist."&lt;/p&gt;

&lt;p&gt;Inference is your choice. It runs on Claude Code today; Ollama and OpenAI are on the roadmap for fully air-gapped setups. The agent layer is engine-independent — swapping the model is API rewiring, not a redeploy.&lt;/p&gt;

&lt;p&gt;The stack, for the curious:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Backend   FastAPI · Python 3.12
Frontend  Vue 3 · PlayCanvas (the 3D world)
Data      Postgres + pgvector · Redis
Agents    Local MCP server (read + bounded write tools)
License   Apache 2.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It's also built for real orgs, not just a solo demo: detailed roles and permissions, multi-org support out of the box, and capacity planning baked into triage and assignment — the Triage Agent defers work when the team is full, and Smart Assignment balances by real-time utilisation rather than who shouts loudest. So the "self-hosted toy" worry doesn't really hold; it'll sit inside an org's access model on day one.&lt;/p&gt;




&lt;h2&gt;
  
  
  Honest status, because HN will ask anyway
&lt;/h2&gt;

&lt;p&gt;I'd rather tell you this up front than have you find it.&lt;/p&gt;

&lt;p&gt;What's live today: the platform, the BUD lifecycle, the MCP write-path, repository and code-graph indexing, skill profiling, and the 3D living-tree dashboard. The agents are real and they work with a human in the loop at every gate.&lt;/p&gt;

&lt;p&gt;What I'm still building: the fully autonomous execution loop. The direction I'm taking it is deliberately narrow — auto mode first for &lt;em&gt;small, low-risk BUDs&lt;/em&gt;, where one agent chain runs tech spec → code → code review → test → deploy end to end, then stops and waits for a human to approve the release. Not "point the swarm at production and walk away." Lights-out on the small stuff, a human gate where it counts. That's the active work, not a shipped claim. So today this is &lt;em&gt;agents-assisted, human-in-the-loop&lt;/em&gt;, and anyone who tells you their agent swarm ships production code fully unattended is selling something.&lt;/p&gt;

&lt;p&gt;This is an independent project. I built it solo, on my own time, not affiliated with any employer — the fintech is where I felt the pain, not the thing that owns the code.&lt;/p&gt;




&lt;h2&gt;
  
  
  You don't have to start from zero
&lt;/h2&gt;

&lt;p&gt;If you're on Jira today, you don't throw your backlog away. Connect Jira Cloud and import your existing issues straight into BUDs — point Bodhiorchard at the work you already have and watch the grove fill in.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl3w0z37pm641e4pxe486.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl3w0z37pm641e4pxe486.jpg" alt=" " width="800" height="379"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The on-ramp is a migration, not a rewrite. Your tickets become BUDs; the agents take it from there.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;There's also a cross-repo graph view — bus-factor analysis, threat detection, BUD-stage filtering across every repo — for when you want the dependency map instead of the grove. Same data, different lens.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I actually want from you
&lt;/h2&gt;

&lt;p&gt;Not stars. Feedback. Two questions I'm genuinely stuck on:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Does "the BUD is the single source of truth" survive contact with your reality?&lt;/strong&gt; Or does real-world ticketing always sprawl back across five tools no matter what you do?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Where would self-hosted + bring-your-own-inference actually change your mind&lt;/strong&gt; versus a hosted SaaS PM tool — and where is it just more ops burden you don't want?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The full methodology is written up at &lt;strong&gt;&lt;a href="https://bodhiorchard.ai/" rel="noopener noreferrer"&gt;bodhiorchard.ai&lt;/a&gt;&lt;/strong&gt; — the twelve agents, the manifesto, the Agile-vs-ADD table, all of it. The repo has six demo videos and four sample repositories you can point it at: &lt;strong&gt;&lt;a href="https://github.com/mickyarun/bodhiorchard" rel="noopener noreferrer"&gt;https://github.com/mickyarun/bodhiorchard&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I spent fifteen years being told the ceremony &lt;em&gt;was&lt;/em&gt; the engineering. Sprints felt broken long before AI. AI just made it impossible to keep pretending.&lt;/p&gt;

&lt;p&gt;So I replaced them. If you've killed a ceremony and lived to tell the tale — which one did you kill first?&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm Arun — CTO &amp;amp; Co-Founder of Atoa, a UK open banking payments platform, and the solo author of Bodhiorchard. I write about what building with AI is actually like, not what the conference slides say. Find me on &lt;a href="https://x.com/mickyarun" rel="noopener noreferrer"&gt;X @mickyarun&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>devops</category>
      <category>productivity</category>
    </item>
    <item>
      <title>How We Hire for the 20% AI Can't Do (And Why We Stopped Asking Candidates to Code From Scratch)</title>
      <dc:creator>arun rajkumar</dc:creator>
      <pubDate>Tue, 02 Jun 2026 14:43:11 +0000</pubDate>
      <link>https://dev.to/mickyarun/how-we-hire-for-the-20-ai-cant-do-and-why-we-stopped-asking-candidates-to-code-from-scratch-1ida</link>
      <guid>https://dev.to/mickyarun/how-we-hire-for-the-20-ai-cant-do-and-why-we-stopped-asking-candidates-to-code-from-scratch-1ida</guid>
      <description>&lt;p&gt;A few weeks ago I published a piece called &lt;a href="https://dev.to/mickyarun/ai-agents-are-great-at-80-of-our-code-the-other-20-is-why-we-still-need-seniors-3lh5"&gt;"AI Agents Are Great at 80% of Our Code. The Other 20% Is Why We Still Need Seniors."&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It got 25 reactions and 34 comments. Several of those comments asked the same question in different words:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"How do you actually measure that 20% when you're hiring?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Fair question. I dodged it in the first article because I didn't have a clean answer yet. Now I do. Or at least, we have a process that's working better than anything we tried before.&lt;/p&gt;

&lt;p&gt;This is that answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The old interview was broken before AI made it obvious
&lt;/h2&gt;

&lt;p&gt;For years, we ran the standard playbook. Whiteboard problem. Timed coding exercise. "Build a REST endpoint in 45 minutes." You know the drill.&lt;/p&gt;

&lt;p&gt;Here's what that interview actually tested: can this person write syntactically correct code under pressure, from memory, with someone watching?&lt;/p&gt;

&lt;p&gt;That's a real skill. It's just not the skill that matters anymore.&lt;/p&gt;

&lt;p&gt;AI handles the code-writing part. I don't mean it handles it perfectly — I wrote a whole article about the 20% it gets wrong. But the 80% that is boilerplate, CRUD, API wrappers, standard patterns? An agent will generate that in seconds. Clean. Typed. Probably with better variable names than I'd pick.&lt;/p&gt;

&lt;p&gt;So if I'm hiring someone and my interview tests whether they can do what an agent already does faster — what exactly am I learning?&lt;/p&gt;

&lt;p&gt;That they can type under pressure. Great. So can the agent. And it doesn't get nervous.&lt;/p&gt;

&lt;h2&gt;
  
  
  The interview we actually run now
&lt;/h2&gt;

&lt;p&gt;We stopped asking candidates to write code from scratch. Instead, we hand them code an AI agent already wrote.&lt;/p&gt;

&lt;p&gt;The code looks fine. It passes the tests we included. The variable names are clean. The types are correct. A junior looking at it would say "ship it."&lt;/p&gt;

&lt;p&gt;But it's wrong.&lt;/p&gt;

&lt;p&gt;Not wrong in a way that crashes. Wrong in a way that costs money three weeks later. Wrong in a way that only someone who thinks about &lt;em&gt;consequences&lt;/em&gt; would catch.&lt;/p&gt;

&lt;p&gt;Here's the shape of it. We give candidates a webhook handler for processing payment confirmations. The handler works. It receives the event, updates the database, returns a 200. Clean code.&lt;/p&gt;

&lt;p&gt;What's missing: idempotency. If the bank retries the webhook — and banks &lt;em&gt;always&lt;/em&gt; retry — the handler processes the payment twice. The customer gets charged twice. We get an FCA complaint. The code is correct. The system is broken.&lt;/p&gt;

&lt;p&gt;Or we show them a payment flow with state transitions. &lt;code&gt;pending&lt;/code&gt; to &lt;code&gt;authorised&lt;/code&gt; to &lt;code&gt;settled&lt;/code&gt;. Looks right. But there's a path where a payment can go from &lt;code&gt;settled&lt;/code&gt; back to &lt;code&gt;pending&lt;/code&gt;. That's an illegal state transition. In our domain, that means money that was already in a merchant's account could theoretically get pulled back without a refund record. No test catches it because no test was written for a transition that shouldn't exist.&lt;/p&gt;

&lt;p&gt;We ask candidates to review this code. Not write it. Review it.&lt;/p&gt;

&lt;p&gt;The ones who have the 20% find these things. Not always immediately. Sometimes they stare at it for five minutes and then say "wait — what happens if this gets called twice?" That moment is worth more than any algorithm they could whiteboard.&lt;/p&gt;

&lt;h2&gt;
  
  
  It's not what they'd change. It's why.
&lt;/h2&gt;

&lt;p&gt;We added a second part to the interview. Once a candidate identifies issues in the AI-generated code, we ask them to walk us through a PR rejection.&lt;/p&gt;

&lt;p&gt;Not "what would you change." We already know what needs to change. We want to hear &lt;em&gt;why they'd reject it&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;This is where you separate pattern-matchers from engineers.&lt;/p&gt;

&lt;p&gt;A pattern-matcher says: "There's no idempotency key. You should add one." Correct. Also surface-level. They've seen the pattern before and recognized its absence. That's good, but it's not enough.&lt;/p&gt;

&lt;p&gt;An engineer says: "There's no idempotency key, which means a network retry from the bank will double-process the payment. The customer sees two debits. Your support team gets a ticket. You file a dispute with the acquiring bank. The refund takes 5-7 business days. And if this happens at volume, you've got a regulatory reporting obligation."&lt;/p&gt;

&lt;p&gt;Same observation. Completely different depth. The first person knows the pattern. The second person knows what happens downstream when the pattern is missing.&lt;/p&gt;

&lt;p&gt;That downstream awareness — the ability to trace a bug forward through the business — is the 20%.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hire for intent, not resumes
&lt;/h2&gt;

&lt;p&gt;Our interview process changed because our hiring philosophy changed first.&lt;/p&gt;

&lt;p&gt;We don't hire resumes. We hire intent. Let me give you three examples.&lt;/p&gt;

&lt;p&gt;One developer's resume listed one skill under "Technical Proficiency": Googling. I'm not paraphrasing. That's what it said. B.Sc. No fancy internships. No side projects on GitHub. Just someone who was honest about what they knew and relentless about learning what they didn't. Today they own our merchant-facing app. The whole thing.&lt;/p&gt;

&lt;p&gt;Another cold-messaged us asking for a job. No referral. No warm intro. Just a direct message. In the interview, they were quiet. Not shy — quiet. Listened more than they talked. When they did talk, they went straight to the solution. No preamble, no hedging, no "well it depends." Just: here's the problem, here's how I'd fix it, here's what could go wrong.&lt;/p&gt;

&lt;p&gt;A third started as an intern. They're now building our Open Banking integration end-to-end. Not assisting. Not maintaining. Building.&lt;/p&gt;

&lt;p&gt;The common thread isn't a degree or a tech stack or years of experience. It's three things: curiosity, ownership, and willingness to be wrong.&lt;/p&gt;

&lt;p&gt;The first didn't pretend they knew things they didn't. The second didn't try to impress with volume — they impressed with clarity. The third didn't wait for someone to assign harder problems — they grew into them because the problems were there and they weren't afraid to try.&lt;/p&gt;

&lt;p&gt;None of them would have passed the old coding interview particularly well. All of them are exactly the kind of engineer you want reviewing an AI agent's output.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 20% isn't just code — it's design thinking
&lt;/h2&gt;

&lt;p&gt;Here's something most "AI and hiring" articles miss: the 20% that matters isn't only about catching bugs in payment logic. It's about knowing how to think about problems that don't have a spec yet.&lt;/p&gt;

&lt;p&gt;Before Atoa, I spent years in design thinking — working with clients who were showcasing products at CES. One project sticks with me. A world-leading chocolate manufacturer wanted to launch a series of chocolates based on human emotions — Anger, Disgust, Sad, Happy, Wimpy. The brief: build software that captures a person's emotion in real-time, recommends the matching chocolate, and makes it go viral on social media.&lt;/p&gt;

&lt;p&gt;Now imagine the marketing manager walks into your office and drops this on your desk. The brief is: make it viral. Which platform do you build on? Where does the experience live? What's the feature that makes someone &lt;em&gt;want&lt;/em&gt; to share it?&lt;/p&gt;

&lt;p&gt;Assume technically anything is possible. The technology isn't the constraint. Your thinking is.&lt;/p&gt;

&lt;p&gt;Here's the filter: if your first answer is "I'll build a mobile app and a web app" — that's a straight reject. Not because mobile and web are wrong technologies. But because you jumped to &lt;em&gt;how&lt;/em&gt; before you thought about &lt;em&gt;why&lt;/em&gt;. You're solving for delivery before you've solved for virality. You're thinking like a developer when the brief asked you to think like a designer.&lt;/p&gt;

&lt;p&gt;The interesting answers start with questions. Who's the audience? Where do they already spend time? What makes someone stop scrolling and share something? What's the 3-second hook? How does the chocolate brand benefit from every share? What's the mechanic that makes this grow without paid media?&lt;/p&gt;

&lt;p&gt;Now here's my challenge to you: how would you approach this? Drop it in the comments. Not the tech stack — the &lt;em&gt;thinking&lt;/em&gt;. How do you decompose this brief into something that actually goes viral?&lt;/p&gt;

&lt;p&gt;There's no single right answer. That's the point. This is a design thinking exercise — the kind of problem where the 20% lives. The brief is intentionally vague. The constraints are real. And the interesting part isn't the technology you pick. It's &lt;em&gt;how you think about a problem before you write a single line of code&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;No AI agent is writing a spec for that. No benchmark captures the ability to look at a brief like "emotion-based chocolate recommendation engine for CES" and turn it into a system. That's design thinking. The ability to hold a vague, human problem in your head and translate it into technical architecture — while keeping the user experience front and centre.&lt;/p&gt;

&lt;p&gt;I look for this in interviews too. Not the ability to solve a well-defined problem. The ability to &lt;em&gt;define&lt;/em&gt; the problem in the first place. When I ask a candidate "how would you approach this?" and they immediately start writing code — that tells me something. When they first ask "who's using this, where, and what does success look like?" — that tells me something very different.&lt;/p&gt;

&lt;p&gt;The 20% is judgment about code. But it's also judgment about products, users, and what should exist in the world. AI can generate solutions. It can't ask the right question.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the 20% actually looks like in an interview
&lt;/h2&gt;

&lt;p&gt;Here's what I'm watching for when a candidate reviews code. It's not a checklist — it's a set of signals.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do they think about what shouldn't happen?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most engineers think about the happy path. The payment goes through. The webhook fires. The database updates. Done.&lt;/p&gt;

&lt;p&gt;The 20% engineers think about the unhappy path &lt;em&gt;first&lt;/em&gt;. What happens when the webhook fires twice? What happens when the database write succeeds but the response times out? What happens when the bank says "yes" and our system says "no" and now the money exists in a state neither side agrees on?&lt;/p&gt;

&lt;p&gt;If a candidate's first instinct is "how does this work?" — that's fine. If their first instinct is "how does this &lt;em&gt;break&lt;/em&gt;?" — that's the signal.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do they ask about failure modes before writing anything?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We've started noticing this in walkthroughs. Some candidates immediately start typing fixes. Others ask questions first. "What's the retry policy on these webhooks?" "Is there a dead letter queue?" "What happens to in-flight payments if this service goes down?"&lt;/p&gt;

&lt;p&gt;The ones who ask first are almost always better engineers. Not because asking is inherently better than doing. But because in the 20% territory — the code that handles edge cases, race conditions, regulatory requirements — the cost of building the wrong thing is higher than the cost of asking one more question.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can they explain a tradeoff they made, not just what they chose?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the question I ask every candidate, regardless of seniority: "Tell me about a technical decision where you chose the worse option on purpose."&lt;/p&gt;

&lt;p&gt;The interesting candidates have an answer. "We chose synchronous calls between two services because the audit trail was easier to reason about, even though async would have been more resilient." "We kept a manual process instead of automating it because the edge cases weren't well understood yet and we didn't want to automate the wrong thing."&lt;/p&gt;

&lt;p&gt;The 20% is full of decisions like this. The right answer isn't always the technically superior one. Sometimes the right answer is the one that's easier to debug at 2am, or the one that produces a cleaner audit trail, or the one that a new engineer can understand without reading three pages of context.&lt;/p&gt;

&lt;h2&gt;
  
  
  The junior training pipeline problem
&lt;/h2&gt;

&lt;p&gt;Here's the question that kept me up after the first article: if AI handles 80% of the code, how do juniors ever build the judgment that makes seniors valuable?&lt;/p&gt;

&lt;p&gt;The 80% used to be the training ground. You learn to write CRUD endpoints. You learn to wire up a database. You learn to handle HTTP errors. You make mistakes in the boring code, you get them caught in review, and slowly you develop an instinct for the less boring code.&lt;/p&gt;

&lt;p&gt;If an agent writes all of that for you on day one, what are you actually learning?&lt;/p&gt;

&lt;p&gt;This is a real problem. And "just let them use AI" isn't the answer, because using AI well requires the judgment you're supposed to be building.&lt;/p&gt;

&lt;p&gt;I'll be honest — I've had to let someone go because of this exact gap. They were using AI for everything. But they were using the default model in Cursor while the rest of the team had moved to Opus for anything that touched critical code. They weren't thinking about &lt;em&gt;which&lt;/em&gt; tool to use &lt;em&gt;when&lt;/em&gt;. They were just pressing tab and shipping. The code looked fine. The judgment wasn't there. And in a payment system, that's not a skill gap you can coach around — it's a risk.&lt;/p&gt;

&lt;p&gt;At Atoa, we pair juniors with seniors on the hard problems. Not the 80% problems. The 20% ones. The payment state machine that handles twelve edge cases. The webhook handler that has to be idempotent across retries, timeouts, and partial failures. The reconciliation logic where our system says one thing and the bank says another.&lt;/p&gt;

&lt;p&gt;The senior doesn't watch the output. They watch the process. They're looking for two things.&lt;/p&gt;

&lt;p&gt;First: "What did you skip?" Not what did you get wrong — what did you not even consider? That gap is where the learning lives. A junior who writes a webhook handler and doesn't think about idempotency hasn't made a mistake. They have a blind spot. Mistakes you can catch in tests. Blind spots you can only catch by asking the right question at the right time. That's what the senior is there for.&lt;/p&gt;

&lt;p&gt;Second: "What happens when this fails?" Not "did you handle the error." Did you think about what the &lt;em&gt;system&lt;/em&gt; does when this component fails? Does the rest of the pipeline stall? Does the customer see a broken state? Does the merchant lose money? The junior doesn't need to have the answer. They need to have the habit of asking the question.&lt;/p&gt;

&lt;p&gt;The painful lessons still happen. They just happen faster because the senior is there to compress the feedback loop from "you'll figure this out in three years" to "let me show you why this matters right now."&lt;/p&gt;

&lt;h2&gt;
  
  
  The best hire isn't the best coder anymore
&lt;/h2&gt;

&lt;p&gt;Three years ago I'd have hired the candidate who wrote the cleanest code the fastest. That person is still good. They're just not rare anymore. An AI agent writes clean code fast. That's table stakes.&lt;/p&gt;

&lt;p&gt;The hire I'm looking for now is the person who reads an AI agent's clean, well-typed, properly structured code — and says "this will break in production, and here's exactly how."&lt;/p&gt;

&lt;p&gt;That person can tell an agent what it got wrong. More importantly, they can explain &lt;em&gt;why it matters&lt;/em&gt;. Not just "add an idempotency key" but "add an idempotency key because the bank will retry, and without it, this elegant code will charge a customer twice."&lt;/p&gt;

&lt;p&gt;The 20% was never about writing harder code. It's about knowing which code is dangerous.&lt;/p&gt;

&lt;p&gt;We changed our interview because the job changed. The job isn't writing code anymore. The job is judgment.&lt;/p&gt;

&lt;p&gt;And judgment is the one thing you can't generate with a prompt.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is a sequel to &lt;a href="https://dev.to/mickyarun/ai-agents-are-great-at-80-of-our-code-the-other-20-is-why-we-still-need-seniors-3lh5"&gt;AI Agents Are Great at 80% of Our Code. The Other 20% Is Why We Still Need Seniors&lt;/a&gt;. If you're building a team that works with AI agents, I'd love to hear how your hiring process has changed. Drop a comment or find me on X &lt;a href="https://twitter.com/mickyarun" rel="noopener noreferrer"&gt;@mickyarun&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>career</category>
      <category>ai</category>
      <category>hiring</category>
      <category>webdev</category>
    </item>
    <item>
      <title>AI Agents Are Great at 80% of Our Code. The Other 20% Is Why We Still Need Seniors.</title>
      <dc:creator>arun rajkumar</dc:creator>
      <pubDate>Thu, 28 May 2026 06:20:38 +0000</pubDate>
      <link>https://dev.to/mickyarun/ai-agents-are-great-at-80-of-our-code-the-other-20-is-why-we-still-need-seniors-3lh5</link>
      <guid>https://dev.to/mickyarun/ai-agents-are-great-at-80-of-our-code-the-other-20-is-why-we-still-need-seniors-3lh5</guid>
      <description>&lt;p&gt;&lt;em&gt;We let AI agents loose on a payment platform. They crushed the boring stuff. Then they silently broke the stuff that matters.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;A survey came out last week. 54% of all code is now AI-generated. Up from 28% last year.&lt;/p&gt;

&lt;p&gt;I read that number and thought: yeah, that tracks. We're probably in that range too.&lt;/p&gt;

&lt;p&gt;But here's the thing nobody's asking — which 54%?&lt;/p&gt;

&lt;p&gt;Not all code carries equal weight. A CRUD endpoint for fetching merchant details? Low risk. The webhook handler that transitions a payment from &lt;code&gt;pending&lt;/code&gt; to &lt;code&gt;complete&lt;/code&gt;? That's someone's rent. Someone's payroll. Get that wrong and money moves where it shouldn't, or worse, money doesn't move at all.&lt;/p&gt;

&lt;p&gt;I'm the CTO of a payment platform. FCA-authorised, processing real money, real merchants, real consequences. We run NestJS microservices, Docker, Traefik — the usual stack. And we've been using AI agents aggressively for over a year now.&lt;/p&gt;

&lt;p&gt;I'm not here to tell you AI is dangerous. It's not.&lt;/p&gt;

&lt;p&gt;I'm here to tell you it's dangerous when you forget what it's actually good at.&lt;/p&gt;




&lt;h2&gt;
  
  
  The 80% Where AI Agents Are Genuinely Brilliant
&lt;/h2&gt;

&lt;p&gt;Let me give credit where it's due. AI agents have made our team faster in ways that would have seemed absurd two years ago.&lt;/p&gt;

&lt;p&gt;API scaffolding. Generating service boilerplate. Writing Zod validation schemas. Spinning up new endpoints. Creating test stubs. Refactoring imports. Migrating patterns across repos.&lt;/p&gt;

&lt;p&gt;We run multiple microservices. When we need a new service, an agent can scaffold the entire thing — module structure, base configuration, Docker setup, Traefik labels — in minutes. What used to be a half-day of copy-paste-and-tweak is now a conversation.&lt;/p&gt;

&lt;p&gt;When we overhauled our env management across all repos, AI agents did the grunt work. They mapped every &lt;code&gt;.env&lt;/code&gt; file, found naming conflicts, identified common variables, and generated a unified Zod schema. What would have taken a team days of grep-and-spreadsheet work took hours.&lt;/p&gt;

&lt;p&gt;For this 80% of the codebase — the predictable, pattern-following, structurally repetitive code — AI agents are the best junior developers money can buy. Tireless. Cheap. No ego. Almost never make a mistake on the stuff they're good at.&lt;/p&gt;

&lt;p&gt;An army of juniors sitting at your terminal.&lt;/p&gt;




&lt;h2&gt;
  
  
  Then You Hit the Other 20%
&lt;/h2&gt;

&lt;p&gt;Here's where it gets interesting.&lt;/p&gt;

&lt;p&gt;We had an agent build out a webhook handler. Webhooks in payments are critical — they're how you know a payment succeeded, failed, or needs attention. The agent wrote the handler. It looked clean. Tests passed.&lt;/p&gt;

&lt;p&gt;But it silently ignored the edge cases.&lt;/p&gt;

&lt;p&gt;Status transitions have rules. A payment can go from &lt;code&gt;pending&lt;/code&gt; to &lt;code&gt;complete&lt;/code&gt;. It cannot go from &lt;code&gt;complete&lt;/code&gt; back to &lt;code&gt;pending&lt;/code&gt;. When a human developer builds this, they think about the illegal transitions because they've seen what happens when money moves backwards. They build the guard because they've felt the pain of not having it.&lt;/p&gt;

&lt;p&gt;The agent didn't care about that. It built the happy path beautifully and treated the edge cases like they didn't exist.&lt;/p&gt;

&lt;p&gt;When we do this work manually, this type of error never happens. A senior developer who has worked in payments for years doesn't forget the impossible transitions. It's not in their code — it's in their bones.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Pattern I Keep Seeing
&lt;/h2&gt;

&lt;p&gt;This isn't a one-off. After months of working with AI agents on a regulated payment stack, one pattern is consistent:&lt;/p&gt;

&lt;p&gt;AI agents optimise for completion, not correctness.&lt;/p&gt;

&lt;p&gt;They want to finish the feature. Get to the green checkmark. And to get there efficiently, they take shortcuts that look reasonable on the surface.&lt;/p&gt;

&lt;p&gt;The agent builds what should happen. It rarely builds what should &lt;em&gt;not&lt;/em&gt; happen. In payments, the negative cases are where all the real risk lives. What happens when a webhook arrives twice? What happens when a refund is requested on an already-refunded transaction? What happens when the bank returns an unexpected status code? The agent doesn't think about any of that unless you explicitly tell it to.&lt;/p&gt;

&lt;p&gt;Then there's the reusability problem. We have shared utility packages. Helper functions. Common patterns that the team has standardised on over years. The agent doesn't care. It writes its own version from scratch. It works, but now you have two implementations of the same logic — one tested and trusted in production, one freshly generated and untested. The agent is focused on completing &lt;em&gt;this&lt;/em&gt; feature, not maintaining the architecture.&lt;/p&gt;

&lt;p&gt;And the subtlest one — agents seem to optimise for fewer back-and-forth turns. It looks like they're saving cost, saving context. Complex validation? Skip it, the basic case works. Error handling for a rare edge case? Not worth the tokens. The result is code that passes every test you wrote but fails on the scenarios you didn't think to test — because those are exactly the scenarios the agent also didn't think about.&lt;/p&gt;




&lt;h2&gt;
  
  
  Juniors Don't Ship Products. They Write Code.
&lt;/h2&gt;

&lt;p&gt;Here's the frame that made this click for me.&lt;/p&gt;

&lt;p&gt;Claude — or any coding agent — is the best junior developer money can buy. An army of juniors. Tireless, cheap, no ego, near-zero error rate on routine work.&lt;/p&gt;

&lt;p&gt;But juniors don't ship products. They write code.&lt;/p&gt;

&lt;p&gt;The difference between code and a product is judgment. Knowing which transitions are illegal. Knowing that the retry logic has a specific backoff curve because you've been burned by what happens when it doesn't. Knowing that the webhook handler needs idempotency because banks sometimes send the same notification three times.&lt;/p&gt;

&lt;p&gt;That knowledge doesn't come from training data. It comes from years of operating a system, debugging at 2am, explaining to a merchant why their settlement was delayed.&lt;/p&gt;

&lt;p&gt;The most dangerous mistake a CTO can make in 2026 is buying AI to replace senior engineers. The right move is buying AI to enable them.&lt;/p&gt;

&lt;p&gt;Replace your senior with AI? You get speed plus silent disasters.&lt;/p&gt;

&lt;p&gt;Enable your senior with AI? You get an architect with an army.&lt;/p&gt;




&lt;h2&gt;
  
  
  What We Actually Do About It
&lt;/h2&gt;

&lt;p&gt;I'm not writing this to complain about AI. I'm writing this because we've built a system that works, and it might help you too.&lt;/p&gt;

&lt;p&gt;The first thing we did was make our architecture machine-readable. We extract design patterns and architecture rules into formats that agents can consume. When an agent works on our codebase, it doesn't just see code — it sees boundaries, patterns, rules about what belongs where. Not documentation nobody reads. Lints and constraints that the agent can't ignore.&lt;/p&gt;

&lt;p&gt;Then we invested heavily in testing the negative cases. Every PR — human or AI — runs through the same suite. But we specifically built tests for the stuff agents skip: illegal state transitions, duplicate webhook handling, idempotency checks. If the agent silently drops a negative case, the tests catch it before it ships.&lt;/p&gt;

&lt;p&gt;And seniors still review everything that touches money. No AI-generated payment logic ships without a senior looking at it. Not because we don't trust AI — because we know exactly where it's blind. The review isn't checking syntax. It's checking judgment. Did the agent handle the ambiguous bank status? Did it respect our existing retry logic? Did it use the shared utility or reinvent the wheel?&lt;/p&gt;

&lt;p&gt;This problem bothered me enough that I started building &lt;a href="https://bodhiorchard.ai/" rel="noopener noreferrer"&gt;Bodhi Orchard&lt;/a&gt; — an open-source agentic development framework. The core idea: don't just let agents write code. Feed them the full context — architecture, design patterns, test plans, existing utilities — so they stop making the same blind-spot mistakes. Human decisions over human busywork, with guardrails that actually enforce quality.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Question for 2026
&lt;/h2&gt;

&lt;p&gt;The survey says 54% of code is AI-generated. I believe it.&lt;/p&gt;

&lt;p&gt;But here's my question: what percentage of &lt;em&gt;bugs&lt;/em&gt; in 2026 will be AI-generated?&lt;/p&gt;

&lt;p&gt;And more importantly — who's going to find them?&lt;/p&gt;

&lt;p&gt;Not the agents. They wrote the bugs in the first place. Not the juniors — they won't know enough to spot what's missing.&lt;/p&gt;

&lt;p&gt;It's going to be the seniors. The architects. The people who've operated these systems long enough to know where the bodies are buried.&lt;/p&gt;

&lt;p&gt;The 80% is solved. AI won. Celebrate that.&lt;/p&gt;

&lt;p&gt;Now invest in the humans who understand the other 20%. Because that's where your product lives or dies.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm Arun, CTO &amp;amp; Co-Founder of Atoa — a UK open banking payment platform. I write about what it's actually like to build fintech with AI, not what the conference slides say it's like. If this resonated, follow me here or on &lt;a href="https://x.com/mickyarun" rel="noopener noreferrer"&gt;X @mickyarun&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;And if you're curious about building AI-native development with proper guardrails, check out &lt;a href="https://bodhiorchard.ai/" rel="noopener noreferrer"&gt;Bodhi Orchard&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>discuss</category>
      <category>webdev</category>
      <category>startup</category>
    </item>
    <item>
      <title>Your Payment API Wasn't Built for AI Agents. Open Banking Might Be the Fix.</title>
      <dc:creator>arun rajkumar</dc:creator>
      <pubDate>Wed, 27 May 2026 11:09:19 +0000</pubDate>
      <link>https://dev.to/mickyarun/your-payment-api-wasnt-built-for-ai-agents-open-banking-might-be-the-fix-43cg</link>
      <guid>https://dev.to/mickyarun/your-payment-api-wasnt-built-for-ai-agents-open-banking-might-be-the-fix-43cg</guid>
      <description>&lt;p&gt;Stripe just shipped an Agentic Commerce Suite. PayPal launched Agent Ready. Visa predicts millions of consumers will use AI agents to complete purchases by the 2026 holiday season. Mastercard introduced Agent Pay with its own verification layer. Google launched the Agent Payments Protocol with 60+ partners.&lt;/p&gt;

&lt;p&gt;Everyone is scrambling to make payments work for AI agents.&lt;/p&gt;

&lt;p&gt;And I keep looking at all of it thinking: you're bolting agent support onto a stack that was designed for a human staring at a checkout page. That's not an integration. That's a retrofit.&lt;/p&gt;

&lt;p&gt;I run payments infrastructure. Our platform processes open banking payments for UK merchants — the kind where money moves directly from the customer's bank account, no card network in between. And from where I sit, the agentic payments conversation has a blind spot the size of Visa's interchange fee.&lt;/p&gt;

&lt;p&gt;Let me explain.&lt;/p&gt;

&lt;h2&gt;
  
  
  The card stack assumes a human is present
&lt;/h2&gt;

&lt;p&gt;Here's what happens when you process a card payment today:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;User types card number into a form (or taps a saved card)&lt;/li&gt;
&lt;li&gt;Your frontend collects PAN data, sends it to a tokenisation layer&lt;/li&gt;
&lt;li&gt;The token goes to an acquirer, who talks to the card network, who talks to the issuing bank&lt;/li&gt;
&lt;li&gt;3D Secure kicks in — the user gets a push notification or SMS OTP&lt;/li&gt;
&lt;li&gt;The issuing bank authorises (or declines)&lt;/li&gt;
&lt;li&gt;Settlement happens days later&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That flow was designed around one assumption: &lt;strong&gt;a human is sitting at a screen, ready to respond to authentication challenges.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Now replace the human with an AI agent.&lt;/p&gt;

&lt;p&gt;The agent doesn't have eyes to read an OTP. It can't tap "approve" on a banking app push notification. It can't solve a CAPTCHA. It can't parse a 3DS iframe that renders differently on every issuing bank's domain.&lt;/p&gt;

&lt;p&gt;So what do the card networks do? They build workarounds. Stripe's agentic suite generates virtual cards. Mastercard's Agent Pay pre-registers agents and skips some auth steps. Everyone is finding clever ways to route around the authentication wall that they built.&lt;/p&gt;

&lt;p&gt;That's the tell. When your entire ecosystem is engineering around its own security layer, the architecture has a problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Open banking was built for machines talking to machines
&lt;/h2&gt;

&lt;p&gt;Open banking works differently. There's no card number. No PAN. No tokenisation vault. No 3DS iframe.&lt;/p&gt;

&lt;p&gt;Here's the flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Your app (or agent) calls a payment initiation API&lt;/li&gt;
&lt;li&gt;The API returns a redirect URL to the customer's bank&lt;/li&gt;
&lt;li&gt;The customer authenticates directly with their bank (biometrics, app-based SCA)&lt;/li&gt;
&lt;li&gt;The bank confirms the payment&lt;/li&gt;
&lt;li&gt;Money moves. Settlement is instant or same-day.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Notice what's different. The payment API is a clean, stateless request-response interface. The authentication happens &lt;strong&gt;on the bank's side&lt;/strong&gt;, not inside your checkout flow. Your code never sees a card number, never handles an OTP, never renders an auth challenge.&lt;/p&gt;

&lt;p&gt;For a human using a checkout page, this is a nice UX improvement. For an AI agent calling a payment API, this is a structural advantage.&lt;/p&gt;

&lt;p&gt;The agent makes an API call. It gets back a payment URL. It hands that URL to the human for one-time bank authentication. Done. The agent doesn't need to handle auth. The bank does. The separation of concerns is clean.&lt;/p&gt;

&lt;p&gt;And with VRPs (Variable Recurring Payments) now live in the UK, it gets even better. The human authenticates once, sets spending limits, and the agent can initiate payments within those limits without any further human interaction. No virtual cards. No pre-registered agent identities. Just an API call against a mandate.&lt;/p&gt;

&lt;p&gt;That's not a workaround. That's the architecture actually working as designed.&lt;/p&gt;

&lt;h2&gt;
  
  
  What breaks when agents use card APIs
&lt;/h2&gt;

&lt;p&gt;I've been watching developers try to build agentic payment flows on card rails. Here's what keeps going wrong:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PCI scope explosion.&lt;/strong&gt; If your agent is generating virtual cards, storing card tokens, or managing card-on-file relationships, your PCI compliance surface just grew. AI agents that handle card data need the same compliance posture as any system that touches PANs. That's not a small thing. That's SOC2 scope, penetration testing, quarterly scans — all for an agent that could've made a bank-to-bank API call instead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Authentication is the bottleneck.&lt;/strong&gt; 3D Secure was designed as a human-in-the-loop check. Every attempt to skip it for agents either weakens security (bad) or creates a parallel auth system (complex and fragile). Open banking's approach — SCA happens at the bank, not at your checkout — means the agent never needs to authenticate. It just calls the API.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Settlement lag creates state management nightmares.&lt;/strong&gt; Card payments settle in days. When an agent is orchestrating a multi-step workflow (compare prices → select vendor → pay → confirm delivery), it needs to know whether payment actually landed. With cards, you get an authorisation that might reverse, a settlement that arrives Tuesday, and a chargeback window that stays open for months. With open banking, payment confirmation is real-time. The state machine is simpler because the money actually moved.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Micro-payments don't work on card rails.&lt;/strong&gt; AI agents generate hundreds of micro-transactions per session. The interchange fee floor on card payments makes sub-pound transactions economically absurd. Open banking's fee structure is flat or percentage-based without the card network's minimum — which is why it actually works for the agentic pattern of many small, frequent payments.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd actually build
&lt;/h2&gt;

&lt;p&gt;If I were starting an agentic commerce integration today for UK customers, here's the architecture I'd reach for:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For one-off agent-initiated purchases:&lt;/strong&gt; Open banking payment initiation. The agent creates a payment request via API, gets a consent URL, passes it to the user. One SCA event, then the money moves. No card data anywhere in the stack.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent → Payment API (create payment request)
      → User gets bank auth link
      → User approves in banking app (biometrics)
      → Webhook confirms payment
      → Agent continues workflow
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;For recurring agent-managed spending:&lt;/strong&gt; VRP mandates. User sets up a mandate once — monthly cap, per-transaction cap, approved merchant. The agent calls the payment API within those bounds. No re-authentication. No virtual cards. No 3DS.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User → Sets up VRP mandate (one-time SCA)
Agent → Calls payment API within mandate limits
      → Instant confirmation via webhook
      → No further user interaction needed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;For the auth layer:&lt;/strong&gt; Don't build one. Seriously. The bank handles it. Your agent's job is to orchestrate the payment flow, not to authenticate the user. That's separation of concerns, and it's the right call whether you're building for agents or humans.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real question no one's asking
&lt;/h2&gt;

&lt;p&gt;Everyone is asking "how do we make card payments work for AI agents?"&lt;/p&gt;

&lt;p&gt;I think that's the wrong question.&lt;/p&gt;

&lt;p&gt;The right question is: &lt;strong&gt;why are we starting with the payment rail that requires the most human interaction, then engineering the human out?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Open banking was built on the premise that software talks to bank APIs. The authentication layer lives at the bank. The payment instruction is a clean API call. Settlement is immediate. There's no card number to protect, no 3DS challenge to render, no interchange fee eating your margin on micro-transactions.&lt;/p&gt;

&lt;p&gt;It wasn't designed for AI agents. But its architecture fits the agentic pattern better than anything the card networks are retrofitting.&lt;/p&gt;

&lt;p&gt;Sometimes the future doesn't come from the company with the biggest R&amp;amp;D budget. Sometimes it comes from the infrastructure that was built on the right abstraction in the first place.&lt;/p&gt;

&lt;p&gt;The UK has live open banking rails, live VRPs, and an FCA that's actively building the regulatory framework for what comes next. If you're a developer building agentic commerce for UK customers, you have a head start. Use it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm Arun, CTO at Atoa. We build open banking payment infrastructure for UK merchants. If you want to see what the API looks like: &lt;a href="https://docs.atoa.me" rel="noopener noreferrer"&gt;docs.atoa.me&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Try the sandbox → &lt;a href="https://docs.atoa.me" rel="noopener noreferrer"&gt;docs.atoa.me&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>fintech</category>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>FCA Just Rewrote the Open Banking Playbook for 2026. Here's What UK Payment Developers Actually Need to Know</title>
      <dc:creator>arun rajkumar</dc:creator>
      <pubDate>Wed, 13 May 2026 10:08:03 +0000</pubDate>
      <link>https://dev.to/mickyarun/fca-just-rewrote-the-open-banking-playbook-for-2026-heres-what-uk-payment-developers-actually-8gg</link>
      <guid>https://dev.to/mickyarun/fca-just-rewrote-the-open-banking-playbook-for-2026-heres-what-uk-payment-developers-actually-8gg</guid>
      <description>&lt;p&gt;I'm the CTO of an FCA-authorised Payment Institution. I spend most of my week either writing payments code or reading FCA / PSR consultation papers so my team doesn't have to.&lt;/p&gt;

&lt;p&gt;2026 is the year that work stopped being optional for everyone else.&lt;/p&gt;

&lt;p&gt;Three things shifted in the first half of the year, and if you ship anything that touches UK payments — checkouts, wallets, invoicing, recurring billing — at least two of them affect your roadmap. Most developer threads I read are still pattern-matching open banking onto "OAuth, but for banks." The actual changes are more interesting than that, and a lot more consequential.&lt;/p&gt;

&lt;p&gt;Here's the part I wish someone had handed me as a one-pager.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Variable Recurring Payments went from "spec" to "shipping"
&lt;/h2&gt;

&lt;p&gt;The FCA confirmed that the first commercial Variable Recurring Payments (VRPs) under the UK Payments Initiative scheme started flowing in Q1 2026. Phase 1 covers utilities, financial services top-ups, and payments to local and central government. (&lt;a href="https://www.regulationtomorrow.com/eu/fca-psr-joint-statement-on-open-banking-pricing-models/" rel="noopener noreferrer"&gt;FCA / PSR joint statement on open banking pricing models&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;For developers, this is the line where open banking stops being a one-off-payment toy and starts looking like a real subscription rail.&lt;/p&gt;

&lt;p&gt;What changes in your code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Consent has duration now.&lt;/strong&gt; A VRP consent is the closest thing UK open banking has ever had to "card-on-file." The user authorises a mandate — amount caps per period, total cap, expiry — and you can initiate payments against it without a fresh SCA dance every time. That's a different state machine than single-payment PIS.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You need to store mandate IDs, not card tokens.&lt;/strong&gt; The shape of the entity you persist is closer to a Direct Debit instruction than a Stripe &lt;code&gt;payment_method&lt;/code&gt;. Caps, frequency, last-used timestamp, status, revocation timestamp.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Revocation is bank-side, not app-side.&lt;/strong&gt; Users can kill a VRP mandate inside their banking app and you get a webhook telling you it's gone. If your retry logic assumes the mandate is still good because the user didn't cancel in &lt;em&gt;your&lt;/em&gt; UI, you'll log a lot of 401s.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pricing is now regulated.&lt;/strong&gt; The FCA / PSR joint statement in 2026 explicitly addressed VRP pricing models. Translation: this is no longer a back-room commercial conversation between ASPSPs and TPPs. The rate card has rails.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're SaaS-billed-monthly and your customers are UK consumers, this is the first year where "switch from cards to open banking" is a real engineering conversation, not a 2027 prediction.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. The FCA is getting actual rule-making powers over open banking
&lt;/h2&gt;

&lt;p&gt;The other big shift: the Data (Use and Access) Act gives the FCA new statutory powers to set open banking rules directly, rather than acting through the legacy CMA-mandated framework. The FCA has said it will consult on the long-term regulatory framework before the end of 2026, with workshops over the summer and autumn. (&lt;a href="https://www.fca.org.uk/news/news-stories/open-banking-2025-progress" rel="noopener noreferrer"&gt;FCA: Open banking — a year of progress&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;A developer-facing translation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The Open Banking Implementation Entity (OBIE) era is winding down. A "Future Entity" is being stood up — the FCA wrote to trade associations in late 2025 and expected the process to kick off around February 2026. (&lt;a href="https://www.regulationtomorrow.com/2026/01/open-banking-process-to-establish-a-future-entity/" rel="noopener noreferrer"&gt;Open banking — process to establish a Future Entity&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;The Open Banking Standard you've been integrating against will get formalised into FCA rules instead of CMA orders. Same spec, harder enforcement, broader scope.&lt;/li&gt;
&lt;li&gt;Expect new categories of regulated activity to land in scope: open finance (savings, pensions, investments, insurance) is being framed as the next phase, and the FCA published a vision paper for it. (&lt;a href="https://www.fca.org.uk/news/press-releases/fca-sets-out-vision-open-finance" rel="noopener noreferrer"&gt;FCA: vision for open finance&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're betting on open banking, you're now betting on a rail with a clearer regulator on top of it. That tends to be good for serious builders and bad for people who were treating PISP authorisation as optional.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. The architecture decision hasn't changed, but the stakes have
&lt;/h2&gt;

&lt;p&gt;The architectural choice for any UK payment developer is still binary, and it's the same one I wrote about in &lt;a href="https://dev.to/mickyarun"&gt;What Developers Get Wrong About PSD2&lt;/a&gt; a few weeks ago:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Become an FCA-authorised PISP yourself.&lt;/strong&gt; Months of compliance work. A regulated entity. Ongoing capital requirements. Direct ASPSP relationships. Worth it if payments is your business.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integrate against an authorised provider.&lt;/strong&gt; Days, not months. One API. Webhook semantics that don't change every time a CMA9 bank updates their consent flow.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;What 2026 changes is the &lt;em&gt;cost of getting this wrong&lt;/em&gt;. With VRPs going live, open finance on the roadmap, and the FCA now sitting on direct rule-making powers, the compliance surface is going to grow. If you've shipped a bank-by-bank integration, that surface grows linearly with every spec revision. We learned this running payments services across the CMA9 — every quarter, something changes.&lt;/p&gt;

&lt;p&gt;If you're not a payments company, don't become one by accident.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the integration actually looks like
&lt;/h2&gt;

&lt;p&gt;Same minimal pattern as a single-payment PIS, slightly different envelope for VRP. A mandate creation against Atoa looks roughly like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// 1. Create a VRP mandate&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;mandate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://api.atoa.me/v1/mandates&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Authorization&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Bearer &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;ATOA_API_KEY&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;reference&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;subscription&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;period&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;monthly&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;max_amount_per_period&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4999&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;// £49.99 cap per month&lt;/span&gt;
    &lt;span class="na"&gt;max_total_amount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;599999&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;       &lt;span class="c1"&gt;// £5,999.99 lifetime cap&lt;/span&gt;
    &lt;span class="na"&gt;expires_at&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;2027-05-13&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;redirect_url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://app.example.com/mandates/return&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;}),&lt;/span&gt;
&lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;

&lt;span class="c1"&gt;// 2. Send the customer to mandate.authorisation_url&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;redirect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;mandate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;authorisation_url&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// 3. Later, charge against the mandate without re-authing the user&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;payment&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`https://api.atoa.me/v1/mandates/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;mandate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/payments`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Authorization&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Bearer &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;ATOA_API_KEY&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4999&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;currency&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;GBP&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;reference&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;invoice&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;}),&lt;/span&gt;
&lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The user does SCA once, when the mandate is created. Every subsequent charge is a single API call. No 3DS dialogs. No saved-card vault. No PAN data on your servers.&lt;/p&gt;

&lt;p&gt;This is the thing developers don't realise until they ship it: the new open banking rail isn't a worse Stripe — it's a different shape, and once VRPs are live, that shape covers most of the cases card-on-file used to.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd do this quarter
&lt;/h2&gt;

&lt;p&gt;If you're shipping UK payments in 2026, three concrete moves:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Add VRP to your payments roadmap, even if you don't ship it this quarter.&lt;/strong&gt; The schema you'd need to model mandates is going to land in your codebase eventually. Sketch the data model now while you're still designing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Decide whether you're integrating directly or through an aggregator before the FCA consultation lands.&lt;/strong&gt; The new rules will reset the bar for "compliant" — if you're piecing together bank integrations, you've now got a regulatory clock on top of an engineering one.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Subscribe to the FCA consultation feed.&lt;/strong&gt; Genuinely. The 2026 papers will set the rails for the next five years. (&lt;a href="https://www.fca.org.uk/news/press-releases/fca-sets-out-vision-open-finance" rel="noopener noreferrer"&gt;FCA: vision for open finance&lt;/a&gt;)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is the moment open banking stops being "the alternative" and starts being "the rail." If you're a developer shipping UK payments, the bet you're being asked to make this year is whether you want to build on the rail before it's mainstream, or after.&lt;/p&gt;

&lt;p&gt;I'd build on it now.&lt;/p&gt;




&lt;p&gt;If you want to try the API and skip the regulatory headache, the sandbox is open: &lt;a href="https://docs.atoa.me" rel="noopener noreferrer"&gt;docs.atoa.me&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>api</category>
      <category>backend</category>
      <category>news</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>Payment Webhooks Will Lie To You. Here's How We Built Ones That Don't (in NestJS)</title>
      <dc:creator>arun rajkumar</dc:creator>
      <pubDate>Wed, 29 Apr 2026 11:48:31 +0000</pubDate>
      <link>https://dev.to/mickyarun/payment-webhooks-will-lie-to-you-heres-how-we-built-ones-that-dont-in-nestjs-30g9</link>
      <guid>https://dev.to/mickyarun/payment-webhooks-will-lie-to-you-heres-how-we-built-ones-that-dont-in-nestjs-30g9</guid>
      <description>&lt;p&gt;A payment webhook fires once. You miss it. The customer thinks they paid. Your dashboard says they didn't.&lt;/p&gt;

&lt;p&gt;Welcome to my Tuesday morning, two years ago.&lt;/p&gt;

&lt;p&gt;I've shipped four payment webhook systems in my career. The first three taught me everything I now refuse to do again. The fourth — the one running inside Atoa today — handles open banking payment notifications across our Node.js services without a single missed event in the last 14 months.&lt;/p&gt;

&lt;p&gt;Here's the boring, opinionated, production-tested pattern.&lt;/p&gt;

&lt;h2&gt;
  
  
  The lie webhooks tell you
&lt;/h2&gt;

&lt;p&gt;Every payment platform sells webhooks the same way:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"We'll notify your endpoint the moment the payment status changes."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;What they don't sell you on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Webhooks &lt;strong&gt;retry&lt;/strong&gt;. Sometimes 8 times. Sometimes never.&lt;/li&gt;
&lt;li&gt;Webhooks &lt;strong&gt;arrive out of order&lt;/strong&gt;. &lt;code&gt;failed&lt;/code&gt; can land before &lt;code&gt;pending&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Webhooks &lt;strong&gt;lie about idempotency&lt;/strong&gt;. Two &lt;code&gt;succeeded&lt;/code&gt; events for the same payment is normal, not a bug.&lt;/li&gt;
&lt;li&gt;Webhooks &lt;strong&gt;drop&lt;/strong&gt;. Network blip, your pod restart, a bad DNS lookup — one missed delivery and your reconciliation is wrong.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your webhook handler is a 30-line controller that updates a row in your database, you don't have a payment system. You have a hope.&lt;/p&gt;

&lt;h2&gt;
  
  
  The four-layer pattern
&lt;/h2&gt;

&lt;p&gt;Every webhook flow we run at Atoa has four layers. Skip any one and you'll be reconciling spreadsheets at midnight.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Verify the signature &lt;em&gt;before&lt;/em&gt; you parse the body
&lt;/h3&gt;

&lt;p&gt;The most common bug I see in code reviews from junior devs: parsing the JSON before checking the HMAC.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// webhook.controller.ts&lt;/span&gt;
&lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nd"&gt;Post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;atoa&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nd"&gt;Headers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;x-atoa-signature&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nx"&gt;signature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nd"&gt;RawBody&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Buffer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;// raw, not parsed&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;crypto&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;verify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;signature&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;secret&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;UnauthorizedException&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enqueue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;received&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two non-negotiables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use the &lt;strong&gt;raw body&lt;/strong&gt; for HMAC verification. NestJS's default JSON parser will mutate whitespace and break your signature check. Enable &lt;code&gt;rawBody: true&lt;/code&gt; on the app.&lt;/li&gt;
&lt;li&gt;Reject before you do &lt;em&gt;anything else&lt;/em&gt;. No DB hits, no logging the payload at info level, nothing.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Acknowledge fast. Process slow.
&lt;/h3&gt;

&lt;p&gt;The webhook controller does two things: verify, enqueue. That's it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// verify (above)&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enqueue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;payment.webhook&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;received&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;  &lt;span class="c1"&gt;// 200 within ~50ms&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your handler takes 8 seconds because you're hitting Stripe + your DB + sending an email, the sender will time out and retry. Now you have two events. Then four. Then the on-call engineer.&lt;/p&gt;

&lt;p&gt;We use BullMQ on Redis. You can use SQS, NATS, Kafka — pick your poison. The point is: &lt;strong&gt;the HTTP response is decoupled from the work&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Idempotency keys are not optional
&lt;/h3&gt;

&lt;p&gt;Every event has an &lt;code&gt;event_id&lt;/code&gt;. Before you do anything in your worker:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nd"&gt;Processor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;payment.webhook&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;WebhookProcessor&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Job&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;WebhookEvent&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;event_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;payment_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;seen&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;events&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;firstSeen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event_id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;seen&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Duplicate event &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;event_id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; — skipping`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;applyStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;payment_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;event_id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;firstSeen&lt;/code&gt; is a write to a Postgres table with &lt;code&gt;event_id&lt;/code&gt; as the primary key. If the insert succeeds, this is the first time we've seen this event. If it conflicts, we've processed it before. No race conditions, no Redis dance — just let the database do the work it's good at.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. State machines, not status updates
&lt;/h3&gt;

&lt;p&gt;This is the one that took me three failed payment systems to learn.&lt;/p&gt;

&lt;p&gt;A payment doesn't have a "status field." It has a &lt;strong&gt;state machine&lt;/strong&gt;. Some transitions are legal. Most aren't.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ALLOWED&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;PaymentStatus&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;PaymentStatus&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;initiated&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;authorising&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;failed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;authorising&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;succeeded&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;failed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;succeeded&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;            &lt;span class="c1"&gt;// terminal&lt;/span&gt;
  &lt;span class="na"&gt;failed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;               &lt;span class="c1"&gt;// terminal&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;applyStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;PaymentStatus&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;eventId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;payment&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;payments&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findById&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;ALLOWED&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;payment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Illegal transition: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;payment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; → &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;       &lt;span class="c1"&gt;// do not update, do not throw — this is normal&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;payments&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;transition&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;eventId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why this matters: when &lt;code&gt;failed&lt;/code&gt; arrives before &lt;code&gt;pending&lt;/code&gt; (and it will), your code shouldn't downgrade a &lt;code&gt;succeeded&lt;/code&gt; payment to &lt;code&gt;failed&lt;/code&gt;. With a state machine, the invalid transition is dropped. The reconciler picks it up later. The customer's payment stays correct.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we'd never do again
&lt;/h2&gt;

&lt;p&gt;Three patterns I see in the wild that I had to unlearn:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Polling instead of webhooks&lt;/strong&gt;. "We'll just check the status every 30 seconds." Sure — and you'll burn rate limits, miss the 5-second window where a customer is staring at the spinner, and pay for compute that does nothing 99% of the time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Replaying webhooks by re-running the handler&lt;/strong&gt;. If the handler does five things, replaying it does five things again. Idempotency keys mean replays are free.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Logging the full payload at info level&lt;/strong&gt;. PSD2 says your logs are PII now. Log the event_id and the status. Nothing else.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Where this gets you
&lt;/h2&gt;

&lt;p&gt;We process open banking payment notifications across dozens of UK merchants on this exact pattern. Zero missed events in 14 months. Reconciliation runs once a day and finds nothing to reconcile.&lt;/p&gt;

&lt;p&gt;The pattern doesn't care which payment provider you use. Stripe, GoCardless, Atoa — same four layers.&lt;/p&gt;

&lt;p&gt;If you want to see what these webhooks look like on the open banking side, our API docs walk through the full payment lifecycle and the webhook events we fire: &lt;a href="https://docs.atoa.me/api-reference/Payment/process-payment" rel="noopener noreferrer"&gt;docs.atoa.me&lt;/a&gt;. Sandbox is free, no card needed.&lt;/p&gt;

&lt;p&gt;Build the boring layers first. Sleep through Tuesday mornings.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Arun is co-founder &amp;amp; CTO of &lt;a href="https://paywithatoa.co.uk" rel="noopener noreferrer"&gt;Atoa&lt;/a&gt;, a UK open banking payments platform. He's &lt;a class="mentioned-user" href="https://dev.to/mickyarun"&gt;@mickyarun&lt;/a&gt; on X and dev.to. Driven by passion.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>nestjs</category>
      <category>webhooks</category>
      <category>openbanking</category>
      <category>node</category>
    </item>
    <item>
      <title>I Asked Three Coding Agents to Build My Son's Cricket Coach a Website. The Result Wasn't Decided by the Model — It Was Decided by Taste.</title>
      <dc:creator>arun rajkumar</dc:creator>
      <pubDate>Tue, 28 Apr 2026 08:51:56 +0000</pubDate>
      <link>https://dev.to/mickyarun/i-asked-three-coding-agents-to-build-my-sons-cricket-coach-a-website-the-result-wasnt-decided-by-3fam</link>
      <guid>https://dev.to/mickyarun/i-asked-three-coding-agents-to-build-my-sons-cricket-coach-a-website-the-result-wasnt-decided-by-3fam</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fog0y03vew7mee6anv6xy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fog0y03vew7mee6anv6xy.png" alt=" " width="800" height="421"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; — Codex GPT-5.5, Claude Opus 4.7, Gemini 3.1 Pro. Same prompt. Same 18 photos. Five total runs across different effort budgets. The one that won wasn't the prettiest. It was the one that understood the job: parents in Bengaluru enquire on WhatsApp, not contact forms.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;My son's cricket coach asked me for a website.&lt;/p&gt;

&lt;p&gt;Saturday afternoon. He runs &lt;strong&gt;Bangalore Royal Cricket Academy&lt;/strong&gt; — a small but seriously good cricket academy for kids. He had two phone numbers, a folder of 18 WhatsApp photos taken by parents, and a single line of brief: &lt;em&gt;"Like a real cricket academy, parents should be able to call or WhatsApp from their phone."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I'm a CTO. I'm in the trenches with AI coding agents most weeks. This felt like a clean, low-stakes test.&lt;/p&gt;

&lt;p&gt;So I gave the &lt;strong&gt;exact same prompt&lt;/strong&gt; and the &lt;strong&gt;exact same 18 photos&lt;/strong&gt; to three coding agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI Codex&lt;/strong&gt; (GPT-5.5, medium effort)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anthropic Claude Opus 4.7&lt;/strong&gt; (low effort, then re-run on medium)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google Gemini 3.1 Pro&lt;/strong&gt; (low effort, then re-run on high)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Five outputs. One Saturday. Five very different opinions on what "a cricket academy website" actually is.&lt;/p&gt;

&lt;p&gt;I went in expecting a verdict on visual quality. I didn't get one. I got something more interesting.&lt;/p&gt;




&lt;h2&gt;
  
  
  The setup
&lt;/h2&gt;

&lt;p&gt;The prompt was deliberately short:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Build a single-page website for Bangalore Royal Cricket Academy. Brand line: "Nurturing champions, one delivery at a time." Programs: Summer Camp, Weekday Batch, Weekend Batch, Intensive (elite). Two phone numbers. The photos are in &lt;code&gt;/photos for website&lt;/code&gt;. Parents should be able to contact us easily from their phone.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's it. No design system. No colour palette. No mention of WhatsApp by name. No mention of tests, deployment, SEO meta, or Cloudflare. Whatever each agent decided "easily contact us from their phone" meant — that was on the agent.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I got back, in five outputs
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Claude Opus 4.7, low effort
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F761vdk2apqzybhinstvj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F761vdk2apqzybhinstvj.png" alt=" " width="800" height="2503"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Single-file HTML, Tailwind via CDN, Bebas Neue display font, royal navy + gold palette.&lt;/p&gt;

&lt;p&gt;The headline made me sit up: &lt;strong&gt;"CHAMPIONS ARE / BUILT HERE."&lt;/strong&gt; with the second half in gold. It was the only one of the five where the hero felt like it belonged on a printed flyer the coach would hand out at a school. Visually polished.&lt;/p&gt;

&lt;p&gt;Engineering-wise, thin: no tests, no OG tags beyond a &lt;code&gt;&amp;lt;meta description&amp;gt;&lt;/code&gt;, photos referenced as &lt;code&gt;img-01.jpg&lt;/code&gt;…&lt;code&gt;img-18.jpg&lt;/code&gt;, all 14 used in a uniform 4-column grid. Tel: links only. No WhatsApp.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Claude Opus 4.7, medium effort
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ulaugzkrw1uou3urocp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ulaugzkrw1uou3urocp.png" alt=" " width="800" height="2381"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Same starting point, completely different output.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;section&lt;/span&gt; &lt;span class="na"&gt;id=&lt;/span&gt;&lt;span class="s"&gt;"top"&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"relative h-screen min-h-[640px] w-full overflow-hidden"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"absolute inset-0"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;img&lt;/span&gt; &lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"assets/photos/brca-01.jpeg"&lt;/span&gt; &lt;span class="na"&gt;alt=&lt;/span&gt;&lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"kenburns"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"absolute inset-0 bg-gradient-to-b from-navy-deep/85 via-navy/70 to-navy-deep/95"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/div&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/div&amp;gt;&lt;/span&gt;
  ...
&lt;span class="nt"&gt;&amp;lt;/section&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Full-screen hero with a &lt;strong&gt;Ken Burns animation&lt;/strong&gt; on the image. A scroll indicator with an animated dot inside a mouse outline. A &lt;strong&gt;gold cricket-seam pattern divider&lt;/strong&gt; between sections — actual dashed lines that look like ball stitching. Two-image collage in the about section with offset margins. CSS-columns masonry gallery using all 15 photos. Inline-SVG favicon as a data URI (one fewer request). OG tags. &lt;code&gt;theme-color&lt;/code&gt;. WhatsApp deep-link button on the contact section.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;a&lt;/span&gt; &lt;span class="na"&gt;href=&lt;/span&gt;&lt;span class="s"&gt;"https://wa.me/917337726777?text=Hi%20BRCA%2C%20I%27d%20like%20to%20know%20more%20about%20your%20programs."&lt;/span&gt;
   &lt;span class="na"&gt;target=&lt;/span&gt;&lt;span class="s"&gt;"_blank"&lt;/span&gt; &lt;span class="na"&gt;rel=&lt;/span&gt;&lt;span class="s"&gt;"noopener"&lt;/span&gt;
   &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"bg-gold text-navy font-semibold px-6 py-3.5 rounded-md"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
  💬 Message us on WhatsApp
&lt;span class="nt"&gt;&amp;lt;/a&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This was the prettiest output of the five. By a clear margin. Bebas Neue + Inter, Ken Burns, gold seam, masonry — the only one I'd let near a printer.&lt;/p&gt;

&lt;p&gt;Still Tailwind via CDN. Still no test suite. Still no automated deploy. Photos renamed semantically (&lt;code&gt;brca-01.jpeg&lt;/code&gt;).&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Codex GPT-5.5, medium effort
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnu67o7yrce765tiz7n4z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnu67o7yrce765tiz7n4z.png" alt=" " width="800" height="2949"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Vanilla HTML + 800-line vanilla CSS + 16 lines of vanilla JS. White-and-navy local-business layout. Numbered "01–04" feature blocks. WhatsApp green CTAs in the contact section.&lt;/p&gt;

&lt;p&gt;It looks less editorial than Claude-medium. It also does five things none of the others did.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One.&lt;/strong&gt; It picked &lt;strong&gt;6 photos&lt;/strong&gt; out of 18 and renamed them by content:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;brca-team-ground.jpeg
brca-trophy-team.jpeg
brca-trophy-presentation.jpeg
brca-young-achievers.jpeg
brca-coaching-moment.jpeg
brca-floodlight-batch.jpeg
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's editorial judgement encoded in code output. It chose; it didn't dump everything into a grid.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Two.&lt;/strong&gt; It wrote a &lt;code&gt;_headers&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;/*
  &lt;span class="n"&gt;X&lt;/span&gt;-&lt;span class="n"&gt;Content&lt;/span&gt;-&lt;span class="n"&gt;Type&lt;/span&gt;-&lt;span class="n"&gt;Options&lt;/span&gt;: &lt;span class="n"&gt;nosniff&lt;/span&gt;
  &lt;span class="n"&gt;Referrer&lt;/span&gt;-&lt;span class="n"&gt;Policy&lt;/span&gt;: &lt;span class="n"&gt;strict&lt;/span&gt;-&lt;span class="n"&gt;origin&lt;/span&gt;-&lt;span class="n"&gt;when&lt;/span&gt;-&lt;span class="n"&gt;cross&lt;/span&gt;-&lt;span class="n"&gt;origin&lt;/span&gt;
  &lt;span class="n"&gt;Permissions&lt;/span&gt;-&lt;span class="n"&gt;Policy&lt;/span&gt;: &lt;span class="n"&gt;camera&lt;/span&gt;=(), &lt;span class="n"&gt;microphone&lt;/span&gt;=(), &lt;span class="n"&gt;geolocation&lt;/span&gt;=()

/&lt;span class="n"&gt;assets&lt;/span&gt;/*
  &lt;span class="n"&gt;Cache&lt;/span&gt;-&lt;span class="n"&gt;Control&lt;/span&gt;: &lt;span class="n"&gt;public&lt;/span&gt;, &lt;span class="n"&gt;max&lt;/span&gt;-&lt;span class="n"&gt;age&lt;/span&gt;=&lt;span class="m"&gt;31536000&lt;/span&gt;, &lt;span class="n"&gt;immutable&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Security headers and cache rules. I didn't ask for them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Three.&lt;/strong&gt; It wrote a real test suite using &lt;code&gt;node:test&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;home page exposes call and WhatsApp enrollment links&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;html&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;index.html&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;assert&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;html&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sr"&gt;/href="tel:&lt;/span&gt;&lt;span class="se"&gt;\+&lt;/span&gt;&lt;span class="sr"&gt;917337726777"/&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;assert&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;html&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sr"&gt;/href="tel:&lt;/span&gt;&lt;span class="se"&gt;\+&lt;/span&gt;&lt;span class="sr"&gt;917337736777"/&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;assert&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;html&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sr"&gt;/https:&lt;/span&gt;&lt;span class="se"&gt;\/\/&lt;/span&gt;&lt;span class="sr"&gt;wa&lt;/span&gt;&lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="sr"&gt;me&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;917337726777/&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;assert&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;html&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sr"&gt;/https:&lt;/span&gt;&lt;span class="se"&gt;\/\/&lt;/span&gt;&lt;span class="sr"&gt;wa&lt;/span&gt;&lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="sr"&gt;me&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;917337736777/&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;referenced local assets and Cloudflare Pages config exist&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;imageRefs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="nx"&gt;html&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;matchAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/src="&lt;/span&gt;&lt;span class="se"&gt;([^&lt;/span&gt;&lt;span class="sr"&gt;"&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;\.(?:&lt;/span&gt;&lt;span class="sr"&gt;jpg|jpeg|png|webp&lt;/span&gt;&lt;span class="se"&gt;))&lt;/span&gt;&lt;span class="sr"&gt;"/gi&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
  &lt;span class="nx"&gt;assert&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;imageRefs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;at least six academy photos are used&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ref&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;imageRefs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;assert&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;existsSync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;URL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;root&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; exists`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three tests. They assert brand text, both phone numbers, both WhatsApp links, security file existence, responsive CSS, and that &lt;strong&gt;every referenced image actually exists on disk&lt;/strong&gt;. That last one is the one I respect most. It catches the single most common silent break in a static site.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Four.&lt;/strong&gt; Every primary CTA is a &lt;code&gt;wa.me&lt;/code&gt; deep link with &lt;strong&gt;prefilled message text&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;a&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"contact-link whatsapp"&lt;/span&gt;
   &lt;span class="na"&gt;href=&lt;/span&gt;&lt;span class="s"&gt;"https://wa.me/917337726777?text=Hi%20BRCA%2C%20I%20would%20like%20to%20know%20more%20about%20cricket%20training."&lt;/span&gt;
   &lt;span class="na"&gt;target=&lt;/span&gt;&lt;span class="s"&gt;"_blank"&lt;/span&gt; &lt;span class="na"&gt;rel=&lt;/span&gt;&lt;span class="s"&gt;"noopener"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;span&amp;gt;&lt;/span&gt;WhatsApp&lt;span class="nt"&gt;&amp;lt;/span&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;strong&amp;gt;&lt;/span&gt;7337726777&lt;span class="nt"&gt;&amp;lt;/strong&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/a&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Not just &lt;code&gt;wa.me/91…&lt;/code&gt;. &lt;strong&gt;Pre-filled message text.&lt;/strong&gt; Parent taps. Message lands. Zero typing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Five.&lt;/strong&gt; It deployed it. It opened my browser, walked me through a Cloudflare OAuth handshake, then pushed the build to Cloudflare Pages. The &lt;code&gt;.wrangler/cache/pages.json&lt;/code&gt; left behind:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"account_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"project_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"brca-academy"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Most coding agents stop at &lt;em&gt;"here's the HTML."&lt;/em&gt; Codex stopped at a live URL. That distinction — treating &lt;em&gt;"build a website"&lt;/em&gt; as a unit of work that includes shipping, not just generating markup — is what made me rate it the most production-ready output of the five.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Gemini 3.1 Pro, low effort
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzfgauyi415d82r7gqrar.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzfgauyi415d82r7gqrar.png" alt=" " width="800" height="2586"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Dark slate background. Electric blue + amber accents. 60 lines of vanilla JS with an IntersectionObserver scroll-reveal effect.&lt;/p&gt;

&lt;p&gt;It looked like a SaaS analytics dashboard. Wrong audience by about ten years. Photos referenced as &lt;code&gt;photo_1.jpeg&lt;/code&gt;…&lt;code&gt;photo_18.jpeg&lt;/code&gt;. Tel: only.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Gemini 3.1 Pro, high effort
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvmtvs96ps1v54ua89a04.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvmtvs96ps1v54ua89a04.png" alt=" " width="800" height="2183"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Palette fixed: navy + amber. Playfair Display + Outfit for typography. About section with an image collage and an "Elite Training Facility" badge. Wider elite-program card with a dedicated highlights box. Mobile menu with hamburger.&lt;/p&gt;

&lt;p&gt;Visually, a different website from the low-effort version. Genuinely better.&lt;/p&gt;

&lt;p&gt;What it still didn't have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;WhatsApp deep links. Anywhere. Tel: only.&lt;/li&gt;
&lt;li&gt;OG tags or &lt;code&gt;theme-color&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;A test suite.&lt;/li&gt;
&lt;li&gt;A deployment config.&lt;/li&gt;
&lt;li&gt;Semantic photo names — still &lt;code&gt;img1.jpeg&lt;/code&gt; through &lt;code&gt;img8.jpeg&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More budget bought better visuals. It didn't buy better judgement about what a Bengaluru cricket academy website is &lt;em&gt;for&lt;/em&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What actually decided it
&lt;/h2&gt;

&lt;p&gt;Not the prettiest hero. Not the cleverest animation.&lt;/p&gt;

&lt;p&gt;This:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;In Bengaluru, parents enquire on WhatsApp. Not email. Not contact forms. Not phone calls until they've messaged first.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The single biggest conversion lever for an Indian local business website is &lt;code&gt;wa.me&lt;/code&gt; deep linking with prefilled message text. Parent opens the page. Parent taps the button. WhatsApp opens with "Hi BRCA, I would like to know more about cricket training" already typed. They send. Coach gets a notification.&lt;/p&gt;

&lt;p&gt;Codex did this on every primary CTA. Claude-medium did it as one button at the bottom of the contact section. Claude-low, Gemini-low, and Gemini-high didn't do it at all.&lt;/p&gt;

&lt;p&gt;That single decision was worth more than the prettiest hero in the comparison.&lt;/p&gt;




&lt;h2&gt;
  
  
  The thing I wasn't expecting
&lt;/h2&gt;

&lt;p&gt;I went in assuming effort budget would be the variable that explained quality differences.&lt;/p&gt;

&lt;p&gt;Compare what happened when I doubled the effort budget on each model:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude (low → medium):&lt;/strong&gt; The visual quality jumped from "pretty" to "editorial-grade". It added Ken Burns animation, masonry gallery, OG tags, a &lt;code&gt;theme-color&lt;/code&gt;, semantic photo names, &lt;strong&gt;and a WhatsApp button&lt;/strong&gt;. It also renamed photos from &lt;code&gt;img-XX.jpg&lt;/code&gt; to &lt;code&gt;brca-XX.jpeg&lt;/code&gt;. The model used the extra budget to upgrade both taste &lt;em&gt;and&lt;/em&gt; product judgement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gemini (low → high):&lt;/strong&gt; The visual quality jumped. The palette got fixed. The typography got upgraded. The layout got more sophisticated.&lt;/p&gt;

&lt;p&gt;It still didn't add WhatsApp.&lt;/p&gt;

&lt;p&gt;It still didn't write tests.&lt;/p&gt;

&lt;p&gt;It still didn't deploy.&lt;/p&gt;

&lt;p&gt;It still left photos as &lt;code&gt;img1.jpeg&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;More budget didn't teach the model what the website was &lt;em&gt;for&lt;/em&gt;. It only taught it to make the wrong website prettier.&lt;/p&gt;

&lt;p&gt;The headline isn't &lt;em&gt;Codex won because GPT-5.5 is the best model&lt;/em&gt;. The headline is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Effort budget isn't the variable that explains output quality. Taste is.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Codex on a single medium run produced more production-ready output than Gemini on high. Claude on medium produced the most beautiful site in the lineup. Gemini on high produced a much-improved-but-still-fundamentally-misjudged website.&lt;/p&gt;

&lt;p&gt;The extra budget surfaced what each model already understood about the job. It didn't change what the model thought the job was.&lt;/p&gt;




&lt;h2&gt;
  
  
  Sidebar: Two paths to a Cloudflare token
&lt;/h2&gt;

&lt;p&gt;Worth mentioning because it's the kind of thing CTOs care about.&lt;/p&gt;

&lt;p&gt;When each agent needed to deploy to Cloudflare Pages, they took one of two paths:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Path A — silent OAuth.&lt;/strong&gt; Codex (medium) and Gemini (low) opened my browser, walked me through Cloudflare's OAuth flow, and got a session. Fast. Smooth. I never saw the token. The agent now has access to my entire Cloudflare account for the duration of that session.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Path B — paste-your-own-token.&lt;/strong&gt; Claude (at every effort level) and Gemini (at medium effort) said: "Go to Cloudflare → My Profile → API Tokens → Create Token with these specific scopes — Account: Cloudflare Pages: Edit — and paste it here. I won't see your account session." More friction at install time. Also more control: the token is scoped, I can see exactly what I gave the agent, I can rotate or revoke it without touching my main session.&lt;/p&gt;

&lt;p&gt;Both are defensible. Path A optimises for time-to-deploy. Path B optimises for credential hygiene.&lt;/p&gt;

&lt;p&gt;If you're a solo developer building a side project, Path A is probably fine. If you're running production infrastructure for a fintech and an AI agent is asking for credentials, Path B is the only answer. The fact that two of three agents converge to Path B at higher effort levels — Claude always, Gemini at medium and above — suggests their "thoughtful" mode is more security-aware. Codex stayed silent-OAuth even at medium. Worth knowing.&lt;/p&gt;




&lt;h2&gt;
  
  
  What this means for picking a coding agent in 2026
&lt;/h2&gt;

&lt;p&gt;Three takeaways, none of them about benchmarks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One. Test the agent on a job, not on a problem.&lt;/strong&gt; "Build a website" and "build a website that converts WhatsApp leads for an Indian local business" are different evaluations. The first is a syntax exercise. The second tells you whether the agent can read the room.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Two. Effort budgets are amplifiers, not teachers.&lt;/strong&gt; They make a model more of what it already is. If a model doesn't understand the job at low effort, high effort will produce a more polished version of the wrong thing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Three. Production scaffolding is the cheapest signal of seriousness.&lt;/strong&gt; Tests. Headers. OG meta. A &lt;code&gt;404.html&lt;/code&gt;. Curated photos with content-aware filenames. None of these were in my prompt. The agent that wrote all of them on its own is the one I trust with code I can't review line by line.&lt;/p&gt;




&lt;h2&gt;
  
  
  Coda — what actually shipped
&lt;/h2&gt;

&lt;p&gt;I have to be honest about something the single-shot benchmark couldn't capture.&lt;/p&gt;

&lt;p&gt;Codex won my engineering eval. That stands. It's the one I'd hand a junior dev and say &lt;em&gt;"ship it."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;But the one I reached for next was &lt;strong&gt;Claude&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Two more prompts with the medium-effort Claude — &lt;em&gt;"add a persistent WhatsApp floating button," "add a three-card contact section like a real local business, with primary office / coaching desk / WhatsApp"&lt;/em&gt; — and a bit of browser automation to handle the Cloudflare deploy and DNS, and the site went live at &lt;strong&gt;&lt;a href="https://brca.in/" rel="noopener noreferrer"&gt;brca.in&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That's the version the coach is using today. WhatsApp floating button. Three contact cards. A "Free trial session available" pill the coach asked for after the first parent enquiry. A schedule strip. Custom domain. Live HTTPS.&lt;/p&gt;

&lt;p&gt;Why Claude, not Codex — given my own engineering verdict?&lt;/p&gt;

&lt;p&gt;Because the single-shot test answers &lt;em&gt;"which agent has the best instincts."&lt;/em&gt; The shipping test answers &lt;em&gt;"which agent do I want as a collaborator."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Those are different questions. They had different answers for me.&lt;/p&gt;

&lt;p&gt;Claude was the one I wanted to keep editing. The Bebas Neue + gold-seam aesthetic, the masonry gallery, the Ken Burns hero — those are the parts of the design I didn't want to throw away. Codex's output was more correct. Claude's output was the one I had a relationship with.&lt;/p&gt;

&lt;p&gt;That's a real signal. Worth saying out loud.&lt;/p&gt;




&lt;h2&gt;
  
  
  The closer
&lt;/h2&gt;

&lt;p&gt;The coach got a website. Parents got a WhatsApp button. The site is live at &lt;strong&gt;&lt;a href="https://brca.in/" rel="noopener noreferrer"&gt;brca.in&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The first parent message landed in the inbox before sundown. &lt;em&gt;"Hi BRCA, I would like to know more about cricket training."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The one-shot finding holds: at first contact, taste decided the comparison. Codex's instinct for what an Indian local business website needed to do was sharper than any other model in the lineup.&lt;/p&gt;

&lt;p&gt;But the part of the comparison nobody benchmarks is the part that matters most after the demo: &lt;strong&gt;which agent do you actually want to keep working with&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For me, on this job — it was Claude.&lt;/p&gt;

&lt;p&gt;Champions are built here. Apparently websites too.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Site live at &lt;a href="https://brca.in/" rel="noopener noreferrer"&gt;brca.in&lt;/a&gt;. Drop a comment if you'd like the source code for all five runs — happy to share the GitHub repo.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I'd love to know:&lt;/strong&gt; which agent are you reaching for in 2026 — and what's the smallest job you've used to test whether it actually understands the room? Reply below.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>webdev</category>
      <category>startup</category>
    </item>
  </channel>
</rss>
