<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: arun rajkumar</title>
    <description>The latest articles on DEV Community by arun rajkumar (@mickyarun).</description>
    <link>https://dev.to/mickyarun</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3835684%2F4771b603-8faa-42b1-9e0e-0687faea63a3.jpg</url>
      <title>DEV Community: arun rajkumar</title>
      <link>https://dev.to/mickyarun</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mickyarun"/>
    <language>en</language>
    <item>
      <title>Open Banking vs Card Rails: Latency, Cost, and Developer Experience</title>
      <dc:creator>arun rajkumar</dc:creator>
      <pubDate>Wed, 10 Jun 2026 08:08:10 +0000</pubDate>
      <link>https://dev.to/mickyarun/open-banking-vs-card-rails-latency-cost-and-developer-experience-2knh</link>
      <guid>https://dev.to/mickyarun/open-banking-vs-card-rails-latency-cost-and-developer-experience-2knh</guid>
      <description>&lt;p&gt;I've integrated both. Cards and open banking. In production. Moving real money for real UK merchants.&lt;/p&gt;

&lt;p&gt;So when developers ask me "which is actually better to build on?" I don't give them the marketing answer. I give them the three numbers that decide it: how much it costs, how fast the money moves, and how much of your life you lose to the integration.&lt;/p&gt;

&lt;p&gt;Let me walk through all three. Honestly. Including where cards still win.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Cost — and why it's not close
&lt;/h2&gt;

&lt;p&gt;Here's the part nobody at the card networks wants in a headline.&lt;/p&gt;

&lt;p&gt;A UK card payment costs you somewhere between &lt;strong&gt;1.5% and 3%+&lt;/strong&gt; per transaction once you stack interchange, scheme fees, and your processor's margin. The "0.2% debit interchange cap" everyone quotes is the floor of the floor — it's not what lands on your statement.&lt;/p&gt;

&lt;p&gt;An open banking payment costs roughly &lt;strong&gt;0.1%–1.0%, or a flat 20p–50p&lt;/strong&gt;. No interchange. No scheme fee. Because there's no scheme. The customer authorises the payment inside their own banking app and the bank moves the money directly.&lt;/p&gt;

&lt;p&gt;The concrete version: a local garage takes £500 for a repair. A 1.5% card fee costs them £7.50. The same payment over open banking can cost around 10p. (&lt;a href="https://noda.live/articles/open-banking-costs-uk" rel="noopener noreferrer"&gt;Noda&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;That's not a rounding difference. That's the difference between a payments line item you tolerate and one you forget exists.&lt;/p&gt;

&lt;p&gt;And there's a second-order cost cards carry that nobody puts in the pricing table: &lt;strong&gt;chargebacks&lt;/strong&gt;. £20 a pop, plus the engineering time to fight them, plus the fraud surface you have to defend. Open banking payments are bank-authenticated at source. There's no card number to steal and no "I didn't authorise this" dispute when the customer tapped approve in their own banking app. The fraud surface is smaller, so the price &lt;em&gt;can&lt;/em&gt; be lower. The two facts are connected.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Latency — settlement, not the spinner
&lt;/h2&gt;

&lt;p&gt;This is where developers get the comparison wrong, so let me split it cleanly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Authorisation latency&lt;/strong&gt; — the spinner the user stares at — is comparable. Both flows take a few seconds. A card auth round-trips the network; an open banking payment redirects the user to their bank's SCA and back. From the user's chair, both feel like "tap, wait, done."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Settlement latency&lt;/strong&gt; is where they diverge violently.&lt;/p&gt;

&lt;p&gt;A card payment authorises instantly but &lt;em&gt;settles&lt;/em&gt; in ~2 business days. The money is promised, then it sits in limbo, then it arrives — minus fees, and reversible for months.&lt;/p&gt;

&lt;p&gt;An open banking payment runs over &lt;strong&gt;Faster Payments&lt;/strong&gt;. Settlement is near-instant — seconds to minutes — straight bank-to-bank, 24/7. There's no two-day float, no "pending payout" dashboard, no reconciling Tuesday's sales against Thursday's deposit. (&lt;a href="https://payop.com/business/the-role-of-open-banking-in-enabling-faster-payments-and-real-time-settlement/" rel="noopener noreferrer"&gt;Payop&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;If you've ever written reconciliation code, you already feel why this matters. Half the complexity in payments tooling exists to model the gap between "authorised" and "settled." Close that gap to near-zero and a whole category of state machine — pending, settling, settled, partially-reversed — collapses into one event: paid.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Developer experience — where I'll be honest both ways
&lt;/h2&gt;

&lt;p&gt;Let me give cards their due first.&lt;/p&gt;

&lt;p&gt;Card SDKs are mature. Stripe's docs are art. The card flow is a solved, copy-paste problem with twenty years of Stack Overflow behind it. If you're doing global, card-first commerce, that maturity is worth real money. I'd still reach for cards there.&lt;/p&gt;

&lt;p&gt;Open banking is younger, and the early ecosystem was genuinely painful — you were integrating against dozens of bank APIs, each with its own quirks, its own auth dance, its own downtime. That's the part that earned open banking its "hard to build on" reputation a few years ago.&lt;/p&gt;

&lt;p&gt;But that reputation is now outdated, and here's why: the bank-by-bank mess is exactly what a good PISP abstracts away. You don't integrate 40 banks. You integrate &lt;strong&gt;one API&lt;/strong&gt; that speaks Payment Initiation, handles SCA, manages the consent lifecycle, and fans out to Faster Payments for you.&lt;/p&gt;

&lt;p&gt;In practice, the flow is short:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// 1. Create a payment — you describe intent, not card mechanics&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;payment&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;atoa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;processPayment&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4999&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;            &lt;span class="c1"&gt;// £49.99 in minor units&lt;/span&gt;
  &lt;span class="na"&gt;currency&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;GBP&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;reference&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;order_10472&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;redirectUrl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://yourapp.com/return&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// 2. Send the customer to their bank to authorise (SCA happens here)&lt;/span&gt;
&lt;span class="nf"&gt;redirect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;payment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;authorisationUrl&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// 3. The bank moves the money over Faster Payments.&lt;/span&gt;
&lt;span class="c1"&gt;//    You get told when it's actually settled — not "authorised, check back Thursday."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Webhook: the event you actually care about is real, not a promise&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/webhooks/atoa&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;verify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;              &lt;span class="c1"&gt;// verify signature&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;payment.completed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;fulfilOrder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;reference&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;    &lt;span class="c1"&gt;// money is already in the bank&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sendStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice what's &lt;em&gt;missing&lt;/em&gt; from that code. No card object. No PCI scope to inherit. No tokenisation vault to secure. No &lt;code&gt;requires_capture&lt;/code&gt; → &lt;code&gt;capture&lt;/code&gt; two-step. No chargeback webhook to handle. You describe a payment, the customer approves it in their bank, the money arrives, you fulfil. The thing you're modelling is the thing that actually happens.&lt;/p&gt;

&lt;p&gt;That's the DX argument in one sentence: &lt;strong&gt;open banking lets you write code that matches reality instead of code that models a 1970s settlement delay.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The honest scorecard
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Card rails&lt;/th&gt;
&lt;th&gt;Open banking&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cost per txn&lt;/td&gt;
&lt;td&gt;1.5%–3%+&lt;/td&gt;
&lt;td&gt;0.1%–1% / 20p–50p&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Settlement&lt;/td&gt;
&lt;td&gt;~2 business days&lt;/td&gt;
&lt;td&gt;Near-instant (Faster Payments)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chargebacks&lt;/td&gt;
&lt;td&gt;Yes, £20+ each&lt;/td&gt;
&lt;td&gt;Structurally absent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PCI scope&lt;/td&gt;
&lt;td&gt;Yours to carry&lt;/td&gt;
&lt;td&gt;Not your problem&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SDK maturity&lt;/td&gt;
&lt;td&gt;Excellent, 20 yrs&lt;/td&gt;
&lt;td&gt;Younger, but abstracted&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best for&lt;/td&gt;
&lt;td&gt;Global, card-first&lt;/td&gt;
&lt;td&gt;UK consumers, instant settlement&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Cards aren't dead. If your customers are international and card-native, build on cards. I mean that.&lt;/p&gt;

&lt;p&gt;But if you're a UK SaaS, marketplace, or merchant tool charging UK consumers — you're paying card prices and eating card latency for an experience your users don't need. The numbers back the switch, and they're moving in one direction: UK open banking payments are up &lt;strong&gt;53% year on year&lt;/strong&gt;, with nearly 1 in 3 adults already using it. (&lt;a href="https://www.openbanking.org.uk/news/open-banking-surges-to-15-million-uk-users-as-july-marks-record-adoption/" rel="noopener noreferrer"&gt;Open Banking Ltd&lt;/a&gt;)&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it yourself
&lt;/h2&gt;

&lt;p&gt;The fastest way to feel the difference is to build it. Atoa's sandbox gives you a real Payment Initiation API, real Faster Payments settlement, and webhooks that fire when money actually moves — not when it's promised. &lt;a href="https://docs.atoa.me" rel="noopener noreferrer"&gt;docs.atoa.me&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Spin up a test payment. Watch it settle in seconds instead of days. Then look at what you'd have paid in card fees for the same transaction.&lt;/p&gt;

&lt;p&gt;That second number is the one that changes your mind.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Have you shipped both card and open banking flows in production? I want to hear where the DX actually broke down for you — the messy bits, not the brochure version.&lt;/em&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  OpenBanking #Payments #Fintech #API #BuildInPublic
&lt;/h1&gt;

</description>
      <category>openbanking</category>
      <category>payments</category>
      <category>fintech</category>
      <category>api</category>
    </item>
    <item>
      <title>I Replaced Scrum, Jira, and Our Wiki With 12 AI Agents on a Mac Mini</title>
      <dc:creator>arun rajkumar</dc:creator>
      <pubDate>Mon, 08 Jun 2026 16:33:49 +0000</pubDate>
      <link>https://dev.to/mickyarun/i-replaced-scrum-jira-and-our-wiki-with-12-ai-agents-on-a-mac-mini-o7o</link>
      <guid>https://dev.to/mickyarun/i-replaced-scrum-jira-and-our-wiki-with-12-ai-agents-on-a-mac-mini-o7o</guid>
      <description>&lt;p&gt;A survey last week put it at 54%. More than half the code shipped today is AI-generated.&lt;/p&gt;

&lt;p&gt;In my own work the number is probably higher. AI writes the first draft. AI estimates the work. AI generates the tests. I've written before about &lt;a href="https://bodhiorchard.ai/" rel="noopener noreferrer"&gt;the dangerous 20%&lt;/a&gt; — the edge cases, the illegal state transitions, the judgment AI quietly skips. That 20% is why I still need senior engineers.&lt;/p&gt;

&lt;p&gt;But there's a second 20% problem nobody talks about. Not in the code. Around it.&lt;/p&gt;

&lt;p&gt;Sprints. Story points. Standups. Jira boards no one updates. Confluence pages that went stale the day they were written. Every one of those tools assumes a human does the work and another human tracks the work.&lt;/p&gt;

&lt;p&gt;That's not my team anymore.&lt;/p&gt;

&lt;p&gt;So I stopped bending fifteen-year-old process around an AI-native team. I built my own way of working and open-sourced it. It runs on a Mac mini in the corner of my room. This is what's inside.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvi1z23hgj1eqg5uuabxo.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvi1z23hgj1eqg5uuabxo.jpg" alt=" " width="800" height="459"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Your whole org as a grove. Each repo is a tree, each feature a branch, each teammate present in the world. More on this below — but yes, that's the actual dashboard.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The thing that finally broke me: the wiki
&lt;/h2&gt;

&lt;p&gt;Here's the moment it clicked.&lt;/p&gt;

&lt;p&gt;A new feature needed context. I opened our wiki. The page was six months old. It described an architecture we'd refactored twice since. The "source of truth" was confidently, completely wrong — and three engineers had made decisions based on it that week.&lt;/p&gt;

&lt;p&gt;Documentation lies the moment you stop maintaining it. And nobody maintains it, because maintaining it is the busywork we all silently agree to skip.&lt;/p&gt;

&lt;p&gt;Source code doesn't lie. It can't. It's the thing that actually runs.&lt;/p&gt;

&lt;p&gt;So the first rule of the system I built: &lt;strong&gt;the code is the wiki.&lt;/strong&gt; Knowledge is extracted from the repository — the call graph, the module boundaries, the patterns, the history — and indexed continuously. When an agent or a human asks "how does settlement work?", the answer is reconstructed from what's true &lt;em&gt;right now&lt;/em&gt;, not from a page someone wrote last quarter and abandoned.&lt;/p&gt;

&lt;p&gt;No Confluence. No Notion graveyard. The only document that's allowed to be authoritative is the one that compiles.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fprdkldeibebkbp5iwzjw.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fprdkldeibebkbp5iwzjw.jpg" alt=" " width="800" height="483"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Nobody wrote this wiki. A baseline scan read the repositories and produced it — 19 live features across 4 repos, each one traceable to the code that backs it.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;And you don't even open the dashboard to read it. Ask in Slack, in plain English — "are we progressing on the P3 backlog item? what's the go-live date?" — and a bot answers from the live BUD: status, assignee, target date, a link back to the source. Not a number someone typed into a board last Tuesday. The thing that's actually true, right now.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Finmclkcrllcudohyp0be.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Finmclkcrllcudohyp0be.jpg" alt=" " width="800" height="1031"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The same emoji-react, thread-reply Slack you already live in — except the answers come from the source of truth, not from memory.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;So "the code is the wiki" isn't a slogan — it's an architecture. Knowledge lives in four layers that stay in sync on their own:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The repos themselves&lt;/strong&gt; — source code plus a per-repo &lt;code&gt;CLAUDE.md&lt;/code&gt;, synced on every PR merge to main.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent skills&lt;/strong&gt; — org standards, design guidelines, API patterns; synced on change.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The central store&lt;/strong&gt; — BUDs, enterprise rules, architecture decisions; real-time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector search&lt;/strong&gt; — semantic search across all of it, auto-indexed.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Two things make this more than a fancy &lt;code&gt;grep&lt;/code&gt;. It indexes &lt;strong&gt;code locations&lt;/strong&gt;, so any knowledge captured during development points back to the exact file and symbol it came from — and it links &lt;strong&gt;across repos&lt;/strong&gt;, so a frontend call is connected to the backend handler it actually hits, not left as two disconnected facts in two different wikis. And it never goes stale: after every PR merge, the affected feature is updated with the new commit history and the new code locations automatically, so the next agent that touches it inherits the &lt;em&gt;current&lt;/em&gt; truth, not last month's.&lt;/p&gt;

&lt;p&gt;That's the whole pitch against Confluence — auto-synced from source instead of hand-maintained, semantically searchable instead of keyword-matched, always current with daily staleness detection, and wired straight into the agents' prompts so they're never reasoning from a stale page.&lt;/p&gt;




&lt;h2&gt;
  
  
  Agent-Driven Development, in one table
&lt;/h2&gt;

&lt;p&gt;I call the methodology Agent-Driven Development (ADD). The simplest way to explain it is to put it next to the thing it replaces.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agile ceremony&lt;/th&gt;
&lt;th&gt;What it assumed&lt;/th&gt;
&lt;th&gt;Agent-Driven Development&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Sprint planning&lt;/td&gt;
&lt;td&gt;Humans do all the work, so plan their hours&lt;/td&gt;
&lt;td&gt;Agents draft; humans decide what's worth building&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Story points / planning poker&lt;/td&gt;
&lt;td&gt;Gut-feel proxy for time&lt;/td&gt;
&lt;td&gt;AI-PERT + Monte Carlo → real P50/P70/P85 &lt;strong&gt;dates&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Jira tickets&lt;/td&gt;
&lt;td&gt;Work scattered across a board&lt;/td&gt;
&lt;td&gt;One &lt;strong&gt;BUD&lt;/strong&gt; per feature: spec + tech plan + tests + history&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Confluence / wiki&lt;/td&gt;
&lt;td&gt;Someone keeps docs current (nobody does)&lt;/td&gt;
&lt;td&gt;Knowledge syncs from the source code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Daily standup&lt;/td&gt;
&lt;td&gt;Humans report status out loud&lt;/td&gt;
&lt;td&gt;A Status Agent reads the PRs and tells you what moved&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retrospective&lt;/td&gt;
&lt;td&gt;A meeting you forget by Friday&lt;/td&gt;
&lt;td&gt;A Learning Agent mines the actual diffs and incidents&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The pattern underneath all six rows is the same: &lt;strong&gt;let the machines handle the noise, so humans spend their judgment where judgment actually matters.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The 12 agents
&lt;/h2&gt;

&lt;p&gt;Here's the whole cycle on one diagram before I break it down — twelve agents around a loop, with a human reviewing at the centre and at every gate.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frshmr16yk8wnd5wm50gm.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frshmr16yk8wnd5wm50gm.jpg" alt=" " width="800" height="614"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Chat Intake (Triage) → BUD → Design → Tech Architecture (Tech Lead reviews; Smart Assignment picks the dev) → Development (AI + Human) → Test Generation → Testing (QA) → UAT &amp;amp; Deploy (Status) → Feature → Learning &amp;amp; Skills. An external bug reopens the feature. The loop never pretends it's a straight line.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;ADD runs a feature from a chat message to production through a chain of specialised agents. Each owns one phase. A human reviews and decides at every gate — this is human-in-the-loop by design, not lights-out automation.&lt;/p&gt;

&lt;p&gt;It starts in Slack. You drop a request; the &lt;strong&gt;Intake agent&lt;/strong&gt; doesn't just file it — it checks for existing features and BUDs so you don't build a duplicate, then asks the questions a good PM would: who is this for, why now, what's the timeline.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh3tweovntidxq7zzu2wf.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh3tweovntidxq7zzu2wf.jpg" alt=" " width="800" height="621"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"Change the notification icon to modern design?" → the agent checks for duplicates, then interrogates the intent before a single line is written.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;From there, every feature moves through the same seven-phase lifecycle, each phase a tab on its BUD:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Slack idea → Intake → Requirements → Design → Tech Spec
   → Development → Code Review → Testing → Prod
        ↑ estimation, status, learning and skills run alongside ↑
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3pb4wyc64lt9ch3isj1n.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3pb4wyc64lt9ch3isj1n.jpg" alt=" " width="800" height="489"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Every phase can run on an agent — or you flip it off and drive it yourself from your local AI via MCP. "Stage agents are off, you're driving this BUD" is a real toggle, per phase, per assignee. That's what human-in-the-loop actually looks like.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Around that spine sit the agents that kill the ceremonies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Estimation&lt;/strong&gt; — AI-PERT + Monte Carlo instead of story points (below).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Status&lt;/strong&gt; — reads the PRs so you never run another standup.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Learning&lt;/strong&gt; — mines the real diffs and incidents when a BUD closes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skills&lt;/strong&gt; — profiles who's strong at what from git history, and feeds it back into estimation and routing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The agents do the busywork. You do the deciding. That division is the whole philosophy.&lt;/p&gt;




&lt;h2&gt;
  
  
  The standup reads the work, not the people
&lt;/h2&gt;

&lt;p&gt;I haven't run a status standup in months. The Standup Agent does it at 08:30 on a cron — but the interesting part is &lt;em&gt;where it reads from&lt;/em&gt;. It doesn't ask anyone "what did you do yesterday." It reads what actually happened.&lt;/p&gt;

&lt;p&gt;Hooks and an MCP server in each dev's local setup post the real signal back to the BUD: the prompts, the commits, the sessions. A TODO gets auto-claimed when work starts on it and auto-marked done when the agent finishes the code — so the board reflects reality without anyone updating it. The agent then aggregates the git, PR, bug and chat activity into a summary with risk flags on anything lagging.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyb8wuyjxldlffnofdi85.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyb8wuyjxldlffnofdi85.jpg" alt=" " width="800" height="489"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffjfqenbi7dao1rdvuxo2.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffjfqenbi7dao1rdvuxo2.jpg" alt=" " width="800" height="493"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Four file-level TODOs, all ticked by the work itself. PR #50 merged, 4 commits, 2 files, 5 sessions, 0 errors — captured from hooks, not typed into a board. The status is a side effect of building, not a separate chore.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;And because the Design Agent generates wireframes from your project's &lt;strong&gt;design system extracted out of the code&lt;/strong&gt; — the real CSS tokens, not a guess — what it produces is on-brand by construction. Same with the tech spec: it's written against your actual architecture and tokens, so "follows the brand guidelines" stops being a review comment and becomes the default.&lt;/p&gt;




&lt;h2&gt;
  
  
  The quality loop that reassigns itself
&lt;/h2&gt;

&lt;p&gt;This is the part I'm proudest of, because it's where most teams quietly accumulate debt.&lt;/p&gt;

&lt;p&gt;The Test Plan Agent auto-generates the test plan from the BUD's acceptance criteria and the code — Playwright e2e, unit and integration, security, and the &lt;strong&gt;manual&lt;/strong&gt; UAT cases a human still has to sign off. An MCP token wires your QA automation repo in, so test commits flow straight back to the BUD.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F85p31iol8hr6j0dzvw6u.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F85p31iol8hr6j0dzvw6u.jpg" alt=" " width="800" height="456"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0dgrs9v452zspng225w7.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0dgrs9v452zspng225w7.jpg" alt=" " width="800" height="493"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;24 test cases for one small feature — and notice the manual ones marked "neither can ship as silent regressions, require human sign-off." The agent writes the tests; it doesn't get to wave them through.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Code review is auto-triggered against your org's rules and submitted back on the PR. And here's the loop that closes itself: testing has a &lt;strong&gt;bug threshold&lt;/strong&gt; — complexity × a configurable multiplier. Cross it, and the work auto-reassigns. The original developer moves to bug review, QA rotates to the next waiting BUD, and each bug is auto-classified as a &lt;em&gt;missed feature&lt;/em&gt; versus a &lt;em&gt;development bug&lt;/em&gt; so it takes the right fix path. Quality debt doesn't pile up quietly, because the system reacts to it before a human notices.&lt;/p&gt;




&lt;h2&gt;
  
  
  The BUD: one document instead of three tools
&lt;/h2&gt;

&lt;p&gt;Every feature lives in a single markdown document called a &lt;strong&gt;BUD&lt;/strong&gt; — Business Understanding Document. Spec, technical spec, test plan, and decision history, all in one place, vector-indexed so any agent can pull it as context.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# BUD-241 · Idempotent webhook handler for refunds&lt;/span&gt;

&lt;span class="gu"&gt;## Intent&lt;/span&gt;
Bank sends the same refund webhook up to 3x. We must process once.

&lt;span class="gu"&gt;## Acceptance criteria&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Duplicate webhook IDs are a no-op (return 200, no state change)
&lt;span class="p"&gt;-&lt;/span&gt; A refund on an already-refunded txn is rejected, not retried
&lt;span class="p"&gt;-&lt;/span&gt; Illegal transition complete → pending is impossible

&lt;span class="gu"&gt;## Tech plan&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Dedup key: (provider, webhook_id) unique in Postgres
&lt;span class="p"&gt;-&lt;/span&gt; Reuse shared &lt;span class="sb"&gt;`refundGuard`&lt;/span&gt; util — do NOT reinvent

&lt;span class="gu"&gt;## History&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; 2026-06-05 design approved (human gate)
&lt;span class="p"&gt;-&lt;/span&gt; 2026-06-05 estimation: P70 = 2 days
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the whole feature. No ticket in Jira, no spec in Confluence, no test plan in a Google Doc that nobody opens. One file. It travels with the code, and it's the context every agent reads before it touches anything.&lt;/p&gt;




&lt;h2&gt;
  
  
  Killing story points with statistics
&lt;/h2&gt;

&lt;p&gt;Story points always bothered me. They're a proxy for time that we then pretend isn't a proxy for time, and they don't compose across a team where one person knows a module cold and another has never opened it.&lt;/p&gt;

&lt;p&gt;ADD replaces them with AI-PERT plus a Monte Carlo simulation.&lt;/p&gt;

&lt;p&gt;For each phase the model generates optimistic / likely / pessimistic estimates — classic PERT — but weighted by a per-developer, per-module &lt;strong&gt;skill score&lt;/strong&gt; (0–1.0, derived from git and BUD history), current load, and backlog depth. Then 10,000 simulated runs turn that distribution into dates with confidence intervals:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight gherkin"&gt;&lt;code&gt;&lt;span class="kd"&gt;Feature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; Idempotent refund webhooks
  P50  →  Jun 9   (50% chance done by)
  P70  →  Jun 10  (70% chance done by)
  P85  →  Jun 12  (85% chance done by)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;"85% confident by the 12th" is the shape a stakeholder actually wants. It's also honest in a way "8 points" never was — it shows you the uncertainty instead of hiding it inside a fake integer.&lt;/p&gt;

&lt;p&gt;Where do those skill scores come from? Git history. The system reads who has actually shipped what, per module, and builds a profile — expertise you can see instead of guess at.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzfztbj1g43qsv4iz0b9u.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzfztbj1g43qsv4iz0b9u.jpg" alt=" " width="800" height="487"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Five developers, eighteen modules, scored from real commits. This is what feeds estimation and routing — not a manager's hunch about who "knows the auth code."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Is the skill-score input perfect? No. It's derived from who happened to touch what, so it can encode bias. That's one of the two things I most want feedback on.&lt;/p&gt;

&lt;p&gt;And the loop closes itself. When a BUD ships, the &lt;strong&gt;Learning Agent&lt;/strong&gt; writes the retrospective from the actual diffs — including an estimated-vs-actual table that tells you exactly where the model was wrong, so the next estimate is better.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc3oksjjoeczf99vxesju.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc3oksjjoeczf99vxesju.jpg" alt=" " width="800" height="498"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;No retro meeting. The agent reads the merges and the timeline and hands you the drift — Design −25%, Development +603% — so estimation actually learns.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The part that sounds whimsical and isn't: the virtual world
&lt;/h2&gt;

&lt;p&gt;The whole organisation renders as a living 3D world — and it's &lt;strong&gt;multiplayer&lt;/strong&gt;. Not a dashboard you look at. A place your team is actually &lt;em&gt;in&lt;/em&gt;, together.&lt;/p&gt;

&lt;p&gt;Each repository is a tree. Each feature is a branch. Each agent is an orchardist tending the grove. A feature in progress is a branch growing; a merged one bears fruit; a stalled one needs pruning. Health is visible at a glance: a thriving tree versus one quietly dying.&lt;/p&gt;

&lt;p&gt;And every teammate is there with you. You walk around with WASD, sprint, jump, orbit the camera over the grove. Your colleagues are avatars with their own houses, present in real time. You can wave, cheer, greet, invite someone over. It sounds like a game because part of it is one — but the effect is presence. A standup is people reading status out loud. This is people standing in the same place, looking at the same living map of the work.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4go4l7gvz2ef3ykkqrht.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4go4l7gvz2ef3ykkqrht.jpg" alt=" " width="800" height="459"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Your team, present. Move, sprint, wave, cheer, invite. The status bar is real controls, not decoration.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It started as a visualisation. It became the most honest org chart I've ever had — because it's drawn from the code, not from a slide. &lt;a href="https://youtu.be/OxoqBI7BNxU" rel="noopener noreferrer"&gt;Here's a walkthrough.&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Shipping quality is the game
&lt;/h2&gt;

&lt;p&gt;Here's the part I didn't expect to care about and now love.&lt;/p&gt;

&lt;p&gt;The world is gamified — but it rewards the &lt;em&gt;right&lt;/em&gt; thing. You earn XP and Skill Points, level up, unlock vehicles, upgrade your house. Crucially, the economy is tuned to quality, not output. Ship a BUD to production: &lt;strong&gt;+1 SP&lt;/strong&gt;. Give a code review: &lt;strong&gt;+0.25&lt;/strong&gt;. Quality score above 80%: &lt;strong&gt;+0.5&lt;/strong&gt;. Bug found in testing: &lt;strong&gt;−0.25&lt;/strong&gt;. Bug found in &lt;em&gt;production&lt;/em&gt;: &lt;strong&gt;−1&lt;/strong&gt;. And the points for shipping don't pay out until the BUD actually reaches CLOSED — through testing, UAT, prod. You don't get rewarded for the green checkmark. You get rewarded for the thing surviving contact with reality.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6vgo2bglkoghgy6a0y5m.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6vgo2bglkoghgy6a0y5m.jpg" alt=" " width="800" height="459"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Read the numbers: a production bug costs you more than shipping earns. That's the whole point. In a world where AI can churn out code that passes tests, the scoreboard has to reward what AI is bad at — code that holds up.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That ties straight back to where I started. AI nails the 80%. The 20% — the part that doesn't blow up in production — is what we actually want to incentivise. So that's what the game scores.&lt;/p&gt;




&lt;h2&gt;
  
  
  It runs on a Mac mini, and your data never leaves it
&lt;/h2&gt;

&lt;p&gt;This is the part I care about most, and the part most "AI dev platform" pitches skip.&lt;/p&gt;

&lt;p&gt;Bodhiorchard is &lt;strong&gt;self-hosted by design.&lt;/strong&gt; Postgres with pgvector, your repositories, the embeddings, and the full audit log live on your hardware. For me, that hardware is a Mac mini. No repo content is shipped to anyone's cloud. For a regulated shop — and I lead engineering at an FCA-authorised fintech, so this is not theoretical for me — that's the difference between "interesting demo" and "allowed to exist."&lt;/p&gt;

&lt;p&gt;Inference is your choice. It runs on Claude Code today; Ollama and OpenAI are on the roadmap for fully air-gapped setups. The agent layer is engine-independent — swapping the model is API rewiring, not a redeploy.&lt;/p&gt;

&lt;p&gt;The stack, for the curious:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Backend   FastAPI · Python 3.12
Frontend  Vue 3 · PlayCanvas (the 3D world)
Data      Postgres + pgvector · Redis
Agents    Local MCP server (read + bounded write tools)
License   Apache 2.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It's also built for real orgs, not just a solo demo: detailed roles and permissions, multi-org support out of the box, and capacity planning baked into triage and assignment — the Triage Agent defers work when the team is full, and Smart Assignment balances by real-time utilisation rather than who shouts loudest. So the "self-hosted toy" worry doesn't really hold; it'll sit inside an org's access model on day one.&lt;/p&gt;




&lt;h2&gt;
  
  
  Honest status, because HN will ask anyway
&lt;/h2&gt;

&lt;p&gt;I'd rather tell you this up front than have you find it.&lt;/p&gt;

&lt;p&gt;What's live today: the platform, the BUD lifecycle, the MCP write-path, repository and code-graph indexing, skill profiling, and the 3D living-tree dashboard. The agents are real and they work with a human in the loop at every gate.&lt;/p&gt;

&lt;p&gt;What I'm still building: the fully autonomous execution loop. The direction I'm taking it is deliberately narrow — auto mode first for &lt;em&gt;small, low-risk BUDs&lt;/em&gt;, where one agent chain runs tech spec → code → code review → test → deploy end to end, then stops and waits for a human to approve the release. Not "point the swarm at production and walk away." Lights-out on the small stuff, a human gate where it counts. That's the active work, not a shipped claim. So today this is &lt;em&gt;agents-assisted, human-in-the-loop&lt;/em&gt;, and anyone who tells you their agent swarm ships production code fully unattended is selling something.&lt;/p&gt;

&lt;p&gt;This is an independent project. I built it solo, on my own time, not affiliated with any employer — the fintech is where I felt the pain, not the thing that owns the code.&lt;/p&gt;




&lt;h2&gt;
  
  
  You don't have to start from zero
&lt;/h2&gt;

&lt;p&gt;If you're on Jira today, you don't throw your backlog away. Connect Jira Cloud and import your existing issues straight into BUDs — point Bodhiorchard at the work you already have and watch the grove fill in.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl3w0z37pm641e4pxe486.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl3w0z37pm641e4pxe486.jpg" alt=" " width="800" height="379"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The on-ramp is a migration, not a rewrite. Your tickets become BUDs; the agents take it from there.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;There's also a cross-repo graph view — bus-factor analysis, threat detection, BUD-stage filtering across every repo — for when you want the dependency map instead of the grove. Same data, different lens.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I actually want from you
&lt;/h2&gt;

&lt;p&gt;Not stars. Feedback. Two questions I'm genuinely stuck on:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Does "the BUD is the single source of truth" survive contact with your reality?&lt;/strong&gt; Or does real-world ticketing always sprawl back across five tools no matter what you do?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Where would self-hosted + bring-your-own-inference actually change your mind&lt;/strong&gt; versus a hosted SaaS PM tool — and where is it just more ops burden you don't want?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The full methodology is written up at &lt;strong&gt;&lt;a href="https://bodhiorchard.ai/" rel="noopener noreferrer"&gt;bodhiorchard.ai&lt;/a&gt;&lt;/strong&gt; — the twelve agents, the manifesto, the Agile-vs-ADD table, all of it. The repo has six demo videos and four sample repositories you can point it at: &lt;strong&gt;&lt;a href="https://github.com/mickyarun/bodhiorchard" rel="noopener noreferrer"&gt;https://github.com/mickyarun/bodhiorchard&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I spent fifteen years being told the ceremony &lt;em&gt;was&lt;/em&gt; the engineering. Sprints felt broken long before AI. AI just made it impossible to keep pretending.&lt;/p&gt;

&lt;p&gt;So I replaced them. If you've killed a ceremony and lived to tell the tale — which one did you kill first?&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm Arun — CTO &amp;amp; Co-Founder of Atoa, a UK open banking payments platform, and the solo author of Bodhiorchard. I write about what building with AI is actually like, not what the conference slides say. Find me on &lt;a href="https://x.com/mickyarun" rel="noopener noreferrer"&gt;X @mickyarun&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>devops</category>
      <category>productivity</category>
    </item>
    <item>
      <title>How We Hire for the 20% AI Can't Do (And Why We Stopped Asking Candidates to Code From Scratch)</title>
      <dc:creator>arun rajkumar</dc:creator>
      <pubDate>Tue, 02 Jun 2026 14:43:11 +0000</pubDate>
      <link>https://dev.to/mickyarun/how-we-hire-for-the-20-ai-cant-do-and-why-we-stopped-asking-candidates-to-code-from-scratch-1ida</link>
      <guid>https://dev.to/mickyarun/how-we-hire-for-the-20-ai-cant-do-and-why-we-stopped-asking-candidates-to-code-from-scratch-1ida</guid>
      <description>&lt;p&gt;A few weeks ago I published a piece called &lt;a href="https://dev.to/mickyarun/ai-agents-are-great-at-80-of-our-code-the-other-20-is-why-we-still-need-seniors-3lh5"&gt;"AI Agents Are Great at 80% of Our Code. The Other 20% Is Why We Still Need Seniors."&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It got 25 reactions and 34 comments. Several of those comments asked the same question in different words:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"How do you actually measure that 20% when you're hiring?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Fair question. I dodged it in the first article because I didn't have a clean answer yet. Now I do. Or at least, we have a process that's working better than anything we tried before.&lt;/p&gt;

&lt;p&gt;This is that answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The old interview was broken before AI made it obvious
&lt;/h2&gt;

&lt;p&gt;For years, we ran the standard playbook. Whiteboard problem. Timed coding exercise. "Build a REST endpoint in 45 minutes." You know the drill.&lt;/p&gt;

&lt;p&gt;Here's what that interview actually tested: can this person write syntactically correct code under pressure, from memory, with someone watching?&lt;/p&gt;

&lt;p&gt;That's a real skill. It's just not the skill that matters anymore.&lt;/p&gt;

&lt;p&gt;AI handles the code-writing part. I don't mean it handles it perfectly — I wrote a whole article about the 20% it gets wrong. But the 80% that is boilerplate, CRUD, API wrappers, standard patterns? An agent will generate that in seconds. Clean. Typed. Probably with better variable names than I'd pick.&lt;/p&gt;

&lt;p&gt;So if I'm hiring someone and my interview tests whether they can do what an agent already does faster — what exactly am I learning?&lt;/p&gt;

&lt;p&gt;That they can type under pressure. Great. So can the agent. And it doesn't get nervous.&lt;/p&gt;

&lt;h2&gt;
  
  
  The interview we actually run now
&lt;/h2&gt;

&lt;p&gt;We stopped asking candidates to write code from scratch. Instead, we hand them code an AI agent already wrote.&lt;/p&gt;

&lt;p&gt;The code looks fine. It passes the tests we included. The variable names are clean. The types are correct. A junior looking at it would say "ship it."&lt;/p&gt;

&lt;p&gt;But it's wrong.&lt;/p&gt;

&lt;p&gt;Not wrong in a way that crashes. Wrong in a way that costs money three weeks later. Wrong in a way that only someone who thinks about &lt;em&gt;consequences&lt;/em&gt; would catch.&lt;/p&gt;

&lt;p&gt;Here's the shape of it. We give candidates a webhook handler for processing payment confirmations. The handler works. It receives the event, updates the database, returns a 200. Clean code.&lt;/p&gt;

&lt;p&gt;What's missing: idempotency. If the bank retries the webhook — and banks &lt;em&gt;always&lt;/em&gt; retry — the handler processes the payment twice. The customer gets charged twice. We get an FCA complaint. The code is correct. The system is broken.&lt;/p&gt;

&lt;p&gt;Or we show them a payment flow with state transitions. &lt;code&gt;pending&lt;/code&gt; to &lt;code&gt;authorised&lt;/code&gt; to &lt;code&gt;settled&lt;/code&gt;. Looks right. But there's a path where a payment can go from &lt;code&gt;settled&lt;/code&gt; back to &lt;code&gt;pending&lt;/code&gt;. That's an illegal state transition. In our domain, that means money that was already in a merchant's account could theoretically get pulled back without a refund record. No test catches it because no test was written for a transition that shouldn't exist.&lt;/p&gt;

&lt;p&gt;We ask candidates to review this code. Not write it. Review it.&lt;/p&gt;

&lt;p&gt;The ones who have the 20% find these things. Not always immediately. Sometimes they stare at it for five minutes and then say "wait — what happens if this gets called twice?" That moment is worth more than any algorithm they could whiteboard.&lt;/p&gt;

&lt;h2&gt;
  
  
  It's not what they'd change. It's why.
&lt;/h2&gt;

&lt;p&gt;We added a second part to the interview. Once a candidate identifies issues in the AI-generated code, we ask them to walk us through a PR rejection.&lt;/p&gt;

&lt;p&gt;Not "what would you change." We already know what needs to change. We want to hear &lt;em&gt;why they'd reject it&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;This is where you separate pattern-matchers from engineers.&lt;/p&gt;

&lt;p&gt;A pattern-matcher says: "There's no idempotency key. You should add one." Correct. Also surface-level. They've seen the pattern before and recognized its absence. That's good, but it's not enough.&lt;/p&gt;

&lt;p&gt;An engineer says: "There's no idempotency key, which means a network retry from the bank will double-process the payment. The customer sees two debits. Your support team gets a ticket. You file a dispute with the acquiring bank. The refund takes 5-7 business days. And if this happens at volume, you've got a regulatory reporting obligation."&lt;/p&gt;

&lt;p&gt;Same observation. Completely different depth. The first person knows the pattern. The second person knows what happens downstream when the pattern is missing.&lt;/p&gt;

&lt;p&gt;That downstream awareness — the ability to trace a bug forward through the business — is the 20%.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hire for intent, not resumes
&lt;/h2&gt;

&lt;p&gt;Our interview process changed because our hiring philosophy changed first.&lt;/p&gt;

&lt;p&gt;We don't hire resumes. We hire intent. Let me give you three examples.&lt;/p&gt;

&lt;p&gt;One developer's resume listed one skill under "Technical Proficiency": Googling. I'm not paraphrasing. That's what it said. B.Sc. No fancy internships. No side projects on GitHub. Just someone who was honest about what they knew and relentless about learning what they didn't. Today they own our merchant-facing app. The whole thing.&lt;/p&gt;

&lt;p&gt;Another cold-messaged us asking for a job. No referral. No warm intro. Just a direct message. In the interview, they were quiet. Not shy — quiet. Listened more than they talked. When they did talk, they went straight to the solution. No preamble, no hedging, no "well it depends." Just: here's the problem, here's how I'd fix it, here's what could go wrong.&lt;/p&gt;

&lt;p&gt;A third started as an intern. They're now building our Open Banking integration end-to-end. Not assisting. Not maintaining. Building.&lt;/p&gt;

&lt;p&gt;The common thread isn't a degree or a tech stack or years of experience. It's three things: curiosity, ownership, and willingness to be wrong.&lt;/p&gt;

&lt;p&gt;The first didn't pretend they knew things they didn't. The second didn't try to impress with volume — they impressed with clarity. The third didn't wait for someone to assign harder problems — they grew into them because the problems were there and they weren't afraid to try.&lt;/p&gt;

&lt;p&gt;None of them would have passed the old coding interview particularly well. All of them are exactly the kind of engineer you want reviewing an AI agent's output.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 20% isn't just code — it's design thinking
&lt;/h2&gt;

&lt;p&gt;Here's something most "AI and hiring" articles miss: the 20% that matters isn't only about catching bugs in payment logic. It's about knowing how to think about problems that don't have a spec yet.&lt;/p&gt;

&lt;p&gt;Before Atoa, I spent years in design thinking — working with clients who were showcasing products at CES. One project sticks with me. A world-leading chocolate manufacturer wanted to launch a series of chocolates based on human emotions — Anger, Disgust, Sad, Happy, Wimpy. The brief: build software that captures a person's emotion in real-time, recommends the matching chocolate, and makes it go viral on social media.&lt;/p&gt;

&lt;p&gt;Now imagine the marketing manager walks into your office and drops this on your desk. The brief is: make it viral. Which platform do you build on? Where does the experience live? What's the feature that makes someone &lt;em&gt;want&lt;/em&gt; to share it?&lt;/p&gt;

&lt;p&gt;Assume technically anything is possible. The technology isn't the constraint. Your thinking is.&lt;/p&gt;

&lt;p&gt;Here's the filter: if your first answer is "I'll build a mobile app and a web app" — that's a straight reject. Not because mobile and web are wrong technologies. But because you jumped to &lt;em&gt;how&lt;/em&gt; before you thought about &lt;em&gt;why&lt;/em&gt;. You're solving for delivery before you've solved for virality. You're thinking like a developer when the brief asked you to think like a designer.&lt;/p&gt;

&lt;p&gt;The interesting answers start with questions. Who's the audience? Where do they already spend time? What makes someone stop scrolling and share something? What's the 3-second hook? How does the chocolate brand benefit from every share? What's the mechanic that makes this grow without paid media?&lt;/p&gt;

&lt;p&gt;Now here's my challenge to you: how would you approach this? Drop it in the comments. Not the tech stack — the &lt;em&gt;thinking&lt;/em&gt;. How do you decompose this brief into something that actually goes viral?&lt;/p&gt;

&lt;p&gt;There's no single right answer. That's the point. This is a design thinking exercise — the kind of problem where the 20% lives. The brief is intentionally vague. The constraints are real. And the interesting part isn't the technology you pick. It's &lt;em&gt;how you think about a problem before you write a single line of code&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;No AI agent is writing a spec for that. No benchmark captures the ability to look at a brief like "emotion-based chocolate recommendation engine for CES" and turn it into a system. That's design thinking. The ability to hold a vague, human problem in your head and translate it into technical architecture — while keeping the user experience front and centre.&lt;/p&gt;

&lt;p&gt;I look for this in interviews too. Not the ability to solve a well-defined problem. The ability to &lt;em&gt;define&lt;/em&gt; the problem in the first place. When I ask a candidate "how would you approach this?" and they immediately start writing code — that tells me something. When they first ask "who's using this, where, and what does success look like?" — that tells me something very different.&lt;/p&gt;

&lt;p&gt;The 20% is judgment about code. But it's also judgment about products, users, and what should exist in the world. AI can generate solutions. It can't ask the right question.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the 20% actually looks like in an interview
&lt;/h2&gt;

&lt;p&gt;Here's what I'm watching for when a candidate reviews code. It's not a checklist — it's a set of signals.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do they think about what shouldn't happen?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most engineers think about the happy path. The payment goes through. The webhook fires. The database updates. Done.&lt;/p&gt;

&lt;p&gt;The 20% engineers think about the unhappy path &lt;em&gt;first&lt;/em&gt;. What happens when the webhook fires twice? What happens when the database write succeeds but the response times out? What happens when the bank says "yes" and our system says "no" and now the money exists in a state neither side agrees on?&lt;/p&gt;

&lt;p&gt;If a candidate's first instinct is "how does this work?" — that's fine. If their first instinct is "how does this &lt;em&gt;break&lt;/em&gt;?" — that's the signal.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do they ask about failure modes before writing anything?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We've started noticing this in walkthroughs. Some candidates immediately start typing fixes. Others ask questions first. "What's the retry policy on these webhooks?" "Is there a dead letter queue?" "What happens to in-flight payments if this service goes down?"&lt;/p&gt;

&lt;p&gt;The ones who ask first are almost always better engineers. Not because asking is inherently better than doing. But because in the 20% territory — the code that handles edge cases, race conditions, regulatory requirements — the cost of building the wrong thing is higher than the cost of asking one more question.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can they explain a tradeoff they made, not just what they chose?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the question I ask every candidate, regardless of seniority: "Tell me about a technical decision where you chose the worse option on purpose."&lt;/p&gt;

&lt;p&gt;The interesting candidates have an answer. "We chose synchronous calls between two services because the audit trail was easier to reason about, even though async would have been more resilient." "We kept a manual process instead of automating it because the edge cases weren't well understood yet and we didn't want to automate the wrong thing."&lt;/p&gt;

&lt;p&gt;The 20% is full of decisions like this. The right answer isn't always the technically superior one. Sometimes the right answer is the one that's easier to debug at 2am, or the one that produces a cleaner audit trail, or the one that a new engineer can understand without reading three pages of context.&lt;/p&gt;

&lt;h2&gt;
  
  
  The junior training pipeline problem
&lt;/h2&gt;

&lt;p&gt;Here's the question that kept me up after the first article: if AI handles 80% of the code, how do juniors ever build the judgment that makes seniors valuable?&lt;/p&gt;

&lt;p&gt;The 80% used to be the training ground. You learn to write CRUD endpoints. You learn to wire up a database. You learn to handle HTTP errors. You make mistakes in the boring code, you get them caught in review, and slowly you develop an instinct for the less boring code.&lt;/p&gt;

&lt;p&gt;If an agent writes all of that for you on day one, what are you actually learning?&lt;/p&gt;

&lt;p&gt;This is a real problem. And "just let them use AI" isn't the answer, because using AI well requires the judgment you're supposed to be building.&lt;/p&gt;

&lt;p&gt;I'll be honest — I've had to let someone go because of this exact gap. They were using AI for everything. But they were using the default model in Cursor while the rest of the team had moved to Opus for anything that touched critical code. They weren't thinking about &lt;em&gt;which&lt;/em&gt; tool to use &lt;em&gt;when&lt;/em&gt;. They were just pressing tab and shipping. The code looked fine. The judgment wasn't there. And in a payment system, that's not a skill gap you can coach around — it's a risk.&lt;/p&gt;

&lt;p&gt;At Atoa, we pair juniors with seniors on the hard problems. Not the 80% problems. The 20% ones. The payment state machine that handles twelve edge cases. The webhook handler that has to be idempotent across retries, timeouts, and partial failures. The reconciliation logic where our system says one thing and the bank says another.&lt;/p&gt;

&lt;p&gt;The senior doesn't watch the output. They watch the process. They're looking for two things.&lt;/p&gt;

&lt;p&gt;First: "What did you skip?" Not what did you get wrong — what did you not even consider? That gap is where the learning lives. A junior who writes a webhook handler and doesn't think about idempotency hasn't made a mistake. They have a blind spot. Mistakes you can catch in tests. Blind spots you can only catch by asking the right question at the right time. That's what the senior is there for.&lt;/p&gt;

&lt;p&gt;Second: "What happens when this fails?" Not "did you handle the error." Did you think about what the &lt;em&gt;system&lt;/em&gt; does when this component fails? Does the rest of the pipeline stall? Does the customer see a broken state? Does the merchant lose money? The junior doesn't need to have the answer. They need to have the habit of asking the question.&lt;/p&gt;

&lt;p&gt;The painful lessons still happen. They just happen faster because the senior is there to compress the feedback loop from "you'll figure this out in three years" to "let me show you why this matters right now."&lt;/p&gt;

&lt;h2&gt;
  
  
  The best hire isn't the best coder anymore
&lt;/h2&gt;

&lt;p&gt;Three years ago I'd have hired the candidate who wrote the cleanest code the fastest. That person is still good. They're just not rare anymore. An AI agent writes clean code fast. That's table stakes.&lt;/p&gt;

&lt;p&gt;The hire I'm looking for now is the person who reads an AI agent's clean, well-typed, properly structured code — and says "this will break in production, and here's exactly how."&lt;/p&gt;

&lt;p&gt;That person can tell an agent what it got wrong. More importantly, they can explain &lt;em&gt;why it matters&lt;/em&gt;. Not just "add an idempotency key" but "add an idempotency key because the bank will retry, and without it, this elegant code will charge a customer twice."&lt;/p&gt;

&lt;p&gt;The 20% was never about writing harder code. It's about knowing which code is dangerous.&lt;/p&gt;

&lt;p&gt;We changed our interview because the job changed. The job isn't writing code anymore. The job is judgment.&lt;/p&gt;

&lt;p&gt;And judgment is the one thing you can't generate with a prompt.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is a sequel to &lt;a href="https://dev.to/mickyarun/ai-agents-are-great-at-80-of-our-code-the-other-20-is-why-we-still-need-seniors-3lh5"&gt;AI Agents Are Great at 80% of Our Code. The Other 20% Is Why We Still Need Seniors&lt;/a&gt;. If you're building a team that works with AI agents, I'd love to hear how your hiring process has changed. Drop a comment or find me on X &lt;a href="https://twitter.com/mickyarun" rel="noopener noreferrer"&gt;@mickyarun&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>career</category>
      <category>ai</category>
      <category>hiring</category>
      <category>webdev</category>
    </item>
    <item>
      <title>AI Agents Are Great at 80% of Our Code. The Other 20% Is Why We Still Need Seniors.</title>
      <dc:creator>arun rajkumar</dc:creator>
      <pubDate>Thu, 28 May 2026 06:20:38 +0000</pubDate>
      <link>https://dev.to/mickyarun/ai-agents-are-great-at-80-of-our-code-the-other-20-is-why-we-still-need-seniors-3lh5</link>
      <guid>https://dev.to/mickyarun/ai-agents-are-great-at-80-of-our-code-the-other-20-is-why-we-still-need-seniors-3lh5</guid>
      <description>&lt;p&gt;&lt;em&gt;We let AI agents loose on a payment platform. They crushed the boring stuff. Then they silently broke the stuff that matters.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;A survey came out last week. 54% of all code is now AI-generated. Up from 28% last year.&lt;/p&gt;

&lt;p&gt;I read that number and thought: yeah, that tracks. We're probably in that range too.&lt;/p&gt;

&lt;p&gt;But here's the thing nobody's asking — which 54%?&lt;/p&gt;

&lt;p&gt;Not all code carries equal weight. A CRUD endpoint for fetching merchant details? Low risk. The webhook handler that transitions a payment from &lt;code&gt;pending&lt;/code&gt; to &lt;code&gt;complete&lt;/code&gt;? That's someone's rent. Someone's payroll. Get that wrong and money moves where it shouldn't, or worse, money doesn't move at all.&lt;/p&gt;

&lt;p&gt;I'm the CTO of a payment platform. FCA-authorised, processing real money, real merchants, real consequences. We run NestJS microservices, Docker, Traefik — the usual stack. And we've been using AI agents aggressively for over a year now.&lt;/p&gt;

&lt;p&gt;I'm not here to tell you AI is dangerous. It's not.&lt;/p&gt;

&lt;p&gt;I'm here to tell you it's dangerous when you forget what it's actually good at.&lt;/p&gt;




&lt;h2&gt;
  
  
  The 80% Where AI Agents Are Genuinely Brilliant
&lt;/h2&gt;

&lt;p&gt;Let me give credit where it's due. AI agents have made our team faster in ways that would have seemed absurd two years ago.&lt;/p&gt;

&lt;p&gt;API scaffolding. Generating service boilerplate. Writing Zod validation schemas. Spinning up new endpoints. Creating test stubs. Refactoring imports. Migrating patterns across repos.&lt;/p&gt;

&lt;p&gt;We run multiple microservices. When we need a new service, an agent can scaffold the entire thing — module structure, base configuration, Docker setup, Traefik labels — in minutes. What used to be a half-day of copy-paste-and-tweak is now a conversation.&lt;/p&gt;

&lt;p&gt;When we overhauled our env management across all repos, AI agents did the grunt work. They mapped every &lt;code&gt;.env&lt;/code&gt; file, found naming conflicts, identified common variables, and generated a unified Zod schema. What would have taken a team days of grep-and-spreadsheet work took hours.&lt;/p&gt;

&lt;p&gt;For this 80% of the codebase — the predictable, pattern-following, structurally repetitive code — AI agents are the best junior developers money can buy. Tireless. Cheap. No ego. Almost never make a mistake on the stuff they're good at.&lt;/p&gt;

&lt;p&gt;An army of juniors sitting at your terminal.&lt;/p&gt;




&lt;h2&gt;
  
  
  Then You Hit the Other 20%
&lt;/h2&gt;

&lt;p&gt;Here's where it gets interesting.&lt;/p&gt;

&lt;p&gt;We had an agent build out a webhook handler. Webhooks in payments are critical — they're how you know a payment succeeded, failed, or needs attention. The agent wrote the handler. It looked clean. Tests passed.&lt;/p&gt;

&lt;p&gt;But it silently ignored the edge cases.&lt;/p&gt;

&lt;p&gt;Status transitions have rules. A payment can go from &lt;code&gt;pending&lt;/code&gt; to &lt;code&gt;complete&lt;/code&gt;. It cannot go from &lt;code&gt;complete&lt;/code&gt; back to &lt;code&gt;pending&lt;/code&gt;. When a human developer builds this, they think about the illegal transitions because they've seen what happens when money moves backwards. They build the guard because they've felt the pain of not having it.&lt;/p&gt;

&lt;p&gt;The agent didn't care about that. It built the happy path beautifully and treated the edge cases like they didn't exist.&lt;/p&gt;

&lt;p&gt;When we do this work manually, this type of error never happens. A senior developer who has worked in payments for years doesn't forget the impossible transitions. It's not in their code — it's in their bones.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Pattern I Keep Seeing
&lt;/h2&gt;

&lt;p&gt;This isn't a one-off. After months of working with AI agents on a regulated payment stack, one pattern is consistent:&lt;/p&gt;

&lt;p&gt;AI agents optimise for completion, not correctness.&lt;/p&gt;

&lt;p&gt;They want to finish the feature. Get to the green checkmark. And to get there efficiently, they take shortcuts that look reasonable on the surface.&lt;/p&gt;

&lt;p&gt;The agent builds what should happen. It rarely builds what should &lt;em&gt;not&lt;/em&gt; happen. In payments, the negative cases are where all the real risk lives. What happens when a webhook arrives twice? What happens when a refund is requested on an already-refunded transaction? What happens when the bank returns an unexpected status code? The agent doesn't think about any of that unless you explicitly tell it to.&lt;/p&gt;

&lt;p&gt;Then there's the reusability problem. We have shared utility packages. Helper functions. Common patterns that the team has standardised on over years. The agent doesn't care. It writes its own version from scratch. It works, but now you have two implementations of the same logic — one tested and trusted in production, one freshly generated and untested. The agent is focused on completing &lt;em&gt;this&lt;/em&gt; feature, not maintaining the architecture.&lt;/p&gt;

&lt;p&gt;And the subtlest one — agents seem to optimise for fewer back-and-forth turns. It looks like they're saving cost, saving context. Complex validation? Skip it, the basic case works. Error handling for a rare edge case? Not worth the tokens. The result is code that passes every test you wrote but fails on the scenarios you didn't think to test — because those are exactly the scenarios the agent also didn't think about.&lt;/p&gt;




&lt;h2&gt;
  
  
  Juniors Don't Ship Products. They Write Code.
&lt;/h2&gt;

&lt;p&gt;Here's the frame that made this click for me.&lt;/p&gt;

&lt;p&gt;Claude — or any coding agent — is the best junior developer money can buy. An army of juniors. Tireless, cheap, no ego, near-zero error rate on routine work.&lt;/p&gt;

&lt;p&gt;But juniors don't ship products. They write code.&lt;/p&gt;

&lt;p&gt;The difference between code and a product is judgment. Knowing which transitions are illegal. Knowing that the retry logic has a specific backoff curve because you've been burned by what happens when it doesn't. Knowing that the webhook handler needs idempotency because banks sometimes send the same notification three times.&lt;/p&gt;

&lt;p&gt;That knowledge doesn't come from training data. It comes from years of operating a system, debugging at 2am, explaining to a merchant why their settlement was delayed.&lt;/p&gt;

&lt;p&gt;The most dangerous mistake a CTO can make in 2026 is buying AI to replace senior engineers. The right move is buying AI to enable them.&lt;/p&gt;

&lt;p&gt;Replace your senior with AI? You get speed plus silent disasters.&lt;/p&gt;

&lt;p&gt;Enable your senior with AI? You get an architect with an army.&lt;/p&gt;




&lt;h2&gt;
  
  
  What We Actually Do About It
&lt;/h2&gt;

&lt;p&gt;I'm not writing this to complain about AI. I'm writing this because we've built a system that works, and it might help you too.&lt;/p&gt;

&lt;p&gt;The first thing we did was make our architecture machine-readable. We extract design patterns and architecture rules into formats that agents can consume. When an agent works on our codebase, it doesn't just see code — it sees boundaries, patterns, rules about what belongs where. Not documentation nobody reads. Lints and constraints that the agent can't ignore.&lt;/p&gt;

&lt;p&gt;Then we invested heavily in testing the negative cases. Every PR — human or AI — runs through the same suite. But we specifically built tests for the stuff agents skip: illegal state transitions, duplicate webhook handling, idempotency checks. If the agent silently drops a negative case, the tests catch it before it ships.&lt;/p&gt;

&lt;p&gt;And seniors still review everything that touches money. No AI-generated payment logic ships without a senior looking at it. Not because we don't trust AI — because we know exactly where it's blind. The review isn't checking syntax. It's checking judgment. Did the agent handle the ambiguous bank status? Did it respect our existing retry logic? Did it use the shared utility or reinvent the wheel?&lt;/p&gt;

&lt;p&gt;This problem bothered me enough that I started building &lt;a href="https://bodhiorchard.ai/" rel="noopener noreferrer"&gt;Bodhi Orchard&lt;/a&gt; — an open-source agentic development framework. The core idea: don't just let agents write code. Feed them the full context — architecture, design patterns, test plans, existing utilities — so they stop making the same blind-spot mistakes. Human decisions over human busywork, with guardrails that actually enforce quality.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Question for 2026
&lt;/h2&gt;

&lt;p&gt;The survey says 54% of code is AI-generated. I believe it.&lt;/p&gt;

&lt;p&gt;But here's my question: what percentage of &lt;em&gt;bugs&lt;/em&gt; in 2026 will be AI-generated?&lt;/p&gt;

&lt;p&gt;And more importantly — who's going to find them?&lt;/p&gt;

&lt;p&gt;Not the agents. They wrote the bugs in the first place. Not the juniors — they won't know enough to spot what's missing.&lt;/p&gt;

&lt;p&gt;It's going to be the seniors. The architects. The people who've operated these systems long enough to know where the bodies are buried.&lt;/p&gt;

&lt;p&gt;The 80% is solved. AI won. Celebrate that.&lt;/p&gt;

&lt;p&gt;Now invest in the humans who understand the other 20%. Because that's where your product lives or dies.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm Arun, CTO &amp;amp; Co-Founder of Atoa — a UK open banking payment platform. I write about what it's actually like to build fintech with AI, not what the conference slides say it's like. If this resonated, follow me here or on &lt;a href="https://x.com/mickyarun" rel="noopener noreferrer"&gt;X @mickyarun&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;And if you're curious about building AI-native development with proper guardrails, check out &lt;a href="https://bodhiorchard.ai/" rel="noopener noreferrer"&gt;Bodhi Orchard&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>discuss</category>
      <category>webdev</category>
      <category>startup</category>
    </item>
    <item>
      <title>Your Payment API Wasn't Built for AI Agents. Open Banking Might Be the Fix.</title>
      <dc:creator>arun rajkumar</dc:creator>
      <pubDate>Wed, 27 May 2026 11:09:19 +0000</pubDate>
      <link>https://dev.to/mickyarun/your-payment-api-wasnt-built-for-ai-agents-open-banking-might-be-the-fix-43cg</link>
      <guid>https://dev.to/mickyarun/your-payment-api-wasnt-built-for-ai-agents-open-banking-might-be-the-fix-43cg</guid>
      <description>&lt;p&gt;Stripe just shipped an Agentic Commerce Suite. PayPal launched Agent Ready. Visa predicts millions of consumers will use AI agents to complete purchases by the 2026 holiday season. Mastercard introduced Agent Pay with its own verification layer. Google launched the Agent Payments Protocol with 60+ partners.&lt;/p&gt;

&lt;p&gt;Everyone is scrambling to make payments work for AI agents.&lt;/p&gt;

&lt;p&gt;And I keep looking at all of it thinking: you're bolting agent support onto a stack that was designed for a human staring at a checkout page. That's not an integration. That's a retrofit.&lt;/p&gt;

&lt;p&gt;I run payments infrastructure. Our platform processes open banking payments for UK merchants — the kind where money moves directly from the customer's bank account, no card network in between. And from where I sit, the agentic payments conversation has a blind spot the size of Visa's interchange fee.&lt;/p&gt;

&lt;p&gt;Let me explain.&lt;/p&gt;

&lt;h2&gt;
  
  
  The card stack assumes a human is present
&lt;/h2&gt;

&lt;p&gt;Here's what happens when you process a card payment today:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;User types card number into a form (or taps a saved card)&lt;/li&gt;
&lt;li&gt;Your frontend collects PAN data, sends it to a tokenisation layer&lt;/li&gt;
&lt;li&gt;The token goes to an acquirer, who talks to the card network, who talks to the issuing bank&lt;/li&gt;
&lt;li&gt;3D Secure kicks in — the user gets a push notification or SMS OTP&lt;/li&gt;
&lt;li&gt;The issuing bank authorises (or declines)&lt;/li&gt;
&lt;li&gt;Settlement happens days later&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That flow was designed around one assumption: &lt;strong&gt;a human is sitting at a screen, ready to respond to authentication challenges.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Now replace the human with an AI agent.&lt;/p&gt;

&lt;p&gt;The agent doesn't have eyes to read an OTP. It can't tap "approve" on a banking app push notification. It can't solve a CAPTCHA. It can't parse a 3DS iframe that renders differently on every issuing bank's domain.&lt;/p&gt;

&lt;p&gt;So what do the card networks do? They build workarounds. Stripe's agentic suite generates virtual cards. Mastercard's Agent Pay pre-registers agents and skips some auth steps. Everyone is finding clever ways to route around the authentication wall that they built.&lt;/p&gt;

&lt;p&gt;That's the tell. When your entire ecosystem is engineering around its own security layer, the architecture has a problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Open banking was built for machines talking to machines
&lt;/h2&gt;

&lt;p&gt;Open banking works differently. There's no card number. No PAN. No tokenisation vault. No 3DS iframe.&lt;/p&gt;

&lt;p&gt;Here's the flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Your app (or agent) calls a payment initiation API&lt;/li&gt;
&lt;li&gt;The API returns a redirect URL to the customer's bank&lt;/li&gt;
&lt;li&gt;The customer authenticates directly with their bank (biometrics, app-based SCA)&lt;/li&gt;
&lt;li&gt;The bank confirms the payment&lt;/li&gt;
&lt;li&gt;Money moves. Settlement is instant or same-day.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Notice what's different. The payment API is a clean, stateless request-response interface. The authentication happens &lt;strong&gt;on the bank's side&lt;/strong&gt;, not inside your checkout flow. Your code never sees a card number, never handles an OTP, never renders an auth challenge.&lt;/p&gt;

&lt;p&gt;For a human using a checkout page, this is a nice UX improvement. For an AI agent calling a payment API, this is a structural advantage.&lt;/p&gt;

&lt;p&gt;The agent makes an API call. It gets back a payment URL. It hands that URL to the human for one-time bank authentication. Done. The agent doesn't need to handle auth. The bank does. The separation of concerns is clean.&lt;/p&gt;

&lt;p&gt;And with VRPs (Variable Recurring Payments) now live in the UK, it gets even better. The human authenticates once, sets spending limits, and the agent can initiate payments within those limits without any further human interaction. No virtual cards. No pre-registered agent identities. Just an API call against a mandate.&lt;/p&gt;

&lt;p&gt;That's not a workaround. That's the architecture actually working as designed.&lt;/p&gt;

&lt;h2&gt;
  
  
  What breaks when agents use card APIs
&lt;/h2&gt;

&lt;p&gt;I've been watching developers try to build agentic payment flows on card rails. Here's what keeps going wrong:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PCI scope explosion.&lt;/strong&gt; If your agent is generating virtual cards, storing card tokens, or managing card-on-file relationships, your PCI compliance surface just grew. AI agents that handle card data need the same compliance posture as any system that touches PANs. That's not a small thing. That's SOC2 scope, penetration testing, quarterly scans — all for an agent that could've made a bank-to-bank API call instead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Authentication is the bottleneck.&lt;/strong&gt; 3D Secure was designed as a human-in-the-loop check. Every attempt to skip it for agents either weakens security (bad) or creates a parallel auth system (complex and fragile). Open banking's approach — SCA happens at the bank, not at your checkout — means the agent never needs to authenticate. It just calls the API.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Settlement lag creates state management nightmares.&lt;/strong&gt; Card payments settle in days. When an agent is orchestrating a multi-step workflow (compare prices → select vendor → pay → confirm delivery), it needs to know whether payment actually landed. With cards, you get an authorisation that might reverse, a settlement that arrives Tuesday, and a chargeback window that stays open for months. With open banking, payment confirmation is real-time. The state machine is simpler because the money actually moved.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Micro-payments don't work on card rails.&lt;/strong&gt; AI agents generate hundreds of micro-transactions per session. The interchange fee floor on card payments makes sub-pound transactions economically absurd. Open banking's fee structure is flat or percentage-based without the card network's minimum — which is why it actually works for the agentic pattern of many small, frequent payments.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd actually build
&lt;/h2&gt;

&lt;p&gt;If I were starting an agentic commerce integration today for UK customers, here's the architecture I'd reach for:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For one-off agent-initiated purchases:&lt;/strong&gt; Open banking payment initiation. The agent creates a payment request via API, gets a consent URL, passes it to the user. One SCA event, then the money moves. No card data anywhere in the stack.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent → Payment API (create payment request)
      → User gets bank auth link
      → User approves in banking app (biometrics)
      → Webhook confirms payment
      → Agent continues workflow
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;For recurring agent-managed spending:&lt;/strong&gt; VRP mandates. User sets up a mandate once — monthly cap, per-transaction cap, approved merchant. The agent calls the payment API within those bounds. No re-authentication. No virtual cards. No 3DS.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User → Sets up VRP mandate (one-time SCA)
Agent → Calls payment API within mandate limits
      → Instant confirmation via webhook
      → No further user interaction needed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;For the auth layer:&lt;/strong&gt; Don't build one. Seriously. The bank handles it. Your agent's job is to orchestrate the payment flow, not to authenticate the user. That's separation of concerns, and it's the right call whether you're building for agents or humans.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real question no one's asking
&lt;/h2&gt;

&lt;p&gt;Everyone is asking "how do we make card payments work for AI agents?"&lt;/p&gt;

&lt;p&gt;I think that's the wrong question.&lt;/p&gt;

&lt;p&gt;The right question is: &lt;strong&gt;why are we starting with the payment rail that requires the most human interaction, then engineering the human out?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Open banking was built on the premise that software talks to bank APIs. The authentication layer lives at the bank. The payment instruction is a clean API call. Settlement is immediate. There's no card number to protect, no 3DS challenge to render, no interchange fee eating your margin on micro-transactions.&lt;/p&gt;

&lt;p&gt;It wasn't designed for AI agents. But its architecture fits the agentic pattern better than anything the card networks are retrofitting.&lt;/p&gt;

&lt;p&gt;Sometimes the future doesn't come from the company with the biggest R&amp;amp;D budget. Sometimes it comes from the infrastructure that was built on the right abstraction in the first place.&lt;/p&gt;

&lt;p&gt;The UK has live open banking rails, live VRPs, and an FCA that's actively building the regulatory framework for what comes next. If you're a developer building agentic commerce for UK customers, you have a head start. Use it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm Arun, CTO at Atoa. We build open banking payment infrastructure for UK merchants. If you want to see what the API looks like: &lt;a href="https://docs.atoa.me" rel="noopener noreferrer"&gt;docs.atoa.me&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Try the sandbox → &lt;a href="https://docs.atoa.me" rel="noopener noreferrer"&gt;docs.atoa.me&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>fintech</category>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>FCA Just Rewrote the Open Banking Playbook for 2026. Here's What UK Payment Developers Actually Need to Know</title>
      <dc:creator>arun rajkumar</dc:creator>
      <pubDate>Wed, 13 May 2026 10:08:03 +0000</pubDate>
      <link>https://dev.to/mickyarun/fca-just-rewrote-the-open-banking-playbook-for-2026-heres-what-uk-payment-developers-actually-8gg</link>
      <guid>https://dev.to/mickyarun/fca-just-rewrote-the-open-banking-playbook-for-2026-heres-what-uk-payment-developers-actually-8gg</guid>
      <description>&lt;p&gt;I'm the CTO of an FCA-authorised Payment Institution. I spend most of my week either writing payments code or reading FCA / PSR consultation papers so my team doesn't have to.&lt;/p&gt;

&lt;p&gt;2026 is the year that work stopped being optional for everyone else.&lt;/p&gt;

&lt;p&gt;Three things shifted in the first half of the year, and if you ship anything that touches UK payments — checkouts, wallets, invoicing, recurring billing — at least two of them affect your roadmap. Most developer threads I read are still pattern-matching open banking onto "OAuth, but for banks." The actual changes are more interesting than that, and a lot more consequential.&lt;/p&gt;

&lt;p&gt;Here's the part I wish someone had handed me as a one-pager.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Variable Recurring Payments went from "spec" to "shipping"
&lt;/h2&gt;

&lt;p&gt;The FCA confirmed that the first commercial Variable Recurring Payments (VRPs) under the UK Payments Initiative scheme started flowing in Q1 2026. Phase 1 covers utilities, financial services top-ups, and payments to local and central government. (&lt;a href="https://www.regulationtomorrow.com/eu/fca-psr-joint-statement-on-open-banking-pricing-models/" rel="noopener noreferrer"&gt;FCA / PSR joint statement on open banking pricing models&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;For developers, this is the line where open banking stops being a one-off-payment toy and starts looking like a real subscription rail.&lt;/p&gt;

&lt;p&gt;What changes in your code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Consent has duration now.&lt;/strong&gt; A VRP consent is the closest thing UK open banking has ever had to "card-on-file." The user authorises a mandate — amount caps per period, total cap, expiry — and you can initiate payments against it without a fresh SCA dance every time. That's a different state machine than single-payment PIS.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You need to store mandate IDs, not card tokens.&lt;/strong&gt; The shape of the entity you persist is closer to a Direct Debit instruction than a Stripe &lt;code&gt;payment_method&lt;/code&gt;. Caps, frequency, last-used timestamp, status, revocation timestamp.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Revocation is bank-side, not app-side.&lt;/strong&gt; Users can kill a VRP mandate inside their banking app and you get a webhook telling you it's gone. If your retry logic assumes the mandate is still good because the user didn't cancel in &lt;em&gt;your&lt;/em&gt; UI, you'll log a lot of 401s.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pricing is now regulated.&lt;/strong&gt; The FCA / PSR joint statement in 2026 explicitly addressed VRP pricing models. Translation: this is no longer a back-room commercial conversation between ASPSPs and TPPs. The rate card has rails.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're SaaS-billed-monthly and your customers are UK consumers, this is the first year where "switch from cards to open banking" is a real engineering conversation, not a 2027 prediction.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. The FCA is getting actual rule-making powers over open banking
&lt;/h2&gt;

&lt;p&gt;The other big shift: the Data (Use and Access) Act gives the FCA new statutory powers to set open banking rules directly, rather than acting through the legacy CMA-mandated framework. The FCA has said it will consult on the long-term regulatory framework before the end of 2026, with workshops over the summer and autumn. (&lt;a href="https://www.fca.org.uk/news/news-stories/open-banking-2025-progress" rel="noopener noreferrer"&gt;FCA: Open banking — a year of progress&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;A developer-facing translation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The Open Banking Implementation Entity (OBIE) era is winding down. A "Future Entity" is being stood up — the FCA wrote to trade associations in late 2025 and expected the process to kick off around February 2026. (&lt;a href="https://www.regulationtomorrow.com/2026/01/open-banking-process-to-establish-a-future-entity/" rel="noopener noreferrer"&gt;Open banking — process to establish a Future Entity&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;The Open Banking Standard you've been integrating against will get formalised into FCA rules instead of CMA orders. Same spec, harder enforcement, broader scope.&lt;/li&gt;
&lt;li&gt;Expect new categories of regulated activity to land in scope: open finance (savings, pensions, investments, insurance) is being framed as the next phase, and the FCA published a vision paper for it. (&lt;a href="https://www.fca.org.uk/news/press-releases/fca-sets-out-vision-open-finance" rel="noopener noreferrer"&gt;FCA: vision for open finance&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're betting on open banking, you're now betting on a rail with a clearer regulator on top of it. That tends to be good for serious builders and bad for people who were treating PISP authorisation as optional.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. The architecture decision hasn't changed, but the stakes have
&lt;/h2&gt;

&lt;p&gt;The architectural choice for any UK payment developer is still binary, and it's the same one I wrote about in &lt;a href="https://dev.to/mickyarun"&gt;What Developers Get Wrong About PSD2&lt;/a&gt; a few weeks ago:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Become an FCA-authorised PISP yourself.&lt;/strong&gt; Months of compliance work. A regulated entity. Ongoing capital requirements. Direct ASPSP relationships. Worth it if payments is your business.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integrate against an authorised provider.&lt;/strong&gt; Days, not months. One API. Webhook semantics that don't change every time a CMA9 bank updates their consent flow.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;What 2026 changes is the &lt;em&gt;cost of getting this wrong&lt;/em&gt;. With VRPs going live, open finance on the roadmap, and the FCA now sitting on direct rule-making powers, the compliance surface is going to grow. If you've shipped a bank-by-bank integration, that surface grows linearly with every spec revision. We learned this running payments services across the CMA9 — every quarter, something changes.&lt;/p&gt;

&lt;p&gt;If you're not a payments company, don't become one by accident.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the integration actually looks like
&lt;/h2&gt;

&lt;p&gt;Same minimal pattern as a single-payment PIS, slightly different envelope for VRP. A mandate creation against Atoa looks roughly like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// 1. Create a VRP mandate&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;mandate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://api.atoa.me/v1/mandates&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Authorization&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Bearer &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;ATOA_API_KEY&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;reference&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;subscription&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;period&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;monthly&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;max_amount_per_period&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4999&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;// £49.99 cap per month&lt;/span&gt;
    &lt;span class="na"&gt;max_total_amount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;599999&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;       &lt;span class="c1"&gt;// £5,999.99 lifetime cap&lt;/span&gt;
    &lt;span class="na"&gt;expires_at&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;2027-05-13&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;redirect_url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://app.example.com/mandates/return&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;}),&lt;/span&gt;
&lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;

&lt;span class="c1"&gt;// 2. Send the customer to mandate.authorisation_url&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;redirect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;mandate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;authorisation_url&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// 3. Later, charge against the mandate without re-authing the user&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;payment&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`https://api.atoa.me/v1/mandates/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;mandate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/payments`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Authorization&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Bearer &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;ATOA_API_KEY&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4999&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;currency&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;GBP&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;reference&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;invoice&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;}),&lt;/span&gt;
&lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The user does SCA once, when the mandate is created. Every subsequent charge is a single API call. No 3DS dialogs. No saved-card vault. No PAN data on your servers.&lt;/p&gt;

&lt;p&gt;This is the thing developers don't realise until they ship it: the new open banking rail isn't a worse Stripe — it's a different shape, and once VRPs are live, that shape covers most of the cases card-on-file used to.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd do this quarter
&lt;/h2&gt;

&lt;p&gt;If you're shipping UK payments in 2026, three concrete moves:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Add VRP to your payments roadmap, even if you don't ship it this quarter.&lt;/strong&gt; The schema you'd need to model mandates is going to land in your codebase eventually. Sketch the data model now while you're still designing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Decide whether you're integrating directly or through an aggregator before the FCA consultation lands.&lt;/strong&gt; The new rules will reset the bar for "compliant" — if you're piecing together bank integrations, you've now got a regulatory clock on top of an engineering one.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Subscribe to the FCA consultation feed.&lt;/strong&gt; Genuinely. The 2026 papers will set the rails for the next five years. (&lt;a href="https://www.fca.org.uk/news/press-releases/fca-sets-out-vision-open-finance" rel="noopener noreferrer"&gt;FCA: vision for open finance&lt;/a&gt;)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is the moment open banking stops being "the alternative" and starts being "the rail." If you're a developer shipping UK payments, the bet you're being asked to make this year is whether you want to build on the rail before it's mainstream, or after.&lt;/p&gt;

&lt;p&gt;I'd build on it now.&lt;/p&gt;




&lt;p&gt;If you want to try the API and skip the regulatory headache, the sandbox is open: &lt;a href="https://docs.atoa.me" rel="noopener noreferrer"&gt;docs.atoa.me&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>api</category>
      <category>backend</category>
      <category>news</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>Payment Webhooks Will Lie To You. Here's How We Built Ones That Don't (in NestJS)</title>
      <dc:creator>arun rajkumar</dc:creator>
      <pubDate>Wed, 29 Apr 2026 11:48:31 +0000</pubDate>
      <link>https://dev.to/mickyarun/payment-webhooks-will-lie-to-you-heres-how-we-built-ones-that-dont-in-nestjs-30g9</link>
      <guid>https://dev.to/mickyarun/payment-webhooks-will-lie-to-you-heres-how-we-built-ones-that-dont-in-nestjs-30g9</guid>
      <description>&lt;p&gt;A payment webhook fires once. You miss it. The customer thinks they paid. Your dashboard says they didn't.&lt;/p&gt;

&lt;p&gt;Welcome to my Tuesday morning, two years ago.&lt;/p&gt;

&lt;p&gt;I've shipped four payment webhook systems in my career. The first three taught me everything I now refuse to do again. The fourth — the one running inside Atoa today — handles open banking payment notifications across our Node.js services without a single missed event in the last 14 months.&lt;/p&gt;

&lt;p&gt;Here's the boring, opinionated, production-tested pattern.&lt;/p&gt;

&lt;h2&gt;
  
  
  The lie webhooks tell you
&lt;/h2&gt;

&lt;p&gt;Every payment platform sells webhooks the same way:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"We'll notify your endpoint the moment the payment status changes."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;What they don't sell you on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Webhooks &lt;strong&gt;retry&lt;/strong&gt;. Sometimes 8 times. Sometimes never.&lt;/li&gt;
&lt;li&gt;Webhooks &lt;strong&gt;arrive out of order&lt;/strong&gt;. &lt;code&gt;failed&lt;/code&gt; can land before &lt;code&gt;pending&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Webhooks &lt;strong&gt;lie about idempotency&lt;/strong&gt;. Two &lt;code&gt;succeeded&lt;/code&gt; events for the same payment is normal, not a bug.&lt;/li&gt;
&lt;li&gt;Webhooks &lt;strong&gt;drop&lt;/strong&gt;. Network blip, your pod restart, a bad DNS lookup — one missed delivery and your reconciliation is wrong.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your webhook handler is a 30-line controller that updates a row in your database, you don't have a payment system. You have a hope.&lt;/p&gt;

&lt;h2&gt;
  
  
  The four-layer pattern
&lt;/h2&gt;

&lt;p&gt;Every webhook flow we run at Atoa has four layers. Skip any one and you'll be reconciling spreadsheets at midnight.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Verify the signature &lt;em&gt;before&lt;/em&gt; you parse the body
&lt;/h3&gt;

&lt;p&gt;The most common bug I see in code reviews from junior devs: parsing the JSON before checking the HMAC.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// webhook.controller.ts&lt;/span&gt;
&lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nd"&gt;Post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;atoa&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nd"&gt;Headers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;x-atoa-signature&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nx"&gt;signature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nd"&gt;RawBody&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Buffer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;// raw, not parsed&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;crypto&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;verify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;signature&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;secret&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;UnauthorizedException&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enqueue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;received&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two non-negotiables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use the &lt;strong&gt;raw body&lt;/strong&gt; for HMAC verification. NestJS's default JSON parser will mutate whitespace and break your signature check. Enable &lt;code&gt;rawBody: true&lt;/code&gt; on the app.&lt;/li&gt;
&lt;li&gt;Reject before you do &lt;em&gt;anything else&lt;/em&gt;. No DB hits, no logging the payload at info level, nothing.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Acknowledge fast. Process slow.
&lt;/h3&gt;

&lt;p&gt;The webhook controller does two things: verify, enqueue. That's it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// verify (above)&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enqueue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;payment.webhook&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;received&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;  &lt;span class="c1"&gt;// 200 within ~50ms&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your handler takes 8 seconds because you're hitting Stripe + your DB + sending an email, the sender will time out and retry. Now you have two events. Then four. Then the on-call engineer.&lt;/p&gt;

&lt;p&gt;We use BullMQ on Redis. You can use SQS, NATS, Kafka — pick your poison. The point is: &lt;strong&gt;the HTTP response is decoupled from the work&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Idempotency keys are not optional
&lt;/h3&gt;

&lt;p&gt;Every event has an &lt;code&gt;event_id&lt;/code&gt;. Before you do anything in your worker:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nd"&gt;Processor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;payment.webhook&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;WebhookProcessor&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Job&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;WebhookEvent&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;event_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;payment_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;seen&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;events&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;firstSeen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event_id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;seen&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Duplicate event &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;event_id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; — skipping`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;applyStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;payment_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;event_id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;firstSeen&lt;/code&gt; is a write to a Postgres table with &lt;code&gt;event_id&lt;/code&gt; as the primary key. If the insert succeeds, this is the first time we've seen this event. If it conflicts, we've processed it before. No race conditions, no Redis dance — just let the database do the work it's good at.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. State machines, not status updates
&lt;/h3&gt;

&lt;p&gt;This is the one that took me three failed payment systems to learn.&lt;/p&gt;

&lt;p&gt;A payment doesn't have a "status field." It has a &lt;strong&gt;state machine&lt;/strong&gt;. Some transitions are legal. Most aren't.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ALLOWED&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;PaymentStatus&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;PaymentStatus&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;initiated&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;authorising&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;failed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;authorising&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;succeeded&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;failed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;succeeded&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;            &lt;span class="c1"&gt;// terminal&lt;/span&gt;
  &lt;span class="na"&gt;failed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;               &lt;span class="c1"&gt;// terminal&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;applyStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;PaymentStatus&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;eventId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;payment&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;payments&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findById&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;ALLOWED&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;payment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Illegal transition: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;payment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; → &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;       &lt;span class="c1"&gt;// do not update, do not throw — this is normal&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;payments&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;transition&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;eventId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why this matters: when &lt;code&gt;failed&lt;/code&gt; arrives before &lt;code&gt;pending&lt;/code&gt; (and it will), your code shouldn't downgrade a &lt;code&gt;succeeded&lt;/code&gt; payment to &lt;code&gt;failed&lt;/code&gt;. With a state machine, the invalid transition is dropped. The reconciler picks it up later. The customer's payment stays correct.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we'd never do again
&lt;/h2&gt;

&lt;p&gt;Three patterns I see in the wild that I had to unlearn:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Polling instead of webhooks&lt;/strong&gt;. "We'll just check the status every 30 seconds." Sure — and you'll burn rate limits, miss the 5-second window where a customer is staring at the spinner, and pay for compute that does nothing 99% of the time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Replaying webhooks by re-running the handler&lt;/strong&gt;. If the handler does five things, replaying it does five things again. Idempotency keys mean replays are free.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Logging the full payload at info level&lt;/strong&gt;. PSD2 says your logs are PII now. Log the event_id and the status. Nothing else.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Where this gets you
&lt;/h2&gt;

&lt;p&gt;We process open banking payment notifications across dozens of UK merchants on this exact pattern. Zero missed events in 14 months. Reconciliation runs once a day and finds nothing to reconcile.&lt;/p&gt;

&lt;p&gt;The pattern doesn't care which payment provider you use. Stripe, GoCardless, Atoa — same four layers.&lt;/p&gt;

&lt;p&gt;If you want to see what these webhooks look like on the open banking side, our API docs walk through the full payment lifecycle and the webhook events we fire: &lt;a href="https://docs.atoa.me/api-reference/Payment/process-payment" rel="noopener noreferrer"&gt;docs.atoa.me&lt;/a&gt;. Sandbox is free, no card needed.&lt;/p&gt;

&lt;p&gt;Build the boring layers first. Sleep through Tuesday mornings.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Arun is co-founder &amp;amp; CTO of &lt;a href="https://paywithatoa.co.uk" rel="noopener noreferrer"&gt;Atoa&lt;/a&gt;, a UK open banking payments platform. He's &lt;a class="mentioned-user" href="https://dev.to/mickyarun"&gt;@mickyarun&lt;/a&gt; on X and dev.to. Driven by passion.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>nestjs</category>
      <category>webhooks</category>
      <category>openbanking</category>
      <category>node</category>
    </item>
    <item>
      <title>I Asked Three Coding Agents to Build My Son's Cricket Coach a Website. The Result Wasn't Decided by the Model — It Was Decided by Taste.</title>
      <dc:creator>arun rajkumar</dc:creator>
      <pubDate>Tue, 28 Apr 2026 08:51:56 +0000</pubDate>
      <link>https://dev.to/mickyarun/i-asked-three-coding-agents-to-build-my-sons-cricket-coach-a-website-the-result-wasnt-decided-by-3fam</link>
      <guid>https://dev.to/mickyarun/i-asked-three-coding-agents-to-build-my-sons-cricket-coach-a-website-the-result-wasnt-decided-by-3fam</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fog0y03vew7mee6anv6xy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fog0y03vew7mee6anv6xy.png" alt=" " width="800" height="421"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; — Codex GPT-5.5, Claude Opus 4.7, Gemini 3.1 Pro. Same prompt. Same 18 photos. Five total runs across different effort budgets. The one that won wasn't the prettiest. It was the one that understood the job: parents in Bengaluru enquire on WhatsApp, not contact forms.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;My son's cricket coach asked me for a website.&lt;/p&gt;

&lt;p&gt;Saturday afternoon. He runs &lt;strong&gt;Bangalore Royal Cricket Academy&lt;/strong&gt; — a small but seriously good cricket academy for kids. He had two phone numbers, a folder of 18 WhatsApp photos taken by parents, and a single line of brief: &lt;em&gt;"Like a real cricket academy, parents should be able to call or WhatsApp from their phone."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I'm a CTO. I'm in the trenches with AI coding agents most weeks. This felt like a clean, low-stakes test.&lt;/p&gt;

&lt;p&gt;So I gave the &lt;strong&gt;exact same prompt&lt;/strong&gt; and the &lt;strong&gt;exact same 18 photos&lt;/strong&gt; to three coding agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI Codex&lt;/strong&gt; (GPT-5.5, medium effort)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anthropic Claude Opus 4.7&lt;/strong&gt; (low effort, then re-run on medium)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google Gemini 3.1 Pro&lt;/strong&gt; (low effort, then re-run on high)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Five outputs. One Saturday. Five very different opinions on what "a cricket academy website" actually is.&lt;/p&gt;

&lt;p&gt;I went in expecting a verdict on visual quality. I didn't get one. I got something more interesting.&lt;/p&gt;




&lt;h2&gt;
  
  
  The setup
&lt;/h2&gt;

&lt;p&gt;The prompt was deliberately short:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Build a single-page website for Bangalore Royal Cricket Academy. Brand line: "Nurturing champions, one delivery at a time." Programs: Summer Camp, Weekday Batch, Weekend Batch, Intensive (elite). Two phone numbers. The photos are in &lt;code&gt;/photos for website&lt;/code&gt;. Parents should be able to contact us easily from their phone.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's it. No design system. No colour palette. No mention of WhatsApp by name. No mention of tests, deployment, SEO meta, or Cloudflare. Whatever each agent decided "easily contact us from their phone" meant — that was on the agent.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I got back, in five outputs
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Claude Opus 4.7, low effort
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F761vdk2apqzybhinstvj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F761vdk2apqzybhinstvj.png" alt=" " width="800" height="2503"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Single-file HTML, Tailwind via CDN, Bebas Neue display font, royal navy + gold palette.&lt;/p&gt;

&lt;p&gt;The headline made me sit up: &lt;strong&gt;"CHAMPIONS ARE / BUILT HERE."&lt;/strong&gt; with the second half in gold. It was the only one of the five where the hero felt like it belonged on a printed flyer the coach would hand out at a school. Visually polished.&lt;/p&gt;

&lt;p&gt;Engineering-wise, thin: no tests, no OG tags beyond a &lt;code&gt;&amp;lt;meta description&amp;gt;&lt;/code&gt;, photos referenced as &lt;code&gt;img-01.jpg&lt;/code&gt;…&lt;code&gt;img-18.jpg&lt;/code&gt;, all 14 used in a uniform 4-column grid. Tel: links only. No WhatsApp.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Claude Opus 4.7, medium effort
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ulaugzkrw1uou3urocp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ulaugzkrw1uou3urocp.png" alt=" " width="800" height="2381"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Same starting point, completely different output.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;section&lt;/span&gt; &lt;span class="na"&gt;id=&lt;/span&gt;&lt;span class="s"&gt;"top"&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"relative h-screen min-h-[640px] w-full overflow-hidden"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"absolute inset-0"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;img&lt;/span&gt; &lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"assets/photos/brca-01.jpeg"&lt;/span&gt; &lt;span class="na"&gt;alt=&lt;/span&gt;&lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"kenburns"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"absolute inset-0 bg-gradient-to-b from-navy-deep/85 via-navy/70 to-navy-deep/95"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/div&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/div&amp;gt;&lt;/span&gt;
  ...
&lt;span class="nt"&gt;&amp;lt;/section&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Full-screen hero with a &lt;strong&gt;Ken Burns animation&lt;/strong&gt; on the image. A scroll indicator with an animated dot inside a mouse outline. A &lt;strong&gt;gold cricket-seam pattern divider&lt;/strong&gt; between sections — actual dashed lines that look like ball stitching. Two-image collage in the about section with offset margins. CSS-columns masonry gallery using all 15 photos. Inline-SVG favicon as a data URI (one fewer request). OG tags. &lt;code&gt;theme-color&lt;/code&gt;. WhatsApp deep-link button on the contact section.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;a&lt;/span&gt; &lt;span class="na"&gt;href=&lt;/span&gt;&lt;span class="s"&gt;"https://wa.me/917337726777?text=Hi%20BRCA%2C%20I%27d%20like%20to%20know%20more%20about%20your%20programs."&lt;/span&gt;
   &lt;span class="na"&gt;target=&lt;/span&gt;&lt;span class="s"&gt;"_blank"&lt;/span&gt; &lt;span class="na"&gt;rel=&lt;/span&gt;&lt;span class="s"&gt;"noopener"&lt;/span&gt;
   &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"bg-gold text-navy font-semibold px-6 py-3.5 rounded-md"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
  💬 Message us on WhatsApp
&lt;span class="nt"&gt;&amp;lt;/a&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This was the prettiest output of the five. By a clear margin. Bebas Neue + Inter, Ken Burns, gold seam, masonry — the only one I'd let near a printer.&lt;/p&gt;

&lt;p&gt;Still Tailwind via CDN. Still no test suite. Still no automated deploy. Photos renamed semantically (&lt;code&gt;brca-01.jpeg&lt;/code&gt;).&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Codex GPT-5.5, medium effort
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnu67o7yrce765tiz7n4z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnu67o7yrce765tiz7n4z.png" alt=" " width="800" height="2949"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Vanilla HTML + 800-line vanilla CSS + 16 lines of vanilla JS. White-and-navy local-business layout. Numbered "01–04" feature blocks. WhatsApp green CTAs in the contact section.&lt;/p&gt;

&lt;p&gt;It looks less editorial than Claude-medium. It also does five things none of the others did.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One.&lt;/strong&gt; It picked &lt;strong&gt;6 photos&lt;/strong&gt; out of 18 and renamed them by content:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;brca-team-ground.jpeg
brca-trophy-team.jpeg
brca-trophy-presentation.jpeg
brca-young-achievers.jpeg
brca-coaching-moment.jpeg
brca-floodlight-batch.jpeg
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's editorial judgement encoded in code output. It chose; it didn't dump everything into a grid.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Two.&lt;/strong&gt; It wrote a &lt;code&gt;_headers&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;/*
  &lt;span class="n"&gt;X&lt;/span&gt;-&lt;span class="n"&gt;Content&lt;/span&gt;-&lt;span class="n"&gt;Type&lt;/span&gt;-&lt;span class="n"&gt;Options&lt;/span&gt;: &lt;span class="n"&gt;nosniff&lt;/span&gt;
  &lt;span class="n"&gt;Referrer&lt;/span&gt;-&lt;span class="n"&gt;Policy&lt;/span&gt;: &lt;span class="n"&gt;strict&lt;/span&gt;-&lt;span class="n"&gt;origin&lt;/span&gt;-&lt;span class="n"&gt;when&lt;/span&gt;-&lt;span class="n"&gt;cross&lt;/span&gt;-&lt;span class="n"&gt;origin&lt;/span&gt;
  &lt;span class="n"&gt;Permissions&lt;/span&gt;-&lt;span class="n"&gt;Policy&lt;/span&gt;: &lt;span class="n"&gt;camera&lt;/span&gt;=(), &lt;span class="n"&gt;microphone&lt;/span&gt;=(), &lt;span class="n"&gt;geolocation&lt;/span&gt;=()

/&lt;span class="n"&gt;assets&lt;/span&gt;/*
  &lt;span class="n"&gt;Cache&lt;/span&gt;-&lt;span class="n"&gt;Control&lt;/span&gt;: &lt;span class="n"&gt;public&lt;/span&gt;, &lt;span class="n"&gt;max&lt;/span&gt;-&lt;span class="n"&gt;age&lt;/span&gt;=&lt;span class="m"&gt;31536000&lt;/span&gt;, &lt;span class="n"&gt;immutable&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Security headers and cache rules. I didn't ask for them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Three.&lt;/strong&gt; It wrote a real test suite using &lt;code&gt;node:test&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;home page exposes call and WhatsApp enrollment links&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;html&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;index.html&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;assert&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;html&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sr"&gt;/href="tel:&lt;/span&gt;&lt;span class="se"&gt;\+&lt;/span&gt;&lt;span class="sr"&gt;917337726777"/&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;assert&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;html&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sr"&gt;/href="tel:&lt;/span&gt;&lt;span class="se"&gt;\+&lt;/span&gt;&lt;span class="sr"&gt;917337736777"/&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;assert&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;html&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sr"&gt;/https:&lt;/span&gt;&lt;span class="se"&gt;\/\/&lt;/span&gt;&lt;span class="sr"&gt;wa&lt;/span&gt;&lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="sr"&gt;me&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;917337726777/&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;assert&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;html&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sr"&gt;/https:&lt;/span&gt;&lt;span class="se"&gt;\/\/&lt;/span&gt;&lt;span class="sr"&gt;wa&lt;/span&gt;&lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="sr"&gt;me&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;917337736777/&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;referenced local assets and Cloudflare Pages config exist&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;imageRefs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="nx"&gt;html&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;matchAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/src="&lt;/span&gt;&lt;span class="se"&gt;([^&lt;/span&gt;&lt;span class="sr"&gt;"&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;\.(?:&lt;/span&gt;&lt;span class="sr"&gt;jpg|jpeg|png|webp&lt;/span&gt;&lt;span class="se"&gt;))&lt;/span&gt;&lt;span class="sr"&gt;"/gi&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
  &lt;span class="nx"&gt;assert&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;imageRefs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;at least six academy photos are used&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ref&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;imageRefs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;assert&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;existsSync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;URL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;root&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; exists`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three tests. They assert brand text, both phone numbers, both WhatsApp links, security file existence, responsive CSS, and that &lt;strong&gt;every referenced image actually exists on disk&lt;/strong&gt;. That last one is the one I respect most. It catches the single most common silent break in a static site.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Four.&lt;/strong&gt; Every primary CTA is a &lt;code&gt;wa.me&lt;/code&gt; deep link with &lt;strong&gt;prefilled message text&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;a&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"contact-link whatsapp"&lt;/span&gt;
   &lt;span class="na"&gt;href=&lt;/span&gt;&lt;span class="s"&gt;"https://wa.me/917337726777?text=Hi%20BRCA%2C%20I%20would%20like%20to%20know%20more%20about%20cricket%20training."&lt;/span&gt;
   &lt;span class="na"&gt;target=&lt;/span&gt;&lt;span class="s"&gt;"_blank"&lt;/span&gt; &lt;span class="na"&gt;rel=&lt;/span&gt;&lt;span class="s"&gt;"noopener"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;span&amp;gt;&lt;/span&gt;WhatsApp&lt;span class="nt"&gt;&amp;lt;/span&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;strong&amp;gt;&lt;/span&gt;7337726777&lt;span class="nt"&gt;&amp;lt;/strong&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/a&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Not just &lt;code&gt;wa.me/91…&lt;/code&gt;. &lt;strong&gt;Pre-filled message text.&lt;/strong&gt; Parent taps. Message lands. Zero typing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Five.&lt;/strong&gt; It deployed it. It opened my browser, walked me through a Cloudflare OAuth handshake, then pushed the build to Cloudflare Pages. The &lt;code&gt;.wrangler/cache/pages.json&lt;/code&gt; left behind:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"account_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"project_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"brca-academy"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Most coding agents stop at &lt;em&gt;"here's the HTML."&lt;/em&gt; Codex stopped at a live URL. That distinction — treating &lt;em&gt;"build a website"&lt;/em&gt; as a unit of work that includes shipping, not just generating markup — is what made me rate it the most production-ready output of the five.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Gemini 3.1 Pro, low effort
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzfgauyi415d82r7gqrar.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzfgauyi415d82r7gqrar.png" alt=" " width="800" height="2586"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Dark slate background. Electric blue + amber accents. 60 lines of vanilla JS with an IntersectionObserver scroll-reveal effect.&lt;/p&gt;

&lt;p&gt;It looked like a SaaS analytics dashboard. Wrong audience by about ten years. Photos referenced as &lt;code&gt;photo_1.jpeg&lt;/code&gt;…&lt;code&gt;photo_18.jpeg&lt;/code&gt;. Tel: only.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Gemini 3.1 Pro, high effort
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvmtvs96ps1v54ua89a04.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvmtvs96ps1v54ua89a04.png" alt=" " width="800" height="2183"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Palette fixed: navy + amber. Playfair Display + Outfit for typography. About section with an image collage and an "Elite Training Facility" badge. Wider elite-program card with a dedicated highlights box. Mobile menu with hamburger.&lt;/p&gt;

&lt;p&gt;Visually, a different website from the low-effort version. Genuinely better.&lt;/p&gt;

&lt;p&gt;What it still didn't have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;WhatsApp deep links. Anywhere. Tel: only.&lt;/li&gt;
&lt;li&gt;OG tags or &lt;code&gt;theme-color&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;A test suite.&lt;/li&gt;
&lt;li&gt;A deployment config.&lt;/li&gt;
&lt;li&gt;Semantic photo names — still &lt;code&gt;img1.jpeg&lt;/code&gt; through &lt;code&gt;img8.jpeg&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More budget bought better visuals. It didn't buy better judgement about what a Bengaluru cricket academy website is &lt;em&gt;for&lt;/em&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What actually decided it
&lt;/h2&gt;

&lt;p&gt;Not the prettiest hero. Not the cleverest animation.&lt;/p&gt;

&lt;p&gt;This:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;In Bengaluru, parents enquire on WhatsApp. Not email. Not contact forms. Not phone calls until they've messaged first.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The single biggest conversion lever for an Indian local business website is &lt;code&gt;wa.me&lt;/code&gt; deep linking with prefilled message text. Parent opens the page. Parent taps the button. WhatsApp opens with "Hi BRCA, I would like to know more about cricket training" already typed. They send. Coach gets a notification.&lt;/p&gt;

&lt;p&gt;Codex did this on every primary CTA. Claude-medium did it as one button at the bottom of the contact section. Claude-low, Gemini-low, and Gemini-high didn't do it at all.&lt;/p&gt;

&lt;p&gt;That single decision was worth more than the prettiest hero in the comparison.&lt;/p&gt;




&lt;h2&gt;
  
  
  The thing I wasn't expecting
&lt;/h2&gt;

&lt;p&gt;I went in assuming effort budget would be the variable that explained quality differences.&lt;/p&gt;

&lt;p&gt;Compare what happened when I doubled the effort budget on each model:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude (low → medium):&lt;/strong&gt; The visual quality jumped from "pretty" to "editorial-grade". It added Ken Burns animation, masonry gallery, OG tags, a &lt;code&gt;theme-color&lt;/code&gt;, semantic photo names, &lt;strong&gt;and a WhatsApp button&lt;/strong&gt;. It also renamed photos from &lt;code&gt;img-XX.jpg&lt;/code&gt; to &lt;code&gt;brca-XX.jpeg&lt;/code&gt;. The model used the extra budget to upgrade both taste &lt;em&gt;and&lt;/em&gt; product judgement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gemini (low → high):&lt;/strong&gt; The visual quality jumped. The palette got fixed. The typography got upgraded. The layout got more sophisticated.&lt;/p&gt;

&lt;p&gt;It still didn't add WhatsApp.&lt;/p&gt;

&lt;p&gt;It still didn't write tests.&lt;/p&gt;

&lt;p&gt;It still didn't deploy.&lt;/p&gt;

&lt;p&gt;It still left photos as &lt;code&gt;img1.jpeg&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;More budget didn't teach the model what the website was &lt;em&gt;for&lt;/em&gt;. It only taught it to make the wrong website prettier.&lt;/p&gt;

&lt;p&gt;The headline isn't &lt;em&gt;Codex won because GPT-5.5 is the best model&lt;/em&gt;. The headline is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Effort budget isn't the variable that explains output quality. Taste is.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Codex on a single medium run produced more production-ready output than Gemini on high. Claude on medium produced the most beautiful site in the lineup. Gemini on high produced a much-improved-but-still-fundamentally-misjudged website.&lt;/p&gt;

&lt;p&gt;The extra budget surfaced what each model already understood about the job. It didn't change what the model thought the job was.&lt;/p&gt;




&lt;h2&gt;
  
  
  Sidebar: Two paths to a Cloudflare token
&lt;/h2&gt;

&lt;p&gt;Worth mentioning because it's the kind of thing CTOs care about.&lt;/p&gt;

&lt;p&gt;When each agent needed to deploy to Cloudflare Pages, they took one of two paths:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Path A — silent OAuth.&lt;/strong&gt; Codex (medium) and Gemini (low) opened my browser, walked me through Cloudflare's OAuth flow, and got a session. Fast. Smooth. I never saw the token. The agent now has access to my entire Cloudflare account for the duration of that session.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Path B — paste-your-own-token.&lt;/strong&gt; Claude (at every effort level) and Gemini (at medium effort) said: "Go to Cloudflare → My Profile → API Tokens → Create Token with these specific scopes — Account: Cloudflare Pages: Edit — and paste it here. I won't see your account session." More friction at install time. Also more control: the token is scoped, I can see exactly what I gave the agent, I can rotate or revoke it without touching my main session.&lt;/p&gt;

&lt;p&gt;Both are defensible. Path A optimises for time-to-deploy. Path B optimises for credential hygiene.&lt;/p&gt;

&lt;p&gt;If you're a solo developer building a side project, Path A is probably fine. If you're running production infrastructure for a fintech and an AI agent is asking for credentials, Path B is the only answer. The fact that two of three agents converge to Path B at higher effort levels — Claude always, Gemini at medium and above — suggests their "thoughtful" mode is more security-aware. Codex stayed silent-OAuth even at medium. Worth knowing.&lt;/p&gt;




&lt;h2&gt;
  
  
  What this means for picking a coding agent in 2026
&lt;/h2&gt;

&lt;p&gt;Three takeaways, none of them about benchmarks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One. Test the agent on a job, not on a problem.&lt;/strong&gt; "Build a website" and "build a website that converts WhatsApp leads for an Indian local business" are different evaluations. The first is a syntax exercise. The second tells you whether the agent can read the room.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Two. Effort budgets are amplifiers, not teachers.&lt;/strong&gt; They make a model more of what it already is. If a model doesn't understand the job at low effort, high effort will produce a more polished version of the wrong thing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Three. Production scaffolding is the cheapest signal of seriousness.&lt;/strong&gt; Tests. Headers. OG meta. A &lt;code&gt;404.html&lt;/code&gt;. Curated photos with content-aware filenames. None of these were in my prompt. The agent that wrote all of them on its own is the one I trust with code I can't review line by line.&lt;/p&gt;




&lt;h2&gt;
  
  
  Coda — what actually shipped
&lt;/h2&gt;

&lt;p&gt;I have to be honest about something the single-shot benchmark couldn't capture.&lt;/p&gt;

&lt;p&gt;Codex won my engineering eval. That stands. It's the one I'd hand a junior dev and say &lt;em&gt;"ship it."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;But the one I reached for next was &lt;strong&gt;Claude&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Two more prompts with the medium-effort Claude — &lt;em&gt;"add a persistent WhatsApp floating button," "add a three-card contact section like a real local business, with primary office / coaching desk / WhatsApp"&lt;/em&gt; — and a bit of browser automation to handle the Cloudflare deploy and DNS, and the site went live at &lt;strong&gt;&lt;a href="https://brca.in/" rel="noopener noreferrer"&gt;brca.in&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That's the version the coach is using today. WhatsApp floating button. Three contact cards. A "Free trial session available" pill the coach asked for after the first parent enquiry. A schedule strip. Custom domain. Live HTTPS.&lt;/p&gt;

&lt;p&gt;Why Claude, not Codex — given my own engineering verdict?&lt;/p&gt;

&lt;p&gt;Because the single-shot test answers &lt;em&gt;"which agent has the best instincts."&lt;/em&gt; The shipping test answers &lt;em&gt;"which agent do I want as a collaborator."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Those are different questions. They had different answers for me.&lt;/p&gt;

&lt;p&gt;Claude was the one I wanted to keep editing. The Bebas Neue + gold-seam aesthetic, the masonry gallery, the Ken Burns hero — those are the parts of the design I didn't want to throw away. Codex's output was more correct. Claude's output was the one I had a relationship with.&lt;/p&gt;

&lt;p&gt;That's a real signal. Worth saying out loud.&lt;/p&gt;




&lt;h2&gt;
  
  
  The closer
&lt;/h2&gt;

&lt;p&gt;The coach got a website. Parents got a WhatsApp button. The site is live at &lt;strong&gt;&lt;a href="https://brca.in/" rel="noopener noreferrer"&gt;brca.in&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The first parent message landed in the inbox before sundown. &lt;em&gt;"Hi BRCA, I would like to know more about cricket training."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The one-shot finding holds: at first contact, taste decided the comparison. Codex's instinct for what an Indian local business website needed to do was sharper than any other model in the lineup.&lt;/p&gt;

&lt;p&gt;But the part of the comparison nobody benchmarks is the part that matters most after the demo: &lt;strong&gt;which agent do you actually want to keep working with&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For me, on this job — it was Claude.&lt;/p&gt;

&lt;p&gt;Champions are built here. Apparently websites too.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Site live at &lt;a href="https://brca.in/" rel="noopener noreferrer"&gt;brca.in&lt;/a&gt;. Drop a comment if you'd like the source code for all five runs — happy to share the GitHub repo.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I'd love to know:&lt;/strong&gt; which agent are you reaching for in 2026 — and what's the smallest job you've used to test whether it actually understands the room? Reply below.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>webdev</category>
      <category>startup</category>
    </item>
    <item>
      <title>Your Agent Doesn't Need a Better Model — It Needs a Context Layer</title>
      <dc:creator>arun rajkumar</dc:creator>
      <pubDate>Fri, 24 Apr 2026 12:49:42 +0000</pubDate>
      <link>https://dev.to/mickyarun/your-agent-doesnt-need-a-better-model-it-needs-a-context-layer-41pg</link>
      <guid>https://dev.to/mickyarun/your-agent-doesnt-need-a-better-model-it-needs-a-context-layer-41pg</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwq0i9qwqxxtonx0cw5pm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwq0i9qwqxxtonx0cw5pm.png" alt=" " width="800" height="447"&gt;&lt;/a&gt;We stopped trying to find a better model.&lt;/p&gt;

&lt;p&gt;We built a better context surface. Different problem. Different fix.&lt;/p&gt;

&lt;p&gt;Here's the story of how we got there, and why I think most teams in 2026 are optimising the wrong side of the equation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 1,200-line PR
&lt;/h2&gt;

&lt;p&gt;A few months ago, one of our engineers asked an AI agent to help add a new refund flow to our merchant service. The agent returned a PR. 1,200 lines. It compiled. The tests passed.&lt;/p&gt;

&lt;p&gt;It also did three things we'd explicitly decided, months earlier, to never do in this codebase:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;It created a new service-to-service HTTP client instead of using our internal &lt;code&gt;ServiceBus&lt;/code&gt; abstraction.&lt;/li&gt;
&lt;li&gt;It persisted refund state in the merchant service's own database instead of emitting a domain event for the ledger service to consume.&lt;/li&gt;
&lt;li&gt;It wrote a retry loop with &lt;code&gt;setTimeout&lt;/code&gt; instead of using our &lt;code&gt;@Retryable&lt;/code&gt; decorator, which has backoff policies tied to our SLOs.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;None of this is in the agent's training data. Nothing in the README told it either. And the reviewer — doing the review at 6pm on a Friday — skimmed the diff, saw green CI, and approved.&lt;/p&gt;

&lt;p&gt;Two weeks later we had a duplicate-refund incident. One hour of debugging to find the cause. Not a bug in the agent's code. A design-pattern violation the agent had no way to know existed.&lt;/p&gt;

&lt;h2&gt;
  
  
  The realisation
&lt;/h2&gt;

&lt;p&gt;Here's the uncomfortable part.&lt;/p&gt;

&lt;p&gt;The agent didn't do anything wrong. It did exactly what a capable junior engineer would have done if dropped into the repo for the first time with no context. Which is: it solved the immediate problem with reasonable-looking code, using the patterns it had seen in its training data.&lt;/p&gt;

&lt;p&gt;Our new hires did the same thing. I went back and checked. In the six months before that incident, we'd had three separate PRs from three different people — two human, one AI — all creating bespoke HTTP clients instead of using &lt;code&gt;ServiceBus&lt;/code&gt;. All of them reviewed by people who knew better but missed it under time pressure.&lt;/p&gt;

&lt;p&gt;The bug wasn't the model. The bug was that &lt;strong&gt;the knowledge of which patterns we'd consciously chosen to standardise lived nowhere an agent could read it, and only half-lived in the heads of senior engineers who weren't always in the review.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So we stopped chasing model quality and started building the thing that was actually missing: a context layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "context layer" actually means
&lt;/h2&gt;

&lt;p&gt;The phrase gets thrown around loosely since MCP took off, so let me be concrete.&lt;/p&gt;

&lt;p&gt;In our stack, a context layer is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;A single, versioned source of truth&lt;/strong&gt; for architectural decisions, design patterns, and merchant-domain invariants.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structured as machine-readable documents&lt;/strong&gt; (MDX with frontmatter, not free-form Confluence pages).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Served over MCP&lt;/strong&gt; so the same corpus is queryable by every AI tool on the team — Claude, Cursor, Copilot, our internal agents.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enforced by CI&lt;/strong&gt; through design-pattern lints that fail the build when any PR — human-authored or AI-authored — violates a recorded pattern.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The enforcement layer is what most teams skip. The context on its own is a wiki nobody reads. The lints on their own are arbitrary rules nobody remembers the reason for. Pairing them is where the leverage lives.&lt;/p&gt;

&lt;h2&gt;
  
  
  The three files that made it work
&lt;/h2&gt;

&lt;p&gt;Here's the minimum structure we settled on, with real examples from our monorepo.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. &lt;code&gt;adr/*.mdx&lt;/code&gt; — architectural decisions, machine-readable
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;---
id: ADR-0047
title: "Service-to-service communication goes through ServiceBus"
status: accepted
date: 2025-11-12
tags: [microservices, inter-service, nestjs]
supersedes: null
lint_rule: no-direct-http-client
---

## Context
15 NestJS microservices. Two years ago, every service had its own
Axios instance. Retry semantics drifted. Timeouts drifted. Tracing
headers got dropped. Incidents had no consistent trail.

## Decision
All service-to-service calls go through @atoa/service-bus, which
wraps Axios with retries, circuit breaking, OpenTelemetry tracing,
and our standard auth header injection.

## Rationale
- Retry policies live in one place, tied to SLOs.
- Every call is traced by default.
- Failures surface consistently in Grafana.

## Enforcement
eslint rule: no-direct-http-client (see lint-rules/)
CI gate: fail on import of 'axios' or 'node:http' in service code.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every ADR has a &lt;code&gt;lint_rule&lt;/code&gt; pointer. No ADR ships without one, unless explicitly marked &lt;code&gt;advisory&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. &lt;code&gt;lint-rules/no-direct-http-client.ts&lt;/code&gt; — the actual enforcement
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;TSESTree&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;TSESLint&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@typescript-eslint/utils&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;BANNED&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;axios&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;node:http&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;node:https&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;undici&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ALLOWED_PATHS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;libs/service-bus/&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;libs/http-primitives/&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;rule&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;TSESLint&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;RuleModule&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;useServiceBus&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;problem&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;useServiceBus&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Direct HTTP clients are banned. Use @atoa/service-bus. See ADR-0047.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;defaultOptions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
  &lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;filename&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getFilename&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ALLOWED_PATHS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;some&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{};&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nc"&gt;ImportDeclaration&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;node&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;TSESTree&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ImportDeclaration&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;BANNED&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;source&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;report&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;messageId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;useServiceBus&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nothing clever. The point is: when an agent (or a human) ships the banned pattern, the PR cannot land. Not "a reviewer will notice." The build fails. Every time.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. &lt;code&gt;context.mcp.json&lt;/code&gt; — what we expose to every tool
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"atoa-engineering-context"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1.4.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"resources"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"uri"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"adr://*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Architectural decisions with enforcement status"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"uri"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"pattern://*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Approved design patterns with code examples"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"uri"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"domain://merchant"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Merchant domain invariants and flows"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"uri"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"domain://payments"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Payment flow state machines"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tools"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"check_pattern"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Given a code snippet, return any ADR violations it would trigger"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"find_precedent"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Search for prior implementations of a similar pattern in our codebase"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every AI tool our team uses mounts this MCP server. When an engineer asks Claude to "add a refund flow," the model has the ADRs in retrieval &lt;em&gt;before&lt;/em&gt; it starts writing code. When it asks "how have we handled async retries in the past," &lt;code&gt;find_precedent&lt;/code&gt; returns the real decorator, not something that looks plausible.&lt;/p&gt;

&lt;p&gt;The agent stopped hallucinating patterns not because the model got smarter. Because we gave it somewhere to look.&lt;/p&gt;

&lt;h2&gt;
  
  
  What happened in the last 30 days
&lt;/h2&gt;

&lt;p&gt;We've been running this layer across the full engineering team — 18 people, mix of AI-heavy and AI-light workflows — for just under a quarter now. Last month's numbers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;23 pattern violations caught&lt;/strong&gt; by design-pattern lints before merge. 14 from human-authored PRs. 9 from AI-authored PRs. The ratio surprised me. I'd expected AI to dominate the violation list. It did not.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;2 architectural regressions avoided&lt;/strong&gt; that would previously have shipped. One was a would-be duplicate-refund bug in the same area as the Friday-night incident. The lint caught what the reviewer under time pressure would have missed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Onboarding time for a new engineer down from 2 weeks to 4 days&lt;/strong&gt; on the local-dev side, which is a separate story, but the context layer helped here too. New hires read the ADR corpus once, then let the MCP server answer their day-to-day "does this already exist?" questions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero arguments in code review about "is this the right pattern."&lt;/strong&gt; When a disagreement happens, the question becomes "is there an ADR for this?" If yes, the lint decides. If no, we write the ADR.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last one is the quiet win. Code review time on architectural questions dropped by roughly a third, because we stopped relitigating decisions we'd already made.&lt;/p&gt;

&lt;h2&gt;
  
  
  The part most teams get wrong
&lt;/h2&gt;

&lt;p&gt;Two patterns I see repeatedly on teams that try to build this and don't get the leverage:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Context without enforcement.&lt;/strong&gt; A beautiful ADR wiki nobody reads. Every violation still ships because there's no gate. This is where most teams stop because the wiki felt like the real deliverable. It is not. The lint is the real deliverable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Enforcement without context.&lt;/strong&gt; A forest of lint rules nobody understands. The first time someone hits a red CI gate with a rule they've never seen, they open a Slack channel and ask why. If the lint points to an ADR with a clear rationale, the question answers itself. If it points to a rule that just says "forbidden," you've built a political problem disguised as infrastructure.&lt;/p&gt;

&lt;p&gt;Pairing them is not optional. Either one alone is worse than nothing.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this means for "model quality" debates in 2026
&lt;/h2&gt;

&lt;p&gt;Every week there's a new "is Claude 4.6 better than Opus 4.5 at code" thread. I read them. I have opinions. But in terms of what actually moved the needle on our shipping velocity this quarter — it wasn't the model.&lt;/p&gt;

&lt;p&gt;It was the retrieval surface.&lt;/p&gt;

&lt;p&gt;The model doesn't need to be smarter. It needs to read the right thing before it answers. And once the context layer is good enough, the difference between "good model" and "great model" collapses, because both are now looking at the same authoritative source.&lt;/p&gt;

&lt;p&gt;For 2026, if I had to pick one place to invest a quarter of engineering time to improve AI-native development, it wouldn't be better prompts. It wouldn't be a new IDE extension. It would be this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Write down the patterns you've actually chosen. Make them machine-readable. Serve them over MCP. Enforce them in CI. Stop relying on tribal knowledge to survive code review.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The agent isn't the bottleneck. The knowledge surface is.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm Arun, CTO and co-founder at &lt;a href="https://paywithatoa.co.uk" rel="noopener noreferrer"&gt;Atoa&lt;/a&gt; — we build open banking payments for the UK. We run 15 NestJS microservices in production and I write about the things we've learned the hard way. Find me on X &lt;a href="https://x.com/mickyarun" rel="noopener noreferrer"&gt;@mickyarun&lt;/a&gt; if you want to argue about any of this.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devops</category>
      <category>ai</category>
      <category>agents</category>
      <category>sre</category>
    </item>
    <item>
      <title>What Developers Get Wrong About PSD2 and Payment Initiation</title>
      <dc:creator>arun rajkumar</dc:creator>
      <pubDate>Wed, 22 Apr 2026 06:16:57 +0000</pubDate>
      <link>https://dev.to/mickyarun/what-developers-get-wrong-about-psd2-and-payment-initiation-2o3m</link>
      <guid>https://dev.to/mickyarun/what-developers-get-wrong-about-psd2-and-payment-initiation-2o3m</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fur71t93pqnivlb3xigu8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fur71t93pqnivlb3xigu8.png" alt=" " width="800" height="447"&gt;&lt;/a&gt;&lt;br&gt;
I've spent UK FinTech Week (April 20–24) reading developer threads about open banking. Same misconceptions every time.&lt;/p&gt;

&lt;p&gt;PSD2 is "just OAuth for banks." Payment Initiation Services are "basically a bank transfer." The whole open banking stack is "Stripe with worse DX."&lt;/p&gt;

&lt;p&gt;None of that is right. And the gap matters, because the developers carrying these assumptions are the ones building the next wave of UK checkouts. If you're shipping payments code in 2026, here's what I'd want you to know before you write the first line.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. PSD2 is not OAuth
&lt;/h2&gt;

&lt;p&gt;The flow looks like OAuth. It is not OAuth.&lt;/p&gt;

&lt;p&gt;OAuth gives an app permission to read or write data on behalf of a user. PSD2's Payment Initiation Service (PIS) gives a regulated third party — the PISP — the legal right to instruct a payment from the user's bank account, with the bank legally obligated to execute it.&lt;/p&gt;

&lt;p&gt;That is a fundamentally different contract.&lt;/p&gt;

&lt;p&gt;The bank is not "letting your app do something." The bank is being compelled by regulation to act on a payment instruction from a licensed PISP, after Strong Customer Authentication (SCA) has been completed. The user authenticates inside their banking app — biometrics, PIN, or device binding — and the bank moves the money. No card network. No tokenisation. No 3DS dance.&lt;/p&gt;

&lt;p&gt;If you treat PIS like OAuth, you'll over-engineer the consent layer and under-engineer the settlement layer. They're different problems.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. "Just hit the bank API" is not a real architecture
&lt;/h2&gt;

&lt;p&gt;I see a lot of "we'll integrate directly with each bank's API." Sure. There are 9 CMA9 banks in the UK alone. Add the building societies, the challenger banks, and the EU PSD2 obligations if you're cross-border.&lt;/p&gt;

&lt;p&gt;Each bank exposes a slightly different flavour of the Open Banking Standard. Different consent expiry rules. Different ASPSP redirect quirks. Different webhook delivery patterns. Different rate limits.&lt;/p&gt;

&lt;p&gt;We learned this the hard way running 15 microservices for UK payment flows. Bank-by-bank integration is not a feature. It is a maintenance liability that grows linearly with every new bank you add and exponentially with every spec revision the OBIE pushes.&lt;/p&gt;

&lt;p&gt;The architectural choice is binary: become an FCA-authorised PISP yourself (months of compliance work, a regulated entity, ongoing capital requirements), or integrate against an aggregator who's already done it.&lt;/p&gt;

&lt;p&gt;If you're not building a payments company, do not become a PISP. Use one.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. SCA is not a checkbox
&lt;/h2&gt;

&lt;p&gt;Strong Customer Authentication is the single biggest thing developers underestimate.&lt;/p&gt;

&lt;p&gt;You don't add SCA to a payment flow. SCA &lt;em&gt;is&lt;/em&gt; the payment flow.&lt;/p&gt;

&lt;p&gt;Every payment initiation in the UK requires two of three factors: knowledge (PIN), possession (device), inherence (biometrics). The user has to authenticate inside their bank, on every payment, unless an exemption applies — and the exemption rules are tighter than most teams realise. Low-value contactless. Recurring TPP-managed VRPs. Trusted beneficiaries. That's mostly it.&lt;/p&gt;

&lt;p&gt;If your UX assumes "save the bank, charge silently next time" the way Stripe lets you save a card — you're going to ship a flow the bank will block.&lt;/p&gt;

&lt;p&gt;This is also why commercial Variable Recurring Payments (cVRP) is the most-watched topic at UK FinTech Week this week. UK Finance proposed the cVRP Wave 2 commercial model earlier this month. cVRP is the legitimate, regulator-blessed answer to "how do I take recurring open banking payments without making the user re-auth every time." It's coming. Build for it.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. The webhook is the source of truth, not the redirect
&lt;/h2&gt;

&lt;p&gt;This one breaks junior payments code more than anything else.&lt;/p&gt;

&lt;p&gt;The user completes authentication in their bank. The bank redirects them back to your &lt;code&gt;redirectUrl&lt;/code&gt;. Your app shows "Payment successful."&lt;/p&gt;

&lt;p&gt;Wrong. The redirect is a UX hint. It is not a payment confirmation.&lt;/p&gt;

&lt;p&gt;The actual payment status — &lt;code&gt;COMPLETED&lt;/code&gt;, &lt;code&gt;PENDING&lt;/code&gt;, &lt;code&gt;FAILED&lt;/code&gt;, &lt;code&gt;CANCELLED&lt;/code&gt; — comes from a server-to-server webhook the PISP fires once the bank has settled (or refused) the payment instruction. Sometimes that's instant. Sometimes there's a delay if the bank is doing fraud checks. Sometimes the user closes the browser before the redirect fires but the payment still completes.&lt;/p&gt;

&lt;p&gt;If your fulfilment logic depends on the redirect, you will eventually ship orders for payments that never landed, or refuse orders for payments that actually did. We had to retrofit this in our merchant app early on. Webhook-first, redirect-second. Always.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Open banking is not "Stripe but cheaper"
&lt;/h2&gt;

&lt;p&gt;I'll be opinionated here because I think the framing matters.&lt;/p&gt;

&lt;p&gt;Stripe is a magnificent product. It abstracts card networks beautifully. It is also, structurally, a card-rails product paying Visa and Mastercard interchange on every transaction. That's why UK card processing costs sit at 1.5–2.9%. The interchange is a tax built into the rail.&lt;/p&gt;

&lt;p&gt;Open banking is a different rail. There is no interchange. The money moves over Faster Payments (UK) or SEPA Instant (EU). The cost structure is fundamentally different — flat fee, not percentage. Atoa is roughly half the cost of cards because we're not paying Visa for the privilege of moving the money.&lt;/p&gt;

&lt;p&gt;The right mental model is not "Stripe alternative." It is "second payment rail, with different economics, different latency, different UX, different fraud profile, different settlement guarantees."&lt;/p&gt;

&lt;p&gt;For high-ticket B2B invoices, open banking is dramatically better. For impulse e-commerce, cards still win on conversion friction. For UK SaaS doing recurring billing, cVRP is about to flip the calculus. Pick the rail that fits the use case. Don't pick the rail that fits last year's mental model.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR for developers shipping in 2026
&lt;/h2&gt;

&lt;p&gt;PSD2 is not OAuth. Don't integrate banks directly. SCA isn't optional. Webhooks are the source of truth. Open banking is its own rail, not a Stripe replacement.&lt;/p&gt;

&lt;p&gt;If you're at UK FinTech Week this week and want to see this in code, the Atoa sandbox takes 5 minutes to set up. We've spent years getting the integration down to a single API call so you don't have to relearn what we already did wrong.&lt;/p&gt;

&lt;p&gt;Try it: &lt;a href="https://docs.atoa.me" rel="noopener noreferrer"&gt;docs.atoa.me&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What's the misconception about open banking you keep hearing from your engineering team?&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Arun Rajkumar is Co-Founder &amp;amp; CTO of Atoa, an FCA-authorised UK open banking payments platform. He writes about CTO lessons, microservices, and what we're learning building a payments rail outside the card networks.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>psd2</category>
      <category>openbanking</category>
      <category>fintech</category>
      <category>api</category>
    </item>
    <item>
      <title>We Built an Open-Source Coding Exam Platform Because Every Vendor Let Us Down</title>
      <dc:creator>arun rajkumar</dc:creator>
      <pubDate>Sat, 11 Apr 2026 01:05:35 +0000</pubDate>
      <link>https://dev.to/mickyarun/we-built-an-open-source-coding-exam-platform-because-every-vendor-let-us-down-a7m</link>
      <guid>https://dev.to/mickyarun/we-built-an-open-source-coding-exam-platform-because-every-vendor-let-us-down-a7m</guid>
      <description>&lt;p&gt;Every year, our team visits engineering colleges across India to hire freshers. The first round is always an online coding test — 300+ students, one shot at finding the ones who can actually think.&lt;/p&gt;

&lt;p&gt;We tried Coderbyte. Fifty concurrent user limit. So we'd split students into batches, stagger timings, juggle schedules between college coordinators and our engineers.&lt;/p&gt;

&lt;p&gt;We tried HackerRank's community edition. Different tool, different headache.&lt;/p&gt;

&lt;p&gt;Every vendor had a ceiling — concurrency limits, inflexible problem formats, generic DSA questions that tested memorization over problem-solving. And the pricing? Designed for companies ten times our size.&lt;/p&gt;

&lt;p&gt;I was ranting about this to my engineering team. Out loud. In our standup. Trying to find yet another vendor to evaluate.&lt;/p&gt;

&lt;p&gt;My engineers — most of them freshers themselves just a couple years ago — went quiet. Said nothing for a few days.&lt;/p&gt;

&lt;p&gt;Then they shipped a product. Two engineers. One weekend. AI-assisted development. And two days of intensive testing before it went live.&lt;/p&gt;




&lt;h2&gt;
  
  
  What They Built
&lt;/h2&gt;

&lt;p&gt;A full-stack, self-hosted coding exam platform. Not a toy. Not a prototype. A production system we ran 300+ students through this hiring season.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiyvl3yshlh9suzimdvts.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiyvl3yshlh9suzimdvts.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn6qwhxaxfkjr2k7gup70.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn6qwhxaxfkjr2k7gup70.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's what's under the hood:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Monaco Editor&lt;/strong&gt; — the same engine that powers VS Code. Syntax highlighting, autocomplete, multi-language support. Students write real code, not paste answers into a textarea.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Judge0 Sandboxed Execution&lt;/strong&gt; — every submission runs inside a sandboxed Judge0 instance. Test cases execute in parallel with automatic batching. Students get instant, per-test-case verdicts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ICPC-Style Scoring&lt;/strong&gt; — not just pass/fail. Penalty points for wrong attempts. Time-based ranking. Race-condition-safe writes to the database. The leaderboard feels like a competitive programming contest, not a homework checker.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Live Leaderboard&lt;/strong&gt; — backed by a PostgreSQL materialized view that refreshes after every accepted submission. O(1) rank queries. Students watch themselves climb in real-time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;API-Based Challenges&lt;/strong&gt; — beyond traditional stdin/stdout problems, we built support for API-format challenges where students interact with real endpoints. This lets us test how candidates think about integration, not just algorithms.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Server-Synced Timer&lt;/strong&gt; — the countdown runs on server time, not the client clock. No inspect-element tricks. Configurable start/end windows with server-enforced access guards.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Autosave&lt;/strong&gt; — code drafts are debounce-saved to the server every few seconds. Browser crash? Tab closed? The student picks up right where they left off.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;White-Label Ready&lt;/strong&gt; — app name, logo, brand colors, copyright — all configurable via environment variables. Zero code changes. We use it as our own branded platform; anyone can make it theirs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture at a Glance
&lt;/h2&gt;

&lt;p&gt;The platform is a monorepo with two core applications:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;client/   → Vue 3 SPA (student exam UI + admin panel)
server/   → NestJS REST API (auth, exam logic, code execution, scoring)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In production, the server compiles and serves the client's static build directly — no separate web server or CDN needed.&lt;/p&gt;

&lt;p&gt;The submission flow works like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Student writes code in the Monaco editor and hits Submit&lt;/li&gt;
&lt;li&gt;The Vue client POSTs to the API with the code and language&lt;/li&gt;
&lt;li&gt;The SubmissionsService fetches all test cases and sends batch requests to Judge0, automatically chunking to stay within limits&lt;/li&gt;
&lt;li&gt;The server polls Judge0 tokens until all results resolve&lt;/li&gt;
&lt;li&gt;The ScoringService applies the ICPC penalty formula and updates the score using a pessimistic database lock&lt;/li&gt;
&lt;li&gt;The LeaderboardService refreshes the materialized view&lt;/li&gt;
&lt;li&gt;Results return to the client with per-test-case verdicts and an updated leaderboard&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;All of this happens in seconds, even under load.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Tech Stack
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Frontend:&lt;/strong&gt; Vue 3 (Composition API), Vite 8, TypeScript 5.9, Pinia 3 for state, Monaco Editor 0.55, Brotli compression&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Backend:&lt;/strong&gt; NestJS 11, TypeScript 5.7, TypeORM 0.3, Passport JWT, Swagger/OpenAPI docs, rate limiting via @nestjs/throttler&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Database:&lt;/strong&gt; PostgreSQL 17 for the application, PostgreSQL 16 + Redis 7.2 for Judge0's internal queue&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Infrastructure:&lt;/strong&gt; Docker Compose orchestrates six services — app, app-db, judge0-server, judge0-worker, judge0-db, and judge0-redis. Multi-stage Dockerfile produces a minimal Node 22-alpine image running as a non-root user.&lt;/p&gt;




&lt;h2&gt;
  
  
  Features That Matter
&lt;/h2&gt;

&lt;p&gt;Here's what we built because we needed it, not because a product manager spec'd it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multiple concurrent exams&lt;/strong&gt; — run several exams at once; students pick which to enter&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mixed formats&lt;/strong&gt; — MCQs alongside coding problems in the same exam&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Admin panel&lt;/strong&gt; — create exams, duplicate them, manage problems with visible/hidden test cases, configure weights&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Safe Exam Browser detection&lt;/strong&gt; — a composable detects whether students are in a locked-down browser&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Built-in API docs&lt;/strong&gt; — interactive API reference baked right into the student UI for API-format challenges&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;QA role opt-in&lt;/strong&gt; — students can flag interest in QA engineering during registration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run mode&lt;/strong&gt; — execute code against sample inputs without scoring; lets students experiment before committing&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why Open Source?
&lt;/h2&gt;

&lt;p&gt;We're a fintech startup. Thirty-odd people. We didn't build this to sell it.&lt;/p&gt;

&lt;p&gt;We built it because we were tired of bending our hiring process around someone else's product limitations. And once we had it, we realized every small company visiting colleges faces the exact same problem.&lt;/p&gt;

&lt;p&gt;Here's the thing that makes this story worth telling: two engineers built this in a weekend, with AI doing the heavy lifting on scaffolding, boilerplate, and iteration. Then two days of intensive testing to harden it for production. That's the power of AI-assisted development — it doesn't replace engineers, it turns two of them into ten.&lt;/p&gt;

&lt;p&gt;In the AI era, expensive hiring software shouldn't be a gate that keeps small teams from finding great talent. If two engineers with AI tools can build a platform that handles 300+ concurrent students with ICPC scoring and sandboxed execution in a weekend, there's no reason that capability should be locked behind enterprise pricing.&lt;/p&gt;

&lt;p&gt;The whole thing is AGPL-3.0 licensed. Fork it, brand it, run it on your own infrastructure — just keep your modifications open too.&lt;/p&gt;




&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;The fastest path is Docker Compose:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env
&lt;span class="c"&gt;# Set DB_PASSWORD, JWT_SECRET, ADMIN_SETUP_KEY&lt;/span&gt;
docker compose up &lt;span class="nt"&gt;--build&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Six services start in dependency order. The app waits for the database health check, runs migrations automatically, and you're live.&lt;/p&gt;

&lt;p&gt;For local development without Docker, you'll need PostgreSQL 17 and a Judge0 instance. The README walks through every step — database creation, migrations, environment variables, and running the frontend and backend separately.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;We're cleaning up a few things before the public launch:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Finishing the test suite (Jest is installed and configured, specs are being added)&lt;/li&gt;
&lt;li&gt;Polishing the contributor docs&lt;/li&gt;
&lt;li&gt;Adding a demo mode so people can try it without setting up Judge0&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're interested, &lt;strong&gt;follow me here&lt;/strong&gt; — I'll drop the GitHub link as soon as the repo goes public.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bigger Lesson
&lt;/h2&gt;

&lt;p&gt;I went looking for a vendor. My team handed me a product.&lt;/p&gt;

&lt;p&gt;Two engineers. One weekend of building. Two days of intensive testing. Powered by AI-assisted development. A platform that replaced two commercial tools and produced measurably better candidate quality in round two.&lt;/p&gt;

&lt;p&gt;That's what happens when you hire people for intent over resumes — and then get out of their way.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built with Vue 3, NestJS, PostgreSQL, Judge0, and a healthy disregard for vendor lock-in.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Star the repo when it drops. Or better yet — fork it and run your own hiring season on it.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>hiring</category>
      <category>webdev</category>
      <category>ai</category>
    </item>
    <item>
      <title>Open Banking Was Built for the Wrong Future — and That's Why It's Perfect for AI Agents</title>
      <dc:creator>arun rajkumar</dc:creator>
      <pubDate>Tue, 07 Apr 2026 19:35:42 +0000</pubDate>
      <link>https://dev.to/mickyarun/open-banking-was-built-for-the-wrong-future-and-thats-why-its-perfect-for-ai-agents-4oha</link>
      <guid>https://dev.to/mickyarun/open-banking-was-built-for-the-wrong-future-and-thats-why-its-perfect-for-ai-agents-4oha</guid>
      <description>&lt;p&gt;Visa announced infrastructure for AI agents to make payments without asking you first.&lt;/p&gt;

&lt;p&gt;GoCardless shipped an MCP server in February so developers can talk to their payment platform in natural language.&lt;/p&gt;

&lt;p&gt;I build open banking payment infrastructure for the UK. I've been watching both of these announcements very closely.&lt;/p&gt;

&lt;p&gt;And I have a counterintuitive take: &lt;strong&gt;the payment rail everyone called "too complicated for normal users" might be the only one that actually works for AI agents.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem With Cards and AI Agents
&lt;/h2&gt;

&lt;p&gt;When an AI agent needs to make a payment on your behalf, the obvious infrastructure is what already exists. Cards. Stored credentials. The same rails your Netflix subscription uses.&lt;/p&gt;

&lt;p&gt;Here's the problem.&lt;/p&gt;

&lt;p&gt;Card authorisation is broad. When you give Stripe a card token, you're essentially giving that token permission to charge whatever you've authorised — subject to 3DS, fraud rules, and limits. But the authorisation scope isn't bound to a specific action.&lt;/p&gt;

&lt;p&gt;For an AI agent, that's dangerous.&lt;/p&gt;

&lt;p&gt;You want an agent to book a flight. It has your card token. Nothing technically stops it from booking the wrong flight, adding seat upgrades you didn't ask for, or — if the prompt is maliciously crafted — doing something you absolutely didn't intend.&lt;/p&gt;

&lt;p&gt;Visa understands this. Their Trusted Agent Protocol exists precisely to solve it: a way for merchants to verify that an agent is legitimate and acting within its authorised scope. It's clever engineering. But it's being bolted onto rails that weren't designed for it.&lt;/p&gt;

&lt;p&gt;Open banking wasn't designed for AI agents either. But its constraints happen to be exactly the right shape.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Open Banking Consent Actually Looks Like
&lt;/h2&gt;

&lt;p&gt;When a customer pays via open banking — the way Atoa processes payments — here's what actually happens under the hood:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Merchant creates a payment consent object
   → amount: £49.99
   → merchant: Atoa test merchant
   → purpose: "Coffee subscription - April"

2. Customer is redirected to their bank
   → Bank shows: "Atoa wants to take £49.99 from your account"
   → Customer approves or declines
   → Bank issues a single-use authorisation code

3. Atoa exchanges the code for the payment
   → One payment. Specific amount. Specific purpose.
   → The authorisation is consumed. It cannot be reused.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every payment is its own consent event. Every consent is scoped to a specific amount and purpose. You can't overcharge. You can't quietly add extras. You can't reuse the authorisation.&lt;/p&gt;

&lt;p&gt;For a human user, this is friction. That's why open banking adoption was slow. Nobody wants to log into their banking app every time they buy something.&lt;/p&gt;

&lt;p&gt;For an AI agent, this friction is a feature.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why the Consent Model Fits AI Agents
&lt;/h2&gt;

&lt;p&gt;Think about what you actually want when an AI agent makes a payment on your behalf.&lt;/p&gt;

&lt;p&gt;You want:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It to charge exactly the amount you authorised&lt;/li&gt;
&lt;li&gt;The scope to be limited to what you asked it to do&lt;/li&gt;
&lt;li&gt;The ability to revoke access without cancelling your card&lt;/li&gt;
&lt;li&gt;A clear audit trail showing what was authorised and when&lt;/li&gt;
&lt;li&gt;The payment to fail loudly if anything is out of scope — not silently proceed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Open banking gives you all of that by default.&lt;/p&gt;

&lt;p&gt;Cards give you none of it by default, and you have to engineer it in.&lt;/p&gt;

&lt;p&gt;The FCA even made it better recently. They removed the 90-day re-authentication requirement that was causing 20-40% customer drop-off for third-party payment providers. Persistent consent — once granted to an agent — can now remain valid without forcing a re-authentication loop.&lt;/p&gt;

&lt;p&gt;That's a massive unlock for agentic payment flows.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Engineering Challenge: Consent Lifecycle for Agents
&lt;/h2&gt;

&lt;p&gt;Here's where it gets genuinely hard.&lt;/p&gt;

&lt;p&gt;When a human completes an open banking payment, the flow is synchronous: they go to the bank, they approve, they come back. Done.&lt;/p&gt;

&lt;p&gt;When an AI agent initiates a payment, the flow might look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User: "Book the Bangalore to London flight if the price drops below £600"

Agent: [Monitors prices for 3 days]
Agent: [Price hits £598 on a Wednesday morning]
Agent: [Attempts to initiate payment]
         → But the user's open banking consent was granted for a specific session
         → That session token expired 6 hours ago
         → Payment fails

Result: Agent missed the window. User wakes up to a "couldn't book your flight" message.
        Price is now £640.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Consent that's synchronous and session-bound doesn't work for agents that act asynchronously.&lt;/p&gt;

&lt;p&gt;This is the real engineering problem. Not "can AI agents make payments?" — they clearly can. But "what does the consent model look like for an agent that might act hours or days after the user gave permission?"&lt;/p&gt;

&lt;p&gt;There are a few approaches:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option 1: Pre-authorised payment mandates&lt;/strong&gt;&lt;br&gt;
Open banking supports Variable Recurring Payments (VRPs) — essentially mandates where the user sets a maximum amount and time window, and the payment provider can initiate within those bounds without re-authentication.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Conceptual structure of a VRP mandate for an agent&lt;/span&gt;
&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;AgentPaymentMandate&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;agentId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;maxAmountPence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;      &lt;span class="c1"&gt;// Agent cannot exceed this&lt;/span&gt;
  &lt;span class="nl"&gt;validUntil&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;           &lt;span class="c1"&gt;// Time-bounded consent&lt;/span&gt;
  &lt;span class="nl"&gt;allowedMerchants&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt; &lt;span class="c1"&gt;// Scope: only these merchants&lt;/span&gt;
  &lt;span class="nl"&gt;purpose&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;            &lt;span class="c1"&gt;// What this mandate is for&lt;/span&gt;
  &lt;span class="nl"&gt;requiresConfirmation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// Some actions still need approval&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent operates within a pre-defined envelope. The user sets the boundaries once. The agent acts within them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option 2: Payment intent + human gate&lt;/strong&gt;&lt;br&gt;
Agent identifies a payment opportunity, creates a payment intent, notifies the user. User approves in one tap. Agent executes.&lt;/p&gt;

&lt;p&gt;This is the pattern we're building toward at Atoa — merchant describes what they need in natural language, agent proposes the payment, human approves in one tap, open banking rails execute it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// What an agent workflow might look like&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;intent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;paymentAgent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;proposePayment&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;merchant&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;flight-booking-service&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;59800&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// £598.00 in pence&lt;/span&gt;
  &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;BLR→LHR flight, price hit target of £600&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;expiresAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="c1"&gt;// 30 min window&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// User gets notified: "Your agent wants to book your flight for £598. Approve?"&lt;/span&gt;
&lt;span class="c1"&gt;// One tap. Payment executes.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Option 3: Programmatic consent with audit trail&lt;/strong&gt;&lt;br&gt;
For fully autonomous agents — the Visa Intelligent Commerce model — the agent holds delegated credentials scoped to specific actions, with every payment logged against the authorisation that permitted it.&lt;/p&gt;

&lt;p&gt;We're not here yet in open banking. But the architecture exists to get there.&lt;/p&gt;




&lt;h2&gt;
  
  
  What We're Thinking About at Atoa
&lt;/h2&gt;

&lt;p&gt;We build open banking payment infrastructure. POS terminals, payment links, invoicing, online checkouts. Everything goes through bank payment rails.&lt;/p&gt;

&lt;p&gt;We're thinking about this differently to most — our payment surfaces each have different agent-readiness, and that's shaped how we're approaching the consent problem. When Visa announced Intelligent Commerce, our first question wasn't "can we compete with this?" It was: "which of our surfaces are ready right now, and which ones need the architecture to change?"&lt;/p&gt;

&lt;p&gt;Here's our honest assessment:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pay by Link — probably the most agent-ready thing we have.&lt;/strong&gt; An agent could generate a payment link, send it to a customer, and monitor completion. The consent event is triggered by the link recipient, not the agent. The agent just facilitates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Payment Pages — also strong.&lt;/strong&gt; A merchant's agent could build and publish a payment page with specific parameters. No card infrastructure needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;POS Terminal — hardest.&lt;/strong&gt; The consent flow requires physical presence for SCA. An agent isn't physically present. This one needs new thinking.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Invoicing — interesting.&lt;/strong&gt; An agent managing a merchant's books could issue invoices and track payment status. The open banking payment confirmation is machine-readable. This is real today.&lt;/p&gt;

&lt;p&gt;The shape of "agentic commerce" looks different for each surface. There's no one-size-fits-all answer.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Question Every Open Banking Developer Should Be Asking
&lt;/h2&gt;

&lt;p&gt;GoCardless shipping an MCP server tells you something important: &lt;strong&gt;payment infrastructure companies are now thinking about developers' AI workflows as a first-class use case.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not just "can humans use our API?" but "can an AI agent use our API safely?"&lt;/p&gt;

&lt;p&gt;That's a different design question. An API designed for humans assumes there's a human reading error messages, handling edge cases, making judgment calls. An API designed for agents needs those things to be machine-readable, scoped, and predictable.&lt;/p&gt;

&lt;p&gt;Open banking has a head start here. The consent model is explicit. The amounts are bounded. The authorisation chain is auditable. Every payment has a "why" attached to it.&lt;/p&gt;

&lt;p&gt;The engineers who figure out the consent lifecycle problem — how do you grant an agent payment permissions that are time-bounded, amount-bounded, and purpose-bounded, without requiring the human to be present at the moment of execution — will be building the infrastructure that the next decade of agentic commerce runs on.&lt;/p&gt;

&lt;p&gt;That's the problem I'm thinking about.&lt;/p&gt;

&lt;p&gt;What's your take — does the card world catch up to open banking here, or does the consent model give open banking a structural advantage in the agent era?&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Arun Rajkumar is CTO &amp;amp; Co-Founder of &lt;a href="https://paywithatoa.co.uk" rel="noopener noreferrer"&gt;Atoa&lt;/a&gt;, an FCA-authorised open banking payments platform in the UK. He writes about payments, fintech engineering, and building for the UK from India. &lt;a href="https://dev.to/mickyarun"&gt;@mickyarun&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>ai</category>
      <category>openbanking</category>
      <category>fintech</category>
    </item>
  </channel>
</rss>
