<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mykola Kondratiuk</title>
    <description>The latest articles on DEV Community by Mykola Kondratiuk (@itskondrat).</description>
    <link>https://dev.to/itskondrat</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3753205%2Fa206f74a-98be-4c2b-abbd-f06ec964327b.jpg</url>
      <title>DEV Community: Mykola Kondratiuk</title>
      <link>https://dev.to/itskondrat</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/itskondrat"/>
    <language>en</language>
    <item>
      <title>Fable 5 Went Dark Friday Night. I Ran My Critical Workflow on a Backup Saturday - Here's What Broke</title>
      <dc:creator>Mykola Kondratiuk</dc:creator>
      <pubDate>Mon, 15 Jun 2026 07:42:36 +0000</pubDate>
      <link>https://dev.to/itskondrat/fable-5-went-dark-friday-night-i-ran-my-critical-workflow-on-a-backup-saturday-heres-what-broke-349d</link>
      <guid>https://dev.to/itskondrat/fable-5-went-dark-friday-night-i-ran-my-critical-workflow-on-a-backup-saturday-heres-what-broke-349d</guid>
      <description>&lt;p&gt;On Friday afternoon a government order hit Anthropic, and by Saturday morning Fable 5 and Mythos 5 were disabled for every customer worldwide. Not deprecated. Gone. Two days later OpenAI shut Sora down because it was losing fifteen million dollars a day.&lt;/p&gt;

&lt;p&gt;I don't have a strong take on the politics. What I had was a smaller, more selfish question at 8am Saturday: if I'd staffed a real workflow on either of those, what would I actually do right now?&lt;/p&gt;

&lt;p&gt;So I tested it. Here's what happened.&lt;/p&gt;

&lt;h2&gt;
  
  
  "We'd just switch" is a hope, not a plan
&lt;/h2&gt;

&lt;p&gt;I'd been telling myself I had redundancy for months. If my main model fell over, I'd move to a second vendor. Easy.&lt;/p&gt;

&lt;p&gt;The problem with that sentence is that I had never once run it. A fallback you've never executed isn't a fallback. It's a guess with good posture.&lt;/p&gt;

&lt;p&gt;So Saturday I took my single most critical AI-dependent workflow - a spec-to-task-breakdown pipeline I lean on every day - and ran it end to end on a different vendor's model. One time. Just to find out whether the guess held.&lt;/p&gt;

&lt;p&gt;It didn't.&lt;/p&gt;

&lt;h2&gt;
  
  
  Break #1: the prompt was overfit to one model
&lt;/h2&gt;

&lt;p&gt;The first thing that broke was the prompt itself. My prompt had drifted into a shape that worked beautifully on the model I built it against. Tight, terse, lots of implicit structure the model had learned to fill in.&lt;/p&gt;

&lt;p&gt;The backup model read the same prompt and produced mush. Not wrong exactly, just vague and unstructured, the kind of output you'd toss.&lt;/p&gt;

&lt;p&gt;The fix was real work, not a config flag:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;&lt;span class="gd"&gt;- summarize the spec and break it into tasks
&lt;/span&gt;&lt;span class="gi"&gt;+ You are breaking a spec into engineering tasks.
+ Output JSON only, matching this shape:
+ { "tasks": [{ "title": "", "estimate_pts": 0, "depends_on": [] }] }
+ Rules:
+ - every task must be independently shippable
+ - no task larger than 3 points; split if larger
+ - depends_on references task titles, not indexes
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Model A filled in all that structure on its own. Model B needed it spelled out. That's twenty minutes of restructuring I'd much rather spend on a calm Saturday than during an actual outage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Break #2: a silent tool-call dependency
&lt;/h2&gt;

&lt;p&gt;The second break scared me more because it was invisible. One step in the pipeline depended on a tool call - a function the model invokes to pull live data. The backup model's tool-calling format was different enough that the call silently no-op'd.&lt;/p&gt;

&lt;p&gt;The output still looked plausible. It just used stale data and didn't tell me. That's the worst failure mode there is: confidently wrong, no error, no flag. I only caught it because I was looking for trouble. On a normal day that bad output flows downstream and someone makes a decision on it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Availability belongs on the risk register
&lt;/h2&gt;

&lt;p&gt;Here's the reframe I walked away with. We already handle the API being &lt;em&gt;down&lt;/em&gt;. You get a 503, you back off, you retry, it comes back. That's an outage with an SLA and a status page that eventually goes green.&lt;/p&gt;

&lt;p&gt;This is the model being &lt;em&gt;gone&lt;/em&gt;. No SLA. No restore ETA. No green status page, because it isn't coming back. A policy order or a vendor's burn-rate review can end it overnight, and you find out the same way everyone else does.&lt;/p&gt;

&lt;p&gt;For a service you don't control and can't restore, that's a single point of failure on your critical path. We'd never ship that for a database. Most of us are shipping it for the model doing half the thinking.&lt;/p&gt;

&lt;h2&gt;
  
  
  The one-pager that deletes your worst hour
&lt;/h2&gt;

&lt;p&gt;The cheapest move turned out to be the most useful. The first hour after a model goes dark gets burned figuring out &lt;em&gt;what just broke&lt;/em&gt; - which workflows touched that model, what versions, where the outputs live.&lt;/p&gt;

&lt;p&gt;IBM found 88% of enterprises don't keep a complete inventory of the AI and agents they run. You can't reroute around a dead model if you don't know what depended on it. So I wrote one file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;workflows&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;spec-to-tasks&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;primary-vendor/model-a&lt;/span&gt;
    &lt;span class="na"&gt;criticality&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;must-survive&lt;/span&gt;
    &lt;span class="na"&gt;fallback&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tested 2026-06-13, prompt needs restructure&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;standup-digest&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;primary-vendor/model-a&lt;/span&gt;
    &lt;span class="na"&gt;criticality&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;can-wait&lt;/span&gt;
    &lt;span class="na"&gt;fallback&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;none, recovery order documented&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;video-assets&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;openai/sora&lt;/span&gt;
    &lt;span class="na"&gt;criticality&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;can-wait&lt;/span&gt;
    &lt;span class="na"&gt;export_path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;download MP4s + project json before EOL&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That last line is the Sora lesson. When a vendor kills a &lt;em&gt;product&lt;/em&gt;, not just a model, you also have to ask where your outputs go and how you get them out. One extra column.&lt;/p&gt;

&lt;h2&gt;
  
  
  The point isn't fear
&lt;/h2&gt;

&lt;p&gt;I want to be clear, because the lazy version of this post is "AI is unreliable, panic." It isn't, and that's not useful. Depending on these models is the right call. The teams that win aren't the ones who avoided the dependency. They're the ones who can keep the work moving the morning it disappears.&lt;/p&gt;

&lt;p&gt;That competence costs an afternoon to build and almost nobody has built it yet:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Run your most critical workflow on a second model once. The rehearsal is the whole instrument.&lt;/li&gt;
&lt;li&gt;Sort workflows into must-survive-today vs can-wait. Only the short list earns a tested fallback.&lt;/li&gt;
&lt;li&gt;Keep a one-page workflow-to-model list so the first lost hour becomes a glance.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I ran my test on a quiet Saturday and it cost me twenty minutes and a little ego. The alternative was running it for the first time on the morning it counted.&lt;/p&gt;

&lt;p&gt;What would break first in your stack if your main model wasn't there tomorrow - and have you ever actually checked?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>discuss</category>
      <category>devops</category>
      <category>productivity</category>
    </item>
    <item>
      <title>I Lead AI Agents Every Day - Here Are 5 Shifts No Standard Tells You How to Make</title>
      <dc:creator>Mykola Kondratiuk</dc:creator>
      <pubDate>Fri, 12 Jun 2026 07:09:26 +0000</pubDate>
      <link>https://dev.to/itskondrat/i-lead-ai-agents-every-day-here-are-5-shifts-no-standard-tells-you-how-to-make-1pg4</link>
      <guid>https://dev.to/itskondrat/i-lead-ai-agents-every-day-here-are-5-shifts-no-standard-tells-you-how-to-make-1pg4</guid>
      <description>&lt;p&gt;A Google DeepMind safety lead said this week that they're putting $10M behind multi-agent safety because "there just isn't really a field of research for multi-agent safety yet."&lt;/p&gt;

&lt;p&gt;I read that and laughed, because I'm already running the thing the research field doesn't exist for yet. Most of us are. You spin up a couple of agents, hand them work, and somewhere in there you quietly become a manager of workers that don't think like workers.&lt;/p&gt;

&lt;p&gt;Two days before that, PMI published the first official standard for AI in project work. It's a solid document. It also leaves the entire "how do you actually do this on a Tuesday" layer to you. So here's my Tuesday layer: five shifts I had to make, each one learned by getting it wrong first.&lt;/p&gt;

&lt;h2&gt;
  
  
  You stop filling the queue and start drawing the line
&lt;/h2&gt;

&lt;p&gt;My first instinct with an agent was the same as with a person: here's work, go.&lt;/p&gt;

&lt;p&gt;That broke the first time an agent made a reasonable decision on something that turned out to be irreversible. It wasn't the agent's fault. I never told it which decisions were one-way doors.&lt;/p&gt;

&lt;p&gt;So now the first artifact I write isn't a task list. It's a boundary file. Something like this lives next to the work:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# decision-boundaries.yml&lt;/span&gt;
&lt;span class="na"&gt;autonomous&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;reformat, refactor, rename within a module&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;anything reversible with a git revert&lt;/span&gt;
&lt;span class="na"&gt;escalate&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;schema changes, public API shape&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;deletes, migrations, anything touching prod data&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;spend over $0 or any external send&lt;/span&gt;
&lt;span class="na"&gt;on_unsure&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;stop_and_ask&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That file does more for me than any standup. Leadership moved from assigning the work to defining what may be decided without me.&lt;/p&gt;

&lt;h2&gt;
  
  
  You read work you never watched happen
&lt;/h2&gt;

&lt;p&gt;I used to review work I'd seen get built. I knew the steps, so "looks right" was usually safe.&lt;/p&gt;

&lt;p&gt;Then I started getting finished diffs with no memory of how they came to be. "Looks right" stopped being safe. The code was clean and the reasoning under it was wrong in a way you only catch if you go digging.&lt;/p&gt;

&lt;p&gt;The skill now is judging a result cold, with zero context on the path. Ethan Mollick wrote this week about a model holding twelve hours of focus on one spec. When the attention window outlasts mine, my job isn't checking steps. It's scoping the spec so tightly the steps don't need a babysitter.&lt;/p&gt;

&lt;h2&gt;
  
  
  You plan capability, not headcount
&lt;/h2&gt;

&lt;p&gt;"How many engineers do I need" is a question I catch myself asking and kill.&lt;/p&gt;

&lt;p&gt;The real one: what mix of people and agents produces this outcome, and what's the human-only core I'd never hand off? The plan turned into a capability map with a deliberately protected center.&lt;/p&gt;

&lt;p&gt;Gergely Orosz's June job-market analysis lands in the same place from the data side: the roles that compound are where judgment about AI systems is the scarce input, not execution on a known stack. Capability planning is that judgment pointed at your own team.&lt;/p&gt;

&lt;h2&gt;
  
  
  You design the alarm before the fire
&lt;/h2&gt;

&lt;p&gt;Standup tells you something broke. Which means it tells you late.&lt;/p&gt;

&lt;p&gt;Workers that fail unpredictably need the alarm built up front. I keep a short tripwire list, each one a single sentence: if this observable crosses this line, halt and ping me, and here's who owns the ping.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# tripwires.yml&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;watch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;test_pass_rate&lt;/span&gt;
  &lt;span class="na"&gt;trip&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;100%&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;on&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;touched&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;files"&lt;/span&gt;
  &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;halt + page me&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;watch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;files_changed&lt;/span&gt;
  &lt;span class="na"&gt;trip&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;20&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;in&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;one&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;task"&lt;/span&gt;
  &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pause for scope review&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It feels too simple to matter. It has saved more bad mornings than any dashboard I've built.&lt;/p&gt;

&lt;h2&gt;
  
  
  You own the system, not the deliverable
&lt;/h2&gt;

&lt;p&gt;This is the one that's actually a promotion.&lt;/p&gt;

&lt;p&gt;Ownership used to mean the outcome is mine. It still is. The level changed. I don't own the deliverable directly anymore. I own the system that makes it: people, agents, and the rules between them. That's the only level that scales.&lt;/p&gt;

&lt;p&gt;Boris Cherny, who runs Claude Code, said this week he hasn't written a line of code himself in eight months. People hear a flex. I hear the shift in one sentence: stopped producing the work, started owning the system that produces it. Bigger job, not a smaller one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where are you on these
&lt;/h2&gt;

&lt;p&gt;I'm not clean on all five. Solid on three, shaky on two, and the shaky ones cost me the most.&lt;/p&gt;

&lt;p&gt;Rate yourself one to five on each, fast. The two you score lowest are the two behaviors that move you this quarter. Which one did you make first, and which are you still avoiding?&lt;/p&gt;

&lt;p&gt;Tags: #projectmanagement #ai #career&lt;/p&gt;

</description>
      <category>projectmanagement</category>
      <category>leadership</category>
      <category>ai</category>
      <category>career</category>
    </item>
    <item>
      <title>I Took the Keyboard Back From an Agent Mid-Task - Here's What the New PMP Can't Test</title>
      <dc:creator>Mykola Kondratiuk</dc:creator>
      <pubDate>Fri, 05 Jun 2026 06:22:24 +0000</pubDate>
      <link>https://dev.to/itskondrat/i-took-the-keyboard-back-from-an-agent-mid-task-heres-what-the-new-pmp-cant-test-55n1</link>
      <guid>https://dev.to/itskondrat/i-took-the-keyboard-back-from-an-agent-mid-task-heres-what-the-new-pmp-cant-test-55n1</guid>
      <description>&lt;p&gt;A few weeks back I had an agent reconciling a vendor list. It ran clean. No error, no crash, output looked right. Then I noticed it had merged two suppliers that share a parent company into a single row, which would have thrown off every spend rollup downstream by a real number.&lt;/p&gt;

&lt;p&gt;I stopped it and fixed it by hand. Not because anything alerted me. Because I'd been burned by that exact shape before, and the burn taught me something a tutorial never did.&lt;/p&gt;

&lt;p&gt;I'm telling you this because on July 9 the PMP changes, and the change points straight at that moment.&lt;/p&gt;

&lt;h2&gt;
  
  
  What changed in the exam
&lt;/h2&gt;

&lt;p&gt;For the first time, the PMP makes AI mandatory content instead of an elective. The Business Environment domain goes from 8% of the exam to 26%. PMBOK 8 becomes the base. Fees climb in August.&lt;/p&gt;

&lt;p&gt;The largest project-management body on earth just put it in writing: AI fluency is core PM competency now, not a specialty track. If you've been treating "PM + AI" as a buzzword, the institution that certifies the role just disagreed with you.&lt;/p&gt;

&lt;p&gt;I think that's the right move. It's also where it gets interesting for anyone who actually ships work through agents.&lt;/p&gt;

&lt;h2&gt;
  
  
  Certified is not practiced
&lt;/h2&gt;

&lt;p&gt;A multiple-choice exam can certify that you know what an agentic workflow is. It can check that you'll pick the textbook answer about AI risk, define non-determinism, name the correct oversight principle.&lt;/p&gt;

&lt;p&gt;That's awareness, and awareness is worth certifying.&lt;/p&gt;

&lt;p&gt;What it can't reach is the reflex that runs real work. The exam can't certify that you've handed live stakes to an agent, watched it drift, and built the instinct for when it can act versus when you take over. You don't recall that instinct. You earn it, and the only way to earn it is to need it.&lt;/p&gt;

&lt;p&gt;I learned the vendor-merge thing from the run where I caught it too late.&lt;/p&gt;

&lt;h2&gt;
  
  
  What AI fluency looks like in the editor, not the exam
&lt;/h2&gt;

&lt;p&gt;Let me put it in do-this terms. Here's the practiced version, and none of it is a question you can bubble in.&lt;/p&gt;

&lt;p&gt;You scope the agent's slice like a statement of work. Not "improve onboarding." Bounded edges, in and out defined before it touches anything:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;reconcile vendor list for Q2&lt;/span&gt;
&lt;span class="na"&gt;in_scope&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;dedupe exact-match names&lt;/span&gt;
&lt;span class="na"&gt;out_of_scope&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;merging distinct legal entities&lt;/span&gt;   &lt;span class="c1"&gt;# &amp;lt;- the line that would have saved me&lt;/span&gt;
&lt;span class="na"&gt;acceptance&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;row count change is human-reviewed before rollup runs&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You know the override moment. This one I'd put first. You only learn the veto by having needed it.&lt;/p&gt;

&lt;p&gt;You read the output for what it didn't do, not just whether what's there is correct. The skipped supplier, the untouched edge case.&lt;/p&gt;

&lt;p&gt;You design the work so a human can actually check it. If the only way to verify is to redo the whole thing, the slice was wrong.&lt;/p&gt;

&lt;p&gt;You size the blast radius before deploy. How wrong can this go, who feels it, answered up front the way you'd treat a change to a live service.&lt;/p&gt;

&lt;p&gt;Five reps. All earned. Zero on the exam.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this is a level-up, not a layoff
&lt;/h2&gt;

&lt;p&gt;The panic read of this is backwards. The credential catching up doesn't shrink the role, it raises the floor. When the institution names AI fluency as baseline, the person who's practiced it instead of read about it becomes the scarce one. Certified-and-practiced is a much smaller set than certified.&lt;/p&gt;

&lt;p&gt;So I wouldn't study the new section and stop. I'd go get the reps it's gesturing at. Hand something real to an agent this week, let it run a little past comfortable, and watch the moment your hand goes for the keyboard. That moment is the competency. It never shows up on a scorecard.&lt;/p&gt;

&lt;p&gt;For those of you shipping work through agents already: what's the moment that taught you to override, and could a test have taught it instead?&lt;/p&gt;

&lt;p&gt;Tags: #projectmanagement&lt;/p&gt;

</description>
      <category>projectmanagement</category>
      <category>ai</category>
      <category>career</category>
      <category>discuss</category>
    </item>
    <item>
      <title>I Sorted 40 Backlog Items by Shape Instead of Who's Free - Here's What Broke</title>
      <dc:creator>Mykola Kondratiuk</dc:creator>
      <pubDate>Wed, 03 Jun 2026 07:29:49 +0000</pubDate>
      <link>https://dev.to/itskondrat/i-sorted-40-backlog-items-by-shape-instead-of-whos-free-heres-what-broke-4h0a</link>
      <guid>https://dev.to/itskondrat/i-sorted-40-backlog-items-by-shape-instead-of-whos-free-heres-what-broke-4h0a</guid>
      <description>&lt;p&gt;canonical_url:&lt;/p&gt;

&lt;p&gt;Two Fortune 500 execs stood on a summit stage last week and gave opposite answers to one question: is an AI agent a colleague or a tool?&lt;/p&gt;

&lt;p&gt;One names his agents and seats them in reviews. The other won't call them colleagues. I watched the debate go by and realized I'd stopped caring about the noun a while ago, because it never once changed what I did with my backlog on a Monday.&lt;/p&gt;

&lt;p&gt;So here's the thing I actually changed, and the part of it that broke.&lt;/p&gt;

&lt;h2&gt;
  
  
  The old default: sort by who's free
&lt;/h2&gt;

&lt;p&gt;For years my planning loop was the same. Look at what needs doing. Look at who has capacity. Assign. The unit was always the person. "Who's free" was the first question.&lt;/p&gt;

&lt;p&gt;When agents showed up in the loop, I just slotted them in as another row in the capacity table. Same question, one more name. That's the "tool" answer in practice - an agent is a faster hand, you point it at whatever's next.&lt;/p&gt;

&lt;p&gt;It worked until it didn't. The agent would happily take a task that needed a human to even scope it correctly, produce something plausible, and I'd find out two steps later that the plausible thing was wrong in a way that cost me a half-day to unwind.&lt;/p&gt;

&lt;h2&gt;
  
  
  The change: sort by shape first
&lt;/h2&gt;

&lt;p&gt;I flipped the order. Before I look at who's free, I sort the backlog by &lt;em&gt;shape&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Two buckets. Person-shaped work needs judgment, lives in ambiguity, depends on taste or a relationship I can't write down. Agent-shaped work is defined, repeatable, and gate-able - I can describe the input, the output, and the check it has to pass.&lt;/p&gt;

&lt;p&gt;The discipline is that I write the agent's slice like a contractor's statement of work, not a chat prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# agent slice — scoped like an SOW, not a vibe&lt;/span&gt;
&lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;regenerate API client from updated OpenAPI spec&lt;/span&gt;
&lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;openapi.yaml (v3.1, committed)&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;existing client at src/clients/&lt;/span&gt;
&lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;regenerated client, same public surface&lt;/span&gt;
&lt;span class="na"&gt;gate&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;all existing contract tests pass&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;public exports diff reviewed by a human before merge&lt;/span&gt;
&lt;span class="na"&gt;owner_of_outcome&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;me&lt;/span&gt;   &lt;span class="c1"&gt;# not the agent&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If I can't fill in &lt;code&gt;gate&lt;/code&gt; and &lt;code&gt;output&lt;/code&gt; cleanly, that's my signal: this isn't agent-shaped yet. It's person-shaped work I was about to mislabel because I wanted it off my plate.&lt;/p&gt;

&lt;p&gt;I ran this across about 40 backlog items. Roughly a third of what I'd have handed to an agent under the old "who's free" sort failed the shape test and went back to a human.&lt;/p&gt;

&lt;h2&gt;
  
  
  What broke
&lt;/h2&gt;

&lt;p&gt;Two things broke, and both were useful.&lt;/p&gt;

&lt;p&gt;First, my sense of which work was "important." I'd been guarding a pile of tasks as too critical to automate. Half of them were just defined work I was emotionally attached to. They sorted cleanly into the agent bucket and ran fine. The status I'd assigned them was about me, not the work.&lt;/p&gt;

&lt;p&gt;Second - and this is the one I underscoped - the gate is only as good as the human watching it. There's research showing that once you treat an agent like a teammate, you review its output &lt;em&gt;less&lt;/em&gt; carefully. Naming the thing lowers your guard. So the "colleague" framing isn't warmth, it's a quiet accountability leak. I caught myself rubber-stamping a passing gate because the agent had "been reliable lately." A tool doesn't earn trust. Work does, run by run.&lt;/p&gt;

&lt;p&gt;There's an old line going around that fits: you automate the boring, and then humans just manage the failure modes. That's not a downgrade of the human job. The failure modes ARE the job now. Managing the part the agent can't hold is the work that's left, and it's the work that compounds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this is a career thing, not just a workflow thing
&lt;/h2&gt;

&lt;p&gt;The same exec who named his agents also said the hardest part isn't the technology, it's the managers. I think that's exactly right, and it cuts both ways.&lt;/p&gt;

&lt;p&gt;If the manager is the bottleneck, the manager is also the leverage. The skill that pays off from here isn't prompt-writing - that floor keeps dropping. It's the older skill of looking at a pile of work and knowing fast which parts are person-shaped and which you can scope, gate, and let run. That's "golden age of the generalist" energy: when the specialization tax drops, the person who holds judgment AND can design the work is the one who wins.&lt;/p&gt;

&lt;p&gt;The execs on stage were missing a practitioner, so they argued about the label. The label was never the leadership problem. The work was.&lt;/p&gt;

&lt;p&gt;So I'm curious where you've landed: when an agent is in your loop, do you still plan by who's free, or have you started cutting the work by shape first? And where has the gate let something through on you?&lt;/p&gt;

</description>
      <category>projectmanagement</category>
      <category>ai</category>
      <category>career</category>
      <category>management</category>
    </item>
    <item>
      <title>I Added a Human Veto to My PM Agent — Here's What Broke First</title>
      <dc:creator>Mykola Kondratiuk</dc:creator>
      <pubDate>Fri, 29 May 2026 19:46:10 +0000</pubDate>
      <link>https://dev.to/itskondrat/i-added-a-human-veto-to-my-pm-agent-heres-what-broke-first-103g</link>
      <guid>https://dev.to/itskondrat/i-added-a-human-veto-to-my-pm-agent-heres-what-broke-first-103g</guid>
      <description>&lt;p&gt;Running automation agents for a while now. Most work fine hands-off. But one of them - my project status agent - kept making decisions that felt right in isolation but wrong in context.&lt;/p&gt;

&lt;p&gt;So I added a human approval step. Not a "review and confirm" UI widget. An actual veto gate in the workflow itself: the agent drafts the action, pauses, and waits for my explicit go-ahead before doing anything irreversible.&lt;/p&gt;

&lt;p&gt;Here's what I didn't expect: the first thing to break wasn't the agent logic. It was my own habits.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem I Was Trying to Solve
&lt;/h2&gt;

&lt;p&gt;My status reporting agent does three things: pulls data from Jira, formats a weekly PM summary, and posts it to the team Slack channel. Straightforward automation.&lt;/p&gt;

&lt;p&gt;Except twice in three months it posted something embarrassing. Once it included a stale blocker that had been resolved 48 hours prior. Once it flagged a team member's ticket as overdue when they'd actually shipped early and the tracker hadn't caught up.&lt;/p&gt;

&lt;p&gt;Neither was catastrophic. Both were awkward.&lt;/p&gt;

&lt;p&gt;The traditional fix would be "add better logic to catch these cases." I tried that. Added freshness checks, added resolved-status validation. Still leaked edge cases.&lt;/p&gt;

&lt;p&gt;So I took a different approach: make human review a structural part of the workflow, not a safety net I bolt on when things go wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Actually Built
&lt;/h2&gt;

&lt;p&gt;The architecture is boring. Agent generates the Slack message draft, posts to a private review channel, waits for a thumbs-up emoji reaction, only then posts to the team channel.&lt;/p&gt;

&lt;p&gt;If no reaction within 2 hours, it pings me directly and kills the task. No silent failures.&lt;/p&gt;

&lt;p&gt;The human approval isn't optional and it isn't a fallback. It's a required step in the sequence. The workflow can't progress without it.&lt;/p&gt;

&lt;p&gt;I got the idea from reading about Microsoft Conductor, which open-sources a similar pattern for multi-agent orchestration. Human approval as a default workflow step, not a retrofit. Their framing stuck with me: &lt;strong&gt;designed-in, not bolted-on.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Broke
&lt;/h2&gt;

&lt;p&gt;I expected the agent to break. It didn't.&lt;/p&gt;

&lt;p&gt;I expected me to approve everything in under 5 minutes. I did, mostly.&lt;/p&gt;

&lt;p&gt;What I didn't expect: I stopped trusting my own review. The first week, I read every draft carefully. By week three, I was rubber-stamping. My brain had offloaded judgment to "well the agent probably got it right." The approval gate existed, but the actual human review stopped happening.&lt;/p&gt;

&lt;p&gt;This is the invisible failure mode nobody talks about. You add a human step. The human shows up. But they're not really there.&lt;/p&gt;

&lt;p&gt;The fix was embarrassingly simple: I added friction. Required a comment, not just an emoji. Had to type at least one word before the workflow could advance. Stupid? Maybe. Effective? Completely.&lt;/p&gt;

&lt;p&gt;Turns out &lt;strong&gt;the veto gate only works if the human has to engage to use it.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Things I Didn't Know I Needed
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Escalation hooks, not just approval gates.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not all decisions are equal. Minor formatting choices don't need a veto. Anything that posts externally or modifies data does. I ended up building a simple severity classifier: low = auto-approve, medium = soft review prompt, high = hard gate. Saved probably 70% of the friction without sacrificing coverage where it mattered.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. A timeout that fails loudly.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;My 2-hour window was too long. If I'm in back-to-back meetings, the agent just hung. Switched to 30 minutes with an escalating Slack ping. Now if I haven't approved it, I can't miss it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. A clear distinction between "irreversible" and "annoying-to-undo."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I started gating everything. Caught myself adding a veto to the agent that sends me my own daily brief. Nobody else sees that. No irreversible action involved. Human gate added zero value there, only friction.&lt;/p&gt;

&lt;p&gt;The useful mental model: &lt;strong&gt;if I had to undo this action at 11pm on a Friday, would I care?&lt;/strong&gt; Yes = human gate. No = let it run.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters More Than It Looks
&lt;/h2&gt;

&lt;p&gt;Most of the "AI safety" conversation in enterprise is about governance frameworks and audit trails. That's real. But the practical engineering question is simpler:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Which steps in your agent workflow require a human in the loop by design, not by accident?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Design it from the start and the workflow is reliable. Bolt it on after an incident and you're playing catch-up forever.&lt;/p&gt;

&lt;p&gt;The Microsoft Conductor open-source was notable to me not for the code but for the default: human approval ON unless you opt out. Most agent frameworks do the opposite. They default to autonomous and assume you'll add guardrails when you need them.&lt;/p&gt;

&lt;p&gt;I think that's backwards. Especially for agents touching anything external: posting, sending, modifying.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where I'm Landing
&lt;/h2&gt;

&lt;p&gt;The veto gate has been running about 6 weeks now. Two incidents caught before they shipped. One case where my review actually improved the draft - not just filtered a bad one. About 3 minutes of daily overhead.&lt;/p&gt;

&lt;p&gt;Worth it. But only because I designed the friction deliberately. The version without the comment requirement was almost worse than no gate at all - it gave me false confidence in an approval that wasn't really happening.&lt;/p&gt;

&lt;p&gt;If you're building agent workflows that touch anything irreversible, the question I'd ask first: &lt;strong&gt;what happens if this runs at 2am and you're asleep?&lt;/strong&gt; Whatever you wouldn't want to explain the next morning - that's where your human gate goes.&lt;/p&gt;




&lt;p&gt;If any of this maps to workflows you're building, curious what the "irreversible action" problem looks like on your end.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>vibecoding</category>
      <category>agents</category>
      <category>discuss</category>
    </item>
    <item>
      <title>I Read Intuit's 3,000-Job Layoff Memo - Here's the One Line Every AI Restructuring Memo Is Missing</title>
      <dc:creator>Mykola Kondratiuk</dc:creator>
      <pubDate>Thu, 21 May 2026 06:55:22 +0000</pubDate>
      <link>https://dev.to/itskondrat/i-read-intuits-3000-job-layoff-memo-heres-the-one-line-every-ai-restructuring-memo-is-missing-igh</link>
      <guid>https://dev.to/itskondrat/i-read-intuits-3000-job-layoff-memo-heres-the-one-line-every-ai-restructuring-memo-is-missing-igh</guid>
      <description>&lt;p&gt;canonical_url:&lt;/p&gt;

&lt;p&gt;On Tuesday, May 20, Intuit's CEO Sasan Goodarzi sent a memo announcing 3,000 jobs cut. 17% of the company. Reason: "reduce complexity to focus on AI." I read it twice looking for one line. It wasn't there.&lt;/p&gt;

&lt;p&gt;The same line has been missing from every AI workforce announcement I have read in the last two years. I want to name it, because the missing line is an engineering-accountability problem before it is a PM-leadership problem, and the people closest to the failure surface (you, reading this on Dev.to) are the ones who feel the gap first.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Memo Names Three Things. It Skips One.
&lt;/h2&gt;

&lt;p&gt;Every AI restructuring announcement in the last two years has the same shape.&lt;/p&gt;

&lt;p&gt;What got cut. Intuit: 3,000 jobs, 17% of workforce. Klarna 2024: customer support function. Duolingo 2025: a chunk of contractor work. IBM 2025: a large internal reorg. Always specific. Always quantified.&lt;/p&gt;

&lt;p&gt;What gets refocused. Intuit: "focus on AI." Klarna: AI-first customer service. Duolingo: AI-augmented learning. IBM: "augmenting HR with AI." Always a direction.&lt;/p&gt;

&lt;p&gt;What stays. Intuit named existing partnerships with OpenAI and Anthropic. Every memo names what stays. It is the reassurance paragraph.&lt;/p&gt;

&lt;p&gt;What's missing. A real human name attached to the failure path of any specific AI system the cut workers used to operate. &lt;em&gt;"When this agent gets it wrong, [name] answers."&lt;/em&gt; That line. Zero of the memos name it.&lt;/p&gt;

&lt;p&gt;It is the same shape. Cut, refocus, stay, no failure owner. Once you read four in a row, you stop reading four announcements and start reading one announcement repeated four times.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Got Swapped
&lt;/h2&gt;

&lt;p&gt;Underneath every one of these memos is a structural swap nobody is naming.&lt;/p&gt;

&lt;p&gt;The financial side moves cleanly. Payroll down. Software spend up. The controller closes the quarter. The general ledger has every entry it needs.&lt;/p&gt;

&lt;p&gt;The accountability side does not move at all. The worker who answered "I made that call, here's why" is gone. The AI that makes the call now has nobody attached to it.&lt;/p&gt;

&lt;p&gt;The headcount got swapped for AI capacity. The accountability the headcount carried did not get re-assigned. By default, accountability becomes nobody's job. Not by malice. Not by oversight. Just by the structural shape of a memo that names everything except the failure owner.&lt;/p&gt;

&lt;p&gt;This is the workforce-to-accountability swap. It is the unstaffed accountability sitting in every AI restructuring announcement, including the next one your company sends.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Is An Engineering Problem First
&lt;/h2&gt;

&lt;p&gt;Most of the layoff coverage is framed at the executive layer - the CEO's memo, the press release, the analyst reaction. The frame I want here is the practitioner one.&lt;/p&gt;

&lt;p&gt;The agent runs in your repo. The pipeline that calls the agent is wired to production. The customer-facing surface the agent writes into is wired to actual humans on the other end. When the agent is wrong, the wrongness lands in code you maintain - a misrouted refund, a wrong tax calculation, a wrong customer email, a wrong policy interpretation surfaced through a chat UI you shipped.&lt;/p&gt;

&lt;p&gt;You are closest to the failure surface. You feel the missing line first. The PM who is supposed to author the line is downstream of the deployment by definition. By the time the line is missing in production, you have probably already noticed.&lt;/p&gt;

&lt;p&gt;That is why this is worth flagging as an engineer-reader: the missing line is not a future PM artifact, it is a present-tense gap on workflows you are already running.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Sharper Edge - Voice Agents This Week
&lt;/h2&gt;

&lt;p&gt;Three different vendors shipped voice-capable AI agents in the last week. PollyReach added live voice on a CRM agent. AdaptiveAI shipped triggered outbound phone agents. Crescendo's Live CX agents take full inbound calls.&lt;/p&gt;

&lt;p&gt;These agents represent the company in a verbal conversation. The voice is the company's. The commitments the agent makes - "I'll refund that", "I'll escalate this", "I'll send you the contract by Friday" - bind the company in the customer's experience.&lt;/p&gt;

&lt;p&gt;No named human is on the call. The agent uptime is 99.98%. The customer is on the line. The named human at the company is not in the conversation, and after the conversation, is not in the transcript.&lt;/p&gt;

&lt;p&gt;99.98% uptime is the SRE side of this. Accountability is the other side. Reliability ≠ accountability. You can have a perfectly uptime-correct agent and still have nobody on the line when its output causes harm.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Engineering+PM Move
&lt;/h2&gt;

&lt;p&gt;The move is small enough to do this afternoon, on one workflow.&lt;/p&gt;

&lt;p&gt;Pick one agent or AI-replaced workflow your team runs. One. The one most likely to be wrong about something that matters.&lt;/p&gt;

&lt;p&gt;Add a one-line block to the agent's spec (AGENTS.md, prompt header, deployment doc - wherever the agent's contract lives):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Failure Owner&lt;/span&gt;
When this agent's output causes harm, [name] answers.
Signed: [name], [date].
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Commit it. PR it. Tag the named person on the PR.&lt;/p&gt;

&lt;p&gt;That is the inheritance move. It is the human-side complement to the version control, the test suite, and the SLA. Three accountability artifacts on the technical side (git blame, test owners, on-call rotation) and zero on the AI output side is the current default. Adding the line fixes one of them.&lt;/p&gt;

&lt;p&gt;If you are the PM on the team and the engineering side is doing this before you do, you are watching a role-growth opportunity walk by. If you are the engineer on the team and you are already maintaining the agent's spec, the line is yours to write today, and the PM on your team will copy it onto the next four workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pattern Will Keep Repeating
&lt;/h2&gt;

&lt;p&gt;The next AI restructuring memo will come this quarter. Probably from a Fortune 500 SaaS company. Probably citing "complexity reduction" or "AI focus." Probably 1,000 to 10,000 jobs. Read it the way I read Intuit's. You will find the cut. You will find the refocus. You will find the stay. You will not find the line.&lt;/p&gt;

&lt;p&gt;The line has to be authored. Someone has to sign it. The earliest signer on any given workflow is the person closest to it - which, on most AI-augmented teams in 2026, is an engineer plus a PM, not a CEO writing a memo.&lt;/p&gt;

&lt;p&gt;Three thousand jobs at Intuit. Zero named humans in the announcement. The missing line is the artifact. The name on the line is the move.&lt;/p&gt;

&lt;p&gt;What's missing from the last AI announcement your company sent - the cut, the refocus, the stay, or the failure owner?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>career</category>
      <category>projectmanagement</category>
      <category>discuss</category>
    </item>
    <item>
      <title>I Read the Devenex Launch Yesterday - Here's the Policy File Your Agent Repo Is Still Missing</title>
      <dc:creator>Mykola Kondratiuk</dc:creator>
      <pubDate>Wed, 20 May 2026 07:30:41 +0000</pubDate>
      <link>https://dev.to/itskondrat/i-read-the-devenex-launch-yesterday-heres-the-policy-file-your-agent-repo-is-still-missing-23j7</link>
      <guid>https://dev.to/itskondrat/i-read-the-devenex-launch-yesterday-heres-the-policy-file-your-agent-repo-is-still-missing-23j7</guid>
      <description>&lt;p&gt;I spent an hour reading the Devenex launch yesterday and the only sentence I keep coming back to is "execution control plane." That phrase is doing a lot of work.&lt;/p&gt;

&lt;p&gt;It says: enforcement is a product now. Every agent request gets policy-evaluated, identity-bound, recorded as evidence before anything runs.&lt;/p&gt;

&lt;p&gt;It does not say: the policy itself exists.&lt;/p&gt;

&lt;h2&gt;
  
  
  Six products ship enforcement. Zero ship the policy.
&lt;/h2&gt;

&lt;p&gt;Look at what shipped this month. Devenex launched May 19 as the first execution control plane. Antigravity 2.0 hardened Git policies at Google I/O Day 2. Notion's External Agent API went GA with workspace-scoped guardrails. Claude has had tool-use limits since launch. OpenAI has function-call constraints. Salesforce Agentforce has action approvals.&lt;/p&gt;

&lt;p&gt;Six products. Different vendors. Different layers. All shipping enforcement.&lt;/p&gt;

&lt;p&gt;The artifact they all need to enforce against is the same shape. None of them ship it. That artifact is your problem, and it lives in your repo, not theirs.&lt;/p&gt;

&lt;p&gt;I started calling it the policy file.&lt;/p&gt;

&lt;h2&gt;
  
  
  What goes in the policy file
&lt;/h2&gt;

&lt;p&gt;Four sections. I've been writing it this way for a while; the launches this week made me realize it's the same shape across every enforcement product I read the docs for. The shape doesn't depend on the vendor.&lt;/p&gt;

&lt;h3&gt;
  
  
  Action classes
&lt;/h3&gt;

&lt;p&gt;The agent's universe of possible actions, broken into named classes: &lt;code&gt;read&lt;/code&gt;, &lt;code&gt;write&lt;/code&gt;, &lt;code&gt;send-external&lt;/code&gt;, &lt;code&gt;transact&lt;/code&gt;, &lt;code&gt;escalate&lt;/code&gt;, &lt;code&gt;spawn-subagent&lt;/code&gt;. Each class is a category the policy file attaches constraints to. The act of writing the list is the point. The default in every deployment doc I've seen is implicit: the agent can do anything inside its tool set. Naming classes is how you refuse that default.&lt;/p&gt;

&lt;p&gt;A sketch in YAML:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;action_classes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;read&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;sources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;crm.contacts&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;crm.opportunities&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;write&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;targets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;crm.opportunities.notes&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;send_external&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;channels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;email&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;slack-dm&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;transact&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;instruments&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;stripe.refund&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's not a real schema. It's the shape your real schema settles into after the third review.&lt;/p&gt;

&lt;h3&gt;
  
  
  Blast radius caps
&lt;/h3&gt;

&lt;p&gt;A number per class. Not a vague guardrail, a number the enforcement layer can compare against at request time.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;caps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;write.records_per_run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt;
  &lt;span class="na"&gt;send_external.recipients_per_session&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
  &lt;span class="na"&gt;transact.usd_per_run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;500&lt;/span&gt;
  &lt;span class="na"&gt;spawn_subagent.depth&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The contrast: the deployment doc says "the agent has access to the CRM." The policy file says "the agent's write class is capped at fifty records per run." One sentence Devenex can check. One sentence Antigravity can check. One sentence Claude tool-use can check.&lt;/p&gt;

&lt;h3&gt;
  
  
  Escalation triggers
&lt;/h3&gt;

&lt;p&gt;The inverse half of the allowlist. When the agent hits a class not in its policy, or a cap it's about to exceed, what fires? Named human. Named channel. Named SLA.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;escalation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;write&lt;/span&gt;
    &lt;span class="na"&gt;trigger&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cap_exceeded&lt;/span&gt;
    &lt;span class="na"&gt;route&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;#agent-ops"&lt;/span&gt;
    &lt;span class="na"&gt;owner&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;@owner-of-record"&lt;/span&gt;
    &lt;span class="na"&gt;sla_hours&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;4&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;transact&lt;/span&gt;
    &lt;span class="na"&gt;trigger&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;any&lt;/span&gt;
    &lt;span class="na"&gt;route&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;#finance-approvals"&lt;/span&gt;
    &lt;span class="na"&gt;owner&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;@treasury-lead"&lt;/span&gt;
    &lt;span class="na"&gt;sla_hours&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The deployment doc has "agent owner" once on page one. The policy file has an escalation route per class.&lt;/p&gt;

&lt;h3&gt;
  
  
  Evidence schema
&lt;/h3&gt;

&lt;p&gt;What the agent has to log so a human can audit the run afterward. Structured output. The action class invoked. The tool calls. The identity the agent acted as. The policy version. The escalation path if any.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;evidence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;required_fields&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;run_id&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;policy_version&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;action_class&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;tool_calls&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;acting_identity&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;escalation_record&lt;/span&gt;
  &lt;span class="na"&gt;format&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;jsonl&lt;/span&gt;
  &lt;span class="na"&gt;retention_days&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;365&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without an evidence schema, you can't answer "did the agent follow the policy?" after the fact. The policy was unenforceable from the start.&lt;/p&gt;

&lt;h2&gt;
  
  
  A specific moment that made this concrete
&lt;/h2&gt;

&lt;p&gt;I was reading through a deployment doc for an agent recently. Clean prose. Listed the APIs. Listed the data sources. Useful agent.&lt;/p&gt;

&lt;p&gt;No section for what happens when it tries to write five thousand records. No section for what happens when it tries to send to two hundred recipients. No section for what happens when it transacts above a cap, because nobody had written the cap.&lt;/p&gt;

&lt;p&gt;The deployment doc wasn't wrong. It was answering the wrong question. It answered "what does the agent do?" The policy file answers "what is the agent allowed to do, and what fires if any of that breaks?"&lt;/p&gt;

&lt;p&gt;Different artifact. Different reviewer. Different file.&lt;/p&gt;

&lt;h2&gt;
  
  
  The clean split: enforcement vs. authoring
&lt;/h2&gt;

&lt;p&gt;Devenex et al. ship enforcement. That half is done. The other half - authoring - isn't a product, and I don't think it can be one. Authoring is the codification of your team's actual judgment about what the agent should be allowed to do. That judgment is cross-functional: engineering knows the runtime, security knows the threat model, legal knows the constraint, finance knows the cap.&lt;/p&gt;

&lt;p&gt;It's not "PM lobs a doc over the wall." The PM convenes the call, drafts the file, opens the PR. Engineering reviews it the same way it reviews a Terraform plan. Security reviews it the same way it reviews IAM. The policy ships in the same PR as the agent.&lt;/p&gt;

&lt;p&gt;That's policy-as-code, the shape devs already know from infra. The new thing isn't the shape; it's the artifact existing for AI agents at all.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd do this week if I were shipping an agent
&lt;/h2&gt;

&lt;p&gt;Open a &lt;code&gt;policy.yaml&lt;/code&gt; in the agent repo. Stub the four sections. Pin one number per class even if it's a wild guess. Wire the evidence schema into the agent's logging path. Put it in the same PR as the next prompt change.&lt;/p&gt;

&lt;p&gt;The enforcement layer your platform vendor ships is checking against something. If nobody wrote the something, the enforcement is checking against silence.&lt;/p&gt;

&lt;p&gt;What's the section your agent repo is missing first - blast radius caps, or the evidence schema?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devops</category>
      <category>vibecoding</category>
      <category>security</category>
    </item>
    <item>
      <title>I Built a 5-Signal Vendor Watchlist for Google I/O 2026 - Here's What Each One Will Break in My Stack</title>
      <dc:creator>Mykola Kondratiuk</dc:creator>
      <pubDate>Fri, 15 May 2026 05:15:33 +0000</pubDate>
      <link>https://dev.to/itskondrat/i-built-a-5-signal-vendor-watchlist-for-google-io-2026-heres-what-each-one-will-break-in-my-23bp</link>
      <guid>https://dev.to/itskondrat/i-built-a-5-signal-vendor-watchlist-for-google-io-2026-heres-what-each-one-will-break-in-my-23bp</guid>
      <description>&lt;p&gt;canonical_url:&lt;/p&gt;

&lt;p&gt;Google I/O 2026 is four days out. If you ship anything that touches a Google model, the keynote is about to change capability assumptions your team never approved against. I have done this dance for six straight years of keynotes, and the pattern is always the same: an engineer pastes a Verge link into Slack on Wednesday, someone says "we should look at this", and three weeks later a capability we never reviewed quietly shipped inside a tool we already authorized.&lt;/p&gt;

&lt;p&gt;This year I tried something different. Friday morning I sat down with a coffee and wrote a five-line watchlist for the keynote - vendor signals plus one specific engineering action per signal. The exercise was small enough to feel silly. Then I realized the format reuses for every keynote that follows. AWS re:Invent. Microsoft Build. OpenAI DevDay.&lt;/p&gt;

&lt;p&gt;Sharing it here because if you have not built one yet, the next four days are your peak window.&lt;/p&gt;

&lt;h2&gt;
  
  
  What vendor product observability actually means
&lt;/h2&gt;

&lt;p&gt;Most engineering teams I work with have great downstream observability. Uptime, error rates, model latency, queue depth. The ops team set it up. Dashboards exist.&lt;/p&gt;

&lt;p&gt;Ask the same teams what they observe about their vendors and the answer goes quiet. The signals that reshape what your stack can do - a keynote, a model release, a quietly retired chatbot, a market position shift - arrive through press coverage, not through a dashboard. There is no Datadog for vendor product strategy. The closest substitute is a small, named list of signals you commit to reading on the day of the keynote.&lt;/p&gt;

&lt;p&gt;That list is the watchlist. It is not glamorous. It is a 5-row markdown file. The discipline is that it exists.&lt;/p&gt;

&lt;h2&gt;
  
  
  Signal 1: OS-layer ambient intelligence as a new delivery category
&lt;/h2&gt;

&lt;p&gt;The Android Show I/O Edition aired last week. Gemini Intelligence ships in June as the intelligence layer running below the Android app surface - booking, browsing, form completion - with access to email, calendar, messages. Rolling out as an OS update to any Android 12+ device on the AI Pro or Ultra tier.&lt;/p&gt;

&lt;p&gt;If your team builds on Android, this matters because most authorization patterns cover two delivery categories: agents your team deployed, and agents a vendor embedded in a tool your team authorized. OS-layer intelligence is a third category, and it does not arrive through your authorization pipeline. It arrives through the OS update channel.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Engineering action at I/O:&lt;/strong&gt; Pull the list of Android-deployed devices in your fleet that intersect "AI Pro or Ultra subscriber" and "Android 12+". That intersection is the scope of devices where the OS now has agent capabilities your data flow assumptions never accounted for.&lt;/p&gt;

&lt;h2&gt;
  
  
  Signal 2: Model capability delta as silent capability inheritance
&lt;/h2&gt;

&lt;p&gt;When a frontier model jumps capability, every downstream tool that runs on it inherits the new floor on keynote day. If you have authorized AI tools whose vendor swapped in a Google model, those tools now have capabilities you never approved against, and you may not even know which ones.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Engineering action at I/O:&lt;/strong&gt; Make a one-column list of authorized AI tools your team uses that run on Google models. After the keynote, write the capability delta beside each one. The delta column is the trigger for re-review. If the delta crosses a sensitivity threshold (PII handling, code execution, autonomous file I/O), the tool needs a fresh approval pass.&lt;/p&gt;

&lt;p&gt;I tried this exercise last month with the Anthropic Opus 4.7 200k context window release. The finding surprised me: capping context at 200k produced better spec quality on agent reviews than running uncapped. Context window length is a quality lever, not just a capacity dial. The delta is not always "more is more."&lt;/p&gt;

&lt;h2&gt;
  
  
  Signal 3: A2A protocol updates as a multi-agent perimeter change
&lt;/h2&gt;

&lt;p&gt;Google has been moving toward agent interoperability standards for two quarters. Any A2A protocol announcement at I/O reshapes the question of which authorized agents are allowed to talk to which other agents.&lt;/p&gt;

&lt;p&gt;This is the question most engineering teams treat as edge case until production forces it. Agents are still mostly designed as endpoints with their own auth. The graph view - what agent A is permitted to call agent B for, under what conditions, with what data - is rarely written down before a multi-agent stack hits real users.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Engineering action at I/O:&lt;/strong&gt; If an A2A spec or protocol lands, write the team's permission policy for agent-to-agent communication on the same day. One page. It does not have to be production-ready. It has to exist before the protocol shows up in the SDK you already use.&lt;/p&gt;

&lt;h2&gt;
  
  
  Signal 4: Vendor market position drift
&lt;/h2&gt;

&lt;p&gt;Axios reported on May 13 that Anthropic has overtaken OpenAI in workplace adoption. Take a moment with that. If your team's vendor selection rationale was written when OpenAI was the dominant workplace AI, your approval rationale references a market position that is no longer current.&lt;/p&gt;

&lt;p&gt;Most approval artifacts have no field for "market position when approved" and no scheduled review trigger for "market position has shifted since." The approval ages silently. Two years from now somebody will ask why the team is on the second-place vendor and the honest answer will be "nobody asked us to update the rationale when the market moved."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Engineering action at I/O:&lt;/strong&gt; Add one field to your vendor approval template - &lt;em&gt;Market position when approved.&lt;/em&gt; The field is the trigger. The next position shift now has a review path baked in. This costs five minutes and pays for itself the first time a budget review asks the question.&lt;/p&gt;

&lt;h2&gt;
  
  
  Signal 5: Vendor product retirement as an agent migration event
&lt;/h2&gt;

&lt;p&gt;Amazon announced this week it is retiring Rufus and replacing it with the Alexa shopping agent. This is the rehearsal for the next ten retirements. Every enterprise AI vendor will eventually retire the chatbot you authorized and ship an agent successor with a different operating envelope.&lt;/p&gt;

&lt;p&gt;Most engineering teams treat the successor as a drop-in. It is not. The operating envelope is different. The data flow is different. The failure mode is different. If your team authorized the predecessor, the successor needs a fresh review - not a carryover.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Engineering action at I/O:&lt;/strong&gt; Watch session notes for "deprecated", "retired", "replaced by" language. Every retirement of a product your team authorized triggers a fresh review pass on the successor. Block the calendar before the announcement, not after.&lt;/p&gt;

&lt;h2&gt;
  
  
  The reusable shape
&lt;/h2&gt;

&lt;p&gt;The five signals are not the point. The point is that writing them down on Friday changes how the keynote reads on Tuesday. The keynote stops being news and becomes a scheduled review trigger with a defined output - an updated stack policy.&lt;/p&gt;

&lt;p&gt;The same five categories apply to AWS re:Invent, MSFT Build, OpenAI DevDay. OS-layer delivery. Model capability delta. A2A perimeter. Vendor position drift. Product retirement. Five rows. One engineering action per row. Re-read four times a year.&lt;/p&gt;

&lt;p&gt;The thirty thousand Claude-certified consultants PwC just announced make the point sharper, not weaker. Every keynote lands against a workforce that already has the tooling. The watchlist is not optional anymore.&lt;/p&gt;

&lt;h2&gt;
  
  
  Question
&lt;/h2&gt;

&lt;p&gt;If you maintain a watchlist for major vendor keynotes - or have ever wished you had one - what signal would I add as the sixth row? I want to compile what you all post into a single follow-up before the keynote.&lt;/p&gt;

&lt;p&gt;Tags: #ai #googleio #productivity #discuss&lt;/p&gt;

</description>
      <category>ai</category>
      <category>googleio</category>
      <category>productivity</category>
      <category>discuss</category>
    </item>
    <item>
      <title>I Graded My Agent Deployment Doc Against LangChain Interrupt - Here Are the 5 Gaps</title>
      <dc:creator>Mykola Kondratiuk</dc:creator>
      <pubDate>Wed, 13 May 2026 07:34:10 +0000</pubDate>
      <link>https://dev.to/itskondrat/i-graded-my-agent-deployment-doc-against-langchain-interrupt-here-are-the-5-gaps-572g</link>
      <guid>https://dev.to/itskondrat/i-graded-my-agent-deployment-doc-against-langchain-interrupt-here-are-the-5-gaps-572g</guid>
      <description>&lt;p&gt;canonical_url:&lt;/p&gt;

&lt;p&gt;I do a lot of deployment authorization docs. That's the PM version of what an SRE would call a launch checklist. It lists the agent, the scopes, the secrets it touches, the kill switch, the cost ceiling, the rollback path.&lt;/p&gt;

&lt;p&gt;For the last two years, the audience for that doc has been exactly one team: ours. Security signs it. Compliance reviews it. Engineering builds against it. Nobody outside the company ever reads a line.&lt;/p&gt;

&lt;p&gt;Today I pulled up the LangChain Interrupt Day 1 schedule and that audience doubled.&lt;/p&gt;

&lt;h2&gt;
  
  
  The thing that changed at 9:30 PT today
&lt;/h2&gt;

&lt;p&gt;Harrison Chase keynoted Interrupt at 9:30 Pacific. The headline was tame on paper: a synthesis of what 1,000+ teams shipped in production over the past 12 months. The substance was less tame. Clay, Rippling, Workday, plus the long tail of teams running smaller agent fleets, surfaced concrete production patterns. The talk wasn't aspirational. It read like a postmortem of an entire industry's first serious year of agent deployment.&lt;/p&gt;

&lt;p&gt;That synthesis is now in public. Same week, SAP Sapphire closed with 200+ agents under a single stated design rule (governance first), with Claude as the reasoning layer and NVIDIA OpenShell as the execution wrapper. Two completely different sources. Same structural artifact: a public reference for what production-ready agent deployment looks like.&lt;/p&gt;

&lt;p&gt;My internal doc has been graded against one rubric. As of today, it gets a second one whether I asked for it or not.&lt;/p&gt;

&lt;h2&gt;
  
  
  I sat down and graded mine
&lt;/h2&gt;

&lt;p&gt;I gave myself 45 minutes. Three columns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pattern named in the public production literature this week&lt;/li&gt;
&lt;li&gt;Our position: adopted, diverged, or gap&lt;/li&gt;
&lt;li&gt;One sentence on why&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I expected to find maybe 1 gap, possibly 2. I found 5.&lt;/p&gt;

&lt;p&gt;Here they are, with what I'm doing about each one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gap 1: per-action cost ceiling, not per-month budget
&lt;/h2&gt;

&lt;p&gt;Our cost guardrail was a monthly budget per agent. Easy to set, easy to forget, fires the alert about a week after damage is done.&lt;/p&gt;

&lt;p&gt;The production pattern that kept surfacing in the public synthesis is per-action ceiling with auto-pause. If a single tool invocation projects to cost more than $N (or more than $N over the rolling 60 seconds), the agent stops and pings a human.&lt;/p&gt;

&lt;p&gt;Our fix is small: a wrapper around the LLM client that estimates token cost in advance, compares against a per-action ceiling defined in the agent's config, and routes to a pause queue when over. About 40 lines. The harder part was deciding the ceiling, which is now a PM call, not an SRE call.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gap 2: scoped credentials per agent, not per agent family
&lt;/h2&gt;

&lt;p&gt;We had one service account per agent family (e.g., "research agents", "support agents"). Audit logs showed activity by family, not by individual agent. Fine for accounting. Bad for blast-radius reasoning.&lt;/p&gt;

&lt;p&gt;The production pattern is one credential per logical agent instance, with the scope narrowed to the specific tables, endpoints, or namespaces that agent legitimately touches. If a single agent goes off, you revoke one credential without taking down the family.&lt;/p&gt;

&lt;p&gt;This is a one-day migration in our system because the agent identity already exists in our config. We just weren't projecting it down into the credential layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gap 3: the production review document does not yet exist
&lt;/h2&gt;

&lt;p&gt;This one's about the artifact, not the runtime. Our internal review covers our policy: does this satisfy our compliance posture? It does not cover the production floor reading: where do we sit relative to what 1,000+ teams already found works?&lt;/p&gt;

&lt;p&gt;I'm adding a new section to every deployment doc going forward. Three subheads:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Adopted: patterns we took straight from the public practitioner record&lt;/li&gt;
&lt;li&gt;Diverged: patterns we considered and chose against, with the reason&lt;/li&gt;
&lt;li&gt;Gaps: patterns we don't have an answer for yet&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The "Gaps" subhead is the high-leverage one. Gaps documented in your own voice are gaps you control the conversation about. Gaps surfaced by a stakeholder in a meeting are not.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gap 4: agent-writes-its-own-retries is a flagged pattern
&lt;/h2&gt;

&lt;p&gt;We let one agent class write its own retry policy when a tool call failed. It's an obvious productivity win until the agent invents a retry pattern that compounds against a rate-limited downstream service.&lt;/p&gt;

&lt;p&gt;The published practitioner consensus this week was clear: retries belong in the agent harness, not in the agent's reasoning loop. The agent should not be the entity deciding when to try again.&lt;/p&gt;

&lt;p&gt;Our fix is to replace the self-retry behavior with a queued reissue: the harness owns the policy, the agent owns the request. About a day's work, including writing the test cases. Most of the time was migrating the existing retry-policy state out of the prompt and into config.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gap 5: blast radius is named in our spec, but not in our review template
&lt;/h2&gt;

&lt;p&gt;Anthropic's Deputy CISO ran a webinar Monday framing agent governance around "specific actions, scopes, and blast radii." That phrase is going to be the lingua franca of agent risk for at least the next 12 months.&lt;/p&gt;

&lt;p&gt;We use blast radius informally in our deployment specs. We do not have a column for it in our review template. So the conversation we want to have (what's the worst this agent can do before someone catches it?) sometimes doesn't happen because the document doesn't prompt it.&lt;/p&gt;

&lt;p&gt;The fix is a column. Each agent's row gets: "Blast radius at maximum scope, if every guardrail fails." One sentence. The act of writing the sentence is the audit.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'm shipping by Friday
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The per-action cost wrapper, with one ceiling per agent class, tunable in config.&lt;/li&gt;
&lt;li&gt;One credential per logical agent. Audit log change merges with it.&lt;/li&gt;
&lt;li&gt;A new section in the deployment doc template: Production-Floor Reading.&lt;/li&gt;
&lt;li&gt;The retry policy migrated from the prompt into the harness.&lt;/li&gt;
&lt;li&gt;A blast radius column on the review template, populated retroactively for every active agent.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these are large. The total work is something like 3-4 engineer-days across the team. The reason I wasn't doing them last week is not that they were hard. It's that I didn't have the second reader yet. The internal reader was satisfied. Without the production floor, none of these gaps surfaced as gaps.&lt;/p&gt;

&lt;p&gt;That's the part worth saying out loud. The second reader is what made the gaps visible. The work was always small. The doc was the bottleneck, and the doc didn't know it was the bottleneck.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd ask if you ship agents
&lt;/h2&gt;

&lt;p&gt;If you ran the same 45-minute exercise on your deployment doc this afternoon, which gap would you find first - cost ceiling, credential scope, retries, blast radius, or the review template itself?&lt;/p&gt;

&lt;p&gt;I'm collecting answers through Friday. The Day 2 Interrupt content will sharpen the production-floor reading, and I'd rather refine the template against five teams' gap rows than against my own.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Tags: projectmanagement, ai, agents, productivity, discuss&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>productivity</category>
      <category>discuss</category>
    </item>
    <item>
      <title>I Read a Survey That Predicted My Job's Next 2 Years - Here's What It Got Right and Missed</title>
      <dc:creator>Mykola Kondratiuk</dc:creator>
      <pubDate>Sat, 09 May 2026 07:34:36 +0000</pubDate>
      <link>https://dev.to/itskondrat/i-read-a-survey-that-predicted-my-jobs-next-2-years-heres-what-it-got-right-and-missed-14ea</link>
      <guid>https://dev.to/itskondrat/i-read-a-survey-that-predicted-my-jobs-next-2-years-heres-what-it-got-right-and-missed-14ea</guid>
      <description>&lt;p&gt;canonical_url:&lt;/p&gt;

&lt;p&gt;KPMG just dropped a number on people in my seat. They surveyed 306 Canadian executives. 39% of them expect AI agents to be leading project management for their teams within 2-3 years. 66% are already moving to a fully integrated AI-human workforce. First time the role-redefinition forecast is in survey data, not in an opinion column.&lt;/p&gt;

&lt;p&gt;I run a PM workflow with an agent fleet doing most of the drafting and a lot of the review. So when an executive survey predicts the next two years of my job, I read it as primary source material on what the people who sign my budget are planning to assume.&lt;/p&gt;

&lt;p&gt;Two things stood out.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the executives got right
&lt;/h2&gt;

&lt;p&gt;The direction is correct. The role really is shifting toward direction-and-review instead of artifact authorship. My morning two years ago was inbox plus drafting the day's first brief. My morning today is fleet status, then choosing which of last night's drafts is shippable, which needs another pass, and which got the wrong scope baked in and needs to be killed before it gets routed.&lt;/p&gt;

&lt;p&gt;The horizon is also realistic, depending on where you're starting from. If your team has not yet stood up an agent stack alongside engineering work, 24-36 months to "agents leading PM" is plausible. There is a real procurement, instruction-tuning, governance-design, trust-building cycle to go through. None of it is fast on the first lap.&lt;/p&gt;

&lt;p&gt;The integrated-workforce framing is the part the dev side will recognize fastest. The pattern is the same one engineering already lives: a PR queue where some commits are human-authored, some are agent-authored, and the human decision surface is mostly review and override. The PM equivalent is here. It looks like a doc queue, a roadmap delta queue, a sprint-scope queue. Same shape, different artifacts.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the survey didn't ask
&lt;/h2&gt;

&lt;p&gt;Executive surveys ask about role-level shifts. They don't ask the day-level question, which is the one engineers and PMs both actually live in.&lt;/p&gt;

&lt;p&gt;The day-level question is: what does the morning look like, what's in the queue, what runs without you, what blocks on a human call, where does the dev-PM interface change shape because the PM is mostly directing instead of authoring?&lt;/p&gt;

&lt;p&gt;For the dev side, the change that matters is on the spec-to-ship loop. Specifically, the spec side gets shorter and the review side gets longer. The PM is still naming what to build, but the artifact that lands in your repo as the brief or the scoped doc is increasingly drafted by an agent the PM directed and reviewed. The conversation about the spec moves from "let me write this up and send it Tuesday" to "the agent drafted three variants overnight, here's the one I'd ship, push back if anything looks off." Faster on the spec side. Slower on the review side, because the dev now has to verify that the directed-and-reviewed spec is still coherent before committing to it.&lt;/p&gt;

&lt;p&gt;The survey doesn't measure that loop. It measures the hiring intent and the workforce category. Both useful, neither operational.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 30-day diff
&lt;/h2&gt;

&lt;p&gt;Here's a move that probably translates regardless of role.&lt;/p&gt;

&lt;p&gt;Pull up your current todo list. Write down three items something automated is already doing or could plausibly be doing if you set it up. Write down three items only you in your seat can do. Then pull up your todo list from a month ago. Run the same split. How many items moved from "only you" to "automated or could-be"? Even one is a real signal. Three is a trend.&lt;/p&gt;

&lt;p&gt;I started doing this around the time I noticed the agent had drafted a brief I'd planned to write. The diff that month was small. Six months in, it was not.&lt;/p&gt;

&lt;p&gt;The KPMG number is a 24-month forecast. The 30-day diff is the short-horizon evidence the survey didn't ask for. The forecast is in their hands. The diff is in yours.&lt;/p&gt;

&lt;h2&gt;
  
  
  The floor, not the ceiling
&lt;/h2&gt;

&lt;p&gt;If you've been running this for two years already, the 39% expecting "agents leading PM" in 24-36 months is the floor of what's coming. The practitioner who started seriously in 2024 is already past where executives expect they'll be in 2028. The interesting question is not "will it happen." It's "what does floor + 1 look like, and who's already there."&lt;/p&gt;

&lt;p&gt;The dev side has been at floor + 1 for a while in a few places. The PM side is catching up.&lt;/p&gt;

&lt;p&gt;What's the loop look like on your team?&lt;/p&gt;

</description>
      <category>career</category>
      <category>ai</category>
      <category>productivity</category>
      <category>discuss</category>
    </item>
    <item>
      <title>I Read Boris Cherny's 30-Day Claude Code Stat. Here's What Most Takes Get Wrong.</title>
      <dc:creator>Mykola Kondratiuk</dc:creator>
      <pubDate>Wed, 06 May 2026 07:44:58 +0000</pubDate>
      <link>https://dev.to/itskondrat/i-read-boris-chernys-30-day-claude-code-stat-heres-what-most-takes-get-wrong-5900</link>
      <guid>https://dev.to/itskondrat/i-read-boris-chernys-30-day-claude-code-stat-heres-what-most-takes-get-wrong-5900</guid>
      <description>&lt;p&gt;Boris Cherny, Head of Claude Code at Anthropic, posted a stat on X this morning. In the last 30 days, 100% of his contributions to Claude Code were written by Claude Code itself. 259 PRs. 497 commits. 40,000 lines added. 38,000 removed. Zero by the head of the team.&lt;/p&gt;

&lt;p&gt;Most reads of this go straight to "AI-assisted dev productivity is real." That's the obvious layer. It's not the interesting one.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Engineering Leadership Read
&lt;/h2&gt;

&lt;p&gt;If you run a team in 2026 - staff+ engineer or EM or director - the data point that matters is the second one. The head of a product team is no longer the team's most prolific code author. The head of the team is also no longer the team's most prolific reviewer; Claude Code does the first pass on its own PRs.&lt;/p&gt;

&lt;p&gt;The work the head of the team is doing all day, then, is not the artifact. The artifact (the spec, the PR description) is a thing AI ships now. The calls inside the artifact - what to build, what to kill - those are the work.&lt;/p&gt;

&lt;p&gt;This is the part most senior engineers haven't named in their own job yet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Same Shift, Different Clothing
&lt;/h2&gt;

&lt;p&gt;Same week, ServiceNow shipped a literal AI agent kill switch at Knowledge 2026. The demo: a prompt-injection attack hits a pricing agent. The system maps blast radius. A kill switch surfaces and asks a human to pull it.&lt;/p&gt;

&lt;p&gt;Most coverage framed this as IT and security infrastructure. It is. It's also a leadership data point in product clothing. The vendor solved the product question - what does the kill switch do, how fast does it cut. The vendor cannot solve the leadership question - who decides when to pull it. Against what threshold.&lt;/p&gt;

&lt;p&gt;Same shape as Boris's stat. The artifact (the kill switch feature) ships from a vendor. The call (when to use it) stays with the senior person on the team.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Deciding Work Actually Looks Like
&lt;/h2&gt;

&lt;p&gt;It doesn't show up in a commit log. It doesn't have a template.&lt;/p&gt;

&lt;p&gt;It's the moment in a Slack thread where someone asks "should we ship this?" and the senior person answers in two sentences with three reasons. It's the call to ship the rollback or the forward fix when the AI flagged the regression. It's the human pass on AI-written code that asks "but does this match the product intent?" and decides yes or no.&lt;/p&gt;

&lt;p&gt;None of those moments produce an artifact. All of them are the work that compounds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Measurement Systems Are Blind to It
&lt;/h2&gt;

&lt;p&gt;Performance reviews and promo packets were built when the artifact was the work. They reward what got shipped. The work that got decided leaves no trace, so the system can't see it.&lt;/p&gt;

&lt;p&gt;The senior engineer or EM measured by code volume or design-doc count is being measured against a 2023 work product. The senior engineer measured by decision quality - what got built, what got killed - is being measured against the 2026 one.&lt;/p&gt;

&lt;p&gt;If your performance review still asks for ship counts and never asks about your call log, the system hasn't caught up to your job.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Moves If This Lands
&lt;/h2&gt;

&lt;p&gt;The fix starts small.&lt;/p&gt;

&lt;p&gt;First, start a private call log this week. One line per call. What was decided. What the alternative was. The first week feels like nothing. By month two it's the artifact your performance review was missing - a record of the work that doesn't show up anywhere else.&lt;/p&gt;

&lt;p&gt;Second, lead with the calls in your next promo conversation or career check-in. "I shipped X" is 2023 language. "I decided X over Y because Z, and the outcome was W" is 2026 language. The shape of evidence changes when the work changes.&lt;/p&gt;

&lt;p&gt;Third, find the leader on your team whose daily work is already 80% calls. Watch how they spend their day. That's the role shape you're growing into - and it's quieter than you'd think.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Career Arc Nobody Named
&lt;/h2&gt;

&lt;p&gt;The path from senior IC to staff to manager to director to VP has always meant less authorship and more direction at every rung. What's new in 2026 is the compression. AI takes over artifact production at every level. So the curve from authorship to direction now starts at IC, well below where it used to.&lt;/p&gt;

&lt;p&gt;The senior leader who is still measured by what they shipped is being measured by a metric the system inherited. The senior leader who is being measured by what they decided is being measured by what the work actually is.&lt;/p&gt;

&lt;p&gt;Boris Cherny just gave us the cleanest data point of the year for that shift. The Head of Claude Code stopped writing code, and the team kept shipping. That isn't a productivity story. It's a leadership story, and the system that measures the head of the team hasn't caught up to it yet.&lt;/p&gt;

&lt;p&gt;What was your highest-leverage call this week, and is it visible in any system that measures your work?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>career</category>
      <category>productivity</category>
      <category>leadership</category>
    </item>
    <item>
      <title>I Pulled 3 Months of Engineering Metrics on Our AI Tools - Here's the Dashboard Cell Nobody Built</title>
      <dc:creator>Mykola Kondratiuk</dc:creator>
      <pubDate>Wed, 29 Apr 2026 06:56:11 +0000</pubDate>
      <link>https://dev.to/itskondrat/i-pulled-3-months-of-engineering-metrics-on-our-ai-tools-heres-the-dashboard-cell-nobody-built-1gk2</link>
      <guid>https://dev.to/itskondrat/i-pulled-3-months-of-engineering-metrics-on-our-ai-tools-heres-the-dashboard-cell-nobody-built-1gk2</guid>
      <description>&lt;p&gt;The CFO asked engineering. Engineering pointed at the PM retro. The PM retro had a row that said "team velocity feels higher" and a row that said "developers report subjective time savings." That was the data.&lt;/p&gt;

&lt;p&gt;Meanwhile a fresh enterprise survey out of ExcelMindCyber says 73% of companies will fail to deliver promised ROI on AI investments this year. I read that and thought: of course. The dashboard for the question doesn't exist.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Repo Already Knows
&lt;/h2&gt;

&lt;p&gt;Pull request throughput. Time-to-first-review. Cycle time from open to merge. Build duration. CI flake rate. Incident count. Mean time to recovery. PR size distribution.&lt;/p&gt;

&lt;p&gt;We ship all of these. Most teams stream them into a Grafana board or a Linear analytics view or a custom dbt model on top of GitHub events. The data is in the repo. The data is in the CI logs. The data is in the deploy pipeline.&lt;/p&gt;

&lt;p&gt;What we don't ship is the cell that says "for the workflow we adopted Tool X for, what changed."&lt;/p&gt;

&lt;p&gt;That sounds trivial. It is not.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Cell That Doesn't Exist
&lt;/h2&gt;

&lt;p&gt;To answer the cell honestly you need three things in one row:&lt;/p&gt;

&lt;p&gt;The workflow boundary. Not the tool boundary. "PR review" is a workflow. "Tool X" is a tool. The same tool can land in three workflows and change two of them. You need the join key to be the workflow.&lt;/p&gt;

&lt;p&gt;The before-window. A baseline of the metric for that workflow before the tool landed. Not the team-wide cycle time. The cycle time on the specific class of work the tool was supposed to change.&lt;/p&gt;

&lt;p&gt;The behavior signal. Did engineers actually use the tool inside the workflow, or did they sign up, click around once, and route around it. We have user-event telemetry for our own product. We rarely have it for the AI tool we just bought.&lt;/p&gt;

&lt;p&gt;Without those three columns, the dashboard answers a different question. It answers "did we deploy the tool" not "did the workflow change."&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Tried First (and Why It Failed)
&lt;/h2&gt;

&lt;p&gt;The first version of the cell I built was a simple compare. Cycle time on PRs in February versus cycle time on PRs in April. Tool landed mid-March.&lt;/p&gt;

&lt;p&gt;Numbers looked good. Cycle time was down 14%. I almost shipped it.&lt;/p&gt;

&lt;p&gt;Then I segmented by PR class. Refactor PRs were down 22%. Bug-fix PRs were flat. Feature PRs were up 4%. The aggregate hid three completely different stories.&lt;/p&gt;

&lt;p&gt;Then I looked at tool usage. Half the team had opened the tool fewer than three times in 30 days. The 14% improvement was carried by four developers. The rest of the team was running the same workflow without the tool and getting roughly the same numbers.&lt;/p&gt;

&lt;p&gt;The honest answer to the CFO question wasn't "the tool drove a 14% improvement." It was "four developers got real value, the rest haven't adopted it yet, and we don't have the playbook for the rest."&lt;/p&gt;

&lt;p&gt;If I had shipped the v1 number, the next quarter's budget cycle would have used it as proof. Then we would have spent more on the same shape of tool, and gotten a smaller delta, because the developers who would benefit had already adopted.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Engineering Owns Here
&lt;/h2&gt;

&lt;p&gt;I keep hearing the framing that "this is a PM problem." It isn't, or rather, it isn't only.&lt;/p&gt;

&lt;p&gt;The PM retro happens after the quarter. The dashboard cell happens continuously. If engineering owns the metrics that say "this tool changed this workflow for these developers by this much," the PM gets a starting point that isn't fiction. If engineering owns nothing, the PM writes the retro on vibes and the CFO funds the next round on vibes.&lt;/p&gt;

&lt;p&gt;The tools we already use give us most of what we need. GitHub events. Linear events. Tool-specific webhooks where they exist. A small dbt model that defines workflow boundaries explicitly. A heartbeat metric on tool usage at the user level.&lt;/p&gt;

&lt;p&gt;The piece nobody is building is the join. Workflow x tool x usage x outcome. Four columns. Most teams have one.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Smallest Version Worth Shipping
&lt;/h2&gt;

&lt;p&gt;A single materialized view. Per workflow, per AI tool, per developer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt;
  &lt;span class="n"&gt;workflow_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;tool_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;developer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;date_trunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'week'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;week&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="c1"&gt;-- usage signal&lt;/span&gt;
  &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;distinct&lt;/span&gt; &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'tool_invocation'&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="n"&gt;event_id&lt;/span&gt; &lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;tool_uses&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="c1"&gt;-- workflow outcome&lt;/span&gt;
  &lt;span class="k"&gt;avg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cycle_time_minutes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;cycle_time_avg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;distinct&lt;/span&gt; &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'workflow_completion'&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="n"&gt;event_id&lt;/span&gt; &lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;completions&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;workflow_events&lt;/span&gt;
&lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;tool_events&lt;/span&gt; &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;developer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;workflow_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;week&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Five columns. One join. Then a Grafana panel that shows cycle_time_avg split by tool_uses bucket (zero, low, high). The panel answers the question: "for the developers who actually use the tool, did the workflow get faster, and by how much."&lt;/p&gt;

&lt;p&gt;The first time I ran ours, the bucket comparison was the most honest 30 seconds of the quarter. It told me which tools had earned their seat and which were budget items pretending to be productivity gains.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest Limit
&lt;/h2&gt;

&lt;p&gt;This dashboard cell does not answer whether the tool was worth the money. That requires a price tag, a discount rate, an opportunity-cost guess. That part is genuinely a CFO conversation.&lt;/p&gt;

&lt;p&gt;What the cell does answer is whether the workflow changed at all. Without that, the CFO conversation is fiction. With it, the conversation is at least a real conversation.&lt;/p&gt;

&lt;p&gt;What's the cell your team has built and your CFO doesn't know about yet?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devops</category>
      <category>productivity</category>
      <category>career</category>
    </item>
  </channel>
</rss>
