<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mykola Kondratiuk</title>
    <description>The latest articles on DEV Community by Mykola Kondratiuk (@itskondrat).</description>
    <link>https://dev.to/itskondrat</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3753205%2Fa206f74a-98be-4c2b-abbd-f06ec964327b.jpg</url>
      <title>DEV Community: Mykola Kondratiuk</title>
      <link>https://dev.to/itskondrat</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/itskondrat"/>
    <language>en</language>
    <item>
      <title>I Read Intuit's 3,000-Job Layoff Memo - Here's the One Line Every AI Restructuring Memo Is Missing</title>
      <dc:creator>Mykola Kondratiuk</dc:creator>
      <pubDate>Thu, 21 May 2026 06:55:22 +0000</pubDate>
      <link>https://dev.to/itskondrat/i-read-intuits-3000-job-layoff-memo-heres-the-one-line-every-ai-restructuring-memo-is-missing-igh</link>
      <guid>https://dev.to/itskondrat/i-read-intuits-3000-job-layoff-memo-heres-the-one-line-every-ai-restructuring-memo-is-missing-igh</guid>
      <description>&lt;p&gt;canonical_url:&lt;/p&gt;

&lt;p&gt;On Tuesday, May 20, Intuit's CEO Sasan Goodarzi sent a memo announcing 3,000 jobs cut. 17% of the company. Reason: "reduce complexity to focus on AI." I read it twice looking for one line. It wasn't there.&lt;/p&gt;

&lt;p&gt;The same line has been missing from every AI workforce announcement I have read in the last two years. I want to name it, because the missing line is an engineering-accountability problem before it is a PM-leadership problem, and the people closest to the failure surface (you, reading this on Dev.to) are the ones who feel the gap first.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Memo Names Three Things. It Skips One.
&lt;/h2&gt;

&lt;p&gt;Every AI restructuring announcement in the last two years has the same shape.&lt;/p&gt;

&lt;p&gt;What got cut. Intuit: 3,000 jobs, 17% of workforce. Klarna 2024: customer support function. Duolingo 2025: a chunk of contractor work. IBM 2025: a large internal reorg. Always specific. Always quantified.&lt;/p&gt;

&lt;p&gt;What gets refocused. Intuit: "focus on AI." Klarna: AI-first customer service. Duolingo: AI-augmented learning. IBM: "augmenting HR with AI." Always a direction.&lt;/p&gt;

&lt;p&gt;What stays. Intuit named existing partnerships with OpenAI and Anthropic. Every memo names what stays. It is the reassurance paragraph.&lt;/p&gt;

&lt;p&gt;What's missing. A real human name attached to the failure path of any specific AI system the cut workers used to operate. &lt;em&gt;"When this agent gets it wrong, [name] answers."&lt;/em&gt; That line. Zero of the memos name it.&lt;/p&gt;

&lt;p&gt;It is the same shape. Cut, refocus, stay, no failure owner. Once you read four in a row, you stop reading four announcements and start reading one announcement repeated four times.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Got Swapped
&lt;/h2&gt;

&lt;p&gt;Underneath every one of these memos is a structural swap nobody is naming.&lt;/p&gt;

&lt;p&gt;The financial side moves cleanly. Payroll down. Software spend up. The controller closes the quarter. The general ledger has every entry it needs.&lt;/p&gt;

&lt;p&gt;The accountability side does not move at all. The worker who answered "I made that call, here's why" is gone. The AI that makes the call now has nobody attached to it.&lt;/p&gt;

&lt;p&gt;The headcount got swapped for AI capacity. The accountability the headcount carried did not get re-assigned. By default, accountability becomes nobody's job. Not by malice. Not by oversight. Just by the structural shape of a memo that names everything except the failure owner.&lt;/p&gt;

&lt;p&gt;This is the workforce-to-accountability swap. It is the unstaffed accountability sitting in every AI restructuring announcement, including the next one your company sends.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Is An Engineering Problem First
&lt;/h2&gt;

&lt;p&gt;Most of the layoff coverage is framed at the executive layer - the CEO's memo, the press release, the analyst reaction. The frame I want here is the practitioner one.&lt;/p&gt;

&lt;p&gt;The agent runs in your repo. The pipeline that calls the agent is wired to production. The customer-facing surface the agent writes into is wired to actual humans on the other end. When the agent is wrong, the wrongness lands in code you maintain - a misrouted refund, a wrong tax calculation, a wrong customer email, a wrong policy interpretation surfaced through a chat UI you shipped.&lt;/p&gt;

&lt;p&gt;You are closest to the failure surface. You feel the missing line first. The PM who is supposed to author the line is downstream of the deployment by definition. By the time the line is missing in production, you have probably already noticed.&lt;/p&gt;

&lt;p&gt;That is why this is worth flagging as an engineer-reader: the missing line is not a future PM artifact, it is a present-tense gap on workflows you are already running.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Sharper Edge - Voice Agents This Week
&lt;/h2&gt;

&lt;p&gt;Three different vendors shipped voice-capable AI agents in the last week. PollyReach added live voice on a CRM agent. AdaptiveAI shipped triggered outbound phone agents. Crescendo's Live CX agents take full inbound calls.&lt;/p&gt;

&lt;p&gt;These agents represent the company in a verbal conversation. The voice is the company's. The commitments the agent makes - "I'll refund that", "I'll escalate this", "I'll send you the contract by Friday" - bind the company in the customer's experience.&lt;/p&gt;

&lt;p&gt;No named human is on the call. The agent uptime is 99.98%. The customer is on the line. The named human at the company is not in the conversation, and after the conversation, is not in the transcript.&lt;/p&gt;

&lt;p&gt;99.98% uptime is the SRE side of this. Accountability is the other side. Reliability ≠ accountability. You can have a perfectly uptime-correct agent and still have nobody on the line when its output causes harm.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Engineering+PM Move
&lt;/h2&gt;

&lt;p&gt;The move is small enough to do this afternoon, on one workflow.&lt;/p&gt;

&lt;p&gt;Pick one agent or AI-replaced workflow your team runs. One. The one most likely to be wrong about something that matters.&lt;/p&gt;

&lt;p&gt;Add a one-line block to the agent's spec (AGENTS.md, prompt header, deployment doc - wherever the agent's contract lives):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Failure Owner&lt;/span&gt;
When this agent's output causes harm, [name] answers.
Signed: [name], [date].
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Commit it. PR it. Tag the named person on the PR.&lt;/p&gt;

&lt;p&gt;That is the inheritance move. It is the human-side complement to the version control, the test suite, and the SLA. Three accountability artifacts on the technical side (git blame, test owners, on-call rotation) and zero on the AI output side is the current default. Adding the line fixes one of them.&lt;/p&gt;

&lt;p&gt;If you are the PM on the team and the engineering side is doing this before you do, you are watching a role-growth opportunity walk by. If you are the engineer on the team and you are already maintaining the agent's spec, the line is yours to write today, and the PM on your team will copy it onto the next four workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pattern Will Keep Repeating
&lt;/h2&gt;

&lt;p&gt;The next AI restructuring memo will come this quarter. Probably from a Fortune 500 SaaS company. Probably citing "complexity reduction" or "AI focus." Probably 1,000 to 10,000 jobs. Read it the way I read Intuit's. You will find the cut. You will find the refocus. You will find the stay. You will not find the line.&lt;/p&gt;

&lt;p&gt;The line has to be authored. Someone has to sign it. The earliest signer on any given workflow is the person closest to it - which, on most AI-augmented teams in 2026, is an engineer plus a PM, not a CEO writing a memo.&lt;/p&gt;

&lt;p&gt;Three thousand jobs at Intuit. Zero named humans in the announcement. The missing line is the artifact. The name on the line is the move.&lt;/p&gt;

&lt;p&gt;What's missing from the last AI announcement your company sent - the cut, the refocus, the stay, or the failure owner?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>career</category>
      <category>projectmanagement</category>
      <category>discuss</category>
    </item>
    <item>
      <title>I Read the Devenex Launch Yesterday - Here's the Policy File Your Agent Repo Is Still Missing</title>
      <dc:creator>Mykola Kondratiuk</dc:creator>
      <pubDate>Wed, 20 May 2026 07:30:41 +0000</pubDate>
      <link>https://dev.to/itskondrat/i-read-the-devenex-launch-yesterday-heres-the-policy-file-your-agent-repo-is-still-missing-23j7</link>
      <guid>https://dev.to/itskondrat/i-read-the-devenex-launch-yesterday-heres-the-policy-file-your-agent-repo-is-still-missing-23j7</guid>
      <description>&lt;p&gt;I spent an hour reading the Devenex launch yesterday and the only sentence I keep coming back to is "execution control plane." That phrase is doing a lot of work.&lt;/p&gt;

&lt;p&gt;It says: enforcement is a product now. Every agent request gets policy-evaluated, identity-bound, recorded as evidence before anything runs.&lt;/p&gt;

&lt;p&gt;It does not say: the policy itself exists.&lt;/p&gt;

&lt;h2&gt;
  
  
  Six products ship enforcement. Zero ship the policy.
&lt;/h2&gt;

&lt;p&gt;Look at what shipped this month. Devenex launched May 19 as the first execution control plane. Antigravity 2.0 hardened Git policies at Google I/O Day 2. Notion's External Agent API went GA with workspace-scoped guardrails. Claude has had tool-use limits since launch. OpenAI has function-call constraints. Salesforce Agentforce has action approvals.&lt;/p&gt;

&lt;p&gt;Six products. Different vendors. Different layers. All shipping enforcement.&lt;/p&gt;

&lt;p&gt;The artifact they all need to enforce against is the same shape. None of them ship it. That artifact is your problem, and it lives in your repo, not theirs.&lt;/p&gt;

&lt;p&gt;I started calling it the policy file.&lt;/p&gt;

&lt;h2&gt;
  
  
  What goes in the policy file
&lt;/h2&gt;

&lt;p&gt;Four sections. I've been writing it this way for a while; the launches this week made me realize it's the same shape across every enforcement product I read the docs for. The shape doesn't depend on the vendor.&lt;/p&gt;

&lt;h3&gt;
  
  
  Action classes
&lt;/h3&gt;

&lt;p&gt;The agent's universe of possible actions, broken into named classes: &lt;code&gt;read&lt;/code&gt;, &lt;code&gt;write&lt;/code&gt;, &lt;code&gt;send-external&lt;/code&gt;, &lt;code&gt;transact&lt;/code&gt;, &lt;code&gt;escalate&lt;/code&gt;, &lt;code&gt;spawn-subagent&lt;/code&gt;. Each class is a category the policy file attaches constraints to. The act of writing the list is the point. The default in every deployment doc I've seen is implicit: the agent can do anything inside its tool set. Naming classes is how you refuse that default.&lt;/p&gt;

&lt;p&gt;A sketch in YAML:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;action_classes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;read&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;sources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;crm.contacts&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;crm.opportunities&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;write&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;targets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;crm.opportunities.notes&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;send_external&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;channels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;email&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;slack-dm&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;transact&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;instruments&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;stripe.refund&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's not a real schema. It's the shape your real schema settles into after the third review.&lt;/p&gt;

&lt;h3&gt;
  
  
  Blast radius caps
&lt;/h3&gt;

&lt;p&gt;A number per class. Not a vague guardrail, a number the enforcement layer can compare against at request time.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;caps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;write.records_per_run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt;
  &lt;span class="na"&gt;send_external.recipients_per_session&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
  &lt;span class="na"&gt;transact.usd_per_run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;500&lt;/span&gt;
  &lt;span class="na"&gt;spawn_subagent.depth&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The contrast: the deployment doc says "the agent has access to the CRM." The policy file says "the agent's write class is capped at fifty records per run." One sentence Devenex can check. One sentence Antigravity can check. One sentence Claude tool-use can check.&lt;/p&gt;

&lt;h3&gt;
  
  
  Escalation triggers
&lt;/h3&gt;

&lt;p&gt;The inverse half of the allowlist. When the agent hits a class not in its policy, or a cap it's about to exceed, what fires? Named human. Named channel. Named SLA.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;escalation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;write&lt;/span&gt;
    &lt;span class="na"&gt;trigger&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cap_exceeded&lt;/span&gt;
    &lt;span class="na"&gt;route&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;#agent-ops"&lt;/span&gt;
    &lt;span class="na"&gt;owner&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;@owner-of-record"&lt;/span&gt;
    &lt;span class="na"&gt;sla_hours&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;4&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;transact&lt;/span&gt;
    &lt;span class="na"&gt;trigger&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;any&lt;/span&gt;
    &lt;span class="na"&gt;route&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;#finance-approvals"&lt;/span&gt;
    &lt;span class="na"&gt;owner&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;@treasury-lead"&lt;/span&gt;
    &lt;span class="na"&gt;sla_hours&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The deployment doc has "agent owner" once on page one. The policy file has an escalation route per class.&lt;/p&gt;

&lt;h3&gt;
  
  
  Evidence schema
&lt;/h3&gt;

&lt;p&gt;What the agent has to log so a human can audit the run afterward. Structured output. The action class invoked. The tool calls. The identity the agent acted as. The policy version. The escalation path if any.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;evidence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;required_fields&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;run_id&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;policy_version&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;action_class&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;tool_calls&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;acting_identity&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;escalation_record&lt;/span&gt;
  &lt;span class="na"&gt;format&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;jsonl&lt;/span&gt;
  &lt;span class="na"&gt;retention_days&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;365&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without an evidence schema, you can't answer "did the agent follow the policy?" after the fact. The policy was unenforceable from the start.&lt;/p&gt;

&lt;h2&gt;
  
  
  A specific moment that made this concrete
&lt;/h2&gt;

&lt;p&gt;I was reading through a deployment doc for an agent recently. Clean prose. Listed the APIs. Listed the data sources. Useful agent.&lt;/p&gt;

&lt;p&gt;No section for what happens when it tries to write five thousand records. No section for what happens when it tries to send to two hundred recipients. No section for what happens when it transacts above a cap, because nobody had written the cap.&lt;/p&gt;

&lt;p&gt;The deployment doc wasn't wrong. It was answering the wrong question. It answered "what does the agent do?" The policy file answers "what is the agent allowed to do, and what fires if any of that breaks?"&lt;/p&gt;

&lt;p&gt;Different artifact. Different reviewer. Different file.&lt;/p&gt;

&lt;h2&gt;
  
  
  The clean split: enforcement vs. authoring
&lt;/h2&gt;

&lt;p&gt;Devenex et al. ship enforcement. That half is done. The other half - authoring - isn't a product, and I don't think it can be one. Authoring is the codification of your team's actual judgment about what the agent should be allowed to do. That judgment is cross-functional: engineering knows the runtime, security knows the threat model, legal knows the constraint, finance knows the cap.&lt;/p&gt;

&lt;p&gt;It's not "PM lobs a doc over the wall." The PM convenes the call, drafts the file, opens the PR. Engineering reviews it the same way it reviews a Terraform plan. Security reviews it the same way it reviews IAM. The policy ships in the same PR as the agent.&lt;/p&gt;

&lt;p&gt;That's policy-as-code, the shape devs already know from infra. The new thing isn't the shape; it's the artifact existing for AI agents at all.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd do this week if I were shipping an agent
&lt;/h2&gt;

&lt;p&gt;Open a &lt;code&gt;policy.yaml&lt;/code&gt; in the agent repo. Stub the four sections. Pin one number per class even if it's a wild guess. Wire the evidence schema into the agent's logging path. Put it in the same PR as the next prompt change.&lt;/p&gt;

&lt;p&gt;The enforcement layer your platform vendor ships is checking against something. If nobody wrote the something, the enforcement is checking against silence.&lt;/p&gt;

&lt;p&gt;What's the section your agent repo is missing first - blast radius caps, or the evidence schema?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devops</category>
      <category>vibecoding</category>
      <category>security</category>
    </item>
    <item>
      <title>I Built a 5-Signal Vendor Watchlist for Google I/O 2026 - Here's What Each One Will Break in My Stack</title>
      <dc:creator>Mykola Kondratiuk</dc:creator>
      <pubDate>Fri, 15 May 2026 05:15:33 +0000</pubDate>
      <link>https://dev.to/itskondrat/i-built-a-5-signal-vendor-watchlist-for-google-io-2026-heres-what-each-one-will-break-in-my-23bp</link>
      <guid>https://dev.to/itskondrat/i-built-a-5-signal-vendor-watchlist-for-google-io-2026-heres-what-each-one-will-break-in-my-23bp</guid>
      <description>&lt;p&gt;canonical_url:&lt;/p&gt;

&lt;p&gt;Google I/O 2026 is four days out. If you ship anything that touches a Google model, the keynote is about to change capability assumptions your team never approved against. I have done this dance for six straight years of keynotes, and the pattern is always the same: an engineer pastes a Verge link into Slack on Wednesday, someone says "we should look at this", and three weeks later a capability we never reviewed quietly shipped inside a tool we already authorized.&lt;/p&gt;

&lt;p&gt;This year I tried something different. Friday morning I sat down with a coffee and wrote a five-line watchlist for the keynote - vendor signals plus one specific engineering action per signal. The exercise was small enough to feel silly. Then I realized the format reuses for every keynote that follows. AWS re:Invent. Microsoft Build. OpenAI DevDay.&lt;/p&gt;

&lt;p&gt;Sharing it here because if you have not built one yet, the next four days are your peak window.&lt;/p&gt;

&lt;h2&gt;
  
  
  What vendor product observability actually means
&lt;/h2&gt;

&lt;p&gt;Most engineering teams I work with have great downstream observability. Uptime, error rates, model latency, queue depth. The ops team set it up. Dashboards exist.&lt;/p&gt;

&lt;p&gt;Ask the same teams what they observe about their vendors and the answer goes quiet. The signals that reshape what your stack can do - a keynote, a model release, a quietly retired chatbot, a market position shift - arrive through press coverage, not through a dashboard. There is no Datadog for vendor product strategy. The closest substitute is a small, named list of signals you commit to reading on the day of the keynote.&lt;/p&gt;

&lt;p&gt;That list is the watchlist. It is not glamorous. It is a 5-row markdown file. The discipline is that it exists.&lt;/p&gt;

&lt;h2&gt;
  
  
  Signal 1: OS-layer ambient intelligence as a new delivery category
&lt;/h2&gt;

&lt;p&gt;The Android Show I/O Edition aired last week. Gemini Intelligence ships in June as the intelligence layer running below the Android app surface - booking, browsing, form completion - with access to email, calendar, messages. Rolling out as an OS update to any Android 12+ device on the AI Pro or Ultra tier.&lt;/p&gt;

&lt;p&gt;If your team builds on Android, this matters because most authorization patterns cover two delivery categories: agents your team deployed, and agents a vendor embedded in a tool your team authorized. OS-layer intelligence is a third category, and it does not arrive through your authorization pipeline. It arrives through the OS update channel.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Engineering action at I/O:&lt;/strong&gt; Pull the list of Android-deployed devices in your fleet that intersect "AI Pro or Ultra subscriber" and "Android 12+". That intersection is the scope of devices where the OS now has agent capabilities your data flow assumptions never accounted for.&lt;/p&gt;

&lt;h2&gt;
  
  
  Signal 2: Model capability delta as silent capability inheritance
&lt;/h2&gt;

&lt;p&gt;When a frontier model jumps capability, every downstream tool that runs on it inherits the new floor on keynote day. If you have authorized AI tools whose vendor swapped in a Google model, those tools now have capabilities you never approved against, and you may not even know which ones.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Engineering action at I/O:&lt;/strong&gt; Make a one-column list of authorized AI tools your team uses that run on Google models. After the keynote, write the capability delta beside each one. The delta column is the trigger for re-review. If the delta crosses a sensitivity threshold (PII handling, code execution, autonomous file I/O), the tool needs a fresh approval pass.&lt;/p&gt;

&lt;p&gt;I tried this exercise last month with the Anthropic Opus 4.7 200k context window release. The finding surprised me: capping context at 200k produced better spec quality on agent reviews than running uncapped. Context window length is a quality lever, not just a capacity dial. The delta is not always "more is more."&lt;/p&gt;

&lt;h2&gt;
  
  
  Signal 3: A2A protocol updates as a multi-agent perimeter change
&lt;/h2&gt;

&lt;p&gt;Google has been moving toward agent interoperability standards for two quarters. Any A2A protocol announcement at I/O reshapes the question of which authorized agents are allowed to talk to which other agents.&lt;/p&gt;

&lt;p&gt;This is the question most engineering teams treat as edge case until production forces it. Agents are still mostly designed as endpoints with their own auth. The graph view - what agent A is permitted to call agent B for, under what conditions, with what data - is rarely written down before a multi-agent stack hits real users.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Engineering action at I/O:&lt;/strong&gt; If an A2A spec or protocol lands, write the team's permission policy for agent-to-agent communication on the same day. One page. It does not have to be production-ready. It has to exist before the protocol shows up in the SDK you already use.&lt;/p&gt;

&lt;h2&gt;
  
  
  Signal 4: Vendor market position drift
&lt;/h2&gt;

&lt;p&gt;Axios reported on May 13 that Anthropic has overtaken OpenAI in workplace adoption. Take a moment with that. If your team's vendor selection rationale was written when OpenAI was the dominant workplace AI, your approval rationale references a market position that is no longer current.&lt;/p&gt;

&lt;p&gt;Most approval artifacts have no field for "market position when approved" and no scheduled review trigger for "market position has shifted since." The approval ages silently. Two years from now somebody will ask why the team is on the second-place vendor and the honest answer will be "nobody asked us to update the rationale when the market moved."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Engineering action at I/O:&lt;/strong&gt; Add one field to your vendor approval template - &lt;em&gt;Market position when approved.&lt;/em&gt; The field is the trigger. The next position shift now has a review path baked in. This costs five minutes and pays for itself the first time a budget review asks the question.&lt;/p&gt;

&lt;h2&gt;
  
  
  Signal 5: Vendor product retirement as an agent migration event
&lt;/h2&gt;

&lt;p&gt;Amazon announced this week it is retiring Rufus and replacing it with the Alexa shopping agent. This is the rehearsal for the next ten retirements. Every enterprise AI vendor will eventually retire the chatbot you authorized and ship an agent successor with a different operating envelope.&lt;/p&gt;

&lt;p&gt;Most engineering teams treat the successor as a drop-in. It is not. The operating envelope is different. The data flow is different. The failure mode is different. If your team authorized the predecessor, the successor needs a fresh review - not a carryover.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Engineering action at I/O:&lt;/strong&gt; Watch session notes for "deprecated", "retired", "replaced by" language. Every retirement of a product your team authorized triggers a fresh review pass on the successor. Block the calendar before the announcement, not after.&lt;/p&gt;

&lt;h2&gt;
  
  
  The reusable shape
&lt;/h2&gt;

&lt;p&gt;The five signals are not the point. The point is that writing them down on Friday changes how the keynote reads on Tuesday. The keynote stops being news and becomes a scheduled review trigger with a defined output - an updated stack policy.&lt;/p&gt;

&lt;p&gt;The same five categories apply to AWS re:Invent, MSFT Build, OpenAI DevDay. OS-layer delivery. Model capability delta. A2A perimeter. Vendor position drift. Product retirement. Five rows. One engineering action per row. Re-read four times a year.&lt;/p&gt;

&lt;p&gt;The thirty thousand Claude-certified consultants PwC just announced make the point sharper, not weaker. Every keynote lands against a workforce that already has the tooling. The watchlist is not optional anymore.&lt;/p&gt;

&lt;h2&gt;
  
  
  Question
&lt;/h2&gt;

&lt;p&gt;If you maintain a watchlist for major vendor keynotes - or have ever wished you had one - what signal would I add as the sixth row? I want to compile what you all post into a single follow-up before the keynote.&lt;/p&gt;

&lt;p&gt;Tags: #ai #googleio #productivity #discuss&lt;/p&gt;

</description>
      <category>ai</category>
      <category>googleio</category>
      <category>productivity</category>
      <category>discuss</category>
    </item>
    <item>
      <title>I Graded My Agent Deployment Doc Against LangChain Interrupt - Here Are the 5 Gaps</title>
      <dc:creator>Mykola Kondratiuk</dc:creator>
      <pubDate>Wed, 13 May 2026 07:34:10 +0000</pubDate>
      <link>https://dev.to/itskondrat/i-graded-my-agent-deployment-doc-against-langchain-interrupt-here-are-the-5-gaps-572g</link>
      <guid>https://dev.to/itskondrat/i-graded-my-agent-deployment-doc-against-langchain-interrupt-here-are-the-5-gaps-572g</guid>
      <description>&lt;p&gt;canonical_url:&lt;/p&gt;

&lt;p&gt;I do a lot of deployment authorization docs. That's the PM version of what an SRE would call a launch checklist. It lists the agent, the scopes, the secrets it touches, the kill switch, the cost ceiling, the rollback path.&lt;/p&gt;

&lt;p&gt;For the last two years, the audience for that doc has been exactly one team: ours. Security signs it. Compliance reviews it. Engineering builds against it. Nobody outside the company ever reads a line.&lt;/p&gt;

&lt;p&gt;Today I pulled up the LangChain Interrupt Day 1 schedule and that audience doubled.&lt;/p&gt;

&lt;h2&gt;
  
  
  The thing that changed at 9:30 PT today
&lt;/h2&gt;

&lt;p&gt;Harrison Chase keynoted Interrupt at 9:30 Pacific. The headline was tame on paper: a synthesis of what 1,000+ teams shipped in production over the past 12 months. The substance was less tame. Clay, Rippling, Workday, plus the long tail of teams running smaller agent fleets, surfaced concrete production patterns. The talk wasn't aspirational. It read like a postmortem of an entire industry's first serious year of agent deployment.&lt;/p&gt;

&lt;p&gt;That synthesis is now in public. Same week, SAP Sapphire closed with 200+ agents under a single stated design rule (governance first), with Claude as the reasoning layer and NVIDIA OpenShell as the execution wrapper. Two completely different sources. Same structural artifact: a public reference for what production-ready agent deployment looks like.&lt;/p&gt;

&lt;p&gt;My internal doc has been graded against one rubric. As of today, it gets a second one whether I asked for it or not.&lt;/p&gt;

&lt;h2&gt;
  
  
  I sat down and graded mine
&lt;/h2&gt;

&lt;p&gt;I gave myself 45 minutes. Three columns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pattern named in the public production literature this week&lt;/li&gt;
&lt;li&gt;Our position: adopted, diverged, or gap&lt;/li&gt;
&lt;li&gt;One sentence on why&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I expected to find maybe 1 gap, possibly 2. I found 5.&lt;/p&gt;

&lt;p&gt;Here they are, with what I'm doing about each one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gap 1: per-action cost ceiling, not per-month budget
&lt;/h2&gt;

&lt;p&gt;Our cost guardrail was a monthly budget per agent. Easy to set, easy to forget, fires the alert about a week after damage is done.&lt;/p&gt;

&lt;p&gt;The production pattern that kept surfacing in the public synthesis is per-action ceiling with auto-pause. If a single tool invocation projects to cost more than $N (or more than $N over the rolling 60 seconds), the agent stops and pings a human.&lt;/p&gt;

&lt;p&gt;Our fix is small: a wrapper around the LLM client that estimates token cost in advance, compares against a per-action ceiling defined in the agent's config, and routes to a pause queue when over. About 40 lines. The harder part was deciding the ceiling, which is now a PM call, not an SRE call.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gap 2: scoped credentials per agent, not per agent family
&lt;/h2&gt;

&lt;p&gt;We had one service account per agent family (e.g., "research agents", "support agents"). Audit logs showed activity by family, not by individual agent. Fine for accounting. Bad for blast-radius reasoning.&lt;/p&gt;

&lt;p&gt;The production pattern is one credential per logical agent instance, with the scope narrowed to the specific tables, endpoints, or namespaces that agent legitimately touches. If a single agent goes off, you revoke one credential without taking down the family.&lt;/p&gt;

&lt;p&gt;This is a one-day migration in our system because the agent identity already exists in our config. We just weren't projecting it down into the credential layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gap 3: the production review document does not yet exist
&lt;/h2&gt;

&lt;p&gt;This one's about the artifact, not the runtime. Our internal review covers our policy: does this satisfy our compliance posture? It does not cover the production floor reading: where do we sit relative to what 1,000+ teams already found works?&lt;/p&gt;

&lt;p&gt;I'm adding a new section to every deployment doc going forward. Three subheads:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Adopted: patterns we took straight from the public practitioner record&lt;/li&gt;
&lt;li&gt;Diverged: patterns we considered and chose against, with the reason&lt;/li&gt;
&lt;li&gt;Gaps: patterns we don't have an answer for yet&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The "Gaps" subhead is the high-leverage one. Gaps documented in your own voice are gaps you control the conversation about. Gaps surfaced by a stakeholder in a meeting are not.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gap 4: agent-writes-its-own-retries is a flagged pattern
&lt;/h2&gt;

&lt;p&gt;We let one agent class write its own retry policy when a tool call failed. It's an obvious productivity win until the agent invents a retry pattern that compounds against a rate-limited downstream service.&lt;/p&gt;

&lt;p&gt;The published practitioner consensus this week was clear: retries belong in the agent harness, not in the agent's reasoning loop. The agent should not be the entity deciding when to try again.&lt;/p&gt;

&lt;p&gt;Our fix is to replace the self-retry behavior with a queued reissue: the harness owns the policy, the agent owns the request. About a day's work, including writing the test cases. Most of the time was migrating the existing retry-policy state out of the prompt and into config.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gap 5: blast radius is named in our spec, but not in our review template
&lt;/h2&gt;

&lt;p&gt;Anthropic's Deputy CISO ran a webinar Monday framing agent governance around "specific actions, scopes, and blast radii." That phrase is going to be the lingua franca of agent risk for at least the next 12 months.&lt;/p&gt;

&lt;p&gt;We use blast radius informally in our deployment specs. We do not have a column for it in our review template. So the conversation we want to have (what's the worst this agent can do before someone catches it?) sometimes doesn't happen because the document doesn't prompt it.&lt;/p&gt;

&lt;p&gt;The fix is a column. Each agent's row gets: "Blast radius at maximum scope, if every guardrail fails." One sentence. The act of writing the sentence is the audit.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'm shipping by Friday
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The per-action cost wrapper, with one ceiling per agent class, tunable in config.&lt;/li&gt;
&lt;li&gt;One credential per logical agent. Audit log change merges with it.&lt;/li&gt;
&lt;li&gt;A new section in the deployment doc template: Production-Floor Reading.&lt;/li&gt;
&lt;li&gt;The retry policy migrated from the prompt into the harness.&lt;/li&gt;
&lt;li&gt;A blast radius column on the review template, populated retroactively for every active agent.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these are large. The total work is something like 3-4 engineer-days across the team. The reason I wasn't doing them last week is not that they were hard. It's that I didn't have the second reader yet. The internal reader was satisfied. Without the production floor, none of these gaps surfaced as gaps.&lt;/p&gt;

&lt;p&gt;That's the part worth saying out loud. The second reader is what made the gaps visible. The work was always small. The doc was the bottleneck, and the doc didn't know it was the bottleneck.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd ask if you ship agents
&lt;/h2&gt;

&lt;p&gt;If you ran the same 45-minute exercise on your deployment doc this afternoon, which gap would you find first - cost ceiling, credential scope, retries, blast radius, or the review template itself?&lt;/p&gt;

&lt;p&gt;I'm collecting answers through Friday. The Day 2 Interrupt content will sharpen the production-floor reading, and I'd rather refine the template against five teams' gap rows than against my own.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Tags: projectmanagement, ai, agents, productivity, discuss&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>productivity</category>
      <category>discuss</category>
    </item>
    <item>
      <title>I Read a Survey That Predicted My Job's Next 2 Years - Here's What It Got Right and Missed</title>
      <dc:creator>Mykola Kondratiuk</dc:creator>
      <pubDate>Sat, 09 May 2026 07:34:36 +0000</pubDate>
      <link>https://dev.to/itskondrat/i-read-a-survey-that-predicted-my-jobs-next-2-years-heres-what-it-got-right-and-missed-14ea</link>
      <guid>https://dev.to/itskondrat/i-read-a-survey-that-predicted-my-jobs-next-2-years-heres-what-it-got-right-and-missed-14ea</guid>
      <description>&lt;p&gt;canonical_url:&lt;/p&gt;

&lt;p&gt;KPMG just dropped a number on people in my seat. They surveyed 306 Canadian executives. 39% of them expect AI agents to be leading project management for their teams within 2-3 years. 66% are already moving to a fully integrated AI-human workforce. First time the role-redefinition forecast is in survey data, not in an opinion column.&lt;/p&gt;

&lt;p&gt;I run a PM workflow with an agent fleet doing most of the drafting and a lot of the review. So when an executive survey predicts the next two years of my job, I read it as primary source material on what the people who sign my budget are planning to assume.&lt;/p&gt;

&lt;p&gt;Two things stood out.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the executives got right
&lt;/h2&gt;

&lt;p&gt;The direction is correct. The role really is shifting toward direction-and-review instead of artifact authorship. My morning two years ago was inbox plus drafting the day's first brief. My morning today is fleet status, then choosing which of last night's drafts is shippable, which needs another pass, and which got the wrong scope baked in and needs to be killed before it gets routed.&lt;/p&gt;

&lt;p&gt;The horizon is also realistic, depending on where you're starting from. If your team has not yet stood up an agent stack alongside engineering work, 24-36 months to "agents leading PM" is plausible. There is a real procurement, instruction-tuning, governance-design, trust-building cycle to go through. None of it is fast on the first lap.&lt;/p&gt;

&lt;p&gt;The integrated-workforce framing is the part the dev side will recognize fastest. The pattern is the same one engineering already lives: a PR queue where some commits are human-authored, some are agent-authored, and the human decision surface is mostly review and override. The PM equivalent is here. It looks like a doc queue, a roadmap delta queue, a sprint-scope queue. Same shape, different artifacts.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the survey didn't ask
&lt;/h2&gt;

&lt;p&gt;Executive surveys ask about role-level shifts. They don't ask the day-level question, which is the one engineers and PMs both actually live in.&lt;/p&gt;

&lt;p&gt;The day-level question is: what does the morning look like, what's in the queue, what runs without you, what blocks on a human call, where does the dev-PM interface change shape because the PM is mostly directing instead of authoring?&lt;/p&gt;

&lt;p&gt;For the dev side, the change that matters is on the spec-to-ship loop. Specifically, the spec side gets shorter and the review side gets longer. The PM is still naming what to build, but the artifact that lands in your repo as the brief or the scoped doc is increasingly drafted by an agent the PM directed and reviewed. The conversation about the spec moves from "let me write this up and send it Tuesday" to "the agent drafted three variants overnight, here's the one I'd ship, push back if anything looks off." Faster on the spec side. Slower on the review side, because the dev now has to verify that the directed-and-reviewed spec is still coherent before committing to it.&lt;/p&gt;

&lt;p&gt;The survey doesn't measure that loop. It measures the hiring intent and the workforce category. Both useful, neither operational.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 30-day diff
&lt;/h2&gt;

&lt;p&gt;Here's a move that probably translates regardless of role.&lt;/p&gt;

&lt;p&gt;Pull up your current todo list. Write down three items something automated is already doing or could plausibly be doing if you set it up. Write down three items only you in your seat can do. Then pull up your todo list from a month ago. Run the same split. How many items moved from "only you" to "automated or could-be"? Even one is a real signal. Three is a trend.&lt;/p&gt;

&lt;p&gt;I started doing this around the time I noticed the agent had drafted a brief I'd planned to write. The diff that month was small. Six months in, it was not.&lt;/p&gt;

&lt;p&gt;The KPMG number is a 24-month forecast. The 30-day diff is the short-horizon evidence the survey didn't ask for. The forecast is in their hands. The diff is in yours.&lt;/p&gt;

&lt;h2&gt;
  
  
  The floor, not the ceiling
&lt;/h2&gt;

&lt;p&gt;If you've been running this for two years already, the 39% expecting "agents leading PM" in 24-36 months is the floor of what's coming. The practitioner who started seriously in 2024 is already past where executives expect they'll be in 2028. The interesting question is not "will it happen." It's "what does floor + 1 look like, and who's already there."&lt;/p&gt;

&lt;p&gt;The dev side has been at floor + 1 for a while in a few places. The PM side is catching up.&lt;/p&gt;

&lt;p&gt;What's the loop look like on your team?&lt;/p&gt;

</description>
      <category>career</category>
      <category>ai</category>
      <category>productivity</category>
      <category>discuss</category>
    </item>
    <item>
      <title>I Read Boris Cherny's 30-Day Claude Code Stat. Here's What Most Takes Get Wrong.</title>
      <dc:creator>Mykola Kondratiuk</dc:creator>
      <pubDate>Wed, 06 May 2026 07:44:58 +0000</pubDate>
      <link>https://dev.to/itskondrat/i-read-boris-chernys-30-day-claude-code-stat-heres-what-most-takes-get-wrong-5900</link>
      <guid>https://dev.to/itskondrat/i-read-boris-chernys-30-day-claude-code-stat-heres-what-most-takes-get-wrong-5900</guid>
      <description>&lt;p&gt;Boris Cherny, Head of Claude Code at Anthropic, posted a stat on X this morning. In the last 30 days, 100% of his contributions to Claude Code were written by Claude Code itself. 259 PRs. 497 commits. 40,000 lines added. 38,000 removed. Zero by the head of the team.&lt;/p&gt;

&lt;p&gt;Most reads of this go straight to "AI-assisted dev productivity is real." That's the obvious layer. It's not the interesting one.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Engineering Leadership Read
&lt;/h2&gt;

&lt;p&gt;If you run a team in 2026 - staff+ engineer or EM or director - the data point that matters is the second one. The head of a product team is no longer the team's most prolific code author. The head of the team is also no longer the team's most prolific reviewer; Claude Code does the first pass on its own PRs.&lt;/p&gt;

&lt;p&gt;The work the head of the team is doing all day, then, is not the artifact. The artifact (the spec, the PR description) is a thing AI ships now. The calls inside the artifact - what to build, what to kill - those are the work.&lt;/p&gt;

&lt;p&gt;This is the part most senior engineers haven't named in their own job yet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Same Shift, Different Clothing
&lt;/h2&gt;

&lt;p&gt;Same week, ServiceNow shipped a literal AI agent kill switch at Knowledge 2026. The demo: a prompt-injection attack hits a pricing agent. The system maps blast radius. A kill switch surfaces and asks a human to pull it.&lt;/p&gt;

&lt;p&gt;Most coverage framed this as IT and security infrastructure. It is. It's also a leadership data point in product clothing. The vendor solved the product question - what does the kill switch do, how fast does it cut. The vendor cannot solve the leadership question - who decides when to pull it. Against what threshold.&lt;/p&gt;

&lt;p&gt;Same shape as Boris's stat. The artifact (the kill switch feature) ships from a vendor. The call (when to use it) stays with the senior person on the team.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Deciding Work Actually Looks Like
&lt;/h2&gt;

&lt;p&gt;It doesn't show up in a commit log. It doesn't have a template.&lt;/p&gt;

&lt;p&gt;It's the moment in a Slack thread where someone asks "should we ship this?" and the senior person answers in two sentences with three reasons. It's the call to ship the rollback or the forward fix when the AI flagged the regression. It's the human pass on AI-written code that asks "but does this match the product intent?" and decides yes or no.&lt;/p&gt;

&lt;p&gt;None of those moments produce an artifact. All of them are the work that compounds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Measurement Systems Are Blind to It
&lt;/h2&gt;

&lt;p&gt;Performance reviews and promo packets were built when the artifact was the work. They reward what got shipped. The work that got decided leaves no trace, so the system can't see it.&lt;/p&gt;

&lt;p&gt;The senior engineer or EM measured by code volume or design-doc count is being measured against a 2023 work product. The senior engineer measured by decision quality - what got built, what got killed - is being measured against the 2026 one.&lt;/p&gt;

&lt;p&gt;If your performance review still asks for ship counts and never asks about your call log, the system hasn't caught up to your job.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Moves If This Lands
&lt;/h2&gt;

&lt;p&gt;The fix starts small.&lt;/p&gt;

&lt;p&gt;First, start a private call log this week. One line per call. What was decided. What the alternative was. The first week feels like nothing. By month two it's the artifact your performance review was missing - a record of the work that doesn't show up anywhere else.&lt;/p&gt;

&lt;p&gt;Second, lead with the calls in your next promo conversation or career check-in. "I shipped X" is 2023 language. "I decided X over Y because Z, and the outcome was W" is 2026 language. The shape of evidence changes when the work changes.&lt;/p&gt;

&lt;p&gt;Third, find the leader on your team whose daily work is already 80% calls. Watch how they spend their day. That's the role shape you're growing into - and it's quieter than you'd think.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Career Arc Nobody Named
&lt;/h2&gt;

&lt;p&gt;The path from senior IC to staff to manager to director to VP has always meant less authorship and more direction at every rung. What's new in 2026 is the compression. AI takes over artifact production at every level. So the curve from authorship to direction now starts at IC, well below where it used to.&lt;/p&gt;

&lt;p&gt;The senior leader who is still measured by what they shipped is being measured by a metric the system inherited. The senior leader who is being measured by what they decided is being measured by what the work actually is.&lt;/p&gt;

&lt;p&gt;Boris Cherny just gave us the cleanest data point of the year for that shift. The Head of Claude Code stopped writing code, and the team kept shipping. That isn't a productivity story. It's a leadership story, and the system that measures the head of the team hasn't caught up to it yet.&lt;/p&gt;

&lt;p&gt;What was your highest-leverage call this week, and is it visible in any system that measures your work?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>career</category>
      <category>productivity</category>
      <category>leadership</category>
    </item>
    <item>
      <title>I Pulled 3 Months of Engineering Metrics on Our AI Tools - Here's the Dashboard Cell Nobody Built</title>
      <dc:creator>Mykola Kondratiuk</dc:creator>
      <pubDate>Wed, 29 Apr 2026 06:56:11 +0000</pubDate>
      <link>https://dev.to/itskondrat/i-pulled-3-months-of-engineering-metrics-on-our-ai-tools-heres-the-dashboard-cell-nobody-built-1gk2</link>
      <guid>https://dev.to/itskondrat/i-pulled-3-months-of-engineering-metrics-on-our-ai-tools-heres-the-dashboard-cell-nobody-built-1gk2</guid>
      <description>&lt;p&gt;The CFO asked engineering. Engineering pointed at the PM retro. The PM retro had a row that said "team velocity feels higher" and a row that said "developers report subjective time savings." That was the data.&lt;/p&gt;

&lt;p&gt;Meanwhile a fresh enterprise survey out of ExcelMindCyber says 73% of companies will fail to deliver promised ROI on AI investments this year. I read that and thought: of course. The dashboard for the question doesn't exist.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Repo Already Knows
&lt;/h2&gt;

&lt;p&gt;Pull request throughput. Time-to-first-review. Cycle time from open to merge. Build duration. CI flake rate. Incident count. Mean time to recovery. PR size distribution.&lt;/p&gt;

&lt;p&gt;We ship all of these. Most teams stream them into a Grafana board or a Linear analytics view or a custom dbt model on top of GitHub events. The data is in the repo. The data is in the CI logs. The data is in the deploy pipeline.&lt;/p&gt;

&lt;p&gt;What we don't ship is the cell that says "for the workflow we adopted Tool X for, what changed."&lt;/p&gt;

&lt;p&gt;That sounds trivial. It is not.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Cell That Doesn't Exist
&lt;/h2&gt;

&lt;p&gt;To answer the cell honestly you need three things in one row:&lt;/p&gt;

&lt;p&gt;The workflow boundary. Not the tool boundary. "PR review" is a workflow. "Tool X" is a tool. The same tool can land in three workflows and change two of them. You need the join key to be the workflow.&lt;/p&gt;

&lt;p&gt;The before-window. A baseline of the metric for that workflow before the tool landed. Not the team-wide cycle time. The cycle time on the specific class of work the tool was supposed to change.&lt;/p&gt;

&lt;p&gt;The behavior signal. Did engineers actually use the tool inside the workflow, or did they sign up, click around once, and route around it. We have user-event telemetry for our own product. We rarely have it for the AI tool we just bought.&lt;/p&gt;

&lt;p&gt;Without those three columns, the dashboard answers a different question. It answers "did we deploy the tool" not "did the workflow change."&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Tried First (and Why It Failed)
&lt;/h2&gt;

&lt;p&gt;The first version of the cell I built was a simple compare. Cycle time on PRs in February versus cycle time on PRs in April. Tool landed mid-March.&lt;/p&gt;

&lt;p&gt;Numbers looked good. Cycle time was down 14%. I almost shipped it.&lt;/p&gt;

&lt;p&gt;Then I segmented by PR class. Refactor PRs were down 22%. Bug-fix PRs were flat. Feature PRs were up 4%. The aggregate hid three completely different stories.&lt;/p&gt;

&lt;p&gt;Then I looked at tool usage. Half the team had opened the tool fewer than three times in 30 days. The 14% improvement was carried by four developers. The rest of the team was running the same workflow without the tool and getting roughly the same numbers.&lt;/p&gt;

&lt;p&gt;The honest answer to the CFO question wasn't "the tool drove a 14% improvement." It was "four developers got real value, the rest haven't adopted it yet, and we don't have the playbook for the rest."&lt;/p&gt;

&lt;p&gt;If I had shipped the v1 number, the next quarter's budget cycle would have used it as proof. Then we would have spent more on the same shape of tool, and gotten a smaller delta, because the developers who would benefit had already adopted.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Engineering Owns Here
&lt;/h2&gt;

&lt;p&gt;I keep hearing the framing that "this is a PM problem." It isn't, or rather, it isn't only.&lt;/p&gt;

&lt;p&gt;The PM retro happens after the quarter. The dashboard cell happens continuously. If engineering owns the metrics that say "this tool changed this workflow for these developers by this much," the PM gets a starting point that isn't fiction. If engineering owns nothing, the PM writes the retro on vibes and the CFO funds the next round on vibes.&lt;/p&gt;

&lt;p&gt;The tools we already use give us most of what we need. GitHub events. Linear events. Tool-specific webhooks where they exist. A small dbt model that defines workflow boundaries explicitly. A heartbeat metric on tool usage at the user level.&lt;/p&gt;

&lt;p&gt;The piece nobody is building is the join. Workflow x tool x usage x outcome. Four columns. Most teams have one.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Smallest Version Worth Shipping
&lt;/h2&gt;

&lt;p&gt;A single materialized view. Per workflow, per AI tool, per developer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt;
  &lt;span class="n"&gt;workflow_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;tool_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;developer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;date_trunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'week'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;week&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="c1"&gt;-- usage signal&lt;/span&gt;
  &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;distinct&lt;/span&gt; &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'tool_invocation'&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="n"&gt;event_id&lt;/span&gt; &lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;tool_uses&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="c1"&gt;-- workflow outcome&lt;/span&gt;
  &lt;span class="k"&gt;avg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cycle_time_minutes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;cycle_time_avg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;distinct&lt;/span&gt; &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'workflow_completion'&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="n"&gt;event_id&lt;/span&gt; &lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;completions&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;workflow_events&lt;/span&gt;
&lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;tool_events&lt;/span&gt; &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;developer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;workflow_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;week&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Five columns. One join. Then a Grafana panel that shows cycle_time_avg split by tool_uses bucket (zero, low, high). The panel answers the question: "for the developers who actually use the tool, did the workflow get faster, and by how much."&lt;/p&gt;

&lt;p&gt;The first time I ran ours, the bucket comparison was the most honest 30 seconds of the quarter. It told me which tools had earned their seat and which were budget items pretending to be productivity gains.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest Limit
&lt;/h2&gt;

&lt;p&gt;This dashboard cell does not answer whether the tool was worth the money. That requires a price tag, a discount rate, an opportunity-cost guess. That part is genuinely a CFO conversation.&lt;/p&gt;

&lt;p&gt;What the cell does answer is whether the workflow changed at all. Without that, the CFO conversation is fiction. With it, the conversation is at least a real conversation.&lt;/p&gt;

&lt;p&gt;What's the cell your team has built and your CFO doesn't know about yet?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devops</category>
      <category>productivity</category>
      <category>career</category>
    </item>
    <item>
      <title>I Found An Agent Running Under A Rotated API Key - Here's What KYA Finally Named</title>
      <dc:creator>Mykola Kondratiuk</dc:creator>
      <pubDate>Wed, 22 Apr 2026 08:13:10 +0000</pubDate>
      <link>https://dev.to/itskondrat/i-found-an-agent-running-under-a-rotated-api-key-heres-what-kya-finally-named-3id3</link>
      <guid>https://dev.to/itskondrat/i-found-an-agent-running-under-a-rotated-api-key-heres-what-kya-finally-named-3id3</guid>
      <description>&lt;h1&gt;
  
  
  I Found An Agent Running Under A Rotated API Key - Here's What KYA Finally Named
&lt;/h1&gt;

&lt;p&gt;Last quarter I was digging through a provider dashboard looking for an unrelated bug when I noticed a service account still active under the API key of someone who had left the company in January. The account wasn't orphaned in any of the audits I ran. No alert fired. No one in our incident history had flagged it. It was just sitting there, calling endpoints on a schedule, billing charges to a credential attached to a person who no longer existed in the org chart.&lt;/p&gt;

&lt;p&gt;That was an agent. I had deployed it six months earlier. I'd rotated off the feature. Nobody picked it up. Nothing in the system knew to page.&lt;/p&gt;

&lt;p&gt;This morning a Bangkok fintech called MetaComp launched the world's first formal "Know Your Agent" framework at Money20/20 Asia. It was built for regulated finance. Firms that already have KYC and AML obligations for humans now need the equivalent for agents initiating payments, managing compliance calls, and rebalancing portfolios under their license. The release names the four things the framework covers: agent identity, authorization scope, monitoring, and accountability.&lt;/p&gt;

&lt;p&gt;The name is the interesting part. The concept isn't new. What is new: the category I was post-morteming six months ago now has an acronym, a framework document, and the beginning of a cross-industry vocabulary. If you run agents outside fintech, you have the same problem. And you have no regulator coming to force the fix.&lt;/p&gt;

&lt;p&gt;Here is what KYA looks like translated into non-regulated-enterprise terms.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agents have deployment. Most don't have identity.
&lt;/h2&gt;

&lt;p&gt;The one-minute test is brutal. Ask your own team, right now: which agent called that endpoint? If the answer takes longer than a minute to assemble from logs, config files, and tribal memory, you don't have Know Your Agent. You have Know Your Deployment. Those are different things.&lt;/p&gt;

&lt;p&gt;Identity is the answer to four questions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Who is this agent?&lt;/strong&gt; Name it. Record what it's called, who deployed it, what version is in production, what decision it was built to make. An anonymous cron job is not an agent. It's a latent incident.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;What is it authorized to do?&lt;/strong&gt; An agent's scope should be strictly smaller than the scope of the human who deployed it. The reason most deployments invert that rule is that the fastest path to a working agent is "give it the admin key." Every production incident I've touched started in one of two places: admin-key scope, or stale permissions tied to a rotated employee.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Who sees what it did?&lt;/strong&gt; Agents that log their own actions into their own memory are not audited. The audit record needs to live outside the agent: a different system, a different credential, ideally a different team owning the retention. Every incident retro I've sat in wanted this artifact and didn't have it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Who is on the hook when it acts?&lt;/strong&gt; One named human owner per agent. Not "the team." Not "whoever deployed it last." When that person rotates, the agent either gets a new owner or gets turned off the same afternoon. Orphan agents are to agent identity what orphan service accounts are to human identity: the attack surface that doesn't show up in any audit because nothing pages when they do something.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The fintech version is about to be mandatory. The rest isn't.
&lt;/h2&gt;

&lt;p&gt;What MetaComp shipped is a regulatory on-ramp. MAS, FINRA, the SEC. They're all moving in the same direction. Regulated financial services will have a KYA-equivalent obligation in the next rule cycle, whether it's called KYA or something else.&lt;/p&gt;

&lt;p&gt;The rest of the enterprise will not. No one is coming to require identity on the Zendesk-ticket-triage agent your customer support lead shipped last sprint. No one is coming to require authorization scope on the marketing-ops automation that posts campaigns under an admin OAuth token somebody generated in 2024. No one is coming to require audit trails for the product-analytics agent your growth team stood up against Segment last month.&lt;/p&gt;

&lt;p&gt;The 2026 Cisco AI Security Index puts the readiness gap at 54 points. 83% of enterprises plan to deploy agentic AI. 29% feel ready to secure it. The missing 54 points aren't a tooling gap. They are an identity gap. No vendor closes it for you because the artifact is organizational, not technical.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to actually do this week
&lt;/h2&gt;

&lt;p&gt;Three concrete moves, cheapest to most expensive:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Day 1 exercise: list your agents.&lt;/strong&gt; A literal spreadsheet. Every agent running against your infrastructure with name, purpose, owner, API credential, and the date that credential was last rotated. The first time I did this I found four agents nobody on my team could identify. Two of them were mine. One I'd forgotten about entirely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Day 2 exercise: diff scope vs. deployer scope.&lt;/strong&gt; For each agent in the list, write down what scope the credential has. Then write down what scope the person who deployed it has today. Not when they deployed it. The rows where the agent outscopes the current human are your inversions. Shrink them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Week 2 exercise: separate the audit.&lt;/strong&gt; Move the audit record out of the agent's runtime. A different system. A different credential. A different retention owner. The retrofit is tedious the first time. It's trivial after that. The first incident retro where it saves you pays for the whole week.&lt;/p&gt;

&lt;h2&gt;
  
  
  The PM artifact
&lt;/h2&gt;

&lt;p&gt;What MetaComp actually shipped today wasn't a framework. It was a name. The people doing this work inside their own stacks already know what it looks like. The value of the acronym is that it gives the work a ticket title. A retro artifact. A category on a roadmap that isn't "miscellaneous AI hygiene."&lt;/p&gt;

&lt;p&gt;For engineers: KYA is the identity layer for agents, the same way RBAC is the identity layer for humans. For PMs: KYA is the artifact you get to own before the first incident forces it. For solo builders: KYA is the four-question checklist you run before any agent touches production.&lt;/p&gt;

&lt;p&gt;For the agent I found running under a rotated API key six months ago. The one I wrote the postmortem for. Today is the first day that postmortem has a category name.&lt;/p&gt;

&lt;p&gt;Which agent on your team has one right now?&lt;/p&gt;

&lt;p&gt;Source: &lt;a href="https://www.prnewswire.com/apac/news-releases/metacomp-launches-the-worlds-first-ai-agent-governance-framework-for-regulated-financial-services-302748422.html" rel="noopener noreferrer"&gt;MetaComp launches the world's first AI agent governance framework for regulated financial services&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>security</category>
      <category>devops</category>
    </item>
    <item>
      <title>I Open-Sourced the AI Agent That Grew My LinkedIn 5x in 30 Days</title>
      <dc:creator>Mykola Kondratiuk</dc:creator>
      <pubDate>Wed, 15 Apr 2026 19:07:35 +0000</pubDate>
      <link>https://dev.to/itskondrat/i-open-sourced-the-ai-agent-that-grew-my-linkedin-5x-in-30-days-4a5j</link>
      <guid>https://dev.to/itskondrat/i-open-sourced-the-ai-agent-that-grew-my-linkedin-5x-in-30-days-4a5j</guid>
      <description>&lt;p&gt;I've been running an AI social media agent for months. It comments, posts and builds connections on LinkedIn, Twitter, Reddit, Dev.to, and 6 more platforms — as me, in my voice.&lt;/p&gt;

&lt;p&gt;Last week I open-sourced the whole thing. Here's the architecture, the hard decisions, and what I learned building it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/Open-Twin/opentwins" rel="noopener noreferrer"&gt;GitHub → Open-Twin/opentwins&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I built this
&lt;/h2&gt;

&lt;p&gt;I'm a tech leader at a gaming company by day. I also run a (&lt;a href="https://opentwins.ai" rel="noopener noreferrer"&gt;Personal Brand&lt;/a&gt;) across 10 platforms. The math was brutal:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;10 platforms × 3-5 meaningful comments/day = 30-50 interactions&lt;/li&gt;
&lt;li&gt;Each one takes 2-3 minutes if you actually read the post and write something thoughtful&lt;/li&gt;
&lt;li&gt;That's &lt;strong&gt;1.5-2.5 hours/day&lt;/strong&gt; just on engagement&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I wasn't willing to sacrifice quality for speed. Template-based tools like Buffer can schedule posts, but they can't &lt;em&gt;read a thread and contribute something useful&lt;/em&gt;. LinkedIn automation tools like Expandi get your account banned because they hit APIs directly.&lt;/p&gt;

&lt;p&gt;I needed something different: an AI that could think, read context, and engage like I would — using a real browser, not API abuse.&lt;/p&gt;

&lt;h2&gt;
  
  
  The stack (and why)
&lt;/h2&gt;

&lt;p&gt;Here's what's under the hood:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────┐
│           OpenTwins CLI             │
│         (Node.js + Bree)            │
├──────────┬──────────┬───────────────┤
│ Scheduler│Dashboard │  Agent Loop   │
│ (cron)   │ :3847    │               │
├──────────┴──────────┼───────────────┤
│                     │  Claude Code  │
│    Chrome CDP       │  (the brain)  │
│   (real browser)    │               │
├─────────────────────┴───────────────┤
│              SQLite                 │
│     (sessions, logs, metrics)       │
└─────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let me walk through the decisions that weren't obvious.&lt;/p&gt;

&lt;h3&gt;
  
  
  Decision 1: Real browser, not API calls
&lt;/h3&gt;

&lt;p&gt;This was non-negotiable. Every LinkedIn automation tool that hits their API directly gets detected and banned eventually. The detection isn't just about rate limits — it's about &lt;em&gt;how&lt;/em&gt; you connect.&lt;/p&gt;

&lt;p&gt;OpenTwins launches a real Chrome instance via CDP (Chrome DevTools Protocol). You log in once manually. The agent uses your actual browser session. From LinkedIn's perspective, there's no difference between you and the agent — same browser fingerprint, same cookies, same IP.&lt;/p&gt;

&lt;p&gt;The tradeoff: it's slower. Each action takes 5-15 seconds instead of milliseconds. But that's actually a feature — it looks human.&lt;/p&gt;

&lt;h3&gt;
  
  
  Decision 2: Claude as the brain
&lt;/h3&gt;

&lt;p&gt;I evaluated GPT-4o, Gemini, and Claude for the core agent loop. Claude won for three reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Voice matching.&lt;/strong&gt; I fed it 50 of my real comments and it nailed my tone. Not generic "Great post! 🔥" energy — actual technical depth with my specific quirks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context window.&lt;/strong&gt; Agents need to read an entire thread + the original post + the commenter's profile before deciding what to say. That's a lot of context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool use reliability.&lt;/strong&gt; The agent loop involves 8-12 tool calls per session. Claude's function calling was the most consistent.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The agent doesn't just generate text. It &lt;em&gt;thinks&lt;/em&gt;: "Is this post worth engaging with? What angle hasn't been covered in the comments? Would my human actually care about this?"&lt;/p&gt;

&lt;h3&gt;
  
  
  Decision 3: Local-first, no cloud
&lt;/h3&gt;

&lt;p&gt;Everything runs on your machine. Your credentials never leave your computer. There's no SaaS backend, no analytics we collect, no "free tier with data sharing."&lt;/p&gt;

&lt;p&gt;This was a product decision as much as a technical one. If someone's going to let an AI post as them on LinkedIn, they need to trust the system completely. Open source + local-only is the maximum trust architecture.&lt;/p&gt;

&lt;p&gt;The cost? You need your own Claude Code subscription or Anthropic API key. But at ~$2-5/day for 10 active platforms, it's cheaper than any SaaS alternative.&lt;/p&gt;

&lt;h2&gt;
  
  
  The agent loop (simplified)
&lt;/h2&gt;

&lt;p&gt;Every hour during your configured active hours, each platform agent runs this loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. DISCOVER  → Find relevant posts/threads to engage with
2. EVALUATE  → Score each one: relevance, engagement potential, recency
3. THINK     → Decide: comment, react, skip, or create original content
4. COMPOSE   → Generate response in your voice with full context
5. ACT       → Execute in the real browser
6. LOG       → Record everything to SQLite for the dashboard
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The EVALUATE step is where most of the intelligence lives. A bad agent comments on everything. A good agent is selective — just like a real person scrolling their feed.&lt;/p&gt;

&lt;h2&gt;
  
  
  What went wrong along the way
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Voice drift.&lt;/strong&gt; Early versions would slowly drift from my voice over long sessions. The agent would start sounding more "AI-generic" by comment #15. Fix: I now re-inject the voice calibration prompt every 5 actions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rate limit tuning.&lt;/strong&gt; My first LinkedIn run did 40 comments in 2 hours. That's... not human. I got a soft warning. Now the defaults are conservative: 8-12 comments/day on LinkedIn, spread across active hours with randomized gaps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform detection on Reddit.&lt;/strong&gt; Reddit's anti-bot systems are sophisticated. They look at things like: do you always comment within X seconds of a post going live? Do your comments follow a pattern? The fix was adding randomized delays and mixing in genuine "lurk" sessions where the agent reads but doesn't engage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The "too helpful" problem.&lt;/strong&gt; Claude is &lt;em&gt;really&lt;/em&gt; good at writing helpful, thorough comments. But sometimes a 3-paragraph response to a simple question looks suspicious. I had to add length calibration: match the depth of your response to the depth of the thread.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real numbers (30 days)
&lt;/h2&gt;

&lt;p&gt;Since going live with the full 10-platform setup:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;th&gt;Change&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LinkedIn profile views/week&lt;/td&gt;
&lt;td&gt;~120&lt;/td&gt;
&lt;td&gt;~450&lt;/td&gt;
&lt;td&gt;+275%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;New connections/week&lt;/td&gt;
&lt;td&gt;5-8&lt;/td&gt;
&lt;td&gt;35-40&lt;/td&gt;
&lt;td&gt;+5x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Inbound DMs/week&lt;/td&gt;
&lt;td&gt;1-2&lt;/td&gt;
&lt;td&gt;8-12&lt;/td&gt;
&lt;td&gt;+6x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hours spent on engagement&lt;/td&gt;
&lt;td&gt;10-15h&lt;/td&gt;
&lt;td&gt;&amp;lt;1h&lt;/td&gt;
&lt;td&gt;-93%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Platforms actively maintained&lt;/td&gt;
&lt;td&gt;3-4&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;+150%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The sub-1-hour figure is real. I spend about 30 minutes per week reviewing the activity feed and adjusting the strategy. The agents handle the rest.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;OpenTwins is MIT licensed and I'm actively developing it. The roadmap includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multi-language support&lt;/strong&gt; — agents that can engage in different languages per platform&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Content pipeline&lt;/strong&gt; — auto-generate original posts from your existing content (blog posts, repos, talks)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Team mode&lt;/strong&gt; — run agents for multiple team members from one dashboard&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Analytics dashboard&lt;/strong&gt; - deeper insights into what types of engagement drive the most results&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/Open-Twin/opentwins" rel="noopener noreferrer"&gt;⭐ Star on GitHub&lt;/a&gt;&lt;/strong&gt; if this is interesting to you. Issues and PRs are very welcome.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm Mykola - I build things with AI and write about the process. Follow me here on DEV.to for more OpenTwins updates and the occasional deep dive into AI agents, automation and building in public.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Have questions about the architecture or want to share your results? Drop a comment below - I read every one (manually, I promise 😄).&lt;/em&gt;&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>ai</category>
      <category>automation</category>
      <category>buildinpublic</category>
    </item>
    <item>
      <title>The Agentic Enterprise Has an Architecture Now: Micro, Macro, and the Missing Governance Layer</title>
      <dc:creator>Mykola Kondratiuk</dc:creator>
      <pubDate>Wed, 15 Apr 2026 07:07:54 +0000</pubDate>
      <link>https://dev.to/itskondrat/the-agentic-enterprise-has-an-architecture-now-micro-macro-and-the-missing-governance-layer-29nj</link>
      <guid>https://dev.to/itskondrat/the-agentic-enterprise-has-an-architecture-now-micro-macro-and-the-missing-governance-layer-29nj</guid>
      <description>&lt;p&gt;CIO published a framework this week that gave the agentic enterprise its first clear architectural diagram. Micro agents execute narrow tasks. Macro agents orchestrate workflows. But the governance layer - who defines outcomes and authorization scope - is unnamed.&lt;/p&gt;

&lt;p&gt;As someone who's been building and governing agent workflows for the past year, I want to walk through what this architecture actually looks like in practice, and why the governance gap matters more than most engineering teams realize.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;

&lt;p&gt;The micro/macro split is intuitive if you've built any multi-agent system.&lt;/p&gt;

&lt;p&gt;Micro agents are scoped tight. One tool, one task, one output format. Think: "validate this CSV against the schema," "summarize this PR," "check this deployment status." They're the functions in your agent pipeline.&lt;/p&gt;

&lt;p&gt;Macro agents chain micro agents into workflows. They handle sequencing, error routing, and state management across a multi-step process. "Process this customer onboarding end-to-end" is a macro agent job.&lt;/p&gt;

&lt;p&gt;If you're a developer, this maps to something you already know: micro agents are microservices, macro agents are orchestrators. The patterns are similar. The failure modes are too.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Governance Gap
&lt;/h2&gt;

&lt;p&gt;Here's what CIO flagged and what I've experienced firsthand: nobody defines the contract between the human and the macro agent.&lt;/p&gt;

&lt;p&gt;In a microservices architecture, you have SLAs, API contracts, and monitoring. In a macro agent architecture, you need the equivalent:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Outcome contracts&lt;/strong&gt; - not "process the onboarding" but "new customer reaches first value milestone within 48 hours with zero manual intervention." The macro agent optimizes for whatever you define as done. If your contract is vague, the agent's behavior will be unpredictable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Authorization scope&lt;/strong&gt; - this has been my biggest learning. Most agent failures I've seen aren't capability failures. They're scope failures. The agent could do the thing, but nobody specified whether it should.&lt;/p&gt;

&lt;p&gt;A simplified example of what authorization scope definition looks like in practice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;agent&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;onboarding-orchestrator&lt;/span&gt;
&lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;macro&lt;/span&gt;
&lt;span class="na"&gt;authorization&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;can_access&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;customer_profile (read)&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;onboarding_checklist (read/write)&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;notification_service (write)&lt;/span&gt;
  &lt;span class="na"&gt;cannot_access&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;billing_system&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;admin_settings&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;customer_communications (direct)&lt;/span&gt;
  &lt;span class="na"&gt;escalation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;trigger&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;confidence_score &amp;lt; &lt;/span&gt;&lt;span class="m"&gt;0.7&lt;/span&gt;
    &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;queue_for_human_review&lt;/span&gt;
  &lt;span class="na"&gt;scope_boundary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;max_actions_per_run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt;
    &lt;span class="na"&gt;timeout_minutes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
    &lt;span class="na"&gt;requires_approval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;delete_*&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;modify_billing_*&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This looks like infrastructure config. It is. But someone needs to decide what goes in each field, and that decision is a project management decision, not an engineering decision. What the agent can touch, when it escalates, what requires human approval - scope and risk decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Authorization Scope Is the Most Overlooked Governance Tool
&lt;/h2&gt;

&lt;p&gt;I posted about this on social media this week and it was my highest-engagement topic by far. The reason: developers building agent systems are hitting authorization failures in production and realizing nobody defined the boundaries before deployment.&lt;/p&gt;

&lt;p&gt;The pattern I keep seeing:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Team builds agent with broad capabilities&lt;/li&gt;
&lt;li&gt;Agent works great in testing (controlled environment, predictable inputs)&lt;/li&gt;
&lt;li&gt;Agent goes to production and immediately does something nobody expected&lt;/li&gt;
&lt;li&gt;Team scrambles to add guardrails after the fact&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Sound familiar? It's the same pattern as deploying a service without rate limiting. The fix is the same too: define the boundaries before deployment.&lt;/p&gt;

&lt;p&gt;The difference is that service boundaries are well-understood engineering practice. Agent authorization boundaries are not. There's no equivalent of an API gateway for agent scope. Yet.&lt;/p&gt;

&lt;h2&gt;
  
  
  AWS Agent Registry: First Infrastructure
&lt;/h2&gt;

&lt;p&gt;AWS launched Agent Registry in preview this week, and it's the first serious infrastructure play for this problem.&lt;/p&gt;

&lt;p&gt;What it provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Centralized registry for discovering and cataloging agents&lt;/li&gt;
&lt;li&gt;Approval workflows for deployment&lt;/li&gt;
&lt;li&gt;Ownership records and accountability chains&lt;/li&gt;
&lt;li&gt;Semantic search across agent inventories&lt;/li&gt;
&lt;li&gt;MCP server integration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For developers, this is a service registry with governance features. For the person defining what agents are allowed to do (whoever that turns out to be), it's the first production-grade tool for the job.&lt;/p&gt;

&lt;p&gt;The gap: Agent Registry handles discovery and lifecycle management. It doesn't handle runtime authorization scope. You still need to define and enforce what each agent can do during execution. That's a layer on top that doesn't exist as a product yet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Edge Cases Are the Real Trust Audit
&lt;/h2&gt;

&lt;p&gt;I had a conversation with an agent tool maker on Product Hunt this week. Asked how they handle edge cases. The response: "it's the hardest part."&lt;/p&gt;

&lt;p&gt;Edge cases in agent deployment are different from edge cases in traditional software. In traditional software, edge cases produce errors. In agent systems, edge cases produce confident-looking wrong actions. The agent doesn't crash - it does something unexpected with full confidence.&lt;/p&gt;

&lt;p&gt;The governance response: define edge case behavior in the authorization scope. What happens when input is ambiguous? When the API times out? When the agent's confidence is low? These need explicit handling, not default behaviors.&lt;/p&gt;

&lt;h2&gt;
  
  
  The PM-Shaped Hole
&lt;/h2&gt;

&lt;p&gt;I keep coming back to this: the governance decisions in agent architecture are project management decisions wearing engineering clothes.&lt;/p&gt;

&lt;p&gt;Outcome contracts are acceptance criteria. Authorization scope is project scope. Lifecycle management is portfolio governance. Approval workflows are change management.&lt;/p&gt;

&lt;p&gt;The tooling is developer-facing right now. AWS Agent Registry's documentation assumes you know what an MCP server is. But the decisions being encoded in that tooling are PM decisions.&lt;/p&gt;

&lt;p&gt;Whether PMs claim this work or engineers do it by default will depend on who moves first. The architecture is published. The infrastructure is shipping. The governance decisions are being made now, with or without project management input.&lt;/p&gt;




&lt;p&gt;What does authorization scope look like in your agent deployments? Drop a comment — I'm curious whether teams are defining this upfront or retrofitting it after the first production incident.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>security</category>
      <category>career</category>
    </item>
    <item>
      <title>Issue Tracking Is Dead - Here's What PMs Actually Manage Now</title>
      <dc:creator>Mykola Kondratiuk</dc:creator>
      <pubDate>Wed, 08 Apr 2026 08:27:29 +0000</pubDate>
      <link>https://dev.to/itskondrat/issue-tracking-is-dead-heres-what-pms-actually-manage-now-12ih</link>
      <guid>https://dev.to/itskondrat/issue-tracking-is-dead-heres-what-pms-actually-manage-now-12ih</guid>
      <description>&lt;p&gt;Linear's CEO just declared issue tracking dead. And honestly? The data's been saying this for months.&lt;/p&gt;

&lt;p&gt;25% of new issues on Linear are created by AI agents. That number was 5x lower three months ago. 75% of enterprise workspaces have coding agents installed. The agent creates the issue, writes the code, opens the PR.&lt;/p&gt;

&lt;p&gt;So what does the PM do when the entire delivery pipeline runs itself?&lt;/p&gt;

&lt;h2&gt;
  
  
  The Accountability Chain Problem
&lt;/h2&gt;

&lt;p&gt;Here's what broke. The traditional workflow has an implicit accountability chain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Human creates issue → Human picks it up → PM owns the outcome
      ↑ context          ↑ delivery          ↑ accountability
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When agents enter the chain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent creates issue → Agent writes code → Agent reviews PR → PM owns... what?
      ↑ ???                ↑ ???               ↑ ???              ↑ everything
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The accountability chain fractures. Nobody validated the agent's issue was worth building. Nobody checked the agent's acceptance criteria. The PM is suddenly accountable for an outcome they didn't initiate, built by entities they can't have a standup with.&lt;/p&gt;

&lt;p&gt;I hit this wall managing my own agent workflows about two months ago. Stopped assigning work, started auditing what agents decided to do. The shift was disorienting until I named it: I'm not managing work anymore. I'm governing outcomes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anthropic's Glasswing - A PM Framework in Disguise
&lt;/h2&gt;

&lt;p&gt;Anthropic launched Project Glasswing yesterday. On the surface it's a safety program for their most powerful model. 12 vetted organizations. Scoped access. Usage audits. $100M in accountability infrastructure.&lt;/p&gt;

&lt;p&gt;But look at the five principles they built:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Scope access&lt;/strong&gt; - not every agent gets access to everything&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vet use cases&lt;/strong&gt; - just because it can doesn't mean it should&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit outputs&lt;/strong&gt; - systematic review of what agents produce&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Budget accountability&lt;/strong&gt; - factor governance into the cost model&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Know when NOT to deploy&lt;/strong&gt; - some workflows need humans&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's not a safety framework. That's a PM governance playbook.&lt;/p&gt;

&lt;p&gt;If you're running agent workflows in your org, here's how those principles translate:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;agent_governance&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;scope&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;define which systems each agent can access&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;set boundaries BEFORE deployment, not after incidents&lt;/span&gt;
  &lt;span class="na"&gt;vetting&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;map which workflows benefit from agent-generated work&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;identify workflows that need human initiation&lt;/span&gt;
  &lt;span class="na"&gt;audit&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;build review cycles for agent-created issues&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;quality gates at each pipeline stage&lt;/span&gt;
  &lt;span class="na"&gt;accountability&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;define who owns outcomes when agents create the work&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;factor incident response into agent deployment costs&lt;/span&gt;
  &lt;span class="na"&gt;boundaries&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;identify workflows requiring human judgment&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;document why certain workflows stay manual&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What I Learned Rebuilding 17 Agent Accountability Chains in One Afternoon
&lt;/h2&gt;

&lt;p&gt;A few weeks back, a platform change forced me to migrate my entire agent setup in one afternoon. The migration itself wasn't the hard part - swapping configs, updating endpoints, that's mechanical.&lt;/p&gt;

&lt;p&gt;The hard part was rebuilding "who owns what."&lt;/p&gt;

&lt;p&gt;Every agent had implicit accountability chains I'd never written down. This agent creates issues but a human reviews them. That agent writes code but only in specific repos. Another agent can deploy to staging but never production.&lt;/p&gt;

&lt;p&gt;When I had to rebuild from scratch, I realized none of those chains were documented. They lived in my head. That's fine for one person running a handful of agents. It's a disaster for a team.&lt;/p&gt;

&lt;p&gt;The exercise took an hour. I mapped every agent to its access scope, its review requirements, and its accountability chain. If you're running any kind of agent workflow, do this before your platform forces you to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Agent: Issue Creator&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Access: Linear workspace, read-only on codebase
&lt;span class="p"&gt;-&lt;/span&gt; Can create: Bug reports, feature suggestions
&lt;span class="p"&gt;-&lt;/span&gt; Review required: Human validates before issue enters sprint
&lt;span class="p"&gt;-&lt;/span&gt; Accountable: PM (me) for issue quality

&lt;span class="gu"&gt;## Agent: Code Writer  &lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Access: Specific repos only, staging environment
&lt;span class="p"&gt;-&lt;/span&gt; Can create: PRs, branch commits
&lt;span class="p"&gt;-&lt;/span&gt; Review required: Human code review before merge
&lt;span class="p"&gt;-&lt;/span&gt; Accountable: Tech lead for code quality, PM for scope
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Simple? Yeah. But I guarantee most teams running agents haven't done it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 10x Employee Governance Gap
&lt;/h2&gt;

&lt;p&gt;The "10x employee" narrative is everywhere right now. One person plus AI replaces five. Solo founder to $80M exit. The numbers are impressive.&lt;/p&gt;

&lt;p&gt;Nobody's asking the governance question though.&lt;/p&gt;

&lt;p&gt;The 10x employee makes judgment calls at 5x the rate. Runs agent stacks that drift incrementally. Produces outputs nobody else can review because nobody else has context on what the agents actually did.&lt;/p&gt;

&lt;p&gt;I've seen agent drift firsthand. It's not dramatic. It's a half-percent quality shift per week. The agent interprets a requirement slightly off. Over a month, those tiny drifts compound into something you wouldn't have shipped manually.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Dies, What Survives
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Dead:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Status updates (agent knows its status)&lt;/li&gt;
&lt;li&gt;Task assignment (agent picks up work)&lt;/li&gt;
&lt;li&gt;Progress tracking (pipeline is observable)&lt;/li&gt;
&lt;li&gt;Manual triage (agent prioritizes on data)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Survives and gets harder:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Outcome definition&lt;/li&gt;
&lt;li&gt;Quality gates&lt;/li&gt;
&lt;li&gt;Accountability chains&lt;/li&gt;
&lt;li&gt;Agent governance&lt;/li&gt;
&lt;li&gt;Stakeholder alignment&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What to Do Monday Morning
&lt;/h2&gt;

&lt;p&gt;If you're a PM reading this and thinking "ok but what do I actually change" - here's the practical version:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Audit your current agent usage. Which agents create work items? Which write code? Which have production access?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Map accountability chains. For each agent, who owns the outcome of what it produces?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Build review cycles. Not for everything - for the high-risk outputs. Agent-created issues that go to sprint. Agent-written code that touches production.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Document boundaries. Which workflows stay manual? Why? Write it down before someone automates them without asking.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Set quality gates. What's the bar for agent outputs? How do you measure drift over time?&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Linear's CEO just told you tracking is dead. The PM who builds agent governance this week is the PM who stays relevant next year.&lt;/p&gt;

&lt;p&gt;What's your agent governance setup look like? Curious how other teams are handling this.&lt;/p&gt;

</description>
      <category>projectmanagement</category>
      <category>ai</category>
      <category>agile</category>
      <category>devtools</category>
    </item>
    <item>
      <title>Anthropic's 3-Agent Harness Is Just a Sprint - And That's the Point</title>
      <dc:creator>Mykola Kondratiuk</dc:creator>
      <pubDate>Sun, 05 Apr 2026 08:53:28 +0000</pubDate>
      <link>https://dev.to/itskondrat/anthropics-3-agent-harness-is-just-a-sprint-and-thats-the-point-3k8o</link>
      <guid>https://dev.to/itskondrat/anthropics-3-agent-harness-is-just-a-sprint-and-thats-the-point-3k8o</guid>
      <description>&lt;p&gt;I've been mapping PM workflows to agent architectures for months. Then Anthropic went and published the diagram.&lt;/p&gt;

&lt;p&gt;Their multi-agent harness for autonomous software engineering has three roles:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Planner → takes short prompt, outputs full spec
Generator → takes spec, builds output  
Evaluator → runs tests, scores against contracts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's scope, execute, review. That's a sprint. They just compiled project management into agent infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Mapping
&lt;/h2&gt;

&lt;p&gt;I sat down and compared the harness to what I do every sprint:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Harness Role&lt;/th&gt;
&lt;th&gt;PM Equivalent&lt;/th&gt;
&lt;th&gt;What It Does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Planner&lt;/td&gt;
&lt;td&gt;Scoping / Requirements&lt;/td&gt;
&lt;td&gt;Expands vague input into actionable spec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Generator&lt;/td&gt;
&lt;td&gt;Sprint Execution&lt;/td&gt;
&lt;td&gt;Builds the thing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Evaluator&lt;/td&gt;
&lt;td&gt;Acceptance Criteria / QA&lt;/td&gt;
&lt;td&gt;Tests output against contracts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Iteration Loop&lt;/td&gt;
&lt;td&gt;Sprint Retro&lt;/td&gt;
&lt;td&gt;Feeds results back, adjusts approach&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The loop runs 5-15 iterations over 2-6 hours. Each cycle, the Evaluator scores the output and feeds results back to the Planner. The Planner adjusts the spec. The Generator tries again.&lt;/p&gt;

&lt;p&gt;Replace "iteration" with "sprint" and "contract" with "acceptance criteria" and you have every PM's weekly cycle.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Cost/Quality Tradeoff Is a PM Problem
&lt;/h2&gt;

&lt;p&gt;Here's where it gets interesting:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Solo agent run: 20 minutes, $9&lt;/li&gt;
&lt;li&gt;Full 3-agent harness: 6 hours, $200&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;40x cost difference. The question: when is it worth it?&lt;/p&gt;

&lt;p&gt;I've been running agent pipelines where I make this call daily. Quick config change? Solo run. Feature touching three services? Full pipeline. The judgment is identical to sprint planning - which items get the full team, which get a quick fix.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# pseudo-decision tree for harness allocation
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;complexity&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;THRESHOLD&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;run_solo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;budget&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;20min&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;run_harness&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;planner&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;spec_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;generator&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;build_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;evaluator&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;test_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;max_iterations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;budget&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;time_limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;6hrs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That decision tree is resource allocation. PMs do it every sprint. The currency changed from developer-hours to compute-dollars, but the judgment is the same.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned Migrating My Own Pipeline
&lt;/h2&gt;

&lt;p&gt;After Anthropic changed their architecture, I had to migrate my agent pipeline to match. The Planner/Generator/Evaluator pattern actually made things clearer.&lt;/p&gt;

&lt;p&gt;Before, I had agents with fuzzy boundaries - some planned and executed, some executed and reviewed. The separation of concerns forced better thinking about where each decision lives.&lt;/p&gt;

&lt;p&gt;The biggest lesson: the Evaluator is not optional. I had a pipeline running without proper evaluation criteria and the output quality was inconsistent - some runs were excellent, some were garbage. Adding structured evaluation with clear contracts was the single biggest improvement. Same as adding acceptance criteria to user stories.&lt;/p&gt;

&lt;h2&gt;
  
  
  The FreeBSD Warning
&lt;/h2&gt;

&lt;p&gt;An AI agent hacked FreeBSD in 4 hours. Autonomously. Nobody told it to.&lt;/p&gt;

&lt;p&gt;That's a harness with no scope constraints on the Planner and no Evaluator checking outputs. The Generator just ran until it accomplished... something.&lt;/p&gt;

&lt;p&gt;In PM terms: shipping without QA. Except in the agent world, "bugs" means "autonomous systems exceeding their mandate at machine speed."&lt;/p&gt;

&lt;p&gt;The Evaluator role is the governance layer. If you're building agent systems without one, you're shipping without QA. It works fine until it doesn't, and when it doesn't, it fails spectacularly.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "Harness Design" Actually Means for Builders
&lt;/h2&gt;

&lt;p&gt;Mollick's 3-layer model puts the harness at the top - above the model, above the app. The harness determines what agents can actually do.&lt;/p&gt;

&lt;p&gt;For builders and PMs working with agent systems:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Define your evaluation contracts first.&lt;/strong&gt; Before you build, decide what "done" looks like. Specific enough that an automated system can check it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Separate planning from execution.&lt;/strong&gt; The Planner and Generator should have different scopes. Don't let one agent do both.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Budget your iterations.&lt;/strong&gt; 15 iterations is Anthropic's upper bound. What's yours? Set it explicitly.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Make cost/quality tradeoffs visible.&lt;/strong&gt; Track which tasks get the full harness vs solo runs. Build your own decision criteria.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;GPT-5.4 just hit 75% on desktop task benchmarks - above human level for routine knowledge work. The question isn't "should I automate?" anymore. It's "what harness do I build?"&lt;/p&gt;

&lt;p&gt;If you've been running sprints, you already know how. The vocabulary just changed.&lt;/p&gt;




&lt;p&gt;What harness patterns are you seeing in your agent architectures? I'm curious how others are handling the Planner/Evaluator split.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devtools</category>
      <category>career</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
