<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Todd Linnertz</title>
    <description>The latest articles on DEV Community by Todd Linnertz (@todd_linnertz_871a076f68e).</description>
    <link>https://dev.to/todd_linnertz_871a076f68e</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3861685%2F9d1bf0cc-474e-4ed5-8ed8-39902bf50cc0.png</url>
      <title>DEV Community: Todd Linnertz</title>
      <link>https://dev.to/todd_linnertz_871a076f68e</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/todd_linnertz_871a076f68e"/>
    <language>en</language>
    <item>
      <title>The Agent Is 20% of the Work. The Platform Is the Other 80%.</title>
      <dc:creator>Todd Linnertz</dc:creator>
      <pubDate>Sun, 17 May 2026 04:56:38 +0000</pubDate>
      <link>https://dev.to/todd_linnertz_871a076f68e/the-agent-is-20-of-the-work-the-platform-is-the-other-80-4cf8</link>
      <guid>https://dev.to/todd_linnertz_871a076f68e/the-agent-is-20-of-the-work-the-platform-is-the-other-80-4cf8</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://devopsdiary.blog/agent-is-20-percent-of-the-work" rel="noopener noreferrer"&gt;devopsdiary.blog&lt;/a&gt;. Post F-AID1 in the "Governing AI in the Enterprise" series.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A payroll team shipped a production AI agent last year. Real workload, not a demo: processing 3,000+ emails a day, classifying them, extracting data and entering payroll. Six distinct steps, end to end.&lt;/p&gt;

&lt;p&gt;Their test accuracy: 94%. Good enough to ship.&lt;/p&gt;

&lt;p&gt;Their production accuracy: 70%.&lt;/p&gt;

&lt;p&gt;That's the talk I keep thinking about from AI Dev 26. The drop itself isn't news. What they did about it is.&lt;/p&gt;

&lt;h2&gt;
  
  
  The accuracy gap has a cause
&lt;/h2&gt;

&lt;p&gt;The 94% looked clean because the test set was curated. It covered the cases the team had thought of. Production didn't care about that. It sent typos. Impossible numbers. Screenshots. Hand-drawn notes. Vague references with no context. Conflicting instructions from two people in the same email thread.&lt;/p&gt;

&lt;p&gt;The test distribution and the production distribution weren't the same. They almost never are.&lt;/p&gt;

&lt;p&gt;A better model didn't close the gap. They ran shadow testing: the agent processed real production emails alongside their human team for four weeks, generating payroll entries but not submitting them. Humans reviewed the shadow outputs. Edge cases surfaced. New tests got written.&lt;/p&gt;

&lt;p&gt;Final accuracy: 98%. The agent didn't change. The scaffolding around it did.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Month&lt;/th&gt;
&lt;th&gt;Accuracy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;M1&lt;/td&gt;
&lt;td&gt;55%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;M2&lt;/td&gt;
&lt;td&gt;97%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;M3&lt;/td&gt;
&lt;td&gt;87%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;M4&lt;/td&gt;
&lt;td&gt;94%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;M5 (live)&lt;/td&gt;
&lt;td&gt;70%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;M6 (shadow)&lt;/td&gt;
&lt;td&gt;98%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Six months of accuracy data from the payroll agent. The dip at M5 is what shipping without production-distribution testing looks like.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The 20/80 problem
&lt;/h2&gt;

&lt;p&gt;The final slide from that talk had a number I wrote down immediately: agent engine = 20% of the work. The durable system around the agent = 80%.&lt;/p&gt;

&lt;p&gt;That ratio feels off if you've spent most of your time thinking about which model to use, how to prompt it, how to evaluate it against a benchmark. Those things matter. They're just not where a production AI project lives or dies.&lt;/p&gt;

&lt;p&gt;The 80% is the multi-stage evaluation pipeline. Shadow testing infrastructure. The control tower that gives ops and leadership visibility into what the agent is actually doing. Input governance for the weird formats production throws at you. The routing logic that decides which step of the workflow a given input actually belongs in.&lt;/p&gt;

&lt;p&gt;None of that is prompt engineering. All of it is platform work.&lt;/p&gt;

&lt;p&gt;I've spent 30 years watching organizations adopt new technology and invest heavily in the visible capability while underbuilding the infrastructure that makes it last. The pattern is consistent. AI isn't running a different play.&lt;/p&gt;

&lt;h2&gt;
  
  
  What breaks without the infrastructure
&lt;/h2&gt;

&lt;p&gt;Enterprise AI conversations split fast once you get past the demo stage. Some teams want to know about governance, evaluation pipelines, how outputs get reviewed before they do anything irreversible. Most are asking which model to use and when they can ship.&lt;/p&gt;

&lt;p&gt;The 70% drop happens. Without a control tower to surface it, teams find out through complaints, not metrics.&lt;/p&gt;

&lt;p&gt;That's a platform problem. Someone has to own the pipeline, not just the agent.&lt;/p&gt;

&lt;h2&gt;
  
  
  The line I can't stop thinking about
&lt;/h2&gt;

&lt;p&gt;Day two had a closing panel. Loose, riffing. One panelist dropped a line that's been with me since:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"If you don't own your harness, you don't own your memory."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It took a beat to unpack. Your harness is your evaluation infrastructure: the test pipelines, the shadow mode, the tooling that decides what "good" looks like for your specific agents on your specific workloads. Your memory is what that harness teaches you over time: where your agents fail, which prompts hold up under real traffic, what your actual production distribution looks like.&lt;/p&gt;

&lt;p&gt;Outsource the harness to a vendor and the vendor runs your evaluation loop. They see your production failures first. Every edge case your agents surface builds their system's understanding, not yours.&lt;/p&gt;

&lt;p&gt;Most teams are focused on which LLM provider to pick, which coding assistant to standardize on. The harness question comes later, usually when a vendor relationship turns complicated and they realize how hard it is to move.&lt;/p&gt;

&lt;p&gt;The payroll team built their own. Multi-stage evals, shadow infrastructure, control tower, four weeks of real production traffic before anything touched the write path. That's why they landed at 98%. And that's why the knowledge of how to get there belongs to them.&lt;/p&gt;

&lt;p&gt;Twenty percent for the agent. Eighty percent for the system around it. Teams that understand that ratio are the ones shipping agents that stick.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>platformengineering</category>
      <category>devops</category>
      <category>mlops</category>
    </item>
    <item>
      <title>What DevOps Taught Me About Running a Function</title>
      <dc:creator>Todd Linnertz</dc:creator>
      <pubDate>Thu, 23 Apr 2026 03:32:38 +0000</pubDate>
      <link>https://dev.to/todd_linnertz_871a076f68e/what-devops-taught-me-about-running-a-function-2b6a</link>
      <guid>https://dev.to/todd_linnertz_871a076f68e/what-devops-taught-me-about-running-a-function-2b6a</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://devopsdiary.blog" rel="noopener noreferrer"&gt;devopsdiary.blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Most engineering orgs measure platform teams like project teams. Both halves are wrong, and the second one is what kills them. Here are the three metrics that actually tell you if a platform function is working.&lt;/p&gt;

&lt;p&gt;The first time I inherited a platform team I asked the obvious question. How is the platform doing? Uptime green, deploys up, tickets closing faster than they were opening. Two months later I knew none of those numbers had told me anything about whether the team was actually doing its job.&lt;/p&gt;

&lt;p&gt;Once you see that gap, you can’t run a platform org any other way.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A function is not a project team&lt;/strong&gt;&lt;br&gt;
Most engineering organizations staff platform teams like project teams and then measure them like project teams. Both halves are wrong, and the second one is what kills them.&lt;/p&gt;

&lt;p&gt;A project team exists to ship a thing. You measure it by whether the thing shipped, when, and how well it works. The metrics are about the team because the output is the team’s output.&lt;/p&gt;

&lt;p&gt;A function is different. Platform engineering, DevOps, security, developer productivity: these are functions. A function exists to change the slope of everyone else’s work. Its output is not its own output. The thing you measure is what becomes possible across the rest of the org because the function exists.&lt;/p&gt;

&lt;p&gt;If you measure a function the way you measure a project team you’ll get a team that ships beautiful internal artifacts nobody uses. Green dashboards and rising attrition. A platform org that looks healthy from the inside and is quietly failing from the outside, and you won’t see the failure until the consumer teams stop pretending.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Metric 1: adoption velocity&lt;/strong&gt;&lt;br&gt;
Adoption velocity is the percentage of consumer teams that move to the platform’s current standard within ninety days of release. Not whether they all get there eventually. The shape of the curve in the first quarter.&lt;/p&gt;

&lt;p&gt;This is the metric that tells you whether the gap between built and adopted is closing or widening. A platform team can ship excellent technical work and still fail if the curve is flat. Worse, a flat curve means the platform team is generating debt at the same rate as the rest of the org, because every standard they release that nobody adopts becomes another version the team has to support forever.&lt;/p&gt;

&lt;p&gt;When I led GitOps adoption, the first quarter looked great. Teams onboarded. We had momentum, we had a story to tell, the architecture review board was happy. The second quarter, same platform, same docs, same support model but the curve had stalled and nobody on the team noticed because the dashboards were full of green.&lt;/p&gt;

&lt;p&gt;I went and talked to the teams that hadn’t adopted. Almost none of their reasons were technical. The blockers were political. Once I knew that, the fix was a half-day of negotiation with the product owners. The curve unstalled the next sprint.&lt;/p&gt;

&lt;p&gt;Without an adoption curve I would have kept measuring uptime and deploy counts and concluded the team was crushing it. The team was crushing it. The platform was failing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Metric 2: time to first success for a new consumer&lt;/strong&gt;&lt;br&gt;
This one is the cleanest signal in the set. How long does it take a brand-new team (one that has never touched the platform) to get from “we’re adopting this” to “we shipped something to production using it.”&lt;/p&gt;

&lt;p&gt;Time to first success is the only proxy I trust for whether the documentation, the onboarding model and the support story actually work. It’s also the metric most platform teams are catastrophically wrong about, because they’ve never measured it. They ask each other whether the platform is intuitive and they all agree it is, because they built it.&lt;/p&gt;

&lt;p&gt;Earlier in my career I inherited operational workflows where new teams were taking six weeks to onboard. Six weeks is a structural problem dressed up as an onboarding problem. The platform team had been adding documentation and the number hadn’t moved. Their theory was that the new teams weren’t reading carefully enough.&lt;/p&gt;

&lt;p&gt;We didn’t write more docs. We restructured the handoffs. Of the four points where new teams were stalling, we collapsed two, automated one and put a single owner on the fourth. New teams started shipping in four days. Defect rates dropped, and throughput improved.&lt;/p&gt;

&lt;p&gt;None of that came from better tooling. All of it came from going to look at a number the team wasn’t measuring and refusing to accept that the existing onboarding was working just because the team said it was.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Metric 3: support ratio&lt;/strong&gt;&lt;br&gt;
The third metric is the percentage of platform-team engineering hours going to consumer support, hand-holding and break-fix versus platform development. Healthy platform teams trend toward more development over time as the platform matures. Unhealthy teams trend the other way and don’t notice until the burnout hits and the senior engineers start interviewing.&lt;/p&gt;

&lt;p&gt;Support ratio is the leading indicator for every organizational failure mode in platform engineering. Burnout. Attrition. Scope creep. Feature stagnation. The eventual quiet rebellion of the consumer teams who have been getting worse responses every month and have stopped expecting better. If you only get to watch one number on a platform org, watch this one.&lt;/p&gt;

&lt;p&gt;It’s also the only metric that tells you whether the team’s design (interfaces, automation, self-service) is actually reducing toil or just relocating it. A team that ships a self-service portal and watches the support ratio climb has built a portal consumers can’t use.&lt;/p&gt;

&lt;p&gt;This is the metric that convinced me the next generation of platform engineering needs structural governance. Better tools won’t save it. When AI generation accelerates the rate at which consumer teams produce work, the support ratio explodes unless the platform itself produces frozen, validated artifacts that the consumers can trust without a human in the loop. That conviction is why I’ve spent the last few months building AIEOS.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What these metrics force you to do&lt;/strong&gt;&lt;br&gt;
Once these three numbers are on your dashboard, the leadership job changes. You stop measuring your team by what they shipped and start measuring them by what the rest of the org shipped because of them. That sounds small. It isn’t.&lt;/p&gt;

&lt;p&gt;The roadmap shifts, because you become willing to deprecate your own team’s work when adoption stalls instead of doubling down on a thing nobody is using. The way you spend political capital shifts, because you start defending the platform team’s time against the constant pressure to absorb every adjacent problem in the org.&lt;/p&gt;

&lt;p&gt;It also changes the conversations you have with your own leadership. You stop reporting up on what your team built and start reporting up on what your team made possible. Those are different sentences. The second one is the one Directors and VPs are paid to say.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The hard part&lt;/strong&gt;&lt;br&gt;
The hardest thing about running a function is that the work is invisible until it isn’t. A team that’s quietly doing it right looks identical to a team that’s quietly burning down. Velocity charts won’t tell you which is which. Neither will uptime or deploy counts. These three metrics are how I tell the difference, and I can usually tell within the first month of taking over.&lt;/p&gt;

&lt;p&gt;If you’re running a platform org and these aren’t on your dashboard, they should be. And if you’re hiring someone to run one, they should already be talking about them.&lt;/p&gt;




&lt;p&gt;Todd Linnertz is a Senior Technology Leader with deep experience in enterprise architecture and DevOps. He is the creator of AIEOS, an open-source AI governance system for software delivery teams. Find him at &lt;a href="//devopsdiary.blog"&gt;devopsdiary.blog&lt;/a&gt; and &lt;a href="//github.com/wtlinnertz"&gt;github.com/wtlinnertz&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>platformengineering</category>
      <category>leadership</category>
      <category>management</category>
    </item>
    <item>
      <title>Why I Stopped Writing (And What Happened Since)</title>
      <dc:creator>Todd Linnertz</dc:creator>
      <pubDate>Wed, 15 Apr 2026 17:14:04 +0000</pubDate>
      <link>https://dev.to/todd_linnertz_871a076f68e/why-i-stopped-writing-and-what-happened-since-33of</link>
      <guid>https://dev.to/todd_linnertz_871a076f68e/why-i-stopped-writing-and-what-happened-since-33of</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://devopsdiary.blog" rel="noopener noreferrer"&gt;devopsdiary.blog&lt;/a&gt;. Series opener for "The Quiet Years," a retrospective on the work between August 2022 and now.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The last post on this blog went up in August 2022. Three and a half years later, here's why the silence happened and why it's ending now.&lt;br&gt;
April 14, 2026 · Todd&lt;br&gt;
One of the last post on this blog went up on August 2022. Time to restore service. Twenty-eight articles in nine months, and then nothing for three and a half years.&lt;/p&gt;

&lt;p&gt;I wasn’t burned out. I didn’t lose interest. The blog went quiet because I took a new job two weeks later, and the work ate the writing.&lt;/p&gt;

&lt;p&gt;That’s the honest version. The strategic version, the one that matters now, is that the work itself was the foundation I needed for what I’m doing today. I just couldn’t see that while I was inside it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The work that ate the blog&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In August 2022 I started a new Technical Architect role. I thought I’d be enabling DevOps practice. What I ended up doing was a lot of the day-to-day firefighting that comes with a large enterprise.&lt;/p&gt;

&lt;p&gt;I spent the better part of 2023 in conference calls explaining why declarative deployments didn’t violate change management.&lt;/p&gt;

&lt;p&gt;While that was happening, I was also running vendor evaluations and designing the configuration automation for our public cloud alongside an existing CloudBees installation. I built dashboards nobody wanted to see and I figured out what to do when Anaconda changed their licensing and hundreds of developers were impacted. A dev container solution I prototyped for my own team ended up getting adopted.&lt;/p&gt;

&lt;p&gt;None of that looked like blog material at the time. It felt like work. It was the daily grind of making enterprise engineering slightly less terrible one approval at a time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I didn’t see coming&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Somewhere in the middle of that stretch, ChatGPT showed up. Then Copilot. Then a flood of other tools that could generate code faster than any human could review it.&lt;/p&gt;

&lt;p&gt;My first reaction was skeptical. My second reaction, was something like “how the hell are we going to govern this?” The output wasn’t bad. It was often impressive. But it was also a black box. It was a new source of engineering artifacts that could be produced at scale, but with no clear way to validate them or trace them back to the decisions that led to them. Architecture documents, design specifications, PRDs, test cases, deployment scripts. All of it could be generated by AI, but none of it could be governed by the processes that had been in place for human-generated artifacts.&lt;/p&gt;

&lt;p&gt;That observation is where the rest of my career bent.&lt;/p&gt;

&lt;p&gt;The governance instincts I’d been building (immutable artifacts, structured handoffs, validation that produces verdicts instead of suggestions, measurement that becomes gating) turned out to be the vocabulary AI-assisted software delivery needed. And almost nobody was connecting those dots. The MLOps world was building model training pipelines. The AI safety world was talking about alignment. The engineering leadership world was dreaming about productivity gains.&lt;/p&gt;

&lt;p&gt;The gap in the middle was empty. Nobody was writing about what governance looks like when AI generates engineering artifacts at scale. That gap is where I’ve been living since early 2026.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why now&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In February I started building AIEOS, an open-source governance system for AI-assisted software delivery. I wrote the first post about it two weeks ago. That post is the reason this one exists.&lt;/p&gt;

&lt;p&gt;I can’t keep writing forward-looking pieces about AI governance without also explaining where the ideas came from. They didn’t show up in February. They came from watching engineers try to absorb new tooling while keeping regulatory commitments, audit trails and production reliability intact. That’s the blog I didn’t write while I was living it.&lt;/p&gt;

&lt;p&gt;So I’m going to write it now, in retrospect. This retrospective won’t read like greatest hits. Several of these posts are about things that didn’t work. A couple are about decisions I’d make differently today. I’m not trying to stack up wins. I want to show the actual path from doing enterprise governance to building AI governance infrastructure, because that path is shorter than most people think, and a lot of engineers are walking it right now without realizing it.&lt;/p&gt;

&lt;p&gt;If you’re one of them, this series is for you.&lt;/p&gt;




&lt;p&gt;Todd Linnertz is the creator of AIEOS, an open-source AI governance system for software delivery teams. Find him at &lt;a href="https://devopsdiary.blog" rel="noopener noreferrer"&gt;devopsdiary.blog&lt;/a&gt; and &lt;a href="//github.com/wtlinnertz"&gt;github.com/wtlinnertz&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>platformengineering</category>
      <category>ai</category>
      <category>career</category>
    </item>
    <item>
      <title>AI Doesn't Fix Your Development Problems. It Accelerates Them.</title>
      <dc:creator>Todd Linnertz</dc:creator>
      <pubDate>Tue, 07 Apr 2026 12:21:50 +0000</pubDate>
      <link>https://dev.to/todd_linnertz_871a076f68e/ai-doesnt-fix-your-development-problems-it-accelerates-them-3m4o</link>
      <guid>https://dev.to/todd_linnertz_871a076f68e/ai-doesnt-fix-your-development-problems-it-accelerates-them-3m4o</guid>
      <description>&lt;p&gt;I've watched the same failure pattern play out across every technology wave of my career.&lt;/p&gt;

&lt;p&gt;Team gets a new tool that promises to change everything. Productivity numbers go up. Everyone celebrates. Six months later, they're drowning in the same late-stage rework they were drowning in before. Just more of it, arriving faster.&lt;/p&gt;

&lt;p&gt;I saw it with CASE tools in the nineties. With offshore development in the 2000s. With Agile transformations in the 2010s. With DevOps automation in the 2020s.&lt;/p&gt;

&lt;p&gt;AI code generation is the most powerful version of this pattern I've ever seen. And most engineering organizations are walking straight into it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Illusion Looks Like This
&lt;/h2&gt;

&lt;p&gt;Your team adopts GitHub Copilot or a similar tool. A developer asks it to implement a user authentication module. In forty seconds, it produces three hundred lines of code, complete with error handling, tests and documentation comments.&lt;/p&gt;

&lt;p&gt;It looks like progress. It genuinely feels like the future.&lt;/p&gt;

&lt;p&gt;Most teams never stop to ask whether the spec for that authentication module was unambiguous.&lt;/p&gt;

&lt;p&gt;Because if the acceptance criteria were vague, if the security requirements weren't spelled out, if the integration assumptions weren't documented, you didn't just get a module in forty seconds. You got a module built on a foundation of ambiguity in forty seconds. The rework that's coming is exactly the same size it would have been without AI, compressed into a shorter timeline, with more generated code to sort through.&lt;/p&gt;

&lt;p&gt;This is what I mean when I say AI accelerates the appearance of progress while the underlying causes of late-stage rework remain unchanged.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Source of the Problem
&lt;/h2&gt;

&lt;p&gt;Late-stage rework has never been caused by slow typing.&lt;/p&gt;

&lt;p&gt;After five companies and more failed projects than I can count, I can say this with confidence: rework happens because of &lt;em&gt;process failures&lt;/em&gt;, not &lt;em&gt;speed deficits&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The real culprits are consistent:&lt;/p&gt;

&lt;p&gt;Ambiguous specifications that leave developers filling in the blanks with assumptions that won't survive contact with the product team.&lt;/p&gt;

&lt;p&gt;Unstable upstream artifacts. The architecture document that's still being revised while the engineering team is implementing against it.&lt;/p&gt;

&lt;p&gt;No separation between generation and judgment. The same person (or tool) that produces the artifact is asked to validate it. The result is rationalization, not evaluation.&lt;/p&gt;

&lt;p&gt;Missing governance at handoff points. Work flows from planning to design to implementation with no formal freeze points and no immutable record of what was decided and when.&lt;/p&gt;

&lt;p&gt;These process failures predate AI by decades. I saw every one of them long before anyone had a code assistant. What AI does is make them faster, and worse. When a developer could only produce two hundred lines of code per day, bad process produced two hundred lines of rework per day. When AI can produce two thousand lines of code per day, bad process produces two thousand lines of rework per day.&lt;/p&gt;

&lt;p&gt;The throughput multiplied. The problem did not diminish.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Most Teams Do About It
&lt;/h2&gt;

&lt;p&gt;Most teams respond to this by trying to write better prompts.&lt;/p&gt;

&lt;p&gt;That's the wrong level of the problem. Better prompts improve the quality of AI output within a session. They do nothing about the structural issues that make that output drift, conflict with upstream decisions, or fail validation three weeks later.&lt;/p&gt;

&lt;p&gt;Some teams add code review. That helps at the implementation level, but it doesn't address the artifact chain. AI-generated architecture documents, PRDs, and design specifications have the same ambiguity problem as AI-generated code, and often create it earlier in the cycle where the blast radius is larger.&lt;/p&gt;

&lt;p&gt;The instinct to treat AI governance as a prompt engineering problem is understandable. Prompt engineering is visible and immediate. The structural failures that cause rework aren't. They hide until you're already underwater.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Fixes It
&lt;/h2&gt;

&lt;p&gt;After watching the same failure patterns repeat, and then watching them accelerate as my teams started adopting AI tooling, I concluded that the fix requires three structural changes, none of which are about prompting.&lt;/p&gt;

&lt;p&gt;Treat AI as a generation engine, not a decision-maker. AI is extraordinarily good at producing artifacts: code, documentation, architecture drafts, test plans. It is not good at determining whether those artifacts are correct relative to upstream decisions it may not fully understand. The organizations that get this right separate generation (what AI does) from judgment (what humans and structured validators do). These are different activities and they need different infrastructure.&lt;/p&gt;

&lt;p&gt;Freeze artifacts before downstream work begins. An architecture document that can change while engineering is implementing against it is a liability, plain and simple. Frozen artifacts create an immutable record of what was decided. When something downstream breaks, you know whether the upstream artifact shifted or whether the implementation deviated. Without freeze semantics, this is guesswork.&lt;/p&gt;

&lt;p&gt;Make validation produce verdicts, not suggestions. When you ask an AI to review its own output, it will find ways to explain why what it generated is reasonable. That's rationalization, not validation. Real validation produces a binary result: the artifact meets the required criteria, or it doesn't. Anything softer than that is a governance gap dressed up as a process.&lt;/p&gt;

&lt;p&gt;At a previous company, I inherited four operational workflows where the same rework patterns were burning cycles everywhere. We didn't buy new tools or speed anything up. We restructured the handoffs and built validation into each transition point. Defect rates dropped 50%. Throughput improved between 35 and 57 percent across all four areas. None of that came from faster tooling. All of it came from fixing the process around the work.&lt;/p&gt;

&lt;p&gt;These aren't novel ideas. They're the same principles that make CI/CD pipelines reliable: automated gates, immutable artifacts, clear separation of build and deploy. The insight is that they apply just as well to AI-assisted software delivery as they do to code deployment pipelines. Maybe more so.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftb7d2hjdf0txefns5niv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftb7d2hjdf0txefns5niv.png" alt="AI Governance Flow" width="800" height="351"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;The difference is structure around the generation.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Framework I Built
&lt;/h2&gt;

&lt;p&gt;When I led GitOps adoption at my current company, the technology was the easy part. Getting architecture review board approval, building deployment standards and creating the governance structure that let teams adopt safely took ten times longer. The teams that tried to skip the governance stalled out. The ones that went through it shipped to production. That experience confirmed something I already suspected: the structure around adoption matters more than the tool being adopted.&lt;/p&gt;

&lt;p&gt;In early 2026, I formalized these ideas into an open-source framework called AIEOS (AI-Enabled Operating System).&lt;/p&gt;

&lt;p&gt;AIEOS structures how engineering artifacts are produced, validated and connected across the full software development lifecycle when AI is involved in generating them. It's built across 24 repositories: an 8-layer model covering the full value-delivery cycle from strategic direction through operational diagnostics, a multi-agent orchestration harness and a guided console for running governance workflows.&lt;/p&gt;

&lt;p&gt;The design reflects a simple premise: when AI generates engineering artifacts, the quality of the output depends on the quality of the structure around it. Better prompts help. Better governance infrastructure is what makes the results repeatable, auditable and trustworthy at scale.&lt;/p&gt;

&lt;p&gt;The repo is at &lt;a href="https://github.com/wtlinnertz" rel="noopener noreferrer"&gt;github.com/wtlinnertz&lt;/a&gt;. It's open source, and the rest of this series will dig into how it's designed and why.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Coming in This Series
&lt;/h2&gt;

&lt;p&gt;Over the next six posts, I'll cover:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The eight questions every AI-assisted engineering team must be able to answer and how they map to a governance architecture&lt;/li&gt;
&lt;li&gt;The three non-negotiable rules for trustworthy AI-generated code&lt;/li&gt;
&lt;li&gt;What DevOps taught me about AI governance (and why that background is an advantage)&lt;/li&gt;
&lt;li&gt;Inside AIEOS: how multi-agent orchestration runs governance workflows&lt;/li&gt;
&lt;li&gt;AI governance in financial services and why the compliance context changes everything&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you've been watching AI tooling arrive in your organization and wondering why the rework isn't going away, this series is for you.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Todd Linnertz is a Senior Technology Leader with deep experience in enterprise architecture and DevOps. He is the creator of AIEOS, an open-source AI governance system for software delivery teams. Find him at &lt;a href="https://devopsdiary.blog" rel="noopener noreferrer"&gt;devopsdiary.blog&lt;/a&gt; and &lt;a href="https://github.com/wtlinnertz" rel="noopener noreferrer"&gt;github.com/wtlinnertz&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devops</category>
      <category>governance</category>
      <category>softwareengineering</category>
    </item>
  </channel>
</rss>
