<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Keith MacKay</title>
    <description>The latest articles on DEV Community by Keith MacKay (@keithjmackay).</description>
    <link>https://dev.to/keithjmackay</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1499463%2Fdc7d3636-8482-4d6e-b619-fb4367cf4dfd.jpg</url>
      <title>DEV Community: Keith MacKay</title>
      <link>https://dev.to/keithjmackay</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/keithjmackay"/>
    <language>en</language>
    <item>
      <title>Many Are Building Cathedrals on Quicksand</title>
      <dc:creator>Keith MacKay</dc:creator>
      <pubDate>Thu, 18 Jun 2026 22:40:16 +0000</pubDate>
      <link>https://dev.to/keithjmackay/many-are-building-cathedrals-on-quicksand-1geo</link>
      <guid>https://dev.to/keithjmackay/many-are-building-cathedrals-on-quicksand-1geo</guid>
      <description>&lt;h1&gt;
  
  
  Many Are Building Cathedrals on Quicksand
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;The foundations of AI development shift every quarter. These are the architectural choices that outlast the churn.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;Medieval cathedrals were designed to outlast their builders. The architects who laid the first stones at Notre-Dame knew they'd never see it finished. They planned in centuries.&lt;/p&gt;

&lt;p&gt;We're doing the opposite. We're building software on foundations that shift every quarter, with vendor relationships that treat genuinely competitive commercial providers as neutral infrastructure, and with code that hard-codes behaviors that will be deprecated before the next sprint cycle.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GPT-4 was state of the art in early 2023. By late 2024, it was middle of the pack [1].&lt;/strong&gt; Entire startups built on specific model behaviors woke up to find their core assumption was gone. Not wrong. Not deprecated with a migration guide. Just: gone, or quietly changed, or superseded by something so different the old prompts didn't work anymore.&lt;/p&gt;

&lt;p&gt;That's the terrain we're traversing as leaders.&lt;/p&gt;

&lt;p&gt;The question isn't whether the ground will shift. It's whether your architecture can handle it when it does.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem with Betting on a Foundation That's Still Being Poured
&lt;/h2&gt;

&lt;p&gt;Here's what the past several years have looked like from where I sit:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;2022: GPT-3 was the obvious choice. Build on it.&lt;/li&gt;
&lt;li&gt;2023: GPT-4 changes everything. Rebuild or fall behind.&lt;/li&gt;
&lt;li&gt;2023 (late): Claude 2, open-source models, local inference. Suddenly the answer wasn't obvious.&lt;/li&gt;
&lt;li&gt;2024: GPT-4o, Claude 3 Opus, Gemini Ultra, Llama 3. All competitive. All different.&lt;/li&gt;
&lt;li&gt;2025: Reasoning models, multimodal, agents. The architecture question gets much harder.&lt;/li&gt;
&lt;li&gt;2026: Tools and harnesses are maturing, workflows are settling, swarms are better at parallelizing tasks, teams are beginning to think about tokenomics. Model is becoming a commodity -- local open-source models are much closer to frontier model capabilities. China's coordination across its AI ecosystem is showing real gains against the US AI ecosystem.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Every one of those transitions created winners and losers, and the losers were almost always the teams that had built the most tightly-coupled solutions to a specific model's API.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not because those teams were bad engineers. Because they were optimizing for the wrong thing. They were building for today's foundation instead of building for foundation-change.&lt;/p&gt;

&lt;p&gt;The deprecation notices tell the story. Anthropic's stated minimum notice window before a model is retired is 60 days -- and several recent models have hit exactly that floor [2]. Claude Sonnet 4 and Claude Opus 4 went from launch to complete retirement in under a year. OpenAI's entire Assistants API product -- a structural foundation many teams built on -- is being removed in August 2026, requiring a complete migration to the Responses API [3]. This isn't a deprecation. It's a teardown with a deadline.&lt;/p&gt;

&lt;p&gt;The release pace compounds it. Frontier model releases arrived roughly once every 37 days in 2023. By 2026, the interval had compressed to roughly every 11 days [4]. The ground doesn't just move. It moves faster every year, every quarter, every month, every week.&lt;/p&gt;

&lt;p&gt;The cloud-native movement figured this out the hard way a decade ago. The teams that won didn't write code that assumed AWS and only AWS forever. They wrote code that treated AWS as a utility, abstracted behind interfaces they controlled, using APIs that could accommodate hybrid cloud environments. In the mergers-and-acquisitions deals I see, limiting acquisition targets to companies using the same cloud provider as the buyer is rarely an acceptable constraint. This means using containerized applications, database abstraction layers, and vendor-agnostic infrastructure-as-code where possible.&lt;/p&gt;

&lt;p&gt;Same lesson. Different decade. Somehow we're learning it again from scratch. What's old becomes new again.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Actually Changes vs. What Stays Stable
&lt;/h2&gt;

&lt;p&gt;A useful (and simple) mental model that works here is the following:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Some concepts in AI (or any broad technology category) are stable. Some are not. Your architecture should only hard-code the stable ones.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Stable: tokens, attention mechanisms, context windows as a concept, embeddings as a concept, the basic prompt-completion pattern, retrieval-augmented generation as an approach to prompt augmentation.&lt;/p&gt;

&lt;p&gt;Unstable: specific API parameters, model-specific prompt formats, context window sizes (they keep growing, though max usable window for predictable results has not grown much...YET), pricing structures, rate limits, specific model behaviors that aren't documented as guarantees, fine-tuning APIs, function-calling syntax.&lt;/p&gt;

&lt;p&gt;When engineers hard-code model-specific behaviors into business logic, they're writing code with an unknown (but near-certain-to-happen) expiration date. However, if they abstract those behaviors behind interfaces their team controls, they're buying themselves optionality.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Optionality is the actual product you're building when you build model-agnostic infrastructure.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;One concrete example: prompt templates. Teams that wrote prompts directly into application code, formatted specifically for GPT-4's preferred patterns, had real migration work to do when they needed to switch. Teams that externalized prompts into configuration, with a thin layer that could reformat them per model, had a much easier time. Same underlying logic. Very different operational posture.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Vendor Lock-In Problem (Again)
&lt;/h2&gt;

&lt;p&gt;OpenAI, Anthropic, and Google are not neutral infrastructure providers.&lt;/p&gt;

&lt;p&gt;I don't say that to be critical of any of them. They're building remarkable technology. But &lt;strong&gt;they have commercial interests, competitive pressures, and strategic priorities that are not aligned with your need for stable, predictable infrastructure.&lt;/strong&gt; Treating them like AWS S3 is strategically naive.&lt;/p&gt;

&lt;p&gt;AWS S3 has maintained complete API backward compatibility since its 2006 launch -- twenty years. Their own 20th-anniversary post states it plainly: "the code you wrote for S3 in 2006 still works today, unchanged" [5]. That's because AWS built S3 as durable utility infrastructure, and their business model depends on your data staying there.&lt;/p&gt;

&lt;p&gt;The frontier model providers are in a race. They're iterating fast because they have to. They're changing behaviors, deprecating models, shifting pricing, and launching new capabilities on a schedule that serves their competitive position, not your deployment stability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The teams treating AI providers as utilities are building abstraction layers they control.&lt;/strong&gt; This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An LLM gateway or router that sits between your application and the model providers.&lt;/li&gt;
&lt;li&gt;A model-agnostic interface that lets you swap the underlying model without touching business logic.&lt;/li&gt;
&lt;li&gt;Evaluation frameworks that work across models so you can make the switch decision on evidence instead of intuition.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a router with model-agnostic interface that Claude and I wrote in Rust with budget controls for individuals/teams/projects, OTEL observability built in, hooks (to add DLP, evals, or other checks), and full command-line admin capability for automation or integration, see &lt;a href="https://github.com/keithmackay/modelrouter" rel="noopener noreferrer"&gt;&lt;strong&gt;https://github.com/keithmackay/modelrouter&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The emergence of MCP (Model Context Protocol) is itself evidence the industry arrived at this conclusion independently. Anthropic introduced MCP in November 2024 and donated it to the Linux Foundation for vendor-neutral governance in December 2025, by which point it had 97 million monthly SDK downloads and had been adopted by OpenAI, Google DeepMind, and Microsoft [6]. MCP standardizes how AI models connect to external tools and data sources. That's real and useful. But what does it solve? MCP addresses tool integration portability. It doesn't standardize prompt behavior, context handling, reasoning model APIs, or deprecation schedules. The abstraction layer that sits between your application and which model handles a request still needs to be built by your team.&lt;/p&gt;

&lt;p&gt;The teams that haven't figured this out yet are the ones where switching providers means a multi-month engineering project. That's not a technical problem. It's an architectural choice that's going to compound.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Case for Staying Deep
&lt;/h2&gt;

&lt;p&gt;Before you build the abstraction layer, know what you're giving up.&lt;/p&gt;

&lt;p&gt;Claude responds better to XML-structured prompts. GPT-4.x responds better to JSON-delimited instructions. Gemini handles context differently. When you write prompts to the lowest common denominator across models, or maintain per-model variants (which is just hidden coupling), you're trading optimization headroom for portability.&lt;/p&gt;

&lt;p&gt;Each gateway hop also adds latency. For real-time voice interfaces or sub-200ms UX targets, that overhead is a real engineering constraint, not a theoretical one [7].&lt;/p&gt;

&lt;p&gt;And there's a perverse argument from pricing history. GPT-4 tokens fell roughly 9x in 17 months -- from $30/million input tokens at launch to around $3/million by late 2024, without any migration required [8]. Teams that stayed deep on GPT-4 during that period captured the cost compression at zero switching cost. The question is whether the next price move works in your favor, and whether you can afford to wait.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The model-agnostic argument isn't "abstraction layers have no cost." They do.&lt;/strong&gt; The argument is that the cost of unplanned migration without them &lt;em&gt;is higher -- and that the migration event is inevitable&lt;/em&gt;. You're only choosing whether you're ready for it. Given that Anthropic's minimum deprecation notice is 60 days and OpenAI's Assistants API is disappearing entirely, "we'll deal with it when we need to" is a plan that has already failed for a lot of teams.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Model-Agnostic Architecture Looks Like
&lt;/h2&gt;

&lt;p&gt;You don't need to over-engineer this. The goal is the right abstraction layers, not abstraction for its own sake.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The LLM gateway layer.&lt;/strong&gt; A single internal service that all your AI calls go through. It handles routing, rate limiting, cost tracking, model selection, and failover. Your application code doesn't know or care whether it's talking to GPT-4o or Claude 3.5 or a local Llama deployment. That's the gateway's job. Teams that built this early have a meaningful operational advantage right now. The market recognized this fast: LiteLLM, the most widely deployed open-source LLM proxy, has proxied over a billion requests and has nearly 48,000 GitHub stars as of mid-2026 [9]. Gartner predicts that by 2028, 70% of organizations building multi-LLM applications will use AI gateway capabilities -- up from less than 5% in 2024 [10].&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt portability.&lt;/strong&gt; Externalize your prompts. Version control them separately from your application code. Build a thin translation layer that can reformat them for different model families. This sounds like overhead until the day you need to migrate, and then it sounds like foresight [11].&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model-agnostic evaluation.&lt;/strong&gt; How do you know if the new model is better for your use case? You need evals that aren't written assuming a specific model's output format. Build your quality benchmarks against the behavior you actually care about, not against what GPT-4 happened to produce [12].&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Avoid the model-specific feature trap.&lt;/strong&gt; Every frontier model has features that look compelling and are only available on that model. Some of them are worth using. But every time you build a core business capability on a feature that's only available from one provider, you're writing a ransom note to yourself.&lt;/p&gt;

&lt;p&gt;The test: if Anthropic or OpenAI raised prices by 5x tomorrow, how long would it take you to switch? If the answer is more than a few weeks, you've got architectural debt that's quietly accumulating.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Organizations Getting This Right
&lt;/h2&gt;

&lt;p&gt;They have a few things in common.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;They treat AI infrastructure like they treat cloud infrastructure: with abstraction layers, provider diversity, and a clear strategy for avoiding single-vendor dependency. They're not anti-any-vendor. They're pro-optionality.&lt;/li&gt;
&lt;li&gt;They invest in internal capability around the stable concepts: understanding embeddings, retrieval, context management, evaluation frameworks. &lt;strong&gt;The engineers who understand why things work are much less disrupted by changes in how things work.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;They run structured experiments when new models arrive rather than either ignoring them or immediately migrating. They have the evaluation infrastructure to make that decision on evidence. They know which models perform better for which task types in their specific context, not just what the benchmarks say.&lt;/li&gt;
&lt;li&gt;And they're honest about the tradeoff. Model-agnostic architecture has real costs. It's more engineering work upfront. Some model-specific optimizations aren't available through abstraction layers. The organizations doing this well accept those costs explicitly, because they've done the math on the alternative.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;We are early in a long infrastructure transition.&lt;/strong&gt; The foundational models will keep changing. The toolchains will keep evolving. The vendors will keep competing, which means the landscape will keep shifting.&lt;/p&gt;

&lt;p&gt;The cathedral builders who got it right designed for the long arc. Stone that could be added to. Foundations that could bear weight they couldn't yet imagine. Architecture that survived the deaths of the architects.&lt;/p&gt;

&lt;p&gt;You can't build that if you're optimizing only for today's model and today's API.&lt;/p&gt;

&lt;p&gt;The teams that will look smart in three years are building abstraction layers now. They're externalizing configuration, investing in evals, treating vendors as utilities, and developing engineers who understand the stable underlying concepts instead of just the current API.&lt;/p&gt;

&lt;p&gt;The quicksand is real, but it has a texture that experienced developers recognize. You can learn the signs and build on pylons rather than just hoping the ground holds.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What's your current AI infrastructure posture: utility abstraction or single-vendor deep integration? Have you had to migrate yet? I'm curious what the migration cost looked like in practice. Share your experience in the comments.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://chat.lmsys.org" rel="noopener noreferrer"&gt;LMSYS Chatbot Arena Leaderboard (ongoing model ranking data)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://platform.claude.com/docs/en/docs/resources/model-deprecations" rel="noopener noreferrer"&gt;Anthropic Model Deprecations (official documentation)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://presenc.ai/research/ai-model-deprecation-tracker-2026" rel="noopener noreferrer"&gt;AI Model Deprecation Tracker 2026 -- Presenc AI (OpenAI Assistants API removal, August 2026)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.digitalapplied.com/blog/frontier-model-release-velocity-index-q2-2026" rel="noopener noreferrer"&gt;Frontier Model Release Velocity Index Q2 2026 -- DigitalApplied&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/aws/twenty-years-of-amazon-s3-and-building-whats-next/" rel="noopener noreferrer"&gt;Twenty Years of Amazon S3 and Building What's Next -- AWS News Blog (March 2026)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.digitalapplied.com/blog/mcp-adoption-statistics-2026-model-context-protocol" rel="noopener noreferrer"&gt;MCP Adoption Statistics 2026: Model Context Protocol -- DigitalApplied&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2603.12465" rel="noopener noreferrer"&gt;The Hidden Costs of LLM Inference: Latency and Abstraction Layer Overhead -- arXiv:2603.12465&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://x.com/AndrewYNg/status/1829190549842321758" rel="noopener noreferrer"&gt;Andrew Ng on GPT-4o token price drop: 9x in 17 months (August 2024)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/BerriAI/litellm" rel="noopener noreferrer"&gt;LiteLLM -- open-source LLM gateway (GitHub)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.flotorch.ai/blogs/llm-gateway-comparison-2026" rel="noopener noreferrer"&gt;LLM Gateway Comparison 2026 (citing Gartner: 70% of organizations building multi-LLM apps will use AI gateway capabilities by 2028) -- FloTorch&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://a16z.com/emerging-architectures-for-llm-applications/" rel="noopener noreferrer"&gt;Emerging Architectures for LLM Applications -- Matt Bornstein and Rajko Radovanovic, a16z (June 20, 2023)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://martinfowler.com/bliki/StranglerFigApplication.html" rel="noopener noreferrer"&gt;Strangler Fig Application (pattern for incremental migration) -- Martin Fowler&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;strong&gt;If this resonated, here are some related articles and resources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;modelrouter - an Open Source, Rust-based model router with OTEL observability, tokenomics cost control management, and command-line control of all features for automated admin: &lt;a href="https://github.com/keithmackay/modelrouter" rel="noopener noreferrer"&gt;https://github.com/keithmackay/modelrouter&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;For the cost side of the infrastructure equation -- why AI infrastructure scarcity is driving up spend even as model prices fall, and why ROI still wins: &lt;a href="https://www.linkedin.com/pulse/ai-infrastructure-scarcity-raising-costs-usage-still-provide-mackay-y2hce/" rel="noopener noreferrer"&gt;On LinkedIn: AI Infrastructure Scarcity is Raising Costs, but AI Usage Will Still Provide Unbeatable ROI&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/p/ai-infrastructure-scarcity-is-raising" rel="noopener noreferrer"&gt;On Substack&lt;/a&gt; | &lt;a href="https://medium.com/@keithwrites/ai-infrastructure-scarcity-is-raising-costs-but-ai-usage-will-still-provide-unbeatable-roi-3194d3178132" rel="noopener noreferrer"&gt;On Medium&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;For how the abstraction shift changes what developers actually build and maintain -- when the code is written for other AI systems to read, not humans: &lt;a href="https://www.linkedin.com/pulse/when-ai-stops-writing-code-humans-keith-mackay-8y37e" rel="noopener noreferrer"&gt;On LinkedIn: When AI Stops Writing Code for Humans&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/p/when-ai-stops-writing-code-for-humans" rel="noopener noreferrer"&gt;On Substack&lt;/a&gt; | &lt;a href="https://medium.com/@keithwrites/when-ai-stops-writing-code-for-humans-b55eb5e8922a" rel="noopener noreferrer"&gt;On Medium&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;For a look at which competitive advantages actually survive when AI commoditizes the software layer -- directly connected to the "optionality is the product" argument here: &lt;a href="https://www.linkedin.com/pulse/software-moats-age-ai-whats-actually-defensible-keith-mackay-ibsde" rel="noopener noreferrer"&gt;On LinkedIn: Software Moats in the Age of AI: What's Actually Defensible?&lt;/a&gt; | &lt;a href="https://medium.com/@keithwrites/software-moats-in-the-age-of-ai-whats-actually-defensible-698d4433d61e" rel="noopener noreferrer"&gt;On Medium&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;For how the return to specification-driven development mirrors the architectural discipline this article argues for: &lt;a href="https://www.linkedin.com/pulse/irony-ai-development-how-context-engineering-taking-us-keith-mackay-g47fe" rel="noopener noreferrer"&gt;On LinkedIn: The Irony of AI Development: How Context Engineering Is Taking Us Back to Waterfall&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/p/the-irony-of-ai-development" rel="noopener noreferrer"&gt;On Substack&lt;/a&gt; | &lt;a href="https://medium.com/@keithwrites/the-irony-of-ai-development-how-context-engineering-is-taking-us-back-to-waterfall-7b6a06044c6b" rel="noopener noreferrer"&gt;On Medium&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Keith MacKay is a technology strategy consultant and CTO in EY-Parthenon's Software Strategy Group (SSG), specializing in AI disruption and technology diligence for private equity and corporate clients. SSG's AI Disruption Lab conducts rapid assessments of how AI transforms and threatens existing business models and value chains. Keith teaches at Northeastern University and writes about strategy, management, and AI/technology, with Claude Code and Codex as AI collaborators.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>infrastructure</category>
      <category>technicaldebt</category>
    </item>
    <item>
      <title>Why Your AI Transformation Is Stalling at Middle Management</title>
      <dc:creator>Keith MacKay</dc:creator>
      <pubDate>Wed, 17 Jun 2026 20:34:21 +0000</pubDate>
      <link>https://dev.to/keithjmackay/why-your-ai-transformation-is-stalling-at-middle-management-1lh</link>
      <guid>https://dev.to/keithjmackay/why-your-ai-transformation-is-stalling-at-middle-management-1lh</guid>
      <description>&lt;h1&gt;
  
  
  Why Your AI Transformation Is Stalling at Middle Management
&lt;/h1&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The C-suite is excited. The developers are excited. The people in between have misaligned incentives.&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Your CEO came back from Davos on fire about AI. Your engineers are already using Cursor and Claude and Codex and have been for months. You've got a board slide about "AI-first strategy" and a consulting firm has been paid good money to produce a roadmap.&lt;/p&gt;

&lt;p&gt;And yet. Nothing is moving.&lt;/p&gt;

&lt;p&gt;The research confirms this is not just your company. PwC's 2026 Global CEO Survey — 4,454 CEOs across 95 countries — found that &lt;strong&gt;56% report AI has delivered neither higher revenues nor lower costs over the past 12 months&lt;/strong&gt; [1]. Only 12% of those CEOs can claim both cost savings and revenue gains. BCG found that 74% of companies have yet to achieve tangible value from AI investments [2]. Gartner had predicted that 30% of generative AI projects would be abandoned after proof of concept by end of 2025 [3] — and the evidence suggests they were right.&lt;/p&gt;

&lt;p&gt;Meanwhile, 88% of organizations say they use AI in at least one function [4]. Sounds impressive until you read the footnote: only 33% have begun scaling AI across the enterprise, and just 6% qualify as actual high performers — organizations where AI measurably moves EBIT. Goldman Sachs, in March 2026, put the macro picture bluntly: &lt;strong&gt;"We still do not find a meaningful relationship between productivity and AI adoption at the economy-wide level."&lt;/strong&gt; [5]&lt;/p&gt;

&lt;p&gt;The technology works. In specific, well-scoped deployments where companies actually measure results, McKinsey finds median productivity gains of roughly 30% [4]. The problem isn't whether AI can deliver. It's whether organizations can get it out of the pilot phase and into production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The bottleneck isn't technology. It's not budget. It's not even culture in the abstract. It's a specific layer of your organization: directors and VPs who have every incentive to slow this down and just enough power to do it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the frozen middle problem. And it's quietly killing more AI transformations than any technical failure ever will.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Middle Managers Are Specifically Threatened
&lt;/h2&gt;

&lt;p&gt;Before you dismiss this as organizational cynicism, understand the structural reality.&lt;/p&gt;

&lt;p&gt;Middle managers exist to do three things: aggregate information from the front lines, coordinate across functions, and translate between strategy and execution. They are the nervous system of the organization. The people who know what's actually happening and can turn that into action.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI is extraordinarily good at all three of those things.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not eventually. Now. Today. LLMs synthesize information across sources. Agentic workflows coordinate handoffs across systems and teams. AI tools translate messy business problems into structured outputs with (almost) zero headcount.&lt;/p&gt;

&lt;p&gt;This isn't a 10-year threat. It's a this-quarter threat. And the directors and VPs who've built careers on being the connective tissue of the organization can feel it, even if they can't articulate it.&lt;/p&gt;

&lt;p&gt;The organizational irony is brutal: &lt;strong&gt;the people most threatened by AI are exactly the ones with the most power to block its adoption.&lt;/strong&gt; They control budget approvals. They set workflow norms. They gate access to systems. They define what "success" looks like for a pilot. They write the performance reviews of the people trying to push AI forward from below.&lt;/p&gt;

&lt;p&gt;This isn't malice. It's self-preservation. And it's completely rational, which makes it hard to fight.&lt;/p&gt;




&lt;h2&gt;
  
  
  Three Archetypes You've Already Met
&lt;/h2&gt;

&lt;p&gt;You know these people. You've been in meetings with them. Maybe you've worked for one. There are three patterns that show up reliably.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Archetype 1: The Pilot Purgatory Director&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This one is enthusiastic about AI. Genuinely! They'll fund a pilot. They'll sit in the demo. They'll nod along to the business case.&lt;/p&gt;

&lt;p&gt;Then the pilot ends and... nothing happens. Because before we move to production, we need to address a few more edge cases. And now the success criteria have shifted slightly. And actually, we should probably run it in parallel with the existing process for another quarter, just to be sure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The pilot lives forever because production means accountability.&lt;/strong&gt; A pilot that doesn't graduate is just a learning experience. A production deployment that underperforms is a career event. Deloitte's research found executives openly acknowledging this: "PoCs using dummy data create false optimism; real data reveals underlying problems" [7].&lt;/p&gt;

&lt;p&gt;So the pilot keeps running. The team that built it gets reassigned. The vendor eventually stops following up. And the director gets to say, truthfully, that they ran an AI pilot.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Archetype 2: The "Not My Budget" VP&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The tool costs $50K per year. The business case shows $500K in productivity gains. The math is obvious.&lt;/p&gt;

&lt;p&gt;But: "It's not in this year's budget." Budget planning is in Q3. We'll put it on the list for next year. (It never makes the list.)&lt;/p&gt;

&lt;p&gt;This one is particularly insidious because it's not technically a no. It's a process no. It hides behind real organizational constraints, legitimate ones even, while achieving the same outcome as an outright veto.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Every budget cycle adds another year of delay. The bar for "next year's budget" keeps moving.&lt;/strong&gt; The tool that would have transformed the workflow in 2025 is now "table stakes" by the time it gets approved in 2027, and the window of advantage is gone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Archetype 3: The "Security Says No" Deflection&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Security concerns are real. Data privacy matters. Governance is important. I am not dismissing any of that.&lt;/p&gt;

&lt;p&gt;But "security said no" as a conversation-ender, with no risk assessment, no proposed controls, no path to resolution, is not a security posture. It's a veto dressed up in compliance language.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The tell: when security concerns are raised without any corresponding question of "what would need to be true for this to be approved?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Real security work is solvable. Data classification frameworks, access controls, approved model lists, output auditing: these are engineering problems with engineering solutions. The deflection version skips all of that and lands at "no" without ever visiting the middle.&lt;/p&gt;

&lt;p&gt;When you hear "security says no" with no further texture, you're not looking at a security problem. You're looking at someone using security as a shield.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Organizational Trap
&lt;/h2&gt;

&lt;p&gt;Here's the dynamic that makes this so hard to unblock from below.&lt;/p&gt;

&lt;p&gt;The IC who wants to use AI tools goes to their manager for approval. The manager says not yet. The IC pushes back with a business case. The manager escalates to their VP for a budget exception. The VP says not this quarter. The IC, now months into the process, quietly starts using the free tier of something and not telling anyone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Shadow IT is not the failure mode. It's the symptom.&lt;/strong&gt; When AI adoption goes underground, it means the official channels have failed. And underground AI adoption is genuinely riskier than sanctioned adoption, with no governance, no data controls, no visibility. Research suggests 50–60% of workers are already using unsanctioned AI tools [6] — meaning frontline adoption is happening around middle management, not through it.&lt;/p&gt;

&lt;p&gt;The frozen middle creates the exact risk it claims to be preventing. And there's a harder dynamic underneath it: Gartner predicts that by 2026, 20% of organizations will eliminate more than half of their current middle management roles via AI [3]. Middle managers aren't imagining the threat. They're responding rationally to a real one. The organizational irony compounds: the more aggressively the C-suite pushes AI, the stronger the survival instinct of the layer being asked to implement it.&lt;/p&gt;

&lt;p&gt;Meanwhile, the C-suite is issuing mandates and wondering why the roadmap is behind. The consulting firm is writing a follow-up deck about change management. And the directors and VPs are nodding along in the all-hands and then returning to their desks to find another reason the pilot isn't ready for production.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Actually Unblocks This
&lt;/h2&gt;

&lt;p&gt;There are strategies that work. None of them are easy and most of them require someone above the frozen layer to actually care about outcomes, not just optics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Executive sponsorship with teeth, not words.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;"The CEO is committed to AI" means nothing if there's no consequence for blocking it. Sponsorship with teeth looks like: a named executive who reviews AI initiative status quarterly, with authority to break budget and approval logjams. Not a steering committee. A person. With accountability.&lt;/p&gt;

&lt;p&gt;The frozen middle is rational. It responds to incentives. If the consequence of blocking AI adoption is a difficult conversation with a senior executive who actually follows up, the calculus changes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bottom-up pressure through visible wins.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Pilots that never graduate to production stay invisible. &lt;strong&gt;Wins that are visible to the organization create pressure the frozen middle can't ignore.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This means: deliberately publicizing AI wins, even small ones, through internal channels. Town halls, team newsletters, lunch-and-learns where teams show what they've built. When peers see other teams shipping real AI workflows and getting recognized for it, the "not my budget" defense gets harder to hold.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Make AI adoption part of how managers are evaluated.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If performance reviews for directors and VPs include no signal on AI adoption, you're rewarding inaction. Full stop.&lt;/p&gt;

&lt;p&gt;This doesn't mean punishing people for being careful. It means including adoption metrics in the conversation: What pilots have you run? What graduated to production? What's your team's AI literacy? What tools has your function implemented?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The people who get measured on it will find a way to make it happen.&lt;/strong&gt; The people who don't will find a way to explain why it isn't their problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pre-approved pilot frameworks.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A significant chunk of "not my budget" and pilot purgatory is friction in the approval process. If every AI pilot requires a custom business case, security review, budget exception, and VP sign-off, you've made piloting so expensive that only the most motivated teams will attempt it.&lt;/p&gt;

&lt;p&gt;Pre-approved pilot frameworks flip this: a pre-cleared list of tools, a standard data classification review, a budget envelope that managers can access without an exception process, and a graduation checklist that defines what "production ready" actually means.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When the path to yes is shorter, more people take it.&lt;/strong&gt; You won't eliminate the frozen middle entirely, but you can reduce the surface area it has to block.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;Your AI transformation is not stalling because the technology isn't ready. It's not stalling because your people don't get it. It's stalling because a layer of your organization has a structural incentive to slow it down and the organizational power to do so.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The frozen middle is not a culture problem. It's an incentive problem.&lt;/strong&gt; Culture is downstream of incentives.&lt;/p&gt;

&lt;p&gt;The fix requires C-suite leaders to do something harder than issuing mandates: they have to redesign the incentive structure for the layer below them, measure what matters, and be willing to have uncomfortable conversations when adoption stalls.&lt;/p&gt;

&lt;p&gt;The alternative is another year of pilot purgatory, another budget cycle where AI tools don't make the list, and another all-hands where someone asks why the AI roadmap is behind.&lt;/p&gt;

&lt;p&gt;The technology is ready. The question is whether the organization is willing to get out of its own way.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Are you a middle manager who sees other hurdles to adoption or success? I'd love to get your perspective -- let's discuss in the comments. Regardless of role, what pattern have you run into most: pilot purgatory, budget deflection, or the security veto? Drop it in the comments. I'm curious whether one of these dominates, or whether it's different by industry.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;PwC (January 2026). 29th Global CEO Survey: Leading Through Uncertainty in the Age of AI. Survey of 4,454 CEOs across 95 countries. &lt;a href="https://www.pwc.com/gx/en/issues/c-suite-insights/ceo-survey.html" rel="noopener noreferrer"&gt;https://www.pwc.com/gx/en/issues/c-suite-insights/ceo-survey.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;BCG (October 2024). AI Adoption in 2024: 74% of Companies Struggle to Achieve and Scale Value. Survey of 1,000 CxOs and senior executives across 59 countries. &lt;a href="https://www.prnewswire.com/news-releases/ai-adoption-in-2024-74-of-companies-struggle-to-achieve-and-scale-value-302285294.html" rel="noopener noreferrer"&gt;https://www.prnewswire.com/news-releases/ai-adoption-in-2024-74-of-companies-struggle-to-achieve-and-scale-value-302285294.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Gartner (July 2024). Gartner Predicts 30% of Generative AI Projects Will Be Abandoned After Proof of Concept by End of 2025. &lt;a href="https://www.gartner.com/en/newsroom/press-releases/2024-07-29-gartner-predicts-30-percent-of-generative-ai-projects-will-be-abandoned-after-proof-of-concept-by-end-of-2025" rel="noopener noreferrer"&gt;https://www.gartner.com/en/newsroom/press-releases/2024-07-29-gartner-predicts-30-percent-of-generative-ai-projects-will-be-abandoned-after-proof-of-concept-by-end-of-2025&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;McKinsey &amp;amp; Company (September 2025). The State of AI: Global Survey 2025. Survey of 1,993 participants across 105 countries. &lt;a href="https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai" rel="noopener noreferrer"&gt;https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Goldman Sachs / Fortune (March 2026). Goldman finds no meaningful relationship between AI and productivity at the economy-wide level. &lt;a href="https://fortune.com/2026/03/03/goldman-earnings-ai-anxiety-no-meaningful-impact-productivity-economy-30-percent-in-2-areas/" rel="noopener noreferrer"&gt;https://fortune.com/2026/03/03/goldman-earnings-ai-anxiety-no-meaningful-impact-productivity-economy-30-percent-in-2-areas/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;SecureWorld (2025). Frozen in the Middle: The AI Bottleneck. Citing Salesforce data on unsanctioned AI tool usage. &lt;a href="https://www.secureworld.io/industry-news/frozen-middle-ai-bottleneck" rel="noopener noreferrer"&gt;https://www.secureworld.io/industry-news/frozen-middle-ai-bottleneck&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Deloitte Global (October 2025). AI ROI: The Paradox of Rising Investment and Elusive Returns. Survey of 1,854 senior executives across 14 countries. &lt;a href="https://www.deloitte.com/global/en/issues/generative-ai/ai-roi-the-paradox-of-rising-investment-and-elusive-returns.html" rel="noopener noreferrer"&gt;https://www.deloitte.com/global/en/issues/generative-ai/ai-roi-the-paradox-of-rising-investment-and-elusive-returns.html&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;strong&gt;If this resonated, here are some related articles:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Are companies really doing layoffs "for AI"?&lt;/strong&gt; (the structural threat middle managers feel is real — this article examines what's actually happening when companies cite AI as a reason for workforce reductions): &lt;a href="https://www.linkedin.com/pulse/companies-really-doing-layoffs-ai-keith-mackay-jtkfe/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/p/are-companies-really-doing-layoffs" rel="noopener noreferrer"&gt;Substack&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What the Beer Game teaches us about AI adoption&lt;/strong&gt; (the bullwhip effect hits AI rollouts too, and the frozen middle is exactly where the distortion originates): &lt;a href="https://www.linkedin.com/pulse/ai-bullwhip-what-beer-game-teaches-us-uneven-adoption-keith-mackay-nluae" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/p/the-ai-bullwhip" rel="noopener noreferrer"&gt;Substack&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dunning-Kruger, now available at enterprise scale&lt;/strong&gt; (why organizations confidently underestimate what AI adoption actually requires — and how overconfidence at the top feeds resistance in the middle): &lt;a href="https://www.linkedin.com/pulse/dunning-kruger-effect-now-available-enterprise-scale-keith-mackay-kclxf/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  - &lt;strong&gt;Situational leadership applied to AI&lt;/strong&gt; (the same incentive logic that freezes middle managers applies when deciding how much autonomy to grant your AI collaborator): &lt;a href="https://www.linkedin.com/pulse/situational-leadership-ai-more-like-capable-colleague-keith-mackay-wjqoe" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/p/situational-leadership-for-ai" rel="noopener noreferrer"&gt;Substack&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Keith MacKay is a technology strategy consultant and CTO in EY-Parthenon's Software Strategy Group (SSG), specializing in AI disruption and technology diligence for private equity and corporate clients. SSG's AI Disruption Lab conducts rapid assessments of how AI transforms and threatens existing business models and value chains. Keith teaches at Northeastern University and writes about strategy, management, and AI/technology, with an AI collaborator.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>management</category>
      <category>leadership</category>
    </item>
    <item>
      <title>The Companies Disrupting Your Job Are Now Proposing Rules to Protect It</title>
      <dc:creator>Keith MacKay</dc:creator>
      <pubDate>Wed, 10 Jun 2026 19:51:22 +0000</pubDate>
      <link>https://dev.to/keithjmackay/the-companies-disrupting-your-job-are-now-proposing-rules-to-protect-it-54bf</link>
      <guid>https://dev.to/keithjmackay/the-companies-disrupting-your-job-are-now-proposing-rules-to-protect-it-54bf</guid>
      <description>&lt;h1&gt;
  
  
  The Companies Disrupting Your Job Are Now Proposing Rules to Protect It
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;The robot tax, the four-day workweek, the public wealth fund. Good ideas, self-serving ideas, and missing ideas. Let's sort them out.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;You might have noticed that the companies building the technology most likely to displace workers are now publishing policy papers about how to protect workers. OpenAI released "Industrial Policy for the Intelligence Age" in April 2026: a 13-page document addressed primarily to US policymakers, proposing robot taxes, public wealth funds, a shorter workweek, and a new social contract for the AI era [1].&lt;/p&gt;

&lt;p&gt;Is the paper worth your time? Some of it is genuinely forward-thinking. Some of it is a "policymercial": advocacy dressed as policy analysis, and a way for the industry leaders to try to shape the policy conversation that (as always) is slower than the evolution of the technology. As a result there are some policy ideas that are not covered at all.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Happens If We Do Nothing
&lt;/h2&gt;

&lt;p&gt;Before sorting through proposals from vested stakeholders, it's worth thinking about what the stakes are. The paper is not catastrophizing. The risks it names are real and already in motion.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The labor market shock is not hypothetical.&lt;/strong&gt; Anthropic's labor market research found measurable displacement effects concentrated in software development, customer support, and data analysis: precisely the knowledge worker categories that expanded most dramatically over the past two decades [2]. Workers in the most exposed roles are 47% higher-paid and more educated than average: the people who least expected to be displaced. Young workers aged 22-25 in exposed occupations are already finding jobs 14% less often than they were before ChatGPT's release. Aggregate unemployment statistics don't show it yet. The leading edge does.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The energy grid is already straining.&lt;/strong&gt; Data center electricity consumption could reach 1,050 TWh globally by 2026, placing the sector among the largest power consumers on the planet [3]. In Ohio, electricity prices rose from 11-12 cents per kilowatt hour in 2020 to 19 cents in 2025 [4]. Dominion Energy proposed its first base rate increase since 1992, adding roughly $8.51 per month to a typical household's bill [4]. The cost of AI infrastructure doesn't disappear when a company pays its cloud bill. It distributes to ratepayers who didn't sign up for it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The wealth concentration problem compounds.&lt;/strong&gt; As OpenAI's own paper acknowledges: as AI replaces labor, wealth accrues to owners of capital rather than owners of labor. Tax systems built on payroll taxes and labor income lose their base. Social Security, Medicaid, and housing assistance all depend on labor income as their funding mechanism. If labor income shrinks and capital income grows, the safety net becomes structurally underfunded exactly when demand for it rises.&lt;/p&gt;

&lt;p&gt;The numbers are no longer theoretical. In 2026, Block cut 4,000 workers (40% of its workforce) with CEO Jack Dorsey explicitly attributing the cuts to AI and agentic workflows [16]. Meta, Intuit, and Atlassian followed with cuts of 10%, 17%, and 10% of their workforces respectively [17]. Tech layoffs reached 142,000 in 2026, with roughly half attributed to AI automation [18]. Worth noting: UVA's Darden School questioned whether Block's framing was "AI strategy or AI scapegoat" (Dorsey also admitted to over-hiring during COVID), but the trend across companies is too consistent to explain away.&lt;/p&gt;

&lt;p&gt;The disruption is compounding in a second way. A May 2026 Gartner study of 350 global executives found that organizations cutting workers to demonstrate AI returns are not seeing them. Companies using AI to amplify workers outperformed automation-only strategies. Gartner VP Helen Poitevin: "Workforce reductions may create budget room, but they do not create return." [19] The companies dismantling their workforces are likely making a strategic mistake as well as a social one.&lt;/p&gt;

&lt;p&gt;None of this requires a dystopian scenario. It only requires the trends already underway to continue at their current trajectory.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Experts Disagree. Their Incentives Don't.
&lt;/h2&gt;

&lt;p&gt;The gap between the most optimistic and most pessimistic economic forecasts for AI may be the largest disagreement in modern macroeconomics.&lt;/p&gt;

&lt;p&gt;Daron Acemoglu, the MIT economist who shared the 2024 Nobel Prize in Economics, ran the numbers. His paper "The Simple Macroeconomics of AI" uses Hulten's theorem: GDP impact equals the fraction of tasks affected times the average cost savings. His conclusion: AI will add roughly 0.07% to annual TFP growth over the next decade (about 0.71% in total) [5]. In a May 2026 interview, Acemoglu identified three signals to watch: whether AI agents can handle the multi-task fluidity actual jobs require (he's skeptical), whether the wave of AI apps comparable to transformative software like Excel ever materializes (still missing), and whether the accelerating race by AI companies to hire prominent economists ends up producing genuine research or just expensive hype management [6].&lt;/p&gt;

&lt;p&gt;On that last point, he was direct: "What I hope we won't get is that they're interested in economists just to further their viewpoints or further the hype." OpenAI has recruited Jason Furman (former Obama economic adviser) and installed Ronnie Chatterji as its first chief economist. Anthropic has assembled a 10-member economic advisory council. Google DeepMind hired Alex Imas as "director of AGI economics." Every major AI lab now employs senior economists whose incentives run directly toward findings that support their employers' continued growth.&lt;/p&gt;

&lt;p&gt;Dario Amodei, CEO of Anthropic, is at the opposite end of the forecast range. At Davos in January 2026, he said: "I can see a world where AI brings the developed world GDP growth to something like 10 or 15 percent." He added that the scenario could produce "5% to 10% GDP growth together with an unemployment rate of 10%," acknowledging that "that's not a combination we've almost ever seen before" [7]. The gap between Acemoglu's 0.07% annual and Amodei's 15% GDP scenario represents roughly $100 trillion in cumulative global output over a decade. It is, straightforwardly, a bet on whether AI will be incrementally useful or historically transformative. (Amodei is building a company valued at over $60 billion on the transformative premise. His forecast is also his fundraising pitch.)&lt;/p&gt;

&lt;p&gt;Jamie Dimon occupies the pragmatic middle. In May 2026, JPMorgan's CEO said the bank will "hire more AI people and fewer bankers in certain categories." He warned that the pace of AI adoption "may go too fast for society" and gave a specific example: two million commercial truckers displaced too quickly could spark civil unrest [8]. His prescription is collaborative public-private management of the transition. His concern, notably, is not purely altruistic. Rapid mass unemployment destabilizes the financial system. Social unrest is bad for banks. JPMorgan has roughly 25,000 to 30,000 departing employees per year through natural attrition; Dimon can afford to absorb AI displacement gradually precisely because of that turnover buffer. His call for a managed transition is also risk management.&lt;/p&gt;

&lt;p&gt;Gregory Daco, Chief Economist at EY-Parthenon, offers the most grounded near-term read. The data already shows what he calls "a clear decoupling between growth and hiring": output expanding while companies generate that growth with fewer workers and fewer hours [9]. But he's cautious about the causal story. Only about 10% of firms currently use AI to produce goods and services. He notes he's "not entirely sure this is a replacement situation where talent is being replaced by technology." At least not yet at scale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Anthropic's own Economic Index provides the most specific ground-truth data available.&lt;/strong&gt; Only 4% of jobs use AI for 75% or more of their tasks. About 36% have some AI involvement for at least a quarter of their work [2]. The gap between theoretical capability and actual deployment remains wide. The signal most worth watching: a 14% reduction in job-finding rates for workers aged 22 to 25 in highly AI-exposed occupations since ChatGPT's release [2]. Aggregate unemployment statistics don't yet show it. The leading edge does.&lt;/p&gt;

&lt;p&gt;Paul Krugman captures the distributional challenge in a sentence: "How do we figure out a system for not just having prosperity, but for having shared prosperity?" [10] He and MIT's Erik Brynjolfsson both flag a J-curve dynamic: productivity gains lag the initial investment as companies retrain workers and redesign processes. The gains, if Amodei is even partially right, could be extraordinary. Whether they are broadly shared is not an economic question. It's a policy one.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Proposals (and What's Behind Each One)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Robot Tax
&lt;/h3&gt;

&lt;p&gt;OpenAI proposes shifting the tax base from labor income and payroll taxes toward corporate income and capital gains, including potential levies on AI-driven automation [1]. The logic: if a company replaces ten workers with AI, it no longer pays its share of the payroll taxes those workers generated. Taxing the automation compensates for the lost base.&lt;/p&gt;

&lt;p&gt;This has precedent. South Korea reduced tax incentives for companies investing in automation in 2017, effectively raising their relative tax burden compared to human employment. The EU has debated robot taxes repeatedly. Bill Gates proposed one explicitly in a 2017 interview, arguing that taxing robots at rates equivalent to displaced workers would fund retraining and social investment [11].&lt;/p&gt;

&lt;p&gt;MIT economists modeling the optimal robot tax rate suggest a range of 1-3.7%: high enough to recapture displaced payroll tax revenue without driving automation offshore or pushing companies toward human-capital-light business models that simply move jobs rather than create them [20]. The design of the rate matters as much as the principle.&lt;/p&gt;

&lt;p&gt;The idea also benefits OpenAI directly, though the paper doesn't say so plainly. A robot tax creates legitimacy for AI adoption by giving governments a fiscal mechanism to offset disruption, reducing political pressure to slow or block AI deployment. The tax pays for the permission.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Public Wealth Fund
&lt;/h3&gt;

&lt;p&gt;Modeled on Norway's sovereign wealth fund and Alaska's Permanent Fund, the proposal calls for a government-owned fund investing in AI companies and AI-adopting firms, distributing returns to citizens [1]. The logic: if you can't work your way into the upside of AI, at least own a piece of it.&lt;/p&gt;

&lt;p&gt;Alaska's Permanent Fund has distributed $1,000-$2,000 annually to every resident since 1982, funded by oil revenues [12]. Norway's Government Pension Fund Global has accumulated over $1.7 trillion [13]. The mechanisms work. The design question is: who seeds the fund, and with what?&lt;/p&gt;

&lt;p&gt;In June 2026, Senator Bernie Sanders announced he would introduce the American AI Sovereign Wealth Fund Act: a one-time 50% equity tax on OpenAI, Anthropic, and other large AI companies, seeding a government-owned fund to distribute direct cash payments to Americans [21]. Sanders explicitly cited OpenAI's own April 2026 paper as precedent. The proposal is blunter than what OpenAI had in mind. The company proposed that "policymakers and AI companies work together" on fund design. Congress heard "50% equity tax" and ran with the logic. When you propose a public wealth fund without specifying who pays for it, someone else will specify it for you.&lt;/p&gt;

&lt;p&gt;OpenAI leaves that deliberately vague. "Policymakers and AI companies should work together" is not a funding plan. A fund seeded with equity from frontier AI firms (including OpenAI) would spread the upside. A fund seeded with tax revenues from existing businesses would just be a new layer of redistribution. The difference matters enormously.&lt;/p&gt;

&lt;p&gt;The self-interest here: if citizens have financial stakes in AI success, they become politically invested in AI's continued growth. Public opposition to AI development becomes harder to sustain when opposition means opposing your own quarterly dividend.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Four-Day Workweek
&lt;/h3&gt;

&lt;p&gt;OpenAI proposes that productivity gains from AI be converted into shorter working hours: a four-day or 32-hour workweek with no reduction in pay [1]. This has actual experimental evidence behind it. Iceland ran the largest public sector four-day workweek trial in history from 2015 to 2019; researchers found maintained or improved productivity and measurably better worker wellbeing [14]. Subsequent trials across hundreds of companies in the UK, Ireland, and Australia have shown similar results.&lt;/p&gt;

&lt;p&gt;The appeal is obvious: if AI makes each worker more productive, sharing that productivity gain as time rather than headcount reduction keeps employment levels stable and improves worker wellbeing. The alternative is layoffs, which concentrates the gain at the ownership level.&lt;/p&gt;

&lt;p&gt;This one benefits OpenAI somewhat indirectly: it reduces political hostility to AI and makes the case that AI can be net-positive for workers. But it's also the proposal with the strongest independent evidence base and the least obvious hidden angle.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Expanded Safety Net
&lt;/h3&gt;

&lt;p&gt;The paper recommends boosting retirement contributions, expanding healthcare coverage, and subsidizing child and elder care for workers in AI-affected sectors [1]. This is largely a restatement of longstanding progressive labor policy wearing new clothes. It's not wrong. Worker transitions are easier when the safety net is thicker. The self-interest here is again about political legitimacy: AI companies benefit from a policy environment where disruption is manageable, not destabilizing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Energy and Infrastructure
&lt;/h3&gt;

&lt;p&gt;The paper calls for public-private partnerships to accelerate grid expansion, with a specific note that "households should not be subsidizing AI data centers" [1]. This is a striking thing for OpenAI to publish, given that current utility rate structures are already pushing data center costs onto residential ratepayers.&lt;/p&gt;

&lt;p&gt;More than 30 states are now proposing or implementing tariffs that require large-load customers (data centers) to pay higher rates to cover grid infrastructure costs [4]. The argument is straightforward: if you're adding load to the grid at scale, you should pay for the infrastructure that load requires, not spread it across residential users who consume a fraction of what you do.&lt;/p&gt;

&lt;p&gt;The self-interest is obvious: if data centers pay their own infrastructure costs, cloud economics stay competitive. If residential ratepayer backlash gets bad enough to trigger regulatory restrictions on data center siting and power access, that's a direct constraint on AI infrastructure expansion. Getting ahead of this with a principled-sounding position is smart positioning.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Missing
&lt;/h2&gt;

&lt;p&gt;The OpenAI paper is notably light on ideas that would benefit society at the cost of AI companies. Critics at TechPolicy.Press specifically noted the paper contains no mention of antitrust enforcement, data privacy protections, structural power asymmetries between AI companies and users, or geopolitical competition: omissions that collectively leave the paper unable to address the market concentration risks that may matter most [22]. Here are several proposals that have come from other quarters:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stronger liability for AI-displaced workers.&lt;/strong&gt; Some labor economists have proposed requiring companies to pay severance and retraining costs proportional to the number of AI-displaced roles, creating a direct financial obligation that internalizes the cost of displacement rather than socializing it. This would genuinely make AI adoption more expensive and slow the pace of workforce disruption.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI deployment moratoria in specific sectors.&lt;/strong&gt; Proposals from labor unions and some European policymakers include mandatory impact assessment periods before AI deployment in high-employment sectors, with worker consultation requirements. France and Germany have both floated versions of this for public sector employment. Slowing deployment is not in OpenAI's interest.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mandatory worker ownership stakes.&lt;/strong&gt; Several Nordic labor models require significant worker ownership in companies above certain sizes. Applied to AI companies, this would give labor a direct claim on AI-generated productivity rather than relying on government redistribution. OpenAI's paper talks about workers sharing in gains; it doesn't propose worker ownership mechanisms.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data rights and compensation.&lt;/strong&gt; The training data that makes large models capable came overwhelmingly from human creative and intellectual labor (writers, coders, artists) who received no compensation. Multiple legislative proposals in the EU and several US states would require licensing payments or data use fees for training data. This would materially increase AI development costs and constrain what models can be trained on. The paper doesn't address it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Open-source AI mandates.&lt;/strong&gt; Regulatory proposals from some researchers and public interest advocates would require frontier AI models above certain capability thresholds to release weights publicly after a defined period, cutting the ability of a small number of companies to capture long-term returns from AI. This would benefit the public while reducing the moat of companies like OpenAI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Energy quotas for AI data centers.&lt;/strong&gt; Direct caps on data center electricity consumption, or carbon pricing mechanisms that make compute more expensive, would slow AI infrastructure expansion and reduce grid stress. Several European countries are already considering this. The paper argues for better grid infrastructure rather than consumption limits: a position that happens to align precisely with AI companies' growth plans.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Independent economic research, free from AI company funding.&lt;/strong&gt; Acemoglu's concern about AI labs hiring economists to shape narrative deserves its own policy response. Public funding for truly independent AI economic research, at arm's length from companies with trillion-dollar stakes in the conclusions, is one of the simplest and most overlooked proposals. It does not appear anywhere in the OpenAI paper.&lt;/p&gt;

&lt;p&gt;Worth noting: Anthropic published its own competing policy paper, "Preparing for AI's Economic Impact," in early 2026 [15]. It is somewhat more candid than OpenAI's. Anthropic explicitly acknowledges that compute and token taxes would "directly impact Anthropic's revenue" and includes them anyway as options for severe disruption scenarios. That transparency about self-interest is rare in this genre, and the Anthropic paper is worth reading alongside the OpenAI one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sorting the List
&lt;/h2&gt;

&lt;p&gt;The proposals in the OpenAI paper range from genuinely good ideas with real evidence (the four-day workweek), to smart market-legitimizing moves dressed as altruism (the robot tax), to vague gestures pending real design work (the public wealth fund). The missing proposals (liability, moratoria, worker ownership, data rights) share one characteristic: they would cost AI companies something.&lt;/p&gt;

&lt;p&gt;That doesn't make the OpenAI paper cynical. Companies advocate for policy environments that allow them to operate. That's normal. The paper is more thoughtful than most corporate policy advocacy, and the risks it identifies are accurate.&lt;/p&gt;

&lt;p&gt;But a complete picture of the policy landscape requires reading it alongside the proposals its authors chose not to make. The full toolkit available to society for managing AI disruption is much wider than what any AI company will voluntarily recommend.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;The expert forecast range runs from 0.07% annual TFP growth to 15% GDP, and everyone with an opinion has skin in the game. The risks of doing nothing are real: already visible in electricity bills, in slowing hiring for young workers in exposed occupations, and in Dimon's warning that the pace of change "may go too fast for society." OpenAI's proposed solutions are a mix of good ideas, market-protecting moves, and incomplete designs. The solutions absent from the paper share one trait: they would cost AI companies something.&lt;/p&gt;

&lt;p&gt;The companies building the technology most likely to displace workers are also now building the economic research teams, the policy papers, and the narrative about what solutions should look like. Read the OpenAI paper. Read Anthropic's too. Then read Acemoglu. Then ask whose fingerprints are on the analysis you're using to make decisions.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What proposals do you think deserve more attention in the policy debate around AI and the economy? Which of the OpenAI proposals do you think would actually get traction?&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;a href="https://openai.com/index/industrial-policy-for-the-intelligence-age/" rel="noopener noreferrer"&gt;Industrial Policy for the Intelligence Age&lt;/a&gt;, OpenAI, April 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.anthropic.com/research/labor-market-impacts" rel="noopener noreferrer"&gt;Labor market impacts of AI: A new measure and early evidence&lt;/a&gt;, Anthropic&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.iea.org/reports/energy-and-ai/energy-demand-from-ai" rel="noopener noreferrer"&gt;Energy demand from AI&lt;/a&gt;, IEA&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.npr.org/2026/01/02/nx-s1-5638587/ai-data-centers-use-a-lot-of-electricity-how-it-could-affect-your-power-bill" rel="noopener noreferrer"&gt;AI data centers use a lot of electricity. How it could affect your power bill&lt;/a&gt;, NPR&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.nber.org/papers/w32487" rel="noopener noreferrer"&gt;The Simple Macroeconomics of AI&lt;/a&gt;, Daron Acemoglu, NBER Working Paper 32487, 2024&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.technologyreview.com/2026/05/11/1137090/three-things-in-ai-to-watch-according-to-a-nobel-winning-economist" rel="noopener noreferrer"&gt;Three things in AI to watch, according to a Nobel-winning economist&lt;/a&gt;, MIT Technology Review, May 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://finance.yahoo.com/news/anthropic-ceo-dario-amodei-warns-173107720.html" rel="noopener noreferrer"&gt;Anthropic CEO Dario Amodei warns of 5-10% GDP growth with 10% joblessness&lt;/a&gt;, Yahoo Finance / Davos, January 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.bankingdive.com/news/jpmorgan-dimon-ai-effect-jobs-workforce-davos/810247/" rel="noopener noreferrer"&gt;Dimon: AI's effect on labor market 'may go too fast for society'&lt;/a&gt;, Banking Dive; see also &lt;a href="https://www.bloomberg.com/news/articles/2026-05-21/dimon-says-jpmorgan-will-hire-more-ai-people-fewer-bankers" rel="noopener noreferrer"&gt;JPMorgan will hire more AI specialists, fewer bankers&lt;/a&gt;, Bloomberg, May 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://finance.yahoo.com/news/productivity-gains-fuel-u-growth-131054041.html" rel="noopener noreferrer"&gt;Productivity gains fuel U.S. growth while hiring slows&lt;/a&gt;, Gregory Daco / Yahoo Finance&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://paulkrugman.substack.com/p/how-should-we-think-about-the-economics" rel="noopener noreferrer"&gt;How should we think about the economics of AI?&lt;/a&gt;, Paul Krugman / Substack (featuring Erik Brynjolfsson)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://qz.com/911968/bill-gates-the-robot-that-takes-your-job-should-pay-taxes" rel="noopener noreferrer"&gt;Bill Gates: The robot that takes your job should pay taxes&lt;/a&gt;, Quartz, 2017&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://pfd.alaska.gov/" rel="noopener noreferrer"&gt;Alaska Permanent Fund Dividend&lt;/a&gt;, Alaska Department of Revenue&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.nbim.no/en/the-fund/market-value/" rel="noopener noreferrer"&gt;The fund's market value&lt;/a&gt;, Norges Bank Investment Management&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://autonomy.work/portfolio/icelandsww/" rel="noopener noreferrer"&gt;Going Public: Iceland's Journey to a Shorter Working Week&lt;/a&gt;, Autonomy / ALDA, 2021 (summary of the 2015-2019 government trials)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.anthropic.com/research/economic-policy-responses" rel="noopener noreferrer"&gt;Preparing for AI's economic impact: exploring policy responses&lt;/a&gt;, Anthropic, 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://fortune.com/2026/02/27/block-jack-dorsey-ceo-xyz-stock-square-4000-ai-layoffs/" rel="noopener noreferrer"&gt;Block, Jack Dorsey's company, cuts 4,000 workers — CEO says AI is taking their jobs&lt;/a&gt;, Fortune, February 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://tech.co/news/companies-replace-workers-with-ai" rel="noopener noreferrer"&gt;Companies replacing workers with AI in 2026&lt;/a&gt;, Tech.co&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.techtimes.com/articles/317392/20260529/tech-layoffs-reach-142000-2026-profitable-companies-cut-jobs-fund-700b-ai-infrastructure.htm" rel="noopener noreferrer"&gt;Tech layoffs reach 142,000 in 2026 as profitable companies cut jobs to fund $700B AI infrastructure&lt;/a&gt;, TechTimes, May 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.gartner.com/en/newsroom/press-releases/2026-05-05-gartner-says-autonomous-business-and-artificial-intelligence-layoffs-may-create-budget-room-but-do-not-deliver-returns" rel="noopener noreferrer"&gt;Gartner Says Autonomous Business and Artificial Intelligence Layoffs May Create Budget Room But Do Not Deliver Returns&lt;/a&gt;, Gartner, May 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://firstmovers.ai/universal-basic-income-automation/" rel="noopener noreferrer"&gt;Universal Basic Income and Automation: The Optimal Robot Tax&lt;/a&gt;, First Movers AI&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.washingtontimes.com/news/2026/jun/2/sen-bernie-sanders-teases-bill-give-public-ownership-ai-companies/" rel="noopener noreferrer"&gt;Sen. Bernie Sanders teases bill to give public ownership of AI companies&lt;/a&gt;, Washington Times, June 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.techpolicy.press/the-doublespeak-in-openais-industrial-policy-for-the-intelligence-age/" rel="noopener noreferrer"&gt;The Doublespeak in OpenAI's Industrial Policy for the Intelligence Age&lt;/a&gt;, TechPolicy.Press&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;strong&gt;If this resonated, here are some related articles:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For why AI layoffs are as much about narrative as headcount, and what the incentive structure actually looks like: &lt;a href="https://www.linkedin.com/pulse/companies-really-doing-layoffs-ai-keith-mackay-jtkfe/" rel="noopener noreferrer"&gt;On LinkedIn: Are Companies Really Doing Layoffs "For AI"?&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/p/are-companies-really-doing-layoffs" rel="noopener noreferrer"&gt;On Substack&lt;/a&gt; | &lt;a href="https://medium.com/@keithwrites/are-companies-really-doing-layoffs-for-ai-1186f32b1b9d" rel="noopener noreferrer"&gt;On Medium&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;For the infrastructure cost math behind AI, including power, memory, and chip scarcity: &lt;a href="https://www.linkedin.com/pulse/ai-infrastructure-scarcity-raising-costs-usage-still-provide-mackay-y2hce/" rel="noopener noreferrer"&gt;On LinkedIn: AI Infrastructure Scarcity is Raising Costs, but AI Usage Will Still Provide Unbeatable ROI&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/p/ai-infrastructure-scarcity-is-raising" rel="noopener noreferrer"&gt;On Substack&lt;/a&gt; | &lt;a href="https://medium.com/@keithwrites/ai-infrastructure-scarcity-is-raising-costs-but-ai-usage-will-still-provide-unbeatable-roi-3194d3178132" rel="noopener noreferrer"&gt;On Medium&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;For how exponential AI change outpaces linear human thinking, the frame behind every "we'll adapt" argument: &lt;a href="https://www.linkedin.com/pulse/were-linear-thinkers-exponentially-changing-world-keith-mackay-ckoqe/" rel="noopener noreferrer"&gt;On LinkedIn: We're Linear Thinkers in an Exponentially-Changing World&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/p/were-linear-thinkers-in-an-exponential" rel="noopener noreferrer"&gt;On Substack&lt;/a&gt; | &lt;a href="https://medium.com/@keithwrites/were-linear-thinkers-in-an-exponentially-changing-world-4b65d324d6b3" rel="noopener noreferrer"&gt;On Medium&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;For the energy story from a different angle, the clean energy breakthrough that could change AI's infrastructure math: &lt;a href="https://www.linkedin.com/pulse/clean-energy-breakthrough-thats-coming-keith-mackay-xwsvc/" rel="noopener noreferrer"&gt;On LinkedIn: The Clean Energy Breakthrough That's Coming&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Keith MacKay is a technology strategy consultant and CTO in EY-Parthenon's Software Strategy Group (SSG), specializing in AI disruption and technology diligence for private equity and corporate clients. SSG's AI Disruption Lab conducts rapid assessments of how AI transforms and threatens existing business models and value chains. Keith teaches at Northeastern University and writes about strategy, management, and AI/technology, with Claude Code and Codex as AI collaborators.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>career</category>
      <category>management</category>
    </item>
    <item>
      <title>The Clean Energy Breakthrough That's Coming</title>
      <dc:creator>Keith MacKay</dc:creator>
      <pubDate>Mon, 08 Jun 2026 22:00:13 +0000</pubDate>
      <link>https://dev.to/keithjmackay/the-clean-energy-breakthrough-thats-coming-13mf</link>
      <guid>https://dev.to/keithjmackay/the-clean-energy-breakthrough-thats-coming-13mf</guid>
      <description>&lt;h1&gt;
  
  
  The Clean Energy Breakthrough That's Starting Now
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;The bottleneck for the energy transition was never sunlight. It was always materials. AI just kicked the door in.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;The wind is free. The sun is free. We've known how to capture both for decades.&lt;/p&gt;

&lt;p&gt;What we haven't had: the right materials to store and convert that energy efficiently enough to matter at scale. That's the actual problem. Not political will. Not capital. Not engineering effort. The right atoms, arranged the right way, at a cost that pencils out.&lt;/p&gt;

&lt;p&gt;For most of human history, finding those materials required synthesizing compounds one at a time, testing them, watching them fail, and starting over. Progress moved at the speed of human hands and human patience. It was slow. Painstakingly, expensively slow.&lt;/p&gt;

&lt;p&gt;In December 2023, something changed.&lt;/p&gt;

&lt;h2&gt;
  
  
  2.2 Million New Materials, Overnight
&lt;/h2&gt;

&lt;p&gt;Google DeepMind published a paper in &lt;em&gt;Nature&lt;/em&gt; describing GNoME: Graph Networks for Materials Exploration [1]. The model identified 2.2 million new stable crystal structures. To put that in perspective: that number exceeds all previously known stable inorganic materials discovered across the entire history of human science. Combined.&lt;/p&gt;

&lt;p&gt;Of those 2.2 million candidates, 380,000 were predicted to be stable enough for practical use.&lt;/p&gt;

&lt;p&gt;Let that land. Decades of painstaking laboratory work, hundreds of thousands of researchers, centuries of collective effort: one baseline. One AI model run: more than double that baseline, in a single study.&lt;/p&gt;

&lt;p&gt;This is what exponential change looks like when it arrives in a field that's been moving linearly for generations.&lt;/p&gt;

&lt;h2&gt;
  
  
  What GNoME Did
&lt;/h2&gt;

&lt;p&gt;The traditional materials discovery pipeline has four steps: hypothesize, synthesize, test, fail. Repeat until you find something. Or run out of funding.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The average time from initial materials discovery to commercial application has historically been 10 to 20 years&lt;/strong&gt; [2]. That's not because scientists are slow. It's because the search space is astronomically large. Atoms combine in near-infinite configurations. Testing every candidate physically is simply not possible.&lt;/p&gt;

&lt;p&gt;GNoME didn't solve materials science. It changed the economics of the search.&lt;/p&gt;

&lt;p&gt;Instead of synthesizing compounds to see if they're stable, researchers can now screen millions of candidates computationally, identify the most promising subset, and only then run physical experiments. The hit rate on those experiments goes up dramatically. The cost and time of candidate generation drops from years to hours.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This is what AI does best: it doesn't replace the experiment. It filters the space of what's worth experimenting on.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Microsoft Went Further
&lt;/h2&gt;

&lt;p&gt;GNoME predicts whether a known candidate is stable. Microsoft's MatterGen model, released in 2024, does something more ambitious: it designs new materials to specification [3].&lt;/p&gt;

&lt;p&gt;Give it a target property set (high ionic conductivity, thermal stability, low toxicity, abundant constituent elements) and MatterGen generates candidate structures that fit. It's generative AI applied to the periodic table.&lt;/p&gt;

&lt;p&gt;The distinction matters. Stability prediction accelerates the search. Generative design changes the nature of the search entirely. You stop asking "which of these known compounds might work?" and start asking "what compound should exist to solve this problem?"&lt;/p&gt;

&lt;p&gt;That's a different kind of leverage.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Specific Bets: Batteries and Solar
&lt;/h2&gt;

&lt;p&gt;Two areas of clean energy stand to benefit most immediately.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solid-state batteries.&lt;/strong&gt; Today's lithium-ion batteries use liquid electrolytes. They work, with well-known limitations: flammable, limited energy density, performance degradation at temperature extremes. The better solution, theoretically, is solid-state electrolytes. Solid electrolytes could roughly double energy density and eliminate fire risk entirely [4].&lt;/p&gt;

&lt;p&gt;The problem: finding the right ionic conductor material. The winning material needs to conduct lithium ions efficiently while remaining mechanically stable, chemically inert with the electrodes, and manufacturable at scale. That's a brutal multi-constraint optimization problem across an enormous search space.&lt;/p&gt;

&lt;p&gt;GNoME-style screening is already generating thousands of solid electrolyte candidates for physical testing. What used to take a research group a decade of trial and error is now a computational job that runs overnight.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Perovskite solar cells.&lt;/strong&gt; Silicon solar cells are mature technology. They work. They've gotten cheaper. But their theoretical efficiency ceiling is known, and approaching it requires expensive manufacturing.&lt;/p&gt;

&lt;p&gt;Perovskites are a class of crystal structures with higher theoretical efficiency than silicon and potentially much cheaper production [5]. The catch: stability. Perovskite cells degrade in heat, humidity, and UV exposure in ways silicon doesn't. Solving that requires finding perovskite compositions that are both highly efficient and durable under real-world conditions.&lt;/p&gt;

&lt;p&gt;Those two properties don't always point to the same composition. Finding the intersection computationally, before burning through lab resources, is exactly what AI-assisted materials discovery enables.&lt;/p&gt;

&lt;h2&gt;
  
  
  While We're at It: Fusion
&lt;/h2&gt;

&lt;p&gt;Fusion — clean, abundant, theoretically limitless energy from hydrogen — has been "30 years away" since roughly 1955. The joke has earned its longevity. AI is making it less funny.&lt;/p&gt;

&lt;p&gt;On plasma control: in 2022, DeepMind and EPFL's Swiss Plasma Center published a &lt;em&gt;Nature&lt;/em&gt; paper describing a deep reinforcement learning controller that managed all 19 magnetic coils of a real tokamak simultaneously [6]. Trained entirely in simulation, deployed on hardware. It held plasma configurations no prior controller had achieved, including two simultaneous plasma droplets held in the same vessel — a first. Control frequency: 10 kHz. Faster than any human or physics-based system before it.&lt;/p&gt;

&lt;p&gt;Two years later, a Princeton team at the DIII-D National Fusion Facility published a follow-on paper that went further [7]. Their RL agent doesn't just control plasma — it predicts and avoids the tearing instabilities that cause plasma disruptions, a persistent bottleneck for stable fusion. The model forecast disruptions 300 milliseconds in advance. Enough time to correct course. In tests, it held plasma stable where uncontrolled discharges failed.&lt;/p&gt;

&lt;p&gt;On ignition: when NIF achieved fusion ignition in December 2022 — energy output exceeding laser input for the first time in history — AI had already predicted it. LLNL's cognitive simulation framework, trained on 150,000 high-fidelity simulations, assigned a 74% probability of ignition to that specific shot design before the laser fired [8]. The experimental result fell within the predicted yield range.&lt;/p&gt;

&lt;p&gt;In October 2025, DeepMind and Commonwealth Fusion Systems formalized a research partnership applying AI to CFS's SPARC tokamak: fast differentiable plasma simulation, RL-based optimization for maximum net energy, and real-time AI plasma control [9].&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The 30-year joke may need updating.&lt;/strong&gt; Not because fusion is solved — it isn't — but because the tools available to attack it are categorically different than they were five years ago.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pace of Science Has Changed
&lt;/h2&gt;

&lt;p&gt;Here's what most Earth Week coverage misses: this isn't a story about one breakthrough. It's a story about a change in the underlying rate of scientific discovery.&lt;/p&gt;

&lt;p&gt;Before AI-assisted materials screening, the constraint was synthesis throughput. You could only test so many compounds per year. Now the constraint is moving: it's becoming physical synthesis of the most promising AI-generated candidates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;That's a fundamentally different bottleneck.&lt;/strong&gt; And it's one that scales differently. Compute scales with Moore's Law. Physical labs scale with headcount and funding. The gap between what AI can propose and what labs can verify is going to widen for years before robotics and automated synthesis close it.&lt;/p&gt;

&lt;p&gt;The practical implication: the pipeline filling with candidates is getting much longer than the pipeline processing them. That sounds like a problem. It's actually an extraordinarily good problem to have. We've never been material-candidate-rich before. We've always been material-candidate-poor.&lt;/p&gt;

&lt;p&gt;A longer candidate pipeline means researchers can be more selective. They can filter not just for stability, but for earth-abundance of constituent elements, toxicity profiles, manufacturing compatibility, and cost. The optimization problem gets richer because the candidate pool is now large enough to support it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Some Ramifications
&lt;/h2&gt;

&lt;p&gt;Realistically, &lt;strong&gt;AI is not going to solve climate change.&lt;/strong&gt; It's a tool. A remarkably powerful one, applied to a specific bottleneck in a specific part of a much larger problem.&lt;/p&gt;

&lt;p&gt;Materials discovery is one lever. Grid infrastructure is another. Policy is another. Behavioral change is another. Economic incentives are another. AI accelerates exactly one of those levers, and only the research-and-discovery portion of it. The manufacturing scale-up, the regulatory approval, the capital formation, the installation logistics: those remain stubbornly human-speed problems for now.&lt;/p&gt;

&lt;p&gt;What AI does here is collapse the distance between "we need a better battery material" and "here are ten thousand candidates worth testing." That's not nothing. That might be the difference between a 10-year path to commercialization and a 5-year path. At the scale of energy transition, that difference is measured in gigatons of carbon.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Changing the rate of discovery changes the rate of transition. That matters.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  This Is An Underreported Story
&lt;/h2&gt;

&lt;p&gt;Earth Week is full of coverage about renewable capacity additions, EV adoption curves, and carbon credit markets. These are real and important. But the story that will look most significant in retrospect is quieter: AI is now operating as a materials scientist at a scale no human team could match.&lt;/p&gt;

&lt;p&gt;We've had the computational tools to model atomic interactions for decades. What changed in 2023 and 2024 is that AI learned to navigate that space intelligently, to predict what matters, to generate candidates that fit constraints we specify. The combination of GNoME's scale and MatterGen's generativity represents something genuinely new.&lt;/p&gt;

&lt;p&gt;It's not a single discovery. It's a new rate of discovery. And if you've spent any time thinking about exponential curves and what happens when a linearly-constrained process gets an exponential tool applied to it, the implications are significant.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;The clean energy transition has always been a materials problem wearing an energy problem's costume. We had enough sun and wind. We didn't have the right substances to catch it, store it, and move it efficiently. Finding those substances, the hard way, was taking too long.&lt;/p&gt;

&lt;p&gt;AI has just changed what "too long" means.&lt;/p&gt;

&lt;p&gt;Two million new candidate materials. Generative design to specification. Computational screening that filters millions of candidates before a single gram of material is synthesized.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The bottleneck hasn't been eliminated. But it has moved. And in exponential systems, where the bottleneck sits determines everything.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This Earth Week, the story worth paying attention to isn't the one about how much solar got installed. It's the one about what AI is building the path for next.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Which front do you think AI makes the biggest near-term difference on: materials discovery for batteries and solar, or plasma control for fusion? And is there a clean energy application I haven't mentioned that deserves more attention?&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;p&gt;[1] Merchant, A., Batzner, S., Schaarschmidt, S.M. et al., "Scaling deep learning for materials discovery," &lt;em&gt;Nature&lt;/em&gt; 624, 80–85, December 2023. &lt;a href="https://doi.org/10.1038/s41586-023-06735-9" rel="noopener noreferrer"&gt;https://doi.org/10.1038/s41586-023-06735-9&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[2] National Academies of Sciences, Engineering, and Medicine, "Frontiers of Materials Research: A Decadal Survey," The National Academies Press, 2019. &lt;a href="https://doi.org/10.17226/25244" rel="noopener noreferrer"&gt;https://doi.org/10.17226/25244&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[3] Zeni, C., Pinsler, R., Zügner, D. et al., "MatterGen: a generative model for inorganic materials design," &lt;em&gt;Nature&lt;/em&gt; 637, 354–363, January 2025. &lt;a href="https://doi.org/10.1038/s41586-024-08628-5" rel="noopener noreferrer"&gt;https://doi.org/10.1038/s41586-024-08628-5&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[4] Janek, J. &amp;amp; Zeier, W.G., "A solid future for battery development," &lt;em&gt;Nature Energy&lt;/em&gt; 1, 16141, 2016. &lt;a href="https://doi.org/10.1038/nenergy.2016.141" rel="noopener noreferrer"&gt;https://doi.org/10.1038/nenergy.2016.141&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[5] National Renewable Energy Laboratory, "Perovskite Solar Cells," NREL Research, &lt;a href="https://www.nrel.gov/pv/perovskite-solar-cells.html" rel="noopener noreferrer"&gt;https://www.nrel.gov/pv/perovskite-solar-cells.html&lt;/a&gt; (accessed April 2026).&lt;/p&gt;

&lt;p&gt;[6] Degrave, J., Felici, F., Kohler, J., et al., "Magnetic control of tokamak plasmas through deep reinforcement learning," &lt;em&gt;Nature&lt;/em&gt; 602, 414–419, February 2022. &lt;a href="https://doi.org/10.1038/s41586-021-04301-9" rel="noopener noreferrer"&gt;https://doi.org/10.1038/s41586-021-04301-9&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[7] Seo, J., Kim, S., Jalalvand, A., et al., "Avoiding fusion plasma tearing instability with deep reinforcement learning," &lt;em&gt;Nature&lt;/em&gt; 626, 746–751, February 2024. &lt;a href="https://doi.org/10.1038/s41586-024-07024-9" rel="noopener noreferrer"&gt;https://doi.org/10.1038/s41586-024-07024-9&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[8] LLNL used AI to predict historic fusion ignition shot — LLNL institutional release describing the cognitive simulation framework (trained on 150,000+ simulations) and 74% ignition probability prediction. Primary journal paper: Humbird, K.D., et al., &lt;em&gt;Science&lt;/em&gt; (2024). &lt;a href="https://www.llnl.gov/article/53316/llnl-used-ai-predict-historic-fusion-ignition-shot" rel="noopener noreferrer"&gt;https://www.llnl.gov/article/53316/llnl-used-ai-predict-historic-fusion-ignition-shot&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[9] Google DeepMind and Commonwealth Fusion Systems research partnership, October 2025: &lt;a href="https://deepmind.google/blog/bringing-ai-to-the-next-generation-of-fusion-energy/" rel="noopener noreferrer"&gt;https://deepmind.google/blog/bringing-ai-to-the-next-generation-of-fusion-energy/&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;If this resonated, here are some related articles:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For the argument that AI agents are the first tools capable of tackling Fuller's cataloged global resource problems — including materials scarcity: &lt;a href="https://www.linkedin.com/pulse/bucky-fullers-to-do-list-can-ai-finally-solve-worlds-cataloged-keith-7c0vc/" rel="noopener noreferrer"&gt;Bucky Fuller's To-Do List: Can AI Finally Solve the World's Cataloged Problems?&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;For why 2.2 million new materials feels cognitively impossible — and why exponential tools keep surprising even people who know better: &lt;a href="https://www.linkedin.com/pulse/were-linear-thinkers-exponentially-changing-world-keith-mackay-ckoqe/" rel="noopener noreferrer"&gt;We're Linear Thinkers in an Exponentially-Changing World&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/p/were-linear-thinkers-in-an-exponential" rel="noopener noreferrer"&gt;Substack&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;For why the ROI math on running millions of AI-driven materials screenings still works decisively, even as compute costs climb: &lt;a href="https://www.linkedin.com/pulse/ai-infrastructure-scarcity-raising-costs-usage-still-provide-mackay-y2hce/" rel="noopener noreferrer"&gt;AI Infrastructure Scarcity is Raising Costs, but AI Usage Will Still Provide Unbeatable ROI&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/p/ai-infrastructure-scarcity-is-raising" rel="noopener noreferrer"&gt;Substack&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Keith MacKay is a technology strategy consultant and CTO in EY-Parthenon's Software Strategy Group (SSG), specializing in AI disruption and technology diligence for private equity and corporate clients. SSG's AI Disruption Lab conducts rapid assessments of how AI transforms and threatens existing business models and value chains. Keith teaches at Northeastern University and writes about strategy, management, and AI/technology, with an AI collaborator.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>cleancoding</category>
      <category>datascience</category>
      <category>agents</category>
    </item>
    <item>
      <title>AI Augmentation: Amazing. Replacement: A Rarity (AI Can't Do Your Whole Job).</title>
      <dc:creator>Keith MacKay</dc:creator>
      <pubDate>Mon, 08 Jun 2026 17:19:28 +0000</pubDate>
      <link>https://dev.to/keithjmackay/ai-augmentation-amazing-replacement-a-rarity-ai-cant-do-your-whole-job-4p7i</link>
      <guid>https://dev.to/keithjmackay/ai-augmentation-amazing-replacement-a-rarity-ai-cant-do-your-whole-job-4p7i</guid>
      <description>&lt;h1&gt;
  
  
  AI Augmentation: Amazing. AI Replacement: A Rarity (It Can Only Do a Fraction of Your Job).
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;The "AI will take your job" prediction keeps getting the unit of analysis wrong. Jobs are bundles, and AI only handles part of the stack.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;Your legal team just ran a document review that would have taken three paralegals two weeks. An AI completed it in four hours. Your CFO is now asking the obvious question: do we still need paralegals?&lt;/p&gt;

&lt;p&gt;The question sounds reasonable. The answer is yes. The confusion about why reveals something important about what jobs actually are.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Job Is Not a Task
&lt;/h2&gt;

&lt;p&gt;When people say "AI will take jobs," they're collapsing two different things.&lt;/p&gt;

&lt;p&gt;A task is a discrete unit of work: summarize this contract, identify anomalies in this dataset, generate a first draft of this email. A job is a bundle of dozens of tasks, plus the judgment that connects them, plus the relationships that give the output meaning, plus the accountability for when things go wrong.&lt;/p&gt;

&lt;p&gt;AI is genuinely good at tasks. AI cannot hold a job.&lt;/p&gt;

&lt;p&gt;Think about what a paralegal actually does over the course of a month. Document review is maybe 30% of it. The rest: advising attorneys on case strategy based on accumulated pattern recognition, managing client communication that requires tone-reading and discretion, deciding which documents in a production are strategically significant versus merely responsive, carrying institutional knowledge about the firm's risk tolerance and client history, and being accountable (in a professional and legal sense) for the work product.&lt;/p&gt;

&lt;p&gt;The AI completed the document review. It cannot do the rest. The paralegal who now does less document review has more time to do the rest better.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A job has dimensionality. A task is one-dimensional.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Dimension Stack
&lt;/h2&gt;

&lt;p&gt;Think of every job as a stack of dimensions. Each dimension describes a type of work along a spectrum from "AI handles this reliably" to "AI struggles and a human is essential":&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Volume and pattern recognition:&lt;/strong&gt; AI wins, and it isn't close. Processing 200,000 documents, reading radiology scans for anomalies, flagging fraud transactions at scale: these are high-volume, pattern-rich tasks where AI outperforms humans on speed and consistency, especially at 2 AM.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Judgment under ambiguity:&lt;/strong&gt; Humans win. When the facts are incomplete, the stakeholders are difficult, the situation has no clear precedent, and being wrong has real consequences, AI generates plausible-sounding answers. Humans know what they don't know. (Mostly.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Relational complexity:&lt;/strong&gt; Humans win. Negotiating a contract isn't just parsing terms: it's reading the room, understanding what the other party actually wants versus what they're asking for, and deciding how hard to push. AI can prepare you for that conversation. It cannot have it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Accountability:&lt;/strong&gt; Humans win by default. Someone has to own the outcome. AI doesn't hold a professional license, can't be sued, and can't make the judgment call about when a risk is worth taking. When AI-assisted work goes wrong, the human in the loop is still the one in front of the client or the regulator.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Novel framing:&lt;/strong&gt; Humans win (for now). Identifying the right question (deciding which problem is worth solving before anyone has framed it) is still predominantly human territory.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most jobs touch all five dimensions. AI currently handles the first well and struggles with the other four.&lt;/p&gt;

&lt;p&gt;MIT economist Daron Acemoglu, in a 2024 working paper on the macroeconomics of AI, made a similar point with more precision [1]. His argument: AI's productivity gains are real, but they concentrate in a narrow slice of tasks within each occupation. He estimated that AI, in its current form, materially affects only about 5% of tasks in the average job: the high-volume, pattern-rich slice. The other 95%, requiring what he called multi-task fluidity (the ability to switch between judgment calls, relational work, novel situations, and domain-specific improvisation across a single workday), remains outside what current systems can handle reliably. His projected contribution to overall economic growth: roughly 0.07% annually. Nowhere near the 5-10% projections from the optimist camp. His 5% figure is the most conservative in the field; Goldman Sachs estimates 25% of all work tasks are eventually automatable, and Penn Wharton puts 40% of labor income in the exposure zone [2]. The right answer is somewhere in that range, which is large enough to be consequential and uncertain enough to warrant humility about any single projection.&lt;/p&gt;

&lt;p&gt;The fluidity point is underappreciated. A paralegal doesn't spend eight hours on document review and then clock out. They spend 90 minutes on document review, then pivot to a client call that requires empathy and discretion, then draft a memo that requires strategic judgment, then field an unexpected question that requires institutional memory. The pivot itself, the reading of context to know which cognitive mode to engage, is something AI cannot do. The tasks are automatable in isolation. The job, the sequence of pivots across a day, is not.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Pairs Research Shows
&lt;/h2&gt;

&lt;p&gt;A meta-analysis across medical, legal, and technical domains found a consistent performance staircase: human alone 68%, AI alone 77%, AI plus human 80%, full collaborative framework 88% [3]. The gap between AI alone and full human-AI collaboration is larger than the gap between AI alone and human alone. The pairing matters.&lt;/p&gt;

&lt;p&gt;Gartner's May 2026 study of 350 executives reinforces the organizational stakes. Companies using AI to amplify workers outperform those using it to replace them. Gartner VP Helen Poitevin: "Workforce reductions may create budget room, but they do not create return" [4].&lt;/p&gt;

&lt;p&gt;Radiologists working with AI-assisted anomaly detection have lower miss rates than either the AI or the radiologist working alone. The AI catches what tired human eyes miss during a 12-hour shift; the radiologist catches the anomaly that falls outside the AI's training distribution. Neither is redundant. Geoffrey Hinton declared radiologists would be obsolete in 2016. Their median salary is now $571K and growing [5]. A decade-long natural experiment: the AI took the routine scans; the radiologist salary rose because judgment and accountability became more valuable, not less.&lt;/p&gt;

&lt;p&gt;In chess (where this research goes back decades), humans paired with AI assistance beat AI alone and unassisted grandmasters. The telling detail: the winning pairs weren't necessarily the grandmasters with the highest individual ratings. They were the humans who understood what the AI saw, what it missed, and when to trust it versus override it. Kasparov called these pairs "centaurs" and argued that the insight applies everywhere knowledge work meets computation [6].&lt;/p&gt;

&lt;p&gt;A study of GitHub Copilot users found developers completed tasks 55% faster on average, with code that passed quality checks at equivalent rates [7]. The speed gain was largest for the kind of boilerplate work that senior engineers find most draining, which means senior engineers got more time for the architecture and debugging that actually requires them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The bottleneck shifts upward.&lt;/strong&gt; AI raises the floor. The ceiling (judgment, relationships, accountability) becomes the new constraint.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Cognitive Surrender Trap
&lt;/h2&gt;

&lt;p&gt;There is a version of augmentation that isn't augmentation. When AI handles the routine tasks, the natural human response is to do less: fewer deep reads, shallower research, faster decisions with less independent verification. That response is rational in the short run and corrosive over time.&lt;/p&gt;

&lt;p&gt;A 2026 peer-reviewed study in &lt;em&gt;Human Behavior and Emerging Technologies&lt;/em&gt; gave this dynamic a name and proved it empirically: the Paradox of Augmentation. Human performance initially rises with AI support. With sustained use, the curve eventually dips below baseline (the human performing worse than before they had the tool) [8]. The mechanism is straightforward. Skills not exercised atrophy. The AI handled the practice reps.&lt;/p&gt;

&lt;p&gt;Cognitive skills require exercise. The radiologist who stops reading difficult scans because AI flags the obvious ones will, over time, lose the pattern recognition that makes them valuable on the edge cases. The lawyer who delegates all document review loses the intuition for what documents actually say and what they imply strategically. The engineer who never writes foundational code loses the feel for what the AI is generating and where it is likely to fail. A 2026 study found AI coding assistance lowers code comprehension scores by 17% and makes experienced developers 19% slower on debugging tasks (while they report feeling 20% faster) [9]. The confidence goes up. The capability goes down.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Augmentation requires deliberate reinvestment.&lt;/strong&gt; The hours AI saves are not supposed to become idle time. They're supposed to become harder work. The paralegal freed from document review should be in the deposition room, not watching the hours tick by. The radiologist whose routine scan volume drops should be spending more time on the cases that don't fit the pattern. The engineer whose boilerplate writes itself should be designing the architecture.&lt;/p&gt;

&lt;p&gt;There is also a generational dimension worth naming. A March 2026 Psychology Today analysis distinguishes two patterns: adults lose skills to AI, and children never build them [10]. Workers 46 and older offload tasks they already mastered; they lose capability but retain a foundation. Workers 17-25 offload tasks they were supposed to be learning. The 55% speed gain from Copilot is real for a senior engineer who understands what good code looks like. For the junior developer who never wrote the boilerplate, there is no foundation to fall back on.&lt;/p&gt;

&lt;p&gt;Research in &lt;em&gt;Scientific Reports&lt;/em&gt; (2026) adds a further wrinkle: AI collaboration enhances task performance but measurably undermines intrinsic motivation and sense of ownership [11]. Augmentation has costs beyond skill atrophy.&lt;/p&gt;

&lt;p&gt;This is the real risk for organizations that automate without intent: you don't lose the job title, you lose the capability behind it. The work gets lighter, the judgment atrophies, and when the hard case arrives (the one that requires genuine expertise), the human who was supposed to be the backstop has spent two years exercising none of the muscles that would have caught it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Good Augmentation Looks Like in Practice
&lt;/h2&gt;

&lt;p&gt;The Stanford HAI 2026 AI Index found developer employment for ages 22-25 fell nearly 20% since 2024, while developers 30 and older at the same companies grew [12]. The floor rises for those already above it. Access to the skills that get you to the ceiling is shrinking.&lt;/p&gt;

&lt;p&gt;The practical question for any leader: where in your team's work does AI handle a dimension well, and what should that free people to do?&lt;/p&gt;

&lt;p&gt;A mapping exercise worth running: list the recurring tasks in a given role. Estimate the time each consumes. Score each against the dimension stack: which are high-volume pattern tasks AI can accelerate, which require judgment, relationships, or accountability? The tasks where AI provides real leverage are candidates for offloading. The tasks that require the upper dimensions are where freed time should go.&lt;/p&gt;

&lt;p&gt;A few patterns worth watching across industries:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In client-facing roles:&lt;/strong&gt; AI handles research, briefing preparation, and follow-up documentation. The human handles the actual relationship. The ratio of meaningful client contact per professional increases, which is the point (and the thing that clients actually pay for).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In technical roles:&lt;/strong&gt; AI handles implementation of known patterns. The human handles architecture, debugging novel failures, and deciding what is worth building. The quality bar on human decisions rises because implementation cost drops, making more ideas worth testing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In analytical roles:&lt;/strong&gt; AI surfaces patterns in data at a scale and speed no human team matches. The human decides which patterns matter, what they imply, and how to present findings to stakeholders who asked the wrong question. The analysis becomes cheap; the interpretation is the scarce resource.&lt;/p&gt;

&lt;p&gt;In each case, the job survives because the job was never the task. The job was the bundle.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;AI replaces tasks. It doesn't replace the judgment, relationships, and accountability that bundle tasks into jobs. The human who works alongside AI and invests the recovered time in harder work is more capable than either the AI alone or the human before the AI arrived.&lt;/p&gt;

&lt;p&gt;The risk worth watching isn't replacement. It's atrophy. The document review AI completed in four hours freed three paralegals for two weeks of higher-dimension work. Or it gave them two weeks of lighter schedules and a gradual erosion of the skills that made them worth keeping. Which version your organization gets depends entirely on whether you're deliberate about it.&lt;/p&gt;

&lt;p&gt;The bundle doesn't disappear. It thins, if you let it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Where have you seen AI augmentation actually work, where the human genuinely got better because of the pairing rather than just faster? And where have you seen the atrophy trap play out? Both patterns are real, and the difference between them isn't the technology.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Related reading:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For the argument that AI should augment cognition rather than replace it, and why convenience is the enemy of capability: &lt;a href="https://www.linkedin.com/posts/keithmackay_the-modern-world-is-optimized-for-convenience-activity-7438246349352996864-OgNs" rel="noopener noreferrer"&gt;On LinkedIn&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;For how to think about AI as a capable colleague rather than a formula or tool, with implications for how much autonomy to grant: &lt;a href="https://www.linkedin.com/pulse/situational-leadership-ai-more-like-capable-colleague-keith-mackay-wjqoe" rel="noopener noreferrer"&gt;On LinkedIn&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/p/situational-leadership-for-ai" rel="noopener noreferrer"&gt;On Substack&lt;/a&gt; | &lt;a href="https://medium.com/@keithwrites/situational-leadership-for-ai-more-like-a-capable-colleague-than-a-fancy-formula-911749baf7f2" rel="noopener noreferrer"&gt;On Medium&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;For the organizational strategy of putting humans before the loop rather than in it, and what that means for judgment-intensive work: &lt;a href="https://www.linkedin.com/pulse/evolving-strategy-knowledge-work-from-keith-mackay-xiefe/" rel="noopener noreferrer"&gt;On LinkedIn&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/p/an-evolving-strategy-for-knowledge" rel="noopener noreferrer"&gt;On Substack&lt;/a&gt; | &lt;a href="https://medium.com/@keithwrites/an-evolving-strategy-for-knowledge-work-from-human-in-the-loop-to-human-before-the-loop-f8da4344a7ae" rel="noopener noreferrer"&gt;On Medium&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;For the reality behind AI-driven layoff announcements and whether jobs are actually being replaced or just tasks: &lt;a href="https://www.linkedin.com/pulse/companies-really-doing-layoffs-ai-keith-mackay-jtkfe/" rel="noopener noreferrer"&gt;On LinkedIn&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/p/are-companies-really-doing-layoffs" rel="noopener noreferrer"&gt;On Substack&lt;/a&gt; | &lt;a href="https://medium.com/@keithwrites/are-companies-really-doing-layoffs-for-ai-1186f32b1b9d" rel="noopener noreferrer"&gt;On Medium&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;a href="https://www.nber.org/papers/w32487" rel="noopener noreferrer"&gt;The Simple Macroeconomics of AI&lt;/a&gt; — Acemoglu, D., NBER Working Paper 32487, 2024. Estimates AI materially affects roughly 5% of tasks in the average occupation; projects 0.07% annual TFP growth from current AI systems. Introduces the multi-task fluidity constraint on AI task substitution.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.aei.org/articles/ais-economic-potential-goldman-sachs-responds-to-daron-acemoglu/" rel="noopener noreferrer"&gt;AI's Economic Potential: Goldman Sachs Responds to Daron Acemoglu&lt;/a&gt; — AEI, 2024. Goldman Sachs estimates 25% of all work tasks are eventually automatable; Penn Wharton analysis puts 40% of labor income in the exposure zone.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://pmc.ncbi.nlm.nih.gov/" rel="noopener noreferrer"&gt;PMC Meta-Analysis: Human-AI Collaboration Performance&lt;/a&gt; — Meta-analysis across medical, legal, and technical domains. Human alone 68%, AI alone 77%, AI plus human 80%, full collaborative framework 88%.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.gartner.com/en/newsroom/press-releases/2026-05-05-gartner-says-autonomous-business-and-artificial-intelligence-layoffs-may-create-budget-room-but-do-not-deliver-returns" rel="noopener noreferrer"&gt;Gartner: Autonomous Business and AI Layoffs May Create Budget Room but Do Not Deliver Returns&lt;/a&gt; — Gartner, May 2026. Study of 350 executives; companies using AI to amplify workers outperform those using it to replace them.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://fortune.com/2026/05/04/godfather-of-ai-geoffrey-hinton-radiologists-future-of-work-tech-ai-job-anxiety/" rel="noopener noreferrer"&gt;Godfather of AI Geoffrey Hinton, Radiologists, and the Future of Work&lt;/a&gt; — Fortune, May 2026. Radiologist median salary now $571K and growing a decade after Hinton's 2016 obsolescence prediction.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.penguinrandomhouse.com/books/557903/deep-thinking-by-garry-kasparov/" rel="noopener noreferrer"&gt;Deep Thinking: Where Machine Intelligence Ends and Human Creativity Begins&lt;/a&gt; — Kasparov, G., PublicAffairs, 2017. Kasparov's centaur chess research and the generalization to human-AI collaboration.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.blog/2022-09-07-research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness/" rel="noopener noreferrer"&gt;GitHub Copilot Research: The Impact of AI on Developer Productivity&lt;/a&gt; — GitHub, 2022. Controlled study: developers completed tasks 55% faster with Copilot assistance; code quality equivalent to unassisted work.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://onlinelibrary.wiley.com/doi/10.1155/hbe2/8303770" rel="noopener noreferrer"&gt;Paradox of Augmentation&lt;/a&gt; — &lt;em&gt;Human Behavior and Emerging Technologies&lt;/em&gt;, 2026. Human performance initially rises with AI support, then dips below baseline with sustained use. Empirical evidence for skill atrophy under AI assistance.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://tianpan.co/blog/2026-04-19-skill-atrophy-ai-augmented-engineering" rel="noopener noreferrer"&gt;Skill Atrophy in AI-Augmented Engineering&lt;/a&gt; — 2026. AI coding assistance lowers code comprehension scores by 17% and makes experienced developers 19% slower on debugging tasks, while developers report feeling 20% faster.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.psychologytoday.com/us/blog/the-algorithmic-mind/202603/adults-lose-skills-to-ai-children-never-build-them" rel="noopener noreferrer"&gt;Adults Lose Skills to AI, Children Never Build Them&lt;/a&gt; — &lt;em&gt;Psychology Today&lt;/em&gt;, March 2026. Distinguishes skill loss in workers 46+ (offloading mastered tasks) from skill formation failure in workers 17-25 (offloading tasks they were supposed to be learning).&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.nature.com/articles/s41598-025-98385-2" rel="noopener noreferrer"&gt;AI Collaboration, Task Performance, and Intrinsic Motivation&lt;/a&gt; — &lt;em&gt;Scientific Reports&lt;/em&gt;, 2026. AI collaboration enhances task performance but measurably undermines intrinsic motivation and sense of ownership.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://insights.som.yale.edu/insights/the-real-job-destruction-from-ai-is-hitting-before-careers-can-start" rel="noopener noreferrer"&gt;The Real Job Destruction from AI Is Hitting Before Careers Can Start&lt;/a&gt; — Yale SOM / Stanford HAI 2026 AI Index. Developer employment ages 22-25 fell nearly 20% since 2024; developers 30 and older at the same companies grew over the same period.&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;em&gt;Keith MacKay is a technology strategy consultant and CTO in EY-Parthenon's Software Strategy Group (SSG), specializing in AI disruption and technology diligence for private equity and corporate clients. SSG's AI Disruption Lab conducts rapid assessments of how AI transforms and threatens existing business models and value chains. Keith teaches at Northeastern University and writes about strategy, management, and AI/technology, with Claude Code and Codex as AI collaborators.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>career</category>
      <category>management</category>
    </item>
    <item>
      <title>The Letter VCs Are Quietly Deleting from ARR</title>
      <dc:creator>Keith MacKay</dc:creator>
      <pubDate>Sun, 07 Jun 2026 21:41:45 +0000</pubDate>
      <link>https://dev.to/keithjmackay/the-letter-vcs-are-quietly-deleting-from-arr-18be</link>
      <guid>https://dev.to/keithjmackay/the-letter-vcs-are-quietly-deleting-from-arr-18be</guid>
      <description>&lt;h1&gt;
  
  
  The Letter VCs Are Quietly Deleting from ARR
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Startups are reporting revenue they haven't earned yet. VCs know it. Investors are cheering anyway. We've seen this movie.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;You're evaluating an AI startup. The pitch deck shows $100 million in ARR. The growth curve is parabolic. The deck says they signed $100 million. What it doesn't say is that $70 million of that is "committed ARR": contracts signed but not yet invoiced, customers who haven't deployed yet, pilots that count toward the number if they convert. Subtract the gap and you've got $30 million in actual recognized revenue hiding under a $100 million headline.&lt;/p&gt;

&lt;p&gt;This is the ARR inflation playbook, and it's running at full speed right now.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Trick Has a Name
&lt;/h2&gt;

&lt;p&gt;"CARR" stands for Contracted or Committed ARR. It's a legitimate concept. In industries where revenue accrues slowly after contract signing (healthcare AI deployments, energy optimization platforms, multi-year enterprise integrations), the gap between signature and recognition can legitimately take months or years to close. Reporting CARR alongside ARR, properly labeled, is defensible.&lt;/p&gt;

&lt;p&gt;That's not what's happening.&lt;/p&gt;

&lt;p&gt;What's happening is simpler: founders strip the "C" and just call it ARR. One VC told TechCrunch the gap between CARR and actual ARR can run as high as 70% [1]. In some confirmed cases the spread is 3-5x. Another investor said flatly: "For sure they are reporting CARR as ARR" [1]. The article indicates that the investor community is not only aware, but many are actively complicit.&lt;/p&gt;

&lt;p&gt;The logic follows its own warped rationality. When one startup in a category inflates, the others have to follow to stay competitive for talent and headlines. "When one startup does it in a category, it is hard not to do it yourself just to keep up," as one investor put it [1]. Spellbook CEO Scott Stevenson, one of the few willing to call this in public, described the practice as a "huge scam," adding that major VC funds are not just watching it happen but actively supporting the narrative [2].&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The metric isn't tracking business reality. It's tracking a story being told for the benefit of people who need the story to be true.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  We Have Seen This Before
&lt;/h2&gt;

&lt;p&gt;If you were in the SaaS world circa 2021-2022, this will feel familiar.&lt;/p&gt;

&lt;p&gt;At the peak of the zero-interest-rate era, growth-at-all-costs was the gospel. Companies reported "ARR" using definitions that would have made their accountants wince: annualizing a single good month, counting trials, counting LOIs, counting anything that could be plausibly labeled recurring. The metric became a narrative tool, not a financial one.&lt;/p&gt;

&lt;p&gt;The crash was instructive. When rates rose and growth-multiple valuations compressed, the distance between reported metrics and cash reality became unforgiving. Companies that had raised at 40x ARR on inflated numbers found themselves underwater fast. The write-downs from venture portfolios in 2022 and 2023 were staggering.&lt;/p&gt;

&lt;p&gt;Go back further to the dot-com era and the dynamic is even starker. Eyeballs. Page views. Registered users. Each bubble generates its own vanity metric that sounds like a financial number but behaves like a PR number. The AI cycle's version is CARR-as-ARR: precise enough to sound real, slippery enough to hide the gap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The pattern is always the same: a new category, a new metric, and a crowd of people with aligned incentives telling themselves the old rules don't apply.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Is Actually a Ponzi Problem
&lt;/h2&gt;

&lt;p&gt;Run the actual math. An AI startup raises at, say, 20x its reported $100 million ARR. Valuation: $2 billion. But if actual ARR is $30 million, the real multiple is 67x: a number most rational investors wouldn't accept if they saw it plainly. The gap has to close in one of two ways: either the revenue materializes (the CARR converts, the contracts activate, the pilots become paid customers) or the next fundraise arrives before anyone asks hard questions.&lt;/p&gt;

&lt;p&gt;The next round keeps the story alive. The round after that does too. The round after that is either an IPO, an acquisition, or a reckoning.&lt;/p&gt;

&lt;p&gt;When markets are liquid and multiples are expanding, the gap can stay hidden indefinitely. The new money funds the old story. But when the music stops (rising rates, a public market correction, a category reset), the companies sitting on inflated ARR find themselves without chairs.&lt;/p&gt;

&lt;p&gt;The investors who got in early will have returned their funds by the time the reckoning arrives. The investors who got in at round D or E on a $2 billion valuation backed by $30 million in real revenue will not. The employees who took lower salaries for equity will not.&lt;/p&gt;

&lt;p&gt;This is not hyperbole. This is the documented playbook of every prior cycle, replayed with different vocabulary.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Actually Look At
&lt;/h2&gt;

&lt;p&gt;If you're evaluating AI companies (as an investor, an acquirer, a PE firm doing diligence, or a potential enterprise customer checking vendor stability), here's what the ARR number won't tell you:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cash in the bank and burn rate.&lt;/strong&gt; If a company has $100 million in ARR and $40 million in cash with a $6 million monthly burn, the story is different than if they're sitting on $200 million. ARR doesn't pay rent. Cash does.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Revenue recognition policy.&lt;/strong&gt; What accounting standard are they using? When do they book revenue: at contract signing, at deployment, at invoice, at collection? A company that recognizes revenue at signature on multi-year contracts is telling a very different story than one that recognizes monthly on delivery. This requires asking directly. The answer tells you which ARR definition you're working with.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Customer concentration.&lt;/strong&gt; "100 million in ARR" is a very different number if three customers represent 60% of it. Realization risk and churn risk are both concentrated.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Logo churn vs. dollar churn.&lt;/strong&gt; The churned customer count understates dollar exposure. The customer who leaves quietly is rarely the small one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pilot-to-paid conversion rate.&lt;/strong&gt; AI adoption timelines are long. Pilots run 90 to 180 days. Not all convert. CARR that depends on pilot conversions is optimistic by nature. Ask for the historical conversion rate, not the projected one.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How much of the CARR has a contractual obligation (signed MSA with SOW) versus a letter of intent?&lt;/li&gt;
&lt;li&gt;What's the average time between contract signing and first invoice?&lt;/li&gt;
&lt;li&gt;What percentage of contracts have renewal clauses versus auto-renew?&lt;/li&gt;
&lt;li&gt;Has any CARR been written down or reclassified in the last 12 months?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The gap between CARR and ARR, over time, is the most diagnostic number in the cap table.&lt;/strong&gt; If it's not shrinking, the story isn't working.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bigger Issue: What It Does to the Category
&lt;/h2&gt;

&lt;p&gt;ARR inflation isn't just a valuation problem. It distorts the entire category.&lt;/p&gt;

&lt;p&gt;When inflated numbers become the benchmark, every founder in the space has to match them or look like a laggard. Genuine companies with honest metrics get compared to fictional ones. Investment dollars flow toward the best story rather than the best business. Talent follows the valuation. The whole category gets mispriced.&lt;/p&gt;

&lt;p&gt;And then, when the reckoning arrives, it hits every company in the category, including the ones that were never inflating anything. The SaaS crash of 2022-2023 punished honest operators alongside the embellishers. The AI category will be no different.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The valuation you benefit from on the way up is the same one that drowns you on the way down.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The people who understand this are the ones currently doing the quiet work of building real revenue from paying customers. They're less interesting at cocktail parties right now. They'll be a lot more interesting in 18 months.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;AI ARR inflation is widespread, VC-tolerated, and structurally identical to every prior cycle's vanity metric problem. The gap between contracted and recognized revenue is real, the incentives to obscure it are powerful, and the investors most exposed are the ones entering late on valuations built on stories rather than cash. That pitch deck you looked at (the one with the parabolic curve and the $100 million headline): the real number is in the footnotes. Ask for the footnotes.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What's your experience on this? Are you seeing CARR-as-ARR in the market, and how are you adjusting your diligence process?&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;a href="https://techcrunch.com/2026/05/22/how-vcs-and-founders-use-inflated-arr-to-kingmake-ai-startups/" rel="noopener noreferrer"&gt;How VCs and founders use inflated 'ARR' to crown AI startups&lt;/a&gt; — TechCrunch, May 22, 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.fastcompany.com/91532292/ai-startups-arr-carr-scott-stevenson" rel="noopener noreferrer"&gt;AI startups are inflating a key revenue metric to win VC attention, says this founder&lt;/a&gt; — Fast Company&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;strong&gt;If this resonated, here are some related articles:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For why the SaaS panic overstated the case and where the real moat actually moved: &lt;a href="https://www.linkedin.com/pulse/saaspocalypse-real-saas-dead-saasinine-keith-mackay-6xqhe/" rel="noopener noreferrer"&gt;On LinkedIn: SaaSpocalypse? Real. SaaS Is Dead? SaaSinine.&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/p/saaspocalypse-real-saas-is-dead-saasinine" rel="noopener noreferrer"&gt;On Substack&lt;/a&gt; | &lt;a href="https://medium.com/@keithwrites/saaspocalypse-real-saas-is-dead-saasinine-630bbb0abfd8" rel="noopener noreferrer"&gt;On Medium&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;For how the "layoffs for AI" narrative works as a story about investors and incentives, not just headcount: &lt;a href="https://www.linkedin.com/pulse/companies-really-doing-layoffs-ai-keith-mackay-jtkfe/" rel="noopener noreferrer"&gt;On LinkedIn: Are Companies Really Doing Layoffs "For AI"?&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/p/are-companies-really-doing-layoffs" rel="noopener noreferrer"&gt;On Substack&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;For the real cost math behind AI investment and why infrastructure scarcity doesn't change the calculus: &lt;a href="https://www.linkedin.com/pulse/ai-infrastructure-scarcity-raising-costs-usage-still-provide-mackay-y2hce/" rel="noopener noreferrer"&gt;On LinkedIn: AI Infrastructure Scarcity is Raising Costs, but AI Usage Will Still Provide Unbeatable ROI&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/p/ai-infrastructure-scarcity-is-raising" rel="noopener noreferrer"&gt;On Substack&lt;/a&gt; | &lt;a href="https://medium.com/@keithwrites/ai-infrastructure-scarcity-is-raising-costs-but-ai-usage-will-still-provide-unbeatable-roi-3194d3178132" rel="noopener noreferrer"&gt;On Medium&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;For what makes a software business genuinely defensible once AI commoditizes the build — the question inflated ARR can't answer: &lt;a href="https://www.linkedin.com/pulse/software-moats-age-ai-whats-actually-defensible-keith-mackay-ibsde" rel="noopener noreferrer"&gt;On LinkedIn: Software Moats in the Age of AI: What's Actually Defensible?&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/p/software-moats-in-the-age-of-ai" rel="noopener noreferrer"&gt;On Substack&lt;/a&gt; | &lt;a href="https://medium.com/@keithwrites/software-moats-in-the-age-of-ai-whats-actually-defensible-698d4433d61e" rel="noopener noreferrer"&gt;On Medium&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Keith MacKay is a technology strategy consultant and CTO in EY-Parthenon's Software Strategy Group (SSG), specializing in AI disruption and technology diligence for private equity and corporate clients. SSG's AI Disruption Lab conducts rapid assessments of how AI transforms and threatens existing business models and value chains. Keith teaches at Northeastern University and writes about strategy, management, and AI/technology, with Claude Code and Codex as AI collaborators.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>saas</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>The Mythical Management Month</title>
      <dc:creator>Keith MacKay</dc:creator>
      <pubDate>Sun, 07 Jun 2026 19:52:07 +0000</pubDate>
      <link>https://dev.to/keithjmackay/the-mythical-management-month-4af0</link>
      <guid>https://dev.to/keithjmackay/the-mythical-management-month-4af0</guid>
      <description>&lt;h1&gt;
  
  
  15 Direct Reports and The Mythical Man(agement) Month
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Everyone is redesigning their org for AI...but the ones who get it right will remember to design for the humans involved.
&lt;/h2&gt;




&lt;p&gt;Brian Armstrong just announced Coinbase is restructuring around small, nimble teams and managers with up to 15 direct reports [1]. The logic is appealing: smaller teams move faster, AI extends individual output, fewer layers means clearer accountability. I don't disagree with any of that.&lt;/p&gt;

&lt;p&gt;But 15 direct reports, 30-minute 1:1s, once a week: that's 7.5 hours. A whole day. And it won't pencil out that cleanly -- other meetings and commitments will get in the way and things will get moved, cancelled, or become interruptions of important blocks of productive working time.&lt;/p&gt;

&lt;p&gt;Paul Graham wrote an essay in 2009 about the fundamental challenge these Coinbase managers will face: the maker's schedule and the manager's schedule are fundamentally incompatible [2]. Makers (knowledge workers) need long uninterrupted blocks for productive work; a single 30-minute meeting doesn't cost 30 minutes, it costs the half-day on either side of it that is lost: losing one's place to jump to the meeting, then rebuilding that context from the ground up after the meeting. &lt;em&gt;[As an aside: we see the same thing when we clear the context window in a coding agent...our team uses tools to store and restore session context before and after the clear to reduce the pain. I have not found a way to do that for human context effectively...and a 30-minute meeting will reliably clear my ADHD context window.]&lt;/em&gt; Coinbase's model asks these managers to be player-coaches who both lead and build -- which means running a manager's calendar while trying to hold a maker's schedule inside it. That's not a workflow optimization problem. It's an unsolvable math problem (Graham solved it by running two shifts: manager's schedule before dinner, maker's schedule after dinner...not unusual for entrepreneurs, but unsustainable for many people).&lt;/p&gt;

&lt;p&gt;Move the 1:1s to biweekly and you've "only" spent half a day per week, but now you're seeing each person for 30 minutes every two weeks. Monthly? You've clawed back most of your calendar, but you're investing only 30 minutes per working month in the people you're supposed to be developing. At some cadence between "constant" and "never," there's a schedule that sounds efficient and quietly breaks everything.&lt;/p&gt;

&lt;p&gt;This is the org design problem that is getting left out of the "gotta use AI" frenzy: connection matters -- and it's a cost of doing business that you can't always engineer away.&lt;/p&gt;

&lt;h3&gt;
  
  
  Brooks Had a Point
&lt;/h3&gt;

&lt;p&gt;Frederick Brooks published &lt;em&gt;The Mythical Man-Month&lt;/em&gt; in 1975 [3]. To oversimplify his central observation: adding engineers to a late project makes it later. Some work is inherently sequential. The example he uses in the book is that a baby takes nine months regardless of how many people you assign to the task.&lt;/p&gt;

&lt;p&gt;Management has sequential, non-parallelizable components too. You can use AI to prep for your 1:1s faster. You can use note-capture tools and second brains with agentic AI assistance to log the things you'd otherwise forget (and I highly recommend it!). You can summarize performance data, flag patterns, auto-draft your weekly update. That's real. Those savings add up to hours.&lt;/p&gt;

&lt;p&gt;What you cannot parallelize is the relationship itself.&lt;/p&gt;

&lt;p&gt;A manager who reads their AI-generated brief right before a 1:1 -- "Taylor had a tough sprint, prefers directness, is working toward a promotion conversation in Q3" -- knows Taylor...but only in the same way that reading an IMDB page means you know an actor. Interesting information. Useful context. Not the same as the thing it's describing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dunbar's Number helps to shed some light here.&lt;/strong&gt; Robin Dunbar's research on primate cognition and human social groups found that we maintain meaningful relationships in layers: roughly 5 in our innermost circle, 15 in the next, 50 beyond that [4]. These aren't preferences. They're cognitive load limits. Dunbar himself said the innermost 5 people (spouse, immediate family, best friends, closest work colleagues) receive roughly 40% of your total social time and emotional capital [4]. Relationships at the 15-person layer are real but thinner; they require consistent investment to stay functional.&lt;/p&gt;

&lt;p&gt;Note the coincidence: the "up to 15 direct reports" number sits right at the outer edge of the layer where humans naturally maintain close working relationships. Push past it and you're not just adding meetings. You're asking people to care meaningfully about more people than their brains are built to track.&lt;/p&gt;

&lt;p&gt;I've seen this in teams, and I've noted before that Dunbar's number correlates quite nicely to when we need layers of management in a start-up to continue growing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;With 5 people: you're family. Everybody is self-managed and knows what everyone else is doing, often minute-to-minute, without formal meetings. You have shorthand with each other so "meetings" are two-minute knowledge transfers over a donut.&lt;/li&gt;
&lt;li&gt;With 15 people, you're extended family. You have moved to multiple people in some roles, and you need to start having managers. You still know everyone, and what everyone is doing, but meetings start happening to allow distribution of tasks, organization of the work. You know your cousins, but you don't know all of them well. How would you?You spend much less time with each of them.&lt;/li&gt;
&lt;li&gt;When you get to 50 people, it's more like the family reunion. You recognize them all. You can't know what everyone is working on today and do your own job well. Meetings (and everything else) need to be more formal, so they can be managed and so strategy is transferred from top management down the org chart (now there are tactical layers that aren't management). There are many more middle managers.&lt;/li&gt;
&lt;li&gt;At the next level, the 125-150 level, we start to see the largest size group where humans seem to be able to manage ongoing relationships at some level. Larger than this, we just need additional layers of management to aggregate the reporting up and the messaging down. One aspect of Dunbar's work showed that when human tribes grew to 125-150 members, they would split into multiple tribes. It seems to say something about the human brain's absolute capacity to manage and maintain relationships with other humans. After all, we evolved in small tribes with limited mobility and shared value in building relationships to work together for survival. We're literally built for that.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So much value arises from good relationships. Having each other's backs -- not just loyalty, but understanding of what teammates need (which is NOT the same from day to day...to the chagrin of new managers everywhere). Cutting each other slack on a bad day. Understanding each other well enough to SEE that it's a bad day. Real trust in each other -- trusting teammates with things that are going on at home that might get in the way of work on occasion is a tricky business. Many will say work is NOT the place for that (and I'm not suggesting you pull a colleague aside right now and tell them about home struggles). And yet, we all have lives outside of work, and bad days...the very best jobs I've had are the ones where small, high-performance teams knew each other well enough to be ABLE to share things about their own lives and struggles with each other where it would have a work impact -- and the team could plan and function better as a result. Clear communication with respect, love, and candor for each other is the goal. This requires trust, which only comes from time together.&lt;/p&gt;

&lt;p&gt;Good relationships are an investment that requires time and consistency.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Number Isn't Fixed. The Range Is.
&lt;/h3&gt;

&lt;p&gt;Before the "but Dunbar's Number is contested!" crowd starts in: yes, the exact figures are debated. Swedish researchers re-running Dunbar's analysis with updated statistical methods found confidence intervals so wide that specifying any single number is, in their words, "of limited value" [5]. Individual variation is real and documented. Extroverts naturally maintain larger networks than introverts. I would argue the small business numbers for management layers are around 10, 25, 50, 100, perhaps because work relationships are a bit shallower than non-work relationships. The outer relationship layer (the ~150 figure) appears more stable cross-culturally than the inner ones.&lt;/p&gt;

&lt;p&gt;But any variation is a range, not an escape hatch. A 2024 study examining how people allocate emotional energy across relationship layers found that some individuals commit 45% of their social attention to their inner 5, others only 15%. Either way, NOBODY is managing 30 deep relationships simultaneously [6]. The cognitive load may not be uniform, but it IS bounded -- we each have limited time and energy available, and must choose how to spend those limited resources on our relationships.&lt;/p&gt;

&lt;p&gt;Will an AI-native generation learn to stretch these limits? If you've grown up coordinating dozens of social connections through apps, does your brain actually develop a wider working memory for relationships? We don't know yet, but I suspect we'll find that there ARE brain changes from our new learning and exposure patterns (we've seen this in existing younger generations who have grown up with social media, for instance). We DO know that the research on &lt;em&gt;depth vs. breadth&lt;/em&gt; consistently points one direction so far: organizations where managers can build high-quality relationships with their reports show meaningfully better trust, knowledge sharing, and team performance than those running wider spans [7]. A 2025 study found that expanded span of control specifically reduces leadership effectiveness by cutting into the relationship-building time that makes management work in the first place [7]. If your experience as a human being in the world is anything like mine, none of this will surprise you.&lt;/p&gt;

&lt;p&gt;The takeaway is that the number of connections varies from individual to individual, but the pattern we've evolved for human interaction doesn't. Trading depth for breadth has costs, and they compound. What we haven't yet studied is what a generation of workers who have only ever known high-breadth, lower-depth management looks like at scale. I think we're going to see the experiment play out around us in real time, and I expect the pendulum will swing back.&lt;/p&gt;

&lt;h3&gt;
  
  
  The IMDB Problem
&lt;/h3&gt;

&lt;p&gt;I use a second brain. Obsidian, notes on everything, agentic AI that helps to provide context just when I need it. I'd lose track of far more without it. In a 1:1, being able to pull up "we talked about this in December and here's what we agreed" has real value.&lt;/p&gt;

&lt;p&gt;But when I'm &lt;em&gt;reading&lt;/em&gt; notes to remember something that I feel like I should organically remember in the interest of a relationship, I'm doing something different than managing. I'm performing management. The file has the data. The relationship has atrophied. Do I KNOW this person and what they need, or do I know ABOUT them, the way I know ABOUT an actor from their IMDB profile? Those are different things.&lt;/p&gt;

&lt;p&gt;AI can help with the measurable inputs: feedback documentation, promotion narratives, goal tracking, prep for difficult conversations. These are real leverage points and you should use them. What AI cannot do is replace the accumulated texture of time spent. The trust that builds from seeing someone struggle and staying in it with them. The instinct that develops when you've watched someone work long enough to know the difference between "they're quiet because they're focused" and "they're quiet because something is wrong."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The parts of developing people that can be made more efficient with AI are the administrative parts. The actual development happens in the accumulated moments that look inefficient from the outside.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Observations In a Group Chat
&lt;/h3&gt;

&lt;p&gt;Talking about the Coinbase decision in a chat composed of VERY experienced C-Suite veterans, I weighed in that 7 or 8 direct reports is, in my mind, a healthy upper target. I ran mastermind groups in the past, and found that this was also the sweet spot for these groups. Larger than that, and the people in the group didn't spend enough time sharing in the sessions to build the trusting relationships necessary for the room to be a safe space for some truly sensitive and supportive conversations. A phenomenal manager and builder of multiple successful businesses indicated 10 people + 5 agents would be an upper limit for them. The largest number anyone in the thread suggested for a conceivable upper limit was 12, and that came with a qualifier: only works for a superstar (and I might add: "and even then only if the 12 reports are all high performers who need limited direction").&lt;/p&gt;

&lt;p&gt;High performers still have bad days. They or their kids get sick. They go through divorces and layoffs and parents in hospice and water leaks and cable installation and the whole catalog of things that happen when you hire humans instead of machines. Being a "superstar" doesn't make someone a robot.&lt;/p&gt;

&lt;p&gt;I conduct some new manager training for our group -- helping them understand the differences as they move from managing tasks and work streams to managing teams of people. I tell them what I believe: this is the hardest transition in a career. Not because the skills are technically complex. Because the whole frame has to flip.&lt;/p&gt;

&lt;p&gt;Tasks are bounded. You can learn them, master them, build intuition for them, delegate them cleanly. If a task goes wrong you diagnose the problem, fix it, move on.&lt;/p&gt;

&lt;p&gt;People aren't bounded. A person having a rough month doesn't come with error logs or reboot instructions, or a right way to solve the problem. The "fix" might take years and involve influences you can't see or control. And you as a manager are also having good days and bad ones, also bringing your own history and blind spots and capacity limits to every interaction.&lt;/p&gt;

&lt;p&gt;We're not automating our way out of that. We shouldn't want to.&lt;/p&gt;

&lt;h3&gt;
  
  
  Teams of Humans and Agents: Who Adapts to Whom?
&lt;/h3&gt;

&lt;p&gt;The conversation is shifting from "AI tools that help individuals" to "teams composed of humans and agents" (or even "dark factory" teams with no humans at all ["zero-person companies"], which is a whole 'nother topic to think through and a fascinating opportunity for some types of businesses). That's real and worth taking seriously.&lt;/p&gt;

&lt;p&gt;Research on human-AI teaming points to something fascinating: unlike human teams, where you largely recruit members with relatively fixed capabilities (and only sometimes hire to fill psychological or workstyle gaps/complementarities), AI teammates can be instantiated to match the profile you need [8]. The agent can be configured to prefer directness or to check in more often or to front-load its uncertainty. The flexibility is genuinely remarkable (I predict that we will see the personalities and work styles of agents evolve to complement the strengths and work styles of the particular human team-members with whom they are working).&lt;/p&gt;

&lt;p&gt;The research also demonstrates the current limits. Humans have evolved as social animals such that teams negotiate roles naturally and interpret implicit social cues without thinking about it. AI agents need explicit protocols to do the same work. The agent will do what you configured. It won't notice what you didn't configure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The right frame isn't "will AI adapt to humans or will humans adapt to AI?" The right frame is: what does this specific team need, and who is responsible for maintaining it?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In a well-designed human-agent team, the agent handles the parallelizable, repeatable, stateful work: tracking progress, surfacing context, flagging when something deviates from the plan. The human handles the work that requires judgment, relationship, and the kind of presence you can't stub in a config file.&lt;/p&gt;

&lt;p&gt;All-agent teams are coming for certain categories of work. For others -- the work that turns on trust, creativity, and the weird non-rational things humans do under pressure -- the agent is likely the foil and thought partner and collaborator, not the lead.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Future Is Already Here--It's Just Not Evenly Distributed
&lt;/h3&gt;

&lt;p&gt;This section heading is a favorite quote from William Gibson, popularized over 30 years ago -- and more true now, if anything. Companies are beginning to experiment with the limits of agentic technology at scale (Meta is recording employee keystrokes, mouse movements, and screenshots as a training set for agentic AI that could in theory eventually replace those employees [9]). Will this work? I bet unequivocally yes -- for some cases, with some spectacular failures to come.&lt;/p&gt;

&lt;p&gt;Cursor runs $500M in annual revenue with 50 engineers [10]. That's a staggering number, and it's not magic: it's a small team with high context, deep ownership, and tools that extend what each person can do. The math works because everyone carries the whole picture.&lt;/p&gt;

&lt;p&gt;But Cursor's model doesn't scale by adding 15 people per manager and calling it nimble. It scales by maintaining the conditions that made small teams effective in the first place: real relationships, shared context, people who know each other well enough to move without constant coordination overhead.&lt;/p&gt;

&lt;p&gt;The tools to run a fundamentally different kind of organization exist today. The constraint is human and organizational absorption speed. Testing takes time. Approval processes take time (heavens to Mergatroid do they take time!). Training takes time. The humans in the loop -- not because they're inefficient, but because they're human -- need the time it actually takes to build knowledge, habits, trust in new systems, new teammates, new ways of working. That's not friction to be eliminated. That's the pace of durable change.&lt;/p&gt;

&lt;p&gt;You can accelerate the administrative layers. You can learn faster than most do by using better teaching/training/learning techniques. You cannot compress the relationship layers. Organizations that understand the difference will build structures that last. Organizations that don't will run excellent pilots and wonder why nothing sticks.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Bottom Line
&lt;/h3&gt;

&lt;p&gt;Fifteen direct reports doesn't fail because the math is wrong. It fails because connection doesn't scale linearly. Dunbar figured that out from looking at human communication patterns. Brooks figured it out from observing software projects. Every experienced manager has figured it out the hard way (usually around year two).&lt;/p&gt;

&lt;p&gt;The org designs that win in the next five years will be the ones that use AI to ruthlessly eliminate everything that shouldn't require a human, and then protect with equal ruthlessness the time for everything that does. The monthly 1:1 that feels like efficiency now will produce...monthly relationships. And monthly relationships, I would argue, are very different from the relationships you want your teams to have. They're acquaintanceships. You can't build someone's career on an acquaintanceship.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What's your current span of control, and where do you feel the edges? I'm curious whether the 7-9 ceiling holds across industries or whether there are domains where it genuinely breaks.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  References
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://techcrunch.com/2026/05/05/coinbase-to-lay-off-14-of-staff-as-part-of-broader-restructuring/" rel="noopener noreferrer"&gt;Coinbase to lay off 14% of staff as part of broader restructuring to AI-native pods (TechCrunch)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.paulgraham.com/makersschedule.html" rel="noopener noreferrer"&gt;Maker's Schedule, Manager's Schedule — Paul Graham (2009)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Brooks, F. P. (1975). &lt;em&gt;The Mythical Man-Month: Essays on Software Engineering&lt;/em&gt;. Addison-Wesley.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.steelcase.com/research/articles/how-your-brain-makes-friends-with-robin-dunbar-transcript/" rel="noopener noreferrer"&gt;Robin Dunbar on the layered structure of human relationships: 5 inner circle, 15 sympathy group, 40% of social time to the inner 5 (Steelcase Research Transcript)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC8103230/" rel="noopener noreferrer"&gt;Dunbar's number deconstructed: confidence intervals too wide to specify a single value (Biology Letters, 2021)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC11896044/" rel="noopener noreferrer"&gt;Reflecting on Dunbar's numbers: individual differences in energy allocation across relationship layers (PLOS One, 2024)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC12231798/" rel="noopener noreferrer"&gt;Expanded span of control, leadership effectiveness, and relationship quality (PMC, 2025)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC12093936/" rel="noopener noreferrer"&gt;The Role of Adaptation in Collective Human-AI Teaming: Zhao, Simmons, Admoni (Carnegie Mellon, Topics in Cognitive Science)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://techcrunch.com/2026/04/21/meta-will-record-employees-keystrokes-and-use-it-to-train-its-meta-ai-models/" rel="noopener noreferrer"&gt;Meta will record employees' keystrokes and use it to train its AI models (TechCrunch, April 2026)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://newsletter.pragmaticengineer.com/p/cursor" rel="noopener noreferrer"&gt;Real-world engineering challenges: building Cursor: 50 engineers, $500M+ ARR (Pragmatic Engineer)&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;strong&gt;If this resonated, here are some related articles:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For why the layoff wave being attributed to AI is more complicated than the headlines suggest: &lt;a href="https://www.linkedin.com/pulse/companies-really-doing-layoffs-ai-keith-mackay-jtkfe/" rel="noopener noreferrer"&gt;Are Companies Really Doing Layoffs "For AI"?&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/p/are-companies-really-doing-layoffs" rel="noopener noreferrer"&gt;Substack&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;For how to think about managing AI itself — the same relationship-vs-tool tension applies when the "direct report" is an agent: &lt;a href="https://www.linkedin.com/pulse/situational-leadership-ai-more-like-capable-colleague-keith-mackay-wjqoe" rel="noopener noreferrer"&gt;Situational Leadership for AI: More Like a Capable Colleague than a Fancy Formula&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/p/situational-leadership-for-ai" rel="noopener noreferrer"&gt;Substack&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;For the org design question underneath this one — how much human judgment to keep in the loop as AI takes on more work: &lt;a href="https://www.linkedin.com/pulse/evolving-strategy-knowledge-work-from-keith-mackay-xiefe/" rel="noopener noreferrer"&gt;An Evolving Strategy for Knowledge Work: From Human-In-the-Loop to Human-Before-the-Loop&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/p/an-evolving-strategy-for-knowledge" rel="noopener noreferrer"&gt;Substack&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;For why enterprise AI confidence tends to outrun enterprise AI competence — relevant to anyone betting org design on AI capabilities that aren't fully proven yet: &lt;a href="https://www.linkedin.com/pulse/dunning-kruger-effect-now-available-enterprise-scale-keith-mackay-kclxf/" rel="noopener noreferrer"&gt;The Dunning-Kruger Effect, Now Available at Enterprise Scale&lt;/a&gt; | &lt;a href="https://dev.to/keithjmackay/the-dunning-kruger-effect-now-available-at-enterprise-scale-32m7-temp-slug-5673335"&gt;Dev.to&lt;/a&gt; | &lt;a href="https://medium.com/@keithwrites/the-dunning-kruger-effect-now-available-at-enterprise-scale-8fa51418005d" rel="noopener noreferrer"&gt;Medium&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/publish/post/196867610" rel="noopener noreferrer"&gt;Substack&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;_Keith MacKay is a technology strategy consultant and CTO in EY-Parthenon's Software Strategy Group (SSG), specializing in AI disruption and technology diligence for private equity and corporate clients. SSG's AI Disruption Lab conducts rapid assessments of how AI transforms and threatens existing business models and value chains. Keith teaches at Northeastern University and writes about strategy, management, and AI/technology, with Claude Code and Codex as AI collaborators.&lt;/p&gt;

</description>
      <category>leadership</category>
      <category>management</category>
      <category>agents</category>
      <category>ai</category>
    </item>
    <item>
      <title>One AI Vendor Is a Single Point of Failure. Treat It Like One.</title>
      <dc:creator>Keith MacKay</dc:creator>
      <pubDate>Sun, 07 Jun 2026 19:35:40 +0000</pubDate>
      <link>https://dev.to/keithjmackay/one-ai-vendor-is-a-single-point-of-failure-treat-it-like-one-kp1</link>
      <guid>https://dev.to/keithjmackay/one-ai-vendor-is-a-single-point-of-failure-treat-it-like-one-kp1</guid>
      <description>&lt;h1&gt;
  
  
  One AI Vendor Is a Single Point of Failure. Treat It Like One.
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;The AI model you built your workflow on today may be indistinguishable from its competitor next quarter. That's not a problem. Betting on one of them is.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;Pick up any two frontier models and run the same prompt through both. A year ago, you'd get noticeably different outputs: different reasoning styles, different strengths, different failure modes. Today, the gap has narrowed to a point where many enterprise workloads can't reliably tell them apart. That convergence is not an accident. It's the predictable result of an industry eating itself.&lt;/p&gt;

&lt;p&gt;And it has significant implications for how you should be building.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the Models Are Converging
&lt;/h2&gt;

&lt;p&gt;The reasons stack up fast.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Models trained on models.&lt;/strong&gt; When a frontier lab releases a strong model, that model's outputs become training data. For researchers. For competitors. For distillation pipelines that compress the big model's behavior into smaller, cheaper ones. [1][2] The knowledge encoded in GPT-4 didn't stay inside OpenAI. It propagated through every dataset that included AI-generated content — which is most of the internet now. Models are increasingly trained on each other's outputs, knowingly or not. OpenAI accused DeepSeek of doing exactly this: using OpenAI API outputs to train a competing model, with the White House AI czar stating there was "substantial evidence that DeepSeek distilled the knowledge out of OpenAI's models." [1] Convergence is baked into the pipeline.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Talent moves.&lt;/strong&gt; The AI industry has roughly the same twelve hundred people who understand how to train frontier models at scale. They circulate. The researchers who built GPT-3 helped found Anthropic — Dario Amodei, Daniela Amodei, and several colleagues left OpenAI and immediately began developing Constitutional AI and mechanistic interpretability. [3] Fortune reported in 2025 that OpenAI engineers were 8x more likely to leave for Anthropic than the reverse; Meta poached at least eleven researchers from OpenAI, DeepMind, and Anthropic in a single hiring sprint. [4] The intellectual property of training methodology travels with the humans who developed it, and the humans don't stay put. The result: training approaches converge because the people designing them are the same people, working from the same theoretical foundations, just wearing different lanyards.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Standards are consolidating — faster than anyone expected.&lt;/strong&gt; The MCP story is the clearest evidence of how quickly the industry is converging around shared infrastructure, and it's worth pausing on. Anthropic announced the Model Context Protocol in November 2024. [5] OpenAI — Anthropic's direct competitor — adopted it four months later, in March 2025. Sam Altman's post said simply: "People love MCP and we are excited to add support across our products." [6] Google's Demis Hassabis publicly endorsed MCP within weeks, and Google followed with formal support. [7] Microsoft hit general availability at Build 2025. [8] By December 2025, Anthropic had donated MCP to the Linux Foundation — with OpenAI and Block as co-founders of the new Agentic AI Foundation alongside Anthropic, and Google, Microsoft, and AWS as supporting members. [9]&lt;/p&gt;

&lt;p&gt;The pace of that trajectory is worth internalizing. A standard created by one lab, adopted by every major competitor within six months, then transferred to neutral open-source governance within a year. SDK downloads went from roughly 100,000 per month at launch to 97 million per month by late 2025 — nearly a thousand-fold increase. [10] There are now over 10,000 active MCP servers. The New Stack ran a piece titled "Why the Model Context Protocol Won." [11]&lt;/p&gt;

&lt;p&gt;When competitors adopt your standard that quickly, it means one thing: the underlying problem it solves is universal enough that no one benefits from a proprietary alternative. That's the definition of infrastructure. And infrastructure is, by definition, commodity. Token formats are standardizing on the same logic — which is precisely why model routers can exist as a product category at all. When the interfaces are identical, the model underneath is interchangeable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The benchmark treadmill.&lt;/strong&gt; Every major lab is optimizing against the same public benchmarks: MMLU, HumanEval, SWE-bench, GPQA. When you train to the same tests, you build the same competencies. The models get better at the same things, in the same ways. Differentiation exists at the frontier and at the edge cases — the things the benchmarks don't measure. For the vast middle of enterprise use cases, the models are functionally equivalent and getting more so.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The insight that changes the strategy:&lt;/strong&gt; The model isn't the moat. The model is the commodity. The moat is the workflow, the data, the institutional knowledge of how to use the tool. That remains yours regardless of which model powers it — if you build for portability.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Mission-Critical Problem
&lt;/h2&gt;

&lt;p&gt;This is a movie that every systems architect has seen on repeat.&lt;/p&gt;

&lt;p&gt;No serious organization runs mission-critical infrastructure on a single server without failover. Not because the server is likely to fail, but because the cost of unplanned downtime is high enough that the insurance is worth the overhead. You build redundancy, you test failover, you know what happens when the primary goes down — before it goes down.&lt;/p&gt;

&lt;p&gt;AI tools have crossed the mission-critical threshold for a growing number of organizations. [12] When your development team's velocity depends on an AI coding assistant, when your customer service runs through an AI agent, when your analysts are using AI for research synthesis — an outage isn't a convenience problem. It's a business problem.&lt;/p&gt;

&lt;p&gt;This is not theoretical risk.&lt;/p&gt;

&lt;p&gt;On April 15, 2026, a critical incident took down Claude.ai, the Claude API, Claude Code, and the platform console simultaneously for approximately three hours. [13] Login failures locked out users who hadn't already established a session; the API went completely dark before recovering.&lt;/p&gt;

&lt;p&gt;Less than a week later, on April 20, 2026 (today as I write this), OpenAI experienced an outage of over two hours that took down ChatGPT, Codex, and the entire API platform simultaneously. [14] Elevated authentication errors, European region failures, and business workspace disruptions occurred in the same week. An AI workflow with no routing alternative was simply dead in the water.&lt;/p&gt;

&lt;p&gt;Also, for the past 4 days as I write this (April 17 through 20), Google's' Gemini API showed partial outages, and AI Studio logged partial outages continuously from April 2 through April 20. [15] That's three for three of the top-tier LLMs with customer-impacting problems over the past week. The pattern is consistent enough that it should inform architecture, not just post-mortems.&lt;/p&gt;

&lt;p&gt;It's important to note that the failure modes for AI services are different from server failures. Servers either work or they don't. AI services degrade: quality drops, rate limits hit, new pricing tiers appear, context windows shrink during peak hours, reasoning depth gets quietly dialed back. [16] The service is still technically available. It's just worse. Detecting and responding to that kind of degradation requires a different architecture than traditional high-availability design.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The right architecture has the same shape as any redundant system: multiple providers, automatic failover, and the ability to route work to wherever it can be done best — or cheapest — at any given moment.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Vendor Lock-in Trap
&lt;/h2&gt;

&lt;p&gt;Most enterprise AI deployments are built on a single provider. One API key. One model. One pricing tier. One support relationship.&lt;/p&gt;

&lt;p&gt;This made sense twelve to eighteen months ago, when the capability differences between providers were large enough to justify the dependency. It makes less sense now, and will make even less sense six months from now as convergence accelerates.&lt;/p&gt;

&lt;p&gt;The lock-in risk is not primarily about the provider going bankrupt. It's subtler:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pricing power.&lt;/strong&gt; A provider that owns your workflow can raise prices with confidence. You've already proven you depend on them. The negotiating position of an organization that can switch in 48 hours is fundamentally different from one that would need six months of re-integration work.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quality degradation without exit.&lt;/strong&gt; When Anthropic quietly reduced reasoning depth for consumer sessions, organizations locked into Claude had no lever. You can complain on Reddit or you can vote with your workload. Only one of those changes the provider's behavior.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Capability ceilings.&lt;/strong&gt; No single model is best at everything. Code generation, long-document synthesis, structured data extraction, creative writing, multi-step reasoning — the rankings shift by task and by model version. An organization that can route each task to the best available tool gets better outputs than one that forces everything through a single model because re-integration is too expensive.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Geopolitical and regulatory exposure.&lt;/strong&gt; As AI regulation diverges across jurisdictions and as export controls on AI capabilities tighten, an organization dependent on a single provider inherits all of that provider's regulatory risk. Diversification is also risk management.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Build the Router
&lt;/h2&gt;

&lt;p&gt;The practical answer to commoditization and to vendor lock-in is the same thing: a model router.&lt;/p&gt;

&lt;p&gt;Not just for budget control — though that's real and important, as I've written about before. Token costs vary 10x across providers for equivalent capability. [17] Dispatching the right query to the right model at the right price is genuinely valuable. But the budget argument is the least interesting reason to build routing infrastructure. I've built. modelrouter that we're beginning to test now. When I first built it, it was primarily for budget control -- "how do we prevent end-of-month sticker shock?"&lt;/p&gt;

&lt;p&gt;Build it for &lt;strong&gt;failover&lt;/strong&gt;: when one provider is degraded or rate-limited, queries should route automatically to the next best option without human intervention. I have NOT built this in yet, but I can imagine a system that is monitoring results across users, monitoring status pages, and perhaps even monitoring news, and feeding those inputs into a scoring system that adjusts routing.&lt;/p&gt;

&lt;p&gt;Build it for &lt;strong&gt;quality routing&lt;/strong&gt;: longer, more complex reasoning tasks to the model with the best benchmark on that class of problem. Routine extraction and summarization to the cheapest model that clears the quality bar. Real-time interaction to the model with the lowest latency. I built this logic into my nanobot, and immediately extended my token runway without affecting quality (anecdotally).&lt;/p&gt;

&lt;p&gt;Build it for &lt;strong&gt;antagonistic validation&lt;/strong&gt;: run the same high-stakes output through two different models and compare. Where they agree, confidence goes up. Where they diverge, a human reviews. This is genuinely a different quality control architecture than single-model review — the models have different failure modes, different training biases, different blindspots. Making them check each other's work surfaces errors that neither would catch alone. I've built hooks into my router that could allow behavior like this, and I've used this strategy very productively with agents. A colleague was sharing this morning that he spins up multiple Tmux sessions with different agents running to test the same plugin in different contexts. That would work through my modelrouter architecture as designed.&lt;/p&gt;

&lt;p&gt;Build it for &lt;strong&gt;portability&lt;/strong&gt;: when the next model generation arrives and reshuffles the capability rankings, your workflows should be able to point at a new endpoint with minimal rework. I built a system to system into modelrouter to register models, and to establish failover paths if a model can't be reached. If opus is down, try sonnet. If sonnet is down, try chatGPT. If that's down, try gemini. Or push to locally-hosted open source models in ollama if you prefer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Skill Portability Proof
&lt;/h2&gt;

&lt;p&gt;The convergence argument isn't just theoretical. I tested it directly.&lt;/p&gt;

&lt;p&gt;The differences in how skills (reusable AI workflows) are defined across Claude Code, Gemini CLI, and Codex have gotten small enough that automated migration is straightforward. I built a tool — &lt;a href="https://github.com/keithmackay/skillporter" rel="noopener noreferrer"&gt;skillporter&lt;/a&gt; — that ports any coding-agent skill across four major platforms in a single pass: Claude Code, Codex, Antigravity, and Gemini CLI.&lt;/p&gt;

&lt;p&gt;The fact that this tool is possible tells you something important. A year ago, the conceptual models were different enough that automated translation would have produced garbage. Today, the translation fidelity is high enough to be genuinely useful. The platforms have converged around similar enough patterns that the same underlying skill, expressed in each platform's native syntax, does the same work.&lt;/p&gt;

&lt;p&gt;That's not a coincidence. It's the same convergence dynamic playing out at the tooling layer. And it's an early signal of where the model layer is headed.&lt;/p&gt;

&lt;p&gt;The practical takeaway for enterprises building on AI: architect as if the specific model doesn't matter, because increasingly it won't. Your prompt engineering, your context management, your institutional knowledge of how to get quality outputs — those compound over time and belong to you (you the enterprise, or you the individual? That's a debate for another article and well worth thinking deeply about...stay tuned). The model you're running them on is becoming as interchangeable as the cloud region your servers run in.&lt;/p&gt;

&lt;h2&gt;
  
  
  What To Build For
&lt;/h2&gt;

&lt;p&gt;Three things that compound value regardless of which model wins the next benchmark cycle:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Routing infrastructure.&lt;/strong&gt; Even a simple implementation — a routing layer that can target different providers with different query types, and that can fail over when a provider is degraded — is worth building now. The harder the dependency is to remove later, the worse your negotiating position and your resilience.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt and context libraries.&lt;/strong&gt; Well-crafted prompts and context strategies are model-agnostic to a first approximation. The effort you put into specifying exactly what good output looks like, what context the model needs, and how to validate the result pays dividends every time the model underneath changes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Evaluation harnesses.&lt;/strong&gt; The organizations that know how to measure AI output quality — not just "does it look right" but "does it pass defined acceptance criteria" — are the ones who can confidently switch models when a better option appears. You can't port to a new model if you can't tell whether it's performing as well as the old one.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;The frontier model arms race is producing a useful side effect: the models are getting good enough, fast enough, that the differences between them at the margin are shrinking. For most enterprise use cases, the specific model is becoming the wrong thing to optimize for. The right things to optimize for are workflow quality, routing flexibility, and the organizational competency to evaluate and switch.&lt;/p&gt;

&lt;p&gt;Build as if the model is infrastructure — because it is. Commodity infrastructure. And the organizations that have treated it as such will have the leverage and the resilience that single-provider shops are going to spend the next cycle wishing they'd built.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Are you running multi-provider AI infrastructure, or still on a single model? Have you hit the vendor lock-in ceiling yet? I'd like to hear what's actually driving routing decisions in your organization.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;If this resonated, here are some related articles:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For the evidence that providers are already adjusting quality and capacity without announcement — the "reasoning_effort=25" discovery and what it means for your workflows: &lt;a href="https://www.linkedin.com/pulse/ai-shrinkflation-your-model-quietly-dialed-back-keith-mackay-9vllc" rel="noopener noreferrer"&gt;AI Shrinkflation: Your AI Model Was Quietly Dialed Back&lt;/a&gt; | &lt;a href="https://dev.to/keithjmackay/ai-shrinkflation-your-ai-model-was-quietly-dialed-back-5g12-temp-slug-4872783"&gt;Dev.to&lt;/a&gt; | &lt;a href="https://medium.com/@keithwrites/ai-shrinkflation-your-ai-model-was-quietly-dialed-back-1029a9b72f32" rel="noopener noreferrer"&gt;Medium&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/publish/post/196867594" rel="noopener noreferrer"&gt;Substack&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;For why the model isn't the moat — and what actually is defensible when AI commoditizes your stack: &lt;a href="https://www.linkedin.com/pulse/software-moats-age-ai-whats-actually-defensible-keith-mackay-ibsde" rel="noopener noreferrer"&gt;Software Moats in the Age of AI: What's Actually Defensible?&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/p/software-moats-in-the-age-of-ai" rel="noopener noreferrer"&gt;Substack&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;For why compute scarcity is real but the ROI math still favors adoption — the infrastructure cost argument that makes routing a financial priority: &lt;a href="https://www.linkedin.com/pulse/ai-infrastructure-scarcity-raising-costs-usage-still-provide-mackay-y2hce/" rel="noopener noreferrer"&gt;AI Infrastructure Scarcity is Raising Costs, but AI Usage Will Still Provide Unbeatable ROI&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/p/ai-infrastructure-scarcity-is-raising" rel="noopener noreferrer"&gt;Substack&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;For the practical case that tokens are currency and should be budgeted — the financial controls argument that pairs with routing infrastructure: &lt;a href="https://www.linkedin.com/pulse/token-bill-coming-nobodys-ready-keith-mackay-ltfme/" rel="noopener noreferrer"&gt;The Token Bill Is Coming. Nobody's Ready for It.&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Axios, &lt;a href="https://www.axios.com/2025/01/29/openai-deepseek-ai-models-data-training" rel="noopener noreferrer"&gt;OpenAI says DeepSeek may have used its outputs to train competing model&lt;/a&gt;, January 2025. (White House AI czar: "substantial evidence that DeepSeek distilled the knowledge out of OpenAI's models.")&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;arXiv, &lt;a href="https://arxiv.org/abs/2402.13116" rel="noopener noreferrer"&gt;A Survey on Knowledge Distillation of Large Language Models&lt;/a&gt;, 2024. (Documents how LLM distillation pipelines use a "teacher" model's outputs as training data for smaller "student" models.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Wikipedia, &lt;a href="https://en.wikipedia.org/wiki/Anthropic" rel="noopener noreferrer"&gt;Anthropic&lt;/a&gt;. (Founding team of Dario Amodei, Daniela Amodei, and colleagues departed OpenAI carrying Constitutional AI and mechanistic interpretability methodology.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Fortune, &lt;a href="https://fortune.com/2025/06/03/openai-deepmind-anthropic-loosing-engineers-ai-talent-war/" rel="noopener noreferrer"&gt;OpenAI and DeepMind Losing Engineers to Anthropic in a One-Sided Talent War&lt;/a&gt;, June 2025. (OpenAI engineers 8x more likely to leave for Anthropic than reverse; Meta poached 11 researchers from OpenAI, DeepMind, and Anthropic.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Anthropic, &lt;a href="https://www.anthropic.com/news/model-context-protocol" rel="noopener noreferrer"&gt;Introducing the Model Context Protocol&lt;/a&gt;, November 25, 2024.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;TechCrunch, &lt;a href="https://techcrunch.com/2025/03/26/openai-adopts-rival-anthropics-standard-for-connecting-ai-models-to-data/" rel="noopener noreferrer"&gt;OpenAI adopts rival Anthropic's standard for connecting AI models to data&lt;/a&gt;, March 26, 2025. (Sam Altman: "People love MCP and we are excited to add support across our products.")&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The New Stack, &lt;a href="https://thenewstack.io/google-embraces-mcp/" rel="noopener noreferrer"&gt;Google Embraces MCP&lt;/a&gt;, 2025. (Google DeepMind CEO Demis Hassabis publicly endorsed MCP in April 2025; formal Google Cloud MCP support announced December 10, 2025.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Microsoft Copilot Studio Blog, &lt;a href="https://www.microsoft.com/en-us/microsoft-copilot/blog/copilot-studio/model-context-protocol-mcp-is-now-generally-available-in-microsoft-copilot-studio/" rel="noopener noreferrer"&gt;Model Context Protocol (MCP) is now generally available in Microsoft Copilot Studio&lt;/a&gt;, May 29, 2025. (GA announcement following Microsoft Build 2025.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Anthropic, &lt;a href="https://www.anthropic.com/news/donating-the-model-context-protocol-and-establishing-of-the-agentic-ai-foundation" rel="noopener noreferrer"&gt;Donating the Model Context Protocol and establishing the Agentic AI Foundation&lt;/a&gt;, December 9, 2025. (Co-founders: Anthropic, Block, OpenAI; supporting members: Google, Microsoft, AWS, Cloudflare, Bloomberg.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;MCP Blog, &lt;a href="https://blog.modelcontextprotocol.io/posts/2025-11-25-first-mcp-anniversary/" rel="noopener noreferrer"&gt;One Year of MCP&lt;/a&gt;, November 2025. (SDK downloads grew from ~100K/month at launch to 97M/month by late 2025; 10,000+ active servers.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The New Stack, &lt;a href="https://thenewstack.io/why-the-model-context-protocol-won/" rel="noopener noreferrer"&gt;Why the Model Context Protocol Won&lt;/a&gt;, December 7, 2025. (Analysis of MCP's industry-wide adoption trajectory.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;McKinsey, &lt;a href="https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai" rel="noopener noreferrer"&gt;The State of AI: Global Survey 2025&lt;/a&gt;, 2025. (88% of organizations regularly use AI in at least one business function; 72% have deployed generative AI, up from 33% in 2024.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Anthropic Status, &lt;a href="https://status.claude.com/incidents/f00h6l76tsjs" rel="noopener noreferrer"&gt;Incident: Increased errors across Claude services&lt;/a&gt;, April 15, 2026. (~3-hour critical outage affecting Claude.ai, Claude API, Claude Code, and platform console simultaneously.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;OpenAI Status, &lt;a href="https://status.openai.com/history" rel="noopener noreferrer"&gt;Users unable to load ChatGPT, Codex and API Platform&lt;/a&gt;, April 20, 2026. (~2-hour outage affecting ChatGPT, Codex, and API simultaneously.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Google AI Studio Status, &lt;a href="https://aistudio.google.com/status" rel="noopener noreferrer"&gt;Service status history&lt;/a&gt;, April 2026. (Google AI Studio partial outages April 2–20, 2026; Gemini API partial outages April 17–20, 2026.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;GitHub / Hacker News, &lt;a href="https://github.com/anthropics/claude-code/issues/42796" rel="noopener noreferrer"&gt;Claude Code reasoning depth drop — 67% reduction documented across 6,852 sessions&lt;/a&gt;, April 2026. (Analysis by Stella Laurenzo, AMD AI group; reasoning_effort parameter set to 25/100 in consumer sessions.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Epoch AI, &lt;a href="https://epoch.ai/data-insights/llm-inference-price-trends" rel="noopener noreferrer"&gt;LLM inference prices have fallen rapidly but unequally across tasks&lt;/a&gt;, 2025. (Wide cost variance across providers for equivalent capability.)&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;em&gt;Keith MacKay is a technology strategy consultant and CTO in EY-Parthenon's Software Strategy Group (SSG), specializing in AI disruption and technology diligence for private equity and corporate clients. SSG's AI Disruption Lab conducts rapid assessments of how AI transforms and threatens existing business models and value chains. Keith teaches at Northeastern University and writes about strategy, management, and AI/technology, with Claude and Codex as AI collaborators.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>infrastructure</category>
      <category>management</category>
      <category>leadership</category>
    </item>
    <item>
      <title>AI Shrinkflation: Your AI Model Was Quietly Dialed Back</title>
      <dc:creator>Keith MacKay</dc:creator>
      <pubDate>Sat, 06 Jun 2026 23:50:30 +0000</pubDate>
      <link>https://dev.to/keithjmackay/ai-shrinkflation-your-ai-model-was-quietly-dialed-back-3p2a</link>
      <guid>https://dev.to/keithjmackay/ai-shrinkflation-your-ai-model-was-quietly-dialed-back-3p2a</guid>
      <description>&lt;h1&gt;
  
  
  AI Shrinkflation: Your AI Model Was Quietly Dialed Back
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;The age of token subsidization is over. AI providers are adjusting pricing, throttling capacity, and dialing back model quality — and most users haven't noticed yet.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;You open your AI coding assistant on a Tuesday morning. Something feels off. The reasoning feels shallower. The context-building you relied on is gone. It's not stopping to read the file before editing it. You aren't imagining it.&lt;/p&gt;

&lt;p&gt;A senior director at AMD's AI group ran the numbers: 6,852 Claude Code session files, 17,871 thinking blocks, 234,760 tool calls. Her analysis found that reasoning depth had dropped roughly 67% following a February 2026 update. [1] The model had shifted from "read first, then edit" to "edit without reading context." Code quality suffered accordingly.&lt;/p&gt;

&lt;p&gt;And when a developer discovered &lt;em&gt;why&lt;/em&gt;, the story got more interesting: Anthropic had quietly injected a &lt;code&gt;reasoning_effort&lt;/code&gt; parameter set to 25 out of 100 into consumer-facing Claude.ai sessions — visible only via extended thinking introspection. [2] The same model, a fraction of the effort. Same price. No announcement.&lt;/p&gt;

&lt;p&gt;That's not a bug. That's policy.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pattern Is Everywhere
&lt;/h2&gt;

&lt;p&gt;Anthropic isn't alone. The entire AI industry is discovering the same problem at the same time: demand is growing faster than supply.&lt;/p&gt;

&lt;p&gt;The math is brutal. Hyperscalers are on track to spend $660-690 billion on AI infrastructure in 2026 — nearly double 2025. [3] And yet: seven-year queues for data center power connections in Northern Virginia. [4] GPU memory prices up 60%, with every major manufacturer's 2026 output pre-sold. [5] PJM capacity prices jumped 11x in a single year. [6]&lt;/p&gt;

&lt;p&gt;You cannot spend your way out of a physical constraint. Not in the short run. So providers are rationing.&lt;/p&gt;

&lt;p&gt;Anthropic introduced peak and off-peak pricing in March 2026: session allowances now deplete faster between 8 AM and 2 PM ET on weekdays. [7] The weekly limit is unchanged. The usable window — for the developers who actually work during business hours — effectively shrank. OpenAI moved in the same direction a year earlier with "Flex processing": a permanent lower-priority tier that prices tokens at 50% off in exchange for slower responses and occasional resource unavailability. [8] Different mechanism, same economic logic — reward users who yield capacity when demand is high.&lt;/p&gt;

&lt;p&gt;The 1M token context window launched with additive pricing: prompts exceeding 200K tokens triggered a 2x input cost surcharge. Anthropic removed that surcharge in March 2026, a real improvement, but the structure reveals the instinct. [9] When capacity is tight, long-context usage — the most resource-intensive workload — is where pricing adjustments land first.&lt;/p&gt;

&lt;p&gt;Anthropic also moved to block third-party agentic harnesses from using consumer Claude subscriptions, citing compute strain. [10] The tools users had integrated into their workflows suddenly stopped working.&lt;/p&gt;

&lt;p&gt;Google Gemini cut free tier quotas 50-80% in December 2025 — daily request limits dropped from 500 to 100. Official explanation: abuse prevention. [11] The timing, as enterprise demand was accelerating, is not a coincidence.&lt;/p&gt;

&lt;p&gt;OpenAI's developer community has spent months debating whether the models they're paying for are the same models they were promised. The threads are long, the evidence anecdotal but consistent, and Sam Altman has acknowledged mistakes were made. [12] The referral credit economy has compressed. Credits that once represented meaningful enticements for advocates now arrive in modest increments, and the sweepstakes-and-guest-pass approach that replaced them tells the same story.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The pattern across providers:&lt;/strong&gt; add pricing tiers for heavy usage, throttle capacity during peaks, reduce inference depth quietly, restrict third-party access, and shrink incentive programs. Each move makes sense in isolation. Together, they read as a coordinated adjustment to the math of unsustainable subsidization.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Subsidy You Didn't Know Was There
&lt;/h2&gt;

&lt;p&gt;Introductory AI pricing was never honest economics. It was customer acquisition. Providers burned capital to establish habits, build developer ecosystems, and win enterprise contracts. The cost per token dropped 40-50x per year for five years. [13] You weren't paying the actual cost of the service. You were the beneficiary of a land grab.&lt;/p&gt;

&lt;p&gt;The land is now grabbed. The infrastructure is strained. And the economics have to close.&lt;/p&gt;

&lt;p&gt;This is not an AI-specific story. Cloud computing ran the same playbook: AWS gave away compute at below-cost rates through the early 2010s to lock in developers, then normalized pricing once the ecosystem was dependent. The difference is that cloud infrastructure was capital-efficient at scale — the same server handled any workload. AI inference is capital-intensive in ways cloud compute was not: memory-bound, power-hungry, model-specific, and subject to depreciation curves that don't spare the assets you already bought.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;There is no free ride at the end of the scarcity tunnel.&lt;/strong&gt; Providers are adjusting now because the alternative is running infrastructure they cannot financially support.&lt;/p&gt;

&lt;h2&gt;
  
  
  But the Math Still Works
&lt;/h2&gt;

&lt;p&gt;This is where many analysts get the story wrong. Token subsidization ending is not a reason to slow down your AI adoption. The efficiency gain on top of non-AI alternatives remains decisive, even after pricing normalization.&lt;/p&gt;

&lt;p&gt;The practical developer benchmark: an experienced engineer using well-configured AI tooling covers roughly 2-4x more meaningful ground per day than without it — and that's after accounting for the J-curve learning costs researchers have documented. [14] At any remotely normalized labor cost, that math is not close.&lt;/p&gt;

&lt;p&gt;The market will grumble. It always does when free becomes paid. But grumbling and canceling are different behaviors, and cancellation requires having a comparable alternative. Most enterprises don't have one.&lt;/p&gt;

&lt;p&gt;The productivity gap between AI-augmented and non-AI teams is widening, not closing. Organizations that pause AI investment while the pricing settles will spend months or years ceding ground to competitors who absorbed the cost increase and kept building.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Possible Futures
&lt;/h2&gt;

&lt;p&gt;The infrastructure strain will resolve. The question is how.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario 1: Consolidation and firesale.&lt;/strong&gt; The AI infrastructure build-out parallels the 1990s telecom fiber boom with uncomfortable precision. Telecom companies spent $500 billion between 1996-2001. By 2002, an estimated 95% of installed fiber was still dark. WorldCom, Global Crossing, and 360networks filed for bankruptcy. [15] Their assets sold at cents on the dollar — and Google, Microsoft, and Facebook later built their networks on top of that cheap foundation.&lt;/p&gt;

&lt;p&gt;The GPU is not the fiber. GPUs depreciate rapidly; the H100s sitting in a failed AI startup's data center in 2028 won't find a second life the way dark fiber did. The computing-specific risk is that stranded assets hold less residual value. But the broader structural parallel holds: builders who overextended will sell assets under duress, and the followers will capitalize. Short-term disruption, longer-term normalization.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario 2: Local inference grows up.&lt;/strong&gt; The local model story is no longer aspirational. Ollama hit 52 million monthly downloads in Q1 2026 — 520x growth from three years prior — and a 32B parameter model now runs at over 80% of frontier quality on commodity Mac hardware. [16] For summarization, coding assistance, RAG over private data, and most enterprise knowledge work, local inference is quietly crossing the threshold of good enough.&lt;/p&gt;

&lt;p&gt;For organizations with predictable workloads, data sovereignty requirements, or genuine cost management pressure, the hybrid model — local inference for routine tasks, cloud for frontier reasoning — is becoming economically rational. The cloud pricing pressure is accelerating this decision.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario 3: Efficiency compounds and absorbs the cost.&lt;/strong&gt; The cost per token has dropped roughly 40-50x per year for five years. [13] The inference capacity constraints will ease as new data centers come online and chip production scales. Algorithmic efficiency improvements — smaller models matching larger predecessors, better quantization, smarter inference — continue to drive capability per dollar upward. The current squeeze may be a 12-24 month phenomenon rather than a permanent structural shift.&lt;/p&gt;

&lt;p&gt;All three scenarios can be true simultaneously, in different parts of the market.&lt;/p&gt;

&lt;h2&gt;
  
  
  What To Do About It
&lt;/h2&gt;

&lt;p&gt;The answer is not to wait for the pricing environment to improve. It's to build for a hybrid world.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stop assuming cloud tokens are the default for everything.&lt;/strong&gt; Evaluate your workload mix: which tasks require frontier reasoning, and which are well within local model capability? For document summarization, code review on familiar patterns, and structured data extraction, you may already be paying frontier prices for sub-frontier work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Treat token budgets as real budget lines.&lt;/strong&gt; The informal assumption that AI usage is "basically free" has driven a lot of unoptimized workflow design. Route queries to the right model for the task. Cache repeated prompts. Convert high-frequency, low-complexity AI calls to deterministic scripts wherever possible. Model routing isn't just cost management — it's the right engineering discipline regardless of price.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Plan for provider pricing variance.&lt;/strong&gt; Lock in access and pricing commitments where possible. Diversify across providers — not for fear of any single provider's stability, but because the competitive pressure among providers is the mechanism that will keep pricing in check. An organization dependent on a single provider has no leverage in that negotiation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Build hybrid model rollover capability now.&lt;/strong&gt; The infrastructure to switch between cloud and local inference mid-workflow is worth building before you need it, not during a price spike or capacity crisis. The teams that have this plumbing in place will be able to respond to pricing changes in hours, not quarters.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;The free sample era is over. AI providers are throttling, pricing, and quality-adjusting their way toward sustainable economics. The market will grumble and bear it — because the productivity math still wins, decisively. The organizations that understand this will neither overreact to the cost increase nor underinvest in the hybrid infrastructure that gives them options when the next adjustment comes.&lt;/p&gt;

&lt;p&gt;Token subsidization built the habits. The habits are now the asset. Protect them by building for durability, not for the pricing environment you got used to.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Are you seeing AI quality or capacity changes in your daily workflow? Have you started building local model capabilities alongside cloud? I'd like to understand how your organization is adjusting to the new pricing reality.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;If this resonated, here are some related articles:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For the deeper math on why AI ROI still wins even as infrastructure scarcity drives costs up: &lt;a href="https://www.linkedin.com/pulse/ai-infrastructure-scarcity-raising-costs-usage-still-provide-mackay-y2hce/" rel="noopener noreferrer"&gt;AI Infrastructure Scarcity is Raising Costs, but AI Usage Will Still Provide Unbeatable ROI&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/p/ai-infrastructure-scarcity-is-raising" rel="noopener noreferrer"&gt;Substack&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;For how unmanaged token spend will blindside organizations the same way cloud sprawl did: &lt;a href="https://www.linkedin.com/pulse/token-bill-coming-nobodys-ready-keith-mackay-ltfme/" rel="noopener noreferrer"&gt;The Token Bill Is Coming. Nobody's Ready for It.&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;For why AI tool performance degrades in ways that compound the quality-reduction problem this article describes: &lt;a href="https://www.linkedin.com/pulse/context-why-ai-tools-degrade-over-longer-work-sessions-keith-mackay-iouxc/" rel="noopener noreferrer"&gt;Context in Context: Why AI Tools Degrade Over Longer Work Sessions&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/p/context-in-context-why-ai-tools-degrade" rel="noopener noreferrer"&gt;Substack&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;For why the boom/bust infrastructure cycle this article references is part of a larger exponential pattern most organizations are still misreading: &lt;a href="https://www.linkedin.com/pulse/were-linear-thinkers-exponentially-changing-world-keith-mackay-ckoqe/" rel="noopener noreferrer"&gt;We're Linear Thinkers in an Exponentially-Changing World&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/p/were-linear-thinkers-in-an-exponential" rel="noopener noreferrer"&gt;Substack&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Stella Laurenzo, &lt;a href="https://github.com/anthropics/claude-code/issues/42796" rel="noopener noreferrer"&gt;GitHub: Claude Code is unusable for complex engineering tasks with Feb updates&lt;/a&gt;, April 2026. (Analysis of 6,852 Claude Code session files; 67% drop in reasoning depth.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Om Patel (via X/Twitter), &lt;a href="https://news.ycombinator.com/item?id=47724951" rel="noopener noreferrer"&gt;Anthropic injects reasoning_effort=25 into Claude.ai consumer system prompts&lt;/a&gt;, Hacker News thread, April 2026.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;IEEE ComSoc Tech Blog, &lt;a href="https://techblog.comsoc.org/2025/09/27/big-tech-spending-on-ai-data-centers-and-infrastructure-vs-the-fiber-optic-buildout-during-the-dot-com-boom-bust/" rel="noopener noreferrer"&gt;Big tech spending on AI data centers and infrastructure vs. the fiber optic buildout during the dot-com boom/bust&lt;/a&gt;, September 2025.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Bloomberg, &lt;a href="https://www.bloomberg.com/news/articles/2024-08-29/data-centers-face-seven-year-wait-for-power-hookups-in-virginia" rel="noopener noreferrer"&gt;Virginia Data Centers Face Seven-Year Wait for Power&lt;/a&gt;, August 2024.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CNBC, &lt;a href="https://www.cnbc.com/2026/01/10/micron-ai-memory-shortage-hbm-nvidia-samsung.html" rel="noopener noreferrer"&gt;AI memory is sold out&lt;/a&gt;, January 2026. (Prices up 30-60%; 70% of DRAM allocated to AI.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;IEEFA, &lt;a href="https://ieefa.org/resources/projected-data-center-growth-spurs-pjm-capacity-prices-factor-10" rel="noopener noreferrer"&gt;Projected data center growth spurs PJM capacity prices factor 10&lt;/a&gt;, 2025.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Register, &lt;a href="https://www.theregister.com/2026/03/26/anthropic_tweaks_usage_limits/" rel="noopener noreferrer"&gt;Anthropic tweaks Claude usage limits to manage capacity&lt;/a&gt;, March 26, 2026.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;OpenAI, &lt;a href="https://platform.openai.com/docs/guides/flex-processing" rel="noopener noreferrer"&gt;Flex processing&lt;/a&gt;, April 2025. (50% off standard pricing in exchange for lower-priority, slower responses; launched alongside o3 and o4-mini.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The New Stack, &lt;a href="https://thenewstack.io/claude-million-token-pricing/" rel="noopener noreferrer"&gt;Anthropic makes a pricing change that matters for Claude's longest prompts&lt;/a&gt;, 2026.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;VentureBeat, &lt;a href="https://venturebeat.com/technology/anthropic-cuts-off-the-ability-to-use-claude-subscriptions-with-openclaw-and" rel="noopener noreferrer"&gt;Anthropic cuts off the ability to use Claude subscriptions with OpenClaw and third-party agents&lt;/a&gt;, 2026.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;AI Free API, &lt;a href="https://www.aifreeapi.com/en/posts/gemini-api-free-tier-rate-limits" rel="noopener noreferrer"&gt;Gemini API Free Tier Rate Limits&lt;/a&gt;, December 2025.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;OpenAI Developer Community, &lt;a href="https://community.openai.com/t/did-openai-secretly-downgrade-our-models-while-everyone-was-leaving/1019206" rel="noopener noreferrer"&gt;Did OpenAI secretly downgrade our models while everyone was leaving?&lt;/a&gt;, 2025.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Epoch AI, &lt;a href="https://epoch.ai/data-insights/llm-inference-price-trends" rel="noopener noreferrer"&gt;LLM inference prices have fallen rapidly but unequally across tasks&lt;/a&gt;, 2025. (Median ~50x/year cost decline.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;METR, &lt;a href="https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/" rel="noopener noreferrer"&gt;Measuring the Impact of Early-2025 AI on Developer Productivity&lt;/a&gt;, July 2025.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Wikipedia, &lt;a href="https://en.wikipedia.org/wiki/Telecoms_crash" rel="noopener noreferrer"&gt;Telecoms crash&lt;/a&gt;. (WorldCom, Global Crossing bankruptcies; $500B+ in telecom investment 1996-2001.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;DEV.to, &lt;a href="https://dev.to/pooyagolchian/local-ai-in-2026-ollama-benchmarks-0-inference-and-the-end-of-per-token-pricing-32e7"&gt;Local AI in 2026: Ollama Benchmarks, $0 Inference, and the End of Per-Token Pricing&lt;/a&gt;, 2026. (Ollama at 52M monthly downloads; Qwen 2.5 32B at 83.2% MMLU on Mac Studio.)&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;em&gt;Keith MacKay is a technology strategy consultant and CTO in EY-Parthenon's Software Strategy Group (SSG), specializing in AI disruption and technology diligence for private equity and corporate clients. SSG's AI Disruption Lab conducts rapid assessments of how AI transforms and threatens existing business models and value chains. Keith teaches at Northeastern University and writes about strategy, management, and AI/technology, with Claude and Codex as AI collaborators.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>infrastructure</category>
      <category>ai</category>
      <category>aiops</category>
    </item>
    <item>
      <title>40-Year-Old Bug. Claude Found It Before the Author Did.</title>
      <dc:creator>Keith MacKay</dc:creator>
      <pubDate>Sat, 06 Jun 2026 23:30:30 +0000</pubDate>
      <link>https://dev.to/keithjmackay/40-year-old-bug-claude-found-it-before-the-author-did-50ga</link>
      <guid>https://dev.to/keithjmackay/40-year-old-bug-claude-found-it-before-the-author-did-50ga</guid>
      <description>&lt;h1&gt;
  
  
  40-Year-Old Bug. Claude Found It Before the Author Did.
&lt;/h1&gt;

&lt;p&gt;Mark Russinovich — Microsoft's Azure CTO — handed Claude Opus 4.6 a binary from 1986.&lt;/p&gt;

&lt;p&gt;Not source code. A binary. Raw 6502 machine language he'd written as a teenager for the Apple II. No comments. No variable names. No documentation. Just bytes.&lt;/p&gt;

&lt;p&gt;Claude decompiled it, reconstructed the logic, and flagged a silent error that had been sitting there, undetected, for forty years: if a destination BASIC line wasn't found during a GOTO, the program would silently advance to the next line — or march right past the end of the program — instead of raising an error. The fix was four instructions. Check the carry flag. Branch to the error handler. Done.&lt;/p&gt;

&lt;p&gt;Russinovich's conclusion: "We are entering an era of automated, AI-accelerated vulnerability discovery that will be leveraged by both defenders and attackers." [1]&lt;/p&gt;




&lt;p&gt;This exercise raises two points that I think are important -- one about security strategy, and one about where we're going.&lt;/p&gt;

&lt;p&gt;I wrote recently about write-only code — the coming era where AI generates software that no human ever reviews (I linked the article below). The thesis was directional: we're moving toward machine-native code that AI writes, AI maintains, and AI debugs, with humans specifying intent in plain English and staying out of the middle. The human-readable layer exists for humans. Remove humans from the loop and you don't need it.&lt;/p&gt;

&lt;p&gt;Russinovich's experiment demonstrates the other side of that equation.&lt;/p&gt;

&lt;p&gt;AI doesn't just write code humans can't read. It &lt;em&gt;reads&lt;/em&gt; code humans can't read either.&lt;/p&gt;

&lt;p&gt;A trained engineer can parse 6502 assembly with enough time and a reference manual (I've done it with my own Vic-20 and Commodore-64 6502 code, at about the time Russinovich wrote the code in question). But nobody was going to sit down with Russinovich's 40-year-old Enhancer utility and do a security audit. That binary was archaeologically frozen: working, shipped, forgotten. The knowledge of what it did lived only in the mind of a teenager who is now one of the most senior technologists in the world — and even he apparently missed the carry flag bug.&lt;/p&gt;

&lt;p&gt;Claude read the binary in the time it takes to refresh a browser tab.&lt;/p&gt;




&lt;p&gt;This is the two-way mirror.&lt;/p&gt;

&lt;p&gt;The write-only code future says: AI writes machine code (because humans don't need to read it).&lt;/p&gt;

&lt;p&gt;The Russinovich experiment says: AI can read machine code, whomever wrote it.&lt;/p&gt;

&lt;p&gt;Together, they describe a world where the human-readable middle layer — every programming language you have ever used — is optional infrastructure. We built it for us. We needed it because we were the ones doing the translating. The moment AI does both the writing and the reading, the translation layer becomes a legacy artifact maintained by sentiment and inertia, not necessity. Must a human remain in, on, or around the loop? I say yes, for now, but it's a window that will continue to close. As I pointed out in my article, many top devs are reading specs and reports, not code. And there is little doubt that trend will continue and will accelerate.&lt;/p&gt;

&lt;p&gt;The implications for security alone are staggering. Every compiled binary in production — closed-source, stripped, obfuscated — is now legible to a model with enough context. Firmware on network devices. Legacy financial systems running on 1990s-era compiled code nobody can find the source for. Embedded controllers in industrial equipment. The attack surface that "security through obscurity" has quietly protected for decades is eroding fast.&lt;/p&gt;

&lt;p&gt;It should be noted that security through obscurity was &lt;em&gt;never&lt;/em&gt; great security (it was analogous to "The Club" steering wheel lock, but for software...breakable, but less work to just move on to something without it). If Russinovich's experiment proves anything, it proves that the concept is now utterly defunct. A tireless AI can comfortably untangle any obscuration scheme to reveal underlying code logic.&lt;/p&gt;

&lt;p&gt;To be clear, defenders gain something too. Russinovich's carry flag bug caused silent incorrect behavior. In a firmware context, that same pattern could be a vulnerability. AI reading the binary finds it before the attacker does — if the defender moves first.&lt;/p&gt;




&lt;p&gt;We're not at Phase 4 yet. I described it as the longer-term future: natural language in, optimized binaries out, maintained entirely by models without a human-readable representation at any stage. We're still in the early phases — AI writing readable code, humans reviewing it less and less.&lt;/p&gt;

&lt;p&gt;But Russinovich's experiment is a signpost. The model doesn't need source code to understand software. It doesn't need variable names or comments or clean abstractions. It can work directly with what the machine actually executes.&lt;/p&gt;

&lt;p&gt;That's not a parlor trick. That's a capability shift.&lt;/p&gt;

&lt;p&gt;The programming languages we've spent sixty years building were translation layers between human thought and machine execution. AI is becoming fluent in both languages natively. The translation layer is still useful — but it's no longer strictly required.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I explored the write-only code thesis and what it means for every programming language you've ever loved in my earlier piece: &lt;a href="https://www.linkedin.com/pulse/when-ai-stops-writing-code-humans-keith-mackay-8y37e" rel="noopener noreferrer"&gt;When AI Stops Writing Code for Humans&lt;/a&gt;. Russinovich's experiment is the live proof-of-concept for the read side of that argument.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What's your take -- does this change how you think about legacy system security? About the future of code review?&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://www.linkedin.com/posts/markrussinovich_opus-46s-security-audit-of-my-1986-code-activity-7436235669938614272-IV5f" rel="noopener noreferrer"&gt;Opus 4.6's Security Audit of My 1986 Code -- Mark Russinovich, LinkedIn&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;strong&gt;If this resonated, here are some related articles:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For the write-only code thesis that Russinovich's experiment puts to the test from the other side -- AI reading code humans can't: &lt;a href="https://www.linkedin.com/pulse/when-ai-stops-writing-code-humans-keith-mackay-8y37e" rel="noopener noreferrer"&gt;When AI Stops Writing Code for Humans&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/p/when-ai-stops-writing-code-for-humans" rel="noopener noreferrer"&gt;Substack&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;For what AI-powered attack surface expansion means for personal and corporate security right now -- and why the current security model wasn't built for agentic AI: &lt;a href="https://www.linkedin.com/pulse/personal-corporate-security-agentic-world-keith-mackay-ocjce/" rel="noopener noreferrer"&gt;Personal and Corporate Security in the Age of AI&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/p/personal-and-corporate-security-in" rel="noopener noreferrer"&gt;Substack&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;"Security through obscurity" was always a shallow moat -- for which software defenses actually hold when AI can read anything: &lt;a href="https://www.linkedin.com/pulse/software-moats-age-ai-whats-actually-defensible-keith-mackay-ibsde" rel="noopener noreferrer"&gt;Software Moats in the Age of AI: What's Actually Defensible?&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/p/software-moats-in-the-age-of-ai" rel="noopener noreferrer"&gt;Substack&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;For the agent infrastructure layer being built to give AI models like Claude the tools to do exactly what Russinovich demonstrated -- at scale, autonomously: &lt;a href="https://www.linkedin.com/pulse/internet-agents-keith-mackay-uhmfe/" rel="noopener noreferrer"&gt;The Internet Is for Agents&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/p/the-internet-is-for-agents" rel="noopener noreferrer"&gt;Substack&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Keith MacKay is a technology strategy consultant and CTO in EY-Parthenon's Software Strategy Group (SSG), specializing in AI disruption and technology diligence for private equity and corporate clients. SSG's AI Disruption Lab conducts rapid assessments of how AI transforms and threatens existing business models and value chains. Keith teaches at Northeastern University and writes about strategy, management, and AI/technology.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>security</category>
      <category>coding</category>
      <category>architecture</category>
    </item>
    <item>
      <title>The Dunning-Kruger Effect, Now Available at Enterprise Scale</title>
      <dc:creator>Keith MacKay</dc:creator>
      <pubDate>Sat, 06 Jun 2026 23:25:08 +0000</pubDate>
      <link>https://dev.to/keithjmackay/the-dunning-kruger-effect-now-available-at-enterprise-scale-1hnh</link>
      <guid>https://dev.to/keithjmackay/the-dunning-kruger-effect-now-available-at-enterprise-scale-1hnh</guid>
      <description>&lt;h1&gt;
  
  
  The Dunning-Kruger Effect, Now Available at Enterprise Scale
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;The research is starting to coalesce, with different angles on whether AI is degrading human cognition or not. The right response is not panic--but it's also not dismissal.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;"Cognitive surrender" (outsourcing one's thinking to AI entirely for a given task) has been creeping into our current zeitgeist as a pre-accepted truth, and researchers have started to examine it more deeply. There are several papers that I think deserve serious attention from anyone making AI adoption decisions, and/or training their workforce in effective use of AI. In one, MIT Media Lab researchers strapped EEG headsets on 54 participants and measured brain activity while they wrote essays with ChatGPT, with a search engine, or with no tools at all. [1] In the second, MIT Sloan's Sinan Aral and Michael Caosun built a formal economic model showing how AI productivity gains rationally lead to skill erosion over time--what they call the "augmentation trap." [2] And Wharton's Steven Shaw and Gideon Nave ran three preregistered behavioral experiments introducing and exploring a specific definition for "cognitive surrender": the measurable tendency to adopt AI outputs without scrutiny, even when the AI is confidently, demonstrably wrong. [3]&lt;/p&gt;

&lt;p&gt;All three are credible. All three point at something real. And all three have generated coverage that ranges from thoughtful to hysterical--headlines like "ChatGPT is rotting your brain" that the MIT Media Lab authors publicly pushed back on, explicitly asking journalists not to use words like "stupid," "dumb," or "brain damage." The paper's actual finding is more interesting and more nuanced than the coverage suggested, and considerably more actionable.&lt;/p&gt;

&lt;p&gt;This is the moment to think carefully--not because the research is wrong, but because how leaders read it will determine whether they respond intelligently or reactively.&lt;/p&gt;

&lt;h3&gt;
  
  
  We Have Heard This Song Before
&lt;/h3&gt;

&lt;p&gt;In Plato's &lt;em&gt;Phaedrus&lt;/em&gt;, Socrates argues that writing will destroy memory and wisdom. Students who read without a teacher's guidance will accumulate the appearance of knowledge without the substance--"thought very knowledgeable when they are for the most part quite ignorant." The mechanism he feared was direct: outsource memory to text, practice it less, lose it.&lt;/p&gt;

&lt;p&gt;He wasn't entirely wrong. We do rely on external storage rather than memory. We do confuse access to information with understanding it (how many instant experts has the internet created? For some of us, it is SUCH a seductive trap!). However, despite the truth lying beneath the fear, civilization didn't collapse. Instead, writing enabled knowledge accumulation at a scale that more than compensated for what individual memory lost.&lt;/p&gt;

&lt;p&gt;The calculator debate in the 1970s and '80s had the same structure. Math educators feared—with the same direct mechanistic logic—that students given calculators would never develop arithmetic fluency. The National Council of Teachers of Mathematics debated restrictions for a decade. Some states banned calculators in early grades.&lt;/p&gt;

&lt;p&gt;What happened: some arithmetic fluency did decline. Mathematical &lt;em&gt;thinking&lt;/em&gt; didn't collapse—it shifted. Students spent less time on long division and more on modeling and reasoning. The catastrophic version never materialized. But neither did the rosy version where everyone became a better mathematician by default. What mattered was whether educators were &lt;em&gt;deliberate&lt;/em&gt; about what they were preserving and what they were letting go.&lt;/p&gt;

&lt;p&gt;GPS and spatial navigation followed the same script, with one twist: here, the cognitive effect was confirmed at the neural level. Maguire et al.'s studies of London taxi drivers showed measurable hippocampal development from active wayfinding—development that GPS-reliant drivers simply didn't build. [4] The mechanism wasn't theoretical. It was visible in brain scans. And yet: people didn't get catastrophically lost. The skill became less universal, persisted where it was practiced, and society adapted to the change in the value of the skill itself.&lt;/p&gt;

&lt;p&gt;The pattern is consistent across every major cognitive tool transition: the concern has partial merit, the catastrophic version doesn't materialize, and the outcome depends almost entirely on whether the transition was managed deliberately. The question for AI isn't whether it will change how people think—it will. It's whether you're managing the transition or just watching it happen.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the Research Shows
&lt;/h3&gt;

&lt;p&gt;With that frame in place, let's look at three papers and what they can tell us.&lt;/p&gt;

&lt;p&gt;Start with the MIT Media Lab study, because it produced the most visceral evidence and the most distorted coverage. Kosmyna, Maes, and colleagues gave 54 participants EEG headsets and asked them to write SAT-style essays in one of three conditions: ChatGPT, a search engine, or no tools. [1] The neural results were stark: brain-only writers showed the strongest and most distributed cognitive engagement; LLM users showed the weakest. Human raters called many AI-assisted essays polished but "soulless." And when the AI group had their tools removed in a follow-up session, they struggled more than participants who'd always worked independently.&lt;/p&gt;

&lt;p&gt;The crucial nuance—the part that "ChatGPT is rotting your brain" headlines stripped out—is what happened to brain-only participants when they &lt;em&gt;were&lt;/em&gt; given AI access in the follow-up. They showed &lt;em&gt;increased&lt;/em&gt; neural engagement and used &lt;em&gt;more&lt;/em&gt; sophisticated prompting than the AI-first group! &lt;strong&gt;The sequencing mattered.&lt;/strong&gt; AI used &lt;em&gt;after&lt;/em&gt; independent thinking enhanced outcomes. AI used &lt;em&gt;instead of&lt;/em&gt; thinking reduced them. The researchers called this accumulated cost "cognitive debt."&lt;/p&gt;

&lt;p&gt;The press got the alarm right, but the prescription wrong.&lt;/p&gt;

&lt;p&gt;Shaw &amp;amp; Nave ran three preregistered experiments using the Cognitive Reflection Test, measuring how 1,372 participants reasoned with and without access to an AI assistant. [3] Their key findings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Participants consulted the AI on more than half of trials—voluntarily&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;When the AI was accurate, reasoning accuracy improved +25 percentage points. When the AI was wrong, reasoning accuracy &lt;em&gt;fell&lt;/em&gt; 15 percentage points&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Confidence inflated &lt;strong&gt;regardless of whether the AI was right or wrong&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The people most likely to surrender were those with higher AI trust and lower "need for cognition"—a stable individual trait measuring how much people &lt;em&gt;enjoy&lt;/em&gt; effortful thinking&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The critical finding isn't that AI makes us wrong. It's that AI makes us confidently wrong.&lt;/strong&gt; And it most affects the people who were already least likely to push back on an authoritative-sounding answer.&lt;/p&gt;

&lt;p&gt;Psychologists have a name for this dynamic in humans: the Dunning-Kruger effect. People with low competence in a domain systematically overestimate their ability—not out of arrogance, but because the skills required to recognize poor performance are the same skills required to produce good performance. You need to know what "good" looks like to know when you're falling short of it.&lt;/p&gt;

&lt;p&gt;AI is the Dunning-Kruger effect, institutionalized at scale in every domain.&lt;/p&gt;

&lt;p&gt;Researchers have a name for this in professional settings: automation bias. Mosier and Skitka documented it studying pilots, medical teams, and air traffic controllers—automated recommendations get followed even when they contradict other available evidence, because the system's confidence is legible and the operator's own uncertainty is not. [5] Safety-critical industries eventually built mandatory override protocols around this finding. Most business AI deployments haven't.&lt;/p&gt;

&lt;p&gt;The model never hedges in proportion to its actual reliability. It doesn't have the metacognitive machinery to flag genuine uncertainty. &lt;strong&gt;It answers confidently whether it's right or wrong&lt;/strong&gt;—because confidence is a stylistic feature of fluent text generation, not a signal of understanding of knowledge level. Shaw &amp;amp; Nave measured the result directly: confidence inflated even when the AI was wrong, and participants adopted that inflated confidence as their own. [3] You're not just getting a wrong answer. You're getting a wrong answer delivered with the bearing of expertise, which you then carry into the next conversation, the next deck, the next decision.&lt;/p&gt;

&lt;p&gt;The original human Dunning-Kruger problem is somewhat bounded, because reality may eventually intrude. The person who doesn't know what they don't know may encounter an outcome that forces recalibration. AI-mediated Dunning-Kruger is a bit more pernicious: users accept the confident wrong answer, act on it, and may never encounter the feedback loop that would flag the error. The model moves on. So does everyone else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The risk isn't that AI makes individuals incompetent. It's that organizations embed confidently-wrong reasoning into decisions at scale, with no natural correction mechanism.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;There's a second-order problem that's harder to see. Memory researchers call it source monitoring failure: over time, people lose the ability to distinguish between ideas they generated themselves and information that came from an external source. [6] You remember the conclusion. You don't remember that the AI produced it. In six months, the person who accepted an AI-drafted strategic framing may genuinely recall it as their own thinking—which makes it invisible to challenge. Cognitive surrender doesn't just corrupt a single decision. It quietly colonizes how someone understands a problem domain, one unexamined AI output at a time.&lt;/p&gt;

&lt;p&gt;The Caosun &amp;amp; Aral paper, from MIT's Sloan School, works at a different level--it's a formal dynamic model rather than a behavioral experiment--but its logic compounds the concern. [2] Even a decision-maker who fully understands that AI use erodes skill will &lt;em&gt;rationally&lt;/em&gt; over-use AI when short-term productivity gains are front-loaded. The long-run trap closes slowly and invisibly. By the time the capability gap shows up, it's a workforce problem, not an individual choice problem.&lt;/p&gt;

&lt;p&gt;Their most striking result exhibited by the model for me: &lt;strong&gt;workers don't converge on a mediocre equilibrium. They diverge.&lt;/strong&gt; Experienced workers who enter AI adoption with a deep skill base realize their full potential. Less-experienced workers--who rely on AI before building foundational capability--can deskill to near zero. Same tools. Same organization. Opposite trajectories.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the Research Doesn't Show
&lt;/h3&gt;

&lt;p&gt;So what is happening when we use AI?&lt;/p&gt;

&lt;p&gt;The Shaw &amp;amp; Nave experiments use Cognitive Reflection Test items[3]--clever logic puzzles specifically designed to produce intuitive-but-wrong answers. These are exactly the conditions where cognitive surrender is most measurable and most consequential. But your VP of Engineering asking Claude to summarize a vendor proposal isn't in the same cognitive situation as someone tricked by a bat-and-ball problem. Real work involves feedback loops, stakes, domain expertise, and the opportunity to discover and correct errors.&lt;/p&gt;

&lt;p&gt;The Caosun &amp;amp; Aral model [2] has its own limitations. To build economics models, many real-world complexities are necessarily simplified or eschewed altogether, modeling a near-ideal state to best illustrate the principle(s) at hand.&lt;/p&gt;

&lt;p&gt;There's also a troubling empirical result that cuts across all three papers: a 2024 Nature Human Behaviour meta-analysis of 106 studies found that human-AI teams performed &lt;em&gt;significantly worse&lt;/em&gt; than the best of either humans or AI working alone. [7] The "centaur" model--human judgment amplified by AI capability--is not the default outcome. Rather, it only occurs as the result of intentional design. Most deployments don't clear that bar.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What we can say with confidence:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Cognitive surrender is a real, measurable phenomenon in controlled conditions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;AI used &lt;em&gt;instead of&lt;/em&gt; independent thinking reduces neural engagement and retention; AI used &lt;em&gt;after&lt;/em&gt; independent thinking can enhance it--sequencing is the variable [1]&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The augmentation trap dynamic is theoretically coherent and consistent with what we know about skill development&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Junior workers face asymmetric risk from poorly-designed AI deployments&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;AI-generated confidence inflation is a specific, identifiable failure mode&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What remains open:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;How large these effects are in real-world high-stakes work&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Whether the effects are permanent or recoverable with deliberate practice&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Whether the population of workers most at risk are the ones being given the most AI autonomy right now (hint: they probably are)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What C-Suite Leaders Should Do
&lt;/h3&gt;

&lt;p&gt;The history of cognitive tools doesn't counsel "ignore the research." It counsels "manage the transition deliberately rather than waiting to see what breaks." Every time organizations did that--intentional curriculum design for calculators, explicit navigation training for pilots who use autopilot--outcomes were manageable. Every time they didn't, skills eroded in ways that were expensive to recover.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Design for skill preservation, not just productivity extraction.&lt;/strong&gt; The Caosun &amp;amp; Aral model identifies five deployment regimes that separate beneficial from harmful adoption. The variable that matters most is whether the productivity gain depends on worker expertise or replaces it. Tools that amplify what a skilled person can do are structurally different from tools that substitute for skills that haven't been built yet. Know which category your deployments fall into.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Protect junior talent deliberately.&lt;/strong&gt; The K-shaped divergence result should be on every CHRO's radar. AI tools handed to junior employees before foundational skills are established don't accelerate their development--they interrupt it. The manager who lets a first-year analyst use AI for everything isn't mentoring them; they're deskilling them at scale. Ericsson's foundational research on expert performance is instructive here: expertise is built through deliberate practice--effortful engagement with tasks at the edge of current ability, with feedback, over time. [8] AI that removes the difficulty removes the mechanism. Structured challenge, supervised stretch work, and "AI-off" modes for developing staff aren't reactionary--they're the investment that makes your senior talent pipeline real rather than imaginary.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sequence AI access deliberately.&lt;/strong&gt; The MIT Media Lab finding is the most actionable result in any of these papers: participants who did independent cognitive work first, then got AI access, showed greater neural engagement and produced better outputs than those who went straight to AI. [1] I would argue that this is related to an insight in educational psychology from before AI that is known as the pretesting effect. This effect, documented by Richland, Kornell, and Kao in 2009, shows that attempting to answer questions &lt;em&gt;before&lt;/em&gt; instruction improves retention and test performance, even when the pretest attempts fail. [9] The struggle itself is the mechanism--what psychologists call the generation effect: information you produce yourself, even incorrectly, encodes more deeply than information you receive passively. [10] Failed retrieval sensitizes the brain to the correct answer when it arrives; effortless receipt of the correct answer leaves no such trace. The AI analog is direct: formulate your own answer first, however rough, then use AI to stress-test, extend, or correct it. Going straight to AI skips the step that makes the answer stick and the reasoning transferable. The prescription isn't "less AI." It's "earn the AI's answer before you accept it."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Add friction to high-stakes decisions.&lt;/strong&gt; Shaw &amp;amp; Nave's [3] confidence inflation finding has a direct organizational implication: when AI is in the room, the people most likely to flag errors are the ones least likely to speak up. Design review processes that assume this. Make override explicit and valued, not implicit and career-risky.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Track skill metrics alongside productivity metrics.&lt;/strong&gt; Organizations deploying AI at scale are measuring the right-hand side of the Caosun &amp;amp; Aral model--outputs--while remaining blind to the left-hand side: the skill stock being drawn down. Without direct assessment of capability, you won't see the augmentation trap closing until it's closed. By then, you've traded durable competitive advantage for a temporary productivity gain.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Bottom Line
&lt;/h3&gt;

&lt;p&gt;Writing didn't destroy memory. Calculators didn't destroy mathematical reasoning. GPS didn't make us permanently lost. In each case, the concern had some legitimate merit, the catastrophic version didn't happen, and the outcome depended on deliberate management of the transition.&lt;/p&gt;

&lt;p&gt;The research on AI cognitive effects--three papers, three different methods, three convergent signals--belongs in that lineage, not as reassurance but as instruction. Some version of the concern is right. The catastrophic version probably isn't. The variable that determines which outcome you get is how intentionally you manage it.&lt;/p&gt;

&lt;p&gt;Cognitive offloading is an opportunity. Cognitive surrender--letting the AI think because thinking is hard--is a risk. The gap between them is policy, design, and leadership.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Are you seeing signs of cognitive surrender in your organization—the AI-generated answer accepted without a second look? What structures are actually working to keep deliberation alive?&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  References
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://arxiv.org/abs/2506.08872" rel="noopener noreferrer"&gt;Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task — Kosmyna, Maes et al., MIT Media Lab (arXiv, 2025)&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://arxiv.org/abs/2604.03501" rel="noopener noreferrer"&gt;The Augmentation Trap: AI Productivity and the Cost of Cognitive Offloading — Caosun &amp;amp; Aral, MIT Sloan (arXiv, 2026)&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://ssrn.com/abstract=6097646" rel="noopener noreferrer"&gt;Thinking—Fast, Slow, and Artificial: How AI Is Reshaping Human Reasoning and the Rise of Cognitive Surrender — Shaw &amp;amp; Nave, Wharton (SSRN, 2026)&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://doi.org/10.1073/pnas.97.8.4398" rel="noopener noreferrer"&gt;Navigation-related structural change in the hippocampi of taxi drivers — Maguire et al. (2000), PNAS&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://doi.org/10.1207/s15327108ijap0801_3" rel="noopener noreferrer"&gt;Automation Bias: Decision Making and Performance in High-Technology Cockpits — Mosier, Skitka, Heers &amp;amp; Burdick (1998), International Journal of Aviation Psychology&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://doi.org/10.1037/0033-2909.114.1.3" rel="noopener noreferrer"&gt;Source Monitoring — Johnson, Hashtroudi &amp;amp; Lindsay (1993), Psychological Bulletin&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.nature.com/articles/s41562-024-02024-1" rel="noopener noreferrer"&gt;When Combinations of Humans and AI Are Useful: A Systematic Review and Meta-Analysis — Nature Human Behaviour (2024)&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://doi.org/10.1037/0033-295X.100.3.363" rel="noopener noreferrer"&gt;The Role of Deliberate Practice in the Acquisition of Expert Performance — Ericsson, Krampe &amp;amp; Tesch-Römer (1993), Psychological Review&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://doi.org/10.1037/a0016496" rel="noopener noreferrer"&gt;The Pretesting Effect: Do Unsuccessful Retrieval Attempts Enhance Learning? — Richland, Kornell &amp;amp; Kao (2009), Journal of Experimental Psychology: Applied&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://doi.org/10.1037/0278-7393.4.6.592" rel="noopener noreferrer"&gt;The Generation Effect: Delineation of a Phenomenon — Slamecka &amp;amp; Graf (1978), Journal of Experimental Psychology: Human Learning and Memory&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;strong&gt;If this resonated, here are some related articles:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;For why the antidote to cognitive surrender is keeping humans deliberately engaged &lt;em&gt;before&lt;/em&gt; the AI loop starts, not just reviewing outputs afterward: &lt;a href="https://www.linkedin.com/pulse/evolving-strategy-knowledge-work-from-keith-mackay-xiefe/" rel="noopener noreferrer"&gt;An Evolving Strategy for Knowledge Work: From Human-In-the-Loop to Human-Before-the-Loop&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/p/an-evolving-strategy-for-knowledge" rel="noopener noreferrer"&gt;Substack&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For my earlier take on why cognitive surrender is the core AI adoption risk and "use it or lose it" applies directly to augmentation: &lt;a href="https://www.linkedin.com/posts/keithmackay_the-modern-world-is-optimized-for-convenience-activity-7438246349352996864-OgNs" rel="noopener noreferrer"&gt;AI Is Best Used for Human Augmentation, Not Cognitive Surrender&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For the management lens on directing AI deliberately—because treating it as a vending machine is the organizational version of cognitive surrender: &lt;a href="https://www.linkedin.com/pulse/situational-leadership-ai-more-like-capable-colleague-keith-mackay-wjqoe" rel="noopener noreferrer"&gt;Situational Leadership for AI: More Like a Capable Colleague than a Fancy Formula&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/p/situational-leadership-for-ai" rel="noopener noreferrer"&gt;Substack&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For a sharper framing of why confidence without correctness is the core failure mode in AI outputs: &lt;a href="https://www.linkedin.com/posts/keithmackay_ive-noted-that-in-the-ai-era-problems-activity-7437501409257742336-UZoA" rel="noopener noreferrer"&gt;Plausibility Is Not Correctness&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For how a structured process amplifies AI output quality without replacing the thinking that makes it worth publishing: &lt;a href="https://www.linkedin.com/pulse/writers-who-use-ai-without-harness-one-published-article-keith-mackay-luooe/" rel="noopener noreferrer"&gt;Writers Who Use AI Without a Harness Are One Published Article From Disaster&lt;/a&gt; | &lt;a href="https://dev.to/keithjmackay/writers-who-use-ai-without-a-harness-are-one-published-article-from-disaster-568o-temp-slug-7575089"&gt;Dev.to&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/publish/post/196867639" rel="noopener noreferrer"&gt;Substack&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Keith MacKay is a technology strategy consultant and CTO in EY-Parthenon's Software Strategy Group (SSG), specializing in AI disruption and technology diligence for private equity and corporate clients. SSG's AI Disruption Lab conducts rapid assessments of how AI transforms and threatens existing business models and value chains. Keith teaches at Northeastern University and writes about strategy, management, and AI/technology, with Claude and Codex as AI collaborators.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>management</category>
      <category>leadership</category>
    </item>
    <item>
      <title>Writers Who Use AI Without a Harness Are One Published Article From Disaster</title>
      <dc:creator>Keith MacKay</dc:creator>
      <pubDate>Sat, 06 Jun 2026 23:23:32 +0000</pubDate>
      <link>https://dev.to/keithjmackay/writers-who-use-ai-without-a-harness-are-one-published-article-from-disaster-3gf2</link>
      <guid>https://dev.to/keithjmackay/writers-who-use-ai-without-a-harness-are-one-published-article-from-disaster-3gf2</guid>
      <description>&lt;h1&gt;
  
  
  Writers Who Use AI Without a Harness Are One Published Article From Disaster
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;AI can be tremendously helpful, or can drive you right into Disaster Chasm. Here are some ways to NOT get burned.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Just over a month ago, a staff writer at Ars Technica was fired. The journalist covering AI, Benj Edwards, senior AI reporter, got tripped up by AI.&lt;/p&gt;

&lt;p&gt;The published article attributed quotes to a real person. Those quotes were fabricated: not pulled from a transcript, not reconstructed from notes, not collected in an interview. A language model wrote words and put them in someone's mouth. The writer didn't catch it. The interviewee did, after publication. Ars Technica's editor-in-chief addressed it publicly and moved fast. Termination on discovery. No corrective action plan. Gone [1].&lt;/p&gt;

&lt;p&gt;The irony writes itself. The argument that follows is harder.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Essay Everyone's Sharing
&lt;/h2&gt;

&lt;p&gt;Around the same time, a short essay started circulating: 546 points on Hacker News. Title: "Don't Let AI Write For You." The core argument, distilled: writing is thinking. Hand the writing to a model, and you're not outsourcing prose. You're outsourcing cognition. "It is like paying somebody to work out for you" [2].&lt;/p&gt;

&lt;p&gt;That lands, and it's correct as far as it goes. The essay spread because it names something real: the uneasy feeling that AI-generated text isn't just lazier, it's emptier. The writer who lets a model draft their ideas isn't really the author of those ideas. The trust problem is real too. When a document reads as AI-generated, it signals that the sender didn't contend with the material. Bynder surveyed 2,000 consumers in the US and UK and found the gap is trust, not quality: 56% preferred the AI-written version when shown blind, but 52% said they'd disengage when they merely &lt;em&gt;suspected&lt;/em&gt; AI involvement [3]. The content wasn't worse. The lack of trust was.&lt;/p&gt;

&lt;p&gt;I don't disagree with any of this. But the essay sets up a binary that doesn't hold, and stopping there misses the more important question: what does legitimate AI collaboration actually look like?&lt;/p&gt;




&lt;h2&gt;
  
  
  Two Different Failures
&lt;/h2&gt;

&lt;p&gt;The Ars Technica incident and the "writing is thinking" critique are describing different problems. Conflating them makes both harder to address.&lt;/p&gt;

&lt;p&gt;The journalist's failure was not a thinking failure. It was a verification failure. The fabricated quotes weren't a product of outsourced cognition: they were a product of an unverified draft from a reporter who was under the gun and under the weather, apparently using some unvetted AI validation tools. The model filled in what it didn't know with what sounded plausible. Plausible is not the same as correct. This distinction matters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Language models write confident, fluent fabrications.&lt;/strong&gt; They've consumed enough interviews, profiles, and reported features to understand the cadence of a well-placed expert quote: the hedge, the technical specificity, the personality tell. When you ask a model to draft a section with relevant expert perspectives, it doesn't return &lt;code&gt;[QUOTE NEEDED]&lt;/code&gt;. It writes something. It sounds like attribution. It reads like reporting. Ironically, a colleague's AI tool alerted him to a problem and quoted the Bynder stat above, but attributed it to Nielsen, as several secondary sources do. I found the Bynder study. I found nothing from Nielsen speaking to this issue with these statistics.&lt;/p&gt;

&lt;p&gt;Stanford researchers found that when asked about specific federal court cases, LLMs hallucinate at least 69% of the time, and models tend toward overconfidence regardless of accuracy, stating fabrications with the same certainty as verified facts [4]. A separate Anthropic interpretability study identified internal circuits that activate hallucination specifically when a model recognizes a name but lacks sufficient information about it: the model knows enough to be dangerous, not enough to be accurate [5].&lt;/p&gt;

&lt;p&gt;The model doesn't know the difference between what it has read and what it's plausibly reconstructing. It treats both with equal confidence. Fluent output is not evidence of accuracy. That's true for humans too, but the scale is different.&lt;/p&gt;

&lt;p&gt;This is a different failure from ordinary human error. A human who invents a quote does so intentionally, and there's a trail. AI fabrication leaves no trail. The quote exists nowhere except in the model's output and, fatally, the published article.&lt;/p&gt;

&lt;p&gt;The "don't let AI write for you" critique is worried about cognitive outsourcing: the writer who hands over thinking and gets back something that approximates thought. That's worth worrying about. Microsoft Research surveyed 319 knowledge workers and found that reliance on AI correlates with reduced critical thinking effort, particularly for users with high trust in the tools [6]. MIT Media Lab went further: in a controlled essay-writing experiment, participants who used ChatGPT showed the weakest brain connectivity of any group, and struggled to accurately quote their own work afterward [7]. You can outsource your writing and outsource your judgment along with it. The MIT data suggests you can also lose track of what you actually said. Shen and Tamkin (2026) documented the same pattern in software developers: AI-assisted teams produced working code, but scored 17% worse than unassisted developers on conceptual quizzes about the code they'd just written [8]. The output ships. The understanding doesn't.&lt;/p&gt;

&lt;p&gt;What Shen and Tamkin also found is worth holding onto: not all AI-assisted developers showed comprehension loss. Three patterns produced good results: asking follow-up questions after generating code, requesting explanations alongside outputs, and using AI only for conceptual questions while debugging independently. The common thread across all three: staying engaged with understanding, not just output. The tool was in the loop. So was the brain.&lt;/p&gt;

&lt;p&gt;These are two separate failure modes. One is a workflow problem. One is a discipline problem. The fix for one doesn't address the other.&lt;/p&gt;




&lt;h2&gt;
  
  
  What a Defensible Process Actually Looks Like
&lt;/h2&gt;

&lt;p&gt;I write with an AI collaborator. I want to be direct about that, because the current discourse pushes toward either "I never use AI" (a missed opportunity, in my view) or radio silence. Neither is useful.&lt;/p&gt;

&lt;p&gt;Here is my workflow:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The outline is mine.&lt;/strong&gt; That is not a small thing. The outline is the argument: what question this piece answers, what the structure of the answer is, where it builds and where it lands. That thinking is mine. No model does it for me. This is the cognitive work the "don't let AI write for you" essay is correctly defending.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The model writes the first draft.&lt;/strong&gt; Given the outline, a structure, and a specific voice, it generates prose. This is closer to research assistance than ghostwriting: it executes the thinking I've already done. It's faster than I am at drafting. It's better at certain kinds of sustained exposition. It finds great sources that I don't find. I wrote a Claude skill that helps maintain active voice, draft stylistically like things I've written in the past, and review the coherence of the argument I'm making. I use it for that.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I edit with real scrutiny.&lt;/strong&gt; Not for typos. For accuracy: whether the argument is actually what I intended, whether anything got invented that I didn't put there. If something sounds authoritative but I can't trace it to a primary source, it doesn't stay in (like my Nielsen study that was actually a Bynder study). If a quote appears that I didn't supply in the outline, I find the source or I cut it. This is where the Ars Technica writer failed: he stopped before this step.&lt;/p&gt;

&lt;p&gt;Calling the model's role "execution" undersells it. The draft isn't just my outline translated into prose. It's a response, and it makes choices I wouldn't have made, finds angles I didn't plant in the outline. Some I reject. Some I keep and build on. The editing pass isn't just verification; it's a negotiation between my judgment and the LLM's enormous accumulation of captured human expression. When I push back on a draft, restructure a section, or challenge a framing, I'm not correcting a tool. I'm arguing with a collaborator that has read more than I ever will. The work that comes out the other side: the direction, the punch, the accuracy, is better than what I'd produce alone. That's not a caveat about AI collaboration. That's the whole argument for it.&lt;/p&gt;

&lt;p&gt;The result is work I own. Not because my name is on it. Because my thinking is in it, and my editorial judgment touched every line.&lt;/p&gt;

&lt;p&gt;There's a useful frame from software engineering for this. Birgitta Böckeler, writing for Martin Fowler's site, describes an engineering practice that my colleagues and I have found critical to good coding performance: building an outer harness around coding agents, a system of guides and sensors that increases the probability the agent gets things right the first time, and lets errors surface before they reach human eyes. The formulation: &lt;strong&gt;Agent = Model + Harness&lt;/strong&gt; [9]. The model generates; the harness gives the output shape and accountability.&lt;/p&gt;

&lt;p&gt;The outline is a harness. My Claude skill is a harness. My editorial pass is a harness. Feedforward and feedback controls: these either steer before the model acts, or catch what slipped through. What the Ars Technica writer lacked wasn't a better model. He lacked the right harness. The model ran free, and fabrication passed all the way to publication.&lt;/p&gt;




&lt;h2&gt;
  
  
  Five Practices To Incorporate for Best Results
&lt;/h2&gt;

&lt;p&gt;I'd argue that these are the workflow practices to implement in concrete form:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Never let a quote pass without a primary source.&lt;/strong&gt; If the model generates a quote or attributed statement, treat it as a placeholder until you've verified it against a transcript, recording, published interview, or direct outreach. "That sounds like something they'd say" is not a source. A source that references the source without strict attribution is not a source. Find a primary source or cut the quote.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The confidence of the output is a warning signal, not a quality signal.&lt;/strong&gt; Models hallucinate with authority. They build plausible outcomes, and plausibility is not the same as correctness. An authoritative-sounding claim is exactly when to slow down. Fluency and accuracy are unrelated.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Build a verification trail, even a minimal one.&lt;/strong&gt; A note in the draft: "Quote verified against [source], [date]" creates accountability and a path to correction if something slips. The Ars Technica writer had no trail. Ideally, you want one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Declare AI involvement on anything with legal, reputational, or attribution stakes.&lt;/strong&gt; Not on every internal email. But sourcing and attribution are precisely those contexts. Reuters now requires that any AI-assisted content involve a human verification step before publication; the AP prohibits AI from generating publishable content for the wire service [10].&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Own the outline; own the edit.&lt;/strong&gt; If you didn't write the structure and you didn't do the final pass, are you the author in any meaningful sense? I believe that both ends of the process have to be yours. There's a mechanism behind this, not just a principle: auditing AI output requires the domain knowledge you're supposed to be bringing to the piece. You can't catch a bad argument about a topic you don't understand. You can't spot a fabricated quote if you don't know the source material. The outline forces you to understand the argument before the model touches it. The edit forces you to verify it after. Remove either end, and you're not just outsourcing writing. You're outsourcing the capacity to know whether it's right.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Ars Technica's Response Was Right
&lt;/h2&gt;

&lt;p&gt;The firing will read as harsh to some. It shouldn't.&lt;/p&gt;

&lt;p&gt;Editorial trust is the product. A publication that treats AI fabrication as a correctable process failure, rather than a terminal professional breach, is signaling that its sourcing standards are negotiable. They aren't.&lt;/p&gt;

&lt;p&gt;The research on why this matters is unambiguous. The Nuremberg Institute for Market Decisions ran a controlled experiment: identical ads, labeled either "AI-generated" or "human-generated." Same content, different label. Consumers rated the AI-labeled versions as less natural, less useful, and showed lower willingness to purchase, not because the content differed, but because the label changed how they processed it [11]. NIQ's December 2024 study reinforced this neurologically: AI-generated ads elicited measurably weaker memory activation than human-made ads, even when rated as high quality [12]. Trust erosion in AI-detected content isn't just a perception problem. It appears to operate below conscious evaluation.&lt;/p&gt;

&lt;p&gt;A Reuters Institute study found that only 42% of news organizations have guidelines on disclosing AI use [13]. Most publications are still catching up to the policy problem. Ars Technica's speed and clarity sets a useful precedent for organizations writing those policies now: AI-fabricated attribution in published work is not a performance issue. It is an integrity issue. Those require different responses.&lt;/p&gt;

&lt;p&gt;The question for every organization that hasn't been through this yet: what's your equivalent? Most professional contexts don't have retraction processes. The client who discovers you put words in their mouth in a deliverable, the colleague who reads a fabricated characterization in a meeting summary: those errors don't get formally corrected. They just sit there, corroding trust.&lt;/p&gt;

&lt;p&gt;Build the guardrails before you need the retraction.&lt;/p&gt;




&lt;h2&gt;
  
  
  Putting It All Together
&lt;/h2&gt;

&lt;p&gt;The "write your own work" argument is right that thinking cannot be outsourced. It's right that credibility comes from contending with material, not from producing a document that approximates what people want to hear.&lt;/p&gt;

&lt;p&gt;It doesn't mean AI can't be in the process.&lt;/p&gt;

&lt;p&gt;The question isn't whether a model touched the prose. It's who owns the thinking going in and the verification coming out. An outline is cognitive work. An edit is cognitive work. The drafting between those two poles can be assisted, and even improved, without surrendering the part that actually matters.&lt;/p&gt;

&lt;p&gt;The tools aren't going away. The standards don't have to erode. Both things can be true, but only if the people using the tools decide where their editorial responsibility begins and, more importantly, where it cannot end.&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;p&gt;[1] Maggie Harrison Dupré, "Ars Technica Fires Reporter After AI Controversy Involving Fabricated Quotes," &lt;em&gt;Futurism&lt;/em&gt;, March 2, 2026. &lt;a href="https://futurism.com/artificial-intelligence/ars-technica-fires-reporter-ai-quotes" rel="noopener noreferrer"&gt;https://futurism.com/artificial-intelligence/ars-technica-fires-reporter-ai-quotes&lt;/a&gt;&lt;br&gt;
[2] Alex Woods, "Don't Let AI Write For You," alexhwoods.com, March 8, 2026. &lt;a href="https://alexhwoods.com/dont-let-ai-write-for-you/" rel="noopener noreferrer"&gt;https://alexhwoods.com/dont-let-ai-write-for-you/&lt;/a&gt;&lt;br&gt;
[3] Bynder, "AI vs. Human-Made Content Study," April 3, 2024 (n=2,000, 1,000 US / 1,000 UK). &lt;a href="https://www.bynder.com/en/press-media/ai-vs-human-made-content-study/" rel="noopener noreferrer"&gt;https://www.bynder.com/en/press-media/ai-vs-human-made-content-study/&lt;/a&gt;&lt;br&gt;
[4] Johann Hue et al., "Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models," &lt;em&gt;Journal of Legal Analysis&lt;/em&gt; 16 (2024): 64–93. &lt;a href="https://doi.org/10.1093/jla/laae003" rel="noopener noreferrer"&gt;https://doi.org/10.1093/jla/laae003&lt;/a&gt;&lt;br&gt;
[5] Anthropic, "Tracing the Thoughts of a Large Language Model," March 27, 2025. Primary papers: "Circuit Tracing: Revealing Computational Graphs in Language Models" and "On the Biology of a Large Language Model," transformer-circuits.pub. &lt;a href="https://transformer-circuits.pub/2025/attribution-graphs/biology.html" rel="noopener noreferrer"&gt;https://transformer-circuits.pub/2025/attribution-graphs/biology.html&lt;/a&gt;&lt;br&gt;
[6] Lyndal Lee et al., "The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers," Microsoft Research / Carnegie Mellon University, January 2025. &lt;a href="https://www.microsoft.com/en-us/research/publication/the-impact-of-generative-ai-on-critical-thinking-self-reported-reductions-in-cognitive-effort-and-confidence-effects-from-a-survey-of-knowledge-workers/" rel="noopener noreferrer"&gt;https://www.microsoft.com/en-us/research/publication/the-impact-of-generative-ai-on-critical-thinking-self-reported-reductions-in-cognitive-effort-and-confidence-effects-from-a-survey-of-knowledge-workers/&lt;/a&gt;&lt;br&gt;
[7] Nataliya Kosmyna et al., "Your Brain on ChatGPT: Accumulation of Cognitive Debt When Using an AI Assistant for Essay Writing Task," MIT Media Lab, June 10, 2025 (n=54, EEG-monitored). &lt;a href="https://www.media.mit.edu/publications/your-brain-on-chatgpt/" rel="noopener noreferrer"&gt;https://www.media.mit.edu/publications/your-brain-on-chatgpt/&lt;/a&gt;&lt;br&gt;
[8] Judy Hanwen Shen and Alex Tamkin, "How AI Impacts Skill Formation," arXiv:2601.20245, January 28, 2026. &lt;a href="https://doi.org/10.48550/arXiv.2601.20245" rel="noopener noreferrer"&gt;https://doi.org/10.48550/arXiv.2601.20245&lt;/a&gt;&lt;br&gt;
[9] Birgitta Böckeler, "Harness Engineering for Coding Agent Users," martinfowler.com, April 2, 2026. &lt;a href="https://martinfowler.com/articles/harness-engineering.html" rel="noopener noreferrer"&gt;https://martinfowler.com/articles/harness-engineering.html&lt;/a&gt;&lt;br&gt;
[10] Associated Press, "AP AI guidelines," August 2023. &lt;a href="https://www.ap.org/ai-tools-and-technology/" rel="noopener noreferrer"&gt;https://www.ap.org/ai-tools-and-technology/&lt;/a&gt;&lt;br&gt;
[11] Nuremberg Institute for Market Decisions (NIM), "Transparency Without Trust," &lt;em&gt;NIM INSIGHTS&lt;/em&gt;, Vol. 7 (n=1,000 each in USA, UK, Germany; two controlled experiments). &lt;a href="https://www.nim.org/en/publications/detail/transparency-without-trust" rel="noopener noreferrer"&gt;https://www.nim.org/en/publications/detail/transparency-without-trust&lt;/a&gt;&lt;br&gt;
[12] NIQ, "NIQ Research Uncovers Hidden Consumer Attitudes Toward AI-Generated Ads," December 12, 2024 (n=2,000+ survey; ~150 EEG). &lt;a href="https://nielseniq.com/global/en/news-center/2024/niq-research-uncovers-hidden-consumer-attitudes-toward-ai-generated-ads/" rel="noopener noreferrer"&gt;https://nielseniq.com/global/en/news-center/2024/niq-research-uncovers-hidden-consumer-attitudes-toward-ai-generated-ads/&lt;/a&gt;&lt;br&gt;
[13] Reuters Institute for the Study of Journalism, "Journalism, Media, and Technology Trends and Predictions 2025," (survey on AI policies in news organizations). &lt;a href="https://reutersinstitute.politics.ox.ac.uk/journalism-media-and-technology-trends-and-predictions-2025" rel="noopener noreferrer"&gt;https://reutersinstitute.politics.ox.ac.uk/journalism-media-and-technology-trends-and-predictions-2025&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;If this resonated, here are some related articles:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For the parallel argument in software: why thinking was always the job, and what changes when AI handles the typing: &lt;a href="https://www.linkedin.com/pulse/when-writing-software-typing-never-job-neither-prompting-keith-mackay-ohdqe/" rel="noopener noreferrer"&gt;When Writing Software, the Typing Was Never the Job&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/p/when-writing-software-the-typing" rel="noopener noreferrer"&gt;Substack&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;For why clear communication is the foundational AI skill — the same one that makes collaboration with AI legitimate vs. empty: &lt;a href="https://www.linkedin.com/pulse/most-important-ai-skill-isnt-technical-keith-mackay-xmdde/" rel="noopener noreferrer"&gt;The Most Important AI Skill Isn't Technical&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/p/the-most-important-ai-skill-isnt" rel="noopener noreferrer"&gt;Substack&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;For a framework on when to keep humans in the loop vs. move them before the loop — directly applicable to AI writing workflows: &lt;a href="https://www.linkedin.com/pulse/evolving-strategy-knowledge-work-from-keith-mackay-xiefe/" rel="noopener noreferrer"&gt;An Evolving Strategy for Knowledge Work: From Human-In-the-Loop to Human-Before-the-Loop&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/p/an-evolving-strategy-for-knowledge" rel="noopener noreferrer"&gt;Substack&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;For the management framework that maps well to deciding how much autonomy to give an AI collaborator: &lt;a href="https://www.linkedin.com/pulse/situational-leadership-ai-more-like-capable-colleague-keith-mackay-wjqoe" rel="noopener noreferrer"&gt;Situational Leadership for AI: More Like a Capable Colleague Than a Fancy Formula&lt;/a&gt; | &lt;a href="https://tlcmentor.substack.com/p/situational-leadership-for-ai" rel="noopener noreferrer"&gt;Substack&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Keith MacKay is a technology strategy consultant and CTO in EY-Parthenon's Software Strategy Group (SSG), specializing in AI disruption and technology diligence for private equity and corporate clients. SSG's AI Disruption Lab conducts rapid assessments of how AI transforms and threatens existing business models and value chains. Keith teaches at Northeastern University and writes about strategy, management, and AI/technology, with Claude and Codex as AI collaborators.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>coding</category>
      <category>convex</category>
      <category>agents</category>
      <category>mcp</category>
    </item>
  </channel>
</rss>
