<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Kento IKEDA</title>
    <description>The latest articles on DEV Community by Kento IKEDA (@ikenyal).</description>
    <link>https://dev.to/ikenyal</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F634239%2Fefa867e9-b872-436c-a450-2c5115bd4394.jpg</url>
      <title>DEV Community: Kento IKEDA</title>
      <link>https://dev.to/ikenyal</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ikenyal"/>
    <language>en</language>
    <item>
      <title>What is AWS Blocks? How it differs from Amplify and App Studio, and what each one is aiming for</title>
      <dc:creator>Kento IKEDA</dc:creator>
      <pubDate>Fri, 19 Jun 2026 22:12:14 +0000</pubDate>
      <link>https://dev.to/aws-builders/what-is-aws-blocks-how-it-differs-from-amplify-and-app-studio-and-what-each-one-is-aiming-for-2kn0</link>
      <guid>https://dev.to/aws-builders/what-is-aws-blocks-how-it-differs-from-amplify-and-app-studio-and-what-each-one-is-aiming-for-2kn0</guid>
      <description>&lt;p&gt;On June 16, 2026, AWS announced &lt;a href="https://aws.amazon.com/about-aws/whats-new/2026/06/aws-blocks-preview/" rel="noopener noreferrer"&gt;AWS Blocks&lt;/a&gt; as a public preview. It is an open-source framework where the TypeScript you write for your backend becomes the AWS infrastructure that runs it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/aws-devtools-labs/aws-blocks" rel="noopener noreferrer"&gt;https://github.com/aws-devtools-labs/aws-blocks&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The first thing many AWS users will think is "another full-stack tool." If you want a tool for frontend developers to build full-stack apps in TypeScript, Amplify Gen2 already exists. For people who don't write code, there is App Studio. I compared those three earlier in &lt;a href="https://zenn.dev/ikenyal/articles/c2c3ccf9fdd0cf" rel="noopener noreferrer"&gt;a write-up on App Studio, Amplify Gen1, and Amplify Gen2&lt;/a&gt;. Now Blocks joins them.&lt;/p&gt;

&lt;p&gt;This article organizes what AWS Blocks is from the official docs and the open-source code, lines it up against Amplify and App Studio, and finally sketches a map of what each one is aiming for. Rather than stopping at a feature-diff table, I want to get at why AWS is offering multiple entry points into full-stack development.&lt;/p&gt;

&lt;p&gt;A note: this is based on the official docs right after the preview announcement and on reading the open-source code. It is not based on long-term production use of Blocks. Read it with the understanding that implementation details may change.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is AWS Blocks
&lt;/h2&gt;

&lt;p&gt;Here is how the official docs define it. &lt;a href="https://docs.aws.amazon.com/blocks/latest/devguide/what-is-blocks.html" rel="noopener noreferrer"&gt;AWS Blocks&lt;/a&gt; is a backend toolkit for full-stack applications, where each Block is a self-contained backend capability that bundles the application code, a local development setup, and the infrastructure to run it. Pick the Blocks you need and compose them, and the infrastructure following AWS best practices is defined automatically.&lt;/p&gt;

&lt;p&gt;Type safety doesn't stop inside the backend. Types flow from the backend all the way to the client, reaching web frameworks (Next.js, Nuxt, Astro, React, Vue, Svelte, Angular) and native targets (Swift, Kotlin, Dart/Flutter). From a single backend, you can generate typed client code for both web and mobile.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/aws-devtools-labs/aws-blocks" rel="noopener noreferrer"&gt;https://github.com/aws-devtools-labs/aws-blocks&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That said, at the time of the preview announcement the frontends officially listed are SPAs (Vite + React) and SSR frameworks (Next.js, Nuxt, Astro), with support expected to widen over time. Blocks itself adds no extra charge; you pay only for the AWS services you use, and you can deploy to all commercial AWS Regions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/about-aws/whats-new/2026/06/aws-blocks-preview/" rel="noopener noreferrer"&gt;https://aws.amazon.com/about-aws/whats-new/2026/06/aws-blocks-preview/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From here, let's walk through the concepts you can't skip to understand Blocks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Block = an npm package bundling infrastructure, runtime, and local implementation
&lt;/h3&gt;

&lt;p&gt;One Block is one npm package, holding the cloud resources, runtime code, and local implementation for a single capability. Instantiate one &lt;code&gt;KVStore&lt;/code&gt;, for example, and you get all at once: a DynamoDB table auto-provisioned at deploy time, runtime code that runs on Lambda, and an in-memory implementation for local development.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/blocks/latest/devguide/concepts.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/blocks/latest/devguide/concepts.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The available Blocks cover most backend needs: data (&lt;code&gt;KVStore&lt;/code&gt;, &lt;code&gt;DistributedTable&lt;/code&gt;, &lt;code&gt;Database&lt;/code&gt;, &lt;code&gt;DistributedDatabase&lt;/code&gt;, &lt;code&gt;FileBucket&lt;/code&gt;), auth (&lt;code&gt;AuthBasic&lt;/code&gt;, &lt;code&gt;AuthCognito&lt;/code&gt;, &lt;code&gt;AuthOIDC&lt;/code&gt;), async work (&lt;code&gt;AsyncJob&lt;/code&gt;, &lt;code&gt;CronJob&lt;/code&gt;), AI (&lt;code&gt;Agent&lt;/code&gt;, &lt;code&gt;KnowledgeBase&lt;/code&gt;), communication (&lt;code&gt;Realtime&lt;/code&gt;, &lt;code&gt;EmailClient&lt;/code&gt;), configuration (&lt;code&gt;AppSetting&lt;/code&gt;), observability (&lt;code&gt;Logger&lt;/code&gt;, &lt;code&gt;Metrics&lt;/code&gt;, &lt;code&gt;Tracer&lt;/code&gt;, &lt;code&gt;Dashboard&lt;/code&gt;), and hosting (&lt;code&gt;Hosting&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/blocks/latest/devguide/what-is-blocks.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/blocks/latest/devguide/what-is-blocks.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What the official overview page doesn't give you, though, is a list of which AWS service each Block actually becomes inside. I was curious, so I read the open-source code (&lt;a href="https://github.com/aws-devtools-labs/aws-blocks" rel="noopener noreferrer"&gt;aws-devtools-labs/aws-blocks&lt;/a&gt;) to confirm the real services behind the main Blocks.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Block&lt;/th&gt;
&lt;th&gt;AWS service inside&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;KVStore&lt;/code&gt; / &lt;code&gt;DistributedTable&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;DynamoDB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Database&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Aurora Serverless v2 (PostgreSQL 16.4)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;DistributedDatabase&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Aurora DSQL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;FileBucket&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;S3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;AuthBasic&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;no dedicated infrastructure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;AuthCognito&lt;/code&gt; / &lt;code&gt;AuthOIDC&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Cognito&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;AsyncJob&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;SQS + Lambda&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;CronJob&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;EventBridge Scheduler&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Agent&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Strands Agents SDK + Bedrock&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;KnowledgeBase&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Bedrock + S3 Vectors&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Realtime&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;API Gateway v2 (WebSocket)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;EmailClient&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;SES&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;AppSetting&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;SSM Parameter Store&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;Logger&lt;/code&gt; / &lt;code&gt;Metrics&lt;/code&gt; / &lt;code&gt;Tracer&lt;/code&gt; / &lt;code&gt;Dashboard&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;CloudWatch&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Hosting&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;CloudFront + S3 + WAF + Route 53 + ACM&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A few findings the overview alone doesn't reveal:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;DistributedDatabase&lt;/code&gt; is Aurora DSQL, so DSQL's own constraints surface directly in the development experience. You can't mix DDL and DML in the same transaction, and only one DDL statement is allowed per transaction. Blocks rejects these on the client side with validation.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;KnowledgeBase&lt;/code&gt; uses S3 Vectors, which arrived in 2025, as the vector store for RAG.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Agent&lt;/code&gt; sits on top of Strands Agents SDK, AWS's open-source agent framework.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Hosting&lt;/code&gt; is less "CloudFront + S3" and more a heavier part that also bundles WAF, Route 53, and ACM.&lt;/p&gt;

&lt;h3&gt;
  
  
  IFC layer = the entry point where code becomes infrastructure
&lt;/h3&gt;

&lt;p&gt;The backend entry point of Blocks lives in a single file, &lt;code&gt;aws-blocks/index.ts&lt;/code&gt;. Instantiate Blocks and define your API there, and the infrastructure is derived directly from that code. No separate file for infrastructure definitions.&lt;/p&gt;

&lt;p&gt;This idea of deriving infrastructure from code is called Infrastructure from Code (IFC), and in the source this backend part was named the IFC subpackage.&lt;/p&gt;

&lt;p&gt;In other words, the code that describes infrastructure and the code that describes the app aren't separated. Infrastructure grows out of the app's code. This is the heart of Blocks' philosophy.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conditional exports = the same import switches implementation by context
&lt;/h3&gt;

&lt;p&gt;The reason a single file can play three roles at once, namely local development, deploy, and production runtime, is Node.js conditional exports, which route the same import statement to a different implementation per context.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Context&lt;/th&gt;
&lt;th&gt;Resolved implementation&lt;/th&gt;
&lt;th&gt;What happens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Local development&lt;/td&gt;
&lt;td&gt;in-memory / filesystem&lt;/td&gt;
&lt;td&gt;the app runs on localhost alone&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CDK synthesis&lt;/td&gt;
&lt;td&gt;CDK constructs&lt;/td&gt;
&lt;td&gt;infrastructure is defined for CloudFormation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lambda runtime&lt;/td&gt;
&lt;td&gt;AWS SDK&lt;/td&gt;
&lt;td&gt;real services are called in production&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TypeScript / IDE&lt;/td&gt;
&lt;td&gt;type definitions&lt;/td&gt;
&lt;td&gt;completion and type checking work&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The same line &lt;code&gt;new KVStore(scope, 'todos')&lt;/code&gt; becomes a local store in development, a DynamoDB table at deploy time, and an SDK call in production. You never change the code. Without writing any configuration by hand, module resolution picks the implementation per context. Reading the source package.json, they are switched by conditions like &lt;code&gt;cdk&lt;/code&gt;, &lt;code&gt;aws-runtime&lt;/code&gt;, &lt;code&gt;types&lt;/code&gt;, and &lt;code&gt;default&lt;/code&gt; (the local mock).&lt;/p&gt;

&lt;h3&gt;
  
  
  ApiNamespace = type-safe RPC with no code generation
&lt;/h3&gt;

&lt;p&gt;The part that calls the backend from the frontend is handled by ApiNamespace. The frontend imports and calls methods defined in the backend directly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Frontend: import the backend API directly&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;api&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../aws-blocks/index.js&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;greet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;World&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// TypeScript knows the return type too&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No code generation step, no API client initialization, no URL configuration. Change a backend method signature and the frontend gets a compile error instantly. Locally it goes through an HTTP server; in production it reaches Lambda through API Gateway.&lt;/p&gt;

&lt;p&gt;The transport underneath is JSON-RPC 2.0. When you call a typed method, it is converted into a JSON-RPC request internally and delivered to the backend. The developer never assembles a payload by hand; the transport stays hidden behind the types.&lt;/p&gt;

&lt;h3&gt;
  
  
  Local-first = everything runs without an AWS account
&lt;/h3&gt;

&lt;p&gt;Run &lt;code&gt;npm run dev&lt;/code&gt; and the whole app starts locally. Blocks resolve to local implementations (an in-memory store, local auth, an embedded DB), running at &lt;code&gt;http://localhost:3000&lt;/code&gt; with hot reload. No AWS account, no internet connection, no cloud billing. Local data is persisted under &lt;code&gt;.bb-data/&lt;/code&gt; at the project root.&lt;/p&gt;

&lt;p&gt;The "embedded DB" here turns out to be PGlite (a WebAssembly build of PostgreSQL). Not only &lt;code&gt;Database&lt;/code&gt; but also &lt;code&gt;DistributedDatabase&lt;/code&gt;, which uses Aurora DSQL, falls back to PGlite locally. It works because DSQL is PostgreSQL-compatible, so a near-real Postgres comes up locally without an AWS account.&lt;/p&gt;

&lt;p&gt;When you want to check behavior against real cloud services, &lt;code&gt;npm run sandbox&lt;/code&gt; deploys to a fast, disposable environment using hot-swap to Lambda. The mocks are swapped for real AWS services (DynamoDB, Aurora, S3, Lambda, and so on), and once you are done you can tear it all down with &lt;code&gt;npm run sandbox:destroy&lt;/code&gt;. To ship to production, you run &lt;code&gt;npm run deploy&lt;/code&gt;. The same backend code runs in all three.&lt;/p&gt;

&lt;h3&gt;
  
  
  Direct CDK access = if Blocks isn't enough, write it yourself
&lt;/h3&gt;

&lt;p&gt;Every Blocks app is a CDK app. You can use arbitrary CDK constructs alongside Blocks, and you can embed Blocks into an existing CDK stack.&lt;/p&gt;

&lt;p&gt;When you want to add a resource Blocks doesn't provide (SNS, Step Functions, and the like) or set up a custom domain, you write &lt;code&gt;aws-blocks/index.cdk.ts&lt;/code&gt; and access CDK constructs directly. Normally the infrastructure is derived from the backend definition (&lt;code&gt;aws-blocks/index.ts&lt;/code&gt;), so you don't need to touch CDK directly. When you do need it, you just write CDK and can build as far as you like, beyond the edges of the framework. By design, it is structurally hard to get trapped and stuck inside the abstraction.&lt;/p&gt;

&lt;h3&gt;
  
  
  AGENTS.md bundled = agents write correct code from the start
&lt;/h3&gt;

&lt;p&gt;Blocks ships an agent-facing guide inside the npm package. Without adding a plugin, an AI coding agent is said to be steered toward writing correct code from the start.&lt;/p&gt;

&lt;p&gt;Its actual form turned out to be an &lt;code&gt;AGENTS.md&lt;/code&gt; placed in the project. Reading it, rather than carrying the full guide inline, it is a pointer to references. The detailed explanation, how to choose a Block, and how to use each Block live under &lt;code&gt;node_modules/@aws-blocks/blocks/&lt;/code&gt; in README.md, docs/index.md, and docs/.md, and it tells the agent to read those. Alongside, the Rules forbid anti-patterns: always use a Block for persistence (no local files, in-memory arrays, or local DBs), and don't assemble JSON-RPC payloads by hand.&lt;/p&gt;

&lt;p&gt;Rather than pouring everything into the agent's context, you have it read only the docs it needs when it needs them. This pointer approach is sound as a design for agent-facing docs, and it is worth noting that an official AWS framework ships with it.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it differs from Amplify and App Studio
&lt;/h2&gt;

&lt;p&gt;Let me line up what we have covered against Amplify Gen2 and App Studio.&lt;/p&gt;

&lt;p&gt;Amplify Gen2 is a development platform where you describe data models, business logic, and authn/authz in TypeScript and the appropriate cloud resources are auto-provisioned. It is built on CDK internally and offers categories like Data, Auth, Storage, and Functions out of the box. Per-developer cloud sandboxes, shared environments mapped one-to-one to Git branches, and the Amplify Console that bundles hosting and CI/CD are its hallmarks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.amplify.aws/react/how-amplify-works/concepts/" rel="noopener noreferrer"&gt;https://docs.amplify.aws/react/how-amplify-works/concepts/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;App Studio sits on the low-code side, letting non-developers and beginners with little coding experience design and build apps. Where Amplify Gen2 and Blocks target developers, App Studio aims at a fundamentally different audience. I have organized the product details in the write-up I mentioned at the top.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://zenn.dev/ikenyal/articles/c2c3ccf9fdd0cf" rel="noopener noreferrer"&gt;https://zenn.dev/ikenyal/articles/c2c3ccf9fdd0cf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Lined up by aspect:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;App Studio&lt;/th&gt;
&lt;th&gt;Amplify Gen2&lt;/th&gt;
&lt;th&gt;AWS Blocks&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Primary audience&lt;/td&gt;
&lt;td&gt;non-developers&lt;/td&gt;
&lt;td&gt;frontend developers&lt;/td&gt;
&lt;td&gt;developers who also write backends&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;How infrastructure is handled&lt;/td&gt;
&lt;td&gt;fully hidden&lt;/td&gt;
&lt;td&gt;abstracted by category&lt;/td&gt;
&lt;td&gt;derived from code (IFC)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Local development&lt;/td&gt;
&lt;td&gt;cloud-first&lt;/td&gt;
&lt;td&gt;per-developer cloud environment&lt;/td&gt;
&lt;td&gt;fully local, no AWS account&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hosting / CI-CD&lt;/td&gt;
&lt;td&gt;built in&lt;/td&gt;
&lt;td&gt;bundled in the Amplify Console&lt;/td&gt;
&lt;td&gt;Hosting is one Block, CI/CD is your own&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;How types flow&lt;/td&gt;
&lt;td&gt;types aren't a concern&lt;/td&gt;
&lt;td&gt;schema-driven Data types&lt;/td&gt;
&lt;td&gt;type-safe RPC, no generation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Distance from CDK&lt;/td&gt;
&lt;td&gt;far&lt;/td&gt;
&lt;td&gt;used for extensions when needed&lt;/td&gt;
&lt;td&gt;a CDK app from the start, write CDK directly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI coding agents&lt;/td&gt;
&lt;td&gt;out of scope&lt;/td&gt;
&lt;td&gt;little explicit support&lt;/td&gt;
&lt;td&gt;bundled AGENTS.md as a premise&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A shared obsession with iteration speed shows up too. Amplify Gen2 advertises up to 8x faster iteration than Gen1, speeding up cloud-side reflection via hot-swap to per-developer sandboxes. Blocks, on the other hand, completes locally, so in many cases there is no round trip to the cloud at all. The direction of "faster" differs, but the aim of shortening the write-then-check loop is shared.&lt;/p&gt;

&lt;h2&gt;
  
  
  What each one is aiming for
&lt;/h2&gt;

&lt;p&gt;Rather than feature diffs, let me put into words what the three are trying to achieve.&lt;/p&gt;

&lt;p&gt;What App Studio aims for is delivering apps to people who don't write code. Its abstraction is the highest, and it minimizes the developer's involvement the most.&lt;/p&gt;

&lt;p&gt;What Amplify aims for is freeing frontend developers from infrastructure. Even though Gen2 was rebuilt on CDK, what developers face day to day are categories like Data, Auth, and Storage, and that managed mass covers up the infrastructure underneath. It takes care of hosting and CI/CD together, freeing developers from stitching individual AWS services by hand. The key is to hide.&lt;/p&gt;

&lt;p&gt;What Blocks aims for looks a little different. The entry point of making infrastructure tools unnecessary to learn is similar to Amplify, but its means leans toward making things transparent with code and types rather than hiding them. Infrastructure grows from the app's code, the same code runs both locally and in the cloud, and you can write CDK directly when needed. Rather than ease through concealment, it aims for reassurance through transparency and the absence of a ceiling.&lt;/p&gt;

&lt;p&gt;In short, Amplify reaches the shared goal of making frontend developers' lives easier by keeping infrastructure out of mind, while Blocks does it by letting you stay aware of infrastructure without requiring you to. They take two routes to the same place. The former thickens the wall of abstraction; the latter makes it thin and transparent.&lt;/p&gt;

&lt;p&gt;And Blocks carries one more premise that is distinctive of 2026. It takes for granted that AI agents write code, and the framework itself carries the correct way to write from the start. This is a design that lowers not only the cost of humans learning but the cost of agents making mistakes, and its starting point looks different from Amplify's design philosophy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where is Amplify headed
&lt;/h2&gt;

&lt;p&gt;What follows is interpretation, not fact. I write it on the premise that none of it is certain.&lt;/p&gt;

&lt;p&gt;The official docs explicitly call the relationship between Blocks and Amplify complementary. Amplify provides hosting, CI/CD, and a managed backend experience; Blocks focuses on type-safe infrastructure-from-code and local-first development.&lt;/p&gt;

&lt;p&gt;Still, the overlap is not small. Amplify Gen2 also defines backends code-first in TypeScript, on top of CDK. Blocks' IFC layer and Amplify Gen2's backend definition are both a "write your backend in TypeScript" experience standing on CDK, so they sit close in philosophy.&lt;/p&gt;

&lt;p&gt;The closeness shows on the implementation side too. Blocks' project-creation CLI auto-detects an existing Amplify Gen2 project and integrates with it, and there is even a dedicated &lt;code&gt;amplify&lt;/code&gt; template. Adding Blocks to an Amplify Gen2 backend is an entry path assumed from the start. The complementary relationship is not just a line in the docs; it is implemented as tool behavior.&lt;/p&gt;

&lt;p&gt;One possibility I read from this: Amplify shifts its center of gravity toward the hosting and managed-experience layer, while Blocks takes the composable-backend-parts and local-development layer. In fact, Hosting in Blocks is treated as one part bundling CloudFront, S3, and even WAF, and the integrated hosting and CI/CD experience remains a strength on Amplify's side.&lt;/p&gt;

&lt;p&gt;What matters is that this doesn't necessarily mean Amplify is shrinking. It reads more naturally as AWS deliberately keeping multiple entry points into full-stack development. App Studio for non-developers, Amplify for those who want a managed experience, Blocks for those who want to command infrastructure with code and types. Rather than converging on a single right answer, it looks like a strategy of preparing a different door for each developer's stance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;AWS Blocks turned out to be not a replacement for Amplify but one more entry point into full-stack development. Amplify, which makes things easy by hiding; Blocks, which makes things transparent and hands you the controls. Which you choose is also a statement of how you want to relate to infrastructure.&lt;/p&gt;

&lt;p&gt;AWS is not converging on a single answer. When the doors multiply, the question isn't which tool is better, but which kind of developer you want to be.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>typescript</category>
      <category>amplify</category>
      <category>cdk</category>
    </item>
    <item>
      <title>S3 annotations and the question of where object metadata should live</title>
      <dc:creator>Kento IKEDA</dc:creator>
      <pubDate>Thu, 18 Jun 2026 23:59:16 +0000</pubDate>
      <link>https://dev.to/aws-builders/s3-annotations-and-the-question-of-where-object-metadata-should-live-44h3</link>
      <guid>https://dev.to/aws-builders/s3-annotations-and-the-question-of-where-object-metadata-should-live-44h3</guid>
      <description>&lt;p&gt;On June 17, 2026, AWS Summit New York ran a long line of announcements that filled in the infrastructure for running agents in production. The official blog has the full roundup.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/blogs/aws/top-announcements-of-the-aws-summit-in-new-york-2026/" rel="noopener noreferrer"&gt;https://aws.amazon.com/blogs/aws/top-announcements-of-the-aws-summit-in-new-york-2026/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;One of them reads as plain from the headline alone: S3 annotations. You can attach up to 1 GB of information directly to an object, so the first impression is "tags just got bigger."&lt;/p&gt;

&lt;p&gt;Reading it as a capacity story misses the point. This changes a decision: where the information attached to an S3 object should live. For a long time, the answer was usually "outside the object." You kept it in an external database or a sidecar file and reconciled the two with a sync process. S3 annotations add another option to that answer: don't move it out. If you have kept data in S3 while managing its metadata off to the side, you know the cost of keeping the two in sync.&lt;/p&gt;

&lt;p&gt;The official announcement is here:&lt;br&gt;
&lt;a href="https://aws.amazon.com/blogs/aws/amazon-s3-annotations-attach-rich-queryable-context-directly-to-your-objects/" rel="noopener noreferrer"&gt;https://aws.amazon.com/blogs/aws/amazon-s3-annotations-attach-rich-queryable-context-directly-to-your-objects/&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The difference is character, not capacity
&lt;/h2&gt;

&lt;p&gt;Line up annotations against the existing ways to describe an object, and what matters is the difference in character, not the numbers.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mechanism&lt;/th&gt;
&lt;th&gt;How much&lt;/th&gt;
&lt;th&gt;Mutable&lt;/th&gt;
&lt;th&gt;Character&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;System-defined metadata&lt;/td&gt;
&lt;td&gt;Fixed fields&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Intrinsic properties of the object&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;User-defined metadata&lt;/td&gt;
&lt;td&gt;2 KB total, set at upload&lt;/td&gt;
&lt;td&gt;Effectively no&lt;/td&gt;
&lt;td&gt;Small incidental notes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Object tags&lt;/td&gt;
&lt;td&gt;Up to 10&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Labels for access control and lifecycle&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;annotations&lt;/td&gt;
&lt;td&gt;Up to 1,000, 1 GB total&lt;/td&gt;
&lt;td&gt;Yes, without rewriting&lt;/td&gt;
&lt;td&gt;Structured knowledge that grows over time&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The counts and sizes come from the official blog. The clear gap is in the bottom two rows. Tags are key-and-value labels, meant for access control and cost allocation. annotations carry structure in JSON or YAML, and you can rewrite them as often as you like without rewriting the object. They travel with the object on copy and replication, and they are removed when the object is deleted.&lt;/p&gt;

&lt;p&gt;A different character means a different job. Tags describe how to handle an object. annotations hold what the object is and what is known about it, the kind of knowledge that accumulates after the fact: an AI-generated summary, an inference confidence score, processing history. That information does not fit in 2 KB, and you want to update it as the data changes. Until now, meeting that requirement meant moving the context outside the object.&lt;/p&gt;

&lt;h2&gt;
  
  
  What AWS says, and one step past it
&lt;/h2&gt;

&lt;p&gt;The official blog describes annotations as removing the need for a separate metadata system. It is true that the pattern of double-writing to DynamoDB for cross-object search, syncing with Lambda, and watching for drift can be retired for some use cases. The annotations you attach flow through S3 Metadata into Apache Iceberg tables and become queryable from Amazon Athena.&lt;/p&gt;

&lt;p&gt;But stopping at "you no longer need an external database" only repeats what AWS already said. The part worth pressing on is what else moves along with the location. When the context lived outside the object, who could touch it was decided directly by the access control on that external database. Once it sits on the object, you have to redesign who is allowed to change which context. Adding and reading annotations requires the IAM actions &lt;code&gt;s3:PutObjectAnnotation&lt;/code&gt; and &lt;code&gt;s3:GetObjectAnnotation&lt;/code&gt;. Coupling context tightly to data is the benefit. It also means tampering with the context turns directly into a misreading of the data.&lt;/p&gt;

&lt;p&gt;The other thing that moves is responsibility for structure. Key-and-value tags left almost no room for design, but being able to hold structure means you have to decide which key represents what and at what granularity to split things. If an agent is meant to read it, that decision drives search quality. Dump everything in, and you end up with annotations that are useless in a later cross-object query. The freedom of capacity arrives bundled with the responsibility of design.&lt;/p&gt;

&lt;h2&gt;
  
  
  How much of your sync layer can you actually retire
&lt;/h2&gt;

&lt;p&gt;"Then move everything to annotations" does not follow. There is a line on what you can move over.&lt;/p&gt;

&lt;p&gt;In a verification by Classmethod, the Annotation Table took about 25 minutes to become active even in a small environment. That is a third-party measurement, but it lines up with the spec: reflection into the table is asynchronous. What you attach is not reflected into cross-object search the instant you write it.&lt;br&gt;
&lt;a href="https://dev.classmethod.jp/articles/s3-annotations-crud-athena-search/" rel="noopener noreferrer"&gt;https://dev.classmethod.jp/articles/s3-annotations-crud-athena-search/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The plain conclusion is about fit with low-latency reads. Information you want to pull in milliseconds on every screen render, or that needs a secondary-index lookup, still fits the older DynamoDB design better. Context that gets updated later but does not need immediacy, such as compliance status, history, or summaries, leans toward annotations. What you retire is part of the sync layer, not all of it.&lt;/p&gt;

&lt;p&gt;Cost does not move in one direction either. You can let go of the sync layer that watches for drift, but storing and reading annotations is billed at the same rates as S3 Standard storage and requests, and the S3 Metadata and Annotation Tables behind cross-object search carry their own processing and storage charges. You weigh the cost you remove against the cost you add. The pricing page has the breakdown.&lt;br&gt;
&lt;a href="https://aws.amazon.com/s3/pricing/" rel="noopener noreferrer"&gt;https://aws.amazon.com/s3/pricing/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Region coverage differs too. Per the official blog, attaching an annotation works in nearly all Regions, while the Annotation Table used for cross-object search is limited to Regions where S3 Metadata is available. S3 Metadata coverage has been expanding in waves.&lt;br&gt;
&lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/11/amazon-s3-metadata-expands-22-regions/" rel="noopener noreferrer"&gt;https://aws.amazon.com/about-aws/whats-new/2025/11/amazon-s3-metadata-expands-22-regions/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Coverage and per-feature availability change, so the documentation is the reliable place to check the current state.&lt;br&gt;
&lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/annotations-overview.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/AmazonS3/latest/userguide/annotations-overview.html&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Is an agent a precondition
&lt;/h2&gt;

&lt;p&gt;annotations are designed for agents, and the official explanation is written with that in mind. It is tempting to read "I don't run AI agents, so this isn't for me" and skip the rest, but hold off for a moment.&lt;/p&gt;

&lt;p&gt;Agents or not, if you operate sidecar files or an external metadata database today, the shift in location is where the benefit shows up. What annotations really are is an improvement to metadata management itself: structured context attached close to the object, queryable across objects. Natural-language search can be added later through the S3 Tables MCP server, so retiring the sync layer first is a fine way in. The agent is a possible first reader of the context you place, not a precondition.&lt;/p&gt;

&lt;h2&gt;
  
  
  From object-level to organization-level context
&lt;/h2&gt;

&lt;p&gt;Widen the view, and annotations look less like a standalone feature and more like part of a movement. At the same Summit, AWS previewed AWS Context, which maps the data relationships across an organization into a knowledge graph. Its availability is stated as forthcoming.&lt;br&gt;
&lt;a href="https://aws.amazon.com/blogs/machine-learning/context-intelligence-for-your-data-and-ai-agents-at-scale/" rel="noopener noreferrer"&gt;https://aws.amazon.com/blogs/machine-learning/context-intelligence-for-your-data-and-ai-agents-at-scale/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If S3 annotations are context at the object level, AWS Context is context at the organization level. Both are designed to surface in Apache Iceberg format in S3, so they read as continuous. The movement is from each team holding its own context for RAG toward a shared context layer with managed access for the whole organization. annotations cover the lowest layer of that, the part where context is attached close to the object.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where to start
&lt;/h2&gt;

&lt;p&gt;The first step is an inventory of the context you currently hold in external databases or sidecars. Pull out the context described above, the kind that gets updated later but does not need immediacy, and make it a candidate for annotations. For schema, deciding a handful of keys you want an agent to read on a single bucket is enough to begin. As long as you hold the premise that responsibility for structure is now yours, you lower the odds of getting stuck in a later cross-object query.&lt;/p&gt;

&lt;p&gt;For a long time, keeping context outside the object was simply the premise. There is meaning in being able to question that premise at all. This is not a story about more capacity. It is a story about who is responsible for the context, and that answer has come back to the side of the object.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>s3</category>
      <category>ai</category>
      <category>metadata</category>
    </item>
    <item>
      <title>Your Claude automation starts metering today (June 15). A quick checklist to avoid surprise charges</title>
      <dc:creator>Kento IKEDA</dc:creator>
      <pubDate>Sun, 14 Jun 2026 23:00:00 +0000</pubDate>
      <link>https://dev.to/aws-builders/your-claude-automation-starts-metering-today-june-15-a-quick-checklist-to-avoid-surprise-charges-3c3k</link>
      <guid>https://dev.to/aws-builders/your-claude-automation-starts-metering-today-june-15-a-quick-checklist-to-avoid-surprise-charges-3c3k</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Update (June 16, 2026):&lt;/strong&gt; After this article was published, Anthropic emailed eligible users to say it is not making this change, which had been set for June 15. For now, Agent SDK and &lt;code&gt;claude -p&lt;/code&gt; usage continues to work with your subscription exactly as before, and there is no credit to claim. Subscription usage limits are unchanged. Anthropic says it will give advance notice before any future change takes effect.&lt;/p&gt;

&lt;p&gt;So the "conversation vs automation" split described below is postponed for now. Whether the direction itself is withdrawn, or will return on a revised plan and timeline, is not clear from the current notice. The body below remains as an explanation of what was originally announced. Note that the official help page had not been updated at the time of writing and still shows the June 15 date.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A change to Claude's paid plans takes effect today, June 15, 2026. Your monthly fee is not going up. What changes is that usage which used to come out of a single pool splits into two: the part where a person is in a conversation, and the part where a program runs on its own. The details are in Anthropic's help center, and every number and scope in this post comes from that page.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Starting June 15, 2026, Claude Agent SDK and &lt;code&gt;claude -p&lt;/code&gt; usage no longer counts toward your Claude plan's usage limits. Your subscription usage limits stay the same and stay reserved for interactive use of Claude Code, Claude Cowork, and Claude.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://support.claude.com/en/articles/15036540-use-the-claude-agent-sdk-with-your-claude-plan" rel="noopener noreferrer"&gt;https://support.claude.com/en/articles/15036540-use-the-claude-agent-sdk-with-your-claude-plan&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The side that gets affected is the second one. If you have been running automation inside your subscription plan, part of that usage leaves the flat-rate pool today and starts metering. To be precise, what you spend beyond a newly granted monthly credit either bills at standard API rates or stops. If you only chat through the app or the terminal, nothing changes for you.&lt;/p&gt;

&lt;p&gt;The first thing to get right is which of your own usage is in scope. Get this wrong and you risk either stopping automation you wanted to keep running, or triggering charges you did not expect. This post gives you a checklist to avoid surprise charges as a same-day fix, then connects it to the larger question of how to design your automation from here.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is changing
&lt;/h2&gt;

&lt;p&gt;There is one line to draw: is a person typing and waiting for a response, or is a program calling Claude on their behalf? The former keeps using your subscription usage limits exactly as before. The latter is what the new monthly credit covers.&lt;/p&gt;

&lt;p&gt;Here is the official breakdown.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;How you use it&lt;/th&gt;
&lt;th&gt;What happens from June 15&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude conversations on web, desktop, or mobile apps&lt;/td&gt;
&lt;td&gt;Stays on subscription. No change&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Interactive Claude Code in the terminal or IDE&lt;/td&gt;
&lt;td&gt;Stays on subscription. No change&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Cowork&lt;/td&gt;
&lt;td&gt;Stays on subscription. No change&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;claude -p&lt;/code&gt; (non-interactive mode)&lt;/td&gt;
&lt;td&gt;Moves to the monthly credit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Agent SDK (Python or TypeScript)&lt;/td&gt;
&lt;td&gt;Moves to the monthly credit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Code GitHub Actions integration&lt;/td&gt;
&lt;td&gt;Moves to the monthly credit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Third-party apps that authenticate through the Agent SDK&lt;/td&gt;
&lt;td&gt;Moves to the monthly credit&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you open the app and chat, or type into Claude Code in the terminal and wait for replies, June 15 looks the same as before. Those keep drawing from your subscription usage limits and never touch the new credit. If that is you, there is nothing to do today.&lt;/p&gt;

&lt;p&gt;The dividing line is whether you operate Claude by hand or have code operate it for you. Running &lt;code&gt;claude -p&lt;/code&gt; from a shell script or cron, having a homegrown app built on the Agent SDK, wiring Claude Code into a GitHub Actions CI step. If any of these sound familiar, the checklist below is for you.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it split into two
&lt;/h2&gt;

&lt;p&gt;Before the checklist, the background helps you make the call. Note that Anthropic has not stated the reason outright; what follows is my reading of what the announcement implies.&lt;/p&gt;

&lt;p&gt;The flat-rate subscription pool let programmatic use, which should run on metered API pricing, run far more cheaply than it otherwise would. Unlike a conversation, automation is not bound by human typing speed. A script or an agent fires requests back to back without pausing, so token consumption climbs even within the same monthly fee. The flat pool absorbed that non-stop consumption. Run the same work through metered API pricing instead, and the gap widens the longer it runs.&lt;/p&gt;

&lt;p&gt;In other words, the flat-rate plan was effectively subsidizing the cost of automation. Conversation has a natural ceiling because the human pauses; automation has no such brake. Keep both in the same pool, and the heaviest automation users get the biggest discount, while the gap between cost and price keeps growing.&lt;/p&gt;

&lt;p&gt;Read the split as ending that subsidy and it makes sense. The conversational part stays inside the flat rate; the part where code runs on its own moves toward metering that tracks real cost. Conversation and automation went into separate wallets because their cost structures were different to begin with. It is less a price hike than a separation of things that had been mixed together.&lt;/p&gt;

&lt;p&gt;This reading carries into the decisions later in the post. Automation is no longer an extra bundled into a flat rate; it has become something you design with cost in mind.&lt;/p&gt;

&lt;h2&gt;
  
  
  A checklist to avoid surprise charges
&lt;/h2&gt;

&lt;p&gt;To sort out whether you are in scope and to avoid surprise charges or unexpected stops, here is what to check today. Go through it in order and you will know which side you are on and what to set.&lt;/p&gt;

&lt;p&gt;First, check whether your day-to-day use is interactive or non-interactive. Chatting on the app or web, typing into Claude Code in the terminal or IDE and waiting for a reply, Claude Cowork. If that is all, you are out of scope and the rest is unnecessary. Most people stop here.&lt;/p&gt;

&lt;p&gt;Second, find every place that runs on its own. Are you calling &lt;code&gt;claude -p&lt;/code&gt; from cron or a shell script? Is there a homegrown script or app (Python or TypeScript) built on the Agent SDK? Is Claude Code wired into CI, GitHub Actions in particular? All of these are in scope. The ones running quietly in the background are the easiest to miss, so it is worth opening your job definitions, crontab, and repository workflows once to check.&lt;/p&gt;

&lt;p&gt;Third, check the authentication path of any third-party tools. If an editor extension or an external agent tool is linked to your Claude account and authenticates through the Agent SDK on your subscription, it is in scope. Even if you never wrote &lt;code&gt;claude -p&lt;/code&gt; yourself, a tool making non-interactive calls behind the scenes counts.&lt;/p&gt;

&lt;p&gt;Fourth, check whether those authenticate via subscription or an API key. Automation that already authenticates with a Claude Platform API key is out of scope for this change. Pay-as-you-go billing continues, unaffected by the monthly credit. This is the final fork for whether you are in scope.&lt;/p&gt;

&lt;p&gt;If you only use it interactively, there is nothing to do today. Only if you fall into automation, third-party tools, or the Agent SDK, and you are running on subscription authentication, do you move on to the next step.&lt;/p&gt;

&lt;h2&gt;
  
  
  If you are affected, what to do
&lt;/h2&gt;

&lt;p&gt;What you do divides into two stages on a time axis: the same-day fix you finish today, and the deeper rework you take your time over. The first keeps automation from stopping, but it is a stopgap; the real solution is in the second.&lt;/p&gt;

&lt;h3&gt;
  
  
  Quick fix: claim the credit and decide how it stops
&lt;/h3&gt;

&lt;p&gt;Eligible plans get a monthly credit. Officially it is called the "Agent SDK credit," but it is not limited to people writing the SDK directly. The &lt;code&gt;claude -p&lt;/code&gt; calls in cron or CI, the GitHub Actions integration, and the third-party tools you checked above all draw from this same credit. Even if you never touch the SDK, if something runs Claude non-interactively, you are in scope. The credit equals your plan's monthly fee.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Plan&lt;/th&gt;
&lt;th&gt;Monthly credit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Pro&lt;/td&gt;
&lt;td&gt;$20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max 5x&lt;/td&gt;
&lt;td&gt;$100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max 20x&lt;/td&gt;
&lt;td&gt;$200&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Team (Standard)&lt;/td&gt;
&lt;td&gt;$20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Team (Premium)&lt;/td&gt;
&lt;td&gt;$100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise (usage-based)&lt;/td&gt;
&lt;td&gt;$20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise (seat-based Premium)&lt;/td&gt;
&lt;td&gt;$200&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise (seat-based Standard)&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;On seat-based Enterprise plans, Premium seats are eligible but Standard seats are not. You claim the credit once through your account (a one-time opt-in), and it refreshes automatically each billing cycle after that. From June 15, eligible users are said to receive an email with claim instructions.&lt;/p&gt;

&lt;p&gt;Here the path forks. If you plan to keep using automation, the first move is to not miss that email and claim the credit. If instead you would rather avoid being charged for automation, or you are fine letting it stop, there is no rush to claim. It starts with deciding whether you will keep running automation at all.&lt;/p&gt;

&lt;p&gt;Next, decide what happens once the credit runs out. Agent SDK usage draws from this credit first, and what happens beyond it depends on a setting. With "Extra Usage" enabled, the overage bills at standard API rates and automation keeps running. Without it, requests stop the moment the credit is exhausted and stay stopped until it refreshes.&lt;/p&gt;

&lt;p&gt;For a production workflow you cannot afford to stop, enabling extra usage to keep it running is the safer choice, though it leaves room for an unexpectedly large bill. For experimental automation that can stop without hurting, you might deliberately leave it off and let the credit act as a spending ceiling that is never crossed. You weigh "how bad is it if it stops" against "how scared am I of surprise charges."&lt;/p&gt;

&lt;p&gt;While you are at it, note the credit's constraints. It is granted per user and cannot be pooled or shared across a team. It resets monthly and unused credit does not roll over. If your team runs automation, the idea of "everyone drawing from one person's credit" does not work, and that is worth keeping in mind.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deeper rework: choose the authentication path as a design decision
&lt;/h3&gt;

&lt;p&gt;The quick fix was about getting by within the credit you are given. The deeper rework is reconsidering, from the ground up, which authentication path your automation rides on.&lt;/p&gt;

&lt;p&gt;Anthropic positions the monthly credit as "sized for individual experimentation and automation." Read the other way, shared production automation is not expected to fit within the credit. In fact, Anthropic points teams running production automation to a Claude Platform API key. API-key usage is out of scope for this change: pay-as-you-go continues and no monthly credit is granted.&lt;/p&gt;

&lt;p&gt;This is where the earlier "automation is no longer an extra" reading pays off. When you build automation, which path you put it on, subscription auth or API-key auth, has become a design decision that shapes both cost structure and behavior at the limit.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Auth path&lt;/th&gt;
&lt;th&gt;How cost reads&lt;/th&gt;
&lt;th&gt;When you hit the limit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Subscription auth (Agent SDK credit)&lt;/td&gt;
&lt;td&gt;Monthly credit equal to your plan fee, resets monthly&lt;/td&gt;
&lt;td&gt;Stops if extra usage is off, continues at standard API rates if on&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API-key auth (Claude Platform)&lt;/td&gt;
&lt;td&gt;Consumed pay-as-you-go from prepaid credit balance&lt;/td&gt;
&lt;td&gt;Stops when the balance runs out, can continue via auto-reload&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For light experiments or personal helper automation, the subscription credit covers it. The way it stops when the credit runs out works as a spending ceiling you never cross. For production automation that hurts when it stops, or workflows shared across a team, API-key auth is easier to handle: consumption shows up directly as billing so it reads cleanly, and you can manage the balance with auto-reload and usage alerts. Both stop when the balance runs out, but subscription auth is a flat pool tied to your plan fee, while the API key is a balance you fund yourself. To run something steadily in production, the API key, whose ceiling is not pinned to your plan fee, gives more operational freedom.&lt;/p&gt;

&lt;h2&gt;
  
  
  Objections worth raising
&lt;/h2&gt;

&lt;p&gt;A few objections come to mind against the picture above. Let me raise them and answer each.&lt;/p&gt;

&lt;h3&gt;
  
  
  Objection 1: Just put everything on an API key
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;So why not put everything on an API key and stop worrying about two wallets?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;There is something to this. Unify on an API key and conversation and automation both run on the same pay-as-you-go billing, with no need to watch a subscription credit and an API balance separately. But for conversation, moving to an API key tends to cost more. The subscription is designed so that, within your usage limits, conversation runs flat-rate and effectively unlimited. Move that to metered API pricing and every single conversational turn that used to be covered by your monthly fee turns into per-token billing. Keep conversation on the subscription's flat rate and split only the heavy automation onto an API key. That is the realistic landing point.&lt;/p&gt;

&lt;h3&gt;
  
  
  Objection 2: My automation is light, so it fits
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;My automation is light, so it fits within the credit I am granted, right?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In many cases, yes. If you only run a script now and then, the granted credit is enough. The problem is estimating "it probably fits" without knowing your own consumption. If you keep an agent running constantly or call it often in CI, you can use up the credit faster than expected. Even when the judgment that it fits turns out right, it is safer to confirm the basis once. Take the run frequency you found in the checklist and reconcile it against actual consumption on the billing screen after the change takes effect.&lt;/p&gt;

&lt;h3&gt;
  
  
  Objection 3: It is a price hike in the end
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Whatever you call it, is this not a price hike in the end?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For users who ran heavy automation on the flat-rate pool, it is indeed a real increase in burden, because the part that was cheap moves closer to real cost. But this is closer to correcting a distortion than an unreasonable hike. Until now the flat-rate subscription subsidized the cost of automation, and the heavier the user, the larger the benefit. The split moves things toward how they arguably should be: conversation flat, automation at real cost. Conversation-centric users see no change in usage limits or feel; the ones affected are limited to the layer that had been benefiting most. It is not an across-the-board hike but the removal of a subsidy that had been riding along.&lt;/p&gt;

&lt;h2&gt;
  
  
  A good moment to revisit your automation design
&lt;/h2&gt;

&lt;p&gt;Once the checking and the fix are done, what remains is a design question. This change is a good moment to revisit how you design your automation.&lt;/p&gt;

&lt;p&gt;Even before this, there was a split in practice: API keys for automation built into production systems, subscription-authenticated Claude Code for scripts at hand. But as long as you ran on subscription auth, that choice never showed up in cost, because the flat pool absorbed it. From June 15, the choice of path feeds straight into cost and availability.&lt;/p&gt;

&lt;p&gt;The more you build and run your own agents, the bigger the impact. There are three axes to judge by.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Does it hurt if it stops&lt;/li&gt;
&lt;li&gt;Do you need consumption to be readable&lt;/li&gt;
&lt;li&gt;Is it shared across a team&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Against these three, you sort which workflows stay on the subscription credit and which move to an API key. Light experiments that can stop use the credit as a spending ceiling; production or shared workflows you cannot stop go on an API key where consumption reads cleanly. The change is a good occasion to inventory your automation once.&lt;/p&gt;

&lt;p&gt;Put plainly, the era of treating automation as something that "just runs, loosely, inside a flat rate" ends here. Until now, running an agent left its cost dissolved into the flat rate, out of sight. From here, cost follows automation everywhere. Since it costs more the more it runs, you will weigh, from the design stage, whether a given automation earns its cost.&lt;/p&gt;

&lt;p&gt;The same-day fix is simple. Check whether you are affected, and if so, claim the credit and decide how it stops. That is all. What remains beyond it is the homework of how to watch the cost and value of your automation. The cost of automation, which had been hidden inside the flat rate, has come into the open. That, I think, is the real substance of this change.&lt;/p&gt;

</description>
      <category>claude</category>
      <category>ai</category>
      <category>devops</category>
      <category>anthropic</category>
    </item>
    <item>
      <title>Datadog and AWS Shipped Ops Agents on the Same Day. What Are They Fighting Over?</title>
      <dc:creator>Kento IKEDA</dc:creator>
      <pubDate>Fri, 12 Jun 2026 00:34:47 +0000</pubDate>
      <link>https://dev.to/aws-builders/datadog-and-aws-shipped-ops-agents-on-the-same-day-what-are-they-fighting-over-29p6</link>
      <guid>https://dev.to/aws-builders/datadog-and-aws-shipped-ops-agents-on-the-same-day-what-are-they-fighting-over-29p6</guid>
      <description>&lt;p&gt;On June 9, 2026 (US time), two big announcements landed on the same day.&lt;/p&gt;

&lt;p&gt;At the keynote of Datadog's annual event DASH 2026 in New York, the Bits AI family expanded significantly: Detection, Investigation, Remediation, Infrastructure, Code, Release, Testing, Data Analysis, Chat, Memories, and Evals. Counting by agent, that is more than ten, with over 100 new features announced together. The full picture is laid out in the keynote roundup.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.datadoghq.com/blog/dash-2026-new-feature-roundup-keynote/" rel="noopener noreferrer"&gt;https://www.datadoghq.com/blog/dash-2026-new-feature-roundup-keynote/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The same day, AWS announced FinOps Agent as a public preview. It bundles four data sources, Cost Explorer, Cost Anomaly Detection, Cost Optimization Hub, and Compute Optimizer, and delivers automated cost-anomaly investigation, natural-language cost questions, periodic cost reports, and aggregated optimization opportunities straight into Slack and Jira. The details are in the AWS blog.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/blogs/aws-cloud-financial-management/aws-finops-agent-is-now-public-preview/" rel="noopener noreferrer"&gt;https://aws.amazon.com/blogs/aws-cloud-financial-management/aws-finops-agent-is-now-public-preview/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;AWS DevOps Agent had already gone GA in March, handling incident response. With FinOps Agent now added, AWS-built standard agents line up across the main operational domains. That said, DevOps Agent also covers multicloud and on-premises environments, so its scope differs from FinOps Agent, which targets AWS cost data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/blogs/mt/announcing-general-availability-of-aws-devops-agent/" rel="noopener noreferrer"&gt;https://aws.amazon.com/blogs/mt/announcing-general-availability-of-aws-devops-agent/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On the surface, this looks like two separate stories: Datadog the monitoring platform, AWS the cloud provider. But read the two announcements side by side, and you see both reaching for the same territory, Ops, through different entrances. Line up their features and most of them overlap, so a surface spec comparison won't show the difference. This article sorts out the same-day releases by the two companies' positioning, asks what these very similar agent lineups are actually fighting over, and goes as far as the axes for telling them apart and the predictions that follow.&lt;/p&gt;

&lt;p&gt;This is written for people working in SRE, FinOps, and Platform Engineering.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the "Ops Agent" Category Is Taking Shape Now
&lt;/h2&gt;

&lt;p&gt;FinOps, DevOps, and SRE go by different names, but they share the same structural problem: the round-trip cost between the operator who notices an anomaly and the developer who actually fixes it.&lt;/p&gt;

&lt;p&gt;Concretely, this kind of flow happens every day. Take cost management. Cost Anomaly Detection raises an alert about a cost anomaly. Someone opens a dashboard and looks at the per-service breakdown. They isolate whether the spike came from EC2 or RDS, open CloudTrail, and cross-check who deployed what. They confirm with stakeholders on Slack, then open the repository to find the relevant IaC code. By this point you have moved back and forth across many screens and tools. Writing the fix and opening a PR comes after all that.&lt;/p&gt;

&lt;p&gt;Incident response follows the same flow. An alert fires, you open a dashboard, read logs, follow APM traces, pull stakeholders into Slack, and apply a fix. It is a chain of context switches, and it wears you down even more when the response runs into the middle of the night.&lt;/p&gt;

&lt;p&gt;What changed over the past year or so is that LLM accuracy reached a level where it can take on this interpretation-and-round-trip work. Reading monitoring data, interpreting the cause, and proposing a fix can now be run on your behalf. The desire to make monitoring and action seamless has been around for a while; the shift is simply that it can finally be shipped in the form of an autonomous agent. In fact, AWS started lining up its operational agents around re:Invent 2025 late last year, when it previewed DevOps Agent and Security Agent.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/12/devops-agent-preview-frontier-agent-operational-excellence/" rel="noopener noreferrer"&gt;https://aws.amazon.com/about-aws/whats-new/2025/12/devops-agent-preview-frontier-agent-operational-excellence/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That DASH 2026 and AWS FinOps Agent landed on the same day is probably not a coincidence. Two companies from different territories are evolving in the same direction at the same time. It was a day that showed exactly that.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Datadog and AWS Differ in Strategy
&lt;/h2&gt;

&lt;p&gt;Let's compare the two strategies along three axes: data source, coverage, and how each positions its agents.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data Source
&lt;/h3&gt;

&lt;p&gt;AWS owns cost data and CloudTrail as first-party data. This is data that only AWS holds completely.&lt;/p&gt;

&lt;p&gt;AWS FinOps Agent's anomaly investigation is designed to leverage this directly. It starts from a Cost Anomaly Detection event and correlates the CloudTrail events around it. Because CloudTrail records who called which API and when, it can automate all the way to identifying the operation that drove the cost change and the IAM user or role behind it, the owner who bears responsibility. The output is not interpretation that reaches into business factors, but a Jira ticket or Slack notification delivered straight to the responsible engineer.&lt;/p&gt;

&lt;p&gt;This is a flow you cannot build unless you hold both the place where cost is generated and the place where resources are operated. Third-party FinOps SaaS can ingest the CUR (Cost and Usage Report), but matching it against CloudTrail at the same precision is not as easy as it is for AWS itself, given data lag and permissions. AWS can capture its own API calls in near real time, and there lies a structural advantage.&lt;/p&gt;

&lt;p&gt;Datadog owns the telemetry of APM, logs, traces, RUM, and profilers as first-party data. Bits Detection looks at real-time metrics and logs plus historical baselines, service topology, ownership information, and source-code context, and continuously judges whether the current state is healthy. Bits Investigation traces the root cause, and Bits Code takes on the fix, producing a production-ready PR on GitHub.&lt;/p&gt;

&lt;p&gt;Because the kind of data at the source differs, the anomalies each is good at differ too. AWS is strong at "money anomalies," Datadog at "performance and behavior anomalies." Both call themselves general-purpose Ops agents, but as long as the underlying data source differs, the center of gravity of what they are good at naturally differs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Coverage
&lt;/h3&gt;

&lt;p&gt;AWS stays within its own cloud. It does not cover multicloud or SaaS.&lt;/p&gt;

&lt;p&gt;This is a weakness and a strength at once. For an organization that runs entirely on AWS, everything from cost to execution logs sits inside the same cloud. Conversely, for an organization where Snowflake, Databricks, Google Cloud, Vercel, and others are mixed in, AWS FinOps Agent alone cannot see the organization-wide cloud cost anomalies.&lt;/p&gt;

&lt;p&gt;Datadog spans multicloud and SaaS, and on top of that pulls in log infrastructure placed on your own infrastructure. In addition to BYOC Logs, which lets you search logs while keeping them in your own cloud environment, DASH 2026 introduced Federated Logs, which lets you search across external data stores like Databricks and ClickHouse from Log Explorer using the same syntax. It is expanding in the direction of searching from Datadog no matter where logs are scattered.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.datadoghq.com/blog/introducing-datadog-byoc-logs/" rel="noopener noreferrer"&gt;https://www.datadoghq.com/blog/introducing-datadog-byoc-logs/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Which side you choose depends on how far your operation spans. For an organization that is 100% AWS, the former is enough; for one with multiple clouds or SaaS mixed in, you need the latter's reach. Even an organization using both can land on a split: AWS FinOps Agent for cost optimization, Datadog Bits for performance and availability.&lt;/p&gt;

&lt;h3&gt;
  
  
  How Each Positions Its Agents
&lt;/h3&gt;

&lt;p&gt;This difference is what I most want to convey in this article.&lt;/p&gt;

&lt;p&gt;AWS is trying to treat AI agents as building blocks that run on its own platform. Bedrock AgentCore is that foundation, with components coming together: Memory, Identity, Observability, Gateway, Runtime, and built-in tools like Browser Tool and Code Interpreter. DevOps Agent and FinOps Agent are positioned as AWS-built standard agents that ride on top of this layer.&lt;/p&gt;

&lt;p&gt;In other words, AWS holds the execution environment for agents itself and rolls out concrete agents on top of it over time. AgentCore also supports third-party frameworks like LangGraph and CrewAI, so third-party agents are expected to ride the same foundation.&lt;/p&gt;

&lt;p&gt;Datadog is trying to look at AI agents from the outside, as objects to monitor. The Datadog Agent Console announced the same day is a mechanism that gives a cross-cutting view of the coding agents used across an organization (Claude Code, Cursor, GitHub Copilot, and so on) alongside Datadog's own Bits AI. It answers, in a single UI, questions like who is using which agent and how much, whether it is paying off compared to not using AI, and where cost is being wasted and what can be fixed.&lt;/p&gt;

&lt;p&gt;What matters here is that Datadog's own Bits AI is also among the monitored targets. Your own agents and competitors' agents can be compared side by side in the same UI. The stance is not "use our agent," but "we'll give you visibility into your organization's agent operations themselves."&lt;/p&gt;

&lt;p&gt;AI Guard sits in the same line of thinking. It protects AI agents from attacks like prompt injection, tool misuse, and data exfiltration, but the protection is not limited to Datadog-made agents. It discovers unprotected agents in the environment and includes externally built agents as targets. Datadog is positioning itself to monitor, protect, and govern agents.&lt;/p&gt;

&lt;p&gt;AWS, which tries to own agents, and Datadog, which tries to monitor them. This difference is the core of the two strategies.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Mapping of the Territory Agents Cover
&lt;/h2&gt;

&lt;p&gt;Here is how each company realizes the territory that AI agents cover, from operations through development. This is not a comparison of the clouds as a whole. The top half is the day-to-day work of operations and development (from detection, investigation, and remediation through code change, release, and testing); the bottom half is the foundation that supports it (memory, evaluation, monitoring, protection, and execution environment). In this table, a slash means multiple products are candidates, and a plus means several are combined.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;Datadog&lt;/th&gt;
&lt;th&gt;AWS&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Anomaly detection&lt;/td&gt;
&lt;td&gt;Bits Detection&lt;/td&gt;
&lt;td&gt;Cost Anomaly Detection / DevOps Agent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Anomaly investigation&lt;/td&gt;
&lt;td&gt;Bits Investigation&lt;/td&gt;
&lt;td&gt;FinOps Agent + CloudTrail / DevOps Agent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auto-remediation&lt;/td&gt;
&lt;td&gt;Bits Infrastructure (infra ops &amp;amp; remediation)&lt;/td&gt;
&lt;td&gt;DevOps Agent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Code change&lt;/td&gt;
&lt;td&gt;Bits Code&lt;/td&gt;
&lt;td&gt;Kiro (spec-driven dev environment)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Release validation&lt;/td&gt;
&lt;td&gt;Bits Release&lt;/td&gt;
&lt;td&gt;CodePipeline / CodeBuild&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Test automation&lt;/td&gt;
&lt;td&gt;Bits Testing&lt;/td&gt;
&lt;td&gt;Kiro&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Natural-language questions&lt;/td&gt;
&lt;td&gt;Bits Chat / Bits Data Analysis&lt;/td&gt;
&lt;td&gt;FinOps Agent (cost only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Knowledge / memory&lt;/td&gt;
&lt;td&gt;Bits Memories&lt;/td&gt;
&lt;td&gt;AgentCore Memory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent evaluation&lt;/td&gt;
&lt;td&gt;Agent Evals&lt;/td&gt;
&lt;td&gt;Bedrock's evaluation features&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent monitoring&lt;/td&gt;
&lt;td&gt;Datadog Agent Console (includes external)&lt;/td&gt;
&lt;td&gt;AgentCore Observability (internal)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent protection&lt;/td&gt;
&lt;td&gt;AI Guard&lt;/td&gt;
&lt;td&gt;AgentCore Identity / Bedrock Guardrails&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent execution base&lt;/td&gt;
&lt;td&gt;The Datadog platform itself&lt;/td&gt;
&lt;td&gt;AgentCore Runtime&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Most of the roles overlap. The strengths show up in the areas that don't overlap, or where the strength clearly differs.&lt;/p&gt;

&lt;p&gt;Datadog has the reach to bring even external agents under its monitoring, plus a design, via Bits Code and Bits Release, that closes the loop from detection through fix and validation inside Datadog.&lt;/p&gt;

&lt;p&gt;AWS has root-cause identification for cost anomalies, sourced from data it owns in CloudTrail, plus AgentCore Runtime as a foundation for pulling third-party agents onto its own platform. The latter is a device for inviting third-party developers to "run your agents on AWS."&lt;/p&gt;

&lt;h2&gt;
  
  
  The Tipping Point Is "Who Changes the Code"
&lt;/h2&gt;

&lt;p&gt;Look at the mapping and one point of contention surfaces: how much of the flow from detection through investigation, code change, and release validation each company completes inside its own UI. That is where the turf war between the two lies.&lt;/p&gt;

&lt;p&gt;Datadog's Bits Code is the emblematic product on this front. According to the official description, Bits Code takes signals from Error Tracking, APM Recommendations, Continuous Profiler, Test Optimization, Code Security, and Bits Investigation, and takes over the chain of work an engineer usually does by hand: triaging the problem, locating the relevant code, writing the fix, running tests, and opening the PR. It handles all of it in one stroke.&lt;/p&gt;

&lt;p&gt;In other words, it is trying to pull into Datadog the division of labor that used to be "monitoring tools alert you, fixes are the job of the IDE and GitHub." The repository itself lives on GitHub or GitLab, but the entity generating the PR is Datadog.&lt;/p&gt;

&lt;p&gt;AWS DevOps Agent also goes from root-cause identification through fix proposal. But when the fix reaches into application code, the main stage for that code is not necessarily on AWS's own services, so the degree of closure here is weaker. AWS's main horse for coding is Kiro, which is good at spec-driven development, or third-party agents via Bedrock, and DevOps Agent ends up on the side that coordinates with them. AWS's strategy is a different path: hold AgentCore as the agent execution environment, and put everything, including third parties, onto its own platform.&lt;/p&gt;

&lt;p&gt;This is where the Datadog Agent Console comes in. By bringing the Claude Code, Cursor, and GitHub Copilot that an organization uses under monitoring, even the activity of coding agents running outside Datadog comes into view. Because Bits Code itself is also a monitored target, in-house and third-party can be compared side by side in the same UI.&lt;/p&gt;

&lt;p&gt;So Datadog is taking a position where, even without owning the coding agent itself, it can hold the initiative by taking the stance of monitoring it. AWS, on the other hand, takes the stance that if everything is completed on top of AgentCore, monitoring is handled in-house too.&lt;/p&gt;

&lt;p&gt;Which strategy works depends on where engineers spend their time. A team that stays long in the AWS console leans AWS; a team whose main stage is the Datadog UI and Slack leans Datadog. For organizations with many resources that can only be touched directly via the AWS console (IAM, VPC, Cost Management), AWS's advantage tends to hold. For organizations whose operations revolve around Slack notifications and PR reviews, the range Datadog can cover keeps extending.&lt;/p&gt;

&lt;h2&gt;
  
  
  Predictions from the Facts
&lt;/h2&gt;

&lt;p&gt;From the facts laid out so far, here are a few predictions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prediction 1: AgentCore Repeats Bedrock's Strategy for Agents
&lt;/h3&gt;

&lt;p&gt;AWS's strategy looks like a replay of its Bedrock strategy. Bedrock grew into a foundation that hosts models from Anthropic, OpenAI, Meta, and others on AWS, places them on the same footing as AWS's own Nova models, and puts everything on the AWS bill. Rather than betting on which model wins, it secures a share of the AI market by holding the execution environment and billing where models run.&lt;/p&gt;

&lt;p&gt;AgentCore Runtime is trying to reproduce this same structure in the agent market. Rather than building a large number of agents in-house, it is a device for steering third-party agent developers to "run it on AWS." DevOps Agent and FinOps Agent are positioned as reference implementations that run on top of that foundation. Components like AgentCore Memory, Identity, Observability, and Gateway look like a setup where AWS takes on the common parts that agent developers find tedious to build themselves.&lt;/p&gt;

&lt;p&gt;Whether this strategy works hinges on whether agent developers accept AgentCore's constraints. Just as Bedrock succeeded in pulling in model providers, if AgentCore succeeds in pulling in agent providers, AWS can keep monitoring, execution, and billing all inside its own house.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prediction 2: Bits Code and Bits Release Together Signal Datadog's Closure Strategy
&lt;/h3&gt;

&lt;p&gt;At DASH 2026, Datadog lined up Bits Code, which handles code change, and Bits Release, which handles release validation, in the same family. Bits Release is an agent that analyzes the intended impact of a code change, builds a validation plan, runs checks in staging, and watches the rollout. The two lining up together is itself a sign of strategic intent.&lt;/p&gt;

&lt;p&gt;To complete the loop from detection through investigation, code change, and release validation inside your own walls, holding the code change (the fix PR) is the starting point. Once you hold the code change, release validation becomes a natural extension: "Datadog watches over the PR Datadog produced at release time." Conversely, if the code change is taken by another company's IDE-side agent or GitHub Copilot, holding release validation alone in-house does not make a closed loop.&lt;/p&gt;

&lt;p&gt;Hold the loop's entry point with Bits Code, and pull its exit into your own walls with Bits Release. With these two lined up, a path to completing detection through fix and release inside Datadog's UI has come into view. Some of these features are still in preview, but the odds are high that Datadog finishes building the whole thing end to end.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prediction 3: Integration into Chat and Tickets Is Part of a Big Shift to "Push UX"
&lt;/h3&gt;

&lt;p&gt;Both companies emphasize chat and ticket-management tools as the output destination for their agents. AWS FinOps Agent lets you specify Slack or Jira as the output destination, and Datadog Bits also delivers investigation results and notifications to tools of the same kind.&lt;/p&gt;

&lt;p&gt;This shows a big shift from a "go look at the dashboard" UX to a "delivered where the engineer already is" UX. Until now, both monitoring tools and FinOps tools were built on the premise that the user goes to look at the dashboard. The alert arrives by email, and you check the details on the dashboard.&lt;/p&gt;

&lt;p&gt;In the age of agents, this flips. When a problem occurs, the interpretation and recommended action arrive directly in chat. You open the dashboard only when you want to dig deeper, or when you can't accept the agent's judgment. The dashboard drops to the position of a reference that quietly backs up the agent's judgment.&lt;/p&gt;

&lt;p&gt;This change is reaching the billing model too. On top of the conventional per-view or per-seat billing, Datadog has already adopted AI credits tied to the agent's workload, and consumption-based billing per investigation. Not how much you look at the dashboard, but how much you put the agent to work. The axis of monitoring billing is starting to move that way.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prediction 4: Coding-Agent Operations Move from "Individual Tool" to "Organizational Tool"
&lt;/h3&gt;

&lt;p&gt;The very fact that the Datadog Agent Console launched shows that coding-agent operations are taking shape as a genre. When Claude Code, Cursor, and GitHub Copilot are adopted ad hoc, it is easy to end up in a state where no one can see who is using how much.&lt;/p&gt;

&lt;p&gt;Each tool, too, is moving to bundle organizational usage, like GitHub Copilot's enterprise management features or Cursor's team features. But an individual tool's management screen is basically closed within that product. The Datadog Agent Console is trying to put a cross-cutting management layer on top of that. It is a different direction from, but close in aim to, GitHub Copilot starting to pull other companies' coding agents onto its own platform.&lt;/p&gt;

&lt;p&gt;The category of an "organizational coding-agent operations dashboard" is already taking shape. What isn't visible yet is the next step: industry-standard tags for cost allocation, and per-agent productivity metrics (PR acceptance rate, error rate, cost per accepted PR, and so on).&lt;/p&gt;

&lt;h3&gt;
  
  
  Prediction 5: The Initiative War Develops into a Three-Way Contest
&lt;/h3&gt;

&lt;p&gt;So far I've written in terms of the two parties, Datadog and AWS, but the real initiative war looks like a three-way contest. GitHub and GitLab, which hold the home of source code; AWS, Google Cloud, and Azure, which hold the execution platform; Datadog, Grafana Labs, and New Relic, which hold the monitoring platform. Each is after the initiative in the age of agents.&lt;/p&gt;

&lt;p&gt;GitHub is moving in the direction of completing the loop from code change through release to monitoring inside GitHub, with the combination of Copilot and Actions. GitHub Advanced Security includes Copilot Autofix, which is already built out to the point of opening a PR with a proposed fix for a vulnerability detected by CodeQL.&lt;/p&gt;

&lt;p&gt;The repository side, the execution-base side, and the monitoring side are each reaching, from the territory they own, for the loop of detection, fix, and release. Rather than any one of the three winning outright, the using organization picks the lead based on its own scale, industry, and existing stack. The winner splits by organization, that kind of outcome looks likely.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prediction 6: An Organizational Shift in FinOps
&lt;/h3&gt;

&lt;p&gt;What AWS FinOps Agent changes most may be not the technology but the shape of the organization.&lt;/p&gt;

&lt;p&gt;FinOps has so far mostly run on a central FinOps team. A group of specialists builds cost reports and talks with the development teams in a monthly review. In recent years, the central team has been moving toward setting standards and guardrails while delegating responsibility to each team, but the carriers of that work were still people. AWS FinOps Agent is designed so individual engineers can ask about cost in natural language, and so that when an anomaly occurs, a Jira ticket arrives directly to the engineer.&lt;/p&gt;

&lt;p&gt;For an organization that wants to lean toward self-service, this is a device that advances that shift a step, because an agent can take on the decentralization that people used to carry. If you take this path, the FinOps team's job moves from teaching everyone toward mapping accounts to owners, setting tag conventions, and designing the context handed to the agent. The FinOps specialist doesn't disappear; the substance of the work changes toward shaping the foundation. The view is that such an option has become realistic.&lt;/p&gt;

&lt;p&gt;A similar structure shows up in other areas. In SRE, there is the choice between watching monitoring directly and shaping the monitoring system. In Platform Engineering, between each team handling its own infrastructure and a team laying down a standard path with guardrails so they can. In security, between a specialist team reviewing one item at a time and embedding policy into the platform's templates. In each, there is a choice: does the specialist team handle the field directly, or move to the side that shapes a system the field can run on its own.&lt;/p&gt;

&lt;p&gt;Neither is the right answer. There are organizations where full centralization fits, and organizations that delegate heavily to the field. Many settle in the middle, where a center provides standards and guardrails and the field moves on its own within that range. AI agents have arrived as an option that can realize the "field moves on its own" side at a lower cost than human effort. The self-service shift in FinOps is one example. Adopting AI agents widens not only the how of monitoring and operation, but the very set of options for where an organization places the center of gravity of its roles.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anticipated Objections
&lt;/h2&gt;

&lt;p&gt;A few objections come to mind against the picture so far. Let me raise them and answer each.&lt;/p&gt;

&lt;h3&gt;
  
  
  Objection 1: Datadog's Multicloud Advantage
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Datadog can see multicloud and SaaS too, so won't it win on coverage in the end?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The breadth of multicloud coverage is indeed Datadog's strength. But AWS has an area, cost data and CloudTrail, where only AWS holds the first-party data. The advantage that AWS itself can trace the basis for a cost change at the highest precision remains. Even a multicloud organization can land on a split where it leans only its AWS cost governance onto AWS FinOps Agent. Unifying monitoring versus precision of the data source: these two hard-to-reconcile axes are what's at stake here.&lt;/p&gt;

&lt;h3&gt;
  
  
  Objection 2: The Disadvantage of Staying Inside Its Own Cloud
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;AWS closes itself inside its own cloud. Doesn't the narrow coverage put it at a disadvantage?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Narrow coverage is indeed a disadvantage. But developers who use AWS as their main platform already spend long hours in the AWS console. Being able to slip into the screen engineers already touch is a strong weapon for lowering the cost of behavior change. Not having to move people to a new UI works strongly in the field of tool adoption.&lt;/p&gt;

&lt;h3&gt;
  
  
  Objection 3: The Both-at-Once Option
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Rather than picking one, why not just adopt both?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In fact, this is a realistic option. In coding, the practice of using multiple agents in parallel has already spread: refining the design with Claude Code, handing implementation to Codex, and having one review the other's output. Use agents with different strengths, and have one verify the other's results. For Ops agents too, the same worldview holds up well.&lt;/p&gt;

&lt;p&gt;But it holds up because humans have the judgment axis of "which to use for what, and which output to trust." Ops agents overlap heavily in function; anomaly investigation, auto-remediation, and natural-language questions can all be done by both. Put both in without a judgment axis, and all that's left is the cost of carrying the same experience in two places, with engineers wondering which one to ask. It is the same structure as the monitoring-tool sprawl of the past. Whether you make the most of using both comes down to the design of how you divide them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Objection 4: The Possibility That Datadog Sees Everything Anyway
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;If agents on AgentCore are visible from Datadog too, won't Datadog hold it all in the end?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is a future well worth considering. AgentCore Observability is OpenTelemetry-compatible and can already send telemetry to external monitoring tools, including Datadog. Even if AWS holds the execution base and billing, a split where the main stage of monitoring stays on the Datadog side is possible. Holding execution but opening monitoring to the outside: that gradient looks like the long-term fault line for initiative.&lt;/p&gt;

&lt;h3&gt;
  
  
  Objection 5: GitHub Entering the Field
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Won't GitHub, which holds the repository, build the same loop first?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This connects directly to the three-way contest from Prediction 5. GitHub is actually moving in this direction, building a code-origin loop with the combination of Copilot Autofix and Actions. Datadog's Bits Code starts from monitoring, GitHub Copilot Autofix from code, AWS DevOps Agent from the execution base. The same loop, with each reaching from a different entrance, is how it can be organized.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where to Start
&lt;/h2&gt;

&lt;p&gt;The axes for telling them apart are now in place: data source, coverage, and how each positions its agents. With those three, you can read each agent's character. So let's move to what to actually do first. I'll organize this assuming a common setup: AWS as the execution base, Datadog for monitoring, GitHub for code management.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Put into Words What Your Operation Runs "From"
&lt;/h3&gt;

&lt;p&gt;The first thing to do is not tool selection or a proof of concept, but putting the current state into words. When an incident or cost anomaly occurs, where do you look first, where do you identify the cause, and where do you apply the fix? Write out this round-trip path once, and you start to see which vendor your organization is most compatible with.&lt;/p&gt;

&lt;p&gt;The judgment axis can be organized like this.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Central issue&lt;/th&gt;
&lt;th&gt;Lead agent&lt;/th&gt;
&lt;th&gt;Basis for the call&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cost optimization / cost-anomaly investigation&lt;/td&gt;
&lt;td&gt;AWS FinOps Agent&lt;/td&gt;
&lt;td&gt;Matching cost data against CloudTrail at this precision is hard for third parties to produce&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Infrastructure failures / resource anomalies inside AWS&lt;/td&gt;
&lt;td&gt;AWS DevOps Agent&lt;/td&gt;
&lt;td&gt;If it stays inside AWS, even execution logs are reachable via its own APIs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multicloud or cross-SaaS performance and availability&lt;/td&gt;
&lt;td&gt;Datadog Bits&lt;/td&gt;
&lt;td&gt;The amount of telemetry owned and the breadth of coverage pay off&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Visibility into coding-agent operations&lt;/td&gt;
&lt;td&gt;Datadog Agent Console&lt;/td&gt;
&lt;td&gt;Seeing external agents across one view is currently nearly unique&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Code-origin security fixes&lt;/td&gt;
&lt;td&gt;GitHub Copilot Autofix&lt;/td&gt;
&lt;td&gt;A fix flow tightly coupled to the repository is easy to run&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In many organizations, several of these apply at once. If you use AWS as the execution base, Datadog for monitoring, and GitHub for code management, you probably match most rows of the table. That's exactly why, rather than putting everything in at once, you need to prioritize by which path has the highest round-trip cost right now.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Agentify Just the One Highest-Round-Trip Path First
&lt;/h3&gt;

&lt;p&gt;Try to agentify everything at once and you end up just trying a bunch of agents at once with no settled evaluation axis. The realistic move is to narrow to the one path from Step 1 where the human round-trip happens most.&lt;/p&gt;

&lt;p&gt;For example, an organization spending serious hours each month on cost-anomaly investigation could start by trying AWS FinOps Agent's public preview, configured with Slack notifications and a threshold filter (only investigate anomalies above a certain amount). If the initial investigation for incident response has become a late-night burden, pick either AWS DevOps Agent or Datadog Bits Investigation, choosing based on whether your incidents stay inside AWS or reach external SaaS. Narrow to one path, and you can concretely evaluate what got faster compared to doing it by hand, and what you can't yet trust.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Sort Out Context and Tag Conventions During the Trial
&lt;/h3&gt;

&lt;p&gt;An agent's accuracy changes greatly with the context you hand it. AWS FinOps Agent is designed to take context files like account-to-owner mappings, team definitions, tag conventions, and review cycles. Whether it can resolve "what is this team's cost" to the right set of accounts hinges on this context work.&lt;/p&gt;

&lt;p&gt;What matters here is that sorting out context and tag conventions is an investment that won't go to waste no matter which vendor you choose. The account-to-owner table, tag conventions for cost allocation, and service-to-team mappings are included. These work as the prerequisite information handed to the agent, whether for AWS FinOps Agent or for Datadog. Sort it out while you're still in the trial phase, and it carries over as is, whether you move to full adoption or switch to a different vendor. Sorting out the metadata at your feet before the agent itself is a shortcut that looks like a detour.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Start Visibility into Coding-Agent Operations Early, as a Separate Track
&lt;/h3&gt;

&lt;p&gt;This is a slightly different topic from Ops or FinOps, but for running an organization's AI use in a healthy way, it's worth moving on in parallel.&lt;/p&gt;

&lt;p&gt;If you use Claude Code, Cursor, and GitHub Copilot across the organization: who uses how much, how much it costs, and how many PRs get accepted. Aren't there many organizations struggling to get these numbers? Use that started as an individual trial eventually spreads to teams and the whole organization. So as not to scramble at that stage, you want to anticipate team and enterprise use from the start. Now that a visibility layer for agent operations like the Datadog Agent Console has appeared, even if adoption is deferred, it's safe to at least start the discussion early on how to make your organization's coding-agent use visible.&lt;/p&gt;

&lt;p&gt;The metrics worth tracking are around the number of users and frequency of use, spend per agent, PR acceptance rate, and rework rate of fixes. With those visible, you can judge by the numbers which agent to weight investment toward, and where waste is occurring.&lt;/p&gt;

&lt;h3&gt;
  
  
  Because It's a Transitional Period, Build Switchability into the Design
&lt;/h3&gt;

&lt;p&gt;What's common across these steps is the stance of not going all-in yet. Most Ops agents are still in preview, and their features and billing models will change.&lt;/p&gt;

&lt;p&gt;That's exactly why you should avoid building deeply dependent on a specific vendor's conventions, and instead thicken assets that work no matter where you land, like the context and tag conventions from Step 3. Put the data source and round-trip path into words, sort out the metadata, and validate one path at a time. Proceed in that order, and whether you move from trial to full adoption, or the initiative landscape shifts, you can move without panic.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Happening Now
&lt;/h2&gt;

&lt;p&gt;That the two companies' announcements landed on the same day is a signal that the Ops-agent category has entered a phase where multiple vendors reach for the same territory at the same time. Datadog, strong in monitoring, and AWS, with the cloud base, are each reaching for the initiative from the source of their own strength. The structure where the one who holds the most data also holds the action that follows is a movement that applies to AI agents generally.&lt;/p&gt;

&lt;p&gt;AWS's strategy is platform-type: hold the lower layer of the agent execution environment, line up the agents that ride on it with its own products over time, and invite outside developers onto the same foundation. Datadog's strategy is meta-platform-type: hold the outer layer of the monitored target, and make both its own and others' products visible in the same UI. And as the third party in the three-way contest, GitHub and GitLab, which hold the home of source code, are entering the same territory.&lt;/p&gt;

&lt;p&gt;Which strategy comes out ahead changes with what an organization runs its operations from, so there is no single answer. But the structure of this competition itself is a microcosm of the initiative war in the age of agents. Each reaches, from its own territory, for what lies beyond. The one who owns the infrastructure reaches for the execution environment of the software that runs on it. The one who monitors reaches for the action beyond the expanding monitored target. The one who keeps the code reaches for the loop of code generation and release.&lt;/p&gt;

&lt;p&gt;What's being fought over is the initiative of where, in which UI, to complete the loop from detection through fix to release. The same-day release of June 9, 2026 was the day that fight came clearly into the open.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>datadog</category>
      <category>ai</category>
      <category>devops</category>
    </item>
    <item>
      <title>Why Coding Stays in Human-AI Collaboration: A Paradox in Stanford's 51 Deployments</title>
      <dc:creator>Kento IKEDA</dc:creator>
      <pubDate>Sat, 06 Jun 2026 04:09:33 +0000</pubDate>
      <link>https://dev.to/aws-builders/why-coding-stays-in-human-ai-collaboration-a-paradox-in-stanfords-51-deployments-1kpi</link>
      <guid>https://dev.to/aws-builders/why-coding-stays-in-human-ai-collaboration-a-paradox-in-stanfords-51-deployments-1kpi</guid>
      <description>&lt;p&gt;"We rolled out AI and saw no results" and "AI made our development dramatically faster" are being said in the same year, often inside the same company. Where does that gap come from?&lt;/p&gt;

&lt;p&gt;Stanford Digital Economy Lab's &lt;a href="https://digitaleconomy.stanford.edu/publication/enterprise-ai-playbook" rel="noopener noreferrer"&gt;The Enterprise AI Playbook: Lessons from 51 Successful Deployments&lt;/a&gt; (April 2026) goes after that question with real data. It analyzes 51 production deployments across 41 organizations and 9 industries, drawing on structured interviews and internal documents to separate what made deployments succeed from what made them fail.&lt;/p&gt;

&lt;p&gt;Most of the coverage so far reads the report from a management angle: AI adoption as an organizational-change problem, the importance of process redesign and executive commitment. That framing is accurate. But the report also spans customer support, software engineering, marketing, and more, and there is plenty in there about software engineering that the management-focused takes barely touch.&lt;/p&gt;

&lt;p&gt;Read it with an engineer's eye and one paradox jumps out. While customer support and IT operations move toward autonomous AI, &lt;strong&gt;coding alone stays in "human-AI collaboration."&lt;/strong&gt; That runs against the prevailing mood that "AI coding is the frontier."&lt;/p&gt;

&lt;p&gt;This post starts from that paradox. First I'll walk through the report's method and key findings, then analyze the structure that keeps coding in collaboration, and finally re-read the 51 cases from three vantage points: the individual engineer, the engineering lead, and whoever owns org-level development. I'll stay close to the report's findings and then push past them.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the 51 cases actually are
&lt;/h2&gt;

&lt;p&gt;A quick look at the study first, so the interpretation later lands properly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Authors and background
&lt;/h3&gt;

&lt;p&gt;The authors are Elisa Pereira, Alvin Wang Graylin, and Erik Brynjolfsson. Brynjolfsson is one of the most-cited researchers in the economics of information, known for early work measuring the productivity effects of IT investment. The "Productivity J-Curve" that Brynjolfsson and colleagues laid out in 2021 is one of the foundations of this study.&lt;/p&gt;

&lt;p&gt;The J-Curve goes like this. A general-purpose technology like AI doesn't raise productivity just by being deployed. It needs complementary investment in intangibles: process redesign, training, reorganization. During that investment, productivity actually dips. Only once the investment pays off does productivity jump. The curve dips into a trough before springing up, hence the "J." The report's recurring message that organization matters more than technology rests on this premise.&lt;/p&gt;

&lt;h3&gt;
  
  
  Method
&lt;/h3&gt;

&lt;p&gt;The study only looks at deployments that moved past the pilot stage into production and produced measurable business value. The selection criteria were:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stable operation, integrated into real workflows&lt;/li&gt;
&lt;li&gt;Used in decision-making by multiple teams for at least 3 months&lt;/li&gt;
&lt;li&gt;Clear outcomes in productivity, revenue, or customer satisfaction&lt;/li&gt;
&lt;li&gt;Replicable to other teams or regions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Interviews ran from August 2025 to February 2026, with at least one 60-minute structured interview per company, supplemented by internal metrics, project plans, and financial documents. The sample skews toward manufacturing, financial services, and technology.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the study is telling us
&lt;/h3&gt;

&lt;p&gt;The conclusion is simple. Using the same technology for the same purpose, outcomes varied widely by organization. What made the difference was not the AI model. It was how prepared the organization was, what processes it had, how its leaders engaged, and whether it had a culture that tolerated failure.&lt;/p&gt;

&lt;p&gt;The findings most relevant to an engineering organization:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;77% of the hardest challenges were intangible costs: change management, data quality, process redesign. The technology itself was consistently rated "the easiest part."&lt;/li&gt;
&lt;li&gt;61% of successful projects had a failed AI project before the one that worked. Those sunk costs never show up in the success case's ROI.&lt;/li&gt;
&lt;li&gt;For similar use cases, one company took weeks and another took years. The difference was not technology but executive engagement, existing infrastructure, and user willingness.&lt;/li&gt;
&lt;li&gt;The "escalation model," where AI autonomously handles 80%+ and humans review only exceptions, had a median productivity gain of 71%, well above the 30% of approval-based models. (The report notes this gap may also partly reflect differences in task characteristics.)&lt;/li&gt;
&lt;li&gt;Agentic AI implementations were still only 20% of cases, but their median productivity gain was 71%, above the 40% of high-automation approaches.&lt;/li&gt;
&lt;li&gt;In 42% of cases, model choice was fully interchangeable. The durable advantage sat in the orchestration layer, not the foundation model.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The selection bias to keep in mind
&lt;/h3&gt;

&lt;p&gt;Worth stressing: this is a study of successful deployments only. The report is explicit about the selection bias. Companies were asked about past failures and abandoned pilots too, but what ends up analyzed are the cases that created value.&lt;/p&gt;

&lt;p&gt;So this study shows "what success looks like and what it takes to get there," not "how common success is." The report cites MIT's NANDA initiative study from 2025, "The GenAI Divide: State of AI in Business 2025" (which reported that 95% of generative-AI pilots produced no measurable financial impact), and positions itself as the inverse: a deep look at the side that succeeded. Read it with that asymmetry in mind.&lt;/p&gt;

&lt;h2&gt;
  
  
  The coding paradox
&lt;/h2&gt;

&lt;p&gt;Here's the core. Chapter 3 of the report has a table organizing human-in-the-loop (HITL) involvement by business function. Read it as an engineer and the table feels off.&lt;/p&gt;

&lt;h3&gt;
  
  
  The three HITL levels
&lt;/h3&gt;

&lt;p&gt;The report splits HITL into three levels:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Escalation&lt;/strong&gt;: AI autonomously handles 80%+; humans review only exceptions (20% or less sampled)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Approval&lt;/strong&gt;: AI does the work; a human approves each output before it executes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Collaboration&lt;/strong&gt;: humans and AI both work hands-on, task by task&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Autonomy is highest at escalation and lowest at collaboration. By function:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Function&lt;/th&gt;
&lt;th&gt;HITL level&lt;/th&gt;
&lt;th&gt;Median productivity gain&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;IT operations&lt;/td&gt;
&lt;td&gt;Escalation&lt;/td&gt;
&lt;td&gt;90%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Customer support&lt;/td&gt;
&lt;td&gt;Escalation&lt;/td&gt;
&lt;td&gt;71%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claims processing&lt;/td&gt;
&lt;td&gt;Escalation&lt;/td&gt;
&lt;td&gt;50%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Field service&lt;/td&gt;
&lt;td&gt;Approval&lt;/td&gt;
&lt;td&gt;80%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Clinical documentation&lt;/td&gt;
&lt;td&gt;Approval&lt;/td&gt;
&lt;td&gt;66%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Coding&lt;/td&gt;
&lt;td&gt;Collaboration&lt;/td&gt;
&lt;td&gt;54%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;(from Chapter 3, "How much human oversight is optimal?")&lt;/p&gt;

&lt;p&gt;Coding is the only function in the collaboration tier. Clinical documentation sits in approval because medical records are legal documents a physician has to sign off on, one by one. Claims processing and customer support can move to escalation because they're high-volume, have clear success criteria, and tolerate recoverable mistakes.&lt;/p&gt;

&lt;p&gt;So why does coding stay in collaboration? No regulation pins AI down here. And yet humans and AI keep working task by task.&lt;/p&gt;

&lt;h3&gt;
  
  
  The role shifted from "writing" to "reviewing"
&lt;/h3&gt;

&lt;p&gt;The report describes the change on the coding floor like this: rather than completing a whole task themselves, engineers increasingly review AI-generated changes, make small adjustments, and merge the PR. At one Latin American fintech, AI agents migrated millions of lines of legacy code in a system serving 100M+ customers, compressing work originally estimated at 18 months and 1,000+ people into a few weeks. At an insurer, a legacy rebuild scoped at 5,000 hours, 7 people, and a 2027 finish was done in 600 hours with 3 people.&lt;/p&gt;

&lt;p&gt;So coding isn't "not getting faster." The role moved from writing to reviewing, and productivity is up 54%. It just hasn't reached the full autonomy other functions have. There's a structural reason.&lt;/p&gt;

&lt;h3&gt;
  
  
  Checking coding against the 4 conditions for agentic success
&lt;/h3&gt;

&lt;p&gt;The report lists four conditions under which agentic AI delivers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;High-volume, repetitive tasks&lt;/li&gt;
&lt;li&gt;Clear success criteria&lt;/li&gt;
&lt;li&gt;Recoverable errors&lt;/li&gt;
&lt;li&gt;Access to data across multiple systems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Hold coding up against these four and the reason it stays in collaboration comes into view.&lt;/p&gt;

&lt;p&gt;Procurement and alert triage cleanly satisfy all four. High volume, a clear right/wrong, recoverable mistakes. So they move to full autonomy.&lt;/p&gt;

&lt;p&gt;Coding? Routine refactors, test generation, dependency bumps tend to satisfy the four. But feature work and architectural change break them. "Tests pass" isn't a sufficient success criterion when readability, maintainability, and fit with existing design are in play. And production migrations or schema changes can produce unrecoverable errors. Two of the conditions, "clear success criteria" and "recoverable errors," fail across a wide swath of coding.&lt;/p&gt;

&lt;p&gt;Per the METR measurements the report cites (METR is a research org that measures AI autonomy), the length of software tasks frontier models can complete autonomously has been doubling roughly every 7 months, reaching about 15 human-expert-hours in early 2026. Anthropic, meanwhile, warns that around the 3.5-hour mark, API success rates drop below 50%. Coding agents that run autonomously for days and emit tens of thousands of lines are no longer rare, but production reliability falls off as tasks get longer and more complex. That's exactly why the human involvement of engineer review still governs quality.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why engineering is a natural fit for collaboration
&lt;/h3&gt;

&lt;p&gt;One step further. Coding stays in collaboration not because AI is weak or engineers are behind, but because software engineering already has a deeply layered culture of verification.&lt;/p&gt;

&lt;p&gt;Type systems, unit tests, code review, CI, static analysis, canary releases. Engineering spent decades building a culture that distrusts even human-written code and puts it through layers of verification before production. Adding review on top of AI-written code is the most natural extension of that culture.&lt;/p&gt;

&lt;p&gt;The flip side: full autonomy (escalation) collides head-on with that verification culture. "Only a human reviews 20% of samples" works for alert triage, but "review only 20%" against production code runs against most engineers' instincts. Coding stays in collaboration partly as a technical limit and partly because engineering, as a discipline, is built around verification.&lt;/p&gt;

&lt;p&gt;Seen this way, HITL level isn't a simple matter of "as models improve, things automatically advance to escalation." The stronger the verification culture in a domain, the longer collaboration persists. Coding's path to autonomy depends not just on model performance but on how much of the verification you can hand to AI itself, specifically, how you design the layer where AI writes tests and AI reviews. Whether review itself can be handed to AI is something I'll come back to later.&lt;/p&gt;

&lt;h2&gt;
  
  
  Re-reading from three vantage points
&lt;/h2&gt;

&lt;p&gt;What coding-stays-in-collaboration means depends on where you stand. Three vantage points: the individual engineer, the engineering lead, and whoever owns org-level development. The same study hands each a different assignment.&lt;/p&gt;

&lt;h3&gt;
  
  
  The individual engineer: reviewing becomes the precondition for productivity
&lt;/h3&gt;

&lt;p&gt;The change the report describes means an individual engineer's daily work is already shifting. Instead of writing from scratch, you read AI-generated changes, judge them, adjust, and merge. Time spent writing code gives way to time spent evaluating code.&lt;/p&gt;

&lt;p&gt;This is where the nature of the collaboration model bites. Under escalation, a human only looks at exceptions. Under collaboration, human judgment governs quality on every single output. Let review get sloppy and defects in generated code flow straight to production.&lt;/p&gt;

&lt;p&gt;The awkward part: review gets harder as AI gets better. The more "plausible" the output looks, the more humans skip the details. Automation complacency, the long-known phenomenon in aviation and process industries where over-trusting an automated system erodes attention, shows up in code review too. Obviously wrong code is easy to catch; code that's 90% right with a subtle 10% wrong slips past a skim. Collaboration's 54%, the most modest gain among the functions, can be read partly as that "cost of review" offsetting the productivity gain.&lt;/p&gt;

&lt;p&gt;The question for the individual engineer is where to draw the line between what to delegate to AI and what to keep under your own judgment. The data says coding is still in the collaboration stage, meaning human judgment is directly tied to quality. Not taking AI output at face value, not treating review as a formality, these become preconditions for productivity at this stage. The direction of skill shifts too. More than the ability to write code from zero, the ability to spot defects in code others (or AI) wrote, and to read the intent behind a design, grows relatively more important.&lt;/p&gt;

&lt;h3&gt;
  
  
  The engineering lead: designing the HITL levels
&lt;/h3&gt;

&lt;p&gt;For a lead, the job becomes designing which development tasks run at which HITL level. The report frames HITL choice as determined by error tolerance, regulatory requirements, and task complexity. That maps directly onto a team's design guidance.&lt;/p&gt;

&lt;p&gt;The four conditions for agentic success (high-volume/repetitive, clear success criteria, recoverable errors, multi-system data access) double as a way to sort development tasks. Tied to practice, roughly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Lean toward escalation&lt;/strong&gt;: dependency bumps, lint/format autofixes, boilerplate generation, filling test-coverage gaps. High-volume, mechanically checkable success criteria, and CI stops failures.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Collaboration fits&lt;/strong&gt;: feature implementation, refactoring, bug fixes. Success criteria involve readability and design fit, and human review governs quality.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Push toward approval&lt;/strong&gt;: production migrations, schema changes, auth and billing. Errors can be unrecoverable, so a human approves each one even when AI does the work.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Leave that sorting vague and pick "let AI do everything" or "humans review everything," and the former invites incidents while the latter caps productivity. Varying the autonomy level by the nature of the task is where a lead earns their keep.&lt;/p&gt;

&lt;p&gt;There's another important finding. In 42% of cases model choice was interchangeable, and the durable competitive advantage was not the underlying large model (the foundation model) itself but the design of how you combine and use it, the orchestration layer. Which model you use is becoming a commodity for many use cases. What separates teams is how you decompose tasks, where you insert human involvement, and how you wire up multiple tools and data sources. In the report's words, the advantage isn't the model but how you compose it.&lt;/p&gt;

&lt;p&gt;This is actually good news for the people doing the designing. You don't have to win the race of chasing the latest model; you can build an edge on the quality of task decomposition, HITL design, and tool integration. How you build the verification layer, the multi-stage scheme where AI writes tests and AI reviews, is exactly this orchestration-layer question, and it's the key to moving coding from collaboration to the next stage.&lt;/p&gt;

&lt;h3&gt;
  
  
  Org-level development: how to use the freed capacity and make it stick
&lt;/h3&gt;

&lt;p&gt;At the org-development level, the question is what happens after productivity rises. The report shows, with concrete cases, that what you do with the freed capacity is decided by organizational choice, not technology.&lt;/p&gt;

&lt;p&gt;The report lays out three strategies for that capacity:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Accelerate&lt;/strong&gt;: keep headcount, pour it into development speed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Redeploy&lt;/strong&gt;: move people from automated work to higher-value work&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reduce&lt;/strong&gt;: cut headcount directly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At one PE-owned company, an 88% productivity gain in coding led to cutting the development team from 7 to 3. At an edtech company, a 20-30% improvement in coding went not to layoffs but to accelerating the roadmap: with a large product backlog, shipping features faster was worth more than trimming the team. The report notes growth-stage companies lean toward acceleration, while cost-focused ownership (PE, turnaround) leans toward reduction. The same productivity gain can tip toward reduction or acceleration. What decides is not technology but organizational strategy.&lt;/p&gt;

&lt;p&gt;A second case, security operations, reads as a model of redeployment. A 6-person SOC (security operations center) team at one tech company was buried under 1,500 alerts a month. After automating first-pass triage with AI, the required headcount dropped to the equivalent of 1.5 people, but no one was laid off. The freed 4.5 FTE were redeployed to proactive threat hunting, security-design review, and team skill-building. The executive who led it put it this way: "AI isn't replacing the person you have; it's replacing the person you don't need to hire." In areas like SRE and security, chronically understaffed with a backlog of "things we want to do but can't," redeployment rather than reduction is the natural choice.&lt;/p&gt;

&lt;p&gt;Org-development also has to think about how to bring cautious departments along. Per the report, the most cautious about AI adoption were not frontline users (23%) but staff functions like legal, HR, risk, and compliance (35%). Different stances care about different things: executives want ROI you can see in numbers, staff functions worry about procedural risk and where the blame lands, and the frontline fears losing their jobs. Each needs a different move. Spreading AI inside an engineering org likewise means looking past the dev team's own walls. How you bring legal, security, and HR along shapes how fast you can roll out. The report has several cases where handing those staff functions a governance role turned the cautious departments into active champions.&lt;/p&gt;

&lt;p&gt;Executive involvement comes in stages too, the report says. Hands-on engagement, checking progress weekly and clearing blockers, accounted for 58% of the successful cases. The 7 cases that reached company-wide transformation all wired AI adoption into a corporate OKR and tied it to evaluation and compensation. For an engineering leader as well, the condition for making it stick is to connect AI use to organizational goals rather than leaving it as a "clever trick on the floor." And one more thing the report stresses repeatedly: a culture that doesn't punish failure. 61% of successful cases had a prior failure, and in none of the cases studied was anyone punished for a failed AI project. Since putting AI into production presupposes trial and error, building a culture that can forgive failure is one of the most important jobs at the org-development level.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to add when you read the report
&lt;/h2&gt;

&lt;p&gt;I've walked through the report's findings. But what's in the report isn't enough on its own. To carry it into practice, a few things need reading-in.&lt;/p&gt;

&lt;p&gt;First, selection bias. As noted, this is a study of successful deployments. The report itself admits it doesn't show "how common success is." Take it as a description of patterns shared by organizations that succeeded, not a guarantee that the same approach will work. MIT's "95% fail" number and this study's "51 that succeeded" are the same phenomenon seen from opposite sides. Only by overlaying both do you get the full picture.&lt;/p&gt;

&lt;p&gt;Second, the handling of reliability. As &lt;a href="https://www.themarketai.com/post/stanford-s-enterprise-ai-playbook-what-works-and-what-s-missing" rel="noopener noreferrer"&gt;one critique&lt;/a&gt; points out, the report claims "messy data isn't a blocker if you design around it," yet flags reliability problems in 27% of cases while never once using the word "hallucination."&lt;/p&gt;

&lt;p&gt;To an engineer, "you can design around it" doesn't quite hold for messy data and model instability. In coding especially, hallucination shows up as "plausible but wrong code" that can slip past review. Take the report's view on board, but also look hard and soberly at your own data quality and model reliability. The "vague success criteria" and "unrecoverable errors" I named earlier as reasons coding stays in collaboration are, in fact, another face of this same reliability problem.&lt;/p&gt;

&lt;p&gt;Third, the time axis. The report's data collection ran from late 2024 to early 2025, when agentic AI was still nascent, and at the time of the study agentic implementations were only 20% of cases. The report itself notes that the redeployment and hiring-freeze patterns observed here are characteristics of an early-adoption phase, and that the distribution may shift as models mature and cost pressure builds. It also references separate Brynjolfsson-et-al. research showing that employment of younger workers in AI-exposed roles has already declined in relative terms, and warns this is an early sign of a larger shift. The coding-stays-in-collaboration picture here is likewise not fixed; read it as a snapshot of this moment. Given how fast the length of tasks AI can complete autonomously is growing (per METR), the boundaries should keep moving, from collaboration to approval, from approval to escalation.&lt;/p&gt;

&lt;p&gt;Finally, separate from those three points, let me answer an objection likely to be aimed at this post's own argument. Even if coding stays in collaboration, why not just hand review to AI too? In fact, AI code review has spread fast, and AI now catches style violations and common bugs. For low-risk changes, some teams already run on AI review alone.&lt;/p&gt;

&lt;p&gt;But having AI review AI-written code has a trap. The same model tends to share the same blind spots, so "plausible but wrong code" can be missed by both the author-AI and the reviewer-AI. The automation complacency from earlier stacks up, multiplied, between AIs. The one who ultimately decides the merge and owns the soundness of the design is, for now, a human. So even as AI review advances, it's more realistic to expect that human review won't go to zero but will narrow to the high-risk areas. Low-risk changes go to AI; areas carrying unrecoverable risk stay with humans. That lines up exactly with this post's view: vary the HITL level by task.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;Coding stays in human-AI collaboration not because the technology is immature or engineers are slow to change. Task complexity, the risk of unrecoverable errors, and the fact that software engineering already has a multi-layered culture of verification: these three overlap to keep coding at a stage where human judgment still governs quality.&lt;/p&gt;

&lt;p&gt;That fact hands each vantage point a different assignment. The individual engineer: the responsibility to keep engaging with review that gets harder as AI advances. The lead: the role of deciding which task sits at which autonomy level and how to design the verification layer. The org-development owner: the work of choosing, as strategy, what to do with the freed capacity, and of building a culture that forgives failure plus org-wide adoption that sticks.&lt;/p&gt;

&lt;p&gt;What Stanford's 51 cases show again and again is one thing: what separates outcomes is not technology but organization. When coding's autonomy moves to its next stage, what decides it won't be model performance but how the engineering org designs, judges, and builds in verification. The shift has already begun. The organizations that don't put off that design work are the ones that will pull ahead.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>productivity</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>"Reinstalling Won't Fix It": A Cross-App Shared-Auth Deadlock After Switching Phones</title>
      <dc:creator>Kento IKEDA</dc:creator>
      <pubDate>Sat, 30 May 2026 07:26:30 +0000</pubDate>
      <link>https://dev.to/aws-builders/reinstalling-wont-fix-it-a-cross-app-shared-auth-deadlock-after-switching-phones-40do</link>
      <guid>https://dev.to/aws-builders/reinstalling-wont-fix-it-a-cross-app-shared-auth-deadlock-after-switching-phones-40do</guid>
      <description>&lt;p&gt;After migrating to a new Android phone, a few specific apps stopped launching. Amazon Shopping and Kindle would freeze on a blank white or black screen for a while, then close on their own. Reinstalling, clearing storage, updating the OS — none of it helped. Going through the usual support steps changed nothing.&lt;/p&gt;

&lt;p&gt;What finally fixed it was clearing the storage of &lt;em&gt;every&lt;/em&gt; Amazon app at once. Tracing the cause through ADB logs, it turned out that authentication data shared across multiple apps had become inconsistent, and the auth-retrieval step at startup was deadlocking.&lt;/p&gt;

&lt;p&gt;The incident itself happened with a specific Pixel-and-Amazon combination, but structurally it's a pattern that can hit any app where "authentication data shared across apps" meets "many subsystems initialized in parallel at startup for speed." It's worth knowing about whether you design SDKs, build apps, or handle ops and support, so I'm leaving it here as a case study.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: This happened on my own device, with my own account. I'm reading the diagnostic output that the OS itself wrote out — not decompiling any app.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What happened after switching phones
&lt;/h2&gt;

&lt;p&gt;Right after migrating data to a Pixel 10a, only certain apps refused to launch.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Affected: Amazon Shopping, Kindle&lt;/li&gt;
&lt;li&gt;Symptom: open the app, it freezes on a blank white or black screen for about ten seconds, then closes on its own&lt;/li&gt;
&lt;li&gt;Everything else: other apps work fine&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The officially suggested remedies are generic — "reinstall the app," "clear the cache," "update the OS" — and none of them worked. When the root cause is in the OS or the data migration rather than the app itself, the standard support path has a hard time isolating it.&lt;/p&gt;

&lt;p&gt;At first I couldn't even tell whether it was an OS problem, an app problem, or an Amazon account problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  When nothing works, change the framing
&lt;/h2&gt;

&lt;p&gt;I tried all the standard fixes.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reinstalling the affected apps&lt;/li&gt;
&lt;li&gt;Clearing storage of just the affected apps&lt;/li&gt;
&lt;li&gt;Clearing the cache of the affected apps&lt;/li&gt;
&lt;li&gt;Updating every app in the Play Store&lt;/li&gt;
&lt;li&gt;Updating the OS&lt;/li&gt;
&lt;li&gt;Restarting the device&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The affected apps still wouldn't launch, no matter what.&lt;/p&gt;

&lt;p&gt;The key realization: as long as you think of it as "a problem with one app," you won't fix it. Reinstalling just the affected app, or clearing just that app's storage, changes nothing. If that's the case, the cause probably isn't contained within a single app.&lt;/p&gt;

&lt;p&gt;Only once you reframe it as "a problem across a group of apps" does the solution come into view.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hypothesis: the shared authentication data is corrupted
&lt;/h2&gt;

&lt;p&gt;Let me start with a hypothesis built only from public information. I'll back it up with logs in the second half.&lt;/p&gt;

&lt;p&gt;Amazon's Login with Amazon SDK has a mechanism that lets other apps reuse the Amazon Shopping app's logged-in state. This is documented officially.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://developer.amazon.com/docs/login-with-amazon/customer-experience-android.html" rel="noopener noreferrer"&gt;https://developer.amazon.com/docs/login-with-amazon/customer-experience-android.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;According to the docs, if the user is already signed in to the Amazon Shopping app, an app that integrates Login with Amazon won't ask them to re-enter account details — the SDK recognizes and reuses the auth state of the Amazon Shopping app or the Fire OS device. That's single sign-on (SSO). The SDK's internal package name is &lt;code&gt;com.amazon.identity.auth.map.device&lt;/code&gt;, which also appears in Amazon's official migration guide.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://developer.amazon.com/docs/login-with-amazon/upgrade-android-sdk.html" rel="noopener noreferrer"&gt;https://developer.amazon.com/docs/login-with-amazon/upgrade-android-sdk.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What we can say from this is that Amazon's authentication layer (referred to internally in the SDK as MAP) is designed so that other apps can reference the Amazon Shopping app's logged-in state. What the docs directly describe is SSO for apps using Login with Amazon, but it's reasonable to think that Amazon's own apps such as Kindle and Prime Video share the same auth layer too.&lt;/p&gt;

&lt;p&gt;The hypothesis, then: during data migration, only part of the authentication data was carried over in a corrupted state, and the startup process that tries to fetch that shared data is getting stuck. If that's right, it explains why nothing short of wiping the apps that hold the shared data will fix it.&lt;/p&gt;

&lt;p&gt;That said, specifics like "the first-installed app holds the auth data as the representative" or "only one particular app is the source" can't be asserted from public information. I'll check how solid the hypothesis is by reading the ANR trace in the second half.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix: clear storage for the whole group of related apps at once
&lt;/h2&gt;

&lt;p&gt;Here's the fix up front. The concrete steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open &lt;code&gt;Settings &amp;gt; Apps &amp;gt; See all XX apps&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;List every Amazon app installed&lt;/li&gt;
&lt;li&gt;For each one, run &lt;code&gt;Storage &amp;amp; cache &amp;gt; Clear storage&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Once they're all cleared, restart the device&lt;/li&gt;
&lt;li&gt;Then open Amazon Shopping or Kindle — if a login screen appears, you're good&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Examples of apps to target:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Amazon Shopping&lt;/li&gt;
&lt;li&gt;Kindle&lt;/li&gt;
&lt;li&gt;Amazon Prime Video&lt;/li&gt;
&lt;li&gt;Amazon Music&lt;/li&gt;
&lt;li&gt;Amazon Photos&lt;/li&gt;
&lt;li&gt;Amazon Alexa&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The important point is that what you need to wipe is not "the app that's failing" but "every app that shares the authentication data." In my case, the one that finally did it was clearing Prime Video's storage. It was an app I barely ever opened, and until I cleared it, clearing the other Amazon apps did nothing. It may well have been the source of the shared data.&lt;/p&gt;

&lt;p&gt;Migration tools restore apps automatically from the old device's app list. As a result, you can end up with Amazon apps you haven't used in ages — ones you've forgotten you ever installed. In the user's mind it's an "app I don't use," but in the authentication-sharing network it's a full-fledged node, and the corrupted data sitting there drags down the apps you are launching. The instinct to "only clear the app that's failing" or "only clear the apps I actually use" backfires here.&lt;/p&gt;

&lt;h2&gt;
  
  
  Verifying the hypothesis with the ANR trace
&lt;/h2&gt;

&lt;p&gt;Now the main part. Let's verify whether this really is a shared-auth deadlock, using the ANR (Application Not Responding) stack trace.&lt;/p&gt;

&lt;p&gt;The first thing to establish: what's killing the app is an "ANR," not a "crash." A crash throws an exception and the process drops immediately; an ANR is the system force-closing the app after the main thread has failed to respond for some time (roughly several seconds to ten). Freezing on a blank screen and then closing is the classic ANR symptom — not an exception, but a timeout while waiting for a response.&lt;/p&gt;

&lt;p&gt;Since this was happening on my own device with my own account, I connected the Pixel to a Mac over ADB and pulled the diagnostic log (the stack trace) that the OS wrote out when the ANR occurred. Again, I'm not decompiling the app — just reading the diagnostic output the OS left behind.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;adb shell dumpsys dropbox &lt;span class="nt"&gt;--print&lt;/span&gt; data_app_anr | &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-A&lt;/span&gt; 200 &lt;span class="s2"&gt;"Process: com.amazon.mShop.android.shopping"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The "DropBox" in &lt;code&gt;dumpsys dropbox&lt;/code&gt; refers to &lt;code&gt;DropBoxManager&lt;/code&gt;, the Android system-log mechanism that stores diagnostic entries (crashes, ANRs, and so on) over time. It has nothing to do with the cloud storage service of the same name. &lt;code&gt;--print data_app_anr&lt;/code&gt; pulls only the entries tagged as app ANRs, filtered here by Amazon Shopping's process name.&lt;/p&gt;

&lt;p&gt;The trace recorded several threads running in parallel at startup. The key part: they were waiting on each other's locks. Let's read them in order.&lt;/p&gt;

&lt;h3&gt;
  
  
  main thread (tid=1): the UI itself, stuck
&lt;/h3&gt;

&lt;p&gt;The main thread was stuck while running a startup task called &lt;code&gt;AndroidComponentDetectTask&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"main" prio=5 tid=1 Blocked
  at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(...)
  - waiting to lock &amp;lt;0x00eb4d79&amp;gt; held by thread 37
  at com.amazon.platform.service.ServiceRegistryImpl.getService(...)
  at com.amazon.mShop.appStart.AndroidComponentDetectTask.apply(...)
  ...
  at android.app.ActivityThread.handleBindApplication(...)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It's trying to acquire a lock, &lt;code&gt;&amp;lt;0x00eb4d79&amp;gt;&lt;/code&gt;, and waiting for thread 37 to release it. This lock is one the Service Registry (the common registry where each subsystem registers and retrieves itself) takes internally when fetching or creating a service. On Android the main thread &lt;em&gt;is&lt;/em&gt; the UI thread, so when it stops here, nothing gets drawn and the screen stays blank.&lt;/p&gt;

&lt;h3&gt;
  
  
  thread 36: collateral damage waiting on the same lock
&lt;/h3&gt;

&lt;p&gt;The error-reporting init task (thread 36) was waiting on the exact same lock as main.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"StagedExecutor2-pool-19-thread-1" prio=5 tid=36 Blocked
  at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(...)
  - waiting to lock &amp;lt;0x00eb4d79&amp;gt; held by thread 37
  at com.amazon.platform.service.ServiceRegistryImpl.getService(...)
  at com.amazon.mShop.sam.log.SAMLogManager.initialize(...)
  at com.amazon.mShop.errorReporting.ErrorReporter.startSession(...)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Also waiting on &lt;code&gt;&amp;lt;0x00eb4d79&amp;gt;&lt;/code&gt;. This Service Registry lock is a congestion point that multiple threads fight over at startup.&lt;/p&gt;

&lt;h3&gt;
  
  
  thread 37: the culprit, holding a lock while waiting on auth data
&lt;/h3&gt;

&lt;p&gt;The problem thread is thread 37. It was holding &lt;code&gt;&amp;lt;0x00eb4d79&amp;gt;&lt;/code&gt; (the Service Registry lock) while trying to acquire another lock, &lt;code&gt;&amp;lt;0x004a4835&amp;gt;&lt;/code&gt;, and getting stuck.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"StagedExecutor3-pool-20-thread-1" prio=5 tid=37 Blocked
  - waiting to lock &amp;lt;0x004a4835&amp;gt; held by thread 62
  at com.amazon.identity.auth.device.api.MAPAccountManager.getAccount(...)
  at com.amazon.mShop.minerva.MinervaWrapperMAPClient.fetchAndSetAccountAttributeForTeen(...)
  at com.amazon.mShop.minerva.MinervaWrapperMAPClient.&amp;lt;init&amp;gt;(...)
  at com.amazon.mShop.minerva.MinervaWrapperServiceImpl.initializeMinervaClientIfNeeded(...)
  at com.amazon.platform.service.ServiceRegistryImpl.instantiateService(...)
  at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(...)
  - locked &amp;lt;0x00eb4d79&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Reading bottom to top, the sequence is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The Service Registry takes its internal lock &lt;code&gt;&amp;lt;0x00eb4d79&amp;gt;&lt;/code&gt; to create a service (at this moment, main and thread 36 are made to wait)&lt;/li&gt;
&lt;li&gt;Still holding that lock, it proceeds into initializing the metrics SDK (Minerva) client&lt;/li&gt;
&lt;li&gt;Inside that, it calls &lt;code&gt;MAPAccountManager.getAccount&lt;/code&gt; to fetch the currently logged-in account&lt;/li&gt;
&lt;li&gt;The auth SDK (MAP) tries to take another lock, &lt;code&gt;&amp;lt;0x004a4835&amp;gt;&lt;/code&gt;, internally&lt;/li&gt;
&lt;li&gt;But that lock is held by thread 62 and never comes back&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Thread 37 sits in the decisive position that triggers the deadlock: holding the Service Registry lock while frozen waiting for auth data. Because it won't release the lock it holds, main and thread 36 — which are waiting on it — stall in turn.&lt;/p&gt;

&lt;h3&gt;
  
  
  thread 27: another one waiting on the auth lock
&lt;/h3&gt;

&lt;p&gt;On top of that, thread 27 (Weblab, fetching A/B-test flags) was also waiting on the same auth lock &lt;code&gt;&amp;lt;0x004a4835&amp;gt;&lt;/code&gt; as thread 37.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"StagedExecutor1-pool-15-thread-1" prio=5 tid=27 Blocked
  - waiting to lock &amp;lt;0x004a4835&amp;gt; held by thread 62
  at com.amazon.identity.auth.device.api.MultipleAccountManager.getAccountForMapping(...)
  at com.amazon.mShop.sso.SSOUtil.getCurrentAccountFromDisk(...)
  at com.amazon.mShop.core.features.weblab.WeblabServiceImpl.getTreatmentAndCacheForAppStartWithTrigger(...)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Weblab also needs auth information at startup, and via &lt;code&gt;getAccountForMapping&lt;/code&gt; it's waiting on the same auth lock to be released. Note that thread 37 reaches the lock through &lt;code&gt;MAPAccountManager&lt;/code&gt; and thread 27 through &lt;code&gt;MultipleAccountManager&lt;/code&gt; — two different APIs converging on one internal lock.&lt;/p&gt;

&lt;h3&gt;
  
  
  The big picture: auth-data retrieval is where every task converges
&lt;/h3&gt;

&lt;p&gt;Laid out, the dependencies look like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7572jtxat9h9s5dec7vs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7572jtxat9h9s5dec7vs.png" alt=" " width="800" height="774"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What stands out is that the tasks meant to run in parallel at startup (metrics, error reporting, A/B testing, component detection) all ultimately converge on a single point: "fetching the MAP auth data." Minerva and Weblab are supposed to be independent features, yet somewhere in initialization each one reaches for the same auth SDK to find out "who is logged in right now."&lt;/p&gt;

&lt;p&gt;That auth-data retrieval never returns, because the shared data is corrupted. Every task that needs auth stalls; and because the task &lt;em&gt;holding&lt;/em&gt; the Service Registry lock has stalled, even tasks unrelated to auth (main, error reporting) get dragged down. That's the full chain that leaves the screen blank until the ANR fires.&lt;/p&gt;

&lt;p&gt;Following thread 62 — the one stuck while holding the auth lock — it was sending a query to another process via a ContentProvider and waiting for the response. A ContentProvider is Android's mechanism for sharing data between apps, and Amazon's apps appear to use it to pass authentication data around. It seems thread 62 was stuck holding the auth lock because one of the sharing sources never returned a response. Which app, and why it didn't respond, can't be pinned down from this trace alone. But the structure — "go fetch the shared auth data from the source, and it never comes back" — is consistent with the fact that wiping every Amazon app's storage fixed it.&lt;/p&gt;

&lt;p&gt;Strictly speaking, this isn't a circular wait where two threads grab each other's locks (the textbook deadlock). It's a hang: a thread holding a lock freezes waiting on an external process, and the threads waiting on it stall in a chain. But since the outcome — "stuck holding a lock, with everyone waiting on it blocked forever" — is no different from a deadlock, I'm calling it a deadlock in this article.&lt;/p&gt;

&lt;h2&gt;
  
  
  The design pitfalls this case reveals
&lt;/h2&gt;

&lt;p&gt;The textbook lessons — "acquire locks in a consistent order," "don't block the main thread" — apply here too, of course. But what the trace really surfaced is the pitfall that emerges when well-intentioned design decisions pile up. Parallelization for speed, auth lookups for functionality, cross-app data sharing for convenience. Each is reasonable on its own, but stacked together they become the following three pitfalls.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall 1: parallel-init speedups backfire on shared resources
&lt;/h3&gt;

&lt;p&gt;Initializing subsystems in parallel to speed up startup looks like a correct optimization. Indeed, the trace recorded several init tasks running concurrently on separate threads — metrics, error reporting, A/B testing, component detection.&lt;/p&gt;

&lt;p&gt;The thing is, many of them internally call the same shared operations: "register with the Service Registry" and "fetch the current logged-in account." Even run in parallel, they end up serialized on the shared resource's lock. On its own that just makes startup slower — but when the thread holding a lock stalls on something else, everyone waiting gets swept up all at once, as happened here.&lt;/p&gt;

&lt;p&gt;Parallelization aimed at speed becomes effectively serial under shared-resource contention, and in the worst case deadlocks. The most dangerous spot is the assumption that "it parallelized, so it must be faster." When you add startup tasks, you have to look at how each one touches shared resources (registry, auth, settings store) as a set — otherwise you not only fail to get the scaling benefit, you raise the odds of a deadlock.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall 2: auth has become an implicit dependency of every feature
&lt;/h3&gt;

&lt;p&gt;The most surprising thing in the trace was that both metrics and A/B testing — features that look unrelated to auth — were reaching for "who is logged in right now" during initialization. Metrics wants to attach user attributes; A/B testing wants to bucket by account. The reasons are each fair enough, but the result is that the auth SDK has become an implicit dependency point for the entire app.&lt;/p&gt;

&lt;p&gt;When auth-data retrieval jams at a single point, it's not auth itself that stops — it's every feature that referenced auth, stalling in a chain. You need to recognize that auth isn't "a concern around the login screen" but "a critical path of the entire startup sequence." If you count how many subsystems call auth-data retrieval at startup in your own app, the number may be higher than you'd imagine.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall 3: ownership of shared data goes adrift during migration
&lt;/h3&gt;

&lt;p&gt;A design where multiple apps share authentication data is convenient for users — sign in once and you don't need to log in again on the other apps. The problem is that "who owns this shared data, and who fixes it when it breaks" is left implicit.&lt;/p&gt;

&lt;p&gt;Suppose there's an implicit rule like "the first-installed app is the representative." If migration doesn't reproduce the install order or state, the ownership relationship goes adrift. The owner sits there holding corrupted data while other apps go to reference it. The fact that nothing was fixed until I cleared Prime Video this time may have this ownership ambiguity in the background. Shared data needs a fallback — another app taking over, or safely regenerating the data — for when the owner disappears or its data breaks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lessons for support and for users
&lt;/h2&gt;

&lt;p&gt;Even if you're not in a position to change the design, knowing this structure changes how fast you can respond.&lt;/p&gt;

&lt;p&gt;For support: keep in mind that "please reinstall" only works when the problem is contained within a single app. For a post-migration report of "only certain apps won't launch," suspect that migration left the shared data corrupted, and being able to offer the next move — "clear storage for the related apps as a group" — changes the opening response. Even just asking "did this start right after switching phones?" up front can sometimes narrow the investigation considerably.&lt;/p&gt;

&lt;p&gt;For users: think of the cleanup target as "every app from the same provider," not "the app that's giving you trouble." Keeping in mind that even an app you don't use is a node in the sharing network, and that corruption there causes collateral damage, raises your odds of getting out of it on your own.&lt;/p&gt;

&lt;h2&gt;
  
  
  Other cases where the same pattern can occur
&lt;/h2&gt;

&lt;p&gt;It's not just Amazon — there are plenty of designs where multiple apps share authentication information.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sharing auth tokens across apps via Android's standard &lt;code&gt;AccountManager&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Sharing login information across same-signature apps through a &lt;code&gt;ContentProvider&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Groups of apps with a common account platform (cross-app login across several apps from the same company, for example)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When these combine with a design that "initializes many subsystems in parallel at startup," the same conditions line up as in this case: when the shared data breaks, every app chain-fails to launch, and a single reinstall won't fix it. If your own app group meets these two conditions, it's worth checking once how you guarantee consistency across a device migration, and how you degrade or regenerate when the sharing source breaks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;If you run into apps not launching after switching phones, first try "if a single clear doesn't fix it, clear the group of related apps." That's the shortest fix from the user's side. When the sharing source is broken, you have to wipe the source along with the rest.&lt;/p&gt;

&lt;p&gt;From the design side, the three pitfalls this trace surfaced are worth remembering: parallel init can backfire on shared resources; auth tends to become an implicit critical path for every feature; and ownership of shared data goes adrift during migration. Each is a "well-intentioned design" on its own, yet combined they produce an app that won't start.&lt;/p&gt;

&lt;p&gt;"Please reinstall" only holds up as a universal fix for designs contained within a single app. For apps that hold shared data and carry a complex startup sequence, the same symptom recurs even after reinstalling. Just knowing this one structure makes a real difference in how fast you respond the next time you hit the same incident.&lt;/p&gt;

</description>
      <category>android</category>
      <category>adb</category>
      <category>mobile</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>What AgentCore Managed Harness Takes Over, What It Leaves to You</title>
      <dc:creator>Kento IKEDA</dc:creator>
      <pubDate>Fri, 08 May 2026 21:03:13 +0000</pubDate>
      <link>https://dev.to/aws-builders/what-agentcore-managed-harness-takes-over-what-it-leaves-to-you-1je6</link>
      <guid>https://dev.to/aws-builders/what-agentcore-managed-harness-takes-over-what-it-leaves-to-you-1je6</guid>
      <description>&lt;p&gt;On April 22, 2026, AWS added a "managed agent harness" (preview) to Amazon Bedrock AgentCore. With this feature, you declare the model, system prompt, and tools as configuration, and the agent runs—the orchestration code lives on the AWS side as managed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/blogs/machine-learning/get-to-your-first-working-agent-in-minutes-announcing-new-features-in-amazon-bedrock-agentcore/" rel="noopener noreferrer"&gt;https://aws.amazon.com/blogs/machine-learning/get-to-your-first-working-agent-in-minutes-announcing-new-features-in-amazon-bedrock-agentcore/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What stands out about this release is less the feature itself and more AWS's adoption of the term "agent harness." Since Martin Fowler wrote his harness engineering essay in February 2026, Anthropic and OpenAI have started using "harness" officially, and now a cloud vendor has applied the same word to its own service.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://martinfowler.com/articles/harness-engineering.html" rel="noopener noreferrer"&gt;https://martinfowler.com/articles/harness-engineering.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From the perspective of someone who has been assembling a harness by hand, the question becomes: what does managed harness take over, and what stays in my hands? This article sorts out that dividing line. Drawing on experience running business-automation agents with Claude Desktop, multiple MCP servers, and Markdown-based knowledge, I lay out the correspondence with AgentCore managed harness.&lt;/p&gt;

&lt;p&gt;A few "tried it out" articles have already been published, so this article positions itself as the prequel: it offers material for deciding whether to adopt, not adopt, or how to phase in. Drawing on the official blog, documentation, and existing explanatory articles as sources, I sort out the correspondence and the judgment criteria that emerge from self-built operation.&lt;/p&gt;

&lt;h2&gt;
  
  
  AWS released "managed harness"
&lt;/h2&gt;

&lt;p&gt;The official blog mentioned above lays out the structure: every agent has an orchestration layer, and running that layer requires compute, a sandbox to safely execute code, tool connections, persistent storage, and error recovery as the underlying infrastructure—bundled together, they form the agent harness. Managed harness is AWS providing this harness as a managed offering, where the user declares the model, system prompt, and tools as configuration, and a working agent is the result.&lt;/p&gt;

&lt;p&gt;Let me first align on what the word "harness" refers to. The term gets used both for what the vendor builds in (internal) and for what the user assembles around the agent (external), and the meaning shifts with context. In addition to Fowler's framing, watany has organized the internal/external confusion in a Zenn article.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://zenn.dev/watany/articles/d8b692bbca65a3" rel="noopener noreferrer"&gt;https://zenn.dev/watany/articles/d8b692bbca65a3&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This article is written from the position of "someone who has been assembling the external environment by hand"—the user-side harness, in operation. AgentCore managed harness can be read as the vendor-side internal harness now offered as managed, but from the user's perspective, it can also be read as: part of what we used to build for ourselves can now be delegated. This duality is the starting point for thinking about where responsibilities split with self-built operation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Self-built harness composition, the four blank layers
&lt;/h2&gt;

&lt;p&gt;Let me map my self-built harness to AgentCore's components. The environment I've been operating consists, broadly, of three elements, and I'll lay out how each one corresponds to something on the AgentCore side.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Self-built harness&lt;/th&gt;
&lt;th&gt;AgentCore side&lt;/th&gt;
&lt;th&gt;Degree of correspondence&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Markdown knowledge files (under &lt;code&gt;agents/&lt;/code&gt;, &lt;code&gt;knowledge/&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;AgentCore Memory&lt;/td&gt;
&lt;td&gt;Similar role; persistence and retrieval mechanisms differ&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP servers (task management / calendar / chat / document management, etc.)&lt;/td&gt;
&lt;td&gt;AgentCore Gateway&lt;/td&gt;
&lt;td&gt;MCP is becoming the standard, so they're close&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Desktop&lt;/td&gt;
&lt;td&gt;AgentCore Runtime&lt;/td&gt;
&lt;td&gt;The execution base for the agent loop, at a different scale&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;(none)&lt;/td&gt;
&lt;td&gt;AgentCore Identity&lt;/td&gt;
&lt;td&gt;Not implemented in self-built&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;(none)&lt;/td&gt;
&lt;td&gt;AgentCore Policy&lt;/td&gt;
&lt;td&gt;Not implemented in self-built&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;(none)&lt;/td&gt;
&lt;td&gt;AgentCore Observability&lt;/td&gt;
&lt;td&gt;Not implemented in self-built&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;(none)&lt;/td&gt;
&lt;td&gt;AgentCore Evaluations&lt;/td&gt;
&lt;td&gt;Not implemented in self-built&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The top three are the correspondence between "what I assembled by hand" and "what AgentCore provides as managed for the same role." The bottom four are blank layers in the self-built harness—components AgentCore offers that aren't covered by my operation.&lt;/p&gt;

&lt;p&gt;The natural question here is whether these four blank layers are "things I didn't write because I didn't need them" or "things I wanted but had given up on." The two are different. For the former, introducing managed harness yields little value; for the latter, it brings value.&lt;/p&gt;

&lt;p&gt;Let me go through the four layers in order.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Identity&lt;/strong&gt; is for managing authentication and permissions when multiple users access the agent. Since my self-built harness runs on a personal device, authentication can rely on the device login, and per-agent authentication wasn't necessary. This is unnecessary "as long as it's just me." The moment you try to share an agent across an organization, controlling who can call which MCP for what becomes a problem, and the gap surfaces in the form of resignation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Policy&lt;/strong&gt; is the mechanism for declaratively defining boundaries when the agent calls tools. It's based on Cedar, AWS's open-source policy language, and you can generate policies from natural language. In my self-built harness, I draw loose boundaries through MCP server scopes and by documenting "what not to do" in the knowledge files—but this is discipline, not enforcement. I had wanted to write strong, enforceable boundaries, but didn't have the motivation to build a Cedar-equivalent system myself, so I had given up on this area.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Observability&lt;/strong&gt; is the mechanism for emitting agent execution logs, traces, and metrics to CloudWatch for visualization. In my self-built harness, I have the conversation history in Claude Desktop and individual logs from each MCP server, but no mechanism to track "which agent called what when, and how it failed" across the board. For solo use, looking at the chat screen suffices, but this becomes necessary in organizational deployment, and falls into the resignation category.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Evaluations&lt;/strong&gt; is the mechanism for continuously evaluating the agent's response quality, with built-in evaluators for dimensions like helpfulness, tool-selection accuracy, and correctness. In my self-built harness, I check subjectively through knowledge-file improvement history and daily work logs, but I have no quantitative quality monitoring. For solo use, subjective is enough; but for organizational operation or paid services, this becomes essential.&lt;/p&gt;

&lt;p&gt;Looking back at the four layers, only Identity falls into "unnecessary as long as it's just me," while the other three fall into "would have been nice, but had given up on as self-built." The fact that the meaning of "blank" differs by layer affects the judgment of whether to adopt managed harness.&lt;/p&gt;

&lt;h2&gt;
  
  
  Layers managed harness takes over, layers it leaves
&lt;/h2&gt;

&lt;p&gt;When you use managed harness, what stops being something you write, and what continues to require writing? This can be derived as fact from the official blog and documentation, so let me sort it out first.&lt;/p&gt;

&lt;p&gt;What managed harness takes over is the following range:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The agent loop: calling the model, selecting tools, returning results, managing context, and recovering from errors&lt;/li&gt;
&lt;li&gt;A microVM, filesystem, and shell isolated per session&lt;/li&gt;
&lt;li&gt;Tool-connection orchestration via AgentCore Gateway&lt;/li&gt;
&lt;li&gt;The framework portion based on Strands Agents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Conversely, what users still need to write even when using managed harness is the following range:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which model to use&lt;/li&gt;
&lt;li&gt;What to write in the system prompt&lt;/li&gt;
&lt;li&gt;Which tools to make callable&lt;/li&gt;
&lt;li&gt;What goes into AgentCore Memory and what doesn't&lt;/li&gt;
&lt;li&gt;What boundaries to declare in AgentCore Policy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Since declaration-based configuration suffices, the amount of code drops significantly. However, the five items above are simply "what you write as configuration changes"—the judgments themselves don't go away. They just shift into the form of the &lt;code&gt;harness.json&lt;/code&gt; configuration file. Reading preview validation articles by people who have actually tried managed harness, you'll see that &lt;code&gt;harness.json&lt;/code&gt; lists the model and tool list as declarations, while a separate &lt;code&gt;system-prompt.md&lt;/code&gt; file holds the system prompt.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.classmethod.jp/articles/bedrock-agentcore-managed-harness-preview/" rel="noopener noreferrer"&gt;https://dev.classmethod.jp/articles/bedrock-agentcore-managed-harness-preview/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/aws-samples/sample-AgentCore-Managed-Harness-News" rel="noopener noreferrer"&gt;https://github.com/aws-samples/sample-AgentCore-Managed-Harness-News&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This looks like what was previously written as Markdown system-prompt files and MCP connection definitions in the self-built harness, repackaged into AWS's configuration file format.&lt;/p&gt;

&lt;p&gt;In other words, what managed harness takes over is "the labor of writing orchestration code," not "the judgment of designing the agent." Design judgments still rest with the user. AWS expresses this as removing the infrastructure barrier, but the non-infrastructure part—"what is this agent for, and how far should it be allowed to go"—remains on the human side, whether it's managed or self-built.&lt;/p&gt;

&lt;p&gt;This distinction is an important perspective when judging whether to adopt managed harness. The pitch "you don't have to write code" is accurate, but reading it as "you don't have to think" makes it inaccurate.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where self-built operation can articulate "the place of design judgments"
&lt;/h2&gt;

&lt;p&gt;When you operate a self-built harness, you accumulate judgments about "where it's okay to move things, and where you must not." These don't go away when you adopt managed harness. The place where they appear shifts to the contents of &lt;code&gt;harness.json&lt;/code&gt;, but the judgments themselves continue to rest on the human side. Let me name a few representative ones.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Knowledge file granularity.&lt;/strong&gt; Whether to split your Markdown knowledge "by role" or "by task" is a judgment that, once made, eases subsequent operation. Splitting by role lets agent dispatch fall naturally out of context. Splitting by task scatters cross-task knowledge. There's no simple winner; the optimum depends on the number of agents you operate and how tasks overlap. Even with managed harness, the same question—what to combine in Memory and what to separate—remains.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MCP server combination design.&lt;/strong&gt; This is the line between "how far to wire up as tools via MCP" and "how far to handle through local file operations." For example, task management is better suited to MCP via API for automation, while sensitive tasks are safer kept as local file operations—judgments that emerge through use. Managed harness's Gateway has to answer the same question, just translated into declarations in a tool list.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent-to-agent responsibility split.&lt;/strong&gt; This is the design choice between having a coordinator agent that judges context and dispatches to specialist agents, or calling specialist agents directly from the start. The coordinator style depends on context-judgment accuracy; the direct-call style puts the discrimination burden on the user. This too remains as a design judgment in managed harness, in the form of how to arrange and connect multiple harnesses.&lt;/p&gt;

&lt;p&gt;These three are judgments that are hard to articulate without operating self-built first. If you start from managed harness, these judgments end up looking "as if they were optimally placed from the beginning." In reality, you've just fixed the premises, but inside fixed premises, the existence of design judgments themselves becomes harder to see.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why not just use managed harness from the start?
&lt;/h3&gt;

&lt;p&gt;Here's a counterargument I anticipate: "If we just use managed harness from the start, we won't need to build anything ourselves."&lt;/p&gt;

&lt;p&gt;I partially agree with this counterargument. If you're building a new agent for organizational production from zero, going in through managed harness is faster, I think. However, the design of an agent to run in production rarely "is visible from the start." Only by actually using the agent do the granularity of knowledge, the over- and under-supply of tools, and the boundaries of responsibility come into view. Whether you run this discovery flow on top of a managed harness with set boundaries, or on a self-built harness with high freedom, changes the amount of learning you get.&lt;/p&gt;

&lt;p&gt;Another perspective: judgments gained from self-built operation can be reused as a blueprint when you migrate to managed harness. If you go into managed harness without a blueprint, you can produce something that appears to work, but a system remains where it's hard to explain why it was structured that way. Whether "let's just put it on managed harness and improve it as we go" works depends on whether one person is improving or multiple people are improving. For one person, the iteration speed gap between self-built and managed may be small; but at the stage where multiple people improve, the declarative changes in &lt;code&gt;harness.json&lt;/code&gt; and the deploy-unit iteration cycle start to take a toll as operational debt.&lt;/p&gt;

&lt;h2&gt;
  
  
  Order of adoption: where personal and organizational use diverge
&lt;/h2&gt;

&lt;p&gt;Whether to adopt managed harness can naturally branch by operational scale. Let me go through three stages.&lt;/p&gt;

&lt;p&gt;In the personal-use stage, where one person is using the agent, the self-built harness is often sufficient. The editing and use of knowledge files are tightly coupled, and the iteration of "rewrite Markdown the moment you notice something while using it" runs fast. Both Identity and Observability are hard to recognize as gaps as long as you're operating solo, and end up in the "would-be-nice-to-have, maybe" zone. In the experimental stage, this freedom directly translates into learning speed.&lt;/p&gt;

&lt;p&gt;At the stage of expanding to organizational operation where multiple people use the agent, the four blank layers all surface as problems at once. You need audit logs of who used which agent how (Observability); you start running into situations where shared environments must not allow tools to be called freely, so boundaries become necessary (Policy); you need to manage credentials per member (Identity); you want to continuously measure agent response quality (Evaluations). At this stage, the value of managed harness comes to the fore. Comparing the labor of writing the four layers yourself versus putting them on AgentCore, the latter becomes practical.&lt;/p&gt;

&lt;p&gt;In the transition phase, you can take a hybrid strategy. Continue the personal exploration stage with a self-built harness, and put only the confirmed paths used in organizational operation onto managed harness. Move agents whose design has settled to AgentCore in order, and keep agents that are still being learned on while running close at hand.&lt;/p&gt;

&lt;p&gt;There's also a guideline for the order of adoption. The first things needed for organizational deployment are Identity and Observability, then Policy, and finally Evaluations. Without Identity, sharing itself doesn't get established. Without Observability, the organization can't make operational judgments. Policy is often too late after an incident, so placing it early in organizational deployment is safer. Evaluations can come in the order of "after operation gets going, then introduce quality measurement"—that's fine.&lt;/p&gt;

&lt;p&gt;The harness was originally a concept lying at the boundary between those who build agents and those who use them. With AWS releasing managed harness, part of what we used to assemble by hand has shifted into a mechanism that runs simply by declaring it as configuration. The fact that layers like Identity, Observability, and Policy—which I had given up on as self-built—have come within reach is no small thing.&lt;/p&gt;

&lt;p&gt;Even so, design judgments such as "what is this agent for," "what to leave in the knowledge," and "how far to grant tools authority" haven't been put into a form you can declare as configuration. The basis of these judgments will continue to live in the commit history and work logs of one's own repository. The experience of having built a self-built harness leaves behind, in your hands, knowledge that doesn't lose its value when you migrate to managed. With the arrival of managed harness, the boundary between "the layers we build ourselves" and "the layers only human judgment can carry" has become more clearly visible than before, you might say.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>agents</category>
      <category>ai</category>
      <category>bedrock</category>
    </item>
    <item>
      <title>Claude Managed Agents: The Layer That Disappears, The Layer That Stays — A View from Business Automation Agents</title>
      <dc:creator>Kento IKEDA</dc:creator>
      <pubDate>Tue, 05 May 2026 07:29:36 +0000</pubDate>
      <link>https://dev.to/aws-builders/claude-managed-agents-the-layer-that-disappears-the-layer-that-stays-a-view-from-business-4n0</link>
      <guid>https://dev.to/aws-builders/claude-managed-agents-the-layer-that-disappears-the-layer-that-stays-a-view-from-business-4n0</guid>
      <description>&lt;p&gt;On April 8, 2026, Anthropic released Claude Managed Agents. The official framing is "meta-harness," and the engineering blog reports infrastructure-level improvements: p50 TTFT down about 60%, p95 down more than 90%. TTFT is the time from request to first response, where p50 is the median and p95 covers the slowest 5%. Cut the median by 60%, cut the slow tail by 90%. These aren't numbers you get from a minor optimization — they're the kind of numbers an architectural change produces. Early adopters include Notion, Rakuten, Asana, Sentry, and Vibecode.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.anthropic.com/engineering/managed-agents" rel="noopener noreferrer"&gt;https://www.anthropic.com/engineering/managed-agents&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There are already several Japanese articles covering this — terminology breakdowns by watany, builder/user harness classifications by Mr. Katayama (paiza), and trial reports by kumamo_tone and galirage, among others.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://zenn.dev/watany/articles/d8b692bbca65a3" rel="noopener noreferrer"&gt;https://zenn.dev/watany/articles/d8b692bbca65a3&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://note.com/rk611/n/n8424c56f4fa5" rel="noopener noreferrer"&gt;https://note.com/rk611/n/n8424c56f4fa5&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://zenn.dev/kumamo_tone/articles/365845d65e6cf4" rel="noopener noreferrer"&gt;https://zenn.dev/kumamo_tone/articles/365845d65e6cf4&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://zenn.dev/galirage/articles/claude-managed-agents-quickstart" rel="noopener noreferrer"&gt;https://zenn.dev/galirage/articles/claude-managed-agents-quickstart&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But the existing discussion almost entirely assumes coding agents. Notion (coding, spreadsheets, slides), Sentry (bug → PR automation), Vibecode (code generation infrastructure) — they all line up as coding-style use cases. For someone building business automation agents (morning briefings, monthly accounting, QA operations, style audits) with Markdown and MCP, where does Managed Agents fit? That perspective hasn't really been laid out yet.&lt;/p&gt;

&lt;p&gt;I run a personal repository where &lt;code&gt;agents/&lt;/code&gt; holds 15 role-specific agent instructions, &lt;code&gt;knowledge/&lt;/code&gt; holds 40+ knowledge files, and &lt;code&gt;prompts/&lt;/code&gt; holds task templates. I run my work through Claude Desktop and MCP. The "harness engineering with Markdown only" idea I wrote about in a previous article is exactly this kind of setup.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/aws-builders/harness-engineering-with-nothing-but-markdown-g6b"&gt;https://dev.to/aws-builders/harness-engineering-with-nothing-but-markdown-g6b&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With Managed Agents arriving, what happens to this self-hosted harness? Does all of it get replaced? Just part? Or is this a different conversation entirely?&lt;/p&gt;

&lt;p&gt;What this article covers is the boundary between the layer Managed Agents provides and the layer you keep yourself. It's not just about what's technically possible to put on Managed Agents — it's also about why, even when it's possible, keeping it yourself can still be the better call. I'll lay this out in five points. The latter half of the article uses my own agents as a worked example, classifying them as "fits / partial fit / doesn't fit."&lt;/p&gt;

&lt;h2&gt;
  
  
  Coding Agents and Business Automation Agents Make Different Demands on the Harness
&lt;/h2&gt;

&lt;p&gt;Before evaluating Managed Agents from a business automation angle, it's worth checking the assumptions behind the existing case studies.&lt;/p&gt;

&lt;p&gt;The early adopters Anthropic highlights are these:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Notion: parallel coding, spreadsheet, and slide tasks delegated within the Notion workspace&lt;/li&gt;
&lt;li&gt;Rakuten: specialist agents per department, each shipped within a week&lt;/li&gt;
&lt;li&gt;Asana: AI Teammates picking up tasks inside projects&lt;/li&gt;
&lt;li&gt;Sentry: bugs detected and turned autonomously into pull requests&lt;/li&gt;
&lt;li&gt;Vibecode: code generation infrastructure with 10x faster setup&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The mix runs from coding-leaning examples (Notion, Sentry, Vibecode) to department-business types (Rakuten, Asana), but they all share one thing: long-running, autonomous tasks that involve continuous operations on files and resources. Managed Agents features like "$0.08/session-hour," "checkpointing for long-running tasks," and "sandboxed code execution" are tuned for this kind of workload.&lt;/p&gt;

&lt;p&gt;On the other hand, here are the kinds of uses you might see from someone building business automation agents with Markdown and MCP. Drawing from my own active and planned setups:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Morning briefings (calendar, Slack mentions, Gmail, news summaries pulled together each morning)&lt;/li&gt;
&lt;li&gt;Monthly accounting support (pulling transactions from freee, aggregating in Excel, sharing with stakeholders) — under construction&lt;/li&gt;
&lt;li&gt;QA operations (reviewing MagicPod test runs, recording problematic test cases in Confluence, sharing in Slack)&lt;/li&gt;
&lt;li&gt;Style audits (checking article drafts against &lt;code&gt;writing-style-guide.md&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;1on1 prep (consolidating past notes, organizing discussion points)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Lining the two up, the demands on the harness can be sorted into roughly four points:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Coding agent&lt;/th&gt;
&lt;th&gt;Business automation agent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Main targets of operation&lt;/td&gt;
&lt;td&gt;File system + repositories&lt;/td&gt;
&lt;td&gt;SaaS APIs (Slack, Calendar, Gmail, freee, etc.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Execution time&lt;/td&gt;
&lt;td&gt;Long-running tasks of minutes to hours&lt;/td&gt;
&lt;td&gt;Repeated short tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Where state lives&lt;/td&gt;
&lt;td&gt;File state inside the container persists&lt;/td&gt;
&lt;td&gt;State lives on the SaaS side, local state is transient&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Triggers&lt;/td&gt;
&lt;td&gt;Initiated by humans (chat UI)&lt;/td&gt;
&lt;td&gt;Mix of schedules, events, and human prompts&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The way Managed Agents is designed fits use cases where the file system persists and tasks run for a long time. Sandboxes, checkpoints, and session-runtime billing all make sense in that context. For business automation use — many short tasks each calling SaaS APIs — most of these features won't get fully used.&lt;/p&gt;

&lt;p&gt;This isn't to say "business automation doesn't fit Managed Agents." Different use cases mean different things you actually get out of Managed Agents. With that in mind, the next sections cover how to combine it with a self-hosted harness.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Managed Agents meta-harness Actually Provides
&lt;/h2&gt;

&lt;p&gt;Reading the official engineering post "Scaling Managed Agents: Decoupling the brain from the hands" (April 8, 2026) makes the design philosophy of Managed Agents clear.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.anthropic.com/engineering/managed-agents" rel="noopener noreferrer"&gt;https://www.anthropic.com/engineering/managed-agents&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The opening sentence captures the whole thing.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Harnesses encode assumptions that go stale as models improve.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;As a concrete example, Sonnet 4.5 had a behavior where it would wrap up tasks early just before hitting the context limit ("context anxiety"), and harnesses implemented context resets to compensate. Run the same harness on Opus 4.5, and the behavior is gone — the resets become dead weight. Corrections you bake into the harness become unnecessary as the model gets smarter. That's the observation.&lt;/p&gt;

&lt;p&gt;So Anthropic chose to abstract the harness itself. Just as an OS virtualizes hardware behind abstractions like &lt;code&gt;process&lt;/code&gt; and &lt;code&gt;file&lt;/code&gt;, Managed Agents separates an agent into three pieces:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Session: an append-only event log. The source of truth for everything that happened&lt;/li&gt;
&lt;li&gt;Harness: a stateless loop that calls Claude and routes tool calls&lt;/li&gt;
&lt;li&gt;Sandbox: the execution environment for code and file operations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The harness calls the sandbox through a simple &lt;code&gt;execute(name, input) → string&lt;/code&gt; interface. Containers, smartphones, Pokémon emulators — anything fits behind the same abstraction, as the official post puts it.&lt;/p&gt;

&lt;p&gt;Where this decoupling pays off is that containers go from "pet" to "cattle." A pet is a uniquely cared-for, named individual; cattle are interchangeable, managed by number — that's the infrastructure ops metaphor. If a container dies, the harness receives it as a tool call error and provisions a new container. If the harness itself dies, you can &lt;code&gt;wake(sessionId)&lt;/code&gt;, call &lt;code&gt;getSession(id)&lt;/code&gt; to retrieve the event log, and resume from the last event. Only the session log is persisted. That's the design.&lt;/p&gt;

&lt;p&gt;The TTFT improvements mentioned at the start (p50 down about 60%, p95 down more than 90%) come from this decoupling. Inference can begin without waiting for container provisioning.&lt;/p&gt;

&lt;p&gt;Anthropic positions its own service as a "meta-harness." Quoting the conclusion of the article:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Managed Agents is a meta-harness in the same spirit, unopinionated about the specific harness that Claude will need in the future.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In other words, what Anthropic provides is "a stable interface that any harness can sit on top of" (the virtualization of session/harness/sandbox), not a prescription for "this is your harness." Claude Code, task-specific harnesses, custom harnesses — all of them are meant to run on top of it.&lt;/p&gt;

&lt;p&gt;That's what Managed Agents actually is.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://platform.claude.com/docs/en/managed-agents/overview" rel="noopener noreferrer"&gt;https://platform.claude.com/docs/en/managed-agents/overview&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The next section moves on to what sits on top: the contents of the self-hosted harness.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Self-Hosted Harness Is Two Layers: Managed Agents Replaces the Bottom Half, You Keep the Top
&lt;/h2&gt;

&lt;p&gt;What the official meta-harness design provides is the three abstractions: &lt;code&gt;session / harness / sandbox&lt;/code&gt;. In other words, the OS layer that makes an agent run — the substrate that hosts what's above, like processes, file systems, and memory.&lt;/p&gt;

&lt;p&gt;So what does a self-hosted harness put on top of that? In my own setup, it looks like this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ikenyal-ai-agents/
├── agents/                  # role-specific agent instructions
│   ├── executive-assistant.md
│   └── ...                  # other role definitions
├── knowledge/               # knowledge base
│   ├── writing-style-guide.md
│   ├── article-strategy.md
│   └── ...                  # various contexts
├── prompts/                 # task templates
│   ├── morning-briefing.md
│   └── 1on1-prep.md
├── tasks/                   # task definitions
├── scripts/                 # analysis and automation scripts
├── docs/                    # work logs and operational docs
└── README.md                # root instructions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These sit on a different layer than the OS layer Managed Agents provides. They express what the agent "knows," "how it should behave," and "what's off limits" — the territory you might call the knowledge layer.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Contents&lt;/th&gt;
&lt;th&gt;Examples&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OS layer&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Managed Agents (meta-harness)&lt;/td&gt;
&lt;td&gt;agent loop, tool execution, sandbox, session persistence&lt;/td&gt;
&lt;td&gt;the three abstractions of &lt;code&gt;session / harness / sandbox&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Knowledge layer&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;self-hosted repository&lt;/td&gt;
&lt;td&gt;agent behavior instructions, organizational context, domain knowledge, style conventions&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;agents/&lt;/code&gt;, &lt;code&gt;knowledge/&lt;/code&gt;, &lt;code&gt;prompts/&lt;/code&gt;, &lt;code&gt;CLAUDE.md&lt;/code&gt;, &lt;code&gt;AGENTS.md&lt;/code&gt;, &lt;code&gt;SKILL.md&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The existing Japanese articles (watany's terminology breakdown, Mr. Katayama's builder/user harness classification) discuss the harness as one big thing. To read accurately what Managed Agents actually changes, you need to see this two-layer structure.&lt;/p&gt;

&lt;p&gt;What Managed Agents replaces is the OS layer only. The knowledge layer stays as is.&lt;/p&gt;

&lt;p&gt;That was the technical-fact part. From here, this article gets into its main argument. Managed Agents lets you register Skills (&lt;code&gt;SKILL.md&lt;/code&gt;) and agent definitions, so technically you can put parts of the knowledge layer on it too. Even so, why is keeping it self-hosted the better choice? The next section breaks it down across five points.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why You Shouldn't Hand the Knowledge Layer to Anthropic — Five Points
&lt;/h2&gt;

&lt;p&gt;Reading the official Managed Agents docs, you can register the &lt;code&gt;mcp_servers&lt;/code&gt; definition, the choice of &lt;code&gt;tools&lt;/code&gt;, the &lt;code&gt;system&lt;/code&gt; prompt, and Skills (&lt;code&gt;SKILL.md&lt;/code&gt;) all as part of the agent definition. Technically, the knowledge layer can ride on Managed Agents too.&lt;/p&gt;

&lt;p&gt;The argument of this article is that even so, keeping it self-hosted is the better call. Five reasons.&lt;/p&gt;

&lt;h3&gt;
  
  
  Point 1: Where the data lives changes
&lt;/h3&gt;

&lt;p&gt;Business automation agents often include organizational context. In my case, &lt;code&gt;agents/&lt;/code&gt; contains things like the structure of my organization, operational know-how, contact information, and various judgment criteria for business decisions. If you register all of this as part of Managed Agents, it becomes a resource on Anthropic's side. Until you explicitly delete it, it stays there.&lt;/p&gt;

&lt;p&gt;What's worth being aware of is the data retention character of Managed Agents. The official "API and data retention" doc states clearly:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://platform.claude.com/docs/en/build-with-claude/api-and-data-retention" rel="noopener noreferrer"&gt;https://platform.claude.com/docs/en/build-with-claude/api-and-data-retention&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Claude Managed Agents is a stateful resource. You can delete session transcripts, but there is no automatic deletion.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;There's a similar note for Skills (&lt;code&gt;SKILL.md&lt;/code&gt;):&lt;/p&gt;

&lt;p&gt;&lt;a href="https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview" rel="noopener noreferrer"&gt;https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Agent Skills is not covered by ZDR arrangements. Data is retained according to the feature's standard retention policy.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;ZDR (Zero Data Retention) is a contractual option Anthropic offers to enterprise API customers who go through Anthropic's review and sign individually. It guarantees that data sent through the API isn't retained on Anthropic's side. It's often cited as a precondition for handling internal data with AI. Managed Agents and Agent Skills are out of scope even under that strictest contract — that's the current positioning.&lt;/p&gt;

&lt;p&gt;Whether or not you have a ZDR arrangement, your agent definitions, sessions, and skills are all retained on Anthropic's side. Unless you explicitly delete them, they don't go away on their own.&lt;/p&gt;

&lt;p&gt;This isn't a "this is absolutely a no-go" kind of statement — it's a question of where the data lives and how you can move it around. Git management also typically uses external services like GitHub, so the storage being external is the same. The difference is that with git, you can choose the storage (GitHub, GitLab, self-hosted, or local-only without a remote), and the content stays as Markdown that can move anywhere. Once you register something in Managed Agents, the location is Anthropic, and the format is Anthropic's proprietary JSON structure — that's the fixed shape. When designing an agent that handles internal data, this difference becomes a factor in the decision.&lt;/p&gt;

&lt;h3&gt;
  
  
  Point 2: Friction in the Edit-and-Run Workflow
&lt;/h3&gt;

&lt;p&gt;With a self-hosted harness, the update cycle for the knowledge layer goes like this. Open the editor, edit &lt;code&gt;agents/executive-assistant.md&lt;/code&gt;, save. Claude Desktop picks it up on the next session — instant reflection. The whole thing takes seconds.&lt;/p&gt;

&lt;p&gt;With Managed Agents, you edit the file, then make an API call (&lt;code&gt;create / update agent&lt;/code&gt;) and restart the session. It's not instant — the API call adds a step in the middle.&lt;/p&gt;

&lt;p&gt;Where this cost actually shows up is when edits happen "the moment you notice something while using it." While running an agent, you realize "this instruction is too verbose" or "I want to add this here," switch to the editor, fix the relevant file, save, see it on the next message — that flow happens routinely.&lt;/p&gt;

&lt;p&gt;The bigger difference isn't the time itself, but whether the flow of thought breaks. With a self-hosted harness, edit-and-reflect is part of the "using it" flow. With Managed Agents, the API-call step interrupts. There's an option to write a "Markdown → API sync script" yourself, but that script then becomes its own maintenance target.&lt;/p&gt;

&lt;h3&gt;
  
  
  Point 3: Losing the Benefits of Git Management
&lt;/h3&gt;

&lt;p&gt;The knowledge layer is a continuous loop of trial and error. You rewrite an agent's instructions, see what happens, rewrite again. &lt;code&gt;git diff&lt;/code&gt; shows you what changed, &lt;code&gt;git log&lt;/code&gt; lets you trace history, &lt;code&gt;git blame&lt;/code&gt; tells you why something was added. If you don't like where it's going, branch off and experiment.&lt;/p&gt;

&lt;p&gt;None of this works through a Managed Agents agent-definition API. Anthropic's side likely has version control of some kind, but the wider &lt;code&gt;git&lt;/code&gt; toolchain ecosystem (GitHub, PRs, CI, code review, cherry-pick, rebase) doesn't apply.&lt;/p&gt;

&lt;p&gt;The evolution of the knowledge layer has value when you can look back at it through git history. Being able to trace "when and why was that one line added to &lt;code&gt;executive-assistant.md&lt;/code&gt;" alongside the commit message — that's a small thing that quietly props up your operational confidence.&lt;/p&gt;

&lt;h3&gt;
  
  
  Point 4: Open-Standard Portability
&lt;/h3&gt;

&lt;p&gt;This is the point I personally weight the most.&lt;/p&gt;

&lt;p&gt;In my previous DESIGN.md article, I covered how &lt;code&gt;AGENTS.md&lt;/code&gt; and &lt;code&gt;SKILL.md&lt;/code&gt; are open standards.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/aws-builders/agentsmd-skillmd-designmd-how-ai-instructions-split-into-three-layers-d0g"&gt;https://dev.to/aws-builders/agentsmd-skillmd-designmd-how-ai-instructions-split-into-three-layers-d0g&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;AGENTS.md&lt;/code&gt; is jointly promoted by OpenAI, Google, Sourcegraph, Cursor, Factory and others, and was donated to the Linux Foundation in December 2025. &lt;code&gt;SKILL.md&lt;/code&gt; is the core of the Agent Skills standardized by agentskills.io.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://agents.md/" rel="noopener noreferrer"&gt;https://agents.md/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://agentskills.io/" rel="noopener noreferrer"&gt;https://agentskills.io/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Multiple AI agents — Codex, Claude Code, Cursor, GitHub Copilot — read the same files.&lt;/p&gt;

&lt;p&gt;Managed Agents agent definitions, on the other hand, are an Anthropic-proprietary JSON structure that bundles fields like &lt;code&gt;name&lt;/code&gt;, &lt;code&gt;model&lt;/code&gt;, &lt;code&gt;system&lt;/code&gt;, &lt;code&gt;tools&lt;/code&gt;, &lt;code&gt;mcp_servers&lt;/code&gt;, &lt;code&gt;skills&lt;/code&gt;, etc. Registering &lt;code&gt;SKILL.md&lt;/code&gt; to Managed Agents makes it work, but it's a registration confined to Anthropic — Codex and Cursor can't see it.&lt;/p&gt;

&lt;p&gt;That's less "vendor lock-in" and more like a re-lock-in of something that just got standardized into one specific implementation. Against the trend of &lt;code&gt;AGENTS.md&lt;/code&gt; / &lt;code&gt;SKILL.md&lt;/code&gt; spreading as open standards, choosing to confine your own knowledge to a vendor-specific format doesn't have a compelling reason to actively pick.&lt;/p&gt;

&lt;p&gt;A repo that holds &lt;code&gt;AGENTS.md&lt;/code&gt; / &lt;code&gt;SKILL.md&lt;/code&gt; itself is a way to keep a "neutral location" — referenceable equally from Managed Agents, Codex, Cursor, and other AI agents.&lt;/p&gt;

&lt;h3&gt;
  
  
  Point 5: Speed of Testing and Iteration
&lt;/h3&gt;

&lt;p&gt;In the "growing it" phase of an agent, cycle speed is what determines quality. Rewrite a line of instruction, try it, see what happens, rewrite again. The faster that loop, the higher the agent's accuracy ends up.&lt;/p&gt;

&lt;p&gt;With a self-hosted harness (Claude Desktop + Markdown), you rewrite, save, see it on the next message — seconds.&lt;/p&gt;

&lt;p&gt;With Managed Agents, you call the agent-update API, rebuild the environment, restart the session, then test. Every cycle has API-mediated steps in it. For a "growing it" phase, that tends to work against you.&lt;/p&gt;

&lt;p&gt;For long-running production tasks (sessions of multiple hours and up), Managed Agents' stability and scalability really shine. But many business automation agents stay in the "fed and grown daily" phase for a long time. My &lt;code&gt;executive-assistant.md&lt;/code&gt; has been getting some kind of weekly tweak for several months now.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Comes into View When You Bundle These Five
&lt;/h3&gt;

&lt;p&gt;"Can be put on" and "should be put on" are different problems. Just as the official Managed Agents design points toward a &lt;code&gt;meta-harness&lt;/code&gt;, the knowledge layer that sits on top can also reasonably stay outside the meta-harness, judged across the points above.&lt;/p&gt;

&lt;p&gt;Just as the official side chose three-way separation (session / harness / sandbox) and a design that doesn't dictate the shape of what goes on top, the user side can equally make the choice of "keep the knowledge layer self-hosted" without dictating its shape. That feels like a natural conclusion when you see things through the two-layer structure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Classifying My Own Agents into "Fits / Partial / Doesn't Fit"
&lt;/h2&gt;

&lt;p&gt;Mapping the discussion onto my own agents: classifying them across "fits / partial fit / doesn't fit" on Managed Agents, the patterns roughly look like this.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type of use&lt;/th&gt;
&lt;th&gt;Main tools&lt;/th&gt;
&lt;th&gt;Verdict&lt;/th&gt;
&lt;th&gt;Reason&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Personal-assistant style (calendar, mail, chat, local files)&lt;/td&gt;
&lt;td&gt;calendar, mail, chat, ticket management, web search, local file ops&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;Main MCPs are remote; local file-ops MCPs don't exist as remote, that part doesn't fit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Infrastructure-ops support&lt;/td&gt;
&lt;td&gt;cloud APIs, chat, docs&lt;/td&gt;
&lt;td&gt;Fits&lt;/td&gt;
&lt;td&gt;All SaaS, long-running tasks are also a comfortable assumption&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Project / org-ops support&lt;/td&gt;
&lt;td&gt;chat, ticket management&lt;/td&gt;
&lt;td&gt;Fits&lt;/td&gt;
&lt;td&gt;All main tools complete via remote MCP, no local dependency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Test-quality ops&lt;/td&gt;
&lt;td&gt;test automation tool MCPs, chat, docs&lt;/td&gt;
&lt;td&gt;Doesn't fit&lt;/td&gt;
&lt;td&gt;Main test automation tool MCPs are local stdio-only, so they can't be called from Managed Agents which assumes remote&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Department functions (accounting, legal, etc.)&lt;/td&gt;
&lt;td&gt;SaaS APIs, chat, local Excel or doc references&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;The SaaS API side is fine on remote MCPs, but local Excel and doc references don't ride along, and how internal data is handled also needs an organizational governance call&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Conversational thinking-organization (morning briefings, 1on1 prep, etc.)&lt;/td&gt;
&lt;td&gt;calendar, chat, web search&lt;/td&gt;
&lt;td&gt;Doesn't fit&lt;/td&gt;
&lt;td&gt;Designed in tandem with the Claude Desktop conversational experience (digging in by dialog) — autonomous Managed Agents isn't the right fit&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;What's notable is that the knowledge layer stays self-hosted across every pattern. Even the agents I judged as "fits" still keep their role definitions, organizational context, and style conventions in the self-hosted repo. What's placed on the Managed Agents side is the OS-layer functionality (sandbox, harness loop, session persistence, auth vault).&lt;/p&gt;

&lt;p&gt;This is the concrete instance of the two-layer structure shown in earlier sections. The OS layer is something you can hand off to Anthropic; the knowledge layer stays in your hands. Per agent, you decide "OS layer handed off / OS layer kept" — that's the shape of the call.&lt;/p&gt;

&lt;p&gt;The judgment axes for fitting your own agents into this look like four:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Whether the main tools complete on remote MCPs, or depend on local tools&lt;/li&gt;
&lt;li&gt;Whether the data being handled includes internal-organization data, or doesn't&lt;/li&gt;
&lt;li&gt;Long-running tasks, or repeated short tasks&lt;/li&gt;
&lt;li&gt;"Growing it" phase, or "operating it" phase&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Looking through these, decide per agent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anticipated Counterarguments and Responses
&lt;/h2&gt;

&lt;p&gt;Three counterarguments, taken in turn.&lt;/p&gt;

&lt;h3&gt;
  
  
  Counter 1: If the main MCPs are remote-capable, why not put it all on?
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Main MCPs (Slack, Atlassian, Calendar, Gmail, freee) are remote-capable, so why not put all the business automation on Managed Agents?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;True — as of May 2026, many of the main MCPs you'd want for business automation are remote-capable: Atlassian Rovo, Slack, Google Calendar/Gmail, freee, and so on. The pool of "agents you could put on it" is growing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://platform.claude.com/docs/en/managed-agents/mcp-connector" rel="noopener noreferrer"&gt;https://platform.claude.com/docs/en/managed-agents/mcp-connector&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://platform.claude.com/docs/en/agents-and-tools/remote-mcp-servers" rel="noopener noreferrer"&gt;https://platform.claude.com/docs/en/agents-and-tools/remote-mcp-servers&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That said, what this article argues isn't "don't put it on because you can't" — it's "even when you can, you don't always should." The OS layer is a candidate for putting on; whether to put the knowledge layer on it has to be judged individually across the five points (where data lives, edit workflow, git management, open standards, iteration speed).&lt;/p&gt;

&lt;h3&gt;
  
  
  Counter 2: $0.08/hour seems acceptable, doesn't it?
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;$0.08/hour seems acceptable, doesn't it?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For short tasks, no problem. If a morning briefing finishes in 10 minutes, then 20 working days × 10 minutes — about 3.3 hours × $0.08 = $0.26, plus token billing. At that scale, fine.&lt;/p&gt;

&lt;p&gt;The question is whether you move the agents you use daily on Claude Desktop. Use that's typical of "open all day during work hours" doesn't translate to Managed Agents cleanly: session billing and token billing scale directly with running time. Same usage, same outputs — the cost is likely to go up.&lt;/p&gt;

&lt;p&gt;"Put it on Managed Agents / use it on Claude Desktop / use both" is something to decide per agent based on use case and cost structure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Counter 3: With existing case studies, why not put business automation on it too?
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;With case studies like Notion, Sentry, Vibecode, why not put business automation on it too?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;These case studies are all coding-style (code generation, bug fixing, spreadsheet operations). Business automation agents typify a different shape (SaaS integration, monthly reports, QA ops) — the demands on the harness are different, as covered earlier.&lt;/p&gt;

&lt;p&gt;And in fact, none of these case studies are entirely closed inside Managed Agents either. Notion runs in Notion, Sentry in Sentry's infrastructure, Vibecode on Vibecode's platform — each with their own knowledge and UX. Managed Agents functions as the OS layer underneath. That lines up with the two-layer structure this article argues for.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where to Place the First Move
&lt;/h2&gt;

&lt;p&gt;If you're going to actually start somewhere, this kind of flow makes sense.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Pick out the SaaS-integration-centric agents in your own collection&lt;/strong&gt;: agents without local file ops or desktop integration are the candidates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Check whether the MCPs each agent uses are remote-capable&lt;/strong&gt;: Slack, Atlassian, Calendar, Gmail, freee are already remote; others, check individually&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Put just one agent on Managed Agents and run it&lt;/strong&gt;: don't migrate everything at once — get the operational feel from one&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep the knowledge layer in the self-hosted repository&lt;/strong&gt;: register the agent definition through the API, but keep &lt;code&gt;agents/&lt;/code&gt;, &lt;code&gt;knowledge/&lt;/code&gt;, &lt;code&gt;prompts/&lt;/code&gt; in git. Treat the Markdown as canonical, the API registration as a mirror&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Expand the put-on range gradually, or don't&lt;/strong&gt;: after running one, if cost, speed, and edit workflow are fine, move to the next; if they aren't, keep it self-hosted&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;"All rewritten" and "all self-hosted" are both extremes. Per agent — that feels like the realistic landing.&lt;/p&gt;

&lt;p&gt;Harnesses encode assumptions that go stale as models improve. Those are the official words. With Managed Agents arriving, self-hosted OS-layer implementations do go stale. There's no longer a need to write the sandbox, agent loop, and session persistence yourself.&lt;/p&gt;

&lt;p&gt;But the knowledge layer above continues to be the place where organizational and personal context lives. &lt;code&gt;AGENTS.md&lt;/code&gt; and &lt;code&gt;SKILL.md&lt;/code&gt; are referenced as open standards by multiple AI agents. Managed by git, edited in the editor in seconds. Things you've grown like a &lt;code&gt;writing-style-guide.md&lt;/code&gt; of your own keep evolving in your own repository, not as a stateful resource on Anthropic's side.&lt;/p&gt;

&lt;p&gt;The next step in harness engineering starts from thinking in layers.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>claude</category>
      <category>mcp</category>
    </item>
    <item>
      <title>AGENTS.md, SKILL.md, DESIGN.md: How AI Instructions Split into Three Layers</title>
      <dc:creator>Kento IKEDA</dc:creator>
      <pubDate>Sat, 02 May 2026 21:35:11 +0000</pubDate>
      <link>https://dev.to/aws-builders/agentsmd-skillmd-designmd-how-ai-instructions-split-into-three-layers-d0g</link>
      <guid>https://dev.to/aws-builders/agentsmd-skillmd-designmd-how-ai-instructions-split-into-three-layers-d0g</guid>
      <description>&lt;p&gt;In April 2026, Google Labs released a spec called &lt;code&gt;DESIGN.md&lt;/code&gt;. It's a design system specification readable by AI agents, packaged with a CLI validator: &lt;code&gt;npx @google/design.md lint&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;With &lt;code&gt;DESIGN.md&lt;/code&gt; in the picture, we now have three different file types for instructing AI agents. &lt;code&gt;AGENTS.md&lt;/code&gt; has been spreading as an industry standard since 2025 (jointly developed by OpenAI, Google, Sourcegraph, Cursor, and Factory; donated to the Linux Foundation in December 2025). &lt;code&gt;SKILL.md&lt;/code&gt; sits at the core of Anthropic's Claude Skills. And now &lt;code&gt;DESIGN.md&lt;/code&gt;. The three handle different concerns and don't overlap.&lt;/p&gt;

&lt;p&gt;This article is for developers using coding agents like Claude Code, Cursor, or Codex in their work, and for tech leads operating natural-language instruction files like CLAUDE.md and style guides. If your team is doing Spec-Driven Development (SDD), this should also reach you.&lt;/p&gt;

&lt;p&gt;What I want to lay out is two things: how AI instructions are starting to split across three layers — behavior, individual tasks, and visual appearance — and how that connects with SDD as a parallel movement.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Old Pattern: Natural-Language Documents
&lt;/h2&gt;

&lt;p&gt;A few years into the ChatGPT era, most engineers have written some form of "rules I want the AI to follow" in a Markdown file. CLAUDE.md, styleguide.md, CONTRIBUTING.md, internal coding conventions. The locations vary, but the format is roughly the same: unstructured natural language.&lt;/p&gt;

&lt;p&gt;A &lt;code&gt;writing-style-guide.md&lt;/code&gt; file I've been building over the past few months is a typical example. It's a style guide I use when writing technical articles with Claude — a list of patterns common in AI-generated text, written down as forbidden phrases. By making Claude Desktop read it every session, the tone of my output stays consistent. It's part of a personal repository (&lt;code&gt;ikenyal-ai-agents&lt;/code&gt;) I use as the harness for my business automation agents — the one I covered in my previous post.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/aws-builders/harness-engineering-with-nothing-but-markdown-g6b"&gt;https://dev.to/aws-builders/harness-engineering-with-nothing-but-markdown-g6b&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The file contains roughly 150 lines: rules like "don't use em dashes," "avoid invitations like 'let's try…!'," "drop AI-style preambles like 'what's interesting is…'." The same repository has 15 instruction files under &lt;code&gt;agents/&lt;/code&gt;, organized by team and role: &lt;code&gt;executive-assistant.md&lt;/code&gt;, &lt;code&gt;sre-support.md&lt;/code&gt;, &lt;code&gt;qa-support.md&lt;/code&gt;, &lt;code&gt;accounting.md&lt;/code&gt;. Each describes "the assumptions to operate under as this role" in plain natural language.&lt;/p&gt;

&lt;p&gt;This approach has clear benefits. You can articulate tone, stance, and implicit rules. New team members can read the files and pick up the expectations. With CLAUDE.md, Claude Code reads it every session, so persona-level instructions land consistently.&lt;/p&gt;

&lt;p&gt;There are limits, too. First, validation falls on humans. Whether a rule was followed or not gets decided by a human reading the output. Second, individual judgment leaks in. "Write politely" means different things to different reviewers.&lt;/p&gt;

&lt;p&gt;The third limit is the actual subject of this article. Rules that are formally verifiable (forbidden phrases, em-dash usage, specific pattern matches) and rules that require judgment (tone, structural choices, how to open with empathy) sit in the same file. So even the verifiable parts end up depending on human review. That's the problem the three new file types are addressing.&lt;/p&gt;

&lt;h2&gt;
  
  
  New Type 1: How DESIGN.md (Google Labs) Specifies Visual Appearance
&lt;/h2&gt;

&lt;p&gt;On April 10, 2026, Google Labs published the &lt;code&gt;DESIGN.md&lt;/code&gt; specification at &lt;code&gt;google-labs-code/design.md&lt;/code&gt;. As of early May, the repo has over 11,000 stars. It's the reference implementation for Google Stitch (&lt;code&gt;stitch.withgoogle.com&lt;/code&gt;), an AI-driven UI generation product.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/google-labs-code/design.md" rel="noopener noreferrer"&gt;https://github.com/google-labs-code/design.md&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The specification doc lives on the Stitch side.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://stitch.withgoogle.com/docs/design-md/specification" rel="noopener noreferrer"&gt;https://stitch.withgoogle.com/docs/design-md/specification&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What &lt;code&gt;DESIGN.md&lt;/code&gt; covers is the design system specification. You write machine-readable design tokens in YAML at the top of the file (colors, typography, spacing, components), and human-readable design intent in the Markdown body underneath. Both live in the same file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Heritage&lt;/span&gt;
&lt;span class="na"&gt;colors&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;primary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;#1A1C1E"&lt;/span&gt;
  &lt;span class="na"&gt;tertiary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;#B8422E"&lt;/span&gt;
&lt;span class="na"&gt;typography&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;h1&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;fontFamily&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Public Sans&lt;/span&gt;
    &lt;span class="na"&gt;fontSize&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;3rem&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gu"&gt;## Overview&lt;/span&gt;

Architectural Minimalism meets Journalistic Gravitas.

&lt;span class="gu"&gt;## Colors&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; Primary (#1A1C1E): Deep ink for headlines and core text.
&lt;span class="p"&gt;-&lt;/span&gt; Tertiary (#B8422E): "Boston Clay", the sole driver for interaction.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The headline feature of this format is the CLI validator that ships with it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx @google/design.md lint DESIGN.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This checks token reference integrity, WCAG contrast ratios, and structural rule compliance, returning the result as JSON. Wire it into CI and you can verify design system consistency on every pull request. There's also a &lt;code&gt;diff&lt;/code&gt; command that compares two &lt;code&gt;DESIGN.md&lt;/code&gt; files and returns token-level changes in a structured form. Design system version control — historically a manual process — gains a verifiable layer.&lt;/p&gt;

&lt;p&gt;For Japanese UIs, the Google Labs spec alone falls short. It doesn't define the typography requirements specific to Japanese (CJK font fallback chains, line height, letter-spacing, kinsoku shori, mixed typesetting). The gap is filled by &lt;code&gt;kzhrknt/awesome-design-md-jp&lt;/code&gt;, which publishes Japan-localized &lt;code&gt;DESIGN.md&lt;/code&gt; files for over 10 services including Apple Japan, SmartHR, freee, note, MUJI, Mercari, LINE, and Toyota. For Japanese products, using both the Google Labs spec and the Japan edition together is the practical approach.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/kzhrknt/awesome-design-md-jp" rel="noopener noreferrer"&gt;https://github.com/kzhrknt/awesome-design-md-jp&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What &lt;code&gt;DESIGN.md&lt;/code&gt; carries is the design system that used to be scattered across Figma files and style guide PDFs, now consolidated into a single file with both machine-readable and human-readable parts. Think of it as the spec foundation that lets AI agents generate UIs with a consistent look every time.&lt;/p&gt;

&lt;h2&gt;
  
  
  New Type 2: How SKILL.md (Anthropic) and AGENTS.md Specify Behavior
&lt;/h2&gt;

&lt;p&gt;While &lt;code&gt;DESIGN.md&lt;/code&gt; covers "appearance," &lt;code&gt;SKILL.md&lt;/code&gt; and &lt;code&gt;AGENTS.md&lt;/code&gt; cover "behavior" — defining what the agent is trying to do, how it should proceed, and what it must not do.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;SKILL.md&lt;/code&gt; is the file format standardized by agentskills.io as part of the Agent Skills open standard. Anthropic's Claude Skills is one implementation of this standard; the same &lt;code&gt;SKILL.md&lt;/code&gt; works across Claude Code, Claude.ai, and the Agent SDK. Because it's standards-compliant, the same file is also readable by other agents like OpenClaw and Hermes. The structure: declare metadata (skill name, description, allowed tools) in the YAML at the top of the file, and write the task procedure or domain knowledge in the Markdown body below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://agentskills.io/home" rel="noopener noreferrer"&gt;https://agentskills.io/home&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A clear example of &lt;code&gt;SKILL.md&lt;/code&gt; is &lt;code&gt;conorbronsdon/avoid-ai-writing&lt;/code&gt;. It's an English-only skill that detects and rewrites AI patterns in English text — transition phrases like "Moreover," significance inflation like "watershed moment," and roundabout verb constructions like "serves as." It uses a 100+ word replacement table organized into 3 tiers (Tier 1 always replaces, Tier 2 flags when 2+ words appear in the same paragraph, Tier 3 flags only at high density), and audits 36 pattern categories. Two modes: &lt;code&gt;detect&lt;/code&gt; and &lt;code&gt;rewrite&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/conorbronsdon/avoid-ai-writing" rel="noopener noreferrer"&gt;https://github.com/conorbronsdon/avoid-ai-writing&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What sets it apart from a one-shot prompt is the structured audit it returns. In &lt;code&gt;rewrite&lt;/code&gt; mode, you get four discrete sections: identified issues, the rewritten text, a summary of changes, and a second-pass audit. What changed and why becomes transparent.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;AGENTS.md&lt;/code&gt; covers the agent's overall behavior. Project assumptions, roles, prohibitions, escalation rules. As I mentioned at the top, it started with the Amp team at Sourcegraph; today OpenAI, Google, Cursor, and Factory jointly drive it, and it was donated to the Linux Foundation in December 2025. Think of &lt;code&gt;CLAUDE.md&lt;/code&gt; as the Claude-specific version of &lt;code&gt;AGENTS.md&lt;/code&gt;. Claude Code reads &lt;code&gt;CLAUDE.md&lt;/code&gt; rather than &lt;code&gt;AGENTS.md&lt;/code&gt; in its spec, but the pattern recommended by &lt;code&gt;agents.md&lt;/code&gt; is to make &lt;code&gt;AGENTS.md&lt;/code&gt; the actual file and symlink &lt;code&gt;CLAUDE.md&lt;/code&gt; to it. In the personal repository I introduced earlier, the files under &lt;code&gt;agents/&lt;/code&gt; belong to this layer.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;SKILL.md&lt;/code&gt; and &lt;code&gt;AGENTS.md&lt;/code&gt; cover different ranges. &lt;code&gt;AGENTS.md&lt;/code&gt; handles "overall context and boundaries." &lt;code&gt;SKILL.md&lt;/code&gt; handles "an executable unit for a specific task."&lt;/p&gt;

&lt;p&gt;The avoid-ai-writing English style auditor I mentioned is a specific task, so it ships as &lt;code&gt;SKILL.md&lt;/code&gt;. A file like &lt;code&gt;agents/genda/qa-support.md&lt;/code&gt;, which describes the assumptions and engagement style of a QA role, defines the agent's boundary — that goes on the &lt;code&gt;AGENTS.md&lt;/code&gt; side.&lt;/p&gt;

&lt;p&gt;The shared concern of these formats is "behavior and procedure," not visual appearance. What the agent knows, what it's tasked with, what it must avoid. That's a movement to fix these in a verifiable form.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Three-Layer Split
&lt;/h2&gt;

&lt;p&gt;Lining up the three file types, the layers each one handles become clear.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Format&lt;/th&gt;
&lt;th&gt;What it carries&lt;/th&gt;
&lt;th&gt;Examples&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Behavior&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;AGENTS.md&lt;/code&gt; / &lt;code&gt;CLAUDE.md&lt;/code&gt; (natural language + rules)&lt;/td&gt;
&lt;td&gt;Overall context, roles, prohibitions&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;CLAUDE.md&lt;/code&gt;, role-specific files like &lt;code&gt;agents/genda/qa-support.md&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Individual task&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;SKILL.md&lt;/code&gt; (YAML at top + Markdown body)&lt;/td&gt;
&lt;td&gt;Reusable tasks, procedures, domain knowledge&lt;/td&gt;
&lt;td&gt;avoid-ai-writing, in-house procedure skills&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Appearance&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;DESIGN.md&lt;/code&gt; (YAML at top + Markdown body)&lt;/td&gt;
&lt;td&gt;Design system spec, verifiable visual rules&lt;/td&gt;
&lt;td&gt;The Google Labs reference, individual service files in &lt;code&gt;kzhrknt/awesome-design-md-jp&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The three are complementary, not competing. CLIs like &lt;code&gt;bergside/typeui&lt;/code&gt; are emerging as tools that can generate or update either &lt;code&gt;SKILL.md&lt;/code&gt; or &lt;code&gt;DESIGN.md&lt;/code&gt;, depending on what you choose — a sign of tooling that assumes the division of labor.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/bergside/typeui" rel="noopener noreferrer"&gt;https://github.com/bergside/typeui&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What's actually different across the layers is "where to place the balance between machine-readable and human-readable." &lt;code&gt;AGENTS.md&lt;/code&gt; skews almost entirely human-readable; over-structuring it would block the contextual judgment and nuance it needs to convey. &lt;code&gt;SKILL.md&lt;/code&gt; is partially structured by the YAML at the top, but the body stays human-readable — task granularity has to be readable by humans before it can be instructed. &lt;code&gt;DESIGN.md&lt;/code&gt; puts machine-readable design tokens in the top YAML and human-readable design intent in the body, with the two cleanly separated.&lt;/p&gt;

&lt;p&gt;The center of gravity between "machine-readable" and "human-readable" sits in different places per layer. That's just the standard structuring principle — "manage things at different layers in different files" — applied to AI agents. The file names themselves spell out the division: &lt;code&gt;AGENTS.md&lt;/code&gt; ("instructions to the agent"), &lt;code&gt;SKILL.md&lt;/code&gt; ("a reusable skill"), &lt;code&gt;DESIGN.md&lt;/code&gt; ("the design system"). The names match what each one carries.&lt;/p&gt;

&lt;p&gt;Teams that have been packing all their "AI rules" into a single &lt;code&gt;CLAUDE.md&lt;/code&gt; now face a split decision. Open up your &lt;code&gt;CLAUDE.md&lt;/code&gt; and run these questions against it — splits start to surface:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is there a section writing design system rules? → If yes, that goes to &lt;code&gt;DESIGN.md&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Are specific task procedures in there (monthly aggregation, test review, contract review)? → If yes, those go to &lt;code&gt;SKILL.md&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;What's left is overall agent context and boundaries (roles, prohibitions, escalation criteria) → that's the &lt;code&gt;AGENTS.md&lt;/code&gt; equivalent that stays&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The three-layer split works as a framework for splitting your file.&lt;/p&gt;

&lt;h2&gt;
  
  
  Connecting with SDD
&lt;/h2&gt;

&lt;p&gt;Stepping back to look at the bigger picture: how does the three-layer split relate to the broader movement of "specs for AI"?&lt;/p&gt;

&lt;p&gt;SDD is a development style where you write the spec — requirements, design, tasks, implementation — before generating the code. The underlying idea: "specs aren't disposable scaffolding, they're executable artifacts that produce code." AWS's Kiro provides a workflow that generates &lt;code&gt;requirements.md&lt;/code&gt;, &lt;code&gt;design.md&lt;/code&gt;, and &lt;code&gt;tasks.md&lt;/code&gt; in order under &lt;code&gt;.kiro/specs/{feature}/&lt;/code&gt;. GitHub's Spec Kit (over 90,000 stars) supports the same flow with slash commands like &lt;code&gt;/specify&lt;/code&gt;, &lt;code&gt;/plan&lt;/code&gt;, &lt;code&gt;/tasks&lt;/code&gt;, &lt;code&gt;/implement&lt;/code&gt;. The EARS notation (Easy Approach to Requirements Syntax) used by Kiro reduces ambiguity by formatting requirements into 5 fixed templates. SDD has spread quickly between 2025 and 2026.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://kiro.dev/" rel="noopener noreferrer"&gt;https://kiro.dev/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/github/spec-kit" rel="noopener noreferrer"&gt;https://github.com/github/spec-kit&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The three-layer split (&lt;code&gt;AGENTS.md&lt;/code&gt; / &lt;code&gt;SKILL.md&lt;/code&gt; / &lt;code&gt;DESIGN.md&lt;/code&gt;) and SDD look like separate movements on the surface. The SDD community concentrates on Kiro and spec-kit usage; the &lt;code&gt;DESIGN.md&lt;/code&gt; side concentrates on formal specs and validation tooling. You don't see many articles bridging the two.&lt;/p&gt;

&lt;p&gt;But put their philosophies side by side and the overlap is striking.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Shared philosophy&lt;/th&gt;
&lt;th&gt;SDD (Kiro etc.)&lt;/th&gt;
&lt;th&gt;
&lt;code&gt;DESIGN.md&lt;/code&gt; / &lt;code&gt;SKILL.md&lt;/code&gt; / &lt;code&gt;AGENTS.md&lt;/code&gt;
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Specify before implementing&lt;/td&gt;
&lt;td&gt;requirements → design → tasks → implementation&lt;/td&gt;
&lt;td&gt;behavior → implementation, appearance → implementation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Mix machine-readable + human-readable&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;requirements.md&lt;/code&gt; (EARS notation) + natural language&lt;/td&gt;
&lt;td&gt;YAML at top + Markdown body&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Persistent context for the AI&lt;/td&gt;
&lt;td&gt;reference &lt;code&gt;.kiro/specs/{feature}/&lt;/code&gt; every time&lt;/td&gt;
&lt;td&gt;reference &lt;code&gt;DESIGN.md&lt;/code&gt; / &lt;code&gt;AGENTS.md&lt;/code&gt; every time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Reduce ambiguity through structured syntax&lt;/td&gt;
&lt;td&gt;EARS notation structures requirements (5 templates)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;lint&lt;/code&gt; validates WCAG contrast ratios and structural rules&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Fix "decisions made" as a place&lt;/td&gt;
&lt;td&gt;spec files are where decisions live&lt;/td&gt;
&lt;td&gt;spec files are where decisions live&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Both sit inside the larger "specs for AI" movement and share the same underlying philosophy.&lt;/p&gt;

&lt;p&gt;That said, they're not the same thing. The biggest difference, in one phrase: time horizon.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Axis&lt;/th&gt;
&lt;th&gt;SDD&lt;/th&gt;
&lt;th&gt;
&lt;code&gt;DESIGN.md&lt;/code&gt; / &lt;code&gt;SKILL.md&lt;/code&gt; / &lt;code&gt;AGENTS.md&lt;/code&gt;
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Time horizon&lt;/td&gt;
&lt;td&gt;Describes "what to build next"&lt;/td&gt;
&lt;td&gt;Describes "rules that already exist"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Scope&lt;/td&gt;
&lt;td&gt;Single feature / project lifecycle&lt;/td&gt;
&lt;td&gt;Persistent rules and styles&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Update rhythm&lt;/td&gt;
&lt;td&gt;New per feature → consume → archive&lt;/td&gt;
&lt;td&gt;Long-term maintenance, gradual growth&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Subject&lt;/td&gt;
&lt;td&gt;Requirements, design, tasks (procedure for action)&lt;/td&gt;
&lt;td&gt;Rules for behavior, individual tasks, appearance&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;SDD specs describe "what we're going to build." &lt;code&gt;requirements.md&lt;/code&gt; is "what this feature needs to satisfy"; &lt;code&gt;design.md&lt;/code&gt; is "how to implement this feature"; &lt;code&gt;tasks.md&lt;/code&gt; is "how to break the feature into work." Once the feature ships, they finish their job and get archived.&lt;/p&gt;

&lt;p&gt;The three-layer specs describe "what should always hold." &lt;code&gt;DESIGN.md&lt;/code&gt; provides the color and typography rules every time you generate a UI; &lt;code&gt;AGENTS.md&lt;/code&gt; provides the agent's assumptions across every session. They get maintained long-term and grow incrementally.&lt;/p&gt;

&lt;p&gt;This time-horizon difference is why the two don't compete. Transient specs and persistent specs coexist in the same project. They can also reference each other. Imagine writing "use &lt;code&gt;{colors.tertiary}&lt;/code&gt; for the button" inside &lt;code&gt;.kiro/specs/checkout-feature/design.md&lt;/code&gt; — that lets a transient feature spec reference a color token from a persistent &lt;code&gt;DESIGN.md&lt;/code&gt;. The pattern isn't widely established yet, but the structure fits cleanly.&lt;/p&gt;

&lt;p&gt;One thing worth noting: as of May 2026, the active areas of SDD (the Kiro community and similar) and the active areas of &lt;code&gt;DESIGN.md&lt;/code&gt; / &lt;code&gt;SKILL.md&lt;/code&gt; / &lt;code&gt;AGENTS.md&lt;/code&gt; haven't really crossed paths. The SDD side concentrates on "how to build a feature"; the three-layer side concentrates on "how to deliver the rules."&lt;/p&gt;

&lt;p&gt;You don't have to be doing SDD to start with the three-layer split — the split alone gets you to the door of "specs for AI." If your team is already on SDD, start referencing &lt;code&gt;DESIGN.md&lt;/code&gt; tokens from inside your feature specs and you avoid maintaining the same rules in two places. The two movements look set to converge in the next phase.&lt;/p&gt;

&lt;h2&gt;
  
  
  Not Everything Becomes a Spec
&lt;/h2&gt;

&lt;p&gt;The discussion of the three-layer split tends to drift toward "shouldn't we just spec everything," but in practice, that doesn't happen.&lt;/p&gt;

&lt;p&gt;Rules that can't be formally verified stay as natural-language documents. Tone, structural choices, cultural nuance. Things like "how to open an article with empathy" or "how to give an ending the right amount of resonance" — judgment-based qualities. The cost of speccing them isn't the issue; the essence falls out when you try.&lt;/p&gt;

&lt;p&gt;The judgment is straightforward: "is this formally verifiable?"&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Color contrast ratios (verifiable) → &lt;code&gt;DESIGN.md&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Word substitutions like "leverage → use" (verifiable) → &lt;code&gt;SKILL.md&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Tone (soft assertions, not textbook-sounding), overall stance (not teaching, just organizing) and similar (not verifiable) → stays in &lt;code&gt;AGENTS.md&lt;/code&gt; / &lt;code&gt;CLAUDE.md&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For small teams, "one natural-language file" is often enough. If &lt;code&gt;CLAUDE.md&lt;/code&gt; alone is keeping things running, there's no need to force a split. The trade-off between the cost of speccing and the load of operating it depends on team size and how long the operation has to last.&lt;/p&gt;

&lt;p&gt;The three-layer split is something you adopt incrementally, just like SDD — you don't need to spec everything at once. Start with the complex areas, the areas where verification helps most.&lt;/p&gt;

&lt;p&gt;In other words, the three-layer split isn't a goal. It's an option you adopt when the situation calls for it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where to Start
&lt;/h2&gt;

&lt;p&gt;A few options come into view from this overview.&lt;/p&gt;

&lt;p&gt;A reasonable first move is to open your &lt;code&gt;CLAUDE.md&lt;/code&gt; or style guide and sort it into "formally verifiable" and "judgment-based" sections. Color and typography rules, word substitution lists, structural rules. If a useful amount of verifiable content sits there, pick one to break out into either &lt;code&gt;DESIGN.md&lt;/code&gt; (appearance) or &lt;code&gt;SKILL.md&lt;/code&gt; (task). Don't try to split everything at once — start with the most independent piece.&lt;/p&gt;

&lt;p&gt;Pulling in external skills is another route. Drop a ready-made &lt;code&gt;SKILL.md&lt;/code&gt; like &lt;code&gt;avoid-ai-writing&lt;/code&gt; into &lt;code&gt;~/.claude/skills/&lt;/code&gt; and your stance as a writer doesn't change — only the verification gets handed off to the machine.&lt;/p&gt;

&lt;p&gt;Teams already running Kiro or spec-kit are probably at the stage where they could try referencing &lt;code&gt;DESIGN.md&lt;/code&gt; tokens from inside &lt;code&gt;.kiro/specs/{feature}/design.md&lt;/code&gt;. The cross-reference between feature specs and persistent specs is still a thin area in terms of public examples.&lt;/p&gt;

&lt;p&gt;The shared stance: don't try to spec everything at once. Document split → operational trial → speccing — staged migration is the realistic path. The three-layer split isn't a finished form. It's a movement still in progress, and that's the safer way to read it.&lt;/p&gt;

&lt;p&gt;AI rules started splitting from a single natural-language document into three spec formats. That's another side of the same movement as SDD.&lt;/p&gt;

&lt;p&gt;Not everything becomes a spec, but managing different roles in different files — that ordinary structuring is starting to apply to AI agents, too.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>agents</category>
      <category>designsystem</category>
    </item>
    <item>
      <title>Harness Engineering with Nothing but Markdown</title>
      <dc:creator>Kento IKEDA</dc:creator>
      <pubDate>Sun, 26 Apr 2026 23:54:18 +0000</pubDate>
      <link>https://dev.to/aws-builders/harness-engineering-with-nothing-but-markdown-g6b</link>
      <guid>https://dev.to/aws-builders/harness-engineering-with-nothing-but-markdown-g6b</guid>
      <description>&lt;p&gt;If coding agents aren't your primary battlefield, "harness engineering" probably feels like a distant concept. Scrolling through a timeline full of articles written for Claude Code and Codex users, you may have thought, "This isn't about me."&lt;/p&gt;

&lt;p&gt;My own agent use wasn't centered on coding either, so none of the articles out there seemed to apply to my case. But looking back, I'd been doing the same thing — it just didn't have a name yet.&lt;/p&gt;

&lt;p&gt;I've been running a business automation agent via Claude Desktop (through MCP servers) for several months now. It gathers information across multiple work tools like Slack, Confluence, and Google Calendar, switches judgment criteria based on context, and produces outputs accordingly. What the agent refers to goes beyond surface-level rules — accumulated knowledge such as understanding of organizational structure, past decision-making history, and writing style guidelines forms the foundation for its judgment.&lt;/p&gt;

&lt;p&gt;I haven't written a single line of code. All I write is Markdown. And most of that Markdown is generated by the agent itself — I just approve or give revision instructions through chat. I almost never open the files directly to edit them.&lt;/p&gt;

&lt;p&gt;This article isn't for people already practicing harness engineering. It's for those who've heard the term but thought, "That's a coding thing, right?" — I'm sharing the structure I've found. Each example includes a ready-to-use sample, so if you're running a business automation agent with MCP, you can try them as-is.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Harness Engineering?
&lt;/h2&gt;

&lt;p&gt;Let me set the foundation.&lt;/p&gt;

&lt;p&gt;Mitchell Hashimoto, co-founder of HashiCorp, gave the name "Engineer the Harness" to a practice he'd cultivated in his AI agent workflow, in a February 2026 blog post. The approach: when an agent makes a mistake, instead of fixing the prompt, build an environment where the same mistake can't happen again.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://mitchellh.com/writing/my-ai-adoption-journey" rel="noopener noreferrer"&gt;https://mitchellh.com/writing/my-ai-adoption-journey&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Days later, OpenAI published a practice report titled "Harness engineering." A small engineering team spent five months building a product using only Codex agents with zero hand-written code, and the repository reached roughly one million lines. The back-to-back publication of Hashimoto's blog and this report cemented "harness engineering" as a term.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://openai.com/index/harness-engineering/" rel="noopener noreferrer"&gt;https://openai.com/index/harness-engineering/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the coding agent context, this translates to implementations like banning specific patterns with ESLint, defining commands in &lt;code&gt;AGENTS.md&lt;/code&gt;, and running automated reviews via pre-commit hooks.&lt;/p&gt;

&lt;p&gt;From "asking" (prompts) to "building" (environment). That's the core.&lt;/p&gt;

&lt;p&gt;Up to this point, the story seems confined to the world of coding agents. But in 2025, MCP became widespread and rapidly expanded the practical scope of non-coding agents. Once agents gained direct access to business tools like Slack, Confluence, Google Calendar, and Jira, the risk of "agents making mistakes on their own" spilled beyond coding. Harnesses are no longer just for coding agents.&lt;/p&gt;

&lt;h2&gt;
  
  
  I Kept Rewriting Prompts
&lt;/h2&gt;

&lt;p&gt;When you incorporate agents into business workflows, you run into experiences like these.&lt;/p&gt;

&lt;p&gt;You write "don't make financial judgments" and it makes them anyway. You write "don't post directly to Slack — create a draft" and it tries to post. You write "commit and push at the end of the session" and it forgets.&lt;/p&gt;

&lt;p&gt;Each time, I'd rewrite the prompt. Under the assumption that "if I write it more clearly, it'll understand."&lt;/p&gt;

&lt;p&gt;At some point, I realized the assumption itself was wrong. No matter how much you polish a prompt, the agent makes the same mistake in the next session. Instructions get buried in long contexts. When the session ends, memory disappears entirely. Requests are volatile.&lt;/p&gt;

&lt;p&gt;Stop expecting the agent to remember. Change the environment instead. Looking back, this was the entry point to harness engineering.&lt;/p&gt;

&lt;h2&gt;
  
  
  Harnesses for Non-Coding Agents
&lt;/h2&gt;

&lt;p&gt;When I lined up what I'd been doing in my repository, the same structure as coding agent harnesses emerged.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Coding Agent Environment&lt;/th&gt;
&lt;th&gt;Non-Coding Agent Environment&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ESLint / TypeScript strict type enforcement&lt;/td&gt;
&lt;td&gt;Prohibited actions section under &lt;code&gt;agents/&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;AGENTS.md&lt;/code&gt; command definitions&lt;/td&gt;
&lt;td&gt;Context routing rules in instruction files&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pre-commit hooks&lt;/td&gt;
&lt;td&gt;Mandatory actions at session end&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CI gates (can't merge unless tests pass)&lt;/td&gt;
&lt;td&gt;Forced knowledge accumulation rules under &lt;code&gt;knowledge/&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The materials on each side are completely different. One uses linters and hooks, the other uses Markdown files. But the design intent is the same: building an environment outside the agent where the agent can behave correctly.&lt;/p&gt;

&lt;p&gt;One prerequisite to note: most AI chat tools have a designated place for instruction files that are automatically loaded at session start. In Claude Desktop it's Project Knowledge; in ChatGPT it's Custom Instructions. What I call "instruction files" in this article are Markdown files placed in this mechanism. Unlike writing in the prompt each time, they're automatically placed in a position that's hard to bury even as conversations grow longer.&lt;/p&gt;

&lt;p&gt;Here are three concrete examples, each with a ready-to-use sample.&lt;/p&gt;

&lt;h3&gt;
  
  
  Structuring Prohibited Actions
&lt;/h3&gt;

&lt;p&gt;Say you've delegated Slack posting to your agent. Even if you write "don't post directly — create a draft" in the prompt, it forgets across sessions.&lt;/p&gt;

&lt;p&gt;The solution is to create a prohibited actions section in the instruction file and structure it so it's loaded every session. Move the instruction's location from prompt (volatile) to file (persistent).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Prohibited Actions

Follow these without exception.

- Do not auto-post to company Slack (draft only; user handles posting)
- Do not make definitive financial judgments (always ask user for confirmation)
- Do not treat replies to clients as final versions (always get user approval)
- Do not make judgments about personnel evaluations or compensation
- When including confidential information (salaries, contract amounts, etc.) in summaries, explicitly note this
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Instead of telling someone verbally each time, place rules in a fixed location and reference them every time. It's that simple, but it changes the lifespan of rules from per-session to permanent.&lt;/p&gt;

&lt;h3&gt;
  
  
  Forcing Actions at Session End
&lt;/h3&gt;

&lt;p&gt;You want to leave a work log at the end of each session. Even if you write "create a work log and commit &amp;amp; push at the end" in the prompt, the agent just wraps up when the conversation gets lively.&lt;/p&gt;

&lt;p&gt;The solution is to define trigger conditions and mandatory actions as a set in the instruction file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Mandatory Actions at Session End

When the user indicates work completion with phrases like "done," "thanks," or "commit,"
execute the following. Skipping is prohibited.

1. Create a work log at `docs/work-logs/YYYY-MM-DD-{topic}.md`
   - Include: background, options considered, key decisions, deliverables, next steps
2. Append a summary of changes to `CHANGELOG.md`
3. Execute git commit &amp;amp; push
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The difference from the prohibited actions example is that trigger conditions for "when to fire" are also defined. By explicitly stating end signals like "done," "thanks," and "commit," the agent can more easily judge "this is the moment." It's not perfect, but the firing rate goes up significantly compared to writing "execute at the appropriate timing" with vague triggers.&lt;/p&gt;

&lt;p&gt;The key is the single line: "Skipping is prohibited." If you leave room for the agent to judge, it will decide on its own that "it's probably fine to skip this time" when conversations get long. Removing discretion stabilizes behavior.&lt;/p&gt;

&lt;p&gt;There's a secondary benefit too. When rules are defined in the instruction file, a simple "leave a log" or "commit" is enough for the agent to instantly understand "that action." No need to explain from scratch each time. The instruction file becomes shared vocabulary between human and agent.&lt;/p&gt;

&lt;h3&gt;
  
  
  Forced Knowledge Accumulation
&lt;/h3&gt;

&lt;p&gt;The third is an example of a "can't proceed without passing the check" structure.&lt;/p&gt;

&lt;p&gt;In conversations with agents, information worth accumulating comes up frequently — things decided in meetings, conclusions from tool selection, facts discovered during troubleshooting. Even if you write "save important information" in the prompt, it predictably forgets.&lt;/p&gt;

&lt;p&gt;The solution is to embed a "knowledge check" protocol in the instruction file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Knowledge Accumulation (Mandatory Check)

Before each response, internally execute the following check. Skipping is prohibited.

Check: Does the user's immediately preceding statement, or your own response,
contain new information matching any of the following?

1. Factual information: team composition, tech stack, account info, environment configuration
2. Decisions: architecture selection, tool adoption, policy changes
3. Learnings: facts discovered during troubleshooting, gotchas, operational tips
4. Client-specific: contact names, contact info, project progress

→ If applicable: In addition to the normal response, append the following at the end.

💾 Knowledge capture proposal:
  File: knowledge/{project-name}/{filename}.md
  Content: (summary of content to add)
  Reason: (why this should be accumulated)

→ If not applicable: Append nothing.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The intended structure is "can't produce a response without passing the check." Of course, LLMs can skip instructions, so the enforcement isn't as strong as a mechanical gate. Still, by embedding the check into the system, the probability of capturing information rises significantly even when the human forgets to say "save that."&lt;/p&gt;

&lt;p&gt;Since implementing this system, knowledge files have been steadily accumulating in the knowledge directory.&lt;/p&gt;

&lt;h2&gt;
  
  
  Acknowledging the Enforcement Gap
&lt;/h2&gt;

&lt;p&gt;Let me address the strongest counterargument upfront. "Markdown prohibitions don't have the same enforcement power as a linter." That's correct.&lt;/p&gt;

&lt;p&gt;Linters and type checkers mechanically detect rule violations. Depending on configuration, they can even block builds and merges entirely. Markdown prohibitions, on the other hand, carry the risk of the agent reading past them. If buried in a long instruction file, effectiveness drops.&lt;/p&gt;

&lt;p&gt;However, the comparison here isn't against "mechanical enforcement" — it's against "writing it in the prompt each time." Why does writing in a file work better than a prompt? Two reasons.&lt;/p&gt;

&lt;p&gt;First, the "reference mechanism is different." As noted earlier, instructions placed in Project Knowledge or Custom Instructions are passed to the agent in a separate channel from regular messages. They're placed in a position that's harder to bury even as conversations grow longer, structurally increasing the probability of being referenced.&lt;/p&gt;

&lt;p&gt;Second, "accumulation becomes irreversible." Instructions written in a prompt don't exist in the next session. Write them in a file, and they persist unless deleted. The cycle of "write a good instruction → forget → write again" becomes "write a good instruction → append to file → automatically referenced from then on."&lt;/p&gt;

&lt;p&gt;Lining up enforcement strength from weakest to strongest:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Write in the prompt each time" → "Place in a persistent file and reference every time" → "Mechanically block with linters and hooks"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Non-coding agents are currently at the middle position. Definitely stronger than the left end, doesn't reach the right end. But moving to the middle is still better for agent stability than staying at the left.&lt;/p&gt;

&lt;h2&gt;
  
  
  Repository Structure as a Design Decision
&lt;/h2&gt;

&lt;p&gt;So far I've written about individual rules, but the "where to put" the rules is itself a design decision.&lt;/p&gt;

&lt;p&gt;The repository structure that solidified through operation looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ai-agents/
├── agents/                  # Role-specific instruction files
│   ├── assistant.md         # Main instructions (prohibitions, mandatory actions)
│   ├── project-a/
│   │   ├── sre-support.md   # SRE-specific instructions
│   │   ├── qa-support.md    # QA-specific instructions
│   │   └── ...
│   └── project-b/
│       ├── accounting.md    # Accounting-specific instructions
│       └── ...
├── knowledge/               # Accumulated knowledge
│   ├── project-a/
│   ├── project-b/
│   └── writing-style-guide.md
├── docs/work-logs/          # Per-session work logs
└── CHANGELOG.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This structure shares two principles with coding agent harness design.&lt;/p&gt;

&lt;p&gt;The first is "separation of concerns." &lt;a href="https://openai.com/index/harness-engineering/" rel="noopener noreferrer"&gt;OpenAI's report&lt;/a&gt; documents the experience of a monolithic &lt;code&gt;AGENTS.md&lt;/code&gt; not working well. When everything in the context is "important," nothing is important. In my own repository too, I initially crammed everything into a single Markdown file. Separating files by role and having the agent reference only what's needed improved instruction effectiveness.&lt;/p&gt;

&lt;p&gt;What enables this is context routing rules. Define routing in the main instruction file so the agent can reference the appropriate specialized instructions based on conversation content.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Context Routing Rules
Judge which context the user's statement belongs to and reference the appropriate specialized instructions.

- Project A context signals: AWS, infrastructure, SRE, QA, team member names → Reference files under `agents/project-a/`
- Project B context signals: billing, contracts, accounting, legal → Reference files under `agents/project-b/`
- Ambiguous: Ask which project this is about
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the same structure as the &lt;code&gt;AGENTS.md&lt;/code&gt; "design as a pointer" principle. The main file handles routing only, delegating details to specialized files. OpenAI's report describes keeping &lt;code&gt;AGENTS.md&lt;/code&gt; to roughly 100 lines, functioning as a map. For non-coding agents, I've observed the same tendency — the longer the instruction file, the more effectiveness drops.&lt;/p&gt;

&lt;p&gt;The second is "version control." By placing instruction files in a Git repository, change history is preserved. "When was this prohibition added?" "Which rule change made things stable?" — all traceable via diff. Slack messages and ad-hoc prompts don't preserve this history. Additionally, since it's a Git repository, you're not tied to a specific PC. Keep it on a remote, and you can launch the same harness from any device.&lt;/p&gt;

&lt;p&gt;OpenAI's team makes the same point. Slack discussions, Google Docs content — if it's not in the repository, it's inaccessible to the agent and might as well not exist. This applies equally to non-coding agents.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;You don't need to structure everything from the start when beginning harness engineering for non-coding agents.&lt;/p&gt;

&lt;p&gt;In my case too, the early days were spent rewriting prompts. The order in which structure solidified was:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;When the agent makes the same mistake twice, write it in a file instead of a prompt&lt;/li&gt;
&lt;li&gt;When the file gets bloated, split by role&lt;/li&gt;
&lt;li&gt;When information is lost between sessions, build an accumulation system&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It's the same pattern Mitchell Hashimoto describes. "When the agent makes a mistake, build a system where that mistake can't happen again." For coding, you build it with linters and hooks. For non-coding, you build it with Markdown file structure. The material differs, but the thinking loop is the same.&lt;/p&gt;

&lt;p&gt;Here's a minimal starter template. Place it in Claude Desktop's Project Knowledge or ChatGPT's Custom Instructions and it works as-is.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Assistant Instructions

## Your Role

An AI assistant that supports user workflows.
Use MCP tools like Slack, Google Calendar, and Confluence for information gathering and organization.

## Prohibited Actions

- Do not auto-post to company Slack (draft only)
- Do not make definitive financial judgments (always ask user for confirmation)
- When including confidential information in summaries, explicitly note this

## Mandatory Actions at Session End

When the user indicates work completion, execute the following. Skipping is prohibited.

1. Create a work log at `docs/work-logs/YYYY-MM-DD-{topic}.md`
2. If there are changes, execute git commit &amp;amp; push

## Knowledge Accumulation (Mandatory Check)

Before each response, internally execute the following check. Skipping is prohibited.

Check: Does the immediately preceding conversation contain new information matching any of the following?
1. Factual information (team composition, tech stack, environment configuration)
2. Decisions (architecture selection, tool adoption, policy changes)
3. Learnings (facts discovered during troubleshooting, gotchas)

→ If applicable: Append a knowledge capture proposal at the end
→ If not applicable: Append nothing
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This template is roughly 30 lines. Start here, and add one line to the prohibited actions every time the agent makes a mistake. In a few months, you'll have a harness built specifically for you.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Question Harnesses Share
&lt;/h2&gt;

&lt;p&gt;Harness engineering isn't a coding-specific technique. It's a design philosophy: giving agents a reliable execution environment.&lt;/p&gt;

&lt;p&gt;Coding agents build that environment with types, linters, and hooks. Non-coding agents build it with structured Markdown and forced referencing. The materials differ, but the question is the same: "When this agent makes a mistake, where is the system that prevents it from happening a second time?"&lt;/p&gt;

&lt;p&gt;Since shifting from "I just need to write better prompts" to "I need to build a structure where the same mistake can't happen," my agents have been running more stably.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>llm</category>
      <category>automation</category>
    </item>
    <item>
      <title>What Changes and What Stays the Same for SRE with AWS Frontier Agents</title>
      <dc:creator>Kento IKEDA</dc:creator>
      <pubDate>Mon, 13 Apr 2026 20:13:46 +0000</pubDate>
      <link>https://dev.to/aws-builders/what-changes-and-what-stays-the-same-for-sre-with-aws-frontier-agents-23aj</link>
      <guid>https://dev.to/aws-builders/what-changes-and-what-stays-the-same-for-sre-with-aws-frontier-agents-23aj</guid>
      <description>&lt;p&gt;On March 31, 2026, AWS made DevOps Agent and Security Agent generally available — the first two of the autonomous AI agents announced at re:Invent 2025 under the "Frontier Agents" brand. A 2-month free trial is included, after which pay-as-you-go pricing kicks in.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/blogs/mt/announcing-general-availability-of-aws-devops-agent/" rel="noopener noreferrer"&gt;https://aws.amazon.com/blogs/mt/announcing-general-availability-of-aws-devops-agent/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/blogs/machine-learning/aws-launches-frontier-agents-for-security-testing-and-cloud-operations/" rel="noopener noreferrer"&gt;https://aws.amazon.com/blogs/machine-learning/aws-launches-frontier-agents-for-security-testing-and-cloud-operations/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The official announcements highlight numbers like "up to 75% MTTR reduction" and "penetration testing compressed from weeks to hours." The question that matters more is: how does this change the day-to-day work of an SRE team? Feature overviews are already plentiful, so this article focuses on what shifts to agents and what stays with humans.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is a Frontier Agent?
&lt;/h2&gt;

&lt;p&gt;AWS announced three Frontier Agents at re:Invent 2025: Kiro Autonomous Agent (software development), DevOps Agent (operations), and Security Agent (security). Of these, DevOps Agent and Security Agent are now GA. Kiro Autonomous Agent remains in preview.&lt;/p&gt;

&lt;p&gt;AWS &lt;a href="https://aws.amazon.com/blogs/machine-learning/aws-launches-frontier-agents-for-security-testing-and-cloud-operations/" rel="noopener noreferrer"&gt;defines Frontier Agents&lt;/a&gt; as systems that "work independently to achieve goals, scale massively to tackle concurrent tasks, and run persistently for hours or days." Frankly, that description could apply to existing AI agents like Claude Code or Devin. What AWS emphasizes is delivering "complete outcomes" rather than assisting with individual tasks, but this feels like a difference of degree, not kind.&lt;/p&gt;

&lt;p&gt;In practice, it's probably best to think of them as domain-specialized autonomous agents — deeply integrated with DevOps and security workflows. "Frontier" is more of a marketing brand than a technical category: "AWS's first-party, domain-specific agent products" is a fair characterization.&lt;/p&gt;

&lt;p&gt;What matters isn't the naming — it's how these agents affect SRE work.&lt;/p&gt;

&lt;h2&gt;
  
  
  What DevOps Agent Does and Doesn't Do
&lt;/h2&gt;

&lt;p&gt;AWS describes DevOps Agent as an "always-available operations teammate." However, since it requires human approval for fixes and can't make business decisions, the reality is closer to an "always-on SRE apprentice" — it investigates and proposes, but can't decide or execute. Here's where that boundary lies.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the Agent Does
&lt;/h3&gt;

&lt;p&gt;Imagine an alert fires at 2 AM. Traditionally, the on-call engineer wakes up from a Datadog alert, opens their laptop, checks dashboards for metric anomalies, digs through logs, cross-references deployment history, and identifies the root cause. DevOps Agent automates this entire initial investigation.&lt;/p&gt;

&lt;p&gt;Specifically, it correlates metrics and logs from monitoring tools (CloudWatch, Datadog, Dynatrace, New Relic, Splunk, Grafana), code repositories (GitHub, GitLab, Azure DevOps), and CI/CD deployment histories to build hypotheses like "this code change introduced in this deployment correlates with this metric anomaly." Investigation progress is shared via the web console and Slack, where you can ask follow-up questions or redirect the investigation.&lt;/p&gt;

&lt;p&gt;The GA release adds Azure and on-premises environments as investigation targets. On-premises tools connect via MCP (Model Context Protocol), enabling consistent investigation across multicloud and hybrid setups.&lt;/p&gt;

&lt;p&gt;Beyond incident response, DevOps Agent also provides proactive improvement recommendations — analyzing historical incident patterns to identify gaps in alert coverage, test coverage, code quality, and infrastructure configuration.&lt;/p&gt;

&lt;p&gt;It's worth noting that Datadog's Bits AI SRE offers quite similar capabilities: autonomous alert investigation, source code analysis, and deployment correlation. The key difference is that DevOps Agent can simultaneously span multiple observability tools (Datadog + CloudWatch + Splunk, etc.) and include Azure and on-premises environments via MCP. If your organization is entirely within the Datadog ecosystem, Bits AI SRE may be sufficient. If you have multiple tools or a multicloud setup, DevOps Agent's cross-platform analysis adds value. More on this in the "How Does the Relationship with Existing Tools Change?" section.&lt;/p&gt;

&lt;p&gt;The GA release also introduced "Learned Skills" and "Custom Skills." Learned Skills let the agent learn from your organization's investigation patterns and tool usage, improving accuracy over time. Custom Skills let you add organization-specific investigation procedures and best practices, configurable per incident type (triage, root cause analysis, mitigation). Code Indexing also enables code-level fix suggestions based on repository understanding.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the Agent Doesn't Do
&lt;/h3&gt;

&lt;p&gt;This is the critical part. DevOps Agent investigates and proposes — executing fixes is up to humans.&lt;/p&gt;

&lt;p&gt;It can generate fix proposals and work with Kiro or Claude Code to produce fix code, but applying changes to production requires human approval. This is intentional — AWS has made the design decision that "an agent that modifies production without approval won't be trusted."&lt;/p&gt;

&lt;p&gt;The other thing the agent doesn't do is business judgment. "This incident has major customer impact, so we need a company-wide response." "It's Friday night — let's apply a workaround and do the root cause fix Monday." These decisions require human context. Identifying the technical root cause and deciding how to respond are separate jobs. DevOps Agent handles the former; humans own the latter.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0swk42fyd4v2tj2ovtzo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0swk42fyd4v2tj2ovtzo.png" alt=" " width="800" height="797"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Security Agent Does and Doesn't Do
&lt;/h2&gt;

&lt;p&gt;Security Agent is the security counterpart — an "always-on penetration tester and security reviewer." It has three main capabilities: on-demand penetration testing, design document security review (Design Review), and PR security review (Code Review).&lt;/p&gt;

&lt;h3&gt;
  
  
  What the Agent Does
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Penetration Testing
&lt;/h4&gt;

&lt;p&gt;Traditionally, penetration testing meant "once or twice a year, for the most critical applications only, outsourced to specialists, taking weeks." Cost and time constraints leave most of the application portfolio untested. As the &lt;a href="https://aws.amazon.com/security-agent/faqs/" rel="noopener noreferrer"&gt;Security Agent FAQ&lt;/a&gt; notes, "most organizations limit manual penetration testing to their most critical applications and conduct these tests periodically." And even tested applications become partially unverified the moment new code is deployed.&lt;/p&gt;

&lt;p&gt;Security Agent changes this structure. You create an "Agent Space," connect your GitHub repository, and the agent reads source code, architecture documents, and design docs to understand the application's structure before running automated penetration tests against endpoints.&lt;/p&gt;

&lt;p&gt;The key difference from simply running a scanner: Security Agent validates discovered vulnerabilities by actually sending payloads to confirm exploitability. Reports include reproduction steps, dramatically reducing false positives. Per the &lt;a href="https://aws.amazon.com/security-agent/features/" rel="noopener noreferrer"&gt;official features page&lt;/a&gt;, testing covers OWASP Top 10 vulnerability types plus business logic flaws. According to the &lt;a href="https://aws.amazon.com/blogs/aws/aws-weekly-roundup-aws-devops-agent-security-agent-ga-product-lifecycle-updates-and-more-april-6-2026/" rel="noopener noreferrer"&gt;GA announcement blog&lt;/a&gt;, LG CNS reported significant false positive reduction, over 50% faster testing, and roughly 30% cost reduction.&lt;/p&gt;

&lt;h4&gt;
  
  
  Design Review
&lt;/h4&gt;

&lt;p&gt;This capability reviews architecture documents and design docs from a security perspective before any code is written. It checks against AWS best practices and your organization's custom security requirements. Catching issues at the design stage avoids costly rework after implementation.&lt;/p&gt;

&lt;h4&gt;
  
  
  PR Review
&lt;/h4&gt;

&lt;p&gt;Pull Request-level security review is the third capability. As of GA, it supports GitHub PRs — Security Agent automatically reviews PRs for security issues when they're created. You can configure it to check custom security requirement compliance, common security vulnerabilities, or both.&lt;/p&gt;

&lt;p&gt;PR security checks aren't new — many organizations already have Claude Code or Codex review PRs with security instructions via CLAUDE.md, or have SAST tools in their CI/CD pipeline. Security Agent's difference is operational: security requirements are defined once in the console and automatically applied across all repositories. This removes the overhead of maintaining per-repository md files, but it doesn't enable something technically impossible before. Of the three capabilities, penetration test automation is where the real differentiation lies.&lt;/p&gt;

&lt;p&gt;Design reviews are free up to 200/month, and code reviews up to 1,000/month. Only penetration testing is paid ($50/task-hour) — more on this in the "Cost Structure" section.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the Agent Doesn't Do
&lt;/h3&gt;

&lt;p&gt;Security Agent automates "discovery and validation" — "judgment and response" remain human territory.&lt;/p&gt;

&lt;p&gt;Security policy decisions are human work. "Which risks to accept and which to address," "how to interpret compliance requirements" — these are outside the agent's scope. For example, "fixing this vulnerability requires a breaking API change, but we need to consider the impact on a major customer's release schedule" is a business trade-off that requires human judgment.&lt;/p&gt;

&lt;p&gt;Social engineering (tricking employees into granting access) and vulnerabilities that can only be discovered by understanding the entire business workflow are also difficult to cover with automated testing alone. While the official documentation says business logic flaws are included in the test scope, the agent doesn't fully replace a human penetration tester's judgment of "what this operation means in the context of this business flow." Security Agent's strength is "broad, frequent, systematic testing" — complementing, not replacing, "deep, creative testing" by human experts.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Does the Relationship with Existing Tools Change?
&lt;/h2&gt;

&lt;p&gt;"Will Datadog or PagerDuty become unnecessary?" Short answer: no. The relationship changes, not the need.&lt;/p&gt;

&lt;p&gt;Here's how the human steps change, using a late-night alert as an example:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk06ru8yvqvkmggtmyjrz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk06ru8yvqvkmggtmyjrz.png" alt=" " width="800" height="923"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Previously humans handled all 5 steps; after adoption, human steps drop to 2. Red = previously all-human work, green = shifts to the agent, blue = remains human.&lt;/p&gt;

&lt;h3&gt;
  
  
  Monitoring Tools (Datadog / CloudWatch, etc.)
&lt;/h3&gt;

&lt;p&gt;DevOps Agent doesn't "replace" these tools — it uses them as "data sources." GA supports integrations with CloudWatch, Datadog, Dynatrace, New Relic, Splunk, and Grafana. This isn't about canceling Datadog and switching to DevOps Agent — it's about DevOps Agent analyzing the metrics and logs that Datadog collects.&lt;/p&gt;

&lt;p&gt;You might wonder: "Could we drop Datadog, consolidate on CloudWatch, and use DevOps Agent / Security Agent to save costs?" DevOps Agent handles incident investigation and improvement recommendations — it doesn't include day-to-day monitoring features like APM, RUM, distributed tracing, or dashboards. Datadog's value extends beyond incident response, so a simple swap doesn't work. That said, there is overlap between Datadog's Bits AI SRE and DevOps Agent's incident investigation capabilities, so whether you need to pay for both is worth evaluating.&lt;/p&gt;

&lt;p&gt;In fact, the quality of your monitoring setup directly affects DevOps Agent's effectiveness. The agent can only analyze what your tools collect. Sparse metrics and logs mean less accurate analysis. The direction isn't "adopt the agent so we can invest less in monitoring" — it's "invest in monitoring to maximize the agent's effectiveness."&lt;/p&gt;

&lt;h3&gt;
  
  
  Incident Notification and On-Call Management (PagerDuty / Datadog On-Call, etc.)
&lt;/h3&gt;

&lt;p&gt;Alert routing and on-call management roles remain unchanged. DevOps Agent starts investigating the moment an alert fires, completing initial investigation before the notified human even logs in. On-call scheduling, escalation, and incident lifecycle management continue to be handled by tools like PagerDuty or Datadog On-Call. The GA release added PagerDuty integration as well.&lt;/p&gt;

&lt;p&gt;What changes is "the first thing the on-call engineer does." Instead of "open the dashboard and check metrics," it becomes "read the Agent's investigation results shared in Slack." Per the &lt;a href="https://aws.amazon.com/blogs/mt/announcing-general-availability-of-aws-devops-agent/" rel="noopener noreferrer"&gt;official GA blog&lt;/a&gt;, Zenchef (a restaurant technology platform) submitted an issue to DevOps Agent during a hackathon and had the root cause identified in 20–30 minutes — an investigation that would normally take 1–2 hours, completed while the engineers stayed focused on the hackathon.&lt;/p&gt;

&lt;h3&gt;
  
  
  GitHub (Security Agent PR Review)
&lt;/h3&gt;

&lt;p&gt;Security Agent automatically posts security review comments on GitHub PRs. Developers can review and address findings without leaving the PR interface. Merge decisions remain human. Details and differentiation points are covered in the "What Security Agent Does and Doesn't Do" section above.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost Structure
&lt;/h2&gt;

&lt;p&gt;Frontier Agents have a distinctive pricing model, especially DevOps Agent's tie-in with AWS Support plans.&lt;/p&gt;

&lt;h3&gt;
  
  
  DevOps Agent: Support Credits
&lt;/h3&gt;

&lt;p&gt;DevOps Agent costs $0.0083/agent-second (roughly $30/hr). The &lt;a href="https://aws.amazon.com/devops-agent/pricing/" rel="noopener noreferrer"&gt;official pricing page&lt;/a&gt; shows usage examples: a small team (10 incident investigations/month, 8 minutes each) at ~$40/month, and an enterprise (500 incidents, 10 Agent Spaces) at ~$2,300/month.&lt;/p&gt;

&lt;p&gt;On top of this, &lt;a href="https://aws.amazon.com/devops-agent/pricing/" rel="noopener noreferrer"&gt;per the pricing page&lt;/a&gt;, AWS Support customers receive monthly credits based on the prior month's Support spend:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Support Plan&lt;/th&gt;
&lt;th&gt;Credit Rate&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Unified Operations&lt;/td&gt;
&lt;td&gt;100% of prior month's Support spend&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise Support&lt;/td&gt;
&lt;td&gt;75% of prior month's Support spend&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Business Support+&lt;/td&gt;
&lt;td&gt;30% of prior month's Support spend&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For example, an organization paying $15,000/month for Enterprise Support would receive $11,250/month in DevOps Agent credits. If usage stays within that, the incremental cost is zero.&lt;/p&gt;

&lt;p&gt;Behind this credit structure is the relationship with TAM (Technical Account Manager) in Enterprise Support. Traditionally, Enterprise Support customers get a TAM who provides architecture reviews and operational guidance. The &lt;a href="https://aws.amazon.com/premiumsupport/plans/enterprise/" rel="noopener noreferrer"&gt;Enterprise Support page&lt;/a&gt; now presents TAM and DevOps Agent side by side — TAM handles strategic guidance, DevOps Agent handles 24/7 automated investigation and improvement proposals. DevOps Agent is positioned as an "extension" of Support, not a replacement, which explains why credits come from Support spend.&lt;/p&gt;

&lt;h3&gt;
  
  
  Security Agent: Penetration Testing Cost Transformation
&lt;/h3&gt;

&lt;p&gt;Security Agent penetration testing is billed at $50/task-hour (design and code reviews have free tiers as mentioned above). Per the &lt;a href="https://aws.amazon.com/security-agent/faqs/" rel="noopener noreferrer"&gt;official FAQ&lt;/a&gt;, a 2-month free trial is included post-GA.&lt;/p&gt;

&lt;p&gt;Traditional third-party penetration testing typically costs hundreds of thousands of yen (tens of thousands of dollars) per engagement, taking weeks. This "high unit cost × low frequency" structure transforms into "$50/task-hour × high frequency."&lt;/p&gt;

&lt;p&gt;The implication: it becomes economically viable to expand penetration testing coverage. Organizations that could only test their most critical applications due to cost constraints can now continuously test across their entire portfolio.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Does SRE Team Management Change?
&lt;/h2&gt;

&lt;p&gt;Frontier Agents adoption has implications for how SRE teams operate.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Nature of On-Call Pain Changes
&lt;/h3&gt;

&lt;p&gt;DevOps Agent's biggest impact is automating initial investigation for late-night alerts. Per the &lt;a href="https://aws.amazon.com/blogs/mt/announcing-general-availability-of-aws-devops-agent/" rel="noopener noreferrer"&gt;official GA blog&lt;/a&gt;, WGU (Western Governors University) deployed it to production during preview, reducing estimated 2-hour investigations to 28 minutes.&lt;/p&gt;

&lt;p&gt;Traditional on-call pain comes from being woken up and having to start investigating from scratch in a less-than-ideal state. Opening dashboards, hunting for metric anomalies, pulling related logs, cross-referencing deployment history — this alone can take 30 minutes to an hour.&lt;/p&gt;

&lt;p&gt;After DevOps Agent adoption, this "starting from zero" disappears. The on-call engineer's first action becomes "read the Agent's findings." The agent presents "this code change in this deployment correlates with this metric anomaly — here's the proposed fix" as the starting point.&lt;/p&gt;

&lt;p&gt;However, a new kind of pressure may emerge: "Is the root cause the Agent identified actually correct? Are there blind spots?" The tension between trusting the agent's output and risking an incorrect fix, versus distrusting it and re-investigating from scratch. Worth discussing as a team before it happens.&lt;/p&gt;

&lt;h3&gt;
  
  
  Required Skill Sets Change
&lt;/h3&gt;

&lt;p&gt;The shift goes from "investigate from scratch when alerted" to "review the Agent's findings and judge whether anything was missed." It's similar to when CI/CD became standard — the emphasis moved from "memorize manual deployment procedures" to "design pipelines and make judgment calls when they fail." As automation advances, the ability to audit automated output and handle exceptions that automation can't becomes more important than hands-on execution skills.&lt;/p&gt;

&lt;p&gt;For less experienced team members, learning design becomes necessary. Previously, "investigating incidents yourself" was the primary way to build incident response skills. With agents handling initial investigation, these "learn by doing" opportunities shrink.&lt;/p&gt;

&lt;p&gt;Possible approaches include "form your own hypothesis before looking at the Agent's findings," "review Agent output with intentionally injected errors," or "regular incident response drills without the Agent." The same thinking as understanding manual deployment procedures even when you rely on CI/CD.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can You Reduce Headcount?
&lt;/h3&gt;

&lt;p&gt;For SRE team managers, this question is unavoidable. Can Frontier Agents adoption mean fewer people?&lt;/p&gt;

&lt;p&gt;In the short term, "same headcount, broader coverage" is more realistic. Agent-handled initial investigations free up SRE team time. Whether that freed time goes to "headcount reduction" or "proactive improvements that were previously backlogged (SLO reviews, chaos engineering, architecture improvements)" is the real question. The latter likely delivers more organizational value.&lt;/p&gt;

&lt;p&gt;SRE teams perpetually stuck in reactive incident response mode (the "firefighter" state) is a common challenge. Frontier Agents adoption is a catalyst for accelerating the shift from "firefighter" to "fire prevention engineer."&lt;/p&gt;

&lt;h2&gt;
  
  
  Constraints and Caveats
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Reading the Numbers
&lt;/h3&gt;

&lt;p&gt;AWS's published figures — "up to 75% MTTR reduction," "up to 80% faster investigations," "94% root cause accuracy" — are all preview-period customer-reported values, explicitly qualified with "up to." Whether your environment sees similar results depends on application complexity, monitoring maturity, and incident characteristics. Treat them as reference points and validate in your own environment.&lt;/p&gt;

&lt;p&gt;WGU's "estimated 2 hours → 28 minutes" and LG CNS's "over 50% faster testing" are also results from specific situations. This article cites these numbers as material for understanding implications, not as guarantees that generalize.&lt;/p&gt;

&lt;h3&gt;
  
  
  Region Limitations
&lt;/h3&gt;

&lt;p&gt;Both DevOps Agent and Security Agent are available in the same six regions at GA: US East (N. Virginia), US West (Oregon), Europe (Frankfurt / Ireland), and Asia Pacific (Sydney / Tokyo). Tokyo region availability is a plus for teams in Japan.&lt;/p&gt;

&lt;p&gt;However, per &lt;a href="https://newclawtimes.com/articles/aws-frontier-agents-devops-security-autonomous-operations-enterprise/" rel="noopener noreferrer"&gt;New Claw Times&lt;/a&gt; analysis, DevOps Agent inference processing occurs in US regions regardless of the selected region. Organizations with strict GDPR or data residency requirements should verify this.&lt;/p&gt;

&lt;h3&gt;
  
  
  Free Trial Limitations
&lt;/h3&gt;

&lt;p&gt;Both agents include a 2-month free trial, but DevOps Agent has monthly caps. Per &lt;a href="https://aws.amazon.com/devops-agent/pricing/" rel="noopener noreferrer"&gt;official pricing&lt;/a&gt;, the trial period allows up to 10 Agent Spaces, 20 hours of incident investigation, 15 hours of prevention evaluations, and 20 hours of on-demand SRE tasks per month. Excess usage incurs standard charges. Sufficient for pre-production evaluation, but watch the limits for large-scale PoCs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multicloud Support Reality
&lt;/h3&gt;

&lt;p&gt;DevOps Agent supports AWS, Azure, and on-premises. On-premises connection uses MCP, requiring access configuration for target tools. "It doesn't just see other environments without setup." Note that DevOps Agent does not explicitly support Google Cloud environments at GA.&lt;/p&gt;

&lt;p&gt;Security Agent's penetration testing supports &lt;a href="https://aws.amazon.com/about-aws/whats-new/2026/03/aws-security-agent-ondemand-penetration/" rel="noopener noreferrer"&gt;per the official GA announcement&lt;/a&gt; "AWS, Azure, GCP, other cloud-providers, and on-premises" — it can test any reachable endpoint regardless of cloud provider.&lt;/p&gt;

&lt;h3&gt;
  
  
  Security Agent: GitHub Only
&lt;/h3&gt;

&lt;p&gt;As mentioned earlier, Security Agent's PR review (Code Review) only supports GitHub at GA. Organizations primarily using GitLab or Bitbucket need to factor in this constraint.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary: Not "Replacement" but "Redesigning the Division of Labor"
&lt;/h2&gt;

&lt;p&gt;Frontier Agents don't "eliminate" SRE work — they "partition" it.&lt;/p&gt;

&lt;p&gt;Work that shifts to agents centers on pattern recognition and correlation analysis: detecting anomalies across metrics and logs, matching against historical incidents to form hypotheses, systematically scanning code for vulnerabilities. This "intellectual labor that demands volume and speed" is where agents excel.&lt;/p&gt;

&lt;p&gt;Work that stays human centers on judgment and decision-making: whether to apply a fix, how to assess business impact, how much risk to accept, how to evolve the architecture. These are context-dependent, requiring organizational knowledge and business priority judgment — outside the agent's scope.&lt;/p&gt;

&lt;p&gt;For SRE team management, this is an opportunity to redesign team skill composition and on-call structure around this new division of labor. Not "agents mean we need fewer people," but "agents handle more of the routine, so humans can focus on work that demands greater judgment."&lt;/p&gt;

</description>
      <category>aws</category>
      <category>devops</category>
      <category>sre</category>
      <category>security</category>
    </item>
    <item>
      <title>Architecture Layers That S3 Files Eliminates — and Creates</title>
      <dc:creator>Kento IKEDA</dc:creator>
      <pubDate>Thu, 09 Apr 2026 06:56:45 +0000</pubDate>
      <link>https://dev.to/aws-builders/architecture-layers-that-s3-files-eliminates-and-creates-16ke</link>
      <guid>https://dev.to/aws-builders/architecture-layers-that-s3-files-eliminates-and-creates-16ke</guid>
      <description>&lt;p&gt;On April 7, 2026, AWS made Amazon S3 Files generally available. It lets you mount S3 buckets as NFS v4.1/v4.2 file systems from EC2, EKS, ECS, and Lambda.&lt;/p&gt;


&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
        &lt;div class="c-embed__cover"&gt;
          &lt;a href="https://aws.amazon.com/blogs/aws/launching-s3-files-making-s3-buckets-accessible-as-file-systems/" class="c-link align-middle" rel="noopener noreferrer"&gt;
            &lt;img alt="" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fd2908q01vomqb2.cloudfront.net%2Fda4b9237bacccdf19c0760cab7aec4a8359010b0%2F2026%2F04%2F07%2FScreenshot-2026-04-06-at-3.50.49%25E2%2580%25AFPM.png" height="433" class="m-0" width="800"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="c-embed__body"&gt;
        &lt;h2 class="fs-xl lh-tight"&gt;
          &lt;a href="https://aws.amazon.com/blogs/aws/launching-s3-files-making-s3-buckets-accessible-as-file-systems/" rel="noopener noreferrer" class="c-link"&gt;
            Launching S3 Files, making S3 buckets accessible as file systems | AWS News Blog
          &lt;/a&gt;
        &lt;/h2&gt;
        &lt;div class="color-secondary fs-s flex items-center"&gt;
            &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fa0.awsstatic.com%2Fmain%2Fimages%2Fsite%2Ffav%2Ffavicon.ico" width="16" height="16"&gt;
          aws.amazon.com
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;There are already plenty of setup guides and first-look posts. This article focuses on something different: what becomes unnecessary and what becomes possible in your architecture.&lt;/p&gt;

&lt;p&gt;If you use S3 regularly and are wondering "this sounds big, but how does it actually affect my architecture?" — this is for you.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem S3 Files Is Solving
&lt;/h2&gt;

&lt;p&gt;Let's start with a shared understanding.&lt;/p&gt;

&lt;p&gt;Say an ML team needs to preprocess training data. The raw data is in S3. They want to use pandas. While &lt;code&gt;pd.read_csv("s3://my-bucket/data.csv")&lt;/code&gt; works, under the hood boto3 issues GET requests and loads data into memory. Writing results back requires PUT. This is fundamentally different from &lt;code&gt;open("./data.csv")&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;At scale, this becomes an architectural problem. Many organizations operate pipelines like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fojovark7e9sap1xag7k5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fojovark7e9sap1xag7k5.png" alt=" " width="800" height="177"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Copy from S3 to EFS/EBS, process, write results back to S3. This "middle copy layer" exists solely to bridge the I/O model gap between object storage and file systems. Maintaining sync scripts, managing consistency during copies, and provisioning EFS — all of this overhead comes from that gap.&lt;/p&gt;

&lt;p&gt;S3 Files aims to eliminate this gap entirely.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqxczxqawaej6wt02f41a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqxczxqawaej6wt02f41a.png" alt=" " width="800" height="183"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From the application's perspective, S3 data appears as a local directory. &lt;code&gt;pd.read_csv("/mnt/s3files/data.csv")&lt;/code&gt; reads from S3 behind the scenes, and &lt;code&gt;df.to_csv("/mnt/s3files/result.csv")&lt;/code&gt; automatically commits changes back.&lt;/p&gt;

&lt;p&gt;The full technical overview is in the official documentation.&lt;/p&gt;


&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
      &lt;div class="c-embed__body flex items-center justify-between"&gt;
        &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-files.html" rel="noopener noreferrer" class="c-link fw-bold flex items-center"&gt;
          &lt;span class="mr-2"&gt;docs.aws.amazon.com&lt;/span&gt;
          

        &lt;/a&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


&lt;h2&gt;
  
  
  Why This Isn't Just Another Mount Feature
&lt;/h2&gt;

&lt;p&gt;If "mount S3" sounds familiar, you might be thinking of Mountpoint for Amazon S3 or Google Cloud's Cloud Storage FUSE (gcsfuse). S3 Files has a fundamentally different architecture.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Difference from FUSE-Based Tools
&lt;/h3&gt;

&lt;p&gt;FUSE-based tools emulate file system behavior on top of S3's API. In Mountpoint for Amazon S3, for example, overwriting a file means deleting the old object and PUTting a new one. Partial file writes — a basic file system operation — aren't supported. Directories don't actually exist, leading to inconsistencies with empty directories.&lt;/p&gt;

&lt;p&gt;S3 Files doesn't emulate. It connects EFS (Elastic File System), a real NFS file system, to S3. The file system side provides real NFS semantics, and the S3 side remains real S3 objects. Two distinct systems coexist with an explicit synchronization layer between them.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgep96c8nftvluyqe4r9h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgep96c8nftvluyqe4r9h.png" alt=" " width="800" height="661"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This matters in practice: appending to a WAL (Write-Ahead Log) or editing part of a config file works with byte-level writes on the file system side, periodically synced to S3 as whole objects. With FUSE, these operations require re-PUTting the entire object.&lt;/p&gt;

&lt;h3&gt;
  
  
  What "Stage and Commit" Actually Does
&lt;/h3&gt;

&lt;p&gt;Andy Warfield, VP and Distinguished Engineer at AWS, describes the sync model as "stage and commit" in his post on All Things Distributed, explicitly noting it's "a term borrowed from version control systems like git" (official documentation uses "synchronization" instead).&lt;/p&gt;

&lt;p&gt;File system changes are like working directory changes in Git. They aren't immediately reflected in S3 — instead, they're batched and committed as S3 PUTs approximately every 60 seconds. In the other direction, when objects are updated in S3 (e.g., via PutObject from another application), the official documentation states changes are reflected in the file system "typically within seconds." DevelopersIO's hands-on testing measured approximately 30 seconds.&lt;/p&gt;


&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
        &lt;div class="c-embed__cover"&gt;
          &lt;a href="https://dev.classmethod.jp/articles/amazon-s3-files-ga-mount-and-compare-efs/" class="c-link align-middle" rel="noopener noreferrer"&gt;
            &lt;img alt="" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.ctfassets.net%2Fct0aopd36mqt%2Fwp-thumbnail-066beb776f0c57ce64255fadcc072f60%2F82b2f6687ab5774cd73f9176dcac7855%2Famazon-s3" height="630" class="m-0" width="1200"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="c-embed__body"&gt;
        &lt;h2 class="fs-xl lh-tight"&gt;
          &lt;a href="https://dev.classmethod.jp/articles/amazon-s3-files-ga-mount-and-compare-efs/" rel="noopener noreferrer" class="c-link"&gt;
            Amazon S3 Files が GA — S3 バケットをファイルシステムとしてマウント、EFS と比較してみた | DevelopersIO
          &lt;/a&gt;
        &lt;/h2&gt;
          &lt;p class="truncate-at-3"&gt;
            2026年4月提供開始のAmazon S3 Filesは、S3バケットをNFS v4.2でマウント可能にする新サービス。EC2/Lambda/EKS/ECSから利用でき、既存レガシーアプリケーションのコード変更なしでS3を活用できます。

          &lt;/p&gt;
        &lt;div class="color-secondary fs-s flex items-center"&gt;
            &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev.classmethod.jp%2Ffavicon.ico" width="48" height="48"&gt;
          dev.classmethod.jp
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;If both sides modify the same file simultaneously, S3 is the source of truth. The file system version is moved to a lost+found directory, with a CloudWatch metric indicating the conflict.&lt;/p&gt;

&lt;p&gt;This is a deliberate tradeoff: not a real-time shared file system, but one that tolerates tens of seconds of delay in exchange for preserving both file and object semantics without compromise.&lt;/p&gt;

&lt;p&gt;According to Warfield's post, the team initially tried to make the boundary between files and objects invisible, but every approach forced unacceptable compromises on one side or the other. They ultimately decided to make the boundary itself an explicit, well-designed feature. His post is essential reading for understanding the "why" behind S3 Files.&lt;/p&gt;


&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
        &lt;div class="c-embed__cover"&gt;
          &lt;a href="https://www.allthingsdistributed.com/2026/04/s3-files-and-the-changing-face-of-s3.html" class="c-link align-middle" rel="noopener noreferrer"&gt;
            &lt;img alt="" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.allthingsdistributed.com%2Fimages%2Fsunflowers.jpg" height="570" class="m-0" width="800"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="c-embed__body"&gt;
        &lt;h2 class="fs-xl lh-tight"&gt;
          &lt;a href="https://www.allthingsdistributed.com/2026/04/s3-files-and-the-changing-face-of-s3.html" rel="noopener noreferrer" class="c-link"&gt;
            S3 Files and the changing face of S3 | All Things Distributed
          &lt;/a&gt;
        &lt;/h2&gt;
          &lt;p class="truncate-at-3"&gt;
            Andy Warfield writes about the hard-won lessons dealing with data friction that lead to S3 Files
          &lt;/p&gt;
        &lt;div class="color-secondary fs-s flex items-center"&gt;
            &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/data%3Aimage%2Fpng%3Bbase64%2CiVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mNk%2BA8AAQUBAScY42YAAAAASUVORK5CYII%3D" width="1" height="1"&gt;
          allthingsdistributed.com
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


&lt;h2&gt;
  
  
  Architecture Layers That Disappear
&lt;/h2&gt;

&lt;p&gt;Here's the core of this article: what specific architectural patterns does S3 Files make unnecessary?&lt;/p&gt;

&lt;h3&gt;
  
  
  1. S3 → EFS/EBS Staging Pipelines
&lt;/h3&gt;

&lt;p&gt;Consider a daily retraining pipeline for a recommendation model. Purchase logs accumulate in S3, and preprocessing involves data cleansing → feature generation → format conversion.&lt;/p&gt;

&lt;p&gt;Previously, every time an EC2 or SageMaker Processing Job starts, it first downloads data from S3 to EBS. For 100GB of training data, depending on instance network bandwidth, the download alone takes several minutes. After processing, results are uploaded back to S3, and the EBS volume is cleaned up. Of the four steps — download → process → upload → cleanup — only "process" is the actual work.&lt;/p&gt;

&lt;p&gt;With S3 Files, you mount the S3 prefix (e.g., &lt;code&gt;s3://ml-data/purchase-logs/&lt;/code&gt;) and your processing script reads and writes &lt;code&gt;/mnt/s3files/purchase-logs/&lt;/code&gt; directly. Download, upload, and cleanup steps disappear.&lt;/p&gt;

&lt;p&gt;Note: if a downstream job needs to read results via the S3 API immediately, the ~60-second commit delay matters. If both jobs use the same mount point, this isn't an issue. For S3 API consumers, design around S3 event notifications or explicit waits.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Lambda's "/tmp Download" Pattern
&lt;/h3&gt;

&lt;p&gt;Consider a Lambda function that generates thumbnails when images are uploaded to S3. The traditional implementation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Traditional: Download → Process → Upload
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;PIL&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;

&lt;span class="n"&gt;s3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;bucket&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Records&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;bucket&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Records&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;key&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;download_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/tmp/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;download_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;download_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;download_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;thumbnail&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;thumb_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/tmp/thumb_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;thumb_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upload_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;thumb_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;thumbnails/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;With S3 Files mounted:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# S3 Files: Operate directly on mounted paths
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;PIL&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Records&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;key&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/mnt/s3files/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;thumbnail&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/mnt/s3files/thumbnails/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;You don't even need to import boto3. The same code you'd write for local development works as-is.&lt;/p&gt;

&lt;p&gt;Beyond code simplicity, Lambda functions are freed from &lt;code&gt;/tmp&lt;/code&gt; capacity constraints (default 512MB, max 10GB). For functions referencing multi-GB ML models, cold start download time directly impacted latency. S3 Files pre-fetches files below a configurable threshold (default 128KB) alongside metadata, and fetches larger files on demand. Warfield calls this "lazy hydration" in his post — you can start working immediately even with millions of objects in the bucket.&lt;/p&gt;


&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
      &lt;div class="c-embed__body flex items-center justify-between"&gt;
        &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-files-mounting-lambda.html" rel="noopener noreferrer" class="c-link fw-bold flex items-center"&gt;
          &lt;span class="mr-2"&gt;docs.aws.amazon.com&lt;/span&gt;
          

        &lt;/a&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Self-Managed EFS + S3 Sync
&lt;/h3&gt;

&lt;p&gt;If your organization uses S3 as a data lake but needs EFS for real-time processing or interactive analysis, you likely have DataSync, Step Functions, or cron scripts bridging the two. Maintaining this sync logic — detecting new objects, identifying diffs, retry on failure, consistency during sync, cleanup of stale EFS files — is a significant operational burden.&lt;/p&gt;

&lt;p&gt;S3 Files replaces this with managed synchronization. Per the official documentation, import from S3 runs at up to 2,400 objects/second, and export to S3 uses ~60-second batch windows. Unused file data is automatically evicted from the file system cache (configurable from 1 to 365 days, default 30) but never deleted from S3. File system storage costs scale with your active working set, not your total dataset.&lt;/p&gt;


&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
      &lt;div class="c-embed__body flex items-center justify-between"&gt;
        &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-files-synchronization.html" rel="noopener noreferrer" class="c-link fw-bold flex items-center"&gt;
          &lt;span class="mr-2"&gt;docs.aws.amazon.com&lt;/span&gt;
          

        &lt;/a&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


&lt;h3&gt;
  
  
  4. Adapter Layers for Legacy Applications
&lt;/h3&gt;

&lt;p&gt;Log aggregation tools watching &lt;code&gt;/var/log/&lt;/code&gt;, build systems reading from &lt;code&gt;/src/&lt;/code&gt;, config management tools writing to &lt;code&gt;/etc/&lt;/code&gt; — these applications assume &lt;code&gt;open()&lt;/code&gt; / &lt;code&gt;read()&lt;/code&gt; / &lt;code&gt;write()&lt;/code&gt; and rewriting them for the S3 SDK is often impractical.&lt;/p&gt;

&lt;p&gt;Previously, "put files on EFS, back up to S3 as needed" was the pragmatic solution. S3 Files lets you keep S3 as primary storage while applications access it via NFS mount. POSIX permissions and file locking (flock) are supported, making migration possible with a mount point change and zero code changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  New Architecture Patterns
&lt;/h2&gt;

&lt;p&gt;What becomes practically feasible for the first time?&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Two-Tier Read Optimization
&lt;/h3&gt;

&lt;p&gt;S3 Files uses a two-tier architecture internally. The first tier, "high-performance storage," caches small, frequently accessed files with sub-millisecond to single-digit millisecond latency per the official documentation. The second tier is S3 itself — reads of 1MB or larger are streamed directly from S3 even if data is cached locally, because S3 is optimized for throughput. Notably, these large reads incur only S3 GET request costs with no file system access charge.&lt;/p&gt;

&lt;p&gt;Official performance specifications:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Max read throughput per client&lt;/td&gt;
&lt;td&gt;3 GiB/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aggregate read throughput per file system&lt;/td&gt;
&lt;td&gt;Terabytes/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max read IOPS per file system&lt;/td&gt;
&lt;td&gt;250,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aggregate write throughput per file system&lt;/td&gt;
&lt;td&gt;1–5 GiB/s (varies by region)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max write IOPS per file system&lt;/td&gt;
&lt;td&gt;50,000&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;


&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
      &lt;div class="c-embed__body flex items-center justify-between"&gt;
        &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-files-performance.html" rel="noopener noreferrer" class="c-link fw-bold flex items-center"&gt;
          &lt;span class="mr-2"&gt;docs.aws.amazon.com&lt;/span&gt;
          

        &lt;/a&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;For context: EBS gp3 provides 125 MiB/s baseline throughput, scalable to 2,000 MiB/s (~2 GB/s) with additional provisioning. io2 Block Express maxes out at 4 GB/s. S3 Files delivers comparable read throughput without any volume provisioning.&lt;/p&gt;

&lt;p&gt;From spec values alone: reading a 100GB dataset sequentially takes ~13 minutes at gp3 default (125 MiB/s) versus ~33 seconds at S3 Files maximum (3 GiB/s). Actual throughput depends on workload and instance type, but the order-of-magnitude difference matters. And since 1MB+ reads are billed at S3 GET rates only, heavy sequential reads essentially incur no file system charges.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Large Reference Data in Lambda
&lt;/h3&gt;

&lt;p&gt;Previously, Lambda functions using large reference data had three options: container images with embedded models (max 10GB, rebuild on every model update), EFS mounts (requires VPC, tends to increase cold starts), or S3 downloads to &lt;code&gt;/tmp&lt;/code&gt; (max 10GB, download time added to cold starts). S3 Files is a fourth option: mount the S3 prefix, read model files via the file system. Model updates require only an S3 upload — no Lambda redeployment needed.&lt;/p&gt;

&lt;p&gt;Unlike EFS mounts, the backend is your standard S3 bucket, so S3-native features like versioning, lifecycle policies, and cross-region replication work as-is.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. AI Agent Access to S3 Data
&lt;/h3&gt;

&lt;p&gt;Coding agents like Claude Code, Codex, Kiro, and Cursor use file system operations as their primary data access method: &lt;code&gt;ls&lt;/code&gt; to list files, &lt;code&gt;cat&lt;/code&gt; to read, editor to modify and save. It's the Unix toolchain.&lt;/p&gt;

&lt;p&gt;Of course, agents can access S3 through other means — running aws cli commands, calling S3 APIs via MCP servers or Skills/Powers, generating boto3 code. But all of these are indirect compared to file operations and add reasoning steps. To search S3 logs, a file system lets you write &lt;code&gt;grep -r "ERROR" /mnt/s3files/logs/&lt;/code&gt; in one line, while the S3 API requires listing objects, downloading individually, and searching locally.&lt;/p&gt;

&lt;p&gt;With S3 Files mounting the bucket, this indirection disappears. To the agent, S3 data is just another directory under &lt;code&gt;/mnt/s3files/&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4r8yd6ripkk8murb0d61.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4r8yd6ripkk8murb0d61.png" alt=" " width="489" height="1662"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Warfield's post describes AWS engineering teams using Kiro and Claude Code hitting the problem of agent context windows compacting and losing session state. With S3 Files, agents write investigation notes and task summaries to shared directories, and other agents read them. When sessions end, state persists on the file system for the next session.&lt;/p&gt;

&lt;p&gt;File locking (flock) supports mutual exclusion across agents and processes. However, S3 API access bypasses file locks — if you write from both the file system and S3 API simultaneously, locking won't protect you.&lt;/p&gt;

&lt;h2&gt;
  
  
  Constraints and Decision Criteria
&lt;/h2&gt;

&lt;p&gt;S3 Files isn't universal. Key constraints to evaluate:&lt;/p&gt;

&lt;h3&gt;
  
  
  Commit Interval: ~60 Seconds (By Design)
&lt;/h3&gt;

&lt;p&gt;Writes take ~60 seconds to appear as S3 objects. If job B reads via S3 API immediately after job A writes via the file system, job B may see stale data.&lt;/p&gt;

&lt;p&gt;This isn't just a limitation — it's a cost optimization. Per the official documentation, consecutive writes to the same file are aggregated within the 60-second window and committed as a single S3 PUT, reducing S3 request costs and versioning storage overhead.&lt;/p&gt;

&lt;p&gt;Sync throughput per the &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-files-performance.html" rel="noopener noreferrer"&gt;official performance specification&lt;/a&gt;: S3 → file system at up to 2,400 objects/s and 700 MB/s; file system → S3 at up to 800 files/s and 2,700 MB/s.&lt;/p&gt;

&lt;p&gt;No "commit now" API exists at GA. Warfield mentions this as an area for future improvement. Workarounds: pass data between jobs via the file system (same mount point), or trigger downstream jobs via S3 event notifications.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rename Costs
&lt;/h3&gt;

&lt;p&gt;S3 has no native rename. File system renames are implemented as copy + delete internally. Per the &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-files-performance.html" rel="noopener noreferrer"&gt;official performance specification&lt;/a&gt;, renaming a directory of 100,000 files completes instantly on the file system, but takes several minutes to reflect in the S3 bucket. During that window, the file system shows the new path while S3 still has the old keys. S3-side request costs (100K CopyObject + 100K DeleteObject) are also non-trivial.&lt;/p&gt;

&lt;h3&gt;
  
  
  Buckets Exceeding 50 Million Objects
&lt;/h3&gt;

&lt;p&gt;Warfield's post warns about mounting buckets with more than 50 million objects (this figure doesn't currently appear on the &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-files-quotas.html" rel="noopener noreferrer"&gt;official quotas page&lt;/a&gt;). Consider mounting a specific prefix to narrow the scope.&lt;/p&gt;

&lt;h3&gt;
  
  
  VPC Requirement
&lt;/h3&gt;

&lt;p&gt;Mount targets live inside a VPC. Lambda functions and EC2 instances must connect from subnets in the same AZ as the mount target. Per the &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-files.html" rel="noopener noreferrer"&gt;official documentation&lt;/a&gt;, supported compute services are EC2, Lambda, EKS, and ECS. On-premises or cross-cloud resources are not in the supported list.&lt;/p&gt;

&lt;h3&gt;
  
  
  Namespace Incompatibilities
&lt;/h3&gt;

&lt;p&gt;Some S3 object keys can't be represented as POSIX filenames: keys ending with &lt;code&gt;/&lt;/code&gt;, keys containing POSIX-invalid characters, or path components exceeding 255 bytes. See the &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-files-quotas.html" rel="noopener noreferrer"&gt;official quotas page&lt;/a&gt; for the full list.&lt;/p&gt;

&lt;p&gt;This is intentional. Per Warfield's post, the team chose to pass through the vast majority of keys that work in both worlds and emit events for incompatible ones rather than silently converting them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Versioning Required
&lt;/h3&gt;

&lt;p&gt;S3 Files &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-files-prereq-policies.html" rel="noopener noreferrer"&gt;requires S3 bucket versioning&lt;/a&gt;. For existing buckets, evaluate the storage cost impact (old versions are retained) and compatibility with existing lifecycle rules.&lt;/p&gt;

&lt;h3&gt;
  
  
  Decision Flowchart
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp33nbpcbl3hn946f2xrj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp33nbpcbl3hn946f2xrj.png" alt=" " width="800" height="1066"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Review in Your Existing Architecture
&lt;/h2&gt;

&lt;p&gt;First, inventory your pipelines for "copy from S3, process, write back to S3" patterns. Batch processing and ML preprocessing pipelines with EBS/EFS staging layers are prime candidates for replacement.&lt;/p&gt;

&lt;p&gt;Second, consider how storage choices change for new projects. "Put it in S3 now, access it as a file system later" is now a viable strategy, reducing the urgency of early "object vs. file system" decisions.&lt;/p&gt;

&lt;p&gt;Third, audit Lambda functions that explicitly download to / upload from &lt;code&gt;/tmp&lt;/code&gt;. Functions handling large reference data or sharing data across invocations are worth evaluating.&lt;/p&gt;

&lt;p&gt;S3 started 20 years ago as an object store. With Tables, Vectors, and now Files, it has expanded how data can be accessed. S3 Files removes one more architectural constraint imposed by storage choices. It won't apply to every workload, but for organizations where "the data is in S3 but the tools need a file system" — and that's a lot of organizations — the impact is significant.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>s3</category>
      <category>lambda</category>
      <category>architecture</category>
    </item>
  </channel>
</rss>
