<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Kento IKEDA</title>
    <description>The latest articles on DEV Community by Kento IKEDA (@ikenyal).</description>
    <link>https://dev.to/ikenyal</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F634239%2Fefa867e9-b872-436c-a450-2c5115bd4394.jpg</url>
      <title>DEV Community: Kento IKEDA</title>
      <link>https://dev.to/ikenyal</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ikenyal"/>
    <language>en</language>
    <item>
      <title>What AgentCore Managed Harness Takes Over, What It Leaves to You</title>
      <dc:creator>Kento IKEDA</dc:creator>
      <pubDate>Fri, 08 May 2026 21:03:13 +0000</pubDate>
      <link>https://dev.to/aws-builders/what-agentcore-managed-harness-takes-over-what-it-leaves-to-you-1je6</link>
      <guid>https://dev.to/aws-builders/what-agentcore-managed-harness-takes-over-what-it-leaves-to-you-1je6</guid>
      <description>&lt;p&gt;On April 22, 2026, AWS added a "managed agent harness" (preview) to Amazon Bedrock AgentCore. With this feature, you declare the model, system prompt, and tools as configuration, and the agent runs—the orchestration code lives on the AWS side as managed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/blogs/machine-learning/get-to-your-first-working-agent-in-minutes-announcing-new-features-in-amazon-bedrock-agentcore/" rel="noopener noreferrer"&gt;https://aws.amazon.com/blogs/machine-learning/get-to-your-first-working-agent-in-minutes-announcing-new-features-in-amazon-bedrock-agentcore/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What stands out about this release is less the feature itself and more AWS's adoption of the term "agent harness." Since Martin Fowler wrote his harness engineering essay in February 2026, Anthropic and OpenAI have started using "harness" officially, and now a cloud vendor has applied the same word to its own service.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://martinfowler.com/articles/harness-engineering.html" rel="noopener noreferrer"&gt;https://martinfowler.com/articles/harness-engineering.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From the perspective of someone who has been assembling a harness by hand, the question becomes: what does managed harness take over, and what stays in my hands? This article sorts out that dividing line. Drawing on experience running business-automation agents with Claude Desktop, multiple MCP servers, and Markdown-based knowledge, I lay out the correspondence with AgentCore managed harness.&lt;/p&gt;

&lt;p&gt;A few "tried it out" articles have already been published, so this article positions itself as the prequel: it offers material for deciding whether to adopt, not adopt, or how to phase in. Drawing on the official blog, documentation, and existing explanatory articles as sources, I sort out the correspondence and the judgment criteria that emerge from self-built operation.&lt;/p&gt;

&lt;h2&gt;
  
  
  AWS released "managed harness"
&lt;/h2&gt;

&lt;p&gt;The official blog mentioned above lays out the structure: every agent has an orchestration layer, and running that layer requires compute, a sandbox to safely execute code, tool connections, persistent storage, and error recovery as the underlying infrastructure—bundled together, they form the agent harness. Managed harness is AWS providing this harness as a managed offering, where the user declares the model, system prompt, and tools as configuration, and a working agent is the result.&lt;/p&gt;

&lt;p&gt;Let me first align on what the word "harness" refers to. The term gets used both for what the vendor builds in (internal) and for what the user assembles around the agent (external), and the meaning shifts with context. In addition to Fowler's framing, watany has organized the internal/external confusion in a Zenn article.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://zenn.dev/watany/articles/d8b692bbca65a3" rel="noopener noreferrer"&gt;https://zenn.dev/watany/articles/d8b692bbca65a3&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This article is written from the position of "someone who has been assembling the external environment by hand"—the user-side harness, in operation. AgentCore managed harness can be read as the vendor-side internal harness now offered as managed, but from the user's perspective, it can also be read as: part of what we used to build for ourselves can now be delegated. This duality is the starting point for thinking about where responsibilities split with self-built operation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Self-built harness composition, the four blank layers
&lt;/h2&gt;

&lt;p&gt;Let me map my self-built harness to AgentCore's components. The environment I've been operating consists, broadly, of three elements, and I'll lay out how each one corresponds to something on the AgentCore side.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Self-built harness&lt;/th&gt;
&lt;th&gt;AgentCore side&lt;/th&gt;
&lt;th&gt;Degree of correspondence&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Markdown knowledge files (under &lt;code&gt;agents/&lt;/code&gt;, &lt;code&gt;knowledge/&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;AgentCore Memory&lt;/td&gt;
&lt;td&gt;Similar role; persistence and retrieval mechanisms differ&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP servers (task management / calendar / chat / document management, etc.)&lt;/td&gt;
&lt;td&gt;AgentCore Gateway&lt;/td&gt;
&lt;td&gt;MCP is becoming the standard, so they're close&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Desktop&lt;/td&gt;
&lt;td&gt;AgentCore Runtime&lt;/td&gt;
&lt;td&gt;The execution base for the agent loop, at a different scale&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;(none)&lt;/td&gt;
&lt;td&gt;AgentCore Identity&lt;/td&gt;
&lt;td&gt;Not implemented in self-built&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;(none)&lt;/td&gt;
&lt;td&gt;AgentCore Policy&lt;/td&gt;
&lt;td&gt;Not implemented in self-built&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;(none)&lt;/td&gt;
&lt;td&gt;AgentCore Observability&lt;/td&gt;
&lt;td&gt;Not implemented in self-built&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;(none)&lt;/td&gt;
&lt;td&gt;AgentCore Evaluations&lt;/td&gt;
&lt;td&gt;Not implemented in self-built&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The top three are the correspondence between "what I assembled by hand" and "what AgentCore provides as managed for the same role." The bottom four are blank layers in the self-built harness—components AgentCore offers that aren't covered by my operation.&lt;/p&gt;

&lt;p&gt;The natural question here is whether these four blank layers are "things I didn't write because I didn't need them" or "things I wanted but had given up on." The two are different. For the former, introducing managed harness yields little value; for the latter, it brings value.&lt;/p&gt;

&lt;p&gt;Let me go through the four layers in order.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Identity&lt;/strong&gt; is for managing authentication and permissions when multiple users access the agent. Since my self-built harness runs on a personal device, authentication can rely on the device login, and per-agent authentication wasn't necessary. This is unnecessary "as long as it's just me." The moment you try to share an agent across an organization, controlling who can call which MCP for what becomes a problem, and the gap surfaces in the form of resignation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Policy&lt;/strong&gt; is the mechanism for declaratively defining boundaries when the agent calls tools. It's based on Cedar, AWS's open-source policy language, and you can generate policies from natural language. In my self-built harness, I draw loose boundaries through MCP server scopes and by documenting "what not to do" in the knowledge files—but this is discipline, not enforcement. I had wanted to write strong, enforceable boundaries, but didn't have the motivation to build a Cedar-equivalent system myself, so I had given up on this area.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Observability&lt;/strong&gt; is the mechanism for emitting agent execution logs, traces, and metrics to CloudWatch for visualization. In my self-built harness, I have the conversation history in Claude Desktop and individual logs from each MCP server, but no mechanism to track "which agent called what when, and how it failed" across the board. For solo use, looking at the chat screen suffices, but this becomes necessary in organizational deployment, and falls into the resignation category.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Evaluations&lt;/strong&gt; is the mechanism for continuously evaluating the agent's response quality, with built-in evaluators for dimensions like helpfulness, tool-selection accuracy, and correctness. In my self-built harness, I check subjectively through knowledge-file improvement history and daily work logs, but I have no quantitative quality monitoring. For solo use, subjective is enough; but for organizational operation or paid services, this becomes essential.&lt;/p&gt;

&lt;p&gt;Looking back at the four layers, only Identity falls into "unnecessary as long as it's just me," while the other three fall into "would have been nice, but had given up on as self-built." The fact that the meaning of "blank" differs by layer affects the judgment of whether to adopt managed harness.&lt;/p&gt;

&lt;h2&gt;
  
  
  Layers managed harness takes over, layers it leaves
&lt;/h2&gt;

&lt;p&gt;When you use managed harness, what stops being something you write, and what continues to require writing? This can be derived as fact from the official blog and documentation, so let me sort it out first.&lt;/p&gt;

&lt;p&gt;What managed harness takes over is the following range:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The agent loop: calling the model, selecting tools, returning results, managing context, and recovering from errors&lt;/li&gt;
&lt;li&gt;A microVM, filesystem, and shell isolated per session&lt;/li&gt;
&lt;li&gt;Tool-connection orchestration via AgentCore Gateway&lt;/li&gt;
&lt;li&gt;The framework portion based on Strands Agents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Conversely, what users still need to write even when using managed harness is the following range:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which model to use&lt;/li&gt;
&lt;li&gt;What to write in the system prompt&lt;/li&gt;
&lt;li&gt;Which tools to make callable&lt;/li&gt;
&lt;li&gt;What goes into AgentCore Memory and what doesn't&lt;/li&gt;
&lt;li&gt;What boundaries to declare in AgentCore Policy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Since declaration-based configuration suffices, the amount of code drops significantly. However, the five items above are simply "what you write as configuration changes"—the judgments themselves don't go away. They just shift into the form of the &lt;code&gt;harness.json&lt;/code&gt; configuration file. Reading preview validation articles by people who have actually tried managed harness, you'll see that &lt;code&gt;harness.json&lt;/code&gt; lists the model and tool list as declarations, while a separate &lt;code&gt;system-prompt.md&lt;/code&gt; file holds the system prompt.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.classmethod.jp/articles/bedrock-agentcore-managed-harness-preview/" rel="noopener noreferrer"&gt;https://dev.classmethod.jp/articles/bedrock-agentcore-managed-harness-preview/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/aws-samples/sample-AgentCore-Managed-Harness-News" rel="noopener noreferrer"&gt;https://github.com/aws-samples/sample-AgentCore-Managed-Harness-News&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This looks like what was previously written as Markdown system-prompt files and MCP connection definitions in the self-built harness, repackaged into AWS's configuration file format.&lt;/p&gt;

&lt;p&gt;In other words, what managed harness takes over is "the labor of writing orchestration code," not "the judgment of designing the agent." Design judgments still rest with the user. AWS expresses this as removing the infrastructure barrier, but the non-infrastructure part—"what is this agent for, and how far should it be allowed to go"—remains on the human side, whether it's managed or self-built.&lt;/p&gt;

&lt;p&gt;This distinction is an important perspective when judging whether to adopt managed harness. The pitch "you don't have to write code" is accurate, but reading it as "you don't have to think" makes it inaccurate.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where self-built operation can articulate "the place of design judgments"
&lt;/h2&gt;

&lt;p&gt;When you operate a self-built harness, you accumulate judgments about "where it's okay to move things, and where you must not." These don't go away when you adopt managed harness. The place where they appear shifts to the contents of &lt;code&gt;harness.json&lt;/code&gt;, but the judgments themselves continue to rest on the human side. Let me name a few representative ones.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Knowledge file granularity.&lt;/strong&gt; Whether to split your Markdown knowledge "by role" or "by task" is a judgment that, once made, eases subsequent operation. Splitting by role lets agent dispatch fall naturally out of context. Splitting by task scatters cross-task knowledge. There's no simple winner; the optimum depends on the number of agents you operate and how tasks overlap. Even with managed harness, the same question—what to combine in Memory and what to separate—remains.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MCP server combination design.&lt;/strong&gt; This is the line between "how far to wire up as tools via MCP" and "how far to handle through local file operations." For example, task management is better suited to MCP via API for automation, while sensitive tasks are safer kept as local file operations—judgments that emerge through use. Managed harness's Gateway has to answer the same question, just translated into declarations in a tool list.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent-to-agent responsibility split.&lt;/strong&gt; This is the design choice between having a coordinator agent that judges context and dispatches to specialist agents, or calling specialist agents directly from the start. The coordinator style depends on context-judgment accuracy; the direct-call style puts the discrimination burden on the user. This too remains as a design judgment in managed harness, in the form of how to arrange and connect multiple harnesses.&lt;/p&gt;

&lt;p&gt;These three are judgments that are hard to articulate without operating self-built first. If you start from managed harness, these judgments end up looking "as if they were optimally placed from the beginning." In reality, you've just fixed the premises, but inside fixed premises, the existence of design judgments themselves becomes harder to see.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why not just use managed harness from the start?
&lt;/h3&gt;

&lt;p&gt;Here's a counterargument I anticipate: "If we just use managed harness from the start, we won't need to build anything ourselves."&lt;/p&gt;

&lt;p&gt;I partially agree with this counterargument. If you're building a new agent for organizational production from zero, going in through managed harness is faster, I think. However, the design of an agent to run in production rarely "is visible from the start." Only by actually using the agent do the granularity of knowledge, the over- and under-supply of tools, and the boundaries of responsibility come into view. Whether you run this discovery flow on top of a managed harness with set boundaries, or on a self-built harness with high freedom, changes the amount of learning you get.&lt;/p&gt;

&lt;p&gt;Another perspective: judgments gained from self-built operation can be reused as a blueprint when you migrate to managed harness. If you go into managed harness without a blueprint, you can produce something that appears to work, but a system remains where it's hard to explain why it was structured that way. Whether "let's just put it on managed harness and improve it as we go" works depends on whether one person is improving or multiple people are improving. For one person, the iteration speed gap between self-built and managed may be small; but at the stage where multiple people improve, the declarative changes in &lt;code&gt;harness.json&lt;/code&gt; and the deploy-unit iteration cycle start to take a toll as operational debt.&lt;/p&gt;

&lt;h2&gt;
  
  
  Order of adoption: where personal and organizational use diverge
&lt;/h2&gt;

&lt;p&gt;Whether to adopt managed harness can naturally branch by operational scale. Let me go through three stages.&lt;/p&gt;

&lt;p&gt;In the personal-use stage, where one person is using the agent, the self-built harness is often sufficient. The editing and use of knowledge files are tightly coupled, and the iteration of "rewrite Markdown the moment you notice something while using it" runs fast. Both Identity and Observability are hard to recognize as gaps as long as you're operating solo, and end up in the "would-be-nice-to-have, maybe" zone. In the experimental stage, this freedom directly translates into learning speed.&lt;/p&gt;

&lt;p&gt;At the stage of expanding to organizational operation where multiple people use the agent, the four blank layers all surface as problems at once. You need audit logs of who used which agent how (Observability); you start running into situations where shared environments must not allow tools to be called freely, so boundaries become necessary (Policy); you need to manage credentials per member (Identity); you want to continuously measure agent response quality (Evaluations). At this stage, the value of managed harness comes to the fore. Comparing the labor of writing the four layers yourself versus putting them on AgentCore, the latter becomes practical.&lt;/p&gt;

&lt;p&gt;In the transition phase, you can take a hybrid strategy. Continue the personal exploration stage with a self-built harness, and put only the confirmed paths used in organizational operation onto managed harness. Move agents whose design has settled to AgentCore in order, and keep agents that are still being learned on while running close at hand.&lt;/p&gt;

&lt;p&gt;There's also a guideline for the order of adoption. The first things needed for organizational deployment are Identity and Observability, then Policy, and finally Evaluations. Without Identity, sharing itself doesn't get established. Without Observability, the organization can't make operational judgments. Policy is often too late after an incident, so placing it early in organizational deployment is safer. Evaluations can come in the order of "after operation gets going, then introduce quality measurement"—that's fine.&lt;/p&gt;

&lt;p&gt;The harness was originally a concept lying at the boundary between those who build agents and those who use them. With AWS releasing managed harness, part of what we used to assemble by hand has shifted into a mechanism that runs simply by declaring it as configuration. The fact that layers like Identity, Observability, and Policy—which I had given up on as self-built—have come within reach is no small thing.&lt;/p&gt;

&lt;p&gt;Even so, design judgments such as "what is this agent for," "what to leave in the knowledge," and "how far to grant tools authority" haven't been put into a form you can declare as configuration. The basis of these judgments will continue to live in the commit history and work logs of one's own repository. The experience of having built a self-built harness leaves behind, in your hands, knowledge that doesn't lose its value when you migrate to managed. With the arrival of managed harness, the boundary between "the layers we build ourselves" and "the layers only human judgment can carry" has become more clearly visible than before, you might say.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>agents</category>
      <category>ai</category>
      <category>bedrock</category>
    </item>
    <item>
      <title>Claude Managed Agents: The Layer That Disappears, The Layer That Stays — A View from Business Automation Agents</title>
      <dc:creator>Kento IKEDA</dc:creator>
      <pubDate>Tue, 05 May 2026 07:29:36 +0000</pubDate>
      <link>https://dev.to/aws-builders/claude-managed-agents-the-layer-that-disappears-the-layer-that-stays-a-view-from-business-4n0</link>
      <guid>https://dev.to/aws-builders/claude-managed-agents-the-layer-that-disappears-the-layer-that-stays-a-view-from-business-4n0</guid>
      <description>&lt;p&gt;On April 8, 2026, Anthropic released Claude Managed Agents. The official framing is "meta-harness," and the engineering blog reports infrastructure-level improvements: p50 TTFT down about 60%, p95 down more than 90%. TTFT is the time from request to first response, where p50 is the median and p95 covers the slowest 5%. Cut the median by 60%, cut the slow tail by 90%. These aren't numbers you get from a minor optimization — they're the kind of numbers an architectural change produces. Early adopters include Notion, Rakuten, Asana, Sentry, and Vibecode.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.anthropic.com/engineering/managed-agents" rel="noopener noreferrer"&gt;https://www.anthropic.com/engineering/managed-agents&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There are already several Japanese articles covering this — terminology breakdowns by watany, builder/user harness classifications by Mr. Katayama (paiza), and trial reports by kumamo_tone and galirage, among others.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://zenn.dev/watany/articles/d8b692bbca65a3" rel="noopener noreferrer"&gt;https://zenn.dev/watany/articles/d8b692bbca65a3&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://note.com/rk611/n/n8424c56f4fa5" rel="noopener noreferrer"&gt;https://note.com/rk611/n/n8424c56f4fa5&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://zenn.dev/kumamo_tone/articles/365845d65e6cf4" rel="noopener noreferrer"&gt;https://zenn.dev/kumamo_tone/articles/365845d65e6cf4&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://zenn.dev/galirage/articles/claude-managed-agents-quickstart" rel="noopener noreferrer"&gt;https://zenn.dev/galirage/articles/claude-managed-agents-quickstart&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But the existing discussion almost entirely assumes coding agents. Notion (coding, spreadsheets, slides), Sentry (bug → PR automation), Vibecode (code generation infrastructure) — they all line up as coding-style use cases. For someone building business automation agents (morning briefings, monthly accounting, QA operations, style audits) with Markdown and MCP, where does Managed Agents fit? That perspective hasn't really been laid out yet.&lt;/p&gt;

&lt;p&gt;I run a personal repository where &lt;code&gt;agents/&lt;/code&gt; holds 15 role-specific agent instructions, &lt;code&gt;knowledge/&lt;/code&gt; holds 40+ knowledge files, and &lt;code&gt;prompts/&lt;/code&gt; holds task templates. I run my work through Claude Desktop and MCP. The "harness engineering with Markdown only" idea I wrote about in a previous article is exactly this kind of setup.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/aws-builders/harness-engineering-with-nothing-but-markdown-g6b"&gt;https://dev.to/aws-builders/harness-engineering-with-nothing-but-markdown-g6b&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With Managed Agents arriving, what happens to this self-hosted harness? Does all of it get replaced? Just part? Or is this a different conversation entirely?&lt;/p&gt;

&lt;p&gt;What this article covers is the boundary between the layer Managed Agents provides and the layer you keep yourself. It's not just about what's technically possible to put on Managed Agents — it's also about why, even when it's possible, keeping it yourself can still be the better call. I'll lay this out in five points. The latter half of the article uses my own agents as a worked example, classifying them as "fits / partial fit / doesn't fit."&lt;/p&gt;

&lt;h2&gt;
  
  
  Coding Agents and Business Automation Agents Make Different Demands on the Harness
&lt;/h2&gt;

&lt;p&gt;Before evaluating Managed Agents from a business automation angle, it's worth checking the assumptions behind the existing case studies.&lt;/p&gt;

&lt;p&gt;The early adopters Anthropic highlights are these:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Notion: parallel coding, spreadsheet, and slide tasks delegated within the Notion workspace&lt;/li&gt;
&lt;li&gt;Rakuten: specialist agents per department, each shipped within a week&lt;/li&gt;
&lt;li&gt;Asana: AI Teammates picking up tasks inside projects&lt;/li&gt;
&lt;li&gt;Sentry: bugs detected and turned autonomously into pull requests&lt;/li&gt;
&lt;li&gt;Vibecode: code generation infrastructure with 10x faster setup&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The mix runs from coding-leaning examples (Notion, Sentry, Vibecode) to department-business types (Rakuten, Asana), but they all share one thing: long-running, autonomous tasks that involve continuous operations on files and resources. Managed Agents features like "$0.08/session-hour," "checkpointing for long-running tasks," and "sandboxed code execution" are tuned for this kind of workload.&lt;/p&gt;

&lt;p&gt;On the other hand, here are the kinds of uses you might see from someone building business automation agents with Markdown and MCP. Drawing from my own active and planned setups:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Morning briefings (calendar, Slack mentions, Gmail, news summaries pulled together each morning)&lt;/li&gt;
&lt;li&gt;Monthly accounting support (pulling transactions from freee, aggregating in Excel, sharing with stakeholders) — under construction&lt;/li&gt;
&lt;li&gt;QA operations (reviewing MagicPod test runs, recording problematic test cases in Confluence, sharing in Slack)&lt;/li&gt;
&lt;li&gt;Style audits (checking article drafts against &lt;code&gt;writing-style-guide.md&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;1on1 prep (consolidating past notes, organizing discussion points)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Lining the two up, the demands on the harness can be sorted into roughly four points:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Coding agent&lt;/th&gt;
&lt;th&gt;Business automation agent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Main targets of operation&lt;/td&gt;
&lt;td&gt;File system + repositories&lt;/td&gt;
&lt;td&gt;SaaS APIs (Slack, Calendar, Gmail, freee, etc.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Execution time&lt;/td&gt;
&lt;td&gt;Long-running tasks of minutes to hours&lt;/td&gt;
&lt;td&gt;Repeated short tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Where state lives&lt;/td&gt;
&lt;td&gt;File state inside the container persists&lt;/td&gt;
&lt;td&gt;State lives on the SaaS side, local state is transient&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Triggers&lt;/td&gt;
&lt;td&gt;Initiated by humans (chat UI)&lt;/td&gt;
&lt;td&gt;Mix of schedules, events, and human prompts&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The way Managed Agents is designed fits use cases where the file system persists and tasks run for a long time. Sandboxes, checkpoints, and session-runtime billing all make sense in that context. For business automation use — many short tasks each calling SaaS APIs — most of these features won't get fully used.&lt;/p&gt;

&lt;p&gt;This isn't to say "business automation doesn't fit Managed Agents." Different use cases mean different things you actually get out of Managed Agents. With that in mind, the next sections cover how to combine it with a self-hosted harness.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Managed Agents meta-harness Actually Provides
&lt;/h2&gt;

&lt;p&gt;Reading the official engineering post "Scaling Managed Agents: Decoupling the brain from the hands" (April 8, 2026) makes the design philosophy of Managed Agents clear.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.anthropic.com/engineering/managed-agents" rel="noopener noreferrer"&gt;https://www.anthropic.com/engineering/managed-agents&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The opening sentence captures the whole thing.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Harnesses encode assumptions that go stale as models improve.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;As a concrete example, Sonnet 4.5 had a behavior where it would wrap up tasks early just before hitting the context limit ("context anxiety"), and harnesses implemented context resets to compensate. Run the same harness on Opus 4.5, and the behavior is gone — the resets become dead weight. Corrections you bake into the harness become unnecessary as the model gets smarter. That's the observation.&lt;/p&gt;

&lt;p&gt;So Anthropic chose to abstract the harness itself. Just as an OS virtualizes hardware behind abstractions like &lt;code&gt;process&lt;/code&gt; and &lt;code&gt;file&lt;/code&gt;, Managed Agents separates an agent into three pieces:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Session: an append-only event log. The source of truth for everything that happened&lt;/li&gt;
&lt;li&gt;Harness: a stateless loop that calls Claude and routes tool calls&lt;/li&gt;
&lt;li&gt;Sandbox: the execution environment for code and file operations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The harness calls the sandbox through a simple &lt;code&gt;execute(name, input) → string&lt;/code&gt; interface. Containers, smartphones, Pokémon emulators — anything fits behind the same abstraction, as the official post puts it.&lt;/p&gt;

&lt;p&gt;Where this decoupling pays off is that containers go from "pet" to "cattle." A pet is a uniquely cared-for, named individual; cattle are interchangeable, managed by number — that's the infrastructure ops metaphor. If a container dies, the harness receives it as a tool call error and provisions a new container. If the harness itself dies, you can &lt;code&gt;wake(sessionId)&lt;/code&gt;, call &lt;code&gt;getSession(id)&lt;/code&gt; to retrieve the event log, and resume from the last event. Only the session log is persisted. That's the design.&lt;/p&gt;

&lt;p&gt;The TTFT improvements mentioned at the start (p50 down about 60%, p95 down more than 90%) come from this decoupling. Inference can begin without waiting for container provisioning.&lt;/p&gt;

&lt;p&gt;Anthropic positions its own service as a "meta-harness." Quoting the conclusion of the article:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Managed Agents is a meta-harness in the same spirit, unopinionated about the specific harness that Claude will need in the future.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In other words, what Anthropic provides is "a stable interface that any harness can sit on top of" (the virtualization of session/harness/sandbox), not a prescription for "this is your harness." Claude Code, task-specific harnesses, custom harnesses — all of them are meant to run on top of it.&lt;/p&gt;

&lt;p&gt;That's what Managed Agents actually is.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://platform.claude.com/docs/en/managed-agents/overview" rel="noopener noreferrer"&gt;https://platform.claude.com/docs/en/managed-agents/overview&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The next section moves on to what sits on top: the contents of the self-hosted harness.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Self-Hosted Harness Is Two Layers: Managed Agents Replaces the Bottom Half, You Keep the Top
&lt;/h2&gt;

&lt;p&gt;What the official meta-harness design provides is the three abstractions: &lt;code&gt;session / harness / sandbox&lt;/code&gt;. In other words, the OS layer that makes an agent run — the substrate that hosts what's above, like processes, file systems, and memory.&lt;/p&gt;

&lt;p&gt;So what does a self-hosted harness put on top of that? In my own setup, it looks like this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ikenyal-ai-agents/
├── agents/                  # role-specific agent instructions
│   ├── executive-assistant.md
│   └── ...                  # other role definitions
├── knowledge/               # knowledge base
│   ├── writing-style-guide.md
│   ├── article-strategy.md
│   └── ...                  # various contexts
├── prompts/                 # task templates
│   ├── morning-briefing.md
│   └── 1on1-prep.md
├── tasks/                   # task definitions
├── scripts/                 # analysis and automation scripts
├── docs/                    # work logs and operational docs
└── README.md                # root instructions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These sit on a different layer than the OS layer Managed Agents provides. They express what the agent "knows," "how it should behave," and "what's off limits" — the territory you might call the knowledge layer.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Contents&lt;/th&gt;
&lt;th&gt;Examples&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OS layer&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Managed Agents (meta-harness)&lt;/td&gt;
&lt;td&gt;agent loop, tool execution, sandbox, session persistence&lt;/td&gt;
&lt;td&gt;the three abstractions of &lt;code&gt;session / harness / sandbox&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Knowledge layer&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;self-hosted repository&lt;/td&gt;
&lt;td&gt;agent behavior instructions, organizational context, domain knowledge, style conventions&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;agents/&lt;/code&gt;, &lt;code&gt;knowledge/&lt;/code&gt;, &lt;code&gt;prompts/&lt;/code&gt;, &lt;code&gt;CLAUDE.md&lt;/code&gt;, &lt;code&gt;AGENTS.md&lt;/code&gt;, &lt;code&gt;SKILL.md&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The existing Japanese articles (watany's terminology breakdown, Mr. Katayama's builder/user harness classification) discuss the harness as one big thing. To read accurately what Managed Agents actually changes, you need to see this two-layer structure.&lt;/p&gt;

&lt;p&gt;What Managed Agents replaces is the OS layer only. The knowledge layer stays as is.&lt;/p&gt;

&lt;p&gt;That was the technical-fact part. From here, this article gets into its main argument. Managed Agents lets you register Skills (&lt;code&gt;SKILL.md&lt;/code&gt;) and agent definitions, so technically you can put parts of the knowledge layer on it too. Even so, why is keeping it self-hosted the better choice? The next section breaks it down across five points.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why You Shouldn't Hand the Knowledge Layer to Anthropic — Five Points
&lt;/h2&gt;

&lt;p&gt;Reading the official Managed Agents docs, you can register the &lt;code&gt;mcp_servers&lt;/code&gt; definition, the choice of &lt;code&gt;tools&lt;/code&gt;, the &lt;code&gt;system&lt;/code&gt; prompt, and Skills (&lt;code&gt;SKILL.md&lt;/code&gt;) all as part of the agent definition. Technically, the knowledge layer can ride on Managed Agents too.&lt;/p&gt;

&lt;p&gt;The argument of this article is that even so, keeping it self-hosted is the better call. Five reasons.&lt;/p&gt;

&lt;h3&gt;
  
  
  Point 1: Where the data lives changes
&lt;/h3&gt;

&lt;p&gt;Business automation agents often include organizational context. In my case, &lt;code&gt;agents/&lt;/code&gt; contains things like the structure of my organization, operational know-how, contact information, and various judgment criteria for business decisions. If you register all of this as part of Managed Agents, it becomes a resource on Anthropic's side. Until you explicitly delete it, it stays there.&lt;/p&gt;

&lt;p&gt;What's worth being aware of is the data retention character of Managed Agents. The official "API and data retention" doc states clearly:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://platform.claude.com/docs/en/build-with-claude/api-and-data-retention" rel="noopener noreferrer"&gt;https://platform.claude.com/docs/en/build-with-claude/api-and-data-retention&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Claude Managed Agents is a stateful resource. You can delete session transcripts, but there is no automatic deletion.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;There's a similar note for Skills (&lt;code&gt;SKILL.md&lt;/code&gt;):&lt;/p&gt;

&lt;p&gt;&lt;a href="https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview" rel="noopener noreferrer"&gt;https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Agent Skills is not covered by ZDR arrangements. Data is retained according to the feature's standard retention policy.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;ZDR (Zero Data Retention) is a contractual option Anthropic offers to enterprise API customers who go through Anthropic's review and sign individually. It guarantees that data sent through the API isn't retained on Anthropic's side. It's often cited as a precondition for handling internal data with AI. Managed Agents and Agent Skills are out of scope even under that strictest contract — that's the current positioning.&lt;/p&gt;

&lt;p&gt;Whether or not you have a ZDR arrangement, your agent definitions, sessions, and skills are all retained on Anthropic's side. Unless you explicitly delete them, they don't go away on their own.&lt;/p&gt;

&lt;p&gt;This isn't a "this is absolutely a no-go" kind of statement — it's a question of where the data lives and how you can move it around. Git management also typically uses external services like GitHub, so the storage being external is the same. The difference is that with git, you can choose the storage (GitHub, GitLab, self-hosted, or local-only without a remote), and the content stays as Markdown that can move anywhere. Once you register something in Managed Agents, the location is Anthropic, and the format is Anthropic's proprietary JSON structure — that's the fixed shape. When designing an agent that handles internal data, this difference becomes a factor in the decision.&lt;/p&gt;

&lt;h3&gt;
  
  
  Point 2: Friction in the Edit-and-Run Workflow
&lt;/h3&gt;

&lt;p&gt;With a self-hosted harness, the update cycle for the knowledge layer goes like this. Open the editor, edit &lt;code&gt;agents/executive-assistant.md&lt;/code&gt;, save. Claude Desktop picks it up on the next session — instant reflection. The whole thing takes seconds.&lt;/p&gt;

&lt;p&gt;With Managed Agents, you edit the file, then make an API call (&lt;code&gt;create / update agent&lt;/code&gt;) and restart the session. It's not instant — the API call adds a step in the middle.&lt;/p&gt;

&lt;p&gt;Where this cost actually shows up is when edits happen "the moment you notice something while using it." While running an agent, you realize "this instruction is too verbose" or "I want to add this here," switch to the editor, fix the relevant file, save, see it on the next message — that flow happens routinely.&lt;/p&gt;

&lt;p&gt;The bigger difference isn't the time itself, but whether the flow of thought breaks. With a self-hosted harness, edit-and-reflect is part of the "using it" flow. With Managed Agents, the API-call step interrupts. There's an option to write a "Markdown → API sync script" yourself, but that script then becomes its own maintenance target.&lt;/p&gt;

&lt;h3&gt;
  
  
  Point 3: Losing the Benefits of Git Management
&lt;/h3&gt;

&lt;p&gt;The knowledge layer is a continuous loop of trial and error. You rewrite an agent's instructions, see what happens, rewrite again. &lt;code&gt;git diff&lt;/code&gt; shows you what changed, &lt;code&gt;git log&lt;/code&gt; lets you trace history, &lt;code&gt;git blame&lt;/code&gt; tells you why something was added. If you don't like where it's going, branch off and experiment.&lt;/p&gt;

&lt;p&gt;None of this works through a Managed Agents agent-definition API. Anthropic's side likely has version control of some kind, but the wider &lt;code&gt;git&lt;/code&gt; toolchain ecosystem (GitHub, PRs, CI, code review, cherry-pick, rebase) doesn't apply.&lt;/p&gt;

&lt;p&gt;The evolution of the knowledge layer has value when you can look back at it through git history. Being able to trace "when and why was that one line added to &lt;code&gt;executive-assistant.md&lt;/code&gt;" alongside the commit message — that's a small thing that quietly props up your operational confidence.&lt;/p&gt;

&lt;h3&gt;
  
  
  Point 4: Open-Standard Portability
&lt;/h3&gt;

&lt;p&gt;This is the point I personally weight the most.&lt;/p&gt;

&lt;p&gt;In my previous DESIGN.md article, I covered how &lt;code&gt;AGENTS.md&lt;/code&gt; and &lt;code&gt;SKILL.md&lt;/code&gt; are open standards.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/aws-builders/agentsmd-skillmd-designmd-how-ai-instructions-split-into-three-layers-d0g"&gt;https://dev.to/aws-builders/agentsmd-skillmd-designmd-how-ai-instructions-split-into-three-layers-d0g&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;AGENTS.md&lt;/code&gt; is jointly promoted by OpenAI, Google, Sourcegraph, Cursor, Factory and others, and was donated to the Linux Foundation in December 2025. &lt;code&gt;SKILL.md&lt;/code&gt; is the core of the Agent Skills standardized by agentskills.io.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://agents.md/" rel="noopener noreferrer"&gt;https://agents.md/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://agentskills.io/" rel="noopener noreferrer"&gt;https://agentskills.io/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Multiple AI agents — Codex, Claude Code, Cursor, GitHub Copilot — read the same files.&lt;/p&gt;

&lt;p&gt;Managed Agents agent definitions, on the other hand, are an Anthropic-proprietary JSON structure that bundles fields like &lt;code&gt;name&lt;/code&gt;, &lt;code&gt;model&lt;/code&gt;, &lt;code&gt;system&lt;/code&gt;, &lt;code&gt;tools&lt;/code&gt;, &lt;code&gt;mcp_servers&lt;/code&gt;, &lt;code&gt;skills&lt;/code&gt;, etc. Registering &lt;code&gt;SKILL.md&lt;/code&gt; to Managed Agents makes it work, but it's a registration confined to Anthropic — Codex and Cursor can't see it.&lt;/p&gt;

&lt;p&gt;That's less "vendor lock-in" and more like a re-lock-in of something that just got standardized into one specific implementation. Against the trend of &lt;code&gt;AGENTS.md&lt;/code&gt; / &lt;code&gt;SKILL.md&lt;/code&gt; spreading as open standards, choosing to confine your own knowledge to a vendor-specific format doesn't have a compelling reason to actively pick.&lt;/p&gt;

&lt;p&gt;A repo that holds &lt;code&gt;AGENTS.md&lt;/code&gt; / &lt;code&gt;SKILL.md&lt;/code&gt; itself is a way to keep a "neutral location" — referenceable equally from Managed Agents, Codex, Cursor, and other AI agents.&lt;/p&gt;

&lt;h3&gt;
  
  
  Point 5: Speed of Testing and Iteration
&lt;/h3&gt;

&lt;p&gt;In the "growing it" phase of an agent, cycle speed is what determines quality. Rewrite a line of instruction, try it, see what happens, rewrite again. The faster that loop, the higher the agent's accuracy ends up.&lt;/p&gt;

&lt;p&gt;With a self-hosted harness (Claude Desktop + Markdown), you rewrite, save, see it on the next message — seconds.&lt;/p&gt;

&lt;p&gt;With Managed Agents, you call the agent-update API, rebuild the environment, restart the session, then test. Every cycle has API-mediated steps in it. For a "growing it" phase, that tends to work against you.&lt;/p&gt;

&lt;p&gt;For long-running production tasks (sessions of multiple hours and up), Managed Agents' stability and scalability really shine. But many business automation agents stay in the "fed and grown daily" phase for a long time. My &lt;code&gt;executive-assistant.md&lt;/code&gt; has been getting some kind of weekly tweak for several months now.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Comes into View When You Bundle These Five
&lt;/h3&gt;

&lt;p&gt;"Can be put on" and "should be put on" are different problems. Just as the official Managed Agents design points toward a &lt;code&gt;meta-harness&lt;/code&gt;, the knowledge layer that sits on top can also reasonably stay outside the meta-harness, judged across the points above.&lt;/p&gt;

&lt;p&gt;Just as the official side chose three-way separation (session / harness / sandbox) and a design that doesn't dictate the shape of what goes on top, the user side can equally make the choice of "keep the knowledge layer self-hosted" without dictating its shape. That feels like a natural conclusion when you see things through the two-layer structure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Classifying My Own Agents into "Fits / Partial / Doesn't Fit"
&lt;/h2&gt;

&lt;p&gt;Mapping the discussion onto my own agents: classifying them across "fits / partial fit / doesn't fit" on Managed Agents, the patterns roughly look like this.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type of use&lt;/th&gt;
&lt;th&gt;Main tools&lt;/th&gt;
&lt;th&gt;Verdict&lt;/th&gt;
&lt;th&gt;Reason&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Personal-assistant style (calendar, mail, chat, local files)&lt;/td&gt;
&lt;td&gt;calendar, mail, chat, ticket management, web search, local file ops&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;Main MCPs are remote; local file-ops MCPs don't exist as remote, that part doesn't fit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Infrastructure-ops support&lt;/td&gt;
&lt;td&gt;cloud APIs, chat, docs&lt;/td&gt;
&lt;td&gt;Fits&lt;/td&gt;
&lt;td&gt;All SaaS, long-running tasks are also a comfortable assumption&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Project / org-ops support&lt;/td&gt;
&lt;td&gt;chat, ticket management&lt;/td&gt;
&lt;td&gt;Fits&lt;/td&gt;
&lt;td&gt;All main tools complete via remote MCP, no local dependency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Test-quality ops&lt;/td&gt;
&lt;td&gt;test automation tool MCPs, chat, docs&lt;/td&gt;
&lt;td&gt;Doesn't fit&lt;/td&gt;
&lt;td&gt;Main test automation tool MCPs are local stdio-only, so they can't be called from Managed Agents which assumes remote&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Department functions (accounting, legal, etc.)&lt;/td&gt;
&lt;td&gt;SaaS APIs, chat, local Excel or doc references&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;The SaaS API side is fine on remote MCPs, but local Excel and doc references don't ride along, and how internal data is handled also needs an organizational governance call&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Conversational thinking-organization (morning briefings, 1on1 prep, etc.)&lt;/td&gt;
&lt;td&gt;calendar, chat, web search&lt;/td&gt;
&lt;td&gt;Doesn't fit&lt;/td&gt;
&lt;td&gt;Designed in tandem with the Claude Desktop conversational experience (digging in by dialog) — autonomous Managed Agents isn't the right fit&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;What's notable is that the knowledge layer stays self-hosted across every pattern. Even the agents I judged as "fits" still keep their role definitions, organizational context, and style conventions in the self-hosted repo. What's placed on the Managed Agents side is the OS-layer functionality (sandbox, harness loop, session persistence, auth vault).&lt;/p&gt;

&lt;p&gt;This is the concrete instance of the two-layer structure shown in earlier sections. The OS layer is something you can hand off to Anthropic; the knowledge layer stays in your hands. Per agent, you decide "OS layer handed off / OS layer kept" — that's the shape of the call.&lt;/p&gt;

&lt;p&gt;The judgment axes for fitting your own agents into this look like four:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Whether the main tools complete on remote MCPs, or depend on local tools&lt;/li&gt;
&lt;li&gt;Whether the data being handled includes internal-organization data, or doesn't&lt;/li&gt;
&lt;li&gt;Long-running tasks, or repeated short tasks&lt;/li&gt;
&lt;li&gt;"Growing it" phase, or "operating it" phase&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Looking through these, decide per agent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anticipated Counterarguments and Responses
&lt;/h2&gt;

&lt;p&gt;Three counterarguments, taken in turn.&lt;/p&gt;

&lt;h3&gt;
  
  
  Counter 1: If the main MCPs are remote-capable, why not put it all on?
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Main MCPs (Slack, Atlassian, Calendar, Gmail, freee) are remote-capable, so why not put all the business automation on Managed Agents?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;True — as of May 2026, many of the main MCPs you'd want for business automation are remote-capable: Atlassian Rovo, Slack, Google Calendar/Gmail, freee, and so on. The pool of "agents you could put on it" is growing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://platform.claude.com/docs/en/managed-agents/mcp-connector" rel="noopener noreferrer"&gt;https://platform.claude.com/docs/en/managed-agents/mcp-connector&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://platform.claude.com/docs/en/agents-and-tools/remote-mcp-servers" rel="noopener noreferrer"&gt;https://platform.claude.com/docs/en/agents-and-tools/remote-mcp-servers&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That said, what this article argues isn't "don't put it on because you can't" — it's "even when you can, you don't always should." The OS layer is a candidate for putting on; whether to put the knowledge layer on it has to be judged individually across the five points (where data lives, edit workflow, git management, open standards, iteration speed).&lt;/p&gt;

&lt;h3&gt;
  
  
  Counter 2: $0.08/hour seems acceptable, doesn't it?
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;$0.08/hour seems acceptable, doesn't it?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For short tasks, no problem. If a morning briefing finishes in 10 minutes, then 20 working days × 10 minutes — about 3.3 hours × $0.08 = $0.26, plus token billing. At that scale, fine.&lt;/p&gt;

&lt;p&gt;The question is whether you move the agents you use daily on Claude Desktop. Use that's typical of "open all day during work hours" doesn't translate to Managed Agents cleanly: session billing and token billing scale directly with running time. Same usage, same outputs — the cost is likely to go up.&lt;/p&gt;

&lt;p&gt;"Put it on Managed Agents / use it on Claude Desktop / use both" is something to decide per agent based on use case and cost structure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Counter 3: With existing case studies, why not put business automation on it too?
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;With case studies like Notion, Sentry, Vibecode, why not put business automation on it too?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;These case studies are all coding-style (code generation, bug fixing, spreadsheet operations). Business automation agents typify a different shape (SaaS integration, monthly reports, QA ops) — the demands on the harness are different, as covered earlier.&lt;/p&gt;

&lt;p&gt;And in fact, none of these case studies are entirely closed inside Managed Agents either. Notion runs in Notion, Sentry in Sentry's infrastructure, Vibecode on Vibecode's platform — each with their own knowledge and UX. Managed Agents functions as the OS layer underneath. That lines up with the two-layer structure this article argues for.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where to Place the First Move
&lt;/h2&gt;

&lt;p&gt;If you're going to actually start somewhere, this kind of flow makes sense.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Pick out the SaaS-integration-centric agents in your own collection&lt;/strong&gt;: agents without local file ops or desktop integration are the candidates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Check whether the MCPs each agent uses are remote-capable&lt;/strong&gt;: Slack, Atlassian, Calendar, Gmail, freee are already remote; others, check individually&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Put just one agent on Managed Agents and run it&lt;/strong&gt;: don't migrate everything at once — get the operational feel from one&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep the knowledge layer in the self-hosted repository&lt;/strong&gt;: register the agent definition through the API, but keep &lt;code&gt;agents/&lt;/code&gt;, &lt;code&gt;knowledge/&lt;/code&gt;, &lt;code&gt;prompts/&lt;/code&gt; in git. Treat the Markdown as canonical, the API registration as a mirror&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Expand the put-on range gradually, or don't&lt;/strong&gt;: after running one, if cost, speed, and edit workflow are fine, move to the next; if they aren't, keep it self-hosted&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;"All rewritten" and "all self-hosted" are both extremes. Per agent — that feels like the realistic landing.&lt;/p&gt;

&lt;p&gt;Harnesses encode assumptions that go stale as models improve. Those are the official words. With Managed Agents arriving, self-hosted OS-layer implementations do go stale. There's no longer a need to write the sandbox, agent loop, and session persistence yourself.&lt;/p&gt;

&lt;p&gt;But the knowledge layer above continues to be the place where organizational and personal context lives. &lt;code&gt;AGENTS.md&lt;/code&gt; and &lt;code&gt;SKILL.md&lt;/code&gt; are referenced as open standards by multiple AI agents. Managed by git, edited in the editor in seconds. Things you've grown like a &lt;code&gt;writing-style-guide.md&lt;/code&gt; of your own keep evolving in your own repository, not as a stateful resource on Anthropic's side.&lt;/p&gt;

&lt;p&gt;The next step in harness engineering starts from thinking in layers.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>claude</category>
      <category>mcp</category>
    </item>
    <item>
      <title>AGENTS.md, SKILL.md, DESIGN.md: How AI Instructions Split into Three Layers</title>
      <dc:creator>Kento IKEDA</dc:creator>
      <pubDate>Sat, 02 May 2026 21:35:11 +0000</pubDate>
      <link>https://dev.to/aws-builders/agentsmd-skillmd-designmd-how-ai-instructions-split-into-three-layers-d0g</link>
      <guid>https://dev.to/aws-builders/agentsmd-skillmd-designmd-how-ai-instructions-split-into-three-layers-d0g</guid>
      <description>&lt;p&gt;In April 2026, Google Labs released a spec called &lt;code&gt;DESIGN.md&lt;/code&gt;. It's a design system specification readable by AI agents, packaged with a CLI validator: &lt;code&gt;npx @google/design.md lint&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;With &lt;code&gt;DESIGN.md&lt;/code&gt; in the picture, we now have three different file types for instructing AI agents. &lt;code&gt;AGENTS.md&lt;/code&gt; has been spreading as an industry standard since 2025 (jointly developed by OpenAI, Google, Sourcegraph, Cursor, and Factory; donated to the Linux Foundation in December 2025). &lt;code&gt;SKILL.md&lt;/code&gt; sits at the core of Anthropic's Claude Skills. And now &lt;code&gt;DESIGN.md&lt;/code&gt;. The three handle different concerns and don't overlap.&lt;/p&gt;

&lt;p&gt;This article is for developers using coding agents like Claude Code, Cursor, or Codex in their work, and for tech leads operating natural-language instruction files like CLAUDE.md and style guides. If your team is doing Spec-Driven Development (SDD), this should also reach you.&lt;/p&gt;

&lt;p&gt;What I want to lay out is two things: how AI instructions are starting to split across three layers — behavior, individual tasks, and visual appearance — and how that connects with SDD as a parallel movement.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Old Pattern: Natural-Language Documents
&lt;/h2&gt;

&lt;p&gt;A few years into the ChatGPT era, most engineers have written some form of "rules I want the AI to follow" in a Markdown file. CLAUDE.md, styleguide.md, CONTRIBUTING.md, internal coding conventions. The locations vary, but the format is roughly the same: unstructured natural language.&lt;/p&gt;

&lt;p&gt;A &lt;code&gt;writing-style-guide.md&lt;/code&gt; file I've been building over the past few months is a typical example. It's a style guide I use when writing technical articles with Claude — a list of patterns common in AI-generated text, written down as forbidden phrases. By making Claude Desktop read it every session, the tone of my output stays consistent. It's part of a personal repository (&lt;code&gt;ikenyal-ai-agents&lt;/code&gt;) I use as the harness for my business automation agents — the one I covered in my previous post.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/aws-builders/harness-engineering-with-nothing-but-markdown-g6b"&gt;https://dev.to/aws-builders/harness-engineering-with-nothing-but-markdown-g6b&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The file contains roughly 150 lines: rules like "don't use em dashes," "avoid invitations like 'let's try…!'," "drop AI-style preambles like 'what's interesting is…'." The same repository has 15 instruction files under &lt;code&gt;agents/&lt;/code&gt;, organized by team and role: &lt;code&gt;executive-assistant.md&lt;/code&gt;, &lt;code&gt;sre-support.md&lt;/code&gt;, &lt;code&gt;qa-support.md&lt;/code&gt;, &lt;code&gt;accounting.md&lt;/code&gt;. Each describes "the assumptions to operate under as this role" in plain natural language.&lt;/p&gt;

&lt;p&gt;This approach has clear benefits. You can articulate tone, stance, and implicit rules. New team members can read the files and pick up the expectations. With CLAUDE.md, Claude Code reads it every session, so persona-level instructions land consistently.&lt;/p&gt;

&lt;p&gt;There are limits, too. First, validation falls on humans. Whether a rule was followed or not gets decided by a human reading the output. Second, individual judgment leaks in. "Write politely" means different things to different reviewers.&lt;/p&gt;

&lt;p&gt;The third limit is the actual subject of this article. Rules that are formally verifiable (forbidden phrases, em-dash usage, specific pattern matches) and rules that require judgment (tone, structural choices, how to open with empathy) sit in the same file. So even the verifiable parts end up depending on human review. That's the problem the three new file types are addressing.&lt;/p&gt;

&lt;h2&gt;
  
  
  New Type 1: How DESIGN.md (Google Labs) Specifies Visual Appearance
&lt;/h2&gt;

&lt;p&gt;On April 10, 2026, Google Labs published the &lt;code&gt;DESIGN.md&lt;/code&gt; specification at &lt;code&gt;google-labs-code/design.md&lt;/code&gt;. As of early May, the repo has over 11,000 stars. It's the reference implementation for Google Stitch (&lt;code&gt;stitch.withgoogle.com&lt;/code&gt;), an AI-driven UI generation product.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/google-labs-code/design.md" rel="noopener noreferrer"&gt;https://github.com/google-labs-code/design.md&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The specification doc lives on the Stitch side.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://stitch.withgoogle.com/docs/design-md/specification" rel="noopener noreferrer"&gt;https://stitch.withgoogle.com/docs/design-md/specification&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What &lt;code&gt;DESIGN.md&lt;/code&gt; covers is the design system specification. You write machine-readable design tokens in YAML at the top of the file (colors, typography, spacing, components), and human-readable design intent in the Markdown body underneath. Both live in the same file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Heritage&lt;/span&gt;
&lt;span class="na"&gt;colors&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;primary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;#1A1C1E"&lt;/span&gt;
  &lt;span class="na"&gt;tertiary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;#B8422E"&lt;/span&gt;
&lt;span class="na"&gt;typography&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;h1&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;fontFamily&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Public Sans&lt;/span&gt;
    &lt;span class="na"&gt;fontSize&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;3rem&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gu"&gt;## Overview&lt;/span&gt;

Architectural Minimalism meets Journalistic Gravitas.

&lt;span class="gu"&gt;## Colors&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; Primary (#1A1C1E): Deep ink for headlines and core text.
&lt;span class="p"&gt;-&lt;/span&gt; Tertiary (#B8422E): "Boston Clay", the sole driver for interaction.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The headline feature of this format is the CLI validator that ships with it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx @google/design.md lint DESIGN.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This checks token reference integrity, WCAG contrast ratios, and structural rule compliance, returning the result as JSON. Wire it into CI and you can verify design system consistency on every pull request. There's also a &lt;code&gt;diff&lt;/code&gt; command that compares two &lt;code&gt;DESIGN.md&lt;/code&gt; files and returns token-level changes in a structured form. Design system version control — historically a manual process — gains a verifiable layer.&lt;/p&gt;

&lt;p&gt;For Japanese UIs, the Google Labs spec alone falls short. It doesn't define the typography requirements specific to Japanese (CJK font fallback chains, line height, letter-spacing, kinsoku shori, mixed typesetting). The gap is filled by &lt;code&gt;kzhrknt/awesome-design-md-jp&lt;/code&gt;, which publishes Japan-localized &lt;code&gt;DESIGN.md&lt;/code&gt; files for over 10 services including Apple Japan, SmartHR, freee, note, MUJI, Mercari, LINE, and Toyota. For Japanese products, using both the Google Labs spec and the Japan edition together is the practical approach.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/kzhrknt/awesome-design-md-jp" rel="noopener noreferrer"&gt;https://github.com/kzhrknt/awesome-design-md-jp&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What &lt;code&gt;DESIGN.md&lt;/code&gt; carries is the design system that used to be scattered across Figma files and style guide PDFs, now consolidated into a single file with both machine-readable and human-readable parts. Think of it as the spec foundation that lets AI agents generate UIs with a consistent look every time.&lt;/p&gt;

&lt;h2&gt;
  
  
  New Type 2: How SKILL.md (Anthropic) and AGENTS.md Specify Behavior
&lt;/h2&gt;

&lt;p&gt;While &lt;code&gt;DESIGN.md&lt;/code&gt; covers "appearance," &lt;code&gt;SKILL.md&lt;/code&gt; and &lt;code&gt;AGENTS.md&lt;/code&gt; cover "behavior" — defining what the agent is trying to do, how it should proceed, and what it must not do.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;SKILL.md&lt;/code&gt; is the file format standardized by agentskills.io as part of the Agent Skills open standard. Anthropic's Claude Skills is one implementation of this standard; the same &lt;code&gt;SKILL.md&lt;/code&gt; works across Claude Code, Claude.ai, and the Agent SDK. Because it's standards-compliant, the same file is also readable by other agents like OpenClaw and Hermes. The structure: declare metadata (skill name, description, allowed tools) in the YAML at the top of the file, and write the task procedure or domain knowledge in the Markdown body below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://agentskills.io/home" rel="noopener noreferrer"&gt;https://agentskills.io/home&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A clear example of &lt;code&gt;SKILL.md&lt;/code&gt; is &lt;code&gt;conorbronsdon/avoid-ai-writing&lt;/code&gt;. It's an English-only skill that detects and rewrites AI patterns in English text — transition phrases like "Moreover," significance inflation like "watershed moment," and roundabout verb constructions like "serves as." It uses a 100+ word replacement table organized into 3 tiers (Tier 1 always replaces, Tier 2 flags when 2+ words appear in the same paragraph, Tier 3 flags only at high density), and audits 36 pattern categories. Two modes: &lt;code&gt;detect&lt;/code&gt; and &lt;code&gt;rewrite&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/conorbronsdon/avoid-ai-writing" rel="noopener noreferrer"&gt;https://github.com/conorbronsdon/avoid-ai-writing&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What sets it apart from a one-shot prompt is the structured audit it returns. In &lt;code&gt;rewrite&lt;/code&gt; mode, you get four discrete sections: identified issues, the rewritten text, a summary of changes, and a second-pass audit. What changed and why becomes transparent.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;AGENTS.md&lt;/code&gt; covers the agent's overall behavior. Project assumptions, roles, prohibitions, escalation rules. As I mentioned at the top, it started with the Amp team at Sourcegraph; today OpenAI, Google, Cursor, and Factory jointly drive it, and it was donated to the Linux Foundation in December 2025. Think of &lt;code&gt;CLAUDE.md&lt;/code&gt; as the Claude-specific version of &lt;code&gt;AGENTS.md&lt;/code&gt;. Claude Code reads &lt;code&gt;CLAUDE.md&lt;/code&gt; rather than &lt;code&gt;AGENTS.md&lt;/code&gt; in its spec, but the pattern recommended by &lt;code&gt;agents.md&lt;/code&gt; is to make &lt;code&gt;AGENTS.md&lt;/code&gt; the actual file and symlink &lt;code&gt;CLAUDE.md&lt;/code&gt; to it. In the personal repository I introduced earlier, the files under &lt;code&gt;agents/&lt;/code&gt; belong to this layer.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;SKILL.md&lt;/code&gt; and &lt;code&gt;AGENTS.md&lt;/code&gt; cover different ranges. &lt;code&gt;AGENTS.md&lt;/code&gt; handles "overall context and boundaries." &lt;code&gt;SKILL.md&lt;/code&gt; handles "an executable unit for a specific task."&lt;/p&gt;

&lt;p&gt;The avoid-ai-writing English style auditor I mentioned is a specific task, so it ships as &lt;code&gt;SKILL.md&lt;/code&gt;. A file like &lt;code&gt;agents/genda/qa-support.md&lt;/code&gt;, which describes the assumptions and engagement style of a QA role, defines the agent's boundary — that goes on the &lt;code&gt;AGENTS.md&lt;/code&gt; side.&lt;/p&gt;

&lt;p&gt;The shared concern of these formats is "behavior and procedure," not visual appearance. What the agent knows, what it's tasked with, what it must avoid. That's a movement to fix these in a verifiable form.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Three-Layer Split
&lt;/h2&gt;

&lt;p&gt;Lining up the three file types, the layers each one handles become clear.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Format&lt;/th&gt;
&lt;th&gt;What it carries&lt;/th&gt;
&lt;th&gt;Examples&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Behavior&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;AGENTS.md&lt;/code&gt; / &lt;code&gt;CLAUDE.md&lt;/code&gt; (natural language + rules)&lt;/td&gt;
&lt;td&gt;Overall context, roles, prohibitions&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;CLAUDE.md&lt;/code&gt;, role-specific files like &lt;code&gt;agents/genda/qa-support.md&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Individual task&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;SKILL.md&lt;/code&gt; (YAML at top + Markdown body)&lt;/td&gt;
&lt;td&gt;Reusable tasks, procedures, domain knowledge&lt;/td&gt;
&lt;td&gt;avoid-ai-writing, in-house procedure skills&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Appearance&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;DESIGN.md&lt;/code&gt; (YAML at top + Markdown body)&lt;/td&gt;
&lt;td&gt;Design system spec, verifiable visual rules&lt;/td&gt;
&lt;td&gt;The Google Labs reference, individual service files in &lt;code&gt;kzhrknt/awesome-design-md-jp&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The three are complementary, not competing. CLIs like &lt;code&gt;bergside/typeui&lt;/code&gt; are emerging as tools that can generate or update either &lt;code&gt;SKILL.md&lt;/code&gt; or &lt;code&gt;DESIGN.md&lt;/code&gt;, depending on what you choose — a sign of tooling that assumes the division of labor.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/bergside/typeui" rel="noopener noreferrer"&gt;https://github.com/bergside/typeui&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What's actually different across the layers is "where to place the balance between machine-readable and human-readable." &lt;code&gt;AGENTS.md&lt;/code&gt; skews almost entirely human-readable; over-structuring it would block the contextual judgment and nuance it needs to convey. &lt;code&gt;SKILL.md&lt;/code&gt; is partially structured by the YAML at the top, but the body stays human-readable — task granularity has to be readable by humans before it can be instructed. &lt;code&gt;DESIGN.md&lt;/code&gt; puts machine-readable design tokens in the top YAML and human-readable design intent in the body, with the two cleanly separated.&lt;/p&gt;

&lt;p&gt;The center of gravity between "machine-readable" and "human-readable" sits in different places per layer. That's just the standard structuring principle — "manage things at different layers in different files" — applied to AI agents. The file names themselves spell out the division: &lt;code&gt;AGENTS.md&lt;/code&gt; ("instructions to the agent"), &lt;code&gt;SKILL.md&lt;/code&gt; ("a reusable skill"), &lt;code&gt;DESIGN.md&lt;/code&gt; ("the design system"). The names match what each one carries.&lt;/p&gt;

&lt;p&gt;Teams that have been packing all their "AI rules" into a single &lt;code&gt;CLAUDE.md&lt;/code&gt; now face a split decision. Open up your &lt;code&gt;CLAUDE.md&lt;/code&gt; and run these questions against it — splits start to surface:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is there a section writing design system rules? → If yes, that goes to &lt;code&gt;DESIGN.md&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Are specific task procedures in there (monthly aggregation, test review, contract review)? → If yes, those go to &lt;code&gt;SKILL.md&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;What's left is overall agent context and boundaries (roles, prohibitions, escalation criteria) → that's the &lt;code&gt;AGENTS.md&lt;/code&gt; equivalent that stays&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The three-layer split works as a framework for splitting your file.&lt;/p&gt;

&lt;h2&gt;
  
  
  Connecting with SDD
&lt;/h2&gt;

&lt;p&gt;Stepping back to look at the bigger picture: how does the three-layer split relate to the broader movement of "specs for AI"?&lt;/p&gt;

&lt;p&gt;SDD is a development style where you write the spec — requirements, design, tasks, implementation — before generating the code. The underlying idea: "specs aren't disposable scaffolding, they're executable artifacts that produce code." AWS's Kiro provides a workflow that generates &lt;code&gt;requirements.md&lt;/code&gt;, &lt;code&gt;design.md&lt;/code&gt;, and &lt;code&gt;tasks.md&lt;/code&gt; in order under &lt;code&gt;.kiro/specs/{feature}/&lt;/code&gt;. GitHub's Spec Kit (over 90,000 stars) supports the same flow with slash commands like &lt;code&gt;/specify&lt;/code&gt;, &lt;code&gt;/plan&lt;/code&gt;, &lt;code&gt;/tasks&lt;/code&gt;, &lt;code&gt;/implement&lt;/code&gt;. The EARS notation (Easy Approach to Requirements Syntax) used by Kiro reduces ambiguity by formatting requirements into 5 fixed templates. SDD has spread quickly between 2025 and 2026.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://kiro.dev/" rel="noopener noreferrer"&gt;https://kiro.dev/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/github/spec-kit" rel="noopener noreferrer"&gt;https://github.com/github/spec-kit&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The three-layer split (&lt;code&gt;AGENTS.md&lt;/code&gt; / &lt;code&gt;SKILL.md&lt;/code&gt; / &lt;code&gt;DESIGN.md&lt;/code&gt;) and SDD look like separate movements on the surface. The SDD community concentrates on Kiro and spec-kit usage; the &lt;code&gt;DESIGN.md&lt;/code&gt; side concentrates on formal specs and validation tooling. You don't see many articles bridging the two.&lt;/p&gt;

&lt;p&gt;But put their philosophies side by side and the overlap is striking.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Shared philosophy&lt;/th&gt;
&lt;th&gt;SDD (Kiro etc.)&lt;/th&gt;
&lt;th&gt;
&lt;code&gt;DESIGN.md&lt;/code&gt; / &lt;code&gt;SKILL.md&lt;/code&gt; / &lt;code&gt;AGENTS.md&lt;/code&gt;
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Specify before implementing&lt;/td&gt;
&lt;td&gt;requirements → design → tasks → implementation&lt;/td&gt;
&lt;td&gt;behavior → implementation, appearance → implementation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Mix machine-readable + human-readable&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;requirements.md&lt;/code&gt; (EARS notation) + natural language&lt;/td&gt;
&lt;td&gt;YAML at top + Markdown body&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Persistent context for the AI&lt;/td&gt;
&lt;td&gt;reference &lt;code&gt;.kiro/specs/{feature}/&lt;/code&gt; every time&lt;/td&gt;
&lt;td&gt;reference &lt;code&gt;DESIGN.md&lt;/code&gt; / &lt;code&gt;AGENTS.md&lt;/code&gt; every time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Reduce ambiguity through structured syntax&lt;/td&gt;
&lt;td&gt;EARS notation structures requirements (5 templates)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;lint&lt;/code&gt; validates WCAG contrast ratios and structural rules&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Fix "decisions made" as a place&lt;/td&gt;
&lt;td&gt;spec files are where decisions live&lt;/td&gt;
&lt;td&gt;spec files are where decisions live&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Both sit inside the larger "specs for AI" movement and share the same underlying philosophy.&lt;/p&gt;

&lt;p&gt;That said, they're not the same thing. The biggest difference, in one phrase: time horizon.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Axis&lt;/th&gt;
&lt;th&gt;SDD&lt;/th&gt;
&lt;th&gt;
&lt;code&gt;DESIGN.md&lt;/code&gt; / &lt;code&gt;SKILL.md&lt;/code&gt; / &lt;code&gt;AGENTS.md&lt;/code&gt;
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Time horizon&lt;/td&gt;
&lt;td&gt;Describes "what to build next"&lt;/td&gt;
&lt;td&gt;Describes "rules that already exist"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Scope&lt;/td&gt;
&lt;td&gt;Single feature / project lifecycle&lt;/td&gt;
&lt;td&gt;Persistent rules and styles&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Update rhythm&lt;/td&gt;
&lt;td&gt;New per feature → consume → archive&lt;/td&gt;
&lt;td&gt;Long-term maintenance, gradual growth&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Subject&lt;/td&gt;
&lt;td&gt;Requirements, design, tasks (procedure for action)&lt;/td&gt;
&lt;td&gt;Rules for behavior, individual tasks, appearance&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;SDD specs describe "what we're going to build." &lt;code&gt;requirements.md&lt;/code&gt; is "what this feature needs to satisfy"; &lt;code&gt;design.md&lt;/code&gt; is "how to implement this feature"; &lt;code&gt;tasks.md&lt;/code&gt; is "how to break the feature into work." Once the feature ships, they finish their job and get archived.&lt;/p&gt;

&lt;p&gt;The three-layer specs describe "what should always hold." &lt;code&gt;DESIGN.md&lt;/code&gt; provides the color and typography rules every time you generate a UI; &lt;code&gt;AGENTS.md&lt;/code&gt; provides the agent's assumptions across every session. They get maintained long-term and grow incrementally.&lt;/p&gt;

&lt;p&gt;This time-horizon difference is why the two don't compete. Transient specs and persistent specs coexist in the same project. They can also reference each other. Imagine writing "use &lt;code&gt;{colors.tertiary}&lt;/code&gt; for the button" inside &lt;code&gt;.kiro/specs/checkout-feature/design.md&lt;/code&gt; — that lets a transient feature spec reference a color token from a persistent &lt;code&gt;DESIGN.md&lt;/code&gt;. The pattern isn't widely established yet, but the structure fits cleanly.&lt;/p&gt;

&lt;p&gt;One thing worth noting: as of May 2026, the active areas of SDD (the Kiro community and similar) and the active areas of &lt;code&gt;DESIGN.md&lt;/code&gt; / &lt;code&gt;SKILL.md&lt;/code&gt; / &lt;code&gt;AGENTS.md&lt;/code&gt; haven't really crossed paths. The SDD side concentrates on "how to build a feature"; the three-layer side concentrates on "how to deliver the rules."&lt;/p&gt;

&lt;p&gt;You don't have to be doing SDD to start with the three-layer split — the split alone gets you to the door of "specs for AI." If your team is already on SDD, start referencing &lt;code&gt;DESIGN.md&lt;/code&gt; tokens from inside your feature specs and you avoid maintaining the same rules in two places. The two movements look set to converge in the next phase.&lt;/p&gt;

&lt;h2&gt;
  
  
  Not Everything Becomes a Spec
&lt;/h2&gt;

&lt;p&gt;The discussion of the three-layer split tends to drift toward "shouldn't we just spec everything," but in practice, that doesn't happen.&lt;/p&gt;

&lt;p&gt;Rules that can't be formally verified stay as natural-language documents. Tone, structural choices, cultural nuance. Things like "how to open an article with empathy" or "how to give an ending the right amount of resonance" — judgment-based qualities. The cost of speccing them isn't the issue; the essence falls out when you try.&lt;/p&gt;

&lt;p&gt;The judgment is straightforward: "is this formally verifiable?"&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Color contrast ratios (verifiable) → &lt;code&gt;DESIGN.md&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Word substitutions like "leverage → use" (verifiable) → &lt;code&gt;SKILL.md&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Tone (soft assertions, not textbook-sounding), overall stance (not teaching, just organizing) and similar (not verifiable) → stays in &lt;code&gt;AGENTS.md&lt;/code&gt; / &lt;code&gt;CLAUDE.md&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For small teams, "one natural-language file" is often enough. If &lt;code&gt;CLAUDE.md&lt;/code&gt; alone is keeping things running, there's no need to force a split. The trade-off between the cost of speccing and the load of operating it depends on team size and how long the operation has to last.&lt;/p&gt;

&lt;p&gt;The three-layer split is something you adopt incrementally, just like SDD — you don't need to spec everything at once. Start with the complex areas, the areas where verification helps most.&lt;/p&gt;

&lt;p&gt;In other words, the three-layer split isn't a goal. It's an option you adopt when the situation calls for it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where to Start
&lt;/h2&gt;

&lt;p&gt;A few options come into view from this overview.&lt;/p&gt;

&lt;p&gt;A reasonable first move is to open your &lt;code&gt;CLAUDE.md&lt;/code&gt; or style guide and sort it into "formally verifiable" and "judgment-based" sections. Color and typography rules, word substitution lists, structural rules. If a useful amount of verifiable content sits there, pick one to break out into either &lt;code&gt;DESIGN.md&lt;/code&gt; (appearance) or &lt;code&gt;SKILL.md&lt;/code&gt; (task). Don't try to split everything at once — start with the most independent piece.&lt;/p&gt;

&lt;p&gt;Pulling in external skills is another route. Drop a ready-made &lt;code&gt;SKILL.md&lt;/code&gt; like &lt;code&gt;avoid-ai-writing&lt;/code&gt; into &lt;code&gt;~/.claude/skills/&lt;/code&gt; and your stance as a writer doesn't change — only the verification gets handed off to the machine.&lt;/p&gt;

&lt;p&gt;Teams already running Kiro or spec-kit are probably at the stage where they could try referencing &lt;code&gt;DESIGN.md&lt;/code&gt; tokens from inside &lt;code&gt;.kiro/specs/{feature}/design.md&lt;/code&gt;. The cross-reference between feature specs and persistent specs is still a thin area in terms of public examples.&lt;/p&gt;

&lt;p&gt;The shared stance: don't try to spec everything at once. Document split → operational trial → speccing — staged migration is the realistic path. The three-layer split isn't a finished form. It's a movement still in progress, and that's the safer way to read it.&lt;/p&gt;

&lt;p&gt;AI rules started splitting from a single natural-language document into three spec formats. That's another side of the same movement as SDD.&lt;/p&gt;

&lt;p&gt;Not everything becomes a spec, but managing different roles in different files — that ordinary structuring is starting to apply to AI agents, too.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>agents</category>
      <category>designsystem</category>
    </item>
    <item>
      <title>Harness Engineering with Nothing but Markdown</title>
      <dc:creator>Kento IKEDA</dc:creator>
      <pubDate>Sun, 26 Apr 2026 23:54:18 +0000</pubDate>
      <link>https://dev.to/aws-builders/harness-engineering-with-nothing-but-markdown-g6b</link>
      <guid>https://dev.to/aws-builders/harness-engineering-with-nothing-but-markdown-g6b</guid>
      <description>&lt;p&gt;If coding agents aren't your primary battlefield, "harness engineering" probably feels like a distant concept. Scrolling through a timeline full of articles written for Claude Code and Codex users, you may have thought, "This isn't about me."&lt;/p&gt;

&lt;p&gt;My own agent use wasn't centered on coding either, so none of the articles out there seemed to apply to my case. But looking back, I'd been doing the same thing — it just didn't have a name yet.&lt;/p&gt;

&lt;p&gt;I've been running a business automation agent via Claude Desktop (through MCP servers) for several months now. It gathers information across multiple work tools like Slack, Confluence, and Google Calendar, switches judgment criteria based on context, and produces outputs accordingly. What the agent refers to goes beyond surface-level rules — accumulated knowledge such as understanding of organizational structure, past decision-making history, and writing style guidelines forms the foundation for its judgment.&lt;/p&gt;

&lt;p&gt;I haven't written a single line of code. All I write is Markdown. And most of that Markdown is generated by the agent itself — I just approve or give revision instructions through chat. I almost never open the files directly to edit them.&lt;/p&gt;

&lt;p&gt;This article isn't for people already practicing harness engineering. It's for those who've heard the term but thought, "That's a coding thing, right?" — I'm sharing the structure I've found. Each example includes a ready-to-use sample, so if you're running a business automation agent with MCP, you can try them as-is.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Harness Engineering?
&lt;/h2&gt;

&lt;p&gt;Let me set the foundation.&lt;/p&gt;

&lt;p&gt;Mitchell Hashimoto, co-founder of HashiCorp, gave the name "Engineer the Harness" to a practice he'd cultivated in his AI agent workflow, in a February 2026 blog post. The approach: when an agent makes a mistake, instead of fixing the prompt, build an environment where the same mistake can't happen again.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://mitchellh.com/writing/my-ai-adoption-journey" rel="noopener noreferrer"&gt;https://mitchellh.com/writing/my-ai-adoption-journey&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Days later, OpenAI published a practice report titled "Harness engineering." A small engineering team spent five months building a product using only Codex agents with zero hand-written code, and the repository reached roughly one million lines. The back-to-back publication of Hashimoto's blog and this report cemented "harness engineering" as a term.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://openai.com/index/harness-engineering/" rel="noopener noreferrer"&gt;https://openai.com/index/harness-engineering/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the coding agent context, this translates to implementations like banning specific patterns with ESLint, defining commands in &lt;code&gt;AGENTS.md&lt;/code&gt;, and running automated reviews via pre-commit hooks.&lt;/p&gt;

&lt;p&gt;From "asking" (prompts) to "building" (environment). That's the core.&lt;/p&gt;

&lt;p&gt;Up to this point, the story seems confined to the world of coding agents. But in 2025, MCP became widespread and rapidly expanded the practical scope of non-coding agents. Once agents gained direct access to business tools like Slack, Confluence, Google Calendar, and Jira, the risk of "agents making mistakes on their own" spilled beyond coding. Harnesses are no longer just for coding agents.&lt;/p&gt;

&lt;h2&gt;
  
  
  I Kept Rewriting Prompts
&lt;/h2&gt;

&lt;p&gt;When you incorporate agents into business workflows, you run into experiences like these.&lt;/p&gt;

&lt;p&gt;You write "don't make financial judgments" and it makes them anyway. You write "don't post directly to Slack — create a draft" and it tries to post. You write "commit and push at the end of the session" and it forgets.&lt;/p&gt;

&lt;p&gt;Each time, I'd rewrite the prompt. Under the assumption that "if I write it more clearly, it'll understand."&lt;/p&gt;

&lt;p&gt;At some point, I realized the assumption itself was wrong. No matter how much you polish a prompt, the agent makes the same mistake in the next session. Instructions get buried in long contexts. When the session ends, memory disappears entirely. Requests are volatile.&lt;/p&gt;

&lt;p&gt;Stop expecting the agent to remember. Change the environment instead. Looking back, this was the entry point to harness engineering.&lt;/p&gt;

&lt;h2&gt;
  
  
  Harnesses for Non-Coding Agents
&lt;/h2&gt;

&lt;p&gt;When I lined up what I'd been doing in my repository, the same structure as coding agent harnesses emerged.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Coding Agent Environment&lt;/th&gt;
&lt;th&gt;Non-Coding Agent Environment&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ESLint / TypeScript strict type enforcement&lt;/td&gt;
&lt;td&gt;Prohibited actions section under &lt;code&gt;agents/&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;AGENTS.md&lt;/code&gt; command definitions&lt;/td&gt;
&lt;td&gt;Context routing rules in instruction files&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pre-commit hooks&lt;/td&gt;
&lt;td&gt;Mandatory actions at session end&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CI gates (can't merge unless tests pass)&lt;/td&gt;
&lt;td&gt;Forced knowledge accumulation rules under &lt;code&gt;knowledge/&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The materials on each side are completely different. One uses linters and hooks, the other uses Markdown files. But the design intent is the same: building an environment outside the agent where the agent can behave correctly.&lt;/p&gt;

&lt;p&gt;One prerequisite to note: most AI chat tools have a designated place for instruction files that are automatically loaded at session start. In Claude Desktop it's Project Knowledge; in ChatGPT it's Custom Instructions. What I call "instruction files" in this article are Markdown files placed in this mechanism. Unlike writing in the prompt each time, they're automatically placed in a position that's hard to bury even as conversations grow longer.&lt;/p&gt;

&lt;p&gt;Here are three concrete examples, each with a ready-to-use sample.&lt;/p&gt;

&lt;h3&gt;
  
  
  Structuring Prohibited Actions
&lt;/h3&gt;

&lt;p&gt;Say you've delegated Slack posting to your agent. Even if you write "don't post directly — create a draft" in the prompt, it forgets across sessions.&lt;/p&gt;

&lt;p&gt;The solution is to create a prohibited actions section in the instruction file and structure it so it's loaded every session. Move the instruction's location from prompt (volatile) to file (persistent).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Prohibited Actions

Follow these without exception.

- Do not auto-post to company Slack (draft only; user handles posting)
- Do not make definitive financial judgments (always ask user for confirmation)
- Do not treat replies to clients as final versions (always get user approval)
- Do not make judgments about personnel evaluations or compensation
- When including confidential information (salaries, contract amounts, etc.) in summaries, explicitly note this
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Instead of telling someone verbally each time, place rules in a fixed location and reference them every time. It's that simple, but it changes the lifespan of rules from per-session to permanent.&lt;/p&gt;

&lt;h3&gt;
  
  
  Forcing Actions at Session End
&lt;/h3&gt;

&lt;p&gt;You want to leave a work log at the end of each session. Even if you write "create a work log and commit &amp;amp; push at the end" in the prompt, the agent just wraps up when the conversation gets lively.&lt;/p&gt;

&lt;p&gt;The solution is to define trigger conditions and mandatory actions as a set in the instruction file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Mandatory Actions at Session End

When the user indicates work completion with phrases like "done," "thanks," or "commit,"
execute the following. Skipping is prohibited.

1. Create a work log at `docs/work-logs/YYYY-MM-DD-{topic}.md`
   - Include: background, options considered, key decisions, deliverables, next steps
2. Append a summary of changes to `CHANGELOG.md`
3. Execute git commit &amp;amp; push
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The difference from the prohibited actions example is that trigger conditions for "when to fire" are also defined. By explicitly stating end signals like "done," "thanks," and "commit," the agent can more easily judge "this is the moment." It's not perfect, but the firing rate goes up significantly compared to writing "execute at the appropriate timing" with vague triggers.&lt;/p&gt;

&lt;p&gt;The key is the single line: "Skipping is prohibited." If you leave room for the agent to judge, it will decide on its own that "it's probably fine to skip this time" when conversations get long. Removing discretion stabilizes behavior.&lt;/p&gt;

&lt;p&gt;There's a secondary benefit too. When rules are defined in the instruction file, a simple "leave a log" or "commit" is enough for the agent to instantly understand "that action." No need to explain from scratch each time. The instruction file becomes shared vocabulary between human and agent.&lt;/p&gt;

&lt;h3&gt;
  
  
  Forced Knowledge Accumulation
&lt;/h3&gt;

&lt;p&gt;The third is an example of a "can't proceed without passing the check" structure.&lt;/p&gt;

&lt;p&gt;In conversations with agents, information worth accumulating comes up frequently — things decided in meetings, conclusions from tool selection, facts discovered during troubleshooting. Even if you write "save important information" in the prompt, it predictably forgets.&lt;/p&gt;

&lt;p&gt;The solution is to embed a "knowledge check" protocol in the instruction file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Knowledge Accumulation (Mandatory Check)

Before each response, internally execute the following check. Skipping is prohibited.

Check: Does the user's immediately preceding statement, or your own response,
contain new information matching any of the following?

1. Factual information: team composition, tech stack, account info, environment configuration
2. Decisions: architecture selection, tool adoption, policy changes
3. Learnings: facts discovered during troubleshooting, gotchas, operational tips
4. Client-specific: contact names, contact info, project progress

→ If applicable: In addition to the normal response, append the following at the end.

💾 Knowledge capture proposal:
  File: knowledge/{project-name}/{filename}.md
  Content: (summary of content to add)
  Reason: (why this should be accumulated)

→ If not applicable: Append nothing.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The intended structure is "can't produce a response without passing the check." Of course, LLMs can skip instructions, so the enforcement isn't as strong as a mechanical gate. Still, by embedding the check into the system, the probability of capturing information rises significantly even when the human forgets to say "save that."&lt;/p&gt;

&lt;p&gt;Since implementing this system, knowledge files have been steadily accumulating in the knowledge directory.&lt;/p&gt;

&lt;h2&gt;
  
  
  Acknowledging the Enforcement Gap
&lt;/h2&gt;

&lt;p&gt;Let me address the strongest counterargument upfront. "Markdown prohibitions don't have the same enforcement power as a linter." That's correct.&lt;/p&gt;

&lt;p&gt;Linters and type checkers mechanically detect rule violations. Depending on configuration, they can even block builds and merges entirely. Markdown prohibitions, on the other hand, carry the risk of the agent reading past them. If buried in a long instruction file, effectiveness drops.&lt;/p&gt;

&lt;p&gt;However, the comparison here isn't against "mechanical enforcement" — it's against "writing it in the prompt each time." Why does writing in a file work better than a prompt? Two reasons.&lt;/p&gt;

&lt;p&gt;First, the "reference mechanism is different." As noted earlier, instructions placed in Project Knowledge or Custom Instructions are passed to the agent in a separate channel from regular messages. They're placed in a position that's harder to bury even as conversations grow longer, structurally increasing the probability of being referenced.&lt;/p&gt;

&lt;p&gt;Second, "accumulation becomes irreversible." Instructions written in a prompt don't exist in the next session. Write them in a file, and they persist unless deleted. The cycle of "write a good instruction → forget → write again" becomes "write a good instruction → append to file → automatically referenced from then on."&lt;/p&gt;

&lt;p&gt;Lining up enforcement strength from weakest to strongest:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Write in the prompt each time" → "Place in a persistent file and reference every time" → "Mechanically block with linters and hooks"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Non-coding agents are currently at the middle position. Definitely stronger than the left end, doesn't reach the right end. But moving to the middle is still better for agent stability than staying at the left.&lt;/p&gt;

&lt;h2&gt;
  
  
  Repository Structure as a Design Decision
&lt;/h2&gt;

&lt;p&gt;So far I've written about individual rules, but the "where to put" the rules is itself a design decision.&lt;/p&gt;

&lt;p&gt;The repository structure that solidified through operation looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ai-agents/
├── agents/                  # Role-specific instruction files
│   ├── assistant.md         # Main instructions (prohibitions, mandatory actions)
│   ├── project-a/
│   │   ├── sre-support.md   # SRE-specific instructions
│   │   ├── qa-support.md    # QA-specific instructions
│   │   └── ...
│   └── project-b/
│       ├── accounting.md    # Accounting-specific instructions
│       └── ...
├── knowledge/               # Accumulated knowledge
│   ├── project-a/
│   ├── project-b/
│   └── writing-style-guide.md
├── docs/work-logs/          # Per-session work logs
└── CHANGELOG.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This structure shares two principles with coding agent harness design.&lt;/p&gt;

&lt;p&gt;The first is "separation of concerns." &lt;a href="https://openai.com/index/harness-engineering/" rel="noopener noreferrer"&gt;OpenAI's report&lt;/a&gt; documents the experience of a monolithic &lt;code&gt;AGENTS.md&lt;/code&gt; not working well. When everything in the context is "important," nothing is important. In my own repository too, I initially crammed everything into a single Markdown file. Separating files by role and having the agent reference only what's needed improved instruction effectiveness.&lt;/p&gt;

&lt;p&gt;What enables this is context routing rules. Define routing in the main instruction file so the agent can reference the appropriate specialized instructions based on conversation content.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Context Routing Rules
Judge which context the user's statement belongs to and reference the appropriate specialized instructions.

- Project A context signals: AWS, infrastructure, SRE, QA, team member names → Reference files under `agents/project-a/`
- Project B context signals: billing, contracts, accounting, legal → Reference files under `agents/project-b/`
- Ambiguous: Ask which project this is about
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the same structure as the &lt;code&gt;AGENTS.md&lt;/code&gt; "design as a pointer" principle. The main file handles routing only, delegating details to specialized files. OpenAI's report describes keeping &lt;code&gt;AGENTS.md&lt;/code&gt; to roughly 100 lines, functioning as a map. For non-coding agents, I've observed the same tendency — the longer the instruction file, the more effectiveness drops.&lt;/p&gt;

&lt;p&gt;The second is "version control." By placing instruction files in a Git repository, change history is preserved. "When was this prohibition added?" "Which rule change made things stable?" — all traceable via diff. Slack messages and ad-hoc prompts don't preserve this history. Additionally, since it's a Git repository, you're not tied to a specific PC. Keep it on a remote, and you can launch the same harness from any device.&lt;/p&gt;

&lt;p&gt;OpenAI's team makes the same point. Slack discussions, Google Docs content — if it's not in the repository, it's inaccessible to the agent and might as well not exist. This applies equally to non-coding agents.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;You don't need to structure everything from the start when beginning harness engineering for non-coding agents.&lt;/p&gt;

&lt;p&gt;In my case too, the early days were spent rewriting prompts. The order in which structure solidified was:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;When the agent makes the same mistake twice, write it in a file instead of a prompt&lt;/li&gt;
&lt;li&gt;When the file gets bloated, split by role&lt;/li&gt;
&lt;li&gt;When information is lost between sessions, build an accumulation system&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It's the same pattern Mitchell Hashimoto describes. "When the agent makes a mistake, build a system where that mistake can't happen again." For coding, you build it with linters and hooks. For non-coding, you build it with Markdown file structure. The material differs, but the thinking loop is the same.&lt;/p&gt;

&lt;p&gt;Here's a minimal starter template. Place it in Claude Desktop's Project Knowledge or ChatGPT's Custom Instructions and it works as-is.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Assistant Instructions

## Your Role

An AI assistant that supports user workflows.
Use MCP tools like Slack, Google Calendar, and Confluence for information gathering and organization.

## Prohibited Actions

- Do not auto-post to company Slack (draft only)
- Do not make definitive financial judgments (always ask user for confirmation)
- When including confidential information in summaries, explicitly note this

## Mandatory Actions at Session End

When the user indicates work completion, execute the following. Skipping is prohibited.

1. Create a work log at `docs/work-logs/YYYY-MM-DD-{topic}.md`
2. If there are changes, execute git commit &amp;amp; push

## Knowledge Accumulation (Mandatory Check)

Before each response, internally execute the following check. Skipping is prohibited.

Check: Does the immediately preceding conversation contain new information matching any of the following?
1. Factual information (team composition, tech stack, environment configuration)
2. Decisions (architecture selection, tool adoption, policy changes)
3. Learnings (facts discovered during troubleshooting, gotchas)

→ If applicable: Append a knowledge capture proposal at the end
→ If not applicable: Append nothing
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This template is roughly 30 lines. Start here, and add one line to the prohibited actions every time the agent makes a mistake. In a few months, you'll have a harness built specifically for you.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Question Harnesses Share
&lt;/h2&gt;

&lt;p&gt;Harness engineering isn't a coding-specific technique. It's a design philosophy: giving agents a reliable execution environment.&lt;/p&gt;

&lt;p&gt;Coding agents build that environment with types, linters, and hooks. Non-coding agents build it with structured Markdown and forced referencing. The materials differ, but the question is the same: "When this agent makes a mistake, where is the system that prevents it from happening a second time?"&lt;/p&gt;

&lt;p&gt;Since shifting from "I just need to write better prompts" to "I need to build a structure where the same mistake can't happen," my agents have been running more stably.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>llm</category>
      <category>automation</category>
    </item>
    <item>
      <title>What Changes and What Stays the Same for SRE with AWS Frontier Agents</title>
      <dc:creator>Kento IKEDA</dc:creator>
      <pubDate>Mon, 13 Apr 2026 20:13:46 +0000</pubDate>
      <link>https://dev.to/aws-builders/what-changes-and-what-stays-the-same-for-sre-with-aws-frontier-agents-23aj</link>
      <guid>https://dev.to/aws-builders/what-changes-and-what-stays-the-same-for-sre-with-aws-frontier-agents-23aj</guid>
      <description>&lt;p&gt;On March 31, 2026, AWS made DevOps Agent and Security Agent generally available — the first two of the autonomous AI agents announced at re:Invent 2025 under the "Frontier Agents" brand. A 2-month free trial is included, after which pay-as-you-go pricing kicks in.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/blogs/mt/announcing-general-availability-of-aws-devops-agent/" rel="noopener noreferrer"&gt;https://aws.amazon.com/blogs/mt/announcing-general-availability-of-aws-devops-agent/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/blogs/machine-learning/aws-launches-frontier-agents-for-security-testing-and-cloud-operations/" rel="noopener noreferrer"&gt;https://aws.amazon.com/blogs/machine-learning/aws-launches-frontier-agents-for-security-testing-and-cloud-operations/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The official announcements highlight numbers like "up to 75% MTTR reduction" and "penetration testing compressed from weeks to hours." The question that matters more is: how does this change the day-to-day work of an SRE team? Feature overviews are already plentiful, so this article focuses on what shifts to agents and what stays with humans.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is a Frontier Agent?
&lt;/h2&gt;

&lt;p&gt;AWS announced three Frontier Agents at re:Invent 2025: Kiro Autonomous Agent (software development), DevOps Agent (operations), and Security Agent (security). Of these, DevOps Agent and Security Agent are now GA. Kiro Autonomous Agent remains in preview.&lt;/p&gt;

&lt;p&gt;AWS &lt;a href="https://aws.amazon.com/blogs/machine-learning/aws-launches-frontier-agents-for-security-testing-and-cloud-operations/" rel="noopener noreferrer"&gt;defines Frontier Agents&lt;/a&gt; as systems that "work independently to achieve goals, scale massively to tackle concurrent tasks, and run persistently for hours or days." Frankly, that description could apply to existing AI agents like Claude Code or Devin. What AWS emphasizes is delivering "complete outcomes" rather than assisting with individual tasks, but this feels like a difference of degree, not kind.&lt;/p&gt;

&lt;p&gt;In practice, it's probably best to think of them as domain-specialized autonomous agents — deeply integrated with DevOps and security workflows. "Frontier" is more of a marketing brand than a technical category: "AWS's first-party, domain-specific agent products" is a fair characterization.&lt;/p&gt;

&lt;p&gt;What matters isn't the naming — it's how these agents affect SRE work.&lt;/p&gt;

&lt;h2&gt;
  
  
  What DevOps Agent Does and Doesn't Do
&lt;/h2&gt;

&lt;p&gt;AWS describes DevOps Agent as an "always-available operations teammate." However, since it requires human approval for fixes and can't make business decisions, the reality is closer to an "always-on SRE apprentice" — it investigates and proposes, but can't decide or execute. Here's where that boundary lies.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the Agent Does
&lt;/h3&gt;

&lt;p&gt;Imagine an alert fires at 2 AM. Traditionally, the on-call engineer wakes up from a Datadog alert, opens their laptop, checks dashboards for metric anomalies, digs through logs, cross-references deployment history, and identifies the root cause. DevOps Agent automates this entire initial investigation.&lt;/p&gt;

&lt;p&gt;Specifically, it correlates metrics and logs from monitoring tools (CloudWatch, Datadog, Dynatrace, New Relic, Splunk, Grafana), code repositories (GitHub, GitLab, Azure DevOps), and CI/CD deployment histories to build hypotheses like "this code change introduced in this deployment correlates with this metric anomaly." Investigation progress is shared via the web console and Slack, where you can ask follow-up questions or redirect the investigation.&lt;/p&gt;

&lt;p&gt;The GA release adds Azure and on-premises environments as investigation targets. On-premises tools connect via MCP (Model Context Protocol), enabling consistent investigation across multicloud and hybrid setups.&lt;/p&gt;

&lt;p&gt;Beyond incident response, DevOps Agent also provides proactive improvement recommendations — analyzing historical incident patterns to identify gaps in alert coverage, test coverage, code quality, and infrastructure configuration.&lt;/p&gt;

&lt;p&gt;It's worth noting that Datadog's Bits AI SRE offers quite similar capabilities: autonomous alert investigation, source code analysis, and deployment correlation. The key difference is that DevOps Agent can simultaneously span multiple observability tools (Datadog + CloudWatch + Splunk, etc.) and include Azure and on-premises environments via MCP. If your organization is entirely within the Datadog ecosystem, Bits AI SRE may be sufficient. If you have multiple tools or a multicloud setup, DevOps Agent's cross-platform analysis adds value. More on this in the "How Does the Relationship with Existing Tools Change?" section.&lt;/p&gt;

&lt;p&gt;The GA release also introduced "Learned Skills" and "Custom Skills." Learned Skills let the agent learn from your organization's investigation patterns and tool usage, improving accuracy over time. Custom Skills let you add organization-specific investigation procedures and best practices, configurable per incident type (triage, root cause analysis, mitigation). Code Indexing also enables code-level fix suggestions based on repository understanding.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the Agent Doesn't Do
&lt;/h3&gt;

&lt;p&gt;This is the critical part. DevOps Agent investigates and proposes — executing fixes is up to humans.&lt;/p&gt;

&lt;p&gt;It can generate fix proposals and work with Kiro or Claude Code to produce fix code, but applying changes to production requires human approval. This is intentional — AWS has made the design decision that "an agent that modifies production without approval won't be trusted."&lt;/p&gt;

&lt;p&gt;The other thing the agent doesn't do is business judgment. "This incident has major customer impact, so we need a company-wide response." "It's Friday night — let's apply a workaround and do the root cause fix Monday." These decisions require human context. Identifying the technical root cause and deciding how to respond are separate jobs. DevOps Agent handles the former; humans own the latter.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0swk42fyd4v2tj2ovtzo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0swk42fyd4v2tj2ovtzo.png" alt=" " width="800" height="797"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Security Agent Does and Doesn't Do
&lt;/h2&gt;

&lt;p&gt;Security Agent is the security counterpart — an "always-on penetration tester and security reviewer." It has three main capabilities: on-demand penetration testing, design document security review (Design Review), and PR security review (Code Review).&lt;/p&gt;

&lt;h3&gt;
  
  
  What the Agent Does
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Penetration Testing
&lt;/h4&gt;

&lt;p&gt;Traditionally, penetration testing meant "once or twice a year, for the most critical applications only, outsourced to specialists, taking weeks." Cost and time constraints leave most of the application portfolio untested. As the &lt;a href="https://aws.amazon.com/security-agent/faqs/" rel="noopener noreferrer"&gt;Security Agent FAQ&lt;/a&gt; notes, "most organizations limit manual penetration testing to their most critical applications and conduct these tests periodically." And even tested applications become partially unverified the moment new code is deployed.&lt;/p&gt;

&lt;p&gt;Security Agent changes this structure. You create an "Agent Space," connect your GitHub repository, and the agent reads source code, architecture documents, and design docs to understand the application's structure before running automated penetration tests against endpoints.&lt;/p&gt;

&lt;p&gt;The key difference from simply running a scanner: Security Agent validates discovered vulnerabilities by actually sending payloads to confirm exploitability. Reports include reproduction steps, dramatically reducing false positives. Per the &lt;a href="https://aws.amazon.com/security-agent/features/" rel="noopener noreferrer"&gt;official features page&lt;/a&gt;, testing covers OWASP Top 10 vulnerability types plus business logic flaws. According to the &lt;a href="https://aws.amazon.com/blogs/aws/aws-weekly-roundup-aws-devops-agent-security-agent-ga-product-lifecycle-updates-and-more-april-6-2026/" rel="noopener noreferrer"&gt;GA announcement blog&lt;/a&gt;, LG CNS reported significant false positive reduction, over 50% faster testing, and roughly 30% cost reduction.&lt;/p&gt;

&lt;h4&gt;
  
  
  Design Review
&lt;/h4&gt;

&lt;p&gt;This capability reviews architecture documents and design docs from a security perspective before any code is written. It checks against AWS best practices and your organization's custom security requirements. Catching issues at the design stage avoids costly rework after implementation.&lt;/p&gt;

&lt;h4&gt;
  
  
  PR Review
&lt;/h4&gt;

&lt;p&gt;Pull Request-level security review is the third capability. As of GA, it supports GitHub PRs — Security Agent automatically reviews PRs for security issues when they're created. You can configure it to check custom security requirement compliance, common security vulnerabilities, or both.&lt;/p&gt;

&lt;p&gt;PR security checks aren't new — many organizations already have Claude Code or Codex review PRs with security instructions via CLAUDE.md, or have SAST tools in their CI/CD pipeline. Security Agent's difference is operational: security requirements are defined once in the console and automatically applied across all repositories. This removes the overhead of maintaining per-repository md files, but it doesn't enable something technically impossible before. Of the three capabilities, penetration test automation is where the real differentiation lies.&lt;/p&gt;

&lt;p&gt;Design reviews are free up to 200/month, and code reviews up to 1,000/month. Only penetration testing is paid ($50/task-hour) — more on this in the "Cost Structure" section.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the Agent Doesn't Do
&lt;/h3&gt;

&lt;p&gt;Security Agent automates "discovery and validation" — "judgment and response" remain human territory.&lt;/p&gt;

&lt;p&gt;Security policy decisions are human work. "Which risks to accept and which to address," "how to interpret compliance requirements" — these are outside the agent's scope. For example, "fixing this vulnerability requires a breaking API change, but we need to consider the impact on a major customer's release schedule" is a business trade-off that requires human judgment.&lt;/p&gt;

&lt;p&gt;Social engineering (tricking employees into granting access) and vulnerabilities that can only be discovered by understanding the entire business workflow are also difficult to cover with automated testing alone. While the official documentation says business logic flaws are included in the test scope, the agent doesn't fully replace a human penetration tester's judgment of "what this operation means in the context of this business flow." Security Agent's strength is "broad, frequent, systematic testing" — complementing, not replacing, "deep, creative testing" by human experts.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Does the Relationship with Existing Tools Change?
&lt;/h2&gt;

&lt;p&gt;"Will Datadog or PagerDuty become unnecessary?" Short answer: no. The relationship changes, not the need.&lt;/p&gt;

&lt;p&gt;Here's how the human steps change, using a late-night alert as an example:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk06ru8yvqvkmggtmyjrz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk06ru8yvqvkmggtmyjrz.png" alt=" " width="800" height="923"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Previously humans handled all 5 steps; after adoption, human steps drop to 2. Red = previously all-human work, green = shifts to the agent, blue = remains human.&lt;/p&gt;

&lt;h3&gt;
  
  
  Monitoring Tools (Datadog / CloudWatch, etc.)
&lt;/h3&gt;

&lt;p&gt;DevOps Agent doesn't "replace" these tools — it uses them as "data sources." GA supports integrations with CloudWatch, Datadog, Dynatrace, New Relic, Splunk, and Grafana. This isn't about canceling Datadog and switching to DevOps Agent — it's about DevOps Agent analyzing the metrics and logs that Datadog collects.&lt;/p&gt;

&lt;p&gt;You might wonder: "Could we drop Datadog, consolidate on CloudWatch, and use DevOps Agent / Security Agent to save costs?" DevOps Agent handles incident investigation and improvement recommendations — it doesn't include day-to-day monitoring features like APM, RUM, distributed tracing, or dashboards. Datadog's value extends beyond incident response, so a simple swap doesn't work. That said, there is overlap between Datadog's Bits AI SRE and DevOps Agent's incident investigation capabilities, so whether you need to pay for both is worth evaluating.&lt;/p&gt;

&lt;p&gt;In fact, the quality of your monitoring setup directly affects DevOps Agent's effectiveness. The agent can only analyze what your tools collect. Sparse metrics and logs mean less accurate analysis. The direction isn't "adopt the agent so we can invest less in monitoring" — it's "invest in monitoring to maximize the agent's effectiveness."&lt;/p&gt;

&lt;h3&gt;
  
  
  Incident Notification and On-Call Management (PagerDuty / Datadog On-Call, etc.)
&lt;/h3&gt;

&lt;p&gt;Alert routing and on-call management roles remain unchanged. DevOps Agent starts investigating the moment an alert fires, completing initial investigation before the notified human even logs in. On-call scheduling, escalation, and incident lifecycle management continue to be handled by tools like PagerDuty or Datadog On-Call. The GA release added PagerDuty integration as well.&lt;/p&gt;

&lt;p&gt;What changes is "the first thing the on-call engineer does." Instead of "open the dashboard and check metrics," it becomes "read the Agent's investigation results shared in Slack." Per the &lt;a href="https://aws.amazon.com/blogs/mt/announcing-general-availability-of-aws-devops-agent/" rel="noopener noreferrer"&gt;official GA blog&lt;/a&gt;, Zenchef (a restaurant technology platform) submitted an issue to DevOps Agent during a hackathon and had the root cause identified in 20–30 minutes — an investigation that would normally take 1–2 hours, completed while the engineers stayed focused on the hackathon.&lt;/p&gt;

&lt;h3&gt;
  
  
  GitHub (Security Agent PR Review)
&lt;/h3&gt;

&lt;p&gt;Security Agent automatically posts security review comments on GitHub PRs. Developers can review and address findings without leaving the PR interface. Merge decisions remain human. Details and differentiation points are covered in the "What Security Agent Does and Doesn't Do" section above.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost Structure
&lt;/h2&gt;

&lt;p&gt;Frontier Agents have a distinctive pricing model, especially DevOps Agent's tie-in with AWS Support plans.&lt;/p&gt;

&lt;h3&gt;
  
  
  DevOps Agent: Support Credits
&lt;/h3&gt;

&lt;p&gt;DevOps Agent costs $0.0083/agent-second (roughly $30/hr). The &lt;a href="https://aws.amazon.com/devops-agent/pricing/" rel="noopener noreferrer"&gt;official pricing page&lt;/a&gt; shows usage examples: a small team (10 incident investigations/month, 8 minutes each) at ~$40/month, and an enterprise (500 incidents, 10 Agent Spaces) at ~$2,300/month.&lt;/p&gt;

&lt;p&gt;On top of this, &lt;a href="https://aws.amazon.com/devops-agent/pricing/" rel="noopener noreferrer"&gt;per the pricing page&lt;/a&gt;, AWS Support customers receive monthly credits based on the prior month's Support spend:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Support Plan&lt;/th&gt;
&lt;th&gt;Credit Rate&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Unified Operations&lt;/td&gt;
&lt;td&gt;100% of prior month's Support spend&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise Support&lt;/td&gt;
&lt;td&gt;75% of prior month's Support spend&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Business Support+&lt;/td&gt;
&lt;td&gt;30% of prior month's Support spend&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For example, an organization paying $15,000/month for Enterprise Support would receive $11,250/month in DevOps Agent credits. If usage stays within that, the incremental cost is zero.&lt;/p&gt;

&lt;p&gt;Behind this credit structure is the relationship with TAM (Technical Account Manager) in Enterprise Support. Traditionally, Enterprise Support customers get a TAM who provides architecture reviews and operational guidance. The &lt;a href="https://aws.amazon.com/premiumsupport/plans/enterprise/" rel="noopener noreferrer"&gt;Enterprise Support page&lt;/a&gt; now presents TAM and DevOps Agent side by side — TAM handles strategic guidance, DevOps Agent handles 24/7 automated investigation and improvement proposals. DevOps Agent is positioned as an "extension" of Support, not a replacement, which explains why credits come from Support spend.&lt;/p&gt;

&lt;h3&gt;
  
  
  Security Agent: Penetration Testing Cost Transformation
&lt;/h3&gt;

&lt;p&gt;Security Agent penetration testing is billed at $50/task-hour (design and code reviews have free tiers as mentioned above). Per the &lt;a href="https://aws.amazon.com/security-agent/faqs/" rel="noopener noreferrer"&gt;official FAQ&lt;/a&gt;, a 2-month free trial is included post-GA.&lt;/p&gt;

&lt;p&gt;Traditional third-party penetration testing typically costs hundreds of thousands of yen (tens of thousands of dollars) per engagement, taking weeks. This "high unit cost × low frequency" structure transforms into "$50/task-hour × high frequency."&lt;/p&gt;

&lt;p&gt;The implication: it becomes economically viable to expand penetration testing coverage. Organizations that could only test their most critical applications due to cost constraints can now continuously test across their entire portfolio.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Does SRE Team Management Change?
&lt;/h2&gt;

&lt;p&gt;Frontier Agents adoption has implications for how SRE teams operate.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Nature of On-Call Pain Changes
&lt;/h3&gt;

&lt;p&gt;DevOps Agent's biggest impact is automating initial investigation for late-night alerts. Per the &lt;a href="https://aws.amazon.com/blogs/mt/announcing-general-availability-of-aws-devops-agent/" rel="noopener noreferrer"&gt;official GA blog&lt;/a&gt;, WGU (Western Governors University) deployed it to production during preview, reducing estimated 2-hour investigations to 28 minutes.&lt;/p&gt;

&lt;p&gt;Traditional on-call pain comes from being woken up and having to start investigating from scratch in a less-than-ideal state. Opening dashboards, hunting for metric anomalies, pulling related logs, cross-referencing deployment history — this alone can take 30 minutes to an hour.&lt;/p&gt;

&lt;p&gt;After DevOps Agent adoption, this "starting from zero" disappears. The on-call engineer's first action becomes "read the Agent's findings." The agent presents "this code change in this deployment correlates with this metric anomaly — here's the proposed fix" as the starting point.&lt;/p&gt;

&lt;p&gt;However, a new kind of pressure may emerge: "Is the root cause the Agent identified actually correct? Are there blind spots?" The tension between trusting the agent's output and risking an incorrect fix, versus distrusting it and re-investigating from scratch. Worth discussing as a team before it happens.&lt;/p&gt;

&lt;h3&gt;
  
  
  Required Skill Sets Change
&lt;/h3&gt;

&lt;p&gt;The shift goes from "investigate from scratch when alerted" to "review the Agent's findings and judge whether anything was missed." It's similar to when CI/CD became standard — the emphasis moved from "memorize manual deployment procedures" to "design pipelines and make judgment calls when they fail." As automation advances, the ability to audit automated output and handle exceptions that automation can't becomes more important than hands-on execution skills.&lt;/p&gt;

&lt;p&gt;For less experienced team members, learning design becomes necessary. Previously, "investigating incidents yourself" was the primary way to build incident response skills. With agents handling initial investigation, these "learn by doing" opportunities shrink.&lt;/p&gt;

&lt;p&gt;Possible approaches include "form your own hypothesis before looking at the Agent's findings," "review Agent output with intentionally injected errors," or "regular incident response drills without the Agent." The same thinking as understanding manual deployment procedures even when you rely on CI/CD.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can You Reduce Headcount?
&lt;/h3&gt;

&lt;p&gt;For SRE team managers, this question is unavoidable. Can Frontier Agents adoption mean fewer people?&lt;/p&gt;

&lt;p&gt;In the short term, "same headcount, broader coverage" is more realistic. Agent-handled initial investigations free up SRE team time. Whether that freed time goes to "headcount reduction" or "proactive improvements that were previously backlogged (SLO reviews, chaos engineering, architecture improvements)" is the real question. The latter likely delivers more organizational value.&lt;/p&gt;

&lt;p&gt;SRE teams perpetually stuck in reactive incident response mode (the "firefighter" state) is a common challenge. Frontier Agents adoption is a catalyst for accelerating the shift from "firefighter" to "fire prevention engineer."&lt;/p&gt;

&lt;h2&gt;
  
  
  Constraints and Caveats
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Reading the Numbers
&lt;/h3&gt;

&lt;p&gt;AWS's published figures — "up to 75% MTTR reduction," "up to 80% faster investigations," "94% root cause accuracy" — are all preview-period customer-reported values, explicitly qualified with "up to." Whether your environment sees similar results depends on application complexity, monitoring maturity, and incident characteristics. Treat them as reference points and validate in your own environment.&lt;/p&gt;

&lt;p&gt;WGU's "estimated 2 hours → 28 minutes" and LG CNS's "over 50% faster testing" are also results from specific situations. This article cites these numbers as material for understanding implications, not as guarantees that generalize.&lt;/p&gt;

&lt;h3&gt;
  
  
  Region Limitations
&lt;/h3&gt;

&lt;p&gt;Both DevOps Agent and Security Agent are available in the same six regions at GA: US East (N. Virginia), US West (Oregon), Europe (Frankfurt / Ireland), and Asia Pacific (Sydney / Tokyo). Tokyo region availability is a plus for teams in Japan.&lt;/p&gt;

&lt;p&gt;However, per &lt;a href="https://newclawtimes.com/articles/aws-frontier-agents-devops-security-autonomous-operations-enterprise/" rel="noopener noreferrer"&gt;New Claw Times&lt;/a&gt; analysis, DevOps Agent inference processing occurs in US regions regardless of the selected region. Organizations with strict GDPR or data residency requirements should verify this.&lt;/p&gt;

&lt;h3&gt;
  
  
  Free Trial Limitations
&lt;/h3&gt;

&lt;p&gt;Both agents include a 2-month free trial, but DevOps Agent has monthly caps. Per &lt;a href="https://aws.amazon.com/devops-agent/pricing/" rel="noopener noreferrer"&gt;official pricing&lt;/a&gt;, the trial period allows up to 10 Agent Spaces, 20 hours of incident investigation, 15 hours of prevention evaluations, and 20 hours of on-demand SRE tasks per month. Excess usage incurs standard charges. Sufficient for pre-production evaluation, but watch the limits for large-scale PoCs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multicloud Support Reality
&lt;/h3&gt;

&lt;p&gt;DevOps Agent supports AWS, Azure, and on-premises. On-premises connection uses MCP, requiring access configuration for target tools. "It doesn't just see other environments without setup." Note that DevOps Agent does not explicitly support Google Cloud environments at GA.&lt;/p&gt;

&lt;p&gt;Security Agent's penetration testing supports &lt;a href="https://aws.amazon.com/about-aws/whats-new/2026/03/aws-security-agent-ondemand-penetration/" rel="noopener noreferrer"&gt;per the official GA announcement&lt;/a&gt; "AWS, Azure, GCP, other cloud-providers, and on-premises" — it can test any reachable endpoint regardless of cloud provider.&lt;/p&gt;

&lt;h3&gt;
  
  
  Security Agent: GitHub Only
&lt;/h3&gt;

&lt;p&gt;As mentioned earlier, Security Agent's PR review (Code Review) only supports GitHub at GA. Organizations primarily using GitLab or Bitbucket need to factor in this constraint.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary: Not "Replacement" but "Redesigning the Division of Labor"
&lt;/h2&gt;

&lt;p&gt;Frontier Agents don't "eliminate" SRE work — they "partition" it.&lt;/p&gt;

&lt;p&gt;Work that shifts to agents centers on pattern recognition and correlation analysis: detecting anomalies across metrics and logs, matching against historical incidents to form hypotheses, systematically scanning code for vulnerabilities. This "intellectual labor that demands volume and speed" is where agents excel.&lt;/p&gt;

&lt;p&gt;Work that stays human centers on judgment and decision-making: whether to apply a fix, how to assess business impact, how much risk to accept, how to evolve the architecture. These are context-dependent, requiring organizational knowledge and business priority judgment — outside the agent's scope.&lt;/p&gt;

&lt;p&gt;For SRE team management, this is an opportunity to redesign team skill composition and on-call structure around this new division of labor. Not "agents mean we need fewer people," but "agents handle more of the routine, so humans can focus on work that demands greater judgment."&lt;/p&gt;

</description>
      <category>aws</category>
      <category>devops</category>
      <category>sre</category>
      <category>security</category>
    </item>
    <item>
      <title>Architecture Layers That S3 Files Eliminates — and Creates</title>
      <dc:creator>Kento IKEDA</dc:creator>
      <pubDate>Thu, 09 Apr 2026 06:56:45 +0000</pubDate>
      <link>https://dev.to/aws-builders/architecture-layers-that-s3-files-eliminates-and-creates-16ke</link>
      <guid>https://dev.to/aws-builders/architecture-layers-that-s3-files-eliminates-and-creates-16ke</guid>
      <description>&lt;p&gt;On April 7, 2026, AWS made Amazon S3 Files generally available. It lets you mount S3 buckets as NFS v4.1/v4.2 file systems from EC2, EKS, ECS, and Lambda.&lt;/p&gt;


&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
        &lt;div class="c-embed__cover"&gt;
          &lt;a href="https://aws.amazon.com/blogs/aws/launching-s3-files-making-s3-buckets-accessible-as-file-systems/" class="c-link align-middle" rel="noopener noreferrer"&gt;
            &lt;img alt="" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fd2908q01vomqb2.cloudfront.net%2Fda4b9237bacccdf19c0760cab7aec4a8359010b0%2F2026%2F04%2F07%2FScreenshot-2026-04-06-at-3.50.49%25E2%2580%25AFPM.png" height="433" class="m-0" width="800"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="c-embed__body"&gt;
        &lt;h2 class="fs-xl lh-tight"&gt;
          &lt;a href="https://aws.amazon.com/blogs/aws/launching-s3-files-making-s3-buckets-accessible-as-file-systems/" rel="noopener noreferrer" class="c-link"&gt;
            Launching S3 Files, making S3 buckets accessible as file systems | AWS News Blog
          &lt;/a&gt;
        &lt;/h2&gt;
        &lt;div class="color-secondary fs-s flex items-center"&gt;
            &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fa0.awsstatic.com%2Fmain%2Fimages%2Fsite%2Ffav%2Ffavicon.ico" width="16" height="16"&gt;
          aws.amazon.com
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;There are already plenty of setup guides and first-look posts. This article focuses on something different: what becomes unnecessary and what becomes possible in your architecture.&lt;/p&gt;

&lt;p&gt;If you use S3 regularly and are wondering "this sounds big, but how does it actually affect my architecture?" — this is for you.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem S3 Files Is Solving
&lt;/h2&gt;

&lt;p&gt;Let's start with a shared understanding.&lt;/p&gt;

&lt;p&gt;Say an ML team needs to preprocess training data. The raw data is in S3. They want to use pandas. While &lt;code&gt;pd.read_csv("s3://my-bucket/data.csv")&lt;/code&gt; works, under the hood boto3 issues GET requests and loads data into memory. Writing results back requires PUT. This is fundamentally different from &lt;code&gt;open("./data.csv")&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;At scale, this becomes an architectural problem. Many organizations operate pipelines like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fojovark7e9sap1xag7k5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fojovark7e9sap1xag7k5.png" alt=" " width="800" height="177"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Copy from S3 to EFS/EBS, process, write results back to S3. This "middle copy layer" exists solely to bridge the I/O model gap between object storage and file systems. Maintaining sync scripts, managing consistency during copies, and provisioning EFS — all of this overhead comes from that gap.&lt;/p&gt;

&lt;p&gt;S3 Files aims to eliminate this gap entirely.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqxczxqawaej6wt02f41a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqxczxqawaej6wt02f41a.png" alt=" " width="800" height="183"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From the application's perspective, S3 data appears as a local directory. &lt;code&gt;pd.read_csv("/mnt/s3files/data.csv")&lt;/code&gt; reads from S3 behind the scenes, and &lt;code&gt;df.to_csv("/mnt/s3files/result.csv")&lt;/code&gt; automatically commits changes back.&lt;/p&gt;

&lt;p&gt;The full technical overview is in the official documentation.&lt;/p&gt;


&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
      &lt;div class="c-embed__body flex items-center justify-between"&gt;
        &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-files.html" rel="noopener noreferrer" class="c-link fw-bold flex items-center"&gt;
          &lt;span class="mr-2"&gt;docs.aws.amazon.com&lt;/span&gt;
          

        &lt;/a&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


&lt;h2&gt;
  
  
  Why This Isn't Just Another Mount Feature
&lt;/h2&gt;

&lt;p&gt;If "mount S3" sounds familiar, you might be thinking of Mountpoint for Amazon S3 or Google Cloud's Cloud Storage FUSE (gcsfuse). S3 Files has a fundamentally different architecture.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Difference from FUSE-Based Tools
&lt;/h3&gt;

&lt;p&gt;FUSE-based tools emulate file system behavior on top of S3's API. In Mountpoint for Amazon S3, for example, overwriting a file means deleting the old object and PUTting a new one. Partial file writes — a basic file system operation — aren't supported. Directories don't actually exist, leading to inconsistencies with empty directories.&lt;/p&gt;

&lt;p&gt;S3 Files doesn't emulate. It connects EFS (Elastic File System), a real NFS file system, to S3. The file system side provides real NFS semantics, and the S3 side remains real S3 objects. Two distinct systems coexist with an explicit synchronization layer between them.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgep96c8nftvluyqe4r9h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgep96c8nftvluyqe4r9h.png" alt=" " width="800" height="661"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This matters in practice: appending to a WAL (Write-Ahead Log) or editing part of a config file works with byte-level writes on the file system side, periodically synced to S3 as whole objects. With FUSE, these operations require re-PUTting the entire object.&lt;/p&gt;

&lt;h3&gt;
  
  
  What "Stage and Commit" Actually Does
&lt;/h3&gt;

&lt;p&gt;Andy Warfield, VP and Distinguished Engineer at AWS, describes the sync model as "stage and commit" in his post on All Things Distributed, explicitly noting it's "a term borrowed from version control systems like git" (official documentation uses "synchronization" instead).&lt;/p&gt;

&lt;p&gt;File system changes are like working directory changes in Git. They aren't immediately reflected in S3 — instead, they're batched and committed as S3 PUTs approximately every 60 seconds. In the other direction, when objects are updated in S3 (e.g., via PutObject from another application), the official documentation states changes are reflected in the file system "typically within seconds." DevelopersIO's hands-on testing measured approximately 30 seconds.&lt;/p&gt;


&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
        &lt;div class="c-embed__cover"&gt;
          &lt;a href="https://dev.classmethod.jp/articles/amazon-s3-files-ga-mount-and-compare-efs/" class="c-link align-middle" rel="noopener noreferrer"&gt;
            &lt;img alt="" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.ctfassets.net%2Fct0aopd36mqt%2Fwp-thumbnail-066beb776f0c57ce64255fadcc072f60%2F82b2f6687ab5774cd73f9176dcac7855%2Famazon-s3" height="630" class="m-0" width="1200"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="c-embed__body"&gt;
        &lt;h2 class="fs-xl lh-tight"&gt;
          &lt;a href="https://dev.classmethod.jp/articles/amazon-s3-files-ga-mount-and-compare-efs/" rel="noopener noreferrer" class="c-link"&gt;
            Amazon S3 Files が GA — S3 バケットをファイルシステムとしてマウント、EFS と比較してみた | DevelopersIO
          &lt;/a&gt;
        &lt;/h2&gt;
          &lt;p class="truncate-at-3"&gt;
            2026年4月提供開始のAmazon S3 Filesは、S3バケットをNFS v4.2でマウント可能にする新サービス。EC2/Lambda/EKS/ECSから利用でき、既存レガシーアプリケーションのコード変更なしでS3を活用できます。

          &lt;/p&gt;
        &lt;div class="color-secondary fs-s flex items-center"&gt;
            &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev.classmethod.jp%2Ffavicon.ico" width="48" height="48"&gt;
          dev.classmethod.jp
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;If both sides modify the same file simultaneously, S3 is the source of truth. The file system version is moved to a lost+found directory, with a CloudWatch metric indicating the conflict.&lt;/p&gt;

&lt;p&gt;This is a deliberate tradeoff: not a real-time shared file system, but one that tolerates tens of seconds of delay in exchange for preserving both file and object semantics without compromise.&lt;/p&gt;

&lt;p&gt;According to Warfield's post, the team initially tried to make the boundary between files and objects invisible, but every approach forced unacceptable compromises on one side or the other. They ultimately decided to make the boundary itself an explicit, well-designed feature. His post is essential reading for understanding the "why" behind S3 Files.&lt;/p&gt;


&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
        &lt;div class="c-embed__cover"&gt;
          &lt;a href="https://www.allthingsdistributed.com/2026/04/s3-files-and-the-changing-face-of-s3.html" class="c-link align-middle" rel="noopener noreferrer"&gt;
            &lt;img alt="" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.allthingsdistributed.com%2Fimages%2Fsunflowers.jpg" height="570" class="m-0" width="800"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="c-embed__body"&gt;
        &lt;h2 class="fs-xl lh-tight"&gt;
          &lt;a href="https://www.allthingsdistributed.com/2026/04/s3-files-and-the-changing-face-of-s3.html" rel="noopener noreferrer" class="c-link"&gt;
            S3 Files and the changing face of S3 | All Things Distributed
          &lt;/a&gt;
        &lt;/h2&gt;
          &lt;p class="truncate-at-3"&gt;
            Andy Warfield writes about the hard-won lessons dealing with data friction that lead to S3 Files
          &lt;/p&gt;
        &lt;div class="color-secondary fs-s flex items-center"&gt;
            &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/data%3Aimage%2Fpng%3Bbase64%2CiVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mNk%2BA8AAQUBAScY42YAAAAASUVORK5CYII%3D" width="1" height="1"&gt;
          allthingsdistributed.com
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


&lt;h2&gt;
  
  
  Architecture Layers That Disappear
&lt;/h2&gt;

&lt;p&gt;Here's the core of this article: what specific architectural patterns does S3 Files make unnecessary?&lt;/p&gt;

&lt;h3&gt;
  
  
  1. S3 → EFS/EBS Staging Pipelines
&lt;/h3&gt;

&lt;p&gt;Consider a daily retraining pipeline for a recommendation model. Purchase logs accumulate in S3, and preprocessing involves data cleansing → feature generation → format conversion.&lt;/p&gt;

&lt;p&gt;Previously, every time an EC2 or SageMaker Processing Job starts, it first downloads data from S3 to EBS. For 100GB of training data, depending on instance network bandwidth, the download alone takes several minutes. After processing, results are uploaded back to S3, and the EBS volume is cleaned up. Of the four steps — download → process → upload → cleanup — only "process" is the actual work.&lt;/p&gt;

&lt;p&gt;With S3 Files, you mount the S3 prefix (e.g., &lt;code&gt;s3://ml-data/purchase-logs/&lt;/code&gt;) and your processing script reads and writes &lt;code&gt;/mnt/s3files/purchase-logs/&lt;/code&gt; directly. Download, upload, and cleanup steps disappear.&lt;/p&gt;

&lt;p&gt;Note: if a downstream job needs to read results via the S3 API immediately, the ~60-second commit delay matters. If both jobs use the same mount point, this isn't an issue. For S3 API consumers, design around S3 event notifications or explicit waits.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Lambda's "/tmp Download" Pattern
&lt;/h3&gt;

&lt;p&gt;Consider a Lambda function that generates thumbnails when images are uploaded to S3. The traditional implementation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Traditional: Download → Process → Upload
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;PIL&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;

&lt;span class="n"&gt;s3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;bucket&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Records&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;bucket&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Records&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;key&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;download_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/tmp/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;download_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;download_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;download_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;thumbnail&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;thumb_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/tmp/thumb_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;thumb_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upload_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;thumb_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;thumbnails/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;With S3 Files mounted:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# S3 Files: Operate directly on mounted paths
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;PIL&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Records&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;key&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/mnt/s3files/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;thumbnail&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/mnt/s3files/thumbnails/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;You don't even need to import boto3. The same code you'd write for local development works as-is.&lt;/p&gt;

&lt;p&gt;Beyond code simplicity, Lambda functions are freed from &lt;code&gt;/tmp&lt;/code&gt; capacity constraints (default 512MB, max 10GB). For functions referencing multi-GB ML models, cold start download time directly impacted latency. S3 Files pre-fetches files below a configurable threshold (default 128KB) alongside metadata, and fetches larger files on demand. Warfield calls this "lazy hydration" in his post — you can start working immediately even with millions of objects in the bucket.&lt;/p&gt;


&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
      &lt;div class="c-embed__body flex items-center justify-between"&gt;
        &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-files-mounting-lambda.html" rel="noopener noreferrer" class="c-link fw-bold flex items-center"&gt;
          &lt;span class="mr-2"&gt;docs.aws.amazon.com&lt;/span&gt;
          

        &lt;/a&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Self-Managed EFS + S3 Sync
&lt;/h3&gt;

&lt;p&gt;If your organization uses S3 as a data lake but needs EFS for real-time processing or interactive analysis, you likely have DataSync, Step Functions, or cron scripts bridging the two. Maintaining this sync logic — detecting new objects, identifying diffs, retry on failure, consistency during sync, cleanup of stale EFS files — is a significant operational burden.&lt;/p&gt;

&lt;p&gt;S3 Files replaces this with managed synchronization. Per the official documentation, import from S3 runs at up to 2,400 objects/second, and export to S3 uses ~60-second batch windows. Unused file data is automatically evicted from the file system cache (configurable from 1 to 365 days, default 30) but never deleted from S3. File system storage costs scale with your active working set, not your total dataset.&lt;/p&gt;


&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
      &lt;div class="c-embed__body flex items-center justify-between"&gt;
        &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-files-synchronization.html" rel="noopener noreferrer" class="c-link fw-bold flex items-center"&gt;
          &lt;span class="mr-2"&gt;docs.aws.amazon.com&lt;/span&gt;
          

        &lt;/a&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


&lt;h3&gt;
  
  
  4. Adapter Layers for Legacy Applications
&lt;/h3&gt;

&lt;p&gt;Log aggregation tools watching &lt;code&gt;/var/log/&lt;/code&gt;, build systems reading from &lt;code&gt;/src/&lt;/code&gt;, config management tools writing to &lt;code&gt;/etc/&lt;/code&gt; — these applications assume &lt;code&gt;open()&lt;/code&gt; / &lt;code&gt;read()&lt;/code&gt; / &lt;code&gt;write()&lt;/code&gt; and rewriting them for the S3 SDK is often impractical.&lt;/p&gt;

&lt;p&gt;Previously, "put files on EFS, back up to S3 as needed" was the pragmatic solution. S3 Files lets you keep S3 as primary storage while applications access it via NFS mount. POSIX permissions and file locking (flock) are supported, making migration possible with a mount point change and zero code changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  New Architecture Patterns
&lt;/h2&gt;

&lt;p&gt;What becomes practically feasible for the first time?&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Two-Tier Read Optimization
&lt;/h3&gt;

&lt;p&gt;S3 Files uses a two-tier architecture internally. The first tier, "high-performance storage," caches small, frequently accessed files with sub-millisecond to single-digit millisecond latency per the official documentation. The second tier is S3 itself — reads of 1MB or larger are streamed directly from S3 even if data is cached locally, because S3 is optimized for throughput. Notably, these large reads incur only S3 GET request costs with no file system access charge.&lt;/p&gt;

&lt;p&gt;Official performance specifications:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Max read throughput per client&lt;/td&gt;
&lt;td&gt;3 GiB/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aggregate read throughput per file system&lt;/td&gt;
&lt;td&gt;Terabytes/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max read IOPS per file system&lt;/td&gt;
&lt;td&gt;250,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aggregate write throughput per file system&lt;/td&gt;
&lt;td&gt;1–5 GiB/s (varies by region)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max write IOPS per file system&lt;/td&gt;
&lt;td&gt;50,000&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;


&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
      &lt;div class="c-embed__body flex items-center justify-between"&gt;
        &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-files-performance.html" rel="noopener noreferrer" class="c-link fw-bold flex items-center"&gt;
          &lt;span class="mr-2"&gt;docs.aws.amazon.com&lt;/span&gt;
          

        &lt;/a&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;For context: EBS gp3 provides 125 MiB/s baseline throughput, scalable to 2,000 MiB/s (~2 GB/s) with additional provisioning. io2 Block Express maxes out at 4 GB/s. S3 Files delivers comparable read throughput without any volume provisioning.&lt;/p&gt;

&lt;p&gt;From spec values alone: reading a 100GB dataset sequentially takes ~13 minutes at gp3 default (125 MiB/s) versus ~33 seconds at S3 Files maximum (3 GiB/s). Actual throughput depends on workload and instance type, but the order-of-magnitude difference matters. And since 1MB+ reads are billed at S3 GET rates only, heavy sequential reads essentially incur no file system charges.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Large Reference Data in Lambda
&lt;/h3&gt;

&lt;p&gt;Previously, Lambda functions using large reference data had three options: container images with embedded models (max 10GB, rebuild on every model update), EFS mounts (requires VPC, tends to increase cold starts), or S3 downloads to &lt;code&gt;/tmp&lt;/code&gt; (max 10GB, download time added to cold starts). S3 Files is a fourth option: mount the S3 prefix, read model files via the file system. Model updates require only an S3 upload — no Lambda redeployment needed.&lt;/p&gt;

&lt;p&gt;Unlike EFS mounts, the backend is your standard S3 bucket, so S3-native features like versioning, lifecycle policies, and cross-region replication work as-is.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. AI Agent Access to S3 Data
&lt;/h3&gt;

&lt;p&gt;Coding agents like Claude Code, Codex, Kiro, and Cursor use file system operations as their primary data access method: &lt;code&gt;ls&lt;/code&gt; to list files, &lt;code&gt;cat&lt;/code&gt; to read, editor to modify and save. It's the Unix toolchain.&lt;/p&gt;

&lt;p&gt;Of course, agents can access S3 through other means — running aws cli commands, calling S3 APIs via MCP servers or Skills/Powers, generating boto3 code. But all of these are indirect compared to file operations and add reasoning steps. To search S3 logs, a file system lets you write &lt;code&gt;grep -r "ERROR" /mnt/s3files/logs/&lt;/code&gt; in one line, while the S3 API requires listing objects, downloading individually, and searching locally.&lt;/p&gt;

&lt;p&gt;With S3 Files mounting the bucket, this indirection disappears. To the agent, S3 data is just another directory under &lt;code&gt;/mnt/s3files/&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4r8yd6ripkk8murb0d61.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4r8yd6ripkk8murb0d61.png" alt=" " width="489" height="1662"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Warfield's post describes AWS engineering teams using Kiro and Claude Code hitting the problem of agent context windows compacting and losing session state. With S3 Files, agents write investigation notes and task summaries to shared directories, and other agents read them. When sessions end, state persists on the file system for the next session.&lt;/p&gt;

&lt;p&gt;File locking (flock) supports mutual exclusion across agents and processes. However, S3 API access bypasses file locks — if you write from both the file system and S3 API simultaneously, locking won't protect you.&lt;/p&gt;

&lt;h2&gt;
  
  
  Constraints and Decision Criteria
&lt;/h2&gt;

&lt;p&gt;S3 Files isn't universal. Key constraints to evaluate:&lt;/p&gt;

&lt;h3&gt;
  
  
  Commit Interval: ~60 Seconds (By Design)
&lt;/h3&gt;

&lt;p&gt;Writes take ~60 seconds to appear as S3 objects. If job B reads via S3 API immediately after job A writes via the file system, job B may see stale data.&lt;/p&gt;

&lt;p&gt;This isn't just a limitation — it's a cost optimization. Per the official documentation, consecutive writes to the same file are aggregated within the 60-second window and committed as a single S3 PUT, reducing S3 request costs and versioning storage overhead.&lt;/p&gt;

&lt;p&gt;Sync throughput per the &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-files-performance.html" rel="noopener noreferrer"&gt;official performance specification&lt;/a&gt;: S3 → file system at up to 2,400 objects/s and 700 MB/s; file system → S3 at up to 800 files/s and 2,700 MB/s.&lt;/p&gt;

&lt;p&gt;No "commit now" API exists at GA. Warfield mentions this as an area for future improvement. Workarounds: pass data between jobs via the file system (same mount point), or trigger downstream jobs via S3 event notifications.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rename Costs
&lt;/h3&gt;

&lt;p&gt;S3 has no native rename. File system renames are implemented as copy + delete internally. Per the &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-files-performance.html" rel="noopener noreferrer"&gt;official performance specification&lt;/a&gt;, renaming a directory of 100,000 files completes instantly on the file system, but takes several minutes to reflect in the S3 bucket. During that window, the file system shows the new path while S3 still has the old keys. S3-side request costs (100K CopyObject + 100K DeleteObject) are also non-trivial.&lt;/p&gt;

&lt;h3&gt;
  
  
  Buckets Exceeding 50 Million Objects
&lt;/h3&gt;

&lt;p&gt;Warfield's post warns about mounting buckets with more than 50 million objects (this figure doesn't currently appear on the &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-files-quotas.html" rel="noopener noreferrer"&gt;official quotas page&lt;/a&gt;). Consider mounting a specific prefix to narrow the scope.&lt;/p&gt;

&lt;h3&gt;
  
  
  VPC Requirement
&lt;/h3&gt;

&lt;p&gt;Mount targets live inside a VPC. Lambda functions and EC2 instances must connect from subnets in the same AZ as the mount target. Per the &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-files.html" rel="noopener noreferrer"&gt;official documentation&lt;/a&gt;, supported compute services are EC2, Lambda, EKS, and ECS. On-premises or cross-cloud resources are not in the supported list.&lt;/p&gt;

&lt;h3&gt;
  
  
  Namespace Incompatibilities
&lt;/h3&gt;

&lt;p&gt;Some S3 object keys can't be represented as POSIX filenames: keys ending with &lt;code&gt;/&lt;/code&gt;, keys containing POSIX-invalid characters, or path components exceeding 255 bytes. See the &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-files-quotas.html" rel="noopener noreferrer"&gt;official quotas page&lt;/a&gt; for the full list.&lt;/p&gt;

&lt;p&gt;This is intentional. Per Warfield's post, the team chose to pass through the vast majority of keys that work in both worlds and emit events for incompatible ones rather than silently converting them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Versioning Required
&lt;/h3&gt;

&lt;p&gt;S3 Files &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-files-prereq-policies.html" rel="noopener noreferrer"&gt;requires S3 bucket versioning&lt;/a&gt;. For existing buckets, evaluate the storage cost impact (old versions are retained) and compatibility with existing lifecycle rules.&lt;/p&gt;

&lt;h3&gt;
  
  
  Decision Flowchart
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp33nbpcbl3hn946f2xrj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp33nbpcbl3hn946f2xrj.png" alt=" " width="800" height="1066"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Review in Your Existing Architecture
&lt;/h2&gt;

&lt;p&gt;First, inventory your pipelines for "copy from S3, process, write back to S3" patterns. Batch processing and ML preprocessing pipelines with EBS/EFS staging layers are prime candidates for replacement.&lt;/p&gt;

&lt;p&gt;Second, consider how storage choices change for new projects. "Put it in S3 now, access it as a file system later" is now a viable strategy, reducing the urgency of early "object vs. file system" decisions.&lt;/p&gt;

&lt;p&gt;Third, audit Lambda functions that explicitly download to / upload from &lt;code&gt;/tmp&lt;/code&gt;. Functions handling large reference data or sharing data across invocations are worth evaluating.&lt;/p&gt;

&lt;p&gt;S3 started 20 years ago as an object store. With Tables, Vectors, and now Files, it has expanded how data can be accessed. S3 Files removes one more architectural constraint imposed by storage choices. It won't apply to every workload, but for organizations where "the data is in S3 but the tools need a file system" — and that's a lot of organizations — the impact is significant.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>s3</category>
      <category>lambda</category>
      <category>architecture</category>
    </item>
    <item>
      <title>AI-Powered Test Case Review with MagicPod MCP Server and Claude</title>
      <dc:creator>Kento IKEDA</dc:creator>
      <pubDate>Mon, 30 Mar 2026 03:55:38 +0000</pubDate>
      <link>https://dev.to/aws-builders/ai-powered-test-case-review-with-magicpod-mcp-server-and-claude-4123</link>
      <guid>https://dev.to/aws-builders/ai-powered-test-case-review-with-magicpod-mcp-server-and-claude-4123</guid>
      <description>&lt;h2&gt;
  
  
  Test Automation has Advanced. Is Review Also Part of the Loop?
&lt;/h2&gt;

&lt;p&gt;The number of teams automating E2E tests with no-code tools like MagicPod is increasing. The hurdle for creating test cases has certainly lowered. However, who ensures the quality of the created test cases, and how?&lt;/p&gt;

&lt;p&gt;For source code in product development, there is a place where code management and review are integrated, such as GitHub Pull Requests. On the other hand, no-code test automation tools often lack an equivalent environment, leaving the establishment of a review system to the team's will and ingenuity. Often, the creator of a test case remains the only person who understands it deeply. It is worth pausing to consider whether this state is leading to individual dependency (siloing).&lt;/p&gt;

&lt;p&gt;In this article, I will introduce a mechanism for AI review of test cases by combining the official MCP server provided by MagicPod with Claude. I have summarized it in a reproducible form, from setup to actual review.&lt;/p&gt;

&lt;h3&gt;
  
  
  Target Audience of This Article
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Teams operating test cases in MagicPod but without a functioning review process.&lt;/li&gt;
&lt;li&gt;QA engineers, developers, and PdMs who feel there are challenges with test quality.&lt;/li&gt;
&lt;li&gt;Those interested in MCP servers and AI agents but lacking a specific image of how to use them.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To ensure that even those unfamiliar with terminal operations can reproduce this, I have also included GUI-based procedures.&lt;/p&gt;

&lt;h2&gt;
  
  
  Overview: What to Do and How
&lt;/h2&gt;

&lt;p&gt;The overall structure of the mechanism is simple:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Connect the MagicPod MCP server to Claude Desktop.&lt;/li&gt;
&lt;li&gt;Prepare a Skill file that defines the review criteria.&lt;/li&gt;
&lt;li&gt;Simply instruct "Review this," and the AI will fetch, analyze, and output a report of the test cases.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The &lt;a href="https://support.magic-pod.com/hc/en-us/articles/46186888063769-MagicPod-MCP-Server" rel="noopener noreferrer"&gt;MagicPod MCP Server&lt;/a&gt; is an official module for operating various MagicPod functions from AI agents (Claude, Cursor, Cline, etc.). It is &lt;a href="https://github.com/Magic-Pod/magicpod-mcp-server" rel="noopener noreferrer"&gt;published on GitHub as MIT-licensed OSS&lt;/a&gt;, and no additional costs are incurred on the MagicPod side.&lt;/p&gt;

&lt;p&gt;As of March 2026, there are mainly four things you can do via the MCP server:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Execute tests via Web API.&lt;/li&gt;
&lt;li&gt;Retrieve test execution information (statistical information, etc.).&lt;/li&gt;
&lt;li&gt;Refer to help pages to suggest usage or troubleshooting.&lt;/li&gt;
&lt;li&gt;Create and edit test cases in natural language via Autopilot.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the AI review described in this article, we use the "retrieval of test case information" among these. It only reads test cases and does not make changes to existing tests. Note that test creation, editing, and execution are only supported in cloud environments, and not in local PC environments—a constraint to keep in mind. All you need is a MagicPod contract and a Claude subscription.&lt;/p&gt;

&lt;p&gt;Furthermore, the official help also introduces &lt;a href="https://support.magic-pod.com/hc/en-us/articles/52768069495321" rel="noopener noreferrer"&gt;how to identify unstable locators&lt;/a&gt; using the MCP server. This content is also included in the review criteria of this article, showing that MagicPod also envisions improving test case quality via MCP.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Important Note:&lt;/strong&gt; The &lt;a href="https://support.magic-pod.com/hc/en-us/articles/46186888063769-MagicPod-MCP-Server" rel="noopener noreferrer"&gt;official help&lt;/a&gt; explicitly states that user information is not used for machine learning via the MCP server, and MagicPod does not retain the entered prompt information. The Web API token is written in the MCP server's config file but is designed not to be passed to the AI agent.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Why Explain with Claude Desktop?
&lt;/h3&gt;

&lt;p&gt;In this article, I explain the procedure using Claude Desktop because it is an environment that is easy to introduce even for those not used to CLI or terminals. Terminal operation is limited to just one edit of a configuration file; everything else is completed within the chat UI. It can be set up via GUI, and the MCP connection status can be checked on the screen, making it suitable as a first step for those encountering MCP servers for the first time.&lt;/p&gt;

&lt;h3&gt;
  
  
  If You Use Other Tools
&lt;/h3&gt;

&lt;p&gt;Since the MagicPod MCP server complies with MCP (Model Context Protocol), it can be used from any AI tool that supports MCP. The review criteria and prompts in this article are designed to be general-purpose and tool-independent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you use Cursor&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Cursor has a mechanism called Project Rules, and you can place the Skill file from this article directly as a Rule. An article by Hacobu ("&lt;a href="https://zenn.dev/hacobu/articles/dd7715c7cd38c2" rel="noopener noreferrer"&gt;Trying to create an AI review mechanism with Cursor × MagicPod MCP Server&lt;/a&gt;") is a helpful preceding case for Cursor users. Refer to Cursor's official documentation for how to set up the MCP server.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you use Claude Code&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Claude Code also supports MCP servers. You can achieve equivalent operation by writing the contents of the Skill file in &lt;code&gt;CLAUDE.md&lt;/code&gt; or specifying the MCP server with the &lt;code&gt;--mcp-config&lt;/code&gt; option. This might be easier for those comfortable with the terminal.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you use ChatGPT&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Since September 2025, ChatGPT has supported MCP servers in Developer Mode. However, it only supports remote servers (SSE / streaming HTTP) and does not support local execution (stdio). Since the MagicPod MCP server is a stdio method launched via &lt;code&gt;npx&lt;/code&gt; locally, you need to use something like ngrok to create a tunnel to connect directly from ChatGPT. In terms of ease, Claude Desktop or Cursor are simpler to set up.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Other MCP-compatible tools (Cline, Windsurf, etc.)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you add the MagicPod MCP server to the MCP server configuration file (equivalent to &lt;code&gt;claude_desktop_config.json&lt;/code&gt;), it will work with any tool. The review criteria prompts are plain text, so there is no need to convert them to tool-specific formats.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Although the steps in this article assume Claude Desktop, the design of the review criteria and the content of the Skill file are the essence of this mechanism. Choose the tool based on your preference.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Setup (5 minutes)
&lt;/h2&gt;

&lt;p&gt;The following steps are for the first time only. Once the setup is complete, the AI will autonomously execute reviews according to the Skill file.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Have a MagicPod account and access to the project to be reviewed (&lt;a href="https://app.magicpod.com/accounts/trial-request/" rel="noopener noreferrer"&gt;Free trial&lt;/a&gt; is also possible).&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://claude.ai/download" rel="noopener noreferrer"&gt;Claude Desktop&lt;/a&gt; must be installed (Mac / Windows).&lt;/li&gt;
&lt;li&gt;Must be subscribed to Claude Pro or higher.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 1: Get MagicPod API Token
&lt;/h3&gt;

&lt;p&gt;Access &lt;a href="https://app.magicpod.com/accounts/api-token/" rel="noopener noreferrer"&gt;https://app.magicpod.com/accounts/api-token/&lt;/a&gt;, issue a token, and copy it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Install Node.js (if not already installed)
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;npx&lt;/code&gt; is required to run the MCP server. Please install the LTS version from &lt;a href="https://nodejs.org/" rel="noopener noreferrer"&gt;https://nodejs.org/&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Edit Claude Desktop Configuration File
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Open Claude Desktop.&lt;/li&gt;
&lt;li&gt;Select &lt;strong&gt;Claude → Settings...&lt;/strong&gt; from the menu bar.&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Developer&lt;/strong&gt; on the left menu → &lt;strong&gt;Edit Config&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;The folder containing the configuration file will open in Finder (or Explorer on Windows). Open &lt;code&gt;claude_desktop_config.json&lt;/code&gt; with a text editor.&lt;/li&gt;
&lt;li&gt;Replace the content with the following and save:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"magicpod-mcp-server"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"-y"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"magicpod-mcp-server"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"--api-token=PASTE_YOUR_TOKEN_FROM_STEP_1_HERE"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Replace &lt;code&gt;PASTE_YOUR_TOKEN_FROM_STEP_1_HERE&lt;/code&gt; with the API token string you copied in Step 1. The token should directly follow &lt;code&gt;--api-token=&lt;/code&gt; (e.g., &lt;code&gt;--api-token=abc123def456&lt;/code&gt;).&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Restart and Confirm Connection
&lt;/h3&gt;

&lt;p&gt;Completely quit Claude Desktop and restart it.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; To check the MCP after restarting, look under the &lt;strong&gt;+&lt;/strong&gt; → &lt;strong&gt;Connectors&lt;/strong&gt; below the chat input field. It's okay if &lt;code&gt;magicpod-mcp-server&lt;/code&gt; is displayed. If it doesn't appear in an already open chat, try opening a new chat.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;As a functional check, type the following into the chat:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Are you connected to MagicPod? Please tell me the list of available organizations and projects.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw5d6s6ngn5jg21qc2by7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw5d6s6ngn5jg21qc2by7.png" alt=" " width="800" height="511"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If the organization name and project list are returned, the setup is complete.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementing the Skill File
&lt;/h2&gt;

&lt;p&gt;Once setup is finished, the next thing to do is place the Skill file.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is a Skill File?
&lt;/h3&gt;

&lt;p&gt;A Skill file is a Markdown file that teaches an AI agent how to perform a specific task. Based on the &lt;a href="https://code.claude.com/docs/en/skills" rel="noopener noreferrer"&gt;Agent Skills Open Standard&lt;/a&gt;, the same &lt;code&gt;SKILL.md&lt;/code&gt; format can be used across multiple tools like Claude Code, Cursor, Gemini CLI, and Codex CLI.&lt;/p&gt;

&lt;p&gt;The Skill file provided in this article combines the following into a single file:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Target organization and project names for review.&lt;/li&gt;
&lt;li&gt;Review criteria (what to check, importance levels).&lt;/li&gt;
&lt;li&gt;Output format (findings for each test case + summary).&lt;/li&gt;
&lt;li&gt;Report generation (sharing formats for Slack/Confluence).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Placement Method
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Placement Location&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude Desktop / claude.ai&lt;/td&gt;
&lt;td&gt;Create a project and add the file as Knowledge&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Code&lt;/td&gt;
&lt;td&gt;Place at &lt;code&gt;.claude/skills/magicpod-review/SKILL.md&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cursor&lt;/td&gt;
&lt;td&gt;Place in &lt;code&gt;.cursor/rules&lt;/code&gt; (Loaded as Project Rules)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Other MCP Tools&lt;/td&gt;
&lt;td&gt;Add to each tool's rule/knowledge location&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;ol&gt;
&lt;li&gt;Copy the full text of the Skill file at the end of this article.&lt;/li&gt;
&lt;li&gt;Save it with the filename &lt;code&gt;magicpod-review.md&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Rewrite &lt;code&gt;{Organization Name}&lt;/code&gt; and &lt;code&gt;{Project Name}&lt;/code&gt; in the file.&lt;/li&gt;
&lt;li&gt;Add it to the placement location for your tool according to the table above.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you have multiple projects, rewrite the "Basic Settings" section of the Skill file as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Basic Settings&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; Target Organization: MyOrganization
&lt;span class="p"&gt;-&lt;/span&gt; Target Projects (Review all projects in order if none specified):
&lt;span class="p"&gt;  -&lt;/span&gt; ProjectA (Browser)
&lt;span class="p"&gt;  -&lt;/span&gt; ProjectB (Mobile App)
&lt;span class="p"&gt;  -&lt;/span&gt; ProjectC (Browser)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For Claude Desktop:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a new project from "Projects" (e.g., "MagicPod Review").&lt;/li&gt;
&lt;li&gt;Select "Files" and upload &lt;code&gt;magicpod-review.md&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Open a new chat within that project.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Contents of the Skill File: Designing Review Perspectives
&lt;/h2&gt;

&lt;p&gt;The core of a Skill file is the definition of review perspectives. Here, based on the 10 perspectives from &lt;a href="https://magicpod.com/blog/testcase-review-idea/" rel="noopener noreferrer"&gt;the MagicPod official blog post&lt;/a&gt;, 10 Ideas for Test Automation Review Perspectives, we have categorized them into those that are easy for AI to detect and those that require human judgment.&lt;/p&gt;

&lt;p&gt;This classification is a crucial point that determines the accuracy of the Skill file. While an AI will function even if you simply ask it to check everything, the results will be a mix of irrelevant comments and useful suggestions, making it time-consuming to filter through them. By deciding in advance what to leave to the AI and what should be reviewed by a human, the reliability of the output increases, bringing the quality closer to a level where review results can be shared with the team as they are.&lt;/p&gt;

&lt;h3&gt;
  
  
  AI Detectable Criteria
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Criterion&lt;/th&gt;
&lt;th&gt;What to detect&lt;/th&gt;
&lt;th&gt;Importance&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Missing Assertions&lt;/td&gt;
&lt;td&gt;Test cases with zero verification commands&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fixed Wait Times&lt;/td&gt;
&lt;td&gt;Use of "Wait X seconds" type commands&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Unstable Locators&lt;/td&gt;
&lt;td&gt;XPath index dependency (e.g., &lt;code&gt;div[2]/button[1]&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Naming Issues&lt;/td&gt;
&lt;td&gt;Empty test case names or description fields&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mechanical Names&lt;/td&gt;
&lt;td&gt;Auto-generated names like "Button Element (1)"&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Abandoned Steps&lt;/td&gt;
&lt;td&gt;Steps left as "Temporarily Disabled"&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Long Tests&lt;/td&gt;
&lt;td&gt;Over 200 steps&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Unused Shared Steps&lt;/td&gt;
&lt;td&gt;Repeated patterns not using shared steps&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Criteria requiring human judgment (AI only provides information)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Criterion&lt;/th&gt;
&lt;th&gt;Information provided by AI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Validity of Assertions&lt;/td&gt;
&lt;td&gt;List of assertions in use (Team decides if appropriate)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dependencies&lt;/td&gt;
&lt;td&gt;List of test cases not using session restart&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; MagicPod Web API's &lt;code&gt;human_readable_steps&lt;/code&gt; does not include step line numbers and cannot distinguish whether a step is disabled. Identifying the specific location of a finding must be based on step content.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Are These 10 Perspectives Sufficient?
&lt;/h3&gt;

&lt;p&gt;To conclude, they function well for a primary screening in AI reviews, but they are not a silver bullet. Based on the results of reviewing over 60 test cases across multiple projects, here are the strengths and limitations of these 10 perspectives.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fields well-covered: Structural issues&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Among the 10 perspectives, issues such as missing assertions (Perspective 1), naming deficiencies (Perspective 4), overly long tests (Perspective 7), and underutilized shared steps (Perspective 8) can be identified by looking at the structure of the test case. AI excels at this type of pattern matching and was able to detect these with high accuracy during actual reviews. In particular, empty description fields were found in many test cases, and it can be said that this single perspective alone made the review worthwhile.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Blind spots: Validity of test design&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;On the other hand, as mentioned in the official blog as other perspectives, consistency with test design documents and the validity of test data are not included in these 10 perspectives. These are difficult to judge within a MagicPod review alone, as they require cross-referencing with test design documents or product specifications. Therefore, this is currently a heavy burden for AI reviews via MCP.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Findings outside the 10 perspectives&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In this review, points were raised that were not included in the original 10 perspectives. These were all observations regarding test cases created to verify MagicPod's behavior, but it is noteworthy that the AI picked up on such remnants.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Broken steps (UI element: not configured). Left incomplete without being configured.&lt;/li&gt;
&lt;li&gt;Test data containing passwords in plain text.&lt;/li&gt;
&lt;li&gt;A mixture of tests for checking MagicPod behavior and production regression tests within the same project.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are examples where the AI picked up additional issues from the context without being bound by the list of perspectives in the prompt. In other words, while specifying the 10 perspectives in the prompt, there is room for the AI to expand upon them autonomously. I feel that it is important for the practical operation of AI reviews to not restrict the perspectives too strictly and to leave some margin for the AI.&lt;/p&gt;

&lt;p&gt;In summary, the 10 perspectives are well-balanced for covering test case implementation quality and can be recommended as a baseline for AI reviews. Reviews that require test design validity or product-specific domain knowledge remain the role of humans.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Use
&lt;/h2&gt;

&lt;p&gt;If you have already finished placing the Skill file, the usage is simple.&lt;/p&gt;

&lt;h3&gt;
  
  
  Basics: Running a Review
&lt;/h3&gt;

&lt;p&gt;Just type one of the following into the chat:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Review this
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Check the test cases
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbgqe6bwx3tkrep88j034.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbgqe6bwx3tkrep88j034.png" alt=" " width="800" height="847"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Since the organization name, project name, viewpoints, and output format are all defined in the Skill file, the AI will autonomously perform the following steps with just that one phrase: "Retrieve test case list -&amp;gt; Analyze each step -&amp;gt; Check according to viewpoints -&amp;gt; Output according to the format."&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Please execute this from the chat within the project where the Skill file is located. If you type "Review this" in a regular chat (outside of the project), the AI will ask "What would you like me to review?" because the Skill file will not be loaded.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Sharing the Report
&lt;/h3&gt;

&lt;p&gt;Once the review results are available, you can also generate reports for sharing.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Summarize for Slack
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Create a report in a format that can be pasted into Confluence
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Skill file includes templates for both Slack mrkdwn and standard Markdown formats, ensuring the output matches your intended destination.&lt;/p&gt;

&lt;p&gt;In environments where the Slack MCP connector is connected, entering "Summarize for Slack" will present you with options such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Post directly to a channel (posts immediately if a channel name is specified)&lt;/li&gt;
&lt;li&gt;Save as a draft (for when you want to review the content before posting)&lt;/li&gt;
&lt;li&gt;Output only the mrkdwn text (for manual copying and pasting)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the Confluence or Jira MCP connectors are connected, you can also create Confluence pages or Jira tickets directly from the chat. Even in environments without active MCP connectors, text output for copying and pasting is always available.&lt;/p&gt;

&lt;p&gt;The following is an example of a report output for Slack (project names and other details have been replaced with placeholders).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;🔍 MagicPod Test Case AI Review Results
Date: 2026/03/28 | Target: SampleOrg (4 projects / Over 60 cases)

🔴 High: 3 cases 🟡 Medium: Many 🟢 Low: 0 cases
━━━━━━━━━━━━━━━━━━━━━━━
🚨 Top 3 Priority Improvement Actions

1. Description Field Entries - Medium
The description field is empty in many of the target test cases.

2. Organization of Behavior Verification Tests - Medium
Tests used to verify behavior during initial setup still remain in the old folder.

3. Replacement of Fixed Wait Times - Medium
Wait 3 seconds and Wait 5 seconds are used extensively throughout ProjectD.
━━━━━━━━━━━━━━━━━━━━━━━
1. ProjectA_Web (22 cases)
- Description field is empty: Almost all cases
- Mechanical UI element names: Area(8), Area(119), etc.
- Regression tests have substantial assertions and appropriate comment separation.

2. ProjectB_Web (23 cases)
- Description field is empty: All cases
- Shared steps not utilized: Screen verification patterns are identical in 10 or more cases.
- Naming conventions are well-consistent across all test cases.

(Details for the other 2 projects are in the thread below 👇)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The report features a three-tier structure: Priority Improvement Actions, followed by Project Summaries, and then Details in Threads. This format allows the team to understand exactly what to do first at a glance. By also including positive points (💡), I have ensured the report provides encouragement rather than just pointing out issues.&lt;/p&gt;

&lt;h3&gt;
  
  
  Customization Examples
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Review while excluding the old folder.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Review only ProjectB_Web.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Show the differences from the last time.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Skill file defines rules to handle these types of instructions, so it works simply by adding conditions in natural language.&lt;/p&gt;

&lt;h2&gt;
  
  
  Results of the Actual Implementation
&lt;/h2&gt;

&lt;p&gt;We executed AI reviews on more than 60 test cases across 4 projects. The following are the results.&lt;/p&gt;

&lt;h3&gt;
  
  
  Most Common Finding: Numerous Empty Description Fields
&lt;/h3&gt;

&lt;p&gt;A large number of the target test cases had empty description fields. This is likely a common situation across many teams. When only the test case name is provided, the intent is not conveyed effectively, which leads to lower accuracy in reviews, handovers, and AI utilization.&lt;/p&gt;

&lt;p&gt;While MagicPod features an AI summarization function, it summarizes the actions of the steps. The underlying purpose, such as what specifically needs to be verified in the test, must be written by a human.&lt;/p&gt;

&lt;h3&gt;
  
  
  Typical Findings in Learning Test Cases
&lt;/h3&gt;

&lt;p&gt;Test cases created at the beginning of projects to verify MagicPod's behavior were still present. Although they were stored in an old folder separate from production regression tests, the AI review accurately detected issues with them.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A test named Test 1. The purpose was unclear, and the description field was empty.&lt;/li&gt;
&lt;li&gt;Missing assertions. The test only performed login operations without a step to verify that the login was successful. It was just running through operations without guaranteeing anything.&lt;/li&gt;
&lt;li&gt;UI element names like Area(119). Mechanical names automatically generated by MagicPod remained, making it impossible for a third party to identify what the element was.&lt;/li&gt;
&lt;li&gt;Neglected unconfigured steps. In one test case, a broken step labeled UI element: not configured was left behind.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of these were remnants from when MagicPod's behavior was being tested. In the production regression tests, naming conventions were unified and assertions were properly implemented. However, the AI's ability to point out that these traces of experimentation remaining in the project could become noise during bulk execution was very useful.&lt;/p&gt;

&lt;p&gt;These are the types of issues that anyone would notice if they performed a review, but if no review is conducted, they remain neglected forever. The value of an AI review lies in mechanically picking up these problems that are obvious upon inspection but lack someone to look at them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Discoveries Outside the Defined Perspectives
&lt;/h3&gt;

&lt;p&gt;Furthermore, some findings emerged that were not included in the ten initial perspectives.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Frequent use of fixed wait times. In one project, wait 3 seconds and wait 5 seconds were scattered across multiple test cases.&lt;/li&gt;
&lt;li&gt;Visualization of quality differences between projects. By reviewing across four projects, it became immediately clear that the rate of description completion varied significantly by project.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;While following the list of perspectives, the AI sometimes picks up additional issues from the context. This behavior is similar to that of a human reviewer and can be seen as an advantage of not strictly limiting the prompts.&lt;/p&gt;

&lt;h3&gt;
  
  
  Detecting Positive Aspects
&lt;/h3&gt;

&lt;p&gt;AI reviews were also effective for detecting good practices, not just pointing out flaws.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Test cases where sections were appropriately divided using comments (&lt;code&gt;//&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Configurations where shared steps were utilized, keeping the login process DRY (Don't Repeat Yourself).&lt;/li&gt;
&lt;li&gt;Projects where test case naming conventions were unified across all test cases.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Reports that only contain criticisms can lower team motivation. Including notes on what is well-done increases the overall acceptability of the report.&lt;/p&gt;

&lt;h2&gt;
  
  
  Limitations and Proper Usage
&lt;/h2&gt;

&lt;p&gt;AI review is not a silver bullet. It is necessary to recognize the following limitations in advance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What AI is good at:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Structural issues that can be detected via pattern matching (naming conventions, presence of assertions, number of steps)&lt;/li&gt;
&lt;li&gt;Bulk analysis and comparison across projects&lt;/li&gt;
&lt;li&gt;Consistency in maintaining the same check standards no matter how many times it is executed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What AI is not good at:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Semantic-level judgment, such as whether this assertion is appropriate for the purpose of this test&lt;/li&gt;
&lt;li&gt;Checking consistency with test design documents (when those documents are outside of MagicPod)&lt;/li&gt;
&lt;li&gt;Team-specific context (e.g., there is an unavoidable reason for this fixed-time wait)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In practice, a realistic approach is a two-tier structure: use AI review for self-review or primary screening, and then have a human review only the points that require human judgment. A workflow is becoming widespread where an AI review runs first on a code Pull Request, and humans focus on making decisions after reviewing the summary. The same structure can be applied to test case reviews.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;The reason test case reviews become dependent on specific individuals often stems from the lack of a formal review system. When review criteria are vague and reviewers lack sufficient time, the reviews themselves eventually stop happening.&lt;/p&gt;

&lt;p&gt;The MagicPod MCP server combined with AI offers a solution to this problem, providing consistent primary screening rather than a perfect review. In this experiment, the fact that many test case description fields were empty is something a human reviewer could also point out. What the AI did was simply look at them.&lt;/p&gt;

&lt;p&gt;If you are already using MagicPod, setup takes only five minutes, and it takes about ten minutes even including the placement of the Skill file. The only additional cost is the fee for Claude. Why not start by typing review into one of your projects and seeing what kind of feedback you get?&lt;/p&gt;




&lt;h2&gt;
  
  
  Appendix: Full Text of the Skill File
&lt;/h2&gt;

&lt;p&gt;The review perspectives, output formats, and report sharing features explained in this article are all included in this single file. Please replace &lt;code&gt;{Organization Name}&lt;/code&gt; and &lt;code&gt;{Project Name}&lt;/code&gt; with your own environment and add the file to the location mentioned earlier.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# MagicPod Test Case AI Review Skill&lt;/span&gt;

&lt;span class="gu"&gt;## About This File&lt;/span&gt;

Placing this file in Claude Desktop's Project Knowledge or CLAUDE.md enables automated review of test cases via the MagicPod MCP server.

&lt;span class="gu"&gt;## Prerequisites&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; The MagicPod MCP server must be connected.
&lt;span class="p"&gt;-&lt;/span&gt; You must know the target Organization Name and Project Name.

&lt;span class="gu"&gt;## Basic Settings&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; Target Organization: {Organization Name}
&lt;span class="p"&gt;-&lt;/span&gt; Target Project: {Project Name}
&lt;span class="p"&gt;-&lt;/span&gt; Review Output Language: English

&lt;span class="gu"&gt;## How to Execute a Review&lt;/span&gt;

Execute the review when the user gives any of the following instructions:
&lt;span class="p"&gt;-&lt;/span&gt; "Review this"
&lt;span class="p"&gt;-&lt;/span&gt; "Check the test cases"
&lt;span class="p"&gt;-&lt;/span&gt; "Please do a MagicPod review"

&lt;span class="gu"&gt;### Execution Steps&lt;/span&gt;
&lt;span class="p"&gt;
1.&lt;/span&gt; Retrieve the list of test cases for the target project via the MagicPod MCP server.
&lt;span class="p"&gt;2.&lt;/span&gt; Retrieve step details for each test case.
&lt;span class="p"&gt;3.&lt;/span&gt; Check them according to the review perspectives below.
&lt;span class="p"&gt;4.&lt;/span&gt; Output the results according to the specified format.

&lt;span class="gu"&gt;## Review Perspectives&lt;/span&gt;

Reference: https://magicpod.com/blog/testcase-review-idea/

The following perspectives are categorized into "Automatic Detection" and "Human Judgment Support." 
For automatic detection, always point out any relevant findings. 
For human judgment support, present the information and leave the final decision to the user.

&lt;span class="gu"&gt;### Automatic Detection (Point out if applicable)&lt;/span&gt;

&lt;span class="gu"&gt;#### Perspective 1: Missing Assertions&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Detect test cases that do not contain any verification commands (assert-type or "verify"-type).
&lt;span class="p"&gt;-&lt;/span&gt; Tests consisting only of operations do not guarantee anything.
&lt;span class="p"&gt;-&lt;/span&gt; Severity: High

&lt;span class="gu"&gt;#### Perspective 2: Use of Fixed-Time Waits&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Detect locations using "Wait X seconds" or "Fixed-time wait" commands.
&lt;span class="p"&gt;-&lt;/span&gt; Recommend replacing them with condition-based waits (e.g., "Wait until element is displayed").
&lt;span class="p"&gt;-&lt;/span&gt; If their use is unavoidable, a reason should be left in the comments.
&lt;span class="p"&gt;-&lt;/span&gt; Severity: Medium

&lt;span class="gu"&gt;#### Perspective 3: Unstable Locators&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Detect index-dependent locators in XPath (e.g., //div[@id='root']/div[2]/button[1]).
&lt;span class="p"&gt;-&lt;/span&gt; Only individually created locators are subject to check (AI self-healing targets have lower priority).
&lt;span class="p"&gt;-&lt;/span&gt; Severity: Low (Since these are naturally discovered if run daily).

&lt;span class="gu"&gt;#### Perspective 4: Deficient Test Case Names or Descriptions&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Test case name is empty or does not follow naming conventions.
&lt;span class="p"&gt;-&lt;/span&gt; Description field is empty (the purpose of the test is not described in text).
&lt;span class="p"&gt;-&lt;/span&gt; Severity: Medium

&lt;span class="gu"&gt;#### Perspective 5: Remaining Mechanical UI Element Names&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Automatically generated names like "Button element (1)" or "Text input (2)" remain as-is.
&lt;span class="p"&gt;-&lt;/span&gt; Meaningful names like "Email address input field" or "Login button" are ideal.
&lt;span class="p"&gt;-&lt;/span&gt; Severity: Medium

&lt;span class="gu"&gt;#### Perspective 6: Neglected Disabled Steps&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Detect steps that remain in a "Temporarily disabled" state.
&lt;span class="p"&gt;-&lt;/span&gt; Unnecessary steps hinder the understanding of the test's intent.
&lt;span class="p"&gt;-&lt;/span&gt; Severity: Low

&lt;span class="gu"&gt;#### Perspective 7: Overly Long Tests&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Detect test cases exceeding 200 steps.
&lt;span class="p"&gt;-&lt;/span&gt; Negatively affects readability, stability, and maintainability.
&lt;span class="p"&gt;-&lt;/span&gt; Severity: High

&lt;span class="gu"&gt;#### Perspective 8: Underutilization of Shared Steps&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Detect suspected cases where the same operation patterns are hard-coded across multiple test cases.
&lt;span class="p"&gt;-&lt;/span&gt; Typical examples include login processes or test data initialization.
&lt;span class="p"&gt;-&lt;/span&gt; Severity: Medium

&lt;span class="gu"&gt;### Human Judgment Support (Information presentation only)&lt;/span&gt;

&lt;span class="gu"&gt;#### Perspective 9: Validity of Assertions&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; List the assertions (verification commands) used in each test case.
&lt;span class="p"&gt;-&lt;/span&gt; Leave it to the user to judge whether they are appropriate for the objective.
&lt;span class="p"&gt;-&lt;/span&gt; Identify whether URL verification, image diff, element value verification, visibility check, or title check is used.

&lt;span class="gu"&gt;#### Perspective 10: Dependencies Between Test Cases&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; List test cases that do not use session restart.
&lt;span class="p"&gt;-&lt;/span&gt; If dependencies exist, check if that fact is noted in the description field.

&lt;span class="gu"&gt;### Constraints on API Specifications&lt;/span&gt;

The &lt;span class="sb"&gt;`human_readable_steps`&lt;/span&gt; in the MagicPod Web API has the following constraints:
&lt;span class="p"&gt;-&lt;/span&gt; Step line numbers are not returned: When pointing out issues, identify them by step content rather than "Step X."
&lt;span class="p"&gt;-&lt;/span&gt; It is unclear if a step is disabled: Add a note that the detection accuracy for Perspective 6 is limited.

&lt;span class="gu"&gt;## Output Format&lt;/span&gt;

&lt;span class="gu"&gt;### Findings per Test Case&lt;/span&gt;

&lt;span class="p"&gt;```&lt;/span&gt;&lt;span class="nl"&gt;
&lt;/span&gt;
markdown
## Test Case: {Test Case Name}

| # | Perspective | Finding | Severity |
|---|-------------|---------|----------|
| 1 | Missing Assertions | No verification commands exist | High |
| 4 | Naming Deficiency | Description field is empty | Medium |


&lt;span class="p"&gt;```&lt;/span&gt;

&lt;span class="gu"&gt;### Summary (Always output at the end of the review)&lt;/span&gt;

&lt;span class="p"&gt;```&lt;/span&gt;&lt;span class="nl"&gt;
&lt;/span&gt;
markdown
## Review Summary

- Total Test Cases: X
- Cases with Findings: Y
- Cases without Findings: Z

### Tally by Perspective
| Perspective | Count |
|-------------|-------|
| Missing Assertions | X |
| Fixed-Time Wait | X |
| ... | ... |

### Top 3 Priority Improvement Actions
1. ...
2. ...
3. ...


&lt;span class="p"&gt;```&lt;/span&gt;

&lt;span class="gu"&gt;## Setup Guide (No terminal required)&lt;/span&gt;

&lt;span class="gu"&gt;### Step 1: Obtain MagicPod API Token&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; Access https://app.magicpod.com/accounts/api-token/
&lt;span class="p"&gt;2.&lt;/span&gt; Copy the token string.

&lt;span class="gu"&gt;### Step 2: Install Node.js (Only if not already installed)&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; Access https://nodejs.org/
&lt;span class="p"&gt;2.&lt;/span&gt; Download and install the LTS version.
&lt;span class="p"&gt;3.&lt;/span&gt; This enables &lt;span class="sb"&gt;`npx`&lt;/span&gt;, which is required to run the MCP server.

&lt;span class="gu"&gt;### Step 3: Edit Claude Desktop Configuration File&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; Open Claude Desktop.
&lt;span class="p"&gt;2.&lt;/span&gt; Select Claude &amp;gt; Settings... from the menu bar.
&lt;span class="p"&gt;3.&lt;/span&gt; Select "Developer" from the left menu.
&lt;span class="p"&gt;4.&lt;/span&gt; Click the "Edit Config" button. The folder containing the config file will open in Finder/Explorer.
&lt;span class="p"&gt;5.&lt;/span&gt; Replace the content with the following and save:

&lt;span class="p"&gt;```&lt;/span&gt;&lt;span class="nl"&gt;
&lt;/span&gt;
json
{
  "mcpServers": {
    "magicpod-mcp-server": {
      "command": "npx",
      "args": ["-y", "magicpod-mcp-server", "--api-token=PASTE_TOKEN_HERE"]
    }
  }
}


&lt;span class="p"&gt;```&lt;/span&gt;

Note: If other MCP servers are already configured, add this entry inside the &lt;span class="sb"&gt;`mcpServers`&lt;/span&gt; object.

&lt;span class="gu"&gt;### Step 4: Restart Claude Desktop&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; Completely quit Claude Desktop (Cmd+Q).
&lt;span class="p"&gt;2.&lt;/span&gt; Start it again.
&lt;span class="p"&gt;3.&lt;/span&gt; Connection is successful if the MCP tool icon appears below the chat input field.

&lt;span class="gu"&gt;### Step 5: Verify Operation&lt;/span&gt;
Enter the following in the chat:
&lt;span class="p"&gt;```&lt;/span&gt;&lt;span class="nl"&gt;
&lt;/span&gt;
plaintext
Are you connected to MagicPod? Please tell me the list of available organizations and projects.


&lt;span class="p"&gt;```&lt;/span&gt;

&lt;span class="gu"&gt;### About Costs&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; MagicPod MCP server itself: Free (MIT licensed OSS)
&lt;span class="p"&gt;-&lt;/span&gt; Required: MagicPod main subscription + Claude Pro subscription
&lt;span class="p"&gt;-&lt;/span&gt; For review purposes (read-only), no additional MagicPod costs are incurred.&lt;span class="sb"&gt;


&lt;/span&gt;&lt;span class="gu"&gt;## Report Generation and Sharing&lt;/span&gt;

After the review is complete, generate a report for sharing if the user gives any of the following instructions:
&lt;span class="p"&gt;-&lt;/span&gt; "Make a report" / "Summarize this"
&lt;span class="p"&gt;-&lt;/span&gt; "I want to share this on Slack" / "I want to post this to Confluence"
&lt;span class="p"&gt;-&lt;/span&gt; "Summarize for sharing"

&lt;span class="gu"&gt;### Format Based on Destination&lt;/span&gt;

&lt;span class="gu"&gt;#### For Slack Posting&lt;/span&gt;

Output using Slack mrkdwn syntax. Assuming long texts will be split into threads, the main body should contain only key points.

&lt;span class="p"&gt;```&lt;/span&gt;&lt;span class="nl"&gt;
&lt;/span&gt;
markdown
*🔍 MagicPod Test Case AI Review Results*
Execution Date: YYYY-MM-DD

*Target*
• Project: {Project Name}
• Test Cases: X

*Summary*
• Findings: Y / No findings: Z
• 🔴 High: X  🟡 Medium: X  🟢 Low: X

*Top 3 Priority Improvement Actions*
1. ...
2. ...
3. ...

Details in thread 👇


&lt;span class="p"&gt;```&lt;/span&gt;

Post detailed findings for each test case in the thread.

&lt;span class="gu"&gt;#### For Confluence / Documentation&lt;/span&gt;

Output a full report in Markdown format. This can be used as-is with Confluence's Markdown paste feature.

&lt;span class="p"&gt;```&lt;/span&gt;&lt;span class="nl"&gt;
&lt;/span&gt;
markdown
# MagicPod Test Case AI Review Report

## Basic Information
- Execution Date: YYYY-MM-DD
- Target Organization: {Organization Name}
- Target Project: {Project Name}
- Total Test Cases: X
- Review Perspectives: Based on MagicPod Official Blog Top 10 + Additional Perspectives

## Summary

| Metric | Value |
|--------|-------|
| Test Cases with Findings | Y (XX%) |
| Test Cases without Findings | Z (XX%) |
| High Severity Findings | X |
| Medium Severity Findings | X |
| Low Severity Findings | X |


&lt;span class="p"&gt;```&lt;/span&gt;

&lt;span class="p"&gt;```&lt;/span&gt;&lt;span class="nl"&gt;
&lt;/span&gt;
markdown
## Tally by Perspective

| # | Perspective | Count | Severity |
|---|-------------|-------|----------|
| 1 | Missing Assertions | X | High |
| 2 | Fixed-Time Wait | X | Medium |
| 3 | Unstable Locators | X | Low |
| 4 | Naming Deficiency (Name/Desc) | X | Medium |
| 5 | Mechanical UI Element Names | X | Medium |
| 6 | Neglected Disabled Steps | X | Low |
| 7 | Long Tests (Over 200) | X | High |
| 8 | Underutilization of Shared Steps | X | Medium |
| + | Security (Plaintext passwords, etc.)| X | High |
| + | Broken Steps (not configured) | X | High |

## Top 3 Priority Improvement Actions

1. **[Action Name]** — [Reason and Expected Effect]
2. **[Action Name]** — [Reason and Expected Effect]
3. **[Action Name]** — [Reason and Expected Effect]

## Good Practices

(Include good practices detected by AI. Provide praise as well as improvements.)

## Details by Test Case

(Expand the list of findings for each test case here.)

## Supplement: Basis for Review Perspectives

This review is based on the perspectives from the MagicPod official blog "Reviewing Automated Tests? 10 Review Perspective Ideas" 
(https://magicpod.com/blog/testcase-review-idea/), reconstructed in a format suitable for AI detection.


&lt;span class="p"&gt;```&lt;/span&gt;

&lt;span class="gu"&gt;### Instructions for Sharing Execution&lt;/span&gt;

Handling cases where the user specifies a concrete sharing destination:

&lt;span class="gu"&gt;#### "Post to Slack"&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; If Slack MCP server is connected: Post directly to the channel using &lt;span class="sb"&gt;`slack_send_message`&lt;/span&gt;.
&lt;span class="p"&gt;2.&lt;/span&gt; If not connected: Output text in Slack mrkdwn format and prompt the user to copy-paste.

&lt;span class="gu"&gt;#### "Post to Confluence"&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; If Atlassian MCP server is connected: Create a page using &lt;span class="sb"&gt;`createConfluencePage`&lt;/span&gt;.
&lt;span class="p"&gt;2.&lt;/span&gt; If not connected: Output in Markdown format and prompt the user to use Confluence's Markdown paste.

&lt;span class="gu"&gt;#### "Make a Jira ticket"&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; Propose creating Jira tickets for findings with High severity.
&lt;span class="p"&gt;2.&lt;/span&gt; After user approval, create them using &lt;span class="sb"&gt;`createJiraIssue`&lt;/span&gt;.

&lt;span class="gu"&gt;### Report Customization&lt;/span&gt;

Adjust the report if the user specifies the following:
&lt;span class="p"&gt;-&lt;/span&gt; "Exclude the 'old' folder" -&amp;gt; Generate report only for production tests.
&lt;span class="p"&gt;-&lt;/span&gt; "Summarize across projects" -&amp;gt; Generate a cross-project summary.
&lt;span class="p"&gt;-&lt;/span&gt; "Show only the good points" -&amp;gt; Generate a "praise" report rather than an improvement report.
&lt;span class="p"&gt;-&lt;/span&gt; "Show the difference from last time" -&amp;gt; Compare with the previous report (saved in Knowledge).

&lt;span class="p"&gt;```&lt;/span&gt;&lt;span class="nl"&gt;
&lt;/span&gt;
`


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>ai</category>
      <category>testing</category>
      <category>mcp</category>
      <category>magicpod</category>
    </item>
    <item>
      <title>When Software Development Common Sense Flips: The Law of Decreasing Generation Costs</title>
      <dc:creator>Kento IKEDA</dc:creator>
      <pubDate>Sat, 14 Mar 2026 07:07:10 +0000</pubDate>
      <link>https://dev.to/aws-builders/when-software-development-common-sense-flips-the-law-of-decreasing-generation-costs-506j</link>
      <guid>https://dev.to/aws-builders/when-software-development-common-sense-flips-the-law-of-decreasing-generation-costs-506j</guid>
      <description>&lt;p&gt;CLAUDE.md, .cursor/rules, Kiro Specs, Devin Playbooks. The past year has seen an explosion of instruction files for AI. While every new tool comes with a new name, doesn't what they are doing feel like deja vu?&lt;/p&gt;

&lt;p&gt;Isnt this a requirements definition document? Isnt this onboarding material? Isnt this a runbook?&lt;/p&gt;

&lt;p&gt;Anyone involved in software development for a long time likely feels the same way. At the same time, you might feel that some practices previously considered common sense are starting to become a hindrance in modern development.&lt;/p&gt;

&lt;p&gt;This article explores the true nature of this deja vu and sense of misalignment.&lt;/p&gt;

&lt;p&gt;To state the conclusion first: the design philosophy of AI coding tools is a reinvention of practices built over 20 years of human team development. In this process, part of software development common sense is structurally flipping. There appears to be a simple law to categorize what flips and what remains universal.&lt;/p&gt;

&lt;h2&gt;
  
  
  What are Instruction Files Reinventing?
&lt;/h2&gt;

&lt;p&gt;First, let's organize the facts.&lt;/p&gt;

&lt;p&gt;If we map the ways to provide instructions in various AI tools to what humans have done in team development for years, it looks like this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;AI Tool Concept&lt;/th&gt;
&lt;th&gt;Human Team Development Equivalent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Kiro Specs&lt;/td&gt;
&lt;td&gt;PRD / Requirements Definition&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CLAUDE.md&lt;/td&gt;
&lt;td&gt;Onboarding Documentation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;.cursor/rules&lt;/td&gt;
&lt;td&gt;Coding Conventions (.eslintrc, etc.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Devin Playbooks&lt;/td&gt;
&lt;td&gt;Operating Procedures / Runbooks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hooks (PreToolUse, etc.)&lt;/td&gt;
&lt;td&gt;Git hooks / CI Pipelines&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Skills&lt;/td&gt;
&lt;td&gt;Internal Common Libraries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Plugins&lt;/td&gt;
&lt;td&gt;Templates distributed via npm/pip&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Many of you have likely noticed this correspondence.&lt;/p&gt;

&lt;p&gt;The important part is what comes next. In human team development, there was a long period of trial and error regarding the design of what to provide. Giving only coding conventions doesn't help a new member get moving. Giving only a PRD doesn't convey design intent. Giving only a procedure manual doesn't allow for exceptional judgments. The knowledge that a person becomes effective only when provided with layers of why this design was chosen, what must not be touched, and criteria for judgment when stuck has been accumulated over many years.&lt;/p&gt;

&lt;p&gt;The instruction design for AI tools is exactly a reinvention of this knowledge.&lt;/p&gt;

&lt;p&gt;Furthermore, the differences in design philosophy among tools can be translated into differences in what kind of organization provides what to a new member first.&lt;/p&gt;

&lt;p&gt;Kiro is like a company that starts with specifications. Let's structure the requirements and clarify acceptance criteria before starting implementation. The design of having three stages—requirements.md, design.md, and tasks.md—aims for a middle ground between pre-definition and iteration. (Note: Kiro provides Requirements-First and Design-First workflows; here, we focus mainly on the Requirements-First flow).&lt;/p&gt;

&lt;p&gt;Claude Code is like a company that emphasizes verbalizing implicit knowledge. The guideline to write persistent context that cannot be inferred from code is spreading as a best practice for CLAUDE.md, which is the essence of onboarding materials. It is the idea of organizing technical stacks, reasons for past design decisions, and hidden pitfalls that cannot be read from the code itself.&lt;/p&gt;

&lt;p&gt;Cursor is like a company that gives rules and lets people work freely. You write project rules flatly in .cursor/rules and leave the rest to the agent's judgment. Because the degree of freedom is high, the quality of the rules directly affects the output quality. Conversely, since the ability to write rules serves as leverage for the team's output quality, the operational design of rules (who updates them and when) becomes implicitly important.&lt;/p&gt;

&lt;p&gt;Devin is like a company that makes people follow procedures. It is a design where humans control the task execution flow by explicitly defining steps in Playbooks.&lt;/p&gt;

&lt;p&gt;None of these is the single correct answer; rather, they represent design differences based on the nature of the team and the task.&lt;/p&gt;

&lt;h2&gt;
  
  
  Predecessors Hit the Same Walls
&lt;/h2&gt;

&lt;p&gt;Now for the main topic.&lt;/p&gt;

&lt;p&gt;Looking at the failure patterns occurring in AI tool instruction design, we notice they are structurally identical to the failures repeated in human team development over the last 20 years.&lt;/p&gt;

&lt;h3&gt;
  
  
  Collapse Without Documentation
&lt;/h3&gt;

&lt;p&gt;Using Claude Code without writing CLAUDE.md and repeatedly re-instructing it because it's not right is almost the same structure as the silos created when starting a project without design documents.&lt;/p&gt;

&lt;p&gt;In human teams, if the basis for design decisions is not documented, a state of it's unknown unless you ask that person is born. For AI, it is even simpler: context disappears once you cross sessions. As a result, you repeat the same explanation, yet get a different output than before. This is the very challenge stated over a decade ago: without documentation, implicit knowledge evaporates.&lt;/p&gt;

&lt;p&gt;The efforts of Kiro to structure requirements.md in EARS format and Claude Code to persist prerequisite knowledge via CLAUDE.md are architectural answers to this evaporation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Paralysis by Information Overload
&lt;/h3&gt;

&lt;p&gt;Adding ten MCP servers only to have the response quality collapse is another common pattern.&lt;/p&gt;

&lt;p&gt;Defining too many tools crowds the context window, and the agent begins to skim the results. As pointed out in Anthropic's article on Code execution with MCP, direct calls to MCP tools can cause token consumption to explode. One example mentioned is a case where transferring a transcript of a two-hour sales meeting from Google Drive to Salesforce could consume an additional 50,000 tokens.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.anthropic.com/engineering/code-execution-with-mcp" rel="noopener noreferrer"&gt;https://www.anthropic.com/engineering/code-execution-with-mcp&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is the same structure as inviting a new hire to 30 Slack channels on their first day and causing them to freeze from information overload. Both humans and AI have limits to the amount of information they can process, necessitating a design that provides necessary information at the appropriate granularity. The evolution of Claude Code Skills, Cursor Rules, and Kiro Powers as mechanisms to provide only necessary information when needed is a response to this problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  Chaos Without Specifications
&lt;/h3&gt;

&lt;p&gt;Kiro Specs' design philosophy of writing specifications first is an answer to the classic failure pattern of starting development without specs and seeing the project go up in flames in the latter half.&lt;/p&gt;

&lt;p&gt;Since AI generates code probabilistically, it will fill in the blanks of ambiguous specifications on its own. You won't know if that matches your intent until you verify the output. If you run with ambiguous specs, a game of whack-a-mole style bug fixing begins. This is the same for both human and AI development.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Inevitability of Automation
&lt;/h3&gt;

&lt;p&gt;Forcing automatic formatting or linting via Hooks upon saving files is a reenactment of the history where we moved from relying on human goodwill for reviews and style consistency to automating it with CI.&lt;/p&gt;

&lt;p&gt;Ten to fifteen years ago, themes like Jenkins implementation, test automation, and continuous integration were frequent at web conferences. Discussions on securing quality through systems became popular, leading to build servers, the cultivation of automated testing cultures, and the establishment of deployment pipelines. The rapid development of Hooks and Skills in the context of AI coding today is exactly a reenactment of this history.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Law: What Happens When Generation Costs Drop?
&lt;/h2&gt;

&lt;p&gt;While the summary so far is that history repeats itself, another point is that some common sense is flipping.&lt;/p&gt;

&lt;p&gt;To explain this flip, I propose a law:&lt;/p&gt;

&lt;p&gt;When generation cost decreases, the center of gravity for value shifts from the product to the intent.&lt;/p&gt;

&lt;p&gt;The cost of writing code has dropped dramatically. Instruct an AI, and hundreds of lines of code appear in seconds. Let's consider the structural consequences of this change.&lt;/p&gt;

&lt;p&gt;It is the same structure as the movement of bottlenecks in the Theory of Constraints (TOC). When the cost of one process drops sharply, the bottleneck moves to another process. Ten years ago, implementation man-hours were often the bottleneck, so it was rational to invest in technologies that improved code quality, such as test automation, refactoring, and coding conventions.&lt;/p&gt;

&lt;p&gt;Now, writing code itself has become cheap. Consequently, the bottleneck moves to what to make it write. The process of verbalizing specifications, recording design decisions, and organizing context—in other words, the process of refining intent—is becoming the bottleneck for productivity.&lt;/p&gt;

&lt;p&gt;Note that this law relies on the current asymmetry where generation costs have dropped, but verification costs have not decreased at the same rate. If AI-driven code review or formal verification advances significantly in the future, verification costs may also drop dramatically, moving the bottleneck to yet another process. In that sense, this is not a permanent law of the universe but a law explaining current structural dynamics.&lt;/p&gt;

&lt;p&gt;Using this law, let's categorize development common sense into what flips and what remains universal.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Flips: Refining the Product (Code)
&lt;/h2&gt;

&lt;p&gt;These are things considered a given for any good developer over the past 20 years, but for which the ROI is changing in the AI era.&lt;/p&gt;

&lt;p&gt;Note that these haven't become meaningless; rather, they have changed from things you should always invest in to things you judge based on the situation.&lt;/p&gt;

&lt;h3&gt;
  
  
  DRY Principle (Flip Degree: Medium to High)
&lt;/h3&gt;

&lt;p&gt;DRY (Don't Repeat Yourself) is a design principle to eliminate code duplication and centralize changes. This was a rational investment when humans were maintaining the code. Duplication leads to missed updates and becomes a hotbed for bugs.&lt;/p&gt;

&lt;p&gt;However, in the context of AI generating code, abstraction for the sake of DRY can sometimes become a risk.&lt;/p&gt;

&lt;p&gt;Introducing abstraction layers for unification increases the chance of the AI misreading the context. If the responsibility of a common function is unclear, the AI might use that function for unintended purposes or call it in inappropriate places. Concrete, self-contained code is often less likely to be misunderstood by AI.&lt;/p&gt;

&lt;p&gt;Of course, DRY remains important in core parts of the codebase that humans will maintain long-term. The criteria for judgment have changed: we must now discern whether this code is a core component for long-term maintenance or a peripheral one intended for regeneration.&lt;/p&gt;

&lt;h3&gt;
  
  
  Refactoring (Flip Degree: Medium)
&lt;/h3&gt;

&lt;p&gt;Refactoring is the process of improving internal structure without changing external behavior. It has long been valued as a means of paying off technical debt.&lt;/p&gt;

&lt;p&gt;As code generation costs drop, the calculation changes. Which is faster and more reliable: carefully refactoring existing code over a week, or clarifying specifications and having it regenerated? The latter is increasingly becoming a realistic option.&lt;/p&gt;

&lt;p&gt;However, this doesn't mean refactoring is unnecessary. Regeneration is only effective within the scope where specifications can be clearly verbalized. Code containing implicit specifications accumulated over years of operation—such as edge case handling not written in docs or performance tuning results—risks being lost during regeneration.&lt;/p&gt;

&lt;p&gt;The axis of investment has shifted from always refactor to refactor or regenerate in this case.&lt;/p&gt;

&lt;h3&gt;
  
  
  Test Coverage (Flip Degree: Low to Medium)
&lt;/h3&gt;

&lt;p&gt;The value of writing tests itself does not change. What has changed is the ROI of humans writing test code by hand. What is flipping here is the allocation of investment toward metrics, not the design principle.&lt;/p&gt;

&lt;p&gt;If you leave identifying test cases and generating test code to AI, comprehensive tests appear in seconds. What humans should do here is design the test strategy—judging what should be tested—rather than implementing the test code.&lt;/p&gt;

&lt;p&gt;Spending time on determining whether a test verifies a truly meaningful specification is becoming higher ROI than spending time just to raise coverage numbers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Naming and Readability (Flip Degree: Low)
&lt;/h3&gt;

&lt;p&gt;The degree of flip for investment in readability is smaller compared to other items, but judgment is beginning to change depending on who is reading. If the code is for humans to read and understand, beautiful naming and clever comments remain important.&lt;/p&gt;

&lt;p&gt;On the other hand, in code primarily read and written by AI, type information and intent explanations in JSDoc can improve AI understanding more than beautiful naming intended for humans. The definition of readability itself is changing depending on the reader.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Universal: Refining the Intent
&lt;/h2&gt;

&lt;p&gt;While some things flip, others have always been important and are even more worth investing in now. According to the law, these are things related to the definition of intent and judgment—processes AI cannot do.&lt;/p&gt;

&lt;h3&gt;
  
  
  Verbalizing Specifications and Intent
&lt;/h3&gt;

&lt;p&gt;The ability to accurately verbalize what to make and why is actually increasing in value in the AI era.&lt;/p&gt;

&lt;p&gt;It is no coincidence that Kiro Specs is designed to have you write specifications first. If you throw ambiguous specs at an AI, it will probabilistically fill in the blanks, resulting in code that sort of works but you don't know if it meets the specs. The effort to verify and fix this often exceeds the effort of writing specifications from the start.&lt;/p&gt;

&lt;p&gt;Starting development without a PRD leads to chaos later. This lesson from 10 years ago is perfectly valid for AI development. In fact, because AI generation speed is fast, ambiguous specs can lead to a large amount of rework in a short time, arguably making the importance of specifications greater than before.&lt;/p&gt;

&lt;h3&gt;
  
  
  Recording Design Decisions
&lt;/h3&gt;

&lt;p&gt;The culture of ADR (Architecture Decision Records)—why this configuration was chosen, what alternatives existed, and why they were rejected—is being rediscovered in the context of CLAUDE.md.&lt;/p&gt;

&lt;p&gt;The guideline for CLAUDE.md to write persistent context that cannot be inferred from code is exactly the recording of design decisions. Information such as &lt;code&gt;packages/shared/src/legacy/ is a compatibility layer for old APIs. You will want to refactor it, but do not touch it (three external systems depend on it, scheduled for removal in 2026 Q3)&lt;/code&gt; cannot be inferred no matter how closely an AI reads the code.&lt;/p&gt;

&lt;p&gt;To have AI write correct code, we must provide the background of these design decisions as context. This is the same structure as onboarding a new human member; the verbalization and persistence of prerequisite knowledge is universally important across eras.&lt;/p&gt;

&lt;h3&gt;
  
  
  Review and Verification Ability
&lt;/h3&gt;

&lt;p&gt;The value of the ability to read and judge is rising relative to the ability to write.&lt;/p&gt;

&lt;p&gt;In the era when CI was just about if the build passes, it's okay, the quality gate was the success or failure of the build. Now, the phase where humans judge if the AI output meets the requirements is becoming the quality bottleneck.&lt;/p&gt;

&lt;p&gt;Reviewing AI-generated code differs in quality from reviewing human-written code. AI can produce inconsistent errors and it is not rare for it to lie convincingly. To spot library version mismatches, calls to non-existent APIs, or logic that ignores instructions, one needs an understanding of the entire codebase and a deep understanding of specifications.&lt;/p&gt;

&lt;p&gt;Since review is becoming the process that determines quality, investment here is universally important.&lt;/p&gt;

&lt;h3&gt;
  
  
  Organizing Context
&lt;/h3&gt;

&lt;p&gt;CLAUDE.md, Steering, Rules. Though the names differ, they all represent the act of organizing prerequisite knowledge to provide to the AI.&lt;/p&gt;

&lt;p&gt;We are in an era where the most cost should be spent on organizing preconditions before writing code. This is a direct consequence of the law: when generation cost drops, the bottleneck moves to intent. Code can be regenerated as many times as you like, but if the preconditions are wrong, it will never be correct no matter how many times you regenerate it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Categorizing by the Law
&lt;/h2&gt;

&lt;p&gt;If we look at the summary so far along one axis, the criteria for categorization are simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What Flips = Investment in processes AI can now do cheaply (Refining the product)&lt;/li&gt;
&lt;li&gt;What is Universal = Investment in processes AI still cannot do (Refining the intent)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As generation costs drop, the ROI of the former decreases, and the ROI of the latter increases. This can be called a structural law that does not depend on specific tools or languages.&lt;/p&gt;

&lt;h2&gt;
  
  
  Predicting the Next from History
&lt;/h2&gt;

&lt;p&gt;Combining this law with patterns from software development history, we can see several problems that have not yet fully materialized but are almost certain to come in the context of AI coding. While each of these themes is deep enough to be explored in its own separate article, I will only outline their structures here.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Problem of Document Obsolescence
&lt;/h3&gt;

&lt;p&gt;The issue where no one updates the internal Wiki and a new member causes an accident using old procedures has occurred repeatedly in technical organizations.&lt;/p&gt;

&lt;p&gt;The same will happen with CLAUDE.md or Kiro Specs. Code is generated and updated by AI, but the maintenance of instruction files is done by humans. This asymmetry will eventually become a problem. A state where the code is up-to-date but the instruction file is a lie is the classic problem of code works but docs are wrong. Handling this issue may become a primary axis in future tool competition.&lt;/p&gt;

&lt;p&gt;Kiro's Agent Hooks is a mechanism that can trigger agents based on events such as file changes, allowing for the configuration of tasks like automatic document updates. However, as pointed out in the SDD (Spec-Driven Development) tool comparison article published on martinfowler.com, drift between specification and implementation remains a challenge. Addressing this issue may become a key factor in the future competition among tools.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://martinfowler.com/articles/exploring-gen-ai/sdd-3-tools.html" rel="noopener noreferrer"&gt;https://martinfowler.com/articles/exploring-gen-ai/sdd-3-tools.html&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Return of It Works on My Machine
&lt;/h3&gt;

&lt;p&gt;Before Docker, deployments often broke due to differences in development environments. It works on my machine hindered team development due to lack of reproducibility.&lt;/p&gt;

&lt;p&gt;A similar problem is appearing in AI tool instruction design. On GitHub, dotfiles repositories like My Claude Code Settings or My AI Agent Setup are surging, creating a culture of sharing personally optimized CLAUDE.md, skills, and hooks. While healthy for individual productivity, It works with my CLAUDE.md could eventually become an obstacle in team development.&lt;/p&gt;

&lt;h3&gt;
  
  
  The AI Version of the DevOps Problem
&lt;/h3&gt;

&lt;p&gt;DevOps was born out of the disconnect between those who build and those who run. In the future, the same wall may arise in the structure of humans operating code in production that was written by AI.&lt;/p&gt;

&lt;p&gt;Lisanne Bainbridge's 1983 paper Ironies of Automation pointed out that as automation becomes more sophisticated, operators lose manual skills and become unable to respond to exceptions. This insight, repeatedly confirmed in aviation and process control, also applies to software development. As AI writes more code, humans understand the details of the codebase less, and humans may become unable to handle failures that the AI cannot handle.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Ironies_of_Automation" rel="noopener noreferrer"&gt;https://en.wikipedia.org/wiki/Ironies_of_Automation&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Orchestration Problem of Multi-Agents
&lt;/h3&gt;

&lt;p&gt;With the decentralization of microservices, how to maintain consistency between services became a major challenge. The same could happen with Agent Teams. Running 10 agents in parallel might decrease efficiency due to the overhead of sharing context. This is the Agent version of Brooks's Law—adding manpower to a late software project makes it later—proposed by Fred Brooks in The Mythical Man-Month in 1975.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Brooks%27s_law" rel="noopener noreferrer"&gt;https://en.wikipedia.org/wiki/Brooks%27s_law&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The design philosophy of AI coding tools is a reinvention of practices built over 20 years of human team development. Some parts of history repeat, while some common sense flips.&lt;/p&gt;

&lt;p&gt;The law for categorization is simple: When generation cost decreases, the center of gravity for value shifts from the product to the intent.&lt;/p&gt;

&lt;p&gt;Refining code—DRY, refactoring, naming, coverage—is changing from things you should always invest in to things you judge based on the situation. Refining intent—verbalizing specifications, recording design decisions, review ability, organizing context—remains as important as ever, and its weight as a bottleneck has increased.&lt;/p&gt;

&lt;p&gt;And if you know the patterns of history, you can see through new tools and concepts, recognizing them as that thing from 10 years ago. If you can see through them, you can apply the lessons left by your predecessors as they are.&lt;/p&gt;

&lt;p&gt;Problems that look new may boil down to known patterns when viewed structurally. Thinking this way, there might not be that many truly unknown challenges after all.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>Why Kiro Looks Unassuming: Organizing Design Philosophy Differences in the Age of Claude Code and Cursor</title>
      <dc:creator>Kento IKEDA</dc:creator>
      <pubDate>Mon, 23 Feb 2026 23:44:15 +0000</pubDate>
      <link>https://dev.to/aws-builders/why-kiro-looks-unassuming-organizing-design-philosophy-differences-in-the-age-of-claude-code-and-1dp9</link>
      <guid>https://dev.to/aws-builders/why-kiro-looks-unassuming-organizing-design-philosophy-differences-in-the-age-of-claude-code-and-1dp9</guid>
      <description>&lt;p&gt;As AI-assisted development becomes the norm, AI coding tools are rapidly shifting from simple "code completion" to "autonomous agent development." This evolution is fast becoming the standard.&lt;/p&gt;

&lt;p&gt;By early 2026, major tools have all evolved toward a direction where "agents autonomously write, test, and fix code"—exemplified by Claude Code’s Agent Teams, Cursor’s parallel Subagents, GitHub Copilot’s Coding Agent, and Windsurf’s parallel Cascade.&lt;/p&gt;

&lt;p&gt;In this landscape, using Kiro might honestly feel a bit underwhelming.&lt;/p&gt;

&lt;p&gt;However, if we discuss this "unassuming" nature without breaking down which layer of development it addresses, Kiro risks being dismissed as merely a "weak IDE."&lt;/p&gt;

&lt;p&gt;This article is intended for developers who use Claude Code or Cursor daily. For those already reaping the benefits of agents, I want to organize the differences in the problems Kiro is trying to solve.&lt;/p&gt;

&lt;h1&gt;
  
  
  Premise: What Makes 2026 AI Coding Tools Strong?
&lt;/h1&gt;

&lt;p&gt;First, let’s align our understanding of the starting point.&lt;/p&gt;

&lt;p&gt;The assessment that "Claude is superior in terms of overall raw capability" is likely based on practical observation rather than emotion. However, by early 2026, it’s fair to say that we are no longer in a situation where "only Claude is strong." The current reality is that major tools are evolving rapidly in distinct directions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Claude Code: Reasoning Power x Autonomous Loops x Multi-Agents
&lt;/h2&gt;

&lt;p&gt;Claude Code’s strength can be described in three layers.&lt;/p&gt;

&lt;p&gt;First is its massive context processing capability. With a context window of up to 1 million tokens (beta), it can fit design docs, implementation code, tests, logs, and diff histories into a single reasoning space. This moves the needle from "local optimization" toward "semi-global reasoning," making cross-dependency analysis and side-effect detection possible within realistic timeframes.&lt;/p&gt;

&lt;p&gt;Second is the autonomous execution loop. Partially automating the loop of Implementation → Testing → Error Reading → Root Cause Hypothesis → Fixing changes the very structure of productivity by expanding human cognitive bandwidth.&lt;/p&gt;

&lt;p&gt;Third is Agent Teams, released alongside Opus 4.6 in February 2026. A lead agent decomposes tasks, and multiple workers execute them in parallel within independent contexts. Sequential work that used to take hours now completes in minutes through parallelization. This is particularly effective for large-scale refactoring or feature implementations spanning multiple modules.&lt;/p&gt;

&lt;p&gt;Furthermore, with structured instructions via &lt;code&gt;CLAUDE.md&lt;/code&gt; or &lt;code&gt;AGENTS.md&lt;/code&gt;, tool integration through MCP, and context continuity between sessions via an automated memory system, it has evolved from a mere reasoning engine into an agent development platform.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cursor: Parallel Subagents and Proprietary Models
&lt;/h2&gt;

&lt;p&gt;Cursor changed significantly with version 2.0. By integrating the proprietary coding model "Composer," it achieved four times the generation speed of previous versions. Even more noteworthy are the asynchronous Subagents. Multiple sub-agents run in parallel without blocking the parent agent, forming a tree structure where sub-agents can spawn further sub-agents. Test results in review articles have reported that migrating a router in an 8,000-line Next.js app was reduced from 17 minutes (sequential) to 9 minutes (parallel).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://myaiverdict.com/cursor-ai-review/" rel="noopener noreferrer"&gt;https://myaiverdict.com/cursor-ai-review/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With Background Agents, an agent can now work on an independent branch and open a Pull Request while the developer is busy with other tasks. It also features process control mechanisms like Plan Mode, Rules, and Hooks.&lt;/p&gt;

&lt;h2&gt;
  
  
  GitHub Copilot: Issue-Driven Autonomous Agents
&lt;/h2&gt;

&lt;p&gt;GitHub Copilot’s Coding Agent creates a Draft Pull Request autonomously in the background simply by assigning an Issue to &lt;code&gt;@copilot&lt;/code&gt;. Operating within the GitHub Actions environment, it automates everything from branch creation to commits and opening PRs. It can share project-specific instructions via &lt;code&gt;AGENTS.md&lt;/code&gt; and link tools via MCP. It is designed to iterate on automatic fixes based on feedback from PR reviews.&lt;/p&gt;

&lt;h2&gt;
  
  
  Windsurf: Parallel Cascade and SWE-1.5
&lt;/h2&gt;

&lt;p&gt;Windsurf released parallel multi-agent sessions in Wave 13. Using Git worktrees, multiple Cascade agents can operate simultaneously within the same repository without branch conflicts. SWE-1.5 demonstrated performance rivaling frontier models on SWE-Bench-Pro and was offered for free to all users for three months following its December 2025 release. Workflow automation via Cascade Hooks has also been added.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common Trends
&lt;/h2&gt;

&lt;p&gt;The trend here is clear: all major tools are competing on "autonomy and parallelism of reasoning." Processing more files, faster, with less human intervention. This is the dominant competitive axis for AI coding tools in early 2026.&lt;/p&gt;

&lt;p&gt;As long as we evaluate based on this axis, Kiro currently occupies a late-comer position.&lt;/p&gt;

&lt;p&gt;That is exactly why we need to precisely understand what Kiro is actually optimizing for.&lt;/p&gt;

&lt;h1&gt;
  
  
  What is Kiro Optimizing For?
&lt;/h1&gt;

&lt;p&gt;While major tools optimize for "autonomy and parallelism of reasoning," Kiro optimizes for "state management of the development process."&lt;/p&gt;

&lt;p&gt;This may sound abstract, so let’s look at Kiro’s specific mechanisms.&lt;/p&gt;

&lt;h2&gt;
  
  
  Spec: A Mechanism to Fix Specifications as Structure
&lt;/h2&gt;

&lt;p&gt;The core feature of Kiro is Spec (Specification-Driven Development). From natural language prompts, it generates three Markdown files in stages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;requirements.md&lt;/code&gt;: Describes user stories and acceptance criteria using the EARS (Easy Approach to Requirements Syntax) notation. EARS was originally developed by Rolls-Royce for airworthiness regulation analysis of jet engines. it eliminates ambiguity through a structured format: &lt;code&gt;WHEN [condition] THE SYSTEM SHALL [behavior]&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;design.md&lt;/code&gt;: Documents technical architecture, sequence diagrams, and implementation considerations.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;tasks.md&lt;/code&gt;: Breaks down the implementation plan into discrete tasks, explicitly tracing each task back to specific requirements in &lt;code&gt;requirements.md&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Crucially, these three files are not independent documents; they are a linked structure. Changing &lt;code&gt;requirements.md&lt;/code&gt; updates the design via "Refine" in &lt;code&gt;design.md&lt;/code&gt;, which then maps new tasks to requirements through "Update tasks" in &lt;code&gt;tasks.md&lt;/code&gt;. It is designed so that the impact of a spec change propagates structurally.&lt;/p&gt;

&lt;h2&gt;
  
  
  Steering: A Mechanism to Turn Implicit Knowledge into Persistent Context
&lt;/h2&gt;

&lt;p&gt;In many projects, critical information exists only as implicit knowledge:&lt;br&gt;
&lt;em&gt;This constraint stems from audit requirements; this cache design assumes future scalability; this async process requires order guarantees; this API depends on an external contract...&lt;/em&gt;&lt;br&gt;
Much of this is scattered across code, reviews, or verbal explanations, rarely maintained as a formal structure.&lt;/p&gt;

&lt;p&gt;Kiro’s "Steering" fixes these as Markdown files under &lt;code&gt;.kiro/steering/&lt;/code&gt;. By default, it generates three files: &lt;code&gt;product.md&lt;/code&gt; (product goals and business context), &lt;code&gt;tech.md&lt;/code&gt; (tech stack and constraints), and &lt;code&gt;structure.md&lt;/code&gt; (project structure and naming conventions).&lt;/p&gt;

&lt;p&gt;Steering files have three inclusion modes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;always&lt;/code&gt;: Included in all interactions.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;fileMatch&lt;/code&gt;: Included only when matching specific file patterns.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;manual&lt;/code&gt;: Included only when explicitly referenced.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This allows for controlling context window consumption while ensuring the AI receives the necessary premises. It is also designed for team deployment, allowing priority control between global and workspace Steering, and distribution via MDM like Jamf.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hooks: Systematizing Process Automation
&lt;/h2&gt;

&lt;p&gt;Agent Hooks are triggers that automatically execute predefined agent actions in response to events like file creation, saving, or deletion. You can systematize routine tasks in an event-driven manner—such as automatically generating tests upon saving a file, syncing documentation when code changes, or running security checks.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This All Means
&lt;/h2&gt;

&lt;p&gt;Looking at Spec, Steering, and Hooks together, it’s clear that Kiro isn’t just about generating spec docs. It’s about clarifying requirements (&lt;code&gt;requirements.md&lt;/code&gt;), fixing design constraints (&lt;code&gt;design.md&lt;/code&gt; + Steering), task decomposition and traceability (&lt;code&gt;tasks.md&lt;/code&gt;), impact propagation during changes (Refine/Update tasks flow), persisting implicit knowledge (Steering), and automating routine processes (Hooks).&lt;/p&gt;

&lt;p&gt;In short, Kiro’s approach is not about making reasoning faster or stronger, but about clarifying the premises upon which reasoning depends and managing the development process as a state.&lt;/p&gt;

&lt;h1&gt;
  
  
  Reasoning Engines and State Management Belong to Different Layers
&lt;/h1&gt;

&lt;p&gt;This is the fundamental difference.&lt;/p&gt;

&lt;p&gt;LLMs are probabilistic reasoning models. They generate the most plausible output based on the given context. Whether you parallelize with Agent Teams or build tree structures with Subagents, each agent is performing "reasoning within a given context."&lt;/p&gt;

&lt;p&gt;"State management" here refers to a mechanism that explicitly maintains invariants, manages dependencies as a structure, and can recalculate the scope of impact when changes occur.&lt;/p&gt;

&lt;p&gt;While LLMs can refer to context, they only reason "within the provided scope." Any premise outside that context is treated as if it doesn’t exist.&lt;/p&gt;

&lt;p&gt;For example, consider a case where you change a rate-limiting specification. If you give the change to Claude Code, Agent Teams can simultaneously fix code, update tests, and modify documentation. However, if the consistency with future multi-tenant plans or audit requirements isn't in the context, they won't be part of the reasoning. Parallelization doesn't solve this.&lt;/p&gt;

&lt;p&gt;The issue here isn't the model's intelligence or speed, but "where the premises are held."&lt;/p&gt;

&lt;p&gt;Kiro fixes constraints in EARS format in &lt;code&gt;requirements.md&lt;/code&gt;, documents design decisions in &lt;code&gt;design.md&lt;/code&gt;, and persists project-specific premises in Steering. When specs change, the impact propagates structurally from requirements to design to tasks.&lt;/p&gt;

&lt;p&gt;While the major tools of 2026 compete on "how fast, autonomously, and in parallel reasoning can run," Kiro competes on "how structurally the context required for reasoning can be maintained." They are competing on different layers.&lt;/p&gt;

&lt;h1&gt;
  
  
  Isn't "Existing Tools + SDD" Enough?
&lt;/h1&gt;

&lt;p&gt;This is the strongest counterargument, and it has gained even more weight from late 2025 into 2026.&lt;/p&gt;

&lt;h2&gt;
  
  
  SDD is No Longer Unique to Kiro
&lt;/h2&gt;

&lt;p&gt;Specification-Driven Development (SDD) is no longer an approach exclusive to Kiro.&lt;/p&gt;

&lt;p&gt;GitHub released "Spec Kit" as open source. It’s a CLI tool that allows the same requirements → design → tasks flow as Kiro to be used across multiple agents like Claude Code, Cursor, GitHub Copilot, Gemini CLI, and Windsurf. It also supports requirement definition via EARS.&lt;/p&gt;

&lt;p&gt;Furthermore, each tool has begun to incorporate its own process control mechanisms. Claude Code has instruction structuring via &lt;code&gt;CLAUDE.md&lt;/code&gt; and &lt;code&gt;AGENTS.md&lt;/code&gt;. Cursor has Rules and Plan Mode. GitHub Copilot has &lt;code&gt;AGENTS.md&lt;/code&gt; and custom agent definitions. Windsurf has Rules and Workflows.&lt;/p&gt;

&lt;p&gt;The argument that "if you want SDD, just use GitHub's Spec Kit with your tool of choice" or "process control can be done to some extent in any tool" is now quite realistic.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Kiro Still Provides
&lt;/h2&gt;

&lt;p&gt;So, what is Kiro’s differentiator?&lt;/p&gt;

&lt;p&gt;While GitHub's Spec Kit provides the "SDD workflow" in an agent-agnostic way, the synchronization between specifications and implementation remains a human responsibility. If you change a &lt;code&gt;requirements.md&lt;/code&gt; generated by Spec Kit, the propagation of that impact to &lt;code&gt;design.md&lt;/code&gt; or &lt;code&gt;tasks.md&lt;/code&gt; is not automated.&lt;/p&gt;

&lt;p&gt;While Rules or &lt;code&gt;AGENTS.md&lt;/code&gt; in other tools structure instructions for agents, they aren't mechanisms for managing the three-layer structure of specs, design, and tasks in a linked fashion within a single IDE.&lt;/p&gt;

&lt;p&gt;Kiro’s edge lies in the "seamless integration of Spec, Steering, and Hooks within the IDE." The flow of &lt;code&gt;requirements.md&lt;/code&gt; change → &lt;code&gt;design.md&lt;/code&gt; Refine → &lt;code&gt;tasks.md&lt;/code&gt; Update → Task execution → Auto-verification via Hooks is completed within a single environment. Spec files are Git-manageable and can be reviewed and shared by the team.&lt;/p&gt;

&lt;p&gt;This is the difference between the "concept of SDD" and the "operation of SDD." While the concept can be realized with Spec Kit, the cost of maintaining it as a daily operation depends heavily on the degree of tool integration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Addressing the "Expert Engineer + Claude Code" Ultimate Combo
&lt;/h2&gt;

&lt;p&gt;Let's go further: "Isn't an expert engineer writing specs themselves and having Claude Code's Agent Teams implement them the ultimate setup?"&lt;/p&gt;

&lt;p&gt;That is indeed a powerful configuration, but the question is where the responsibility for state management lies.&lt;/p&gt;

&lt;p&gt;In this setup, judgments about "which invariants exist," "which design constraints are vital," and "which changes are breaking" remain as implicit knowledge on the human side. This rarely causes issues in short-term projects. However, as spec changes accumulate, members rotate, or maintenance occurs six months later, the degradation of implicit knowledge is inevitable.&lt;/p&gt;

&lt;p&gt;Kiro provides concrete mechanisms against this degradation. EARS format in &lt;code&gt;requirements.md&lt;/code&gt; fixes acceptance criteria without ambiguity, each task in &lt;code&gt;tasks.md&lt;/code&gt; traces back to a requirement number, and Steering persists project premises in a Git-manageable form. These are versioned, reviewable, and accessible even if the person in charge changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Kiro’s Current Limitations
&lt;/h2&gt;

&lt;p&gt;To be fair, Kiro has clear limitations.&lt;/p&gt;

&lt;p&gt;The synchronization between Spec and implementation code isn't fully automated. As noted in reviews on Martin Fowler’s site, drift between specs and implementation remains a challenge. There are also reports of Spec granularity not fitting project scale—the "sledgehammer for a small nail" problem where four user stories and sixteen acceptance criteria are generated for a tiny bug fix.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://martinfowler.com/articles/exploring-gen-ai/sdd-3-tools.html" rel="noopener noreferrer"&gt;https://martinfowler.com/articles/exploring-gen-ai/sdd-3-tools.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Kiro’s default (Auto mode) uses routing based on the Sonnet 4 series, but Pro users and above can select Opus 4.6. It also features parallel execution via custom Subagents and an Autonomous Agent (preview). The gap in model performance and agent features is closing fast; Kiro’s differentiation will likely remain in its integration of process state management via Spec, Steering, and Hooks, rather than on the axis of "reasoning power and parallelism."&lt;/p&gt;

&lt;p&gt;Kiro's pricing ranges from Free (50 credits/month) to Power ($200/month), using a credit-based usage model.&lt;/p&gt;

&lt;h1&gt;
  
  
  Why Does Kiro Still Look Unassuming?
&lt;/h1&gt;

&lt;p&gt;The evaluation criteria for AI coding tools in 2026 are clear.&lt;/p&gt;

&lt;p&gt;How many minutes did it take to implement with Agent Teams? How many files were modified in parallel with Subagents? How many hours after assigning an Issue to a Coding Agent was a PR opened? How many bugs were crushed simultaneously with parallel Cascade?&lt;/p&gt;

&lt;p&gt;On this axis, speed and autonomy are what look attractive. The impact of "Agent Teams modified 5 modules simultaneously" is overwhelmingly larger as an experience than "Requirements were structured in EARS notation."&lt;/p&gt;

&lt;p&gt;Conversely, Kiro’s value lies in preventing accidents: suppressing spec deviation, preventing missed dependencies, and making design constraints explicit. Generally, "the absence of accidents" is difficult to appreciate.&lt;/p&gt;

&lt;p&gt;As a result, Kiro isn't weak; it just stands in a "position that is hard to evaluate."&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion: So, What Should You Do?
&lt;/h1&gt;

&lt;p&gt;Let’s translate this into a practical decision-making framework.&lt;/p&gt;

&lt;p&gt;First, we must acknowledge that in early 2026, AI coding tools are evolving rapidly in reasoning autonomy and parallelism. In terms of raw implementation speed on this axis, Kiro is not in the same league.&lt;/p&gt;

&lt;p&gt;So when do you need a Kiro-like approach? The criterion is "what is the primary bottleneck in this project?"&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Maximize Agent Reasoning Power
&lt;/h2&gt;

&lt;p&gt;The more the following conditions apply, the more rational it is to stick with Claude Code, Cursor, or GitHub Copilot:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Limited scope&lt;/li&gt;
&lt;li&gt;Relatively stable specifications&lt;/li&gt;
&lt;li&gt;PoC or exploratory phases&lt;/li&gt;
&lt;li&gt;Small team with shared implicit knowledge&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In these cases, the bottleneck is "implementation speed." You gain more by maximizing agent reasoning and generation than by fixing the process.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Threshold for Process State Management
&lt;/h2&gt;

&lt;p&gt;On the other hand, things change when you see these signs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Frequent specification changes&lt;/li&gt;
&lt;li&gt;Non-functional requirements introduced late in the game&lt;/li&gt;
&lt;li&gt;Handovers or long-term maintenance are expected&lt;/li&gt;
&lt;li&gt;Inconsistencies between tests and specs begin to increase&lt;/li&gt;
&lt;li&gt;The labor to verify agent output starts to exceed the labor of implementation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At this point, the bottleneck shifts to "consistency maintenance." Here, where you fix the state matters more than the agent's reasoning capability.&lt;/p&gt;

&lt;p&gt;Having requirements fixed in EARS in &lt;code&gt;requirements.md&lt;/code&gt;, tasks traced back to requirements in &lt;code&gt;tasks.md&lt;/code&gt;, and project premises persisted in Steering becomes a safety net for your future self or your successor six months down the line.&lt;/p&gt;

&lt;p&gt;Whether you adopt Kiro as an IDE or incorporate Kiro’s philosophy into your toolchain via GitHub's Spec Kit is a separate decision. The important thing is to "recognize the existence of the layer called process state management."&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Summary
&lt;/h2&gt;

&lt;p&gt;Claude Code is powerful. Cursor is fast. GitHub Copilot is deeply integrated with GitHub. Windsurf excels in parallelism. These are undeniable strengths. But that strength lies in "autonomy, speed, and parallelism of reasoning and implementation."&lt;/p&gt;

&lt;p&gt;Kiro is not flashy, but its value lies in the "structural fixation of process state."&lt;/p&gt;

&lt;p&gt;If you compare them simply without understanding this difference, Kiro will always lose. But if you change the evaluation axis, the view changes.&lt;/p&gt;

&lt;p&gt;In 2026, all AI coding tools are running toward "making reasoning faster, stronger, and more autonomous." Kiro cannot win on that competitive axis. However, phases dominated by reasoning and phases dominated by consistency maintenance costs are different things. The moment the latter becomes dominant, Kiro’s fixation of process state through Spec, Steering, and Hooks begins to show its worth.&lt;/p&gt;

&lt;p&gt;It’s not about superiority; it’s about a difference in layers. Recognizing that difference and choosing what your project needs might be the most realistic solution for 2026.&lt;/p&gt;

</description>
      <category>kiro</category>
      <category>claudecode</category>
      <category>cursor</category>
      <category>ai</category>
    </item>
    <item>
      <title>Beyond Assistance: The Executive Power of "Agent Plugins for AWS"</title>
      <dc:creator>Kento IKEDA</dc:creator>
      <pubDate>Wed, 18 Feb 2026 04:41:45 +0000</pubDate>
      <link>https://dev.to/aws-builders/beyond-assistance-the-executive-power-of-agent-plugins-for-aws-5hnh</link>
      <guid>https://dev.to/aws-builders/beyond-assistance-the-executive-power-of-agent-plugins-for-aws-5hnh</guid>
      <description>&lt;p&gt;Released by AWS Labs on GitHub in February 2026, &lt;a href="https://github.com/awslabs/agent-plugins" rel="noopener noreferrer"&gt;Agent Plugins for AWS&lt;/a&gt; is a "plugin library that grants executable skill sets to AI agents," distinguishing itself from simple code completion or natural language assistance.&lt;/p&gt;

&lt;p&gt;In addition to being officially supported as a Claude Code plugin, this OSS can be installed and used via the official marketplace in Cursor. It supports the entire workflow end-to-end—from AI-driven design assistance, recommendations, and cost estimation to IaC generation and deployment to AWS.&lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/awslabs" rel="noopener noreferrer"&gt;
        awslabs
      &lt;/a&gt; / &lt;a href="https://github.com/awslabs/agent-plugins" rel="noopener noreferrer"&gt;
        agent-plugins
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      Agent Plugins for AWS equip AI coding agents with the skills to help you architect, deploy, and operate on AWS.
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;Agent Plugins for AWS&lt;/h1&gt;
&lt;/div&gt;
&lt;div class="markdown-alert markdown-alert-important"&gt;
&lt;p class="markdown-alert-title"&gt;Important&lt;/p&gt;
&lt;p&gt;Generative AI can make mistakes. You should consider reviewing all output and costs generated by your chosen AI model and agentic coding assistant. See &lt;a href="https://aws.amazon.com/ai/responsible-ai/policy/" rel="nofollow noopener noreferrer"&gt;AWS Responsible AI Policy&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Agent Plugins for AWS equip AI coding agents with the skills to help you architect, deploy, and operate on AWS. Agent plugins are currently supported by Claude Code and Cursor.&lt;/p&gt;
&lt;p&gt;AI coding agents are increasingly used in software development, helping developers write, review, and deploy code more efficiently. Agent skills and the broader agent plugin packaging model are emerging as best practices for steering coding agents toward reliable outcomes without bloating model context. Instead of repeatedly pasting long AWS guidance into prompts, developers can now encode that guidance as reusable, versioned capabilities that agents invoke when relevant. This improves determinism, reduces context overhead, and makes agent behavior easier to standardize across teams. Agent plugins act as…&lt;/p&gt;
&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/awslabs/agent-plugins" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;




&lt;p&gt;In that sense, existing CLI-based agents are capable of similar actions if we only consider the aspect of "execution." However, Agent Plugins differs in that it goes beyond mere task automation; it is provided as a capability layer that systematizes the design process.&lt;/p&gt;

&lt;p&gt;This article explores "what it achieves," "why it is important," and "what value it creates."&lt;/p&gt;

&lt;h2&gt;
  
  
  Why It’s More Than Just a "Completion Tool"
&lt;/h2&gt;

&lt;p&gt;Previously, development support using LLMs typically followed these patterns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Providing natural language advice in response to questions like, "How should I implement this architecture on AWS?"&lt;/li&gt;
&lt;li&gt;Humans designing and coding based on that advice.&lt;/li&gt;
&lt;li&gt;Deployment and testing performed manually by humans.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this flow, the AI's role is centered on "design assistance," while the actual application to the environment depends on humans.&lt;/p&gt;

&lt;p&gt;In contrast, "Agent Plugins for AWS" is a suite of plugins that allows AI agents to proactively handle everything from Design → Recommendations → Cost Estimation → IaC Generation → Deployment. This is qualitatively different from automation tools that simply suggest command completions or trigger a CLI via natural language.&lt;/p&gt;

&lt;h2&gt;
  
  
  Differences from CLI Automation: Why It's Not Just "Terminal Operations"
&lt;/h2&gt;

&lt;p&gt;Many might think that with current tools like Claude Code or various AI CLIs, one can already:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generate IaC&lt;/li&gt;
&lt;li&gt;Execute the AWS CLI&lt;/li&gt;
&lt;li&gt;Proceed to deployment without manual intervention
...provided that agent mode is enabled with appropriate permissions. Many are likely already practicing this.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If we look solely at "whether it can be executed," traditional CLI-based agent environments can perform similar tasks.&lt;/p&gt;

&lt;p&gt;So, what is the differentiator for "Agent Plugins for AWS"?&lt;br&gt;
The difference lies not in "executability," but in "at which layer the capability is integrated."&lt;/p&gt;
&lt;h3&gt;
  
  
  1. Ad-hoc Inference vs. Structured Capability
&lt;/h3&gt;

&lt;p&gt;In CLI-based automation, the agent reasons based on a prompt each time to generate and execute necessary commands. Design decisions and service selections tend to rely on the model's internal knowledge and the immediate context.&lt;/p&gt;

&lt;p&gt;In contrast, Agent Plugins for AWS is characterized by explicitly defining the AWS design workflow itself—&lt;strong&gt;Analyze, Recommend, Estimate, Generate, Deploy&lt;/strong&gt;—as extended capabilities of the agent. This is not just a sequence of commands to execute a task; it is a capability that stages the entire design process.&lt;/p&gt;

&lt;p&gt;In other words, it represents a shift from a model that "executes operations thought up on the spot" to one that "internalizes the AWS design process as a structured capability."&lt;/p&gt;

&lt;p&gt;While mechanisms like Claude's Skills or Kiro Powers can indeed grant agents additional specialized knowledge or scripts, they act as modules to enhance behavior or knowledge for specific domains. They do not systematize the entire design workflow.&lt;/p&gt;

&lt;p&gt;"Agent Plugins for AWS" takes a fundamentally different role by officially packaging the sequence from design to execution and providing it as a capability integrated with live data (pricing, documentation, etc.) via MCP.&lt;/p&gt;
&lt;h3&gt;
  
  
  2. Inference-Centric vs. Live Data-Connected
&lt;/h3&gt;

&lt;p&gt;While recommendations and generation are possible via CLI execution alone, those judgments tend to rely on the model's internal training data.&lt;/p&gt;

&lt;p&gt;Agent Plugins connects via MCP servers to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;awsknowledge&lt;/strong&gt; (Official documentation)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;awspricing&lt;/strong&gt; (Real-time pricing)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;aws-iac-mcp&lt;/strong&gt; (IaC best practices)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This structure ensures that design recommendations and cost estimates are tied to the latest official information and real-world data. The difference is not "whether it can be done," but "whether the information sources backing the judgment are systematically integrated."&lt;/p&gt;
&lt;h3&gt;
  
  
  3. Operational Automation vs. Domain Capability Expansion
&lt;/h3&gt;

&lt;p&gt;CLI automation is primarily about making "operations" more efficient.&lt;br&gt;
Agent Plugins grants the agent specific AWS domain knowledge, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Service selection logic&lt;/li&gt;
&lt;li&gt;Cost evaluation flows&lt;/li&gt;
&lt;li&gt;IaC output patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This can be viewed as an attempt to "expand the design capability within the AWS domain" rather than just automating command execution.&lt;/p&gt;
&lt;h3&gt;
  
  
  Positioning in the Capability Stack
&lt;/h3&gt;

&lt;p&gt;Structurally, Agent Plugins is positioned in the following layer:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff6af4sw4i8v8c8plv1g5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff6af4sw4i8v8c8plv1g5.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;While CLI automation optimizes the "CLI / API Execution Layer," Agent Plugins adds an "AWS Domain Capability" intermediate layer above it. This design elevates the agent's capability stack rather than just adding another tool.&lt;/p&gt;
&lt;h3&gt;
  
  
  Organizational Perspective
&lt;/h3&gt;

&lt;p&gt;CLI automation improves individual efficiency. Agent Plugins standardizes the design workflow. This difference may seem small but becomes significant at the organizational level:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reproducibility of designs&lt;/li&gt;
&lt;li&gt;Consistency in cost evaluation&lt;/li&gt;
&lt;li&gt;Uniformity of IaC output&lt;/li&gt;
&lt;li&gt;Reviewable rationales for recommendations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Agent Plugins contributes to the "standardization of processes." Therefore, its differentiator is not "replacing the CLI," but "stacking AWS-specific capabilities on top of the CLI." If CLI automation is the "execution foundation," Agent Plugins is the "capability extension layer" built upon it.&lt;/p&gt;
&lt;h2&gt;
  
  
  Basic Structure and Workflow of Agent Plugins
&lt;/h2&gt;

&lt;p&gt;Agent Plugins for AWS is a collection of plugin modules that grant AWS-related functionalities to AI agents. According to the README, the goal is to provide skills that allow AI coding agents to assist with everything from AWS design and deployment to operations.&lt;/p&gt;
&lt;h3&gt;
  
  
  The 5-Step Workflow
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Analyze&lt;/strong&gt;: Analyzes source code and project structure to identify frameworks, dependencies, and data stores.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recommend&lt;/strong&gt;: Suggests appropriate AWS service configurations and provides the reasoning.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Estimate&lt;/strong&gt;: References real-time pricing via the AWS Pricing MCP server to estimate the cost of the recommended setup.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generate&lt;/strong&gt;: Converts the design into IaC (CDK or CloudFormation).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deploy&lt;/strong&gt;: Reflects and executes the generated IaC in the AWS environment after user approval.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This establishes a circuit for the AI agent to proactively drive the workflow from design to implementation and deployment.&lt;/p&gt;
&lt;h2&gt;
  
  
  Real-Data Integration Powered by MCP Servers
&lt;/h2&gt;

&lt;p&gt;The Model Context Protocol (MCP) server is the crucial underlying mechanism supporting the utility of Agent Plugins. MCP is a standardized protocol for connecting AI models to external data sources and tools. AWS-side MCP servers provide official documentation, pricing, and best practices.&lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
        &lt;div class="c-embed__cover"&gt;
          &lt;a href="https://awslabs.github.io/mcp/" class="c-link align-middle" rel="noopener noreferrer"&gt;
            &lt;img alt="" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fawslabs.github.io%2Fmcp%2Fimg%2Faws-logo.svg" height="auto" class="m-0"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="c-embed__body"&gt;
        &lt;h2 class="fs-xl lh-tight"&gt;
          &lt;a href="https://awslabs.github.io/mcp/" rel="noopener noreferrer" class="c-link"&gt;
            Welcome to Open Source MCP Servers for AWS
          &lt;/a&gt;
        &lt;/h2&gt;
          &lt;p class="truncate-at-3"&gt;
            Get started with open source MCP Servers for AWS and learn core features.
          &lt;/p&gt;
        &lt;div class="color-secondary fs-s flex items-center"&gt;
            &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fawslabs.github.io%2Fmcp%2Fimg%2Faws-logo.svg"&gt;
          awslabs.github.io
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;




&lt;h3&gt;
  
  
  Key MCP Servers
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;MCP Server&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;awsknowledge&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AWS documentation, architecture guides, best practices.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;awspricing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Real-time AWS pricing information.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;aws-iac-mcp&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;IaC (CDK/CloudFormation) best practices.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This allows the agent to refer to the latest live data rather than relying solely on the model's internal knowledge.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Value
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1) Cloud Migration &amp;amp; Architecture Design Support
&lt;/h3&gt;

&lt;p&gt;In traditional cloud migration, humans had to handle multiple phases: analyzing current setups, selecting services, decision-making based on costs, designing IaC, and deploying.&lt;br&gt;
By simply instructing an agent using Agent Plugins in natural language, much of this is automated.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I want to optimize this project for an AWS serverless architecture and deploy it."&lt;br&gt;
This single instruction can lead to recommendations, cost comparisons, IaC, and execution, significantly reducing manual effort and ensuring design accuracy.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  2) Formalizing Team Knowledge
&lt;/h3&gt;

&lt;p&gt;The tacit knowledge of veteran designers often leads to siloing. Because Agent Plugins outputs the rationale for recommendations, costs, and IaC, knowledge sharing and reviews become much easier. This results in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Transparency in the design decision process.&lt;/li&gt;
&lt;li&gt;Formalization of best practices.&lt;/li&gt;
&lt;li&gt;Reduced learning costs for new members.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3) Integration with CI/CD and Quality Evaluation
&lt;/h3&gt;

&lt;p&gt;Generated IaC and configurations can be integrated directly into CI/CD pipelines.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automatically validating IaC in pull requests.&lt;/li&gt;
&lt;li&gt;Attaching cost comparison reports to the review stage.&lt;/li&gt;
&lt;li&gt;Linking to automated deployment approval workflows.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Considerations and Risks
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Model Errors and Recency&lt;/strong&gt;: As stated in the official README, outputs may contain errors, and all results require human review.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security and Permissions&lt;/strong&gt;: Careful design of AWS CLI and IAM settings is essential. Risks increase with excessive permissions; establishing proper approval flows for automated deployments is vital.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Future Outlook
&lt;/h2&gt;

&lt;p&gt;Agent Plugins for AWS is a foundation for evolving AI agents from "explanatory assistants" into "orchestration engines for execution." The underlying MCP servers and ecosystem (Claude, Cursor, etc.) are continuously developing, potentially leading to further automation of cloud operations.&lt;/p&gt;

&lt;p&gt;Furthermore, AWS has announced the preview of the &lt;strong&gt;AWS MCP Server&lt;/strong&gt;, a remote/fully-managed Model Context Protocol server. This suggests a direction where governance—such as authentication/authorization via IAM and log collection via CloudTrail—will be natively supported.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/11/aws-mcp-server/" rel="noopener noreferrer"&gt;https://aws.amazon.com/about-aws/whats-new/2025/11/aws-mcp-server/&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Agent Plugins for AWS is a significant evolution that moves the role of AI from "assistance" to "execution." By providing a foundation based on real-time data, consistent workflows, and reasoned support, it enables both productivity and quality in cloud design, migration, and operations.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>ai</category>
      <category>aiops</category>
      <category>infrastructure</category>
    </item>
    <item>
      <title>Solving Parenting Pain Points with Generative AI — A Potty-Training Support Device Built with Kiro and M5Stack</title>
      <dc:creator>Kento IKEDA</dc:creator>
      <pubDate>Sat, 30 Aug 2025 08:47:02 +0000</pubDate>
      <link>https://dev.to/aws-builders/solving-parenting-pain-points-with-generative-ai-a-potty-training-support-device-built-with-kiro-3e2o</link>
      <guid>https://dev.to/aws-builders/solving-parenting-pain-points-with-generative-ai-a-potty-training-support-device-built-with-kiro-3e2o</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Potty training can be quite a trial for parents. Our family was no exception—we were stuck. At daycare our child could go frequently, but at home the toilet was boycotted. Sometimes the motivation appeared, but it just wouldn’t stick as a habit. That’s when I thought, “If this is a motivation problem, maybe a little ‘mechanism’ could change the situation,” which became the spark for this project.&lt;/p&gt;

&lt;p&gt;There are plenty of commercial potty-training products, but kids get used to them quickly and the motivation fades. So I decided to make an original device that shows a favorite animal character and praises the child. If the novelty wears off, I can add more animal variations—or switch to vehicles or something else—so the device can keep their interest almost indefinitely.&lt;/p&gt;

&lt;p&gt;It also happened to be a period when I was evaluating Kiro (an AI IDE), and I wanted to test its capabilities. I usually develop web applications and rarely do embedded development with hardware, but I hoped that with AI’s help I could pull it off.&lt;/p&gt;

&lt;p&gt;In the end, I could rely on Kiro far more than expected and let AI handle almost every step. From tech selection and implementation to debugging—and even generating images and audio—what would normally take a few days was finished in about an hour on a Sunday morning. That was astonishing.&lt;/p&gt;

&lt;p&gt;In this post, I’ll review the development process, share what it was like to build with Kiro, highlight where human judgment matters, and reflect on what AI-driven development could look like going forward.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;It’s a “Potty-Training Cheer Device” using an M5Stack Grey. It’s a palm-sized gadget packed with features kids love.&lt;/p&gt;

&lt;p&gt;The usage is simple: when the child succeeds at the toilet, they press a button. Then an animal character (rabbit, penguin, or cat) appears at random on the screen and plays a sound.&lt;/p&gt;

&lt;p&gt;The UI is in Japanese, with easy-to-read hiragana like “といれ できた？” (“Did you use the toilet?”) and “すごい！” (“Great!”). Even if they haven’t learned hiragana yet, hearts and star symbols are displayed so they can still react to those in addition to the animals.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fds8atvdrprkydracaggq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fds8atvdrprkydracaggq.png" alt=" " width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpkhm7808jsjhva6gfjls.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpkhm7808jsjhva6gfjls.png" alt=" " width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2j9bgt2n87xg04snf68v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2j9bgt2n87xg04snf68v.png" alt=" " width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzepv19kux7726yzxfkrj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzepv19kux7726yzxfkrj.png" alt=" " width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Freu8bi1zcch8qmm0nlbl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Freu8bi1zcch8qmm0nlbl.png" alt=" " width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Functionally, the device automatically counts successes and saves the count to EEPROM, so the record persists even if power is turned off. On special counts—first time, fifth time, tenth time, etc.—a celebratory screen with a rainbow background is shown to help maintain motivation.&lt;/p&gt;

&lt;p&gt;For practicality, I added a volume control, a battery level check, and a long-press reset. The battery check was especially helpful while I was still getting used to the M5Stack; when behavior seemed odd during development, I could quickly assess the state by pressing the left and middle buttons simultaneously.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj1wdxpbhoauvwztot31b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj1wdxpbhoauvwztot31b.png" alt=" " width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc75pannhf98lkkddms9d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc75pannhf98lkkddms9d.png" alt=" " width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqgvb9cmnl0x1rv4rrx9w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqgvb9cmnl0x1rv4rrx9w.png" alt=" " width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I kept the operation as simple as possible, used clear Japanese for the display, and chose catchy electronic sounds—prioritizing usability from a child’s point of view.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building with Kiro
&lt;/h2&gt;

&lt;h3&gt;
  
  
  A Kiro-Led Development Flow
&lt;/h3&gt;

&lt;p&gt;This time I took the reverse approach from typical development. Normally, a human would prepare some requirements and development policy—say, in a file like &lt;code&gt;AGENTS.md&lt;/code&gt; . But to test Kiro’s abilities, I intentionally prepared nothing. I started only by telling it that I wanted to make a potty-training device for my child, that I had an M5Stack Grey, and roughly how old the child is.&lt;/p&gt;

&lt;h3&gt;
  
  
  Expected Approach vs. Actual Approach
&lt;/h3&gt;

&lt;p&gt;Kiro is known as an “AI IDE that’s good at planning,” so I expected a structured spec-writing process like:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Requirements (requirements.md)&lt;/strong&gt; — user stories and EARS-style acceptance criteria&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Design (design.md)&lt;/strong&gt; — architecture, data flow, and rationale for tech choices&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implementation plan (tasks.md)&lt;/strong&gt; — stepwise implementation and task breakdown&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In practice, Kiro chose a more hands-on approach rather than waterfall-style documentation.&lt;/p&gt;

&lt;h4&gt;
  
  
  Why the Gap?
&lt;/h4&gt;

&lt;p&gt;I can only speculate about the reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;No explicit request to write specs&lt;/strong&gt; — My vague “I want to try Kiro’s capabilities” may not have triggered a formal spec process.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prototyping context detected&lt;/strong&gt; — It likely recognized this as a personal, experimental project and avoided heavyweight enterprise-style process.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reading my expectations&lt;/strong&gt; — It might have sensed “I want something working quickly,” prioritizing implementation over documents.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Learned behavior as an AI IDE&lt;/strong&gt; — From similar projects, it may have learned that implementation-first often yields higher satisfaction for solo developers.&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Was That the Right Call?
&lt;/h4&gt;

&lt;p&gt;Looking back, it felt appropriate. If we had started with a detailed requirements doc, we might have wasted time on unknowable questions like “a UI kids will love” or “electronic tones that resemble animal calls.” Those are better answered by building and testing.&lt;/p&gt;

&lt;p&gt;That said, we didn’t fully exploit Kiro’s strength in planning. Ideally we could have done structured spec → staged implementation to leverage Kiro’s advantages.&lt;/p&gt;

&lt;h4&gt;
  
  
  A Pragmatic Documentation Strategy
&lt;/h4&gt;

&lt;p&gt;Instead of large requirements and design docs, Kiro created lean, practical docs incrementally:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;setup.md&lt;/strong&gt; — concrete steps for environment setup&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;testing.md&lt;/strong&gt; — how to verify behavior and debug&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;voice_creation.md&lt;/strong&gt; — automating voice file generation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;image_preparation.md&lt;/strong&gt; — image generation and format conversion&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;troubleshooting.md&lt;/strong&gt; — common issues and fixes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This was spot-on for solo prototyping. In a big company, requirements and design docs matter; for a small one-person prototype, “how to set up,” “how to test,” and “what to do when things break” are far more valuable.&lt;/p&gt;

&lt;h4&gt;
  
  
  Agile by Nature
&lt;/h4&gt;

&lt;p&gt;Rather than perfect docs first, we organized information as working code emerged. When we needed sounds, we created &lt;code&gt;voice_creation.md&lt;/code&gt; . When “Mac doesn’t recognize the device,” we expanded &lt;code&gt;troubleshooting.md&lt;/code&gt; .&lt;/p&gt;

&lt;p&gt;It felt like Kiro understood my development style and context and chose an optimal approach. Whether that was intentional or simply a bias toward implementation is hard to say.&lt;/p&gt;

&lt;h4&gt;
  
  
  Transparency of AI Decisions
&lt;/h4&gt;

&lt;p&gt;One thing I noticed: the reasons behind Kiro’s choices aren’t always clear. I can’t tell whether it chose implementation-first due to nuanced situational judgment or simply because code generation is a strong suit. This opacity can be a challenge in AI-driven development, where a human developer would normally explain their rationale.&lt;/p&gt;

&lt;h3&gt;
  
  
  From First Chat to a Working Device
&lt;/h3&gt;

&lt;p&gt;We started with the vague idea of “a gadget where pressing a button does something.” As Kiro proposed features and I saw them working, more concrete requirements surfaced.&lt;/p&gt;

&lt;p&gt;Kiro built something minimal, I tried it, then I gave feedback. For example, the “smart random” logic—preventing the same character from appearing twice in a row—came from me noticing during testing that kids might get bored without change.&lt;/p&gt;

&lt;p&gt;The long-press reset was also my suggestion, considering I’d be developing on the same device I’d be using in real life.&lt;/p&gt;

&lt;h4&gt;
  
  
  “Think While Building”
&lt;/h4&gt;

&lt;p&gt;Traditional development fixes requirements (to some degree) before implementation. Kiro suggested, “Let’s build the basics first and improve as we go.” That proved highly effective. Seeing something real made it easy to spot improvements like “use larger text here” or “a volume control would help.”&lt;/p&gt;

&lt;p&gt;Especially in solo projects, the motivation to “just make it run” can outweigh careful up-front planning. This approach created a positive loop: get it running fast, then keep improving.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code Generation Quality
&lt;/h3&gt;

&lt;p&gt;Kiro generated everything consistently—from PlatformIO settings and C++ for M5Stack to Python scripts for image generation. I didn’t write a single character of code.&lt;/p&gt;

&lt;h4&gt;
  
  
  Library Selection and Setup
&lt;/h4&gt;

&lt;p&gt;It selected and configured appropriate libraries: M5Stack libraries, ESP8266Audio for audio playback, and SPIFFS for the filesystem. The platformio.ini—from board selection to library dependencies—was completed without my intervention.&lt;/p&gt;

&lt;h4&gt;
  
  
  Error Handling
&lt;/h4&gt;

&lt;p&gt;We added fallbacks when asset files were missing and handled audio playback failures—folding in feedback from real-device testing.&lt;/p&gt;

&lt;h4&gt;
  
  
  Memory Management
&lt;/h4&gt;

&lt;p&gt;The implementation accounted for the ESP32’s limited memory: proper freeing of dynamically allocated memory, chunked reads for large files, and other embedded-specific considerations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Automated Image &amp;amp; Audio Generation
&lt;/h3&gt;

&lt;p&gt;Automatic generation for images and audio was especially helpful.&lt;/p&gt;

&lt;p&gt;Finding or crafting assets takes time, so automation was practically a must-have. It also matters for future extensibility when adding more assets.&lt;/p&gt;

&lt;h4&gt;
  
  
  Japanese Text as Images
&lt;/h4&gt;

&lt;p&gt;At first I planned to render Japanese text like “といれ できた？” and “すごい！” directly on the M5Stack. But out of the box, M5Stack can’t display Japanese fonts without extra libraries.&lt;/p&gt;

&lt;p&gt;While Kiro initially claimed “this will display Japanese,” it was garbled. I directed it to follow a blog post to support Japanese, but after multiple failed attempts, I pivoted: “Let’s render the text to images instead,” which avoids Japanese font support on-device. Kiro then generated a Python script to produce BMPs that display well on M5Stack. For a personal, kid-facing project, pixel-perfect beauty wasn’t required—the results were readable and “good enough.”&lt;/p&gt;

&lt;h4&gt;
  
  
  Auto-Generated Sounds
&lt;/h4&gt;

&lt;p&gt;I asked for automatically generated animal-like calls—but this fell short of expectations.&lt;/p&gt;

&lt;p&gt;Kiro synthesized plausible effects in Python, but they didn’t quite sound like animal calls. It tried techniques like ADSR envelopes for natural attack/decay and frequency changes tailored to each animal’s “character,” but in the end, the sounds didn’t truly feel like calls. Still, I appreciated the attempt to generate audio mathematically without external libraries.&lt;/p&gt;

&lt;p&gt;It also generated spoken lines like “すごい といれ できたね” as audio files.&lt;/p&gt;

&lt;h3&gt;
  
  
  A Natural Development Flow
&lt;/h3&gt;

&lt;p&gt;The loop of “run what Kiro built → test → give feedback” progressed as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Initialize the PlatformIO project&lt;/strong&gt; — M5Stack-specific configuration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Basic display&lt;/strong&gt; — simple “Hello World”-level verification&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audio playback&lt;/strong&gt; — play a simple audio file&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Image display&lt;/strong&gt; — show Japanese text as images&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Randomization&lt;/strong&gt; — character selection logic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Persistence&lt;/strong&gt; — saving success counts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Special features&lt;/strong&gt; — celebration screen and battery check&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The beauty of this flow: you have something working at each stage. If text appears on the M5Stack screen, wiring is likely correct. If sound plays, the speaker and library settings are likely correct.&lt;/p&gt;

&lt;p&gt;Kiro consistently took the “get it working first, then improve it” approach. Rather than chasing perfection, it raised quality step by step—very much like an experienced developer.&lt;/p&gt;

&lt;p&gt;That human-like flow kept collaboration smooth. Kiro didn’t try to create a perfect solution in one shot; it suggested iterative steps. When problems occurred, its behavior resembled an experienced engineer: investigate based on the symptoms I shared, fix the issue, and document it—exactly how you’d want someone to handle a bug ticket.&lt;/p&gt;

&lt;p&gt;Working incrementally with continuous verification felt like pairing with a human developer, giving me a glimpse of a new mode of AI pair programming.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Humans Should Intervene
&lt;/h2&gt;

&lt;p&gt;Not everything should be left to AI. There were several moments where human judgment mattered.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prioritizing Requirements
&lt;/h3&gt;

&lt;p&gt;Kiro proposed many features, but humans must set priorities. We adjusted the implementation order based on child usability and actual need, even though features like Wi-Fi connectivity and cloud sync sounded attractive.&lt;/p&gt;

&lt;p&gt;For potty training, Wi-Fi isn’t essential; reliability and simplicity matter more. Practical choices—handling dead batteries or preventing accidental resets—trumped data sync.&lt;/p&gt;

&lt;p&gt;Humans are best at imagining real usage contexts and balancing technical possibilities with real-world constraints. That role still belongs to us.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hardware-Specific Issues
&lt;/h3&gt;

&lt;p&gt;The M5Stack has unit variance; SPIFFS has size limits; power management has quirks. These are things you only notice on real hardware. A human needs to observe them and report back to Kiro.&lt;/p&gt;

&lt;p&gt;For example, when a heart image didn’t display, I first suspected code issues. But it turned out SPIFFS might not have uploaded the file correctly. After I shared serial logs and error messages, Kiro proposed stepwise debugging, and we solved it by adding file-existence checks and fallbacks.&lt;/p&gt;

&lt;p&gt;This kind of cooperation—human observation on a real device plus AI-suggested fixes—worked very well.&lt;/p&gt;

&lt;h3&gt;
  
  
  Final Design Decisions
&lt;/h3&gt;

&lt;p&gt;Color choices, layout, and sound length—these sensory judgments were made by me. Kiro can implement technically correct solutions, but the question “Will a child enjoy this?” is where human sensibility shines.&lt;/p&gt;

&lt;p&gt;The initial effect sounded like a warning beep—attention-grabbing but not pleasant. Noticing and adjusting that is still a human strength.&lt;/p&gt;

&lt;h2&gt;
  
  
  Productivity Gains
&lt;/h2&gt;

&lt;p&gt;Compared to conventional development, several areas improved dramatically—especially the time from initial research to a working prototype.&lt;/p&gt;

&lt;h3&gt;
  
  
  Faster Research
&lt;/h3&gt;

&lt;p&gt;How to use the M5Stack, set up PlatformIO, and work with ESP8266Audio—Kiro provided the right information instantly, not just API references but project-ready sample code. I could move straight to verification.&lt;/p&gt;

&lt;p&gt;It also covered tricky integration details that usually bite beginners—like memory management when using ESP8266Audio with SPIFFS and settings for the M5Stack’s built-in speaker.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rapid Prototyping
&lt;/h3&gt;

&lt;p&gt;The “get something working” phase was vastly faster. Kiro generated basic functionality in minutes, and I could test on the M5Stack right away.&lt;/p&gt;

&lt;p&gt;Previously, I’d have to hunt for sample code, adapt it, resolve build errors, and so on. This time, I started with runnable code and focused immediately on validating features.&lt;/p&gt;

&lt;h3&gt;
  
  
  Automated Tooling Around the Edges
&lt;/h3&gt;

&lt;p&gt;Image generation, audio generation, format conversion—tasks that normally require separate tools or manual work—were delivered as automation scripts.&lt;/p&gt;

&lt;p&gt;For example, the script that converts Japanese text into M5Stack-friendly BMPs handled font choice, sizing, background color, and format conversion. Otherwise I’d have been manually crafting each image in an editor.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing Thoughts
&lt;/h2&gt;

&lt;p&gt;This project made me feel how impactful tools like Kiro can be on development. Especially for solo work and small prototypes, AI can dramatically improve efficiency.&lt;/p&gt;

&lt;p&gt;Hardware projects used to feel daunting—particularly for those without embedded experience. With AI helping from tech selection to implementation, even web engineers can take on hardware.&lt;/p&gt;

&lt;p&gt;At the same time, the importance of human decisions became clearer: defining requirements, prioritizing, judging quality—especially designing from the actual user’s perspective (in this case, a child). With the right division of labor between AI and humans, we can make better things.&lt;/p&gt;

&lt;p&gt;What intrigued me most was not just the technical implementation, but how AI also raised the product’s overall completeness—image generation, voice synthesis, and documentation included. It suggests that even solo developers can ship something reasonably polished, quickly.&lt;/p&gt;

&lt;p&gt;This started as a small, everyday parenting challenge—potty training—but turned into a great hands-on exploration of the possibilities and challenges of AI-driven development. Most importantly, from the very day we finished, our child started going to the toilet. They’re eager to press the button—“Can I press it?”—and it’s clearly boosting motivation. I truly felt how technology can improve parenting.&lt;/p&gt;

&lt;p&gt;As technology advances, the distance from idea to implementation keeps shrinking. Little sparks like this can now take shape in just a few hours over a weekend. That means many more opportunities to unleash individual creativity—and many more diverse, delightful products appearing in the world.&lt;/p&gt;

</description>
      <category>kiro</category>
      <category>m5stack</category>
      <category>iot</category>
      <category>ai</category>
    </item>
    <item>
      <title>Fastest MVP with Amplify Kiro Amazon Q Developer —Practical playbook for startups and enterprise innovation teams—</title>
      <dc:creator>Kento IKEDA</dc:creator>
      <pubDate>Sat, 09 Aug 2025 05:39:33 +0000</pubDate>
      <link>https://dev.to/aws-builders/fastest-mvp-with-amplify-x-kiro-x-amazon-q-developer-practical-playbook-for-startups-and-aki</link>
      <guid>https://dev.to/aws-builders/fastest-mvp-with-amplify-x-kiro-x-amazon-q-developer-practical-playbook-for-startups-and-aki</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;If your goal is to compress the design → release → improvement loop and ship an MVP as fast as possible, a pragmatic approach in 2025 is to lean on three AWS tools that work well together.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Amplify Gen 2

&lt;ul&gt;
&lt;li&gt;A code-first platform where defining requirements in TypeScript (data models / auth / functions) automatically provisions the AWS resources you need.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Kiro

&lt;ul&gt;
&lt;li&gt;AWS’s agentic AI IDE (preview) that runs the pipeline end-to-end: spec → design → coding → tests → documentation.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Amazon Q Developer

&lt;ul&gt;
&lt;li&gt;Your IDE “pair-programmer” for code understanding, doc/test generation, and design guidance.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;This article explains how to combine these three to shorten the path to MVP for both startups and enterprise innovation teams.&lt;/p&gt;

&lt;h1&gt;
  
  
  Why center the stack on Amplify Gen2
&lt;/h1&gt;

&lt;p&gt;Amplify Gen2 puts the “write requirements in TypeScript = the infra stands up” experience front and center. From your laptop, &lt;code&gt;npx ampx sandbox&lt;/code&gt; launches a personal cloud sandbox; changes under &lt;code&gt;amplify/&lt;/code&gt; are applied in real time (via CDK hot-swap). You also get full-stack PR previews per pull request.&lt;/p&gt;

&lt;p&gt;Key takeaways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In the first few days, you can keep the requirements ↔ implementation loop unbroken

&lt;ul&gt;
&lt;li&gt;Use Sandbox + PR previews&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Future headroom is handled with CDK

&lt;ul&gt;
&lt;li&gt;Gen2 is continuous with CDK, so extensions live in the same repository&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Docs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.amplify.aws/react/deploy-and-host/sandbox-environments/setup/" rel="noopener noreferrer"&gt;https://docs.amplify.aws/react/deploy-and-host/sandbox-environments/setup/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.amplify.aws/react/deploy-and-host/fullstack-branching/pr-previews/" rel="noopener noreferrer"&gt;https://docs.amplify.aws/react/deploy-and-host/fullstack-branching/pr-previews/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/ja_jp/amplify/latest/userguide/pr-previews.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/ja_jp/amplify/latest/userguide/pr-previews.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Startup lens — go fast, frugal, and scalable at once
&lt;/h1&gt;

&lt;p&gt;For a small team, being up and running in ~30 minutes is a big deal. A Japanese health-tech startup, KAKEHASHI Inc., publicly documented how they hosted multiple apps on Amplify Console, offloading CI/CD and hosting to AWS and shortening their development cycle. It predates Gen2, but the hosting × CI/CD value carries over today.&lt;/p&gt;

&lt;p&gt;On costs, there’s a tailwind: new accounts get up to $200 in Free Tier credits, which materially lowers the cost of learning and early experiments.&lt;/p&gt;

&lt;p&gt;Refs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/jp/blogs/startup/casestudy_kakehashi/" rel="noopener noreferrer"&gt;https://aws.amazon.com/jp/blogs/startup/casestudy_kakehashi/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/jp/about-aws/whats-new/2025/07/aws-free-tier-credits-month-free-plan/" rel="noopener noreferrer"&gt;https://aws.amazon.com/jp/about-aws/whats-new/2025/07/aws-free-tier-credits-month-free-plan/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Enterprise innovation lens — balance speed × governance
&lt;/h1&gt;

&lt;p&gt;If you need multiple PoCs running in parallel across departments, create owner-scoped sandboxes and institutionalize “build and discard.” With &lt;code&gt;ampx sandbox&lt;/code&gt;, every save is applied immediately, so requirement validation moves fast.&lt;/p&gt;

&lt;p&gt;Governance hinges on the fact that Gen2 backends are CDK-based. That aligns cleanly with Control Tower / GuardDuty and lets you add VPC / PrivateLink / legacy integrations in the same stack. The flip side: Amplify’s autogenerated resource names can be noisy in audit logs—define naming conventions and tag standards up front and wire them into CI.&lt;/p&gt;

&lt;p&gt;Costs follow the startup case: the Free Tier credits are handy for early exploration at the department level.&lt;/p&gt;

&lt;p&gt;Refs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.amplify.aws/react/deploy-and-host/sandbox-environments/setup/" rel="noopener noreferrer"&gt;https://docs.amplify.aws/react/deploy-and-host/sandbox-environments/setup/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/jp/about-aws/whats-new/2025/07/aws-free-tier-credits-month-free-plan/" rel="noopener noreferrer"&gt;https://aws.amazon.com/jp/about-aws/whats-new/2025/07/aws-free-tier-credits-month-free-plan/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Bringing generative AI in: Amplify AI Kit × Bedrock
&lt;/h1&gt;

&lt;p&gt;Amplify AI Kit adds AI routes—Conversation and Generation—as TypeScript definitions. In minutes, you get a front-end scaffold and a Bedrock connection. The design is intentionally TypeScript-first, not a one-off magic CLI incantation.&lt;/p&gt;

&lt;p&gt;Fastest setup:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Scaffold with &lt;code&gt;npm create amplify@latest&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Reflect changes instantly with &lt;code&gt;npx ampx sandbox&lt;/code&gt; (hot-swap)&lt;/li&gt;
&lt;li&gt;Add AI routes in TypeScript and wire up Bedrock&lt;/li&gt;
&lt;li&gt;Iterate using the preview&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Refs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/jp/blogs/news/build-fullstack-ai-apps-in-minutes-with-the-new-amplify-ai-kit/" rel="noopener noreferrer"&gt;https://aws.amazon.com/jp/blogs/news/build-fullstack-ai-apps-in-minutes-with-the-new-amplify-ai-kit/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.amplify.aws/react/ai/" rel="noopener noreferrer"&gt;https://docs.amplify.aws/react/ai/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.amplify.aws/react/deploy-and-host/sandbox-environments/setup/" rel="noopener noreferrer"&gt;https://docs.amplify.aws/react/deploy-and-host/sandbox-environments/setup/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  “If we have Kiro, do we still need Amplify?” — the three-in-one division of roles
&lt;/h1&gt;

&lt;p&gt;Let’s recap what each tool owns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Kiro — draws the blueprint and work plan

&lt;ul&gt;
&lt;li&gt;Spec-driven pipeline that runs requirements → design → coding → tests → docs end-to-end
-MCP support to connect external knowledge bases and tools (preview)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Amplify — the land and utilities to run it

&lt;ul&gt;
&lt;li&gt;Hosting, CI/CD, auth, data, plus Sandbox and PR previews to keep the loop turning&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Amazon Q Developer — your teammate on the ground

&lt;ul&gt;
&lt;li&gt;Code understanding, doc/test generation, design guidance&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;In short: Kiro drives design→code aggressively forward, but where you run it safely is a separate question. Amplify gives you a production-grade cloud execution base quickly, and Q Developer keeps daily improvements flowing. Together, they close the design → release → improvement loop.&lt;/p&gt;

&lt;p&gt;A minimal example flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Scaffold: &lt;code&gt;npm create amplify@latest&lt;/code&gt; and write backend requirements in TypeScript&lt;/li&gt;
&lt;li&gt;Instant verify: &lt;code&gt;npx ampx sandbox&lt;/code&gt; and hot-swap on every save&lt;/li&gt;
&lt;li&gt;Review: use full-stack PR previews&lt;/li&gt;
&lt;li&gt;Add AI: define AI routes with Amplify AI Kit in TypeScript and connect Bedrock&lt;/li&gt;
&lt;li&gt;Thicken in the IDE: firm up specs/tests with Kiro; generate docs/tests with Amazon Q Developer&lt;/li&gt;
&lt;/ol&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;Startups can accelerate hypothesis testing with Amplify + AI Kit, and Free Tier credits make new bets cheaper.&lt;/p&gt;

&lt;p&gt;Enterprises can balance speed × governance with owner-scoped sandboxes plus naming/tag standards.&lt;/p&gt;

&lt;p&gt;Kiro (design) × Amplify (execution) × Q Developer (improvement) shortens the distance to MVP. They’re complementary, not substitutes.&lt;/p&gt;

</description>
      <category>kiro</category>
      <category>amplify</category>
      <category>qdeveloper</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
