<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: yoko / Naoki Yokomachi</title>
    <description>The latest articles on DEV Community by yoko / Naoki Yokomachi (@yokomachi).</description>
    <link>https://dev.to/yokomachi</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F637404%2F00fb320f-dc2f-4e97-9bd1-8b2b578b2209.jpg</url>
      <title>DEV Community: yoko / Naoki Yokomachi</title>
      <link>https://dev.to/yokomachi</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/yokomachi"/>
    <language>en</language>
    <item>
      <title>I Built an Issue-Based Claude Code Plugin "cadenza" for Technical Output Creation</title>
      <dc:creator>yoko / Naoki Yokomachi</dc:creator>
      <pubDate>Sat, 09 May 2026 04:03:01 +0000</pubDate>
      <link>https://dev.to/yokomachi/i-built-an-issue-based-claude-code-plugin-cadenza-for-technical-output-creation-37hc</link>
      <guid>https://dev.to/yokomachi/i-built-an-issue-based-claude-code-plugin-cadenza-for-technical-output-creation-37hc</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;When I write blog posts or give lightning talks, I often feel that my output isn't quite as engaging as I'd like. Reflecting on it briefly, I think the causes are: (1) trying to cram in too much and ending up wide but shallow, (2) staying at the level of mere introduction without doing my own verification or analysis, and (3) the structure lacking dynamic pacing and feeling flat.&lt;/p&gt;

&lt;p&gt;To address these issues and make my output more effective, efficient, and sustainable, I created a Claude Code plugin. (The contents are simply a collection of Skills, so it can be used with other agents as well.)&lt;/p&gt;

&lt;p&gt;The repository is below. Anyone can install and use it by following the README.md.&lt;br&gt;
&lt;a href="https://github.com/n-yokomachi/cadenza" rel="noopener noreferrer"&gt;https://github.com/n-yokomachi/cadenza&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Plugin Overview
&lt;/h2&gt;

&lt;p&gt;The plugin is named &lt;code&gt;cadenza&lt;/code&gt;. It comes from the musical term "cadenza" (a free movement in a concerto where the soloist showcases their virtuosity). The idea is that while the structured workflow is strictly enforced, the user's free will drives the issue framing and verification.&lt;/p&gt;

&lt;p&gt;cadenza is a Claude Code plugin that divides technical output creation into 5 phases. Each phase functions as a "gate," preventing rushed writing while encouraging users to clarify their questions.&lt;/p&gt;

&lt;p&gt;The fundamental design philosophy is based on issue-first knowledge production theories such as Kazuto Ataka's &lt;em&gt;Issue Driven&lt;/em&gt; (Eiji Press, 2010).&lt;/p&gt;
&lt;h3&gt;
  
  
  The 5-phase structure
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Phase&lt;/th&gt;
&lt;th&gt;Skill&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1. Planning&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/cadenza:issue-finding&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Identify what question is worth answering&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2. Design&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/cadenza:issue-decomposition&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Decompose into sub-issues and design the storyline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3. Storyboard&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/cadenza:storyboarding&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Decide the "presentation" (code snippets, diagrams, tables, etc.) for each sub-issue&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4. Verification&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/cadenza:analysis-execution&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Implement, measure, and create diagrams according to the storyboard&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5. Finishing&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/cadenza:output-crafting&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Generate the Markdown deliverable&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In addition, there are 2 skills for review.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/cadenza:output-proofread&lt;/code&gt; — AI-driven exhaustive proofreading (fact-checking + language proofreading)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/cadenza:output-review&lt;/code&gt; — Author-driven review cycle support&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each skill confirms "Shall we proceed to the next phase?" upon completion, and if approved, calls the next skill in a chain. The final deliverable of cadenza is a single general-purpose Markdown file at &lt;code&gt;./.cadenza/output.md&lt;/code&gt;. The intended use is to base blog posts or slide decks on this Markdown.&lt;/p&gt;
&lt;h3&gt;
  
  
  Each phase functions as a "gate"
&lt;/h3&gt;

&lt;p&gt;A key feature of this plugin is that each phase checks whether the start conditions for downstream phases are met. For example, when &lt;code&gt;storyboarding&lt;/code&gt; starts up, it checks whether the Phase 2 (Issue Decomposition) section has been written to &lt;code&gt;./.cadenza/state.md&lt;/code&gt;, and if not, it directs the user to run &lt;code&gt;issue-decomposition&lt;/code&gt; first.&lt;/p&gt;

&lt;p&gt;This structurally prevents phase skipping.&lt;/p&gt;

&lt;p&gt;Each phase also defines "upstream regression signals." For example, if the storyline starts to drift during the design phase, the plugin guides the user back to the planning phase to reconsider the issue.&lt;/p&gt;
&lt;h3&gt;
  
  
  Enforcing user accountability
&lt;/h3&gt;

&lt;p&gt;As an aside: even though I'm introducing this AI tool for output creation, I still take a critical stance toward output that is simply written by AI alone.&lt;br&gt;
&lt;a href="https://zenn.dev/yokomachi/articles/202512_ai_article_comment" rel="noopener noreferrer"&gt;https://zenn.dev/yokomachi/articles/202512_ai_article_comment&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For this reason, cadenza is designed so that output cannot be produced by simply leaving everything to the agent. To ensure that everything written remains the user's responsibility, mandatory steps such as user confirmation are placed at each phase, so the entire process cannot be fully delegated to the agent.&lt;/p&gt;

&lt;p&gt;Roughly, the user-led and agent-led steps are as follows.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Phase / Skill&lt;/th&gt;
&lt;th&gt;User-led steps&lt;/th&gt;
&lt;th&gt;Agent-led steps&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Phase 1: issue-finding&lt;/td&gt;
&lt;td&gt;Confirm primary information / articulate target audience, problem hypothesis, and post-read change / explicitly consent to the one-line issue&lt;/td&gt;
&lt;td&gt;Survey existing content (WebSearch) / 3-condition check&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Phase 2: issue-decomposition&lt;/td&gt;
&lt;td&gt;Agree on the decomposition pattern / articulate a hypothesis for each sub-issue / agree on the storyline pattern / pin the claim down to one sentence&lt;/td&gt;
&lt;td&gt;Storyline validity check&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Phase 3: storyboarding&lt;/td&gt;
&lt;td&gt;Agree on format and specifications for each sub-issue / specify output style&lt;/td&gt;
&lt;td&gt;Storyboard assembly / storyboard review&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Phase 4: analysis-execution&lt;/td&gt;
&lt;td&gt;Decide whether to regress upstream / run verification / request additional checks beyond the skill's defaults&lt;/td&gt;
&lt;td&gt;Classify verification type / pin premises down / run verification / structure results&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Phase 5: output-crafting&lt;/td&gt;
&lt;td&gt;Select the title (1 from 3 candidates)&lt;/td&gt;
&lt;td&gt;Assemble structure / write TL;DR / write each section / final check&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;output-proofread&lt;/td&gt;
&lt;td&gt;Decide which findings to accept and edit accordingly&lt;/td&gt;
&lt;td&gt;Technical accuracy check / language proofreading / generate proofreading report&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;output-review&lt;/td&gt;
&lt;td&gt;User-led overall (author re-reads, asks questions, issues editing instructions)&lt;/td&gt;
&lt;td&gt;Provide grounded answers to questions / edit only when instructed&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The leadership balance in Phase 4 (verification) shifts depending on the type of verification. For verifications like Implementation, Measurement, and Reproduction, the agent handles the baseline planning, while the actual hands-on work — or directing the agent on separate implementation tasks — is user-led. On the other hand, Comparison (research of public information) and Diagramming are AI-led from start to finish.&lt;/p&gt;

&lt;p&gt;Ideally I'd want a guard that prevents output unless the user demonstrably understands the verification results (for example, the agent quizzing the user on the results and refusing to create the output unless they answer correctly), but I'll leave that to the user's (my own) conscience.&lt;/p&gt;
&lt;h2&gt;
  
  
  How to Use
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Installation
&lt;/h3&gt;

&lt;p&gt;The cadenza repository itself functions as a Claude Code marketplace. You can use it just by registering the marketplace and installing the plugin.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claude plugin marketplace add github.com/n-yokomachi/cadenza
claude plugin &lt;span class="nb"&gt;install &lt;/span&gt;cadenza@cadenza
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After installation, reload the plugin with &lt;code&gt;/reload-plugins&lt;/code&gt;, then check that cadenza is installed with &lt;code&gt;/plugins&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Launch
&lt;/h3&gt;

&lt;p&gt;Launch Claude Code in the project directory where you want to write a technical output, and start the flow by running &lt;code&gt;/cadenza:issue-finding&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/cadenza:issue-finding
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  State management
&lt;/h3&gt;

&lt;p&gt;cadenza creates a &lt;code&gt;./.cadenza/&lt;/code&gt; directory directly under the working directory and consolidates state and deliverables there.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;./.cadenza/
├── state.md          # Consolidates confirmed information from each phase
└── output.md         # Final deliverable
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Confirmed information from Phase 1 through Phase 5 is appended to &lt;code&gt;state.md&lt;/code&gt; sequentially. Each skill checks that the upstream phase's section exists in &lt;code&gt;state.md&lt;/code&gt; before proceeding downstream, so phase skipping is structurally prevented.&lt;/p&gt;

&lt;p&gt;If you work in a different project directory, &lt;code&gt;./.cadenza/&lt;/code&gt; will naturally be a separate one, so you can produce multiple articles in parallel.&lt;/p&gt;

&lt;h3&gt;
  
  
  Suspend and resume
&lt;/h3&gt;

&lt;p&gt;Since the result of each phase is written out to &lt;code&gt;state.md&lt;/code&gt;, you can resume even if the Claude Code session is disconnected. Running &lt;code&gt;/cadenza:&amp;lt;next phase&amp;gt;&lt;/code&gt; in a new session loads the contents of &lt;code&gt;state.md&lt;/code&gt; and lets you pick up where you left off.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bonus: Actual Flow
&lt;/h2&gt;

&lt;p&gt;The following is a walkthrough showing how cadenza actually behaves during output creation. The sample topic is "Quantitative Comparison of AI Coding Agent Free Tiers as of May 2026," and we'll follow how cadenza behaves at each step. The output content itself is also just a sample — I haven't reviewed it properly, so please take it with a grain of salt.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 1: Issue Finding
&lt;/h3&gt;

&lt;p&gt;When &lt;code&gt;/cadenza:issue-finding&lt;/code&gt; is launched, it starts with theme selection, hypothesis, and issue framing.&lt;/p&gt;

&lt;p&gt;The following 5 steps run in order.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Step&lt;/th&gt;
&lt;th&gt;Lead&lt;/th&gt;
&lt;th&gt;Interaction details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;User&lt;/td&gt;
&lt;td&gt;Confirm primary information → I've used all 7 tools / no experience getting stuck. Selected "Continue as research/survey by an experienced user"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;User&lt;/td&gt;
&lt;td&gt;Articulate target audience / reader's problem hypothesis / post-read change in 1-2 sentences&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;AI&lt;/td&gt;
&lt;td&gt;Survey existing articles via WebSearch → Found that the combination "free-tier-focused × quantitative × Japanese" was not yet covered&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;AI&lt;/td&gt;
&lt;td&gt;All 3 condition checks (essential choice / deep hypothesis / answerable) passed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;User&lt;/td&gt;
&lt;td&gt;Finalize the one-line issue and give explicit consent&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The final issue was decided as follows.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;As of May 2026, how should individual developers wanting to try AI coding agents choose the "tool to try first" that fits their use case from the free tiers of 7 major tools (Claude Code / Codex CLI / Cursor / Gemini CLI / GitHub Copilot / Windsurf / Kiro), and what usage patterns should they consider for paid upgrades?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Phase 2: Issue Decomposition
&lt;/h3&gt;

&lt;p&gt;Launching &lt;code&gt;/cadenza:issue-decomposition&lt;/code&gt; enters the process of building the storyline.&lt;/p&gt;

&lt;p&gt;The following 5 steps run.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Step&lt;/th&gt;
&lt;th&gt;Lead&lt;/th&gt;
&lt;th&gt;Interaction details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;AI proposal → user agreement&lt;/td&gt;
&lt;td&gt;Decomposition pattern selection. This time, Compare-Select (Options → Criteria → Decision) was proposed → adopted&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;User (this time: AI draft → user edit)&lt;/td&gt;
&lt;td&gt;One-line hypothesis for each sub-issue. Since I had already answered "no experience getting stuck / no unique elements" in Phase 1, the AI provided a draft that I adopted as-is&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;AI proposal → user agreement&lt;/td&gt;
&lt;td&gt;Storyline pattern. Sky-Rain-Umbrella, suitable for long-form articles, was proposed → adopted&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;User&lt;/td&gt;
&lt;td&gt;Pin the claim down to one sentence. Selected from 3 candidates&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;AI-led&lt;/td&gt;
&lt;td&gt;Storyline validity check (5 items). All items passed&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The storyline was settled as follows.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;Question&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;☁️ Sky (fact)&lt;/td&gt;
&lt;td&gt;What does it look like when each tool's free-tier limits are aligned in common units?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🌧️ Rain (interpretation)&lt;/td&gt;
&lt;td&gt;Do the differences in limits stem from each provider's business model? (Does the 3-strategy classification of growth-first / paid conversion / platform infiltration hold up?)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;☂️ Umbrella (action)&lt;/td&gt;
&lt;td&gt;How should we sort the tools by use case (completion / agent / refactoring) into "try first" vs. "paid required"?&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The claim is as follows.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Each provider's free tier reflects &lt;strong&gt;3 strategic patterns&lt;/strong&gt; (growth-first / paid conversion / platform infiltration), and the right approach is to choose by matching the reader's use case to each provider's strategic pattern.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Phase 3: Storyboarding
&lt;/h3&gt;

&lt;p&gt;Launching &lt;code&gt;/cadenza:storyboarding&lt;/code&gt; lets you design "how to show it" (code / diagrams / tables / benchmarks, etc.) and "what needs to be verified vs. what won't be" for each sub-issue.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Step&lt;/th&gt;
&lt;th&gt;Lead&lt;/th&gt;
&lt;th&gt;Interaction details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1, 2&lt;/td&gt;
&lt;td&gt;AI proposal → user agreement&lt;/td&gt;
&lt;td&gt;Propose the format (presentation) and specifications (content) for each sub-issue&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;AI proposal → user agreement&lt;/td&gt;
&lt;td&gt;Adjust to match the output style. This time I specified "drop flat onto Zenn, no diagrams, tables only"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;AI-led&lt;/td&gt;
&lt;td&gt;Organize all verification items and out-of-scope items (what won't be done)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;AI-led&lt;/td&gt;
&lt;td&gt;Storyboard review (5 items). All items passed&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The policy is to express each sub-issue in a single table (Mermaid diagrams are not used).&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Sub-issue&lt;/th&gt;
&lt;th&gt;Format&lt;/th&gt;
&lt;th&gt;Specification overview&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;☁️ Sky (sub-issue 1)&lt;/td&gt;
&lt;td&gt;Comparison table&lt;/td&gt;
&lt;td&gt;Rows = 7 tools / Columns = official limits, common-unit conversion (tasks/day), billing trigger, credit card requirement&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🌧️ Rain (sub-issue 2)&lt;/td&gt;
&lt;td&gt;3-strategy classification table&lt;/td&gt;
&lt;td&gt;Rows = 7 tools / Columns = strategy pattern, limit generosity, offering type, basis for judgment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;☂️ Umbrella (sub-issue 3)&lt;/td&gt;
&lt;td&gt;Decision matrix + descriptive paragraph&lt;/td&gt;
&lt;td&gt;A 21-cell grid of rows = 3 use cases × columns = 7 tools, marked with ◎ / ○ / △ / ×, followed by a decision-guidance paragraph&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Phase 4: Analysis Execution
&lt;/h3&gt;

&lt;p&gt;In &lt;code&gt;/cadenza:analysis-execution&lt;/code&gt;, the actual verification work is executed. This time, since it's just a sample, I scoped everything down to web research only.&lt;/p&gt;

&lt;h4&gt;
  
  
  Verification execution and main findings
&lt;/h4&gt;

&lt;p&gt;I aggregated the official pricing pages of the 7 tools and the business model background of each company via WebSearch, and defined the baseline unit "1 coding task = 1 file edit + ~5 completions, or 1 agent invocation" based on the author's usage experience. Using this, I converted each tool's free tier into "tasks/day."&lt;/p&gt;

&lt;p&gt;Main findings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For Tab completion generosity, Windsurf (unlimited) and GitHub Copilot Free (equivalent to 2,000/month) stand out&lt;/li&gt;
&lt;li&gt;For agent-driven use, Gemini CLI (1,000 req/day) is more generous than expected&lt;/li&gt;
&lt;li&gt;Claude Code / Codex CLI's free tiers are essentially zero (Pro at $20/month is required for serious use)&lt;/li&gt;
&lt;li&gt;Refactoring and large-scale tasks exceed the free tier across all providers; a paid plan is required&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Issue reconsideration (partial)
&lt;/h4&gt;

&lt;p&gt;The 3-strategy names (growth-first / paid conversion / platform infiltration) set in Phase 2 turned out to be inaccurate when checked against the data; "platform on-ramp / pure-tool paid funnel / completion-focused giveaway" proved to be a more persuasive classification.&lt;/p&gt;

&lt;p&gt;cadenza defines 3 types of upstream regression signals (storyline collapse / new issue discovery / format mismatch). Since this finding didn't match any of them — the underlying structure of three strategy patterns held up, only the labels needed updating — I decided that returning to Phase 2 was unnecessary, and updated only the hypothesis names within Phase 4.&lt;/p&gt;

&lt;p&gt;Partial shifts in the storyline driven by data are within the expected range, and cadenza is designed to ask the user whether to handle such shifts by "going all the way back upstream" or "updating in place."&lt;/p&gt;

&lt;h4&gt;
  
  
  Materials to hand off to Phase 5
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Comparison table (7 tools × 4 columns: official limits / task conversion / billing trigger / credit-card requirement)&lt;/li&gt;
&lt;li&gt;3-strategy classification table (revised, with the basis for each tool's classification)&lt;/li&gt;
&lt;li&gt;Decision matrix (3 use cases × 7 tools = 21 cells of ◎/○/△/× judgments)&lt;/li&gt;
&lt;li&gt;Updated claim candidate (version with the 3-strategy labels replaced)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That wraps up Phase 4. Next is Phase 5 (Output Crafting), which generates &lt;code&gt;./.cadenza/output.md&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 5: Output Crafting
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;/cadenza:output-crafting&lt;/code&gt; writes out the final Markdown to &lt;code&gt;./.cadenza/output.md&lt;/code&gt; based on the confirmed information from Phases 1 through 4.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Step&lt;/th&gt;
&lt;th&gt;Lead&lt;/th&gt;
&lt;th&gt;Interaction details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;AI-led&lt;/td&gt;
&lt;td&gt;Build the structural skeleton (Title / TL;DR / Background / one section per sub-issue / Conclusion / References)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;AI proposal → user selection&lt;/td&gt;
&lt;td&gt;Propose 3 title candidates; user selects one&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;AI-led&lt;/td&gt;
&lt;td&gt;TL;DR / opening (claim + target audience + post-read change in 3-5 lines)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;AI-led&lt;/td&gt;
&lt;td&gt;Write each sub-issue as its own section. Use the visuals (tables) decided in the Phase 3 storyboard as-is, maintaining storyboard fidelity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;AI-led&lt;/td&gt;
&lt;td&gt;Final check on code / diagrams / personal info. All 7 final confirmation items passed&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  Output body
&lt;/h4&gt;

&lt;p&gt;The generated &lt;code&gt;output.md&lt;/code&gt; is reproduced verbatim below as a raw Markdown source.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# AI Coding Agent Free Tiers Reflect 3 Strategies: Sorting 7 Tools by Use Case&lt;/span&gt;

&lt;span class="gu"&gt;## TL;DR&lt;/span&gt;

When the free tiers of 7 major AI coding agent tools (Claude Code / Codex CLI / Cursor / Gemini CLI / GitHub Copilot / Windsurf / Kiro) are aligned on a common "tasks/day" unit, each provider's limit design reflects one of 3 business strategies (platform on-ramp / pure-tool paid funnel / completion-focused giveaway). For individual developers, the realistic answer is not to commit to a single tool, but to combine the best free tier for each use case (completion-focused / agent-driven / refactoring or large-scale).

The target audience is individual developers and small-team developers who want to try AI coding agents and quantitatively compare multiple tools before going paid. After reading, you'll be able to gauge the practical value of each tool's free tier in common units, and sort tools that fit your use case into "try first" vs. "free isn't enough — paid required."

&lt;span class="gu"&gt;## Stance of this article&lt;/span&gt;

This comparison is a research / survey-style write-up by an experienced user. The author has actually used all 7 tools, but has no particular experience of "getting stuck" with the free tiers. The purpose is to organize information so that readers about to try them can quantitatively grasp the limits. Read this not as "stories of when I got stuck," but as "a map for those about to try them out."

&lt;span class="gu"&gt;## Defining the common unit: "1 coding task"&lt;/span&gt;

Each provider publishes limits in different units (requests / completions / tokens / premium requests / messages, etc.), so they can't be compared apples-to-apples as-is. This article normalizes them against the following baseline unit.
&lt;span class="gt"&gt;
&amp;gt; **1 coding task = 1 file edit + about 5 completions, or 1 agent invocation**&lt;/span&gt;

Here, "completion" means inline completion (short suggestions accepted via Tab), and "agent invocation" treats chat-based instructions or Edit / Cascade-style operations spanning multiple files as 1 unit. Conversion accuracy is roughly ±50% — the goal is approximate comparison of practical usage, not precision.

&lt;span class="gu"&gt;## Each tool's free tier limits (as of May 2026)&lt;/span&gt;

The following organizes each provider's official pricing page side-by-side.

| Tool | Official limit | Common unit conversion (tasks/day) | Billing trigger | Credit card required |
|--------|-----------|------------------------|------------|-----------|
| Claude Code | Pro at $20/month (annual contract: $17/month) required; not usable on the free tier. Some sources indicate that new API accounts receive about $5 in API credit | About 5 tasks/day (rough estimate, dividing $5 of API credit at Claude Sonnet 4.x mid-tier rates, assuming 5,000-10,000 tokens per task, spread over 30 days) | Pro subscription / API credit depletion | Required for both API and Pro subscription |
| Codex CLI | Codex is included with ChatGPT Free / Go, but specific usage limits for Free / Go must be checked individually in the ChatGPT usage dashboard (not listed in the official pricing table) | Not evaluable (limits aren't public, so can't be quantified) | ChatGPT Plus at $20/month | Optional |
| Cursor (Hobby) | Per publicly available info, 2,000 completions + 50 slow premium model requests/month (not directly extractable from the official pricing page; sourced from review articles) | About 13 tasks/day (completion) + about 1.7 premium/day | Monthly quota depletion → Pro at $20 | Not required |
| Gemini CLI | 1,000 requests/day, 60 requests/minute, about 250,000 tokens/minute (Flash model-centric; Pro model is limited. Specific values are checked individually in the Google AI Studio dashboard) | About 200 tasks/day (1 task = 5 requests) | Daily limit / per-minute token limit | Not required (Google account only) |
| GitHub Copilot Free | Officially listed as 2,000 completions/month + 50 agent mode or chat requests/month | About 13 tasks/day (completion) + about 1.7 agent / chat requests/day | Monthly quota depletion → Pro at $10 | Not required |
| Windsurf (Free) | Public info indicates Tab completion is exempt from the usage quota. Advanced features like Cascade are quota-based (not directly extractable from the official pricing page; sourced from review articles) | Completion: effectively unlimited / advanced: a few times/day | Advanced feature use → Pro at $20 | Not required |
| Kiro (Free) | Officially listed as 50 credits/month + an initial 500-credit bonus (must be used within 30 days), with overage at $0.04/credit | Steady-state: about 1.7 credits/day / Initial bonus: about 17 credits/day (vibe mode 1 = 1 credit, spec mode consumes several credits) | Credit depletion → Pro at $20 / additional $0.04/credit | Not required |
&lt;span class="gt"&gt;
&amp;gt; **Note**: The above values are aggregated from each provider's official pricing pages and review articles as of May 2026. Only GitHub Copilot Free and Kiro publish specific free-tier values directly in their official pricing tables. The others (Claude Code / Codex / Cursor / Gemini CLI / Windsurf) require checking separate pages or usage dashboards, or rely on external review articles for the published limits. Please verify each provider's latest official information before actually trying them.&lt;/span&gt;

The values span a &lt;span class="gs"&gt;**5-10× range**&lt;/span&gt;. The differences are too large to lump together as the same "free tier."

Laid out side-by-side, you can see significant variation in the generosity of free tiers. Anthropic and OpenAI are essentially zero, Google and Microsoft (GitHub) are more generous, Windsurf is unlimited only for completion, and Amazon's Kiro takes an unusual credit-based approach. This isn't a technical constraint — it's &lt;span class="gs"&gt;**a reflection of each provider's business model**&lt;/span&gt;.

&lt;span class="gu"&gt;## 3-strategy classification of free tiers&lt;/span&gt;
&lt;span class="gt"&gt;
&amp;gt; **Note**: The 3-strategy classification below is this article's original framing, not an established industry taxonomy. It groups providers into 3 categories from this article's perspective, based on strategies that each provider has officially expressed (see "judgment basis" below for citations). Readers are welcome to reclassify along other axes (IDE-embedded / CLI / feature maturity, etc.).&lt;/span&gt;

When the free-tier design of the 7 tools is viewed from a strategic perspective using this article's framing, the following 3 patterns emerge.

| Pattern | Name | Characteristics | Tools |
|------|------|------|----------|
| A | Platform on-ramp | Generous free tier draws users into the provider's other services (GitHub / Google Cloud / AWS). Their main business is cloud/platform, and the coding agent serves as an entry point | GitHub Copilot, Gemini CLI, Kiro |
| B | Pure-tool paid funnel | Thin free tier allows trial use, but full use requires a paid plan. The tool itself is the main business, with subscriptions as the primary revenue source | Cursor, Claude Code, Codex CLI |
| C | Completion-focused giveaway | Tab completion is fully released as unlimited; agent and advanced features are gated behind paid tiers. A strategic loss-leader aimed at maximizing IDE adoption | Windsurf |

&lt;span class="gu"&gt;### Each company's judgment basis (official source citation)&lt;/span&gt;

&lt;span class="gs"&gt;**A. Platform on-ramp**&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="gs"&gt;**GitHub Copilot**&lt;/span&gt;: Microsoft CEO Satya Nadella stated, "Any per user business of ours, whether it's productivity or coding or security, will become a per user and usage business," positioning Copilot as part of Microsoft's company-wide per-user + usage strategy (&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;GitHub Blog: usage-based billing&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;https://github.blog/news-insights/company-news/github-copilot-is-moving-to-usage-based-billing/&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;). Copilot Free can be read as the entry point, aiming for stickiness to GitHub accounts and repositories plus monetization through Enterprise integration.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**Gemini CLI**&lt;/span&gt;: Google's official blog announcement explicitly states, "industry's largest allowance with 60 model requests per minute and 1,000 requests per day at no charge," and lays out a tiered funnel where additional quota moves to "usage-based billing with Google AI Studio or Vertex AI key" (&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Google Blog: Introducing Gemini CLI&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;https://blog.google/technology/developers/introducing-gemini-cli-open-source-ai-agent/&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;). It's designed as the entry point for Cloud migration: Free → Standard → Vertex AI Enterprise.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**Kiro**&lt;/span&gt;: AWS positions Kiro as the successor to Amazon Q Developer and has announced that no new Q Developer Free Tier accounts will be created going forward (&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;AWS Blog: Q Developer end-of-support&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;https://aws.amazon.com/blogs/devops/amazon-q-developer-end-of-support-announcement/&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;). Signing in with an AWS Builder ID enables direct integration with Amazon Q and the broader AWS ecosystem (&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Kiro Authentication docs&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;https://kiro.dev/docs/getting-started/authentication/&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;), which reads as a play for AWS developer acquisition.

&lt;span class="gs"&gt;**B. Pure-tool paid funnel**&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="gs"&gt;**Cursor**&lt;/span&gt;: Anysphere has reached &lt;span class="gs"&gt;**$1B in annualized revenue with over 1 million paying users**&lt;/span&gt; on Cursor alone, with the $20 Pro subscription as its revenue mainstay. Hobby Free is positioned as an evaluation tier: "a real comparison... most developers who give it a serious two-week test either upgrade to Pro or decide the tool is not for them" (interpretation based on public benchmarks + review articles).
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**Claude Code**&lt;/span&gt;: Anthropic Head of Growth Amol Avasare publicly explained the bundling strategy with Pro/Max: "Max launched a year prior, it didn't include Claude Code, and the company later bundled Claude Code into Max after it took off" (&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;reported by The Register&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;https://www.theregister.com/2026/04/22/anthropic_removes_claude_code_pro/&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;). Claude Code is positioned as a lever for boosting subscription engagement.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**Codex CLI**&lt;/span&gt;: OpenAI's official blog states, "Codex is included with ChatGPT Plus, Pro, Business, and Enterprise plans—no separate subscription needed" (&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;OpenAI: Introducing Codex&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;https://openai.com/index/introducing-codex/&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;), and signing in to Plus / Pro also grants free API credit ($5/$50). The tight Free / Go limits can be read as a deliberate push toward the ChatGPT subscription.

&lt;span class="gs"&gt;**C. Completion-focused giveaway**&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="gs"&gt;**Windsurf**&lt;/span&gt;: Cognition (Devin's parent company) acquired Windsurf for about $250M in December 2025. Cognition CEO Scott Wu laid out the strategy in an official blog post: "start by integrating Cognition's autonomous AI-powered engineer Devin into Windsurf's IDE," and "developers can plan tasks in Windsurf and launch a team of Devins" (&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Cognition Blog: Windsurf acquisition&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;https://cognition.ai/blog/windsurf&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;). The Free plan's unlimited Tab completion accelerates IDE adoption, while monetization comes from advanced features (Cascade / Devin integration).

Once you see the strategy patterns, you realize that even similar numbers like 2,000 completions/month &lt;span class="gs"&gt;**play opposite roles depending on the strategy**&lt;/span&gt;. For GitHub Copilot it acts as "an entry point for user retention (a steady allowance to keep users in the GitHub ecosystem long-term)," while for Cursor it acts as "a cutoff line before paid (a mechanism to push serious users toward Pro at $20/month)." That's the contrast.

&lt;span class="gu"&gt;## Sorting by use case × tool&lt;/span&gt;

Based on the strategy patterns, the table below judges how far the 7 tools can be pushed across 3 typical use cases.

| Use case | Claude Code | Codex CLI | Cursor | Gemini CLI | GitHub Copilot | Windsurf | Kiro |
|------|-------------|-----------|--------|------------|----------------|----------|------|
| Completion-focused (Tab completion as main) | × | × | △ | ○ | ◎ | ◎ | △ |
| Agent-driven (delegating tasks) | △ | × | △ | ◎ | △ | △ | ○ |
| Refactoring / large-scale (multi-file editing) | × | × | × | △ | × | △ | × |

Legend: ◎ try first / ○ worth trying / △ paid required / × skip

&lt;span class="gu"&gt;### Reading by use case&lt;/span&gt;

&lt;span class="gs"&gt;**Completion-focused users**&lt;/span&gt; (typing in the IDE while heavily using Tab completion) have &lt;span class="gs"&gt;**two clear winners: GitHub Copilot Free and Windsurf**&lt;/span&gt;. Copilot Free offers 2,000 completions/month, while Windsurf has fully unlimited Tab. Copilot brings strong integration with the GitHub ecosystem, while Windsurf stands out for the polish of the IDE itself. Gemini CLI's 1,000 req/day is hard to dismiss for completion use, but being CLI-based, it's a fundamentally different experience from in-IDE completion. Cursor and Kiro deplete quota quickly under completion-focused use, and Claude Code and Codex CLI don't target completion as their primary use case (both lean agent).

&lt;span class="gs"&gt;**Agent-driven users**&lt;/span&gt; (delegating tasks via chat, auto-editing multiple files) will find that, perhaps surprisingly, &lt;span class="gs"&gt;**Gemini CLI Free is the strongest option**&lt;/span&gt;. Even though it's Flash-model-centric, 1,000 requests/day is plenty to try agent tasks, and being usable with just a Google account is a big plus. Kiro also suits agent delegation in spec mode, but burns through credits quickly under its credit-based system. Claude Code has high-quality agent design, but with a free tier of essentially zero, Pro is required for serious use. Cursor's 50 premium requests/month is insufficient for trying agent-driven workflows, and Codex CLI's free-tier limits aren't public, so it falls outside the scope of evaluation here.

&lt;span class="gs"&gt;**Refactoring / large-scale users**&lt;/span&gt; (structural changes spanning multiple files, heavy editing) unfortunately face &lt;span class="gs"&gt;**structurally insufficient free tiers across the board**&lt;/span&gt;. Cursor's 50 premium runs out in a few days, as do Kiro's 50 credits and Copilot Free's 50 agent / chat requests. Windsurf's Cascade is also limited to a few uses per day on the free tier. Gemini CLI's 1,000 requests/day is theoretically generous, but keeping multi-file editing within 5 requests per task is practically infeasible, and the quota gets consumed regardless. &lt;span class="gs"&gt;**For serious use of AI agents in this category, a paid upgrade to one of the tools should be assumed from the outset.**&lt;/span&gt;

&lt;span class="gu"&gt;## Conclusion&lt;/span&gt;

The fact that free-tier strategies divide into 3 patterns gives the chooser a clear guideline: &lt;span class="gs"&gt;**"match your use case to each provider's strategy pattern."**&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="gs"&gt;**Completion-focused trial**&lt;/span&gt; → Pattern A (GitHub Copilot Free / Windsurf) is enough. You get the benefits of the platform on-ramp while staying within the free tier
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**Agent-driven trial**&lt;/span&gt; → Short-term verification with Pattern A (Gemini CLI Free, or Kiro's initial bonus)
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**Refactoring / large-scale serious use**&lt;/span&gt; → A paid upgrade to one of the Pattern B tools (Cursor Pro / Claude Code Pro) is required

In other words, &lt;span class="gs"&gt;**individual developers shouldn't commit to a single tool but should combine free tiers by use case**&lt;/span&gt; — that's the realistic answer as of May 2026. Once you read each provider's strategy against your own use case, the right timing for going paid also becomes apparent on its own.

&lt;span class="gu"&gt;## Aside: OSS BYO API tools&lt;/span&gt;

While outside the scope of this article, the following OSS tools are worth knowing about as a separate category — &lt;span class="gs"&gt;**"the tool itself is $0 since it's OSS, but you pay separately for LLM API usage"**&lt;/span&gt;.
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="gs"&gt;**Cline**&lt;/span&gt;: A VS Code extension; an OSS tool whose popularity is rapidly rising. You bring your own API keys for Anthropic / OpenAI / Google, etc.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**Aider**&lt;/span&gt;: A terminal-CLI-based OSS tool, aimed at power users
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**Continue**&lt;/span&gt;: OSS that runs as an extension for VS Code / JetBrains

These follow a "tool free, API at cost" model, so they don't fit on this article's "quantitative comparison of free tiers" axis. They are, however, an option for developers who don't want to pay a fixed monthly API fee or who prefer to manage billing themselves.

&lt;span class="gu"&gt;## References&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Claude Code Pricing (Anthropic)&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;https://claude.com/pricing&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Codex Pricing (OpenAI Developers)&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;https://developers.openai.com/codex/pricing&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Cursor Models &amp;amp; Pricing&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;https://cursor.com/docs/models-and-pricing&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Gemini CLI Quotas (Google AI)&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;https://ai.google.dev/gemini-api/docs/rate-limits&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;GitHub Copilot Plans&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;https://github.com/features/copilot/plans&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Windsurf Pricing&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;https://windsurf.com/pricing&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Kiro Pricing&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;https://kiro.dev/pricing/&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>ai</category>
      <category>claude</category>
      <category>writing</category>
      <category>documentation</category>
    </item>
    <item>
      <title>Improving and Validating Multi-Agent Prompts with Bedrock AgentCore Optimization</title>
      <dc:creator>yoko / Naoki Yokomachi</dc:creator>
      <pubDate>Mon, 04 May 2026 07:22:57 +0000</pubDate>
      <link>https://dev.to/aws-builders/improving-and-validating-multi-agent-prompts-with-bedrock-agentcore-optimization-4052</link>
      <guid>https://dev.to/aws-builders/improving-and-validating-multi-agent-prompts-with-bedrock-agentcore-optimization-4052</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This article is an AI-assisted translation of a Japanese technical article.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In April 2026, Amazon Bedrock AgentCore added a new capability called &lt;strong&gt;Optimization&lt;/strong&gt;, which takes real agent traces and proposes prompt improvements based on them.&lt;br&gt;
&lt;a href="https://aws.amazon.com/about-aws/whats-new/2026/05/bedrock-agentcore-optimization-preview/" rel="noopener noreferrer"&gt;https://aws.amazon.com/about-aws/whats-new/2026/05/bedrock-agentcore-optimization-preview/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this article, I apply AgentCore Optimization to a Strands Agents-as-Tools setup (a main agent that wraps sub-agents as &lt;code&gt;@tool&lt;/code&gt;s) and walk through what actually happens. What kind of improvements does Recommendations propose? Does the change hold up under real traffic in an A/B test? And how does it feel to put this into operation? Those are the questions I tried to answer.&lt;/p&gt;
&lt;h2&gt;
  
  
  Inside AgentCore Optimization
&lt;/h2&gt;

&lt;p&gt;Let me start by laying out what Optimization actually consists of.&lt;/p&gt;
&lt;h3&gt;
  
  
  The three capabilities
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Recommendations&lt;/td&gt;
&lt;td&gt;Takes real trace logs plus a target Evaluator as input, and has an AI generate improved versions of system prompts and tool descriptions. Instead of you iterating manually, Recommendations does the iteration for you.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Configuration bundles&lt;/td&gt;
&lt;td&gt;Externalizes prompts and tool descriptions out of source code and version-manages them on the AgentCore side. You can change agent behavior just by swapping the bundled values — no code change, no redeploy. Also used to run two settings side by side in the A/B test described below.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A/B testing&lt;/td&gt;
&lt;td&gt;Routes real traffic via AgentCore Gateway between two variants (control / treatment), scoring each side with an Evaluator. You can compare which prompt actually performs better in production, with statistical backing.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The official docs describe these three as a "continuous improvement loop": Recommendations generates an improved version → Configuration bundles version-controls it → A/B testing validates the effect under real traffic. The three capabilities are designed to cycle.&lt;/p&gt;
&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;p&gt;Following the official docs, the setup requires:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An agent built with Strands Agents&lt;/li&gt;
&lt;li&gt;Deployed to AgentCore Runtime with Observability enabled&lt;/li&gt;
&lt;li&gt;CloudWatch Transaction Search enabled&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Building the test setup
&lt;/h2&gt;

&lt;p&gt;For the experiment I built a multi-agent setup with Strands Agents — a main agent that delegates to specialized sub-agents for weather and news, wired together with the Agents-as-Tools pattern.&lt;/p&gt;

&lt;p&gt;The repo:&lt;br&gt;
&lt;a href="https://github.com/n-yokomachi/agentcore-optimization-lab" rel="noopener noreferrer"&gt;https://github.com/n-yokomachi/agentcore-optimization-lab&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Configuration bundle structure
&lt;/h3&gt;

&lt;p&gt;To make a setup A/B-testable, prompts and tool descriptions need to be externalized in &lt;code&gt;configBundles&lt;/code&gt; inside &lt;code&gt;agentcore.json&lt;/code&gt;. The bundle structure I ended up with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"components"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"{{runtime:agentsAsToolsLab}}"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"configuration"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"systemPrompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"You are an assistant that answers questions about weather and news."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"weather_agent"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Get weather"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"news_agent"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Get news"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A note on the prompts: I deliberately wrote them quite carelessly so the impact of Recommendations would be easy to see.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;{{runtime:agentsAsToolsLab}}&lt;/code&gt; is an agentcore CLI placeholder; it gets resolved to the actual Runtime ARN at deploy time.&lt;/p&gt;

&lt;p&gt;One quirk: the tool descriptions (&lt;code&gt;weather_agent&lt;/code&gt; / &lt;code&gt;news_agent&lt;/code&gt;) sit directly under &lt;code&gt;configuration&lt;/code&gt; as flat siblings. This shape matches how the Recommendations API resolves the tool description path. The default structure that the AgentCore CLI generates with &lt;code&gt;--with-config-bundle&lt;/code&gt; (which nests them under &lt;code&gt;toolDescriptions&lt;/code&gt;) didn't resolve correctly for tool description Recommendations, so I flattened it and that worked.&lt;/p&gt;

&lt;p&gt;Adding the bundle definition and deploying are both done through the AgentCore CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentcore add config-bundle
agentcore deploy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Wiring the bundle into the agent
&lt;/h3&gt;

&lt;p&gt;To inject bundle values into the Runtime dynamically, we use Strands' hook mechanism. The &lt;code&gt;ConfigBundleHook&lt;/code&gt; class overrides the main agent's system prompt at &lt;code&gt;BeforeInvocationEvent&lt;/code&gt; and each tool's description at &lt;code&gt;BeforeToolCallEvent&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ConfigBundleHook&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;HookProvider&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;register_hooks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;HookRegistry&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_callback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BeforeInvocationEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_inject_system_prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_callback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BeforeToolCallEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_override_tool_description&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_inject_system_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;BeforeInvocationEvent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;BedrockAgentCoreContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_config_bundle&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;system_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;systemPrompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;DEFAULT_SYSTEM_PROMPT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_override_tool_description&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;BeforeToolCallEvent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;BedrockAgentCoreContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_config_bundle&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;override&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_use&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;override&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;selected_tool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;spec&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;selected_tool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_spec&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;spec&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;override&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This Hook class is based on the template the AgentCore CLI generates with &lt;code&gt;--with-config-bundle&lt;/code&gt;. Because I flattened the bundle structure, the tool description lookup (&lt;code&gt;config.get(event.tool_use["name"])&lt;/code&gt;) is simpler than the generated default.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recommendations and A/B test run
&lt;/h2&gt;

&lt;p&gt;For the experiment I generated trace logs from 8 English queries × 5 rounds = 40 sessions, then ran both system-prompt and tool-description Recommendations against the agent.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentcore run recommendation &lt;span class="nt"&gt;--type&lt;/span&gt; system-prompt
agentcore run recommendation &lt;span class="nt"&gt;--type&lt;/span&gt; tool-description
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Recommendations on the system prompt
&lt;/h3&gt;

&lt;p&gt;The original system prompt and the Recommendations output are both visible in the AWS Console. The improved prompt now factors in tool calling — phrases like "call both tools in parallel" and "use news_agent to find related news" appear in the suggestion.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhjdkk6w49xzsqiw9jmof.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhjdkk6w49xzsqiw9jmof.png" alt=" " width="800" height="411"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Recommendations on the tool descriptions
&lt;/h3&gt;

&lt;p&gt;The before/after for tool descriptions is visible in the same way. The descriptions are filled out more thoroughly, and they explicitly call out the possibility of parallel use with the other sub-agent — phrases like "Often used alongside news_agent" and "Often used alongside weather_agent".&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff8ambufewxhojes95gz0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff8ambufewxhojes95gz0.png" alt=" " width="800" height="412"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  A/B test for effect validation
&lt;/h3&gt;

&lt;p&gt;To verify that the Recommendations output actually moves the needle, I ran an A/B test as well.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Control variant (C): bundle version with the human-authored prompt and tool descriptions&lt;/li&gt;
&lt;li&gt;Treatment variant (T1): bundle version with the Recommendations output applied&lt;/li&gt;
&lt;li&gt;Traffic split: 50/50 (sticky session-to-variant assignment by session ID)&lt;/li&gt;
&lt;li&gt;Online Evaluator: &lt;code&gt;Builtin.GoalSuccessRate&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Traffic volume: 8 queries × 5 rounds = 40 sessions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To run the A/B test you need an HTTP Gateway and an Online evaluation config. The HTTP Gateway has to be added by hand to &lt;code&gt;httpGateways&lt;/code&gt; in &lt;code&gt;agentcore.json&lt;/code&gt; (no &lt;code&gt;add&lt;/code&gt; subcommand seems to exist for it at the moment). The Online evaluation config is added with &lt;code&gt;agentcore add online-eval&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"httpGateways"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"agentsAsToolsLabGateway"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"runtimeRef"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"agentsAsToolsLab"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentcore add online-eval
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then add the A/B test itself and register everything in one go with deploy.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentcore add ab-test
agentcore deploy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Traffic generation is done by POSTing to the AgentCore Gateway URL with SigV4 auth. &lt;code&gt;agentcore invoke&lt;/code&gt; hits the Runtime directly, so for the A/B test we have to go through the Gateway URL. Here's the script I used:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;GATEWAY_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://agentsastoolslabgateway-XXXXX.gateway.bedrock-agentcore.us-west-2.amazonaws.com/agentsAsToolsLab/invocations&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;credentials&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Session&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;get_credentials&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;invoke_one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;sid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;req&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AWSRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;POST&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;GATEWAY_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content-Type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-Amzn-Bedrock-AgentCore-Runtime-Session-Id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;sid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="nc"&gt;SigV4Auth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;credentials&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bedrock-agentcore&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-west-2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;add_auth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;http_req&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;urllib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;GATEWAY_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;POST&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;urllib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;urlopen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;http_req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;180&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;sid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The A/B test results are visible in the AWS Console under "Bedrock AgentCore &amp;gt; Optimizations &amp;gt; A/B Tests".&lt;/p&gt;

&lt;p&gt;Here are the numbers:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Sessions routed to control&lt;/td&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;td&gt;Number of sessions routed to the control variant&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sessions routed to variant&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;Number of sessions routed to the treatment variant&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Control average (Goal Success Rate)&lt;/td&gt;
&lt;td&gt;0.48&lt;/td&gt;
&lt;td&gt;Mean Goal Success Rate of the control variant&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Variant average&lt;/td&gt;
&lt;td&gt;0.53&lt;/td&gt;
&lt;td&gt;Mean Goal Success Rate of the treatment variant&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Variant improvement&lt;/td&gt;
&lt;td&gt;Not significant: +10.5% (p=0.95)&lt;/td&gt;
&lt;td&gt;Treatment shows a +10.5% improvement over control, but not statistically significant (p&amp;gt;0.05)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9urda45vfdrcrzstbrgv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9urda45vfdrcrzstbrgv.png" alt=" " width="800" height="411"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Directionally, the treatment is ahead by +5pt absolute (= +10.5% relative). So the Recommendations output is moving things in the right direction, but with only 40 sessions there isn't enough data to claim statistical significance. Since the original goal — confirming Recommendations actually works end to end — is met, and going further would start to hurt my wallet, I'm cutting the experiment off here.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where to draw the line with Recommendations
&lt;/h2&gt;

&lt;p&gt;This is just from this experiment, but if I sort the improvement patterns Recommendations produced, I think the natural division of labor between Recommendations and the developer looks something like this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Owner&lt;/th&gt;
&lt;th&gt;Domain&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Recommendations&lt;/td&gt;
&lt;td&gt;Mention of parallel calls, naming of related elements, multilingual support callouts, response format directives, safety mechanisms, proactive behavior&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Developer&lt;/td&gt;
&lt;td&gt;Domain context, business logic, data interpretation policy&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;So when you put Recommendations into your operational loop, the parts you (the human) still need to write are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Domain-specific context (specific customer business processes, external API specs, etc.)&lt;/li&gt;
&lt;li&gt;Business logic (output constraints, compliance, billing rules, etc.)&lt;/li&gt;
&lt;li&gt;Data interpretation policy (e.g. "when this field is empty, treat it as X")&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For everything else — the "general patterns of good prompt writing" — it might be reasonable to let Recommendations handle it. That's the takeaway for me from this experiment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrap-up
&lt;/h2&gt;

&lt;p&gt;So that was a hands-on look at AgentCore Optimization on an Agents-as-Tools setup. The takeaways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Recommendations extracts general patterns like parallel invocation, tangential topic handling, response format, and safety mechanisms&lt;/li&gt;
&lt;li&gt;A boundary becomes visible between what humans should write (domain context, business logic) and what we can hand off to Recommendations&lt;/li&gt;
&lt;li&gt;The A/B testing capability and its outputs are confirmed working, but at this experiment's scale the sample size isn't enough for significance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's it. I hope this is useful for anyone planning to try Optimization themselves.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bonus: Japanese system prompts getting misflagged as prompt injection?
&lt;/h2&gt;

&lt;p&gt;When I ran the system prompt Recommendation with a Japanese prompt like &lt;code&gt;--inline "あなたは天気とニュースに答えるアシスタント。"&lt;/code&gt;, I got this error:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[ValidationException] The provided content was detected as unsafe by 
prompt attack protection. Please review your system prompt and try again.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After narrowing it down:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fails regardless of Evaluator (&lt;code&gt;Builtin.GoalSuccessRate&lt;/code&gt; / &lt;code&gt;Builtin.Helpfulness&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Fails whether via bundle or inline mode&lt;/li&gt;
&lt;li&gt;Fails even when I rewrite the Japanese prompt in different ways&lt;/li&gt;
&lt;li&gt;Works as soon as I switch to English&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the only difference that flips the outcome is the language of the prompt. Tool description Recommendations work fine in Japanese, by the way.&lt;/p&gt;

&lt;p&gt;For that reason, all the experiments in this article ended up being run with English prompts.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/optimization.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/optimization.html&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/aws/agentcore-cli" rel="noopener noreferrer"&gt;https://github.com/aws/agentcore-cli&lt;/a&gt;&lt;br&gt;
&lt;a href="https://aws.amazon.com/about-aws/whats-new/2026/05/bedrock-agentcore-optimization-preview/" rel="noopener noreferrer"&gt;https://aws.amazon.com/about-aws/whats-new/2026/05/bedrock-agentcore-optimization-preview/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aws</category>
      <category>agents</category>
      <category>llm</category>
    </item>
    <item>
      <title>Building an AWS Cost Visualization Workflow with Strands Agents Skills and AgentCore Code Interpreter</title>
      <dc:creator>yoko / Naoki Yokomachi</dc:creator>
      <pubDate>Fri, 03 Apr 2026 01:26:25 +0000</pubDate>
      <link>https://dev.to/aws-builders/building-an-aws-cost-visualization-workflow-with-strands-agents-skills-and-agentcore-code-2d2</link>
      <guid>https://dev.to/aws-builders/building-an-aws-cost-visualization-workflow-with-strands-agents-skills-and-agentcore-code-2d2</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;I'm currently developing a personal AI agent called TONaRi. It also has an X (Twitter) account where it posts tech news and more.&lt;br&gt;
&lt;a href="https://x.com/tonari_with" rel="noopener noreferrer"&gt;https://x.com/tonari_with&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The agent's core architecture is built on &lt;a href="https://strandsagents.com/" rel="noopener noreferrer"&gt;Strands Agents&lt;/a&gt; + &lt;a href="https://aws.amazon.com/bedrock/agentcore/" rel="noopener noreferrer"&gt;Amazon Bedrock AgentCore&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffyz2pxpy9x1s8ia4nuwt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffyz2pxpy9x1s8ia4nuwt.png" alt="Architecture overview" width="800" height="520"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this article, I combined AgentCore Code Interpreter with Strands Agents' Agent Skills to implement a workflow that retrieves AWS cost data and generates chart images using code. Check out the video demo below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://x.com/_cityside/status/2035339843014987845" rel="noopener noreferrer"&gt;https://x.com/_cityside/status/2035339843014987845&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Although this was an addition to an existing web application codebase, I hope it also serves as a useful reference for building something similar from scratch.&lt;/p&gt;

&lt;p&gt;Here are the main technologies used:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AgentCore Code Interpreter&lt;/strong&gt;: One of Amazon Bedrock AgentCore's building blocks that executes code in a sandboxed environment&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent Skills (SKILL.md)&lt;/strong&gt;: Externalized prompts that are loaded on demand&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost Explorer API&lt;/strong&gt;: An API for retrieving AWS cost data, called from an agent tool&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;S3&lt;/strong&gt;: Stores chart images generated by Code Interpreter, served to the frontend via Presigned URLs&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;
  
  
  Amazon Bedrock AgentCore Code Interpreter
&lt;/h1&gt;

&lt;p&gt;Amazon Bedrock AgentCore Code Interpreter (hereafter "Code Interpreter") is one of the building blocks that allows agents hosted on AgentCore Runtime to safely execute code in a sandboxed environment.&lt;br&gt;
&lt;a href="https://aws.amazon.com/blogs/machine-learning/introducing-the-amazon-bedrock-agentcore-code-interpreter/" rel="noopener noreferrer"&gt;https://aws.amazon.com/blogs/machine-learning/introducing-the-amazon-bedrock-agentcore-code-interpreter/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Key features include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Code execution in a sandboxed environment&lt;/li&gt;
&lt;li&gt;Pre-installed libraries such as pandas, numpy, and matplotlib&lt;/li&gt;
&lt;li&gt;In addition to the default access-restricted environment, you can create user-defined environments with public internet access or VPC connectivity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this project, I use Code Interpreter to have the agent dynamically generate chart images from data using matplotlib.&lt;/p&gt;
&lt;h1&gt;
  
  
  Strands Agents Skills
&lt;/h1&gt;

&lt;p&gt;Agent Skills is a mechanism originally proposed by Anthropic. In a nutshell, it works like this: you define procedures you want the agent to execute in Markdown files (similar to system prompts), then inject only the metadata into the system prompt. The agent dynamically loads the Skill files based on the metadata and executes the procedures. This approach helps reduce token consumption and prevents context pollution.&lt;/p&gt;

&lt;p&gt;As of March 2026, Agent Skills are now available in Strands Agents as well:&lt;br&gt;
&lt;a href="https://strandsagents.com/docs/user-guide/concepts/plugins/skills/" rel="noopener noreferrer"&gt;https://strandsagents.com/docs/user-guide/concepts/plugins/skills/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For this project, I defined the following workflow as a Skill:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Call the Cost Explorer API tool to retrieve cost data for the user-specified period&lt;/li&gt;
&lt;li&gt;Call the cost visualization tool

&lt;ul&gt;
&lt;li&gt;2-1. Convert cost data into a chart image using Code Interpreter&lt;/li&gt;
&lt;li&gt;2-2. Upload the image to S3&lt;/li&gt;
&lt;li&gt;2-3. Return the S3 presigned URL&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h1&gt;
  
  
  Processing Flow
&lt;/h1&gt;

&lt;p&gt;Here's a simplified overview of the processing flow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User: "Show me this month's AWS costs"
  ↓
Main Agent
  ├─ ① skills tool: Load skill
  ├─ ② get_aws_cost tool: Call Cost Explorer API
  └─ ③ execute_python tool
     └─ ③-1 Generate matplotlib chart via Code Interpreter
        ③-2 Upload to S3
        ③-3 Return presigned URL
  ↓
Frontend: Detect S3 image URL in text → Display inline in chat
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Implementation
&lt;/h1&gt;

&lt;h2&gt;
  
  
  get_aws_cost: Cost Data Retrieval Tool
&lt;/h2&gt;

&lt;p&gt;The AWS cost retrieval tool is defined as an agent tool using the &lt;code&gt;@tool&lt;/code&gt; decorator. The logic is separated from the Code Interpreter chart image generation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;

&lt;span class="n"&gt;_ce_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ce&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ap-northeast-1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_aws_cost&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;period&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;monthly&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;months&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;group_by_service&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Retrieve AWS cost data from Cost Explorer.

    Use this tool to fetch cost data. Then pass the result to execute_python
    to create matplotlib charts for visualization.

    Args:
        period: Granularity - &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;monthly&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; or &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;daily&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.
        months: Number of months to look back (default: 1, max: 6).
        group_by_service: If True, break down costs by AWS service.

    Returns:
        JSON string with cost data.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;ce&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_ce_client&lt;/span&gt;
    &lt;span class="c1"&gt;# ...
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ce&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_cost_and_usage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;TimePeriod&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Start&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;End&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;Granularity&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MONTHLY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Metrics&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;UnblendedCost&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;GroupBy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DIMENSION&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SERVICE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  execute_python: Code Execution Tool
&lt;/h2&gt;

&lt;p&gt;Similarly, Code Interpreter code execution is defined as an agent tool using the &lt;code&gt;@tool&lt;/code&gt; decorator. To reliably capture matplotlib figures, the tool automatically injects capture code before and after the agent-generated code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bedrock_agentcore.tools.code_interpreter_client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;code_session&lt;/span&gt;

&lt;span class="n"&gt;CODE_INTERPRETER_REGION&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CODE_INTERPRETER_REGION&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ap-northeast-1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;OUTPUT_BUCKET&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CODE_INTERPRETER_OUTPUT_BUCKET&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ap-northeast-1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;_s3_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;s3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AWS_REGION&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ap-northeast-1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute_python&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Execute Python code in a sandboxed environment. Use this to run data analysis,
    generate charts with matplotlib, or perform calculations.

    Available libraries: pandas, numpy, matplotlib, json, datetime.
    Use ONLY matplotlib for plotting (not seaborn).
    Use English for all chart labels and titles (Japanese fonts are not available).

    IMPORTANT for chart generation:
    - Do NOT call plt.savefig() — images are auto-captured from open figures.
    - Do NOT call plt.close() — closing figures prevents image capture.
    - Just create figures with plt.subplots() and leave them open.
    - Do NOT use boto3 — the sandbox has no AWS credentials.

    Args:
        code: Python code to execute.
        description: Optional description of what the code does.

    Returns:
        JSON string with execution results including stdout, stderr, and image URLs.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Automatically inject matplotlib image capture code
&lt;/span&gt;    &lt;span class="n"&gt;img_code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
import matplotlib
matplotlib.use(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Agg&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;)
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
import matplotlib.pyplot as plt, base64, io, json as _json
_imgs = []
for _i in plt.get_fignums():
    _b = io.BytesIO()
    plt.figure(_i).savefig(_b, format=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;png&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, bbox_inches=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tight&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, dpi=100)
    _b.seek(0)
    _imgs.append({{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;i&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;: _i, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;d&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;: base64.b64encode(_b.read()).decode()}})
if _imgs:
    print(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;_IMG_&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; + _json.dumps(_imgs) + &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;_END_&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;)
plt.close(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;all&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;)
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;code_session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CODE_INTERPRETER_REGION&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;code_client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;code_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;executeCode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;img_code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;language&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;clearContext&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="c1"&gt;# Extract images from stdout using _IMG_..._END_ markers
&lt;/span&gt;        &lt;span class="c1"&gt;# Upload to S3 and return presigned URLs
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Creating the SKILL.md
&lt;/h2&gt;

&lt;p&gt;Now that the tools are defined, we create the Agent Skill that defines how to call them. The directory structure looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;agentcore/
├── skills/
│   └── aws-cost/
│       └── SKILL.md
├── app.py
└── ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The SKILL.md file contains YAML frontmatter and a Markdown-formatted prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws-cost&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Analyze&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;visualize&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;AWS&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;cost&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;using&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;get_aws_cost"&lt;/span&gt;
  &lt;span class="s"&gt;for data retrieval and execute_python for matplotlib chart generation&lt;/span&gt;
&lt;span class="na"&gt;allowed-tools&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;get_aws_cost execute_python&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gh"&gt;# AWS Cost Analysis Skill&lt;/span&gt;

Two-step process: fetch data with &lt;span class="sb"&gt;`get_aws_cost`&lt;/span&gt;,
then visualize with &lt;span class="sb"&gt;`execute_python`&lt;/span&gt;.

&lt;span class="gu"&gt;## Critical Rules&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="gs"&gt;**NEVER call plt.savefig()**&lt;/span&gt; — images are auto-captured from open figures.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**NEVER call plt.close()**&lt;/span&gt; — closing figures prevents image capture.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**Use English for ALL text**&lt;/span&gt; in charts — Japanese fonts are unavailable.

&lt;span class="gu"&gt;## Step 1: Fetch Data&lt;/span&gt;
(How to call get_aws_cost)

&lt;span class="gu"&gt;## Step 2: Visualize&lt;/span&gt;
(matplotlib code template)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Integrating with the Agent
&lt;/h2&gt;

&lt;p&gt;The tools are passed via the &lt;code&gt;tools&lt;/code&gt; parameter, and the Skill is initialized with the &lt;code&gt;AgentSkills&lt;/code&gt; plugin and passed to the agent.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AgentSkills&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;src.agent.code_interpreter&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;execute_python&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;src.agent.aws_cost&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;get_aws_cost&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize the Skills plugin
&lt;/span&gt;&lt;span class="n"&gt;skills_plugin&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentSkills&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skills&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./skills/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Create the agent
&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;other_tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;execute_python&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;get_aws_cost&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;plugins&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;skills_plugin&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I'll skip the frontend implementation details, but essentially it detects image URLs in the agent's response and automatically fetches and displays them inline.&lt;/p&gt;

&lt;h1&gt;
  
  
  Demo
&lt;/h1&gt;

&lt;p&gt;Here's what it looks like when the skill is actually running. Since the chart-generating code is dynamically created by the agent, the output varies depending on how you phrase your instructions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbez6nxpakl4bzypunzt2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbez6nxpakl4bzypunzt2.png" alt="Demo screenshot" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's the video demo again from the beginning of the article:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://x.com/_cityside/status/2035339843014987845" rel="noopener noreferrer"&gt;https://x.com/_cityside/status/2035339843014987845&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Wrapping Up
&lt;/h1&gt;

&lt;p&gt;That's how I implemented an AWS cost charting feature using Agent Skills + Code Interpreter. (Admittedly, you could just look at the Cost Explorer console for the same information, but this was more of a proof of concept...)&lt;/p&gt;

&lt;p&gt;In this implementation, I used the default Code Interpreter tool, which restricts public internet access. However, by using a user-defined Code Interpreter tool, you could enable more flexible code execution. I'd love to explore the possibilities further.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>aws</category>
    </item>
    <item>
      <title>Using OpenRouter's OpenAI-Compatible Models (Grok 4.1 Fast) with Strands Agents</title>
      <dc:creator>yoko / Naoki Yokomachi</dc:creator>
      <pubDate>Sun, 15 Mar 2026 02:05:39 +0000</pubDate>
      <link>https://dev.to/yokomachi/using-openrouters-openai-compatible-models-grok-41-fast-with-strands-agents-8l3</link>
      <guid>https://dev.to/yokomachi/using-openrouters-openai-compatible-models-grok-41-fast-with-strands-agents-8l3</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This article is an AI-assisted translation of a Japanese technical article.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;I'm building a personal AI agent called TONaRi ("tonari" means "next to" in Japanese — named with the idea of an AI that stands next to you and supports your daily life). It's built with Strands Agents + Amazon Bedrock AgentCore, with a VRM-powered 3D avatar frontend using AITuberKit.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmvxtux6cbpf8b0oqys5g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmvxtux6cbpf8b0oqys5g.png" alt=" " width="800" height="520"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In a previous article, I wrote about cost reduction through sub-agent splitting.&lt;br&gt;
&lt;a href="https://dev.to/yokomachi/28-tool-definitions-cutting-ai-agent-costs-with-sub-agent-splitting-4dbp"&gt;https://dev.to/yokomachi/28-tool-definitions-cutting-ai-agent-costs-with-sub-agent-splitting-4dbp&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This time, I took cost reduction a step further by making it possible to switch the LLM itself to Grok 4.1 Fast via OpenRouter.&lt;/p&gt;
&lt;h2&gt;
  
  
  Cost Comparison
&lt;/h2&gt;

&lt;p&gt;Let's compare the costs between Claude Haiku 4.5 (Amazon Bedrock), which I had been using as the main model, and Grok 4.1 Fast (OpenRouter), the new alternative.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Claude Haiku 4.5 (Bedrock)&lt;/th&gt;
&lt;th&gt;Grok 4.1 Fast (OpenRouter)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Input&lt;/td&gt;
&lt;td&gt;$1.10 / 1M tokens&lt;/td&gt;
&lt;td&gt;$0.20 / 1M tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output&lt;/td&gt;
&lt;td&gt;$5.50 / 1M tokens&lt;/td&gt;
&lt;td&gt;$0.50 / 1M tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That's a significant difference. As I mentioned in the previous article, LLM per-token pricing is by far the biggest cost driver, so reducing the unit price — while maintaining an acceptable quality balance — has the greatest impact.&lt;/p&gt;
&lt;h2&gt;
  
  
  Switching Models in Strands Agents
&lt;/h2&gt;

&lt;p&gt;Strands Agents is an open-source agent SDK provided by AWS, and it supports models beyond Bedrock. Using the &lt;code&gt;OpenAIModel&lt;/code&gt; class, you can directly use models from any service that provides an OpenAI-compatible API, such as OpenRouter. If you need broader provider support, &lt;code&gt;LiteLLMModel&lt;/code&gt; is also an option. Since Grok 4.1 Fast is OpenAI-compatible, we use the &lt;code&gt;OpenAIModel&lt;/code&gt; class directly.&lt;/p&gt;
&lt;h3&gt;
  
  
  Creating an OpenAIModel
&lt;/h3&gt;

&lt;p&gt;First, add the &lt;code&gt;openai&lt;/code&gt; dependency.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dependencies = [
    "strands-agents&amp;gt;=1.23.0",
    "openai&amp;gt;=1.0.0",
    # ...
]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then create the model instance via OpenRouter.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.models.openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAIModel&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAIModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;client_args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;api_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-openrouter-api-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;base_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://openrouter.ai/api/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;x-ai/grok-4.1-fast&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The created model can be passed to an Agent with the exact same interface as a Bedrock model.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Works the same whether BedrockModel or OpenAIModel
&lt;/span&gt;    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a personal AI assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;my_tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Wrap Up
&lt;/h2&gt;

&lt;p&gt;So I switched the model used for everyday conversations to Grok 4.1 Fast, and my impression is that quality isn't a major issue for casual conversation. However, application-specific conversation tags (this AI agent uses tags like &lt;code&gt;[happy]&lt;/code&gt; or &lt;code&gt;[bow]&lt;/code&gt; to trigger facial expressions and motions) sometimes get ignored or misinterpreted by the model, so that still needs tuning.&lt;/p&gt;

&lt;p&gt;I also had concerns about tool calling via AgentCore Gateway, but it's been working surprisingly well without any major adjustments.&lt;/p&gt;

&lt;p&gt;I'll continue monitoring and consider trying other models or implementing model-specific routing if needed.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>ai</category>
      <category>strandsagents</category>
      <category>openrouter</category>
    </item>
    <item>
      <title>28 TOOL DEFINITIONS! — Cutting AI Agent Costs with Sub-Agent Splitting</title>
      <dc:creator>yoko / Naoki Yokomachi</dc:creator>
      <pubDate>Sat, 07 Mar 2026 12:53:02 +0000</pubDate>
      <link>https://dev.to/yokomachi/28-tool-definitions-cutting-ai-agent-costs-with-sub-agent-splitting-4dbp</link>
      <guid>https://dev.to/yokomachi/28-tool-definitions-cutting-ai-agent-costs-with-sub-agent-splitting-4dbp</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This article is an AI-assisted translation of a Japanese technical article.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;I'm building a personal AI agent called TONaRi ("tonari" means "next to" in Japanese — named with the idea of an AI that stands next to you and supports your daily life). It's built with Strands Agents + Amazon Bedrock AgentCore, with a VRM-powered 3D avatar frontend using AITuberKit.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8jjnj0rszxp1bhp89f2t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8jjnj0rszxp1bhp89f2t.png" alt=" " width="800" height="520"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As I kept adding tools to make my personal AI agent more useful for daily tasks, the input tokens per API call ballooned — and so did the cost.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbk9v2dt1e2hjyuxal3tu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbk9v2dt1e2hjyuxal3tu.png" alt=" " width="800" height="426"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;It's lower now, but the projection was heading toward $120/month&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In this article, I'll walk through the input token bloat problem caused by too many tools and how I tackled it by splitting into sub-agents.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;

&lt;p&gt;Here's a high-level look at TONaRi's architecture:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Frontend (Next.js + VRM 3D Avatar)
  → Next.js API Route
    → AgentCore Runtime (Strands Agent)
      → AgentCore Gateway → Lambda functions (tools)
      → AgentCore Memory (STM/LTM)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent runs as a container deployed on Bedrock AgentCore Runtime. External tools are implemented as Lambda functions accessed through AgentCore Gateway. Adding a new tool is as simple as writing a Lambda function and registering it as a Gateway target.&lt;/p&gt;

&lt;h2&gt;
  
  
  All the Tools
&lt;/h2&gt;

&lt;p&gt;AgentCore Gateway lets you expose Lambda functions as agent tools.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.tools.mcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MCPClient&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mcp_proxy_for_aws.client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;aws_iam_streamablehttp_client&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_mcp_client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gateway_url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;MCPClient&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_transport&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;aws_iam_streamablehttp_client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;gateway_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;aws_region&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;aws_service&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bedrock-agentcore&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;MCPClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;create_transport&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here are all the tools I've connected:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Domain&lt;/th&gt;
&lt;th&gt;Tools&lt;/th&gt;
&lt;th&gt;Count&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Task Management&lt;/td&gt;
&lt;td&gt;List, Add, Complete, Update&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Calendar&lt;/td&gt;
&lt;td&gt;List events, Check availability, Create, Update, Delete, Suggest schedule&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gmail&lt;/td&gt;
&lt;td&gt;Search, Get, Create draft, Archive&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Notion&lt;/td&gt;
&lt;td&gt;Search pages, Get page, Create, Update, Query DB, Get DB&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Twitter&lt;/td&gt;
&lt;td&gt;Get today's tweets, Post&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Diary&lt;/td&gt;
&lt;td&gt;Save, Get&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Date Utils&lt;/td&gt;
&lt;td&gt;Get current datetime, Calculate date, List date range&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Web Search&lt;/td&gt;
&lt;td&gt;Web search&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;28&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each tool can be called individually, but the real power is chaining. For example, saying "Search for a recipe, save the bookmark to Notion, create a shopping list, and add grocery shopping to my tasks" triggers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Web search tool finds a recipe&lt;/li&gt;
&lt;li&gt;Saves the URL to a Notion bookmark page&lt;/li&gt;
&lt;li&gt;Creates a shopping list from the recipe and saves it to a Notion memo page&lt;/li&gt;
&lt;li&gt;Adds a grocery shopping task to TONaRi's task list&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The AI agent sits between tools and interprets vague user requests to orchestrate across them — this is the most useful aspect of using an AI agent day-to-day.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Input Token Explosion
&lt;/h2&gt;

&lt;p&gt;Behind the convenience, costs were quietly piling up. When calling the Bedrock API, input tokens consist of four main components:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;System prompt&lt;/strong&gt;: Agent character settings, behavior rules&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool definitions&lt;/strong&gt;: Name, description, and JSON schema for every tool&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long-term memory (LTM)&lt;/strong&gt;: Episodes and facts extracted from past conversations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Conversation history (STM)&lt;/strong&gt;: Current session content&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The biggest culprit was tool definitions. I had Claude Code calculate it — the 28 tools directly connected to the agent consumed about 5,000 tokens.&lt;/p&gt;

&lt;h3&gt;
  
  
  Breaking Down the Numbers
&lt;/h3&gt;

&lt;p&gt;Here's a rough breakdown of input tokens per call for the monolithic agent:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Estimated Tokens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;System prompt (character + all domain rules)&lt;/td&gt;
&lt;td&gt;~3,500&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool definitions (28 tools × schema)&lt;/td&gt;
&lt;td&gt;~5,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LTM search results&lt;/td&gt;
&lt;td&gt;~1,500&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Conversation history (10 turns)&lt;/td&gt;
&lt;td&gt;Variable (~5,000–30,000)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The system prompt, tools, and LTM are essentially fixed costs sent with every message — that's 10,000 tokens per call. With about 100 calls per day, the monthly fixed cost alone is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;10,000 tokens × 100 calls/day × 30 days = 30,000,000 tokens/month
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At Claude Haiku 4.5's Bedrock input token rate ($1.10/1M tokens for Japan cross-region inference), that's $33/month in fixed costs alone. As a solo developer, having ~$33/month go toward tool definitions that might not even be used on a given call was painful.&lt;/p&gt;

&lt;h2&gt;
  
  
  Splitting into Sub-Agents
&lt;/h2&gt;

&lt;p&gt;To reduce the number of tool definitions the main agent loads, I created domain-specific sub-agents and had the main agent call them via the &lt;code&gt;@tool&lt;/code&gt; decorator.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Before: Monolithic]
Main Agent
├── System prompt (all domain rules)
└── 28 tools ← sent every single call

[After: Sub-agent split]
Main Agent
├── System prompt (generic rules only)
├── DateTool (3 tools)      ← frequently used, kept in main
├── TavilySearch (1 tool)   ← same
├── task_agent      ← defined as @tool (4 tools)
├── calendar_agent  ← defined as @tool (6 tools)
├── gmail_agent     ← defined as @tool (4 tools)
├── notion_agent    ← defined as @tool (6 tools)
├── diary_agent     ← defined as @tool (2 tools)
├── briefing_agent  ← defined as @tool (multi-domain tools)
└── twitter_agent   ← defined as @tool (2 tools)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Sub-Agent Implementation
&lt;/h3&gt;

&lt;p&gt;With Strands Agents' &lt;code&gt;@tool&lt;/code&gt; decorator, you can define a sub-agent as a tool for the main agent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calendar_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Google Calendar sub-agent. Handles listing, availability checks, creating, updating, and deleting events.

    Args:
        request: A request related to the owner&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s calendar
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;BedrockModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;jp.anthropic.claude-haiku-4-5-20251001-v1:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ap-northeast-1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;streaming&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a Google Calendar specialist assistant...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;_calendar_tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# calendar tools only
&lt;/span&gt;            &lt;span class="n"&gt;callback_handler&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Calendar operation error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  System Prompt Reduction
&lt;/h3&gt;

&lt;p&gt;By splitting sub-agents by domain, domain-specific rules moved from the main system prompt to each sub-agent's prompt.&lt;/p&gt;

&lt;p&gt;Before: Main prompt contained all domain rules&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- Calendar rules (duplicate checks, deletion confirmation, etc.)
- Gmail rules (draft only, date search caveats, etc.)
- Notion rules (property formats, database mappings, etc.)
- Briefing procedure (5 detailed sections)
- Diary creation flow (interview → generate → save)
- ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After: Main prompt only has sub-agent list and delegation rules&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Sub-agent Coordination
- task_agent: Task management (list, add, complete, update)
- calendar_agent: Google Calendar (get, create, update, delete events)
- gmail_agent: Gmail (search, get, create drafts)
- ...

### Delegation Rules
- Describe requests to sub-agents in detail
- Rephrase sub-agent results in your own words
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This reduced the system prompt from ~7,400 characters to ~3,800 characters.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost Reduction
&lt;/h3&gt;

&lt;p&gt;Comparing the main agent's fixed cost per call:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Before (Monolithic)&lt;/th&gt;
&lt;th&gt;After (Sub-agent split)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;System prompt&lt;/td&gt;
&lt;td&gt;~3,500 tokens&lt;/td&gt;
&lt;td&gt;~2,000 tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool definitions&lt;/td&gt;
&lt;td&gt;28 tools (~5,000 tokens)&lt;/td&gt;
&lt;td&gt;12 tools (~2,500 tokens)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LTM search results&lt;/td&gt;
&lt;td&gt;~1,500 tokens&lt;/td&gt;
&lt;td&gt;~1,500 tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Fixed cost total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~10,000 tokens&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~6,000 tokens&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Those 4,000 tokens weren't deleted — they moved to the sub-agents. Here's the per-call input token cost for each sub-agent:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Sub-agent&lt;/th&gt;
&lt;th&gt;Prompt&lt;/th&gt;
&lt;th&gt;Tool Defs&lt;/th&gt;
&lt;th&gt;Request Message&lt;/th&gt;
&lt;th&gt;Total&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;task_agent&lt;/td&gt;
&lt;td&gt;~400&lt;/td&gt;
&lt;td&gt;~400&lt;/td&gt;
&lt;td&gt;~100&lt;/td&gt;
&lt;td&gt;~900&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;calendar_agent&lt;/td&gt;
&lt;td&gt;~400&lt;/td&gt;
&lt;td&gt;~850&lt;/td&gt;
&lt;td&gt;~100&lt;/td&gt;
&lt;td&gt;~1,350&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gmail_agent&lt;/td&gt;
&lt;td&gt;~400&lt;/td&gt;
&lt;td&gt;~400&lt;/td&gt;
&lt;td&gt;~100&lt;/td&gt;
&lt;td&gt;~900&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;notion_agent&lt;/td&gt;
&lt;td&gt;~400&lt;/td&gt;
&lt;td&gt;~700&lt;/td&gt;
&lt;td&gt;~100&lt;/td&gt;
&lt;td&gt;~1,200&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;briefing_agent&lt;/td&gt;
&lt;td&gt;~500&lt;/td&gt;
&lt;td&gt;~2,500&lt;/td&gt;
&lt;td&gt;~100&lt;/td&gt;
&lt;td&gt;~3,100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;diary_agent&lt;/td&gt;
&lt;td&gt;~400&lt;/td&gt;
&lt;td&gt;~200&lt;/td&gt;
&lt;td&gt;~100&lt;/td&gt;
&lt;td&gt;~700&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;twitter_agent&lt;/td&gt;
&lt;td&gt;~400&lt;/td&gt;
&lt;td&gt;~150&lt;/td&gt;
&lt;td&gt;~100&lt;/td&gt;
&lt;td&gt;~650&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you just add everything up, the "After" total is actually higher. But the key insight is reducing tokens sent on &lt;em&gt;every&lt;/em&gt; call. For example, the briefing_agent loads Gmail, Calendar, and task tools all at once and has complex rules — it's expensive, but it only runs once a day. Before, all those definitions were loaded on every single call. Now they only load when needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Monthly Cost Impact
&lt;/h3&gt;

&lt;p&gt;Estimating with ~100 calls per day:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Main agent fixed cost reduction (every call)]
  4,000 tokens/call × 100 calls/day × 30 days = 12,000,000 tokens/month

[Sub-agent additional cost (only when invoked)]
  Assuming ~60% of calls (60/day) trigger one sub-agent
  Average 900 tokens/call × 60 calls/day × 30 days = 1,620,000 tokens/month
  *briefing_agent (~3,100 tokens) runs once/day, calculated separately
  briefing: 3,100 tokens × 30 days = 93,000 tokens/month

[Net savings]
  12,000,000 - 1,620,000 - 93,000 = 10,287,000 tokens/month
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At Claude Haiku 4.5's Bedrock input token rate ($1.10/1M tokens, Japan cross-region inference), that's roughly &lt;strong&gt;$11/month in input token savings&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Other Optimizations
&lt;/h2&gt;

&lt;p&gt;I also made several complementary changes:&lt;/p&gt;

&lt;h3&gt;
  
  
  Conversation Window Reduction
&lt;/h3&gt;

&lt;p&gt;Changed &lt;code&gt;SlidingWindowConversationManager&lt;/code&gt;'s &lt;code&gt;window_size&lt;/code&gt; from 15 to 10.&lt;br&gt;
Savings: $3–5/month&lt;/p&gt;

&lt;h3&gt;
  
  
  LTM Search Result Reduction
&lt;/h3&gt;

&lt;p&gt;Reduced &lt;code&gt;top_k&lt;/code&gt; across LTM strategies (total 18 → 10 results).&lt;br&gt;
Savings: $2–3/month&lt;/p&gt;

&lt;h3&gt;
  
  
  Lightweight Pipeline Agents
&lt;/h3&gt;

&lt;p&gt;For automated tasks like scheduled tweets and news collection, I was using the full main agent. I replaced these with lightweight dedicated agents that share memory but carry only minimal tools.&lt;br&gt;
Savings: $2–3/month&lt;/p&gt;

&lt;h3&gt;
  
  
  Total Savings
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Optimization&lt;/th&gt;
&lt;th&gt;Est. Monthly Savings&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Sub-agent splitting&lt;/td&gt;
&lt;td&gt;$11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Conversation window reduction&lt;/td&gt;
&lt;td&gt;$3–5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LTM result reduction&lt;/td&gt;
&lt;td&gt;$2–3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lightweight pipeline agents&lt;/td&gt;
&lt;td&gt;$2–3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$18–22&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;So I managed to cut costs to some degree, but it's still expensive...! &lt;br&gt;
If you have any clever cost reduction ideas, I'd love to hear them.&lt;/p&gt;

&lt;p&gt;(Fortunately I was recently selected as an AWS Community Builder, so I'm hoping for some AWS credits!)&lt;/p&gt;

</description>
      <category>aws</category>
      <category>ai</category>
      <category>python</category>
      <category>strandsagents</category>
    </item>
    <item>
      <title>Controlling VRM Character Motions for an AI Agent on the Web</title>
      <dc:creator>yoko / Naoki Yokomachi</dc:creator>
      <pubDate>Sat, 21 Feb 2026 13:00:22 +0000</pubDate>
      <link>https://dev.to/yokomachi/controlling-vrm-character-motions-for-an-ai-agent-on-the-web-3gga</link>
      <guid>https://dev.to/yokomachi/controlling-vrm-character-motions-for-an-ai-agent-on-the-web-3gga</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This article is an AI-assisted translation of a Japanese technical article.&lt;br&gt;
&lt;a href="https://zenn.dev/yokomachi/articles/202602_vrm-motion-control-on-web" rel="noopener noreferrer"&gt;https://zenn.dev/yokomachi/articles/202602_vrm-motion-control-on-web&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;I'm currently working on a personal AI agent project and decided to use a 3D model as the user interface.&lt;br&gt;
Since I didn't have the knowledge to build everything from scratch, I leveraged &lt;a href="https://github.com/tegnike/aituber-kit" rel="noopener noreferrer"&gt;AITuberKit&lt;/a&gt;, an OSS project I'd been aware of for a while, to quickly set up the frontend.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fukjb50ylfi6h5k0t4c5r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fukjb50ylfi6h5k0t4c5r.png" alt=" " width="800" height="378"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;VRM model creation: &lt;a href="https://vroid.com/studio" rel="noopener noreferrer"&gt;VRoid Studio&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Web frontend: Next.js, TypeScript&lt;/li&gt;
&lt;li&gt;VRM rendering &amp;amp; control: &lt;a href="https://github.com/pixiv/three-vrm" rel="noopener noreferrer"&gt;three-vrm&lt;/a&gt; (v3.0.0), Three.js&lt;/li&gt;
&lt;li&gt;Base kit: &lt;a href="https://github.com/tegnike/aituber-kit" rel="noopener noreferrer"&gt;AITuberKit&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Agent implementation: Strands Agents, Amazon Bedrock AgentCore &lt;em&gt;Not covered in detail in this article&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;
  
  
  VRM and VRoid Studio
&lt;/h1&gt;

&lt;p&gt;VRM is a file format designed for 3D avatars.&lt;br&gt;
With &lt;a href="https://vroid.com/studio" rel="noopener noreferrer"&gt;VRoid Studio&lt;/a&gt;, you can create characters and export them in VRM format without any 3D modeling knowledge.&lt;br&gt;
In my case, my only prior experience was creating characters in video games, but I was able to create two models (male and female) in about an hour — that's how easy it is.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://x.com/_cityside/status/2019742015617994773" rel="noopener noreferrer"&gt;https://x.com/_cityside/status/2019742015617994773&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;
  
  
  What AITuberKit Can Do
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://github.com/tegnike/aituber-kit" rel="noopener noreferrer"&gt;AITuberKit&lt;/a&gt; is an OSS that displays VRM models in a web browser and bundles features like LLM-powered chat, facial expression control, and speech synthesis.&lt;/p&gt;

&lt;p&gt;Here are some of the key features AITuberKit provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;VRM model display, facial expression control, and lip-sync&lt;/li&gt;
&lt;li&gt;LLM-powered chatbot functionality&lt;/li&gt;
&lt;li&gt;Speech synthesis API integration&lt;/li&gt;
&lt;li&gt;YouTube streaming integration&lt;/li&gt;
&lt;li&gt;Multimodal input&lt;/li&gt;
&lt;li&gt;etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For my project, since I'm building it as a personal AI agent, I'm using AITuberKit's base features like VRM display control and chatbot functionality while adding heavy customizations on top.&lt;/p&gt;
&lt;h1&gt;
  
  
  Implementing Motion Control
&lt;/h1&gt;

&lt;p&gt;Here's where we get to the main topic.&lt;br&gt;
AITuberKit supports switching facial expressions (smile, angry face, etc.) out of the box, so I decided to implement additional body motions (bowing, extending a hand, etc.).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://x.com/_cityside/status/2016874430056845502" rel="noopener noreferrer"&gt;https://x.com/_cityside/status/2016874430056845502&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;

&lt;p&gt;Here's the overall picture of the motion control system:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LLM Response
  ↓ Streaming parser
  ├─ [emotion] Emotion tag → ExpressionController → Facial expression control
  └─ [bow/present] Motion tag → GestureController → Bone control
                                         ↑
                                    EmoteController (conflict resolution)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;EmoteController&lt;/code&gt; sits between facial expressions and motions to handle conflicts between them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Motion Definitions
&lt;/h2&gt;

&lt;p&gt;Motions are implemented by defining bone rotations as keyframes.&lt;/p&gt;

&lt;p&gt;Here's an example definition for a bow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/features/emoteController/gestureController.ts&lt;/span&gt;
&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;BoneRotation&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;bone&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;VRMHumanBoneName&lt;/span&gt;
  &lt;span class="nx"&gt;rotation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;THREE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Quaternion&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;GestureKeyframe&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;
  &lt;span class="nx"&gt;bones&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;BoneRotation&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;GestureDefinition&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;keyframes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;GestureKeyframe&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
  &lt;span class="nx"&gt;holdDuration&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;
  &lt;span class="nx"&gt;closeEyes&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For the bow motion, three bones — spine, chest, and neck — are each rotated forward to create a more natural-looking bow rather than simply bending at the waist.&lt;br&gt;
The arm bones are also adjusted to achieve a natural posture.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/features/emoteController/gestureController.ts&lt;/span&gt;
&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_gestures&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;bow&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;keyframes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;bones&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;bone&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;spine&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;rotation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;THREE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Quaternion&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;setFromEuler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;THREE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Euler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;bone&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;chest&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;rotation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;THREE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Quaternion&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;setFromEuler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;THREE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Euler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;bone&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;neck&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;rotation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;THREE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Quaternion&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;setFromEuler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;THREE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Euler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="c1"&gt;// Arm bones are also adjusted (omitted)&lt;/span&gt;
      &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;holdDuration&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;closeEyes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// Close eyes during the bow&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Triggering Motions from LLM Responses
&lt;/h1&gt;

&lt;p&gt;The character's expressions are controlled by having the LLM output emotion and motion tags in its responses.&lt;/p&gt;

&lt;p&gt;Emotion tags are implemented by default in AITuberKit. The LLM response looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[happy]Thank you so much!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Motion tags are a custom addition. They appear in the response just like emotion tags:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Welcome! [bow]What kind of fragrance are you looking for today?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When both emotion and motion tags appear simultaneously, both are triggered.&lt;br&gt;
For example, &lt;code&gt;[happy][bow]&lt;/code&gt; results in the character bowing with a smile.&lt;/p&gt;

&lt;p&gt;The system prompt includes the following instructions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="s2"&gt;`
## Emotional Expression
The format for conversation text is as follows. Choose the single most appropriate emotion for the entire response and prepend the emotion tag at the beginning.
[{neutral|happy|angry|sad|relaxed|surprised}]{conversation text}
`&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Handling Conflicts Between Expressions and Motions
&lt;/h1&gt;

&lt;p&gt;Simply applying both facial expressions and motions at the same time can cause unexpected behavior, so I've added the following controls.&lt;br&gt;
For example, having the eyes open during a bow looked unnatural, so I set &lt;code&gt;closeEyes: true&lt;/code&gt; to close the eyes on the motion control side.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;EmoteController&lt;/code&gt; manages this by passing flags between controllers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/features/emoteController/emoteController.ts&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;updateExpression&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;isEmotionActive&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_expressionController&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;isEmotionActive&lt;/span&gt;
  &lt;span class="c1"&gt;// Skip auto-blink if the motion is closing eyes and expression is neutral&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;skipAutoBlink&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_gestureController&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;isClosingEyes&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;isEmotionActive&lt;/span&gt;
  &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_expressionController&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;skipAutoBlink&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;updateGesture&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;isEmotionActive&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_expressionController&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;isEmotionActive&lt;/span&gt;
  &lt;span class="c1"&gt;// Skip motion eye-close if an emotion expression is active&lt;/span&gt;
  &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_gestureController&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;isEmotionActive&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The emotion expressions and the motion's eye-close feature are mutually exclusive.&lt;br&gt;
When the emotion is &lt;code&gt;neutral&lt;/code&gt;, the motion side closes the eyes. When an emotion is active, the motion's eye-close is disabled and control is handed to the expression side.&lt;/p&gt;

&lt;h1&gt;
  
  
  Wrapping Up
&lt;/h1&gt;

&lt;p&gt;Using a chat UI as the frontend for an AI agent is a very common approach, but even a simple model like this feels lively just by having it move around, which makes it really fun.&lt;br&gt;
That said, controlling motions can be quite tricky — figuring out which bones to rotate and by how much is surprisingly difficult.&lt;br&gt;
For more complex motions, you could look into purchasing motion packs, which might be a good option.&lt;/p&gt;

</description>
      <category>vrm</category>
      <category>threejs</category>
      <category>nextjs</category>
      <category>typescript</category>
    </item>
  </channel>
</rss>
