<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Gabriel Koo</title>
    <description>The latest articles on DEV Community by Gabriel Koo (@gabrielkoo).</description>
    <link>https://dev.to/gabrielkoo</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F915460%2Fd0dc6e3b-e400-47c8-b6fc-ecc9cf80be4b.jpeg</url>
      <title>DEV Community: Gabriel Koo</title>
      <link>https://dev.to/gabrielkoo</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/gabrielkoo"/>
    <language>en</language>
    <item>
      <title>Bedrock for AI Coding Tools: Mantle vs Gateway vs LiteLLM — A Decision Guide for AWS Credit Burners</title>
      <dc:creator>Gabriel Koo</dc:creator>
      <pubDate>Sun, 22 Mar 2026 08:59:54 +0000</pubDate>
      <link>https://dev.to/aws-builders/bedrock-for-ai-coding-tools-mantle-vs-gateway-vs-litellm-a-decision-guide-for-aws-credit-burners-1h01</link>
      <guid>https://dev.to/aws-builders/bedrock-for-ai-coding-tools-mantle-vs-gateway-vs-litellm-a-decision-guide-for-aws-credit-burners-1h01</guid>
      <description>&lt;p&gt;You have AWS credits. You want to use them on AI coding tools — OpenCode, Codex CLI, Claude Code, whatever. Amazon Bedrock has the models. But how do you actually connect them?&lt;/p&gt;

&lt;p&gt;There are three approaches, and picking the wrong one wastes time. Here's the decision guide I wish I had.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;All data in this post is as of March 2026. Model counts and API support may change — check &lt;a href="https://amazonbedrockmodels.github.io" rel="noopener noreferrer"&gt;amazonbedrockmodels.github.io&lt;/a&gt; for the latest.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Just want it to work?&lt;/strong&gt; Mantle + OpenCode. Five minutes, zero infra.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Need Claude models via OpenAI API?&lt;/strong&gt; bedrock-access-gateway on Lambda.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Need Claude Code specifically?&lt;/strong&gt; LiteLLM. It's the only path.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Codex CLI?&lt;/strong&gt; Broken with all three. Wait for LiteLLM to fix a tool translation bug.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The three paths
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0sln5ko8ufwm73r5r8tb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0sln5ko8ufwm73r5r8tb.png" alt="Decision flowchart: Mantle vs bedrock-access-gateway vs LiteLLM" width="678" height="1295"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One thing all three have in common: your API keys and code context stay within your AWS account or your own infrastructure.&lt;/strong&gt; Third-party AI gateways exist (Bifrost, Portkey, etc.), but they require routing your Bedrock API keys and code context through someone else's servers. Self-hosted or AWS-native — that's the baseline.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Bedrock Mantle — no self-hosted infra required
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/bedrock-mantle.html" rel="noopener noreferrer"&gt;Mantle&lt;/a&gt; is AWS's native OpenAI-compatible endpoint. No Lambda, no container, no proxy — just set your base URL and API key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"https://bedrock-mantle.us-east-1.api.aws/v1"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your-bedrock-api-key"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What's on Mantle:&lt;/strong&gt; 38 open-weight models — DeepSeek, Mistral, Qwen, GLM, NVIDIA Nemotron, MiniMax, Moonshot Kimi, Google Gemma, OpenAI gpt-oss, and Writer Palmyra.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's NOT on Mantle:&lt;/strong&gt; Anthropic Claude, Amazon Nova, Meta Llama, AI21, Cohere — the proprietary/first-party models are absent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;API coverage:&lt;/strong&gt; Mantle exposes Chat Completions (&lt;code&gt;/v1/chat/completions&lt;/code&gt;) and Responses API (&lt;code&gt;/v1/responses&lt;/code&gt;). No Anthropic Messages API (&lt;code&gt;/v1/messages&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;The Responses API is limited — only 4 models support it: &lt;code&gt;openai.gpt-oss-120b-1:0&lt;/code&gt;, &lt;code&gt;openai.gpt-oss-20b-1:0&lt;/code&gt;, &lt;code&gt;openai.gpt-oss-120b&lt;/code&gt;, and &lt;code&gt;openai.gpt-oss-20b&lt;/code&gt;. Every other model is Chat Completions only. I verified this by scraping all 102 model card pages in the &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/models-supported.html" rel="noopener noreferrer"&gt;AWS documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; Standard Bedrock on-demand pricing. No gateway markup, no infra costs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; OpenCode or any tool that speaks OpenAI Chat Completions.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. bedrock-access-gateway — self-hosted, all models
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/aws-samples/bedrock-access-gateway" rel="noopener noreferrer"&gt;bedrock-access-gateway&lt;/a&gt; (or my fork, &lt;a href="https://github.com/gabrielkoo/bedrock-access-gateway-function-url" rel="noopener noreferrer"&gt;bedrock-access-gateway-function-url&lt;/a&gt;) gives you an OpenAI-compatible proxy backed by all Bedrock models — including Claude, Nova, and Llama.&lt;/p&gt;

&lt;p&gt;Deploy it as a Lambda Function URL or on ECS, and you get:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"https://your-lambda-url.lambda-url.us-west-2.on.aws/api/v1"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your-gateway-api-key"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The tradeoff: you maintain infrastructure. But you get access to every Bedrock model through a single OpenAI-compatible endpoint.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; Bedrock on-demand pricing + Lambda/ECS compute costs (minimal for Lambda Function URLs — you pay per invocation).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; When you need Claude or Nova through OpenAI-compatible tools, or want full control over routing, caching, and logging.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. LiteLLM — the universal translator
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/BerriAI/litellm" rel="noopener noreferrer"&gt;LiteLLM&lt;/a&gt; is the Swiss Army knife. It translates between API schemas — OpenAI, Anthropic, Bedrock native, and more. It's the only option that gives you Anthropic Messages API (&lt;code&gt;/v1/messages&lt;/code&gt;) compatibility with Bedrock models.&lt;/p&gt;

&lt;p&gt;This matters because &lt;strong&gt;Claude Code uses the Anthropic API schema&lt;/strong&gt;, not OpenAI's. If you want to run Claude Code against Bedrock, LiteLLM is your best (and arguably only) option. I tested this end-to-end: Claude Code CLI → LiteLLM → Bedrock Converse API — it works, including streaming responses.&lt;/p&gt;

&lt;p&gt;Is it perfect? No. Setup is more complex (Python process or Docker container, optional PostgreSQL for analytics), and you're adding another layer of abstraction. But it's the most flexible gateway available.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; Bedrock on-demand pricing + your compute costs for hosting LiteLLM. No per-call markup from LiteLLM itself (open source).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Claude Code, or when you need both OpenAI and Anthropic API compatibility from a single proxy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tool compatibility matrix
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;API Schema&lt;/th&gt;
&lt;th&gt;Mantle&lt;/th&gt;
&lt;th&gt;bedrock-access-gateway&lt;/th&gt;
&lt;th&gt;LiteLLM&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OpenCode&lt;/td&gt;
&lt;td&gt;OpenAI Chat&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codex CLI&lt;/td&gt;
&lt;td&gt;OpenAI Responses&lt;/td&gt;
&lt;td&gt;❌ Auth issues&lt;/td&gt;
&lt;td&gt;❌ No Responses API&lt;/td&gt;
&lt;td&gt;⚠️ Tool bug&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Code&lt;/td&gt;
&lt;td&gt;Anthropic Messages&lt;/td&gt;
&lt;td&gt;❌ No support&lt;/td&gt;
&lt;td&gt;❌ Wrong schema&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Note: Anthropic-native tools like Kiro CLI also work through LiteLLM's Anthropic Messages API translation.&lt;/p&gt;

&lt;h3&gt;
  
  
  A note on Codex CLI
&lt;/h3&gt;

&lt;p&gt;Codex CLI requires the Responses API (&lt;code&gt;/v1/responses&lt;/code&gt;), which limits your options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Mantle:&lt;/strong&gt; Only the 4 OpenAI gpt-oss models support Responses API. Even with those, I hit 401 auth errors (Bearer token not passed correctly through Codex's HTTPS transport) and tool type rejections (&lt;code&gt;web_search&lt;/code&gt; type not supported — only &lt;code&gt;function&lt;/code&gt; and &lt;code&gt;mcp&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;bedrock-access-gateway:&lt;/strong&gt; No Responses API at all — &lt;code&gt;/v1/responses&lt;/code&gt; returns 404. The gateway only implements Chat Completions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LiteLLM:&lt;/strong&gt; Supports Responses API (&lt;a href="https://github.com/BerriAI/litellm/releases" rel="noopener noreferrer"&gt;v1.66.3+&lt;/a&gt;) and has an &lt;a href="https://docs.litellm.ai/docs/tutorials/openai_codex" rel="noopener noreferrer"&gt;official Codex CLI tutorial&lt;/a&gt;. However, as of v1.82.5, there's a tool translation bug: Codex CLI sends built-in tool types that LiteLLM converts to Bedrock Converse format with empty &lt;code&gt;toolSpec.name&lt;/code&gt; fields, causing Bedrock validation errors. The Responses API itself works fine when tested with standard function tools. This should be fixable on the LiteLLM side.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Quick setup: OpenCode + Mantle
&lt;/h2&gt;

&lt;p&gt;If you just want to burn AWS credits on a coding CLI today, here's the fastest path. &lt;a href="https://opencode.ai" rel="noopener noreferrer"&gt;OpenCode&lt;/a&gt; (v1.2.27+) works with Mantle out of the box:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"$schema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://opencode.ai/config.json"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"bedrock-mantle"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"npm"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"@ai-sdk/openai-compatible"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Bedrock Mantle"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"options"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"baseURL"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://bedrock-mantle.us-east-1.api.aws/v1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"apiKey"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"{env:BEDROCK_API_KEY}"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"openai.gpt-oss-120b"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"GPT OSS 120B"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"zai.glm-5"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"GLM 5 (744B/40B MoE)"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"qwen.qwen3-coder-480b-a35b-instruct"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Qwen3 Coder 480B"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"deepseek.v3.2"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DeepSeek V3.2"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"mistral.mistral-large-3-675b-instruct"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Mistral Large 3"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bedrock-mantle/openai.gpt-oss-120b"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Save to &lt;code&gt;~/.config/opencode/opencode.json&lt;/code&gt;, set &lt;code&gt;BEDROCK_API_KEY&lt;/code&gt;, and you're coding.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bottom line
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Mantle&lt;/th&gt;
&lt;th&gt;bedrock-access-gateway&lt;/th&gt;
&lt;th&gt;LiteLLM&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Infra to maintain&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Lambda/ECS&lt;/td&gt;
&lt;td&gt;Container/process&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Models available&lt;/td&gt;
&lt;td&gt;38 (open-weight)&lt;/td&gt;
&lt;td&gt;All Bedrock&lt;/td&gt;
&lt;td&gt;All Bedrock&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI Chat API&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI Responses API&lt;/td&gt;
&lt;td&gt;⚠️ gpt-oss only&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Anthropic Messages API&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenCode&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codex CLI&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;⚠️ Tool bug&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Code&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Extra cost&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;~$0 (Lambda)&lt;/td&gt;
&lt;td&gt;Your compute&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Setup time&lt;/td&gt;
&lt;td&gt;5 min&lt;/td&gt;
&lt;td&gt;30 min&lt;/td&gt;
&lt;td&gt;1 hr&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Track available Mantle models
&lt;/h2&gt;

&lt;p&gt;I maintain &lt;a href="https://amazonbedrockmodels.github.io" rel="noopener noreferrer"&gt;amazonbedrockmodels.github.io&lt;/a&gt; — a catalog of every Bedrock model with API support badges and endpoint support (Mantle vs Runtime), scraped from the AWS documentation.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Burning AWS credits on something interesting? I'd love to hear what tools and models you're using — drop a comment.&lt;/em&gt;&lt;br&gt;
 comment.*&lt;/p&gt;

</description>
      <category>aws</category>
      <category>bedrock</category>
      <category>ai</category>
      <category>openai</category>
    </item>
    <item>
      <title>From 3-Minute Cold Starts to ~20 Seconds: Whisper on AWS Lambda + EFS for OpenClaw</title>
      <dc:creator>Gabriel Koo</dc:creator>
      <pubDate>Fri, 13 Mar 2026 01:27:21 +0000</pubDate>
      <link>https://dev.to/aws-builders/from-3-minute-cold-starts-to-20-seconds-whisper-on-aws-lambda-efs-for-openclaw-9c5</link>
      <guid>https://dev.to/aws-builders/from-3-minute-cold-starts-to-20-seconds-whisper-on-aws-lambda-efs-for-openclaw-9c5</guid>
      <description>&lt;p&gt;&lt;em&gt;Part 3 of my series on building a low-cost personal AI stack on AWS.&lt;/em&gt; &lt;br&gt;
&lt;em&gt;&lt;a href="https://dev.to/aws-builders/i-squeezed-my-1k-monthly-openclaw-api-bill-with-20month-in-aws-credits-heres-the-exact-setup-3gj4"&gt;Part 1 — Squeezing my $1k/month API bill to $20/month with AWS Credits&lt;/a&gt;&lt;/em&gt;&lt;br&gt;
&lt;em&gt;&lt;a href="https://dev.to/aws-builders/drop-in-perplexity-sonar-replacement-with-aws-bedrock-nova-grounding-35o9"&gt;Part 2 — Drop-in Perplexity Sonar replacement with AWS Bedrock Nova Grounding&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;I built a self-hosted speech-to-text API on AWS Lambda using &lt;a href="https://github.com/SYSTRAN/faster-whisper" rel="noopener noreferrer"&gt;faster-whisper&lt;/a&gt;. After trying Amazon Transcribe, SageMaker Serverless, and Lambda with a bundled model, I landed on a &lt;strong&gt;Lambda + EFS + S3&lt;/strong&gt; architecture that achieves ~20-30 second cold starts (once the model is cached on EFS) for ~$0.21/month in storage costs. Once warm, specifying the language drops response time to ~10s.&lt;/p&gt;

&lt;p&gt;Open source: &lt;a href="https://github.com/gabrielkoo/aws-lambda-whisper-adaptor" rel="noopener noreferrer"&gt;gabrielkoo/aws-lambda-whisper-adaptor&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;I wanted to automatically transcribe Telegram voice messages. The requirements were simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Accuracy&lt;/strong&gt;: Good enough for Cantonese&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost&lt;/strong&gt;: Pay-per-use, scales to zero when idle&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency&lt;/strong&gt;: Cold start under 60 seconds&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There's a fourth constraint that's easy to overlook outside Hong Kong: &lt;strong&gt;most managed STT APIs simply aren't available here&lt;/strong&gt;. OpenAI's Whisper API falls under their notorious China's regional restriction. Google's Gemini models are available and actually competitive on both accuracy and price — Gemini 3 Flash achieves 3.1% WER at ~$1.92/1000 minutes (&lt;a href="https://artificialanalysis.ai/speech-to-text" rel="noopener noreferrer"&gt;Artificial Analysis STT leaderboard&lt;/a&gt;), cheaper than OpenAI's Whisper API and competitive with Lambda at low volume. The real reason I went with Lambda: AWS Credits from the Community Builder program (same theme as the rest of this series) make it effectively free.&lt;/p&gt;

&lt;p&gt;Simple enough. Except it took four attempts to get there.&lt;/p&gt;


&lt;h2&gt;
  
  
  What I Tried (and Why It Didn't Work)
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Option 1: Amazon Transcribe
&lt;/h3&gt;

&lt;p&gt;The obvious first choice — fully managed, pay-per-use, native AWS integration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why I rejected it before even trying:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Amazon Transcribe &lt;a href="https://docs.aws.amazon.com/transcribe/latest/dg/supported-languages.html" rel="noopener noreferrer"&gt;supports &lt;code&gt;zh-CN&lt;/code&gt; and &lt;code&gt;zh-TW&lt;/code&gt;, but not &lt;code&gt;yue&lt;/code&gt; (Cantonese)&lt;/a&gt;. Whisper large-v3-turbo handles Cantonese significantly better, and accuracy matters more than convenience here.&lt;/p&gt;


&lt;h3&gt;
  
  
  Option 2: SageMaker Serverless Inference
&lt;/h3&gt;

&lt;p&gt;SageMaker Serverless scales to zero and handles model serving — sounds perfect.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What happened:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I deployed a SageMaker Serverless endpoint with faster-whisper. The first invocation after idle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Container provisioning: ~30s&lt;/li&gt;
&lt;li&gt;Model loading: ~45-60s&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Total cold start: 60-90 seconds&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a voice message that's 5-10 seconds long, waiting 90 seconds is a terrible experience.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The 6GB memory wall:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;SageMaker Serverless &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints.html" rel="noopener noreferrer"&gt;maxes out at 6144 MB (6 GB) RAM&lt;/a&gt;. Here's why that's a problem for Whisper:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://huggingface.co/Zoont/faster-whisper-large-v3-turbo-int8-ct2" rel="noopener noreferrer"&gt;&lt;code&gt;whisper-large-v3-turbo&lt;/code&gt; (INT8)&lt;/a&gt;: ~780MB model + ~2GB Python/runtime overhead ≈ 2.8GB minimum&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://huggingface.co/Systran/faster-whisper-large-v3" rel="noopener noreferrer"&gt;&lt;code&gt;whisper-large-v3&lt;/code&gt; (FP16)&lt;/a&gt;: ~3GB model alone — barely fits, zero headroom for audio processing&lt;/li&gt;
&lt;li&gt;Any concurrent requests? You're OOM.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/configuration-memory.html" rel="noopener noreferrer"&gt;Lambda goes up to 10,240 MB&lt;/a&gt;. That headroom matters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost comparison:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/sagemaker/pricing/" rel="noopener noreferrer"&gt;SageMaker Serverless bills per GB-second&lt;/a&gt; of inference time. For sporadic voice message transcription (~10s per request, a few times a day), Lambda's per-invocation pricing is significantly cheaper. My Lambda setup costs ~$0.21/month in storage — the compute is essentially free at this volume.&lt;/p&gt;

&lt;p&gt;I deleted the endpoint after testing.&lt;/p&gt;


&lt;h3&gt;
  
  
  Option 2b: Bedrock Marketplace
&lt;/h3&gt;

&lt;p&gt;AWS Bedrock Marketplace &lt;a href="https://aws.amazon.com/blogs/machine-learning/build-a-serverless-audio-summarization-solution-with-amazon-bedrock-and-whisper/" rel="noopener noreferrer"&gt;does list Whisper Large V3 Turbo&lt;/a&gt; — but it deploys on a &lt;strong&gt;dedicated endpoint instance&lt;/strong&gt;. Auto-scaling is available (including scale-to-zero), but that creates a different problem:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Keep minimum 1 instance&lt;/strong&gt;: always paying for idle time, even at 3am&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scale to zero&lt;/strong&gt;: cold starts when traffic resumes — SageMaker cold starts are measured in &lt;strong&gt;minutes&lt;/strong&gt;, not seconds&lt;/li&gt;
&lt;li&gt;Not token/usage-based pricing either way&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a Telegram bot that gets a few voice messages a day, you're either burning money on idle instances or waiting minutes for the first message to transcribe. Lambda's ~20-30s cold start looks great by comparison.&lt;/p&gt;


&lt;h3&gt;
  
  
  Option 3: Lambda with Bundled Model
&lt;/h3&gt;

&lt;p&gt;Next idea: bundle the model directly into the Docker image. No external dependencies, simple architecture.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What happened:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# Download model during build&lt;/span&gt;
&lt;span class="c"&gt;# Note: using openai/whisper-large-v3-turbo converted to int8 via sync-model workflow&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;python &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"from faster_whisper import WhisperModel; WhisperModel('openai/whisper-large-v3-turbo')"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Docker image size: &lt;strong&gt;~10GB&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;ECR push time: &lt;strong&gt;5+ minutes&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Lambda cold start: &lt;strong&gt;2 minutes 51 seconds&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The cold start is dominated by Lambda pulling the 10GB image from ECR. AWS Lambda caches images, but any cold start after the cache expires hits this wall.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it didn't work:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;3-minute cold start is unusable for interactive transcription&lt;/li&gt;
&lt;li&gt;Every code change requires rebuilding and pushing a 10GB image&lt;/li&gt;
&lt;li&gt;ECR storage: ~$1/month just for the image&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Option 4: Lambda + S3 (No EFS)
&lt;/h3&gt;

&lt;p&gt;What if Lambda downloads the model from S3 on cold start, storing it in &lt;code&gt;/tmp&lt;/code&gt;?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Lambda's &lt;code&gt;/tmp&lt;/code&gt; is ephemeral. Every cold start re-downloads the model from S3:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;S3 download for 1.6GB FP16 model: &lt;strong&gt;30-60 seconds&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;S3 download for 780MB INT8 model: &lt;strong&gt;15-30 seconds&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is better than the bundled model approach, but there's a bigger issue: &lt;strong&gt;no caching between Lambda instances&lt;/strong&gt;. If you have 3 concurrent invocations, all 3 download the model independently. You're paying for S3 transfer on every cold start.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What about Lambda SnapStart or Durable Functions?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;AWS added two relevant features since this was written:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;SnapStart for Python&lt;/strong&gt; (Nov 2024): snapshots the initialized execution environment — sounds perfect for caching a loaded model. The catch: &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/snapstart.html" rel="noopener noreferrer"&gt;SnapStart doesn't support container images&lt;/a&gt;. This adaptor is container-based, so it's off the table.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/12/lambda-durable-multi-step-applications-ai-workflows/" rel="noopener noreferrer"&gt;Lambda Durable Functions&lt;/a&gt;&lt;/strong&gt; (re:Invent 2025): enables multi-step workflows with automatic checkpointing, pause/resume for up to one year, and failure recovery. This is workflow orchestration (think Azure Durable Functions) — useful for multi-step AI pipelines, but not for persisting a 780MB model binary between cold starts.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;EFS remains the right solution for model caching.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Actually Worked: Lambda + EFS + S3
&lt;/h2&gt;

&lt;p&gt;The solution: use &lt;strong&gt;EFS as a persistent model cache&lt;/strong&gt;, bootstrapped from S3. I've used EFS for &lt;a href="https://dev.to/aws-builders/scale-a-stateful-streamlit-chatbot-with-aws-ecs-and-efs-48gm"&gt;persistent Streamlit state on ECS&lt;/a&gt; before — same pattern, different compute layer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Request → Lambda Function URL
               ↓
          Lambda (VPC)
               ↓ first cold start only: S3 → EFS
              EFS (model cached here permanently)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwezctzw3zlsjpf19q5f1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwezctzw3zlsjpf19q5f1.png" alt="Logic Flow" width="800" height="1846"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;First cold start&lt;/strong&gt; (once per model): Lambda checks for a marker file on EFS. If missing, downloads model from S3 to EFS (~55s for INT8). Writes marker file. Then loads model into RAM.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Subsequent cold starts&lt;/strong&gt; (new container, model already on EFS): Marker file exists → load model from EFS into RAM (~20-30s for INT8).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Warm invocations&lt;/strong&gt; (same container reused): Model already in memory → transcription-only time (~10-22s depending on audio length and whether language is specified).
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;HF_MODEL_REPO&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;HF_MODEL_REPO&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;openai/whisper-large-v3-turbo&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;MODEL_SLUG&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;HF_MODEL_REPO&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;--&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;EFS_MODEL_DIR&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/mnt/whisper-models/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;MODEL_SLUG&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;MODEL_MARKER&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/mnt/whisper-models/.ready-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;MODEL_SLUG&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;bootstrap_model&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MODEL_MARKER&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;WhisperModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;EFS_MODEL_DIR&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cpu&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;compute_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;int8&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# First run: sync model from S3 to EFS
&lt;/span&gt;    &lt;span class="n"&gt;s3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;prefix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;models/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;MODEL_SLUG&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;makedirs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;EFS_MODEL_DIR&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exist_ok&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;paginator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_paginator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;list_objects_v2&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;paginator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;paginate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Bucket&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;MODEL_S3_BUCKET&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;Prefix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prefix&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;obj&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Contents&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]):&lt;/span&gt;
            &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Key&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;local_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;EFS_MODEL_DIR&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prefix&lt;/span&gt;&lt;span class="p"&gt;):])&lt;/span&gt;
            &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;makedirs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dirname&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;local_path&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;exist_ok&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;download_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;MODEL_S3_BUCKET&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;local_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MODEL_MARKER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;w&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# Mark as ready
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;WhisperModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;EFS_MODEL_DIR&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cpu&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;compute_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;int8&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;bootstrap_model&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# Runs at Lambda init time, cached for warm invocations
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why EFS works:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;EFS persists across Lambda instances — model is downloaded &lt;strong&gt;once&lt;/strong&gt;, reused forever&lt;/li&gt;
&lt;li&gt;EFS is mounted at &lt;code&gt;/mnt/whisper-models&lt;/code&gt; — Lambda reads it like a local filesystem&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;S3 VPC Gateway Endpoint is free&lt;/strong&gt; — no NAT Gateway needed (saves ~$32/month)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero internet egress&lt;/strong&gt; — Lambda → S3 via VPC Gateway Endpoint, Lambda → EFS within VPC. The Lambda function never reaches the internet. This is a meaningful security benefit when using third-party models from HuggingFace — model weights never leave the AWS network once synced to S3.&lt;/li&gt;
&lt;li&gt;EFS storage: ~$0.19/month for the 780MB INT8 model&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;🔒 &lt;strong&gt;Security note:&lt;/strong&gt; The Lambda runs in a VPC with &lt;strong&gt;no internet access&lt;/strong&gt; — no NAT Gateway, no public subnet. It can only reach EFS (VPC-internal) and S3 (via the free VPC Gateway Endpoint). This means even if you're using a third-party HuggingFace model, the model weights and your audio data never leave the AWS network. No data exfiltration risk, no outbound calls to unknown endpoints.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  INT8 vs FP16: The Model Size Trade-off
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;openai/whisper-large-v3-turbo&lt;/code&gt; model on HuggingFace needs conversion to CTranslate2 format. The &lt;code&gt;sync-model&lt;/code&gt; workflow handles this, converting to INT8 and fixing the &lt;code&gt;num_mel_bins&lt;/code&gt; config. Alternatively, use &lt;a href="https://huggingface.co/Zoont/faster-whisper-large-v3-turbo-int8-ct2" rel="noopener noreferrer"&gt;&lt;code&gt;Zoont/faster-whisper-large-v3-turbo-int8-ct2&lt;/code&gt;&lt;/a&gt; — a pre-converted CTranslate2 INT8 model that works out of the box with &lt;code&gt;quantization=none&lt;/code&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Size (EFS)&lt;/th&gt;
&lt;th&gt;First Bootstrap&lt;/th&gt;
&lt;th&gt;EFS Cold Start&lt;/th&gt;
&lt;th&gt;Warm (2.5s audio)&lt;/th&gt;
&lt;th&gt;Memory&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Zoont/faster-whisper-large-v3-turbo-int8-ct2&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;~780MB&lt;/td&gt;
&lt;td&gt;~55s&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;~22s&lt;/strong&gt; ✅&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;~10s&lt;/strong&gt; ✅&lt;/td&gt;
&lt;td&gt;~2.8GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;openai/whisper-large-v3-turbo&lt;/code&gt; (INT8, via sync-model)&lt;/td&gt;
&lt;td&gt;~780MB&lt;/td&gt;
&lt;td&gt;~55s&lt;/td&gt;
&lt;td&gt;~22s&lt;/td&gt;
&lt;td&gt;~10s&lt;/td&gt;
&lt;td&gt;~2.8GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;openai/whisper-large-v3-turbo&lt;/code&gt; (FP16)&lt;/td&gt;
&lt;td&gt;~1.5GB&lt;/td&gt;
&lt;td&gt;~126s&lt;/td&gt;
&lt;td&gt;~40s&lt;/td&gt;
&lt;td&gt;~15s&lt;/td&gt;
&lt;td&gt;~4GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;Systran/faster-whisper-large-v3&lt;/code&gt; (FP16, loaded as int8)&lt;/td&gt;
&lt;td&gt;~1.6GB&lt;/td&gt;
&lt;td&gt;~54s&lt;/td&gt;
&lt;td&gt;~30s&lt;/td&gt;
&lt;td&gt;~13s&lt;/td&gt;
&lt;td&gt;6GB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Recommended:&lt;/strong&gt; &lt;code&gt;Zoont/faster-whisper-large-v3-turbo-int8-ct2&lt;/code&gt; — no conversion step needed, identical performance to the openai model converted to INT8. Use &lt;code&gt;quantization=none&lt;/code&gt; in the sync-model workflow since it's already in CTranslate2 format.&lt;/p&gt;




&lt;h2&gt;
  
  
  Cost Breakdown
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Resource&lt;/th&gt;
&lt;th&gt;Monthly Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;EFS storage (780MB INT8)&lt;/td&gt;
&lt;td&gt;~$0.19&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3 storage (780MB)&lt;/td&gt;
&lt;td&gt;~$0.02&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lambda compute&lt;/td&gt;
&lt;td&gt;~$0.00167/warm invocation*&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3 VPC Gateway Endpoint&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Free&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NAT Gateway&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Not needed ($0)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total (storage only)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$0.21/month&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;10GB × 10s = 100 GB-seconds per warm invocation. The &lt;a href="https://aws.amazon.com/lambda/pricing/" rel="noopener noreferrer"&gt;Lambda free tier&lt;/a&gt; covers **400,000 GB-seconds/month&lt;/em&gt;* — roughly 4,000 warm invocations. For a personal bot, compute cost is effectively &lt;strong&gt;$0&lt;/strong&gt;. Storage dominates.&lt;/p&gt;

&lt;p&gt;Compare to SageMaker Serverless: minimum ~$5-10/month for similar workloads, plus the 60-90s cold start penalty.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Why not Provisioned Concurrency?&lt;/strong&gt; PC keeps Lambda permanently warm (no cold starts), but costs ~$0.0000097222/GB-second. For a 10GB function running 24/7: ~$252/month. Even a minimal 4GB setup runs ~$100/month — roughly 500x more than the $0.21 storage approach. For a personal bot with a few voice messages a day, the occasional ~60s cold start is a fine trade-off.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  vs. OpenAI Whisper API
&lt;/h3&gt;

&lt;p&gt;OpenAI's Whisper API costs &lt;a href="https://openai.com/api/pricing/" rel="noopener noreferrer"&gt;&lt;strong&gt;$0.006/minute&lt;/strong&gt;&lt;/a&gt;. Here's how it compares for a bot averaging 15s voice messages:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Volume&lt;/th&gt;
&lt;th&gt;OpenAI Whisper API&lt;/th&gt;
&lt;th&gt;Self-hosted Lambda&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;50 msgs/month&lt;/td&gt;
&lt;td&gt;$0.08&lt;/td&gt;
&lt;td&gt;$0.21 (storage only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;140 msgs/month&lt;/td&gt;
&lt;td&gt;$0.21&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;$0.21&lt;/strong&gt; ← break-even&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;500 msgs/month&lt;/td&gt;
&lt;td&gt;$0.75&lt;/td&gt;
&lt;td&gt;$0.21 (storage only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1,000 msgs/month&lt;/td&gt;
&lt;td&gt;$1.50&lt;/td&gt;
&lt;td&gt;$0.21 (storage only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4,000 msgs/month&lt;/td&gt;
&lt;td&gt;$6.00&lt;/td&gt;
&lt;td&gt;$0.21 (storage only)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Lambda compute is free within the free tier (~4,000 warm invocations/month). Beyond that, it's $0.00167/invocation — but that's a high volume for a personal bot.&lt;/p&gt;

&lt;p&gt;Break-even: &lt;strong&gt;~140 messages/month&lt;/strong&gt;. Above that, Lambda wins on cost.&lt;/p&gt;

&lt;p&gt;But cost isn't the only reason to self-host:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Geographic availability&lt;/strong&gt;: OpenAI's API is not available in Hong Kong — HK falls under China's regional restriction. Azure OpenAI does offer Whisper, but &lt;a href="https://learn.microsoft.com/en-us/answers/questions/2237575/new-announced-speech-to-text-models-for-realtime" rel="noopener noreferrer"&gt;only &lt;code&gt;whisper-1&lt;/code&gt; (large-v2 based)&lt;/a&gt; — large-v3 and large-v3-turbo are not available. If you're in HK (or other restricted regions), this approach isn't just cheaper — it's the only option for v3-quality transcription.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cantonese accuracy&lt;/strong&gt;: &lt;code&gt;language=yue&lt;/code&gt; with Whisper large-v3-turbo is noticeably better than the managed API for Cantonese&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Privacy&lt;/strong&gt;: audio never leaves your infrastructure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No rate limits&lt;/strong&gt;: Lambda scales independently&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Telegram voice message
        ↓
   OpenClaw (gateway)
        ↓
Lambda Function URL (auth via token)
        ↓
Lambda (VPC, 10GB RAM, 900s timeout)
        ↓
EFS /mnt/whisper-models/{model-slug}
        ↓
faster-whisper (CTranslate2, INT8)
        ↓
    Transcript
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Lambda configuration:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Memory: 10,240 MB — actual usage is &lt;strong&gt;~2.2GB&lt;/strong&gt; (INT8 model), but &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/configuration-memory.html" rel="noopener noreferrer"&gt;Lambda allocates CPU proportional to memory&lt;/a&gt;. 10GB gives ~6 vCPUs vs ~2.3 vCPUs at 4GB, cutting warm transcription from ~16s to ~10s. You're paying for CPU, not RAM.&lt;/li&gt;
&lt;li&gt;Timeout: 900s (handles long audio files)&lt;/li&gt;
&lt;li&gt;VPC: Default VPC (no NAT Gateway)&lt;/li&gt;
&lt;li&gt;EFS: Mounted at &lt;code&gt;/mnt/whisper-models&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Memory vs. cost trade-off (tested, 3 runs each):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Config&lt;/th&gt;
&lt;th&gt;Cold Start&lt;/th&gt;
&lt;th&gt;Warm (2.5s audio)&lt;/th&gt;
&lt;th&gt;GB-seconds/invocation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;4,096 MB&lt;/td&gt;
&lt;td&gt;~30s&lt;/td&gt;
&lt;td&gt;~21s&lt;/td&gt;
&lt;td&gt;84 (~$0.00140)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6,144 MB&lt;/td&gt;
&lt;td&gt;~25s&lt;/td&gt;
&lt;td&gt;~16s&lt;/td&gt;
&lt;td&gt;96 (~$0.00160)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8,192 MB&lt;/td&gt;
&lt;td&gt;~24s&lt;/td&gt;
&lt;td&gt;~18s&lt;/td&gt;
&lt;td&gt;144 (~$0.00240)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10,240 MB&lt;/td&gt;
&lt;td&gt;~22s&lt;/td&gt;
&lt;td&gt;~15s&lt;/td&gt;
&lt;td&gt;150 (~$0.00250)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Cold start is ~20-30s across all configs — it's EFS I/O bound, not CPU bound, so more memory doesn't help much here. Warm inference time does scale with memory (more vCPUs = faster CTranslate2 decoding). Interestingly, 4GB is the cheapest per invocation — the warm time savings at higher memory don't offset the extra GB-seconds. Within the free tier, cost differences are negligible regardless.&lt;/p&gt;




&lt;h2&gt;
  
  
  API Compatibility
&lt;/h2&gt;

&lt;p&gt;The adaptor exposes two endpoints so it works as a drop-in replacement for existing integrations:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenAI compatible&lt;/strong&gt; (&lt;code&gt;/v1/audio/transcriptions&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST https://&amp;lt;&lt;span class="k"&gt;function&lt;/span&gt;&lt;span class="nt"&gt;-url&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;/v1/audio/transcriptions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Token &amp;lt;secret&amp;gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-F&lt;/span&gt; &lt;span class="s2"&gt;"file=@audio.ogg"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-F&lt;/span&gt; &lt;span class="s2"&gt;"language=yue"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"transcript here"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Deepgram compatible&lt;/strong&gt; (&lt;code&gt;/v1/listen&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST https://&amp;lt;&lt;span class="k"&gt;function&lt;/span&gt;&lt;span class="nt"&gt;-url&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;/v1/listen?language&lt;span class="o"&gt;=&lt;/span&gt;yue &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Token &amp;lt;secret&amp;gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: audio/ogg"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--data-binary&lt;/span&gt; @audio.ogg
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Model Management API
&lt;/h2&gt;

&lt;p&gt;Once you've synced multiple models to EFS, there's no SSH access to see what's there or clean up. I added two non-standard endpoints:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;List models on EFS:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://&amp;lt;&lt;span class="k"&gt;function&lt;/span&gt;&lt;span class="nt"&gt;-url&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;/v1/models &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Token &amp;lt;secret&amp;gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"list"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"openai/whisper-large-v3-turbo"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"owned_by"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"openai"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Systran/faster-distil-whisper-large-v3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"owned_by"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Systran"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Delete a model from EFS&lt;/strong&gt; (the currently loaded model returns 409):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; DELETE https://&amp;lt;&lt;span class="k"&gt;function&lt;/span&gt;&lt;span class="nt"&gt;-url&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;/v1/models/Systran/faster-distil-whisper-large-v3 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Token &amp;lt;secret&amp;gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Systran/faster-distil-whisper-large-v3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"deleted"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Slashes in model IDs work naturally — &lt;code&gt;rawPath&lt;/code&gt; preserves the full path, so &lt;code&gt;DELETE /v1/models/openai/whisper-large-v3-turbo&lt;/code&gt; correctly maps to model ID &lt;code&gt;openai/whisper-large-v3-turbo&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Performance Tip: Always Specify Language
&lt;/h2&gt;

&lt;p&gt;When no language is specified, Whisper runs language detection on the first audio chunk — adding noticeable overhead. For a 2.5s voice message:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Request&lt;/th&gt;
&lt;th&gt;Response Time&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;No language (auto-detect)&lt;/td&gt;
&lt;td&gt;~22s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;language=yue&lt;/code&gt; (Cantonese)&lt;/td&gt;
&lt;td&gt;~10s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That's a &lt;strong&gt;2x speedup&lt;/strong&gt; just from passing a language hint. Two ways to do it:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option A — per-request query param&lt;/strong&gt; (recommended, keeps Lambda language-agnostic):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Deepgram endpoint&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST https://&amp;lt;&lt;span class="k"&gt;function&lt;/span&gt;&lt;span class="nt"&gt;-url&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;/v1/listen?language&lt;span class="o"&gt;=&lt;/span&gt;yue &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Token &amp;lt;secret&amp;gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: audio/ogg"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--data-binary&lt;/span&gt; @audio.ogg

&lt;span class="c"&gt;# OpenAI endpoint&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST https://&amp;lt;&lt;span class="k"&gt;function&lt;/span&gt;&lt;span class="nt"&gt;-url&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;/v1/audio/transcriptions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Token &amp;lt;secret&amp;gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-F&lt;/span&gt; &lt;span class="s2"&gt;"file=@audio.ogg"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-F&lt;/span&gt; &lt;span class="s2"&gt;"language=yue"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Option B — Lambda env var&lt;/strong&gt; (simpler if you only ever transcribe one language):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;WHISPER_LANGUAGE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;yue
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I use Option A — the language is set in my OpenClaw config (&lt;code&gt;language: "yue"&lt;/code&gt; in the audio model), which passes it as &lt;code&gt;?language=yue&lt;/code&gt; to the Lambda on every request.&lt;/p&gt;

&lt;h3&gt;
  
  
  Real-time Factor
&lt;/h3&gt;

&lt;p&gt;Once warm, the Lambda transcribes faster than real-time for typical voice messages:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Audio Duration&lt;/th&gt;
&lt;th&gt;Warm Response Time&lt;/th&gt;
&lt;th&gt;Real-time Factor&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2.5s&lt;/td&gt;
&lt;td&gt;~10s&lt;/td&gt;
&lt;td&gt;4x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;33s&lt;/td&gt;
&lt;td&gt;~23s&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;0.68x&lt;/strong&gt; ✅ faster than real-time&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The 2.5s result looks slow (4x), but Whisper processes audio in 30-second chunks — the overhead is fixed regardless of audio length. For longer messages, the real-time factor drops well below 1x.&lt;/p&gt;




&lt;h2&gt;
  
  
  Open Source
&lt;/h2&gt;

&lt;p&gt;The project is open source at &lt;a href="https://github.com/gabrielkoo/aws-lambda-whisper-adaptor" rel="noopener noreferrer"&gt;gabrielkoo/aws-lambda-whisper-adaptor&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Key features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Any &lt;a href="https://huggingface.co/models?search=faster-whisper" rel="noopener noreferrer"&gt;faster-whisper&lt;/a&gt; model via &lt;code&gt;HF_MODEL_REPO&lt;/code&gt; env var&lt;/li&gt;
&lt;li&gt;GitHub Actions workflow to sync models from HuggingFace → S3 (&lt;code&gt;quantization=int8&lt;/code&gt; for HF-format models, &lt;code&gt;quantization=none&lt;/code&gt; for pre-converted CTranslate2 models)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;GET /v1/models&lt;/code&gt; — list all models currently on EFS&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;DELETE /v1/models/{owner}/{model}&lt;/code&gt; — remove a model from EFS on demand&lt;/li&gt;
&lt;li&gt;Pre-built Docker image: &lt;code&gt;ghcr.io/gabrielkoo/aws-lambda-whisper-adaptor:latest&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Configurable language detection via &lt;code&gt;WHISPER_LANGUAGE&lt;/code&gt; env var or per-request parameter&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Pre-warming
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;For OpenClaw voice prompts:&lt;/strong&gt; the ~20-30s cold start is often negligible in practice — if you're asking the agent to run a multi-step job, it'll take a few minutes anyway. Pre-warming only matters if you need the very first transcription to be fast.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Cold starts happen when Lambda hasn't been invoked recently. For predictable usage patterns (e.g. a morning standup bot), pre-warm the Lambda before you need it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# prewarm.sh — trigger Lambda init before expected usage&lt;/span&gt;
curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; /dev/null &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$WHISPER_LAMBDA_URL&lt;/span&gt;&lt;span class="s2"&gt;/v1/listen?language=yue"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Token &lt;/span&gt;&lt;span class="nv"&gt;$WHISPER_API_SECRET&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: audio/ogg"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--data-binary&lt;/span&gt; @sample.ogg
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Lambda pre-warmed"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Schedule with cron: &lt;code&gt;0 8 * * * /path/to/prewarm.sh&lt;/code&gt; (runs at 8am daily).&lt;/p&gt;

&lt;p&gt;Alternatively, use an EventBridge rule to ping the Lambda every few minutes — though at that frequency, Provisioned Concurrency starts making more sense cost-wise.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The Lambda + EFS + S3 architecture achieves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;~20-30s cold start&lt;/strong&gt; (INT8 model on EFS); first-ever bootstrap from S3 takes ~55s (one-time only)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;~10s warm invocations&lt;/strong&gt; with &lt;code&gt;language=yue&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;~$0.21/month&lt;/strong&gt; storage cost&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero idle cost&lt;/strong&gt; (scales to zero)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deepgram and OpenAI compatible&lt;/strong&gt; APIs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key insight: &lt;strong&gt;EFS is the missing piece&lt;/strong&gt;. It provides persistent, fast storage that Lambda can access without a NAT Gateway (using the free S3 VPC Gateway Endpoint for bootstrapping).&lt;/p&gt;

&lt;p&gt;I couldn't find any existing write-up of Whisper on Lambda using EFS for persistent model caching — most approaches either bundle the model in Docker (3-minute cold starts) or re-download from S3 on every cold start (no caching between instances). If you've seen this done before, I'd love to know.&lt;/p&gt;

&lt;p&gt;Two things worth knowing before you deploy:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Use &lt;code&gt;Zoont/faster-whisper-large-v3-turbo-int8-ct2&lt;/code&gt; with &lt;code&gt;quantization=none&lt;/code&gt; in the sync-model workflow — it's pre-converted to CTranslate2 INT8 and works out of the box (the &lt;code&gt;openai/whisper-large-v3-turbo&lt;/code&gt; model requires conversion and can hit &lt;code&gt;num_mel_bins&lt;/code&gt; config issues)&lt;/li&gt;
&lt;li&gt;Always pass a &lt;code&gt;language&lt;/code&gt; parameter if you know it — cuts response time roughly in half&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you're building voice transcription on AWS and want Whisper-quality accuracy without the SageMaker complexity, give it a try.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Using EFS as a persistent model cache follows the same pattern I used earlier for &lt;a href="https://dev.to/aws-builders/scale-a-stateful-streamlit-chatbot-with-aws-ecs-and-efs-48gm"&gt;scaling a stateful Streamlit chatbot with ECS + EFS&lt;/a&gt; — if you're building other stateful workloads on AWS, that one's worth a look too.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>whisper</category>
      <category>openclaw</category>
      <category>aws</category>
      <category>efs</category>
    </item>
    <item>
      <title>I Squeezed My $1k Monthly OpenClaw API Bill with ~$20/Month in AWS Credits — Here's the Exact Setup</title>
      <dc:creator>Gabriel Koo</dc:creator>
      <pubDate>Sat, 21 Feb 2026 14:44:45 +0000</pubDate>
      <link>https://dev.to/aws-builders/i-squeezed-my-1k-monthly-openclaw-api-bill-with-20month-in-aws-credits-heres-the-exact-setup-3gj4</link>
      <guid>https://dev.to/aws-builders/i-squeezed-my-1k-monthly-openclaw-api-bill-with-20month-in-aws-credits-heres-the-exact-setup-3gj4</guid>
      <description>&lt;p&gt;&lt;em&gt;Part 1 of my series on building a low-cost personal AI stack on AWS.&lt;/em&gt;&lt;br&gt;
&lt;em&gt;&lt;a href="https://dev.to/aws-builders/drop-in-perplexity-sonar-replacement-with-aws-bedrock-nova-grounding-35o9"&gt;Part 2 — Drop-in Perplexity Sonar replacement with AWS Bedrock Nova Grounding&lt;/a&gt;&lt;/em&gt;&lt;br&gt;
&lt;em&gt;&lt;a href="https://dev.to/aws-builders/from-3-minute-cold-starts-to-20-seconds-whisper-on-aws-lambda-efs-for-openclaw-9c5"&gt;Part 3 — From 3-Minute Cold Starts to ~20 Seconds: Whisper on AWS Lambda + EFS&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;



&lt;p&gt;I've got OpenClaw(MoltBot(ClawdBot)) running locally on a Raspberry Pi — where computation power is scarce, and it's gone unresponsive on me more than a few times. But even on constrained hardware, every chat turn, every memory search, every web lookup is hitting paid APIs. The bill is small at first, then it isn't. I had been using &lt;code&gt;qwen3-coder-480b&lt;/code&gt; for a week or two, and the daily cost skyrocketed to as much as $50.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Assumption:&lt;/strong&gt; OpenClaw is running on hardware you already own or pay for separately — a Raspberry Pi, home server, or existing cloud instance. The compute cost of the host itself isn't counted in the ~$20/month figure here.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you've picked up AWS Credits from events, the &lt;a href="https://builder.aws.com/content/32g2lQ7kc3Py8kKIYGS15Pe8VSS/aws-community-builders-program" rel="noopener noreferrer"&gt;AWS Community Builder program&lt;/a&gt; ($500/year), or AWS Activate — or if your company prefers to keep spend within AWS rather than onboarding yet another SaaS API provider — there's a way to run the whole OpenClaw stack on credits.&lt;/p&gt;

&lt;p&gt;This is how I did it.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Disclaimer&lt;/strong&gt;: The crux of this hack relies heavily on Amazon Q Developer Pro's undocumented while generously high usage ceiling while it lasts. If it's eventually deprecated, we will still need to switch to Kiro plans with overage pricings - still covered by AWS Credits with lower cost/token ratio. &lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  Who This Is For
&lt;/h2&gt;

&lt;p&gt;Two very different reasons to care about this setup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you have AWS Credits to burn:&lt;/strong&gt;&lt;br&gt;
Credits from re:Invent, AWS Community Builder, AWS Activate, or customer programs come with expiry dates. Running your AI assistant stack on them is one of the most practical ways to put idle credits to work — at ~$20/month, $100 in credits covers 5 months of the full stack. If you're sitting on a few hundred dollars with an end-of-year deadline, this is a productive use before they lapse.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you're in a company with procurement or compliance requirements:&lt;/strong&gt;&lt;br&gt;
Every new SaaS vendor is a TPRM exercise. OpenAI for embeddings, Perplexity for web search, Anthropic for Claude — each one is a separate vendor assessment, a separate DPA, and a separate conversation with your security team. For FSI and regulated industries, that's not just overhead — it can be a blocker.&lt;/p&gt;

&lt;p&gt;AWS is likely already in your vendor register. Consolidating on Bedrock means single billing, fewer third-party relationships to manage, and data residency you control. For anything touching customer data in banking, insurance, or healthcare, that's the difference between a quick internal approval and a 3-month procurement cycle.&lt;/p&gt;


&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AWS account&lt;/strong&gt; with Bedrock access enabled in &lt;code&gt;us-east-1&lt;/code&gt; (or another US region)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS credentials&lt;/strong&gt; — a &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/api-keys.html" rel="noopener noreferrer"&gt;Bedrock API key&lt;/a&gt; is the simplest option if your account supports it. Otherwise, a long-term IAM access key/secret key pair works fine and is easier to manage than SSO. IAM Identity Center is only required for the Q Developer Pro layer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Python 3.10+&lt;/strong&gt; — used by kiro-gateway, LiteLLM, and the Nova grounding proxy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Amazon Q Developer Pro subscription&lt;/strong&gt; ($19/user/month, credit-eligible) — required for Layer 1 (kiro-gateway). Kiro Pro, Pro+, or Power plans also work but are credit-based with overage charges — Q Developer Pro is the better deal.&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  What Actually Costs Money in OpenClaw?
&lt;/h2&gt;

&lt;p&gt;Before reaching for solutions, it helps to know exactly where the spend goes. OpenClaw has five distinct cost centers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Main model (LLM)&lt;/strong&gt;&lt;br&gt;
Every chat turn, every agent action, every tool call — all routed through your primary LLM. This is the biggest variable cost. On a busy day it adds up fast.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Memory search (embeddings)&lt;/strong&gt;&lt;br&gt;
OpenClaw's &lt;code&gt;memory_search&lt;/code&gt; tool converts your memory files into vector embeddings and queries them semantically. Every search = an embedding API call. Low cost per call, but it runs constantly in the background.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Web search&lt;/strong&gt;&lt;br&gt;
The &lt;code&gt;web_search&lt;/code&gt; tool hits Perplexity or Brave APIs. Perplexity charges per query on paid plans; Brave gives you $5/month free then charges beyond that.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Browser automation&lt;/strong&gt;&lt;br&gt;
The &lt;code&gt;browser&lt;/code&gt; tool spins up a Chromium instance for web scraping, form filling, and screenshots. Running a full browser on a low-compute machine (Raspberry Pi, t4g.small) is heavy — and cloud browser options cost per session.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Speech-to-text (STT)&lt;/strong&gt;&lt;br&gt;
Voice messages transcribed via your STT provider. OpenAI Whisper API charges per minute of audio — self-hosting on Lambda eliminates this entirely.&lt;/p&gt;

&lt;p&gt;That's it. Five layers. The goal: drive variable cost to zero.&lt;/p&gt;


&lt;h2&gt;
  
  
  My Config: All 5 Layers on AWS Credits
&lt;/h2&gt;

&lt;p&gt;Here's the full picture before we go deep:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Solution&lt;/th&gt;
&lt;th&gt;Credit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Main model&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/jwadow/kiro-gateway" rel="noopener noreferrer"&gt;kiro-gateway&lt;/a&gt; → Amazon Q Developer Pro&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/Jwadow" rel="noopener noreferrer"&gt;@Jwadow&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory search&lt;/td&gt;
&lt;td&gt;Native Bedrock embeddings via &lt;a href="https://github.com/openclaw/openclaw/pull/20191" rel="noopener noreferrer"&gt;PR #20191&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/gabrielkoo" rel="noopener noreferrer"&gt;@gabrielkoo&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Web search&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/gabrielkoo/bedrock-web-search-proxy" rel="noopener noreferrer"&gt;bedrock-web-search-proxy&lt;/a&gt; — Nova Grounding as Perplexity drop-in&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/gabrielkoo" rel="noopener noreferrer"&gt;@gabrielkoo&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Browser&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/vercel-labs/agent-browser/pull/397" rel="noopener noreferrer"&gt;agent-browser + AgentCore provider&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://x.com/pahudnet" rel="noopener noreferrer"&gt;@pahudnet&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Speech-to-text&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/gabrielkoo/aws-lambda-whisper-adaptor" rel="noopener noreferrer"&gt;&lt;code&gt;aws-lambda-whisper-adaptor&lt;/code&gt;&lt;/a&gt; — Whisper on Lambda + EFS&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/gabrielkoo" rel="noopener noreferrer"&gt;@gabrielkoo&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Three of these I built myself. Two were built by other community members. All five are open source.&lt;/p&gt;


&lt;h2&gt;
  
  
  Layer 1: Main Model + Image Analysis — Kiro CLI — Covered by AWS Credits
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Amazon Q Developer Pro: flat-rate access to Claude
&lt;/h3&gt;

&lt;p&gt;The key difference between Amazon Q Developer Pro and Kiro Pro is the billing model. Kiro Pro is credit-based — 1,000 credits/month, pay more if you exceed them. Amazon Q Developer Pro is a flat monthly subscription: &lt;strong&gt;$19/user/month, no per-token billing, no surprise overages.&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Plan&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;th&gt;Usage&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Kiro Free&lt;/td&gt;
&lt;td&gt;$0/mo&lt;/td&gt;
&lt;td&gt;50 credits/month&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kiro Pro&lt;/td&gt;
&lt;td&gt;$20/mo&lt;/td&gt;
&lt;td&gt;1,000 credits + $0.04/credit overage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kiro Pro+&lt;/td&gt;
&lt;td&gt;$40/mo&lt;/td&gt;
&lt;td&gt;2,000 credits + $0.04/credit overage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kiro Power&lt;/td&gt;
&lt;td&gt;$200/mo&lt;/td&gt;
&lt;td&gt;10,000 credits + $0.04/credit overage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Amazon Q Developer Pro (legacy)&lt;/td&gt;
&lt;td&gt;$19/user/mo&lt;/td&gt;
&lt;td&gt;Flat-rate, not credit-capped&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Amazon Q Developer Pro is now a legacy plan in the Kiro ecosystem. AWS has stopped allowing new Builder ID subscriptions to Q Developer Pro — new users can only subscribe through Kiro plans. The undocumented usage limits on Q Pro are likely part of why AWS made this transition. If you're already on Q Developer Pro, you retain access and it remains the better deal for OpenClaw.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Your Q Developer Pro subscription grants access to &lt;code&gt;kiro-cli&lt;/code&gt;. The documented quota is &lt;a href="https://docs.aws.amazon.com/general/latest/gr/amazonqdev.html" rel="noopener noreferrer"&gt;10,000 inference calls/month&lt;/a&gt; — for a personal AI assistant, that's more than enough.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Real-world cost check:&lt;/strong&gt; In 4 days of active OpenClaw usage after switching to kiro-gateway, I consumed ~40M input tokens and ~865K output tokens with Claude Sonnet. OpenClaw loads memory files, system prompts, and tool results into every turn — the context window fills up fast. At standard Bedrock pricing ($3/1M input, $15/1M output), that's ~$135 for 4 days, or roughly &lt;strong&gt;$1,000/month&lt;/strong&gt;. Q Developer Pro covers all of it for $19/month flat.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In practice, I've been running Kiro CLI with OpenClaw daily and haven't hit any rate limits in active use. Note: the &lt;code&gt;/usage&lt;/code&gt; command isn't available under the Q Developer Pro plan — monitor your usage via the AWS console instead. That said, after running OpenClaw with kiro-gateway for several days, I checked the Q Developer usage metrics in the AWS console and the figures hadn't moved at all. It's unclear whether Kiro CLI usage is counted against the same quota as Q Developer's agentic requests, or tracked separately. The &lt;a href="https://aws.amazon.com/q/developer/pricing/" rel="noopener noreferrer"&gt;Amazon Q Developer pricing page&lt;/a&gt; only states "Included (with limits)" for the Pro tier — no specifics on what those limits are or how Kiro CLI calls are metered.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Q Developer Pro requires &lt;a href="https://docs.aws.amazon.com/singlesignon/latest/userguide/what-is.html" rel="noopener noreferrer"&gt;AWS IAM Identity Center&lt;/a&gt; (SSO) — you can't use it with a free Builder ID. If you're already set up with Identity Center (common in enterprise teams and AWS Community Builders with corporate accounts), you're good to go.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Important:&lt;/strong&gt; Standard AWS Credits don't cover per-token Claude usage via Anthropic's marketplace agreement. But the Q Developer Pro subscription fee itself &lt;strong&gt;is&lt;/strong&gt; credit-eligible — making the whole stack fundable with AWS credits. Kiro's flat-rate subscription is currently the only practical way to run Claude in OpenClaw without per-token billing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;New AWS accounts:&lt;/strong&gt; Even if you'd prefer to pay per-token via direct Bedrock API, new accounts often come with &lt;a href="https://dev.to/aws-builders/ultra-low-bedrock-llm-rate-limits-for-new-aws-accounts-time-to-wake-up-your-inactive-aws-accounts-3no0"&gt;ultra-low default rate limits&lt;/a&gt; that can't reliably serve OpenClaw — even when you're willing to pay. The flat-rate Q Developer Pro route sidesteps this entirely.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  kiro-gateway: the bridge
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/jwadow/kiro-gateway" rel="noopener noreferrer"&gt;kiro-gateway&lt;/a&gt; — built by &lt;a href="https://github.com/Jwadow" rel="noopener noreferrer"&gt;@Jwadow&lt;/a&gt; — wraps Kiro CLI and exposes OpenAI-compatible and Anthropic-compatible API endpoints. OpenClaw talks to it like any other provider.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/jwadow/kiro-gateway
&lt;span class="nb"&gt;cd &lt;/span&gt;kiro-gateway
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Edit &lt;code&gt;.env&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PROXY_API_KEY="your-secret-key"
KIRO_CREDS_FILE="~/.aws/sso/cache/kiro-auth-token.json"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run &lt;code&gt;kiro-cli login&lt;/code&gt; once to authenticate — this populates &lt;code&gt;KIRO_CREDS_FILE&lt;/code&gt; automatically. (&lt;code&gt;kiro-cli&lt;/code&gt; is only needed for this initial login; &lt;code&gt;kiro-gateway&lt;/code&gt; reads the token it generates. Re-run if your token expires.) Then:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python main.py &lt;span class="nt"&gt;--port&lt;/span&gt; 9000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Heads up:&lt;/strong&gt; kiro-gateway's hardcoded fallback model list may lag behind new Claude releases. If a model isn't showing up at &lt;code&gt;/v1/models&lt;/code&gt;, add it manually to &lt;code&gt;FALLBACK_MODELS&lt;/code&gt; in &lt;code&gt;kiro/config.py&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Available models via Q Developer Pro:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;claude-sonnet-4.6&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;General tasks, coding, writing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;claude-haiku-4.5&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Fast, lightweight responses&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;claude-opus-4.6&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Complex reasoning, long context&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;OpenClaw config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"providers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"kiro"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"baseUrl"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://localhost:9000"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"apiKey"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"your-secret-key"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"api"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"anthropic-messages"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"agents"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"defaults"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"primary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"kiro/claude-sonnet-4.6"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"imageModel"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"primary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"kiro/claude-sonnet-4.6"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Bonus:&lt;/strong&gt; kiro-gateway works with any tool that supports OpenAI or Anthropic APIs — not just OpenClaw. To use it with Claude Code: &lt;code&gt;ANTHROPIC_BASE_URL=http://localhost:9000&lt;/code&gt; and &lt;code&gt;ANTHROPIC_API_KEY=your-secret-key&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Layer 2: Memory Search — Bedrock Embeddings — Covered by AWS Credits
&lt;/h2&gt;

&lt;p&gt;OpenClaw's &lt;code&gt;memory_search&lt;/code&gt; needs an embedding model. &lt;a href="https://docs.aws.amazon.com/nova/latest/userguide/nova-embeddings.html" rel="noopener noreferrer"&gt;Amazon Nova Multimodal Embeddings&lt;/a&gt; costs ~$0.00014 per 1K tokens — fractions of a cent per query, and covered by AWS Credits.&lt;/p&gt;

&lt;p&gt;OpenClaw's native Bedrock provider doesn't wire up embeddings cleanly yet — &lt;a href="https://github.com/openclaw/openclaw/pull/24892" rel="noopener noreferrer"&gt;PR #24892&lt;/a&gt; - (I made a novice mistake with &lt;a href="https://github.com/openclaw/openclaw/pull/20191" rel="noopener noreferrer"&gt;PR #20191&lt;/a&gt;) is pending merge. Until then, you'll need a local OpenAI-compatible proxy in front of Bedrock. Two options:&lt;/p&gt;

&lt;h3&gt;
  
  
  Option A: LiteLLM
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# litellm_config.yaml&lt;/span&gt;
&lt;span class="na"&gt;model_list&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;model_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nova-2-multimodal-embeddings-v1.0&lt;/span&gt;
    &lt;span class="na"&gt;litellm_params&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;bedrock/amazon.nova-2-multimodal-embeddings-v1:0&lt;/span&gt;
      &lt;span class="na"&gt;aws_region_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;us-east-1&lt;/span&gt;

&lt;span class="na"&gt;litellm_settings&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;drop_params&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;master_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;local-only"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s1"&gt;'litellm[proxy]'&lt;/span&gt;
litellm &lt;span class="nt"&gt;--config&lt;/span&gt; litellm_config.yaml &lt;span class="nt"&gt;--port&lt;/span&gt; 4000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"memorySearch"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"enabled"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"openai"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"remote"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"baseUrl"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://localhost:4000"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"apiKey"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"local-only"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"nova-2-multimodal-embeddings-v1.0"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Option B: bedrock-access-gateway-function-url (serverless, no fixed cost)
&lt;/h3&gt;

&lt;p&gt;My own fork of the original &lt;code&gt;bedrock-access-gateway&lt;/code&gt; — deployed as a Lambda Function URL instead of ALB+Fargate, so there's no $16+/month fixed cost. Full writeup: &lt;a href="https://dev.to/aws-builders/use-amazon-bedrock-models-via-an-openai-api-compatible-serverless-endpoint-now-without-fixed-cost-5hf5"&gt;Use Amazon Bedrock Models with OpenAI SDKs with a Serverless Proxy Endpoint&lt;/a&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; My &lt;a href="https://github.com/aws-samples/bedrock-access-gateway/pull/222" rel="noopener noreferrer"&gt;PR #222&lt;/a&gt; for Nova 2 embedding support against the original &lt;code&gt;bedrock-access-gateway&lt;/code&gt; project has been merged — so my fork pulls from this upstream automatically via &lt;code&gt;prepare_source.sh&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone &lt;span class="nt"&gt;--depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 https://github.com/gabrielkoo/bedrock-access-gateway-function-url
&lt;span class="nb"&gt;cd &lt;/span&gt;bedrock-access-gateway-function-url
./prepare_source.sh
sam build
sam deploy &lt;span class="nt"&gt;--guided&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Grab the &lt;code&gt;FunctionUrl&lt;/code&gt; output after deploy, then:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"memorySearch"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"enabled"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"openai"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"remote"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"baseUrl"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://&amp;lt;your-function-url&amp;gt;.lambda-url.us-east-1.on.aws"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"apiKey"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"your-api-key"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"amazon.nova-2-multimodal-embeddings-v1:0"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Region note:&lt;/strong&gt; &lt;code&gt;amazon.nova-2-multimodal-embeddings-v1:0&lt;/code&gt; availability varies — check the &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/models-regions.html" rel="noopener noreferrer"&gt;Bedrock model availability page&lt;/a&gt;. Make sure your IAM credentials have &lt;code&gt;bedrock:InvokeModel&lt;/code&gt; in your target region.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Once &lt;a href="https://github.com/openclaw/openclaw/pull/24892" rel="noopener noreferrer"&gt;PR #24892&lt;/a&gt; merges, no proxy needed — the config simplifies to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"memorySearch"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"enabled"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bedrock"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"amazon.nova-2-multimodal-embeddings-v1:0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"region"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"us-east-1"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Layer 3: Web Search — Nova Grounding Proxy — Covered by AWS Credits
&lt;/h2&gt;

&lt;p&gt;I built &lt;a href="https://github.com/gabrielkoo/bedrock-web-search-proxy" rel="noopener noreferrer"&gt;&lt;code&gt;bedrock-web-search-proxy&lt;/code&gt;&lt;/a&gt; — a FastAPI wrapper that makes Bedrock Nova Grounding look like the Perplexity Sonar API. No Perplexity or Brave API key needed. Runs entirely on AWS Credits.&lt;/p&gt;

&lt;p&gt;Full writeup: &lt;a href="https://dev.to/aws-builders/drop-in-perplexity-sonar-replacement-with-aws-bedrock-nova-grounding-35o9"&gt;Drop-in Perplexity Sonar Replacement with AWS Bedrock Nova Grounding&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option A: Run locally
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/gabrielkoo/bedrock-web-search-proxy
&lt;span class="nb"&gt;cd &lt;/span&gt;bedrock-web-search-proxy
pip &lt;span class="nb"&gt;install &lt;/span&gt;fastapi uvicorn boto3
uvicorn main:app &lt;span class="nt"&gt;--port&lt;/span&gt; 7000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Option B: Lambda Function URL (zero idle cost)
&lt;/h3&gt;

&lt;p&gt;See the &lt;a href="https://github.com/gabrielkoo/bedrock-web-search-proxy" rel="noopener noreferrer"&gt;deployment guide in the repo&lt;/a&gt; — SAM-based, arm64, python3.13. Once deployed, you get a persistent HTTPS endpoint with no local process to manage.&lt;/p&gt;

&lt;p&gt;OpenClaw config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tools"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"web"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"search"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"perplexity"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"perplexity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"apiKey"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"your-proxy-key"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"baseUrl"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://localhost:7000/v1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sonar-pro"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;All US Nova CRIS (Cross-Region Inference Services) profiles support web grounding (&lt;code&gt;us.amazon.nova-premier-v1:0&lt;/code&gt;, &lt;code&gt;us.amazon.nova-pro-v1:0&lt;/code&gt;, etc.). Native model IDs without the &lt;code&gt;us.&lt;/code&gt; prefix do NOT work — must use CRIS profiles. Web grounding is US regions only (us-east-1, us-east-2, us-west-2).&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Layer 4: Cloud Browser — Bedrock AgentCore — Covered by AWS Credits
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/vercel-labs/agent-browser" rel="noopener noreferrer"&gt;&lt;code&gt;agent-browser&lt;/code&gt;&lt;/a&gt; by Vercel Labs, with the AgentCore provider contributed by &lt;a href="https://github.com/pahud" rel="noopener noreferrer"&gt;Pahud Hsieh&lt;/a&gt; (&lt;a href="https://x.com/pahudnet" rel="noopener noreferrer"&gt;@pahudnet&lt;/a&gt;) — &lt;a href="https://github.com/vercel-labs/agent-browser/pull/397" rel="noopener noreferrer"&gt;PR #397&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The browser runs in AWS — no local Chromium needed. Particularly useful on low-compute instances (Pi, t4g.small) where running a local browser would be too heavy. Covered by AWS Credits.&lt;/p&gt;

&lt;p&gt;Node.js and pnpm required. Since &lt;a href="https://github.com/vercel-labs/agent-browser/pull/397" rel="noopener noreferrer"&gt;PR #397&lt;/a&gt; isn't merged yet, check out the branch directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/vercel-labs/agent-browser
&lt;span class="nb"&gt;cd &lt;/span&gt;agent-browser
git fetch origin pull/397/head:agentcore
git checkout agentcore
pnpm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; pnpm build
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then use it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agent-browser &lt;span class="nt"&gt;-p&lt;/span&gt; agentcore open https://example.com
agent-browser close
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your AWS identity needs these IAM permissions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;bedrock-agentcore:StartBrowserSession&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;bedrock-agentcore:ConnectBrowserAutomationStream&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;bedrock-agentcore:StopBrowserSession&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;On a desktop machine with enough RAM, local CDP (OpenClaw's built-in browser) is free and works fine. AgentCore is the play for headless/low-compute setups.&lt;/p&gt;
&lt;/blockquote&gt;







&lt;h2&gt;
  
  
  Layer 5: Speech-to-Text — Whisper on Lambda — Covered by AWS Credits
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/gabrielkoo/aws-lambda-whisper-adaptor" rel="noopener noreferrer"&gt;&lt;code&gt;aws-lambda-whisper-adaptor&lt;/code&gt;&lt;/a&gt; — self-hosted &lt;a href="https://github.com/SYSTRAN/faster-whisper" rel="noopener noreferrer"&gt;faster-whisper&lt;/a&gt; on AWS Lambda, with Deepgram-compatible and OpenAI-compatible transcription endpoints. EFS-backed model storage, pay-per-use, scales to zero.&lt;/p&gt;

&lt;p&gt;Full setup guide: &lt;a href="https://dev.to/aws-builders/from-3-minute-cold-starts-to-20-seconds-whisper-on-aws-lambda-efs-for-openclaw-9c5"&gt;From 3-Minute Cold Starts to ~20 Seconds: Whisper on AWS Lambda + EFS&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Quick Start
&lt;/h3&gt;

&lt;p&gt;Use the pre-built image — no build step needed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker pull ghcr.io/gabrielkoo/aws-lambda-whisper-adaptor:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use this image URI when creating your Lambda function. The repo includes a SAM template for the full VPC + EFS setup.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Lambda runs in VPC for EFS access — no NAT Gateway needed (free S3 VPC Gateway Endpoint). Cold start is ~20–30s on first invocation after a model download; subsequent calls are fast.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The Cost Math
&lt;/h2&gt;

&lt;p&gt;Without this setup, Claude Sonnet alone runs ~&lt;strong&gt;$1,000/month&lt;/strong&gt; at standard Bedrock pricing — based on real token usage from my own sessions. OpenClaw's large context window (memory files, system prompts, tool results loaded every turn) means the token bill compounds fast.&lt;/p&gt;

&lt;p&gt;The full stack with this setup runs at &lt;strong&gt;~$20/month&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;$19/mo&lt;/strong&gt; — Amazon Q Developer Pro (flat-rate, covers all LLM calls)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;≤$1/mo&lt;/strong&gt; — Bedrock embeddings for memory search (Nova 2 at $0.00014/1K tokens)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Web search, browser automation, and speech-to-text are covered by AWS Credits — no separate line item.&lt;/p&gt;

&lt;p&gt;With &lt;strong&gt;$100 in AWS Credits&lt;/strong&gt;, you cover roughly &lt;strong&gt;5 months&lt;/strong&gt; of the full stack. Both the Q Developer Pro subscription and Bedrock embeddings are credit-eligible — if you're an AWS Community Builder, that $500/year allocation more than covers it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where AWS Credits Come From
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AWS event participant/speaker&lt;/strong&gt; — re:Invent, Summit, local user groups&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS Community Builder&lt;/strong&gt; — $500/year for active builders (&lt;a href="https://builder.aws.com/content/32g2lQ7kc3Py8kKIYGS15Pe8VSS/aws-community-builders-program" rel="noopener noreferrer"&gt;builder.aws.com&lt;/a&gt;). The application opens a few rounds per year — I'm one of the builders in the program.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS Customer Council&lt;/strong&gt; — participation typically includes credits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS Activate&lt;/strong&gt; (startups) — up to $100K&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS Educate / Academy&lt;/strong&gt; — educators and students&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Check your balance: &lt;a href="https://console.aws.amazon.com/billing/home#/credits" rel="noopener noreferrer"&gt;console.aws.amazon.com/billing/home#/credits&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;Five layers. Two built by community members, three I built myself. All open source, all running on AWS Credits.&lt;/p&gt;

&lt;p&gt;To be clear: &lt;strong&gt;kiro-gateway is the most crucial piece here.&lt;/strong&gt; &lt;a href="https://github.com/Jwadow" rel="noopener noreferrer"&gt;@Jwadow&lt;/a&gt; built the bridge that makes Claude accessible without per-token billing — I built the embedding proxy, web search proxy, and Whisper gateway to fill the remaining gaps. &lt;a href="https://github.com/gabrielkoo/bedrock-web-search-proxy" rel="noopener noreferrer"&gt;Web search&lt;/a&gt; and &lt;a href="https://github.com/vercel-labs/agent-browser" rel="noopener noreferrer"&gt;cloud browser&lt;/a&gt; (Layers 3 and 4) are purely AWS Credits — no subscription, per-token billing well covered by AWS Credits.&lt;/p&gt;

&lt;p&gt;If you're already an AWS Community Builder or have credits sitting in your account, there's no reason to be paying per-token for a personal AI assistant. Wire it up once, and the stack runs itself.&lt;/p&gt;

&lt;p&gt;Put those credits to work.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>bedrock</category>
      <category>openclaw</category>
      <category>kiro</category>
    </item>
    <item>
      <title>Drop-in Perplexity Sonar Replacement with AWS Bedrock Nova Grounding</title>
      <dc:creator>Gabriel Koo</dc:creator>
      <pubDate>Fri, 20 Feb 2026 16:26:37 +0000</pubDate>
      <link>https://dev.to/aws-builders/drop-in-perplexity-sonar-replacement-with-aws-bedrock-nova-grounding-35o9</link>
      <guid>https://dev.to/aws-builders/drop-in-perplexity-sonar-replacement-with-aws-bedrock-nova-grounding-35o9</guid>
      <description>&lt;p&gt;&lt;em&gt;Part 2 of my series on building a low-cost personal AI stack on AWS.&lt;/em&gt;&lt;br&gt;
&lt;em&gt;&lt;a href="https://dev.to/aws-builders/i-squeezed-my-1k-monthly-openclaw-api-bill-with-20month-in-aws-credits-heres-the-exact-setup-3gj4"&gt;Part 1 — I Squeezed My $1k Monthly OpenClaw API Bill with ~$20/Month in AWS Credits&lt;/a&gt;&lt;/em&gt;&lt;br&gt;
&lt;em&gt;&lt;a href="https://dev.to/aws-builders/from-3-minute-cold-starts-to-20-seconds-whisper-on-aws-lambda-efs-for-openclaw-9c5"&gt;Part 3 — From 3-Minute Cold Starts to ~20 Seconds: Whisper on AWS Lambda + EFS&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;



&lt;p&gt;If you're running an AI assistant or agent framework that uses Perplexity's Sonar API for web search, you're paying per query — or burning through your monthly credit allocation faster than you'd like.&lt;/p&gt;

&lt;p&gt;I'm on Perplexity Pro, which comes with $5/month in API credits. Sounds fine until you hit mid-month and realize OpenClaw has quietly burned through all of it. I wanted something uncapped that didn't add another bill. If you're an AWS user with any credits sitting around — that $25 from a workshop, an event promo, or re:Invent swag — there's a better option: route those queries through Amazon Bedrock's Nova Premier grounding instead.&lt;/p&gt;

&lt;p&gt;I built &lt;a href="https://github.com/gabrielkoo/bedrock-web-search-proxy" rel="noopener noreferrer"&gt;&lt;code&gt;bedrock-web-search-proxy&lt;/code&gt;&lt;/a&gt;, a FastAPI proxy that makes Bedrock Nova Premier look exactly like the Perplexity Sonar API. Change one URL, keep everything else the same.&lt;/p&gt;
&lt;h2&gt;
  
  
  What is Nova Grounding?
&lt;/h2&gt;

&lt;p&gt;Amazon Nova Premier supports a &lt;code&gt;nova_grounding&lt;/code&gt; system tool that lets the model search the web in real-time and return answers with citations — similar to Perplexity Sonar. The difference: it runs on Bedrock, so it counts against your AWS credits rather than a separate Perplexity subscription.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why Not Just Use Brave Search's Free Tier?
&lt;/h2&gt;

&lt;p&gt;Brave does have an AI Answers API that returns synthesized answers with citations — similar to Perplexity. Two catches though:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Credit card required&lt;/strong&gt; — even the $5/month free tier needs a card on file as an anti-fraud measure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Undocumented model&lt;/strong&gt; — Brave doesn't clearly disclose which LLM powers the answers, so you're trusting a black box&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;With Nova grounding, you know exactly what's running (Nova Premier on Bedrock), and it counts against AWS credits you likely already have. No new billing relationship, no mystery model.&lt;/p&gt;
&lt;h2&gt;
  
  
  Apps That Use Perplexity API
&lt;/h2&gt;

&lt;p&gt;The wrapper is a drop-in for any app that supports Perplexity as a provider:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OpenClaw&lt;/strong&gt; — &lt;code&gt;tools.web.search.perplexity.baseUrl&lt;/code&gt; config&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open WebUI&lt;/strong&gt; — web search integration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LibreChat&lt;/strong&gt; — via Perplexity MCP server&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cursor&lt;/strong&gt; — Perplexity MCP for web research&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Continue.dev&lt;/strong&gt; — Sonar models for codebase context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AnythingLLM&lt;/strong&gt; — Perplexity as cloud LLM provider&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LiteLLM&lt;/strong&gt; — web search interception&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Proof It's Actually Grounded (Not Hallucinated)
&lt;/h2&gt;

&lt;p&gt;Here's a direct API call asking for the current Bitcoin price:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; http://localhost:7000/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "nova-premier-web-grounding",
    "messages": [{"role": "user", "content": "What is the Bitcoin price right now?"}],
    "max_tokens": 200
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Response:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"choices"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The current price of Bitcoin (BTC) is $67,254.57 USD, reflecting a 0.54% increase in the last 24 hours. Last updated February 20, 2026 at 04:34 UTC."&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"citations"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"https://www.latestly.com/technology/bitcoin-price-today-february-20-2026-btc-price-at-usd-67243-up-compared-to-yesterdays-usd-66941-mark-7321498.html"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"https://www.binance.com/en/price/bitcoin"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The citation URL contains today's date in the slug. Not hallucinated — Nova Premier actually fetched and synthesized live web content.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Install and run (one line, no cloning needed)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uvx &lt;span class="nt"&gt;--from&lt;/span&gt; git+https://github.com/gabrielkoo/bedrock-web-search-proxy bedrock-web-search-proxy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or run directly from the raw script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv run https://raw.githubusercontent.com/gabrielkoo/bedrock-web-search-proxy/main/main.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both require &lt;a href="https://docs.astral.sh/uv/" rel="noopener noreferrer"&gt;uv&lt;/a&gt; and AWS credentials with &lt;code&gt;bedrock:InvokeModel&lt;/code&gt; on &lt;code&gt;us.amazon.nova-premier-v1:0&lt;/code&gt;. Region defaults to &lt;code&gt;us-east-1&lt;/code&gt; — override with &lt;code&gt;AWS_DEFAULT_REGION&lt;/code&gt; if needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Configure your app
&lt;/h3&gt;

&lt;p&gt;For &lt;strong&gt;OpenClaw&lt;/strong&gt;, update &lt;code&gt;~/.openclaw/openclaw.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tools"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"web"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"search"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"perplexity"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"perplexity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"baseUrl"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://localhost:7000/v1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"apiKey"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"nova-grounding"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"nova-premier-web-grounding"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;⚠️ The &lt;code&gt;apiKey&lt;/code&gt; must &lt;strong&gt;not&lt;/strong&gt; be a real &lt;code&gt;pplx-&lt;/code&gt; key — OpenClaw detects that prefix and overrides &lt;code&gt;baseUrl&lt;/code&gt; back to Perplexity's servers.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For other apps, just point the Perplexity base URL to &lt;code&gt;http://your-host:7000/v1&lt;/code&gt; and use any model name — the wrapper routes everything to Nova Premier.&lt;/p&gt;

&lt;h2&gt;
  
  
  Model Aliases
&lt;/h2&gt;

&lt;p&gt;All standard Perplexity model names are accepted and routed to Nova Premier (the only Nova model that currently supports the grounding tool):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Request model&lt;/th&gt;
&lt;th&gt;Bedrock model&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;nova-premier-web-grounding&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;us.amazon.nova-premier-v1:0&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;sonar-pro&lt;/code&gt;, &lt;code&gt;sonar-pro-online&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;&lt;code&gt;us.amazon.nova-premier-v1:0&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;sonar&lt;/code&gt;, &lt;code&gt;sonar-mini&lt;/code&gt;, &lt;code&gt;sonar-turbo&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;&lt;code&gt;us.amazon.nova-premier-v1:0&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Cost
&lt;/h2&gt;

&lt;p&gt;Nova Premier usage counts against your AWS credits — so if you have any sitting around (a $25 promo from an AWS event, workshop, or re:Invent swag bag), this is effectively free. Check your &lt;a href="https://console.aws.amazon.com/billing/home#/credits" rel="noopener noreferrer"&gt;Billing console&lt;/a&gt; — you might have more than you think.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AWS Community Builders&lt;/strong&gt;: covered by $500/year credits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Others with AWS credits&lt;/strong&gt;: same deal — credits apply&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No credits&lt;/strong&gt;: check the &lt;a href="https://aws.amazon.com/bedrock/pricing/" rel="noopener noreferrer"&gt;Bedrock pricing page&lt;/a&gt; for current Nova Premier rates&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Caveats
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Streaming doesn't return &lt;code&gt;citations[]&lt;/code&gt;&lt;/strong&gt; — Nova limitation. Non-streaming works fine, and OpenClaw's &lt;code&gt;web_search&lt;/code&gt; tool uses non-streaming.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;MAX_CONCURRENT&lt;/code&gt; semaphore&lt;/strong&gt; defaults to 5 — tune via env var if needed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Region&lt;/strong&gt;: Nova Premier grounding requires &lt;code&gt;us-east-1&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;If you're already on AWS Bedrock for your LLM workloads, there's no reason to pay Perplexity separately for web-grounded search. The wrapper is ~350 lines of Python, has 44 tests, and is OpenAI SDK-compatible — so it works with anything that speaks the Perplexity or OpenAI chat completions API.&lt;/p&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/gabrielkoo/bedrock-web-search-proxy" rel="noopener noreferrer"&gt;github.com/gabrielkoo/bedrock-web-search-proxy&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>bedrock</category>
      <category>python</category>
      <category>webdev</category>
    </item>
    <item>
      <title>AWS Silently Releases Kimi K2.5 and GLM 4.7 Models to Bedrock</title>
      <dc:creator>Gabriel Koo</dc:creator>
      <pubDate>Sun, 08 Feb 2026 15:31:31 +0000</pubDate>
      <link>https://dev.to/aws-builders/aws-silently-releases-kimi-k25-and-glm-47-models-to-bedrock-1514</link>
      <guid>https://dev.to/aws-builders/aws-silently-releases-kimi-k25-and-glm-47-models-to-bedrock-1514</guid>
      <description>&lt;p&gt;&lt;strong&gt;[UPDATE 10 Feb 2026]&lt;/strong&gt; - It seems it was part of AWS’s rollout plan for rolling out open weight models for Kiro and Kiro CLI! &lt;a href="https://kiro.dev/blog/open-weight-models/" rel="noopener noreferrer"&gt;Open weight models are here: more choice, more speed, less cost&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But interestingly among the models covered in this article, only DeepSeek v3.2, Minimax 2.1 and Qwen Coder next were covered. Moonshot K2.5 and GLM-4.7 were missing.&lt;br&gt;
————&lt;br&gt;
I was refreshing my &lt;a href="https://amazonbedrockmodels.github.io" rel="noopener noreferrer"&gt;Bedrock model catalog&lt;/a&gt; script our of random curiosity when a few unfamiliar model IDs showed up in us-east-1. No AWS blog post. No tweet thread. Just a few new entries in the API response.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you've been waiting for a Claude-adjacent model you could swap in seamlessly via AWS credits — this is it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;But there’s a drawback for early adopters - Do read till the end to learn about the flaw!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Kimi K2.5&lt;/strong&gt; (by Moonshot AI), &lt;strong&gt;GLM 4.7&lt;/strong&gt; (by Zhipu AI), and several other new models like DeepSeek 3.2 and Qwen3 Coder Next are now live on Bedrock, all with full support for the Converse API, tool calling, and — in Kimi K2.5's case — native image understanding.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faecm7jquxdar4sdneiz2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faecm7jquxdar4sdneiz2.png" alt=" " width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjbac1lp3pf87i383xmqk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjbac1lp3pf87i383xmqk.png" alt=" " width="800" height="364"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quick note:&lt;/strong&gt; These models aren't listed in the &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/models-supported.html" rel="noopener noreferrer"&gt;AWS Bedrock models-supported documentation&lt;/a&gt;, yet they're fully functional via the Converse API. That's the whole "silent release" thing — available in production, just not reflected in the canonical docs yet. Worth bookmarking your region's actual model list from the Bedrock console instead of relying solely on the written guides.&lt;/p&gt;
&lt;h2&gt;
  
  
  The models: what just landed
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftbe18txuqhsi9nu8y0o2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftbe18txuqhsi9nu8y0o2.png" alt=" " width="800" height="343"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Kimi K2.5&lt;/strong&gt; (&lt;a href="https://www.kimi.com/blog/kimi-k2.5.html" rel="noopener noreferrer"&gt;Moonshot AI blog&lt;/a&gt;) is the eye-catcher here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tool calling (function calling):&lt;/strong&gt; ✓ Fully supported via Bedrock Converse API&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Image understanding:&lt;/strong&gt; ✓ Native image inputs (base64 or URL)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code generation:&lt;/strong&gt; In my testing, it held its own against Claude 4.5 Sonnet on typical coding prompts — it handled a multi-file refactor of a FastAPI router cleanly on the first try&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bedrock Model ID:&lt;/strong&gt; &lt;code&gt;moonshotai.kimi-k2.5&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Availability:&lt;/strong&gt; us-east-1, us-west-2 (and expanding)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use case fit:&lt;/strong&gt; Drop-in replacement for Claude if you're already on AWS credits&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;GLM 4.7&lt;/strong&gt; (&lt;a href="https://z.ai/blog/glm-4.7" rel="noopener noreferrer"&gt;Zhipu AI blog&lt;/a&gt;) fills a quieter but useful role:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tool calling:&lt;/strong&gt; ✓ Supported, though less aggressively tested in my flows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code generation:&lt;/strong&gt; Strong; competitive with Deepseek for certain workloads&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bedrock Model ID:&lt;/strong&gt; &lt;code&gt;zai.glm-4.7(-flash)&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Availability:&lt;/strong&gt; us-east-1, us-west-2&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use case fit:&lt;/strong&gt; Solid all-arounder; good for prompts that don't strictly require image handling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The real unlock:&lt;/strong&gt; Both are live on &lt;code&gt;converse&lt;/code&gt; API, which means they work seamlessly with Bedrock's function-calling infrastructure. &lt;/p&gt;
&lt;h3&gt;
  
  
  When to pick which
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Need&lt;/th&gt;
&lt;th&gt;Pick&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Image understanding + tool calling&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Kimi K2.5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Only Bedrock open-weight flagship model with both&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Text-only tasks, cost-conscious&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;GLM 4.7&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Solid all-arounder, no vision overhead&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Maximum reliability &amp;amp; ecosystem&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Claude&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Battle-tested, widest documentation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;
  
  
  Why this matters for vibe coding
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;"Vibe coding"&lt;/em&gt; — the practice of rapidly iterating on code with LLM assistance, swapping models mid-session, and optimizing for flow over perfection — lives or dies on how frictionless your model-switching is.&lt;/p&gt;

&lt;p&gt;If you're sitting on expiring AWS credits (I've got ~$700 by July 2026), the bottleneck isn't usually "which model is smartest?" — it's "how fast can I swap without rewriting everything?"&lt;/p&gt;

&lt;p&gt;Kimi K2.5 solves a real pain point: until now, if you wanted image understanding + tool calling + AWS-native billing, you were stuck with Claude. And the only way you could have done so is via purchasing Kiro CLI subscription ($20/$40/$200 per month) - Since Anthropic Claude models are not covered by the typical AWS Credits.&lt;/p&gt;

&lt;p&gt;I initially considered subscribing to the $200 Kiro Power plan, but then I am not confident that I could utilize the plan fully every month, but I'm worried that sticking to the $20/$40 plans will result to paying for Kiro credit overages (which is double of the average plan price per credit). Therefore any PAYG option would perfectly fit my usage pattern.&lt;/p&gt;

&lt;p&gt;So now you have an option that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Bills directly to your AWS account&lt;/strong&gt; — no vendor intermediary, no separate API key, just your existing credits burning down&lt;/li&gt;
&lt;li&gt;Runs on the same Bedrock Converse API &lt;/li&gt;
&lt;li&gt;Calls tools reliably&lt;/li&gt;
&lt;li&gt;Natively understands images&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For experimentation loops (refactors, code generation, visual analysis), that's a genuinely useful escape hatch.&lt;/p&gt;
&lt;h2&gt;
  
  
  The lightweight setup: local LiteLLM gateway
&lt;/h2&gt;

&lt;p&gt;You don't need a complex setup. My entire gateway is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A Python venv with &lt;code&gt;litellm&lt;/code&gt; installed&lt;/li&gt;
&lt;li&gt;A single YAML config file (shown in the next section)&lt;/li&gt;
&lt;li&gt;A systemd unit to keep it running on port 4000&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No containers, no Kubernetes. One command to install, one service file to manage. Once running, any client on that machine calls &lt;code&gt;http://localhost:4000/chat/completions&lt;/code&gt; with the standard OpenAI format, and LiteLLM translates it to Bedrock Converse API automatically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Performance note:&lt;/strong&gt; In my testing, the LiteLLM translation layer adds negligible latency (~20–50ms overhead). Streaming responses from Kimi K2.5 feel comparable to calling Claude directly — first tokens arrive within 1–2 seconds for typical prompts.&lt;/p&gt;
&lt;h2&gt;
  
  
  Bonus: Claude Code &amp;amp; OpenCode integration
&lt;/h2&gt;

&lt;p&gt;Here's the slightly cheeky part: you can point &lt;a href="https://docs.anthropic.com/en/docs/claude-code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt; or &lt;a href="https://opencode.ai/" rel="noopener noreferrer"&gt;OpenCode&lt;/a&gt; at your local LiteLLM gateway and route requests through to Kimi K2.5 or GLM 4.7 on Bedrock — all while staying on your AWS credits.&lt;/p&gt;

&lt;p&gt;LiteLLM supports the Anthropic &lt;code&gt;/v1/messages&lt;/code&gt; API endpoint, so it's a two-liner to set up:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://localhost:4000
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_AUTH_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-your-litellm-key
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;kimi-k2.5
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;DISABLE_PROMPT_CACHING&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;DISABLE_PROMPT_CACHING=true&lt;/code&gt; is essential here (Special thanks my colleague &lt;em&gt;&lt;strong&gt;[at]Marty&lt;/strong&gt;&lt;/em&gt; for troubleshooting and fixing that) - since by default Claude Code tries to apply prompt caching for speed, but not every model on Amazon Bedrock supports that:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1cl3hjaw2gf1mlhttp8m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1cl3hjaw2gf1mlhttp8m.png" alt=" " width="800" height="132"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then launch Claude Code or OpenCode as usual. LiteLLM intercepts the Anthropic-format requests and translates them to Bedrock Converse calls. It's not officially blessed by Anthropic, but it works cleanly for local experimentation — and your AWS credits take the hit instead of your Anthropic billing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmqqccoiz00icpxzuj6ao.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmqqccoiz00icpxzuj6ao.png" alt=" " width="800" height="356"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Config: explicit about capabilities
&lt;/h2&gt;

&lt;p&gt;Here's how I route Kimi K2.5 and GLM 4.7:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;model_list&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;model_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kimi-k2.5&lt;/span&gt;
    &lt;span class="na"&gt;litellm_params&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;bedrock/converse/moonshotai.kimi-k2.5&lt;/span&gt;
      &lt;span class="na"&gt;aws_region_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;us-east-1&lt;/span&gt;
      &lt;span class="na"&gt;allowed_openai_params&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;reasoning_effort'&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tools'&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tool_choice'&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;model_info&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;completion&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;model_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;glm-4.7&lt;/span&gt;
    &lt;span class="na"&gt;litellm_params&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;bedrock/converse/zai.glm-4.7&lt;/span&gt;
      &lt;span class="na"&gt;aws_region_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;us-east-1&lt;/span&gt;
      &lt;span class="na"&gt;allowed_openai_params&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;reasoning_effort'&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tools'&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tool_choice'&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;    
    &lt;span class="na"&gt;model_info&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;completion&lt;/span&gt;

&lt;span class="na"&gt;litellm_settings&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;modify_params&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;log_responses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key patterns here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Friendly names&lt;/strong&gt; (&lt;code&gt;kimi-k2.5&lt;/code&gt;, &lt;code&gt;glm-4.7&lt;/code&gt;) instead of long model IDs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Explicit capability flags&lt;/strong&gt; (&lt;code&gt;supports_function_calling&lt;/code&gt;, &lt;code&gt;supports_vision&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;modify_params: true&lt;/code&gt;&lt;/strong&gt; for Bedrock edge-case smoothing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Single region&lt;/strong&gt; (us-east-1) since both models are there&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The capability flags matter. They let your orchestration layer (or agent framework) gracefully degrade if a model can't do tools or images. No more "half-attempt to call a function and fail mysteriously."&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing: quick verification
&lt;/h2&gt;

&lt;p&gt;I tested both models with the Bedrock Converse API in us-east-1. Here's what actually happened:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Kimi K2.5:&lt;/strong&gt; I threw a "get current weather in Tokyo" tool spec at it — standard JSON Schema function definition, nothing fancy. It correctly structured the function call on the first attempt, including proper argument types in the response. For code generation, I asked it to refactor a Python CLI script into async; the output was clean and ran without edits. Image support is declared in the model schema but I haven't validated it hands-on yet — that's next on my list.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GLM 4.7:&lt;/strong&gt; Solid on text queries and code generation. Tool calling works, though it was slightly less eager to invoke tools unprompted compared to Kimi — it sometimes answered directly when I expected a function call. No image support, as expected; Zhipu hasn't added vision capabilities to GLM 4.7.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the quiet release?
&lt;/h2&gt;

&lt;p&gt;These aren't show-stopping announcements. They're bread-and-butter additions to Bedrock's model portfolio. AWS likely brought them in as part of an ongoing expansion to reduce vendor lock-in on "you have to use Claude for everything." That's healthy — more options, better pricing pressure, cleaner credit utilization.&lt;/p&gt;

&lt;p&gt;This is a pattern, not an anomaly. Anthropic Claude, AI21 Jamba, and several Mistral variants all appeared on Bedrock before official blog posts or documentation updates. If you're only checking AWS launch announcements, you're always behind.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;📌 That's exactly why I built &lt;a href="https://amazonbedrockmodels.github.io" rel="noopener noreferrer"&gt;amazonbedrockmodels.github.io&lt;/a&gt;&lt;/strong&gt; — a living catalog of what's &lt;em&gt;actually&lt;/em&gt; available on Bedrock, in which regions, and what each model can do. Bookmark it. It updates faster than the docs.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The model-swapping checklist
&lt;/h2&gt;

&lt;p&gt;If you want to swap models without rewriting your code:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Use a gateway&lt;/strong&gt; (LiteLLM, LLMProxy, or similar) to normalize requests&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pin the Bedrock route explicitly&lt;/strong&gt; (&lt;code&gt;bedrock/converse/modelid&lt;/code&gt;) in your config&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mark capability per model&lt;/strong&gt; (tool calling, vision, etc.) — don't assume&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test the tool spec&lt;/strong&gt; — even "supported" models sometimes have quirky implementations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep a catalog&lt;/strong&gt; so you don't rediscover the same model twice&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Kimi K2.5 fits this playbook cleanly. It's a genuine Claude replacement for Bedrock users, not a "wait and see if it works" experiment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Next steps
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;If you're on AWS credits:&lt;/strong&gt; Spin up a local LiteLLM instance and try both models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;If you find more quietly-available models:&lt;/strong&gt; Open a PR against the catalog or message me&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;If you're in an AWS org:&lt;/strong&gt; Check your Bedrock region — availability is still expanding&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The AWS credits will expire whether you use them or not. Might as well pick models that fit your workflow instead of forcing your workflow around Kiro CLI only.&lt;/p&gt;




&lt;p&gt;P.S. During my testing, indeed I observed occasional longer LLM inference times - but I guess AWS is working on providing more compute capability on these newer beta models.&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;p&gt;[1] &lt;a href="https://www.kimi.com/blog/kimi-k2.5.html" rel="noopener noreferrer"&gt;Moonshot AI — Kimi K2.5 Announcement&lt;/a&gt;&lt;br&gt;
[2] &lt;a href="https://z.ai/blog/glm-4.7" rel="noopener noreferrer"&gt;Zhipu AI — GLM 4.7 Announcement&lt;/a&gt;&lt;br&gt;
[3] &lt;a href="https://docs.aws.amazon.com/cli/latest/reference/bedrock-runtime/converse.html" rel="noopener noreferrer"&gt;AWS Bedrock — Converse API Reference&lt;/a&gt;&lt;br&gt;
[4] &lt;a href="https://docs.litellm.ai/docs/providers/bedrock" rel="noopener noreferrer"&gt;LiteLLM — AWS Bedrock Provider Documentation&lt;/a&gt;&lt;br&gt;
[5] &lt;a href="https://amazonbedrockmodels.github.io" rel="noopener noreferrer"&gt;Unofficial - Amazon Bedrock Model Catalog I Created&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>vibecoding</category>
      <category>ai</category>
    </item>
    <item>
      <title>Ultra Low Bedrock LLM Rate Limits for New AWS Accounts? Time to Wake Up Your Inactive AWS Accounts!</title>
      <dc:creator>Gabriel Koo</dc:creator>
      <pubDate>Tue, 25 Nov 2025 17:00:02 +0000</pubDate>
      <link>https://dev.to/aws-builders/ultra-low-bedrock-llm-rate-limits-for-new-aws-accounts-time-to-wake-up-your-inactive-aws-accounts-3no0</link>
      <guid>https://dev.to/aws-builders/ultra-low-bedrock-llm-rate-limits-for-new-aws-accounts-time-to-wake-up-your-inactive-aws-accounts-3no0</guid>
      <description>&lt;h2&gt;
  
  
  Are You Struggling With Amazon Bedrock’s Ultra-Low Quotas on New AWS Accounts? 🤯
&lt;/h2&gt;

&lt;p&gt;Are you hitting painfully low rate limits when running LLMs on Amazon Bedrock from a newly created AWS account? You’re definitely not alone — many developers are discovering that new accounts often start with &lt;strong&gt;extremely restrictive quotas&lt;/strong&gt;, sometimes as low as &lt;strong&gt;2 requests per minute&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Official guidance usually suggests contacting an account manager to escalate your limits, but for startups, hobby projects, or personal experimentation, that path is far from simple.&lt;/p&gt;

&lt;h2&gt;
  
  
  New Account? Big Ambitions, Tiny Quota 🤏🚧
&lt;/h2&gt;

&lt;p&gt;Since 2024 or maybe 2025, AWS quietly adjusted Bedrock’s default model access for newly created accounts - together with other lower account defaults such as a maximum concurreny of 10 for AWS Lambda. Even when using global endpoints, many fresh accounts get &lt;strong&gt;just a few requests per minute&lt;/strong&gt; (e.g., 2 rpm for Claude 4.5 Sonnet). This severely slows down prototyping or early-stage AI development.&lt;/p&gt;

&lt;p&gt;Meanwhile, &lt;strong&gt;older AWS accounts&lt;/strong&gt; — even ones that never touched Bedrock before — often start with dramatically higher limits, approaching &lt;strong&gt;200+ rpm&lt;/strong&gt; for the exact same models.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffrfvny41yyg8psazvfhu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffrfvny41yyg8psazvfhu.png" alt="Sonnet 4.5 Rates" width="800" height="177"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This creates a real operational advantage for teams with access to aged accounts.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Elder Account Advantage 🕰️✨
&lt;/h2&gt;

&lt;p&gt;AWS appears to apply significantly stricter defaults to newer accounts while preserving far more permissive limits for older ones. This aligns with AWS’s long-standing pattern of maintaining stable experiences for long-time customers.&lt;/p&gt;

&lt;p&gt;In practice, a dormant but years-old AWS account can immediately receive much &lt;strong&gt;higher Bedrock limits&lt;/strong&gt; purely due to its age.&lt;/p&gt;

&lt;p&gt;Below is a striking example: &lt;strong&gt;3 rpm&lt;/strong&gt; for new accounts vs &lt;strong&gt;250 rpm&lt;/strong&gt; for older accounts on Claude 4.5 Opus.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftseif1z5fgpdlpbrxyej.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftseif1z5fgpdlpbrxyej.png" alt="Opus 4.5 Rates" width="800" height="301"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why AWS Is Unlikely to Reduce Older Accounts’ Limits 🔒🏢
&lt;/h2&gt;

&lt;p&gt;Reducing quotas for older accounts would create major risk and break expectations for long-standing customers — especially enterprises.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Many organizations have stable, long-lived workloads.&lt;/li&gt;
&lt;li&gt;Retroactively lowering quotas could break pipelines and violate performance assumptions.&lt;/li&gt;
&lt;li&gt;AWS historically avoids backward-incompatible changes unless absolutely necessary.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because of this, it’s unlikely AWS will apply newer, stricter defaults to older accounts. Teams spinning up new AWS accounts face a much steeper ramp for experimenting with Bedrock.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding and Reusing Older AWS Accounts 🔎📦
&lt;/h2&gt;

&lt;p&gt;If your organization has older AWS accounts lying around, they may offer instant scaling advantages. With the new &lt;strong&gt;&lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/11/aws-organizations-direct-account-transfers/" rel="noopener noreferrer"&gt;AWS Organizations Direct Account Transfer&lt;/a&gt;&lt;/strong&gt; feature, accounts can move between Organizations without removing payment methods or performing the old, painful detachment workflow.&lt;/p&gt;

&lt;p&gt;When moving such accounts, remember to update:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Legal entity name&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Root user email&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Addresses and billing contacts&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tax information&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your organization uses &lt;a href="https://docs.aws.amazon.com/accounts/latest/reference/using-orgs-trusted-access.html" rel="noopener noreferrer"&gt;Trusted Access for AWS Account Management&lt;/a&gt;, these updates are straightforward. Also make sure to audit the account for any leftover resources before dedicating it to GenAI workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why New Accounts? Why Not Run Everything in One AWS Account? 🧩💼
&lt;/h2&gt;

&lt;p&gt;Putting Bedrock experiments, different production services, and dev workloads into a single account sounds convenient — until it isn’t. Combining everything into one single AWS Account creates unnecessary risk and operational noise.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Isolation Protects You 🔥🧱&lt;br&gt;
Account boundaries are AWS’s strongest safety net. One bad experiment or IAM mistake shouldn’t touch production. Isolation limits accidental data access, cost spikes, and incident blast radius.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Governance Stays Clean 📜✨&lt;br&gt;
Different workloads need different controls. Separate accounts keep audits simpler, give clear ownership, and let you apply SCPs and guardrails without compromise.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Costs Stay Transparent 💸📊&lt;br&gt;
Mixing Bedrock prototyping with core services muddies cost reporting. Individual accounts let you track usage, set budgets, and avoid team-to-team disputes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Experiments Move Faster ⚡🔬&lt;br&gt;
A sandbox (ideally an older account with higher limits) lets you test models, tweak IAM, and push boundaries freely — without risking production stability.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It Matches AWS Best Practices 🏗️📚&lt;br&gt;
AWS recommends multi-account setups for lifecycle separation and blast-radius control. Using older accounts for Bedrock while keeping core workloads isolated follows this playbook perfectly.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  How to Break Free From Bedrock’s Slow Lane 🚀💡
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Identify older AWS accounts that haven't been used recently.&lt;/li&gt;
&lt;li&gt;Transfer them into your AWS Organization using the streamlined Direct Account Transfer workflow.&lt;/li&gt;
&lt;li&gt;Update all account metadata for compliance.&lt;/li&gt;
&lt;li&gt;Deploy your Bedrock workloads — and unlock higher default limits instantly.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This approach helps teams accelerate their AI development journey despite the strict constraints placed on newly created AWS accounts.&lt;/p&gt;

&lt;p&gt;Have you seen similar quota differences in your environment? Share your experience — more data points help the community understand the pattern! 🙌&lt;/p&gt;

</description>
      <category>bedrock</category>
      <category>ai</category>
      <category>aws</category>
    </item>
    <item>
      <title>Use OpenAI Codex CLI with Amazon Bedrock Models - Pay As You Go</title>
      <dc:creator>Gabriel Koo</dc:creator>
      <pubDate>Wed, 27 Aug 2025 15:39:16 +0000</pubDate>
      <link>https://dev.to/aws-builders/use-openai-codex-cli-with-amazon-bedrock-models-pay-as-you-go-48eb</link>
      <guid>https://dev.to/aws-builders/use-openai-codex-cli-with-amazon-bedrock-models-pay-as-you-go-48eb</guid>
      <description>&lt;p&gt;&lt;strong&gt;NEW&lt;/strong&gt; (2026 Jan) Newer versions of Codex uses &lt;code&gt;/v1/responses&lt;/code&gt; API by default and dropped support for chat completions endpoint. You'll need to add &lt;code&gt;wire_api = "responses"&lt;/code&gt; to your existing config to use the new endpoint instead: &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/bedrock-mantle.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/bedrock/latest/userguide/bedrock-mantle.html&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  OpenAI Codex CLI on Amazon Bedrock Models: Why Bother?
&lt;/h2&gt;

&lt;p&gt;Here’s why Codex plus the Amazon Bedrock models make sense under some cases:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Pay as you go&lt;/strong&gt;: No fixed cost—just pay for Bedrock tokens and Lambda invocations. No monthly minimum while Amazon Q Developer CLI  has a free-tier quota, you must upgrade to the $19 USD/month paid plan if you have breached it, which &lt;em&gt;still&lt;/em&gt; enforces usage caps.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use your own fine-tuned models&lt;/strong&gt;: Swap model endpoints easily; the gateway can even route to your own Amazon Bedrock fine-tunes (e.g. Nova) without friction.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transparent logging&lt;/strong&gt;: Codex’s request/response logs give you full visibility — a plus for debugging and cost tracking.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No AWS IAM/Identity required - Perfect for Headless Workloads&lt;/strong&gt;: You only need your Bedrock Access Gateway API key; no need to log into your AWS identities with an inconvenient console authentication with your AWS Identity Center user/Builder ID (great for CI/CD and ephemeral cloud instances).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Regional flexibility&lt;/strong&gt;: Yes, you could use &lt;a href="https://docs.anthropic.com/en/docs/claude-code/amazon-bedrock" rel="noopener noreferrer"&gt;Claude Code with Amazon Bedrock&lt;/a&gt;, but then I live in Hong Kong where Claude model usage is not allowed. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Amazon Nova Micro: Price King&lt;/strong&gt;: For pure simple text LLM tasks, swapping Sonnet 4 for Nova Micro cuts costs by a factor of 85 — Comparing between Nova Micro and Sonnet 4.&lt;/li&gt;
&lt;li&gt;If you have a bunch of AWS Credits from AWS events - you're cover with your usages with &lt;code&gt;gpt-oss&lt;/code&gt; / Nova family of models!&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Setup: Codex CLI + Bedrock Gateway
&lt;/h2&gt;

&lt;p&gt;(UPDATE: Deprecated after Codex v0.80.0 &lt;a href="https://github.com/openai/codex/discussions/7782" rel="noopener noreferrer"&gt;https://github.com/openai/codex/discussions/7782&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;Get your Lambda Gateway Function URL and API Key after deployment. (Check my earlier article for a step-by-step guide to get it running on Lambda via AWS SAM: &lt;a href="https://dev.to/aws-builders/use-amazon-bedrock-models-via-an-openai-api-compatible-serverless-endpoint-now-without-fixed-cost-5hf5"&gt;https://dev.to/aws-builders/use-amazon-bedrock-models-via-an-openai-api-compatible-serverless-endpoint-now-without-fixed-cost-5hf5&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;Here's a no-brainer if you want to skip my article and deploy it right away:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;(&lt;/span&gt;
  &lt;span class="nb"&gt;cd&lt;/span&gt; /tmp &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  git clone &lt;span class="nt"&gt;--depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 https://github.com/gabrielkoo/bedrock-access-  gateway-function-url &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nb"&gt;cd &lt;/span&gt;bedrock-access-gateway-function-url &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  ./prepare_source.sh &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  sam build &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  sam deploy &lt;span class="nt"&gt;--guided&lt;/span&gt;
&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now &lt;a href="https://github.com/openai/codex?tab=readme-ov-file#installing-and-running-codex-cli" rel="noopener noreferrer"&gt;install Codex&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm i &lt;span class="nt"&gt;-g&lt;/span&gt; @openai/codex
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Configure Codex like so:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="c"&gt;# ~/.codex/config.toml&lt;/span&gt;
&lt;span class="py"&gt;profile&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;'bedrock'&lt;/span&gt;

&lt;span class="nn"&gt;[profiles.bedrock]&lt;/span&gt;
&lt;span class="py"&gt;model&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;'openai.gpt-oss-120b-1:0'&lt;/span&gt;
&lt;span class="c"&gt;# OR&lt;/span&gt;
&lt;span class="c"&gt;# model = 'us.amazon.nova-premier-v1:0'&lt;/span&gt;
&lt;span class="py"&gt;model_provider&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;'bedrock'&lt;/span&gt;
&lt;span class="py"&gt;model_reasoning_effort&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"low"&lt;/span&gt;
&lt;span class="c"&gt;# NEW! Newer versions of Codex uses /v1/responses API by default.&lt;/span&gt;
&lt;span class="py"&gt;wire_api&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"chat"&lt;/span&gt;

&lt;span class="nn"&gt;[model_providers.bedrock]&lt;/span&gt;
&lt;span class="py"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;'bedrock'&lt;/span&gt;
&lt;span class="py"&gt;base_url&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;'https://RANDOM_HASH_HERE.lambda-url.AWS_REGION.on.aws/api/v1'&lt;/span&gt;
&lt;span class="py"&gt;env_key&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;'CODEX_OPENAI_API_KEY'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Alternatively, if you want to stick to only &lt;code&gt;gpt-oss&lt;/code&gt; models but not e.g. Claude/Nova families of models, you can use the latest official OpenAI compatible endpoint with an Amazon Bedrock API Key instead - there will be no need to host the Bedrock Access Gateway:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="err"&gt;...&lt;/span&gt;
&lt;span class="py"&gt;web_search&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"disabled"&lt;/span&gt;

&lt;span class="nn"&gt;[model_providers.bedrock]&lt;/span&gt;
&lt;span class="py"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"AmazonBedrock"&lt;/span&gt;
&lt;span class="py"&gt;base_url&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"https://bedrock-mantle.us-west-1.api.aws/v1"&lt;/span&gt;
&lt;span class="py"&gt;env_key&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ENV_KEY_FOR_YOUR_BEDROCK_API_KEY"&lt;/span&gt;

&lt;span class="err"&gt;...&lt;/span&gt;

&lt;span class="nn"&gt;[profiles.gpt-oss]&lt;/span&gt;
&lt;span class="c"&gt;# NOTE: The model ID is truncated if you use the responses API.&lt;/span&gt;
&lt;span class="py"&gt;model&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"openai.gpt-oss-120b"&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Query the LLM:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;codex &lt;span class="nt"&gt;--profile&lt;/span&gt; bedrock &lt;span class="s2"&gt;"What is my public IP address?"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1iv3f12t7c5g34cmsmti.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1iv3f12t7c5g34cmsmti.png" alt="Codex with Bedrock model in action" width="800" height="343"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Model Support
&lt;/h2&gt;

&lt;p&gt;Note that not all Bedrock models work over the gateway. Models must support &lt;strong&gt;tool calls&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GPT OSS (20b/120b)&lt;/strong&gt;: Optimzied with Codex&lt;br&gt;
&lt;strong&gt;Nova family (Premier, Pro, Lite, Micro):&lt;/strong&gt; All tested and working.&lt;br&gt;
&lt;strong&gt;Claude, Llama, Mistral, Command R:&lt;/strong&gt; Working, subject to regional restrictions (e.g. Hong Kong).&lt;/p&gt;

&lt;h2&gt;
  
  
  Amazon Q Developer CLI vs Codex CLI on Bedrock
&lt;/h2&gt;

&lt;p&gt;Amazon Q Developer CLI is indeed &lt;strong&gt;officially supported in Hong Kong&lt;/strong&gt; — but after your free usage (&lt;a href="https://aws.amazon.com/q/developer/pricing/" rel="noopener noreferrer"&gt;50 agentic chats/month&lt;/a&gt;), you'll need the $19/month paid plan, and may hit quotas even then.&lt;/p&gt;

&lt;p&gt;Codex CLI via Amazon Bedrock gives &lt;em&gt;unmetered usage&lt;/em&gt; (subject to whatever quotas you have on Amazon Bedrock itself and Lambda), no AWS login required - you just need to prepare the API key that you defined your self when you deployed the Bedrock Access Gateway.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Don't I Just Use the new OpenAI Compatible Endpoint?
&lt;/h2&gt;

&lt;p&gt;Refer to my other blog article &lt;a href="https://dev.to/aws-builders/aws-launches-openai-compatible-api-for-bedrock-and-i-did-some-tests-49cd"&gt;AWS Launches OpenAI-Compatible API for Bedrock (and I Did Some Tests!)&lt;/a&gt;, the new OpenAI compatible Amazon Bedrock API endpoint supports &lt;code&gt;gpt-oss&lt;/code&gt; 20b as well as 120b out of the box, other models like Nova or Claude are not supported.&lt;/p&gt;

&lt;p&gt;So with my solution of wrapping the calls via a Bedrock Access Gateway, you can switch to other models whenever you want according your choice.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Codex CLI + Amazon Bedrock (via OpenAI-compatible gateway) gives developers a way to use pay-as-you-go agentic CLI Agents, swap fine-tuned models easily, and avoid region/pricing issues present in other AWS or Anthropic toolings. For minimal cost, Nova Micro is unbeatable for text workloads. And yes, my serverless gateway solution is the backbone — but more about that in &lt;a href="https://dev.to/aws-builders/use-amazon-bedrock-models-via-an-openai-api-compatible-serverless-endpoint-now-without-fixed-cost-5hf5"&gt;my previous blog&lt;/a&gt;!&lt;/p&gt;

</description>
      <category>vibecoding</category>
      <category>genai</category>
      <category>aws</category>
    </item>
    <item>
      <title>🚀 AWS Launches OpenAI-Compatible API for Bedrock (and I Did Some Tests!)</title>
      <dc:creator>Gabriel Koo</dc:creator>
      <pubDate>Wed, 06 Aug 2025 16:45:12 +0000</pubDate>
      <link>https://dev.to/aws-builders/aws-launches-openai-compatible-api-for-bedrock-and-i-did-some-tests-49cd</link>
      <guid>https://dev.to/aws-builders/aws-launches-openai-compatible-api-for-bedrock-and-i-did-some-tests-49cd</guid>
      <description>&lt;h2&gt;
  
  
  🚀 AWS OpenAI-Compatible API for Bedrock OSS GPT: Real-World Dev Tests &amp;amp; Insights
&lt;/h2&gt;

&lt;p&gt;AWS just dropped a &lt;a href="https://aws.amazon.com/blogs/aws/openai-open-weight-models-now-available-on-aws/" rel="noopener noreferrer"&gt;bombshell&lt;/a&gt; by launching open-weight GPT models (&lt;code&gt;gpt-oss-120b&lt;/code&gt;, &lt;code&gt;gpt-oss-20b&lt;/code&gt;) on Amazon Bedrock as serverless, pay-as-you-go option to self hosting — this caught a lot of headlines 👀.&lt;/p&gt;

&lt;p&gt;But as someone obsessed with &lt;strong&gt;Developer Experience (DX)&lt;/strong&gt;, I was even more stoked that AWS &lt;strong&gt;finally&lt;/strong&gt; launched &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/inference-chat-completions.html" rel="noopener noreferrer"&gt;an official OpenAI-compatible endpoint&lt;/a&gt;:  &lt;/p&gt;

&lt;p&gt;&lt;code&gt;https://bedrock-runtime.us-west-2.amazonaws.com/openai/v1&lt;/code&gt;  &lt;/p&gt;

&lt;p&gt;This puts AWS right alongside Gemini/VertexAI and Anthropic as companies providing first-party OpenAI SDK compatibility, and finally brings native OpenAI "plug-and-play" infra to AWS 🤩&lt;/p&gt;

&lt;p&gt;Before we dive in, &lt;strong&gt;here’s my previous deep-dive blog for context on using Amazon Bedrock models with OpenAI compatibility (without fixed costs):&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://dev.to/aws-builders/use-amazon-bedrock-models-via-an-openai-api-compatible-serverless-endpoint-now-without-fixed-cost-5hf5"&gt;https://dev.to/aws-builders/use-amazon-bedrock-models-via-an-openai-api-compatible-serverless-endpoint-now-without-fixed-cost-5hf5&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  😎 Wait—Does This Make the Bedrock Proxy Gateway Projects Stale?
&lt;/h2&gt;

&lt;p&gt;Nope! While simple OpenAI API calls still runs perfectly with these new official compatibile Amazon Bedrock Runtimeendpoints, &lt;strong&gt;there are critical compatibility gaps&lt;/strong&gt; — so custom proxies/gateways like my &lt;a href="https://github.com/gabrielkoo/bedrock-access-gateway-function-url" rel="noopener noreferrer"&gt;bedrock-access-gateway-function-url&lt;/a&gt; project and the official &lt;a href="https://github.com/aws-samples/bedrock-access-gateway/" rel="noopener noreferrer"&gt;bedrock-access-gateway&lt;/a&gt; project are nowhere near obsolete yet.&lt;/p&gt;




&lt;h2&gt;
  
  
  By the Way, AWS Credits Does Cover the &lt;code&gt;gpt-oss&lt;/code&gt; models!
&lt;/h2&gt;

&lt;p&gt;It has been quite a pity that typical AWS Credits do not cover other models like Claude or Mistral. It's great that this time with the official partnership between AWS and OpenAI, the usage for &lt;code&gt;gpt-oss&lt;/code&gt; models seems covered finally:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frjygmjdd89l6pen8jdot.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frjygmjdd89l6pen8jdot.png" alt=" " width="800" height="391"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🧑‍💻 I Vibe-Coded the Tests with Kiro AI IDE — Here’s What Happened
&lt;/h2&gt;

&lt;p&gt;You can check my &lt;strong&gt;actual repo and test scripts here:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
📦 &lt;a href="https://github.com/gabrielkoo/test-official-amazon-bedrock-openai-compatible-endpoint" rel="noopener noreferrer"&gt;https://github.com/gabrielkoo/test-official-amazon-bedrock-openai-compatible-endpoint&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AWS Official Docs:&lt;/strong&gt; &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/inference-chat-completions.html" rel="noopener noreferrer"&gt;AWS Bedrock Docs: OpenAI Chat Completions API&lt;/a&gt;&lt;br&gt;&lt;br&gt;
&lt;em&gt;(Note: The docs didn’t mention support is limited to latest OpenAI &lt;code&gt;gpt-oss&lt;/code&gt; models only, and didn't mention directly support for &lt;code&gt;tool_call&lt;/code&gt; or &lt;code&gt;response_format&lt;/code&gt; features/parameters, so do read on for my real-world dev findings!)&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary Table:
&lt;/h2&gt;

&lt;p&gt;As a control test, I also ran the same testing script with my local Ollama &lt;code&gt;gpt-oss:20b&lt;/code&gt; model:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Test&lt;/th&gt;
&lt;th&gt;Bedrock OpenAI Endpoint&lt;/th&gt;
&lt;th&gt;Local Ollama + &lt;code&gt;gpt-oss&lt;/code&gt;
&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Basic Chat Completion + Reasoning&lt;/td&gt;
&lt;td&gt;✅ Works: Step-wise reasoning included&lt;/td&gt;
&lt;td&gt;✅ Works, chain-of-thought&lt;/td&gt;
&lt;td&gt;Bedrock wraps reasoning in &lt;code&gt;&amp;lt;reasoning&amp;gt;&lt;/code&gt; tags, both are great for math/logic explanations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;reasoning_effort&lt;/code&gt; parameter&lt;/td&gt;
&lt;td&gt;✅ Supported, adjusts explanation depth&lt;/td&gt;
&lt;td&gt;✅ Supported, works&lt;/td&gt;
&lt;td&gt;Fine-tune output detail—rare outside OpenAI OSS models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🔧 Tool/Function Calling (&lt;code&gt;tools&lt;/code&gt;/&lt;code&gt;tool_choice&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;❌ Only intent as JSON, &lt;em&gt;not&lt;/em&gt; native &lt;code&gt;tool_calls&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;✅ True OpenAI tool_call schema&lt;/td&gt;
&lt;td&gt;Just like the initial state of &lt;code&gt;o1-mini&lt;/code&gt;: AWS "gets" the tools, but you can’t delegate natively&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;response_format&lt;/code&gt; (JSON schema)&lt;/td&gt;
&lt;td&gt;❌ Not supported, API error&lt;/td&gt;
&lt;td&gt;✅ Works, emits JSON&lt;/td&gt;
&lt;td&gt;Bedrock doesn’t do response schema yet&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Token param (&lt;code&gt;max_tokens&lt;/code&gt; vs &lt;code&gt;max_completion_tokens&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;❌ Must use &lt;code&gt;max_completion_tokens&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;✅ Accepts either&lt;/td&gt;
&lt;td&gt;Bedrock is stricter, adjust scripts accordingly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Non-OSS Model (&lt;code&gt;Nova&lt;/code&gt;, &lt;code&gt;Titan&lt;/code&gt;, etc.)&lt;/td&gt;
&lt;td&gt;✅ "Model not found", as expected&lt;/td&gt;
&lt;td&gt;✅ Same&lt;/td&gt;
&lt;td&gt;Endpoint only exposes OSS GPT models (&lt;code&gt;gpt-oss-*&lt;/code&gt;) for now&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  🌟 Key DX Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;API Parity, But Not Quite Full Compatibility:&lt;/strong&gt;
Any OpenAI SDK code (and most OpenAI agent frameworks) can now point at AWS’s Bedrock Runtime &lt;code&gt;/openai/v1&lt;/code&gt; endpoint with just env var tweaks — no code rewriting. This UX is 🔥 for engineers and innovation teams!&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transparent, Adjustable Reasoning:&lt;/strong&gt;
Get full chain-of-thought out-of-the-box, plus parametric reasoning level control. Perfect for math, science, coding, and explainability applications!&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool Calling Support—Almost There:&lt;/strong&gt;
AWS Bedrock OSS GPT models &lt;em&gt;recognize&lt;/em&gt; function call intent (return JSON), but don’t yet emit proper &lt;code&gt;tool_calls&lt;/code&gt; blocks in the response. (While the same test case passed for my local Ollama test on the same &lt;code&gt;gpt-oss&lt;/code&gt; model). This feels just like the early days of the &lt;code&gt;o1-mini&lt;/code&gt; models, so: expect better in future, but for now, your OpenAI tool-calling apps will need adapters or proxies. 🙂&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strict OpenAI Param Parsing:&lt;/strong&gt;
If your code uses &lt;code&gt;max_tokens&lt;/code&gt;, it’ll break here — swap to &lt;code&gt;max_completion_tokens&lt;/code&gt; for as &lt;code&gt;gpt-oss&lt;/code&gt; is a reasoning model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model Registry/Open Weight Only:&lt;/strong&gt;
The official endpoint supports only &lt;code&gt;gpt-oss&lt;/code&gt; models so far (&lt;code&gt;gpt-oss-120b&lt;/code&gt;, &lt;code&gt;gpt-oss-20b&lt;/code&gt;). Other Amazon Bedrock serverless models like &lt;code&gt;Nova&lt;/code&gt; will get you a fast fail as of now.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Still Need Dev Proxy Power:&lt;/strong&gt;
&lt;a href="https://github.com/gabrielkoo/bedrock-access-gateway-function-url" rel="noopener noreferrer"&gt;bedrock-access-gateway-function-url&lt;/a&gt; is &lt;em&gt;not&lt;/em&gt; obsolete! It remains highly useful for compatibility workarounds or for non-&lt;code&gt;gpt-oss&lt;/code&gt; models.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As a disclaimer, if we refer to the model documentation of &lt;code&gt;gpt-oss&lt;/code&gt; on AWS, &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-openai.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-openai.html&lt;/a&gt;, tool calls were not supported directly for &lt;code&gt;gpt-oss&lt;/code&gt; on Amazon Bedrock hosted version as of now, so it’s totally possible when AWS release more model support for the new OpenAI endpoint, these could no longer become a concern!&lt;/p&gt;




&lt;h2&gt;
  
  
  💡 DX Tips for Devs
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Add backward/forward compatibility adapters&lt;/strong&gt; if you’re building serious agentic stacks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Watch the docs&lt;/strong&gt; (&lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/inference-chat-completions.html" rel="noopener noreferrer"&gt;Invoke a model with the OpenAI Chat Completions API&lt;/a&gt;), but always test yourself — AWS hasn't listed all the current limitations yet clearly in the documentation!&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Port your scripts with care:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Reasoning, logic, coding, and OpenAI-style chat: 👍
&lt;/li&gt;
&lt;li&gt;Tool use, output schemas: ⚠️ adapter or proxy needed today!&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;OSS all the things!&lt;/strong&gt; Making the full open-weight LLM stack &lt;em&gt;API-portable&lt;/em&gt; is a big win for multi-cloud, local/cloud hybrid, and plain old sovereignty nerds. It feels like a new era for "bring your own infra, but keep your SDKs."&lt;/li&gt;

&lt;/ul&gt;




&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;Thanks to &lt;a href="https://kiro.dev/" rel="noopener noreferrer"&gt;Kiro AI IDE&lt;/a&gt; for the vibe-code collab and to AWS's team on making the OpenAI compatible endpoint finally launched!&lt;/p&gt;

&lt;p&gt;AWS’s new official OpenAI compatible endpoint for &lt;code&gt;gpt-oss&lt;/code&gt; models is a major leap for cloud/dev portability, even if a few “classic” OpenAI features like native function calling aren’t fully there (yet). If you need adapters, or want total control, proxies like mine remain essential. But for standard chat/reasoning patterns, Bedrock is now OpenAI SDK ready — from your own existing code based on OpenAI SDKs.&lt;/p&gt;

&lt;p&gt;Happy building! 🚀🤖&lt;/p&gt;

&lt;p&gt;P.S. This article was written on 2025-08-06, 1 day after the new endpoint it was launched, so it's totally possible that by the time you read this blog article, AWS could have already rolled out more support for other common OpenAI SDK features!&lt;/p&gt;

</description>
      <category>bedrock</category>
      <category>aws</category>
      <category>openai</category>
    </item>
    <item>
      <title>Regain access to Amazon Q CLI in CloudShell with This Simple Trick</title>
      <dc:creator>Gabriel Koo</dc:creator>
      <pubDate>Sat, 02 Aug 2025 16:53:56 +0000</pubDate>
      <link>https://dev.to/aws-builders/regain-access-to-amazon-q-cli-in-cloudshell-with-this-simple-trick-58m3</link>
      <guid>https://dev.to/aws-builders/regain-access-to-amazon-q-cli-in-cloudshell-with-this-simple-trick-58m3</guid>
      <description>&lt;p&gt;If you’ve tried to use the Amazon Q CLI a.k.a. &lt;code&gt;q chat&lt;/code&gt; command in &lt;strong&gt;AWS CloudShell&lt;/strong&gt; recently, you might have seen the frustrating message 🤯:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Q CLI integration is temporarily disabled. For continued access to Q Chat, please use the integrated version in your AWS Console.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is annoying especially for me as a true lover of Amazon Q CLI which is so far the best GenAI CLI agent for AWS daily users.&lt;/p&gt;

&lt;p&gt;The good news is here’s a guide on how to force reinstall Amazon Q CLI and log in with your own account. Plus, some perspective on why this happens, and why Amazon Q CLI is still a must-have for me 🛠️ — especially if you’re experimenting with GenAI in cloud environments.&lt;/p&gt;

&lt;h2&gt;
  
  
  What’s Happening?
&lt;/h2&gt;

&lt;p&gt;As of time of writing (2025-08-02), AWS has temporarily disabled Amazon Q chat CLI features in CloudShell due to an internal issue (officially documented by AWS &lt;a href="https://docs.aws.amazon.com/cloudshell/latest/userguide/q-cli-features-in-cloudshell.html" rel="noopener noreferrer"&gt;here&lt;/a&gt;). Q-based inline suggestions and the actual chat functionality are both disabled in CloudShell. ⛔️&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Is This an Issue (My Own Speculation)?
&lt;/h2&gt;

&lt;p&gt;I saw an user reporting the issue ⚠️ as early as 2025-05-23: &lt;a href="https://github.com/aws/amazon-q-developer-cli/issues/1944" rel="noopener noreferrer"&gt;https://github.com/aws/amazon-q-developer-cli/issues/1944&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Here’s why I think it’s happening 🧠:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CloudShell&lt;/strong&gt; authenticates you for AWS credentials natively using &lt;strong&gt;AWS IAM users, roles, or Identity Center Access Role sessions&lt;/strong&gt; — so that you can do API calls like &lt;code&gt;aws sts get-caller-identity&lt;/code&gt; within CloudShell without the hassle of configuring the credentials.&lt;/p&gt;

&lt;p&gt;In contrast, &lt;strong&gt;Amazon Q CLI&lt;/strong&gt; expects authentication via &lt;strong&gt;AWS Builder ID&lt;/strong&gt; or &lt;strong&gt;AWS IAM Identity Center&lt;/strong&gt; for Pro plan.&lt;/p&gt;

&lt;p&gt;These authentication models may not be exactly compatible, especially for chat-based workflows requiring licensing or entitlements outside basic AWS IAM integration.&lt;/p&gt;

&lt;p&gt;One last possibility is that it is actually quite hard to apply limits or attribute usage rates of Amazon Q CLI. Imagine if one’s Amazon Q CLI usage with a particular IAM user in CloudShell session exceeds a fair pre-defined limit, one can just create another IAM user and have rate limits reset as a separate user. One could even programmatically set this up for literally unlimited Amazon Q CLI usage quota (if there is not any AWS Account wide limit applied). 💸&lt;/p&gt;

&lt;p&gt;Until AWS aligns these behind the scenes, expect more time from the Amazon Q CLI team to work on a fix.&lt;/p&gt;

&lt;h2&gt;
  
  
  But Amazon Q CLI is Still Cool!
&lt;/h2&gt;

&lt;p&gt;Even with this hiccup, Amazon Q CLI remains hugely valuable ⚡️:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI-driven CLI suggestions right where you work&lt;/li&gt;
&lt;li&gt;Natural language command generation&lt;/li&gt;
&lt;li&gt;Instant code explanations and diagnostics&lt;/li&gt;
&lt;li&gt;Makes security and automation work faster and less error-prone&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you use it outside CloudShell, it truly supercharges your workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Force Reinstall &amp;amp; Login Steps
&lt;/h2&gt;

&lt;p&gt;Here’s comes to the main topic - how you can force a clean install and log in outside CloudShell.&lt;/p&gt;

&lt;p&gt;Easy. Since AWS CloudShell computing environment is just x64 Amazon Linux under the hood, just force uninstall it and then install it according to the &lt;a href="https://docs.aws.amazon.com/amazonq/latest/qdeveloper-ug/command-line-installing-ssh-setup-autocomplete.html#command-line-download-install-file" rel="noopener noreferrer"&gt;official documentation&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# This removes the existing installation&lt;/span&gt;
&lt;span class="nb"&gt;sudo rm&lt;/span&gt; &lt;span class="sb"&gt;`&lt;/span&gt;which q&lt;span class="sb"&gt;`&lt;/span&gt;
&lt;span class="c"&gt;# Download the latest ZIP-based installer&lt;/span&gt;
curl &lt;span class="nt"&gt;--proto&lt;/span&gt; &lt;span class="s1"&gt;'=https'&lt;/span&gt; &lt;span class="nt"&gt;--tlsv1&lt;/span&gt;.2 &lt;span class="nt"&gt;-sSf&lt;/span&gt; &lt;span class="s2"&gt;"https://desktop-release.q.us-east-1.amazonaws.com/latest/q-x86_64-linux-musl.zip"&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="s2"&gt;"q.zip"&lt;/span&gt;

unzip q.zip
./q/install.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When prompted:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Do you want q to modify your shell config (you will have to manually do this otherwise)?&lt;br&gt;
Answer: &lt;strong&gt;Yes&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This helps set up the &lt;strong&gt;Amazon Q CLI&lt;/strong&gt; and autocompletion back.&lt;/p&gt;

&lt;p&gt;Next, authenticate with your preferred credentials:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;q login
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You’ll be guided through a browser-based process (powered by Builder ID, Identity Center, etc.), enter the code, and your CLI session will be authenticated. Once authorized, you get full access (outside limitations of CloudShell).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcgm0ikm5c2x1mm5n8u95.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcgm0ikm5c2x1mm5n8u95.jpeg" alt=" " width="800" height="219"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, let’s test it out. Try:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;q &lt;span class="nb"&gt;help
&lt;/span&gt;q inline &lt;span class="nb"&gt;enable&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With proper authentication, you’ll see &lt;strong&gt;Amazon Q CLI&lt;/strong&gt; in all its glory (in supported environments) 🚀.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0ente2yojzkpgmlmnnio.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0ente2yojzkpgmlmnnio.jpeg" alt=" " width="800" height="543"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Important Points to Note:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;With my method, your Amazon Q CLI usages in CloudShell are bind to your own AWS Builder ID account / your Identity Center user. Be aware of usage rate limits. 🧾&lt;/li&gt;
&lt;li&gt;AWS CloudShell environments would be automatically deleted after &lt;strong&gt;120 days of inactivity&lt;/strong&gt;. You might need to re-do this hack if you haven’t used a particular CloudShell environment for more than 120 days. 🗑️&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Don’t stress if &lt;code&gt;q chat&lt;/code&gt; is broken in CloudShell—it’s a temporary AWS-side limitation.&lt;/li&gt;
&lt;li&gt;Use my way to force reinstall and login outside &lt;strong&gt;AWS CloudShell&lt;/strong&gt; for a working setup.&lt;/li&gt;
&lt;li&gt;Integration hurdles are likely caused by the difference in authentication flows: IAM / Identity Center built into CloudShell vs. Builder ID or Pro Plan requirements for Amazon Q CLI.&lt;/li&gt;
&lt;li&gt;Amazon Q CLI is a great tool—just leverage it where AWS supports full features, and watch for future CloudShell improvements.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Happy hacking — enjoy more intelligent, secure, and productive time in your CloudShell terminal! ☁️&lt;/p&gt;

</description>
      <category>aws</category>
      <category>cloud</category>
      <category>amazonqcli</category>
    </item>
    <item>
      <title>Building GitHub-Style Contribution Grids for dev.to Articles with Kiro AI IDE</title>
      <dc:creator>Gabriel Koo</dc:creator>
      <pubDate>Sat, 02 Aug 2025 03:04:21 +0000</pubDate>
      <link>https://dev.to/kirodotdev/building-github-style-contribution-grids-for-devto-articles-with-ai-3fpn</link>
      <guid>https://dev.to/kirodotdev/building-github-style-contribution-grids-for-devto-articles-with-ai-3fpn</guid>
      <description>&lt;p&gt;&lt;em&gt;How I created beautiful GitHub-style contribution graphs and traffic analytics for my dev.to articles using Kiro.dev - without writing much code myself&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgcnti335gilug6n93fd9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgcnti335gilug6n93fd9.png" alt=" " width="800" height="707"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem: Understanding Your Content's Impact
&lt;/h2&gt;

&lt;p&gt;As a technical writer publishing on dev.to, I found myself constantly wondering about my articles' performance. Sure, dev.to provides basic stats, but I wanted something more comprehensive - something that could show me:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub-style contribution grids&lt;/strong&gt; for my writing activity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Traffic source analysis&lt;/strong&gt; to understand where readers discover my content&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Historical trends&lt;/strong&gt; to see how articles perform over time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Beautiful visualizations&lt;/strong&gt; I could embed anywhere&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Working frequently with SEO teams, I was particularly interested in understanding traffic sources. Were my articles getting discovered through Google searches? LinkedIn shares? Twitter threads? Newsletter mentions? This data would be crucial for optimizing my content distribution strategy.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: GitHub-Style Analytics with Traffic Insights
&lt;/h2&gt;

&lt;p&gt;What started as a simple idea turned into a comprehensive analytics system that fetches data from dev.to's API and generates beautiful SVG visualizations. The system creates GitHub-style contribution grids and tracks:&lt;/p&gt;

&lt;h3&gt;
  
  
  📊 Core Metrics
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Views&lt;/strong&gt;: Daily article view counts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reactions&lt;/strong&gt;: Likes, unicorns, and bookmarks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Comments&lt;/strong&gt;: Reader engagement levels&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Combined Activity&lt;/strong&gt;: Weighted scoring system&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🌐 Traffic Source Analysis
&lt;/h3&gt;

&lt;p&gt;The system tracks where your readers come from and visualizes it in a beautiful pie chart:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Direct traffic&lt;/strong&gt; (users typing your URL directly)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Search engines&lt;/strong&gt; (Google, Bing, DuckDuckGo)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Social media&lt;/strong&gt; (LinkedIn, Twitter, Facebook)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Developer platforms&lt;/strong&gt; (GitHub, dev.to internal traffic)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Newsletters and blogs&lt;/strong&gt; (Substack, personal blogs)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Professional tools&lt;/strong&gt; (Slack, Microsoft Office)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj8cu4i8oc6msf1a0exy5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj8cu4i8oc6msf1a0exy5.png" alt="Traffic sources pie chart showing distribution of 19,096 total views" width="800" height="638"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  📈 GitHub-Style Contribution Grids
&lt;/h3&gt;

&lt;p&gt;The system generates six types of beautiful SVG visualizations:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Views Activity Grid&lt;/strong&gt; - GitHub-style contribution graph with green color scheme&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reactions Activity Grid&lt;/strong&gt; - Purple-themed grid showing engagement patterns
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Combined Activity Grid&lt;/strong&gt; - Orange-themed weighted activity visualization&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Top Articles by Views&lt;/strong&gt; - Horizontal bar chart of most-viewed content&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Top Articles by Reactions&lt;/strong&gt; - Bar chart highlighting most engaging articles&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Traffic Sources Pie Chart&lt;/strong&gt; - Visual breakdown of where your readers come from&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Magic: Built with Kiro.dev
&lt;/h2&gt;

&lt;p&gt;Here's where the story gets interesting - &lt;strong&gt;I didn't write most of this code myself&lt;/strong&gt;. The entire project was created using &lt;a href="https://kiro.dev/" rel="noopener noreferrer"&gt;Kiro.dev&lt;/a&gt;, an AI-powered development environment that acts as your coding partner.&lt;/p&gt;

&lt;h3&gt;
  
  
  How Kiro.dev Transformed My Workflow
&lt;/h3&gt;

&lt;p&gt;Instead of spending weeks researching APIs, designing data structures, and debugging visualization code, I simply described what I wanted:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I want to create GitHub-style contribution grids for my dev.to articles, track traffic sources, and generate beautiful SVG visualizations I can embed anywhere."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Kiro.dev understood the requirements and generated the complete system:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Data fetching&lt;/strong&gt; - &lt;a href="//fetch_stats.py"&gt;&lt;code&gt;fetch_stats.py&lt;/code&gt;&lt;/a&gt; handles API integration with error handling and rate limiting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Contribution grids&lt;/strong&gt; - &lt;a href="//generate_advanced_graph.py"&gt;&lt;code&gt;generate_advanced_graph.py&lt;/code&gt;&lt;/a&gt; creates GitHub-style activity visualizations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Top articles charts&lt;/strong&gt; - &lt;a href="//generate_top_articles.py"&gt;&lt;code&gt;generate_top_articles.py&lt;/code&gt;&lt;/a&gt; generates ranking visualizations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Traffic analysis&lt;/strong&gt; - &lt;a href="//generate_traffic_pie_chart.py"&gt;&lt;code&gt;generate_traffic_pie_chart.py&lt;/code&gt;&lt;/a&gt; creates traffic source breakdowns&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automation&lt;/strong&gt; - &lt;a href="//.github/workflows/"&gt;&lt;code&gt;.github/workflows/&lt;/code&gt;&lt;/a&gt; sets up daily updates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data management&lt;/strong&gt; - Smart incremental updates and referrer tracking&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  The Development Experience
&lt;/h3&gt;

&lt;p&gt;Working with Kiro.dev felt like pair programming with an expert developer who:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Understood context&lt;/strong&gt; - Grasped the full project scope from minimal descriptions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Made smart decisions&lt;/strong&gt; - Chose appropriate data structures and algorithms&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wrote clean code&lt;/strong&gt; - Generated well-documented, maintainable Python scripts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anticipated needs&lt;/strong&gt; - Added features I hadn't even thought of, like referrer tracking&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The AI didn't just write code - it architected a complete solution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Technical Deep Dive
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Data Architecture
&lt;/h3&gt;

&lt;p&gt;The system organizes data efficiently:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;data/
├── articles/           # Individual article analytics
│   ├── {id}-{slug}.json
├── account.json        # Aggregated account stats
└── top_articles.json   # Rankings by metrics
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each article file contains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Basic metrics (views, comments, reactions)&lt;/li&gt;
&lt;li&gt;Daily breakdown of activity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Referrer data&lt;/strong&gt; showing traffic sources&lt;/li&gt;
&lt;li&gt;Metadata for URL construction&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  API Integration
&lt;/h3&gt;

&lt;p&gt;The system uses three dev.to API endpoints:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/analytics/historical&lt;/code&gt; - Historical analytics data&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/analytics/referrers&lt;/code&gt; - Traffic source information&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/articles/me/published&lt;/code&gt; - Article listings&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Smart Updates
&lt;/h3&gt;

&lt;p&gt;The fetcher implements intelligent incremental updates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Only fetches new data since last update&lt;/li&gt;
&lt;li&gt;Refreshes the second-to-last day to catch delayed analytics&lt;/li&gt;
&lt;li&gt;Handles API rate limits gracefully&lt;/li&gt;
&lt;li&gt;Maintains data integrity with backup strategies&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Visualization Engine
&lt;/h3&gt;

&lt;p&gt;The SVG generation system creates publication-ready graphics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Responsive design&lt;/strong&gt; that works at any size&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Interactive tooltips&lt;/strong&gt; with detailed information&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clickable elements&lt;/strong&gt; linking to original articles&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multiple color schemes&lt;/strong&gt; for different contexts&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Key Insights from Traffic Analysis
&lt;/h2&gt;

&lt;p&gt;After implementing this system, I discovered fascinating patterns in my content's reach:&lt;/p&gt;

&lt;h3&gt;
  
  
  Traffic Source Breakdown
&lt;/h3&gt;

&lt;p&gt;From my 19,096 total views, the top 10 referrers account for 14,974 views (78.4%):&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Direct traffic (41.6%)&lt;/strong&gt; - 7,939 views from users directly accessing articles&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google.com (24.3%)&lt;/strong&gt; - 4,639 views from organic search&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;t.co (3.2%)&lt;/strong&gt; - 619 views from Twitter links&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LinkedIn.com (3.0%)&lt;/strong&gt; - 582 views from professional network shares&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;dev.to (2.4%)&lt;/strong&gt; - 459 views from platform internal discovery&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;office.net (0.9%)&lt;/strong&gt; - 172 views from Microsoft Office applications&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;duckduckgo.com (0.9%)&lt;/strong&gt; - 172 views from privacy-focused search&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;bing.com (0.8%)&lt;/strong&gt; - 158 views from Microsoft search&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;facebook.com (0.6%)&lt;/strong&gt; - 121 views from social media shares&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;mrugalski.pl (0.6%)&lt;/strong&gt; - 113 views from a personal blog referral&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  SEO Team Goldmine
&lt;/h3&gt;

&lt;p&gt;This data proved invaluable for SEO strategy discussions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Search dominance&lt;/strong&gt; - Google alone drives 24.3% of traffic, with additional search engines contributing more&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Social amplification&lt;/strong&gt; - Twitter (3.2%) and LinkedIn (3.0%) show strong professional sharing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Platform dynamics&lt;/strong&gt; - Only 2.4% comes from dev.to's internal discovery, showing external reach&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Direct engagement&lt;/strong&gt; - 41.6% direct traffic indicates strong brand recognition and bookmarking&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Content Performance Patterns
&lt;/h3&gt;

&lt;p&gt;The top-performing article "I bought us-east-1.com" generated 9,021 views with fascinating traffic patterns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;71% direct traffic&lt;/strong&gt; - Indicating strong word-of-mouth sharing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;6.6% Twitter traffic&lt;/strong&gt; - Viral social media spread&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;5.7% LinkedIn traffic&lt;/strong&gt; - Professional network engagement&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Showcasing Results: Raw GitHub Content Integration
&lt;/h2&gt;

&lt;p&gt;One of the most powerful features is the ability to embed these visualizations anywhere using GitHub's raw content URLs. Here's how:&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Generate Your Visualizations
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# GitHub-style contribution grids&lt;/span&gt;
python3 generate_advanced_graph.py &lt;span class="nt"&gt;--metric&lt;/span&gt; views &lt;span class="nt"&gt;--color&lt;/span&gt; github
python3 generate_advanced_graph.py &lt;span class="nt"&gt;--metric&lt;/span&gt; reactions &lt;span class="nt"&gt;--color&lt;/span&gt; purple

&lt;span class="c"&gt;# Top articles charts  &lt;/span&gt;
python3 generate_top_articles.py &lt;span class="nt"&gt;--metric&lt;/span&gt; reactions &lt;span class="nt"&gt;--count&lt;/span&gt; 3

&lt;span class="c"&gt;# Traffic sources analysis&lt;/span&gt;
python3 generate_traffic_pie_chart.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Embed in README Files
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="p"&gt;![&lt;/span&gt;&lt;span class="nv"&gt;Views Activity&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;https://raw.githubusercontent.com/yourusername/yourrepo/main/graphs/devto_views_graph.svg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;![&lt;/span&gt;&lt;span class="nv"&gt;Top Articles&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;https://raw.githubusercontent.com/yourusername/yourrepo/main/graphs/top_3_reactions.svg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;![&lt;/span&gt;&lt;span class="nv"&gt;Traffic Sources&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;https://raw.githubusercontent.com/yourusername/yourrepo/main/graphs/traffic_sources_pie.svg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Use in Documentation
&lt;/h3&gt;

&lt;p&gt;The SVGs work perfectly in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub README files&lt;/strong&gt; - Showcase your writing activity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Portfolio websites&lt;/strong&gt; - Demonstrate content creation consistency&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Blog posts&lt;/strong&gt; - Visual proof of engagement metrics&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Social media&lt;/strong&gt; - Eye-catching performance summaries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fskmwik785v31ggx0ecul.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fskmwik785v31ggx0ecul.png" alt="README file showing embedded SVG graphs" width="800" height="1070"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Pro Tips for Display
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Repository structure&lt;/strong&gt; - Keep graphs in a dedicated &lt;code&gt;/graphs&lt;/code&gt; folder&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Naming convention&lt;/strong&gt; - Use descriptive filenames like &lt;code&gt;devto_views_graph.svg&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Update automation&lt;/strong&gt; - GitHub Actions ensures graphs stay current&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multiple formats&lt;/strong&gt; - Generate different metrics and color schemes for various contexts&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Automation Layer
&lt;/h2&gt;

&lt;p&gt;GitHub Actions keeps everything current with daily updates:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Runs daily at midnight UTC&lt;/span&gt;
&lt;span class="c1"&gt;# Fetches latest analytics&lt;/span&gt;
&lt;span class="c1"&gt;# Generates updated visualizations&lt;/span&gt;
&lt;span class="c1"&gt;# Commits changes automatically&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means your analytics dashboard stays fresh without any manual intervention.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. AI-Assisted Development is Transformative
&lt;/h3&gt;

&lt;p&gt;Kiro.dev didn't just speed up development - it elevated the entire solution. The AI suggested features and optimizations I wouldn't have considered, like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Logarithmic scaling for better visualization distribution&lt;/li&gt;
&lt;li&gt;Referrer data aggregation for traffic source analysis&lt;/li&gt;
&lt;li&gt;Incremental update strategies for API efficiency&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Data Visualization Drives Insights
&lt;/h3&gt;

&lt;p&gt;Having visual representations of my writing activity revealed patterns invisible in raw numbers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Consistency gaps in publishing schedule&lt;/li&gt;
&lt;li&gt;Correlation between article topics and engagement&lt;/li&gt;
&lt;li&gt;Traffic source diversity indicating content reach&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. SEO Integration is Crucial
&lt;/h3&gt;

&lt;p&gt;The referrer tracking proved invaluable for SEO discussions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Identifying high-performing traffic sources&lt;/li&gt;
&lt;li&gt;Understanding content discovery patterns&lt;/li&gt;
&lt;li&gt;Optimizing distribution strategies&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;Want to build your own analytics dashboard? The complete system is available as an open-source project. Here's how to get started:&lt;/p&gt;

&lt;h3&gt;
  
  
  Quick Setup
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Fork the repository - 
&lt;strong&gt;&lt;a href="https://github.com/gabrielkoo/devto-stats-github-action" rel="noopener noreferrer"&gt;📊 devto-stats-github-action&lt;/a&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Add your dev.to API key to GitHub Actions Secrets&lt;/li&gt;
&lt;li&gt;Enable GitHub Actions for automatic updates&lt;/li&gt;
&lt;li&gt;Customize visualizations to match your brand&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Customization Options
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Color schemes&lt;/strong&gt; - GitHub green, purple, blue, or orange themes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metrics focus&lt;/strong&gt; - Emphasize views, reactions, or combined activity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Time ranges&lt;/strong&gt; - Adjust historical data collection periods&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Graph types&lt;/strong&gt; - Modify visualization styles and layouts&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Building this analytics dashboard taught me that the future of development isn't about replacing developers - it's about amplifying our capabilities. With Kiro.dev as my coding partner, I created a sophisticated system that would have taken weeks to build manually.&lt;/p&gt;

&lt;p&gt;The insights gained from traffic source analysis have already improved my content strategy, and the beautiful visualizations provide compelling evidence of writing consistency and engagement.&lt;/p&gt;

&lt;p&gt;Most importantly, this project demonstrates how AI-assisted development can help creators focus on what matters most - creating great content - while still building the tools needed to understand and optimize their impact.&lt;/p&gt;

&lt;p&gt;Whether you're a technical writer, developer advocate, or content creator, having detailed analytics about your work's reach and impact is invaluable. And with tools like Kiro.dev, building these systems is more accessible than ever.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Ready to build your own analytics dashboard? Check out the complete source code and start tracking your content's impact today.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/gabrielkoo/devto-stats-github-action" rel="noopener noreferrer"&gt;📊 devto-stats-github-action&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcaprspsujgieu9vx7tuo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcaprspsujgieu9vx7tuo.png" alt="Sample" width="800" height="564"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devto</category>
      <category>kirodev</category>
      <category>seo</category>
      <category>analytics</category>
    </item>
    <item>
      <title>Why I Built a Web UI for Amazon Q Developer CLI (And How I Vibe-Coded It)</title>
      <dc:creator>Gabriel Koo</dc:creator>
      <pubDate>Thu, 26 Jun 2025 00:52:11 +0000</pubDate>
      <link>https://dev.to/aws-builders/why-i-built-a-web-ui-for-amazon-q-developer-cli-and-how-i-vibe-coded-it-54d6</link>
      <guid>https://dev.to/aws-builders/why-i-built-a-web-ui-for-amazon-q-developer-cli-and-how-i-vibe-coded-it-54d6</guid>
      <description>&lt;h2&gt;
  
  
  🚀 &lt;strong&gt;Beyond the Console: Why I Built a Web UI for the Amazon Q Developer CLI&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/q/" rel="noopener noreferrer"&gt;Amazon Q&lt;/a&gt; is a &lt;strong&gt;game-changer for developers&lt;/strong&gt;, but if you've only used it inside the &lt;strong&gt;AWS Console&lt;/strong&gt;, you're only seeing part of the picture. Using Q in the Console is great for general AWS questions, but when it comes to &lt;strong&gt;interacting with your own code and environment&lt;/strong&gt;, it hits fundamental limits.&lt;/p&gt;

&lt;p&gt;My project started not from a corporate need, but from a personal one familiar to many developers: the &lt;strong&gt;homelab&lt;/strong&gt;. I wanted a way to make quick fixes on my server from anywhere, but was constantly frustrated by the &lt;strong&gt;clunky experience of mobile terminals&lt;/strong&gt;. This is the story of why I built the &lt;strong&gt;Amazon Q Developer CLI WebUI&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  📱 &lt;strong&gt;The Real Problem: The Mobile Terminal Nightmare&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Picture this: you're on the couch with your phone and have an idea for a quick change on your homelab server. You log in through a mobile web terminal and immediately hit a wall. Have you ever tried to &lt;strong&gt;paste a multiline command&lt;/strong&gt; into one? Or &lt;strong&gt;send a &lt;code&gt;Ctrl+C&lt;/code&gt; to stop a runaway process&lt;/strong&gt;? It's a nightmare of awkward UIs, missed keystrokes, and immense frustration.&lt;/p&gt;

&lt;h3&gt;
  
  
  🧱 &lt;strong&gt;Example 1&lt;/strong&gt;: The Multi-Line &lt;code&gt;docker run&lt;/code&gt; Command
&lt;/h3&gt;

&lt;p&gt;A common task in a homelab is running a new Docker container with specific ports and volumes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;my-app &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt; 8080:80 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;pwd&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;/data:/app/data &lt;span class="se"&gt;\&lt;/span&gt;
  nginx
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Mobile Frustrations&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Line Breaks&lt;/strong&gt;: Typing a backslash (&lt;code&gt;\&lt;/code&gt;) then hitting Enter is clumsy and error-prone on mobile keyboards.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Special Characters&lt;/strong&gt;: Juggling &lt;code&gt;-&lt;/code&gt;, &lt;code&gt;:&lt;/code&gt;, &lt;code&gt;=&lt;/code&gt;, and &lt;code&gt;$&lt;/code&gt; requires constant switching between keyboard symbol layouts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Editing&lt;/strong&gt;: Fixing a typo in the middle of this command on a touchscreen is &lt;strong&gt;incredibly difficult&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🔍 &lt;strong&gt;Example 2&lt;/strong&gt;: The Piped &lt;code&gt;find&lt;/code&gt; and &lt;code&gt;grep&lt;/code&gt; Command
&lt;/h3&gt;

&lt;p&gt;You need to search for a specific configuration line within all &lt;code&gt;.conf&lt;/code&gt; files:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;find &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;-name&lt;/span&gt; &lt;span class="s2"&gt;"*.conf"&lt;/span&gt; | xargs &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s2"&gt;"listen_port"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Mobile Frustrations&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Keyboard Gymnastics&lt;/strong&gt;: This short command is packed with hard-to-access symbols on mobile keyboards: &lt;code&gt;.&lt;/code&gt;, &lt;code&gt;*&lt;/code&gt;, &lt;code&gt;"&lt;/code&gt;, and especially &lt;code&gt;|&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Pipe (&lt;code&gt;|&lt;/code&gt;)&lt;/strong&gt;: Often deeply hidden, requiring multiple taps to find and type.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No &lt;code&gt;Ctrl&lt;/code&gt; Key&lt;/strong&gt;: If a command hangs, the lack of an easy &lt;code&gt;Ctrl+C&lt;/code&gt; makes stopping it a nightmare.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This frustration becomes a &lt;strong&gt;major barrier&lt;/strong&gt; when using powerful tools like the &lt;strong&gt;Amazon Q Developer CLI&lt;/strong&gt;. The CLI is the perfect assistant for the job—but it's &lt;strong&gt;shackled to a terminal&lt;/strong&gt; that mobile browsers just can't handle.&lt;/p&gt;

&lt;h2&gt;
  
  
  🤯 Other Reasons for Build This Project
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;You cannot install the great Amazon Q Developer CLI on &lt;strong&gt;Windows&lt;/strong&gt; 🪟 unless you setup Windows Subsystem for Linux (WSL).&lt;/li&gt;
&lt;li&gt;For other AI Agent alternatives, they usually belong to one of the cases below:

&lt;ul&gt;
&lt;li&gt;Pure server side AI Agents - like Gemini/ChatGPT, they have access to tools/canvas and even run code for you - but they are not connected to your computer/servers&lt;/li&gt;
&lt;li&gt;CLI AI Agents - like Codex CLI, requires you to use in a Bring-Your-Own-Key way. You have to be very careful with your context length as you need to worry about token usage. &lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  🤖 &lt;strong&gt;Q Developer CLI: An Agent with Better Tools&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Think of the &lt;a href="https://github.com/aws/amazon-q-developer-cli" rel="noopener noreferrer"&gt;Amazon Q Developer CLI&lt;/a&gt; as a &lt;strong&gt;specialized AI agent&lt;/strong&gt; with more powerful "tools" than the AWS Console version. Its biggest advantage? &lt;strong&gt;File system access&lt;/strong&gt;, giving it the &lt;strong&gt;context it needs to be a true coding partner&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Also look at how I vibe coded a Star Wars inspired lightsaber duel game: &lt;a href="https://dev.to/aws-builders/building-a-plasma-sword-fighter-game-with-amazon-q-cli-279g"&gt;Building A Plasma Sword Fighter Game with Amazon Q CLI&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  🔍 &lt;strong&gt;Access Comparison&lt;/strong&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Q in AWS Console&lt;/th&gt;
&lt;th&gt;Amazon Q Developer CLI&lt;/th&gt;
&lt;th&gt;Amazon Q CLI via WebUI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Primary Use Case&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;General AWS service Q&amp;amp;A&lt;/td&gt;
&lt;td&gt;Deep, contextual code development&lt;/td&gt;
&lt;td&gt;Contextual development on any device&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Code Awareness&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌ No File System Access&lt;/td&gt;
&lt;td&gt;✅ Full Project Context&lt;/td&gt;
&lt;td&gt;✅ Full Project Context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Interaction Model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;GUI Chat Window&lt;/td&gt;
&lt;td&gt;Command-Line Interface&lt;/td&gt;
&lt;td&gt;Browser-Based Terminal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Accessibility&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Requires AWS login&lt;/td&gt;
&lt;td&gt;Requires desktop terminal&lt;/td&gt;
&lt;td&gt;Access from any browser&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Mobile Usability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Serviceable UI&lt;/td&gt;
&lt;td&gt;❌ Very Difficult&lt;/td&gt;
&lt;td&gt;✅ Designed for Web/Mobile&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The table shows the gap: the &lt;strong&gt;most powerful version&lt;/strong&gt; (CLI) is the &lt;strong&gt;least accessible&lt;/strong&gt;—unless we bridge that gap.&lt;/p&gt;

&lt;h2&gt;
  
  
  🛠️ &lt;strong&gt;The Solution: A Self-Hosted Web Interface for the Q CLI&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwyz4n5083wa83h1knhse.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwyz4n5083wa83h1knhse.jpg" alt=" " width="800" height="921"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Seeing it in action: &lt;a href="https://github.com/user-attachments/assets/99053791-17c5-4f09-bddb-d5b9ecd61cc0" rel="noopener noreferrer"&gt;https://github.com/user-attachments/assets/99053791-17c5-4f09-bddb-d5b9ecd61cc0&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;My vibe-coded solution: &lt;strong&gt;Amazon Q Developer CLI WebUI&lt;/strong&gt;—a bridge to the full CLI that runs in &lt;strong&gt;any modern browser&lt;/strong&gt;. It wraps around the native &lt;code&gt;q&lt;/code&gt; command, exposing its full feature set without needing a traditional terminal.&lt;/p&gt;

&lt;p&gt;I started with telling Amazon Q Developer CLI &lt;em&gt;"I want to re-create the Amazon Q Developer CLI experience in a web UI"&lt;/em&gt;. Q came up with the plan and the initial tech stack, I iterated a few rounds with testings - in less than an hour, my final solution came:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tech Stack:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Backend: &lt;code&gt;Node.js&lt;/code&gt;, &lt;code&gt;Express&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Real-Time Communication: &lt;code&gt;socket.io&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Terminal Emulation: &lt;code&gt;node-pty&lt;/code&gt; for spawning a real &lt;code&gt;pty&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result? A &lt;strong&gt;fluid, interactive CLI experience&lt;/strong&gt; from a browser—even on mobile.&lt;/p&gt;

&lt;h2&gt;
  
  
  🚀 &lt;strong&gt;Getting Started&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Make sure you have &lt;a href="https://docs.aws.amazon.com/amazonq/latest/qdeveloper-ug/command-line-installing.html" rel="noopener noreferrer"&gt;&lt;code&gt;Amazon Q Developer CLI&lt;/code&gt; installed&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone &lt;span class="nt"&gt;--depth&lt;/span&gt; 1 https://github.com/gabrielkoo/amazon-q-developer-cli-webui
&lt;span class="nb"&gt;cd &lt;/span&gt;amazon-q-developer-cli-webui
npm &lt;span class="nb"&gt;install
&lt;/span&gt;q login &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm start  &lt;span class="c"&gt;# —-host 0.0.0.0&lt;/span&gt;
&lt;span class="c"&gt;#&lt;/span&gt;
&lt;span class="c"&gt;# &amp;gt; amazon-q-cli-webui@0.0.1 start&lt;/span&gt;
&lt;span class="c"&gt;# &amp;gt; node server.js&lt;/span&gt;
&lt;span class="c"&gt;#&lt;/span&gt;
&lt;span class="c"&gt;# Server running on http://localhost:3000&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  🌐 &lt;strong&gt;Making Web UI Available On Your Mobile&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Use any of the typical homelab methods to expose &lt;code&gt;localhost:3000&lt;/code&gt; online:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🌍 &lt;strong&gt;TailScale&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;☁️ &lt;strong&gt;CloudFlare Tunnel&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;🚪 &lt;strong&gt;Ngrok&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;🔐 &lt;strong&gt;Open port 3000 + IP Whitelisting&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The first two options are preferred if you care very much about security.&lt;/p&gt;

&lt;h2&gt;
  
  
  📲 &lt;strong&gt;Unlocking a Truly Mobile Workflow&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Let’s revisit the homelab scenario. You’re reviewing a Python script that needs a fix.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Before&lt;/strong&gt;: You’d struggle with a mobile SSH app, fighting to type commands.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;After&lt;/strong&gt;: You just open a browser tab to your WebUI. Type: &lt;em&gt;"Review &lt;code&gt;fix_script.py&lt;/code&gt;, identify bugs, and suggest improvements."&lt;/em&gt; Copy, paste, and interact—&lt;strong&gt;touch-friendly and frustration-free&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc45nhndm23zkdrravnbb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc45nhndm23zkdrravnbb.png" alt=" " width="800" height="338"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwnlsjwqj812vlmpzs9zr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwnlsjwqj812vlmpzs9zr.png" alt=" " width="800" height="1738"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuiehysd2feowmczdkigx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuiehysd2feowmczdkigx.png" alt=" " width="800" height="1738"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Live demo: &lt;a href="https://dev-to-uploads.s3.amazonaws.com/uploads/articles/2z9oxc4bm8waeor2nriy.gif" rel="noopener noreferrer"&gt;https://dev-to-uploads.s3.amazonaws.com/uploads/articles/2z9oxc4bm8waeor2nriy.gif&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  ✨ &lt;strong&gt;More Than a Wrapper: An Enhanced User Experience&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;I focused on &lt;strong&gt;authenticity and comfort&lt;/strong&gt;, solving the mobile pain points:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;📝 &lt;strong&gt;Proper Input&lt;/strong&gt;: A real multiline &lt;code&gt;&amp;lt;textarea&amp;gt;&lt;/code&gt; makes command editing natural.&lt;/li&gt;
&lt;li&gt;💬 &lt;strong&gt;Streaming Token Display&lt;/strong&gt;: Outputs appear with a "typing" effect, just like the CLI.&lt;/li&gt;
&lt;li&gt;🎨 &lt;strong&gt;Full ANSI Support&lt;/strong&gt;: Colors, bolding, and formatting are preserved for readability.&lt;/li&gt;
&lt;li&gt;🤖 &lt;strong&gt;Let Q Work for You&lt;/strong&gt;: Don’t sweat every keystroke—&lt;strong&gt;Amazon Q can fix and run code for you&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This project was born from a simple need: &lt;strong&gt;use my favorite tools anywhere&lt;/strong&gt;. It’s about &lt;strong&gt;unlocking the full power of Amazon Q&lt;/strong&gt;, even far from a real keyboard.&lt;/p&gt;

&lt;p&gt;🔗 &lt;strong&gt;Check it out here&lt;/strong&gt;: &lt;a href="https://github.com/gabrielkoo/amazon-q-developer-cli-webui" rel="noopener noreferrer"&gt;https://github.com/gabrielkoo/amazon-q-developer-cli-webui&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>ai</category>
      <category>vibecoding</category>
      <category>devex</category>
    </item>
    <item>
      <title>Building A Plasma Sword Fighter Game with Amazon Q CLI</title>
      <dc:creator>Gabriel Koo</dc:creator>
      <pubDate>Wed, 18 Jun 2025 15:35:30 +0000</pubDate>
      <link>https://dev.to/aws-builders/building-a-plasma-sword-fighter-game-with-amazon-q-cli-279g</link>
      <guid>https://dev.to/aws-builders/building-a-plasma-sword-fighter-game-with-amazon-q-cli-279g</guid>
      <description>&lt;p&gt;As a &lt;strong&gt;DevSecOps engineer&lt;/strong&gt;, my daily grind usually involves &lt;strong&gt;CI/CD pipelines&lt;/strong&gt;, &lt;strong&gt;security audits&lt;/strong&gt;, and &lt;strong&gt;infrastructure as code&lt;/strong&gt;. So, when the "Build Games with Amazon Q CLI" campaign popped up, it was a refreshing detour from the usual. The idea of conjuring a game with just &lt;strong&gt;conversational prompts&lt;/strong&gt;, powered by &lt;strong&gt;Amazon Q CLI's Claude 4 large language model&lt;/strong&gt;, was too intriguing to pass up. This isn't the usual realm of an "enthusiast" for me, but more of an exploration into how &lt;strong&gt;AI can augment a developer's toolkit&lt;/strong&gt;, even outside their primary domain.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Game: A "Space Trilogy" Inspired Plasma Sword Fighter ✨
&lt;/h2&gt;

&lt;p&gt;My concept for the game was heavily inspired by the classic &lt;strong&gt;"Space Trilogy" narratives&lt;/strong&gt; (you know the ones 😉), where the good guys wield &lt;strong&gt;blue "light swords"&lt;/strong&gt; and the antagonists opt for menacing &lt;strong&gt;red ones&lt;/strong&gt;. I wanted to capture that classic duel vibe, with players having &lt;strong&gt;supernatural "force push" abilities&lt;/strong&gt; to add another layer to the combat. The result is a &lt;strong&gt;"Plasma Sword Fighter" game&lt;/strong&gt; – a &lt;strong&gt;two-player combat experience&lt;/strong&gt; featuring &lt;strong&gt;real-time sword mechanics&lt;/strong&gt; and &lt;strong&gt;tactical force pushes&lt;/strong&gt;. It's designed to be &lt;strong&gt;intuitive, visually engaging&lt;/strong&gt;, and, importantly, &lt;strong&gt;free from copyright entanglements&lt;/strong&gt; by using generic names.&lt;/p&gt;

&lt;h2&gt;
  
  
  Effective Prompting: Speaking the AI's Language 🗣️
&lt;/h2&gt;

&lt;p&gt;My interaction with Amazon Q CLI was a rapid learning curve in &lt;strong&gt;effective prompting&lt;/strong&gt;. Here’s what I found worked best:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context is King&lt;/strong&gt;: I started by setting the scene: "write a pygame on a streetfight style of star war light sword game, but don't use real names to avoid copyright." This broad stroke gave the AI its initial direction.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1c9nrj4sfohctkqhmqa4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1c9nrj4sfohctkqhmqa4.png" alt="Initial Prompt" width="800" height="489"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Feature by Feature&lt;/strong&gt;: Instead of overwhelming the AI with a massive request, I let the AI to design an initial version first, followed by my later bug requests as well as feature enhancements, this allowed the AI to build the game incrementally.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7l3xgvoxo6o2yjz5mhnm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7l3xgvoxo6o2yjz5mhnm.png" alt="Initial Features" width="800" height="586"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Leveraging Error Messages&lt;/strong&gt;: When things inevitably went sideways (as they do in development 🐛), I found Amazon Q CLI doing a good job on &lt;strong&gt;auto-identifying the errors&lt;/strong&gt; from the command outputs in its initial runs, and it was able to &lt;strong&gt;resolve the errors by itself&lt;/strong&gt; without my intervention.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnbqx43uw9aot8a82fvyb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnbqx43uw9aot8a82fvyb.png" alt="Auto resolved package error" width="800" height="982"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Refining Game Logic&lt;/strong&gt;: One of the more nuanced challenges was ensuring &lt;strong&gt;continuous damage&lt;/strong&gt; when an opponent remained in the sword's active area. My prompt, &lt;code&gt;"now there's a problem, the opponent's HP doesn't decrease for a 2nd time if the opponent stayed in the attack area of the plasma sword.&lt;/code&gt; resulted in &lt;code&gt;You're right! The issue is that the combat detection only triggers once per attack due to the last_hit_time check. When a player holds down the attack button and the opponent stays in range, it should continue dealing damage. Let me fix this:"&lt;/code&gt;, guided Amazon Q CLI to implement a &lt;strong&gt;time-based hit detection&lt;/strong&gt;, allowing for sustained damage while preventing hit-spamming with &lt;strong&gt;invulnerability frames&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhi4tpj6zyro2xm96ssnv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhi4tpj6zyro2xm96ssnv.png" alt="Fixing the " width="800" height="726"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  AI as a Development Accelerator / Quick Prototype Generator ⚡
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Amazon Q CLI&lt;/strong&gt;, recently powered by &lt;strong&gt;Claude 4&lt;/strong&gt;, proved to be an &lt;strong&gt;invaluable development partner&lt;/strong&gt;. It automated much of the heavy lifting, significantly reducing my &lt;strong&gt;development time&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Boilerplate Generation&lt;/strong&gt;: The initial &lt;code&gt;pygame&lt;/code&gt; setup, including window creation, basic event loops, and constant definitions, was generated almost instantly. This freed me from the mundane setup tasks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Core Game Mechanics&lt;/strong&gt;: From player movement and sword activation to force push mechanics and health management, the AI took my &lt;strong&gt;high-level descriptions&lt;/strong&gt; and translated them into &lt;strong&gt;functional code&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Smart Debugging&lt;/strong&gt;: The AI's ability to not only identify errors but also &lt;strong&gt;suggest and implement fixes&lt;/strong&gt;, like installing missing libraries or correcting logical flaws in combat detection, was a &lt;strong&gt;major time-saver&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Iterative Refinement&lt;/strong&gt;: The &lt;strong&gt;back-and-forth process of prompting, testing, and refining&lt;/strong&gt; allowed for quick iterations and continuous improvement of the game's mechanics.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  About the Code 💻
&lt;/h2&gt;

&lt;p&gt;The Python code for the "Plasma Sword Fighter" game is straightforward and relies solely on the &lt;strong&gt;&lt;code&gt;pygame&lt;/code&gt; library&lt;/strong&gt;. While some of the combat and AI logic might appear "raw" to a seasoned game developer, offering room for more sophisticated refactoring (e.g., using state machines for AI), the current structure is &lt;strong&gt;remarkably readable&lt;/strong&gt;. This clarity is a testament to the AI's ability to produce &lt;strong&gt;understandable code&lt;/strong&gt;, even when generating complex interactions.&lt;/p&gt;

&lt;p&gt;The full code is be hosted on GitHub here: &lt;a href="https://github.com/gabrielkoo/amazonq-plasma-sword-fighter-game/" rel="noopener noreferrer"&gt;https://github.com/gabrielkoo/amazonq-plasma-sword-fighter-game/&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Screenshots and Gameplay 🎮
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6eih31o2u6onogy0nk7r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6eih31o2u6onogy0nk7r.png" alt=" " width="800" height="526"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh5b6tjgotzhclogqukcq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh5b6tjgotzhclogqukcq.png" alt=" " width="800" height="554"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpy2waqf2tg91ryc5uujl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpy2waqf2tg91ryc5uujl.png" alt=" " width="800" height="554"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6oboqfe1wz9wslm24uld.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6oboqfe1wz9wslm24uld.png" alt=" " width="800" height="554"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here are some snapshots from the "Plasma Sword Fighter" battles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ready for Battle&lt;/strong&gt;: The game's initial screen, featuring two fighters against a cosmic backdrop, their health bars poised for action.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mid-Combat&lt;/strong&gt;: A dynamic shot showing the glowing plasma swords in action, with players engaged in a fierce duel.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Game Features:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Two-player combat&lt;/strong&gt; with &lt;strong&gt;glowing plasma swords&lt;/strong&gt; (avoiding copyright)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time combat system&lt;/strong&gt; with sword swinging and blocking&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Supernatural "force push" ability&lt;/strong&gt; with cooldown mechanics&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Health system&lt;/strong&gt; with visual health bars&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Invulnerability frames&lt;/strong&gt; after taking damage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Visual effects&lt;/strong&gt; including sword glow and hit flashes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Starfield background&lt;/strong&gt; for an immersive space combat feel&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI opponent&lt;/strong&gt; with adjustable difficulty (Easy, Medium, Hard)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How to Play:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Player 1 (Blue)&lt;/strong&gt;: WASD to move, SPACE to activate sword, SHIFT to attack, Q for force push, T to toggle targeting mode (mouse vs. auto-target).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Player 2 (Red)&lt;/strong&gt;: Arrow keys to move, Right CTRL to activate sword, Right SHIFT to attack, ENTER for force push, P to toggle targeting mode (mouse vs. auto-target).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI Difficulty&lt;/strong&gt;: Press 1 for Easy, 2 for Medium, 3 for Hard.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Combat Mechanics:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Activate your plasma sword and maneuver close to your opponent.&lt;/li&gt;
&lt;li&gt;Swing your sword to deal damage (&lt;strong&gt;10 HP per hit&lt;/strong&gt;).&lt;/li&gt;
&lt;li&gt;Utilize &lt;strong&gt;force push&lt;/strong&gt; to knock back enemies and inflict minor damage (&lt;strong&gt;5 HP + knockback&lt;/strong&gt;).&lt;/li&gt;
&lt;li&gt;Each player starts with &lt;strong&gt;100 HP&lt;/strong&gt;; the first to reach 0 loses.&lt;/li&gt;
&lt;li&gt;Brief invulnerability periods after taking damage prevent spam attacks.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Game Controls:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Press &lt;strong&gt;R to restart&lt;/strong&gt; after a game over.&lt;/li&gt;
&lt;li&gt;Press &lt;strong&gt;ESC to quit&lt;/strong&gt; anytime.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The "Plasma Sword Fighter" game captures the essence of classic space duels without infringing on any existing intellectual property. The visual effects create that iconic glowing sword aesthetic, offering a fun and engaging combat experience.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts 💡
&lt;/h2&gt;

&lt;p&gt;This experience with Amazon Q CLI wasn't just about building a game; it was about understanding the practical applications of &lt;strong&gt;GenAI in accelerating software development&lt;/strong&gt;. &lt;strong&gt;Amazon Q CLI&lt;/strong&gt;, recently leveraging &lt;strong&gt;Claude 4&lt;/strong&gt;, is a &lt;strong&gt;powerful tool&lt;/strong&gt; that can significantly enhance productivity, even for those working outside traditional software development domains. It's a clear example of how &lt;strong&gt;GenAI can democratize development&lt;/strong&gt;, allowing anyone with an idea to bring it to life with guided assistance. I'm genuinely impressed and encourage others to experiment with Amazon Q CLI to discover its potential firsthand.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>awschallenge</category>
      <category>genai</category>
    </item>
  </channel>
</rss>
