I've got OpenClaw(MoltBot(ClawdBot)) running locally on a Raspberry Pi — where computation power is scarce, and it's gone unresponsive on me more than a few times. But even on constrained hardware, every chat turn, every memory search, every web lookup is hitting paid APIs. The bill is small at first, then it isn't. I had been using qwen3-coder-480b for a week or two, and the daily cost skyrocketed to as much as $50.
Assumption: OpenClaw is running on hardware you already own or pay for separately — a Raspberry Pi, home server, or existing cloud instance. The compute cost of the host itself isn't counted in the ~$20/month figure here.
If you've picked up AWS Credits from events, the AWS Community Builder program ($500/year), or AWS Activate — or if your company prefers to keep spend within AWS rather than onboarding yet another SaaS API provider — there's a way to run the whole OpenClaw stack on credits.
This is how I did it.
Disclaimer: The crux of this hack relies heavily on Amazon Q Developer Pro's undocumented while generously high usage ceiling while it lasts. If it's eventually deprecated, we will still need to switch to Kiro plans with overage pricings - still covered by AWS Credits with lower cost/token ratio.
Who This Is For
Two very different reasons to care about this setup.
If you have AWS Credits to burn:
Credits from re:Invent, AWS Community Builder, AWS Activate, or customer programs come with expiry dates. Running your AI assistant stack on them is one of the most practical ways to put idle credits to work — at ~$20/month, $100 in credits covers 5 months of the full stack. If you're sitting on a few hundred dollars with an end-of-year deadline, this is a productive use before they lapse.
If you're in a company with procurement or compliance requirements:
Every new SaaS vendor is a TPRM exercise. OpenAI for embeddings, Perplexity for web search, Anthropic for Claude — each one is a separate vendor assessment, a separate DPA, and a separate conversation with your security team. For FSI and regulated industries, that's not just overhead — it can be a blocker.
AWS is likely already in your vendor register. Consolidating on Bedrock means single billing, fewer third-party relationships to manage, and data residency you control. For anything touching customer data in banking, insurance, or healthcare, that's the difference between a quick internal approval and a 3-month procurement cycle.
Prerequisites
-
AWS account with Bedrock access enabled in
us-east-1(or another US region) - AWS credentials — a Bedrock API key is the simplest option if your account supports it. Otherwise, a long-term IAM access key/secret key pair works fine and is easier to manage than SSO. IAM Identity Center is only required for the Q Developer Pro layer.
- Python 3.10+ — used by kiro-gateway, LiteLLM, and the Nova grounding proxy
- Amazon Q Developer Pro subscription ($19/user/month, credit-eligible) — required for Layer 1 (kiro-gateway). Kiro Pro, Pro+, or Power plans also work but are credit-based with overage charges — Q Developer Pro is the better deal.
What Actually Costs Money in OpenClaw?
Before reaching for solutions, it helps to know exactly where the spend goes. OpenClaw has four distinct cost centers:
1. Main model (LLM)
Every chat turn, every agent action, every tool call — all routed through your primary LLM. This is the biggest variable cost. On a busy day it adds up fast.
2. Memory search (embeddings)
OpenClaw's memory_search tool converts your memory files into vector embeddings and queries them semantically. Every search = an embedding API call. Low cost per call, but it runs constantly in the background.
3. Web search
The web_search tool hits Perplexity or Brave APIs. Perplexity charges per query on paid plans; Brave gives you $5/month free then charges beyond that.
4. Browser automation
The browser tool spins up a Chromium instance for web scraping, form filling, and screenshots. Running a full browser on a low-compute machine (Raspberry Pi, t4g.small) is heavy — and cloud browser options cost per session.
That's it. Four layers. The goal: drive variable cost to zero.
My Config: All 4 Layers on AWS Credits
Here's the full picture before we go deep:
| Layer | Solution | Credit |
|---|---|---|
| Main model | kiro-gateway → Amazon Q Developer Pro | @Jwadow |
| Memory search | Native Bedrock embeddings via PR #20191 | @gabrielkoo |
| Web search | bedrock-web-search-proxy — Nova Grounding as Perplexity drop-in | @gabrielkoo |
| Browser | agent-browser + AgentCore provider | @pahudnet |
Two of these I built myself. Two were built by other community members. All four are open source.
Layer 1: Main Model + Image Analysis — Kiro CLI — Covered by AWS Credits
Amazon Q Developer Pro: flat-rate access to Claude
The key difference between Amazon Q Developer Pro and Kiro Pro is the billing model. Kiro Pro is credit-based — 1,000 credits/month, pay more if you exceed them. Amazon Q Developer Pro is a flat monthly subscription: $19/user/month, no per-token billing, no surprise overages.
| Plan | Cost | Usage |
|---|---|---|
| Kiro Free | $0/mo | 50 credits/month |
| Kiro Pro | $20/mo | 1,000 credits + $0.04/credit overage |
| Kiro Pro+ | $40/mo | 2,000 credits + $0.04/credit overage |
| Kiro Power | $200/mo | 10,000 credits + $0.04/credit overage |
| Amazon Q Developer Pro (legacy) | $19/user/mo | Flat-rate, not credit-capped |
Note: Amazon Q Developer Pro is now a legacy plan in the Kiro ecosystem. AWS has stopped allowing new Builder ID subscriptions to Q Developer Pro — new users can only subscribe through Kiro plans. The undocumented usage limits on Q Pro are likely part of why AWS made this transition. If you're already on Q Developer Pro, you retain access and it remains the better deal for OpenClaw.
Your Q Developer Pro subscription grants access to kiro-cli. The documented quota is 10,000 inference calls/month — for a personal AI assistant, that's more than enough.
Real-world cost check: In 4 days of active OpenClaw usage after switching to kiro-gateway, I consumed ~40M input tokens and ~865K output tokens with Claude Sonnet. OpenClaw loads memory files, system prompts, and tool results into every turn — the context window fills up fast. At standard Bedrock pricing ($3/1M input, $15/1M output), that's ~$135 for 4 days, or roughly $1,000/month. Q Developer Pro covers all of it for $19/month flat.
In practice, I've been running Kiro CLI with OpenClaw daily and haven't hit any rate limits in active use. Note: the /usage command isn't available under the Q Developer Pro plan — monitor your usage via the AWS console instead. That said, after running OpenClaw with kiro-gateway for several days, I checked the Q Developer usage metrics in the AWS console and the figures hadn't moved at all. It's unclear whether Kiro CLI usage is counted against the same quota as Q Developer's agentic requests, or tracked separately. The Amazon Q Developer pricing page only states "Included (with limits)" for the Pro tier — no specifics on what those limits are or how Kiro CLI calls are metered.
Note: Q Developer Pro requires AWS IAM Identity Center (SSO) — you can't use it with a free Builder ID. If you're already set up with Identity Center (common in enterprise teams and AWS Community Builders with corporate accounts), you're good to go.
Important: Standard AWS Credits don't cover per-token Claude usage via Anthropic's marketplace agreement. But the Q Developer Pro subscription fee itself is credit-eligible — making the whole stack fundable with AWS credits. Kiro's flat-rate subscription is currently the only practical way to run Claude in OpenClaw without per-token billing.
New AWS accounts: Even if you'd prefer to pay per-token via direct Bedrock API, new accounts often come with ultra-low default rate limits that can't reliably serve OpenClaw — even when you're willing to pay. The flat-rate Q Developer Pro route sidesteps this entirely.
kiro-gateway: the bridge
kiro-gateway — built by @Jwadow — wraps Kiro CLI and exposes OpenAI-compatible and Anthropic-compatible API endpoints. OpenClaw talks to it like any other provider.
git clone https://github.com/jwadow/kiro-gateway
cd kiro-gateway
pip install -r requirements.txt
cp .env.example .env
Edit .env:
PROXY_API_KEY="your-secret-key"
KIRO_CREDS_FILE="~/.aws/sso/cache/kiro-auth-token.json"
Run kiro-cli login once to authenticate — this populates KIRO_CREDS_FILE automatically. (kiro-cli is only needed for this initial login; kiro-gateway reads the token it generates. Re-run if your token expires.) Then:
python main.py --port 9000
Heads up: kiro-gateway's hardcoded fallback model list may lag behind new Claude releases. If a model isn't showing up at
/v1/models, add it manually toFALLBACK_MODELSinkiro/config.py.
Available models via Q Developer Pro:
| Model | Best for |
|---|---|
claude-sonnet-4.6 |
General tasks, coding, writing |
claude-haiku-4.5 |
Fast, lightweight responses |
claude-opus-4.6 |
Complex reasoning, long context |
OpenClaw config:
{
"models": {
"providers": {
"kiro": {
"baseUrl": "http://localhost:9000",
"apiKey": "your-secret-key",
"api": "anthropic-messages"
}
}
},
"agents": {
"defaults": {
"model": {
"primary": "kiro/claude-sonnet-4.6"
},
"imageModel": {
"primary": "kiro/claude-sonnet-4.6"
}
}
}
}
Bonus: kiro-gateway works with any tool that supports OpenAI or Anthropic APIs — not just OpenClaw. To use it with Claude Code:
ANTHROPIC_BASE_URL=http://localhost:9000andANTHROPIC_API_KEY=your-secret-key.
Layer 2: Memory Search — Bedrock Embeddings — Covered by AWS Credits
OpenClaw's memory_search needs an embedding model. Amazon Nova Multimodal Embeddings costs ~$0.00014 per 1K tokens — fractions of a cent per query, and covered by AWS Credits.
OpenClaw's native Bedrock provider doesn't wire up embeddings cleanly yet — PR #24892 - (I made a novice mistake with PR #20191) is pending merge. Until then, you'll need a local OpenAI-compatible proxy in front of Bedrock. Two options:
Option A: LiteLLM
# litellm_config.yaml
model_list:
- model_name: nova-2-multimodal-embeddings-v1.0
litellm_params:
model: bedrock/amazon.nova-2-multimodal-embeddings-v1:0
aws_region_name: us-east-1
litellm_settings:
drop_params: true
master_key: "local-only"
pip install 'litellm[proxy]'
litellm --config litellm_config.yaml --port 4000
"memorySearch": {
"enabled": true,
"provider": "openai",
"remote": { "baseUrl": "http://localhost:4000", "apiKey": "local-only" },
"model": "nova-2-multimodal-embeddings-v1.0"
}
Option B: bedrock-access-gateway-function-url (serverless, no fixed cost)
My own fork of the original bedrock-access-gateway — deployed as a Lambda Function URL instead of ALB+Fargate, so there's no $16+/month fixed cost. Full writeup: Use Amazon Bedrock Models with OpenAI SDKs with a Serverless Proxy Endpoint.
Note: My PR #222 for Nova 2 embedding support against the original
bedrock-access-gatewayproject has been merged — so my fork pulls from this upstream automatically viaprepare_source.sh.
git clone --depth=1 https://github.com/gabrielkoo/bedrock-access-gateway-function-url
cd bedrock-access-gateway-function-url
./prepare_source.sh
sam build
sam deploy --guided
Grab the FunctionUrl output after deploy, then:
"memorySearch": {
"enabled": true,
"provider": "openai",
"remote": { "baseUrl": "https://<your-function-url>.lambda-url.us-east-1.on.aws", "apiKey": "your-api-key" },
"model": "amazon.nova-2-multimodal-embeddings-v1:0"
}
Region note:
amazon.nova-2-multimodal-embeddings-v1:0availability varies — check the Bedrock model availability page. Make sure your IAM credentials havebedrock:InvokeModelin your target region.
Once PR #24892 merges, no proxy needed — the config simplifies to:
"memorySearch": {
"enabled": true,
"provider": "bedrock",
"model": "amazon.nova-2-multimodal-embeddings-v1:0",
"region": "us-east-1"
}
Layer 3: Web Search — Nova Grounding Proxy — Covered by AWS Credits
I built bedrock-web-search-proxy — a FastAPI wrapper that makes Bedrock Nova Grounding look like the Perplexity Sonar API. No Perplexity or Brave API key needed. Runs entirely on AWS Credits.
Full writeup: Drop-in Perplexity Sonar Replacement with AWS Bedrock Nova Grounding.
Option A: Run locally
git clone https://github.com/gabrielkoo/bedrock-web-search-proxy
cd bedrock-web-search-proxy
pip install fastapi uvicorn boto3
uvicorn main:app --port 7000
Option B: Lambda Function URL (zero idle cost)
See the deployment guide in the repo — SAM-based, arm64, python3.13. Once deployed, you get a persistent HTTPS endpoint with no local process to manage.
OpenClaw config:
{
"tools": {
"web": {
"search": {
"provider": "perplexity",
"perplexity": {
"apiKey": "your-proxy-key",
"baseUrl": "http://localhost:7000/v1",
"model": "sonar-pro"
}
}
}
}
}
All US Nova CRIS (Cross-Region Inference Services) profiles support web grounding (
us.amazon.nova-premier-v1:0,us.amazon.nova-pro-v1:0, etc.). Native model IDs without theus.prefix do NOT work — must use CRIS profiles. Web grounding is US regions only (us-east-1, us-east-2, us-west-2).
Layer 4: Cloud Browser — Bedrock AgentCore — Covered by AWS Credits
agent-browser by Vercel Labs, with the AgentCore provider contributed by Pahud Hsieh (@pahudnet) — PR #397.
The browser runs in AWS — no local Chromium needed. Particularly useful on low-compute instances (Pi, t4g.small) where running a local browser would be too heavy. Covered by AWS Credits.
Node.js and pnpm required. Since PR #397 isn't merged yet, check out the branch directly:
git clone https://github.com/vercel-labs/agent-browser
cd agent-browser
git fetch origin pull/397/head:agentcore
git checkout agentcore
pnpm install && pnpm build
Then use it:
agent-browser -p agentcore open https://example.com
agent-browser close
Your AWS identity needs these IAM permissions:
bedrock-agentcore:StartBrowserSessionbedrock-agentcore:ConnectBrowserAutomationStreambedrock-agentcore:StopBrowserSession
On a desktop machine with enough RAM, local CDP (OpenClaw's built-in browser) is free and works fine. AgentCore is the play for headless/low-compute setups.
The Cost Math
Without this setup, Claude Sonnet alone runs ~$1,000/month at standard Bedrock pricing — based on real token usage from my own sessions. OpenClaw's large context window (memory files, system prompts, tool results loaded every turn) means the token bill compounds fast.
The full stack with this setup runs at ~$20/month:
- $19/mo — Amazon Q Developer Pro (flat-rate, covers all LLM calls)
- ≤$1/mo — Bedrock embeddings for memory search (Nova 2 at $0.00014/1K tokens)
Web search and browser automation are covered by AWS Credits — no separate line item.
With $100 in AWS Credits, you cover roughly 5 months of the full stack. Both the Q Developer Pro subscription and Bedrock embeddings are credit-eligible — if you're an AWS Community Builder, that $500/year allocation more than covers it.
Where AWS Credits Come From
- AWS event participant/speaker — re:Invent, Summit, local user groups
- AWS Community Builder — $500/year for active builders (builder.aws.com). The application opens a few rounds per year — I'm one of the builders in the program.
- AWS Customer Council — participation typically includes credits
- AWS Activate (startups) — up to $100K
- AWS Educate / Academy — educators and students
Check your balance: console.aws.amazon.com/billing/home#/credits
Closing
Four layers. Two built by community members, two I built myself. All open source, all running on AWS Credits.
To be clear: kiro-gateway is the most crucial piece here. @Jwadow built the bridge that makes Claude accessible without per-token billing — I built the embedding proxy and web search proxy to fill the remaining gaps. Web search and cloud browser (Layers 3 and 4) are purely AWS Credits — no subscription, per-token billing well covered by AWS Credits.
If you're already an AWS Community Builder or have credits sitting in your account, there's no reason to be paying per-token for a personal AI assistant. Wire it up once, and the stack runs itself.
Put those credits to work.
Top comments (2)
Great breakdown on the costs. I got my OpenClaw API spend down from ~$40/month to under $2 by switching to Gemini Flash. The trick is matching the model to the task you don't need Opus for every message.
It's hard to estimate costs with OpenClaw (actually with any AI inference cost). Usually it takes some time to understand how much compute a task needs.
If you want to avoid coding we created an OpenClaw plugin that lets you choose 4 models (from cheap and fast to the big reasoning one) and route to the right model on the fly: github.com/mnfst/manifest