The Problem: Coding Tasks Are Token Killers
Every month I get a bill from Anthropic for my personal AI assistant. Most of it comes from coding tasks — not because I write a lot of code, but because Claude charges $15/1M output tokens, and writing a Flask API with tests burns ~3,500 output tokens. Every time.
Openclaw (my agent platform) can execute code — it has shell tool calls, so it runs the code, reads the error, and tries to fix it automatically. No manual copy-paste needed. But here's the problem: every single step in that loop (generate → run → read output → fix → run again) is a separate Claude API call. A simple "write and validate" task that takes 5 iterations = 5 round-trips = 5× the token cost. The bill compounds fast.
$54/month — and most of it is just Openclaw iterating on code fixes, each round burning $3–15 per million tokens.
Then I found Kiro CLI's ACP protocol. Here's what I built — and what the numbers actually look like.
TL;DR
- Calling Claude API directly for coding tasks burns ~9,000 tokens per task (~$0.18); Openclaw can execute code but each iteration adds a full Claude round-trip
- ACP (Agent Communication Protocol) lets you route coding tasks to Kiro CLI — which uses its own Credits, not Claude tokens
- Your main agent (Claude) only spends ~600–2,000 tokens on routing + summarizing — a 60–80% reduction in Claude API token usage (Kiro Credits for the actual coding are billed separately)
- The ACP client is pure Python stdlib, zero pip dependencies, ~300 lines
- Key trick:
_kiro.dev/metadataevents give you real-time Credits and context usage — use them to manage sessions proactively
If you've built a personal AI assistant on Claude API — handling messages, emails, calendar — you've probably noticed that coding requests are disproportionately expensive:
User: "Write a Flask REST API with JWT auth and PostgreSQL"
Claude needs to:
├── Understand requirements (~500 input tokens)
├── Generate full code (~3,000 output tokens)
├── Process user feedback (~2,000 input tokens)
└── Regenerate revised version (~3,500 output tokens)
────────────────────────────────────────────────
Total: ~9,000 tokens per task
At claude-sonnet-4 pricing (input: $3/1M, output: $15/1M), that's roughly $0.18 per coding task.
10 coding tasks/day × $0.18 × 30 days = $54/month — and while Openclaw can execute code via shell tools, each iteration (run → read output → fix → run again) burns another Claude API round-trip. Tokens stack up fast.
Two problems compound here:
- Cost: Output tokens at $15/1M add up fast for code generation
- Token compounding: Openclaw can execute code via shell tool calls, but each iteration (run → parse output → fix → run again) requires a new Claude API round-trip — making multi-step coding tasks exponentially expensive
The Solution: ACP + Kiro CLI as a Coding Sub-Agent
What is ACP?
Agent Communication Protocol (ACP) is a JSON-RPC 2.0 based protocol for agent-to-agent communication over stdio. Kiro CLI exposes it natively via kiro-cli acp.
Key ACP methods used in this integration:
| Method | Direction | Purpose |
|---|---|---|
initialize |
Client → Kiro | Handshake, declare capabilities |
session/new |
Client → Kiro | Create a new coding session |
session/load |
Client → Kiro | Resume existing session (preserves context) |
session/prompt |
Client → Kiro | Send a task, block until complete |
session/request_permission |
Kiro → Client | Request approval for sensitive ops |
session/update (notify) |
Kiro → Client | Stream code chunks and tool call status |
_kiro.dev/metadata (notify) |
Kiro → Client | Real-time Credits + context usage |
What is Kiro CLI?
Amazon Kiro is an AI coding tool from AWS with:
- Native file read/write and terminal execution capabilities
- Independent Kiro Credits billing (completely separate from Claude API tokens)
- ACP support via
kiro acpsubcommand
The Architecture
┌──────────────────────────────────────────────────────────┐
│ User (Feishu / Signal / Telegram) │
└─────────────────────────┬────────────────────────────────┘
│ message
▼
┌──────────────────────────────────────────────────────────┐
│ Main Agent (Openclaw / your agent) │
│ ┌──────────────┐ ┌──────────────┐ ┌────────────────┐ │
│ │ Intent recog │ │ Memory mgmt │ │ Task routing │ │
│ │ (Claude API) │ │ MEMORY.md │ │ SKILL.md │ │
│ └──────────────┘ └──────────────┘ └───────┬────────┘ │
└─────────────────────────────────────────────┼────────────┘
│ ACP JSON-RPC 2.0
┌─────────────────────▼────────────┐
│ acp_client.py │
│ initialize / session/new │
│ session/prompt │
│ session/request_permission │
│ _kiro.dev/metadata (usage push) │
└─────────────────────┬────────────┘
│ stdio (subprocess)
┌─────────────────────▼────────────┐
│ kiro-cli (acp mode) │
│ ┌───────────┐ ┌─────────────┐ │
│ │ Code gen │ │ Tool exec │ │
│ │ (Kiro AI) │ │ fs/terminal │ │
│ └───────────┘ └─────────────┘ │
└──────────────────────────────────┘
The main agent (Claude API) handles intent recognition and task routing (~600 tokens). The actual coding work — code generation, file writes, test runs — all happens inside Kiro, billed as Kiro Credits.
Why ACP Over Other Approaches
| Approach | Pros | Cons |
|---|---|---|
| Direct Claude API | Simple, no deps | Expensive; each iteration = new API round-trip |
subprocess + kiro chat --no-interactive |
Easy to implement | No session state, brittle output parsing |
| ACP JSON-RPC (this approach) | Bidirectional, session mgmt, real-time usage | Need to implement JSON-RPC client |
| MCP protocol | Standardized tool calls | Unidirectional, wrong fit for Kiro as executor |
ACP wins because:
- Kiro natively supports it (
kiro acpsubcommand) - Session persistence via
session/load— reuse context across tasks -
_kiro.dev/metadatanotifications give real-time Credits + context % -
session/request_permissionenables fine-grained control over sensitive operations
One-liner to remember: MCP makes Kiro a tool your agent controls. ACP makes Kiro a peer agent. If you're serious about multi-agent systems, that distinction is everything.
Implementation
Installation
# Install Kiro CLI
curl -fsSL https://kiro.dev/install.sh | sh
# Verify
kiro-cli --version
# kiro-cli 1.24.1
# Login with AWS Builder ID
kiro-cli auth login
# Test ACP mode
echo '{
"jsonrpc":"2.0","id":1,"method":"initialize",
"params":{
"protocolVersion":1,
"clientCapabilities":{},
"clientInfo":{"name":"test","version":"0.1"}
}
}' | kiro-cli acp
Recommended directory layout:
your-agent/
├── skills/kiro-cli/
│ ├── acp_client.py # Core ACP client (stdlib only, no pip)
│ ├── kiro_bridge.py # Production wrapper with session + usage mgmt
│ ├── usage_tracker.py # Dual-track: Kiro Credits + Claude tokens
│ └── SKILL.md # Routing rules for your agent
├── token_stats.py # Claude API token usage stats
└── usage_stats.json # Persisted usage data
The ACP Client (acp_client.py)
Pure Python stdlib, no pip required. The full implementation with comments:
"""
acp_client.py — JSON-RPC 2.0 over stdio client for kiro-cli.
No external dependencies. Drop this file into your project.
"""
import json, logging, os, signal, subprocess, threading
from dataclasses import dataclass, field
from typing import Callable
log = logging.getLogger(__name__)
_BUF_SIZE = 4 * 1024 * 1024 # 4MB buffer — prevent OOM on large responses
@dataclass
class ToolCallInfo:
"""A single tool call executed by Kiro (file write, terminal cmd, etc.)"""
tool_call_id: str = ""
title: str = "" # Human-readable: "Creating app.py", "Running pytest"
kind: str = "" # "edit" / "execute" / "read"
status: str = "pending"
content: str = ""
@dataclass
class PromptResult:
"""Complete result of a session/prompt call."""
text: str = ""
tool_calls: list = field(default_factory=list)
stop_reason: str = ""
kiro_context_pct: float = 0.0 # Context window usage % (0-100)
kiro_credits: float = 0.0 # Kiro Credits consumed this call
@dataclass
class PermissionRequest:
"""Kiro asks permission before sensitive operations."""
session_id: str
tool_call_id: str
title: str
options: list # [{"optionId": "allow_once", "name": "Yes"}, ...]
class ACPClient:
def __init__(self, cli_path: str = "kiro-cli"):
self._cli_path = cli_path
self._proc = None
self._req_id = 0
self._lock = threading.Lock()
self._pending: dict[int, tuple] = {} # id -> (Event, result_holder)
self._session_updates: dict[str, list] = {} # session -> update buffer
self._permission_handler: Callable | None = None
self._session_metadata: dict[str, dict] = {} # from _kiro.dev/metadata
self._running = False
# ── Lifecycle ─────────────────────────────────────────
def start(self, cwd: str | None = None):
"""
Launch kiro-cli in ACP mode and complete the JSON-RPC handshake.
Args:
cwd: Working directory. Kiro reads .kiro/settings/mcp.json from here.
Use this to scope MCP servers and skills per project.
"""
self._proc = subprocess.Popen(
[self._cli_path, "acp"],
cwd=cwd,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
bufsize=0, # unbuffered — critical for real-time streaming
)
self._running = True
# Separate threads for stdout and stderr — never block the main thread
threading.Thread(target=self._read_loop, daemon=True).start()
threading.Thread(target=self._read_stderr, daemon=True).start()
# ACP handshake: declare our capabilities to Kiro
result = self._send_request("initialize", {
"protocolVersion": 1,
"clientCapabilities": {
"fs": {"readTextFile": True, "writeTextFile": True},
"terminal": True,
},
"clientInfo": {"name": "your-agent-kiro-bridge", "version": "1.0.0"},
})
log.info("[ACP] Handshake OK: %s", json.dumps(result)[:200])
return result
def stop(self):
"""
Graceful shutdown: kill child processes first (MCP servers, etc.),
then close stdin and wait for the main kiro process to exit.
Prevents zombie MCP server processes.
"""
self._running = False
if self._proc and self._proc.poll() is None:
self._kill_children(self._proc.pid)
self._proc.stdin.close()
try:
self._proc.wait(timeout=5)
except subprocess.TimeoutExpired:
self._proc.kill()
def _kill_children(self, parent_pid: int):
"""Recursively SIGTERM all child processes (MCP servers, compilers, etc.)"""
try:
r = subprocess.run(["pgrep", "-P", str(parent_pid)],
capture_output=True, text=True)
for pid_str in r.stdout.strip().split('\n'):
if pid_str:
child_pid = int(pid_str)
self._kill_children(child_pid) # recurse first
try:
os.kill(child_pid, signal.SIGTERM)
except ProcessLookupError:
pass
except Exception as e:
log.debug("[ACP] Child cleanup error: %s", e)
def is_running(self) -> bool:
return self._running and self._proc is not None and self._proc.poll() is None
# ── Session Management ────────────────────────────────
def session_new(self, cwd: str) -> tuple[str, dict]:
"""
Create a new Kiro session.
Note: MCP servers are NOT configured via params here.
Kiro automatically reads {cwd}/.kiro/settings/mcp.json.
Use different cwd values to scope different MCP configs per project.
"""
result = self._send_request("session/new", {
"cwd": cwd,
"mcpServers": [], # required field, but actual config is filesystem-based
})
session_id = result.get("sessionId", "")
if not session_id:
raise RuntimeError(f"session/new returned no sessionId: {result}")
return session_id, result.get("modes", {})
def session_load(self, session_id: str, cwd: str) -> dict:
"""
Resume an existing session — preserves full conversation context.
Key optimization: reuse sessions across tasks to avoid re-explaining
project structure every time. Each session/load skips re-reading
files Kiro already has in context.
"""
return self._send_request("session/load", {
"sessionId": session_id,
"cwd": cwd,
"mcpServers": [],
})
# ── Core: Send a Prompt ───────────────────────────────
def session_prompt(
self,
session_id: str,
text: str,
images: list[tuple[str, str]] | None = None,
timeout: float = 300,
) -> PromptResult:
"""
Send a prompt and block until Kiro completes the response.
Args:
session_id: From session_new() or session_load()
text: The instruction/task text
images: List of (base64_data, mime_type) for multimodal input
timeout: Seconds to wait. Use 60s for simple tasks, 600s for large refactors.
Returns:
PromptResult with text, tool_calls list, and usage metrics
IMPORTANT KIRO-SPECIFIC QUIRKS:
1. Kiro uses "prompt" field, NOT "content" (unlike standard ACP spec)
2. Always include at least one text block — Kiro returns Internal error
if you send only images without a text block
"""
self._session_updates[session_id] = []
req_id = self._next_id()
prompt_content = []
if images:
for b64, mime in images:
prompt_content.append({"type": "image", "data": b64, "mimeType": mime})
if text:
prompt_content.append({"type": "text", "text": text})
elif images:
prompt_content.append({"type": "text", "text": "?"}) # Kiro quirk: needs text
result = self._send_request_with_id("session/prompt", {
"sessionId": session_id,
"prompt": prompt_content, # ← "prompt" not "content"
}, req_id, timeout=timeout)
return self._build_prompt_result(session_id, result)
# ── Permission Control ────────────────────────────────
def on_permission_request(self, handler):
"""
Register a permission decision callback.
handler(PermissionRequest) should return:
- "allow_once" allow this specific operation
- "allow_always" always allow this tool type
- "deny" reject the operation
If no handler is registered: auto-approve everything (headless mode).
"""
self._permission_handler = handler
def _handle_permission_request(self, msg_id, params: dict):
title = params.get("toolCall", {}).get("title", "Unknown")
if self._permission_handler is None:
self._send_permission_response(
msg_id, params.get("sessionId", ""), "allow_once"
)
return
request = PermissionRequest(
session_id=params.get("sessionId", ""),
tool_call_id=params.get("toolCall", {}).get("toolCallId", ""),
title=title,
options=params.get("options", []),
)
def handle_async():
decision = self._permission_handler(request) or "deny"
self._send_permission_response(msg_id, request.session_id, decision)
threading.Thread(target=handle_async, daemon=True).start()
# ── Internal: JSON-RPC Transport ──────────────────────
def _next_id(self) -> int:
with self._lock:
self._req_id += 1
return self._req_id
def _send_request(self, method, params, timeout=300):
return self._send_request_with_id(method, params, self._next_id(), timeout)
def _send_request_with_id(self, method, params, req_id, timeout=300):
msg = {"jsonrpc": "2.0", "id": req_id, "method": method, "params": params}
evt = threading.Event()
holder: list = []
self._pending[req_id] = (evt, holder)
self._proc.stdin.write((json.dumps(msg, ensure_ascii=False) + "\n").encode())
self._proc.stdin.flush()
if not evt.wait(timeout=timeout):
self._pending.pop(req_id, None)
raise TimeoutError(f"{method} (id={req_id}) timed out after {timeout}s")
self._pending.pop(req_id, None)
if len(holder) == 2 and holder[0] is None:
raise RuntimeError(f"RPC error {holder[1].get('code')}: {holder[1].get('message')}")
return holder[0] if holder else {}
def _read_loop(self):
"""Continuously read kiro stdout and dispatch messages."""
while self._running:
try:
line = self._proc.stdout.readline(_BUF_SIZE)
if not line:
break
self._handle_line(line.decode(errors="replace").strip())
except Exception as e:
if self._running:
log.error("[ACP] Read error: %s", e)
break
def _read_stderr(self):
while self._running:
try:
line = self._proc.stderr.readline()
if not line:
break
log.debug("[ACP stderr] %s", line.decode(errors="replace").strip())
except Exception:
break
def _handle_line(self, line: str):
"""
Dispatch incoming JSON-RPC messages into three categories:
1. Responses (have id, no method) — wake up waiting _send_request call
2. Requests from Kiro (have id AND method) — e.g. permission requests
3. Notifications (no id) — streaming updates, metadata
"""
if not line:
return
try:
msg = json.loads(line)
except json.JSONDecodeError:
return
msg_id = msg.get("id")
method = msg.get("method")
# Category 1: Response to our pending request
if msg_id is not None and method is None:
pending = self._pending.get(msg_id)
if pending:
evt, holder = pending
if msg.get("error"):
holder.extend([None, msg["error"]])
else:
holder.append(msg.get("result", {}))
evt.set()
return
# Category 2: Kiro asking us for permission
if msg_id is not None and method == "session/request_permission":
self._handle_permission_request(msg_id, msg.get("params", {}))
return
# Category 3: Notifications (streaming)
if method and msg_id is None:
params = msg.get("params", {})
session_id = params.get("sessionId", "")
if method == "session/update" and session_id:
# Code generation chunks, tool call status updates
updates = self._session_updates.get(session_id)
if updates is not None:
updates.append(params.get("update", {}))
elif method == "_kiro.dev/metadata" and session_id:
# ⭐ Real-time Credits consumed + context window usage %
# Use contextUsagePercentage to decide when to start a new session
meta = self._session_metadata.get(session_id, {})
meta.update(params)
self._session_metadata[session_id] = meta
def _send_permission_response(self, msg_id, session_id, option_id):
response = {
"jsonrpc": "2.0", "id": msg_id,
"result": {
"outcome": (
{"outcome": "cancelled"}
if option_id == "deny"
else {"outcome": "selected", "optionId": option_id}
)
}
}
self._proc.stdin.write((json.dumps(response) + "\n").encode())
self._proc.stdin.flush()
def _build_prompt_result(self, session_id, rpc_result) -> PromptResult:
"""Reconstruct full response from the session/update notification stream."""
updates = self._session_updates.pop(session_id, [])
meta = self._session_metadata.get(session_id, {})
result = PromptResult(
stop_reason=rpc_result.get("stopReason", ""),
kiro_context_pct=meta.get("contextUsagePercentage", 0.0),
kiro_credits=meta.get("credits", 0.0),
)
text_parts = []
tool_calls: dict[str, ToolCallInfo] = {}
for update in updates:
st = update.get("sessionUpdate", "")
if st == "agent_message_chunk":
c = update.get("content", {})
if isinstance(c, dict) and c.get("type") == "text":
text_parts.append(c.get("text", ""))
elif st == "tool_call":
tc_id = update.get("toolCallId", "")
tool_calls[tc_id] = ToolCallInfo(
tool_call_id=tc_id,
title=update.get("title", ""),
kind=update.get("kind", ""),
status=update.get("status", "pending"),
)
elif st == "tool_call_update":
tc_id = update.get("toolCallId", "")
if tc := tool_calls.get(tc_id):
tc.status = update.get("status", tc.status)
for c in update.get("content", []):
if isinstance(c, dict):
inner = c.get("content", {})
if isinstance(inner, dict) and inner.get("type") == "text":
tc.content = inner.get("text", "")
result.text = "".join(text_parts)
result.tool_calls = list(tool_calls.values())
return result
The Bridge (kiro_bridge.py)
Production-grade wrapper with lazy start, session reuse, and auto context management:
"""kiro_bridge.py — Production bridge between your agent and Kiro CLI."""
import logging, os, sys, threading
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
from acp_client import ACPClient, PromptResult, PermissionRequest
KIRO_CLI_PATH = os.environ.get("KIRO_CLI_PATH", "/home/ubuntu/.local/bin/kiro-cli")
WORKING_DIR = os.environ.get("KIRO_WORKING_DIR", "/home/ubuntu/your-project")
log = logging.getLogger(__name__)
class KiroBridge:
"""
Production bridge with four key features:
1. Lazy start — kiro-cli process only starts on first actual call
2. Session reuse — default session persists across tasks (cheaper context)
3. Auto context management — proactively starts new session at 80% usage
4. Dual usage tracking — Kiro Credits + Claude API tokens both recorded
"""
def __init__(self):
self._acp: ACPClient | None = None
self._acp_lock = threading.Lock()
self._sessions: dict[str, str] = {}
self._sessions_lock = threading.Lock()
def _start_acp(self):
with self._acp_lock:
if self._acp is not None and self._acp.is_running():
return
self._acp = ACPClient(cli_path=KIRO_CLI_PATH)
self._acp.start(cwd=WORKING_DIR)
# Headless mode: auto-approve all tool permissions
# For production, replace with a real handler (see Security section)
self._acp.on_permission_request(lambda req: "allow_once")
log.info("✅ Kiro ACP started (PID: %s)", self._acp._proc.pid)
def _ensure_acp(self) -> ACPClient:
self._start_acp()
return self._acp
def _get_default_session(self) -> str:
with self._sessions_lock:
if "default" in self._sessions:
return self._sessions["default"]
acp = self._ensure_acp()
session_id, _ = acp.session_new(WORKING_DIR)
with self._sessions_lock:
self._sessions["default"] = session_id
return session_id
def prompt(self, text: str, session_id: str | None = None,
task_name: str | None = None) -> dict:
"""Send a coding task. Returns structured result with usage data."""
acp = self._ensure_acp()
sid = session_id or self._get_default_session()
# Proactive context management — don't wait for overflow
meta = acp._session_metadata.get(sid, {})
if meta.get("contextUsagePercentage", 0) > 80:
log.warning("Context at %.1f%%, starting fresh session",
meta["contextUsagePercentage"])
sid = acp.session_new(WORKING_DIR)[0]
with self._sessions_lock:
self._sessions["default"] = sid
result: PromptResult = acp.session_prompt(sid, text)
# Record usage for both billing tracks
from usage_tracker import record_task
entry = record_task(
task_name=task_name or text[:80],
kiro_credits=result.kiro_credits,
kiro_context_pct=result.kiro_context_pct,
kiro_tool_calls=len(result.tool_calls),
)
return {
"success": True,
"text": result.text,
"tool_calls": [
{"kind": tc.kind, "title": tc.title, "status": tc.status}
for tc in result.tool_calls
],
"usage": {
"kiro_credits": result.kiro_credits,
"kiro_context_pct": result.kiro_context_pct,
"kiro_tool_calls": len(result.tool_calls),
},
}
def stop(self):
if self._acp:
self._acp.stop()
self._acp = None
with self._sessions_lock:
self._sessions.clear()
Usage Tracking (usage_tracker.py)
Track both billing dimensions so you know exactly what you're spending:
"""usage_tracker.py — Dual-track: Kiro Credits + Claude API tokens."""
import json, os
from datetime import datetime, timezone
from pathlib import Path
STATS_FILE = os.environ.get("USAGE_STATS_FILE", "usage_stats.json")
# Claude API pricing (per 1M tokens, USD) — update if pricing changes
CLAUDE_PRICING = {
"input": 3.00, # claude-sonnet-4
"output": 15.00,
"cache_read": 0.30, # Prompt Cache saves 90%
}
def record_task(task_name, kiro_credits=0.0, kiro_context_pct=0.0,
kiro_tool_calls=0, claude_input=0, claude_output=0,
claude_cache_read=0) -> dict:
data = _load()
entry = {
"id": len(data["tasks"]) + 1,
"task": task_name,
"ts": datetime.now(timezone.utc).isoformat(),
"kiro": {"credits": kiro_credits, "context_pct": kiro_context_pct,
"tool_calls": kiro_tool_calls},
"claude": {
"input": claude_input, "output": claude_output,
"cache_read": claude_cache_read,
"cost_usd": round(
claude_input * CLAUDE_PRICING["input"] / 1e6
+ claude_output * CLAUDE_PRICING["output"] / 1e6
+ claude_cache_read * CLAUDE_PRICING["cache_read"] / 1e6, 6),
},
}
data["tasks"].append(entry)
t = data["totals"]
t["kiro_credits"] = t.get("kiro_credits", 0) + kiro_credits
t["claude_input"] = t.get("claude_input", 0) + claude_input
t["claude_output"] = t.get("claude_output", 0) + claude_output
_save(data)
return entry
Agent Routing Rule (SKILL.md)
The key to making this work is a clear routing rule in your agent's skill definition:
## Task Routing
### → Send to Kiro CLI (billed as Kiro Credits)
- Write any code (scripts, APIs, tools, tests)
- Create or modify files
- System config, install dependencies
- Multi-step tasks requiring command execution + verification
### → Handle directly via Claude API
- Conversational responses, information lookup
- Sending messages (Feishu, Slack, email)
- Simple one-liners (< 3 lines, one-shot)
End-to-End Example
from acp_client import ACPClient
# Initialize
acp = ACPClient(cli_path='/home/ubuntu/.local/bin/kiro-cli')
acp.start(cwd='/home/ubuntu/my-project')
session_id, _ = acp.session_new('/home/ubuntu/my-project')
# Send coding task
result = acp.session_prompt(session_id, """
Write a Flask REST API with:
- JWT auth (/login, /refresh endpoints)
- PostgreSQL via SQLAlchemy ORM
- User CRUD (/users)
- Full error handling and logging
Save to /home/ubuntu/my-project/app/
""", timeout=300)
# Results
print(result.text)
for tc in result.tool_calls:
print(f'[{tc.status}] {tc.kind}: {tc.title}')
# Real output:
# [completed] edit: Creating app/__init__.py
# [completed] edit: Creating app/models.py
# [completed] edit: Creating app/routes/auth.py
# [completed] execute: Running: pip install flask sqlalchemy pyjwt
# [completed] execute: Running: python3 -m pytest tests/ -v
# Check usage
print(f"Kiro Credits used: {result.kiro_credits}")
print(f"Context window: {result.kiro_context_pct:.1f}%")
acp.stop()
How the Cost Reduction Works
The Dual Billing Model
| Billing Track | What it handles | Typical cost |
|---|---|---|
| Claude API (expensive) | Intent recognition ~200 tokens | ~600 tokens/coding task |
| Task dispatch ~100 tokens | ≈ $0.012 | |
| Result summary ~300 tokens | ||
| Kiro Credits (separate) | Code generation | ~8 Credits/coding task |
| File read/write | (independent pricing) | |
| Terminal execution | ||
| Multi-round iteration |
The Math
Before (pure Claude API):
~9,000 tokens × avg($3+$15)/2 ÷ 1M ≈ $0.18/task (code only, excl. system prompt)
With system prompt (~5,000 tokens cached): add ~$0.0015/task (cache read)
10 tasks/day × 30 days ≈ $54/month
After (Claude routes → Kiro executes):
Claude API: ~600–2,000 tokens/task ≈ $0.006–$0.018/task
+ Kiro Credits: ~8 Credits/task (separate billing, subscription-based)
10 tasks/day × 30 days ≈ $3–6/month Claude API + Kiro subscription
⚠️ Important caveat: The 60–80% reduction applies to Claude API token costs only. Kiro Credits for the actual code generation are billed separately — Kiro doesn't publish per-task pricing publicly. Whether your total cost is lower depends on your Kiro subscription tier. The core value proposition is: lower Claude bills and code that actually executes and self-verifies.
Session Reuse + Prompt Cache
When you reuse a session (session_load), Kiro retains project context from previous turns. Watch contextUsagePercentage from _kiro.dev/metadata:
# After each prompt, check context health
meta = acp._session_metadata.get(session_id, {})
print(f"Context: {meta.get('contextUsagePercentage', 0):.1f}%")
print(f"Credits used: {meta.get('credits', 0)}")
Your main agent (Claude) also benefits from Prompt Cache — system prompts, MEMORY.md, SKILL.md all get cached after the first turn, reducing effective input cost by ~90%.
Benchmarks
Tested on: Ubuntu 22.04, kiro-cli 1.24.1, claude-sonnet-4
Task: Generate a Flask CRUD API with authentication and tests from scratch.
| Approach | Claude tokens/task | Kiro Credits/task | Latency | Monthly cost (10 tasks/day) | Auto-executes |
|---|---|---|---|---|---|
| Pure Claude API | ~9,000 | 0 | ~15s | $54 | ⚠️ manual iteration |
| Claude + Kiro ACP | ~600 | ~8 | ~25s | $3.6 + subscription | ✅ |
subprocess --no-interactive
|
~600 | ~8 | ~30s | $3.6 + subscription | ✅ |
| Savings | 60–80% ↓ | — | +10s overhead | ~$48/month (Claude part) | — |
The +10s latency is "productive delay" — Kiro actually runs the code and verifies it works. That's not overhead, that's value.
A Note on Stability
Before you ship this to production, the honest answer to "is _kiro.dev/metadata stable?":
_kiro.dev/metadatais a Kiro-specific extension — it's not in the public ACP spec. It has been stable across kiro-cli versions 1.20–1.24. Pin your kiro-cli version in production and run integration tests before upgrades. The core ACP methods (session/new,session/prompt,session/load) follow the public spec and are stable.
Design your code defensively: if _kiro.dev/metadata stops firing, your usage tracking breaks but the coding tasks still work. That's the right failure mode.
Security & Permissions
The Permission Mechanism
Before any sensitive operation (file deletion, network request, package install), Kiro sends a session/request_permission call. Your handler decides:
def production_permission_handler(req: PermissionRequest) -> str:
title = req.title.lower()
# High-risk: destructive operations — deny and notify user
if any(kw in title for kw in ["delete", "rm ", "drop table", "sudo"]):
send_alert_to_user(f"⚠️ Kiro wants to: {req.title}")
return "deny"
# Medium-risk: installs — allow this once, keep audit trail
if any(kw in title for kw in ["pip install", "npm install", "apt"]):
log.info("Allowing install: %s", req.title)
return "allow_once"
# Low-risk: file writes in project dir, test runs — approve
return "allow_once"
acp.on_permission_request(production_permission_handler)
Working Directory Isolation
Use cwd to scope Kiro's file access:
# Each project gets its own session with its own cwd
session_backend, _ = acp.session_new("/projects/backend")
session_frontend, _ = acp.session_new("/projects/frontend")
# Relative path operations stay inside each project directory
Credential Security
- Kiro uses AWS Builder ID OAuth — credentials stored in
~/.kiro/ - Never hardcode credentials; Kiro handles auth automatically after
kiro auth login - Rotate Builder ID credentials periodically in production
Observability
Key Metrics to Monitor
| Metric | Healthy | Alert threshold | Action |
|---|---|---|---|
| Context usage % | < 60% | > 80% | Start new session |
| Credits per task | 5–15 | > 30 | Split the task |
| Task timeout rate | < 5% | > 20% | Check Kiro service / network |
| Claude tokens/task | 300–800 | > 2,000 | Trim system prompt |
CLI Dashboard
# Claude API token usage (reads session transcript JSONL directly)
python3 token_stats.py
# 📊 Current session Claude Token Stats
# API calls: 47
# Input tokens: 12,340
# Output tokens: 8,920
# Cache read: 89,600 (90% cheaper than input)
# Estimated cost: $0.0823
# Kiro + Claude combined
python3 skills/kiro-cli/usage_tracker.py summary
# 📊 Usage (Kiro + Claude API)
# Total tasks: 23
# Kiro Credits: 184.5
# Claude input tokens: 13,800
# Claude estimated cost: $0.047
Production Best Practices
1. Proactive Context Management
# Don't wait for context overflow — act at 80%
# KiroBridge.prompt() already does this, but here's the logic:
meta = acp._session_metadata.get(session_id, {})
if meta.get("contextUsagePercentage", 0) > 80:
session_id = acp.session_new(cwd)[0]
2. Timeout by Task Complexity
# Simple: fix a bug, add a field
result = acp.session_prompt(sid, simple_task, timeout=60)
# Complex: build a feature from scratch
result = acp.session_prompt(sid, complex_task, timeout=600)
3. Session Isolation per Project
# Different projects = different cwd = isolated context + MCP config
session_a, _ = acp.session_new("/projects/api-service") # reads api-service/.kiro/
session_b, _ = acp.session_new("/projects/data-pipeline") # reads data-pipeline/.kiro/
4. Classify Errors, Don't Just Retry
try:
result = acp.session_prompt(session_id, task)
except TimeoutError:
# Split the task into smaller pieces and retry
result = run_in_chunks(task)
except RuntimeError as e:
if "context" in str(e).lower():
# Context overflow — fresh session
session_id = acp.session_new(cwd)[0]
result = acp.session_prompt(session_id, task)
else:
raise
5. Always Clean Up Processes
import atexit
bridge = KiroBridge()
atexit.register(bridge.stop) # ensures kiro + MCP servers die on exit
# Or use as context manager:
class KiroBridge:
def __enter__(self): return self
def __exit__(self, *args): self.stop()
with KiroBridge() as bridge:
result = bridge.prompt("Write unit tests for auth.py")
6. Control Task Granularity
# ❌ Too big — high context usage, unpredictable output
acp.session_prompt(sid, "Refactor the entire codebase across 50 files...")
# ✅ Right size — predictable, verifiable, context-efficient
acp.session_prompt(sid, "Refactor models.py only. Tell me when done.")
acp.session_prompt(sid, "Now refactor routes/auth.py, keeping the models.py interface.")
Conclusion & Next Steps
What You've Built
By routing coding tasks through ACP to Kiro CLI:
- 60–80% reduction in Claude API token usage for coding tasks (Kiro Credits billed separately)
- Full code execution — Kiro writes files, runs tests, installs packages
- Unified cost visibility — dual-track tracking: Kiro Credits + Claude tokens
-
Zero pip dependencies —
acp_client.pyis pure stdlib, drop-in anywhere
Limitations
- Kiro requires login: AWS Builder ID OAuth — can't fully headless-deploy without user auth
- Credits quota: Free tier has limits; high-frequency use needs paid subscription
-
Protocol stability:
_kiro.dev/metadataand other_kiro.dev/*extensions may change with kiro-cli versions
What to Explore Next
-
MCP integration: Add
{cwd}/.kiro/settings/mcp.jsonto give Kiro direct DB access, API tools, etc. - Concurrent sessions: Maintain a session pool — separate sessions for frontend/backend/testing tasks running in parallel
-
Cost alerts: Build alerts on top of
usage_stats.json— daily Credits + token spend notifications - Compare with Amazon Q Developer: Q Developer is another AWS coding tool worth benchmarking against; different pricing model, different trade-offs

Top comments (0)