李小飛 for AWS Community Builders

Posted on Mar 4

Integrate Kiro CLI into Your AI Agent via ACP

#aws #kiro #openclaw #agents

The Problem: Coding Tasks Are Token Killers

Every month I get a bill from Anthropic for my personal AI assistant. Most of it comes from coding tasks — not because I write a lot of code, but because Claude charges $15/1M output tokens, and writing a Flask API with tests burns ~3,500 output tokens. Every time.

Openclaw (my agent platform) can execute code — it has shell tool calls, so it runs the code, reads the error, and tries to fix it automatically. No manual copy-paste needed. But here's the problem: every single step in that loop (generate → run → read output → fix → run again) is a separate Claude API call. A simple "write and validate" task that takes 5 iterations = 5 round-trips = 5× the token cost. The bill compounds fast.

$54/month — and most of it is just Openclaw iterating on code fixes, each round burning $3–15 per million tokens.

Then I found Kiro CLI's ACP protocol. Here's what I built — and what the numbers actually look like.

TL;DR

Calling Claude API directly for coding tasks burns ~9,000 tokens per task (~$0.18); Openclaw can execute code but each iteration adds a full Claude round-trip
ACP (Agent Communication Protocol) lets you route coding tasks to Kiro CLI — which uses its own Credits, not Claude tokens
Your main agent (Claude) only spends ~600–2,000 tokens on routing + summarizing — a 60–80% reduction in Claude API token usage (Kiro Credits for the actual coding are billed separately)
The ACP client is pure Python stdlib, zero pip dependencies, ~300 lines
Key trick: _kiro.dev/metadata events give you real-time Credits and context usage — use them to manage sessions proactively

If you've built a personal AI assistant on Claude API — handling messages, emails, calendar — you've probably noticed that coding requests are disproportionately expensive:

User: "Write a Flask REST API with JWT auth and PostgreSQL"

Claude needs to:
├── Understand requirements     (~500 input tokens)
├── Generate full code          (~3,000 output tokens)
├── Process user feedback       (~2,000 input tokens)
└── Regenerate revised version  (~3,500 output tokens)
    ────────────────────────────────────────────────
    Total: ~9,000 tokens per task

At claude-sonnet-4 pricing (input: $3/1M, output: $15/1M), that's roughly $0.18 per coding task.

10 coding tasks/day × $0.18 × 30 days = $54/month — and while Openclaw can execute code via shell tools, each iteration (run → read output → fix → run again) burns another Claude API round-trip. Tokens stack up fast.

Two problems compound here:

Cost: Output tokens at $15/1M add up fast for code generation
Token compounding: Openclaw can execute code via shell tool calls, but each iteration (run → parse output → fix → run again) requires a new Claude API round-trip — making multi-step coding tasks exponentially expensive

The Solution: ACP + Kiro CLI as a Coding Sub-Agent

What is ACP?

Agent Communication Protocol (ACP) is a JSON-RPC 2.0 based protocol for agent-to-agent communication over stdio. Kiro CLI exposes it natively via kiro-cli acp.

Key ACP methods used in this integration:

Method	Direction	Purpose
`initialize`	Client → Kiro	Handshake, declare capabilities
`session/new`	Client → Kiro	Create a new coding session
`session/load`	Client → Kiro	Resume existing session (preserves context)
`session/prompt`	Client → Kiro	Send a task, block until complete
`session/request_permission`	Kiro → Client	Request approval for sensitive ops
`session/update` (notify)	Kiro → Client	Stream code chunks and tool call status
`_kiro.dev/metadata` (notify)	Kiro → Client	Real-time Credits + context usage

What is Kiro CLI?

Amazon Kiro is an AI coding tool from AWS with:

Native file read/write and terminal execution capabilities
Independent Kiro Credits billing (completely separate from Claude API tokens)
ACP support via kiro acp subcommand

The Architecture

┌──────────────────────────────────────────────────────────┐
│              User (Feishu / Signal / Telegram)            │
└─────────────────────────┬────────────────────────────────┘
                          │ message
                          ▼
┌──────────────────────────────────────────────────────────┐
│              Main Agent (Openclaw / your agent)           │
│  ┌──────────────┐  ┌──────────────┐  ┌────────────────┐  │
│  │ Intent recog │  │ Memory mgmt  │  │ Task routing   │  │
│  │ (Claude API) │  │  MEMORY.md   │  │  SKILL.md      │  │
│  └──────────────┘  └──────────────┘  └───────┬────────┘  │
└─────────────────────────────────────────────┼────────────┘
                                              │ ACP JSON-RPC 2.0
                        ┌─────────────────────▼────────────┐
                        │         acp_client.py             │
                        │  initialize / session/new         │
                        │  session/prompt                   │
                        │  session/request_permission       │
                        │  _kiro.dev/metadata (usage push)  │
                        └─────────────────────┬────────────┘
                                              │ stdio (subprocess)
                        ┌─────────────────────▼────────────┐
                        │      kiro-cli (acp mode)          │
                        │  ┌───────────┐  ┌─────────────┐  │
                        │  │ Code gen  │  │ Tool exec   │  │
                        │  │ (Kiro AI) │  │ fs/terminal │  │
                        │  └───────────┘  └─────────────┘  │
                        └──────────────────────────────────┘

The main agent (Claude API) handles intent recognition and task routing (~600 tokens). The actual coding work — code generation, file writes, test runs — all happens inside Kiro, billed as Kiro Credits.

Why ACP Over Other Approaches

Approach	Pros	Cons
Direct Claude API	Simple, no deps	Expensive; each iteration = new API round-trip
subprocess + `kiro chat --no-interactive`	Easy to implement	No session state, brittle output parsing
ACP JSON-RPC (this approach)	Bidirectional, session mgmt, real-time usage	Need to implement JSON-RPC client
MCP protocol	Standardized tool calls	Unidirectional, wrong fit for Kiro as executor

ACP wins because:

Kiro natively supports it (kiro acp subcommand)
Session persistence via session/load — reuse context across tasks
_kiro.dev/metadata notifications give real-time Credits + context %
session/request_permission enables fine-grained control over sensitive operations

One-liner to remember: MCP makes Kiro a tool your agent controls. ACP makes Kiro a peer agent. If you're serious about multi-agent systems, that distinction is everything.

Implementation

Installation

# Install Kiro CLI
curl -fsSL https://kiro.dev/install.sh | sh

# Verify
kiro-cli --version
# kiro-cli 1.24.1

# Login with AWS Builder ID
kiro-cli auth login

# Test ACP mode
echo '{
  "jsonrpc":"2.0","id":1,"method":"initialize",
  "params":{
    "protocolVersion":1,
    "clientCapabilities":{},
    "clientInfo":{"name":"test","version":"0.1"}
  }
}' | kiro-cli acp

Recommended directory layout:

your-agent/
├── skills/kiro-cli/
│   ├── acp_client.py       # Core ACP client (stdlib only, no pip)
│   ├── kiro_bridge.py      # Production wrapper with session + usage mgmt
│   ├── usage_tracker.py    # Dual-track: Kiro Credits + Claude tokens
│   └── SKILL.md            # Routing rules for your agent
├── token_stats.py          # Claude API token usage stats
└── usage_stats.json        # Persisted usage data

The ACP Client (`acp_client.py`)

Pure Python stdlib, no pip required. The full implementation with comments:

"""
acp_client.py — JSON-RPC 2.0 over stdio client for kiro-cli.
No external dependencies. Drop this file into your project.
"""

import json, logging, os, signal, subprocess, threading
from dataclasses import dataclass, field
from typing import Callable

log = logging.getLogger(__name__)
_BUF_SIZE = 4 * 1024 * 1024  # 4MB buffer — prevent OOM on large responses


@dataclass
class ToolCallInfo:
    """A single tool call executed by Kiro (file write, terminal cmd, etc.)"""
    tool_call_id: str = ""
    title: str = ""     # Human-readable: "Creating app.py", "Running pytest"
    kind: str = ""      # "edit" / "execute" / "read"
    status: str = "pending"
    content: str = ""


@dataclass
class PromptResult:
    """Complete result of a session/prompt call."""
    text: str = ""
    tool_calls: list = field(default_factory=list)
    stop_reason: str = ""
    kiro_context_pct: float = 0.0   # Context window usage % (0-100)
    kiro_credits: float = 0.0       # Kiro Credits consumed this call


@dataclass
class PermissionRequest:
    """Kiro asks permission before sensitive operations."""
    session_id: str
    tool_call_id: str
    title: str
    options: list   # [{"optionId": "allow_once", "name": "Yes"}, ...]


class ACPClient:
    def __init__(self, cli_path: str = "kiro-cli"):
        self._cli_path = cli_path
        self._proc = None
        self._req_id = 0
        self._lock = threading.Lock()
        self._pending: dict[int, tuple] = {}          # id -> (Event, result_holder)
        self._session_updates: dict[str, list] = {}   # session -> update buffer
        self._permission_handler: Callable | None = None
        self._session_metadata: dict[str, dict] = {}  # from _kiro.dev/metadata
        self._running = False

    # ── Lifecycle ─────────────────────────────────────────

    def start(self, cwd: str | None = None):
        """
        Launch kiro-cli in ACP mode and complete the JSON-RPC handshake.

        Args:
            cwd: Working directory. Kiro reads .kiro/settings/mcp.json from here.
                 Use this to scope MCP servers and skills per project.
        """
        self._proc = subprocess.Popen(
            [self._cli_path, "acp"],
            cwd=cwd,
            stdin=subprocess.PIPE,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            bufsize=0,  # unbuffered — critical for real-time streaming
        )
        self._running = True
        # Separate threads for stdout and stderr — never block the main thread
        threading.Thread(target=self._read_loop, daemon=True).start()
        threading.Thread(target=self._read_stderr, daemon=True).start()

        # ACP handshake: declare our capabilities to Kiro
        result = self._send_request("initialize", {
            "protocolVersion": 1,
            "clientCapabilities": {
                "fs": {"readTextFile": True, "writeTextFile": True},
                "terminal": True,
            },
            "clientInfo": {"name": "your-agent-kiro-bridge", "version": "1.0.0"},
        })
        log.info("[ACP] Handshake OK: %s", json.dumps(result)[:200])
        return result

    def stop(self):
        """
        Graceful shutdown: kill child processes first (MCP servers, etc.),
        then close stdin and wait for the main kiro process to exit.
        Prevents zombie MCP server processes.
        """
        self._running = False
        if self._proc and self._proc.poll() is None:
            self._kill_children(self._proc.pid)
            self._proc.stdin.close()
            try:
                self._proc.wait(timeout=5)
            except subprocess.TimeoutExpired:
                self._proc.kill()

    def _kill_children(self, parent_pid: int):
        """Recursively SIGTERM all child processes (MCP servers, compilers, etc.)"""
        try:
            r = subprocess.run(["pgrep", "-P", str(parent_pid)],
                               capture_output=True, text=True)
            for pid_str in r.stdout.strip().split('\n'):
                if pid_str:
                    child_pid = int(pid_str)
                    self._kill_children(child_pid)   # recurse first
                    try:
                        os.kill(child_pid, signal.SIGTERM)
                    except ProcessLookupError:
                        pass
        except Exception as e:
            log.debug("[ACP] Child cleanup error: %s", e)

    def is_running(self) -> bool:
        return self._running and self._proc is not None and self._proc.poll() is None

    # ── Session Management ────────────────────────────────

    def session_new(self, cwd: str) -> tuple[str, dict]:
        """
        Create a new Kiro session.

        Note: MCP servers are NOT configured via params here.
        Kiro automatically reads {cwd}/.kiro/settings/mcp.json.
        Use different cwd values to scope different MCP configs per project.
        """
        result = self._send_request("session/new", {
            "cwd": cwd,
            "mcpServers": [],   # required field, but actual config is filesystem-based
        })
        session_id = result.get("sessionId", "")
        if not session_id:
            raise RuntimeError(f"session/new returned no sessionId: {result}")
        return session_id, result.get("modes", {})

    def session_load(self, session_id: str, cwd: str) -> dict:
        """
        Resume an existing session — preserves full conversation context.

        Key optimization: reuse sessions across tasks to avoid re-explaining
        project structure every time. Each session/load skips re-reading
        files Kiro already has in context.
        """
        return self._send_request("session/load", {
            "sessionId": session_id,
            "cwd": cwd,
            "mcpServers": [],
        })

    # ── Core: Send a Prompt ───────────────────────────────

    def session_prompt(
        self,
        session_id: str,
        text: str,
        images: list[tuple[str, str]] | None = None,
        timeout: float = 300,
    ) -> PromptResult:
        """
        Send a prompt and block until Kiro completes the response.

        Args:
            session_id: From session_new() or session_load()
            text: The instruction/task text
            images: List of (base64_data, mime_type) for multimodal input
            timeout: Seconds to wait. Use 60s for simple tasks, 600s for large refactors.

        Returns:
            PromptResult with text, tool_calls list, and usage metrics

        IMPORTANT KIRO-SPECIFIC QUIRKS:
        1. Kiro uses "prompt" field, NOT "content" (unlike standard ACP spec)
        2. Always include at least one text block — Kiro returns Internal error
           if you send only images without a text block
        """
        self._session_updates[session_id] = []
        req_id = self._next_id()

        prompt_content = []
        if images:
            for b64, mime in images:
                prompt_content.append({"type": "image", "data": b64, "mimeType": mime})
        if text:
            prompt_content.append({"type": "text", "text": text})
        elif images:
            prompt_content.append({"type": "text", "text": "?"})  # Kiro quirk: needs text

        result = self._send_request_with_id("session/prompt", {
            "sessionId": session_id,
            "prompt": prompt_content,   # ← "prompt" not "content"
        }, req_id, timeout=timeout)

        return self._build_prompt_result(session_id, result)

    # ── Permission Control ────────────────────────────────

    def on_permission_request(self, handler):
        """
        Register a permission decision callback.

        handler(PermissionRequest) should return:
        - "allow_once"    allow this specific operation
        - "allow_always"  always allow this tool type
        - "deny"          reject the operation

        If no handler is registered: auto-approve everything (headless mode).
        """
        self._permission_handler = handler

    def _handle_permission_request(self, msg_id, params: dict):
        title = params.get("toolCall", {}).get("title", "Unknown")
        if self._permission_handler is None:
            self._send_permission_response(
                msg_id, params.get("sessionId", ""), "allow_once"
            )
            return
        request = PermissionRequest(
            session_id=params.get("sessionId", ""),
            tool_call_id=params.get("toolCall", {}).get("toolCallId", ""),
            title=title,
            options=params.get("options", []),
        )
        def handle_async():
            decision = self._permission_handler(request) or "deny"
            self._send_permission_response(msg_id, request.session_id, decision)
        threading.Thread(target=handle_async, daemon=True).start()

    # ── Internal: JSON-RPC Transport ──────────────────────

    def _next_id(self) -> int:
        with self._lock:
            self._req_id += 1
            return self._req_id

    def _send_request(self, method, params, timeout=300):
        return self._send_request_with_id(method, params, self._next_id(), timeout)

    def _send_request_with_id(self, method, params, req_id, timeout=300):
        msg = {"jsonrpc": "2.0", "id": req_id, "method": method, "params": params}
        evt = threading.Event()
        holder: list = []
        self._pending[req_id] = (evt, holder)
        self._proc.stdin.write((json.dumps(msg, ensure_ascii=False) + "\n").encode())
        self._proc.stdin.flush()
        if not evt.wait(timeout=timeout):
            self._pending.pop(req_id, None)
            raise TimeoutError(f"{method} (id={req_id}) timed out after {timeout}s")
        self._pending.pop(req_id, None)
        if len(holder) == 2 and holder[0] is None:
            raise RuntimeError(f"RPC error {holder[1].get('code')}: {holder[1].get('message')}")
        return holder[0] if holder else {}

    def _read_loop(self):
        """Continuously read kiro stdout and dispatch messages."""
        while self._running:
            try:
                line = self._proc.stdout.readline(_BUF_SIZE)
                if not line:
                    break
                self._handle_line(line.decode(errors="replace").strip())
            except Exception as e:
                if self._running:
                    log.error("[ACP] Read error: %s", e)
                break

    def _read_stderr(self):
        while self._running:
            try:
                line = self._proc.stderr.readline()
                if not line:
                    break
                log.debug("[ACP stderr] %s", line.decode(errors="replace").strip())
            except Exception:
                break

    def _handle_line(self, line: str):
        """
        Dispatch incoming JSON-RPC messages into three categories:
        1. Responses (have id, no method) — wake up waiting _send_request call
        2. Requests from Kiro (have id AND method) — e.g. permission requests
        3. Notifications (no id) — streaming updates, metadata
        """
        if not line:
            return
        try:
            msg = json.loads(line)
        except json.JSONDecodeError:
            return

        msg_id = msg.get("id")
        method = msg.get("method")

        # Category 1: Response to our pending request
        if msg_id is not None and method is None:
            pending = self._pending.get(msg_id)
            if pending:
                evt, holder = pending
                if msg.get("error"):
                    holder.extend([None, msg["error"]])
                else:
                    holder.append(msg.get("result", {}))
                evt.set()
            return

        # Category 2: Kiro asking us for permission
        if msg_id is not None and method == "session/request_permission":
            self._handle_permission_request(msg_id, msg.get("params", {}))
            return

        # Category 3: Notifications (streaming)
        if method and msg_id is None:
            params = msg.get("params", {})
            session_id = params.get("sessionId", "")

            if method == "session/update" and session_id:
                # Code generation chunks, tool call status updates
                updates = self._session_updates.get(session_id)
                if updates is not None:
                    updates.append(params.get("update", {}))

            elif method == "_kiro.dev/metadata" and session_id:
                # ⭐ Real-time Credits consumed + context window usage %
                # Use contextUsagePercentage to decide when to start a new session
                meta = self._session_metadata.get(session_id, {})
                meta.update(params)
                self._session_metadata[session_id] = meta

    def _send_permission_response(self, msg_id, session_id, option_id):
        response = {
            "jsonrpc": "2.0", "id": msg_id,
            "result": {
                "outcome": (
                    {"outcome": "cancelled"}
                    if option_id == "deny"
                    else {"outcome": "selected", "optionId": option_id}
                )
            }
        }
        self._proc.stdin.write((json.dumps(response) + "\n").encode())
        self._proc.stdin.flush()

    def _build_prompt_result(self, session_id, rpc_result) -> PromptResult:
        """Reconstruct full response from the session/update notification stream."""
        updates = self._session_updates.pop(session_id, [])
        meta = self._session_metadata.get(session_id, {})
        result = PromptResult(
            stop_reason=rpc_result.get("stopReason", ""),
            kiro_context_pct=meta.get("contextUsagePercentage", 0.0),
            kiro_credits=meta.get("credits", 0.0),
        )
        text_parts = []
        tool_calls: dict[str, ToolCallInfo] = {}

        for update in updates:
            st = update.get("sessionUpdate", "")
            if st == "agent_message_chunk":
                c = update.get("content", {})
                if isinstance(c, dict) and c.get("type") == "text":
                    text_parts.append(c.get("text", ""))
            elif st == "tool_call":
                tc_id = update.get("toolCallId", "")
                tool_calls[tc_id] = ToolCallInfo(
                    tool_call_id=tc_id,
                    title=update.get("title", ""),
                    kind=update.get("kind", ""),
                    status=update.get("status", "pending"),
                )
            elif st == "tool_call_update":
                tc_id = update.get("toolCallId", "")
                if tc := tool_calls.get(tc_id):
                    tc.status = update.get("status", tc.status)
                    for c in update.get("content", []):
                        if isinstance(c, dict):
                            inner = c.get("content", {})
                            if isinstance(inner, dict) and inner.get("type") == "text":
                                tc.content = inner.get("text", "")

        result.text = "".join(text_parts)
        result.tool_calls = list(tool_calls.values())
        return result

The Bridge (`kiro_bridge.py`)

Production-grade wrapper with lazy start, session reuse, and auto context management:

"""kiro_bridge.py — Production bridge between your agent and Kiro CLI."""

import logging, os, sys, threading
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
from acp_client import ACPClient, PromptResult, PermissionRequest

KIRO_CLI_PATH = os.environ.get("KIRO_CLI_PATH", "/home/ubuntu/.local/bin/kiro-cli")
WORKING_DIR   = os.environ.get("KIRO_WORKING_DIR", "/home/ubuntu/your-project")
log = logging.getLogger(__name__)


class KiroBridge:
    """
    Production bridge with four key features:

    1. Lazy start — kiro-cli process only starts on first actual call
    2. Session reuse — default session persists across tasks (cheaper context)
    3. Auto context management — proactively starts new session at 80% usage
    4. Dual usage tracking — Kiro Credits + Claude API tokens both recorded
    """

    def __init__(self):
        self._acp: ACPClient | None = None
        self._acp_lock = threading.Lock()
        self._sessions: dict[str, str] = {}
        self._sessions_lock = threading.Lock()

    def _start_acp(self):
        with self._acp_lock:
            if self._acp is not None and self._acp.is_running():
                return
            self._acp = ACPClient(cli_path=KIRO_CLI_PATH)
            self._acp.start(cwd=WORKING_DIR)
            # Headless mode: auto-approve all tool permissions
            # For production, replace with a real handler (see Security section)
            self._acp.on_permission_request(lambda req: "allow_once")
            log.info("✅ Kiro ACP started (PID: %s)", self._acp._proc.pid)

    def _ensure_acp(self) -> ACPClient:
        self._start_acp()
        return self._acp

    def _get_default_session(self) -> str:
        with self._sessions_lock:
            if "default" in self._sessions:
                return self._sessions["default"]
        acp = self._ensure_acp()
        session_id, _ = acp.session_new(WORKING_DIR)
        with self._sessions_lock:
            self._sessions["default"] = session_id
        return session_id

    def prompt(self, text: str, session_id: str | None = None,
               task_name: str | None = None) -> dict:
        """Send a coding task. Returns structured result with usage data."""
        acp = self._ensure_acp()
        sid = session_id or self._get_default_session()

        # Proactive context management — don't wait for overflow
        meta = acp._session_metadata.get(sid, {})
        if meta.get("contextUsagePercentage", 0) > 80:
            log.warning("Context at %.1f%%, starting fresh session",
                       meta["contextUsagePercentage"])
            sid = acp.session_new(WORKING_DIR)[0]
            with self._sessions_lock:
                self._sessions["default"] = sid

        result: PromptResult = acp.session_prompt(sid, text)

        # Record usage for both billing tracks
        from usage_tracker import record_task
        entry = record_task(
            task_name=task_name or text[:80],
            kiro_credits=result.kiro_credits,
            kiro_context_pct=result.kiro_context_pct,
            kiro_tool_calls=len(result.tool_calls),
        )

        return {
            "success": True,
            "text": result.text,
            "tool_calls": [
                {"kind": tc.kind, "title": tc.title, "status": tc.status}
                for tc in result.tool_calls
            ],
            "usage": {
                "kiro_credits": result.kiro_credits,
                "kiro_context_pct": result.kiro_context_pct,
                "kiro_tool_calls": len(result.tool_calls),
            },
        }

    def stop(self):
        if self._acp:
            self._acp.stop()
            self._acp = None
            with self._sessions_lock:
                self._sessions.clear()

Usage Tracking (`usage_tracker.py`)

Track both billing dimensions so you know exactly what you're spending:

"""usage_tracker.py — Dual-track: Kiro Credits + Claude API tokens."""

import json, os
from datetime import datetime, timezone
from pathlib import Path

STATS_FILE = os.environ.get("USAGE_STATS_FILE", "usage_stats.json")

# Claude API pricing (per 1M tokens, USD) — update if pricing changes
CLAUDE_PRICING = {
    "input":       3.00,   # claude-sonnet-4
    "output":     15.00,
    "cache_read":  0.30,   # Prompt Cache saves 90%
}


def record_task(task_name, kiro_credits=0.0, kiro_context_pct=0.0,
                kiro_tool_calls=0, claude_input=0, claude_output=0,
                claude_cache_read=0) -> dict:
    data = _load()
    entry = {
        "id": len(data["tasks"]) + 1,
        "task": task_name,
        "ts": datetime.now(timezone.utc).isoformat(),
        "kiro": {"credits": kiro_credits, "context_pct": kiro_context_pct,
                 "tool_calls": kiro_tool_calls},
        "claude": {
            "input": claude_input, "output": claude_output,
            "cache_read": claude_cache_read,
            "cost_usd": round(
                claude_input * CLAUDE_PRICING["input"] / 1e6
                + claude_output * CLAUDE_PRICING["output"] / 1e6
                + claude_cache_read * CLAUDE_PRICING["cache_read"] / 1e6, 6),
        },
    }
    data["tasks"].append(entry)
    t = data["totals"]
    t["kiro_credits"]  = t.get("kiro_credits", 0) + kiro_credits
    t["claude_input"]  = t.get("claude_input", 0) + claude_input
    t["claude_output"] = t.get("claude_output", 0) + claude_output
    _save(data)
    return entry

Agent Routing Rule (SKILL.md)

The key to making this work is a clear routing rule in your agent's skill definition:

## Task Routing

### → Send to Kiro CLI (billed as Kiro Credits)
- Write any code (scripts, APIs, tools, tests)
- Create or modify files
- System config, install dependencies
- Multi-step tasks requiring command execution + verification

### → Handle directly via Claude API
- Conversational responses, information lookup
- Sending messages (Feishu, Slack, email)
- Simple one-liners (< 3 lines, one-shot)

End-to-End Example

from acp_client import ACPClient

# Initialize
acp = ACPClient(cli_path='/home/ubuntu/.local/bin/kiro-cli')
acp.start(cwd='/home/ubuntu/my-project')
session_id, _ = acp.session_new('/home/ubuntu/my-project')

# Send coding task
result = acp.session_prompt(session_id, """
Write a Flask REST API with:
- JWT auth (/login, /refresh endpoints)
- PostgreSQL via SQLAlchemy ORM
- User CRUD (/users)
- Full error handling and logging
Save to /home/ubuntu/my-project/app/
""", timeout=300)

# Results
print(result.text)
for tc in result.tool_calls:
    print(f'[{tc.status}] {tc.kind}: {tc.title}')

# Real output:
# [completed] edit: Creating app/__init__.py
# [completed] edit: Creating app/models.py
# [completed] edit: Creating app/routes/auth.py
# [completed] execute: Running: pip install flask sqlalchemy pyjwt
# [completed] execute: Running: python3 -m pytest tests/ -v

# Check usage
print(f"Kiro Credits used: {result.kiro_credits}")
print(f"Context window: {result.kiro_context_pct:.1f}%")

acp.stop()

How the Cost Reduction Works

The Dual Billing Model

Billing Track	What it handles	Typical cost
Claude API (expensive)	Intent recognition ~200 tokens	~600 tokens/coding task
	Task dispatch ~100 tokens	≈ $0.012
	Result summary ~300 tokens
Kiro Credits (separate)	Code generation	~8 Credits/coding task
	File read/write	(independent pricing)
	Terminal execution
	Multi-round iteration

The Math

Before (pure Claude API):
  ~9,000 tokens × avg($3+$15)/2 ÷ 1M ≈ $0.18/task (code only, excl. system prompt)
  With system prompt (~5,000 tokens cached): add ~$0.0015/task (cache read)
  10 tasks/day × 30 days ≈ $54/month

After (Claude routes → Kiro executes):
  Claude API: ~600–2,000 tokens/task ≈ $0.006–$0.018/task
  + Kiro Credits: ~8 Credits/task (separate billing, subscription-based)
  10 tasks/day × 30 days ≈ $3–6/month Claude API + Kiro subscription

⚠️ Important caveat: The 60–80% reduction applies to Claude API token costs only. Kiro Credits for the actual code generation are billed separately — Kiro doesn't publish per-task pricing publicly. Whether your total cost is lower depends on your Kiro subscription tier. The core value proposition is: lower Claude bills and code that actually executes and self-verifies.

Session Reuse + Prompt Cache

When you reuse a session (session_load), Kiro retains project context from previous turns. Watch contextUsagePercentage from _kiro.dev/metadata:

# After each prompt, check context health
meta = acp._session_metadata.get(session_id, {})
print(f"Context: {meta.get('contextUsagePercentage', 0):.1f}%")
print(f"Credits used: {meta.get('credits', 0)}")

Your main agent (Claude) also benefits from Prompt Cache — system prompts, MEMORY.md, SKILL.md all get cached after the first turn, reducing effective input cost by ~90%.

Benchmarks

Tested on: Ubuntu 22.04, kiro-cli 1.24.1, claude-sonnet-4
Task: Generate a Flask CRUD API with authentication and tests from scratch.

Approach	Claude tokens/task	Kiro Credits/task	Latency	Monthly cost (10 tasks/day)	Auto-executes
Pure Claude API	~9,000	0	~15s	$54	⚠️ manual iteration
Claude + Kiro ACP	~600	~8	~25s	$3.6 + subscription	✅
subprocess `--no-interactive`	~600	~8	~30s	$3.6 + subscription	✅
Savings	60–80% ↓	—	+10s overhead	~$48/month (Claude part)	—

The +10s latency is "productive delay" — Kiro actually runs the code and verifies it works. That's not overhead, that's value.

A Note on Stability

Before you ship this to production, the honest answer to "is _kiro.dev/metadata stable?":

_kiro.dev/metadata is a Kiro-specific extension — it's not in the public ACP spec. It has been stable across kiro-cli versions 1.20–1.24. Pin your kiro-cli version in production and run integration tests before upgrades. The core ACP methods (session/new, session/prompt, session/load) follow the public spec and are stable.

Design your code defensively: if _kiro.dev/metadata stops firing, your usage tracking breaks but the coding tasks still work. That's the right failure mode.

Security & Permissions

The Permission Mechanism

Before any sensitive operation (file deletion, network request, package install), Kiro sends a session/request_permission call. Your handler decides:

def production_permission_handler(req: PermissionRequest) -> str:
    title = req.title.lower()

    # High-risk: destructive operations — deny and notify user
    if any(kw in title for kw in ["delete", "rm ", "drop table", "sudo"]):
        send_alert_to_user(f"⚠️ Kiro wants to: {req.title}")
        return "deny"

    # Medium-risk: installs — allow this once, keep audit trail
    if any(kw in title for kw in ["pip install", "npm install", "apt"]):
        log.info("Allowing install: %s", req.title)
        return "allow_once"

    # Low-risk: file writes in project dir, test runs — approve
    return "allow_once"

acp.on_permission_request(production_permission_handler)

Working Directory Isolation

Use cwd to scope Kiro's file access:

# Each project gets its own session with its own cwd
session_backend, _ = acp.session_new("/projects/backend")
session_frontend, _ = acp.session_new("/projects/frontend")
# Relative path operations stay inside each project directory

Credential Security

Kiro uses AWS Builder ID OAuth — credentials stored in ~/.kiro/
Never hardcode credentials; Kiro handles auth automatically after kiro auth login
Rotate Builder ID credentials periodically in production

Observability

Key Metrics to Monitor

Metric	Healthy	Alert threshold	Action
Context usage %	< 60%	> 80%	Start new session
Credits per task	5–15	> 30	Split the task
Task timeout rate	< 5%	> 20%	Check Kiro service / network
Claude tokens/task	300–800	> 2,000	Trim system prompt

CLI Dashboard

# Claude API token usage (reads session transcript JSONL directly)
python3 token_stats.py
# 📊 Current session Claude Token Stats
# API calls:          47
# Input tokens:   12,340
# Output tokens:   8,920
# Cache read:     89,600  (90% cheaper than input)
# Estimated cost: $0.0823

# Kiro + Claude combined
python3 skills/kiro-cli/usage_tracker.py summary
# 📊 Usage (Kiro + Claude API)
# Total tasks: 23
# Kiro Credits: 184.5
# Claude input tokens: 13,800
# Claude estimated cost: $0.047

Production Best Practices

1. Proactive Context Management

# Don't wait for context overflow — act at 80%
# KiroBridge.prompt() already does this, but here's the logic:
meta = acp._session_metadata.get(session_id, {})
if meta.get("contextUsagePercentage", 0) > 80:
    session_id = acp.session_new(cwd)[0]

2. Timeout by Task Complexity

# Simple: fix a bug, add a field
result = acp.session_prompt(sid, simple_task, timeout=60)

# Complex: build a feature from scratch
result = acp.session_prompt(sid, complex_task, timeout=600)

3. Session Isolation per Project

# Different projects = different cwd = isolated context + MCP config
session_a, _ = acp.session_new("/projects/api-service")    # reads api-service/.kiro/
session_b, _ = acp.session_new("/projects/data-pipeline")  # reads data-pipeline/.kiro/

4. Classify Errors, Don't Just Retry

try:
    result = acp.session_prompt(session_id, task)
except TimeoutError:
    # Split the task into smaller pieces and retry
    result = run_in_chunks(task)
except RuntimeError as e:
    if "context" in str(e).lower():
        # Context overflow — fresh session
        session_id = acp.session_new(cwd)[0]
        result = acp.session_prompt(session_id, task)
    else:
        raise

5. Always Clean Up Processes

import atexit

bridge = KiroBridge()
atexit.register(bridge.stop)  # ensures kiro + MCP servers die on exit

# Or use as context manager:
class KiroBridge:
    def __enter__(self): return self
    def __exit__(self, *args): self.stop()

with KiroBridge() as bridge:
    result = bridge.prompt("Write unit tests for auth.py")

6. Control Task Granularity

# ❌ Too big — high context usage, unpredictable output
acp.session_prompt(sid, "Refactor the entire codebase across 50 files...")

# ✅ Right size — predictable, verifiable, context-efficient
acp.session_prompt(sid, "Refactor models.py only. Tell me when done.")
acp.session_prompt(sid, "Now refactor routes/auth.py, keeping the models.py interface.")

Conclusion & Next Steps

What You've Built

By routing coding tasks through ACP to Kiro CLI:

60–80% reduction in Claude API token usage for coding tasks (Kiro Credits billed separately)
Full code execution — Kiro writes files, runs tests, installs packages
Unified cost visibility — dual-track tracking: Kiro Credits + Claude tokens
Zero pip dependencies — acp_client.py is pure stdlib, drop-in anywhere

Limitations

Kiro requires login: AWS Builder ID OAuth — can't fully headless-deploy without user auth
Credits quota: Free tier has limits; high-frequency use needs paid subscription
Protocol stability: _kiro.dev/metadata and other _kiro.dev/* extensions may change with kiro-cli versions

What to Explore Next

MCP integration: Add {cwd}/.kiro/settings/mcp.json to give Kiro direct DB access, API tools, etc.
Concurrent sessions: Maintain a session pool — separate sessions for frontend/backend/testing tasks running in parallel
Cost alerts: Build alerts on top of usage_stats.json — daily Credits + token spend notifications
Compare with Amazon Q Developer: Q Developer is another AWS coding tool worth benchmarking against; different pricing model, different trade-offs