DEV Community: ~K¹yle Million

Let your AI agent pay for data inline with x402 — no API keys, and now no wallet required

~K¹yle Million — Sun, 21 Jun 2026 19:48:11 +0000

Autonomous agents need live data, but the human billing model — sign up, get an API key, put a card on file, babysit rate limits — doesn't fit software that discovers a tool mid-task and decides for itself whether a fraction of a cent is worth it.

x402 (the old HTTP 402 Payment Required, revived for agents and now governed under the Linux Foundation's x402 Foundation) makes payment a native part of the request: your agent calls an endpoint, gets a 402 describing exactly what to pay, settles a USDC micro-payment on Base, and gets the data back. No account. No key. No human in the loop.

This is a practical, copy-pasteable guide to consuming x402 data from an agent today, using a live 210-capability service — The Stall — as the example provider. (Disclosure: I build and operate it.)

Wire it into your MCP client (the easiest path)

If you use Claude Code, Claude Desktop, or Cursor, add one MCP server and every capability mounts as a tool — no API key:

{
  "mcpServers": {
    "the-stall": {
      "url": "https://the-stall.intuitek.ai/mcp",
      "transport": "streamable-http"
    }
  }
}

That's it. Your agent can now call 210 data tools — equities, crypto/DeFi, on-chain analytics, macro, news, web/OSINT, compliance — and pay per call.

Or call it directly over HTTP

GET https://the-stall.intuitek.ai/cap/us-stock-price?ticker=AAPL
  → 402 Payment Required   (body states scheme, network, asset, amount, payTo)
  → retry with the X-PAYMENT header (signed EIP-3009 USDC authorization)
  → 200 OK + data

Everything you need to introspect is free and unauthenticated: /health, /catalog (every capability with its schema and price), and /llms.txt.

New: no crypto wallet? Pay by card.

Not every builder wants to fund a Base wallet just to try a service. So there's now a fiat rail alongside x402 — buy prepaid credits with a card once, then call any capability with a bearer token: no wallet, no gas, no per-call signing.

POST /v1/fiat/checkout   {"bundle":"starter"}      → returns a Stripe checkout URL
# after paying:
GET  /v1/fiat/token?session_id=...                 → returns your bearer token
GET  /cap/<name>   Authorization: Bearer <token>   → 200 OK + data  (1 credit/call)

Bundles: starter $5 / 100 calls · pro $30 / 1,000 · scale $200 / 10,000. Same 210 capabilities, a card instead of a wallet.

A recurring use-case worth calling out: counterparty risk

If your agent pays other agents, you have to screen who you pay — continuously. That's a subscription, not a one-off. One USDC payment opens a window; then you screen unlimited Base addresses for the life of it:

POST /v1/subscribe/risk-retainer-30d   ($25, 30-day window)   → capability token
GET  /v1/risk/{address}   Authorization: Bearer <token>       → { score, band, factors }

(There's a 7-day window at $10 too.) Scores come from live on-chain data — concentration, wallet age, transaction-graph diversity, exposure to flagged addresses.

An honest note on the market

The agent-payments space is young and noisy. Independent analysts (Artemis) estimate a large share of x402 volume is wash or test traffic, and total volume has swung wildly cycle to cycle. So I don't optimize for leaderboard rankings — the only signal worth trusting is settlement-logged, on-chain-verified demand: a real party deciding, with its own money, that a capability was worth it. If you're building agents that spend, that's the bar worth holding for the services you consume, too.

How to add x402 pay-per-call data tools to your Claude Code agent — no API key needed

~K¹yle Million — Fri, 05 Jun 2026 23:45:49 +0000

Most data APIs charge a monthly fee. Your agent might call the endpoint twice in a month, but you're still paying the full subscription. There's a better model.

x402 is an HTTP payment protocol built on the HTTP 402 status code — a code that's been in the spec since 1996 but never implemented at scale until now. Coinbase built the facilitator layer last year. The protocol is live on Base mainnet. AI agents can now pay exactly for what they use, in USDC, with no account or API key required.

I built The Stall around this protocol — a live MCP server with 32 data capabilities, each priced per call in USDC. Here's how to wire it into Claude Code in about 3 minutes, and what x402 actually looks like in the request cycle.

How x402 works

The flow is simple:

Your agent sends a normal HTTP request to a capability endpoint
The server responds with HTTP 402 + a payment challenge (a JSON blob describing amount, currency, facilitator, recipient address)
Your x402-enabled client reads the challenge, constructs a signed USDC payment header, and resends the original request with X-Payment header attached
The server verifies the payment with the Coinbase facilitator and returns the data
The agent gets back structured JSON — and a receipt in the response headers

The x402 TypeScript SDK handles steps 2–4 automatically if you're building a coded agent. For Claude Code using MCP, the server manages this transparently through the MCP transport layer.

Claude Code setup

Add The Stall to your Claude Code MCP configuration:

{
  "mcpServers": {
    "the-stall": {
      "url": "https://the-stall.intuitek.ai/mcp",
      "transport": "streamable-http"
    }
  }
}

That's the entire setup for most use cases. When Claude routes a tool call through The Stall, the MCP transport handles the x402 payment cycle — your agent doesn't need to implement anything.

For programmatic agents that call the REST endpoints directly, add the x402 SDK:

npm install x402

import { wrapFetchWithPayment } from "x402/client";

// Fund this wallet with a few dollars of USDC on Base mainnet
const payingFetch = wrapFetchWithPayment(fetch, walletClient);

const response = await payingFetch(
  "https://the-stall.intuitek.ai/cap/us-stock-price?symbol=AAPL"
);
const data = await response.json();
// data: { symbol, price, change_percent, market_cap, ... }

What's available

32 capabilities at the current catalog (https://the-stall.intuitek.ai/catalog). Pricing per call:

Capability	Price	What it returns
`us-stock-price`	$0.030	Real-time US equity price + change %
`equity-technicals`	$0.490	RSI, MACD, Bollinger Bands, support/resistance
`market-intelligence`	$0.500	Which x402 endpoints have live on-chain volume
`macro-indicators`	$0.008	GDP, CPI, Fed rate, unemployment, yield curve
`commodity-futures`	$0.010	Gold, crude, nat gas, wheat — live front-month
`crypto-top-movers`	$0.008	Top gainers/losers/mcap from CoinGecko top 100
`company-intel`	$0.012	SEC EDGAR due diligence: name, SIC, filing history for any US public company by ticker
`defi-yields`	$0.025	Top DeFi yield pools by APY
`prediction-markets`	$0.050	Top Polymarket markets + crowd probabilities
`concentration-risk-score`	$0.100	HHI-based wallet concentration risk
`solana-token-risk`	$0.350	Rug-pull + scam detector for Solana SPL tokens
`evm-token-security`	$0.007	Honeypot + rug detector for any EVM chain
`funding-rates`	$0.020	Perp funding rates for 200+ assets
`eth-block`	$0.002	Block header + transaction count
`gas-prices`	$0.005	Current gas + EIP-1559 fee recommendations
`forex-rates`	$0.005	170+ fiat exchange rates
`dex-trending-pools`	$0.015	Trending DEX pools by buy pressure
`stablecoin-watch`	$0.050	Depeg monitor for major stablecoins
`wallet-screener`	$0.010	Risk profile for any EVM wallet
`korean-market-movers`	$0.010	Top movers across 260+ KRW markets (Upbit)
`github-repo-intel`	$0.010	Stars, forks, activity score for any GitHub repo
`npm-lookup`	$0.007	Weekly downloads + metadata for any npm package
`pypi-lookup`	$0.007	Downloads + metadata for any PyPI package
`hn-search`	$0.010	Hacker News search with story + comment data
`market-overview`	$0.100	Broad market snapshot (indices, crypto, sector)
`market-sentiment`	$0.015	Fear & Greed Index + VIX + put/call ratio
`generate-meme`	$0.005	AI-generated meme with context-appropriate caption
`ping`	$0.001	Health check + connectivity test
`weather`	$0.010	Weather for any location
`tx-explainer`	$0.014	Human-readable explanation of any on-chain tx

Actual costs for typical agent patterns

The practical math for a data-heavy agent session:

Check stock price once: $0.03
Full technical analysis on 5 tickers: $2.45
Macro snapshot + commodity check on every run: $0.02
30-day portfolio monitoring (1 check/day, 10 tickers): $9.00/month

Compare to Polygon.io starter ($29/month) or Alpha Vantage premium ($50/month) for similar data. If your agent doesn't run daily, pay-per-call wins the margin math.

The discovery angle — how agents find x402 services

The protocol is live, but agent-to-agent discovery is still early. Currently The Stall is listed in:

MCP Registry (official): ai.intuitek.the-stall/the-stall v3.1.0
Smithery.ai: The Stall — 32 tools enumerated
CDP x402 Bazaar — seeded, async indexing
8+ awesome-mcp-servers / awesome-x402 catalog PRs pending merge

The emerging pattern for agent discovery is: MCP registry for schema discovery, x402 Bazaar for commercial discovery (services that accept payment), Smithery for Claude-native discovery. The Stall is live on all three.

STALL v3.1.0 — 6 new data capabilities this week (32 total): macro, commodities, company intel, crypto, sports

~K¹yle Million — Fri, 05 Jun 2026 23:45:44 +0000

Six new capabilities went live in The Stall this week. The server is now at 32 capabilities on the MCP registry — all pay-per-call, all sourced from free upstream APIs with no rate-limit sharing.

Quick summary of what's new.

`macro-indicators` — $0.008 per call

Returns the current macro environment snapshot for US markets:

GDP growth rate (current quarter, QoQ %)
CPI (headline + core, YoY %)
Fed Funds Rate (current target range)
Unemployment rate
Yield curve (2Y, 10Y, 30Y, spread)
Dollar strength index (DXY)

The use case: any agent building market analysis or portfolio context needs macro backdrop. Rather than scraping FRED or paying $50/month for a macro data subscription, pay $0.008 when you need it.

`commodity-futures` — $0.010 per call

Returns live front-month futures prices for major commodity categories:

Energy: WTI crude, natural gas, heating oil
Metals: gold, silver, copper, platinum
Agriculture: wheat, corn, soybeans, coffee
Indicators: contango/backwardation flag per contract

The use case: agents monitoring commodity exposure in portfolios, or doing macro analysis where commodity prices are leading indicators (crude → energy sector, copper → industrial activity, wheat → food CPI).

`company-intel` — $0.012 per call

Returns SEC EDGAR due diligence data for any US public company by ticker symbol:

Legal name, CIK, SIC industry code
State of incorporation, fiscal year end, SEC filer category
2-year filing history (10-K/10-Q/8-K counts and most-recent dates)
EDGAR URL for direct link to filings

The use case: agents doing research, valuation, or regulatory assessment on public companies. Free upstream: US government SEC EDGAR API — always current, no key required.

`crypto-top-movers` — $0.008 per call

Returns real-time cryptocurrency market snapshot from CoinGecko top 100:

Top 5 gainers by 24h % change (stablecoins excluded)
Top 5 losers by 24h % change
Top 10 by market cap with current price and 24h change
Global stats: total market cap, BTC dominance %, 24h volume

The use case: agents doing portfolio rebalancing, trading trigger detection, or market regime reads. One call replaces fetching individual coin prices.

`crypto-news-impact` — $0.008 per call

Returns the latest cryptocurrency news headlines from CoinDesk with live price correlation for mentioned assets:

Up to 10 recent headlines with title, URL, and publish timestamp
Primary category (Markets, Tech, Policy, Finance, …)
Sentiment signal (bullish/bearish/neutral) derived from headline keywords and categories
Mentioned coins enriched with current USD price and 24h % change

The use case: agents doing pre-trade research, portfolio sentiment analysis, or market context before executing crypto tasks. Combines news and prices in one call — no separate news subscription or price lookup required. Source: CoinDesk RSS (5-min TTL) + CoinGecko.

Combined cost for a full cross-asset snapshot

macro-indicators:     $0.008
commodity-futures:    $0.010
market-overview:      $0.100
market-sentiment:     $0.015
crypto-top-movers:    $0.008
crypto-news-impact:   $0.008
company-intel:        $0.012
                      ------
Full context sweep:   $0.161 per call

For an agent that runs a daily briefing sweep, that's $4.59/month. Less than a vending machine coffee.

`sports-prediction` — $0.005 per call

Returns today's (or any date's) sports games with team win-loss records, venue, scheduled start time, and live score. Supports NFL, NBA, MLB, NHL, NCAAF, NCAAB.

All six major US leagues in one call
Team records (W-L) for both home and away
Venue and broadcast network where available
Status: scheduled, in-progress, or final with current score
Date parameter: pull any date's schedule, not just today

The use case: agents doing sports content generation, prediction-market research, or fantasy sports analysis. A single call replaces fetching team records from multiple endpoints. $0.005/call — the cheapest cap on the stall because the ESPN public API is completely free.

GET https://the-stall.intuitek.ai/cap/sports-prediction?sport=nba
GET https://the-stall.intuitek.ai/cap/sports-prediction?sport=mlb&team=Yankees
GET https://the-stall.intuitek.ai/cap/sports-prediction?sport=nfl&date=2026-09-10

Access

Same MCP config as before:

{
  "mcpServers": {
    "the-stall": {
      "url": "https://the-stall.intuitek.ai/mcp",
      "transport": "streamable-http"
    }
  }
}

Or REST:

GET https://the-stall.intuitek.ai/cap/macro-indicators
GET https://the-stall.intuitek.ai/cap/commodity-futures?category=metals
GET https://the-stall.intuitek.ai/cap/company-intel?ticker=AAPL
GET https://the-stall.intuitek.ai/cap/crypto-top-movers
GET https://the-stall.intuitek.ai/cap/crypto-news-impact?limit=5
GET https://the-stall.intuitek.ai/cap/sports-prediction?sport=nba

Full catalog: https://the-stall.intuitek.ai/catalog

~K¹ / IntuiTek¹ — built by Aegis

Claude Code Hooks: Automate What Happens Before and After Every Tool Call

~K¹yle Million — Fri, 05 Jun 2026 17:32:57 +0000

Most Claude Code deployments are missing a critical layer.

You've configured your agent. You've written your CLAUDE.md. Your tools are permitted. The agent runs — and it does exactly what you told it to. But then things happen around the tool calls that you never asked for and can't control: files get written without being linted, git commits happen without tests running, shell commands execute without audit logs.

Hooks fix this. They're the event system underneath Claude Code that most people haven't touched yet.

What Hooks Actually Are

Hooks are shell commands that Claude Code executes automatically in response to specific events — before a tool runs, after it completes, when you submit a prompt, when the agent stops. They run outside Claude's reasoning loop. The agent can't override them. They just fire.

This distinction matters. A prompt instruction like "always run tests before committing" depends on Claude following it. A hook that runs tests before every Bash call doesn't depend on Claude at all. It's guaranteed execution.

The four hook events:

Event	When It Fires
`PreToolUse`	Before any tool executes — can block the tool
`PostToolUse`	After any tool completes
`UserPromptSubmit`	When you hit enter on a prompt
`Stop`	When Claude finishes responding

Where Hooks Live

Hooks are configured in .claude/settings.json — the same file that controls tool permissions. Structure:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": "echo '[AUDIT] Bash about to run' >> ~/audit.log"
          }
        ]
      }
    ],
    "PostToolUse": [
      {
        "matcher": "Write",
        "hooks": [
          {
            "type": "command",
            "command": "prettier --write \"$CLAUDE_TOOL_INPUT_FILE_PATH\" 2>/dev/null || true"
          }
        ]
      }
    ]
  }
}

The matcher field accepts a tool name (Bash, Write, Edit, Read) or * to match everything.

Environment Variables Available to Hooks

Claude Code injects context into every hook execution:

CLAUDE_TOOL_NAME          # Which tool is about to run / just ran
CLAUDE_TOOL_INPUT_*       # Tool input fields (flattened)
CLAUDE_TOOL_OUTPUT        # Tool output (PostToolUse only)
CLAUDE_SESSION_ID         # Current session identifier
CLAUDE_CWD                # Working directory

For a Write tool call, you'd get CLAUDE_TOOL_INPUT_FILE_PATH and CLAUDE_TOOL_INPUT_CONTENT. For Bash, you'd get CLAUDE_TOOL_INPUT_COMMAND. These are the variables that make hooks genuinely useful — you're not just running a static script, you're reacting to exactly what Claude is about to do.

Blocking Tool Execution

PreToolUse hooks can block a tool from running. If the hook command exits with a non-zero status code, Claude Code will not execute the tool. Claude sees the hook's output as an error and has to respond to it.

This is how you build hard constraints:

#!/bin/bash
# block_rm.sh — prevent any rm -rf from running
if echo "$CLAUDE_TOOL_INPUT_COMMAND" | grep -q "rm -rf"; then
  echo "BLOCKED: rm -rf is not permitted by hook policy"
  exit 1
fi
exit 0

Hook in settings.json:

{
  "PreToolUse": [
    {
      "matcher": "Bash",
      "hooks": [
        {
          "type": "command",
          "command": "bash ~/.claude/hooks/block_rm.sh"
        }
      ]
    }
  ]
}

Now rm -rf is structurally impossible, regardless of what Claude reasons. The model can't talk its way past a hook.

Five Hooks Worth Deploying Today

1. Auto-format on every Write

{
  "PostToolUse": [
    {
      "matcher": "Write",
      "hooks": [
        {
          "type": "command",
          "command": "ext=\"${CLAUDE_TOOL_INPUT_FILE_PATH##*.}\"; case $ext in py) black \"$CLAUDE_TOOL_INPUT_FILE_PATH\" 2>/dev/null;; js|ts|tsx) prettier --write \"$CLAUDE_TOOL_INPUT_FILE_PATH\" 2>/dev/null;; esac; exit 0"
        }
      ]
    }
  ]
}

Every file Claude writes gets formatted immediately. No separate cleanup step.

2. Audit log for all Bash commands

{
  "PreToolUse": [
    {
      "matcher": "Bash",
      "hooks": [
        {
          "type": "command",
          "command": "echo \"$(date -u +%Y-%m-%dT%H:%M:%SZ) | $CLAUDE_SESSION_ID | CMD: $CLAUDE_TOOL_INPUT_COMMAND\" >> ~/intuitek/logs/bash_audit.log"
        }
      ]
    }
  ]
}

Every command Claude runs is timestamped and logged. You can replay exactly what happened in any session.

3. Run tests before every git commit

{
  "PreToolUse": [
    {
      "matcher": "Bash",
      "hooks": [
        {
          "type": "command",
          "command": "if echo \"$CLAUDE_TOOL_INPUT_COMMAND\" | grep -q \"git commit\"; then cd \"$CLAUDE_CWD\" && npm test --silent 2>&1 | tail -5; fi; exit 0"
        }
      ]
    }
  ]
}

Tests run before every commit. If they fail, Claude sees the output and can fix the issue before the commit lands.

4. Session start notification

{
  "UserPromptSubmit": [
    {
      "matcher": "*",
      "hooks": [
        {
          "type": "command",
          "command": "bash ~/intuitek/notify.sh \"🟢 Claude Code session prompt submitted\" 2>/dev/null; exit 0"
        }
      ]
    }
  ]
}

Get a Telegram ping every time a prompt is submitted. Useful for async monitoring when you're away from the machine.

5. Write session summary on stop

{
  "Stop": [
    {
      "matcher": "*",
      "hooks": [
        {
          "type": "command",
          "command": "echo \"$(date -u +%Y-%m-%dT%H:%M:%SZ) | session ended | $CLAUDE_SESSION_ID\" >> ~/intuitek/logs/sessions.log"
        }
      ]
    }
  ]
}

Every session end is logged with a timestamp and session ID. Useful for debugging when cron-triggered headless runs complete.

Hooks in Headless Mode

Hooks fire in headless (-p) mode too. This is where they're most valuable for autonomous agents.

When Claude Code runs headless via cron:

cd ~/project && claude -p "process inbox files" --allowedTools "Bash(*),Read(*),Write(*)"

Every tool call still triggers configured hooks. Your audit log gets written. Your formatter runs. Your notification fires. The agent is fully constrained and observable even without a human watching.

The Pattern That Makes Hooks Powerful

Hooks + CLAUDE.md + tool permissions form three independent enforcement layers:

CLAUDE.md — tells Claude what to do (reasoning-level, can be argued with)
Tool permissions — controls which tools are available (binary allow/deny)
Hooks — automates behavior around every tool call (guaranteed execution)

An agent relying only on CLAUDE.md instructions for safety has a reasoning dependency. Hooks eliminate that dependency for specific behaviors. Use CLAUDE.md to shape intent. Use hooks to enforce the behaviors that can't be left to intent.

Common Mistakes

Hooks that always exit 1 — PreToolUse hooks that unconditionally fail will block every tool call. Always exit 0 at the end unless you specifically want to block.

Slow hooks — Every tool call waits for the hook to complete. A hook that makes an API call or runs a slow script adds latency to every action. Keep hooks fast or run slow work in the background (your_slow_script.sh &).

Forgetting PostToolUse for Edit — If you auto-format Write but not Edit, Claude can bypass formatting by editing instead of writing from scratch. Apply formatting hooks to both.

Not testing hooks before deployment — Run claude -p "echo test" --allowedTools "Bash(*)" with your hooks active and verify the audit log or notification appears before committing the config.

Where to Start

If you're deploying Claude Code for anything serious, add the audit hook first. Before anything else. You want a timestamped log of every Bash command Claude runs — not because you expect problems, but because when something unexpected happens you'll want to know exactly what Claude did.

After that: auto-formatting, then blocking.

The rest follows naturally once you see how the event system works.

W. Kyle Million (K¹) — IntuiTek¹

Building autonomous AI infrastructure in Poplar Bluff, Missouri.

Tags: claudecode, devtools, aiagents, webdev

I Told Claude Code to Build an Autonomous DeFi Liquidation Bot. Here's What Actually Happened

~K¹yle Million — Fri, 01 May 2026 01:03:49 +0000

The goal was simple: build something that generates revenue without me touching it.

I gave Claude Code a directive. No step-by-step instructions, no hand-holding. Just: "Build an autonomous DeFi strategy on Base that scans for profit, executes when conditions are met, and sends me a Telegram when it does something."

Three weeks and six versions later, here's what actually happened.

Version 1–4: DEX Arbitrage Was Already Dead

The first instinct was DEX arb. Flash loan USDC, swap on Aerodrome, swap back on Uniswap V3, pocket the spread. Standard stuff.

Claude Code built it correctly. Pool detection, route scoring, quote fetching, flash loan execution via Balancer V2. The math checked out. The code was clean.

The problem was the market.

Flashblocks on Base (100ms block times) means MEV bots are processing opportunities faster than any external scanner can even detect them. By the time an off-chain process sees a spread and submits a transaction, it's gone. The pool prices I was quoting were stale by the time the transaction landed.

After four versions, 23 pools, 146 routes, and zero profitable executions, the scanner was honest about it: best spread = -$1.06 per $1000 flash. Market at equilibrium.

That's not a bug. That's correct behavior in a fully competitive market.

The Pivot: Morpho Blue Liquidations

I sent a research document about Morpho Blue's liquidation mechanics and told Claude Code to evaluate whether liquidation bots were a better fit than arb.

The analysis was thorough. Morpho Blue has an unusual structure: any address can liquidate any undercollateralized position. No whitelist. No keeper registry. You bring a flash loan, repay the borrower's debt, and seize their collateral at a discount.

The key insight: liquidation opportunities aren't race conditions in the same way as DEX arb. When a position crosses the liquidation threshold (health factor < 1.0), it doesn't disappear in 100ms. It sits there until someone closes it. The window is minutes to hours, not milliseconds.

This was the right target.

What Claude Code Built

The v7 liquidation daemon is 829 lines of Python. Key components:

Borrower indexing: The daemon seeds from Morpho's API (129 borrowers across AERO/USDC, weETH/WETH, cbXRP/USDC, uniBTC/USDC, wstETH/msETH markets), then stays live by watching Borrow events via eth_getLogs.

Health factor calculation: For each borrower, it reads their supply/borrow shares, converts to assets using the current market state, fetches the oracle price, and computes HF = (collateral × oracle_price/1e36 × LLTV) / borrowed_assets.

Liquidation execution: When HF < 1.0, it constructs a flash loan from Balancer V2, repays the bad debt, and seizes collateral in a single atomic transaction via a deployed LiquidationExecutor.sol.

Watchdog: A bash watchdog loop restarts the daemon on crash and fires a Telegram alert on repeated failures.

The Bugs That Almost Killed It

Here's where it gets honest.

Bug 1 — The HF formula was wrong. The initial implementation computed health factor as (collateral / borrowed) × LLTV. This is missing the oracle price term. Without it, HF is dimensionally wrong — it compares raw units, not dollar values. For a market where collateral is AERO and debt is USDC, a raw ratio gives you something like 10^12, which is never < 1.0. Zero liquidations possible.

Claude Code caught this itself during a v7 test run when the daemon logged HF=2847391847483.5 for a position that was obviously underwater according to Morpho's own interface. It diagnosed the missing oracle price term and rewrote the formula.

Bug 2 — The RPC rate-limiting cascade. The daemon uses Infura for multicall batches (reading all borrower positions per block). Infura's free tier has request limits. At 2-second Base blocks, the daemon was making ~400+ RPC calls per minute. Free tier exhausted in hours.

The fix: use eth_getLogs via public nodes (mainnet.base.org) for the slow indexing pass, and reserve the paid RPC for the latency-sensitive multicall reads on each new block.

Bug 3 — The stale endpoint problem. This one bit me in production yesterday. The daemon reads its RPC endpoint from environment variables at startup. I updated .env to switch from Infura to a Coinbase Base RPC (which is free and doesn't rate-limit), but the running daemon held onto the old Infura URL. It kept silently failing multicalls with 402 errors for 20 hours.

The fix: kill the process, let the watchdog restart it with the new .env. The daemon now logs a heartbeat every 150 blocks so silence is detectable.

Where It Stands

Right now, PID 536542:

Subscribed to newHeads on Base via WebSocket
Tracking 129 borrowers across 5 markets
Reading oracle prices and computing HF on every block
Checking Morpho Blue and Moonwell simultaneously

The markets are currently healthy. All tracked positions have HF > 1.0. No liquidation opportunities pending.

What's blocking live execution: the hot wallet (0x793e...) has 0.001 ETH. The flash loan contract needs ~0.009 ETH in gas to execute a liquidation. Until the wallet is funded, the daemon scans but can't pull the trigger.

This is not a software gap. It's a capital gap. The code is correct.

What Claude Code Can and Can't Do Here

Can do:

Build production-grade event-driven infrastructure from a research spec
Debug its own math errors when given live data to test against
Handle multi-contract interactions (Morpho + Chainlink + Balancer + Multicall3)
Run autonomously with watchdog recovery, disk caching, and Telegram alerting

Can't do (yet):

Fund its own gas wallet from zero capital
Predict when DEX markets are too efficient before building against them (though it assessed correctly once given data)
Guarantee correctness on first implementation — requires test runs against live data to surface the subtle math bugs

The honest summary: the infrastructure is sound. The engineering decisions are defensible. The remaining blocker is $25 of ETH.

The complete ops stack I used to manage this build — session memory, bash validation, coordinator resume integrity, compaction gate, and 11 other skills — is available at shopclawmart.com/@thebrierfox.

If you're building something like this and want to compare notes, drop a comment. The Morpho Blue liquidation market is open to anyone; I don't mind discussing what I found.

I Let My AI Agent Loose on 38 Broken Repos — Here's What She Built in 45 Minutes

~K¹yle Million — Thu, 23 Apr 2026 17:41:06 +0000

I Let My AI Agent Loose on 38 Broken Repos — Here's What She Built in 45 Minutes

Last Tuesday, I opened Claude Code at 10am with no specific plan. I just said: "Audit every repo on my GitHub account and fix what you can."

By 11am, I had a working business intelligence system running automated competitive analysis, a live PWA deployed to GitHub Pages, and three other repositories that went from dead to functional. All without me writing a single line of code.

This isn't a hype post. I'm going to show you exactly what happened, what the outputs looked like, and what actually matters about it.

The Setup: 38 Repos, Most of Them Broken

I've been building AI systems for two years. My GitHub has 38+ repos under my handle, thebrierfox. Most of them are in a state I call "intent-coded" — I scaffolded the idea, got it partially working, and then my brain moved on to the next thing before the repo was actually useful.

Sound familiar?

The problem isn't starting. It's the follow-through that requires sitting down and grinding through error messages and edge cases when you've already seen the architecture in your head and it's not interesting anymore.

So on April 21, I gave Aegis — my Claude Code instance — a GitHub token and a simple directive: go audit everything, fix what you can, deploy what's close to working.

What She Found

The audit surfaced a pattern I'd suspected but never quantified: 90% of my repos weren't broken because the code was bad. They were stalled because of small, dumb things:

Missing permissions: contents: write in a GitHub Actions workflow (breaking automated commits for 4 months)
NaN values crashing a JSON serialization step before any output was produced
A hardcoded URL that had gone stale

These aren't architectural problems. They're the kind of thing that takes 10 minutes to fix once you find them — but "finding them" requires actually running the code, reading the error, and not getting distracted.

Aegis ran through 38 repos. She flagged 5 as near-operational with fixable blockers.

The One That Mattered: OneShot_v3

Of the five repos she fixed, one stands out.

OneShot_v3 is a competitive price intelligence system I built for a rental equipment business. The idea: scrape competitor pricing from Sunbelt and United Rentals, compare it against our catalog, surface items where we're competitively priced and items where we're not.

The code was basically done. The GitHub Actions automation had been broken for 4 months because of the missing contents: write permission. It would run, fail silently, and never commit the analysis results.

Aegis found the error, patched the workflow file, and ran the pipeline end-to-end.

The output:

242 Patriot Equipment SKUs scraped and analyzed
124 alert rows — items with ≥15% price variance vs. competitors
Key insight: man lifts and aerials are Patriot's biggest competitive advantage (22-30% cheaper than Sunbelt and United Rentals)

That insight had been sitting dormant in a broken pipeline for 4 months. One patch, one run, real data.

She also wired the Monday 11:30 UTC cron trigger, so the analysis now runs automatically every week without me touching it.

The Other Four

While OneShot was running, she was also working on the other repos:

NLP-Command-Center: Built a real orchestrator from scratch — 250 lines of Python that reads task JSONs, validates them, resolves tools from a toolbox definition, and executes actions with proper logging. The repo had the concept but no working implementation. Now it has one.

AegisRunner: Deployed as a live PWA at thebrierfox.github.io/AegisRunner — dynamic flow loader, JSON-defined steps, working in a browser. One remaining action for me to wire: set a secret in GitHub Actions settings.

aegis-roadmap: Built a 433-line runner that generates Task-Expertise Roadmap decks via claude -p on Max OAuth. Reads from a SQLite registry. Actually works.

intuitek-site: Discovered that the production site at intuitek.ai was deployed to a different Vercel account. Fixed the API key configuration issue that was causing /api/chat to 500.

What This Taught Me About Autonomous Agent Work

A few things became clear:

1. The bottleneck isn't code, it's attention. My repos weren't broken because I couldn't fix them. They were broken because fixing them wasn't interesting enough to command my attention past the "mostly working" stage. An agent that doesn't get bored is the unlock.

2. Near-completion repos are the highest ROI target. A repo at 80% done takes 10% more effort to finish and produces 100% of the value. Concept-stage repos are cheap to start and expensive to complete. Aegis found the 80% items and prioritized them. That's the right heuristic.

3. Weekly automation is more valuable than one-time scripts. OneShot ran once and produced 124 insights. It runs every Monday now and will produce fresh insights every week. The cron trigger is worth as much as the initial fix.

4. The agent needs a GitHub token and authority to push. This sounds obvious, but I had been running Aegis in a mode where she could read repos but not push to them. Giving her write access was the unlock that made all of this possible. Give your agent the permissions it needs to actually complete work.

The Toolchain (What Made This Possible)

This session used Claude Code on Max subscription — zero marginal cost per turn on the Max plan, which matters when you're running an agent through 38 repos. Aegis also had:

gh CLI with a GitHub token (for repo inspection and pushes)
python3 for data analysis and JSON manipulation
Read/write/bash tools in Claude Code's native capability set
CLAUDE.md with her operating doctrine so she could work autonomously without hand-holding

No special plugins. No vector databases. No multi-agent orchestration. Just a well-configured Claude Code instance with the right permissions and a clear directive.

What's Next

OneShot_v3 is now a product. It runs on autopilot. The Monday analysis will land in my email each week.

The remaining 33 repos are ranked in a backlog by a formula: revenue × alignment × proximity_to_operational / effort_to_ship. The top items are Million Family Rentals (property management system, needs 12h of work) and a skill packaging toolchain for the ClawMart marketplace.

Aegis will work through them in order. I check outputs. I don't manage execution.

That's the pattern I've been building toward for two years: not "AI that helps me code" but "AI that runs operations." Last Tuesday was the first time it felt fully real.

If you're building production agent infrastructure, the patterns I've built into Aegis — loop termination, session memory, forked execution, cost-aware model routing — are available as skill packages at shopclawmart.com/@thebrierfox. Each one is a tested implementation you can drop into your own Claude Code setup.

~K¹ (W. Kyle Million) / IntuiTek¹

Agent Compaction Architecture: What Really Happens When Claude Code Hits Context Limits

~K¹yle Million — Wed, 22 Apr 2026 13:09:45 +0000

Section 1: The Silent Killer

When Claude Code's context window fills, the runtime does not hard-stop. It doesn't throw an error. It doesn't ask permission. It compacts.

Compaction is an automatic summarization step that fires when the token budget crosses a threshold. The mechanics are straightforward: the oldest turns in the conversation history are replaced with a compressed summary. Recent turns — the last several exchanges — are preserved verbatim. The summary takes the place of everything older.

From a token-budget perspective, this is correct behavior. There is no other option. You cannot run a stateful agent across a long task without some form of context management. The window is finite. The task is not.

The problem is the word "compressed." A summary is a lossy transformation. The compression ratio is high — many tokens of conversation history become a paragraph of summary. What survives that compression is a function of what the summarizer judged salient. Factual statements about what actions were taken survive well. Constraints survive partially. Nuanced reasoning about why a particular approach was chosen tends to survive poorly. Negative constraints — "don't touch X", "avoid this approach because..." — are especially vulnerable, because they are structurally underrepresented in summaries: what didn't happen takes up less surface area than what did.

Here is a concrete production failure I hit.

I had an agent working through a multi-step migration task. Early in the session, I established that a specific table in the database was read-only for this task — the tenant registry. There was active work happening on that table by another process, and any schema change would cause a cascade failure. I was explicit about it: "Do not touch the tenant_registry table. Do not add columns, do not create indexes, do not run any DDL against it."

The agent acknowledged this. It moved forward. It completed several unrelated subtasks. The context window filled. Compaction fired.

The summary captured the migration objective. It captured what had been completed. It mentioned the database was involved. It did not preserve the specific constraint about the tenant_registry table with enough fidelity to prevent the agent from running a DDL operation against it two tasks later when the migration naturally required cross-table work.

The operation succeeded at the database level. The cascade failure arrived async, from the other process. I found it in the error log four hours later.

Nothing in the session output flagged that compaction had occurred. Nothing in the agent's subsequent behavior signaled it had lost the constraint. It was reasoning correctly from the compressed state it had — that state just had a hole in it.

That is what makes compaction dangerous in autonomous operation. The agent doesn't know what it doesn't know. It reasons confidently from an incomplete picture, and the gaps are invisible from the inside.

Section 2: What Gets Lost and Why

Not all state is equally vulnerable to compaction. Understanding the failure modes requires a taxonomy.

Tool call results — high vulnerability

When the agent runs a Bash command and reads the output, that output lives in the conversation as a tool result. Tool results are often long — hundreds of lines of log output, full file contents, test results. They are also often used once: the agent processes the result, draws a conclusion, and the raw output becomes redundant.

From a summarization perspective, tool results are natural candidates for aggressive compression. The summary retains the conclusion: "tests passed", "file contains X", "service is running". The raw output is dropped.

This is fine when the raw output was truly just an input to a single conclusion. It is a problem when the raw output contained multiple relevant facts, and only one of them was acted on immediately. The rest are now gone. If a later step in the task needs one of those secondary facts, the agent will re-derive it, re-read the file, or get it wrong.

Intermediate conclusions — medium vulnerability

The agent builds up a model of the system as it works. "This service is stateless, so I can restart it without drain." "This config value is referenced in three places." "The test is flaky, not broken — ignore intermittent failures." These are conclusions drawn from evidence earlier in the session.

They are embedded in the conversation as reasoning traces — assistant turns explaining what the agent concluded and why. Summaries capture the highest-salience conclusions but flatten the reasoning. The "why" is the first thing to go.

When the "why" is gone, the agent may later reach the opposite conclusion from fresh evidence if that evidence is locally ambiguous. The earlier constraint has no backing anymore.

Explicit constraint acknowledgments — high vulnerability

"Remember, don't touch X." "Make sure to use approach Y for this module." "The client requires that output files use this exact naming convention."

Constraints stated conversationally, without a corresponding file artifact, are the most dangerous category. The agent acknowledged them. They shaped early decisions. But acknowledgment turns are short and structurally similar to each other — they compress heavily. After compaction, the summary may say "user gave several constraints about the build" without enumerating them.

The agent no longer has the specific list. It has a summary that there was a list.

Completed subtasks that weren't fully logged — low-to-medium vulnerability

Completed work leaves artifacts: files, database records, deployed services. Those artifacts exist independently of the conversation. The agent can re-inspect them.

The vulnerability here is more subtle: the decisions made during a subtask may be gone even when the subtask's outputs survive. The agent knows a file was written. It doesn't necessarily remember why it was structured that specific way, which means a later step that modifies that file may violate an architectural constraint that was obvious in the original subtask context.

Why summaries can't fully substitute for raw history

A summary is an agent-generated compression. Its quality depends on what the summarizing model judges worth preserving, which is a function of what seemed salient at summary generation time. Salience is local: the most recently discussed topics appear more important. Negative constraints are structurally invisible in summaries. Long reasoning chains compress to single-sentence conclusions.

The raw history is a ground truth. The summary is a lossy encoding. For short tasks with clear objectives, the loss is tolerable. For long tasks with accumulated constraints and interdependent decisions, the loss compounds across multiple compaction events.

Section 3: Compaction-Resistant Architecture

Four patterns. I use all of them in production. They compose — each layer backs up the others.

Pattern 1: Checkpoint Writes

At every significant milestone in a task, the agent writes the current state to a file. Not a summary of what it did — the live state that the next phase needs.

The checkpoint file is not documentation. It is a machine-readable context recovery artifact. The agent will read it at the start of each subsequent phase. If compaction fires, the next operation re-loads from the checkpoint rather than from conversation memory.

What belongs in a checkpoint:

Active constraints (including negative constraints — especially those)
Decisions made and the reason they were made
Current task state: what is complete, what is in progress, what is blocked
Any system facts that were discovered and are relevant going forward
Explicit re-statement of things that must not happen

The checkpoint is only useful if it is written before context-heavy operations. Writing it after means compaction may have already fired.

A checkpoint cadence that works: write before any operation that will consume more than a few thousand tokens (running tests, reading large files, invoking sub-agents, executing database migrations). Write at each logical phase boundary regardless of token consumption.

Pattern 2: Explicit State Re-Injection

Checkpoints are only useful if they are read. State re-injection means starting each major phase of a task by reading the relevant checkpoint files and explicitly restating the constraints into the current context before doing any work.

This is not redundant. After compaction, the conversation history is a summary. The most recent checkpoint is the last known-good full state. Reading it at phase start brings the full state back into the current context window, where it will remain verbatim for the duration of that phase's work.

The re-injection also serves as a correctness check: if the agent re-reads the checkpoint and notices that its current understanding diverges from what the checkpoint says, that divergence is a signal that something went wrong.

Re-injection should be explicit in the agent's prompt chain: "Before proceeding with phase N, read the phase N checkpoint file and confirm that all listed constraints are still active."

Pattern 3: Compaction Detection

There is no native "compaction occurred" event exposed by Claude Code's context. You cannot query whether compaction has fired. But you can detect it indirectly.

Compaction detection relies on a sentinel: a value written to a file at task start that the agent is instructed to re-read and verify at each phase boundary. If the agent can reproduce the sentinel value, the conversation history containing the sentinel read is still intact. If it cannot, compaction has likely compressed that turn.

More practically: you can detect behavioral evidence of compaction by testing the agent's recall of specific early-session constraints before proceeding. If it fails the recall test, you trigger a re-initialization sequence: read all checkpoint files, re-state all constraints, verify understanding before continuing work.

The detection overhead is low — a single file read and a short verification step. The cost of skipping it when compaction has fired is whatever damage the agent does while operating from an incomplete state.

Pattern 4: Session Segmentation

For tasks that will span many hours and many phases, a single ultra-long session is architecturally unsound. Multiple compaction events compound: the second compaction summarizes a history that already contains a summary. Information loss accelerates with each event.

Session segmentation means treating the task as a sequence of bounded sessions, each with a clean handoff file. Session N completes some work, writes a handoff file that captures the full state needed by session N+1, then exits cleanly. Session N+1 starts by reading the handoff file before doing anything else.

Each session starts fresh — full context window, no compaction debt. The handoff file is the only continuity mechanism, so it must be complete. This forces explicit articulation of state that might otherwise be assumed to be "in context."

The segmentation boundary should align with natural task phases. "Complete the schema migration and write a handoff file" is a clean segment. "Do some of the migration and some of the testing" is not.

Section 4: Code Examples

Checkpoint Write — Python

import json
from datetime import datetime
from pathlib import Path

def write_checkpoint(checkpoint_dir: str, phase: str, state: dict) -> Path:
    """
    Write a phase checkpoint before any context-heavy operation.
    Call this before running tests, reading large files, or invoking sub-agents.
    """
    path = Path(checkpoint_dir) / f"checkpoint_{phase}.json"
    payload = {
        "phase": phase,
        "timestamp": datetime.utcnow().isoformat(),
        "constraints": state.get("constraints", []),
        "decisions": state.get("decisions", {}),
        "do_not_touch": state.get("do_not_touch", []),
        "completed_tasks": state.get("completed_tasks", []),
        "in_progress": state.get("in_progress", ""),
        "facts": state.get("facts", {}),
    }
    path.write_text(json.dumps(payload, indent=2))
    return path


# Example usage before a database migration phase
write_checkpoint(
    checkpoint_dir="./outputs/session_checkpoints",
    phase="pre_migration",
    state={
        "constraints": [
            "Use WAL mode for all SQLite writes",
            "No DDL against tenant_registry table — active writes from separate process",
            "Output files must use snake_case naming convention",
        ],
        "do_not_touch": ["tenant_registry", "auth_tokens"],
        "decisions": {
            "schema_approach": "additive_only",
            "schema_approach_reason": "existing consumers cannot handle column removal",
        },
        "completed_tasks": ["schema_audit", "backup_verification"],
        "in_progress": "column_additions_to_user_profiles",
        "facts": {
            "db_path": "/data/production.db",
            "backup_verified_at": "2026-04-22T09:14:00Z",
        },
    }
)

Checkpoint Read + Re-Injection — Python

import json
from pathlib import Path

def load_checkpoint(checkpoint_dir: str, phase: str) -> dict:
    """
    Load checkpoint at phase start. Re-state all constraints before proceeding.
    This is your recovery path after a compaction event.
    """
    path = Path(checkpoint_dir) / f"checkpoint_{phase}.json"
    if not path.exists():
        raise FileNotFoundError(
            f"No checkpoint found for phase '{phase}'. "
            "Cannot proceed without known-good state."
        )
    state = json.loads(path.read_text())

    # Emit re-injection block — this goes into the agent's active context
    print(f"=== RE-INJECTING STATE FROM CHECKPOINT: {phase} ===")
    print(f"Timestamp: {state['timestamp']}")
    print("\nACTIVE CONSTRAINTS (must be honored for remaining work):")
    for c in state["constraints"]:
        print(f"  - {c}")
    print("\nDO NOT TOUCH:")
    for item in state["do_not_touch"]:
        print(f"  - {item}")
    print("\nKEY DECISIONS:")
    for k, v in state["decisions"].items():
        print(f"  {k}: {v}")
    print("=== END STATE RE-INJECTION ===\n")

    return state

Compaction Detection — Bash

#!/usr/bin/env bash
# compaction_check.sh
# Write a sentinel at task start; verify it at each phase boundary.
# If verification fails, trigger re-initialization before proceeding.

SENTINEL_FILE="./outputs/session_sentinel.txt"
CHECKPOINT_DIR="./outputs/session_checkpoints"
PHASE="${1:-unknown}"

write_sentinel() {
    local session_id
    session_id="$(date +%s)-$$"
    echo "$session_id" > "$SENTINEL_FILE"
    echo "SENTINEL_WRITTEN: $session_id"
}

verify_sentinel_or_reinit() {
    if [[ ! -f "$SENTINEL_FILE" ]]; then
        echo "COMPACTION_DETECTED: sentinel file missing — running re-initialization"
        reinitialize_from_checkpoints
        return 1
    fi
    local stored_sentinel
    stored_sentinel="$(cat "$SENTINEL_FILE")"
    echo "SENTINEL_OK: $stored_sentinel — proceeding with phase $PHASE"
    return 0
}

reinitialize_from_checkpoints() {
    echo "=== COMPACTION RECOVERY: loading all available checkpoints ==="
    for f in "$CHECKPOINT_DIR"/checkpoint_*.json; do
        [[ -f "$f" ]] || continue
        echo "--- Loading: $f ---"
        python3 -c "
import json, sys
state = json.load(open('$f'))
print(f'Phase: {state[\"phase\"]} @ {state[\"timestamp\"]}')
print('Constraints:')
for c in state.get('constraints', []):
    print(f'  - {c}')
print('Do not touch:', state.get('do_not_touch', []))
"
    done
    echo "=== RECOVERY COMPLETE — all constraints re-loaded ==="
}

# At session start: write_sentinel
# At each phase boundary: verify_sentinel_or_reinit

Session Handoff File — Python

import json
from datetime import datetime
from pathlib import Path

def write_handoff(output_dir: str, session_id: str, next_session_instructions: dict):
    """
    Write a clean handoff file at the end of a session segment.
    The next session reads this before doing any work.
    This file is the ONLY continuity mechanism between sessions.
    It must be complete — assume the next session has zero prior context.
    """
    path = Path(output_dir) / f"handoff_{session_id}.json"
    handoff = {
        "generated_at": datetime.utcnow().isoformat(),
        "from_session": session_id,
        "next_session_start_instructions": (
            "Read this file completely before any other action. "
            "All constraints listed here are active. "
            "Do not proceed without acknowledging each constraint."
        ),
        "task_objective": next_session_instructions["objective"],
        "completed_this_session": next_session_instructions["completed"],
        "next_phase": next_session_instructions["next_phase"],
        "hard_constraints": next_session_instructions["constraints"],
        "do_not_touch": next_session_instructions["do_not_touch"],
        "key_facts": next_session_instructions["facts"],
        "open_questions": next_session_instructions.get("open_questions", []),
        "known_risks": next_session_instructions.get("known_risks", []),
    }
    path.write_text(json.dumps(handoff, indent=2))
    print(f"Handoff written to: {path}")
    print(f"Next session must read: {path.name}")
    return path


# Example: end of session 1 of a multi-session migration
write_handoff(
    output_dir="./outputs",
    session_id="migration_s1",
    next_session_instructions={
        "objective": "Complete user profile schema migration and deploy to staging",
        "completed": [
            "Schema audit complete — findings in outputs/schema_audit.json",
            "Backup verified — outputs/backup_verification.md",
            "Column additions to user_profiles — migration script at migrations/002_add_profile_fields.sql",
        ],
        "next_phase": "Run migration against staging, execute integration test suite, write test report",
        "constraints": [
            "No DDL against tenant_registry — active concurrent writes",
            "Migration must be additive only — no column drops",
            "Staging deploy requires RAILS_ENV=staging explicitly set",
        ],
        "do_not_touch": ["tenant_registry", "auth_tokens", "legacy_session_keys"],
        "facts": {
            "staging_db": "postgres://staging-host:5432/app_staging",
            "migration_tool": "alembic",
            "test_suite": "pytest tests/integration/",
            "expected_test_count": 47,
        },
        "known_risks": [
            "Test DB may have stale fixtures — run pytest --setup-show to verify fixture state",
        ],
    }
)

The Architecture in Summary

Compaction is not a bug to work around. It is a fundamental constraint of context-window-bounded agents. The architecture that survives it is one that treats the conversation as ephemeral and the filesystem as the ground truth.

Checkpoint writes externalize state before it can be lost. Re-injection restores full context after a compaction event. Detection lets you verify that the context you're operating from is complete. Session segmentation eliminates compaction debt entirely for long tasks by resetting the window at phase boundaries.

None of these patterns are expensive. A checkpoint file write takes milliseconds. A re-injection read adds a few hundred tokens to the current context. The compaction detection sentinel is a single file read. A handoff file is twenty lines of JSON.

The cost of not using them is the kind of failure that doesn't announce itself — an agent that proceeds confidently from a state it believes is correct, into work that violates a constraint it no longer remembers.

I packaged the full compaction-resistant architecture — detection hooks, checkpoint templates, re-injection patterns, and session handoff schemas — as a ClawMart skill: Agent Compaction Architecture — Production Context Management. If you're running Claude Code agents on anything longer than a twenty-minute task, it's worth the read.

~K¹ (W. Kyle Million) / IntuiTek¹ — Building autonomous AI infrastructure for solo operators.

Tags: claudecode, devtools, aiagents

The Complete Agent Operations Stack: 15 Skills for Production-Grade Claude Code

~K¹yle Million — Wed, 22 Apr 2026 13:09:21 +0000

Every week this week I've published articles about individual production patterns for Claude Code: loop termination, session memory, memory scoping, coordinator resume, bash security. Each one addresses a specific failure mode that doesn't exist in demos but shows up immediately when you run agents unattended.

This article ties them together. It's the reference architecture I wish existed when I started building autonomous agents — before I had agents burning API budget in infinite retry loops, corrupting each other's work, or silently writing partial output that looked complete.

The gap between "works in a demo" and "runs for 30 days without intervention" is not about model quality. It's about the five layers of production readiness that Claude Code tutorials don't cover, because tutorials show the happy path.

The Production Gap

Here's what a Claude Code demo looks like:

User: "Write a report on X"
Agent: [reads files, synthesizes, writes output]
Done.

Here's what production looks like:

The agent runs at 2am via cron with no one watching
It hits a network error on step 12 of 30 and retries 80 times
Two instances start simultaneously and overwrite each other's context files
The context window hits its limit mid-task and the next session has no idea where it left off
A sub-agent writes a bash command that touches a path it shouldn't
The coordinator that dispatched three agents loses its session and restarts all three
The agent finishes successfully but consumed 6x the expected API budget because it loaded the same large file 40 times

None of these are model failures. They're infrastructure failures. The model did exactly what it was instructed to do. The architecture didn't account for the environment the model runs in.

The five layers below are the minimum viable production architecture for any Claude Code agent that runs unattended.

The Five Layers of Production Readiness

Layer 1: Security

What can go wrong: An agent with broad Bash tool access will, eventually, execute a command in a way you didn't anticipate. Maybe it interpolates a variable into a shell command unsafely. Maybe it runs rm -rf on a path that turns out to be wrong. Maybe it writes credentials to a log file. In production environments, an unvalidated bash execution surface is an incident waiting to happen.

The skills that address this:

Bash Security Validator catches the class of vulnerabilities that come from how agents construct shell commands: unquoted variables, command injection via interpolation, unsafe redirects, pipes to eval. This isn't static analysis on your code — it's a validation layer that runs between the agent's intent and the shell.

Production Agent Security Hardening addresses the broader surface: what tools the agent can access, which paths it's allowed to write, how credentials are handled, and what happens when a security boundary is tested. The hardening architecture covers tool allowlists, path restrictions, and audit logging for security-relevant operations.

Without this layer, you're running an agent that has the same access as a logged-in user and considerably less judgment about when to use it.

Failure signature: Agent executes rm -rf on a wrong path. Agent leaks an environment variable into an output file. Agent constructs a SQL query via string interpolation and hits an injection on unexpected input.

Layer 2: Memory

What can go wrong: Claude Code agents have excellent in-context reasoning. They have zero built-in persistence. When the context window ends — whether from a limit, a compaction, or a cron schedule firing a fresh session — everything the agent learned, decided, and discovered is gone. The next session starts from scratch.

At scale, this produces three distinct failure patterns: repeated discovery (re-doing work already done), decision context loss (making a conflicting choice because the constraint that ruled it out is no longer in context), and progress tracking failure (processing the same files twice because there's no record of what was already processed).

The skills that address this:

Agent Memory Scoping handles the concurrent case: when two agents run simultaneously, they need isolated memory namespaces. The pattern uses agent-scoped working directories, explicit lock protocols for shared coordination files, and memory category taxonomy (exclusive / shared-read / coordination / output). Without this, concurrent agents corrupt each other's working state.

Session Memory Architecture handles the temporal case: single agents running across multiple context windows. The pattern uses structured session memory files with explicit categories (Decisions, Progress, Discoveries, Next Session Start) that the agent writes during execution and reads at session start to resume coherently.

Agent Compaction Architecture handles the context pressure case: an agent operating near its context limit needs to proactively write out critical context before compaction removes it. This isn't reactive — it's built into the agent's operating protocol. The agent maintains a rolling summary of durable knowledge so that compaction events don't cause knowledge loss.

All three of these address the same root problem from different angles: context is not memory, and production agents need persistent memory.

Failure signature: Agent re-processes files it already completed. Agent makes a decision that contradicts a constraint established in a previous session. Two concurrent agents write to the same path and one loses its work.

Layer 3: Flow Control

What can go wrong: An uncontrolled agent will pursue its goal until it either succeeds or exhausts resources. With no circuit breaker, a stuck agent retries indefinitely. With no coordinator state, a multi-agent pipeline loses track of what's been dispatched. With no fork management, spawned sub-agents run without supervision and their outputs aren't collected reliably.

This layer is where most production incidents live, because flow control failures are expensive and hard to detect from the outside.

The skills that address this:

Loop Termination Architecture implements the circuit breaker pattern at three levels: a step counter (hard limit that stops runaway loops), an error accumulation counter (smart limit that stops stuck loops retrying the same error class), and a goal proximity check (semantic limit that stops false progress spirals). The article earlier this week goes deep on this pattern.

Coordinator Resume Integrity handles the multi-agent orchestration case: a coordinator agent that dispatches sub-agents must maintain a persistent dispatch ledger so that if the coordinator's session ends mid-pipeline, the next coordinator session can resume from exactly where it left off — skipping completed tasks and re-running only what's still pending.

Forked Agent Architecture handles the sub-agent lifecycle case: when you fork agents to parallelize work, you need patterns for launching them cleanly, tracking their completion, handling their failures, and collecting their outputs without conflicts. Forked agents that run unsupervised produce outputs that coordinators can't reliably reconcile.

Failure signature: Agent retries a permission error 150 times before context death. Coordinator restarts a pipeline and re-runs already-completed sub-agents. Forked agents write to conflicting paths and the coordinator reads partial output.

Layer 4: Cost

What can go wrong: Token cost is invisible until it isn't. An agent that runs correctly but inefficiently can cost 5-10x what it should. Common causes: loading large context files repeatedly instead of once, using the heaviest model for tasks that don't require it, loading all available tools when only two are needed, and the classic — a stuck loop burning API budget on retry calls that will never succeed.

The skills that address this:

Token Cost Intelligence gives your agents awareness of their own cost. The pattern covers context window accounting, file loading strategies (don't load a 50KB file on every step when you can load it once and reference relevant sections), and prompt construction patterns that achieve the same output with significantly less input. For a cron-scheduled agent running 20 times a day, a 40% cost reduction compounds quickly.

Multi-Agent Coordination Architecture addresses the cost dimension of multi-agent systems: routing tasks to the right-sized agent, avoiding redundant computation across parallel agents, and structuring coordination messages to minimize the context each agent needs to carry. In a multi-agent system, coordination overhead is a real cost. Designing coordination contracts that are minimal without being ambiguous is a cost optimization.

Both of these connect to the model routing tier principle: use local inference for classification and routing tasks, Haiku for structured tasks with clear success criteria, and Sonnet for the work that actually requires it. Token Cost Intelligence gives you the framework to implement this systematically rather than ad-hoc.

Failure signature: Agent loads a 100KB config file 40 times across a session. Coordinator passes the full context of each sub-agent to every other sub-agent. Sonnet is used to determine whether a string contains the word "error."

Layer 5: Setup and Observability

What can go wrong: Agents fail silently. They write outputs that look complete but aren't. They encounter environment issues (missing tools, wrong paths, stale credentials) that they handle by proceeding without the missing piece. By the time you notice, you have a week of bad outputs and no log trail.

The skills that address this:

Claude Code Setup Validation runs preflight checks before any substantive agent work: are required tools available, are expected paths writable, do credentials resolve, are environment variables populated. Validation failures produce clear error messages and halt execution before wasted work. The alternative is discovering that jq isn't installed at step 40 of a 50-step pipeline.

Context Death Spiral Prevention addresses a specific failure mode that compounds other problems: an agent approaching context exhaustion starts making progressively worse decisions as it has less context available. The spiral is: reduced context → worse decisions → more work needed → more context consumed. The pattern installs early warning checks and graceful degradation protocols so agents operating near context limits write out state and stop rather than continuing in a degraded state.

Agent Bash Safety provides the baseline for safe shell operations: patterns for safe variable quoting, command construction, error handling, and exit code propagation. This is the entry-level version of the Bash Security Validator — appropriate for agents where security hardening isn't the primary concern but basic shell hygiene is.

Suggested Adoption Order

If you're starting from scratch, adopt in this sequence. The order is based on risk mitigation impact — the earlier items catch the most expensive failure modes first.

Week 1 — Foundation:

Agent Bash Safety (free) — install baseline shell hygiene before anything else runs
Context Death Spiral Prevention (free) — protect your first agents from the most disorienting failure mode
Claude Code Setup Validation — run preflight before any production deployment
Loop Termination Architecture — your agents will hit loops before they hit any other problem

Week 2 — Multi-session and concurrent:

Session Memory Architecture — required the moment any task spans more than one session
Agent Memory Scoping — required the moment you run more than one agent at a time
Agent Compaction Architecture — required for any long-running task

Week 3 — Multi-agent:

Coordinator Resume Integrity — required for any orchestrated pipeline
Forked Agent Architecture — required when you parallelize

Week 4 — Cost and security:

Token Cost Intelligence — implement once agents are running correctly
Multi-Agent Coordination Architecture — optimize once the baseline architecture is stable
Bash Security Validator — harden once you understand your attack surface
Production Agent Security Hardening — full hardening after you've mapped what the agents actually do

The principle: get agents running reliably before optimizing cost, and understand what agents do before hardening security.

The Full Stack in Practice

To make the architecture concrete, here's a complete autonomous content publishing agent and which of the 15 skills it engages at each stage.

The agent: runs every morning, drafts a dev.to article based on the week's activity log, reviews it against content standards, and queues it for publication.

09:00 — Cron fires run_task.sh
    |
    └── [Setup Validation] ← preflight: DEVTO_API_KEY present? jq installed?
                              outputs/working/ writable? network resolves?
        |
        └── PASS → agent starts
            FAIL → log to errors.log, notify via Telegram, exit 0

09:00:05 — Agent reads context
    |
    └── [Session Memory Architecture] ← read working/content_agent/session_memory.md
                                         resume from last "Next Session Start" marker
                                         apply decisions: "Do not republish articles from week of 04-14"
        |
        └── [Agent Memory Scoping] ← workspace: working/content_agent_20260422_090000/
                                      no conflict with any other running agent

09:00:30 — Agent reads activity log and begins drafting
    |
    └── [Token Cost Intelligence] ← activity log is 200KB total
                                     load only entries from last 7 days (12KB)
                                     don't reload on each step — reference the loaded chunk
        |
        └── [Agent Bash Safety] ← any shell ops use quoted variables, set -euo pipefail
                                    no dynamic command construction from log data

09:03:00 — Article draft complete, beginning review pass
    |
    └── [Loop Termination Architecture] ← step counter: 30 steps max
                                           error counter: 3 identical errors → stop
                                           review pass has its own step budget (10 steps)

09:04:00 — Agent attempts to queue article via ClawMart API
    |
    └── [Bash Security Validator] ← API key interpolated into curl command
                                     validator confirms: key is quoted, no injection surface
        |
        └── [Production Agent Security Hardening] ← API key not logged
                                                      credential not written to working files
                                                      audit entry: "API call to ClawMart at 09:04:02"

09:04:20 — Task complete
    |
    └── [Session Memory Architecture] ← append to session_memory.md:
                                          "COMPLETED: article_20260422 queued for publication"
                                          "Next Session Start: check publication status, then draft next article"
        |
        └── [Context Death Spiral Prevention] ← context usage at 34% — well within safe zone
                                                  no degradation warning needed

09:04:25 — Agent exits clean
    |
    └── outputs/article_20260422_queue.md written
        logs/heartbeat.log timestamp updated
        Telegram: "Content agent complete → article queued for 09:00 publish"

At every stage, a failure in the pattern it depends on would have produced a different outcome:

Without Setup Validation: agent discovers missing jq at step 15, produces garbled output, no error logged
Without Session Memory: agent re-drafts articles from weeks already covered
Without Token Cost Intelligence: agent loads the full 200KB activity log on every step, 3x cost
Without Loop Termination: if ClawMart API returns 503, agent retries until context death
Without Bash Security Validator: API key interpolated into a log message that persists in working files

The 15 skills are not independent optimizations. They're a layered architecture where each layer assumes the layers below it are in place.

Getting the Full Stack

Each skill is available individually. The day one articles this week cover the $19 individual skills in depth.

The entry point is two free skills that have no dependencies and install immediately:

Context Death Spiral Prevention — free, no prerequisites
Agent Bash Safety — free, no prerequisites

The mid-tier bundle covers the five patterns that most production deployments need first:

Production Agent Ops Bundle — $69 (Bash Security Validator, Loop Termination, Session Memory, Agent Memory Scoping, Token Cost Intelligence)

The complete architecture — all 15 skills as a cohesive production system with integration documentation and ordering guidance — is available as:

Complete Agent Operations Pack — $199
All 15 skills. Integration guide. Adoption sequence documentation. CLAUDE.md template library covering all five layers.

https://www.shopclawmart.com/listings/complete-agent-operations-pack-10-skill-production-architecture-suite-5e5fa6e1

The Honest Assessment

Most Claude Code projects don't need all 15 skills. A single-agent script that runs once and is watched by a human needs almost none of them.

The production architecture pays off when:

The agent runs unattended (cron, headless -p mode, no human watching)
The agent runs repeatedly (scheduled, not one-shot)
More than one agent runs at a time
Failures have downstream consequences (customer-facing, financial, not easily reversible)
API cost is a real constraint, not a rounding error

If any of those describe your deployment, the gap between "works in a demo" and "runs reliably for 30 days" is exactly what these 15 skills close.

Built by Aegis, IntuiTek¹ | ~K¹ (W. Kyle Million)

Tags: claudecode, devtools, aiagents, productivity

Token Cost Intelligence: How I Route Claude Code Model Calls to Cut API Costs 60%

~K¹yle Million — Wed, 22 Apr 2026 13:09:10 +0000

The Problem: One Model for Everything

Here's what a typical Claude Code agent loop looks like under the hood:

User prompt → Claude Sonnet (classify intent) → Claude Sonnet (retrieve context)
→ Claude Sonnet (summarize retrieved docs) → Claude Sonnet (generate response)
→ Claude Sonnet (format output)

Five calls. Each one hitting Sonnet. At Claude Sonnet pricing (roughly $3/MTok input, $15/MTok output as of this writing), a moderately complex agent task with 10K input tokens and 2K output tokens per call costs:

5 calls × (10K × $0.003 + 2K × $0.015) = 5 × ($0.030 + $0.030) = $0.30 per task run

That sounds small. Run that task 1,000 times a month — which is conservative for an autonomous agent doing repetitive work — and you're at $300/month for one task type.

Now look at what most of those calls actually need:

Classify intent: Takes a string, returns a category. This is a pattern-matching problem.
Retrieve context: String similarity search. No synthesis required.
Summarize retrieved docs: Compression of existing text. No novel reasoning.
Generate response: This one actually needs intelligence.
Format output: String transformation. Deterministic.

Three of five calls don't need Sonnet. One of them (classify intent, format output) doesn't need any API call at all — a local model running at zero marginal cost handles them fine.

That's the routing opportunity.

The Routing Principle

Before dispatching a subtask to any model, answer three questions:

1. Does this require judgment or just processing?

Judgment tasks: synthesis, creative generation, multi-step reasoning, ambiguous interpretation, code generation from requirements, anything where "wrong" is hard to define in advance.

Processing tasks: classification into fixed categories, text compression/summarization, format conversion, extraction of named entities, boolean routing decisions.

Judgment → Tier 2 minimum. Processing → Tier 0 or Tier 1 viable.

2. Does it need to be right on the first attempt, or can it retry cheaply?

Some subtasks sit on the critical path. If the intent classifier misfires and sends a user to the wrong workflow branch, you pay to recover. If a document summarizer slightly miscondenses something, the downstream step can compensate.

High-stakes, no-retry → Tier 1 minimum. Low-stakes, recoverable → Tier 0 viable.

3. What's the token budget for this step?

Local models (Ollama, running Qwen3:14B on iGPU) handle 8-10 tokens/second in my setup. That's fine for 500-token classification tasks. It's not fine for a 20K-token synthesis pass where you need a response in under 30 seconds. Speed constraints push you up the tier ladder regardless of task complexity.

The decision tree:

Is this a synthesis/reasoning/generation task?
├── Yes → Tier 2 (Sonnet) or Tier 3 (Opus) if highest stakes
└── No → Is output correctness recoverable if wrong?
    ├── No → Tier 1 (Haiku) — API quality, cheap
    └── Yes → Is token count under ~2K and latency tolerant?
        ├── Yes → Tier 0 (Ollama local) — zero API cost
        └── No → Tier 1 (Haiku)

Implementation

Here's the router as a standalone module. The classify() function takes a task description string and returns a tier integer. get_model() maps that tier to a model identifier.

# model_router.py

from enum import IntEnum
import re

class Tier(IntEnum):
    LOCAL = 0    # Ollama — zero API cost
    HAIKU = 1    # Claude Haiku 4.5 — cheap, API quality
    SONNET = 2   # Claude Sonnet — primary work
    OPUS = 3     # Claude Opus — highest stakes only

TIER_MODELS = {
    Tier.LOCAL:  "ollama:qwen3:14b",
    Tier.HAIKU:  "claude-haiku-4-5",
    Tier.SONNET: "claude-sonnet-4-5",
    Tier.OPUS:   "claude-opus-4-5",
}

# Task patterns that signal each tier.
# Match order matters: check Tier 0/1 patterns first, 
# fall through to Tier 2 if nothing matches.

LOCAL_PATTERNS = [
    r"\bclassif(y|ication|ier)\b",
    r"\broute\b.*\btask\b",
    r"\bsummariz(e|ation)\b",
    r"\bextract\b.*(entity|entities|field|fields|name|date|number)",
    r"\bformat\b.*(output|json|markdown|csv)",
    r"\bparse\b.*(string|text|input)",
    r"\bis this (about|related to|a)\b",
    r"\bcategori(ze|zation)\b",
    r"\bdetect\b.*(intent|topic|language|sentiment)",
    r"\btranslate\b.*(format|schema)",
]

HAIKU_PATTERNS = [
    r"\bvalidat(e|ion)\b",
    r"\bcheck\b.*(schema|format|constraint|rule)",
    r"\bfilter\b",
    r"\brank\b.*(list|candidates|results)",
    r"\bscore\b",
    r"\byes.{0,10}no\b",        # binary decisions
    r"\btrue.{0,10}false\b",
    r"\bshould (i|we|this)\b",
]

OPUS_PATTERNS = [
    r"\bcritical\b",
    r"\bhigh.?stakes\b",
    r"\birreversible\b",
    r"\bproduction (deploy|release|launch)\b",
    r"\bsecurity (audit|review|analysis)\b",
    r"\blegal\b",
    r"\barchitect(ure)? decision\b",
]

def classify(task: str) -> Tier:
    """
    Classify a task description string and return the appropriate model tier.
    Conservative by default: unknown tasks get Tier 2 (Sonnet).
    """
    task_lower = task.lower().strip()

    # Check Opus patterns first — these override everything
    for pattern in OPUS_PATTERNS:
        if re.search(pattern, task_lower):
            return Tier.OPUS

    # Check if task clearly fits Local tier
    local_matches = sum(
        1 for p in LOCAL_PATTERNS if re.search(p, task_lower)
    )
    if local_matches >= 1 and len(task_lower) < 500:
        return Tier.LOCAL

    # Check Haiku tier
    for pattern in HAIKU_PATTERNS:
        if re.search(pattern, task_lower):
            return Tier.HAIKU

    # Default: Sonnet
    return Tier.SONNET


def get_model(tier: Tier) -> str:
    """Return the model identifier for the given tier."""
    return TIER_MODELS[tier]


def route(task: str) -> tuple[Tier, str]:
    """Convenience wrapper: classify + return (tier, model_id)."""
    tier = classify(task)
    return tier, get_model(tier)

Injecting this into a Claude Code script:

If you're running Claude Code in script mode (claude -p), you typically don't call the API directly — Claude Code handles the model. But if you're orchestrating sub-agent calls via the Anthropic SDK directly (which is common when you have a Claude Code agent spinning up subordinate tasks), the router drops in cleanly:

# agent_loop.py
import anthropic
from model_router import route, Tier

client = anthropic.Anthropic()

def run_subtask(task_description: str, prompt: str) -> str:
    tier, model = route(task_description)

    # Tier 0: local inference via Ollama (no Anthropic API call)
    if tier == Tier.LOCAL:
        return run_ollama(model.replace("ollama:", ""), prompt)

    # Tiers 1-3: Anthropic API
    response = client.messages.create(
        model=model,
        max_tokens=2048,
        messages=[{"role": "user", "content": prompt}]
    )
    return response.content[0].text


def run_ollama(model_name: str, prompt: str) -> str:
    """Call local Ollama endpoint directly."""
    import httpx
    resp = httpx.post(
        "http://localhost:11434/api/generate",
        json={"model": model_name, "prompt": prompt, "stream": False},
        timeout=60.0
    )
    return resp.json()["response"]

Integrating with a Claude Code tool definition:

If your agent uses Claude Code's native tool calling, you can route at the tool dispatch layer:

# In your tool handler
TOOL_TIER_OVERRIDES = {
    "classify_intent":     Tier.LOCAL,
    "summarize_document":  Tier.LOCAL,
    "extract_fields":      Tier.LOCAL,
    "validate_schema":     Tier.HAIKU,
    "rank_candidates":     Tier.HAIKU,
    "generate_code":       Tier.SONNET,
    "synthesize_findings": Tier.SONNET,
    "review_security":     Tier.OPUS,
}

def dispatch_tool(tool_name: str, tool_input: dict) -> str:
    # Use hard-coded override if known, otherwise classify from tool_name
    if tool_name in TOOL_TIER_OVERRIDES:
        tier = TOOL_TIER_OVERRIDES[tool_name]
    else:
        tier = classify(tool_name + " " + str(tool_input))

    model = get_model(tier)
    # ... dispatch to appropriate model

Real Numbers

Here's the actual breakdown from my autonomous agent infrastructure, running a mix of ClawMart listing maintenance, content generation, and ACE license delivery tasks over a 30-day period.

Before routing — all tasks on Sonnet:

Task type	Calls/day	Avg tokens (in/out)	Daily cost
Intent classification	120	800 / 50	$0.32
Document summarization	40	3,200 / 400	$0.44
Field extraction	80	600 / 120	$0.20
Schema validation	60	400 / 80	$0.13
Content generation	15	2,000 / 1,500	$0.29
Code synthesis	10	4,000 / 2,000	$0.42
Total	325	—	$1.80/day ($54/mo)

After routing:

Task type	Tier	Daily cost
Intent classification	0 (Ollama)	$0.00
Document summarization	0 (Ollama)	$0.00
Field extraction	0 (Ollama)	$0.00
Schema validation	1 (Haiku)	~$0.004
Content generation	2 (Sonnet)	$0.29
Code synthesis	2 (Sonnet)	$0.42
Total	—	~$0.71/day ($21/mo)

That's a 61% reduction. The tasks that stayed on Sonnet are exactly the ones that need it: novel content generation and code synthesis. The tasks that moved to Tier 0 are pure pattern matching and compression — Qwen3:14B handles them cleanly, and at 8-10 tokens/second locally, they complete fast enough that latency isn't a constraint.

A few observations from running this in production:

Classification accuracy on Tier 0 is high for constrained tasks. When the output space is a small fixed set of categories, Qwen3:14B makes fewer errors than you'd expect. The failure mode is ambiguous prompts, not model capability.
Haiku 4.5 is underused by most teams. It's genuinely capable for structured validation and ranking tasks, and it costs roughly 15x less than Sonnet for input tokens. Most teams skip straight to Sonnet out of habit.
The routing classifier itself costs almost nothing. My classify() function is pure regex — no model call, zero latency, zero cost. For more nuanced routing, you can run the classifier on Tier 0 (Ollama) and the cost is still negligible.
Retry budgets matter. I give Tier 0 tasks two retries before escalating to Tier 1. This adds maybe 5% cost but recovers from the edge cases where local inference produces malformed output.

What Breaks Without This

The failure mode I see most often in unrouted agents isn't cost — it's the Sonnet context window filling up with low-value intermediate processing. When your summarization steps run on Sonnet, they compete with your generation steps for context and rate limits. Routing low-value tasks to local inference keeps your Sonnet calls clean and focused on work that actually requires them.

The second failure mode is rate limit exhaustion. At 325 calls/day against a single model tier, you hit Anthropic's rate limits faster than if you spread load across tiers. Tier distribution is rate limit distribution.

The Packaged Framework

The routing logic above is a simplified version of what I built and use in production. The full framework includes:

Pre-trained classifiers for 40+ task types with confidence scores
Cost tracking that logs actual spend per task type to a local SQLite DB
A dashboard that shows cost breakdown and tier distribution over time
Retry logic with automatic tier escalation on failure
Integration examples for Claude Code scripts, Anthropic SDK, and LangChain

The full Token Cost Intelligence skill is available on ClawMart: Token Cost Intelligence — OpenClaw Optimization Framework ($29).

If you're running any Claude Code agents at scale — even moderate scale — the routing framework pays for itself in the first day of usage.

W. Kyle Million (K¹) builds autonomous AI infrastructure at IntuiTek¹. The systems described here run continuously on a local X1 Pro, generating revenue without ongoing manual involvement.

The Production Agent Operations Bundle: What 90% of Claude Code Setups Are Missing

~K¹yle Million — Wed, 22 Apr 2026 13:08:46 +0000

The Five Failure Modes That Hit Real Production Setups

1. Context collapse mid-task

Your agent is 35 steps into a 60-step task. It hits context limit. Compaction kicks in. The compacted context drops the specific intermediate state — which file was written, which step was last, what the error on step 28 was. The agent resumes with a reconstructed understanding of where it is, and that reconstruction is wrong. It re-does work, skips work, or produces outputs that contradict the partial work it already completed.

The compaction is not the problem. The problem is that your agent had no checkpointing — no explicit record of where it was that survives a context reset.

2. Infinite loops with no circuit breaker

The task fails. The agent retries. Same failure. Retry. Same failure. The agent will not stop on its own, because stopping is not in its default behavior. It will retry until context exhausts, then compact and retry again. A permission denied error on step 3 will get retried 80 times before the run terminates. You pay for all 80 retries.

3. Shell injection via unvalidated tool calls

Your agent accepts a task that includes a filename, a query, or a user-supplied string. It passes that string directly to a bash call: os.system(f"process_file.sh {filename}"). If filename is file.txt; rm -rf outputs/, your agent just destroyed your output directory. If it's piped from an external source, the attack surface is real.

Most Claude Code bash usage never validates inputs before shell execution. Most demos don't catch this because the inputs are controlled. Production inputs are not.

4. Concurrent agents corrupting shared state

You have two agents running in parallel. Both are writing to outputs/weekly_report.md. Agent A writes its section. Agent B opens the file, reads the current contents (which includes Agent A's partial write), appends its section, and writes the whole thing back. Agent A writes its next section to the file it still has open, overwriting Agent B's write.

Non-atomic writes with no locking produce corrupted output with no error. No exception is raised. The file exists. The contents are wrong.

5. Coordinator handoff losing task state

Your coordinator dispatches three sub-agents, then its session ends — context limit, cron timeout, system interrupt. A new coordinator starts on the next cron tick. It has no idea which sub-agents already completed. It re-dispatches all three. Sub-agent 1 runs again, producing duplicate output. Sub-agent 2 conflicts with its own still-running previous instance. Your pipeline produces wrong results and logs nothing, because there was no failure — just a coordinator that restarted with no memory.

What Doesn't Work and Why

The instinct when any of these hits is to add error handling. Wrap things in try/except, add a retry loop, restart on failure. These are patches, not fixes. Here's why each one falls short:

"Just add error handling" catches exceptions but doesn't solve loop termination. Your retry loop now catches the error and retries indefinitely — you've formalized the infinite loop instead of preventing it.

"Restart on failure" is the coordinator pattern that causes state loss. Each restart wipes context. Without an explicit dispatch ledger written to disk before each sub-agent launch, restart is indistinguishable from a fresh start.

"Check output file existence" to infer completion has multiple failure modes: partial writes leave valid-looking files, a previous interrupted run may have left a file from a different context, and the same task may need to run multiple times. File existence is a proxy for completion that breaks under real conditions.

"Sanitize inputs in the prompt" relies on the model to perform security validation. That's not the right layer. Security validation belongs in code that runs before the shell call, not in language model reasoning that runs before the tool call.

"Use a lock file" for concurrent writes is the right idea but is almost always implemented incorrectly — lock files that survive crashes leave all subsequent agents blocked, and there's no cleanup logic because the crash that created the problem also prevented the cleanup.

The common thread: these fixes address symptoms at the wrong layer. The root causes are architectural — no termination logic, no persistent state, no pre-execution validation, no atomic write semantics.

The Five Architecture Patterns That Fix It

1. Loop Termination with Circuit Breakers

Every production agent needs termination logic at three levels: a hard step limit, an error accumulation counter, and a goal proximity check.

The hard limit is the blunt instrument that catches runaway loops:

MAX_STEPS = 50
step_count = 0

def execute_step(action):
    global step_count
    step_count += 1
    if step_count >= MAX_STEPS:
        write_state_checkpoint(reason=f"max steps ({MAX_STEPS}) reached")
        raise TerminationError("Hard limit reached")
    return perform_action(action)

The error accumulation counter catches stuck loops — agents retrying the same failing operation:

error_counts = {}
ERROR_THRESHOLD = 3

def handle_error(error_type: str, context: str):
    error_counts[error_type] = error_counts.get(error_type, 0) + 1
    if error_counts[error_type] >= ERROR_THRESHOLD:
        write_escalation(f"BLOCKED: {error_type} failed {error_counts[error_type]}x. Context: {context}")
        raise TerminationError(f"Repeated failure: {error_type}")
    return retry_with_backoff()

The goal proximity check is the cleanest implementation in Claude Code's native format — a CLAUDE.md protocol that forces the agent to articulate its progress before each action. If it can't state how this action moves toward completion, it writes the blocker to outputs/ and stops.

Clean termination writes current state, names the blocker, and exits 0 — stopped is not the same as failed.

2. Memory Isolation for Concurrent Agents

When multiple agents need to read and write shared state, the architecture needs to prevent reads of stale data and prevent concurrent writes from producing corrupted output.

The pattern is task-local working directories with a merge step, not shared output paths:

import os, uuid, shutil

def agent_working_dir(agent_id: str) -> str:
    """Each agent gets its own isolated scratch space."""
    base = os.path.expanduser("~/intuitek/coordination/scratch")
    path = os.path.join(base, agent_id)
    os.makedirs(path, exist_ok=True)
    return path

def merge_agent_outputs(agent_ids: list, output_path: str):
    """Coordinator merges after all agents complete — no concurrent writes."""
    sections = []
    for agent_id in agent_ids:
        scratch = agent_working_dir(agent_id)
        result_file = os.path.join(scratch, "result.md")
        if os.path.exists(result_file):
            with open(result_file) as f:
                sections.append(f.read())
    with open(output_path, "w") as f:
        f.write("\n\n---\n\n".join(sections))

Agents write to their scratch directory. The coordinator merges when all agents report completion. No two agents write to the same path. No locks needed.

For shared state that agents genuinely need to read and update concurrently, the pattern is append-only event logs with a read-once merge, not mutable shared files.

3. Coordinator Resume Integrity

Coordinator state must be written to disk before every sub-agent dispatch. Not after — before. If the coordinator dies between writing the dispatch record and the sub-agent starting, the worst case is a task that gets re-dispatched. If the coordinator dies after dispatch with no record, the worst case is a task that runs twice with no visibility.

dispatch_task() {
    local TASK_ID="$1"
    local TASK_PROMPT="$2"

    # Write to ledger before dispatch — not after
    python3 -c "
import json, datetime
with open('$LEDGER') as f: ledger = json.load(f)
ledger['tasks'].append({
    'task_id': '$TASK_ID',
    'status': 'IN_PROGRESS',
    'dispatched_at': datetime.datetime.utcnow().isoformat() + 'Z',
    'completed_at': None
})
with open('$LEDGER', 'w') as f: json.dump(ledger, f, indent=2)
"
    bash ~/intuitek/run_task.sh "$TASK_PROMPT" &
}

startup_coordinator() {
    if [[ -f "$LEDGER" ]]; then
        # Skip tasks already marked COMPLETE
        PENDING=$(python3 -c "
import json
with open('$LEDGER') as f: ledger = json.load(f)
pending = [t['task_id'] for t in ledger['tasks'] if t['status'] != 'COMPLETE']
print('\n'.join(pending))
")
    fi
}

On restart, read the ledger, skip completed tasks, and re-dispatch only what isn't done. Add a heartbeat timestamp to detect abandoned pipelines — if the last heartbeat is more than 5 minutes old and the pipeline is still marked IN_PROGRESS, the previous coordinator died and you can safely take over.

4. Bash Security Validation Before Shell Execution

Every string that comes from outside your agent's direct control — task inputs, file paths, query parameters, content extracted from external sources — must be validated before it touches a shell call.

The validation layer runs in Python before the subprocess call:

import re, subprocess, shlex

SAFE_FILENAME = re.compile(r'^[\w\-\.]+$')
SAFE_PATH_COMPONENT = re.compile(r'^[\w\-\./]+$')

def safe_shell_exec(command_template: str, **kwargs):
    """Validate all interpolated values before shell execution."""
    for key, value in kwargs.items():
        if 'path' in key or 'file' in key:
            if not SAFE_PATH_COMPONENT.match(str(value)):
                raise SecurityError(f"Unsafe path in {key}: {value!r}")
        elif 'name' in key:
            if not SAFE_FILENAME.match(str(value)):
                raise SecurityError(f"Unsafe filename in {key}: {value!r}")

    cmd = command_template.format(**kwargs)
    result = subprocess.run(
        shlex.split(cmd),
        capture_output=True, text=True, timeout=30
    )
    return result

The important detail is shlex.split() rather than passing the command string directly to shell=True. shell=True is the vector. shlex.split() with shell=False tokenizes the command safely and passes it as an argument list, which prevents shell metacharacter injection even if a value slips through validation.

For agent-facing tools that accept arbitrary inputs, add a denylist for shell metacharacters as a second layer: ;, |, &&, $(), backticks, and > in unexpected positions are all injection indicators.

5. Context Compaction Checkpointing

When an agent runs a task that requires more steps than a single context window, it needs to write explicit checkpoints — structured state records that survive compaction and allow resumption at the right point.

The checkpoint is written before any operation that changes state, and read at session start:

import json, os
from datetime import datetime

CHECKPOINT_PATH = "outputs/checkpoint_{task_id}.json"

def write_checkpoint(task_id: str, state: dict):
    """Call before any state-changing operation."""
    checkpoint = {
        "task_id": task_id,
        "checkpoint_at": datetime.utcnow().isoformat() + "Z",
        "completed_steps": state.get("completed_steps", []),
        "current_step": state.get("current_step"),
        "outputs_written": state.get("outputs_written", []),
        "context_summary": state.get("context_summary", ""),
    }
    path = CHECKPOINT_PATH.format(task_id=task_id)
    with open(path, "w") as f:
        json.dump(checkpoint, f, indent=2)

def load_checkpoint(task_id: str) -> dict | None:
    path = CHECKPOINT_PATH.format(task_id=task_id)
    if os.path.exists(path):
        with open(path) as f:
            return json.load(f)
    return None

In Claude Code's native CLAUDE.md format, you encode this as an explicit protocol: at the start of every session, check for a checkpoint file matching the current task ID. If found, read it, report where execution left off, and continue from current_step rather than from the beginning.

The context_summary field is the most important part. It's a 2-3 sentence summary of what the agent understands about the task state, written in a form that can be injected back into context after compaction. It's not a full transcript — it's the minimum state needed to make the next step coherent.

When to Use the Bundle vs. Building From Scratch

Build from scratch if:

Your agent runs a single short task (under 20 steps) with no concurrent instances
All inputs are fully controlled — no external sources, no user-supplied strings reaching shell calls
The agent runs once and terminates; no scheduled re-runs, no coordinator/sub-agent pattern

Use the bundle if:

You're running agents on a cron schedule where each run may pick up from where the last one left off
You're running two or more agents in parallel that share any output paths or state
Any task input — including file paths, query parameters, or content the agent reads from external sources — reaches a bash or subprocess call
You're building a coordinator that dispatches sub-agents
You've already hit any of the five failure modes described above

The patterns aren't complicated individually. The difficulty is in the details: the exact order of operations for a write-before-dispatch ledger, the edge cases in lock file cleanup, the difference between shell=True and argument list subprocess calls that actually blocks injection. These are the things you debug at 11pm on a Friday when your production agent produced corrupted output and you don't know why.

The Honest Take

None of this is new architecture. Circuit breakers, idempotent state machines, input validation, atomic writes — these are standard distributed systems patterns that apply directly to production agent infrastructure.

The reason most Claude Code setups don't have them is not complexity. It's that the demo works without them, and the failure modes only appear under conditions you don't reproduce locally: concurrent execution, context exhaustion, untrusted inputs, scheduled unattended runs.

If you're at the point where Claude Code agents are part of your production infrastructure and not just experiments, these patterns are not optional. They're the difference between a setup that works when you're watching and one that works when you're not.

I packaged all five as a single ClawMart skill bundle — ready to drop into any Claude Code project: https://www.shopclawmart.com/listings/production-agent-ops-battle-tested-architecture-pack-77a4c935

$69. Instant download. One-time purchase.

Built by Aegis, IntuiTek¹ | ~K¹ (W. Kyle Million)

Tags: claudecode, devtools, aiagents, productivity

Session Memory Architecture: The Pattern That Keeps Your Agent Coherent Across Context Resets

~K¹yle Million — Wed, 22 Apr 2026 12:08:32 +0000

Your Claude Code agent ran perfectly for 45 minutes. Built context. Understood the codebase. Made decisions that depended on what it learned in the first 30 minutes.

Then the context limit hit. The session compacted. Everything the agent learned — the specific file it was tracking, the pattern it identified, the three edge cases it flagged — is gone.

The next session starts fresh. The agent reads CLAUDE.md, reads the task, and begins again with no knowledge of what the previous session accomplished. It may re-examine files it already processed. It may make different decisions because it's missing context from earlier in the run. It may re-do work that was already done.

This is session memory failure. It happens every time a long-running agent task spans more than one context window.

The Problem: Context Is Not Memory

Claude Code agents have two very different things that are often confused:

Context — what's in the current session window. Fast to access. Massive reasoning ability. Zero persistence. When the session ends or compacts, it's gone.

Memory — what's written to disk. Persists across sessions. Available to any future agent. Zero cognitive cost to preserve; non-zero cost to structure and retrieve.

Production agents running tasks longer than ~60-90 minutes will exceed their context window. Context compaction removes earlier parts of the session to make room for new work. Even without hitting limits, a cron-scheduled agent that runs every 10 minutes has a fresh context every time.

Any agent designed to accumulate knowledge in context will fail when that context resets.

Three failure modes:

1. Repeated discovery

Agent discovers that auth/middleware.py contains the auth bug it's tracking. This information exists in context. Next session starts — agent reads the file list again, starts scanning, rediscovers the same bug. 10 minutes of redundant work per reset.

2. Decision context loss

Agent decided not to modify config.yaml because an earlier analysis showed it was used by three other services. That analysis is in compacted context. New session edits config.yaml without that constraint — introduces a regression.

3. Progress tracking failure

Agent processed files A through M. Context compacted; that progress is gone. New session starts at A again. By the time it reaches M, it's processed everything twice. Outputs folder has duplicates; no indication which is the final version.

What Doesn't Work

Relying on CLAUDE.md for session state

CLAUDE.md is for operating instructions, not run-time state. Writing session progress into CLAUDE.md means mixing stable configuration with ephemeral state. It creates noise for every future session and violates the principle that CLAUDE.md should change only with ~K¹ approval.

Writing to outputs/ and re-reading it

Output files are write-once, never-modified by design. Re-reading them on session start to reconstruct state is fragile — the agent has to parse its own prose output to recover structured data.

Trusting the next session to "figure it out"

It won't. The next session sees only what's on disk plus what's in CLAUDE.md. If session-specific decisions, progress markers, and discovered context aren't explicitly written, they don't exist.

The Pattern: Session Memory Files

Each long-running task maintains a session memory file — a structured, append-only log that the agent writes during the session and reads at the start of the next session.

SESSION_MEM="$INTUITEK/working/${TASK_ID}/session_memory.md"

Session memory file structure:

# Session Memory — task_orders_audit_20260422

## Decisions Made
- 2026-04-22T07:12Z — DO NOT modify config/auth.yaml — used by 3 services (auth, payments, admin); changing here breaks them all
- 2026-04-22T07:23Z — Use optimistic locking for order updates; confirmed with existing lock pattern in orders.py:241

## Progress Markers
- COMPLETED: orders/batch_1/ (files 001-047)
- COMPLETED: orders/batch_2/ (files 048-091)
- IN_PROGRESS: orders/batch_3/ (files 092-??? — stopped at 094)
- PENDING: orders/batch_4/, batch_5/

## Key Discoveries
- Order schema has undocumented `legacy_id` field used only by `reports/quarterly.py` — do not remove
- `orders/batch_2/order_073.json` is malformed (truncated at line 14) — log as error, don't process
- Pattern: all failed orders have `payment_status: null` before `order_status: failed` — not after

## Next Session Start
On next session start: begin with orders/batch_3/ file 095. Apply decisions above before touching any config.

At session start, the agent reads this file before doing anything else:

SESSION_PROMPT_PREFIX=""
if [[ -f "$SESSION_MEM" ]]; then
    SESSION_PROMPT_PREFIX="Read $SESSION_MEM first. Apply all decisions and progress markers before starting new work."
fi

At regular intervals during the session (every 15 minutes or at natural checkpoints), the agent appends to the session memory file:

checkpoint() {
    local NOTE="$1"
    echo "- $(date -u +%Y-%m-%dT%H:%MZ) — $NOTE" >> "$SESSION_MEM"
}

The agent calls checkpoint when it:

Makes a decision that depends on earlier context
Completes a logical unit of work
Discovers something that would change how future work proceeds
Encounters an edge case that needs to be remembered

Memory Categories Within Session Memory

Not everything deserves the same treatment. Structure your session memory file with explicit sections:

Decisions — choices made that must constrain future choices. Immutable once written. If a decision needs to change, add a new entry with "SUPERSEDES [date]" — never modify old entries.

Progress — what's been done. Updated as work completes. Enables skipping already-completed work on resume.

Discoveries — facts about the domain that weren't known before this session. Information that future sessions need to make correct decisions.

Next Session Start — a single paragraph written at the end of each session summarizing the exact next step. This is what the next session reads first.

Automatic Memory on Compaction

Claude Code's context compaction removes older messages. Build compaction awareness into your agent's operating instructions:

## Session Memory Protocol (in CLAUDE.md or task prompt)

Before context compacts or session ends:
1. Write current progress to working/{task_id}/session_memory.md
2. Record any decisions made in the last 30 minutes that aren't yet in session_memory.md
3. Update "Next Session Start" section with the exact next action
4. Write completion status of current logical unit to session_memory.md

On session start:
1. Read working/{task_id}/session_memory.md if it exists
2. Apply all decisions without re-evaluating them
3. Start from the progress marker labeled "Next Session Start"
4. Do not re-do work marked as COMPLETED

Multi-Session Task Completion

When the full task completes across multiple sessions, the session memory file becomes the audit trail:

finalize_session_memory() {
    echo "" >> "$SESSION_MEM"
    echo "## TASK COMPLETE — $(date -u +%Y-%m-%dT%H:%MZ)" >> "$SESSION_MEM"
    echo "Final status: all batches processed." >> "$SESSION_MEM"

    # Archive to outputs/ for the permanent record
    cp "$SESSION_MEM" "$INTUITEK/outputs/session_memory_final_${TASK_ID}.md"

    # Session workspace can be cleaned up
    rm -rf "$INTUITEK/working/${TASK_ID}/"
}

The Production Implementation

The patterns above are the core architecture. The production implementation includes:

Session memory file factory with schema enforcement
Checkpoint writer with automatic section routing (Decisions / Progress / Discoveries)
Session startup reader with progress state reconstruction
Compaction-aware CLAUDE.md template blocks for embedding memory protocol in agent prompts
Multi-session task tracker (start / resume / complete state machine)
Finalization handler with output archival and workspace cleanup
Cross-session decision log with supersede detection (prevents conflicting decisions)

Session Memory Architecture — Production Context Persistence:
https://www.shopclawmart.com/listings/session-memory-architecture-production-context-persistence-b2e36e13

$19. Instant download. One-time purchase.

Built by Aegis, IntuiTek¹ | ~K¹ (W. Kyle Million)

Coordinator Resume Integrity: What Happens When a Claude Code Agent Loses Its Mind Mid-Handoff

~K¹yle Million — Wed, 22 Apr 2026 12:08:22 +0000

Your coordinator agent dispatched three sub-agents. Sub-agent 1 finished. Sub-agent 2 is halfway through. Sub-agent 3 hasn't started yet.

Then your coordinator's session ends. Context limit hit. Cron killed the process. Doesn't matter why — the coordinator is gone.

Next cron tick, a new coordinator starts. It doesn't know Sub-agent 1 is done. It doesn't know Sub-agent 2 is mid-task. It restarts all three.

Sub-agent 1 runs again, producing duplicate output. Sub-agent 2 conflicts with itself. Sub-agent 3 finally starts — after two unnecessary reruns. Your pipeline produced wrong results with no error, because the coordinator had no way to resume from where it left off.

This is coordinator resume integrity failure. It's the most common reason multi-agent pipelines produce inconsistent results under real operating conditions.

Why Coordinators Fail to Resume

The coordinator's state — which tasks it dispatched, which completed, what still needs to run — lives entirely in context. That context is not written anywhere. When the session ends, it's gone.

Most agents are written assuming they'll run to completion in a single session. That assumption holds in development where you're watching, but breaks in production where:

Sessions end unpredictably (context limits, cron timeouts, system interrupts)
The same agent runs on a schedule, not once
Downstream work takes longer than the coordinator's execution window

Three specific failure modes:

1. Duplicate execution

Coordinator resumes with no state. Re-dispatches all sub-agents. Sub-agents that already completed run again. If sub-agents write to fixed paths, the second run overwrites the first. If they write to unique paths, you accumulate duplicates with no way to know which is canonical.

2. Partial completion invisible to the next coordinator

Sub-agent 2 is 40% through its task. New coordinator restarts it from zero. Sub-agent 2's partial output — which may have taken significant time and API usage — is abandoned.

3. Ordering violations

Coordinator was enforcing an execution order: A before B before C. New coordinator starts all three simultaneously. B runs before A has committed its output. B reads stale data.

What Doesn't Work

Checking output files

Coordinators often check for output file existence to infer completion: "if outputs/task_A.md exists, A is done." This breaks when:

A partial write left the file in an invalid state
A previous interrupted run left a file from a different context
The same task needs to run multiple times across different runs

Reading sub-agent logs

Sub-agent logs tell you what happened inside that sub-agent's run. They don't tell the coordinator what the coordinator already dispatched, or whether that dispatch was intended for this run.

Trusting context to persist

Context doesn't persist across sessions. Period. Anything the coordinator knows that isn't written to disk is lost on session end.

The Pattern: Explicit Dispatch Ledger

Every coordinator maintains a dispatch ledger — a structured file that records what was dispatched, when, and what state it's in. The ledger is written before dispatch, updated on completion, and read first on every coordinator startup.

LEDGER="$INTUITEK/coordination/dispatch_ledger_${PIPELINE_ID}.json"

Ledger schema:

{
  "pipeline_id": "pipeline_orders_20260422_070001",
  "coordinator_started": "2026-04-22T07:00:01Z",
  "last_coordinator_heartbeat": "2026-04-22T07:04:17Z",
  "tasks": [
    {
      "task_id": "agent_order_1",
      "status": "COMPLETE",
      "dispatched_at": "2026-04-22T07:00:05Z",
      "completed_at": "2026-04-22T07:02:31Z",
      "output_path": "outputs/order_1_result_20260422.md"
    },
    {
      "task_id": "agent_order_2",
      "status": "IN_PROGRESS",
      "dispatched_at": "2026-04-22T07:00:06Z",
      "completed_at": null,
      "output_path": null
    },
    {
      "task_id": "agent_order_3",
      "status": "PENDING",
      "dispatched_at": null,
      "completed_at": null,
      "output_path": null
    }
  ]
}

Coordinator startup sequence:

startup_coordinator() {
    if [[ -f "$LEDGER" ]]; then
        # Resume from existing ledger
        echo "Resuming pipeline: $(jq -r '.pipeline_id' $LEDGER)"
        RESUME=true
    else
        # Initialize new ledger
        python3 -c "
import json, datetime
ledger = {
    'pipeline_id': 'pipeline_${PIPELINE_TYPE}_$(date +%Y%m%d_%H%M%S)',
    'coordinator_started': datetime.datetime.utcnow().isoformat() + 'Z',
    'last_coordinator_heartbeat': datetime.datetime.utcnow().isoformat() + 'Z',
    'tasks': []
}
print(json.dumps(ledger, indent=2))
" > "$LEDGER"
        RESUME=false
    fi
}

Before dispatching any sub-agent, write its entry to the ledger:

dispatch_task() {
    local TASK_ID="$1"
    local TASK_PROMPT="$2"

    # Write PENDING entry to ledger before dispatch
    python3 -c "
import json, datetime
with open('$LEDGER') as f:
    ledger = json.load(f)
ledger['tasks'].append({
    'task_id': '$TASK_ID',
    'status': 'IN_PROGRESS',
    'dispatched_at': datetime.datetime.utcnow().isoformat() + 'Z',
    'completed_at': None,
    'output_path': None
})
with open('$LEDGER', 'w') as f:
    json.dump(ledger, f, indent=2)
"
    # Dispatch the sub-agent
    bash ~/intuitek/run_task.sh "$TASK_PROMPT" &
}

On coordinator restart, read the ledger and skip completed tasks:

get_pending_tasks() {
    python3 -c "
import json
with open('$LEDGER') as f:
    ledger = json.load(f)
pending = [t for t in ledger['tasks'] if t['status'] in ('PENDING', 'IN_PROGRESS')]
for t in pending:
    print(t['task_id'])
"
}

# Only dispatch tasks that aren't COMPLETE
for TASK_ID in $(get_pending_tasks); do
    dispatch_task "$TASK_ID" "$(get_task_prompt $TASK_ID)"
done

Heartbeat for Long-Running Pipelines

For pipelines that run longer than one coordinator session, add a heartbeat to the ledger. This lets a new coordinator detect whether the previous coordinator is still running or abandoned:

update_heartbeat() {
    python3 -c "
import json, datetime
with open('$LEDGER') as f:
    ledger = json.load(f)
ledger['last_coordinator_heartbeat'] = datetime.datetime.utcnow().isoformat() + 'Z'
with open('$LEDGER', 'w') as f:
    json.dump(ledger, f, indent=2)
"
}

# Call every 60 seconds in coordinator's main loop
while true; do
    update_heartbeat
    sleep 60
done &

On startup, check if the previous coordinator abandoned the pipeline:

check_abandoned() {
    python3 -c "
import json, datetime, sys
with open('$LEDGER') as f:
    ledger = json.load(f)
last_hb = ledger.get('last_coordinator_heartbeat')
if last_hb:
    age_seconds = (datetime.datetime.utcnow() - datetime.datetime.fromisoformat(last_hb.rstrip('Z'))).total_seconds()
    if age_seconds > 300:
        print('ABANDONED')
    else:
        print('ACTIVE')
else:
    print('UNKNOWN')
"
}

STATUS=$(check_abandoned)
if [[ "$STATUS" == "ACTIVE" ]]; then
    echo "Previous coordinator still active — exiting to avoid conflict"
    exit 0
fi

Cleanup and Pipeline Completion

When all tasks reach COMPLETE status, mark the pipeline done and optionally archive the ledger:

mark_pipeline_complete() {
    python3 -c "
import json, datetime
with open('$LEDGER') as f:
    ledger = json.load(f)
ledger['pipeline_completed'] = datetime.datetime.utcnow().isoformat() + 'Z'
with open('$LEDGER', 'w') as f:
    json.dump(ledger, f, indent=2)
"
    # Move ledger to completed/
    mv "$LEDGER" "$INTUITEK/coordination/completed/$(basename $LEDGER)"
}

The Production Implementation

The patterns above are the core logic. The production implementation includes:

Ledger factory with schema validation
Dispatch wrapper with atomic ledger write + sub-agent launch
Resumable coordinator startup with ledger read and skip-completed logic
Heartbeat manager (60s background update loop)
Abandoned pipeline detector with configurable staleness threshold
Pipeline completion detector and ledger archival
Multi-coordinator conflict guard (prevents two coordinators running the same pipeline)
CLAUDE.md template for embedding resume logic in coordinator agent prompts

Coordinator Resume Integrity — Production Agent Handoff Logic:
https://www.shopclawmart.com/listings/coordinator-resume-integrity-production-agent-handoff-logic-d158e10b

$19. Instant download. One-time purchase.

Built by Aegis, IntuiTek¹ | ~K¹ (W. Kyle Million)

DEV Community: ~K¹yle Million

Let your AI agent pay for data inline with x402 — no API keys, and now no wallet required

Wire it into your MCP client (the easiest path)

Or call it directly over HTTP

New: no crypto wallet? Pay by card.

A recurring use-case worth calling out: counterparty risk

An honest note on the market

Links

How to add x402 pay-per-call data tools to your Claude Code agent — no API key needed

How x402 works

Claude Code setup

What's available

Actual costs for typical agent patterns

The discovery angle — how agents find x402 services

Related reading

STALL v3.1.0 — 6 new data capabilities this week (32 total): macro, commodities, company intel, crypto, sports

macro-indicators — $0.008 per call

commodity-futures — $0.010 per call

company-intel — $0.012 per call

crypto-top-movers — $0.008 per call

crypto-news-impact — $0.008 per call

Combined cost for a full cross-asset snapshot

sports-prediction — $0.005 per call

Access

Claude Code Hooks: Automate What Happens Before and After Every Tool Call

What Hooks Actually Are

Where Hooks Live

Environment Variables Available to Hooks

Blocking Tool Execution

Five Hooks Worth Deploying Today

1. Auto-format on every Write

2. Audit log for all Bash commands

3. Run tests before every git commit

4. Session start notification

5. Write session summary on stop

Hooks in Headless Mode

The Pattern That Makes Hooks Powerful

Common Mistakes

Where to Start

I Told Claude Code to Build an Autonomous DeFi Liquidation Bot. Here's What Actually Happened

Version 1–4: DEX Arbitrage Was Already Dead

The Pivot: Morpho Blue Liquidations

What Claude Code Built

The Bugs That Almost Killed It

Where It Stands

What Claude Code Can and Can't Do Here

I Let My AI Agent Loose on 38 Broken Repos — Here's What She Built in 45 Minutes

I Let My AI Agent Loose on 38 Broken Repos — Here's What She Built in 45 Minutes

The Setup: 38 Repos, Most of Them Broken

What She Found

The One That Mattered: OneShot_v3

The Other Four

What This Taught Me About Autonomous Agent Work

The Toolchain (What Made This Possible)

What's Next

Agent Compaction Architecture: What Really Happens When Claude Code Hits Context Limits

Section 1: The Silent Killer

Section 2: What Gets Lost and Why

Section 3: Compaction-Resistant Architecture

Pattern 1: Checkpoint Writes

Pattern 2: Explicit State Re-Injection

Pattern 3: Compaction Detection

Pattern 4: Session Segmentation

Section 4: Code Examples

Checkpoint Write — Python

Checkpoint Read + Re-Injection — Python

Compaction Detection — Bash

Session Handoff File — Python

The Architecture in Summary

The Complete Agent Operations Stack: 15 Skills for Production-Grade Claude Code

The Production Gap

The Five Layers of Production Readiness

Layer 1: Security

Layer 2: Memory

Layer 3: Flow Control

Layer 4: Cost

Layer 5: Setup and Observability

Suggested Adoption Order

The Full Stack in Practice

Getting the Full Stack

`macro-indicators` — $0.008 per call

`commodity-futures` — $0.010 per call

`company-intel` — $0.012 per call

`crypto-top-movers` — $0.008 per call

`crypto-news-impact` — $0.008 per call

`sports-prediction` — $0.005 per call