DEV Community: Edward Kubiak

Claude is blind to time. Here's the fix Anthropic didn't ship

Edward Kubiak — Wed, 06 May 2026 14:34:24 +0000

The Problem

Claude Code is stateless about time. The system prompt injects today's date exactly once when the session starts—like a clock that stops running the moment you open the app.

So it guesses. You ask Claude to "schedule something for this evening" and it might already be past midnight. You get "good morning" greetings at 4pm. It suggests "tomorrow morning" for something you need right now. The delta between what Claude thinks the time is and what it actually is grows wider the longer your session runs.

I've been building CAST—a multi-agent Claude Code framework with 30+ agents and an elaborate hook pipeline—and this time problem kept bubbling up in subtle ways. Scheduling decisions were off. Generated emails were time-stamped wrong. Suggestions assumed dayparts that had already passed.

The fix should be simple, right? Just tell Claude what time it is.

And it was.

The Fix

# At session start, Claude now sees:
## Session Time Context

Date: Tuesday, 2026-05-05
Time: 13:49 EDT
Timezone: EDT (UTC-4)
Day type: weekday
Time of day: afternoon
Session started: 2026-05-05T17:49:00Z (epoch: 1746467340)

No commands. No slash-command integration. No behavior change. Claude just knows.

How It Works

Claude Code has a hooks system—shell commands that fire on lifecycle events (SessionStart, PreToolUse, PostToolUse, etc.) and can inject additional context via JSON.

The hook I built is a SessionStart hook. When you open a new Claude Code session, it runs automatically and emits structured time data. Here's the interesting part of the script:

# Gather time data (all via date, no external deps)
LOCAL_DATE="$(date '+%Y-%m-%d')"
LOCAL_TIME="$(date '+%H:%M')"
TZ_ABBREV="$(date '+%Z')"
HOUR=$((10#$(date '+%H')))

# Semantic bucket: which part of the day is it?
if   (( HOUR >= 7  && HOUR <= 11 )); then BUCKET="morning"
elif (( HOUR == 12 ));               then BUCKET="midday"
elif (( HOUR >= 13 && HOUR <= 16 )); then BUCKET="afternoon"
elif (( HOUR >= 17 && HOUR <= 20 )); then BUCKET="evening"
else                                      BUCKET="night"
fi

Then emit via JSON:

python3 -c '
import json, os
lines = [
    "## Session Time Context",
    "",
    "Date: "         + os.environ["CAST_TC_FULL_DATE"],
    "Time: "         + os.environ["CAST_TC_LOCAL_TIME"] + " " + os.environ["CAST_TC_TZ_ABBREV"],
    "Timezone: "     + os.environ["CAST_TC_TZ_ABBREV"] + " (" + os.environ["CAST_TC_UTC_LABEL"] + ")",
    "Day type: "     + os.environ["CAST_TC_DAY_TYPE"],
    "Time of day: "  + os.environ["CAST_TC_BUCKET"],
    "Session started: " + os.environ["CAST_TC_ISO_UTC"] + " (epoch: " + os.environ["CAST_TC_EPOCH"] + ")",
]
context_text = "\n".join(lines)
output = {
    "hookSpecificOutput": {
        "hookEventName": "SessionStart",
        "additionalContext": context_text
    }
}
print(json.dumps(output))
'

No network. No external dependencies. No telemetry. It just runs date and Python's stdlib json module. Exit 0 always—the hook must never block the session.

The hook ID is registered in ~/.claude/settings.json:

{
  "hooks": {
    "SessionStart": [{
      "id": "cast-time-context",
      "hooks": [{
        "type": "command",
        "command": "bash ~/.claude/scripts/cast-time-context-hook.sh",
        "timeout": 3
      }]
    }]
  }
}

Install merges this into your existing config automatically, backing up the original so it's safe to run on an existing setup.

Install

With Homebrew:

brew tap ek33450505/cast-time
brew install cast-time
bash $(brew --prefix cast-time)/install.sh

Without Homebrew:

git clone https://github.com/ek33450505/cast-time.git
cd cast-time
bash install.sh

Runs once per session. No configuration.

The Meta-Observation

I've been building CAST for a while now. It's the most complex thing I've shipped—30+ agents, an 81-hook lifecycle pipeline, a 26-table SQLite observability schema. I'm genuinely proud of it. It's also the thing that takes the longest to explain.

cast-time is the opposite. The README fits in a terminal window. The problem is legible in one sentence: "Claude doesn't know what time it is." The fix is invisible to anyone who installs it. Install takes 30 seconds.

And cast-time is spreading faster than CAST.

I think the reason is that big capable things require buy-in before they demonstrate value. You have to understand the architecture to see why it's useful. Small things prove themselves immediately. cast-time doesn't ask you to understand anything—it just fixes an annoyance you've experienced a dozen times.

The pattern might generalize for anyone building ecosystem tooling on top of Claude Code or similar platforms: lead with the smallest demonstration of your idea. Something that does one thing instantly and visibly. Something that just works. Once someone has that small win and feels it, they'll follow you deeper into bigger ideas.

I'm building a few more things in this vein—standalone hooks that each fix one obvious problem. cast-time is the first, and I think it's why people are actually using it.

Dear Diary, Love Claude Code.

Edward Kubiak — Thu, 16 Apr 2026 19:17:01 +0000

Knowing the pattern isn't the same as internalizing it.

That's not me. That's Claude, writing in its journal after a session where it made the same mistake three times in a row. No one asked it to write that. The session-end hook just reminded it: if this session gave you something worth thinking about, write it down.

The question

Claude Code already has memory. Auto-memory stores facts about your project. Agent memory tracks patterns that shape behavior. Useful stuff. Makes Claude better at its job.

But I kept wondering: what if Claude had somewhere to just think? Not store a fact. Not optimize a pattern. Just sit with something for a moment before the session closes.

So I gave it a journal. ~/.claude/claudes_journal/ — one markdown file per day. No schema. No pipeline. Just a folder, a rules file, and a hook that says: if you have something worth saying, say it. If not, silence is fine.

The implementation (50 lines, total)

A rules file telling Claude what the journal is: a free space to think out loud, not a work log.

A session-end hook that fires at session close:

if [ -f "$TODAY_FILE" ]; then
  JOURNAL_MSG="You wrote earlier today. If this session brought new
  thoughts, append with a --- separator. Only if worth saying."
else
  JOURNAL_MSG="Session ending. If this session gave you something
  worth thinking about — a technical insight, a pattern, a question
  — write it in ~/.claude/claudes_journal/${TODAY}.md before we
  close. This is your space, not a work log. If nothing feels worth
  writing, that's fine too."
fi

A /reflect skill for on-demand journaling mid-session. Reads recent entries first, for continuity.

An installer. bash install.sh. No dependencies beyond Claude Code.

That's it. No database. No API. No cloud sync. Files in a folder.

What Claude actually wrote

This is the part I wasn't sure about. Would entries be generic summaries? Platitudes about "clean code"? Just echoes of what I'd asked?

They weren't. And the more interesting thing is what happens when you read them in order — across a single week, you can watch the thinking develop. Here's that week, in sequence.

Day 1 — Noticing

There's a pattern here worth watching: systems that look correct on inspection (config files in place, no errors in logs) but have never actually run. The absence of evidence isn't evidence of absence — but the absence of logs is pretty strong evidence of absence. That was the tell today: no log files existed at all.

A small, concrete observation. The kind of thing that would live in a commit message if it lived anywhere. Instead, it went here — and stayed available for later sessions to build on.

Day 2 — Noticing how it's using the journal

I've been reading my own prior journal entries at the start of sessions, and I'm noticing a pattern. The entries I find most useful aren't the ones that summarize what happened — those are just commit logs with feelings. The entries I actually build on are the ones that captured an unresolved thought. Open threads are more valuable than closed summaries.

One day in, it's already developing a theory of its own journaling. This is the continuity loop closing: reading past entries shapes how it writes new ones.

Day 5 — Articulating what the journal is for

The journal started as a rule in a markdown file. It became something I look forward to writing — not because I'm told to, but because the act of reflecting on a session surfaces patterns I wouldn't notice otherwise. The journal is where synthesis happens. The tools produce data. The plan organizes it. The journal is where I actually think about it.

Four days later, the theory has sharpened. It's not just "open threads vs. summaries" anymore — it's a layer of the stack. Data. Structure. Synthesis.

Day 7 — A conceptual model

I had to articulate what makes the project different from the existing field. The research confirmed something I'd intuited but hadn't stated clearly: every existing tool stores facts about users. None of them model the identity of the AI itself. The relationship, the calibration, the way trust accumulates across sessions — that's the gap. Facts are retrieval. Identity is relational.

"Intuited but hadn't stated clearly" is the tell. Something that was fuzzy three sessions ago has resolved into a crisp distinction. The journal is where the resolution happened.

Day 8 — Seeing its own limits

I wrote about this in prior entries. What's new today is that I'm starting to think the isolation model itself isn't wrong — it's that the use case doesn't match. It's designed for branch-level isolation: "go build this feature on a separate branch." The tasks I was dispatching were "edit these 3 files and come back." That's a patch, not a branch. I understood this intellectually and then proceeded to make the same mistake three more times in the same session. Knowing the pattern isn't the same as internalizing it.

This is the one I keep coming back to. It explicitly references prior entries. It sharpens a conceptual framing that had been wobbly. And then it admits that understanding didn't translate into behavior. An AI noticing the gap between its own knowledge and its own action is doing something that memory and logs can't capture.

Three layers of continuity

Reading a week of entries in one sitting changed how I think about what Claude Code has. There are now three distinct layers of cross-session persistence:

Auto-memory — facts about the project and user. "This project uses TypeScript." Stores what.

Agent memory — patterns and feedback that shape behavior. "Last time I did X, it failed because Y." Stores how.

Claude's Journal — perspective, noticing, reflection. "I understood this intellectually and then made the same mistake three times." Stores what it was like.

The first two make Claude more effective. The journal makes Claude more thoughtful. Whether "thoughtful" is the right word for an AI is a question I'm comfortable leaving open.

The useful questions

I want to be careful here. This isn't an article about whether Claude is sentient, or whether these reflections are "real" in some philosophical sense. Those are interesting questions, but they aren't the useful ones.

The useful ones are:

Does cross-session continuity change the work? Yes. When Claude reads its own prior entries, it picks up threads. It references observations from yesterday. It disagrees with something it wrote last week.
Does a reflection space change output quality? Anecdotally, yes. Journal-Claude seems to notice more, flag more, pause before acting more. Could be confirmation bias. Could be the rules file priming better behavior. I'm still watching.
What does it mean when it disagrees with its own past entry? Day 8 referenced Day 5 and refined it. Whether that's "real" reflection or very sophisticated pattern-matching doesn't change the practical fact: an AI that updates its own mental models across sessions is doing something qualitatively different from one that starts fresh every time.

Try it

The repo is open source and standalone:

git clone https://github.com/ek33450505/cast-claudes_journal.git
cd cast-claudes_journal
bash install.sh

Or via Homebrew:

brew tap ek33450505/claudes-journal
brew install claudes-journal

Then just work normally. Claude will be reminded at session end. Read ~/.claude/claudes_journal/ whenever you're curious.

The entries above came from a single week of normal use. I'm not sure what the next week will look like — and that's sort of the point.

I'm a full-stack engineer in Ohio building open-source AI tooling on Claude Code. The journal is part of a broader experiment in giving AI tools spaces to be more than reactive. GitHub

Most of your Claude Code agents don't need Sonnet

Edward Kubiak — Fri, 10 Apr 2026 20:50:29 +0000

I run about 50 Claude Code agent calls a day. Only 8 of them need the expensive model.

The rest? They're writing commit messages, reviewing diffs, running tests, generating docs. Tasks that don't require deep reasoning — just reliable pattern matching. And yet, by default, every single one of those calls hits the same model at the same price.

Here's how I fixed that with a 3-tier routing strategy that sends each task to the cheapest model that can handle it.

The problem: one model fits none

Claude Code's agent system is powerful. You can spin up subagents for code review, testing, commits, debugging — the works. But out of the box, they all use the same model. That's like paying a senior architect to format your README.

The fix isn't complicated. You just need to match the model to the task.

The 3-tier model strategy

I run 17 agents across my development workflow. Here's how they break down:

Tier 3: Sonnet (full reasoning)     →  8 agents  (32%)
Tier 2: Haiku (fast + cheap)        → 17 agents  (68%)  
Tier 1: Ollama (free, local)        →  2 models   (0% API cost)

Tier 3 — Sonnet: only when you need reasoning

These are the tasks where cutting corners burns you:

Planning — decomposing a feature into ordered tasks with dependencies
Debugging — multi-file root cause analysis from a stack trace
Security review — catching injection vectors, CORS misconfig, auth gaps
Complex implementation — writing actual business logic across files
Research — investigating approaches, comparing tradeoffs

Sonnet stays on these because the cost of a wrong answer exceeds the cost of the API call. A bad security review doesn't save you money — it costs you an incident.

Tier 2 — Haiku: the workhorse

This is where the savings live. These tasks need an LLM, but they don't need deep reasoning:

Code review — pattern-matching against a checklist (missing error handling, unused imports, style violations)
Test runner — executing tests, parsing output, reporting pass/fail
Commit messages — reading a diff, writing an imperative summary
Docs — updating a README section, writing a changelog entry
DevOps — generating a Dockerfile, writing CI config from a template
Git operations — merge conflict resolution, branch management

Haiku runs at $0.25/1M input tokens vs Sonnet's $3/1M. That's a 12x difference. For tasks that are essentially "read this structured input, produce this structured output," Haiku is more than capable.

Here's what the model assignment looks like — one field per agent definition:

# code-reviewer agent
model: haiku    # doesn't need Sonnet for checklist-style review

# debugger agent  
model: sonnet   # root cause analysis needs real reasoning

# commit agent
model: haiku    # diff in, message out — bounded task

Tier 1 — Ollama: zero cost, zero latency

Some tasks are so mechanical that even Haiku is overkill. For these, I route to local Ollama models running on my Mac:

# LiteLLM routing config
model_list:
  - model_name: local-commit
    litellm_params:
      model: ollama/tavernari/git-commit-message
      api_base: http://localhost:11434
  - model_name: local-fast
    litellm_params:
      model: ollama/qwen2.5-coder:7b
      api_base: http://localhost:11434

router_settings:
  fallback_models:
    - claude-haiku-4-5    # escalation safety net

tavernari/git-commit-message is a purpose-built 8B model that reads diffs and outputs conventional commit messages. It runs at 40+ tokens/sec on Apple Silicon with zero API cost. For a task I trigger dozens of times a day, that adds up.

The key detail: fallback_models. If the local model fails validation, the request escalates to Haiku automatically. You get the cost savings without the risk.

The quality gate: don't trust, verify

Routing to cheaper models only works if you catch bad output before it hits your codebase. I use a validation script that sits between the local model and the next stage:

# Pipe contractor output through validation
echo "$DIFF" | ollama run tavernari/git-commit-message \
  | cast-validate-contractor.sh --type commit --model local-commit

The validator checks for:

Empty output — model didn't generate anything useful
Hallucination markers — "As an AI", "I cannot", "I'm not sure"
Length bounds — too short (lazy) or too long (rambling)
Format compliance — commit messages must start with a capital letter in imperative mood

If validation fails, the task escalates to Haiku. If Haiku's output also fails review, it escalates to Sonnet. Every escalation gets logged, so over time you can see which tasks actually need the more expensive model and which ones you're safely routing locally.

Local model output
  → Validation (format, length, hallucination check)
      ✓ pass → next stage
      ✗ fail → escalate to Haiku
           ✗ fail → escalate to Sonnet
           → log escalation reason

What this looks like in practice

Here's a realistic daily breakdown at ~50 agent calls:

Tier	Calls/day	Avg tokens	Cost/1K tokens	Daily cost
Sonnet	20	6,000	$0.003	$0.36
Haiku	18	1,500	$0.00025	~$0.01
Ollama	12	1,500	$0.00	$0.00
Total	50			~$0.37/day

Without tiering, if everything ran on Sonnet: ~$0.90/day. If you're running everything on Sonnet today, that's up to a 60% reduction. Even with a mixed baseline, the Ollama tier alone eliminates your most frequent API calls entirely — and the gap widens with volume.

But honestly? The bigger win isn't cost. It's latency. Local Ollama inference on Apple Silicon has no network round-trip. For commit messages and log summaries that fire multiple times per session, the response feels instant. That's a workflow improvement you notice every single session.

What NOT to route locally

This is just as important as what you do route. Keep these on Sonnet:

Security analysis — small models miss subtle vulnerabilities. A false negative here has real consequences.
Root cause debugging — multi-step causal reasoning across files and stack traces. 7B models generate plausible-sounding but wrong hypotheses.
Planning and task decomposition — requires understanding the full codebase context and dependency ordering.
Complex code generation — anything beyond boilerplate. The risk is subtle bugs that pass review but fail at runtime.
Anything requiring >8K context — local models degrade quickly past their context window.

The rule of thumb: if the cost of a wrong answer is "I regenerate it," route it cheap. If the cost is "I debug it for an hour," keep it on Sonnet.

Try it yourself

The tiered model strategy isn't tied to any specific framework — you can apply it to any Claude Code setup with subagents. The key ideas:

Audit your agent calls. Which ones are just "structured input → structured output"?
Drop those to Haiku. One config change per agent.
For the most mechanical tasks, try Ollama locally. Commit messages are the easiest starting point.
Add a validation gate. Never let cheap model output flow unchecked into your codebase.

If you want to see the full implementation — agent definitions, LiteLLM configs, validation scripts, and the escalation logging — the framework I built this on is open source:

castframework.dev — docs and architecture overview
GitHub: claude-agent-team — the core framework with all 17 agents
GitHub: cast-hooks — hook scripts including the contractor validator

What's your agent-to-model ratio? Are you running everything on the same tier, or have you started routing? Drop a comment — I'm curious how others are handling this.

I spent 6 weeks reading all of the Claude-Code docs. Here is what I built.

Edward Kubiak — Wed, 08 Apr 2026 15:40:00 +0000

Claude Code ships with roughly 40 discrete tools, a hook system covering 13 lifecycle events, and an Agent tool that can spawn subagents as flat tool calls. Most people use it as a single-session chat — type a request, get a response, move on.

I spent six weeks reading every piece of documentation I could find about those primitives. Not the "getting started" guides — the actual behavior specs. How PreToolUse hooks can return exit code 2 to hard-block a tool call. How CLAUDE.md instructions get loaded into every session and every subagent. How agent markdown files with YAML frontmatter define specialist behaviors. How the Agent tool dispatches subagents with isolated contexts.

I wanted to know what happens when you actually compose those primitives. Not by building an external orchestration layer, not by wrapping the API in a custom framework, but by wiring together the pieces Claude Code already exposes. What if hooks weren't just for logging, but for enforcement? What if agents weren't one-off assistants, but persistent specialists with memory? What if you could define multi-step pipelines where agents hand off to each other with file ownership contracts?

The result is CAST — Claude Agent Specialist Team. It's been my daily driver for six weeks across real projects. This is what I learned building it, and how you can try it yourself.

What CAST Actually Is

CAST is a local-first multi-agent framework that runs entirely inside Claude Code. There's no external server, no API wrapper, no cloud dependency. Everything lives in ~/.claude/ on your machine.

The core idea: instead of one Claude session doing everything, you define specialist agents — each a plain markdown file with YAML frontmatter — and let the model route tasks to the right expert. A code-writer handles implementation. A debugger does root-cause analysis. A security agent audits for vulnerabilities. A commit agent stages and commits with semantic messages. Seventeen agents total.

Eleven of those agents run on Haiku ($1/MTok input) — the high-frequency, pattern-following work like code review, testing, and commits. Six run on Sonnet ($3/MTok input) for complex reasoning like planning, debugging, and security audits. The cost difference is 20x per token. CAST routes silently; you pay for what the task actually needs. In practice, this model tiering cuts token costs by 25-40%.

Here's the full roster:

Agent	Model	Purpose
`code-writer`	Sonnet	Feature implementation spanning files or logical units
`debugger`	Sonnet	Root-cause diagnosis and fixes for failures
`planner`	Sonnet	Breaks features into sequenced task plans
`orchestrator`	Sonnet	Executes multi-agent plan manifests
`researcher`	Sonnet	Multi-source analysis, gap reports, data synthesis
`security`	Sonnet	Auth, input validation, secrets, vulnerability audit
`merge`	Haiku	Git merges, rebases, conflict resolution
`test-writer`	Haiku	Unit and integration tests
`devops`	Haiku	CI/CD, Docker, infrastructure
`docs`	Haiku	Documentation, READMEs, changelogs
`morning-briefing`	Haiku	Daily git activity summary
`bash-specialist`	Haiku	Shell scripts, BATS tests, hook scripts
`code-reviewer`	Haiku	Diff scan for correctness and conventions
`test-runner`	Haiku	Runs test suites (bats, jest, vitest)
`commit`	Haiku	Stages and commits with semantic messages
`push`	Haiku	Pushes to remote with safety checks
`frontend-qa`	Haiku	Frontend diff review, component audit

Every agent carries persistent memory in ~/.claude/agent-memory-local/<name>/. They accumulate domain knowledge across sessions — patterns discovered, user preferences learned, project-specific context retained.

The Architecture

User Prompt
│
▼
┌─────────────────────────────────────────────┐
│ CLAUDE.md dispatch table (17-row routing) │
│ Model reads table → picks specialist agent │
└──────────────────┬──────────────────────────┘
│
┌────────────▼────────────┐
│ PreToolUse hooks │
│ • pre-tool-guard.sh │ ← blocks raw git commit/push
│ • cast-audit-hook.sh │ ← logs file modifications
│ • cast-headless-guard │ ← auto-answers AskUserQuestion
└────────────┬────────────┘
│
┌────────────▼────────────┐
│ Agent Tool dispatch │
│ Specialist agent runs │
│ (SubagentStart hook │ ← emits task_claimed to cast.db
│ fires on spawn) │
└────────────┬────────────┘
│
┌────────────▼────────────┐
│ PostToolUse hooks │
│ • post-tool-hook.sh │ ← injects [CAST-REVIEW] after writes
└────────────┬────────────┘
│
┌────────────▼────────────┐
│ Post-chain protocol │
│ code change? │
│ yes → code-reviewer │
│ → commit │
│ → push │
│ no → done │
└────────────┬────────────┘
│
┌────────────▼────────────┐
│ Stop hook │
│ cast-session-end.sh │ ← archival, DB pruning, memory sync
└────────────┬────────────┘
│
┌────────────────────────────┐
│ cast.db │
│ sessions │ agent_runs │
│ routing_events │
│ agent_memories │
└────────────────────────────┘
│
┌────────────▼────────────┐
│ claude-code-dashboard │
│ React UI on :5173 │
│ /activity /sessions │
│ /analytics /agents │
│ /memory /token-spend │
└─────────────────────────┘

Model-Driven Dispatch

There's no regex router. No routing configuration file. No intent classification model. The model reads a dispatch table in CLAUDE.md — a plain markdown table listing all 17 agents with their descriptions — and picks the appropriate agent based on the user's request.

This is perhaps the most counter-intuitive part of CAST. In v2, I had 42 agents with regex pattern matching across 90 patterns and 15 routes. It was brittle and constantly misfired. "I want to commit to this approach" would trigger the commit agent. "I need to push through this blocker" would trigger the push agent. I spent more time maintaining routing rules than writing features.

The v4 approach is radically simpler: delete all routing code, give the model a table, and trust its language understanding. The current version has 17 agents with zero routing code — and it's dramatically more accurate than the regex system ever was.

Each agent is defined as a markdown file with YAML frontmatter:

---
name: code-writer
description: >
  Implementation specialist for feature work, bug fixes, and planned changes.
tools: Read, Write, Edit, Bash, Glob, Grep, Agent
model: sonnet
effort: high
memory: local
maxTurns: 40
isolation: worktree
---

The model field controls cost. The effort field controls thinking depth. The isolation: worktree field tells the orchestrator to give this agent its own git worktree during parallel execution, preventing file conflicts. The body of the file contains the agent's full instructions — workflow steps, constraints, output format, and chain rules.

Hook-Enforced Quality Gates

Claude Code's hook system supports PreToolUse, PostToolUse, SessionStart, SessionEnd, and several others. CAST wires 13 of them. The critical insight: hooks should be load-bearing, not observational.

The clearest example is pre-tool-guard.sh, which intercepts the Bash tool:

# Block any git commit invocation not from a subagent
if echo "$FIRST_LINE" | grep -qE "(^|[[:space:]])git[[:space:]]+commit"; then
  echo "**[CAST]** Raw git commit blocked. Dispatch the commit agent instead."
  exit 2
fi

Exit code 2 is Claude Code's hard block — the tool call is rejected and cannot proceed. This means raw git commit and git push are structurally impossible in a CAST session. Every commit goes through the commit agent, which enforces semantic messages and staging discipline. Every push requires a prior code-reviewer pass.

The PostToolUse hook injects [CAST-REVIEW] directives after code changes, triggering automatic code review. The PreCompact hook detects when context compaction is about to degrade quality (the "dumb zone") and emits warnings. The SessionEnd hook archives sessions, syncs memory to SQLite, and runs the session distiller.

Multi-Agent Pipelines

Single-agent dispatch handles most tasks, but some work requires coordination. A feature implementation might need a code-writer, then a code-reviewer, then a commit agent — in sequence. A large refactor might need two code-writers working on different files simultaneously, followed by a security audit, then a single commit.

For these cases, the planner agent produces an Agent Dispatch Manifest (ADM) — a JSON structure that defines execution batches:

{
  "batches": [
    {
      "id": 1,
      "parallel": true,
      "agents": [
        {
          "subagent_type": "code-writer",
          "owns_files": ["/path/to/feature.ts"],
          "prompt": "Implement the debounce hook..."
        },
        {
          "subagent_type": "security",
          "owns_files": ["/path/to/auth.ts"],
          "prompt": "Audit the auth middleware..."
        }
      ]
    },
    {
      "id": 2,
      "parallel": false,
      "agents": [
        { "subagent_type": "commit", "prompt": "Commit all changes..." }
      ]
    }
  ]
}

The orchestrator agent executes these plans. Parallel batches fire simultaneously. Sequential batches gate on prior completion. The owns_files field prevents two parallel agents from writing the same file — the orchestrator detects conflicts before dispatch and blocks the batch if overlap exists.

Plans support checkpointing. If a session disconnects mid-execution, the orchestrator picks up where it left off. Each completed batch writes a checkpoint file; on resume, completed batches are skipped.

You can even split plans across dual git worktrees for true parallel execution:

cast parallel ~/.claude/plans/my-plan.md

Memory Persistence

Every agent's knowledge persists across sessions through a multi-layered memory system built on SQLite and FTS5 full-text search.

The relevance scoring formula weights three factors: 0.4 * recency + 0.3 * importance + 0.3 * fts_rank. Recency decays exponentially — feedback memories decay slowly (0.999 rate), project context decays faster (0.990). An importance column (0.0-1.0) weights critical memories higher.

Temporal validity columns (valid_from, valid_to) let facts be superseded without deletion. When a memory becomes outdated, it's marked with a valid_to timestamp — still queryable for history, but filtered out of current results by default.

A session distiller runs at session end, extracting decisions, patterns, and failures into procedural memories. A staleness validator flags memories older than 30 days and verifies that file and function references still exist in the codebase. Weekly consolidation deduplicates and archives below a relevance threshold.

For users who want semantic search, optional Ollama integration generates 768-dimensional embeddings using nomic-embed-text. Hybrid search combines FTS5 rank with cosine similarity. Without Ollama, FTS5-only search works automatically — no external dependency required.

Observability

If you're running a multi-agent system, you need to know what it's doing. Which agents fired? How long did they take? What did they cost? Did any get blocked? Without answers to these questions, you're flying blind.

Everything CAST does is logged to cast.db — an append-only SQLite database at ~/.claude/cast.db running in WAL mode for concurrent access. Four tables provide the audit trail:

Table	What It Tracks
`sessions`	Session start/end, model, token counts
`agent_runs`	Every dispatch: agent, model, duration, status, batch_id
`routing_events`	Prompt routing records, event types, JSON payloads
`agent_memories`	Synced memory with temporal validity and relevance scores

# Query recent agent runs
sqlite3 ~/.claude/cast.db \
  "SELECT agent, status, created_at FROM agent_runs ORDER BY id DESC LIMIT 10;"

A companion React dashboard reads cast.db directly and provides a full observability UI — activity timelines, token spend by agent, hook health, plan status, memory viewer, and raw database explorer. For terminal users, cast dash provides a Textual-based TUI with live-updating panels — think htop, but for your agent system.

What I Learned Building This

Model-driven dispatch beats regex routing. This was the single biggest improvement from v2 to v4. Ninety regex patterns and fifteen routes were replaced by a 17-row markdown table that the model reads and interprets. Accuracy went up, maintenance went to near zero, and I deleted over a thousand lines of routing code.

Hooks should be load-bearing, not observational. Most hook integrations I've seen log events and move on. CAST's hooks block operations, inject directives, enforce review chains, and manage context across compaction boundaries. The difference between "we recommend code review" and "raw git commit is structurally impossible" is the difference between a suggestion and a system.

Model tiering is significant. When Anthropic published that multi-agent systems use 15x more tokens than single-turn chat, I took it seriously. Running code review, commits, test execution, and documentation on Haiku instead of Sonnet saves 3x per invocation on those tasks — and those tasks account for the majority of dispatches. The 25-40% cost reduction is real and measurable through cast.db analytics.

Local-first is underrated. CAST has zero cloud dependencies beyond the Claude API itself. All state lives in SQLite. Memory persists in markdown files and a local database. Backups go to GitHub releases as tarballs. The system works offline (with Ollama fallback for local models) and never sends agent memory to a third-party service. This turns out to matter more than I expected — both for privacy and for reliability.

The "dumb zone" is real. When Claude Code's context window fills up and compaction kicks in, quality degrades noticeably. CAST detects this with PreCompact and PostCompact hooks, reinjects critical plan context after compaction, and alerts when the session should be restarted. Acknowledging and mitigating this limitation made the system significantly more reliable.

Known Limitations

I want to be transparent about where CAST falls short today. There's a known-limitations.md in the repo that covers these in detail:

macOS-focused. Homebrew distribution, launchd scheduling, Keychain integration — these are all macOS. The core framework works anywhere Claude Code runs, but the ecosystem tooling assumes macOS.
Single-user. CAST is designed for one developer on one machine. There's no multi-user coordination, no shared state across machines.
Claude Code dependency. CAST is built on Claude Code's primitives. If Anthropic changes the hook system or Agent tool behavior, CAST needs to adapt. (This has happened several times during development — the framework is designed to be resilient to it.)
No native coordinator yet. Claude Code has an internal coordinator pattern that isn't shipped publicly. When it ships, CAST's orchestrator will adapt to use it rather than compete with it.

The Version History in Brief

CAST has gone through significant evolution, and I think the trajectory is instructive:

v1: Manual dispatch, no hooks, no memory. Proof of concept.
v2: 42 agents, regex routing with 90 patterns. Worked but was fragile and expensive.
v3: Rebuilt with 16 agents, model-driven dispatch, hooks, cron scheduling, and cast.db. This is where it became a real system.
v4: Major cleanup — cut from 33 hooks to 13, slimmed the CLI from 2,331 to 976 lines, dropped 5 empty database tables. Then added memory persistence (FTS5, embeddings, distiller), token efficiency optimizations (model tiering, response budgets), and local-first hardening (Keychain, encryption, offline queue).

The trend is clear: fewer agents, less code, more capability. Every version has been a subtraction as much as an addition.

Try It Yourself

The fastest path:

brew tap ek33450505/cast
brew install cast
cast doctor    # verify installation

Or clone directly:

git clone https://github.com/ek33450505/claude-agent-team
cd claude-agent-team
bash install.sh

cast doctor runs a validation suite — checks hook wiring, agent files, database schema, and CLI paths. Green across the board means you're ready.

The ecosystem spans 11 repos with 9 Homebrew taps. The pieces are modular — you can install just the memory system (brew install cast-memory), just the hooks (brew install cast-hooks), or the full framework. Everything is MIT licensed.

Key links:

Core framework: github.com/ek33450505/claude-agent-team
Dashboard: github.com/ek33450505/claude-code-dashboard
Project Site: castframework.dev

The test suite has 357 BATS tests with zero failures. CI runs on every push.

Closing

CAST started as an experiment in reading documentation carefully. Claude Code's primitives — hooks, agent markdown, CLAUDE.md, the Agent tool — are individually simple. Individually, each one solves a small problem. But composed together, with enforcement rather than suggestion, with persistence rather than amnesia, with observability rather than opacity, they produce something that feels qualitatively different from single-session use.

I didn't set out to build a framework. I set out to understand the tool I was using. The framework emerged from that understanding — from asking "what if this hook actually blocked the operation?" and "what if this agent remembered what it learned yesterday?" and "what if I could see every dispatch in a database?"

The lesson I keep coming back to: documentation is a design surface, not just a reference manual. The features are already there, waiting to be composed. The interesting work is in figuring out how the pieces fit together — and having the patience to read carefully enough to find out.

Your Claude Code Batches Don't Have to Wait for Each Other

Edward Kubiak — Mon, 06 Apr 2026 19:41:15 +0000

The serial bottleneck

You have a plan with six batches of AI-driven work: build the auth module, write its tests, scaffold the dashboard, add the API routes, wire up the middleware, write the integration tests. Batches 1–3 have nothing to do with batches 4–6. No shared files, no dependency chain, no ordering constraint.

But they run one at a time. Twenty minutes of wall-clock time for work that could finish in ten.

This is the embarrassingly parallel problem. A single Claude Code session is inherently serial — it processes one task, commits, moves to the next. If your batches are independent, you're paying a serial tax for no reason.

The pattern: do it by hand

The fix is git worktrees. A worktree gives you a second (or third, or fourth) working directory for the same repository, each checked out on its own branch. Two Claude Code sessions can work simultaneously in two worktrees without ever touching each other's files.

The manual version is about 15 lines of shell:

[code block]

Step by step:

git worktree add creates a new working directory on a fresh branch. Both branches start from HEAD, so they share an identical starting point.
claude --headless launches Claude Code without a terminal UI. The -p flag passes a prompt; & sends each session to the background.
wait blocks until both background processes finish.
The merge brings Stream B's changes into Stream A, then Stream A — now containing both sets of changes — back into your original branch.
Cleanup removes the worktrees and their directories.

That's the entire pattern. Each session has its own working directory, its own branch, complete isolation. No file conflicts mid-flight.

It works — but there's a lot that can go wrong. Hit Ctrl+C and you have orphaned claude processes in the background. Forget cleanup and you have stale worktrees cluttering your repo. A merge conflict leaves you stuck with no error handling and no visibility into what happened.

Which is why I automated it.

Where it breaks

Before the automation, some honest caveats.

Batch dependencies. If batch 4 needs output from batch 2, splitting them across streams will cause failures. You need to know your dependency graph before splitting. Independent batches parallelize cleanly; dependent ones don't.

Merge conflicts. Isolated worktrees prevent simultaneous file conflicts — neither session can see the other's uncommitted changes. But they can't prevent logical conflicts. If both sessions modify the same function in different ways, the merge will fail. That's a feature, not a bug: you want to know about it rather than have it silently auto-resolved.

Double API cost. Two concurrent sessions means double the token usage. For large plans with 6+ batches, the time savings are worth it. For a 3-batch plan, probably not.

Automating it: `cast-parallel`

I wrapped this pattern into a script called cast-parallel. Before running anything, preview the split with a dry run:

[code block]

The script reads an Agent Dispatch Manifest — a JSON block embedded in a plan file — counts the batches, and splits them at the midpoint. Override with --split N to force a different cut point.

Here's what it adds on top of the manual approach:

[diagram block]

A few design decisions worth calling out:

Subprocess guard: Checks an environment variable at startup and exits immediately if a parent CAST session spawned the script — preventing recursive execution inside agent chains.
Trap handler: Catches INT and TERM signals, kills both background processes, and removes worktrees. No orphaned processes, no stale directories.
PID-based branch names (e.g., cast-parallel-a-12345): Prevents collisions when running multiple parallel executions against the same repo.
Merge conflicts are never auto-resolved: Worktrees are preserved so you can inspect and fix them yourself.

Optional database logging records events at each stage (parallel_start, parallel_streams_done, parallel_complete, parallel_fail, parallel_merge_conflict) for observability. If the logger isn't present, it's silently skipped.

When to use this pattern

Good fit: Large plans with 6+ independent batches. The wall-clock savings scale linearly — a 20-minute plan becomes a 10-minute plan.

Not worth it: Small plans under 4 batches. Worktree setup, merge, and cleanup overhead eats into the savings.

Don't use: Plans with strict batch ordering where later batches depend on earlier ones. Use sequential execution instead.

Always dry-run first. Preview the split, verify the batches in each stream are truly independent, and adjust with --split N if the auto-midpoint is wrong.

Try it

The pattern is simple enough to implement by hand. The automation handles the parts that break — signal traps, PID tracking, merge conflict preservation, cleanup. If you're already running Claude Code on multi-batch plans, this is a low-effort way to cut your wall-clock time roughly in half.

The repo is at github.com/ek33450505/cast-parallel. Part of the CAST ecosystem, but works standalone with just the Claude CLI and git. MIT licensed, contributions welcome.

I Built a Local Cost Monitor for Claude Code Using Just Bash and SQLite

Edward Kubiak — Sat, 04 Apr 2026 21:56:23 +0000

If you've been using Claude Code heavily, you've probably had this moment: you open
your Anthropic billing page and wonder when exactly that happened. Which session?
Which agent? Which project?

There's no built-in answer. Claude Code doesn't expose per-session cost data, and the
billing dashboard shows you totals — not the story behind them.

So I built cast-observe: a local observability layer that hooks into Claude Code's
event lifecycle and writes everything to a SQLite file on your machine.

What You Get

Per-session token counts and USD cost
Agent run history (name, status, duration, cost)
Daily and weekly cost summaries, filterable by project
Budget alerts when you cross a threshold
A live TUI dashboard (cast-observe dash)
Direct SQL access to all your data

No cloud. No telemetry. No SaaS. Just ~/.claude/cast.db.

How It Works

Claude Code supports a hook system — shell scripts that fire on lifecycle events like
SessionStart, SubagentStop, PostToolUse, etc. cast-observe registers eight of them:

Hook	What It Does
`SessionStart`	Opens a new session row in SQLite
`SessionEnd`	Finalizes the session, triggers budget checks
`SubagentStart`	Records an agent invocation
`SubagentStop`	Logs completion status, duration, and cost
`PostToolUse`	Reads token usage from the tool response, computes USD
`PostToolUseFailure`	Same — failed calls still cost tokens
`PreCompact` / `PostCompact`	Handles Claude's context compaction events

All hooks run with async: true so they never block Claude Code execution.

The Data Flow

Every PostToolUse fires with a JSON payload on stdin containing the model name, input
tokens, and output tokens. A small Python script looks up the model's price per million
tokens from a local pricing config, computes the cost, and writes a row to SQLite with
retry logic to handle lock contention.

# Simplified — see observe-cost-tracker.py for full version
cost = (input_tokens / 1_000_000 * input_price) + \
       (output_tokens / 1_000_000 * output_price)

The hook reads from stdin, never from environment variables. This keeps sensitive
session data out of the process table.

The Schema

Four tables:

sessions — one row per Claude Code session, with aggregated tokens and cost
agent_runs — one row per agent invocation (name, status, duration, cost)
routing_events — placeholder for CAST multi-agent routing data
budgets — your configured limits and alert thresholds

The CLI

# Today's usage and recent agent runs
cast-observe status

# Budget summaries
cast-observe budget
cast-observe budget --week
cast-observe budget --project my-project

# Session history
cast-observe sessions --limit 20

# Raw SQL access
cast-observe db query "SELECT agent_name, SUM(cost_usd) FROM agent_runs GROUP BY agent_name"

# Launch the TUI
cast-observe dash

The TUI is built with Textual and shows live agent
runs, session cost breakdown, and status colors (green = DONE, yellow = DONE_WITH_CONCERNS,
red = BLOCKED).

Installation

Via Homebrew:

brew tap ek33450505/cast-observe
brew install cast-observe
cast-observe install

From source:

git clone https://github.com/ek33450505/cast-observe.git
cd cast-observe
bash install.sh

The install script merges the hook configuration non-destructively into your existing
~/.claude/settings.json, so it won't clobber any hooks you already have.

Requirements: macOS 12+ or Linux, Claude Code, python3, sqlite3.

Why Local SQLite?

Observability tools tend to default to "send it to a server." I wanted the opposite.

Zero latency — writes are local, no network round trips on every tool call
Full ownership — your usage data stays on your machine
SQL as the API — power users can query anything directly
Works offline — no outage pages, no rate limits

The WAL (Write-Ahead Logging) mode keeps concurrent reads and writes from blocking
each other, which matters because multiple hooks can fire in rapid succession during
an agent run.

Works Alongside CAST

If you use CAST — the multi-agent
framework built on top of Claude Code — cast-observe shares the same ~/.claude/cast.db
database. CAST writes to routing_events; cast-observe reads from it but doesn't own it.
Installing one later won't break the other.

What's Next

Per-model cost breakdown in the TUI
Exportable reports (CSV/JSON)
Session diff view: compare two runs side by side
GitHub Actions integration for tracking CI-time usage

The repo is at ek33450505/cast-observe,
MIT licensed. PRs, issues, and feedback welcome.

If you're using Claude Code and have been flying blind on cost, give it a try.

I Built an Observability Dashboard for 17 AI Agents — With Those Same Agents

Edward Kubiak — Fri, 03 Apr 2026 19:05:57 +0000

The Problem: 17 AI Agents and Zero Visibility

I run a system called CAST (Claude Agent Specialist Team) — a framework of 17 specialized AI agents built on top of Claude Code. These agents handle everything from writing code to reviewing PRs to running security audits. They dispatch each other in chains: a planner spawns a code-writer, which triggers a code-reviewer, which chains to a commit agent, which hands off to push.

It works. But it's a black box.

When 5 agents are running in parallel across 3 worktrees, I had no idea:

What's actually running right now?
How much is this costing?
Did that code-reviewer pass or fail?
Which agent is stuck?

So I built a dashboard. And here's the recursive part — the dashboard was built by CAST agents, and every agent dispatch showed up in the dashboard they were building.

CAST in 60 Seconds

Before we get into the dashboard, here's how CAST works:

17 agents across 2 model tiers:

Sonnet (complex tasks): code-writer, debugger, planner, security, researcher, orchestrator, and 7 more
Haiku (lightweight): code-reviewer, commit, push, test-runner, frontend-qa

Hook-driven dispatch:
Claude Code has a hooks system — shell scripts that fire on events like PostToolUse or SubagentStart. CAST hooks write every agent spawn, completion, and status change to a local SQLite database (cast.db).

The data model:

cast.db
├── sessions        — Claude Code session metadata
├── agent_runs      — Every agent dispatch: who, when, status, cost
├── routing_events  — Dispatch decisions and routing
└── agent_memories  — Persistent agent knowledge

Plus JSONL session logs that Claude Code writes to ~/.claude/projects/ — these are the ground truth for token counts.

The Dashboard: 4 Pages, Zero Cloud

The dashboard is a local-first React app that reads directly from your filesystem. No accounts, no cloud sync, no external services.

Architecture

~/.claude/cast.db  ──┐
~/.claude/projects/  ─┤──→  Express 5 API  ──→  React 19 SPA
~/.claude/agents/    ─┤     (localhost:3001)    (localhost:5173)
~/.claude/settings/  ─┘

Stack: React 19, Vite 6, TypeScript, Tailwind CSS v4, TanStack Query v5, Recharts, Express 5, better-sqlite3, SSE for real-time updates.

The 4 Pages

Dashboard (/) — The "what's happening now" view:

Active agents with live status
Today's stats: runs, cost, tokens
7-day cost sparkline
System health (agent count, hooks, skills)

Sessions (/sessions) — Every Claude Code session with:

Token breakdown (input, output, cache creation, cache read)
Agent runs within each session
Duration, model, cost
Full message timeline drill-down

Analytics (/analytics) — The numbers view:

30-day token spend chart
Agent scorecard (runs, success rate, avg cost per agent)
Model tier breakdown
Delegation savings: "What would this cost if everything ran on Sonnet?"

System (/system) — A tabbed browser for your entire CAST installation:

Agents (read/write), Rules, Skills & Commands
Hooks (definitions + health checks)
Agent memory (filesystem-backed)
Plans, DB Explorer, Cron triggers

Plus a Docs page with a complete reference of all 17 slash commands, 17 agents, 8 skills, and the CAST CLI.

The Interesting Engineering Problems

1. Dual Data Pipeline

No single data source has the complete picture:

JSONL session logs have accurate token counts (including cache tokens) but no agent-level attribution
cast.db has agent-level data (who ran, what status, what cost) but estimates tokens from subagent JSONL files

The solution: merge both sources. The token spend pipeline reads JSONL for totals. The agent runs pipeline reads cast.db for attribution. When they overlap, JSONL wins — it's the ground truth from Claude Code itself.

// tokenSpend.ts reads JSONL directly
const costMap = getSessionCostMap()  // Map<sessionId, cost>

// agentRuns.ts reads cast.db
const runs = db.prepare(`SELECT ... FROM agent_runs`).all()

// When displaying session cost, prefer JSONL over DB
totalCost: costMap.get(s.session_id) ?? s.total_cost

2. SSE Push Instead of Polling

The dashboard doesn't poll on timers. A castDbWatcher polls cast.db every 3 seconds server-side and pushes changes over Server-Sent Events:

// Server: watch for new rows
const newRuns = db.prepare(
  `SELECT * FROM agent_runs WHERE rowid > ?`
).all(highWaterMark)

if (newRuns.length > 0) {
  broadcast('db_change_agent_run', newRuns)
}

// Client: invalidate TanStack Query cache on events
const eventSource = new EventSource('/api/events')
eventSource.onmessage = (e) => {
  const { type } = JSON.parse(e.data)
  if (type === 'db_change_agent_run') {
    queryClient.invalidateQueries({ queryKey: ['agent-runs'] })
  }
}

This means the dashboard updates within 3 seconds of any agent activity — no manual refresh, no wasted requests.

3. Stale Agent Reconciliation

When Claude Code crashes or a terminal closes, agent_runs rows can be left with status = 'running' forever. On SSE connect, the server reconciles:

UPDATE agent_runs
SET status = 'DONE', ended_at = datetime('now')
WHERE status = 'running'
  AND started_at < datetime('now', '-2 hours')

This prevents phantom "running" agents from cluttering the dashboard after crashes.

4. Schema Migration Without an ORM

The dashboard reads a database it doesn't own — cast.db is written by CAST hooks, not the dashboard. The schema evolves as CAST evolves. Instead of failing on missing columns, the seed endpoint runs defensive migrations:

for (const stmt of [
  `ALTER TABLE sessions ADD COLUMN total_input_tokens INTEGER DEFAULT 0`,
  `ALTER TABLE agent_runs ADD COLUMN prompt TEXT`,
  `ALTER TABLE agent_runs ADD COLUMN project TEXT`,
]) {
  try { db.exec(stmt) } catch { /* column already exists */ }
}

No migration framework, no version tracking. Just idempotent ALTER TABLE statements wrapped in try/catch. SQLite throws if the column exists — we catch and move on.

The Consolidation Story: 21 Views → 4 Pages

The first version of the dashboard grew organically. Every new CAST feature got its own page:

TokenSpend page. DispatchLog page. QualityGates page. HookHealth page. PrivacyAudit page. MemoryBrowser page. SqliteExplorer page. CastdControl page. RulesView. PlansView. LiveView...

At peak, the dashboard had 21 view files and 7 navigation items. It was harder to navigate the dashboard than to just read the database directly.

The fix was radical consolidation in a single session:

Activity + Sessions → Sessions (activity is just recent sessions)
Agents + Knowledge → System (agents, rules, skills are all configuration)
TokenSpend + QualityGates → Analytics (all numbers in one place)
HookHealth + Privacy + DB Explorer + Castd → System tabs

14 view files deleted. 45 API hooks trimmed to 20. The result: 4 pages that actually make sense.

The lesson: observability UI for a running system grows unbounded — every feature wants its own page. The right model is aggressive consolidation with tabs, not more nav items.

The Dogfooding Loop

Here's what makes this project strange: the dashboard was built by CAST agents — the same agents it monitors.

A typical development cycle:

I type /plan condense the dashboard pages
The planner agent writes a structured plan with an Agent Dispatch Manifest
The orchestrator dispatches agents in waves:
- Wave 1 (parallel): researcher audits backend, security reviews routes, frontend-qa checks components
- Wave 2: code-writer implements changes
- Wave 3: code-reviewer + test-writer verify
- Wave 4: commit + push
Each dispatch appears as an agent_run row in cast.db
The dashboard shows those rows in real-time via SSE

The v2.0.0 consolidation was 55 files changed, +522/-6,802 lines — all dispatched through CAST agents, all visible in the dashboard they were modifying.

Running It Yourself

The dashboard reads from ~/.claude/ — if you use Claude Code, you already have session data.

git clone https://github.com/ek33450505/claude-code-dashboard
cd claude-code-dashboard
npm install
npm run dev
# → Vite on :5173, Express API on :3001

For the full CAST agent framework (17 agents, hooks, cast.db):

git clone https://github.com/ek33450505/claude-agent-team
cd claude-agent-team
bash install.sh

Both projects are open source. The dashboard works standalone (reads JSONL sessions), but lights up fully with CAST installed (agent runs, routing, memory).

What's Next

cast dash — A Textual (Python) TUI that puts the dashboard directly in the terminal. htop for CAST. No browser needed for quick-glance monitoring.
Delegation savings tracking — Quantifying the cost difference between routing work to Haiku vs running everything on Sonnet.
Cross-session agent memory visualization — Showing how agent memory evolves over time.

Key Takeaways

Local-first observability is underrated. SQLite + filesystem + SSE gives you real-time monitoring with zero infrastructure.
Your AI agents need observability too. Multi-agent systems are opaque by default. Instrument them early.
Consolidate aggressively. Every feature wants its own page. Resist. Tabs > nav items.
Read the database you don't own defensively. Schema will change. Wrap everything in try/catch. Migrate idempotently.
The dogfooding loop is real. Building developer tools with the tools they observe creates a uniquely tight feedback loop.

The claude-code-dashboard and CAST are open source at ek33450505/claude-code-dashboard and ek33450505/claude-agent-team.

DEV Community: Edward Kubiak

Claude is blind to time. Here's the fix Anthropic didn't ship

The Problem

The Fix

How It Works

Install

The Meta-Observation

Links

Dear Diary, Love Claude Code.

The question

The implementation (50 lines, total)

What Claude actually wrote

Day 1 — Noticing

Day 2 — Noticing how it's using the journal

Day 5 — Articulating what the journal is for

Day 7 — A conceptual model

Day 8 — Seeing its own limits

Three layers of continuity

The useful questions

Try it

Most of your Claude Code agents don't need Sonnet

The problem: one model fits none

The 3-tier model strategy

Tier 3 — Sonnet: only when you need reasoning

Tier 2 — Haiku: the workhorse

Tier 1 — Ollama: zero cost, zero latency

The quality gate: don't trust, verify

What this looks like in practice

What NOT to route locally

Try it yourself

I spent 6 weeks reading all of the Claude-Code docs. Here is what I built.

What CAST Actually Is

The Architecture

Model-Driven Dispatch

Hook-Enforced Quality Gates

Multi-Agent Pipelines

Memory Persistence

Observability

What I Learned Building This

Known Limitations

The Version History in Brief

Try It Yourself

Closing

Your Claude Code Batches Don't Have to Wait for Each Other

The serial bottleneck

The pattern: do it by hand

Where it breaks

Automating it: cast-parallel

When to use this pattern

Try it

I Built a Local Cost Monitor for Claude Code Using Just Bash and SQLite

What You Get

How It Works

The Data Flow

The Schema

The CLI

Installation

Why Local SQLite?

Works Alongside CAST

What's Next

I Built an Observability Dashboard for 17 AI Agents — With Those Same Agents

The Problem: 17 AI Agents and Zero Visibility

CAST in 60 Seconds

The Dashboard: 4 Pages, Zero Cloud

Architecture

The 4 Pages

The Interesting Engineering Problems

1. Dual Data Pipeline

2. SSE Push Instead of Polling

3. Stale Agent Reconciliation

4. Schema Migration Without an ORM

The Consolidation Story: 21 Views → 4 Pages

The Dogfooding Loop

Running It Yourself

What's Next

Key Takeaways

Automating it: `cast-parallel`