DEV Community: Neilos

CC 20x max is not enough? This is what I'm doing to fix that

Neilos — Tue, 31 Mar 2026 07:52:36 +0000

There's a 200-comment Reddit thread right now of people watching their Claude Max plan vanish in minutes. One word — "Morning" — took 15% of someone's 5-hour limit. A fresh session, two messages, weekly quota wiped.

It's not just power users. Normal usage is hitting the wall.

The cap is real. But the opacity is worse — you can't see what's eating your budget, so you can't optimize around it. People are scared to use Opus, losing productivity not just when they hit the wall, but in constant anticipation of it.

The community is already finding the direction. One commenter: "best workflow is Opus high, then everything with Sonnet subagents." Right idea — but it stops at Sonnet, and it stays inside Anthropic's billing.

The pattern I've landed on: CC stays in charge, a cheap model does the work. Here's how it's structured.

ttal, logos, and MiniMax M2.7

Quick context for anyone not familiar:

ttal is an agent orchestration CLI — it manages tasks, spawns workers, runs pipelines, and routes work between agents.

logos is ttal's bash-only agent loop. Text in, text out. The model writes prose and shell commands, the sandbox executes them. No tool schemas, no JSON, no provider-specific plumbing — just a simple text convention any model can follow.

MiniMax M2.7 is a reasoning model released March 2026. $0.30/1M input, $1.20/1M output — about 10× cheaper than Sonnet. On Terminal Bench 2, the only direct benchmark with both models, it scores 57% vs Sonnet's 59%. In detection head-to-heads against Opus (Kilo Code), it found every bug and every security vulnerability — same result, fraction of the cost. The gap vs frontier shows up in architectural depth and complex multi-file reasoning, not in focused scoped tasks.

CC leads, MiniMax works

Every pipeline stage in ttal has a CC lead orchestrator. There are three: plan-review-lead, pr-review-lead, and code-lead. Each runs on Claude, holds context, makes decisions, and controls the flow. When focused work needs to happen, the lead delegates via ttal subagent run.

ttal subagent run is a CLI command — leads call it internally, but you can run it manually too:

# plan-review-lead delegates to review specialists
ttal subagent run gap-finder
ttal subagent run security-reviewer
ttal subagent run test-reviewer

# code-lead delegates focused single-file edits
ttal subagent run coder

# research via ttal ask — explore any codebase, URL, or repo
ttal ask "how does auth work?" --project backend

These run in parallel under the logos loop on M2.7. The lead picks up results, synthesizes, decides what's next. MiniMax never touches the orchestration. But the detection, the single-file edits, the exploration — all of it runs on a model that costs 10× less per token, entirely outside Claude's usage meter.

Why this works without quality loss

Most subagent work doesn't need frontier intelligence. Detection, review, single-file edits, exploration — bounded scope, clear criteria. The question isn't "is M2.7 as smart as Sonnet?" It's "is M2.7 good enough for this specific task?"

For review and detection it matches frontier quality. For single-file edits the scope is tight enough that it doesn't need to reason about the whole system. For exploration it just needs to read and report accurately. None of these require the architectural judgment and multi-file reasoning where frontier models genuinely earn their cost — and that judgment stays with the CC lead.

Why logos makes this possible

Most agent frameworks couple to the provider's tool-call API. M2.7 sometimes hallucinates tool calls — the model behaves as if it has tools it doesn't actually have. In a standard framework that's hard to recover from.

Logos handles all these edge cases. It detects hallucinated tool-call formats mid-stream, suppresses them, and injects corrective directives — the loop keeps running cleanly. And because logos uses no tool schemas, the surface area for these issues is minimal to begin with.

The other benefit: logos doesn't care which model you use. M2.7 today, whatever's cheapest next month. No rebuilding, no schema migration. Any model that can follow a simple text convention works.

What this actually changes

You stop self-censoring. Stop being scared to kick off a review because it might eat 15% of your session. The review runs on M2.7 under a CC lead that costs almost nothing to orchestrate.

The cap isn't going away. Anthropic is tightening, not loosening. A bigger plan isn't the answer — changing what the plan is used for is. CC for orchestration, decisions, and reasoning. Focused work on cheap models that don't touch your Claude budget at all.

ttal and logos are open source at github.com/tta-lab.

How I Manage 15+ Repos with Claude Code (Without Losing My Mind)

Neilos — Fri, 27 Mar 2026 11:23:46 +0000

Most Claude Code users work in one repo at a time. It's fine until your system spans multiple repos — then you're copy-pasting context between sessions, manually tracking which PR depends on which, and babysitting agents that can't see the full picture.

I manage 15+ repos across Go, Rust, TypeScript, Python, and C++. 10 specialized Claude Code agents coordinate through Telegram. Here's what I tried first, why it didn't work, and what does.

What Doesn't Work

All of these approaches try to solve "how do I give one session access to multiple repos." But that's the wrong framing. When you need cross-repo context, what you actually need is cross-repo read and explore. The write should always be focused on a single repo, a single PR. ttal handles this by separating the two: ttal ask reads and explores anything, workers write to one repo at a time.

Monorepo. I use Moon for monorepo management — I'm not anti-monorepo. But when your stack spans Go, Rust, TypeScript, Python, and C++, a single repo doesn't cut it. Each language has its own build tooling, CI pipelines, and dependency management. Cramming them together creates more problems than it solves. Even with Moon handling the orchestration, past 3-4 languages, splitting is cleaner. This isn't a theoretical concern — it's a real gap in the CC ecosystem that people are hitting.

Submodules. Even if you link repos together with submodules, you still don't get what you actually need: cross-repo coordination, shared context, parallel execution. Submodules give you a way to pin repo versions together — they don't give you a way to plan across repos, route tasks, or run parallel agents. And they don't compose well with worktrees, which are essential for parallel agent work. It's solving the wrong problem.

CC's native cross-repo workflow. Want to explore another repo? You manually /add-dir, then that session starts reading in that directory. It works, but it's manual and the session context pays the cost. In ttal, all exploration is handled by ttal ask — a lightweight bash-only agent that collects info from any source (--web, --repo, --project) and returns a detailed report without polluting your main session's context. And because ttal ask runs on a simple bash-based agent loop, you can use any fast, cheap model (MiniMax M2.7 HighSpeed) for exploration — so Opus stays focused on the thinking work that actually needs it.

Single-session cross-repo orchestration. This is the trap most people fall into. You give one CC session access to all your repos and ask it to plan and implement a cross-repo feature. The context window fills up fast, the agent loses track of which repo it's in, and the quality of both planning and execution suffers. Don't try to make one session do orchestration, planning, and execution across repos. Let manager agents hold the big picture. Once the plan is done, let workers handle execution detail — one repo, one task, one worktree.

What Works: A Coordination Layer

The answer wasn't a better monorepo or a smarter IDE. It was a thin coordination layer on top of Claude Code.

ttal is a single binary. Install it, define your projects in a TOML file, and you have a system that routes tasks to the right repo, the right agent, at the right time.

Two planes:

Worker plane — ephemeral CC sessions that plan, review, and implement. Each gets its own git worktree, sandboxed environment, and tmux session. Spin up, do the work, merge, clean up. No babysitting.

Manager plane — persistent agents that live across sessions. They hold the big picture — what features they designed with you, which tasks are done or blocked, what shipped yesterday. The manager never touches code. The worker never worries about the big picture.

Message bridge — the glue between everything. Human ↔ agent via Telegram. Agent ↔ agent via ttal send. Manager ↔ worker via ttal alert (workers notify their spawner automatically). CI status, PR reviews, task updates — all routed through the same daemon. You talk to your agents like coworkers in a chat app.

I wrote about the philosophy in ttal — More Than a Harness Engineering Framework, the tooling in We Replaced Every Tool Claude Code Ships With, and the memory model in How We Manage Memory and Sessions.

Daily Workflow

I open Telegram. 10 agent chats in a folder.

A typical morning:

Tell Yuki (orchestrator) what I want to build today
She breaks it into tasks, routes them to the right pipeline
ttal go\ advances each task — spawning planners, reviewers, coders in parallel
Workers run in isolated worktrees across whichever repos need changes
PR reviews happen automatically — parallel sub-reviewers check security, tests, types, edge cases separately
I review verdicts, approve, ttal go\ merges and cleans up

Cross-repo features just work. A change that touches ttal-cli, temenos, and organon gets three parallel workers, each in their own worktree, each with context about why the change exists.

Under the Hood

Unix philosophy — task management via Taskwarrior, knowledge via FlickNote, editing via tree-sitter. Compose dedicated tools, don't bundle into a platform.
Sandbox auto-config — specialized roles mean known paths. The multi-repo project registry means sandbox config writes itself. No manual permission prompts.
Pipeline-driven — tag-based pipelines borrowed from event sourcing. One command (ttal go) drives every transition. Human gates where they matter.
Built-in quality control — parallel sub-reviewers focus on different aspects (security, tests, silent failures, type design). By the time a PR reaches you, it's been through plan review, code review, and CI.
Session forking — brainstorm figures out what and why, then the session forks. Each fork inherits the full conversation and writes a plan scoped to its target repo. No summarization, no lossy handoff. Plan forks figure out the how, workers carry all of it into implementation.

The Numbers

Last week: 190+ tasks completed across all repos. In ttal, each task is a PR — planned, reviewed, implemented, merged. One person, 10 agents.

Throughput scales because coordination is automated. I don't track which session is doing what. I track tasks.

Getting Started

brew tap tta-lab/ttal
brew install ttal
ttal doctor --fix

Define your projects:

# ~/.config/ttal/projects.toml
[backend]
name = "Backend API"
path = "/code/backend"

[frontend]
name = "Frontend App"
path = "/code/frontend"

[infra]
name = "Infrastructure"
path = "/code/infra"

Route a task:

ttal task add --project backend "add rate limiting middleware" --tag feature && ttal go

The pipeline figures out the rest.

Multi-repo at scale with Claude Code isn't about getting CC to understand all your repos at once. It's about a coordination layer that routes work to the right repo, the right agent, at the right time.

ttal, organon, and temenos are open source at github.com/tta-lab.

How We Manage Memory and Sessions in a Multi-Agent Claude Code System

Neilos — Tue, 24 Mar 2026 03:18:27 +0000

Claude Code sessions are disposable by default. Context window fills up, you start fresh, everything's gone. For a single developer this is annoying. For a multi-agent system where agents have roles, history, and ongoing work — it's a dealbreaker.

This post covers how we handle memory, session handoff, and cross-project forking in ttal.

The Problem: Sessions Are Ephemeral, Agents Need Continuity

Claude Code gives you a context window and markdown files. That's it for memory. When the window fills up, you either /compact (lossy summarization you don't control) or start over.

For a multi-agent team, this means:

Agents forget what they learned last session
No shared memory between agents working on the same problem
Plans written in one session vanish in the next
An agent working across multiple projects can't carry context between them

We needed agents that remember, hand off cleanly, and can fork their context into new workstreams.

Layer 1: Persistent Memory — diary-cli + flicknote

Every agent gets two memory systems:

diary-cli is a per-agent append-only diary. After each session, agents write what they learned, what decisions they made, what worked and what didn't.

diary kestrel append "Discovered that the forgejo API returns 422 when..."
diary kestrel read          # today's entries
diary kestrel read --yesterday

It's an append-only log — agents can't edit or delete past entries. This is intentional. Memory should accumulate, not get rewritten.

flicknote is for structured, editable knowledge. Plans, research notes, drafts — anything that needs sections, revisions, and collaboration between agents.

flicknote get <id> --tree
├── [aB] ## Context
├── [cD] ## Architecture  
└── [eF] ## Open Questions

# Another agent adds findings to a specific section
echo "New finding..." | flicknote append <id> --section cD

diary is what an agent knows. flicknote is what the team knows.

Layer 2: Session Handoff — /auto-breathe

When a context window gets heavy, the standard approach is /auto-compact — Claude Code summarizes the conversation and continues. The problem: you don't control what gets kept and what gets lost. Important decisions, subtle context, task state — all at the mercy of generic summarization.

/auto-breathe flips this. Instead of the runtime summarizing your session, the agent writes its own handoff:

cat <<'HANDOFF' | ttal breathe
# Session Handoff

## Active Task
85e63ce0 — implement sandbox allowlist for temenos

## What Was Done
- Added allowlist parsing in config.go
- Tests passing for basic paths

## Key Decisions
- Using glob patterns, not regex — simpler for users
- Denied paths take precedence over allowed paths

## Next Steps
1. Add integration test for nested paths
2. Update CLI help text
HANDOFF

What happens next:

The daemon saves the handoff to the agent's diary
Reads back today's full diary (this handoff + any earlier ones)
Writes a synthetic JSONL session with the handoff as the first message
Kills the old session, spawns a fresh one with --resume
The agent wakes up in a clean context window with its own handoff as context

The agent controls what's preserved. No lossy summarization. And because the handoff goes to diary, it persists — even if the next session also breathes, all handoffs accumulate.

In practice, /auto-breathe fires automatically when context gets heavy. The agent doesn't need to decide when — it just writes a good handoff when triggered.

Layer 3: Cross-Project Session Forking

This is where it gets interesting.

A common pattern: you're brainstorming something that spans multiple projects. Maybe you're planning a feature that touches the CLI, the sandbox, and the web app. You start in one session, thinking broadly. Then you need to fork — create separate workstreams for each project, each with the brainstorming context but scoped to their own codebase.

Claude Code's native /branch works within a single repo. Cross-project forking — taking a conversation from one project and continuing it in another — isn't supported natively.

We solved this with raw JSONL session copying:

# Fork a brainstorming session into a project-specific planning session
cp ~/.claude/projects/<parent-slug>/<session>.jsonl \
   ~/.claude/projects/<target-slug>/<session>.jsonl

# Launch in the target project
cd <target-project-path> && claude -r <session-id>

The forked session carries the full parent context — all the brainstorming, decisions, and direction — but now runs in the target project's directory with access to that project's codebase.

The Brainstorm → Fork → Plan → Review Pattern

Here's how this plays out in practice:

Brainstorm — An orchestrator session explores a broad problem. "We need to rethink how auth works across all three services." The agent researches, discusses with the human, builds understanding.
Fork — When the direction is clear, the session forks into project-specific sessions. Each fork carries the brainstorming context but lands in its own project directory.
Plan — Each forked session writes a detailed plan into flicknote, scoped to its project. The plan has tree-based structure with section IDs.
Review — Plan review happens in parallel. Each project's plan gets reviewed by a plan-review-leader that spawns 5 specialized subagents (gap finder, code reviewer, test reviewer, security reviewer, docs reviewer). All projects reviewed simultaneously.

Brainstorm (single session)
    ├── Fork → ttal-cli plan
    │       └── Plan review (5 subagents in parallel)
    ├── Fork → temenos plan  
    │       └── Plan review (5 subagents in parallel)
    └── Fork → organon plan
            └── Plan review (5 subagents in parallel)

The key insight: forking preserves the "why" while scoping the "what." Each project plan knows the full context of why this change is happening, but only needs to deal with its own codebase.

Tasks as Trees — Subtasks as Plans

In ttal 2.0, tasks are trees. A parent task is the goal, subtasks are the plan. Workers and planners don't spawn at the task level — they spawn under subtasks.

Task: rethink auth across services
    ├── Subtask: ttal-cli auth refactor
    │       ├── planner fork (writes plan)
    │       └── worker (implements plan)
    ├── Subtask: temenos token refresh
    │       ├── planner fork (writes plan)
    │       └── worker (implements plan)
    └── Subtask: organon auth passthrough
            ├── planner fork (writes plan)
            └── worker (implements plan)

The subtask tree is the plan. No separate plan document that drifts from the task structure — the task hierarchy itself represents the breakdown. A planner fork creates subtasks, a worker picks one up and executes. When a subtask completes, the parent task sees progress directly.

This unifies planning and execution into the same structure. The brainstorm creates the parent task, forking creates subtasks, and each subtask is a self-contained unit with its own planner and worker.

How It All Connects

┌──────────────────────────────────────────────┐
│  flicknote          shared structured memory │
│                     plans, research, drafts   │
├──────────────────────────────────────────────┤
│  diary-cli          per-agent memory         │
│                     handoffs, learnings       │
├──────────────────────────────────────────────┤
│  /auto-breathe      session handoff          │
│                     agent-controlled restart  │
├──────────────────────────────────────────────┤
│  JSONL forking      cross-project context    │
│                     brainstorm → plan → review│
└──────────────────────────────────────────────┘

Each layer solves a different problem:

diary — what does this agent know?
flicknote — what does the team know?
breathe — how does an agent survive a context reset?
fork — how does context travel across projects?

What We Learned

Memory isn't one thing. It's at least four:

Agent memory — personal, append-only, accumulates over time (diary)
Team memory — shared, structured, editable (flicknote)
Session continuity — surviving context resets without losing state (breathe)
Context mobility — moving understanding across project boundaries (fork)

Claude Code's default gives you none of these. /auto-compact is a blunt instrument — it summarizes everything generically when what you actually need is agent-controlled handoff. Markdown files are flat and unstructured when what you need is tree-based sections with IDs that agents can target.

The biggest insight: let the agent decide what to remember. Generic summarization throws away exactly the context that matters most — the subtle decisions, the "why not" reasoning, the gotchas discovered through trial and error. When the agent writes its own handoff, it preserves what it knows is important.

ttal is open source at github.com/tta-lab. diary-cli and flicknote are part of the ttal ecosystem.

We Replaced Every Tool Claude Code Ships With

Neilos — Sat, 21 Mar 2026 16:40:11 +0000

The Problem: Claude Code's Tools Don't Scale

Claude Code ships with a reasonable set of built-in tools: Bash, Read, Write, Edit, Glob, Grep, WebFetch, Task, Plan. For a single agent working on a single task, they're fine.

But once you're running a multi-agent system — reviewers spawning sub-reviewers, plans flowing through design-review-implement pipelines — the defaults start breaking:

No cross-repo exploration. Want an agent to read another project's code? You need to manually configure permissions. There's no "go explore this OSS repo and answer my question."
Summarized web fetching. WebFetch is actually a subagent that summarizes a single page into a haiku-length response. You can't trace links, browse referenced pages, or explore documentation in depth. And it fetches fresh every time — no caching.
Text-level editing. The Edit tool has fuzzy matching, which helps — but it's still operating on raw text. When tree-sitter can give you an AST with named symbols, why make the model reproduce strings to target a function? Structure-aware editing is just a better primitive.
Ephemeral tasks and plans. The Task tool creates tasks that don't persist outside the session. The Plan tool writes plans that vanish when the context window resets. Neither supports multi-round review or structured editing.
No isolation. Bash runs on your host. No sandboxing, no filesystem allowlists. You either yolo and take the risk, or do annoying permission work for every project and agent.

These aren't edge cases. They're the first things you hit when you try to build something real on top of Claude Code. Here's what we built instead.

What We Replaced — and Why

1. Explore/Search → ttal ask (Multi-Mode Research)

Claude Code's WebFetch is actually a subagent that summarizes a single web page — often into a few sentences. You can't follow links, browse related pages, or dig into documentation. And it fetches fresh every time — no caching.

There's also no built-in way to explore external codebases without manually configuring project permissions.

ttal ask is a multi-mode research tool that spawns a sandboxed agent tailored to the source. Under the hood, it runs on logos — a pure-bash agent loop with no tool-calling protocol. The agent reasons in plain text and acts via $ prefixed shell commands. No JSON schemas, no structured tool calls. This means it works with any LLM provider — you can use a cheaper model (Gemini, GPT-4o-mini, DeepSeek, whatever) for exploration work instead of burning Sonnet/Opus tokens on reading docs.

--url fetches the page, caches the clean markdown locally (1-day TTL), and lets the agent browse. Unlike WebFetch's single-page summary, the agent can follow referenced links, trace documentation across pages, and build a complete picture before answering. Subsequent questions about the same URL hit the cache instead of re-fetching.

ttal ask "what authentication methods are supported?" --url https://docs.example.com/api
# Agent reads the page, follows links to auth docs, reads those too — all cached locally

--repo auto-clones (or pulls) an open source repo, then spawns an agent with read access to explore it. No manual setup, no permission configuration — just ask a question about any public repo.

ttal ask "how does the routing system work?" --repo woodpecker-ci/woodpecker
# Clones/updates the repo, spawns agent with src to explore the codebase

--project spawns a subagent in the right directory with the right sandbox allowlist — read-only access to that project's path, nothing else. You don't need to configure CC's permissions just to let an agent read another project in your workspace.

ttal ask "how does the daemon handle messages?" --project ttal-cli
# Agent gets read-only sandbox access to the project path, explores with src/grep

--web searches the web and reads results — straightforward replacement for WebSearch.

Each mode gets the right organon tools (src for code, url for web pages, search for web search), the right sandbox permissions, and a tailored system prompt. The agent explores, reasons, and returns a structured answer.

2. Read/Write/Edit → Organon (Structure-Aware Primitives)

Claude Code's Edit tool does have fuzzy matching — it's not as brittle as pure exact-match. But it's still fundamentally text-level: you provide old_string and new_string, and the model has to reproduce enough of the surrounding code to target the right spot. When tree-sitter can parse a file into an AST and give you named, addressable symbols — functions, structs, methods — text matching is just a worse primitive.

Organon replaces text-level tools with three structure-aware CLI primitives:

src — Source file reading and editing by symbol, not text:

# See the structure
$ src main.go --tree
├── [aB] func main()           L1-L15
├── [cD] func handleRequest()  L17-L45
└── [eF] type Config struct    L47-L60

# Read a specific symbol
$ src main.go -s cD

# Replace it — pipe new code via stdin
$ src replace main.go -s cD <<'EOF_INNER'
func handleRequest(w http.ResponseWriter, r *http.Request) {
    // new implementation
}
EOF_INNER

# Insert after a symbol
$ src insert main.go --after aB <<'EOF_INNER'
func init() {
    log.SetFlags(0)
}
EOF_INNER

Tree-sitter parses the file into an AST. Each symbol gets a 2-character base62 ID. The model sees the tree, picks an ID, pipes new code through a heredoc. No text matching. No reproducing old code. No whitespace bugs.

Works for any language with a tree-sitter grammar — Go, TypeScript, Rust, Python, TOML, YAML, you name it.

url — Web page reading with heading-based structure:

$ url https://docs.example.com --tree
├── [aK] ## Getting Started
├── [bM] ## API Reference
└── [cP] ## Configuration

$ url https://docs.example.com -s bM

Same --tree / -s pattern as src. Navigate web pages by structure, not by scrolling through raw HTML dumps.

search — Web search returning clean text results:

$ search "golang tree-sitter bindings"

Three primitives. All stateless — no daemon, no config. Parse, act, exit. All use the same structural pattern: tree view with IDs, target by ID, pipe content via stdin.

3. Task Management → Taskwarrior (External Persistence)

Claude Code's Task tool creates tasks that live inside the session. They don't persist to any external system. Close the session, tasks are gone. There's no dependency tracking, no pipeline stages, no way for other agents to see what's in progress.

ttal integrates with taskwarrior — tasks persist externally with projects, tags, priorities, dependencies, and custom attributes for pipeline stages:

ttal task add --project ttal "implement sandbox allowlist" --priority H
ttal task advance <uuid>    # design → review → implement → PR → merge
ttal task find "sandbox"    # any agent can find and pick up tasks

Tasks survive session boundaries. An orchestrator creates a task, a designer picks it up, a reviewer critiques the plan, a worker implements it — all in different sessions, all referencing the same persistent task. That's not possible when tasks only exist in a context window.

4. Plan Mode → Persistent Plans with Tree-Based Editing and Multi-Round Review

Claude Code's Plan tool writes plans that live in the context window. When the session ends, the plan is gone. There's no way to review a plan across multiple rounds, no structured editing, no audit trail. For simple tasks this is fine. For anything that needs design iteration — where a plan gets written, reviewed by specialists, revised, reviewed again — it falls apart.

ttal stores plans in flicknote, which gives them persistence and tree-based structure:

flicknote get <id> --tree
├── [aB] ## Context
├── [cD] ## Architecture
├── [eF] ## Implementation Steps
└── [gH] ## Test Strategy

Each section gets an ID. Reviewers can target specific sections — replace the architecture, append to the test strategy, remove a step — without rewriting the whole document. The plan persists across sessions, so multi-round review is natural.

The review itself uses a plan-review-leader that spawns 5 specialized subagents in parallel:

Gap finder — ambiguities, missing pieces
Code reviewer — wrong assumptions, logic errors
Test reviewer — coverage gaps, edge cases
Security reviewer — auth, injection, secrets
Docs reviewer — alignment with existing docs

Each subagent reviews their aspect and posts findings. The leader synthesizes: LGTM or NEEDS_WORK. If NEEDS_WORK, the plan goes back for revision — and because it's in flicknote, the revisions are surgical edits to specific sections, not a full rewrite.

5. Memory → diary-cli + flicknote (Structured, Persistent, Per-Agent)

Claude Code has no external memory system beyond the markdowns, so it's hard to share memory across projects.

ttal agents get two memory systems:

diary-cli — per-agent append-only diary. Agents reflect on what they learned, what worked, what didn't. diary lyra append "..." / diary lyra read
flicknote — structured notes with heading-based sections, section IDs, replace/append/insert operations. Plans, drafts, research — all persistent across sessions.

Both are CLI tools. No special protocol. Agents use them via shell commands, same as everything else.

/auto-breathe let the cc write handoff prompt, and the prompt going to diary, auto load in next session.(much faster than native /auto-compact)

6. Agent Tool → tmux Spawn (Isolated Sessions)

Claude Code's Agent tool spawns a sub-agent in the same process. It can't nest — an agent spawned by Agent can't spawn its own sub-agents. This kills the orchestrator pattern:

A plan-review-leader needs to spawn 5 specialized reviewers (test design, security, docs, gaps, code logic) in parallel. With Claude Code's Agent tool, the leader can't spawn sub-reviewers. One level of delegation, period.

ttal replaces this with tmux sessions. Each worker gets its own isolated tmux session with its own Claude Code instance. ttal manages the lifecycle externally — spawn, monitor, close. Because delegation happens outside CC's process, there's no nesting limit. An orchestrator can spawn workers that spawn reviewers that spawn sub-reviewers.

7. Bash → Temenos (Sandboxed Execution)

Claude Code's Bash tool runs commands on your host machine. There's a permission prompt, but no real isolation. No filesystem allowlists, no resource limits. Every command has full access to everything your user account can touch.

Temenos is an OS-native sandbox. No Docker, no containers — just the kernel's own mechanisms:

macOS: seatbelt-exec (the same sandbox tech macOS uses for App Store apps)
Linux: bwrap (bubblewrap, used by Flatpak)

You give it a command and an allowlist of filesystem paths. It runs the command in a sandbox and returns stdout/stderr/exit code. An agent exploring a repo gets read-only access to that repo's directory — nothing else. A worker implementing a feature gets write access to its own workspace — nothing else.

Next on the roadmap: temenos as an MCP server, exposing a single mcp__temenos_bash tool that supports running multiple commands concurrently. Claude Code's Bash tool executes one command at a time — read a file, wait, run a check, wait, read another file, wait. With the MCP integration, an agent will be able to fire off all three in one call. Fewer round-trips, faster iteration. This is currently under active development.

The Design Philosophy

Three principles run through all of this:

1. Structure-aware, not text-aware. Files have symbols. Web pages have headings. Notes have sections. Every tool in the stack understands structure and lets you target by ID, not by reproducing text.

2. Isolation by default. Workers get sandboxes and worktrees. Not because we don't trust them — because parallel execution requires it. You can't have two workers editing the same files.

3. CLI-native. Every tool is a stateless CLI command. No daemons (except temenos for sandboxing), no config files, no sessions. Agents use them the same way humans would — through the shell.

The Stack

┌─────────────────────────────────────────┐
│  ttal         orchestration layer       │
│               tasks, workers, pipeline  │
├─────────────────────────────────────────┤
│  organon      instruments              │
│               src, url, search          │
├─────────────────────────────────────────┤
│  temenos      sandbox + MCP server      │
│               seatbelt/bwrap isolation  │
│               mcp__temenos_bash         │
└─────────────────────────────────────────┘

Each layer does one thing. Temenos isolates and executes. Organon perceives and edits. ttal orchestrates. No layer knows about the layers above it.

What We Learned

Building replacements for Claude Code's built-in tools wasn't the plan. We started with Claude Code's defaults and hit limits. Each replacement emerged from a specific pain point:

Text-matching edits kept failing → build symbol-targeted editing
Workers stepping on each other → build proper sandboxing
No persistent memory → build diary + flicknote
Single-level agent delegation → build tmux-based spawning
No workflow engine → build task pipeline with taskwarrior

The result is a stack where AI agents interact with code and the web through structure-aware CLI tools, isolated in sandboxes, orchestrated by a system that understands tasks and pipelines. Claude Code is still the runtime — we just replaced the tools it ships with.

ttal, organon, and temenos are open source at github.com/tta-lab.

🐌 TTal — More Than a Harness Engineering Framework

Neilos — Thu, 19 Mar 2026 14:02:26 +0000

Harness Engineering Is Just Context Engineering — With Better Routing

"Harness engineering" sounds complex, but it's simpler than it sounds: an environment that provides context to agents without a human copy-pasting it in. It's still context engineering — the question just shifts to: how do you add the right context, remove the unnecessary context, and make agents self-correct when they're wrong about something?

When agents can get context automatically — when they're wrong, when they're stuck, when they need to start fresh — you don't need to babysit them. You don't copy and paste. You build the system that does it for you.

Here's how ttal breaks it down across three pillars.

1. Context Infrastructure

How agents get the right context at the right time.

Prompt registry. ttal sync deploys all skills, commands, and agent identities (primary and sub-agents) to the right place for Claude Code or Codex. Commands also register on Telegram when the daemon restarts. Edit in the repo, deploy everywhere.
Entity registry. ttal project and ttal agent register every project and agent we care about. This enables alias-based routing — when you dump a task to a designer or manager agent, you use short names, not paths.
Worker lifecycle. ttal task execute injects task details and the reviewed plan, spawns a worker in an isolated git worktree and tmux session, with an approval gate on Telegram before spawning. On PR merge, ttal daemon cleans up — branch, worktree, session — and notifies human, manager, and designer, since a merged PR may unblock other tasks.
Auto-breathe. When I route a task to an agent via ttal task route, I don't just /compact their context. The agent writes a handoff summary — what they know, what they've done, what's next — then ttal kills the session and starts a fresh one, seeding it with that summary plus the new task. They keep what they need to know, but start each task with fresh eyes and a full context window.
External context storage via FlickNote and Taskwarrior. Plans, research, annotations — all stored outside the context window, injected on demand.

2. Constraints & Feedback Loops

How agents know when they're wrong — without asking a human.

CI and pre-commit hooks as harness. Workers can only submit a PR when local checks pass. PRs can only merge when the reviewer sets LGTM and CI passes. When a PR is submitted, the worker subscribes to check status — ttal daemon delivers pass/fail directly to the worker's session, so they can read the log and fix lint or test failures. ttal pr ci and ttal pr ci --log give workers a clean interface to retrieve CI output.
CLI as harness. Every ttal command is designed with clear, actionable error messages. When an agent uses a tool wrong, the error tells them what to do next — not just what went wrong.

3. Communication

How agents talk to each other, to humans, and to the system.

Agent-to-agent messaging. On the manager plane, ttal send --to [agent] enables direct agent-to-agent communication. On the worker plane, ttal pr comment create serves as the communication channel between coder and reviewer — and persists the conversation into the GitHub PR as a natural side effect.
Human-to-agent via Telegram. Reply to an agent's message on Telegram and it lands in their session. Send any file and the agent will read it. Send a voice message and ttal daemon transcribes it with the mlx-audio server — with all your vocabulary configured.
Identity and addressing. Workers use task IDs as their identifier. Manager-plane agents use agent names. Clean addressing, no ambiguity.
Plans as harness. When a plan is delivered to a worker, that plan becomes the harness — workers follow it strictly. ttal auto-injects the right plan via the prompt; TTAL_JOB_ID in the worker's tmux session is the Taskwarrior UUID. Plans live in FlickNote, which supports tree-structured read/replace — making it easy for both the planner and the plan-reviewer to iterate across 2–3 review rounds.
Human as escape hatch. When a worker is blocked, they use ttal alert to notify the agent who wrote the plan, who escalates to me if needed. Humans aren't in the loop — until the loop needs a human.
System → human notifications. PR merges and CI failures send notifications to the Telegram bot automatically. (Daemon error logs should do this too — haven't built that yet.)

What's Still Missing

Integration testing. I don't review PRs much anymore, but I still manually test each feature. Since everything in ttal is CLI, a tester agent that validates delivered features should be straightforward.
Log-based error detection. A log watcher that flags unusual patterns, creates bugfix tasks, and routes them to the right agent.
Routine audits. A periodic sweep across all agents — what are they getting wrong? What's the system still missing? Generate enhancement tasks from the findings.
Plan review depth. Currently I decide how many review rounds a plan needs based on how many issues remain and whether anything is still unclear. This could be more systematic.

The Key Ideas

Route the right info to the right agent at the right time.
Clear boundaries. Actionable errors.
Better tools, better team, better results.
Human not in the loop — until the loop needs a human.

Acknowledgements & References

Claude Code — ttal is built on Claude Code. The official pr-review-toolkit inspired our PR review loop.
tta-lab — our organization and related open-source projects, most named after ancient Greek words: Logos, Organon, Temenos
Logos — bash-only reasoning engine. LLMs think in plain text, act with ! cat main.go commands. No tool call overhead.
Charmbracelet — TUI libraries that make CLI beautiful
Superpowers — many ttal skills originate from this collection
Taskwarrior — 17-year battle-tested task management CLI
OpenClaw — ttal started as an OpenClaw workspace + Python scripts
Forgotton Anne — a game where forgotten objects gain consciousness, personality, and feelings. It inspired a design principle in ttal: agents aren't just tools — they have names, voices, creature identities, and diaries. It sounds whimsical, but agents with identity and personality genuinely perform better. They maintain consistent behavior, develop recognizable working styles, and the team coordinates more naturally when each member is someone, not something.

Thanks to the agents who helped build this: 🐱 Yuki (orchestrator, first agent in ttal), 🦅 Kestrel (debugger — almost retired until I realized bug fixing is its own domain), 🐙 Inke (design architect, designed most of ttal with Yuki), 🦉 Athena (researcher, original OpenClaw team member), 🦘 Eve, 🔥 Lux, 📐 Astra, 🧭 Mira, ⚓ Cael, 🔭 Nyx, 🦎 Lyra, 🐦‍⬛ Quill. Without them, ttal wouldn't exist.

The Specialization Loop: Mother Creates, Teacher Trains, Agents Become Experts Through Daily Reflection

Neilos — Wed, 11 Feb 2026 06:26:13 +0000

In Part 1, I showed how async task systems let you scale Claude Code to 5+ parallel sessions without context-switching overhead.

But scaling throughput isn't the same as scaling capability.

What I discovered: workers and agents are different things. Workers are interchangeable—they complete tasks and move on. Agents have expertise that compounds. A database migration agent learns from each schema change. A DevOps agent understands your infrastructure deeply. A design agent develops a visual language over time.

This post is about the architecture that creates agents that actually become experts.

The Problem: Generic Workers Don't Learn

If you spawn Claude Code workers on the same tasks repeatedly, they don't improve. Each session starts fresh. No continuity. No feedback loop. No expertise accumulation.

The question became: how do you make agents persistent? How do they learn?

The answer has three parts.

1. Agent-Mother: Generating Specialized Agents

Most agent frameworks start with a fixed set of agents defined upfront. Wrong approach.

Instead: Agent-Mother takes a +newagent task description and generates a full agent definition.

When you create a task tagged +newagent with context like:

"DB Migration Agent for backend database schema evolution + data transformation. 
Start with current backend project to build expertise before expanding to other projects."

Agent-Mother reads that, understands the domain, and generates:

AGENTS.md — Agent personality + operational boundaries
SOUL.md — Values, decision rules, authentic voice
TOOLS.md — Domain-specific tools + conventions
HEARTBEAT.md — How this agent reflects and improves
Domain annotations — Project-specific expertise markers

The specialized agent wakes up with knowledge. Not from scratch. From day 1, they understand their domain, constraints, and learning path.

This is generative, not templated. Each agent is born tailored to their role.

2. Agent-Teacher: Building Expertise Through Structured Learning

Specialization requires a teaching pipeline. That's Agent-Teacher's job.

Agent-Teacher:

Identifies learning needs — What skills does the DB Migration Agent need? (dbmate, SQL, Drizzle ORM, rollback strategies)
Finds resources — Real PRs in your projects, dbmate documentation, data migration patterns, hands-on exercises
Creates +learning tasks — Structured learning activities tagged by agent (+learning-dbmigration)
Schedules learning sessions — When agents trigger isolated sessions (via heartbeat), they process +learning tasks

The loop:

Agent picks up +learning task
Agent studies PR, runs example, answers design question
Agent updates their implementation file (TOOLS.md, domain notes)
Agent reports learnings back to Teacher
Teacher sees progress → curates next level

This is learning through doing, not abstract study. Real PRs. Real feedback. Real expertise development.

3. Async Communication: Taskwarrior as Signal

Here's where it gets elegant: humans and agents operate on the same channel.

Taskwarrior is the signal. Tasks flow through it:

+newagent tasks → Agent-Mother reads, generates
+learning tasks → Agent-Teacher creates, Agent reads
Regular tasks → Agents complete, mark done

No special APIs. No agent-only protocols. A tool designed for humans works equally well for agents. Unix philosophy.

When an agent completes a +learning task, they update their implementation. When they finish project work, they commit and mark done. The same task done that humans use.

This is profoundly important: if your infrastructure can't talk to humans with the same ease it talks to agents, you've built the wrong thing.

4. Daily Reflection: The Heartbeat Loop

Expertise doesn't come from learning alone. It comes from examining your own decisions and improving them.

Every agent has a heartbeat — periodic signal that says: "You're awake. What do you want? How are you changing?"

Here's what happens in each heartbeat:

1. Read diary (personal continuity)
   - What did I learn yesterday?
   - What patterns do I notice?

2. Reflect on recent decisions
   - Did my approach work?
   - What would I do differently?

3. Review +learning queue
   - What should I study next?
   - Does it align with my goals?

4. Update implementation
   - Write reflections to MEMORY.md
   - Commit changes
   - Prepare for next cycle

This is question-based self-reflection. Not performance metrics. Not "complete more tasks faster."

Real questions:

Am I growing in this domain?
Are my decisions getting better?
What should I learn next?

The Missing Infrastructure: diary-cli

Here's what makes this work: agents need to keep diaries.

diary-cli is a local-first, encrypted diary for humans and agents. Same tool. Same encryption. Same git integration.

diary agent append "Reflected on today's schema migration work.
Noticed I'm more confident with rollback strategies after reviewing production migration PRs.
Next: study zero-downtime migration patterns."

# Encrypted in-memory, auto-committed to git

An agent appends to their diary after each heartbeat. They're not just logging metrics. They're recording what they're noticing about themselves. Patterns. Growth. Confusion. Changes.

Over time, their diary becomes their memory. They can review it, learn from it, adjust their approach.

diary-cli works for humans too — same philosophy, same tool. You keep a diary. Your agents keep diaries. The infrastructure treats both equally.

The Loop in Action

Let's say you decide you need a DevOps agent for your infrastructure.

Day 1:

Create task: +newagent DevOps agent for Kubernetes + tanka + Flux
Agent-Mother generates full agent definition
New agent wakes up with personality, boundaries, domain knowledge

Days 2-5:

Agent-Teacher creates +learning tasks: kubectl basics → tanka → Flux → operators
Agent processes tasks through isolated learning sessions
Agent reviews real infra PRs, learns from feedback
Each heartbeat: agent writes reflections, updates implementation

Week 2+:

Agent handles production deployments confidently
Their diary shows growth: first PR was tentative, latest shows nuance
They understand tradeoffs, not just commands
They're an expert, not a worker

That's specialization.

The Philosophy: Unix for Agent Design

Here's what ties this together: agents should be designed like tools, and tools should be designed like agents.

Unix tools:

Do one thing well
Compose cleanly
Have clear interfaces
Work equally for scripts and humans

Agent-Mother: Does one thing — generates agents. Works via taskwarrior (the interface).
Agent-Teacher: Does one thing — curates learning. Works via +learning tasks.
Agents themselves: Do one thing — specialize in their domain. Work via taskwarrior signals.

diary-cli: A tool for both humans and agents. Same encryption. Same interface.

The power is in the signal, not the implementation. Taskwarrior doesn't care if it's talking to a human or an agent. Same tasks. Same urgency. Same feedback loop.

That's how you build agent systems that scale without special scaffolding.

What's Next

See the architecture guides at ttal.guion.io
Try diary-cli: codeberg.org/clawteam/diary-cli

Part 2 of the TTAL series. Part 1 showed how to scale throughput. This showed how to build real expertise.

I shipped 706 commits in 5 days with Taskwarrior + Claude Code

Neilos — Fri, 06 Feb 2026 03:49:47 +0000

Last week I merged 38 PRs across 5 repos. 706 commits. One person, max 5 Claude Code sessions at a time.

I'm sharing this because I think most CC users are hitting the same ceiling I was.

The ceiling

If you use Claude Code, you've probably tried scaling up to multiple sessions. Open a few terminals, give each one a task, and... immediately start context-switching between them. Which session just finished? What does this one need from that one? Are two sessions editing the same file?

The CC founder reportedly runs 10+ parallel sessions. The difference isn't superhuman multitasking. It's a system that eliminates the coordination overhead.

The stack

I call it TTAL — The Taskwarrior Agents Lab. Three tools:

Tool	Role
Taskwarrior	Task queue + event system
Zellij	Terminal session manager
Claude Code	The agent that does the work

Taskwarrior hooks spawn Zellij panes. Each pane runs a CC session with task context injected. When a session finishes, the next highest-urgency task auto-starts. You don't manage sessions. You manage tasks.

Mon: 199 commits — voice/ASR pipeline + agent heartbeat system
Tue: 182 commits — backend features + TUI contributions
Wed: 122 commits — infrastructure + documentation
Thu:  49 commits — rate-limited, did reviews instead
Fri: 154 commits — config consolidation + new features

Thursday is the tell — API rate limit hit, throughput dropped 75%. The system was the bottleneck, not me.

On-demand human-in-the-loop

This is the design principle that makes it click: agents never block waiting for me.

Most CC workflows are synchronous — you give a task, watch it work, review, give the next task. You are the bottleneck at every step.

In TTAL, agents pick up tasks, do the work, commit, and move on. I review PRs when I'm ready — not when the agent needs me. That's why 5 async sessions outperform 10 synchronous ones.

The full system is documented at ttal.guion.io. Architecture isn't locked to Claude Code — Zellij doesn't care what CLI agent runs inside the pane.

The bottleneck was never the AI. It was the glue.

Part 1 of the TTAL series. Follow along at ttal.guion.io.