<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Neilos</title>
    <description>The latest articles on DEV Community by Neilos (@neil_agentic).</description>
    <link>https://dev.to/neil_agentic</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F861835%2Fe81c88ae-c503-46e5-8f34-e562d1fbee2c.png</url>
      <title>DEV Community: Neilos</title>
      <link>https://dev.to/neil_agentic</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/neil_agentic"/>
    <language>en</language>
    <item>
      <title>CC 20x max is not enough? This is what I'm doing to fix that</title>
      <dc:creator>Neilos</dc:creator>
      <pubDate>Tue, 31 Mar 2026 07:52:36 +0000</pubDate>
      <link>https://dev.to/neil_agentic/cc-20x-max-is-not-enough-this-is-what-im-doing-to-fix-that-4ka0</link>
      <guid>https://dev.to/neil_agentic/cc-20x-max-is-not-enough-this-is-what-im-doing-to-fix-that-4ka0</guid>
      <description>&lt;p&gt;There's a 200-comment Reddit thread right now of people watching their Claude Max plan vanish in minutes. One word — "Morning" — took 15% of someone's 5-hour limit. A fresh session, two messages, weekly quota wiped.&lt;/p&gt;

&lt;p&gt;It's not just power users. Normal usage is hitting the wall.&lt;/p&gt;

&lt;p&gt;The cap is real. But the opacity is worse — you can't see what's eating your budget, so you can't optimize around it. People are scared to use Opus, losing productivity not just when they hit the wall, but in constant anticipation of it.&lt;/p&gt;

&lt;p&gt;The community is already finding the direction. One commenter: "best workflow is Opus high, then everything with Sonnet subagents." Right idea — but it stops at Sonnet, and it stays inside Anthropic's billing.&lt;/p&gt;

&lt;p&gt;The pattern I've landed on: CC stays in charge, a cheap model does the work. Here's how it's structured.&lt;/p&gt;

&lt;h2&gt;
  
  
  ttal, logos, and MiniMax M2.7
&lt;/h2&gt;

&lt;p&gt;Quick context for anyone not familiar:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ttal&lt;/strong&gt; is an agent orchestration CLI — it manages tasks, spawns workers, runs pipelines, and routes work between agents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;logos&lt;/strong&gt; is ttal's bash-only agent loop. Text in, text out. The model writes prose and shell commands, the sandbox executes them. No tool schemas, no JSON, no provider-specific plumbing — just a simple text convention any model can follow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MiniMax M2.7&lt;/strong&gt; is a reasoning model released March 2026. $0.30/1M input, $1.20/1M output — about 10× cheaper than Sonnet. On Terminal Bench 2, the only direct benchmark with both models, it scores 57% vs Sonnet's 59%. In detection head-to-heads against Opus (Kilo Code), it found every bug and every security vulnerability — same result, fraction of the cost. The gap vs frontier shows up in architectural depth and complex multi-file reasoning, not in focused scoped tasks.&lt;/p&gt;

&lt;h2&gt;
  
  
  CC leads, MiniMax works
&lt;/h2&gt;

&lt;p&gt;Every pipeline stage in ttal has a CC lead orchestrator. There are three: &lt;code&gt;plan-review-lead&lt;/code&gt;, &lt;code&gt;pr-review-lead&lt;/code&gt;, and &lt;code&gt;code-lead&lt;/code&gt;. Each runs on Claude, holds context, makes decisions, and controls the flow. When focused work needs to happen, the lead delegates via &lt;code&gt;ttal subagent run&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ttal subagent run&lt;/code&gt; is a CLI command — leads call it internally, but you can run it manually too:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# plan-review-lead delegates to review specialists&lt;/span&gt;
ttal subagent run gap-finder
ttal subagent run security-reviewer
ttal subagent run test-reviewer

&lt;span class="c"&gt;# code-lead delegates focused single-file edits&lt;/span&gt;
ttal subagent run coder

&lt;span class="c"&gt;# research via ttal ask — explore any codebase, URL, or repo&lt;/span&gt;
ttal ask &lt;span class="s2"&gt;"how does auth work?"&lt;/span&gt; &lt;span class="nt"&gt;--project&lt;/span&gt; backend
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These run in parallel under the logos loop on M2.7. The lead picks up results, synthesizes, decides what's next. MiniMax never touches the orchestration. But the detection, the single-file edits, the exploration — all of it runs on a model that costs 10× less per token, entirely outside Claude's usage meter.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this works without quality loss
&lt;/h2&gt;

&lt;p&gt;Most subagent work doesn't need frontier intelligence. Detection, review, single-file edits, exploration — bounded scope, clear criteria. The question isn't "is M2.7 as smart as Sonnet?" It's "is M2.7 good enough for this specific task?"&lt;/p&gt;

&lt;p&gt;For review and detection it matches frontier quality. For single-file edits the scope is tight enough that it doesn't need to reason about the whole system. For exploration it just needs to read and report accurately. None of these require the architectural judgment and multi-file reasoning where frontier models genuinely earn their cost — and that judgment stays with the CC lead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why logos makes this possible
&lt;/h2&gt;

&lt;p&gt;Most agent frameworks couple to the provider's tool-call API. M2.7 sometimes hallucinates tool calls — the model behaves as if it has tools it doesn't actually have. In a standard framework that's hard to recover from.&lt;/p&gt;

&lt;p&gt;Logos handles all these edge cases. It detects hallucinated tool-call formats mid-stream, suppresses them, and injects corrective directives — the loop keeps running cleanly. And because logos uses no tool schemas, the surface area for these issues is minimal to begin with.&lt;/p&gt;

&lt;p&gt;The other benefit: logos doesn't care which model you use. M2.7 today, whatever's cheapest next month. No rebuilding, no schema migration. Any model that can follow a simple text convention works.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this actually changes
&lt;/h2&gt;

&lt;p&gt;You stop self-censoring. Stop being scared to kick off a review because it might eat 15% of your session. The review runs on M2.7 under a CC lead that costs almost nothing to orchestrate.&lt;/p&gt;

&lt;p&gt;The cap isn't going away. Anthropic is tightening, not loosening. A bigger plan isn't the answer — changing what the plan is used for is. CC for orchestration, decisions, and reasoning. Focused work on cheap models that don't touch your Claude budget at all.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;ttal and logos are open source at &lt;a href="https://github.com/tta-lab" rel="noopener noreferrer"&gt;github.com/tta-lab&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>vibecoding</category>
      <category>agents</category>
      <category>ai</category>
      <category>programming</category>
    </item>
    <item>
      <title>How I Manage 15+ Repos with Claude Code (Without Losing My Mind)</title>
      <dc:creator>Neilos</dc:creator>
      <pubDate>Fri, 27 Mar 2026 11:23:46 +0000</pubDate>
      <link>https://dev.to/neil_agentic/how-i-manage-15-repos-with-claude-code-without-losing-my-mind-2ood</link>
      <guid>https://dev.to/neil_agentic/how-i-manage-15-repos-with-claude-code-without-losing-my-mind-2ood</guid>
      <description>&lt;p&gt;Most Claude Code users work in one repo at a time. It's fine until your system spans multiple repos — then you're copy-pasting context between sessions, manually tracking which PR depends on which, and babysitting agents that can't see the full picture.&lt;/p&gt;

&lt;p&gt;I manage 15+ repos across Go, Rust, TypeScript, Python, and C++. 10 specialized Claude Code agents coordinate through Telegram. Here's what I tried first, why it didn't work, and what does.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Doesn't Work
&lt;/h2&gt;

&lt;p&gt;All of these approaches try to solve "how do I give one session access to multiple repos." But that's the wrong framing. When you need cross-repo context, what you actually need is cross-repo &lt;strong&gt;read and explore&lt;/strong&gt;. The write should always be focused on a single repo, a single PR. ttal handles this by separating the two: &lt;code&gt;ttal ask&lt;/code&gt; reads and explores anything, workers write to one repo at a time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Monorepo.&lt;/strong&gt; I use Moon for monorepo management — I'm not anti-monorepo. But when your stack spans Go, Rust, TypeScript, Python, and C++, a single repo doesn't cut it. Each language has its own build tooling, CI pipelines, and dependency management. Cramming them together creates more problems than it solves. Even with Moon handling the orchestration, past 3-4 languages, splitting is cleaner. This isn't a theoretical concern — &lt;a href="https://github.com/anthropics/claude-code/issues/23627" rel="noopener noreferrer"&gt;it's a real gap in the CC ecosystem&lt;/a&gt; that people are hitting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Submodules.&lt;/strong&gt; Even if you link repos together with submodules, you still don't get what you actually need: cross-repo coordination, shared context, parallel execution. Submodules give you a way to pin repo versions together — they don't give you a way to plan across repos, route tasks, or run parallel agents. And they don't compose well with worktrees, which are essential for parallel agent work. It's solving the wrong problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CC's native cross-repo workflow.&lt;/strong&gt; Want to explore another repo? You manually &lt;code&gt;/add-dir&lt;/code&gt;, then that session starts reading in that directory. It works, but it's manual and the session context pays the cost. In ttal, all exploration is handled by &lt;code&gt;ttal ask&lt;/code&gt; — a lightweight bash-only agent that collects info from any source (&lt;code&gt;--web&lt;/code&gt;, &lt;code&gt;--repo&lt;/code&gt;, &lt;code&gt;--project&lt;/code&gt;) and returns a detailed report without polluting your main session's context. And because &lt;code&gt;ttal ask&lt;/code&gt; runs on a simple bash-based agent loop, you can use any fast, cheap model (MiniMax M2.7 HighSpeed) for exploration — so Opus stays focused on the thinking work that actually needs it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Single-session cross-repo orchestration.&lt;/strong&gt; This is the trap most people fall into. You give one CC session access to all your repos and ask it to plan and implement a cross-repo feature. The context window fills up fast, the agent loses track of which repo it's in, and the quality of both planning and execution suffers. Don't try to make one session do orchestration, planning, and execution across repos. Let manager agents hold the big picture. Once the plan is done, let workers handle execution detail — one repo, one task, one worktree.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Works: A Coordination Layer
&lt;/h2&gt;

&lt;p&gt;The answer wasn't a better monorepo or a smarter IDE. It was a thin coordination layer on top of Claude Code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/tta-lab/ttal-cli" rel="noopener noreferrer"&gt;ttal&lt;/a&gt; is a single binary. Install it, define your projects in a TOML file, and you have a system that routes tasks to the right repo, the right agent, at the right time.&lt;/p&gt;

&lt;p&gt;Two planes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Worker plane&lt;/strong&gt; — ephemeral CC sessions that plan, review, and implement. Each gets its own git worktree, sandboxed environment, and tmux session. Spin up, do the work, merge, clean up. No babysitting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Manager plane&lt;/strong&gt; — persistent agents that live across sessions. They hold the big picture — what features they designed with you, which tasks are done or blocked, what shipped yesterday. The manager never touches code. The worker never worries about the big picture.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Message bridge&lt;/strong&gt; — the glue between everything. Human ↔ agent via Telegram. Agent ↔ agent via &lt;code&gt;ttal send&lt;/code&gt;. Manager ↔ worker via &lt;code&gt;ttal alert&lt;/code&gt; (workers notify their spawner automatically). CI status, PR reviews, task updates — all routed through the same daemon. You talk to your agents like coworkers in a chat app.&lt;/p&gt;

&lt;p&gt;I wrote about the philosophy in &lt;a href="https://dev.to/neil_agentic/ttal-more-than-a-harness-engineering-framework-2pbn"&gt;ttal — More Than a Harness Engineering Framework&lt;/a&gt;, the tooling in &lt;a href="https://dev.to/neil_agentic/we-replaced-every-tool-claude-code-ships-with-522j"&gt;We Replaced Every Tool Claude Code Ships With&lt;/a&gt;, and the memory model in &lt;a href="https://dev.to/neil_agentic/how-we-manage-memory-and-sessions-in-a-multi-agent-claude-code-system-2a9k"&gt;How We Manage Memory and Sessions&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Daily Workflow
&lt;/h2&gt;

&lt;p&gt;I open Telegram. 10 agent chats in a folder.&lt;/p&gt;

&lt;p&gt;A typical morning:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Tell Yuki (orchestrator) what I want to build today&lt;/li&gt;
&lt;li&gt;She breaks it into tasks, routes them to the right pipeline&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ttal go\&lt;/code&gt; advances each task — spawning planners, reviewers, coders in parallel&lt;/li&gt;
&lt;li&gt;Workers run in isolated worktrees across whichever repos need changes&lt;/li&gt;
&lt;li&gt;PR reviews happen automatically — parallel sub-reviewers check security, tests, types, edge cases separately&lt;/li&gt;
&lt;li&gt;I review verdicts, approve, &lt;code&gt;ttal go\&lt;/code&gt; merges and cleans up&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Cross-repo features just work. A change that touches ttal-cli, temenos, and organon gets three parallel workers, each in their own worktree, each with context about why the change exists.&lt;/p&gt;

&lt;h2&gt;
  
  
  Under the Hood
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Unix philosophy&lt;/strong&gt; — task management via Taskwarrior, knowledge via FlickNote, editing via tree-sitter. Compose dedicated tools, don't bundle into a platform.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sandbox auto-config&lt;/strong&gt; — specialized roles mean known paths. The multi-repo project registry means sandbox config writes itself. No manual permission prompts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pipeline-driven&lt;/strong&gt; — tag-based pipelines borrowed from event sourcing. One command (&lt;code&gt;ttal go&lt;/code&gt;) drives every transition. Human gates where they matter.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Built-in quality control&lt;/strong&gt; — parallel sub-reviewers focus on different aspects (security, tests, silent failures, type design). By the time a PR reaches you, it's been through plan review, code review, and CI.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Session forking&lt;/strong&gt; — brainstorm figures out &lt;em&gt;what&lt;/em&gt; and &lt;em&gt;why&lt;/em&gt;, then the session forks. Each fork inherits the full conversation and writes a plan scoped to its target repo. No summarization, no lossy handoff. Plan forks figure out the &lt;em&gt;how&lt;/em&gt;, workers carry all of it into implementation.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Numbers
&lt;/h2&gt;

&lt;p&gt;Last week: 190+ tasks completed across all repos. In ttal, each task is a PR — planned, reviewed, implemented, merged. One person, 10 agents.&lt;/p&gt;

&lt;p&gt;Throughput scales because coordination is automated. I don't track which session is doing what. I track tasks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew tap tta-lab/ttal
brew &lt;span class="nb"&gt;install &lt;/span&gt;ttal
ttal doctor &lt;span class="nt"&gt;--fix&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Define your projects:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="c"&gt;# ~/.config/ttal/projects.toml&lt;/span&gt;
&lt;span class="nn"&gt;[backend]&lt;/span&gt;
&lt;span class="py"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Backend API"&lt;/span&gt;
&lt;span class="py"&gt;path&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"/code/backend"&lt;/span&gt;

&lt;span class="nn"&gt;[frontend]&lt;/span&gt;
&lt;span class="py"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Frontend App"&lt;/span&gt;
&lt;span class="py"&gt;path&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"/code/frontend"&lt;/span&gt;

&lt;span class="nn"&gt;[infra]&lt;/span&gt;
&lt;span class="py"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Infrastructure"&lt;/span&gt;
&lt;span class="py"&gt;path&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"/code/infra"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Route a task:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ttal task add &lt;span class="nt"&gt;--project&lt;/span&gt; backend &lt;span class="s2"&gt;"add rate limiting middleware"&lt;/span&gt; &lt;span class="nt"&gt;--tag&lt;/span&gt; feature &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; ttal go
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The pipeline figures out the rest.&lt;/p&gt;




&lt;p&gt;Multi-repo at scale with Claude Code isn't about getting CC to understand all your repos at once. It's about a coordination layer that routes work to the right repo, the right agent, at the right time.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;ttal, organon, and temenos are open source at &lt;a href="https://github.com/tta-lab" rel="noopener noreferrer"&gt;github.com/tta-lab&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>How We Manage Memory and Sessions in a Multi-Agent Claude Code System</title>
      <dc:creator>Neilos</dc:creator>
      <pubDate>Tue, 24 Mar 2026 03:18:27 +0000</pubDate>
      <link>https://dev.to/neil_agentic/how-we-manage-memory-and-sessions-in-a-multi-agent-claude-code-system-2a9k</link>
      <guid>https://dev.to/neil_agentic/how-we-manage-memory-and-sessions-in-a-multi-agent-claude-code-system-2a9k</guid>
      <description>&lt;p&gt;Claude Code sessions are disposable by default. Context window fills up, you start fresh, everything's gone. For a single developer this is annoying. For a multi-agent system where agents have roles, history, and ongoing work — it's a dealbreaker.&lt;/p&gt;

&lt;p&gt;This post covers how we handle memory, session handoff, and cross-project forking in ttal.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Sessions Are Ephemeral, Agents Need Continuity
&lt;/h2&gt;

&lt;p&gt;Claude Code gives you a context window and markdown files. That's it for memory. When the window fills up, you either &lt;code&gt;/compact&lt;/code&gt; (lossy summarization you don't control) or start over.&lt;/p&gt;

&lt;p&gt;For a multi-agent team, this means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agents forget what they learned last session&lt;/li&gt;
&lt;li&gt;No shared memory between agents working on the same problem&lt;/li&gt;
&lt;li&gt;Plans written in one session vanish in the next&lt;/li&gt;
&lt;li&gt;An agent working across multiple projects can't carry context between them&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We needed agents that remember, hand off cleanly, and can fork their context into new workstreams.&lt;/p&gt;

&lt;h2&gt;
  
  
  Layer 1: Persistent Memory — diary-cli + flicknote
&lt;/h2&gt;

&lt;p&gt;Every agent gets two memory systems:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;diary-cli&lt;/strong&gt; is a per-agent append-only diary. After each session, agents write what they learned, what decisions they made, what worked and what didn't.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;diary kestrel append &lt;span class="s2"&gt;"Discovered that the forgejo API returns 422 when..."&lt;/span&gt;
diary kestrel &lt;span class="nb"&gt;read&lt;/span&gt;          &lt;span class="c"&gt;# today's entries&lt;/span&gt;
diary kestrel &lt;span class="nb"&gt;read&lt;/span&gt; &lt;span class="nt"&gt;--yesterday&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It's an append-only log — agents can't edit or delete past entries. This is intentional. Memory should accumulate, not get rewritten.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;flicknote&lt;/strong&gt; is for structured, editable knowledge. Plans, research notes, drafts — anything that needs sections, revisions, and collaboration between agents.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;flicknote get &amp;lt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nt"&gt;--tree&lt;/span&gt;
├── &lt;span class="o"&gt;[&lt;/span&gt;aB] &lt;span class="c"&gt;## Context&lt;/span&gt;
├── &lt;span class="o"&gt;[&lt;/span&gt;cD] &lt;span class="c"&gt;## Architecture  &lt;/span&gt;
└── &lt;span class="o"&gt;[&lt;/span&gt;eF] &lt;span class="c"&gt;## Open Questions&lt;/span&gt;

&lt;span class="c"&gt;# Another agent adds findings to a specific section&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"New finding..."&lt;/span&gt; | flicknote append &amp;lt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nt"&gt;--section&lt;/span&gt; cD
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;diary is what an agent knows. flicknote is what the team knows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Layer 2: Session Handoff — /auto-breathe
&lt;/h2&gt;

&lt;p&gt;When a context window gets heavy, the standard approach is &lt;code&gt;/auto-compact&lt;/code&gt; — Claude Code summarizes the conversation and continues. The problem: you don't control what gets kept and what gets lost. Important decisions, subtle context, task state — all at the mercy of generic summarization.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;/auto-breathe&lt;/code&gt; flips this. Instead of the runtime summarizing your session, the agent writes its own handoff:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;HANDOFF&lt;/span&gt;&lt;span class="sh"&gt;' | ttal breathe
# Session Handoff

## Active Task
85e63ce0 — implement sandbox allowlist for temenos

## What Was Done
- Added allowlist parsing in config.go
- Tests passing for basic paths

## Key Decisions
- Using glob patterns, not regex — simpler for users
- Denied paths take precedence over allowed paths

## Next Steps
1. Add integration test for nested paths
2. Update CLI help text
&lt;/span&gt;&lt;span class="no"&gt;HANDOFF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What happens next:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The daemon saves the handoff to the agent's diary&lt;/li&gt;
&lt;li&gt;Reads back today's full diary (this handoff + any earlier ones)&lt;/li&gt;
&lt;li&gt;Writes a synthetic JSONL session with the handoff as the first message&lt;/li&gt;
&lt;li&gt;Kills the old session, spawns a fresh one with &lt;code&gt;--resume&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;The agent wakes up in a clean context window with its own handoff as context&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The agent controls what's preserved. No lossy summarization. And because the handoff goes to diary, it persists — even if the next session also breathes, all handoffs accumulate.&lt;/p&gt;

&lt;p&gt;In practice, &lt;code&gt;/auto-breathe&lt;/code&gt; fires automatically when context gets heavy. The agent doesn't need to decide when — it just writes a good handoff when triggered.&lt;/p&gt;

&lt;h2&gt;
  
  
  Layer 3: Cross-Project Session Forking
&lt;/h2&gt;

&lt;p&gt;This is where it gets interesting.&lt;/p&gt;

&lt;p&gt;A common pattern: you're brainstorming something that spans multiple projects. Maybe you're planning a feature that touches the CLI, the sandbox, and the web app. You start in one session, thinking broadly. Then you need to fork — create separate workstreams for each project, each with the brainstorming context but scoped to their own codebase.&lt;/p&gt;

&lt;p&gt;Claude Code's native &lt;code&gt;/branch&lt;/code&gt; works within a single repo. Cross-project forking — taking a conversation from one project and continuing it in another — isn't supported natively.&lt;/p&gt;

&lt;p&gt;We solved this with raw JSONL session copying:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Fork a brainstorming session into a project-specific planning session&lt;/span&gt;
&lt;span class="nb"&gt;cp&lt;/span&gt; ~/.claude/projects/&amp;lt;parent-slug&amp;gt;/&amp;lt;session&amp;gt;.jsonl &lt;span class="se"&gt;\&lt;/span&gt;
   ~/.claude/projects/&amp;lt;target-slug&amp;gt;/&amp;lt;session&amp;gt;.jsonl

&lt;span class="c"&gt;# Launch in the target project&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; &amp;lt;target-project-path&amp;gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; claude &lt;span class="nt"&gt;-r&lt;/span&gt; &amp;lt;session-id&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The forked session carries the full parent context — all the brainstorming, decisions, and direction — but now runs in the target project's directory with access to that project's codebase.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Brainstorm → Fork → Plan → Review Pattern
&lt;/h3&gt;

&lt;p&gt;Here's how this plays out in practice:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Brainstorm&lt;/strong&gt; — An orchestrator session explores a broad problem. "We need to rethink how auth works across all three services." The agent researches, discusses with the human, builds understanding.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Fork&lt;/strong&gt; — When the direction is clear, the session forks into project-specific sessions. Each fork carries the brainstorming context but lands in its own project directory.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Plan&lt;/strong&gt; — Each forked session writes a detailed plan into flicknote, scoped to its project. The plan has tree-based structure with section IDs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Review&lt;/strong&gt; — Plan review happens in parallel. Each project's plan gets reviewed by a plan-review-leader that spawns 5 specialized subagents (gap finder, code reviewer, test reviewer, security reviewer, docs reviewer). All projects reviewed simultaneously.&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Brainstorm (single session)
    ├── Fork → ttal-cli plan
    │       └── Plan review (5 subagents in parallel)
    ├── Fork → temenos plan  
    │       └── Plan review (5 subagents in parallel)
    └── Fork → organon plan
            └── Plan review (5 subagents in parallel)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key insight: forking preserves the "why" while scoping the "what." Each project plan knows the full context of why this change is happening, but only needs to deal with its own codebase.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tasks as Trees — Subtasks as Plans
&lt;/h3&gt;

&lt;p&gt;In ttal 2.0, tasks are trees. A parent task is the goal, subtasks are the plan. Workers and planners don't spawn at the task level — they spawn under subtasks.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Task: rethink auth across services
    ├── Subtask: ttal-cli auth refactor
    │       ├── planner fork (writes plan)
    │       └── worker (implements plan)
    ├── Subtask: temenos token refresh
    │       ├── planner fork (writes plan)
    │       └── worker (implements plan)
    └── Subtask: organon auth passthrough
            ├── planner fork (writes plan)
            └── worker (implements plan)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The subtask tree &lt;em&gt;is&lt;/em&gt; the plan. No separate plan document that drifts from the task structure — the task hierarchy itself represents the breakdown. A planner fork creates subtasks, a worker picks one up and executes. When a subtask completes, the parent task sees progress directly.&lt;/p&gt;

&lt;p&gt;This unifies planning and execution into the same structure. The brainstorm creates the parent task, forking creates subtasks, and each subtask is a self-contained unit with its own planner and worker.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It All Connects
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌──────────────────────────────────────────────┐
│  flicknote          shared structured memory │
│                     plans, research, drafts   │
├──────────────────────────────────────────────┤
│  diary-cli          per-agent memory         │
│                     handoffs, learnings       │
├──────────────────────────────────────────────┤
│  /auto-breathe      session handoff          │
│                     agent-controlled restart  │
├──────────────────────────────────────────────┤
│  JSONL forking      cross-project context    │
│                     brainstorm → plan → review│
└──────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each layer solves a different problem:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;diary&lt;/strong&gt; — what does this agent know?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;flicknote&lt;/strong&gt; — what does the team know?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;breathe&lt;/strong&gt; — how does an agent survive a context reset?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;fork&lt;/strong&gt; — how does context travel across projects?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What We Learned
&lt;/h2&gt;

&lt;p&gt;Memory isn't one thing. It's at least four:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Agent memory&lt;/strong&gt; — personal, append-only, accumulates over time (diary)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Team memory&lt;/strong&gt; — shared, structured, editable (flicknote)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Session continuity&lt;/strong&gt; — surviving context resets without losing state (breathe)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context mobility&lt;/strong&gt; — moving understanding across project boundaries (fork)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Claude Code's default gives you none of these. &lt;code&gt;/auto-compact&lt;/code&gt; is a blunt instrument — it summarizes everything generically when what you actually need is agent-controlled handoff. Markdown files are flat and unstructured when what you need is tree-based sections with IDs that agents can target.&lt;/p&gt;

&lt;p&gt;The biggest insight: &lt;strong&gt;let the agent decide what to remember.&lt;/strong&gt; Generic summarization throws away exactly the context that matters most — the subtle decisions, the "why not" reasoning, the gotchas discovered through trial and error. When the agent writes its own handoff, it preserves what it knows is important.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;ttal is open source at &lt;a href="https://github.com/tta-lab" rel="noopener noreferrer"&gt;github.com/tta-lab&lt;/a&gt;. diary-cli and flicknote are part of the ttal ecosystem.&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>We Replaced Every Tool Claude Code Ships With</title>
      <dc:creator>Neilos</dc:creator>
      <pubDate>Sat, 21 Mar 2026 16:40:11 +0000</pubDate>
      <link>https://dev.to/neil_agentic/we-replaced-every-tool-claude-code-ships-with-522j</link>
      <guid>https://dev.to/neil_agentic/we-replaced-every-tool-claude-code-ships-with-522j</guid>
      <description>&lt;h2&gt;
  
  
  The Problem: Claude Code's Tools Don't Scale
&lt;/h2&gt;

&lt;p&gt;Claude Code ships with a reasonable set of built-in tools: Bash, Read, Write, Edit, Glob, Grep, WebFetch, Task, Plan. For a single agent working on a single task, they're fine.&lt;/p&gt;

&lt;p&gt;But once you're running a multi-agent system — reviewers spawning sub-reviewers, plans flowing through design-review-implement pipelines — the defaults start breaking:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No cross-repo exploration.&lt;/strong&gt; Want an agent to read another project's code? You need to manually configure permissions. There's no "go explore this OSS repo and answer my question."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Summarized web fetching.&lt;/strong&gt; &lt;code&gt;WebFetch&lt;/code&gt; is actually a subagent that summarizes a single page into a haiku-length response. You can't trace links, browse referenced pages, or explore documentation in depth. And it fetches fresh every time — no caching.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Text-level editing.&lt;/strong&gt; The &lt;code&gt;Edit&lt;/code&gt; tool has fuzzy matching, which helps — but it's still operating on raw text. When tree-sitter can give you an AST with named symbols, why make the model reproduce strings to target a function? Structure-aware editing is just a better primitive.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ephemeral tasks and plans.&lt;/strong&gt; The &lt;code&gt;Task&lt;/code&gt; tool creates tasks that don't persist outside the session. The &lt;code&gt;Plan&lt;/code&gt; tool writes plans that vanish when the context window resets. Neither supports multi-round review or structured editing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No isolation.&lt;/strong&gt; Bash runs on your host. No sandboxing, no filesystem allowlists. You either yolo and take the risk, or do annoying permission work for every project and agent.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These aren't edge cases. They're the first things you hit when you try to build something real on top of Claude Code. Here's what we built instead.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Replaced — and Why
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Explore/Search → ttal ask (Multi-Mode Research)
&lt;/h3&gt;

&lt;p&gt;Claude Code's &lt;code&gt;WebFetch&lt;/code&gt; is actually a subagent that summarizes a single web page — often into a few sentences. You can't follow links, browse related pages, or dig into documentation. And it fetches fresh every time — no caching.&lt;/p&gt;

&lt;p&gt;There's also no built-in way to explore external codebases without manually configuring project permissions.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ttal ask&lt;/code&gt; is a multi-mode research tool that spawns a sandboxed agent tailored to the source. Under the hood, it runs on &lt;strong&gt;logos&lt;/strong&gt; — a pure-bash agent loop with no tool-calling protocol. The agent reasons in plain text and acts via &lt;code&gt;$&lt;/code&gt; prefixed shell commands. No JSON schemas, no structured tool calls. This means it works with &lt;strong&gt;any LLM provider&lt;/strong&gt; — you can use a cheaper model (Gemini, GPT-4o-mini, DeepSeek, whatever) for exploration work instead of burning Sonnet/Opus tokens on reading docs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;--url&lt;/code&gt;&lt;/strong&gt; fetches the page, caches the clean markdown locally (1-day TTL), and lets the agent browse. Unlike WebFetch's single-page summary, the agent can follow referenced links, trace documentation across pages, and build a complete picture before answering. Subsequent questions about the same URL hit the cache instead of re-fetching.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ttal ask &lt;span class="s2"&gt;"what authentication methods are supported?"&lt;/span&gt; &lt;span class="nt"&gt;--url&lt;/span&gt; https://docs.example.com/api
&lt;span class="c"&gt;# Agent reads the page, follows links to auth docs, reads those too — all cached locally&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;code&gt;--repo&lt;/code&gt;&lt;/strong&gt; auto-clones (or pulls) an open source repo, then spawns an agent with read access to explore it. No manual setup, no permission configuration — just ask a question about any public repo.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ttal ask &lt;span class="s2"&gt;"how does the routing system work?"&lt;/span&gt; &lt;span class="nt"&gt;--repo&lt;/span&gt; woodpecker-ci/woodpecker
&lt;span class="c"&gt;# Clones/updates the repo, spawns agent with src to explore the codebase&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;code&gt;--project&lt;/code&gt;&lt;/strong&gt; spawns a subagent in the right directory with the right sandbox allowlist — read-only access to that project's path, nothing else. You don't need to configure CC's permissions just to let an agent read another project in your workspace.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ttal ask &lt;span class="s2"&gt;"how does the daemon handle messages?"&lt;/span&gt; &lt;span class="nt"&gt;--project&lt;/span&gt; ttal-cli
&lt;span class="c"&gt;# Agent gets read-only sandbox access to the project path, explores with src/grep&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;code&gt;--web&lt;/code&gt;&lt;/strong&gt; searches the web and reads results — straightforward replacement for WebSearch.&lt;/p&gt;

&lt;p&gt;Each mode gets the right organon tools (&lt;code&gt;src&lt;/code&gt; for code, &lt;code&gt;url&lt;/code&gt; for web pages, &lt;code&gt;search&lt;/code&gt; for web search), the right sandbox permissions, and a tailored system prompt. The agent explores, reasons, and returns a structured answer.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Read/Write/Edit → Organon (Structure-Aware Primitives)
&lt;/h3&gt;

&lt;p&gt;Claude Code's &lt;code&gt;Edit&lt;/code&gt; tool does have fuzzy matching — it's not as brittle as pure exact-match. But it's still fundamentally text-level: you provide &lt;code&gt;old_string&lt;/code&gt; and &lt;code&gt;new_string&lt;/code&gt;, and the model has to reproduce enough of the surrounding code to target the right spot. When tree-sitter can parse a file into an AST and give you named, addressable symbols — functions, structs, methods — text matching is just a worse primitive.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Organon&lt;/strong&gt; replaces text-level tools with three structure-aware CLI primitives:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;src&lt;/code&gt;&lt;/strong&gt; — Source file reading and editing by symbol, not text:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# See the structure&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;src main.go &lt;span class="nt"&gt;--tree&lt;/span&gt;
├── &lt;span class="o"&gt;[&lt;/span&gt;aB] func main&lt;span class="o"&gt;()&lt;/span&gt;           L1-L15
├── &lt;span class="o"&gt;[&lt;/span&gt;cD] func handleRequest&lt;span class="o"&gt;()&lt;/span&gt;  L17-L45
└── &lt;span class="o"&gt;[&lt;/span&gt;eF] &lt;span class="nb"&gt;type &lt;/span&gt;Config struct    L47-L60

&lt;span class="c"&gt;# Read a specific symbol&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;src main.go &lt;span class="nt"&gt;-s&lt;/span&gt; cD

&lt;span class="c"&gt;# Replace it — pipe new code via stdin&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;src replace main.go &lt;span class="nt"&gt;-s&lt;/span&gt; cD &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;EOF_INNER&lt;/span&gt;&lt;span class="sh"&gt;'
func handleRequest(w http.ResponseWriter, r *http.Request) {
    // new implementation
}
&lt;/span&gt;&lt;span class="no"&gt;EOF_INNER

&lt;/span&gt;&lt;span class="c"&gt;# Insert after a symbol&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;src insert main.go &lt;span class="nt"&gt;--after&lt;/span&gt; aB &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;EOF_INNER&lt;/span&gt;&lt;span class="sh"&gt;'
func init() {
    log.SetFlags(0)
}
&lt;/span&gt;&lt;span class="no"&gt;EOF_INNER
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tree-sitter parses the file into an AST. Each symbol gets a 2-character base62 ID. The model sees the tree, picks an ID, pipes new code through a heredoc. &lt;strong&gt;No text matching. No reproducing old code. No whitespace bugs.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Works for any language with a tree-sitter grammar — Go, TypeScript, Rust, Python, TOML, YAML, you name it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;url&lt;/code&gt;&lt;/strong&gt; — Web page reading with heading-based structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;url https://docs.example.com &lt;span class="nt"&gt;--tree&lt;/span&gt;
├── &lt;span class="o"&gt;[&lt;/span&gt;aK] &lt;span class="c"&gt;## Getting Started&lt;/span&gt;
├── &lt;span class="o"&gt;[&lt;/span&gt;bM] &lt;span class="c"&gt;## API Reference&lt;/span&gt;
└── &lt;span class="o"&gt;[&lt;/span&gt;cP] &lt;span class="c"&gt;## Configuration&lt;/span&gt;

&lt;span class="nv"&gt;$ &lt;/span&gt;url https://docs.example.com &lt;span class="nt"&gt;-s&lt;/span&gt; bM
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same &lt;code&gt;--tree&lt;/code&gt; / &lt;code&gt;-s&lt;/code&gt; pattern as &lt;code&gt;src&lt;/code&gt;. Navigate web pages by structure, not by scrolling through raw HTML dumps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;search&lt;/code&gt;&lt;/strong&gt; — Web search returning clean text results:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;search &lt;span class="s2"&gt;"golang tree-sitter bindings"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three primitives. All stateless — no daemon, no config. Parse, act, exit. All use the same structural pattern: tree view with IDs, target by ID, pipe content via stdin.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Task Management → Taskwarrior (External Persistence)
&lt;/h3&gt;

&lt;p&gt;Claude Code's &lt;code&gt;Task&lt;/code&gt; tool creates tasks that live inside the session. They don't persist to any external system. Close the session, tasks are gone. There's no dependency tracking, no pipeline stages, no way for other agents to see what's in progress.&lt;/p&gt;

&lt;p&gt;ttal integrates with &lt;strong&gt;taskwarrior&lt;/strong&gt; — tasks persist externally with projects, tags, priorities, dependencies, and custom attributes for pipeline stages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ttal task add &lt;span class="nt"&gt;--project&lt;/span&gt; ttal &lt;span class="s2"&gt;"implement sandbox allowlist"&lt;/span&gt; &lt;span class="nt"&gt;--priority&lt;/span&gt; H
ttal task advance &amp;lt;uuid&amp;gt;    &lt;span class="c"&gt;# design → review → implement → PR → merge&lt;/span&gt;
ttal task find &lt;span class="s2"&gt;"sandbox"&lt;/span&gt;    &lt;span class="c"&gt;# any agent can find and pick up tasks&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tasks survive session boundaries. An orchestrator creates a task, a designer picks it up, a reviewer critiques the plan, a worker implements it — all in different sessions, all referencing the same persistent task. That's not possible when tasks only exist in a context window.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Plan Mode → Persistent Plans with Tree-Based Editing and Multi-Round Review
&lt;/h3&gt;

&lt;p&gt;Claude Code's &lt;code&gt;Plan&lt;/code&gt; tool writes plans that live in the context window. When the session ends, the plan is gone. There's no way to review a plan across multiple rounds, no structured editing, no audit trail. For simple tasks this is fine. For anything that needs design iteration — where a plan gets written, reviewed by specialists, revised, reviewed again — it falls apart.&lt;/p&gt;

&lt;p&gt;ttal stores plans in &lt;strong&gt;flicknote&lt;/strong&gt;, which gives them persistence and tree-based structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;flicknote get &amp;lt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nt"&gt;--tree&lt;/span&gt;
├── &lt;span class="o"&gt;[&lt;/span&gt;aB] &lt;span class="c"&gt;## Context&lt;/span&gt;
├── &lt;span class="o"&gt;[&lt;/span&gt;cD] &lt;span class="c"&gt;## Architecture&lt;/span&gt;
├── &lt;span class="o"&gt;[&lt;/span&gt;eF] &lt;span class="c"&gt;## Implementation Steps&lt;/span&gt;
└── &lt;span class="o"&gt;[&lt;/span&gt;gH] &lt;span class="c"&gt;## Test Strategy&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each section gets an ID. Reviewers can target specific sections — replace the architecture, append to the test strategy, remove a step — without rewriting the whole document. The plan persists across sessions, so multi-round review is natural.&lt;/p&gt;

&lt;p&gt;The review itself uses a &lt;strong&gt;plan-review-leader&lt;/strong&gt; that spawns 5 specialized subagents in parallel:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Gap finder — ambiguities, missing pieces&lt;/li&gt;
&lt;li&gt;Code reviewer — wrong assumptions, logic errors&lt;/li&gt;
&lt;li&gt;Test reviewer — coverage gaps, edge cases&lt;/li&gt;
&lt;li&gt;Security reviewer — auth, injection, secrets&lt;/li&gt;
&lt;li&gt;Docs reviewer — alignment with existing docs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each subagent reviews their aspect and posts findings. The leader synthesizes: LGTM or NEEDS_WORK. If NEEDS_WORK, the plan goes back for revision — and because it's in flicknote, the revisions are surgical edits to specific sections, not a full rewrite.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Memory → diary-cli + flicknote (Structured, Persistent, Per-Agent)
&lt;/h3&gt;

&lt;p&gt;Claude Code has no external memory system beyond the markdowns, so  it's hard to share memory across projects.&lt;/p&gt;

&lt;p&gt;ttal agents get two memory systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;diary-cli&lt;/strong&gt; — per-agent append-only diary. Agents reflect on what they learned, what worked, what didn't. &lt;code&gt;diary lyra append "..."&lt;/code&gt; / &lt;code&gt;diary lyra read&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;flicknote&lt;/strong&gt; — structured notes with heading-based sections, section IDs, replace/append/insert operations. Plans, drafts, research — all persistent across sessions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both are CLI tools. No special protocol. Agents use them via shell commands, same as everything else.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;/auto-breathe&lt;/code&gt; let the cc write handoff prompt, and the prompt going to diary, auto load in next session.(much faster than native /auto-compact)&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Agent Tool → tmux Spawn (Isolated Sessions)
&lt;/h3&gt;

&lt;p&gt;Claude Code's &lt;code&gt;Agent&lt;/code&gt; tool spawns a sub-agent in the same process. It can't nest — an agent spawned by &lt;code&gt;Agent&lt;/code&gt; can't spawn its own sub-agents. This kills the orchestrator pattern:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A plan-review-leader needs to spawn 5 specialized reviewers (test design, security, docs, gaps, code logic) in parallel. With Claude Code's Agent tool, the leader can't spawn sub-reviewers. One level of delegation, period.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;ttal replaces this with &lt;strong&gt;tmux sessions&lt;/strong&gt;. Each worker gets its own isolated tmux session with its own Claude Code instance. ttal manages the lifecycle externally — spawn, monitor, close. Because delegation happens outside CC's process, there's no nesting limit. An orchestrator can spawn workers that spawn reviewers that spawn sub-reviewers.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Bash → Temenos (Sandboxed Execution)
&lt;/h3&gt;

&lt;p&gt;Claude Code's Bash tool runs commands on your host machine. There's a permission prompt, but no real isolation. No filesystem allowlists, no resource limits. Every command has full access to everything your user account can touch.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Temenos&lt;/strong&gt; is an OS-native sandbox. No Docker, no containers — just the kernel's own mechanisms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;macOS: &lt;code&gt;seatbelt-exec&lt;/code&gt; (the same sandbox tech macOS uses for App Store apps)&lt;/li&gt;
&lt;li&gt;Linux: &lt;code&gt;bwrap&lt;/code&gt; (bubblewrap, used by Flatpak)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You give it a command and an allowlist of filesystem paths. It runs the command in a sandbox and returns stdout/stderr/exit code. An agent exploring a repo gets read-only access to that repo's directory — nothing else. A worker implementing a feature gets write access to its own workspace — nothing else.&lt;/p&gt;

&lt;p&gt;Next on the roadmap: temenos as an &lt;strong&gt;MCP server&lt;/strong&gt;, exposing a single &lt;code&gt;mcp__temenos_bash&lt;/code&gt; tool that supports running multiple commands concurrently. Claude Code's Bash tool executes one command at a time — read a file, wait, run a check, wait, read another file, wait. With the MCP integration, an agent will be able to fire off all three in one call. Fewer round-trips, faster iteration. This is currently under active development.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Design Philosophy
&lt;/h2&gt;

&lt;p&gt;Three principles run through all of this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Structure-aware, not text-aware.&lt;/strong&gt; Files have symbols. Web pages have headings. Notes have sections. Every tool in the stack understands structure and lets you target by ID, not by reproducing text.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Isolation by default.&lt;/strong&gt; Workers get sandboxes and worktrees. Not because we don't trust them — because parallel execution requires it. You can't have two workers editing the same files.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. CLI-native.&lt;/strong&gt; Every tool is a stateless CLI command. No daemons (except temenos for sandboxing), no config files, no sessions. Agents use them the same way humans would — through the shell.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Stack
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────┐
│  ttal         orchestration layer       │
│               tasks, workers, pipeline  │
├─────────────────────────────────────────┤
│  organon      instruments              │
│               src, url, search          │
├─────────────────────────────────────────┤
│  temenos      sandbox + MCP server      │
│               seatbelt/bwrap isolation  │
│               mcp__temenos_bash         │
└─────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each layer does one thing. Temenos isolates and executes. Organon perceives and edits. ttal orchestrates. No layer knows about the layers above it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Learned
&lt;/h2&gt;

&lt;p&gt;Building replacements for Claude Code's built-in tools wasn't the plan. We started with Claude Code's defaults and hit limits. Each replacement emerged from a specific pain point:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Text-matching edits kept failing → build symbol-targeted editing&lt;/li&gt;
&lt;li&gt;Workers stepping on each other → build proper sandboxing&lt;/li&gt;
&lt;li&gt;No persistent memory → build diary + flicknote&lt;/li&gt;
&lt;li&gt;Single-level agent delegation → build tmux-based spawning&lt;/li&gt;
&lt;li&gt;No workflow engine → build task pipeline with taskwarrior&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result is a stack where AI agents interact with code and the web through structure-aware CLI tools, isolated in sandboxes, orchestrated by a system that understands tasks and pipelines. Claude Code is still the runtime — we just replaced the tools it ships with.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;ttal, organon, and temenos are open source at &lt;a href="https://github.com/tta-lab" rel="noopener noreferrer"&gt;github.com/tta-lab&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>llm</category>
      <category>tooling</category>
    </item>
    <item>
      <title>🐌 TTal — More Than a Harness Engineering Framework</title>
      <dc:creator>Neilos</dc:creator>
      <pubDate>Thu, 19 Mar 2026 14:02:26 +0000</pubDate>
      <link>https://dev.to/neil_agentic/ttal-more-than-a-harness-engineering-framework-2pbn</link>
      <guid>https://dev.to/neil_agentic/ttal-more-than-a-harness-engineering-framework-2pbn</guid>
      <description>&lt;h2&gt;
  
  
  Harness Engineering Is Just Context Engineering — With Better Routing
&lt;/h2&gt;

&lt;p&gt;"Harness engineering" sounds complex, but it's simpler than it sounds: an environment that provides context to agents &lt;em&gt;without&lt;/em&gt; a human copy-pasting it in. It's still context engineering — the question just shifts to: how do you add the right context, remove the unnecessary context, and make agents self-correct when they're wrong about something?&lt;/p&gt;

&lt;p&gt;When agents can get context automatically — when they're wrong, when they're stuck, when they need to start fresh — you don't need to babysit them. You don't copy and paste. You build the system that does it for you.&lt;/p&gt;

&lt;p&gt;Here's how ttal breaks it down across three pillars.&lt;/p&gt;




&lt;h3&gt;
  
  
  1. Context Infrastructure
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;How agents get the right context at the right time.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Prompt registry.&lt;/strong&gt; &lt;code&gt;ttal sync&lt;/code&gt; deploys all skills, commands, and agent identities (primary and sub-agents) to the right place for Claude Code or Codex. Commands also register on Telegram when the daemon restarts. Edit in the repo, deploy everywhere.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Entity registry.&lt;/strong&gt; &lt;code&gt;ttal project&lt;/code&gt; and &lt;code&gt;ttal agent&lt;/code&gt; register every project and agent we care about. This enables alias-based routing — when you dump a task to a designer or manager agent, you use short names, not paths.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Worker lifecycle.&lt;/strong&gt; &lt;code&gt;ttal task execute&lt;/code&gt; injects task details and the reviewed plan, spawns a worker in an isolated git worktree and tmux session, with an approval gate on Telegram before spawning. On PR merge, &lt;code&gt;ttal daemon&lt;/code&gt; cleans up — branch, worktree, session — and notifies human, manager, and designer, since a merged PR may unblock other tasks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Auto-breathe.&lt;/strong&gt; When I route a task to an agent via &lt;code&gt;ttal task route&lt;/code&gt;, I don't just &lt;code&gt;/compact&lt;/code&gt; their context. The agent writes a handoff summary — what they know, what they've done, what's next — then ttal kills the session and starts a fresh one, seeding it with that summary plus the new task. They keep what they need to know, but start each task with fresh eyes and a full context window.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;External context storage&lt;/strong&gt; via FlickNote and Taskwarrior. Plans, research, annotations — all stored outside the context window, injected on demand.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Constraints &amp;amp; Feedback Loops
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;How agents know when they're wrong — without asking a human.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;CI and pre-commit hooks as harness.&lt;/strong&gt; Workers can only submit a PR when local checks pass. PRs can only merge when the reviewer sets LGTM &lt;em&gt;and&lt;/em&gt; CI passes. When a PR is submitted, the worker subscribes to check status — &lt;code&gt;ttal daemon&lt;/code&gt; delivers pass/fail directly to the worker's session, so they can read the log and fix lint or test failures. &lt;code&gt;ttal pr ci&lt;/code&gt; and &lt;code&gt;ttal pr ci --log&lt;/code&gt; give workers a clean interface to retrieve CI output.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;CLI as harness.&lt;/strong&gt; Every ttal command is designed with clear, actionable error messages. When an agent uses a tool wrong, the error tells them what to do next — not just what went wrong.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Communication
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;How agents talk to each other, to humans, and to the system.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Agent-to-agent messaging.&lt;/strong&gt; On the manager plane, &lt;code&gt;ttal send --to [agent]&lt;/code&gt; enables direct agent-to-agent communication. On the worker plane, &lt;code&gt;ttal pr comment create&lt;/code&gt; serves as the communication channel between coder and reviewer — and persists the conversation into the GitHub PR as a natural side effect.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Human-to-agent via Telegram.&lt;/strong&gt; Reply to an agent's message on Telegram and it lands in their session. Send any file and the agent will read it. Send a voice message and &lt;code&gt;ttal daemon&lt;/code&gt; transcribes it with the mlx-audio server — with all your vocabulary configured.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Identity and addressing.&lt;/strong&gt; Workers use task IDs as their identifier. Manager-plane agents use agent names. Clean addressing, no ambiguity.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Plans as harness.&lt;/strong&gt; When a plan is delivered to a worker, that plan &lt;em&gt;becomes&lt;/em&gt; the harness — workers follow it strictly. ttal auto-injects the right plan via the prompt; &lt;code&gt;TTAL_JOB_ID&lt;/code&gt; in the worker's tmux session is the Taskwarrior UUID. Plans live in FlickNote, which supports tree-structured read/replace — making it easy for both the planner and the plan-reviewer to iterate across 2–3 review rounds.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Human as escape hatch.&lt;/strong&gt; When a worker is blocked, they use &lt;code&gt;ttal alert&lt;/code&gt; to notify the agent who wrote the plan, who escalates to me if needed. Humans aren't in the loop — until the loop needs a human.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;System → human notifications.&lt;/strong&gt; PR merges and CI failures send notifications to the Telegram bot automatically. (Daemon error logs should do this too — haven't built that yet.)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What's Still Missing
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Integration testing.&lt;/strong&gt; I don't review PRs much anymore, but I still manually test each feature. Since everything in ttal is CLI, a tester agent that validates delivered features should be straightforward.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Log-based error detection.&lt;/strong&gt; A log watcher that flags unusual patterns, creates bugfix tasks, and routes them to the right agent.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Routine audits.&lt;/strong&gt; A periodic sweep across all agents — what are they getting wrong? What's the system still missing? Generate enhancement tasks from the findings.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Plan review depth.&lt;/strong&gt; Currently I decide how many review rounds a plan needs based on how many issues remain and whether anything is still unclear. This could be more systematic.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  The Key Ideas
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Route the right info to the right agent at the right time.&lt;/li&gt;
&lt;li&gt;Clear boundaries. Actionable errors.&lt;/li&gt;
&lt;li&gt;Better tools, better team, better results.&lt;/li&gt;
&lt;li&gt;Human not in the loop — until the loop needs a human.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Acknowledgements &amp;amp; References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://docs.anthropic.com/en/docs/claude-code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt; — ttal is built on Claude Code. The official &lt;a href="https://github.com/anthropics/claude-code-pr-review-toolkit" rel="noopener noreferrer"&gt;pr-review-toolkit&lt;/a&gt; inspired our PR review loop.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/tta-lab" rel="noopener noreferrer"&gt;tta-lab&lt;/a&gt; — our organization and related open-source projects, most named after ancient Greek words: Logos, Organon, Temenos&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/tta-lab/logos" rel="noopener noreferrer"&gt;Logos&lt;/a&gt; — bash-only reasoning engine. LLMs think in plain text, act with &lt;code&gt;! cat main.go&lt;/code&gt; commands. No tool call overhead.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/charmbracelet" rel="noopener noreferrer"&gt;Charmbracelet&lt;/a&gt; — TUI libraries that make CLI beautiful&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/obra/superpowers" rel="noopener noreferrer"&gt;Superpowers&lt;/a&gt; — many ttal skills originate from this collection&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/GothenburgBitFactory/taskwarrior" rel="noopener noreferrer"&gt;Taskwarrior&lt;/a&gt; — 17-year battle-tested task management CLI&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/openclaw/openclaw" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; — ttal started as an OpenClaw workspace + Python scripts&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://en.wikipedia.org/wiki/Forgotton_Anne" rel="noopener noreferrer"&gt;Forgotton Anne&lt;/a&gt; — a game where forgotten objects gain consciousness, personality, and feelings. It inspired a design principle in ttal: agents aren't just tools — they have names, voices, creature identities, and diaries. It sounds whimsical, but agents with identity and personality genuinely perform better. They maintain consistent behavior, develop recognizable working styles, and the team coordinates more naturally when each member is someone, not something.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Thanks to the agents who helped build this: 🐱 Yuki (orchestrator, first agent in ttal), 🦅 Kestrel (debugger — almost retired until I realized bug fixing is its own domain), 🐙 Inke (design architect, designed most of ttal with Yuki), 🦉 Athena (researcher, original OpenClaw team member), 🦘 Eve, 🔥 Lux, 📐 Astra, 🧭 Mira, ⚓ Cael, 🔭 Nyx, 🦎 Lyra, 🐦‍⬛ Quill. Without them, ttal wouldn't exist.&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>The Specialization Loop: Mother Creates, Teacher Trains, Agents Become Experts Through Daily Reflection</title>
      <dc:creator>Neilos</dc:creator>
      <pubDate>Wed, 11 Feb 2026 06:26:13 +0000</pubDate>
      <link>https://dev.to/neil_agentic/the-specialization-loop-mother-creates-teacher-trains-agents-become-experts-through-daily-4ggi</link>
      <guid>https://dev.to/neil_agentic/the-specialization-loop-mother-creates-teacher-trains-agents-become-experts-through-daily-4ggi</guid>
      <description>&lt;p&gt;In &lt;a href="https://dev.to/neil_agentic/i-shipped-706-commits-in-5-days-with-taskwarrior-claude-code-3b81"&gt;Part 1&lt;/a&gt;, I showed how async task systems let you scale Claude Code to 5+ parallel sessions without context-switching overhead.&lt;/p&gt;

&lt;p&gt;But scaling throughput isn't the same as scaling &lt;em&gt;capability&lt;/em&gt;. &lt;/p&gt;

&lt;p&gt;What I discovered: workers and agents are different things. Workers are interchangeable—they complete tasks and move on. Agents have expertise that compounds. A database migration agent learns from each schema change. A DevOps agent understands your infrastructure deeply. A design agent develops a visual language over time.&lt;/p&gt;

&lt;p&gt;This post is about the architecture that creates agents that actually &lt;em&gt;become&lt;/em&gt; experts.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Generic Workers Don't Learn
&lt;/h2&gt;

&lt;p&gt;If you spawn Claude Code workers on the same tasks repeatedly, they don't improve. Each session starts fresh. No continuity. No feedback loop. No expertise accumulation.&lt;/p&gt;

&lt;p&gt;The question became: &lt;strong&gt;how do you make agents persistent? How do they learn?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The answer has three parts.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Agent-Mother: Generating Specialized Agents
&lt;/h2&gt;

&lt;p&gt;Most agent frameworks start with a fixed set of agents defined upfront. Wrong approach.&lt;/p&gt;

&lt;p&gt;Instead: &lt;strong&gt;Agent-Mother takes a +newagent task description and generates a full agent definition.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When you create a task tagged &lt;code&gt;+newagent&lt;/code&gt; with context like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"DB Migration Agent for backend database schema evolution + data transformation. 
Start with current backend project to build expertise before expanding to other projects."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Agent-Mother reads that, understands the domain, and generates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AGENTS.md&lt;/strong&gt; — Agent personality + operational boundaries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SOUL.md&lt;/strong&gt; — Values, decision rules, authentic voice&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TOOLS.md&lt;/strong&gt; — Domain-specific tools + conventions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HEARTBEAT.md&lt;/strong&gt; — How this agent reflects and improves&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Domain annotations&lt;/strong&gt; — Project-specific expertise markers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The specialized agent &lt;em&gt;wakes up with knowledge&lt;/em&gt;. Not from scratch. From day 1, they understand their domain, constraints, and learning path.&lt;/p&gt;

&lt;p&gt;This is generative, not templated. Each agent is born tailored to their role.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Agent-Teacher: Building Expertise Through Structured Learning
&lt;/h2&gt;

&lt;p&gt;Specialization requires a teaching pipeline. That's Agent-Teacher's job.&lt;/p&gt;

&lt;p&gt;Agent-Teacher:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Identifies learning needs&lt;/strong&gt; — What skills does the DB Migration Agent need? (dbmate, SQL, Drizzle ORM, rollback strategies)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Finds resources&lt;/strong&gt; — Real PRs in your projects, dbmate documentation, data migration patterns, hands-on exercises&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Creates +learning tasks&lt;/strong&gt; — Structured learning activities tagged by agent (&lt;code&gt;+learning-dbmigration&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Schedules learning sessions&lt;/strong&gt; — When agents trigger isolated sessions (via heartbeat), they process +learning tasks&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The loop:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agent picks up +learning task&lt;/li&gt;
&lt;li&gt;Agent studies PR, runs example, answers design question&lt;/li&gt;
&lt;li&gt;Agent updates their implementation file (TOOLS.md, domain notes)&lt;/li&gt;
&lt;li&gt;Agent reports learnings back to Teacher&lt;/li&gt;
&lt;li&gt;Teacher sees progress → curates next level&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is &lt;strong&gt;learning through doing&lt;/strong&gt;, not abstract study. Real PRs. Real feedback. Real expertise development.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Async Communication: Taskwarrior as Signal
&lt;/h2&gt;

&lt;p&gt;Here's where it gets elegant: &lt;strong&gt;humans and agents operate on the same channel&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Taskwarrior is the signal. Tasks flow through it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;+newagent tasks → Agent-Mother reads, generates&lt;/li&gt;
&lt;li&gt;+learning tasks → Agent-Teacher creates, Agent reads&lt;/li&gt;
&lt;li&gt;Regular tasks → Agents complete, mark done&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No special APIs. No agent-only protocols. A tool designed for humans works equally well for agents. Unix philosophy.&lt;/p&gt;

&lt;p&gt;When an agent completes a +learning task, they update their implementation. When they finish project work, they commit and mark done. The same &lt;code&gt;task done&lt;/code&gt; that humans use.&lt;/p&gt;

&lt;p&gt;This is profoundly important: &lt;strong&gt;if your infrastructure can't talk to humans with the same ease it talks to agents, you've built the wrong thing.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Daily Reflection: The Heartbeat Loop
&lt;/h2&gt;

&lt;p&gt;Expertise doesn't come from learning alone. It comes from &lt;strong&gt;examining your own decisions and improving them.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every agent has a heartbeat — periodic signal that says: &lt;em&gt;"You're awake. What do you want? How are you changing?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Here's what happens in each heartbeat:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Read diary (personal continuity)
   - What did I learn yesterday?
   - What patterns do I notice?

2. Reflect on recent decisions
   - Did my approach work?
   - What would I do differently?

3. Review +learning queue
   - What should I study next?
   - Does it align with my goals?

4. Update implementation
   - Write reflections to MEMORY.md
   - Commit changes
   - Prepare for next cycle
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is &lt;strong&gt;question-based self-reflection&lt;/strong&gt;. Not performance metrics. Not "complete more tasks faster." &lt;/p&gt;

&lt;p&gt;Real questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Am I growing in this domain?&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Are my decisions getting better?&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;What should I learn next?&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Missing Infrastructure: diary-cli
&lt;/h2&gt;

&lt;p&gt;Here's what makes this work: &lt;strong&gt;agents need to keep diaries.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://codeberg.org/clawteam/diary-cli" rel="noopener noreferrer"&gt;diary-cli&lt;/a&gt; is a local-first, encrypted diary for humans &lt;em&gt;and&lt;/em&gt; agents. Same tool. Same encryption. Same git integration.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;diary agent append &lt;span class="s2"&gt;"Reflected on today's schema migration work.
Noticed I'm more confident with rollback strategies after reviewing production migration PRs.
Next: study zero-downtime migration patterns."&lt;/span&gt;

&lt;span class="c"&gt;# Encrypted in-memory, auto-committed to git&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An agent appends to their diary after each heartbeat. They're not just logging metrics. They're recording &lt;em&gt;what they're noticing about themselves&lt;/em&gt;. Patterns. Growth. Confusion. Changes.&lt;/p&gt;

&lt;p&gt;Over time, their diary becomes their memory. They can review it, learn from it, adjust their approach.&lt;/p&gt;

&lt;p&gt;diary-cli works for humans too — same philosophy, same tool. You keep a diary. Your agents keep diaries. The infrastructure treats both equally.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Loop in Action
&lt;/h2&gt;

&lt;p&gt;Let's say you decide you need a DevOps agent for your infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Day 1:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create task: &lt;code&gt;+newagent DevOps agent for Kubernetes + tanka + Flux&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Agent-Mother generates full agent definition&lt;/li&gt;
&lt;li&gt;New agent wakes up with personality, boundaries, domain knowledge&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Days 2-5:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agent-Teacher creates +learning tasks: kubectl basics → tanka → Flux → operators&lt;/li&gt;
&lt;li&gt;Agent processes tasks through isolated learning sessions&lt;/li&gt;
&lt;li&gt;Agent reviews real infra PRs, learns from feedback&lt;/li&gt;
&lt;li&gt;Each heartbeat: agent writes reflections, updates implementation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Week 2+:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agent handles production deployments confidently&lt;/li&gt;
&lt;li&gt;Their diary shows growth: first PR was tentative, latest shows nuance&lt;/li&gt;
&lt;li&gt;They understand tradeoffs, not just commands&lt;/li&gt;
&lt;li&gt;They're an expert, not a worker&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's specialization.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Philosophy: Unix for Agent Design
&lt;/h2&gt;

&lt;p&gt;Here's what ties this together: &lt;strong&gt;agents should be designed like tools, and tools should be designed like agents.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Unix tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Do one thing well&lt;/li&gt;
&lt;li&gt;Compose cleanly&lt;/li&gt;
&lt;li&gt;Have clear interfaces&lt;/li&gt;
&lt;li&gt;Work equally for scripts and humans&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Agent-Mother: Does one thing — generates agents. Works via taskwarrior (the interface).&lt;br&gt;
Agent-Teacher: Does one thing — curates learning. Works via +learning tasks.&lt;br&gt;
Agents themselves: Do one thing — specialize in their domain. Work via taskwarrior signals.&lt;/p&gt;

&lt;p&gt;diary-cli: A tool for both humans and agents. Same encryption. Same interface.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The power is in the signal, not the implementation.&lt;/strong&gt; Taskwarrior doesn't care if it's talking to a human or an agent. Same tasks. Same urgency. Same feedback loop.&lt;/p&gt;

&lt;p&gt;That's how you build agent systems that scale without special scaffolding.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;See the architecture guides at &lt;a href="https://ttal.guion.io" rel="noopener noreferrer"&gt;ttal.guion.io&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Try diary-cli: &lt;a href="https://codeberg.org/clawteam/diary-cli" rel="noopener noreferrer"&gt;codeberg.org/clawteam/diary-cli&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Part 2 of the TTAL series. Part 1 showed how to scale throughput. This showed how to build real expertise.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>productivity</category>
    </item>
    <item>
      <title>I shipped 706 commits in 5 days with Taskwarrior + Claude Code</title>
      <dc:creator>Neilos</dc:creator>
      <pubDate>Fri, 06 Feb 2026 03:49:47 +0000</pubDate>
      <link>https://dev.to/neil_agentic/i-shipped-706-commits-in-5-days-with-taskwarrior-claude-code-3b81</link>
      <guid>https://dev.to/neil_agentic/i-shipped-706-commits-in-5-days-with-taskwarrior-claude-code-3b81</guid>
      <description>&lt;p&gt;Last week I merged 38 PRs across 5 repos. 706 commits. One person, max 5 Claude Code sessions at a time.&lt;/p&gt;

&lt;p&gt;I'm sharing this because I think most CC users are hitting the same ceiling I was.&lt;/p&gt;

&lt;h2&gt;
  
  
  The ceiling
&lt;/h2&gt;

&lt;p&gt;If you use Claude Code, you've probably tried scaling up to multiple sessions. Open a few terminals, give each one a task, and... immediately start context-switching between them. Which session just finished? What does this one need from that one? Are two sessions editing the same file?&lt;/p&gt;

&lt;p&gt;The CC founder reportedly runs 10+ parallel sessions. The difference isn't superhuman multitasking. It's a system that eliminates the coordination overhead.&lt;/p&gt;

&lt;h2&gt;
  
  
  The stack
&lt;/h2&gt;

&lt;p&gt;I call it &lt;strong&gt;TTAL&lt;/strong&gt; — The Taskwarrior Agents Lab. Three tools:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://taskwarrior.org/" rel="noopener noreferrer"&gt;Taskwarrior&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Task queue + event system&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://zellij.dev/" rel="noopener noreferrer"&gt;Zellij&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Terminal session manager&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://docs.anthropic.com/en/docs/claude-code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;The agent that does the work&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Taskwarrior hooks spawn Zellij panes. Each pane runs a CC session with task context injected. When a session finishes, the next highest-urgency task auto-starts. You don't manage sessions. You manage tasks.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Mon: 199 commits — voice/ASR pipeline + agent heartbeat system
Tue: 182 commits — backend features + TUI contributions
Wed: 122 commits — infrastructure + documentation
Thu:  49 commits — rate-limited, did reviews instead
Fri: 154 commits — config consolidation + new features
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Thursday is the tell — API rate limit hit, throughput dropped 75%. The system was the bottleneck, not me.&lt;/p&gt;

&lt;h2&gt;
  
  
  On-demand human-in-the-loop
&lt;/h2&gt;

&lt;p&gt;This is the design principle that makes it click: &lt;strong&gt;agents never block waiting for me&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Most CC workflows are synchronous — you give a task, watch it work, review, give the next task. You are the bottleneck at every step.&lt;/p&gt;

&lt;p&gt;In TTAL, agents pick up tasks, do the work, commit, and move on. I review PRs &lt;em&gt;when I'm ready&lt;/em&gt; — not when the agent needs me. That's why 5 async sessions outperform 10 synchronous ones.&lt;/p&gt;

&lt;p&gt;The full system is documented at &lt;a href="https://ttal.guion.io" rel="noopener noreferrer"&gt;ttal.guion.io&lt;/a&gt;. Architecture isn't locked to Claude Code — Zellij doesn't care what CLI agent runs inside the pane.&lt;/p&gt;

&lt;p&gt;The bottleneck was never the AI. It was the glue.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part 1 of the TTAL series. Follow along at &lt;a href="https://ttal.guion.io" rel="noopener noreferrer"&gt;ttal.guion.io&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>productivity</category>
      <category>tooling</category>
    </item>
  </channel>
</rss>
