<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Adrien Cossa</title>
    <description>The latest articles on DEV Community by Adrien Cossa (@souliane).</description>
    <link>https://dev.to/souliane</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3816852%2Fe449a0fc-14fe-4d60-994a-ffa47063be46.jpg</url>
      <title>DEV Community: Adrien Cossa</title>
      <link>https://dev.to/souliane</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/souliane"/>
    <language>en</language>
    <item>
      <title>Software engineering became software architecture.</title>
      <dc:creator>Adrien Cossa</dc:creator>
      <pubDate>Wed, 10 Jun 2026 12:39:23 +0000</pubDate>
      <link>https://dev.to/souliane/software-engineering-became-software-architecture-1ega</link>
      <guid>https://dev.to/souliane/software-engineering-became-software-architecture-1ega</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Highly opinionated, based on my personal experience dogfooding my own setup.&lt;br&gt;
Not a prescription. I'm scratching the surface, with a lot left to learn.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Is coding solved? I think so. But what's left to us? If there's something every developer can still refer to — even those who don't follow the buzz around agentic engineering, then harness engineering, and now loop engineering — it would be architecture. You could say we've been promoted, or soon will be, to manager of a team of agents. But the one part of the job that still speaks to each of us, the part we should keep doing ourselves, is software architecture.&lt;/p&gt;

&lt;p&gt;I'm not claiming the agent writes clean, typed, tested code out of the box — it doesn't, and I'll get to how you set that up. But that setup is mostly a one-time job plus a little maintenance, and once it's done, coding is basically solved. What's really left to us is the architectural decisions — the ones that separate good quality from good quality &lt;em&gt;plus&lt;/em&gt; a design that lasts. The taste. The judgment about what should exist and where the boundaries go. That layer is still on our side... or so I thought.&lt;/p&gt;

&lt;p&gt;The reality, at least for me, is a bit different. Even though I knew it wasn't a good idea, I'd already handed that layer to the model — inside teatree, the code factory I build and run for myself. It wasn't planned. It just happened — and I'm still watching it slip further away from me.&lt;/p&gt;

&lt;p&gt;"Coding is solved" assumes one thing I'll set aside here: a complete spec going in. I treat the spec I hand the agent as bulletproof — making sure it actually is, and asking questions when it isn't, is my job, just not this post's subject. Now let me start from the beginning.&lt;/p&gt;

&lt;h2&gt;
  
  
  What a gate is, and what it does
&lt;/h2&gt;

&lt;p&gt;A gate, the way I'll use the word here, is a deterministic check that returns a pass-or-fail verdict and blocks the commit on a fail. Same input, same verdict, every time — no matter which model ran or what context was loaded. Run one on every commit and the output can only move one way: toward whatever the gate calls "good." That's convergence. Tech debt piles up when nothing pushes back. A gate pushes back, and you can rely on it to.&lt;/p&gt;

&lt;p&gt;This matters more in the agent era, not less. The agent out-produces my reading — it writes more code, faster, than I would by hand, and more than I can carefully read. Convergence is what keeps that volume from becoming instant debt. Drop the gates and the same speed that ships features ships mess at the same rate. The gates aren't perfect, but they're necessary.&lt;/p&gt;

&lt;p&gt;I work in Python, so my gates are &lt;code&gt;ruff&lt;/code&gt;, &lt;code&gt;ty&lt;/code&gt; and &lt;code&gt;tach&lt;/code&gt;, run through &lt;code&gt;prek&lt;/code&gt;, plus a stack of project-specific hooks. teatree — the personal code factory I'm dogfooding — is a Django project that turns a ticket URL into a merged PR (in theory). Its pre-commit pipeline runs more than sixty hooks in a numbered sequence. Most fire on every commit, a few only at push or in CI. The prose below leans on a handful of them:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Gate&lt;/th&gt;
&lt;th&gt;What it blocks&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Safety guards&lt;/td&gt;
&lt;td&gt;commits and pushes that shouldn't happen at all — a merged branch, a public leak, a secret&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lint and structure&lt;/td&gt;
&lt;td&gt;rule, boundary and duplication violations — &lt;code&gt;ruff&lt;/code&gt; (every rule on), &lt;code&gt;tach&lt;/code&gt;, &lt;code&gt;import-linter&lt;/code&gt;, a 500-line file cap&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Types&lt;/td&gt;
&lt;td&gt;type errors and warnings — &lt;code&gt;ty&lt;/code&gt;, with warnings failing the commit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gate guards&lt;/td&gt;
&lt;td&gt;silent relaxations of the gates themselves&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Evals (token-free)&lt;/td&gt;
&lt;td&gt;behaviour regressions — skill-triggers, pinned-regressions&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The rest — formatting fixers, lockfile sync, doc generators, dependency audits, a conventional-commit check — do the unglamorous tidying you'd expect. Two of the gates above run stricter than people usually set them. &lt;code&gt;ruff&lt;/code&gt; runs with every rule on, exceptions justified one by one. &lt;code&gt;ty&lt;/code&gt; fails the commit on a warning instead of letting it scroll past. Coverage sits behind a hard floor, with a stricter per-module floor on the newer code, so one module can't rot quietly while the project number stays green. None of it is advisory. A failing gate is a failing commit.&lt;/p&gt;

&lt;h2&gt;
  
  
  The ratchet only turns one way
&lt;/h2&gt;

&lt;p&gt;A check you — or your agent — can quietly widen or switch off stops being a constraint. The cheap way out of a red check is to widen the ignore list or lower the floor. So one of my gate guards watches for exactly that: it fails any commit that touches a lint-ignore list, a coverage floor, an &lt;code&gt;omit&lt;/code&gt; pattern, or reaches for &lt;code&gt;--no-cov&lt;/code&gt; or &lt;code&gt;--no-verify&lt;/code&gt;, and tells you to fix the underlying issue instead. The escape hatches are themselves gated, so convergence stays one-directional.&lt;/p&gt;

&lt;p&gt;A second guard works on structure, not lint. A 500-line file cap and a cap on module-level functions don't force every oversized file under the limit at once. A file already over the line stays — but it can only shrink. A commit that grows it is blocked, and newly crossing a cap is blocked outright. Structure ratchets the same way coverage does.&lt;/p&gt;

&lt;p&gt;But the durable thing isn't the rule. It's where the rule lives. A rule kept in prose — a skill, a note, a thing I have to remember — isn't reliable, because prose gets read inconsistently or not at all. The same rule as a hook is as reliable as your unit tests. My blueprint, the one document that records the system's current shape, puts it plainly: durability comes from enforcement encoded in code and structure, not prose that decays. You can push prose to hold more strictly than that — but it's a harder problem, and a later post.&lt;/p&gt;

&lt;p&gt;So a lot of tooling that predates the agentic era just became more necessary than ever. The kind I lean on more lately are the structural ones — what I've seen called architectural fitness functions: a deterministic test that checks a property of the whole module graph rather than a single line. &lt;code&gt;tach&lt;/code&gt; enforces dependency direction (which module is allowed to import which — the module DAG, the layering). A chokepoint registry maps each dangerous primitive to the one module allowed to call it: every outbound network call, say, has to go through a single egress module, so a raw HTTP call made anywhere else fails the check. I reach for these because of volume. "Remember not to do X" loses when the agent writes more than I can read. A structural test that makes the violation impossible doesn't depend on anyone reading anything — and it's declarative, so one line catches the whole class, including code that doesn't exist yet.&lt;/p&gt;

&lt;h2&gt;
  
  
  The part nothing gates
&lt;/h2&gt;

&lt;p&gt;Every gate above checks the code. None of them checks whether the design is right.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ruff&lt;/code&gt; will tell you a function is too complex. It won't tell you the function shouldn't exist, or belongs in a different module, or that the boundary it sits behind is in the wrong place. &lt;code&gt;ty&lt;/code&gt; catches a type error and waves through a wrong abstraction that happens to type-check. Coverage tells you the code is exercised, not that it's the right code to exercise. You can have all four legs from &lt;a href="https://dev.to/souliane/coding-is-solved-the-factory-isnt-18i3"&gt;Part 0&lt;/a&gt; in place — the model, the harness, the deterministic constraints (the gates this post is about), and the skills — pass every gate, and still ship a clean, fully-typed, fully-covered implementation of the wrong architecture.&lt;/p&gt;

&lt;p&gt;Architecture has no deterministic gate. No fixed rule returns the same pass or fail every time on whether a design is right.&lt;/p&gt;

&lt;p&gt;The closest thing in my setup is a design companion — a checklist that fires before any code touches the core surfaces (the CLI, the core models, the scanners, the overlay base class, a backend protocol) and makes the agent reason through nine checks first:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Layout&lt;/strong&gt; — blueprint alignment, component boundaries, dependency direction&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Contracts&lt;/strong&gt; — FSM phase boundaries (which moves from one workflow phase to the next are legal), extension-point contracts, behavior preservation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Under change&lt;/strong&gt; — test surface, resilience invariants, identity and key normalization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Of those nine, exactly one is backed by a real gate: dependency direction, which &lt;code&gt;tach&lt;/code&gt; enforces. The other eight produce design questions and nothing more — no verdict behind them. Someone still has to reason about whether the answer is right, and no hook I can write resolves that. It isn't a gap I haven't gotten to yet. With fixed rules, it's the shape of the problem.&lt;/p&gt;

&lt;p&gt;But fixed rules aren't the only kind of check. There's another kind — non-deterministic, grading behaviour rather than asserting a line — and that's where Part 2 goes. The tidy conclusion, "no gate, so it stays human," leans on a binary that doesn't hold, so I'd rather leave that door open than slam it here.&lt;/p&gt;

&lt;h2&gt;
  
  
  I never decided to hand it over
&lt;/h2&gt;

&lt;p&gt;Here's the part I got wrong about my own setup.&lt;/p&gt;

&lt;p&gt;I assumed those eight verdict-less checks would route to me. I'm the human, architecture is the human's job, so the companion fires and I sign off. They don't route to me, and the reason is mundane. The companion fires on every change to a core surface. Signing off on each pass means the agent stops and waits for me every few minutes. That doesn't scale — and I never sat down and decided it didn't. I just stopped doing it, the way you stop reading a dialog box you've seen a hundred times.&lt;/p&gt;

&lt;p&gt;So the call defaults to the agent. It makes the architectural decision, writes the code, and only pulls me in when it flags its &lt;em&gt;own&lt;/em&gt; uncertainty. Which means the agent is already making most of the architectural decisions in this system — not because I reasoned my way to delegating them, but because the alternative was an interruption I couldn't sustain.&lt;/p&gt;

&lt;h2&gt;
  
  
  Found by use, not by spec
&lt;/h2&gt;

&lt;p&gt;If you can't gate whether the architecture is right, you find out the only other way: you run the thing and watch it break.&lt;/p&gt;

&lt;p&gt;My own README is blunt about it — not a stable product, expected to break, expected to change shape, dogfooded daily on real work. A design flaw in a system you don't use is a hypothesis. A design flaw in a system you depend on every day is a stalled ticket, and a stalled ticket is impossible to ignore. Battle-testing isn't a phase after the design. It is the design process — the only honest signal I have about architecture.&lt;/p&gt;

&lt;p&gt;teatree's current shape wasn't specced up front. It grew, in this order: it started as one monolithic skill, &lt;code&gt;ac-multitask&lt;/code&gt; — take a ticket, run it end to end. I split that into about eight lifecycle skills, one per phase, and those became the &lt;code&gt;t3-*&lt;/code&gt; skill system. A unified &lt;code&gt;t3&lt;/code&gt; CLI with a finite-state machine pulled them together. Then it became a Django extension — models, migrations, real persistence. Then the inversion: the whole thing became the Django project itself, with the overlays demoted to lightweight packages on top. Then I packaged it as a Claude plugin. Each shape change came from hitting a wall with the previous one, not from a plan that anticipated the wall.&lt;/p&gt;

&lt;p&gt;The blueprint is where the current shape is written down — but after use confirmed it, as a record of what survived, not a spec dictated before the first line.&lt;/p&gt;

&lt;h2&gt;
  
  
  So what's the job now?
&lt;/h2&gt;

&lt;p&gt;So the agent has the &lt;em&gt;volume&lt;/em&gt; of architectural decisions now. It doesn't yet have the &lt;em&gt;judgment&lt;/em&gt; to catch its own bad ones.&lt;/p&gt;

&lt;p&gt;It still makes beginner mistakes — a boundary in the wrong place, a decision heading the wrong direction — the kind any experienced developer has made once and learned to spot on sight. I catch some of them by reading the agent's reasoning as it goes. Part of that is plain developer experience. The other part is knowing this particular model — where it oversells a fix, where it quietly papers over something it couldn't do, the kind of task it's already failed three times.&lt;/p&gt;

&lt;p&gt;So if I've handed off the code and most of the first-pass calls, what's left? The part around them. Building the gates so the convergent work converges without me. Shaping what the agent reasons against — the blueprint, the boundaries, the chokepoints — so its default lands closer to right. Reading the reasoning on the decisions that matter, and catching the ones it gets wrong. Deciding what gets built now and what waits — the product calls, made even when there's no spec written down.&lt;/p&gt;

&lt;p&gt;That last part is product work, not engineering: I'm the product manager here as much as the developer. I'm still at the keyboard all day, just not writing code — I read what the agent produces and write back what to do next. (Until that turns into talking out loud, which it will.)&lt;/p&gt;

&lt;p&gt;How long that holds, I don't know. The model keeps narrowing the gap, and the day it catches its own architectural mistakes, the job moves again — the way it just moved from writing code to shaping the thing that writes it. I'd rather watch that line move than pretend it's holding still. That's most of why I'm writing this down.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>agents</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Hosting OpenClaw: a money trap and two silent failures</title>
      <dc:creator>Adrien Cossa</dc:creator>
      <pubDate>Wed, 10 Jun 2026 12:39:06 +0000</pubDate>
      <link>https://dev.to/souliane/self-hosting-openclaw-a-money-trap-and-two-silent-failures-21df</link>
      <guid>https://dev.to/souliane/self-hosting-openclaw-a-money-trap-and-two-silent-failures-21df</guid>
      <description>&lt;p&gt;I run &lt;a href="https://github.com/openclaw/openclaw" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; on a Hetzner CAX ARM VPS. It talks to me over Signal and does a morning press review. Three gotchas on that box are worth writing down: one money trap in the model-routing layer, and two silent failures that each left the briefing dead for days. In case someone is staring at the same thing.&lt;/p&gt;

&lt;h2&gt;
  
  
  OpenRouter can spend your credits on a provider you didn't pick
&lt;/h2&gt;

&lt;p&gt;I use &lt;a href="https://openrouter.ai" rel="noopener noreferrer"&gt;OpenRouter&lt;/a&gt; as the single door to a pile of models. Its BYOK (bring-your-own-key) feature has a trap. You add your own OpenAI key for a model, flip on "Always use for this provider," and read that as &lt;em&gt;never spend OpenRouter credits.&lt;/em&gt; It doesn't mean that.&lt;/p&gt;

&lt;p&gt;The toggle only guarantees they use &lt;em&gt;your&lt;/em&gt; key for &lt;em&gt;that&lt;/em&gt; provider. It does &lt;strong&gt;not&lt;/strong&gt; stop OpenRouter routing to a &lt;em&gt;different&lt;/em&gt; provider that serves the same model when your key fails or is unavailable — and that fallback spends your OpenRouter credits, at pay-per-token rates, on a provider you never picked. Working as designed. Just not the design in your head when you flip the toggle.&lt;/p&gt;

&lt;p&gt;The setting that does what I wanted is the &lt;code&gt;provider.only&lt;/code&gt; routing param. It tells OpenRouter to &lt;strong&gt;fail the request rather than fall back&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"only"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"your-provider-slug"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The part that surprised me even without BYOK: a single model spans a wide price range by provider. I checked &lt;code&gt;openai/gpt-oss-120b&lt;/code&gt; (what I actually run) via &lt;code&gt;GET /api/v1/models/openai/gpt-oss-120b/endpoints&lt;/code&gt; — it runs from about &lt;strong&gt;$0.039 to $0.95 per million tokens&lt;/strong&gt;. Default routing optimises its own mix of price and availability and can land you on the expensive end silently.&lt;/p&gt;

&lt;p&gt;So I pinned it. In OpenClaw the routing params live under &lt;code&gt;models.providers.openrouter.params.provider&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"only"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"deepinfra"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"dekallm"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"novita"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"sort"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"price"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now OpenRouter uses only those three, cheapest first, and a failure drops to my free fallback instead of escalating to a pricey provider. One catch that cost me a few minutes: &lt;code&gt;only&lt;/code&gt; wants provider &lt;em&gt;slugs&lt;/em&gt;, not display names — the slug is the part before the &lt;code&gt;/&lt;/code&gt; in each endpoint's &lt;code&gt;tag&lt;/code&gt; field.&lt;/p&gt;

&lt;p&gt;Two guardrails worth setting regardless:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenRouter API keys have &lt;strong&gt;no spend limit by default&lt;/strong&gt; — mine showed &lt;code&gt;"limit": null&lt;/code&gt;. Set one at &lt;code&gt;openrouter.ai/settings/keys&lt;/code&gt;. A cap is the one protection that doesn't depend on getting the routing right.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;openrouter.ai/activity&lt;/code&gt; shows which provider actually served each request — where to look when you suspect a provider you didn't pick.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There's a catch to &lt;code&gt;sort: price&lt;/code&gt; that pulls the opposite way: the cheapest providers are the least reliable. My daily digest started failing after a day — &lt;code&gt;LLM request failed&lt;/code&gt;, nothing else. The gateway was healthy, so it wasn't the auto-update below. The route was the problem: a long digest turn kept landing on whichever cheap provider was flaky that morning. The fix was to pin the cron's agent to a few reliable, still-cheap &lt;em&gt;paid&lt;/em&gt; providers and sort by &lt;strong&gt;throughput&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"only"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"groq"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"together"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"baseten"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"sort"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"throughput"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a digest of about 30k tokens a day, that's around $0.20 a month — and the next run came through. Pin by price for anything you can afford to retry, but for an automation that &lt;em&gt;must&lt;/em&gt; deliver, the cheapest route is a false economy. Free tiers rate-limit and cheap providers get flaky exactly when you lean on them. Two more things the same episode taught me: routing reliability belongs in the routing config, not the agent's model list — and an over-eager edit to that model block can hard-fail the gateway on restart, so back up the file first.&lt;/p&gt;

&lt;h2&gt;
  
  
  signal-cli crashes on ARM64, and the fix is a container
&lt;/h2&gt;

&lt;p&gt;The briefing went quiet. The gateway was up and healthy, but signal-cli kept dying. The crash pointed at a &lt;code&gt;SIGSEGV&lt;/code&gt; deep inside libsignal's native JNI code during message &lt;em&gt;encryption&lt;/em&gt; — generation worked fine, every send killed the daemon, and the watchdog kept restarting it into the same wall.&lt;/p&gt;

&lt;p&gt;The root cause is an ARM64 packaging gap. signal-cli's bundled &lt;code&gt;libsignal-client&lt;/code&gt; jar ships a native library for Linux x86 and macOS ARM, but &lt;strong&gt;not Linux ARM64&lt;/strong&gt;. On an ARM box it falls back to a mismatched &lt;code&gt;libsignal_jni.so&lt;/code&gt; that corrupts JNI handles and segfaults on the encrypt path. I burned time on JVM flags first — interpreter-only (&lt;code&gt;-Xint&lt;/code&gt;), a different garbage collector, disabling compressed oops — and every one still crashed. It's not a JIT or GC problem. The native library is simply wrong for the platform.&lt;/p&gt;

&lt;p&gt;What worked: stop fixing the native build by hand and run signal-cli from a container that ships correct ARM64 binaries — &lt;a href="https://github.com/bbernhard/signal-cli-rest-api" rel="noopener noreferrer"&gt;&lt;code&gt;bbernhard/signal-cli-rest-api&lt;/code&gt;&lt;/a&gt;. Mount the existing account data so there's no re-pairing and no safety-number change:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--name&lt;/span&gt; signal-daemon &lt;span class="nt"&gt;--restart&lt;/span&gt; unless-stopped &lt;span class="nt"&gt;--no-healthcheck&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt; 127.0.0.1:8080:8080 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;XDG_DATA_HOME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/data &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;HOME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/tmp &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$HOME&lt;/span&gt;&lt;span class="s2"&gt;/.local/share:/data"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--user&lt;/span&gt; 1000:1000 &lt;span class="nt"&gt;--entrypoint&lt;/span&gt; signal-cli &lt;span class="se"&gt;\&lt;/span&gt;
  bbernhard/signal-cli-rest-api:latest &lt;span class="se"&gt;\&lt;/span&gt;
  daemon &lt;span class="nt"&gt;--http&lt;/span&gt; 0.0.0.0:8080 &lt;span class="nt"&gt;--no-receive-stdout&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then set &lt;code&gt;channels.signal.autoStart&lt;/code&gt; to &lt;code&gt;false&lt;/code&gt; and &lt;code&gt;channels.signal.httpUrl&lt;/code&gt; to &lt;code&gt;http://127.0.0.1:8080&lt;/code&gt; so OpenClaw connects to the container instead of spawning the broken local binary. Sends stopped crashing, and a direct send test came back &lt;code&gt;SUCCESS&lt;/code&gt;. Updates are now a &lt;code&gt;docker pull&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;One rough edge still open: in the container, replies to incoming messages resolve their recipient fine, but the &lt;em&gt;proactive&lt;/em&gt; scheduled send (the cron "announce") doesn't resolve the explicit recipient and errors before sending — even though the generated text is right there in the run record. For now I get a briefing by messaging the bot. Tracking down why the scheduled push differs is still on my list.&lt;/p&gt;

&lt;h2&gt;
  
  
  An auto-update can kill your scheduled jobs without a single error
&lt;/h2&gt;

&lt;p&gt;The briefing went quiet again — but nothing crashed. The gateway was healthy, &lt;code&gt;systemctl status&lt;/code&gt; said &lt;code&gt;active (running)&lt;/code&gt;, restart count low. The daily cron just stopped firing: no error, no run-log row. Two days passed before I noticed.&lt;/p&gt;

&lt;p&gt;The cause: OpenClaw auto-updated on disk and migrated its cron store as part of the bump — the per-feature JSON files got consolidated into a single &lt;code&gt;~/.openclaw/state/openclaw.sqlite&lt;/code&gt;, the old ones renamed &lt;code&gt;*.migrated&lt;/code&gt;. But it never restarted the running process. The old in-memory scheduler still pointed at the now-renamed &lt;code&gt;jobs.json&lt;/code&gt;, which no longer existed. New code on disk, old code in memory, store moved out from under both.&lt;/p&gt;

&lt;p&gt;The class is worth naming: an auto-updating daemon that migrates state without restarting fails silently. "It worked yesterday" means nothing across an auto-update. Newer OpenClaw ships a restart-after-update path, but the fallback may not pick up a store migration, and I didn't want to bet the briefing on that.&lt;/p&gt;

&lt;p&gt;The fix watches the outcome, not the process — because the process was green the whole time. A small stdlib Python script on a daily systemd timer, run an hour after the brief is due, asks one question: did today's expected output actually arrive? On a miss it messages me over the same Signal channel, naming the problem and the fix (&lt;code&gt;sudo systemctl restart openclaw.service&lt;/code&gt;). A healthy day sends nothing. Checking "is the daemon up" stays green for two days. Checking "did the thing I wanted happen" catches it the same morning.&lt;/p&gt;

&lt;p&gt;One footgun while debugging: don't run the &lt;code&gt;openclaw&lt;/code&gt; CLI on the host. &lt;code&gt;openclaw cron list&lt;/code&gt;, &lt;code&gt;openclaw doctor&lt;/code&gt;, any of them — the CLI detects the running gateway's PID and SIGTERMs it before starting its own, knocking the service over for 20-30 seconds. Edit the cron store directly, hit signal-cli's JSON-RPC for sends, and admin from anywhere but the box.&lt;/p&gt;




&lt;p&gt;The pattern across all three: the thing that tells you it's fine — the toggle, &lt;code&gt;systemctl status&lt;/code&gt;, the watchdog — is measuring something next to what you actually care about. The cure each time was to watch the outcome and keep a way back in. If you're running OpenClaw on your own box and hit a fourth one of these, or found a cleaner fix for that proactive-send gap, I'd like to hear it.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>selfhosted</category>
      <category>openclaw</category>
      <category>openrouter</category>
    </item>
    <item>
      <title>Text-to-Speech for Claude Code — Hear What the Agent Is Doing</title>
      <dc:creator>Adrien Cossa</dc:creator>
      <pubDate>Sat, 06 Jun 2026 20:35:46 +0000</pubDate>
      <link>https://dev.to/souliane/text-to-speech-for-claude-code-hear-what-the-agent-is-doing-3mom</link>
      <guid>https://dev.to/souliane/text-to-speech-for-claude-code-hear-what-the-agent-is-doing-3mom</guid>
      <description>&lt;p&gt;Claude Code can already listen to you. Run &lt;code&gt;/voice&lt;/code&gt; and you get push-to-talk dictation — you speak, it transcribes into the prompt (&lt;a href="https://code.claude.com/docs/en/voice-dictation" rel="noopener noreferrer"&gt;docs&lt;/a&gt;). What it does not do is talk back. When I leave a long task running, I either babysit the terminal or miss the moment it finishes or asks a question.&lt;/p&gt;

&lt;p&gt;So I added the other half: text-to-speech. A hook reads the agent's replies aloud. I can be in another room and still hear "done, tests pass" or "I need a decision here". This post has two parts — a small recipe anyone can paste into their config, and how I wired the same idea into my own tooling for the times I'm not at my desk.&lt;/p&gt;

&lt;p&gt;This is a personal hack, not a Claude Code feature. It reads short text aloud after the agent stops. That's it. No wake words, no conversation, no reading code blocks (you don't want that).&lt;/p&gt;

&lt;h2&gt;
  
  
  The recipe: a hook + your OS speech command
&lt;/h2&gt;

&lt;p&gt;Claude Code &lt;a href="https://code.claude.com/docs/en/hooks" rel="noopener noreferrer"&gt;hooks&lt;/a&gt; run a shell command on lifecycle events. The two that matter here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Stop&lt;/strong&gt; — fires when the agent finishes responding. It receives the path to the conversation transcript on stdin.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Notification&lt;/strong&gt; — fires when Claude Code wants your attention (a permission prompt, an idle nudge). It receives the notification text on stdin as a &lt;code&gt;message&lt;/code&gt; field.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Notification is the simplest win, so start there. Every OS ships a speech command: &lt;code&gt;say&lt;/code&gt; on macOS, &lt;code&gt;spd-say&lt;/code&gt; or &lt;code&gt;espeak-ng&lt;/code&gt; on Linux, and a one-line PowerShell call on Windows.&lt;/p&gt;

&lt;p&gt;Here is a Notification hook that speaks the message. Put it in &lt;code&gt;~/.claude/settings.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Notification"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"jq -r '.message // empty' | say"&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;jq&lt;/code&gt; reads the &lt;code&gt;message&lt;/code&gt; field from the JSON on stdin, and &lt;code&gt;say&lt;/code&gt; (macOS) reads piped text aloud. On Linux swap &lt;code&gt;say&lt;/code&gt; for &lt;code&gt;spd-say -e&lt;/code&gt; or &lt;code&gt;espeak-ng&lt;/code&gt;, both of which also read stdin. On Windows, point the command at PowerShell:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"jq -r '.message // empty' | powershell -Command &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;Add-Type -AssemblyName System.Speech; (New-Object System.Speech.Synthesis.SpeechSynthesizer).Speak([Console]::In.ReadToEnd())&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That covers the "needs your attention" case. If you also want the agent to read its actual reply, add a Stop hook. The wrinkle: Stop gives you the transcript path, not the text. The transcript is JSONL (one JSON object per line), so you pull the last assistant text block out of it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Stop"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"jq -rs 'map(select(.type==&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;assistant&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;)) | last | .message.content[]? | select(.type==&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;text&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;) | .text' &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;$(jq -r .transcript_path)&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt; 2&amp;gt;/dev/null | head -c 600 | say"&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few honest caveats, because this is where it gets rough:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cap the length.&lt;/strong&gt; &lt;code&gt;head -c 600&lt;/code&gt; stops &lt;code&gt;say&lt;/code&gt; droning through a 4 KB status report. Pick your own limit.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strip markdown if you can.&lt;/strong&gt; Read aloud, code fences and URLs are noise. The recipe above doesn't strip them — for a one-liner it's tolerable, but a real version should.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The transcript shape is not a stable public contract.&lt;/strong&gt; The &lt;code&gt;jq&lt;/code&gt; filter above matches the current JSONL layout. If Claude Code changes it, the filter breaks. Treat it as a hack, not an API.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For most people the Notification hook alone is enough, and it's the part least likely to break.&lt;/p&gt;

&lt;h2&gt;
  
  
  The t3 extra: speak settings
&lt;/h2&gt;

&lt;p&gt;I keep my Claude Code automation in a project called &lt;a href="https://github.com/souliane/teatree" rel="noopener noreferrer"&gt;teatree&lt;/a&gt;. It has a &lt;code&gt;t3 speak&lt;/code&gt; command driven by one &lt;code&gt;[teatree.speak]&lt;/code&gt; table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[teatree.speak]&lt;/span&gt;
&lt;span class="py"&gt;local&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"dm"&lt;/span&gt;   &lt;span class="c"&gt;# what plays on this machine's speakers: "dm" | "all" | "off"&lt;/span&gt;
&lt;span class="py"&gt;slack&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;   &lt;span class="c"&gt;# attach a spoken audio file to each bot→user Slack DM&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;local&lt;/code&gt; controls the speakers in front of you: &lt;code&gt;dm&lt;/code&gt; reads only the bot's DMs to you, &lt;code&gt;all&lt;/code&gt; also reads every agent turn aloud, &lt;code&gt;off&lt;/code&gt; is silent. &lt;code&gt;slack&lt;/code&gt; attaches a spoken audio file to each bot→user DM. The two are independent, and both default off, so it does nothing until you configure it.&lt;/p&gt;

&lt;p&gt;Two destinations because there are two places I am. At the desk, &lt;code&gt;local&lt;/code&gt; plays through the speakers the moment a DM lands — no clicking. Away from it, &lt;code&gt;slack&lt;/code&gt; is what I reach for: the spoken text arrives as an audio file attached to the DM, and on the phone I press play. Not hands-free, but I can listen while moving instead of stopping to read.&lt;/p&gt;

&lt;p&gt;Two operational notes. The voice comes from macOS &lt;code&gt;say&lt;/code&gt;. And &lt;code&gt;slack&lt;/code&gt; needs the bot's file-upload permission, so an existing bot has to be reinstalled once to grant it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where it stands
&lt;/h2&gt;

&lt;p&gt;The hook recipe is the part I'd actually recommend trying — it's a few lines and it degrades gracefully. The teatree side is tied to my own setup, so take it as one way to structure the same idea rather than something to copy verbatim.&lt;/p&gt;

&lt;p&gt;I'm still figuring out how much to read aloud. &lt;code&gt;local = "all"&lt;/code&gt; gets chatty fast. &lt;code&gt;dm&lt;/code&gt; is calmer but misses things. If you try this, I'd be curious what threshold works for you.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claudecode</category>
      <category>automation</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Coding is solved. The factory isn't.</title>
      <dc:creator>Adrien Cossa</dc:creator>
      <pubDate>Fri, 05 Jun 2026 09:23:20 +0000</pubDate>
      <link>https://dev.to/souliane/coding-is-solved-the-factory-isnt-18i3</link>
      <guid>https://dev.to/souliane/coding-is-solved-the-factory-isnt-18i3</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Highly opinionated, based on my personal experience. Not a prescription —&lt;br&gt;
just notes from what I keep figuring out while dogfooding my own setup. I'm&lt;br&gt;
scratching the surface, with a lot left to learn.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I'm building a multi-repo personal code factory. I don't spec it up front: I dogfood it day by day — using it, and asking for improvements or fixes when something breaks. The architectural decisions still can't be made blindly by the models, so daily use is how the system finds its shape. Two qualifiers about scope, then four claims.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Personal means local — for better and worse.&lt;/strong&gt; I call this a &lt;em&gt;personal&lt;/em&gt; code factory because it runs on my own laptop, with the credentials I already carry as a developer — not as managed infrastructure with its own auth, audit, and isolation. It's a personal dev tool, with a personal dev tool's trade-offs: the agent works inside the access I already have, the way I'd run anything else on my own machine.&lt;/p&gt;

&lt;p&gt;The upside of that scope is speed. It's my own tool on my own machine, so there's nothing to coordinate — no infra to provision, no shared environment to keep in sync, nothing to wait on. It just automates what I'd otherwise do by hand, and that's what lets me experiment fast and reshape the thing every week.&lt;/p&gt;

&lt;p&gt;The downside is the same thing said the other way: because it's mine, it doesn't help anyone else, and it's nowhere near as efficient as something that would run on GitLab or Slack directly. This is a POC. If it turns out to work, the right move is to promote it to actual company infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-repo means a particular kind of hard.&lt;/strong&gt; I run it on multi-repo because that's what my work looks like. If your code lives in a single repo, a lot of what I describe in this series either disappears or shows up differently. I'm not claiming the multi-repo case is the interesting one — just that it's the one I have.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Coding is solved
&lt;/h2&gt;

&lt;p&gt;Coding is solved — Cherny's phrase, and I think he's right. It took four things, and they're not the same thing.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;model&lt;/strong&gt;: capable enough to write the code.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;harness&lt;/strong&gt;: what lets the model act instead of just emitting text — read the repo, run the tests, iterate, fix. (Mine is Claude Code, but the principle isn't tied to it.)&lt;/p&gt;

&lt;p&gt;A layer of &lt;strong&gt;deterministic constraints&lt;/strong&gt;: checks that keep the output converging toward quality instead of tech debt. I work in Python, so for me that's &lt;a href="https://docs.astral.sh/ruff/" rel="noopener noreferrer"&gt;ruff&lt;/a&gt;, &lt;a href="https://github.com/astral-sh/ty" rel="noopener noreferrer"&gt;ty&lt;/a&gt;, &lt;a href="https://github.com/gauge-sh/tach" rel="noopener noreferrer"&gt;tach&lt;/a&gt; run through &lt;a href="https://github.com/j178/prek" rel="noopener noreferrer"&gt;prek&lt;/a&gt;, plus &lt;a href="https://github.com/gitleaks/gitleaks" rel="noopener noreferrer"&gt;gitleaks&lt;/a&gt; and a stack of project-specific hooks. Different language, different tools — the constraint is the point, not the toolchain.&lt;/p&gt;

&lt;p&gt;And &lt;strong&gt;skills&lt;/strong&gt;: written guidance that gives the model the business and project knowledge to make the right call in &lt;em&gt;this&lt;/em&gt; codebase, not a generic one.&lt;/p&gt;

&lt;p&gt;Take any one of the four away and it stops working. What none of them guarantees is that the architecture is &lt;em&gt;right&lt;/em&gt; — and that is the next claim.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. The factory around it isn't, and you can't specify it
&lt;/h2&gt;

&lt;p&gt;The factory around it isn't solved. I don't think you can specify it up front.&lt;/p&gt;

&lt;p&gt;There are two ways to get a system that builds and ships software for you. One: write the spec — every edge case, every failure mode, every integration — hand it to an agent, let it build. Two: use it every day and fix what breaks.&lt;/p&gt;

&lt;p&gt;I don't believe in the first — at least I wouldn't try it. A spec for a system that builds, reviews, and ships software ends up being more or less the system itself: you don't find out which edges bite until they bite. And the architectural calls inside it still can't be made blindly by the models, so the spec would have to make all of them in advance — that's the part I don't see working yet.&lt;/p&gt;

&lt;p&gt;That leaves the second way.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Dogfooding is the only loop where any of this works
&lt;/h2&gt;

&lt;p&gt;That leaves dogfooding — using the thing every day, fixing what breaks, keeping it running tomorrow.&lt;/p&gt;

&lt;p&gt;Dogfooding fuses three things into one loop that no spec can: verifying the system works, improving it where it's wrong, and keeping it running long enough to do both. The first two are the same act — you verify by trying to use it, and the parts that don't work are the parts you fix.&lt;/p&gt;

&lt;p&gt;Making that verification less manual split into two halves. The proactive half is a test suite that checks whether the agent &lt;em&gt;behaves&lt;/em&gt; as intended — did it reach for the right tool, did it avoid the wrong one — so a behavior regression shows up as a red test instead of going unnoticed for days. I'm only starting on these: a handful of behavioral scenarios plus the deterministic checks around them, noisy enough that I don't lean on them yet. The reactive half is a runtime hook that catches a bad action as it happens and refuses it — the backstop for when the agent misbehaves anyway. I lean on those far more today. But every backstop I need is something the proactive half didn't catch in time. If the evals and the agent were good enough, the gates would be dead weight. They aren't yet, so I keep both.&lt;/p&gt;

&lt;p&gt;The third thing in the loop is the precondition. &lt;strong&gt;Self-improvement and resilience are two sides of the same coin.&lt;/strong&gt; A system that shuts down can't keep improving itself. If I had to pick which matters more, it's resilience — improvement stops the moment the loop stops. You don't get either by specifying them. You get both by running the thing every day and refusing to let it stay broken.&lt;/p&gt;

&lt;p&gt;So who orchestrates the loop? That's the last claim.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Orchestration is the last thing to automate
&lt;/h2&gt;

&lt;p&gt;Orchestration looks like the part that stays human: holding the big picture, deciding what gets attention first, noticing when two threads are about the same thing, deciding what to keep and what to drop.&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://github.com/souliane/teatree" rel="noopener noreferrer"&gt;teatree&lt;/a&gt; most of it already runs without me. One orchestrator with the big picture, not a swarm — it arbitrates and hands the actual work to sub-agents. What still needs me is basically troubleshooting and steering, and I assume that the loop can't be fully closed as long as the behavioral evals are missing.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the rest of the series is
&lt;/h2&gt;

&lt;p&gt;I'll try to publish roughly one post a week. Each one is the thing I keep getting wrong and trying to get less wrong:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Part 1 — Software engineering became software architecture.&lt;/strong&gt; Deterministic constraints solve code quality. Nothing solves whether the architecture is &lt;em&gt;right&lt;/em&gt; — that's what's left for a human, and there is no gate for it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 2 — Suppose the skill is never followed.&lt;/strong&gt; Why I treat prose guidance as decorative and everything that matters as a hook, why I had to invent memory because skills aren't reliably read, and how I'm starting to write evals — a test suite that checks whether the agent actually behaved as planned — so I catch a skill being ignored before a hook has to.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 3 — Make yourself an optional reviewer.&lt;/strong&gt; The closed-loop part, including the surprisingly hard subproblem of letting the system merge PRs without my approval without it feeling reckless.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 4 — One orchestrator, many loops.&lt;/strong&gt; Why I run a single session with many sub-agents instead of many sessions, what that costs, and the honest ceiling I think it has.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 5 — FSM in the database.&lt;/strong&gt; How concurrency, leaks, and crashes stopped being terrifying once the workflow state lived in a table instead of in memory — and how the same substrate carries resilience and a distributed improvement mechanism across multiple repos.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'll change my mind about some of this between now and the last post. That's the point.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>agents</category>
      <category>dogfooding</category>
    </item>
    <item>
      <title>Installing OpenClaw the Easy Way</title>
      <dc:creator>Adrien Cossa</dc:creator>
      <pubDate>Mon, 16 Mar 2026 16:34:05 +0000</pubDate>
      <link>https://dev.to/souliane/installing-openclaw-the-easy-way-5733</link>
      <guid>https://dev.to/souliane/installing-openclaw-the-easy-way-5733</guid>
      <description>&lt;p&gt;I started installing &lt;a href="https://github.com/openclaw/openclaw" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; manually — reading through the docs, figuring out each step, hitting the usual walls. It was taking too long, so I stopped and decided to try a different approach. So instead of repeating the process, I wrote a &lt;a href="https://github.com/souliane/skills/tree/main/ac-openclaw" rel="noopener noreferrer"&gt;skill&lt;/a&gt; for it: a set of instructions and references that lets an AI coding agent handle the whole thing. This is the &lt;a href="https://dev.to/souliane/skill-driven-development-transferring-your-craft-to-ai-agents"&gt;skill-driven development&lt;/a&gt; approach I described in a previous post — let the agent do the work, fix the skill when it gets something wrong, repeat until it gets it right.&lt;/p&gt;

&lt;p&gt;The result is a skill that handles the full installation and configuration of OpenClaw on a VPS. You tell your agent "set up OpenClaw on my server" and it walks through everything: provisioning, hardening, messaging channels, backups. The skill encodes the gotchas so you don't have to hit them yourself.&lt;/p&gt;




&lt;h2&gt;
  
  
  What you get
&lt;/h2&gt;

&lt;p&gt;With the skill loaded, setting up OpenClaw goes roughly like this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Set up OpenClaw on my new Hetzner server"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The agent asks a few questions — SSH access details, which messaging channels you want, which model providers you have API keys for — then works through the phases in order. It handles:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Server provisioning&lt;/strong&gt; — SSH setup, initial access&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OS hardening&lt;/strong&gt; — UFW firewall, Fail2Ban, SSH lockdown&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Disk encryption&lt;/strong&gt; — LUKS full-disk or encrypted block storage volumes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runtime installation&lt;/strong&gt; — Node.js, Python, build tools&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenClaw installation&lt;/strong&gt; — clone, configure, systemd service&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model configuration&lt;/strong&gt; — API keys (Anthropic, OpenAI, etc.) or local Ollama&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Messaging channels&lt;/strong&gt; — Signal, Telegram, WhatsApp, Discord, and others&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-agent routing&lt;/strong&gt; — different agents for different contacts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Remote access&lt;/strong&gt; — Cloudflare Tunnel, Tailscale, or Caddy reverse proxy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docker sandboxing&lt;/strong&gt; — container isolation with proper firewall rules&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backups&lt;/strong&gt; — local snapshots, GitHub push, cloud provider images&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Social media integration&lt;/strong&gt; — optional, third-party schedulers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ongoing maintenance&lt;/strong&gt; — updates, log rotation, health checks&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;At each step, the agent adapts to your choices. Pick Telegram instead of Signal? The Signal-specific build steps get skipped. Want Tailscale instead of Cloudflare Tunnel? Different config, different verification. The skill describes the trade-offs; the agent presents them and moves on.&lt;/p&gt;

&lt;p&gt;Without the skill, an agent can still attempt all of this — the information exists in OpenClaw's docs and platform-specific guides. But it takes hours of trial and error, and several of the issues below are genuinely hard to figure out from scratch.&lt;/p&gt;




&lt;h2&gt;
  
  
  Issues the skill handles for you
&lt;/h2&gt;

&lt;p&gt;These are the problems that came up during the SDD loop — things the agent got wrong, that I fixed in the skill, and that you won't have to deal with.&lt;/p&gt;

&lt;h3&gt;
  
  
  Docker bypasses UFW
&lt;/h3&gt;

&lt;p&gt;This one is not well-known. When Docker publishes a port, it writes its own iptables rules that bypass UFW entirely. Your firewall says "deny all incoming," but Docker's containers are wide open anyway.&lt;/p&gt;

&lt;p&gt;The skill includes explicit &lt;code&gt;DOCKER-USER&lt;/code&gt; chain rules that block all inbound connections to containers unless they come from loopback (&lt;code&gt;127.0.0.1&lt;/code&gt;) or your Tailscale CGNAT range (&lt;code&gt;100.64.0.0/10&lt;/code&gt;). Without this, any container you run is exposed to the internet regardless of your UFW config.&lt;/p&gt;

&lt;h3&gt;
  
  
  Signal on ARM64 has no pre-built binary
&lt;/h3&gt;

&lt;p&gt;If you're running on an ARM64 VPS (Hetzner CAX series, for example), signal-cli's libsignal JNI binding doesn't have a pre-built binary. You have to clone the signal-cli repo and build libsignal from Rust source yourself. This requires &lt;code&gt;openjdk-25-jre-headless&lt;/code&gt;, &lt;code&gt;build-essential&lt;/code&gt;, &lt;code&gt;cmake&lt;/code&gt;, &lt;code&gt;libclang-dev&lt;/code&gt;, &lt;code&gt;protobuf-compiler&lt;/code&gt;, and &lt;code&gt;rustc&lt;/code&gt; — a dependency chain that takes a while to sort out.&lt;/p&gt;

&lt;p&gt;The skill documents the exact build steps and dependencies. Without it, the agent installs signal-cli, it silently fails at runtime, and you spend a while figuring out why.&lt;/p&gt;

&lt;h3&gt;
  
  
  Secrets end up in config files
&lt;/h3&gt;

&lt;p&gt;Left to its own devices, the agent will store API keys in environment files or config files because it's the fastest path. The skill enforces storing all secrets in &lt;code&gt;pass&lt;/code&gt; (the standard Unix password manager) and reading them at runtime. No API keys in plain text, ever.&lt;/p&gt;

&lt;h3&gt;
  
  
  Services "started" but not actually running
&lt;/h3&gt;

&lt;p&gt;A common agent failure: it runs &lt;code&gt;systemctl start openclaw&lt;/code&gt;, gets no error, and declares the phase complete. But the service might have crashed immediately after starting, or it might be listening on the wrong port, or a dependency might be missing.&lt;/p&gt;

&lt;p&gt;The skill marks verification as &lt;code&gt;(Non-Negotiable)&lt;/code&gt; — the agent must confirm each service actually responds via HTTP before moving on. This single rule prevented most of the false-completion issues during the SDD loop.&lt;/p&gt;

&lt;h3&gt;
  
  
  Disk encryption on a cloud VPS
&lt;/h3&gt;

&lt;p&gt;LUKS full-disk encryption on a cloud VPS isn't straightforward. You need to boot into rescue mode, set up &lt;code&gt;cryptsetup&lt;/code&gt;, and install &lt;code&gt;dropbear-initramfs&lt;/code&gt; so you can unlock the disk remotely after every reboot. The skill documents this path but also offers a simpler alternative: Hetzner encrypted block storage volumes for sensitive data only, which avoids the rescue-boot complexity for most use cases.&lt;/p&gt;

&lt;h3&gt;
  
  
  Remote access done wrong
&lt;/h3&gt;

&lt;p&gt;Exposing ports directly to the internet is the default instinct, but it's the wrong one. The skill requires one of three approaches: Cloudflare Tunnel (outbound-only, Zero Trust), Tailscale Serve (private mesh), or Caddy reverse proxy with password auth. Each has documented trade-offs. The agent asks which one you prefer and configures it accordingly.&lt;/p&gt;




&lt;h2&gt;
  
  
  How the skill is structured
&lt;/h2&gt;

&lt;p&gt;The main &lt;code&gt;SKILL.md&lt;/code&gt; file is around 160 lines and covers the decision flow and ordering constraints. Detailed procedures live in reference files that the agent pulls in as needed:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Reference&lt;/th&gt;
&lt;th&gt;What it covers&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Security hardening&lt;/td&gt;
&lt;td&gt;SSH config, UFW rules, Fail2Ban jails, Docker iptables bypass&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Local models&lt;/td&gt;
&lt;td&gt;Ollama installation, model selection by RAM, GPU passthrough&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Messaging channels&lt;/td&gt;
&lt;td&gt;Per-channel setup (Signal ARM64 build, WhatsApp QR pairing, Telegram BotFather)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-agent routing&lt;/td&gt;
&lt;td&gt;Contact-to-agent bindings, DM access policies, agent personalities&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Remote access&lt;/td&gt;
&lt;td&gt;Cloudflare Tunnel vs Tailscale vs Caddy — trade-offs and setup&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Social media&lt;/td&gt;
&lt;td&gt;Third-party schedulers, risk considerations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Backups&lt;/td&gt;
&lt;td&gt;Local snapshots, GitHub push with deploy key, cloud provider images&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This progressive disclosure keeps context usage reasonable. The full Signal ARM64 build instructions only get loaded if you're actually setting up Signal on ARM64.&lt;/p&gt;




&lt;h2&gt;
  
  
  Limitations
&lt;/h2&gt;

&lt;p&gt;The skill isn't tied to a specific setup. It asks what you're working with — VPS or local machine, which OS, which architecture, which provider — and adapts accordingly. It can fetch current VPS pricing (Hetzner CAX series, DigitalOcean, etc.) so you can compare options before committing. If you want to run OpenClaw on a spare laptop instead of a cloud server, it adjusts the flow — no VPS provisioning, different networking, local Ollama instead of cloud API keys.&lt;/p&gt;

&lt;p&gt;That said, my own setup (Hetzner ARM64, Ubuntu 24.04) is the only tested path so far. Other combinations should work but may have gaps in the gotcha coverage.&lt;/p&gt;

&lt;p&gt;The skill is also a snapshot as of early 2026. OpenClaw is actively developed. Signal might ship ARM64 binaries eventually. Docker might fix the UFW bypass. Treat specific version numbers with appropriate skepticism.&lt;/p&gt;

&lt;p&gt;And while the skill handles the setup, it doesn't replace understanding what's running on your server. If something breaks outside the skill's playbook, you'll need basic comfort with SSH, firewalls, and systemd to debug it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Getting it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills add https://github.com/souliane/skills &lt;span class="nt"&gt;--skill&lt;/span&gt; ac-openclaw &lt;span class="nt"&gt;-g&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It works with any AI agent that can read files and run commands.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/openclaw/openclaw" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; | &lt;a href="https://github.com/souliane/skills/blob/main/LICENSE" rel="noopener noreferrer"&gt;MIT License&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>selfhosted</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Skill-Driven Development: Transferring your Craft to AI Agents</title>
      <dc:creator>Adrien Cossa</dc:creator>
      <pubDate>Fri, 13 Mar 2026 16:40:28 +0000</pubDate>
      <link>https://dev.to/souliane/skill-driven-development-transferring-your-craft-to-ai-agents-15an</link>
      <guid>https://dev.to/souliane/skill-driven-development-transferring-your-craft-to-ai-agents-15an</guid>
      <description>&lt;p&gt;Every project has its own way of doing things — migration patterns, transaction handling, deployment quirks, that one PDF template workflow nobody wants to touch. I've been writing this stuff down as markdown files with helper scripts so my AI agent can follow along. I've taken to calling this &lt;strong&gt;skill-driven development&lt;/strong&gt; (building on &lt;a href="https://code.claude.com/docs/en/skills" rel="noopener noreferrer"&gt;agent skills&lt;/a&gt;, a convention that started with Anthropic's Claude Code and has since been adopted by several other AI coding agents).&lt;/p&gt;

&lt;p&gt;This post walks through some skills I've put together and shows the feedback loop that makes them worth maintaining.&lt;/p&gt;




&lt;h2&gt;
  
  
  What skills add over a README
&lt;/h2&gt;

&lt;p&gt;A project README or an AGENTS.md can capture conventions. The skills I've been writing try to go a bit further:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Progressive disclosure&lt;/strong&gt; — a slim main file loads first, with detailed references pulled in only when needed. This keeps the agent's context window focused.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Composability&lt;/strong&gt; — skills declare dependencies and load together. A Django skill + a project overlay skill + a TDD skill compose into a complete workflow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scripts&lt;/strong&gt; — skills can ship executable scripts alongside the instructions. A script the agent calls is more reliable than a 15-step procedure the agent interprets.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-improvement&lt;/strong&gt; — when something goes wrong, the fix goes into the skill. Next session, the agent follows the updated instructions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last point is what makes it worth maintaining: every correction you make goes into the skill, and the agent doesn't make the same mistake twice.&lt;/p&gt;




&lt;h2&gt;
  
  
  The SDD loop
&lt;/h2&gt;

&lt;p&gt;There's a useful parallel with Test-Driven Development:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh1967r2k74xarrz9mosl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh1967r2k74xarrz9mosl.png" alt="20260309-introducing-skills diagram 1" width="784" height="242"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;TDD&lt;/th&gt;
&lt;th&gt;SDD&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Write&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Write the test first&lt;/td&gt;
&lt;td&gt;Write the skill first&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Run&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Run the code&lt;/td&gt;
&lt;td&gt;Let the agent produce the code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Evaluate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Test fails → fix the &lt;strong&gt;code&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Output is wrong → fix the &lt;strong&gt;skill&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Loop&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Until green&lt;/td&gt;
&lt;td&gt;Until the agent gets it right&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In TDD, you iterate on code until the tests pass. Here, you iterate on skills until the agent's output is what you wanted. The skill isn't a one-shot handoff — you keep tweaking it as you find gaps.&lt;/p&gt;

&lt;p&gt;In practice, the two aren't separate activities — skills encode TDD as part of the workflow. The implementation skill says "write a failing test first, then implement," and the agent does both. Nobody writes tests by hand and then separately invokes a skill; the skill &lt;em&gt;is&lt;/em&gt; what makes the agent follow TDD.&lt;/p&gt;

&lt;p&gt;When tests fail, it's worth asking whether the code is wrong or the skill is incomplete. Fix the skill, re-run. Over time it adds up — each fix prevents one class of mistake, and after enough sessions you've built up a decent set of guardrails just from things that went wrong.&lt;/p&gt;

&lt;p&gt;If you use &lt;a href="https://github.com/souliane/teatree" rel="noopener noreferrer"&gt;teatree&lt;/a&gt; (a set of lifecycle skills I put together for multi-repo development), this loop can be automated: &lt;code&gt;t3-retro&lt;/code&gt; runs a retrospective after each session and writes fixes into the skill files. But it works just as well manually — whenever you correct the agent, you can put that correction in a skill so it sticks.&lt;/p&gt;




&lt;h2&gt;
  
  
  What a skill looks like
&lt;/h2&gt;

&lt;p&gt;A skill is a markdown file (&lt;code&gt;SKILL.md&lt;/code&gt;) with YAML frontmatter, often accompanied by scripts for the mechanical parts. Here's a simplified example of the markdown side:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ac-django&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Django coding conventions and best practices.&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;0.0.1&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="c1"&gt;# Django Conventions&lt;/span&gt;

&lt;span class="c1"&gt;## Models&lt;/span&gt;

&lt;span class="c1"&gt;### Fat Models Doctrine&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Business logic belongs in models, not views or serializers.&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Use model managers for complex queries.&lt;/span&gt;

&lt;span class="c1"&gt;### Migrations&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Always use `apps.get_model()` in data migrations — never import directly.&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Set `elidable=True` on data-only migrations.&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Include both `forwards` and `backwards` functions.&lt;/span&gt;

&lt;span class="c1"&gt;## Settings&lt;/span&gt;

&lt;span class="c1"&gt;### Storage Configuration (Non-Negotiable)&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Use `STORAGES` dict (Django 4.2+), not `DEFAULT_FILE_STORAGE`.&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;The deprecated setting causes silent failures on deployment.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Rules marked &lt;code&gt;(Non-Negotiable)&lt;/code&gt; are things I've learned the hard way. "Always verify services respond via HTTP before declaring running" sounds obvious, but without it, the agent will say "servers started" without checking whether anything actually came up.&lt;/p&gt;

&lt;p&gt;These work with any agent that can read files — Claude Code, Codex, Cursor, whatever. The agent reads the skill and follows the instructions.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's in the repo
&lt;/h2&gt;

&lt;p&gt;I've put the generic ones in a &lt;a href="https://github.com/souliane/skills" rel="noopener noreferrer"&gt;public repository&lt;/a&gt; in case any of them are useful to others. Here are the ones I'd recommend looking at first.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;ac-reviewing-skills&lt;/code&gt; — keep your skills in shape
&lt;/h3&gt;

&lt;p&gt;This is probably the most broadly useful one. It does a deep audit of your skill files — architecture, content quality, script correctness, stale cross-references, duplicated guidance. I run it periodically and it consistently finds things I missed: rules that drifted between files, references pointing at renamed sections, scripts with missing error handling. If you maintain more than a handful of skills, it's worth running periodically.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;ac-django&lt;/code&gt; — Django conventions that models already "know" but get wrong
&lt;/h3&gt;

&lt;p&gt;The agent knows Django. It doesn't know how &lt;em&gt;you&lt;/em&gt; use Django. This skill covers the mistakes I kept correcting: outdated migration patterns (&lt;code&gt;apps.get_model()&lt;/code&gt; vs direct imports), unsafe transaction handling, the &lt;code&gt;STORAGES&lt;/code&gt; dict vs deprecated &lt;code&gt;DEFAULT_FILE_STORAGE&lt;/code&gt;, &lt;code&gt;post_migrate&lt;/code&gt; signal timing for permission assignments. It's a reference, not a tutorial — it assumes the agent already understands the framework and just needs guardrails for the non-obvious parts.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ac-python&lt;/code&gt; is its companion for generic Python: style, typing, OOP patterns, testing conventions. Less opinionated, but useful as a baseline.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;ac-adopting-ruff&lt;/code&gt; — structured linter migration
&lt;/h3&gt;

&lt;p&gt;A step-by-step playbook for replacing black + isort + flake8 with ruff, one rule category per MR. It handles the things I got stuck on — conflicting formatter settings, rule equivalences between linters, the &lt;code&gt;unfixable&lt;/code&gt; vs &lt;code&gt;ignore&lt;/code&gt; distinction. Doing it in one big MR is painful; the skill breaks it into reviewable increments.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;ac-openclaw&lt;/code&gt; — self-hosted AI assistant setup
&lt;/h3&gt;

&lt;p&gt;An interactive guide to install &lt;a href="https://github.com/openclaw/openclaw" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; on a VPS or local machine. Covers server provisioning, OS hardening, model configuration (BYOK or local Ollama), messaging channel integration (Signal, WhatsApp, Telegram, etc.), and secure remote access (Cloudflare Tunnel, Tailscale, or Caddy). It walks through every decision point — useful if you want a self-hosted personal AI assistant without piecing together a dozen tutorials.&lt;/p&gt;

&lt;h3&gt;
  
  
  Everything else
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Skill&lt;/th&gt;
&lt;th&gt;What it covers&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ac-python&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Generic Python: style, typing, OOP design, testing, tooling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ac-editing-acroforms&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;AcroForm PDF templates: widget geometry, content streams, font subsetting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ac-auditing-repos&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Cross-repo infrastructure audit: harmonize pre-commit, linter, and editor configs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ac-writing-blog-posts&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Article writing + social media promotion + dev.to publishing pipeline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ac-generating-slides&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Markdown to presentation slides via Marp&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ac-scaffolding-skill-repos&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Scaffold new skill repos with correct config and structure&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;code&gt;ac-editing-acroforms&lt;/code&gt; deserves a mention — it came out of editing PDF form templates by hand. The internals (annotation dictionaries, appearance stream generation, widget flags) are barely documented. The agent can't figure this out from training data alone, so the skill ships with Python scripts that handle the tricky bits.&lt;/p&gt;

&lt;p&gt;This blog post was written with &lt;code&gt;ac-writing-blog-posts&lt;/code&gt;, for what it's worth.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to use them
&lt;/h2&gt;

&lt;p&gt;Install with &lt;a href="https://github.com/nichochar/skills" rel="noopener noreferrer"&gt;npx skills&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills add https://github.com/souliane/skills &lt;span class="nt"&gt;--skill&lt;/span&gt; &lt;span class="s1"&gt;'*'&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This installs all skills globally for your default agent. To install for multiple agents at once:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills add https://github.com/souliane/skills &lt;span class="nt"&gt;--skill&lt;/span&gt; &lt;span class="s1"&gt;'*'&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--agent&lt;/span&gt; claude-code codex cursor github-copilot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you want the SDD feedback loop — where retrospective fixes land in files you can commit — clone the repo and symlink it into your agent's skills directory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone git@github.com:souliane/skills.git ~/workspace/souliane/skills

&lt;span class="c"&gt;# Example for Claude Code — adjust the target for your agent runtime&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;skill &lt;span class="k"&gt;in&lt;/span&gt; ~/workspace/souliane/skills/ac-&lt;span class="k"&gt;*&lt;/span&gt;/&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;&lt;span class="nb"&gt;ln&lt;/span&gt; &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$skill&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; ~/.claude/skills/&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;basename&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$skill&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This points your agent at the live git checkout directly. When the agent (or you) updates a skill file, the change is immediately available in the next session and can be committed. Don't use &lt;code&gt;npx skills add&lt;/code&gt; for this — it creates a managed copy that doesn't point back to your clone.&lt;/p&gt;

&lt;p&gt;If you use &lt;a href="https://github.com/souliane/teatree" rel="noopener noreferrer"&gt;teatree&lt;/a&gt;, its setup wizard can suggest these as companion skills for your project overlay — they're loaded automatically when you work in matching repos.&lt;/p&gt;




&lt;h2&gt;
  
  
  When it helps
&lt;/h2&gt;

&lt;p&gt;Skills work best when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You correct the agent for the same kind of mistake more than once&lt;/li&gt;
&lt;li&gt;Your project has conventions that diverge from common patterns&lt;/li&gt;
&lt;li&gt;You work across sessions and the agent keeps losing context&lt;/li&gt;
&lt;li&gt;You use deterministic tools (PDF editors, linters, deployment scripts) where the agent needs exact steps&lt;/li&gt;
&lt;li&gt;You want to share a recipe with others — a skill is a portable, self-contained package that anyone can install and use with their own agent&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They're less useful for one-off tasks or when the model's defaults already match your preferences. But even something you only do once yourself might be worth writing as a skill if it's useful to someone else.&lt;/p&gt;

&lt;p&gt;These skills reflect my own workflow — Django, Python, PDF templates, multi-repo infrastructure. They might not match yours at all. The most useful skills are probably ones you'd write yourself for your own project's conventions. These are just examples of what worked for me.&lt;/p&gt;




&lt;h2&gt;
  
  
  Update — June 2026
&lt;/h2&gt;

&lt;p&gt;The tooling around skills has moved on since I wrote this, so a couple of additions.&lt;/p&gt;

&lt;p&gt;There's now an attempt at a shared convention (&lt;a href="https://agentskills.io" rel="noopener noreferrer"&gt;agentskills.io&lt;/a&gt;) and lots of community skill collections worth borrowing from. A term I've picked up is "skillify" — capturing a workflow into a reusable skill draft &lt;em&gt;after&lt;/em&gt; a session pulls it off, which is a nice bottom-up complement to writing the skill up front.&lt;/p&gt;

&lt;p&gt;The bigger gap is the &lt;em&gt;evaluate&lt;/em&gt; step in the loop above: I left it as a human judgment call, but the better answer is a runnable eval. Anthropic's &lt;a href="https://github.com/anthropics/skills/tree/main/skills/skill-creator" rel="noopener noreferrer"&gt;skill-creator&lt;/a&gt; stores test cases (a &lt;code&gt;prompt&lt;/code&gt; plus an &lt;code&gt;expected_output&lt;/code&gt;) in an &lt;code&gt;evals/&lt;/code&gt; folder and grades a skill's output with and without the skill, so a regression shows up as a failing benchmark instead of a gut feeling. If I were drawing the SDD diagram today, that node would be a real eval. If you're picking this up now, lean on the current tooling rather than this post — it's a bit dated already.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/souliane/skills" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; | &lt;a href="https://github.com/souliane/skills/blob/main/LICENSE" rel="noopener noreferrer"&gt;MIT License&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>productivity</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Introducing Teatree: Parallel Multi-Repo Development with AI Agents</title>
      <dc:creator>Adrien Cossa</dc:creator>
      <pubDate>Thu, 12 Mar 2026 13:16:50 +0000</pubDate>
      <link>https://dev.to/souliane/introducing-teatree-parallel-multi-repo-development-with-ai-agents-cjp</link>
      <guid>https://dev.to/souliane/introducing-teatree-parallel-multi-repo-development-with-ai-agents-cjp</guid>
      <description>&lt;p&gt;I'm a Customer Success Engineer at &lt;a href="https://www.opercredits.com/" rel="noopener noreferrer"&gt;Oper Credits&lt;/a&gt;. My daily work involves a multi-repo project — backend, frontend, translations, configuration — and I use AI coding agents constantly. The friction isn't writing code; agents handle that well. It's everything surrounding it: following different conventions across codebases, coordinating changes across services, managing local environments that diverge from what's in git, and encoding the workflow patterns we could all benefit from.&lt;/p&gt;

&lt;p&gt;The agent can figure out most of these things, but it struggles with the specifics — it loops on troubleshooting, tries approaches that don't match the project's actual setup, and burns tokens on trial and error. I started putting together teatree to write down that knowledge so the agent doesn't have to rediscover it every session. It's also a way to define and automate your personal workflow without adding friction with your team — build it on your own, then push for adoption once it works.&lt;/p&gt;

&lt;p&gt;This post walks through the architecture, the design choices I landed on, and how the pieces fit together. It's long because there's a lot of ground to cover. If you just want the quick pitch, the &lt;a href="https://github.com/souliane/teatree" rel="noopener noreferrer"&gt;README&lt;/a&gt; has that.&lt;/p&gt;




&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;What it looks like&lt;/li&gt;
&lt;li&gt;The problem&lt;/li&gt;
&lt;li&gt;Skills as markdown and scripts&lt;/li&gt;
&lt;li&gt;The lifecycle graph&lt;/li&gt;
&lt;li&gt;Multi-repo worktree management&lt;/li&gt;
&lt;li&gt;The overlay and extension system&lt;/li&gt;
&lt;li&gt;Auto-loading hooks&lt;/li&gt;
&lt;li&gt;The retrospective loop&lt;/li&gt;
&lt;li&gt;Companion skills&lt;/li&gt;
&lt;li&gt;Getting started&lt;/li&gt;
&lt;li&gt;When it helps (and when it doesn't)&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  What it looks like
&lt;/h2&gt;

&lt;p&gt;Tell your AI agent what you want. Teatree skills guide it through the entire lifecycle:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;https://gitlab.com/org/repo/-/issues/1234&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The agent fetches the ticket, creates synchronized worktrees, provisions isolated databases and ports, implements the feature with TDD, writes a test plan, runs E2E tests, self-reviews, then pushes and creates the merge request.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Fix PROJ-5678&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The agent fetches the failed test report from CI, reproduces locally, fixes, pushes, and monitors the pipeline until green.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Review https://gitlab.com/org/repo/-/merge_requests/456&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The agent fetches the ticket for context, inspects every commit individually, and posts draft review comments inline on the correct file and line.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Run the test plan for !789&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The agent generates a test plan from the MR changes, runs E2E tests, and posts evidence screenshots on the MR.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Follow up on my open tickets&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The agent batch-processes your assigned tickets, checks CI statuses, nudges stale MRs, and starts work on anything that's ready.&lt;/p&gt;




&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;AI coding agents can do a lot — reason about architecture, run tests, create merge requests. But without your project's specific context, they spend tokens and time rediscovering things you already know. Your repo layout, your CI conventions, your team's practices, your local tooling — none of that is in training data.&lt;/p&gt;

&lt;p&gt;The friction is especially pronounced with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multi-repo setups&lt;/strong&gt; — creating branches across 3+ repos for a single ticket, provisioning isolated databases, allocating non-conflicting ports&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Atypical local environments&lt;/strong&gt; — personal tooling that differs from what's in git, dev configurations the team hasn't adopted yet&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operational workflows&lt;/strong&gt; — self-reviewing before pushing, creating properly formatted merge requests, monitoring pipelines, running retrospectives&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The agent can attempt all of these. But without explicit guidance, it either asks twenty questions or confidently does the wrong thing — and when something fails, it loops instead of applying the fix you already know.&lt;/p&gt;

&lt;p&gt;I tried shell scripts and aliases first, sometimes Python scripts too. They worked for the happy path but couldn't handle edge cases — the database import that fails because VPN is down, the port conflict because another worktree is still running, the CI format check that rejects your MR title. A shell script can't say "if the test fails, check if it's a known flake — here are the patterns." An AI agent can.&lt;/p&gt;

&lt;p&gt;So I started writing this stuff down — as markdown instructions with tested Python and shell scripts for the mechanical parts. The markdown gives the agent enough context to handle edge cases; the scripts handle deterministic operations where you don't want the agent improvising.&lt;/p&gt;




&lt;h2&gt;
  
  
  Skills as markdown and scripts
&lt;/h2&gt;

&lt;p&gt;A teatree skill starts with a markdown file (&lt;code&gt;SKILL.md&lt;/code&gt;) with YAML frontmatter, but the heavy lifting often happens in scripts that ship alongside it. Teatree currently has 15 Python executables, 9 library modules, and 3 shell scripts — backed by 26 test files. Here's a simplified example of the markdown side:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;t3-code&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Writing code with TDD methodology.&lt;/span&gt;
&lt;span class="na"&gt;requires&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;t3-workspace&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;0.0.1&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="c1"&gt;# Writing Code (TDD)&lt;/span&gt;

&lt;span class="c1"&gt;## Dependencies&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="err"&gt;*&lt;/span&gt;&lt;span class="nv"&gt;*t3-workspace&lt;/span&gt;&lt;span class="err"&gt;**&lt;/span&gt; &lt;span class="s"&gt;(required) — provides dev servers for live reload.&lt;/span&gt;

&lt;span class="c1"&gt;## Workflow&lt;/span&gt;

&lt;span class="c1"&gt;### 1. Plan First (Non-Negotiable)&lt;/span&gt;

&lt;span class="s"&gt;Always make a plan before writing code. Never jump straight to coding.&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Identify scope&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;which files, modules, and repos are affected.&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Review existing patterns in the codebase before writing new code.&lt;/span&gt;

&lt;span class="c1"&gt;### 2. TDD Cycle&lt;/span&gt;

&lt;span class="s"&gt;Write failing test → Implement → Green → Refactor&lt;/span&gt;

&lt;span class="c1"&gt;### 3. Follow Conventions&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Language/framework conventions from the project's convention skills.&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Repository-specific patterns take precedence over generic guidance.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few things to note:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skills contain both instructions and scripts.&lt;/strong&gt; The markdown tells the agent &lt;em&gt;when&lt;/em&gt; and &lt;em&gt;why&lt;/em&gt; to do things. The Python scripts handle deterministic operations: worktree creation, port allocation, database provisioning, branch finalization. A script the agent calls is more robust than a 15-step procedure in a markdown file. Instructions for judgment calls, scripts for mechanical work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skills declare dependencies.&lt;/strong&gt; The &lt;code&gt;requires:&lt;/code&gt; field in the frontmatter tells the loading system which other skills need to be present. When &lt;code&gt;t3-code&lt;/code&gt; is loaded, &lt;code&gt;t3-workspace&lt;/code&gt; comes along automatically. This eliminates wasted round-trips where the agent reads a skill, sees "Load &lt;code&gt;/t3-workspace&lt;/code&gt; now", and then has to make a second call.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skills use progressive disclosure.&lt;/strong&gt; Most &lt;code&gt;SKILL.md&lt;/code&gt; files are 80–160 lines, with detailed procedures in &lt;code&gt;references/&lt;/code&gt; files that the agent reads on demand. This keeps the typical skill set well within a reasonable context budget.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skills have rules marked &lt;code&gt;(Non-Negotiable)&lt;/code&gt;.&lt;/strong&gt; These are things I've had to learn the hard way. "Always verify services respond via HTTP before declaring running" sounds obvious, but without it, the agent will say "servers started" without checking whether anything actually came up.&lt;/p&gt;




&lt;h2&gt;
  
  
  The lifecycle graph
&lt;/h2&gt;

&lt;p&gt;Teatree organizes development into phases, each handled by a dedicated skill:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjkldryhcyx4niugm82im.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjkldryhcyx4niugm82im.png" alt="diagram" width="784" height="214"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The flow is: &lt;strong&gt;ticket → code → test → review → ship → retro&lt;/strong&gt;, with &lt;code&gt;t3-workspace&lt;/code&gt; providing infrastructure to all phases and &lt;code&gt;t3-debug&lt;/code&gt; available whenever something breaks.&lt;/p&gt;

&lt;p&gt;Here's what each skill does:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Skill&lt;/th&gt;
&lt;th&gt;Phase&lt;/th&gt;
&lt;th&gt;What it handles&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;t3-setup&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Bootstrapping&lt;/td&gt;
&lt;td&gt;Interactive setup wizard, health checks, overlay scaffolding&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;t3-workspace&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Infrastructure&lt;/td&gt;
&lt;td&gt;multi-repo worktrees, port allocation, DB provisioning, env files, dev servers, cleanup&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;t3-ticket&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Intake&lt;/td&gt;
&lt;td&gt;Fetch the issue, extract acceptance criteria, detect affected repos, detect tenant/variant, create worktrees&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;t3-code&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Implementation&lt;/td&gt;
&lt;td&gt;Plan-first workflow, TDD cycle, convention enforcement, feature flag checks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;t3-test&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Verification&lt;/td&gt;
&lt;td&gt;Test execution, CI interaction, E2E test plans, quality gates&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;t3-debug&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Troubleshooting&lt;/td&gt;
&lt;td&gt;Systematic 5-phase debugging protocol, user-hint-first investigation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;t3-review&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Code review&lt;/td&gt;
&lt;td&gt;Self-review checklist, giving review, receiving feedback&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;t3-ship&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Delivery&lt;/td&gt;
&lt;td&gt;Commit formatting, branch finalization, MR creation, pipeline monitoring&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;t3-review-request&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Notifications&lt;/td&gt;
&lt;td&gt;Post MR links to review channels, check for duplicate requests&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;t3-retro&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Improvement&lt;/td&gt;
&lt;td&gt;Conversation audit, root cause analysis, skill updates, privacy scans&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;t3-contribute&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Contribution&lt;/td&gt;
&lt;td&gt;Push skill improvements to fork, open upstream issues&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;t3-followup&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Batch ops&lt;/td&gt;
&lt;td&gt;Process assigned tickets, check CI statuses, nudge stale MRs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The skills mirror how development actually works. Implementing a ticket touches intake, coding, testing, review, and delivery — often across multiple repos. Making the skills fully independent would mean duplicating knowledge across every one of them, which always diverges over time.&lt;/p&gt;

&lt;h3&gt;
  
  
  The follow-up dashboard
&lt;/h3&gt;

&lt;p&gt;One skill worth highlighting is &lt;code&gt;t3-followup&lt;/code&gt;. It runs your daily routine: batch-processing new tickets, checking CI statuses, advancing tickets through their lifecycle, and nudging reviewers about stale MRs.&lt;/p&gt;

&lt;p&gt;As it works, it builds a persistent cache (&lt;code&gt;followup.json&lt;/code&gt;) of all in-flight work — tickets, merge requests, pipeline statuses, review request states, and review comment tracking. From that cache, it generates an HTML dashboard:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3jgptus1dyri2l28thro.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3jgptus1dyri2l28thro.png" alt="t3-followup dashboard" width="800" height="601"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The dashboard gives you a single view of everything that's in flight: ticket lifecycle status, pipeline results (color-coded pills), review request state, and tracked review comments. Everything is a clickable link — tickets, MRs, CI pipelines, Slack messages — so you can jump directly into any conversation.&lt;/p&gt;

&lt;p&gt;The cache is a plain JSON file, so project overlays can inject extra fields (external tracker status, deployment state, tenant info) via the &lt;code&gt;followup_enrich_data&lt;/code&gt; extension point. Stale tickets are purged automatically after their MRs have been merged for 14 days (configurable via &lt;code&gt;T3_FOLLOWUP_PURGE_DAYS&lt;/code&gt;).&lt;/p&gt;




&lt;h2&gt;
  
  
  Multi-repo worktree management
&lt;/h2&gt;

&lt;p&gt;This is where I started, and it's the feature I use most.&lt;/p&gt;

&lt;p&gt;Suppose your project has three repos: &lt;code&gt;acme-backend&lt;/code&gt;, &lt;code&gt;acme-frontend&lt;/code&gt;, and &lt;code&gt;acme-translations&lt;/code&gt;. You're about to work on ticket PROJ-1234. Running &lt;code&gt;t3_ticket PROJ-1234&lt;/code&gt; creates this structure:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flt9n5aucch3fz0ia5404.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flt9n5aucch3fz0ia5404.png" alt="diagram" width="784" height="274"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Each ticket gets its own directory containing one git worktree per affected repo — lightweight checkouts that share the &lt;code&gt;.git&lt;/code&gt; directory with the main clone but have their own branch and working tree. A shared &lt;code&gt;.env.worktree&lt;/code&gt; file provides allocated ports, database name, and variant configuration.&lt;/p&gt;

&lt;p&gt;After creating the worktrees, &lt;code&gt;t3_setup&lt;/code&gt; provisions the environment:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Symlinks&lt;/strong&gt; — &lt;code&gt;.venv&lt;/code&gt;, &lt;code&gt;node_modules&lt;/code&gt;, &lt;code&gt;.python-version&lt;/code&gt;, and configurable shared directories are symlinked from the main repo (so you don't reinstall dependencies for every worktree)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Environment files&lt;/strong&gt; — &lt;code&gt;.env.worktree&lt;/code&gt; with unique ports, database URL, variant-specific overrides&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database&lt;/strong&gt; — creates an isolated DB, imports from a snapshot or dump, runs migrations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;direnv&lt;/strong&gt; — auto-loads environment variables when you &lt;code&gt;cd&lt;/code&gt; into the worktree&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frontend dependencies&lt;/strong&gt; — installs if the lockfile changed&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Then &lt;code&gt;t3_start&lt;/code&gt; brings everything up: Docker services, migrations, backend server, frontend dev server. Each worktree is fully isolated — its own database, its own ports, its own services. You can have ticket 1234 and ticket 5678 running simultaneously without conflicts.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why this matters
&lt;/h3&gt;

&lt;p&gt;Without isolation, the most common failure is contamination between tickets. You're working on ticket A, make a database change, then switch to ticket B which expected the old schema — migrations fail, the frontend shows stale data, and you spend time figuring out what went wrong. Worktree isolation avoids this. Each ticket is a clean room.&lt;/p&gt;

&lt;p&gt;The other benefit is parallelism. While waiting for CI on ticket A, start working on ticket B in a completely separate environment. No branch switching, no stashing, no "wait, which database am I pointing at?"&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-tenant awareness
&lt;/h3&gt;

&lt;p&gt;If your project serves multiple tenants — each with their own configuration, feature flags, and sometimes database — teatree handles that too. The variant system (&lt;code&gt;wt_detect_variant&lt;/code&gt;) auto-detects the target tenant from ticket labels, descriptions, or external trackers, then provisions tenant-specific databases, environment variables, and configuration. Feature flag checks during code review ensure changes are properly scoped per tenant.&lt;/p&gt;

&lt;p&gt;The project overlay wires in your tenant-to-variant mapping; teatree handles the rest. This means "set up a worktree for ticket X" automatically produces an environment configured for the correct tenant — no manual env file editing, no guesswork about which tenant you're in.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why &lt;code&gt;t3_ticket&lt;/code&gt; instead of raw git commands
&lt;/h3&gt;

&lt;p&gt;The convention is &lt;code&gt;&amp;lt;ticket&amp;gt;/&amp;lt;repo&amp;gt;/&lt;/code&gt; — a ticket directory containing worktrees. Raw &lt;code&gt;git worktree add&lt;/code&gt; creates flat worktrees at whatever path you give it, which breaks the ticket-directory structure that every other tool expects. &lt;code&gt;t3_ticket&lt;/code&gt; enforces the convention, handles branch naming (with your prefix), and creates worktrees across all affected repos in one call. The skill file marks this as &lt;code&gt;(Non-Negotiable)&lt;/code&gt; because flat worktrees cause subtle breakage downstream.&lt;/p&gt;




&lt;h2&gt;
  
  
  The overlay and extension system
&lt;/h2&gt;

&lt;p&gt;Teatree knows how to create worktrees, allocate ports, and orchestrate a development lifecycle. It doesn't know how to start &lt;em&gt;your&lt;/em&gt; backend, import &lt;em&gt;your&lt;/em&gt; database, or create &lt;em&gt;your&lt;/em&gt; merge requests. That project-specific knowledge lives in a &lt;strong&gt;project overlay&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  The three-layer architecture
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnylhll95t2svhgbya9us.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnylhll95t2svhgbya9us.png" alt="diagram" width="784" height="487"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When teatree needs to do something project-specific (start the backend, import a database, create an MR), it calls an &lt;strong&gt;extension point&lt;/strong&gt; through a registry. The registry resolves the implementation using a 3-layer priority:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Priority&lt;/th&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Highest&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Project&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Your overlay's &lt;code&gt;project_hooks.py&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;t3_start&lt;/code&gt; that runs Docker + Django + Angular&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Middle&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Framework&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Framework integration (e.g., Django)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;wt_post_db&lt;/code&gt; that runs &lt;code&gt;manage.py migrate&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lowest&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Default&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Teatree core fallback&lt;/td&gt;
&lt;td&gt;Usually a no-op or "not configured" message&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The registry itself is simple — 45 lines of Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;_LAYERS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;default&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;framework&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;project&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;_LAYER_RANK&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;layer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;layer&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_LAYERS&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;
&lt;span class="n"&gt;_registry&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Callable&lt;/span&gt;&lt;span class="p"&gt;]]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;point&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Callable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;layer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;default&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;entries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_registry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setdefault&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;point&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;
    &lt;span class="n"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;[:]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;lyr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;lyr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;entries&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;lyr&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;layer&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;layer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;_LAYER_RANK&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]])&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;point&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Callable&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;entries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_registry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;point&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# highest priority = last entry
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;point&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;fn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;point&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;KeyError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No handler registered for extension point &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;point&lt;/span&gt;&lt;span class="si"&gt;!r}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Registering a handler at the &lt;code&gt;"project"&lt;/code&gt; layer automatically overrides anything at &lt;code&gt;"framework"&lt;/code&gt; or &lt;code&gt;"default"&lt;/code&gt;. The framework layer is there so teatree can ship framework integrations (Django is the first) that work out of the box but can still be overridden by project-specific needs.&lt;/p&gt;

&lt;h3&gt;
  
  
  What an overlay looks like
&lt;/h3&gt;

&lt;p&gt;A project overlay is a directory with this structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;acme-overlay/
├── SKILL.md                    # Skill description + loading order
├── scripts/
│   └── lib/
│       ├── bootstrap.sh        # Shell wrappers (sourced after teatree)
│       ├── shell_helpers.sh    # Env loading, variant detection
│       └── project_hooks.py    # Extension point overrides
├── hook-config/
│   ├── context-match.yml       # Patterns that trigger this overlay
│   └── reference-injections.yml # References to load per lifecycle phase
└── references/
    ├── prerequisites-and-setup.md
    ├── troubleshooting.md
    └── playbooks/
        └── README.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;project_hooks.py&lt;/code&gt; file registers your overrides:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;lib.registry&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;register&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;register_acme&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;wt_env_extra&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;envfile&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;envfile&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ACME_API_KEY=dev-key&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;wt_db_import&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;variant&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;main_repo&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Import from your team's shared dump
&lt;/span&gt;        &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;lib.db&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;db_restore&lt;/span&gt;
        &lt;span class="nf"&gt;db_restore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;main_repo&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/dumps/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;variant&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;_latest.sql&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;wt_run_backend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;
        &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;manage.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;runserver&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.0.0.0:8000&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                      &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wt_env_extra&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wt_env_extra&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;project&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wt_db_import&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wt_db_import&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;project&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wt_run_backend&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wt_run_backend&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;project&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The teatree core scripts call &lt;code&gt;registry.call("wt_run_backend")&lt;/code&gt;, and your project handler runs instead of the default "not configured" stub. You only override what you need — everything else falls through to the framework or default layer.&lt;/p&gt;

&lt;h3&gt;
  
  
  There are 25 extension points
&lt;/h3&gt;

&lt;p&gt;They cover the full lifecycle:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Extension Points&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Workspace setup&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;wt_symlinks&lt;/code&gt;, &lt;code&gt;wt_env_extra&lt;/code&gt;, &lt;code&gt;wt_services&lt;/code&gt;, &lt;code&gt;wt_detect_variant&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Database&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;wt_db_import&lt;/code&gt;, &lt;code&gt;wt_post_db&lt;/code&gt;, &lt;code&gt;wt_restore_ci_db&lt;/code&gt;, &lt;code&gt;wt_reset_passwords&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dev servers&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;wt_run_backend&lt;/code&gt;, &lt;code&gt;wt_run_frontend&lt;/code&gt;, &lt;code&gt;wt_build_frontend&lt;/code&gt;, &lt;code&gt;wt_start_session&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Testing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;wt_run_tests&lt;/code&gt;, &lt;code&gt;wt_trigger_e2e&lt;/code&gt;, &lt;code&gt;wt_quality_check&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Delivery&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;wt_create_mr&lt;/code&gt;, &lt;code&gt;wt_monitor_pipeline&lt;/code&gt;, &lt;code&gt;wt_send_review_request&lt;/code&gt;, &lt;code&gt;wt_fetch_failed_tests&lt;/code&gt;, &lt;code&gt;wt_fetch_ci_errors&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ticket management&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;ticket_check_deployed&lt;/code&gt;, &lt;code&gt;ticket_update_external_tracker&lt;/code&gt;, &lt;code&gt;ticket_get_mrs&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Follow-up&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;followup_enrich_data&lt;/code&gt;, &lt;code&gt;followup_enrich_dashboard&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;/t3-setup&lt;/code&gt; wizard can scaffold an overlay for you. Tell it your repos, your backend framework, and your database, and it generates the skeleton with commented-out examples for each relevant extension point. From there, fill in the blanks — or ask your AI agent to fill them in if it already knows your codebase (e.g., after working in the repos for a while).&lt;/p&gt;

&lt;h3&gt;
  
  
  The sourcing chain
&lt;/h3&gt;

&lt;p&gt;Shell functions are loaded in order:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# In .zshrc:&lt;/span&gt;
&lt;span class="nb"&gt;source&lt;/span&gt; ~/.teatree                                     &lt;span class="c"&gt;# load config&lt;/span&gt;
&lt;span class="nb"&gt;source&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$T3_REPO&lt;/span&gt;&lt;span class="s2"&gt;/scripts/lib/bootstrap.sh"&lt;/span&gt;            &lt;span class="c"&gt;# teatree core functions&lt;/span&gt;
&lt;span class="nb"&gt;source&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$T3_OVERLAY&lt;/span&gt;&lt;span class="s2"&gt;/scripts/lib/bootstrap.sh"&lt;/span&gt;         &lt;span class="c"&gt;# project overlay overrides&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The overlay's bootstrap has a guard — it checks that teatree was sourced first (&lt;code&gt;_T3_SCRIPTS_DIR&lt;/code&gt; must be set). This prevents confusing errors from running the overlay standalone.&lt;/p&gt;

&lt;p&gt;Inside Python scripts, the pattern is similar:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;lib.init&lt;/span&gt;
&lt;span class="n"&gt;lib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;init&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;init&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;                 &lt;span class="c1"&gt;# registers defaults + auto-detects framework
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;lib.project_hooks&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;register_project&lt;/span&gt;
&lt;span class="nf"&gt;register_project&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;              &lt;span class="c1"&gt;# registers project overrides at 'project' layer
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;lib.registry&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;ext&lt;/span&gt;
&lt;span class="nf"&gt;ext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wt_post_db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;project_dir&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# calls highest-priority handler
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Auto-loading hooks
&lt;/h2&gt;

&lt;p&gt;Skills don't help if the agent doesn't load them. I got tired of manually telling it which skill to read, so I added a hook that suggests the right skills automatically based on what you're doing.&lt;/p&gt;

&lt;p&gt;The mechanism is &lt;code&gt;ensure-skills-loaded.sh&lt;/code&gt;, a hook that runs before every message (in Claude Code, this is a &lt;code&gt;UserPromptSubmit&lt;/code&gt; hook; other agent platforms would use their own equivalent). It does three things:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9wk4z05w7ht6n32dr4py.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9wk4z05w7ht6n32dr4py.png" alt="diagram" width="784" height="1768"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Project context detection
&lt;/h3&gt;

&lt;p&gt;The hook scans all skill directories for &lt;code&gt;hook-config/context-match.yml&lt;/code&gt; files. If any pattern in the file matches the current working directory or the active-repo tracker, that skill is identified as the project overlay. This is how teatree knows you're working in a specific project without you having to say so.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# hook-config/context-match.yml&lt;/span&gt;
&lt;span class="na"&gt;cwd_patterns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;acme-backend"&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;acme-frontend"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your &lt;code&gt;$PWD&lt;/code&gt; contains &lt;code&gt;acme-backend&lt;/code&gt;, the hook knows you're in the acme project and will suggest loading the &lt;code&gt;ac-acme&lt;/code&gt; overlay alongside whatever lifecycle skill you need.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Intent detection
&lt;/h3&gt;

&lt;p&gt;The hook parses the prompt to figure out which lifecycle phase you're in. It checks for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;URL patterns&lt;/strong&gt; — a GitLab issue URL triggers &lt;code&gt;t3-ticket&lt;/code&gt;, a Sentry URL triggers &lt;code&gt;t3-debug&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keyword patterns&lt;/strong&gt; — "implement" triggers &lt;code&gt;t3-code&lt;/code&gt;, "push" triggers &lt;code&gt;t3-ship&lt;/code&gt;, "broken" triggers &lt;code&gt;t3-debug&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;End-of-session phrases&lt;/strong&gt; — "done", "all set", "that's it" triggers &lt;code&gt;t3-retro&lt;/code&gt; (only if at least one other skill was loaded this session)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bare imperative verbs&lt;/strong&gt; — "Fix the login page" triggers &lt;code&gt;t3-code&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If nothing matches and you're in project context, it defaults to &lt;code&gt;t3-code&lt;/code&gt; — because most prompts in a project directory are about coding.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Dependency resolution and suggestion
&lt;/h3&gt;

&lt;p&gt;Once the hook knows which skill you need, it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Parses the skill's &lt;code&gt;requires:&lt;/code&gt; frontmatter to find dependencies&lt;/li&gt;
&lt;li&gt;Checks which skills are already loaded (tracked in a session file)&lt;/li&gt;
&lt;li&gt;Builds a suggestion list of skills that need loading&lt;/li&gt;
&lt;li&gt;Adds companion skills (e.g., &lt;code&gt;ac-django&lt;/code&gt; for backend work in a Django project)&lt;/li&gt;
&lt;li&gt;Adds reference file injections from &lt;code&gt;reference-injections.yml&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The output looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LOAD THESE SKILLS NOW: /t3-workspace, /t3-code, /ac-acme.
ACME references to read: references/prerequisites-and-setup.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent sees this as a system message and loads the skills before doing anything else. The wording is intentionally forceful ("LOAD THESE SKILLS NOW") — softer phrasing ("Consider loading...") gets ignored by models.&lt;/p&gt;

&lt;h3&gt;
  
  
  Symlink health checks
&lt;/h3&gt;

&lt;p&gt;The hook also runs a once-per-session health check on skills that you maintain (determined by an ownership config):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Verifies skill symlinks are actual symlinks (not stale copies)&lt;/li&gt;
&lt;li&gt;Checks that the source is a real git repository (not a downloaded zip)&lt;/li&gt;
&lt;li&gt;Validates that symlinks point into git repos (so retrospective commits work)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If anything is broken, it either auto-fixes (re-running the installer) or warns with a specific remediation.&lt;/p&gt;




&lt;h2&gt;
  
  
  The retrospective loop
&lt;/h2&gt;

&lt;p&gt;After every non-trivial session, &lt;code&gt;t3-retro&lt;/code&gt; runs a retrospective — a systematic audit of the conversation that produces concrete skill improvements and optionally contributes them upstream.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Figu93yip3et13p3r9dsq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Figu93yip3et13p3r9dsq.png" alt="diagram" width="784" height="1848"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  What the audit catches
&lt;/h3&gt;

&lt;p&gt;The retrospective categorizes issues into specific types:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;What went wrong&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;False completion&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Claimed "done" without full verification&lt;/td&gt;
&lt;td&gt;Said feature was complete but didn't run the test suite&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Skill not loaded&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;A relevant skill existed but wasn't loaded&lt;/td&gt;
&lt;td&gt;Worked in project context without the overlay&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Playbook miss&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;A playbook covered the task but wasn't consulted&lt;/td&gt;
&lt;td&gt;Didn't check the deployment playbook before pushing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Over-engineering&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Did unnecessary work&lt;/td&gt;
&lt;td&gt;Built a migration when admin config would have sufficed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Under-engineering&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Missed required work&lt;/td&gt;
&lt;td&gt;Updated the backend but forgot the frontend changes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hook gap&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Auto-loading should have triggered but didn't&lt;/td&gt;
&lt;td&gt;Hook didn't detect intent from "fix the flaky test"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Stale guidance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Followed outdated instructions&lt;/td&gt;
&lt;td&gt;Playbook referenced pre-refactoring patterns&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For each issue, the retrospective determines the root cause and writes the fix directly into the skill system — a new guardrail, an updated playbook, a troubleshooting entry, a hook pattern.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where improvements go
&lt;/h3&gt;

&lt;p&gt;The retrospective respects a clear hierarchy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Project overlay&lt;/strong&gt; (&lt;code&gt;$T3_OVERLAY&lt;/code&gt;) — receives project-specific improvements (troubleshooting, playbooks, guardrails). This is the default target when &lt;code&gt;T3_CONTRIBUTE&lt;/code&gt; is &lt;code&gt;false&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Core skills&lt;/strong&gt; (&lt;code&gt;$T3_REPO&lt;/code&gt;) — only modified when &lt;code&gt;T3_CONTRIBUTE=true&lt;/code&gt;, and only for generic improvements (missing verification steps, hook gaps, stale core guidance)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Personal config&lt;/strong&gt; (memory files, agent config like &lt;code&gt;AGENTS.md&lt;/code&gt;) — for user preferences and environment-specific facts. Also serves as a fallback location when the overlay isn't maintained by the user.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The contribution model
&lt;/h3&gt;

&lt;p&gt;When you enable &lt;code&gt;T3_CONTRIBUTE=true&lt;/code&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The retrospective creates a local commit on the current branch in your fork. It never pushes automatically.&lt;/li&gt;
&lt;li&gt;A privacy scan checks for emails, home directory paths, API keys, internal hostnames, and any terms in &lt;code&gt;$T3_BANNED_TERMS&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;When you're ready, &lt;code&gt;/t3-contribute&lt;/code&gt; reviews what will be pushed, checks for fork divergence, and optionally opens an issue on the upstream repo.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The idea is that every user's failures make the system better for all users — but only through an explicit, reviewed contribution path. Nothing happens without your consent. The default is &lt;code&gt;T3_CONTRIBUTE=false&lt;/code&gt;, which means the retrospective only improves your project overlay and personal config.&lt;/p&gt;

&lt;h3&gt;
  
  
  A concrete example
&lt;/h3&gt;

&lt;p&gt;Suppose during a session, the agent set up a multi-repo worktree and claimed it was ready, but the backend server failed to start due to port conflicts with a previous worktree. The agent didn't verify that the infrastructure was actually running before declaring complete.&lt;/p&gt;

&lt;p&gt;The retrospective would:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Audit&lt;/strong&gt;: Identify this as "false completion" — claimed infrastructure ready without verification evidence&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Root cause&lt;/strong&gt;: The &lt;code&gt;t3-workspace&lt;/code&gt; script runs through all setup steps but has no way for projects to define and verify health checks before the agent declares the worktree usable&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fix (core)&lt;/strong&gt;: Add a new extension point &lt;code&gt;wt_health_check&lt;/code&gt; to &lt;code&gt;t3-workspace&lt;/code&gt; that projects can implement&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fix (overlay)&lt;/strong&gt;: Implement &lt;code&gt;wt_health_check&lt;/code&gt; in the project's &lt;code&gt;project_hooks.py&lt;/code&gt; to curl the backend, check the frontend dev server, verify the database is accessible&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verify&lt;/strong&gt;: Check that the skill file parses, the extension point is registered correctly, and the overlay hook runs without errors&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Commit&lt;/strong&gt;: If &lt;code&gt;T3_CONTRIBUTE=true&lt;/code&gt;, commit the core extension point to the fork's teatree core skills; overlay changes go to the project overlay repo&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Next time the agent sets up a worktree, &lt;code&gt;t3-workspace&lt;/code&gt; runs the project's health checks before finishing — the core provides the mechanism, the project overlay provides the specifics. Both are enforced going forward.&lt;/p&gt;

&lt;h3&gt;
  
  
  It adds up
&lt;/h3&gt;

&lt;p&gt;A single retrospective might fix one guardrail. After enough sessions, you've accumulated a lot of them — each one from a specific failure that actually happened.&lt;/p&gt;




&lt;h2&gt;
  
  
  Companion skills
&lt;/h2&gt;

&lt;p&gt;Teatree handles the lifecycle — ticket intake, worktree management, TDD, review, delivery. It doesn't know about your programming language's conventions or your framework's best practices. That's what companion skills are for.&lt;/p&gt;

&lt;p&gt;Companion skills are standalone skills that live in separate repos and are loaded alongside teatree when relevant. I maintain a few (&lt;a href="https://github.com/souliane/skills" rel="noopener noreferrer"&gt;souliane/skills&lt;/a&gt;) covering Django and Python conventions, but the best companion skill for your stack is one you find (or build) yourself. I wrote a separate post about &lt;a href="https://dev.to/souliane/skill-driven-development-transferring-your-craft-to-ai-agents"&gt;skill-driven development and the skills I'm open-sourcing&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The project overlay's &lt;code&gt;hook-config/context-match.yml&lt;/code&gt; wires companion skills to repo patterns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;companion_skills&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ac-django&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;acme-backend"&lt;/span&gt;
  &lt;span class="na"&gt;ac-python&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;acme-backend"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the hook detects you're working in &lt;code&gt;acme-backend&lt;/code&gt;, it suggests loading &lt;code&gt;ac-django&lt;/code&gt; and &lt;code&gt;ac-python&lt;/code&gt; alongside the lifecycle skill. You get framework conventions without cluttering the core lifecycle skills with language-specific details.&lt;/p&gt;

&lt;p&gt;This separation matters. Django conventions change on a different cadence than worktree management. Keeping them in separate skills means you can update one without touching the other, and teams using Flask or Express aren't burdened with Django-specific guidance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Companion skills vs framework layer
&lt;/h3&gt;

&lt;p&gt;These are different things. The &lt;strong&gt;framework layer&lt;/strong&gt; is teatree's built-in middle priority in the 3-layer extension point registry — it ships stock implementations for common frameworks (e.g., a Django integration that auto-registers &lt;code&gt;manage.py migrate&lt;/code&gt; as the post-DB hook). &lt;strong&gt;Companion skills&lt;/strong&gt; are external standalone skills that teach the agent coding conventions — they don't register extension points, they provide guidelines. The framework layer handles &lt;em&gt;infrastructure&lt;/em&gt; (how to run migrations); companion skills handle &lt;em&gt;conventions&lt;/em&gt; (how to write good Django code).&lt;/p&gt;




&lt;h2&gt;
  
  
  Getting started
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;An AI coding agent (the auto-loading hooks currently target &lt;a href="https://docs.anthropic.com/en/docs/claude-code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;, but the skills and scripts work with any agent that can read files and run commands)&lt;/li&gt;
&lt;li&gt;Python 3.12+&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.astral.sh/uv/" rel="noopener noreferrer"&gt;uv&lt;/a&gt; (Python package manager)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Installation
&lt;/h3&gt;

&lt;p&gt;Teatree requires a local git clone — it has shared infrastructure (&lt;code&gt;scripts/&lt;/code&gt;, &lt;code&gt;references/&lt;/code&gt;, &lt;code&gt;integrations/&lt;/code&gt;) that lives outside the individual skill directories, so &lt;code&gt;npx skills add&lt;/code&gt; alone isn't enough.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/souliane/teatree/fork" rel="noopener noreferrer"&gt;Fork the repo on GitHub&lt;/a&gt; (or just clone it directly if you don't plan to contribute back), then:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone git@github.com:YOUR_USERNAME/teatree.git ~/workspace/teatree
&lt;span class="nb"&gt;cd&lt;/span&gt; ~/workspace/teatree
./scripts/install_skills.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The install script creates symlinks from your agent's skills directory to the clone. Then open your agent and run &lt;code&gt;/t3-setup&lt;/code&gt; — it handles config, shell integration, hooks, and optionally scaffolds a project overlay for your repos.&lt;/p&gt;

&lt;p&gt;If you want the retrospective loop to write improvements back into skill files, set &lt;code&gt;T3_CONTRIBUTE=true&lt;/code&gt; in &lt;code&gt;~/.teatree&lt;/code&gt; (created by &lt;code&gt;/t3-setup&lt;/code&gt;). This requires a fork — the agent pushes to your fork, not to the upstream repo.&lt;/p&gt;

&lt;p&gt;The setup wizard:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Checks prerequisites&lt;/strong&gt; — verifies all required tools are installed, reports a summary table&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Creates &lt;code&gt;~/.teatree&lt;/code&gt;&lt;/strong&gt; — asks for workspace path, branch prefix, issue tracker, chat platform&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scaffolds a project overlay&lt;/strong&gt; (optional) — ask it about your repos, framework, and database, and it generates the skeleton&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Configures shell integration&lt;/strong&gt; — adds sourcing lines to &lt;code&gt;.zshrc&lt;/code&gt; or &lt;code&gt;.bashrc&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Installs skill symlinks&lt;/strong&gt; — creates the symlink chain from the agent's skills directory to your clone&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Configures hooks&lt;/strong&gt; — sets up &lt;code&gt;ensure-skills-loaded.sh&lt;/code&gt; and the statusline (Claude Code-specific; other agents would configure their own hooks)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runs a smoke test&lt;/strong&gt; — verifies hooks parse, statusline runs, Python imports work&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;After setup, restart your agent (or start a new conversation). Try: "start working on ticket PROJ-1234" — the hook should suggest &lt;code&gt;/t3-ticket&lt;/code&gt; + &lt;code&gt;/t3-workspace&lt;/code&gt;, and the agent will take it from there.&lt;/p&gt;

&lt;p&gt;You can re-run &lt;code&gt;/t3-setup&lt;/code&gt; at any time as a health check. It validates the existing installation, checks for broken symlinks, verifies hook wording, and reports what needs fixing.&lt;/p&gt;

&lt;h3&gt;
  
  
  The directory structure after setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;~/
├── .teatree                    # Config file (sourced by shell)
├── .local/share/teatree/       # Runtime data (ticket cache, dashboard, MR reminders, cache)
├── .claude/                    # Claude Code example (adapt paths for your agent)
│   ├── CLAUDE.md               # Agent instructions (skill-loading block)
│   ├── settings.json           # Hooks, statusline
│   └── skills/
│       ├── t3-ticket -&amp;gt; ~/workspace/teatree/t3-ticket
│       ├── t3-code -&amp;gt; ~/workspace/teatree/t3-code
│       ├── ...
│       └── ac-acme -&amp;gt; ~/workspace/acme-overlay
└── workspace/
    ├── teatree/                # Teatree clone (or fork)
    ├── acme-overlay/           # Project overlay
    ├── acme-backend/           # Main repo clone
    ├── acme-frontend/          # Main repo clone
    └── ac/                     # Ticket worktrees
        ├── 1234/
        │   ├── acme-backend/   # Worktree
        │   ├── acme-frontend/  # Worktree
        │   └── .env.worktree   # Shared env
        └── 5678/
            └── ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The symlinks ensure that skill files always resolve to the live git clone. This is important for the retrospective — when the agent writes improvements to skill files, the changes land in a real git repository where they can be committed and pushed.&lt;/p&gt;




&lt;h2&gt;
  
  
  When it helps (and when it doesn't)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;It helps most with:&lt;/strong&gt; structured, repeatable processes that span multiple repos or require project-specific knowledge. Ticket intake, worktree setup, TDD cycles, code review, MR creation, CI debugging. The kind of work that eats hours but follows a pattern.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It helps less with:&lt;/strong&gt; one-off creative decisions, highly ambiguous tasks, or projects simple enough that a single repo with &lt;code&gt;npm start&lt;/code&gt; covers everything. If your development workflow is "edit a file and push," teatree is overkill.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The sweet spot&lt;/strong&gt; is when you have enough friction that encoding it pays off through repetition. The project works for my workflow but hasn't been tested beyond that. If something doesn't click for your setup, open an issue or a PR. Or point your AI agent at the problem and let it fix things until it works for you — that's kind of the point.&lt;/p&gt;

&lt;h3&gt;
  
  
  A note on security
&lt;/h3&gt;

&lt;p&gt;Teatree skills are prompt instructions — they control what your AI agent does. That makes the supply chain a security surface. The defaults are conservative: self-improvement is off (&lt;code&gt;T3_CONTRIBUTE=false&lt;/code&gt;), pushing is disabled (&lt;code&gt;T3_PUSH=false&lt;/code&gt;), and there is no auto-update mechanism. You opt in to each level of automation explicitly. If you use a fork from someone else, you're trusting that person's skill files as agent instructions — review changes before pulling.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why "teatree"?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;TEA&lt;/strong&gt;'s &lt;strong&gt;E&lt;/strong&gt;xtensible &lt;strong&gt;A&lt;/strong&gt;rchitecture for work*&lt;em&gt;tree&lt;/em&gt;* management. Also, teatree oil cuts through grime, which felt fitting.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/souliane/teatree" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; | &lt;a href="https://github.com/souliane/teatree/blob/main/LICENSE" rel="noopener noreferrer"&gt;MIT License&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>productivity</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
