DEV Community

Cover image for Introducing Teatree: Parallel Multi-Repo Development with AI Agents
Adrien Cossa
Adrien Cossa

Posted on

Introducing Teatree: Parallel Multi-Repo Development with AI Agents

I'm a Customer Success Engineer at Oper Credits. My daily work involves a multi-repo project—backend, frontend, translations, configuration—and I use AI coding agents constantly. The friction isn't about writing code; agents handle that well. It's everything surrounding it: understanding and following different conventions across codebases, coordinating changes across services, managing local environments where our setup diverges from what's in git, and the workflow patterns we could all benefit from encoding.

The agent can figure out most of these things, but it struggles with the specifics — it loops on troubleshooting, tries approaches that don't match the project's actual setup, and burns tokens on trial and error. Teatree started as a way to encode that knowledge so the agent gets it right the first time. It's also an approach that lets you define and automate your personal workflow without adding friction with your team — build it as a proof of concept on your own, then push for adoption once it works.

This post walks through the architecture, the design choices I landed on, and how the pieces fit together. It's long because there's a lot of ground to cover. If you just want the quick pitch, the README has that.


Table of Contents

  1. What it looks like
  2. The problem
  3. Skills as structured markdown
  4. The lifecycle graph
  5. Multi-repo worktree management
  6. The overlay and extension system
  7. Auto-loading hooks
  8. The retrospective loop
  9. Companion skills
  10. Getting started
  11. When it helps (and when it doesn't)

What it looks like

Tell your AI agent what you want. Teatree skills guide it through the entire lifecycle:

https://gitlab.com/org/repo/-/issues/1234

The agent fetches the ticket, creates synchronized worktrees, provisions isolated databases and ports, implements the feature with TDD, writes a test plan, runs E2E tests, self-reviews, then pushes and creates the merge request.

Fix PROJ-5678

The agent fetches the failed test report from CI, reproduces locally, fixes, pushes, and monitors the pipeline until green.

Review https://gitlab.com/org/repo/-/merge_requests/456

The agent fetches the ticket for context, inspects every commit individually, and posts draft review comments inline on the correct file and line.

Run the test plan for !789

The agent generates a test plan from the MR changes, runs E2E tests, and posts evidence screenshots on the MR.

Follow up on my open tickets

The agent batch-processes your assigned tickets, checks CI statuses, nudges stale MRs, and starts work on anything that's ready.


The problem

Modern AI coding agents can handle more than just writing code — they can reason about architecture, run tests, create merge requests, and in some cases drive a feature from ticket to delivery. The bottleneck isn't usually their capability but their efficiency: without your project's specific context, they spend tokens and time rediscovering things you already know. And the knowledge they're missing — your repo layout, your CI conventions, your team's practices, your local tooling — isn't in any training data.

The friction is especially pronounced with:

  • Multi-repo setups — creating branches across 3+ repos for a single ticket, provisioning isolated databases, allocating non-conflicting ports
  • Atypical local environments — personal tooling that differs from what's in git, dev configurations the team hasn't adopted yet
  • Operational workflows — self-reviewing before pushing, creating properly formatted merge requests, monitoring pipelines, running retrospectives

The agent can attempt all of these. But without explicit guidance, it either asks twenty questions or confidently does the wrong thing — and when something fails, it loops on troubleshooting instead of applying the fix you already know.

I tried shell scripts and aliases first, sometimes Python scripts too. They worked for the happy path but couldn't handle the edge cases — the database import that fails because VPN is down, the port conflict because another worktree is still running, the CI format check that rejects your MR title. A shell script can't say "if the test fails, check if it's a known flake — here are the patterns." An AI agent can.

That's the core idea: encode workflow knowledge in a format that an AI agent can read, interpret, and adapt to the current situation. Not as rigid scripts, but as structured instructions with enough context for the agent to handle edge cases intelligently.


Skills as structured markdown

A teatree skill is a markdown file (SKILL.md) with YAML frontmatter. Here's a simplified example:

---
name: t3-code
description: Writing code with TDD methodology.
requires:
  - t3-workspace
metadata:
  version: 0.0.1
---

# Writing Code (TDD)

## Dependencies

- **t3-workspace** (required) — provides dev servers for live reload.

## Workflow

### 1. Plan First (Non-Negotiable)

Always make a plan before writing code. Never jump straight to coding.
- Identify scope: which files, modules, and repos are affected.
- Review existing patterns in the codebase before writing new code.

### 2. TDD Cycle

Write failing test → Implement → Green → Refactor

### 3. Follow Conventions

- Language/framework conventions from the project's convention skills.
- Repository-specific patterns take precedence over generic guidance.
Enter fullscreen mode Exit fullscreen mode

A few things to note:

Skills typically contain both instructions and scripts. The markdown instructions tell the agent when and why to do things — like handing a capable colleague a detailed runbook. But teatree's skills also include Python scripts for deterministic operations: worktree creation, port allocation, database provisioning, branch finalization. The agent calls these scripts as tools. This is deliberate: a script the agent calls is more robust than a 15-step procedure in a markdown file. The split is practical — instructions for judgment calls, scripts for mechanical work.

Skills declare dependencies. The requires: field in the frontmatter tells the loading system which other skills need to be present. When t3-code is loaded, t3-workspace comes along automatically. This eliminates wasted round-trips where the agent reads a skill, sees "Load /t3-workspace now", and then has to make a second call.

Skills use progressive disclosure. Most SKILL.md files are 80–160 lines, with detailed procedures in references/ files that the agent reads on demand, not upfront. This matters for token economy — loading the typical set of skills for a task stays well within a reasonable context budget.

Skills have rules marked (Non-Negotiable). These are guardrails that the agent must not skip, even if they seem redundant. "Always verify services respond via HTTP before declaring running" sounds obvious, but without it, the agent will say "servers started" after launching the process — without checking whether it actually came up. These guardrails come from real failures.


The lifecycle graph

Teatree organizes development into phases, each handled by a dedicated skill:

graph LR
  ticket["t3-ticket<br/>(fetches ticket)"] --> code["t3-code<br/>(implements)"]
  code --> test["t3-test<br/>(tests)"]
  test --> review["t3-review<br/>(reviews)"]
  review --> ship["t3-ship<br/>(delivers)"]
  ship --> retro["t3-retro<br/>(improves skills)"]
  retro -.-> ticket

  ship --> rr["t3-review-request<br/>(notifies reviewers)"]
  debug["t3-debug<br/>(troubleshoots)"] -.-> code
  debug -.-> test
  followup["t3-followup<br/>(batch processes)"] -.-> ticket
  workspace["t3-workspace<br/>(provisions infra)"] -..-> code & test & review & ship
Enter fullscreen mode Exit fullscreen mode

The flow is: ticket → code → test → review → ship → retro, with t3-workspace providing infrastructure to all phases and t3-debug available whenever something breaks.

Here's what each skill does:

Skill Phase What it handles
t3-setup Bootstrapping Interactive setup wizard, health checks, overlay scaffolding
t3-workspace Infrastructure multi-repo worktrees, port allocation, DB provisioning, env files, dev servers, cleanup
t3-ticket Intake Fetch the issue, extract acceptance criteria, detect affected repos, detect tenant/variant, create worktrees
t3-code Implementation Plan-first workflow, TDD cycle, convention enforcement, feature flag checks
t3-test Verification Test execution, CI interaction, E2E test plans, quality gates
t3-debug Troubleshooting Systematic 5-phase debugging protocol, user-hint-first investigation
t3-review Code review Self-review checklist, giving review, receiving feedback
t3-ship Delivery Commit formatting, branch finalization, MR creation, pipeline monitoring
t3-review-request Notifications Post MR links to review channels, check for duplicate requests
t3-retro Improvement Conversation audit, root cause analysis, skill updates, privacy scans
t3-contribute Contribution Push skill improvements to fork, open upstream issues
t3-followup Batch ops Process assigned tickets, check CI statuses, nudge stale MRs

The skills mirror how development actually works. Implementing a ticket touches intake, coding, testing, review, and delivery — often across multiple repos. Making the skills fully independent would mean duplicating knowledge across every one of them, which always diverges over time.

The follow-up dashboard

One skill worth highlighting is t3-followup. It runs your daily routine: batch-processing new tickets, checking CI statuses, advancing tickets through their lifecycle, and nudging reviewers about stale MRs.

As it works, it builds a persistent cache (followup.json) of all in-flight work — tickets, merge requests, pipeline statuses, review request states, and review comment tracking. From that cache, it generates an HTML dashboard:

t3-followup dashboard

The dashboard gives you a single view of everything that's in flight. Tickets show their current lifecycle status. MRs show pipeline results (color-coded pills), review request state, and which review channel they were posted to. Review comments are tracked so you know which discussions are addressed and which are waiting on the reviewer. All of this embeds clickable links — to the ticket, the MR, the CI pipeline, Slack messages, and review comments — so you can jump from the dashboard directly into any conversation or resource.

The cache means t3-followup doesn't re-fetch data unnecessarily — it only checks what's stale. And because it's a plain JSON file, project overlays can inject extra fields (external tracker status, deployment state, tenant info) via the followup_enrich_data extension point. The dashboard renders whatever's in the cache, so overlay-specific columns show up automatically.

Stale tickets are purged automatically — when all MRs for a ticket have been merged for more than 14 days (configurable via T3_FOLLOWUP_PURGE_DAYS), the ticket and its MRs are removed from the cache.


Multi-repo worktree management

This is where teatree started, and it's the feature that makes parallel ticket work possible.

Suppose your project has three repos: acme-backend, acme-frontend, and acme-translations. You're about to work on ticket PROJ-1234. Running t3_ticket PROJ-1234 creates this structure:

graph TD
  subgraph "$WORKSPACE_DIR"
    direction TB
    subgraph "Main repos (default branch)"
      main_be["acme-backend/"]
      main_fe["acme-frontend/"]
      main_tr["acme-translations/"]
    end
    subgraph "Ticket worktrees"
      subgraph "ac/1234/"
        wt_be["acme-backend/<br/>(worktree)"]
        wt_fe["acme-frontend/<br/>(worktree)"]
        wt_tr["acme-translations/<br/>(worktree)"]
        envfile[".env.worktree<br/>(shared ports, DB, variant)"]
      end
      subgraph "ac/5678/"
        wt2_be["acme-backend/<br/>(worktree)"]
        wt2_fe["acme-frontend/<br/>(worktree)"]
      end
    end
  end

  main_be -.->|"git worktree"| wt_be
  main_fe -.->|"git worktree"| wt_fe
  main_tr -.->|"git worktree"| wt_tr
  main_be -.->|"git worktree"| wt2_be
  main_fe -.->|"git worktree"| wt2_fe
Enter fullscreen mode Exit fullscreen mode

Each ticket gets its own directory. Inside that directory, each affected repo gets a git worktree — a lightweight checkout that shares the same .git directory as the main clone but has its own branch and working tree. The ticket directory also gets a shared .env.worktree file with allocated ports, database name, and variant configuration.

After creating the worktrees, t3_setup provisions the environment:

  1. Symlinks.venv, node_modules, .python-version, and configurable shared directories are symlinked from the main repo (so you don't reinstall dependencies for every worktree)
  2. Environment files.env.worktree with unique ports, database URL, variant-specific overrides
  3. Database — creates an isolated DB, imports from a snapshot or dump, runs migrations
  4. direnv — auto-loads environment variables when you cd into the worktree
  5. Frontend dependencies — installs if the lockfile changed

Then t3_start brings everything up: Docker services, migrations, backend server, frontend dev server. Each worktree is fully isolated — its own database, its own ports, its own services. You can have ticket 1234 and ticket 5678 running simultaneously without conflicts.

Why this matters

The most common failure before teatree was contamination between tickets. Working on ticket A, you make a database change. Then you switch to ticket B, which expected the old schema. Migrations fail, the frontend shows stale data, and you spend time figuring out what went wrong. Worktree isolation eliminates this entirely. Each ticket is a clean room.

The other benefit is parallelism. While waiting for CI on ticket A, you can start working on ticket B in a completely separate environment. No branch switching, no stashing, no "wait, which database am I pointing at?"

Multi-tenant awareness

If your project serves multiple tenants — each with their own configuration, feature flags, and sometimes database — teatree handles that too. The variant system (wt_detect_variant) auto-detects the target tenant from ticket labels, descriptions, or external trackers, then provisions tenant-specific databases, environment variables, and configuration. Feature flag checks during code review ensure changes are properly scoped per tenant.

The project overlay wires in your tenant-to-variant mapping; teatree handles the rest. This means "set up a worktree for ticket X" automatically produces an environment configured for the correct tenant — no manual env file editing, no guesswork about which tenant you're in.

Why t3_ticket instead of raw git commands

The convention is <ticket>/<repo>/ — a ticket directory containing worktrees. Raw git worktree add creates flat worktrees at whatever path you give it, which breaks the ticket-directory structure that every other tool expects. t3_ticket enforces the convention, handles branch naming (with your prefix), and creates worktrees across all affected repos in one call. The skill file marks this as (Non-Negotiable) because flat worktrees cause subtle breakage downstream.


The overlay and extension system

Teatree is generic infrastructure. It knows how to create worktrees, allocate ports, and orchestrate a development lifecycle. But it doesn't know how to start your backend, import your database, or create your merge requests. That project-specific knowledge lives in a project overlay.

The three-layer architecture

graph TB
  subgraph "Extension Point Resolution"
    direction TB
    call["registry.call('wt_run_backend')"] --> resolve["Resolve highest priority"]
    resolve --> project["Project Layer<br/>(your overlay)<br/>Priority: highest"]
    resolve --> framework["Framework Layer<br/>(e.g., Django plugin)<br/>Priority: middle"]
    resolve --> default["Default Layer<br/>(teatree core)<br/>Priority: lowest"]
  end

  project -.->|"if registered"| result["Use project handler"]
  framework -.->|"if no project"| result2["Use framework handler"]
  default -.->|"fallback"| result3["Use default (usually no-op)"]

  style project fill:#c8e6c9
  style framework fill:#bbdefb
  style default fill:#f5f5f5
Enter fullscreen mode Exit fullscreen mode

When teatree needs to do something project-specific (start the backend, import a database, create an MR), it calls an extension point through a registry. The registry resolves the implementation using a 3-layer priority:

Priority Layer Source Example
Highest Project Your overlay's project_hooks.py t3_start that runs Docker + Django + Angular
Middle Framework Framework integration (e.g., Django) wt_post_db that runs manage.py migrate
Lowest Default Teatree core fallback Usually a no-op or "not configured" message

The registry itself is simple — 45 lines of Python:

_LAYERS = ("default", "framework", "project")
_LAYER_RANK = {layer: i for i, layer in enumerate(_LAYERS)}
_registry: dict[str, list[tuple[str, Callable]]] = {}

def register(point: str, fn: Callable, layer: str = "default") -> None:
    entries = _registry.setdefault(point, [])
    entries[:] = [(lyr, func) for lyr, func in entries if lyr != layer]
    entries.append((layer, fn))
    entries.sort(key=lambda x: _LAYER_RANK[x[0]])

def get(point: str) -> Callable | None:
    entries = _registry.get(point)
    if not entries:
        return None
    return entries[-1][1]  # highest priority = last entry

def call(point: str, *args, **kwargs):
    fn = get(point)
    if fn is None:
        raise KeyError(f"No handler registered for extension point {point!r}")
    return fn(*args, **kwargs)
Enter fullscreen mode Exit fullscreen mode

Registering a handler at the "project" layer automatically overrides anything at "framework" or "default". The framework layer is there so teatree can ship framework integrations (Django is the first) that work out of the box but can still be overridden by project-specific needs.

What an overlay looks like

A project overlay is a directory with this structure:

acme-overlay/
├── SKILL.md                    # Skill description + loading order
├── scripts/
│   └── lib/
│       ├── bootstrap.sh        # Shell wrappers (sourced after teatree)
│       ├── shell_helpers.sh    # Env loading, variant detection
│       └── project_hooks.py    # Extension point overrides
├── hook-config/
│   ├── context-match.yml       # Patterns that trigger this overlay
│   └── reference-injections.yml # References to load per lifecycle phase
└── references/
    ├── prerequisites-and-setup.md
    ├── troubleshooting.md
    └── playbooks/
        └── README.md
Enter fullscreen mode Exit fullscreen mode

The project_hooks.py file registers your overrides:

from lib.registry import register

def register_acme():
    def wt_env_extra(envfile):
        with open(envfile, "a") as f:
            f.write("ACME_API_KEY=dev-key\n")

    def wt_db_import(db_name, variant, main_repo):
        # Import from your team's shared dump
        from lib.db import db_restore
        db_restore(db_name, f"{main_repo}/dumps/{variant}_latest.sql")
        return True

    def wt_run_backend(*args):
        import subprocess
        subprocess.run(["python", "manage.py", "runserver", "0.0.0.0:8000"],
                      check=False)

    register("wt_env_extra", wt_env_extra, "project")
    register("wt_db_import", wt_db_import, "project")
    register("wt_run_backend", wt_run_backend, "project")
Enter fullscreen mode Exit fullscreen mode

The teatree core scripts call registry.call("wt_run_backend"), and your project handler runs instead of the default "not configured" stub. You only override what you need — everything else falls through to the framework or default layer.

There are 25 extension points

They cover the full lifecycle:

Category Extension Points
Workspace setup wt_symlinks, wt_env_extra, wt_services, wt_detect_variant
Database wt_db_import, wt_post_db, wt_restore_ci_db, wt_reset_passwords
Dev servers wt_run_backend, wt_run_frontend, wt_build_frontend, wt_start_session
Testing wt_run_tests, wt_trigger_e2e, wt_quality_check
Delivery wt_create_mr, wt_monitor_pipeline, wt_send_review_request, wt_fetch_failed_tests, wt_fetch_ci_errors
Ticket management ticket_check_deployed, ticket_update_external_tracker, ticket_get_mrs
Follow-up followup_enrich_data, followup_enrich_dashboard

The /t3-setup wizard can scaffold an overlay for you. Tell it your repos, your backend framework, and your database, and it generates the skeleton with commented-out examples for each relevant extension point. From there, fill in the blanks — or ask your AI agent to fill them in if it already knows your codebase (e.g., after working in the repos for a while).

The sourcing chain

Shell functions are loaded in order:

# In .zshrc:
source ~/.teatree                                     # load config
source "$T3_REPO/scripts/lib/bootstrap.sh"            # teatree core functions
source "$T3_OVERLAY/scripts/lib/bootstrap.sh"         # project overlay overrides
Enter fullscreen mode Exit fullscreen mode

The overlay's bootstrap has a guard — it checks that teatree was sourced first (_T3_SCRIPTS_DIR must be set). This prevents confusing errors from running the overlay standalone.

Inside Python scripts, the pattern is similar:

import lib.init
lib.init.init()                 # registers defaults + auto-detects framework
from lib.project_hooks import register_project
register_project()              # registers project overrides at 'project' layer
from lib.registry import call as ext
ext("wt_post_db", project_dir)  # calls highest-priority handler
Enter fullscreen mode Exit fullscreen mode

Auto-loading hooks

Skills are useless if they're not loaded. The whole point of teatree's hook system is that the right skills load automatically — you shouldn't have to think about which skill to activate before asking the agent to do something.

The mechanism is ensure-skills-loaded.sh, a hook that runs before every message (in Claude Code, this is a UserPromptSubmit hook; other agent platforms would use their own equivalent). It does three things:

flowchart TD
  A["User sends prompt"] --> B["Hook receives prompt + session context"]
  B --> C{"Detect project context"}
  C -->|"PWD matches overlay patterns"| D["Set project_context = true"]
  C -->|"No match"| E["Generic mode"]

  D --> F{"Detect intent from prompt"}
  E --> F

  F -->|"URL patterns"| G["gitlab.com/.../issues/123 → t3-ticket"]
  F -->|"Keyword patterns"| H["'implement' → t3-code<br/>'push' → t3-ship<br/>'broken' → t3-debug"]
  F -->|"End-of-session"| I["'done' / 'all set' → t3-retro"]
  F -->|"No match + project context"| J["Default → t3-code"]

  G & H & I & J --> K{"Resolve dependencies"}
  K --> L["t3-code requires t3-workspace"]
  K --> M["t3-ticket requires t3-workspace"]

  L & M --> N{"Check already loaded"}
  N -->|"Not loaded"| O["Add to suggestion list"]
  N -->|"Already loaded"| P["Skip"]

  O --> Q{"Project context?"}
  Q -->|"Yes"| R["Add overlay skill + companion skills"]
  Q -->|"No"| S["Skip"]

  R & S & P --> T["Output: LOAD THESE SKILLS NOW: /t3-workspace, /t3-code, /ac-acme"]
Enter fullscreen mode Exit fullscreen mode

1. Project context detection

The hook scans all skill directories for hook-config/context-match.yml files. If any pattern in the file matches the current working directory or the active-repo tracker, that skill is identified as the project overlay. This is how teatree knows you're working in a specific project without you having to say so.

# hook-config/context-match.yml
cwd_patterns:
  - "acme-backend"
  - "acme-frontend"
Enter fullscreen mode Exit fullscreen mode

If your $PWD contains acme-backend, the hook knows you're in the acme project and will suggest loading the ac-acme overlay alongside whatever lifecycle skill you need.

2. Intent detection

The hook parses the prompt to figure out which lifecycle phase you're in. It checks for:

  • URL patterns — a GitLab issue URL triggers t3-ticket, a Sentry URL triggers t3-debug
  • Keyword patterns — "implement" triggers t3-code, "push" triggers t3-ship, "broken" triggers t3-debug
  • End-of-session phrases — "done", "all set", "that's it" triggers t3-retro (only if at least one other skill was loaded this session)
  • Bare imperative verbs — "Fix the login page" triggers t3-code

If nothing matches and you're in project context, it defaults to t3-code — because most prompts in a project directory are about coding.

3. Dependency resolution and suggestion

Once the hook knows which skill you need, it:

  1. Parses the skill's requires: frontmatter to find dependencies
  2. Checks which skills are already loaded (tracked in a session file)
  3. Builds a suggestion list of skills that need loading
  4. Adds companion skills (e.g., ac-django for backend work in a Django project)
  5. Adds reference file injections from reference-injections.yml

The output looks like:

LOAD THESE SKILLS NOW: /t3-workspace, /t3-code, /ac-acme.
ACME references to read: references/prerequisites-and-setup.md
Enter fullscreen mode Exit fullscreen mode

The agent sees this as a system message and loads the skills before doing anything else. The wording is intentionally forceful ("LOAD THESE SKILLS NOW") — softer phrasing ("Consider loading...") gets ignored by models.

Symlink health checks

The hook also runs a once-per-session health check on skills that you maintain (determined by an ownership config):

  • Verifies skill symlinks are actual symlinks (not stale copies)
  • Checks that the source is a real git repository (not a downloaded zip)
  • Validates that symlinks point into git repos (so retrospective commits work)

If anything is broken, it either auto-fixes (re-running the installer) or warns with a specific remediation.


The retrospective loop

After every non-trivial session, t3-retro runs a retrospective. It's a systematic audit of the conversation that produces concrete skill improvements. It ends with an opportunity to contribute upstream by pushing your own improvements upstream.

flowchart TD
  A["Session ends or user triggers /t3-retro"] --> B["1. Conversation Audit"]
  B --> C["Categorize every issue:<br/>false completion, skill gap,<br/>playbook miss, over/under-engineering,<br/>hook gap, stale guidance"]

  C --> D["2. Root Cause Analysis"]
  D --> E["Why did each issue happen?<br/>Missing guardrail? Vague verification?<br/>Skill not loaded? Outdated step?"]

  E --> F["3. Fix Skills"]
  F --> G{"Where does the fix go?"}

  G -->|"Project-specific"| H["Write to $T3_OVERLAY<br/>(troubleshooting, playbooks, guardrails)"]
  G -->|"Core skill gap<br/>(T3_CONTRIBUTE=true)"| I["Write to $T3_REPO<br/>(skill files, references, hooks)"]
  G -->|"User preference"| J["Write to MEMORY.md<br/>(personal config only)"]

  H & I & J --> K["4. Quality Checks"]
  K --> L["No duplication across skills?"]
  K --> M["Single source of truth?"]
  K --> N["Pre-commit hooks pass?"]
  K --> O["Tests pass?"]

  L & M & N & O --> P{"T3_CONTRIBUTE=true?"}
  P -->|"Yes"| Q["5. Commit to Fork<br/>(local only, never auto-pushes)"]
  P -->|"No"| R["Done — overlay improved"]

  Q --> S["6. Privacy Scan<br/>(emails, paths, keys, banned terms)"]
  S --> T["Commit on current branch"]
  T --> U["User runs /t3-contribute later<br/>to review, push, open upstream issue"]
Enter fullscreen mode Exit fullscreen mode

What the audit catches

The retrospective categorizes issues into specific types:

Category What went wrong Example
False completion Claimed "done" without full verification Said feature was complete but didn't run the test suite
Skill not loaded A relevant skill existed but wasn't loaded Worked in project context without the overlay
Playbook miss A playbook covered the task but wasn't consulted Didn't check the deployment playbook before pushing
Over-engineering Did unnecessary work Built a migration when admin config would have sufficed
Under-engineering Missed required work Updated the backend but forgot the frontend changes
Hook gap Auto-loading should have triggered but didn't Hook didn't detect intent from "fix the flaky test"
Stale guidance Followed outdated instructions Playbook referenced pre-refactoring patterns

For each issue, the retrospective determines the root cause and writes the fix directly into the skill system — a new guardrail, an updated playbook, a troubleshooting entry, a hook pattern.

Where improvements go

The retrospective respects a clear hierarchy:

  • Project overlay ($T3_OVERLAY) — receives project-specific improvements (troubleshooting, playbooks, guardrails). This is the default target when T3_CONTRIBUTE is false.
  • Core skills ($T3_REPO) — only modified when T3_CONTRIBUTE=true, and only for generic improvements (missing verification steps, hook gaps, stale core guidance)
  • Personal config (memory files, agent config like AGENTS.md) — for user preferences and environment-specific facts. Also serves as a fallback location when the overlay isn't maintained by the user.

The contribution model

When you enable T3_CONTRIBUTE=true:

  1. The retrospective creates a local commit on the current branch in your fork. It never pushes automatically.
  2. A privacy scan checks for emails, home directory paths, API keys, internal hostnames, and any terms in $T3_BANNED_TERMS.
  3. When you're ready, /t3-contribute reviews what will be pushed, checks for fork divergence, and optionally opens an issue on the upstream repo.

The idea is that every user's failures make the system better for all users — but only through an explicit, reviewed contribution path. Nothing happens without your consent. The default is T3_CONTRIBUTE=false, which means the retrospective only improves your project overlay and personal config.

A concrete example

Suppose during a session, the agent set up a multi-repo worktree and claimed it was ready, but the backend server failed to start due to port conflicts with a previous worktree. The agent didn't verify that the infrastructure was actually running before declaring complete.

The retrospective would:

  1. Audit: Identify this as "false completion" — claimed infrastructure ready without verification evidence
  2. Root cause: The t3-workspace script runs through all setup steps but has no way for projects to define and verify health checks before the agent declares the worktree usable
  3. Fix (core): Add a new extension point wt_health_check to t3-workspace that projects can implement
  4. Fix (overlay): Implement wt_health_check in the project's project_hooks.py to curl the backend, check the frontend dev server, verify the database is accessible
  5. Verify: Check that the skill file parses, the extension point is registered correctly, and the overlay hook runs without errors
  6. Commit: If T3_CONTRIBUTE=true, commit the core extension point to the fork's teatree core skills; overlay changes go to the project overlay repo

Next time the agent sets up a worktree, t3-workspace runs the project's health checks before finishing — the core provides the mechanism, the project overlay provides the specifics. Both are enforced going forward.

It adds up

A single retrospective might fix one guardrail. After a hundred sessions, the skill system has a hundred guardrails, each one preventing a specific failure mode that actually happened in practice. Not because the model got smarter — because the skills got better.


Companion skills

Teatree handles the lifecycle — ticket intake, worktree management, TDD, review, delivery. But it doesn't know about your programming language's conventions or your framework's best practices. That's what companion skills are for.

Companion skills are optional, standalone skills that complement the lifecycle. They're not part of teatree core — they live in separate repos and are loaded alongside teatree when relevant. I maintain a few (souliane/skills) covering Django and Python conventions (including a code review improvement skill that refines review practices through feedback), but the best companion skill for your stack is one you find (or build) yourself — the agent can search for existing public skills or help you create one from your project's conventions.

The project overlay's hook-config/context-match.yml wires companion skills to repo patterns:

companion_skills:
  ac-django:
    - "acme-backend"
  ac-python:
    - "acme-backend"
Enter fullscreen mode Exit fullscreen mode

When the hook detects you're working in acme-backend, it suggests loading ac-django and ac-python alongside the lifecycle skill. You get framework conventions without cluttering the core lifecycle skills with language-specific details.

This separation matters. Django conventions change on a different cadence than worktree management. Keeping them in separate skills means you can update one without touching the other, and teams using Flask or Express aren't burdened with Django-specific guidance.

Companion skills vs framework layer

These are different things. The framework layer is teatree's built-in middle priority in the 3-layer extension point registry — it ships stock implementations for common frameworks (e.g., a Django integration that auto-registers manage.py migrate as the post-DB hook). Companion skills are external standalone skills that teach the agent coding conventions — they don't register extension points, they provide guidelines. The framework layer handles infrastructure (how to run migrations); companion skills handle conventions (how to write good Django code).


Getting started

Prerequisites

  • An AI coding agent (the auto-loading hooks currently target Claude Code, but the skills and scripts work with any agent that can read files and run commands)
  • Python 3.12+
  • uv (Python package manager)

Installation

Teatree requires a local git clone — it has shared infrastructure (scripts/, references/, integrations/) that lives outside the individual skill directories, so npx skills add alone isn't enough.

Fork the repo on GitHub (or just clone it directly if you don't plan to contribute back), then:

git clone git@github.com:YOUR_USERNAME/teatree.git ~/workspace/teatree
cd ~/workspace/teatree
./scripts/install_skills.sh
Enter fullscreen mode Exit fullscreen mode

The install script creates symlinks from your agent's skills directory to the clone. Then open your agent and run /t3-setup — it handles config, shell integration, hooks, and optionally scaffolds a project overlay for your repos.

If you want the retrospective loop to write improvements back into skill files, set T3_CONTRIBUTE=true in ~/.teatree (created by /t3-setup). This requires a fork — the agent pushes to your fork, not to the upstream repo.

The setup wizard:

  1. Checks prerequisites — verifies all required tools are installed, reports a summary table
  2. Creates ~/.teatree — asks for workspace path, branch prefix, issue tracker, chat platform
  3. Scaffolds a project overlay (optional) — ask it about your repos, framework, and database, and it generates the skeleton
  4. Configures shell integration — adds sourcing lines to .zshrc or .bashrc
  5. Installs skill symlinks — creates the symlink chain from the agent's skills directory to your clone
  6. Configures hooks — sets up ensure-skills-loaded.sh and the statusline (Claude Code-specific; other agents would configure their own hooks)
  7. Runs a smoke test — verifies hooks parse, statusline runs, Python imports work

After setup, restart your agent (or start a new conversation). Try: "start working on ticket PROJ-1234" — the hook should suggest /t3-ticket + /t3-workspace, and the agent will take it from there.

You can re-run /t3-setup at any time as a health check. It validates the existing installation, checks for broken symlinks, verifies hook wording, and reports what needs fixing.

The directory structure after setup

~/
├── .teatree                    # Config file (sourced by shell)
├── .local/share/teatree/       # Runtime data (ticket cache, dashboard, MR reminders, cache)
├── .claude/                    # Claude Code example (adapt paths for your agent)
│   ├── CLAUDE.md               # Agent instructions (skill-loading block)
│   ├── settings.json           # Hooks, statusline
│   └── skills/
│       ├── t3-ticket -> ~/workspace/teatree/t3-ticket
│       ├── t3-code -> ~/workspace/teatree/t3-code
│       ├── ...
│       └── ac-acme -> ~/workspace/acme-overlay
└── workspace/
    ├── teatree/                # Teatree clone (or fork)
    ├── acme-overlay/           # Project overlay
    ├── acme-backend/           # Main repo clone
    ├── acme-frontend/          # Main repo clone
    └── ac/                     # Ticket worktrees
        ├── 1234/
        │   ├── acme-backend/   # Worktree
        │   ├── acme-frontend/  # Worktree
        │   └── .env.worktree   # Shared env
        └── 5678/
            └── ...
Enter fullscreen mode Exit fullscreen mode

The symlinks ensure that skill files always resolve to the live git clone. This is important for the retrospective — when the agent writes improvements to skill files, the changes land in a real git repository where they can be committed and pushed.


When it helps (and when it doesn't)

It helps most with: structured, repeatable processes that span multiple repos or require project-specific knowledge. Ticket intake, worktree setup, TDD cycles, code review, MR creation, CI debugging. The kind of work that eats hours but follows a pattern.

It helps less with: one-off creative decisions, highly ambiguous tasks, or projects simple enough that a single repo with npm start covers everything. If your development workflow is "edit a file and push," teatree is overkill.

The sweet spot is when you have enough friction that encoding it pays off through repetition. The project is still experimental — it works for my workflow but hasn't been tested beyond that. If something doesn't click for your setup, open an issue or a PR. Or point your AI agent at the problem and let it fix things until it works for you. That's kind of the point.

A note on security

Teatree skills are prompt instructions — they control what your AI agent does. That makes the supply chain a security surface. The defaults are conservative: self-improvement is off (T3_CONTRIBUTE=false), pushing is disabled (T3_PUSH=false), and there is no auto-update mechanism. You opt in to each level of automation explicitly. If you use a fork from someone else, you're trusting that person's skill files as agent instructions — review changes before pulling.

Why "teatree"?

TEA's Extensible Architecture for work*tree* management. Also: teatree oil cuts through grime, and that's what this does to multi-repo worktree friction.

GitHub | MIT License

Top comments (0)