DEV Community

Dibyanshu kumar
Dibyanshu kumar

Posted on

Tag-Team Deep Dive: Architecture & Technical Reference

Tag-Team Deep Dive: Architecture & Technical Reference

This is the companion technical reference for "How I Made Claude Code Finish Tasks That Outlast Its Memory." That post covers the problem and the concept — this one covers the full architecture, configuration, and implementation details.


Architecture Overview

Tag-team has three layers:

┌─────────────────────────────────────┐
│  Dispatcher (SKILL.md)              │  ← Your conversation. Stays lean.
│  - Parses arguments                 │
│  - Manages the relay loop           │
│  - Tracks progress                  │
├─────────────────────────────────────┤
│  Workers (spawned agents)           │  ← Disposable. One at a time.
│  - Do the actual work               │
│  - Monitor their own context        │
│  - Write handoff files on exit      │
├─────────────────────────────────────┤
│  Handoff Files (.claude/tag-team/)  │  ← Persistent state on disk.
│  - Structured markdown              │
│  - Self-contained resume context    │
│  - Append-only progress log         │
└─────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

The dispatcher never reads large files or does real work. It spawns agents, reads their short return messages, and appends to a progress log. This is why it can run 10+ iterations without hitting its own context limit.

File Structure

After a 3-worker relay, your .claude/tag-team/ directory looks like:

.claude/tag-team/
├── progress.md          # Append-only log of all iterations
├── handoff-001.md       # Worker 1's state when it handed off
├── handoff-002.md       # Worker 2's state when it handed off
└── archive-20260401-143022/  # Previous session (if any)
    ├── progress.md
    └── handoff-001.md
Enter fullscreen mode Exit fullscreen mode

Configuration

config.json controls relay behavior:

{
  "max_iterations": 10,
  "handoff_dir": ".claude/tag-team",
  "handoff_prefix": "handoff",
  "progress_file": "progress.md",
  "context_thresholds": {
    "prepare_handoff": 70,
    "force_handoff": 80,
    "emergency_stop": 90
  },
  "worker": {
    "tool_call_soft_limit": 50,
    "mode": "bypassPermissions"
  }
}
Enter fullscreen mode Exit fullscreen mode
Field What it does
max_iterations Hard cap on relay workers. Prevents runaway loops.
context_thresholds.prepare_handoff At this %, worker finishes current item and stops starting new work.
context_thresholds.force_handoff At this %, worker stops immediately and writes handoff.
context_thresholds.emergency_stop At this %, worker dumps partial state and exits.
worker.tool_call_soft_limit If a worker makes this many tool calls without a context warning, it hands off proactively. Safety net for when the proxy monitoring is delayed.
worker.mode Permission mode for worker agents. bypassPermissions avoids approval prompts mid-relay.

The Dispatcher Loop (Pseudocode)

parse arguments → (fresh | resume | status)

if status:
    read progress.md, list handoff files, display, stop

if resume:
    find highest-numbered handoff file
    set iteration = that number

if fresh:
    archive old session if exists
    create .claude/tag-team/
    write initial progress.md
    set iteration = 0

load worker-instructions.md

while iteration < max_iterations:
    build prompt:
        if iteration == 0: worker instructions + task description
        if iteration > 0:  worker instructions + "read handoff-{N}.md and continue"

    spawn agent(prompt, mode=bypassPermissions)

    parse result:
        "ALL_DONE: ..." → break
        "HANDOFF: ..." → increment iteration, continue
        else → check for handoff file, error if missing

    append to progress.md

display final report
Enter fullscreen mode Exit fullscreen mode

Handoff File Format (Complete)

Every handoff file follows this structure. The sections are mandatory — workers are instructed to fill all of them.

## Mission
- Goal: <original task, copied verbatim>
- Status: <percentage done, items completed/total>
- Next step: <single most important thing for the next worker>

## Technical State
- Files modified: <full paths of every file changed>
- Files created: <full paths of every file created>
- Active entities: <repos, directories, configs, temp files in play>
- Working directory: <cwd if relevant>

## Key Decisions
- Decisions made: <choices that the next worker MUST preserve>
- Dead ends: <approaches that FAILED  saves the next worker from repeating>
- Constraints: <rules from the original task, copied exactly>

## Progress
- Completed: <numbered list with brief descriptions>
- In progress: <what was being worked on at handoff  include partial state>
- Remaining: <numbered list of what's left>

## Resume Instructions
<Written as a briefing for a new teammate who has never seen this task.
Explicit and actionable. Includes file paths, exact commands, specific
next steps. The next worker reads ONLY this section — it doesn't have
the conversation history.>
Enter fullscreen mode Exit fullscreen mode

Why Each Section Exists

  • Mission: Prevents goal drift across workers. The task description is copied verbatim so Worker 5 is solving the same problem as Worker 1.
  • Technical State: Workers need to know what's on disk. Without this, they waste tool calls re-discovering the file layout.
  • Key Decisions / Dead ends: The highest-value section. Dead ends are the difference between a smart relay and a dumb restart. If Worker 1 learned that batch-reading 20 files blows up context, Worker 2 shouldn't rediscover that.
  • Progress: Explicit item-level tracking. "Completed files 1-80" is actionable. "Made good progress" is not.
  • Resume Instructions: Self-contained. Written for someone with zero context. This is what the next worker actually executes from.

Worker Rules

Workers follow five operational rules designed to prevent the most common failure modes:

1. Interleave reads and writes

The #1 cause of context exhaustion: reading 50 files into context, then trying to write outputs for all of them. By the time you start writing, you've forgotten the early files.

Instead: read 5-8 files, write their outputs, repeat. Each batch is a self-contained unit of work that survives a handoff.

2. Save work to disk frequently

Every file written to disk is progress that survives a handoff. Unwritten work that's only in the conversation context is lost when the worker exits. Write early, write often.

3. Prefer small committed units

Finish one item completely before starting the next. A half-processed file is harder to resume than an unstarted one — the next worker has to figure out what's done and what isn't.

4. Track your own progress

Workers maintain a mental count of items completed vs. total. This count goes into the handoff file and the return message. Without it, the dispatcher can't report meaningful progress.

5. Don't re-read what the previous worker summarized

If the handoff file says "file X contains a REST endpoint for user creation," trust it. Only re-read source files when you need to generate output from them. This saves context for actual work.

The --skill Flag

The composition model is simple: when --skill is provided, the task description is rewritten to invoke that skill:

# Without --skill
/tag-team "Process all 200 files"
→ Task sent to worker: "Process all 200 files"

# With --skill
/tag-team PROJ-12345 --skill develop
→ Task sent to worker: "Run /develop PROJ-12345. This is a tag-team relay
   — follow the worker instructions for context warning handling and handoff."
Enter fullscreen mode Exit fullscreen mode

The wrapped skill doesn't know it's inside a relay. It just runs normally. If it exhausts context, the worker's context-warning protocol kicks in, writes a handoff, and the next worker resumes the skill from the handoff state.

This means any long-running skill becomes context-resilient without modifying the skill itself.

Error Handling

Three failure modes and how each is handled:

Failure Detection Response
Worker exits without ALL_DONE or HANDOFF prefix Dispatcher checks if handoff file exists at expected path If file exists: treat as handoff. If not: log error, stop relay, show full output to user.
Worker crashes mid-handoff Handoff file is incomplete or missing expected sections Dispatcher warns user and stops. Suggests /tag-team resume after manual inspection.
Max iterations reached Loop counter Dispatcher reports progress and suggests /tag-team resume to continue.

Workers are instructed to write a handoff file even on errors — partial state is better than no state.

Session Management

Fresh Start with Existing Session

If .claude/tag-team/ already has files from a previous run, the dispatcher warns:

Found existing tag-team session with 3 handoff files.
Use `/tag-team resume` to continue, or confirm to start fresh (will archive old files).
Enter fullscreen mode Exit fullscreen mode

On confirmation, existing files are moved to .claude/tag-team/archive-{timestamp}/. Nothing is deleted.

Resume Detection

/tag-team resume globs for handoff-*.md, sorts numerically, takes the highest number, validates the file has the expected sections, then continues the loop from that iteration.

Progress File

progress.md is an append-only log. After each worker:

## Iteration 2 (Worker-2)
- Completed: 2026-04-02T14:35:22Z
- Result: HANDOFF
- Summary: Processed files 81-150. Handed off at context 78%. Next: file 151...
Enter fullscreen mode Exit fullscreen mode

After relay completion:

## Relay Complete
- Total iterations: 3
- Final result: COMPLETED
- Completed: 2026-04-02T14:52:08Z
Enter fullscreen mode Exit fullscreen mode

When Tag-Team Adds Overhead vs. Value

Overhead wins (don't use tag-team):

  • Task fits in one context window
  • Task requires deep cross-file reasoning where splitting work loses coherence
  • Task is interactive and needs human input at multiple points

Value wins (use tag-team):

  • Batch processing 20+ files
  • Multi-phase workflows that routinely exhaust context
  • Any task where you've restarted a conversation more than once
  • Wrapping skills that sometimes fail due to context limits

The break-even point is roughly when you'd need 2+ manual restarts without tag-team.


This is part of a series on scaling Claude Code for enterprise workflows. Previously: How I Stopped Losing Work to Context Window Overflow, How I Taught an AI Agent to Save Its Own Progress, and Centralized Skill Management for Claude Code.

Top comments (0)