DEV Community: Rick Fleming

Past the Worktree

Rick Fleming — Wed, 29 Apr 2026 03:54:20 +0000

If you have used a desktop AI coding tool in the last several months, you have probably noticed they all solve the same problem the same way. You start a session for a task. The app spins up an isolated copy of your repo. The agent runs there. When you are done, you push, merge, or throw it away.

The thing under the hood doing the isolation, in nearly every case, is git worktree.

Pull up the docs for any of the desktop tools that wrap Claude Code, Codex, or similar agents in a multi-session GUI shell. You will find a worktrees tab in the settings, or a setup-script editor that targets per-thread environments, or a help article explaining that each agent runs in its own checked-out copy of the repo. The product names rotate. The mechanism does not.

This is what isolation looks like across the category. Worktrees are how it works.

The thing is, worktrees were not designed for this.

What worktrees were for

git worktree shipped in Git 2.5, in July 2015 (source). The original use case was a developer with one machine, one editor, and one annoying problem: you are hours into a feature branch, an urgent bug comes in, and you do not want to commit your half-baked work or stash it. So you spin up a second working tree on main in another directory, fix the bug, push, and come back. The shape of the feature is human, occasional, short-lived. Two or three working trees, hours to a couple of days alive, paired with one developer's attention.

You can squint and tell yourself multi-agent AI work is the same shape. You would be missing what is different about it.

A modern AI session might fork five attempts at the same task in a minute, or restore to a state from twenty checkpoints ago, or pick up a session another teammate started on a different machine. The cadence is wrong for worktrees. The number of branches is wrong. The lifecycle is wrong. And a few sharp limitations that humans only hit occasionally turn into near-constant friction for AI workloads.

What worktrees can't carry

A working AI session is more than a copy of your repo. The agent is having a conversation with you. It took a step at minute 18 that turned out to be wrong, you steered it back at minute 22, and at minute 40 you want to walk back to minute 17 to try a different approach entirely. None of that lives in git.

Worktrees only carry what git tracks. The agent's conversation history: not in git. The DVR-style recording of the session, with command output and intermediate file states scrubbable on a timeline: not in git. The decision trail: every moment you considered a different approach, every detour you abandoned, every save point you marked because the next change felt risky: not in git.

Tools that are worktree-based have to bolt these onto the side. They write conversation logs to disk somewhere outside the worktree. They track session metadata in their own database. They wire up cleanup logic so a deleted worktree also deletes the conversation file that referenced it. Each piece works individually. The shape of the seams between them is where the friction lives.

The fundamental version of the same point: every operation an agent or a user takes inside a worktree-based tool has to round-trip through git. Save your work? Make a commit. Revert? git reset or git checkout something. Switch to another attempt? Switch worktrees. Throw it away? Delete the worktree and clean up the registered ref. Decide you wanted that work after all? Hope you did not already delete it. Humans tolerate this because git is the price of admission for working in a team. AI agents do not have a stake in that price. They just want to write files, run tests, and try again.

Branching from a moment, not a ref

Worktrees branch off git refs. You ask for a new worktree and you point it at a branch, commit, or tag. The fork is a thing in git's mental model.

A real session works on a finer grain than that. You want to fork off the moment when the agent was about to try a different approach but you told it to keep going. You want to fork off the state right before a risky refactor, regardless of whether that state was ever a commit. You want to fork off "save point twenty minutes ago, before any of this conversation went where it went." Those moments are not refs. There is nothing for git to fork from.

In Taskeract, every checkpoint is a moment, and every moment is forkable. Click an older node in the history graph, work forward, save: a new line peels off the timeline at exactly that point. The base node is forkable. The first checkpoint is forkable. The hundredth checkpoint is forkable. Multiple lines can fan out from any single node. The graph is not a tree of refs grafted onto a parent branch. It is a tree of moments, and the entire tree is browsable, restorable, and brancheable from any vertex.

This is what we mean when we say it is not tied to a linear base point. Worktree models tend to assume one base, one feature branch off it, occasionally a stash or a temporary detour. Real AI work fans out. Five attempts at the same starting state. Three explorations from three different mid-session checkpoints. A fork off a fork off a fork. The graph holds all of it without any of it becoming a real branch on the remote.

What we built instead

Taskeract's isolation model is not git worktrees. It is not commits, not branches, not anything else from the git repertoire. It is a separate primitive we call a checkpoint.

A checkpoint is a save point. Not a commit. Whenever there is uncommitted work, a button in the session header shows the number of changes; click it (or press Mod+K) and a composer opens with a title field (pre-filled with the timestamp) and an optional description. Type something and hit Enter, or just hit Enter. You can also tell the agent to take the checkpoint, in which case the composer opens with a meaningful title and description the agent thought were worth recording, ready for you to accept, edit, or discard. Either way, the checkpoint records the state of the workspace, the agent's conversation, and the position in the session's recording at that moment, and a node is added to the session's history graph.

You do not write commit messages. You do not pollute the branch with WIP entries. You do not think about whether to stash or commit before switching contexts. Saves are cheap, the cost is invisible, and you can take fifty of them without any of them ever showing up in the git branch reviewers will eventually look at.

Restore is the inverse. Click any node in the graph and click Restore. The workspace materializes there, the conversation jumps to the matching turn, the recording rewinds to the same point, and the session resumes as if the intervening time had not happened. No detached-HEAD warning. No git reset --hard. No risk of losing uncommitted work, because anything you had unsaved gets auto-captured as a WIP checkpoint on the line you walked away from, available with a simple restore if you change your mind.

The graph itself is the source of truth. Not a branch listing. Not a reflog. Not a stash entry that you might forget about. The graph shows every checkpoint as a node, every fork as a divergent line, and the active node as wherever the workspace currently is. Worktree-based tools cannot really render this view because git does not carry enough of the necessary state.

Multi-attempt is a workflow now

This is what makes "give me five different ways to do X" actually work. The agent itself can take checkpoints, navigate them, and produce sibling tips on the timeline graph for you to pick from. Restore to a starting state, run an attempt, save the tip, restore back, run a different attempt, save another tip. Five forks off the same starting point, all visible in the same graph, all selectable, all comparable.

There is no worktree shuffle. There is no per-attempt branch ceremony. The agent does not have to convince git that this fifth experimental idea deserves a real branch name on the remote. The save points are the structure.

Diffs that read like code

Every checkpoint knows what changed in it, all the way back to the base branch. Select a checkpoint, click View diff, and the panel swaps for a full-screen diff of the cumulative changes that node represents. Close, and you are back at the graph with your selection intact.

The diff itself is semantic, not a wall of plus and minus signs. It understands the language you are reading: it lines up code by structure rather than by character offset, recognizes when a function moved instead of changed, and ignores the noise that text-only diff tools surface. Renaming a variable does not look like a hundred-line rewrite. Reformatting a block does not bury the one real change underneath. The diff reads more like a code review and less like a patch.

You can compare two checkpoints by hopping between them. You can read what a long line of work looks like in aggregate without ever materializing it on disk. You can spot the moment a regression went in by walking the timeline. Worktree-based tools cannot really do any of this on top of git. Most do not try.

Save versus publish

Saving and publishing are different operations.

On Pro, your teammates are already inside the session with you. The timeline syncs live, they can open it on their own machine, take their own checkpoints on it, branch off any node in the graph. They do not have to wait for anything to be "ready" to see what is going on. The session is the shared workspace.

Publish is the separate step for everything outside that boundary: pushing the work out to git as a branch that reviewers, CI, and the merge process can act on. Click Publish and your work goes out as one clean commit, ready for review or merge. Take fifty checkpoints to get there - the people looking at the PR see one tidy commit. Republish after more work and the branch updates. No rebase choreography, no squash-merge ritual, no interactive cleanup before pushing.

The fifty checkpoints are still in your timeline. They are still browsable, still restorable, still part of the session. They just are not in what got published, because nobody reviewing the PR needs to read fifty WIP entries to understand what changed. Worktree-based tools do not have a way to think about save and publish as separate things; in those models, every save is a commit, and every commit is something the git side will eventually see.

Sessions that travel

Pro adds the part that no worktree-based tool can really offer: the timeline travels.

Every checkpoint is end-to-end encrypted on the device that took it before it leaves the machine. The encrypted timeline syncs to your other devices, and to teammates in your organization. Open the same session from a different laptop and the workspace materializes at the latest saved checkpoint, the conversation history is there, the recording plays back, the graph is the same shape. Take a new checkpoint anywhere in that group and the rest see it within seconds. Only the devices on your account hold the keys; the cloud cannot read the contents.

The team angle is the strongest one. When you review a pull request, you can open the actual session that wrote it and watch the agent's full conversation, the timeline of decisions, and the recording. When you pick up an issue someone else started, the agent picker shows their existing session as a one-click attach. The work continues on your machine from where they left off, with the full graph intact. Cross-device is not a feature bolted on after the fact. It is what falls out of having a save primitive that is not tied to git's local-machine semantics.

This is genuinely hard to do with worktrees. Worktrees live on a single machine's filesystem. Their state lives in the git ref database, the working tree, and untracked files that git refuses to acknowledge. Reproducing one on a second machine means cloning, branch tracking, manual setup, and hoping the result matches. Reproducing the agent's conversation that goes with it is an entirely separate problem. Reproducing the recording is a third one. Nobody has built any of it because it is not really doable inside the model.

Where this goes

The worktree-based tools are good. They made parallel agent work tractable for the first time, and we have used them. The point of this article is not that they are bad. It is that they are a clever borrow from a different era, and the borrow is showing its limits.

A real AI session is more than a checkout of your repo. It is the workspace, the agent's conversation, the recording, and the graph of moments worth coming back to. None of that fits inside git worktree cleanly, because none of that was what git worktree was for.

We built Taskeract on top of git worktrees first. We hit every limit in this article, repeatedly, and a few we did not write down. So we ripped that layer out and rebuilt around the checkpoint graph instead. The save points carry everything: code, conversation, and recording, all in one node. The graph is the source of truth, every node in it is forkable, and the timeline travels across your devices and your team. Worktrees are no longer in the picture.

The future state of AI development is here. The question is whether your tooling lets you reach it.

The Shift That Already Happened

Rick Fleming — Fri, 27 Mar 2026 03:03:29 +0000

A year ago, if you wanted AI help writing code, you opened your editor and used whatever was built in. Copilot, Cursor, Windsurf. They've gotten genuinely capable. Agent modes that make multi-file changes, run commands, iterate on errors. These aren't just autocomplete anymore.

But something else was gaining traction at the same time, and it's quietly changing how serious AI-assisted development actually works.

The CLI agents

Claude Code, OpenAI's Codex CLI, OpenCode, and others work directly in your terminal. They operate in your actual development environment with full access to the filesystem, your shell, your toolchain. They read your codebase, create files, run tests, fix what breaks, and iterate until the job is done.

Editor agent modes can do a lot of this too. The difference isn't really about what the agent can do in theory. It's about what model is doing the work and what it costs you to use it.

The model economics

Every tool in this space is grappling with the same problem: the best AI models are expensive to run, and someone has to pay for them.

Editor-embedded tools bundle model access into their subscription. Cursor uses a credit system (introduced June 2025) where credits deplete at different rates depending on which model handles the request. Their Pro plan is $20/month, Ultra is $200/month with a much larger credit pool. If you exceed your credits, you pay overages at API rates. Their agent and edit features only work with Cursor's own custom models. You can't bring your own API key for those.

Windsurf recently restructured to a quota system with daily and weekly refreshes. Pro is $20/month, and they added a Max tier at $200/month for heavier usage. Individual users can still bring their own API keys, but teams and enterprise users can't.

GitHub Copilot uses premium request allowances. Copilot Pro+ at $39/month includes 1,500 premium requests, with overages at $0.04 each. When you exceed your allowance without paying overages, you fall back to a less capable model.

CLI agents connect to model providers directly. Claude Code authenticates with a Claude Max subscription ($100 or $200/month), giving you Opus with weekly limits. Codex CLI can authenticate with your ChatGPT Pro subscription ($200/month) for GPT-5.4 Pro access. You can also use API keys with per-token billing if you prefer.

At the $200/month price point, both approaches have limits. Claude Max and ChatGPT Pro use rolling time windows (5-hour and daily resets). Cursor Ultra and Windsurf Max use credit or quota pools. Both sides offer the option to pay overages at API rates when you exceed your allowance. The details of how much usage you actually get for $200/month are hard to compare directly since each platform measures differently, but anecdotally the model provider subscriptions tend to be more generous for heavy agent use than the equivalent editor tier.

The other difference is model choice. With CLI agents, you pick your model and your provider. Claude Code runs Opus because that's what Anthropic offers through Claude Max. Codex CLI runs whatever OpenAI makes available through ChatGPT Pro. With editor subscriptions, you use whatever models the editor vendor has chosen to offer through their system, and for Cursor specifically, agent features only work with their custom models.

Context

Every AI coding tool manages context, and every one of them hits limits eventually. CLI agents are no exception. Claude Code compacts conversation history when sessions get long. Auto-compaction kicks in around 80% of the context window, and earlier parts of the conversation get summarized. This is a real limitation that affects output quality over long sessions.

Editor-embedded tools face the same fundamental constraint, with an additional layer of complexity. They're assembling context from codebase indexes, open files, retrieval systems, and file references. Some show a usage meter so you can see the context window filling up. The context management is sophisticated, but it's also something you end up thinking about. Which files to reference, when to start a fresh session, how to keep the AI aware of what matters.

CLI agents have a more direct relationship with your project. The agent reads files from the filesystem when it needs them, rather than depending on what a retrieval system surfaced or which files happen to be open. The context window sizes are comparable when using the same underlying models, but you tend to spend less time managing the AI's awareness of your project and more time on the actual problem.

The workflow gap

Here's the thing about CLI agents. They're terminals. You start a session, the agent works, and then you're left with a pile of changes in a directory somewhere. Turning that into a reviewed, tested, merged pull request is still on you. And if you want to run multiple agents in parallel on the same repo? Good luck managing the git conflicts.

This is the gap that desktop AI workspaces fill. Not by replacing the CLI agents, but by giving them a proper environment to work in.

At Taskeract, we built the layer that wraps around these agents. Every session gets its own isolated git worktree, so agents never step on each other's work, or yours. You can run Claude Code in one session and Codex in another, both working on the same project simultaneously, on separate branches that won't conflict.

But isolation is just the starting point. The real workflow starts when the code is written.

From issue to done

Most tools in this space handle some of the post-coding workflow. You can create PRs, review diffs, push changes. But the full loop (starting from an issue, reviewing changes, creating a PR, monitoring CI, responding to reviewer feedback, merging, and closing the issue) still involves jumping between multiple tools. Your terminal, your browser, your git hosting provider, your issue tracker.

We built Taskeract to cover the entire loop. Start a session from an issue in GitHub, GitLab, Jira, Linear, or Trello. The agent works in its isolated environment. When it's done, review the changes with syntax-highlighted diffs right in the app. Push, create a PR, see CI status, respond to review threads, and merge, all without leaving the window. The issue automatically advances through its workflow states as the work progresses.

It's the difference between a tool that helps you write code and a tool that helps you ship code.

Where this is going

The autonomous agent approach with top-tier models is already producing better results at more predictable costs. And once the code is written, the workflow around it matters just as much as the code itself.

If you've been feeling like AI coding hasn't quite lived up to the promise, it might not be the AI that's the bottleneck. It might be what's around it.

The shift already happened. The question is whether your workflow has caught up.