AI Work Doesn't Fail All At Once. It Drifts

#ai #agents #discuss #architecture

A few days ago I was looking at Microsoft's AI Engineering Coach project. It analyzes coding-agent logs after a session ends and surfaces patterns and anti-patterns in how developers worked with AI. The implementation isn't what caught me. It was the assumption underneath it: AI work generates operational signals. That's a bigger idea than it sounds.

For the last year, most AI tooling has focused on one of three things: better agents, better memory, or better orchestration. All three matter, but they all assume the same thing. The agent is the center of the system, and the work is whatever it hands back at the end. We judge a session by whether the final diff looks right, not by what happened on the way there, because nothing in the workflow asks us to look. Lately I've been wondering if that's backwards.

Agents Don't Work. They Take Shifts.

In a previous article, I argued that agents don't really "work." They take shifts. Claude works for a while, then Codex, then Cursor. Sometimes a human steps in, then another model takes over. The work continues. The participants rotate, and each one only sees the part of the work that happened on their own shift.

That's a continuity problem. If nobody, not the agent and not you, has a continuous view of the whole project, things get re-explained, re-decided, and occasionally undone by whoever shows up next without knowing better. That observation led me to build Holistic, a system focused on preserving project continuity across those shifts.

But closing that gap surfaced a second, harder question, and it's the one this piece is actually about. Even within a single shift, one agent, one session, nobody handing off to anyone, the work isn't always doing what it looks like it's doing. Projects rarely fail all at once. They drift. The agent gets stuck in a retry loop. Requirements slowly fall out of focus. Research expands without execution. Scope quietly grows beyond the original task. Tests keep failing for an hour while everyone hopes the next attempt will somehow be different. Nothing is technically broken. The session is still running, the agent is still producing output, but the work is degrading underneath all of it.

Drift Is Expensive, and It Hides

When work drifts, it creates waste, and not just wasted tokens. Wasted engineering time. Wasted compute. Wasted attention. The cost is often invisible because the project still appears healthy: the agent is busy, files are changing, commits are happening, progress is being reported. But underneath the surface, effort is accumulating without producing meaningful forward movement. By the time the failure becomes obvious, the waste has already been incurred.

You've seen the shape of this even if you haven't named it. A retry loop runs for an hour. A requirement gets forgotten halfway through implementation. A solution gets built, removed, and rebuilt. A new agent spends thirty minutes rediscovering decisions that were already made an hour earlier. These aren't isolated mistakes. They're forms of operational waste, and traditional manufacturing systems have spent decades trying to catch exactly this kind of thing before it compounds.

What Manufacturing Already Knows

Andon boards exist to surface problems while work is still in progress, not after the shift ends and not after production wraps. While the problem is still recoverable. That idea feels increasingly relevant to AI engineering.

What if we treated AI work as an operational system, where agent activity isn't just output but telemetry? A failing test isn't merely a failing test, it's a signal. Repeated edits to the same files are signals. Reversing decisions is a signal. Expanding scope is a signal. Repeatedly asking for information already sitting in context is a signal. Viewed individually, these events look like noise. Viewed together, they become findings, and findings can drive intervention.

That's the real shift I'm pointing at: not "here's what went wrong yesterday," but "something is drifting right now." Retrospective coaching is useful. Real-time supervision changes outcomes. The goal isn't to understand drift after the fact, it's to interrupt it before the waste compounds.

Holistic Remembers. Andon Watches.

This is the idea behind Andon, which already ships inside Holistic as an experimental add-on. It's incomplete by design right now, but the foundation is there: the pieces needed to start recognizing drift patterns and surfacing findings, not just recording checkpoints.

Holistic asks: "What does the project remember?" Andon asks: "What needs attention right now?" The stronger Holistic gets, the more context Andon has to work with. The more Andon catches, the more valuable Holistic's checkpoints become. One preserves continuity. The other watches for drift. They're two different jobs, and I don't think most tooling right now is doing either one on purpose.

I don't think this is a new agent framework, a memory system, or an orchestration platform. It's operational intelligence for AI work, and I think that's its own category.

Microsoft's Coach analyzes completed sessions and surfaces patterns that would otherwise be easy to miss. That's worth having. But if AI-generated work keeps becoming a bigger share of how software gets built, we won't just need systems that explain the wreckage after the fact. We'll need systems that catch it while it's still just a crack.

Because AI work doesn't fail all at once. It drifts, and drift creates waste. The job of supervision is catching that drift before the work becomes unrecoverable.