DEV Community: Connor Hickey

Designing for the Moment You Switch Back: Introducing Workstream Continuity Design

Connor Hickey — Tue, 23 Jun 2026 15:13:42 +0000

Most software is good at remembering where information lives.

It is much worse at remembering where the work stands.

Consider a routine interaction with an AI agent:

You delegate a task.
You switch to two other workstreams.
You return ninety seconds later.
The agent says, “Done.”

Did it merely produce a draft, or did it change durable state? Which assumptions changed? What evidence supports the result? Did the agent remain within its authority? Is a human decision required? Can the work continue safely without you?

Answering those questions usually means reconstructing context from a mixture of chat messages, records, notifications, logs, tabs, approvals, and generated artifacts.

The system retained the activity. It failed to preserve the operating picture.

That gap is the subject of Workstream Continuity Design.

The interface remembers where the work stands—at every switch.

The full public research edition is available on Zenodo:

Workstream Continuity Design: Design Bible v0.4

Software is becoming an environment of continuing work

Traditional application design often assumes that the user is the direct executor.

The user opens a page, manipulates an object, submits a form, receives a response, and moves on. Pages, records, documents, and sessions are natural units for this model.

Those structures remain useful. The operating model around them is changing.

Software can now continue working while the initiating person is elsewhere. A single user may be supervising:

An agent researching a technical question
Another agent preparing a code change
A customer case waiting for a reply
A deployment blocked by approval
A document under human review
A background workflow that has entered an exception state

The central interaction repeatedly becomes switch-in:

Enter a workstream, acquire its operating state, make or delegate the next decision, and move elsewhere.

The reconstruction cost is paid at every switch—sometimes after days, but often after only seconds or minutes.

Agents amplify this problem through parallel execution, probabilistic plans, changing assumptions, tool use, and external side effects. They also create a difficult accountability condition: a person can remain responsible for work they did not directly perform and may not have watched unfold.

What is workstream continuity?

Workstream continuity is the degree to which a system lets a person move among concurrent workstreams and reconstruct the minimum sufficient operating state within the attention available.

That operating state includes:

Intent
Meaningful change
Responsibility
Authority
Evidence
Consequence
The safest useful next action

Workstream Continuity Design, or WCD, is the practice of designing systems around that quality.

The durable unit is the workstream: a goal-directed course of work connecting relevant objects, actors, decisions, dependencies, artifacts, policies, events, and resume state over time.

A workstream may span multiple pages, records, conversations, agent runs, and sessions. Returning to its last URL is therefore insufficient. Location does not restore purpose, changed assumptions, responsibility, or permission.

The five commitments

The proposed discipline rests on five commitments.

1. Every focus transition is a first-class interaction

Products should deliberately design the moment a person enters or re-enters a workstream.

That includes a rapid switch after thirty seconds, an interruption lasting an hour, a return on another device, and an inherited handoff from another person.

2. The workstream outlives the page

The system should preserve the goal, state, actors, authority, meaningful changes, evidence, and resume point independently of navigation.

A browser session may disappear. The course of work should remain intelligible.

3. Operational state and agency remain separate

“AI active” does not tell us whether the work is ready, blocked, waiting, under review, or failing.

Similarly, one assignee field cannot adequately represent:

Who remains accountable
Who or what is acting now
Who must act next
What authority applies

These are separate dimensions and should be stored and rendered separately.

4. The interface reconstructs meaningful change

Raw chronology is necessary for audit, but it is inefficient for orientation.

An operator usually needs statements such as:

Consent expired, invalidating the prepared outreach
Legal approval arrived, unblocking execution
The customer reply changed the requested scope
The agent produced the artifact, but the evidence remains incomplete
Responsibility transferred from the service to a human reviewer

The system should preserve the underlying events while presenting a reviewable semantic delta.

5. Oversight must affect what can happen

An approval button alone does not establish meaningful human oversight.

A continuity surface has to connect presentation with policy enforcement, provenance, consequence, intervention, reversibility, containment, and recovery.

The interface should explain the boundary. The architecture must maintain it.

A shared continuity grammar

Rapid switching becomes easier when equivalent situations are represented through equivalent semantics.

WCD proposes a compact continuity grammar:

GOAL · ATTN · STATE · DELTA · ACTORS · AUTH · EVIDENCE · EFFECT · NEXT

Each slot answers a specific operational question:

GOAL: What outcome are we pursuing?
ATTN: Why does this workstream deserve attention now?
STATE: What condition is the work currently in?
DELTA: What materially changed?
ACTORS: Who owns, acts now, and acts next?
AUTH: What actions are permitted, denied, expired, or unverifiable?
EVIDENCE: What supports the current state or recommendation?
EFFECT: What is the scope, externality, and reversibility?
NEXT: What is the safest useful next action or waiting condition?

A CRM workstream might look like this:

GOAL      Qualify Northstar renewal
ATTN      Human review required
STATE     Review
DELTA     Consent changed from valid to expired
ACTORS    Owner: account lead | Current: none | Next: account lead
AUTH      External send denied
EVIDENCE  Verified CRM record, 09:42
EFFECT    External and non-reversible
NEXT      Review consent evidence

This is more expressive than a traffic light and considerably more compact than replaying a transcript.

The grammar can also have multiple compression levels. A portfolio view may show only attention, state, delta, and next action. Entering the workstream expands the full grammar. Decision and audit views add sources, policy basis, alternatives, raw events, approvals, receipts, and recovery history.

Existing interface patterns still matter

WCD does not require every product to become a giant command center.

Dashboards, queues, tables, chats, notifications, workflow builders, and activity feeds all solve legitimate problems. Their roles need sharper boundaries under concurrent delegated work.

A dashboard can be a strong continuity surface when it is organized around obligations, meaningful changes, owners, next actors, consequences, and safe actions.

A chat can remain useful for invocation, clarification, and explanation. It should not become the sole state model for durable work.

An activity feed can preserve chronology and attribution. It should not force the operator to infer every important change from dozens of low-level events.

A notification can summon attention. Its destination should reconstruct the relevant operating context rather than opening a generic record.

The objective is one coherent operating model—not one enormous pane containing every piece of information.

Recommendation, confidence, and permission are different

Many AI interfaces visually collapse several independent questions:

What did the machine observe or produce?
What does it recommend?
How strong is the evidence?
What is currently authorized?

A high model score does not grant permission.

A completed draft does not mean an external action occurred.

An approval artifact does not necessarily authorize a modified payload, another recipient, or a later action.

A green status should not simultaneously mean healthy, accurate, approved, permitted, and complete.

WCD treats recommendation, evidence, uncertainty, consequence, approval, and authorization as separate system concepts.

For consequential work, authority should be resolved by an independent policy service rather than asserted by the agent or inferred by the browser. Execution should recheck current policy even when an earlier approval exists.

Machine work needs an accountable surface

The continuity grammar depends on reliable underlying records.

For that reason, the Bible also proposes a WCD Accountable Expression Profile for operator-visible machine claims, proposals, state changes, exceptions, and actions.

The central idea is that consequential machine expressions should be:

Typed
Attributable
Connected to evidence
Bound to a durable workstream
Bounded in consequence
Resolved against current authority
Paired with an appropriate human intervention path

The agent may produce content and suggest classifications. It should not unilaterally certify that its evidence is sufficient, its action is reversible, or its authority is valid.

Identity, evidence, policy, execution, and workstream services each contribute the facts they can authoritatively validate.

Measuring continuity

A continuity feature should not be considered successful because users clicked it quickly or said they liked it.

The framework proposes measures such as:

Time to Orientation: How long does it take to identify whether attention is required and select the right workstream?
Time to Decision Readiness: How long does it take to correctly understand the decision, material delta, actors, authority, evidence, consequence, and safe options?
False-ready rate: How often does someone believe an action is ready or permitted when it is not?
Cross-workstream contamination: How often are facts, intent, authority, or evidence incorrectly imported from another active workstream?
Next-actor accuracy: Can users correctly identify who must act next?
Intervention latency: How quickly can a person detect and contain a divergent machine process?

Speed only counts as an improvement when understanding and decision quality remain accurate.

The category is intentionally falsifiable. If conventional dashboards, queues, histories, and notifications perform equally well without explicit workstream, delta, actor, evidence, consequence, and policy models, then the proposed category should collapse back into established enterprise UX practice.

What the proposal claims

The underlying human-factors research is established: task switching, interruption recovery, goal reconstruction, situation awareness, supervisory control, appropriate reliance, distributed cognition, auditability, and graceful recovery.

Workstream Continuity Design is an original synthesis of those areas around a particular operating unit and interaction rhythm: durable concurrent work that continues across people, services, and partially autonomous software actors.

The terminology, continuity grammar, pattern library, metrics, maturity model, and accountable-expression profile remain design and standards proposals. They require prototypes, comparative studies, accessibility evaluation, field deployment, and revision.

The current document is a public, non-peer-reviewed research edition—not a declaration that the category has already been proven.

The full Design Bible

The complete Workstream Continuity Design Bible v0.4 includes:

The category definition and boundaries
Thirteen core design principles
A canonical information architecture
A workstream and agency model
The continuity grammar and operational diff
Twenty-two interaction patterns
Human-oversight and safety architecture
Evaluation metrics and study protocols
A four-level maturity model
An AI-first CRM case study
A design-review checklist
Open research and standards questions

Read the full public research edition on Zenodo

The document is licensed under CC BY 4.0.

The questions I am most interested in testing are:

Are the five commitments complete?
Is the continuity grammar the smallest stable set required for rapid switching?
Is the accountable-expression profile correctly scoped against existing protocols?
Would the proposed metrics genuinely falsify the category?
Is the boundary correctly limited to the accountable, operated surface rather than private model reasoning or open-ended conversation?

The working thesis is simple:

The interface should remember where the work stands—at every switch.

The Scaffold and the Cage: Vibe Coding, Enabled Coding, and the Fight for Judgment

Connor Hickey — Sat, 30 May 2026 15:36:45 +0000

The phrase vibe coding has become a convenient way to describe a strange new relationship between humans, machines, and software. At its simplest, vibe coding means telling an AI system what you want and letting it produce the code. The human provides intent, mood, direction, and correction. The machine produces implementation. The result may be a game prototype, a tool, a website, a mod, a script, or an entire application. The person may not understand every line. They may not even pretend to. They describe the desired artifact, test whether it feels right, and keep prompting until the thing seems to work.

The term itself is recent. Andrej Karpathy coined vibe coding in a widely shared post in February 2025, describing a way of working in which you trust the model, stop reading the diffs, and "forget that the code even exists" (Karpathy, 2025); within the year the phrase had spread far enough to be named Collins Dictionary's Word of the Year (Collins, 2025). Karpathy was candid about what the mode gives up — in its original sense, vibe coding meant precisely not reviewing the output.

That description is useful, but it is also too blunt. It collapses many different practices into one label. It treats the person who blindly accepts generated code the same as the person who uses an agent to learn, debug, test, and gradually understand a system they could not have built alone. It also risks turning "vibe coder" into a social category — almost an insult — rather than a description of a method. The term can imply that someone is merely pretending to code, that they are outsourcing the real work while borrowing the identity of a programmer.

I am not sure that label fits me. At least, not cleanly.

I do not experience agentic coding as pretending to be a programmer. I experience it as finally being able to stay inside the programming loop long enough to become one.

That, at least, is the story I want to tell. A good part of this essay is an attempt to find out whether the story is true, or whether it is the most comfortable thing I could believe about a tool I have come to depend on.

The distinction matters. For me, and likely for many others, AI-assisted or agentic coding is not simply a shortcut around skill. It is a scaffold that makes skill reachable. It lowers the activation barrier. It helps manage the blank page, the syntax wall, the debugging spiral, the architecture fog, and the working-memory demands that make programming difficult to sustain. This is especially significant for people with ADHD or other executive-function challenges. Coding is not only a technical activity; it is also a cognitive endurance task. It requires attention, sequencing, planning, error tolerance, working memory, and the ability to return to a problem after repeated failure. Agentic coding changes the shape of that task.

The more interesting question, then, is not whether AI wrote the code. That question is already becoming less useful. The better question is: who owns the intent, the judgment, and the resulting system?

Coding is shifting from line production to system stewardship. In that shift, the meaningful boundary is no longer between human-written and AI-written code. The boundary lies between artifacts the human can own and artifacts the human merely accepts.

This essay began as a defense of the purple space between vibe coding and genuine ownership: the space where an agent writes more of the code than the human could comfortably write alone, but the human is still learning, testing, questioning, and moving toward understanding the system. I still think that space exists. But I no longer think it is a natural developmental stage. Purple is not a conveyor belt from dependence to competence. It is a fork. One path uses the agent as a scaffold and deliberately preserves the difficulty required to build judgment. The other uses the agent as a cage, removing so much friction that the user gains fluency without ownership. The difference is not whether the machine writes the code. The difference is whether the human refuses to surrender evaluation.

Throughout, I will use three colors as shorthand. Red is vibe coding in the narrow sense: the human expresses desire and accepts machine output with minimal understanding. Blue is enabled coding: the human leans on agents heavily but keeps conceptual ownership, verification responsibility, and the ability to reason about the system. Purple is the contested space between them — and the rest of this essay is an argument about which way it points.

Vibe Coding as Red: Desire Without Ownership

Vibe coding begins with desire. The human says, in natural language, what they want the software to do. The prompt may be specific or vague. It may describe an interface, a mechanic, a workflow, a tool, or a feeling. "Make me a basic platformer controller." "Build a save system." "Create an inventory UI." "Fix this bug." "Make it feel smoother." "Add juice." "Make the enemy smarter." "Make this look like a real app."

The agent responds with code. The human runs it. Something breaks. The human pastes the error back. The agent patches. The human tries again. Eventually the thing works, or appears to work. The loop continues.

There is nothing inherently wrong with this process. In low-risk contexts, it can be playful, productive, and creatively liberating. A solo developer can prototype faster. A non-programmer can test an idea. A designer can make an interactive sketch. A student can get unstuck. A person who would normally never touch code can suddenly make a working artifact.

The risk appears when the artifact becomes detached from human understanding. In the red zone, the user accepts code because it appears to work, not because they understand why it works. The program becomes opaque. The user's standard of correctness is surface behavior: the button clicks, the scene loads, the function returns something plausible, the error disappears. The agent becomes the only participant with any apparent model of the implementation, and even that model may be unstable or hallucinated.

This matters because software is not only output. Software has consequences. It stores data, moves money, exposes private information, controls experiences, shapes user behavior, and breaks in ways that can be subtle. Even in small projects, code accumulates. A prototype becomes a tool. A tool becomes infrastructure. A quick fix becomes an architectural dependency. The more a system grows, the more dangerous it becomes for the human to remain outside the logic of the thing they are building.

In red, the human says: "It works, so I accept it."

That may be enough for a disposable prototype. It is not enough for ownership.

Enabled Coding as Blue: Acceleration With Ownership

Enabled coding looks similar from the outside. The human still uses an agent. The agent may still write most of the code. The human may still describe changes in natural language. The workflow may still include copy-pasting errors, asking for patches, and iterating quickly.

The difference is not the amount of AI involvement. The difference is the human's relationship to the artifact.

Enabled coding means the agent reduces the execution burden while the human retains responsibility for direction, comprehension, verification, and maintenance. The human does not need to type every line to own the system. They do need to understand the relevant behavior well enough to make decisions about it.

In blue, the human asks different questions.

Why did you choose this pattern?
What files did you change?
What assumption does this function make?
What happens if the input is null?
What breaks if there are two players instead of one?
Is this state stored globally?
Can this be simplified?
Can we add a test?
Can you explain this like I am going to maintain it next month?

These questions change the role of the agent. The agent is no longer just a code vending machine. It becomes a pair programmer, tutor, debugger, explainer, and implementation accelerator. It can still be wrong, but its wrongness becomes part of a review process rather than a hidden liability.

Enabled coding does not require total mastery. That would be an unrealistic standard. No programmer understands every layer of the stack they use. Professional developers rely on compilers, engines, frameworks, libraries, documentation, autocomplete, forums, package managers, and abstractions they do not fully control. The question is not whether the human has absolute knowledge. The question is whether the human has enough situated understanding to responsibly guide, test, and maintain the system.

This is not only how I would like experienced developers to work; it appears to be how they actually do. When researchers observed and surveyed professional developers using AI agents through 2025, they found that the experienced ones do not vibe code at all. They plan the task, supervise the agent closely, and review its output rigorously, holding onto authority over design and implementation out of a refusal to compromise on software quality (Huang et al., 2025). Expertise, in agentic coding, expresses itself not as faster acceptance but as more disciplined control.

This is where the traditional gatekeeping around programming starts to break down. If programming is defined narrowly as manually producing lines of syntax, then AI-generated code seems to threaten the identity of the programmer. But if programming is understood as designing, reasoning about, testing, maintaining, and evolving computational systems, then agentic tools do not erase programming. They shift its center of gravity.

The coder becomes less like a typist and more like a system steward.

The Purple Zone: Scaffolded Ownership

Between red and blue is purple.

Purple is the state where the agent writes more code than the human could comfortably write alone, but the human is not merely accepting magic. The human is directing, testing, questioning, and learning. They may not understand the implementation immediately, but they do not treat incomprehension as the final state. They use the agent to move toward understanding.

This is the zone where many new programmers probably live now. It is also where many solo builders, indie developers, modders, designers, domain experts, and neurodivergent creators may find themselves. They are not traditional programmers in the old sense, but they are not non-programmers either. They are becoming capable through collaboration with a machine.

Purple is easy to dismiss because it looks messy. The person may ask naive questions. They may rely heavily on the agent. They may struggle to explain the code at first. They may use imprecise language. They may build something that works before they fully understand why it works. To an experienced programmer, this can look like incompetence wearing a productivity mask.

But that judgment, I want to argue, misses the developmental nature of the process. A beginner using an agent is not necessarily bypassing learning. They may be entering learning from the other side. Instead of spending weeks blocked by syntax, setup, and error messages, they can start with a functioning artifact and then interrogate it. They can ask the agent to explain the architecture. They can trace the data flow. They can request comments. They can break the code and repair it. They can compare implementations. They can ask why one approach is better than another. They can move from outcome to mechanism.

That is not fake programming. It is scaffolded programming — and the word is not loose. In developmental psychology, scaffolding (Wood, Bruner, & Ross, 1976) names the temporary support a more capable partner supplies so that a learner can accomplish something that would be "beyond his unassisted efforts," within what Vygotsky (1978) called the zone of proximal development: the distance between what a learner can do alone and what they can do with help. But the concept carries a condition that is easy to forget. The defining feature of a scaffold, in that literature, is that it fades — it is deliberately withdrawn as the learner's competence grows. A scaffold that is never removed is not a scaffold. It is a permanent prop, and the building never learns to stand.

The distinction depends on whether the scaffold becomes a bridge or a cage. If the user remains dependent on the agent for every change, every bug, and every explanation, purple collapses back into red. The artifact remains opaque. The user can produce software but cannot own it. But if the agent helps the user build a mental model, purple moves toward blue. The user becomes more capable over time.

I have now used the word if twice in a single paragraph, and I want to flag that, because everything optimistic in this essay is hiding inside those conditionals. I have asserted that the scaffold can become a bridge. I have not yet given any reason to believe it tends to. That is the work the rest of the essay has to do, and it is harder than I would like it to be.

ADHD, Executive Function, and the Programming Loop

The ADHD angle is not incidental. It may be central.

Programming is often described as a logic skill, but in practice it is also an executive-function gauntlet. A programming task requires the developer to hold multiple layers of information in mind: the goal, the current bug, the relevant files, the syntax, the architecture, the runtime behavior, the error messages, the edge cases, and the next step. The developer has to break large tasks into smaller tasks. They have to tolerate delayed gratification. They have to recover from repeated failure. They have to remember what they were doing before the last error interrupted them.

For someone with ADHD, these demands can become the real barrier. The problem is not always lack of intelligence or lack of interest. It can be task initiation, sequencing, working memory, context switching, emotional regulation, and persistence through friction. Programming creates friction constantly. One missing semicolon, one broken dependency, one unclear error, one setup issue, one file in the wrong folder — any of these can derail momentum.

Agentic coding can function as an external executive system. It can hold context. It can summarize the next step. It can break a feature into smaller chunks. It can explain an error without the shame spiral of feeling stupid. It can offer a concrete first move when the blank page is too abstract. It can convert "I want this mechanic" into "start by creating these files and these functions." It can keep the loop alive.

For me, that matters more than I know how to say in an essay that is trying to stay analytical. The agent does not simply make coding faster. It makes coding reachable. It lets me remain in contact with the work long enough to build understanding. Instead of falling out of the loop every time the task becomes too abstract or too fragmented, I can use the agent as a stabilizer. It gives me a way back in.

This reframes the ethics of AI-assisted coding. The public conversation often treats AI coding as a question of laziness, authenticity, or cheating. Those frames are too narrow. For some people, agentic coding is closer to access technology: an external support for task initiation, sequencing, working memory, and recovery from failure. It does not remove the need for judgment, effort, or learning. It changes the conditions under which those things become possible.

That is the strongest version of my case, and I believe it. Which is exactly why I have to attack it now, because I notice that I have arranged the argument so that no one is allowed to question it. I have wrapped the claim in the language of disability and accessibility, and that language has a way of ending conversations. To doubt an accessibility tool feels like doubting the person who needs it. But I am not interested in an argument that wins by becoming unfalsifiable. So I have to ask the question the accessibility framing is designed to make me feel bad for asking.

The Counterclaim: Why the Scaffold Might Be the Cage

Here is the objection, and I am going to give it every advantage.

The danger of agentic coding is not that it removes labor. The danger is that it may remove the specific forms of labor through which judgment is formed.

There is a comforting word for the support I described a moment ago: a prosthetic. An external stand-in for a capacity I struggle to supply on my own. But if I let myself reach for that word, I inherit its darker half. A prosthetic is not a teacher. It is a substitute. We do not expect a prosthetic limb to grow a real one underneath it. A wheelchair is not a stage in learning to walk. So the moment I call the agent a prosthetic, I may be smuggling in good news that the metaphor does not actually contain. The honest version of the prosthetic framing is not hopeful at all. It is the picture of a permanent substitution for a capacity that will never develop — because the substitution removes the very stimulus that would have developed it.

Look again at what I praised, and notice what it costs.

Every executive function the agent supplies is one I do not exercise. It holds the context, so my working memory never has to stretch to hold it. It sequences the next step, so I never build the muscle of decomposition. It absorbs the failure spiral, so I am never the one who sits in the wreckage of a broken build until I understand why it broke. The agent does not strengthen these capacities by performing them for me, any more than a forklift strengthens my back. It performs them instead of me. And a function that is always performed for you is a function that quietly disappears.

The learning sciences have an uncomfortable name for the thing I have been treating as pure cost: desirable difficulty (Bjork, 1994; Bjork & Bjork, 2011). The finding, roughly, is that conditions which make a task feel harder and slower in the moment often produce more durable learning, and conditions which make a task feel fluent and easy often produce the illusion of learning without the substance. Underneath it lies a distinction Bjork draws between performance — how well you can execute right now, with support in place — and learning — the durable capability that remains once the support is gone. The two routinely move in opposite directions, which is exactly why fluency in the moment is such an unreliable signal of competence acquired. Struggle is not a bug in the process of becoming competent. In many cases struggle is the process. There is a related and equally inconvenient result, the generation effect: across decades of experiments, people remember and understand material they generate themselves far better than the same material merely shown to them (Slamecka & Graf, 1978). Reading a correct solution feels like understanding. It is not. It is recognition wearing understanding's clothes.

Now consider what an agentic coding tool actually is, mechanically. It is a fluency-maximizing machine. Its entire value proposition is the removal of difficulty. That is the product. That is what I am paying for, in money and in dependence. So if the difficulty was where the learning lived, then the tool is not protecting my learning. It is optimizing it away, and presenting me with the pleasant sensation of competence as the receipt. This gap between sensation and fact is measurable. In a 2025 randomized controlled trial, experienced open-source developers predicted that AI tools would speed them up, and reported afterward that the tools had sped them up — yet they actually completed their tasks roughly nineteen percent slower with the tools than without them (Becker et al., 2025). The feeling of acceleration and the fact of it had come apart, and the people inside the experiment could not detect the difference. If fluency can hide a slowdown that large, it can certainly hide the smaller, slower divergence between understanding a system and merely operating one.

There is an older version of this worry, from outside software, named the ironies of automation (Bainbridge, 1983). Bainbridge's observation was that when you automate the routine parts of a task and leave the human responsible for the rest, you erode the operator's skill at exactly the moments automation fails and a human must take over — so the more reliable the automation, the less prepared the human it ultimately depends on. Aviation has tested this directly, and the result is precise about which skills go. When researchers had airline pilots fly routine and non-routine scenarios in a Boeing 747 simulator at varying levels of automation, they found that the manual control skills — the stick-and-rudder motor skills — held up reasonably well, but the cognitive skills of manual flight, the knowing-what-to-attend-to and deciding-what-to-do, were the ones that decayed under reliance on automation (Casner, Geven, Recker, & Schooler, 2014; see also Ebbatson, Harris, Huddlestone, & Sears, 2010). The hands remembered the airplane. The judgment did not. Generated code threatens to fail along the same fault line. The agent handles the ordinary. The human is summoned only for the catastrophe — the subtle data corruption, the security hole, the architectural dead end that no further prompting can patch. The mechanical skill of producing code may well survive; it is the judgment that quietly hollows out, invisibly, behind the comfortable hum of things mostly working.

This is the point where my essay is in real trouble, and I want to name precisely how, because it is worse than a missing caveat.

Rehabilitation science draws exactly the distinction that exposes the problem. Assistive and rehabilitative technologies are not one category but two, with opposite definitions of success (Cook & Polgar, 2015). Some is compensatory: a wheelchair, glasses, a hearing aid. You will use it permanently, and that is completely fine — independence was never about legs or unaided eyes. Permanent dependence on a wheelchair is not a failure of the wheelchair. It is the wheelchair working. But some assistive technology is rehabilitative: a course of physical therapy, training wheels, a scaffold around a building under construction. Its whole purpose is to be outgrown. Permanent dependence on a rehab program is not a success. It is the rehab failing.

Here is the bind. My entire gradient — red to purple to blue, movement, becoming, "a steward of the system" — is a rehabilitative claim. I am promising that you outgrow the scaffold. The word "scaffold" gave it away. So I do not get to retreat to the comfortable wheelchair defense when challenged — "it's a prosthetic, dependence is fine" — because the wheelchair defense abandons my thesis. I committed to the harder claim: that the tool is a bridge you cross and leave behind. And the harder claim is exactly the one the entire deskilling literature suggests is least likely to come true, because the easier and more fluent a scaffold makes the work, the less reason and less stimulus there is to ever step off it.

And the ADHD framing, which I leaned on as my strongest card, may be my weakest. Because if executive function is genuinely the barrier, then a tool that supplies executive function on demand removes the only conditions under which executive function gets practiced. The story I told — "it keeps me in the loop long enough to learn" — assumes the time in the loop is spent learning. But it might be spent being carried. The frictionless loop is not obviously a classroom. It may just be a more comfortable room in the same cage.

I am not going to pretend this objection is weak. It is the true center of the question, and most of the optimistic writing about AI and coding, including my own first draft of this essay, simply walks around it.

What Survives, and Under What Condition

I do not think the objection is fatal. But it changes what I am allowed to claim, and it forces me to give up ground I would rather have kept.

The counterclaim forces a narrower definition of enabled coding. I can no longer define it as AI-assisted production that happens to make me feel more capable. Nor can I define it as any process where the agent helps me stay in motion. Motion is not growth. The only defensible definition left is this: enabled coding is agentic coding in which production may be delegated, but evaluation is not.

First, the concession, and it is a real one. For the executive-function layer specifically — initiation, sequencing, the working-memory juggling, the chunking of a feature into files — I will grant the compensatory reading and stop pretending it is rehabilitative. I do not need to internalize the ability to break a task into the right four files, and I probably will not, and I have decided that is acceptable, the way a writer does not need to internalize manuscript formatting to be a writer. Ownership was never made of those parts. So if those particular muscles atrophy, they cost me nothing I needed to keep. The skeptic is right about them, and being right about them turns out not to matter.

The real question is not about executive function at all. It is about judgment. And here I have to relocate the entire argument.

Ownership, when I am honest about what it consists of, is not the ability to type or to sequence. It is the ability to evaluate. To look at a working solution and know whether it is also a correct one. To recognize the specific texture of an agent that has stopped reasoning and started guessing. To reject a patch that passes every visible test but quietly corrupts the architecture. To come back next week, reopen the project, find the relevant part, and make a controlled change without starting from zero. That faculty — judgment — is what blue is actually made of. Everything else is logistics.

So the only question worth arguing is narrow and brutal: is the agent compensatory or rehabilitative with respect to judgment?

And here the objection bites hardest, because I cannot wave it off. Judgment is built by consequence. It is the residue of having been wrong and having had to find out why. If the agent absorbs the failure spiral — and absorbing the failure spiral is exactly what I praised it for in the ADHD section — then it may absorb the error-and-consequence loop that is, as far as anyone knows, the only way judgment forms. I have to admit the painful symmetry: the feature that makes the tool an accessibility device is the same feature that threatens the one capacity I cannot afford to lose.

This is why I can no longer claim that purple tends toward blue. On the frictionless path — the path the tool is engineered to make easy — purple does not tend toward blue. It tends toward a deeper red. Judgment atrophies precisely as the skeptic predicts, and the loss is masked by the pleasant fluency of a system that keeps mostly working.

What survives is smaller, and conditional, and I think true: judgment can still form, if the human refuses to offload evaluation even while offloading production. Those two things are separable, and the separation is the whole game. I can let the agent write every line and still insist on being the one who decides whether the line deserves to exist. But that insistence is not natural. It runs directly against the grain of a tool whose entire design is to make insistence feel unnecessary, even rude — the friend who finishes your sentences so smoothly you forget you had one.

Which means the practices of ownership are not safety advice bolted onto an optimistic essay. They are the essay's actual engine, and I had them filed under the wrong heading.

Read the diff.
Ask for the explanation, then check it against the behavior instead of trusting it.
Run the code yourself.
Write the test before you believe the fix.
Keep changes small enough to understand.
Ask what breaks when the input is null, when there are two players, when the network is gone.
Refactor deliberately.
Return to old code and find out, honestly, whether you still understand it — and treat the answer as data about yourself, not the code.

Reframe these correctly and they are not hygiene. They are deliberately reintroduced difficulty. Reading the diff is refusing to offload comprehension. Writing the test is refusing to offload the definition of correct. Asking what breaks when the input is null is manufacturing, by hand, the edge-case confrontation that the happy path would otherwise have spared me. Each practice is a conscious reinjection of the friction the agent removed — and crucially, friction placed back exactly where the learning lives, rather than scattered at random across syntax and setup, where it never belonged. That is the form the optimism has to take after the objection. Not "the scaffold becomes a bridge." Rather: the scaffold becomes a bridge only for the person who keeps rebuilding, by hand, the difficulty the scaffold was selling them relief from.

The Boundary: When Red Becomes Blue

With that correction in place, the old account of the boundary still stands, but it reads differently now. The movement from vibe coding to enabled coding is not a single moment, and it is not a current that carries you. It is a set of practices performed against the grain.

A red prompt says: "Fix this."

A purple prompt says: "Here is the bug, here is what I expected, here is what happened, and here are the files that might be involved. Help me inspect the cause."

A blue prompt says: "The problem seems to be that this state is updated before the event listener finishes. Propose a minimal patch, explain the tradeoff, and include a regression test."

The difference is not vocabulary. The difference is whether a mental model exists behind the words, and a mental model is the one thing the tool cannot hand you, because building it is the difficulty the tool removes.

The final test is delayed ownership. Can the person come back next week, reopen the project, understand the relevant parts, and continue? Can they debug without starting from zero? Can they explain the system well enough to improve it? If yes, the code is no longer merely something they accepted. It is something they are beginning to own.

But notice what that test really measures. It measures whether the friction got put back. The person who can return and continue is not the person the tool produced by default. It is the person who insisted on understanding things the tool was willing to understand for them.

Risks: When the Agent Owns the System

Everything in the standard risk inventory is real. AI-generated code can be insecure, inefficient, brittle, overcomplicated, or subtly wrong. It can introduce dependencies without explaining why. It can solve the local bug while damaging the larger design. It can pass the visible test path and fail under edge cases. It can invent APIs. It can confidently explain false reasoning. It can encourage the user to move faster than their understanding.

But after the counterclaim, I no longer think the central danger is in the code. The central danger is in the user. The risk is not primarily that the agent produces a bad artifact. It is that the agent produces a person who feels like an owner and is not one — a person whose sense of competence is calibrated to fluency rather than understanding, and who therefore cannot tell the difference between a system they command and a system that merely behaves, until the day it stops behaving.

A person can ship code they do not understand. They can collect users, data, payments, or trust with a system they cannot maintain. They can build a game or an app that becomes impossible to extend because every feature was patched into existence through disconnected prompts. They can become dependent on the agent as a repair oracle, unable to distinguish a good fix from a bad one — which is just another way of saying their judgment never formed, masked by years of things mostly working.

The practices above do not eliminate this. They reintroduce friction in the right places, slowing the user down just enough to keep a responsible human in the loop. That is the most they can do, and it only works if the user actually does them, against the tool's every incentive to skip them.

Against Traditional Gatekeeping

None of this rehabilitates the old gatekeeping, and I want to be careful not to let a sobering objection curdle into nostalgia.

The old image of programming centers on manual authorship: a programmer is someone who knows the language, writes the lines, fixes the errors, and builds the system through direct control. In that model, AI assistance looks like contamination. But programming was never only manual authorship. It has always involved layers of abstraction — engines no one fully understands, libraries no one wrote, operating systems and compilers whose output is rarely inspected. A developer using a game engine or a web framework is already delegating enormous amounts of behavior to code they did not author. The question has always been how well the developer can reason within those abstractions.

Agentic coding adds a new abstraction layer: natural language as an interface to implementation. That layer is unstable and risky, but it is still an abstraction layer, and rejecting it outright because it changes the shape of labor would repeat an old mistake — confusing the tools of programming with its essence.

The essence is not typing. The essence is judgment under stewardship: forming an intention, translating it into a computational system, evaluating whether the system behaves correctly, and maintaining it as requirements change. AI can participate in all of that. It can do most of the line-level production. The human's role does not disappear — unless the human surrenders the evaluation. That, and not the volume of AI involvement, is the line between an enabled coder and a vibe coder. And after everything above, I have to add that the surrender is not a single choice. It is the default outcome of a frictionless tool, and resisting it is a daily, unnatural act.

Conclusion: The Bridge You Have to Carry Across

Vibe coding asks whether the machine can make software from my desire. The question I began with was whether the machine can help me become the kind of person who can own the software I desired.

The honest answer is harder than the one I wanted to write. The machine can make me someone who appears to own it, instantly. And that appearance is precisely the danger, because it is indistinguishable from the real thing — to me most of all — right up until the moment the system breaks and demands that I be the one who actually understands it.

The bridge from red to blue exists. I am now fairly sure of that. But the agent does not walk me across it, and it does not pull me toward it. Its gravity runs the other way, toward the comfortable, fluent, hollowing cage, because removing difficulty is what it is for. The only way across is to carry, by hand and on purpose, the very weight the agent kept offering to take — to read what it would have let me skim, to struggle where it would have let me coast, to be wrong in the specific ways that build judgment instead of letting the wrongness be quietly absorbed and patched.

So I will not say that a new kind of programmer is being formed by this technology. By default, the technology forms passive consumers, and dresses them in the feeling of mastery. What is true is smaller and entirely conditional: the technology makes available a path that a disciplined minority can take, against its grain, by manufacturing the difficulty it was built to remove.

Purple, I have to admit at the end, is not a stage you pass through on the way to blue. It is a fork, and it is a place you can fall back from at any moment. The well-lit, frictionless path leads back to red. The other path is uphill, and you build it yourself, out of the difficulty you choose to keep.

A vibe coder accepts the artifact.

An enabled coder refuses to stop understanding it, even when the machine has made understanding optional.

And because the machine is built to make understanding feel optional, it will win that argument whenever the user stops actively resisting it. That is why enabled coding cannot simply mean coding with help. It has to mean coding under a discipline: the discipline of keeping judgment human when production no longer has to be.

References

Bainbridge, L. (1983). Ironies of automation. Automatica, 19(6), 775–779. https://doi.org/10.1016/0005-1098(83)90046-8

Becker, J., et al. (2025). Measuring the impact of early-2025 AI on experienced open-source developer productivity. METR. (Reported finding: experienced developers expected and perceived a speedup from AI tools while completing tasks ~19% slower with them.)

Bjork, R. A. (1994). Memory and metamemory considerations in the training of human beings. In J. Metcalfe & A. Shimamura (Eds.), Metacognition: Knowing about knowing (pp. 185–205). MIT Press.

Bjork, E. L., & Bjork, R. A. (2011). Making things hard on yourself, but in a good way: Creating desirable difficulties to enhance learning. In M. A. Gernsbacher, R. W. Pew, L. M. Hough, & J. R. Pomerantz (Eds.), Psychology and the real world: Essays illustrating fundamental contributions to society (pp. 56–64). Worth Publishers.

Casner, S. M., Geven, R. W., Recker, M. P., & Schooler, J. W. (2014). The retention of manual flying skills in the automated cockpit. Human Factors, 56(8), 1506–1516. https://doi.org/10.1177/0018720814535628

Collins Dictionary. (2025). The Collins Word of the Year 2025. HarperCollins.

Cook, A. M., & Polgar, J. M. (2015). Assistive technologies: Principles and practice (4th ed.). Elsevier/Mosby.

Ebbatson, M., Harris, D., Huddlestone, J., & Sears, R. (2010). The relationship between manual handling performance and recent flying experience in air transport pilots. Ergonomics, 53(2), 268–277. https://doi.org/10.1080/00140130903342349

Huang, R., et al. (2025). Professional software developers don't vibe, they control: AI agent use for coding in 2025. arXiv preprint arXiv:2512.14012. https://arxiv.org/abs/2512.14012

Karpathy, A. (2025, February 2). There's a new kind of coding I call "vibe coding" [Post]. X. https://x.com/karpathy/status/1886192184808149383

Slamecka, N. J., & Graf, P. (1978). The generation effect: Delineation of a phenomenon. Journal of Experimental Psychology: Human Learning and Memory, 4(6), 592–604. https://doi.org/10.1037/0278-7393.4.6.592

Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes. Harvard University Press.

Wood, D., Bruner, J. S., & Ross, G. (1976). The role of tutoring in problem solving. Journal of Child Psychology and Psychiatry, 17(2), 89–100. https://doi.org/10.1111/j.1469-7610.1976.tb00381.x