davidvk89

Posted on Jul 2

Case File: Context Drift

#ai #softwareengineering #programming #documentation

A Thought Trail on Operational Memory, Controlled Context Flow, and AI-Assisted Development

I started with what looked like a practical AI coding problem.

The coding agent was moving too fast. Not computationally fast. Conceptually fast. It could turn a loose idea into code before I had fully decided whether that idea was stable, temporary, rejected, or just something I was thinking through.

At first, the obvious suspect was prompting. Maybe I needed clearer prompts, better task descriptions, or more detail each time.

That helped, but it did not explain the whole pattern.

Then documentation looked guilty. If the AI kept losing track of project state, maybe I needed better documentation: cleaner notes, clearer architecture descriptions, more explicit decisions, and more reminders about what not to touch.

That helped too.

But that still was not the whole mechanism.

The problem was not simply that the AI lacked context. The problem was that different parts of the workflow stopped sharing the same understanding of project state.

The human had one version of the project in mind. The assistant had another, shaped by conversation context. The coding agent had another, shaped by the prompt and files it was given. The documentation had another, depending on whether it had been updated. The codebase had the actual state, but not necessarily the intent behind that state.

That gap is what I started calling context drift.

In AI-assisted development, context drift happens when the human, assistant, coding agent, documentation, and codebase stop sharing the same understanding of project state.

When that happens, a coding agent can act on incomplete, stale, speculative, or polluted context faster than the human can mentally reorganize the project.

That was the failure mode.

The AI Did Not Need More Context. It Needed the Right Context at the Right Time.

My first useful response was not “write more documentation.”

It was separating broad reasoning from narrow implementation.

The assistant could reason broadly. It could help explore options, identify risks, compare designs, recover intent, and translate vague thoughts into engineering language.

But the coding agent needed something different.

It needed bounded implementation instructions: what to change, what to preserve, what to avoid, what to verify, what documentation to update, and what to report back.

That pressure eventually became a stricter handoff artifact: a reviewed implementation contract.

In my workflow, I called these Codex Contracts. The name matters less than the role.

The contract was not just a better prompt. It was the reduction point between broad reasoning and narrow execution.

A planning conversation could stay messy. It could include uncertainty, rejected ideas, half-formed thoughts, possible future directions, and things I was not ready to implement.

But the coding agent should not act directly on that whole mess.

The implementation contract translated broad reasoning into bounded authority.

That distinction became one of the most important ideas in the whole project:

Context access is not authority to act.

A coding agent may have access to project documentation, prior discussion, planning notes, future ideas, rejected options, and implementation constraints.

But not all of that context should be allowed to direct implementation.

Useful reasoning context is not automatically implementation authority.

Documentation Changed Role

As the workflow developed, documentation started to change role.

At first, documentation was just helpful notes: architecture notes, current-state summaries, known technical debt, and decisions I did not want to forget.

But once those documents were updated, reviewed, and reused as future context, they became something stronger.

They started functioning as operational project memory.

That does not mean the documentation became source code. It did not compile, execute, or replace tests, code review, or engineering judgment.

But it could guide implementation.

When a coding agent reads documentation and acts from it, documentation becomes part of the instruction surface that shapes future code changes.

That changes how seriously documentation needs to be treated.

A normal bad documentation update is already a problem because it can mislead a human. But in an AI-assisted workflow, a bad documentation update can become bad memory.

If that bad memory is later reused by an assistant or included in coding-agent context, it can preserve false project state and feed it back into future work.

That is why documentation review stopped feeling like cleanup.

It became part of the mechanism.

The documentation had to be required, bounded, reviewed, updated, and reused inside controlled context flow.

More documentation alone was not the answer.

Slow Down the AI, Speed Up the Human

The slogan I kept coming back to was:

Slow down the AI. Speed up the human.

That sounds backwards at first. The whole selling point of AI coding tools is speed. Why slow them down?

Because the AI was already fast.

How the loop works: broad planning context is reduced into a reviewed implementation contract, executed narrowly by the coding agent, reviewed through a trust gate, and preserved as operational project memory for the next loop.

The problem was that it could move faster than I could safely understand, review, and direct. The workflow did not make the AI faster. It made the AI’s speed usable.

By slowing down the transition from reasoning to implementation, I could keep more authority over what the coding agent was actually allowed to act on.

The assistant could help reason, translate, and compress context. The implementation contract could define the task boundary. The coding agent could execute narrowly. Final review could compare the result against the contract. Documentation updates could preserve accepted project state for the next task.

My role shifted away from repeated re-explanation and cleanup, and toward intent, judgment, review, and final acceptance.

That is what “slow down the AI, speed up the human” meant in practice.

The Artifact Trail Mattered

The prototype did not prove that this workflow works generally.

It was one solo prototype, over a short development period, using a prompt-only AI-assisted workflow. It did not test large teams, multiple branches, long timelines, enterprise tooling, or multiple people updating the same project-memory layer.

So the evidence boundary has to stay narrow.

But the prototype did leave something useful behind: an artifact trail.

There were project-memory files, consecutive implementation contracts, documentation updates, final reviews, and later reuse of reviewed context across multiple development passes.

That mattered because the artifacts showed context moving through the workflow.

Broad reasoning did not move directly into coding-agent execution. It was reduced into reviewed implementation contracts. Those contracts acted as the handoff point between assistant reasoning and coding-agent execution.

After implementation, the result was reviewed, documentation was updated, and that reviewed documentation became usable context for later AI-assisted tasks.

That did not make the prototype proof.

But it did make the mechanism visible.

The value of the prototype was not that it answered the scale question.

The value was that it made the scale question concrete enough to ask.

This Is Not a Claim of Inventing Software Engineering

A fair critique is that none of this sounds completely new.

Specifications, reviews, architecture decision records, acceptance criteria, technical-debt tracking, implementation boundaries, and living documentation all existed before this.

I am not claiming otherwise.

The interesting question is narrower:

Do familiar software-engineering practices take on a different operational role when arranged around AI-assisted development, where assistants and coding agents may read, reuse, and act on project context?

In my prototype, the answer appeared to be yes.

Documentation was no longer only explanatory. It became operational. It became part of the memory layer.

Controlled context flow became the routing and authority layer.

The implementation contract became the reduction point between broad reasoning and narrow execution.

Review before reuse became the safety gate before documentation could become trusted future context.

The pieces were familiar.

The arrangement mattered because the failure mode was specific to AI-assisted development.

The problem was not only whether the AI had enough context.

The problem was whether the AI was acting from the right context, at the right time, with the right authority.

Better Tools Do Not Remove the Principle

Manual file passing was part of my prototype, but it was never the principle.

I manually dragged files around because that was what my tools allowed and because I wanted direct control over what context went where.

A better tool should reduce that manual overhead. Repo-aware coding agents should be able to inspect files, retrieve documentation, index project state, understand code structure, and work directly inside development environments.

That is the better direction.

But repo awareness does not remove the deeper problem.

Access to more context is not the same as controlled context flow.

A repo-aware agent with broad access can still act from the wrong context if it treats unreviewed planning, stale documentation, rejected ideas, or future possibilities as actionable implementation instruction.

That is why the principle is not manual file separation.

The principle is controlled separation between reasoning context, reviewed project memory, implementation authority, coding-agent instructions, documentation updates, and future reusable context.

Better tooling should reduce manual overhead while preserving or strengthening those control points.

The Security Turn

The final clue was not another workflow improvement.

It was the risk hiding inside the improvement.

If documentation could become operational project memory, then documentation was no longer only a record of the project. It could become part of the instruction surface future AI-assisted work relied on.

Not executable code.

Not a replacement for tests, review, or engineering judgment.

But still something that could influence future code changes.

That changed the case.

A bad documentation update was not just a bad note anymore.

If it passed review and entered future context, it could become bad memory.

If a coding agent treated that memory as implementation guidance, then misleading or poisoned documentation could shape code without anyone directly changing the source code.

This was not a newly discovered attack class. It belonged to a known family of LLM security risks showing up inside the mechanism.

OWASP describes indirect prompt injection as occurring when an LLM accepts input from external sources such as websites or files, and notes that the impact depends partly on the agency built into the system. OWASP lists possible impacts including unauthorized function access, arbitrary command execution in connected systems, and manipulation of critical decisions. (OWASP Gen AI Security Project)

OWASP’s prompt-injection prevention cheat sheet also lists RAG poisoning/retrieval attacks and agent-specific attacks among common attack types. (OWASP Cheat Sheet Series)

In my project’s terms, the risk is this:

A bad actor may not need to attack the source code directly.

They may attack the project memory layer that future AI-assisted work relies on.

That made the old distinction sharper:

Documentation access is not implementation authority.

A coding agent may be able to read broad project documentation.

That does not mean every sentence inside it should be allowed to guide implementation.

Trust Gates

This is where the trust gate mattered.

A trust gate is the point where context becomes more authoritative than it had been before.

Planning context becomes implementation authority. Draft documentation becomes operational project memory. Coding-agent output becomes accepted project state.

A future tooling system might implement that with permissions, review workflows, provenance checks, authority labels, policy enforcement, or some better mechanism I have not designed.

In my prototype, the trust gate was crude and manual.

It was me dragging files.

It was me deciding what crossed the boundary.

It was me deciding what stayed background context, what became implementation instruction, and what the coding agent was actually allowed to act on.

In hindsight, manual file dragging wasn't just awkward tooling.

It was a crude chain of custody.

That is why the detective metaphor started to fit.

The important question was not only:

What does the AI know?

It was:

Where did this context come from?

Has it been reviewed?

What authority does it have?

What is allowed to act on it?

What happens if it is wrong?

That is not just a workflow question.

At scale, it becomes a security and governance question.

The Limit of the Prototype

This is also where the prototype reaches its honest limit.

The artifact trail can show the mechanism. It can make the workflow inspectable. It can explain why documentation, controlled context flow, implementation authority, and review before reuse mattered in one observed workflow.

But it cannot prove safety at scale.

The same property that makes operational documentation useful also makes it sensitive.

If it can preserve project intent, it can also preserve false authority.

If it can help guide future implementation, it can also misguide future implementation when stale, poisoned, or poorly reviewed.

If this mechanism scales, then documentation governance becomes part of the safety model.

Projects would need to know which documents are passive explanation, which are planning-only, which are trusted operational memory, and which are allowed to become implementation authority. They would need provenance, ownership, review rules, stale-memory detection, conflict reporting, and better tooling around context routing and agent authority boundaries.

That is bigger than one solo prototype.

And that is the point.

What I Think I Found

I do not think I proved a finished methodology.

I do not think I proved long-term project memory.

I do not think this exact workflow should be copied unchanged.

What I think I found is a prototype-scale mechanism:

Documentation-as-operational-memory, when maintained through controlled context flow, may support long-term project memory in AI-assisted development.

That is a hypothesis, not a conclusion.

The prototype supports a narrower claim:

In one prompt-only AI-assisted development workflow, this mechanism appeared to reduce context drift and make later work more reviewable, bounded, and easier to resume.

That is enough to make the mechanism worth testing.

It is not enough to declare it solved.

The next work belongs to larger projects, longer timelines, multiple agents, team workflows, repo-aware tooling, and better security testing.

The case stops being only mine at exactly the point where the implication becomes too large to validate honestly alone.

That is not a failure of the prototype.

That is the research boundary.

The prototype found a mechanism specific enough to inspect. The artifact trail makes it possible to challenge. The hypothesis is clear enough to test.

Scaling it belongs to better tooling, better security boundaries, and more rigorous testing than one person can honestly provide.

DEV Community