DEV Community

Cover image for Three Kinds of AI Context: Most Tools Only Solve One
Vuong Ngo
Vuong Ngo

Posted on

Three Kinds of AI Context: Most Tools Only Solve One

AI context failure bundles three distinct problems: personal context (who you are), product-decision context (what the product should do), and local task persistence (what work is queued). Two new tools and one Anthropic feature each solve one layer. But the fourth layer — a shared, writable contract of the current open work item — is what none of them address, and it's why developers who've installed all three still feel stuck.


You set up a CLAUDE.md. Maybe you wrote memory files. In the week of 8 June 2026, two tools hit Product Hunt — one for personal context, one for product decisions — and you installed those too. You are still re-explaining the project at the start of every session.

The problem is the diagnosis. "AI starts from scratch" treats one frustration as one cause. It isn't. It's at least three separate context failures that happen to produce the same symptom, and most tools solve exactly one of them.

Here's the model I've landed on after watching this category for the past six months.


Layer 1 — Who you are

This is the most static layer. Your stack, your role, your preferences, how you like commit messages formatted, what framework you avoid. It changes roughly as often as your LinkedIn headline.

Unabyss hit #1 on Product Hunt on launch day with 755 upvotes for solving exactly this. The tagline is unambiguous: "Set it up once and never re-explain yourself to AI again." It pulls structured context from LinkedIn, Notion, and Gmail, then exposes it to any tool that speaks MCP, with per-tool visibility controls.

If this is your gap, you can fix it manually today. A CLAUDE.md covering who you are is a perfectly working solution for a single assistant:

## About me

- Stack: TypeScript, Node, PostgreSQL, React
- Prefer functional React; no class components
- Testing: Vitest + Testing Library; mock at service boundaries only
- Commit style: conventional commits, imperative mood, no period
- Time zone: AEST
Enter fullscreen mode Exit fullscreen mode

The limit is portability. If you run multiple agents, switch machines, or want consistent preferences across tools, file-per-tool doesn't hold. That's the gap Unabyss fills: one writable context store, readable by anything that asks.


Layer 2 — What your product should do

This layer moves slower than your task queue but faster than your identity. It's the architectural decisions already made, the approaches that were ruled out and why, the constraints that aren't visible in the code.

Brief launched this week and reached #5 with 253 upvotes. The problem statement is sharp: "AI agents can ship quickly, but without the right product context, they're often flying blind." Brief stores those decisions and serves relevant context to agents through chat, Slack, CLI, and MCP.

A CLAUDE.md can carry this layer too, but it gets unwieldy:

## Architecture decisions

- Auth: custom JWT + refresh token. Rejected Clerk (vendor lock-in concern, 2024-11).
  See: docs/ADRs/0003-auth-approach.md
- DB: Postgres. MongoDB ruled out early — our query patterns are relational.
- Background jobs: BullMQ. No migration to new runners without a spike.
Enter fullscreen mode Exit fullscreen mode

At a certain point, keeping that file accurate is its own maintenance job. Tools like Brief try to automate the curation. Whether you use a tool or a disciplined ADR directory, the important thing is that this layer exists and stays current — because an assistant that doesn't know why the auth system looks the way it does will confidently propose changes you ruled out eight months ago.


Layer 3 — What work is left (on this machine)

On January 22, 2026, Anthropic shipped Claude Code Tasks — a persistent task system that survives session termination. Tasks live in ~/.claude/tasks/ as JSON:

{
  "id": "01JJ3QZWZ4R2XM6GBTF9V7Y8KP",
  "title": "Implement rate limiting on /api/v1/completions",
  "status": "in_progress",
  "dependencies": ["01JJ3QY..."],
  "owner": "claude",
  "created": "2026-01-24T08:12:00Z"
}
Enter fullscreen mode Exit fullscreen mode

Before Tasks, Claude Code stored todos in session memory. They disappeared when the terminal closed. Tasks fix this: create them once, and they persist across restarts, terminal crashes, and session resets. That's a genuine improvement over the status quo.

The constraint is scope. Tasks are local. They live on one machine. They store orchestration metadata — status, dependencies, owner — but not the content of the work. What "done" means, what the acceptance criteria are, what artifacts prove the task is complete. And they don't synchronise across machines or agents.


The four layers, together

Layer What it answers Change frequency Example tools
1 — Personal context Who you are, preferences, stack Rarely (months) Unabyss, CLAUDE.md
2 — Product-decision context What should be built and why Occasionally (weeks) Brief, ADRs
3 — Local task persistence What work is queued on this machine Constantly (sessions) Claude Code Tasks
4 — Structured current-work context What is open, what done means, what proves it Constantly, shared

The question mark in that last column is where most developers who've installed layers 1–3 are still stuck.

Quadrant diagram mapping the four layers of AI context by change frequency and scope. Layer 4 — structured current-work context — occupies the high-frequency, shared quadrant and is highlighted in orange as the gap most tools leave unfilled.

The four layers plotted by how often they change and who can see them. Layers 1 and 2 sit in the slow-change rows; layers 3 and 4 are in constant flux. Most tools cover the left column. The top-right cell is the gap. (Author's model.)


The AI Context Gap Nobody Names

Walk through a real session. You open a new Claude Code instance. Layer 1 tells it you prefer TypeScript and conventional commits. Layer 2 tells it why the auth system looks the way it does. Layer 3 tells it there's a task called "Implement rate limiting" in progress.

What it doesn't know: what done means for that task. What the acceptance criteria are. Whether there's a failing test waiting. Whether another agent already started the same work in a different worktree. Whether the spec changed since you queued the task.

That information isn't in your CLAUDE.md. It's not in your decisions log. It's not in the Tasks JSON. It's the contract of the work — and it needs to live somewhere shared, writable, and structured. Not a file you write once and hope stays accurate.

This is also what distinguishes Layer 4 from the others in a practical sense: the contract changes as the work progresses. An assistant needs to be able to read it at session start and write to it as evidence accumulates. Static files can't do that.


Why a longer prompt doesn't close this gap

The instinct is to paste more context into the system prompt or CLAUDE.md. It rarely helps, and there's a mechanical reason.

Qualitative U-shaped curve showing model recall accuracy by document position in a long context: highest at start and end, lowest in the middle, with the middle position highlighted as the worst retrieval point.

Model recall by position in a long context window. Acceptance criteria buried in paragraph 12 of a CLAUDE.md face the worst retrieval odds. Based on Liu et al. 2023. Y-axis values are qualitative only.

A 2023 study on how language models use long contexts — "Lost in the Middle" — showed that models retrieve information reliably from the start and end of long inputs but degrade badly for content in the middle. The longer the context window, the more of your carefully-written CLAUDE.md sits in the graveyard.

Anthropic's context engineering guide for agents says it directly: "context is a critical but finite resource." The guidance is to treat it as something you curate and structure, not something you dump in bulk.

For Layer 4, the implication is concrete. If the acceptance criteria for a task are buried in paragraph 12 of a 600-line memory file, the assistant is not reliably reading them. They need to be in a distinct, retrievable record — something the assistant fetches on demand rather than scans.


What structured current-work context actually looks like

Here's the shape of the missing piece. This isn't a vendor-specific format — it's what a work item record needs to carry to be genuinely useful to an AI assistant at session start:

{
  "id": "wu_01JJ3R",
  "title": "Rate limiting on /api/v1/completions",
  "status": "active",
  "acceptanceCriteria": [
    "Returns 429 with Retry-After header when limit exceeded",
    "Limit is configurable per API key, not global",
    "Integration test covers the 429 path with a real Redis instance"
  ],
  "artifacts": [
    {
      "type": "spec",
      "label": "Rate limit spec",
      "url": "https://...",
      "linkedAt": "2026-06-08T09:00:00Z"
    },
    {
      "type": "test-result",
      "label": "Failing test run (pre-fix)",
      "url": "https://...",
      "linkedAt": "2026-06-09T11:43:00Z"
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Compare that to the Layer 3 task record. Layer 3 tells the agent that a task exists, who owns it, and whether it's in progress. Layer 4 tells it what the task means — the criteria that will constitute evidence of completion, and the evidence that already exists.

Wiring this up over MCP looks like any other context server:

{
  "mcpServers": {
    "project": {
      "command": "npx",
      "args": ["-y", "@your-tool/project-mcp@latest"],
      "env": {
        "PROJECT_API_KEY": "your-key"
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

With this configured, the assistant can call get_work_unit at the start of every session and receive the full record — criteria, artifacts, status — fetched fresh. Not read from a static file that may have drifted.

For a detailed breakdown of how this plays out across multiple tasks and agents, Agiflow's write-up on coordinating multi-task workflows with work units covers the model in practice.


Which layer is actually your problem

You probably don't need all four fixed today.

If the assistant keeps asking about your stack or commit style: Layer 1. A good CLAUDE.md or Unabyss solves it in an afternoon.

If it makes decisions that contradict past architecture choices: Layer 2. Start writing ADRs, or try Brief.

If it loses track of what it was doing when the session ends: Layer 3. Claude Code Tasks is already shipped, it's free, and it's local.

If it knows what work is queued but not what done means, drifts off the spec mid-session, or can't pick up where another agent left off: that's Layer 4. No static file solution handles it cleanly. You need a writable, shared, structured source of truth for the current work contract.

Most developers I've seen hit Layer 4 and diagnose it as Layer 3. They add more to CLAUDE.md, the agent still drifts, and the conclusion is "AI just isn't reliable enough yet." Sometimes that's true. More often, the right AI context structure was never there to begin with.

Top comments (0)