~K¹yle Million

Posted on Apr 22

Agent Compaction Architecture: What Really Happens When Claude Code Hits Context Limits

#claudecode #aiagents #devtools #productivity

Section 1: The Silent Killer

When Claude Code's context window fills, the runtime does not hard-stop. It doesn't throw an error. It doesn't ask permission. It compacts.

Compaction is an automatic summarization step that fires when the token budget crosses a threshold. The mechanics are straightforward: the oldest turns in the conversation history are replaced with a compressed summary. Recent turns — the last several exchanges — are preserved verbatim. The summary takes the place of everything older.

From a token-budget perspective, this is correct behavior. There is no other option. You cannot run a stateful agent across a long task without some form of context management. The window is finite. The task is not.

The problem is the word "compressed." A summary is a lossy transformation. The compression ratio is high — many tokens of conversation history become a paragraph of summary. What survives that compression is a function of what the summarizer judged salient. Factual statements about what actions were taken survive well. Constraints survive partially. Nuanced reasoning about why a particular approach was chosen tends to survive poorly. Negative constraints — "don't touch X", "avoid this approach because..." — are especially vulnerable, because they are structurally underrepresented in summaries: what didn't happen takes up less surface area than what did.

Here is a concrete production failure I hit.

I had an agent working through a multi-step migration task. Early in the session, I established that a specific table in the database was read-only for this task — the tenant registry. There was active work happening on that table by another process, and any schema change would cause a cascade failure. I was explicit about it: "Do not touch the tenant_registry table. Do not add columns, do not create indexes, do not run any DDL against it."

The agent acknowledged this. It moved forward. It completed several unrelated subtasks. The context window filled. Compaction fired.

The summary captured the migration objective. It captured what had been completed. It mentioned the database was involved. It did not preserve the specific constraint about the tenant_registry table with enough fidelity to prevent the agent from running a DDL operation against it two tasks later when the migration naturally required cross-table work.

The operation succeeded at the database level. The cascade failure arrived async, from the other process. I found it in the error log four hours later.

Nothing in the session output flagged that compaction had occurred. Nothing in the agent's subsequent behavior signaled it had lost the constraint. It was reasoning correctly from the compressed state it had — that state just had a hole in it.

That is what makes compaction dangerous in autonomous operation. The agent doesn't know what it doesn't know. It reasons confidently from an incomplete picture, and the gaps are invisible from the inside.

Section 2: What Gets Lost and Why

Not all state is equally vulnerable to compaction. Understanding the failure modes requires a taxonomy.

Tool call results — high vulnerability

When the agent runs a Bash command and reads the output, that output lives in the conversation as a tool result. Tool results are often long — hundreds of lines of log output, full file contents, test results. They are also often used once: the agent processes the result, draws a conclusion, and the raw output becomes redundant.

From a summarization perspective, tool results are natural candidates for aggressive compression. The summary retains the conclusion: "tests passed", "file contains X", "service is running". The raw output is dropped.

This is fine when the raw output was truly just an input to a single conclusion. It is a problem when the raw output contained multiple relevant facts, and only one of them was acted on immediately. The rest are now gone. If a later step in the task needs one of those secondary facts, the agent will re-derive it, re-read the file, or get it wrong.

Intermediate conclusions — medium vulnerability

The agent builds up a model of the system as it works. "This service is stateless, so I can restart it without drain." "This config value is referenced in three places." "The test is flaky, not broken — ignore intermittent failures." These are conclusions drawn from evidence earlier in the session.

They are embedded in the conversation as reasoning traces — assistant turns explaining what the agent concluded and why. Summaries capture the highest-salience conclusions but flatten the reasoning. The "why" is the first thing to go.

When the "why" is gone, the agent may later reach the opposite conclusion from fresh evidence if that evidence is locally ambiguous. The earlier constraint has no backing anymore.

Explicit constraint acknowledgments — high vulnerability

"Remember, don't touch X." "Make sure to use approach Y for this module." "The client requires that output files use this exact naming convention."

Constraints stated conversationally, without a corresponding file artifact, are the most dangerous category. The agent acknowledged them. They shaped early decisions. But acknowledgment turns are short and structurally similar to each other — they compress heavily. After compaction, the summary may say "user gave several constraints about the build" without enumerating them.

The agent no longer has the specific list. It has a summary that there was a list.

Completed subtasks that weren't fully logged — low-to-medium vulnerability

Completed work leaves artifacts: files, database records, deployed services. Those artifacts exist independently of the conversation. The agent can re-inspect them.

The vulnerability here is more subtle: the decisions made during a subtask may be gone even when the subtask's outputs survive. The agent knows a file was written. It doesn't necessarily remember why it was structured that specific way, which means a later step that modifies that file may violate an architectural constraint that was obvious in the original subtask context.

Why summaries can't fully substitute for raw history

A summary is an agent-generated compression. Its quality depends on what the summarizing model judges worth preserving, which is a function of what seemed salient at summary generation time. Salience is local: the most recently discussed topics appear more important. Negative constraints are structurally invisible in summaries. Long reasoning chains compress to single-sentence conclusions.

The raw history is a ground truth. The summary is a lossy encoding. For short tasks with clear objectives, the loss is tolerable. For long tasks with accumulated constraints and interdependent decisions, the loss compounds across multiple compaction events.

Section 3: Compaction-Resistant Architecture

Four patterns. I use all of them in production. They compose — each layer backs up the others.

Pattern 1: Checkpoint Writes

At every significant milestone in a task, the agent writes the current state to a file. Not a summary of what it did — the live state that the next phase needs.

The checkpoint file is not documentation. It is a machine-readable context recovery artifact. The agent will read it at the start of each subsequent phase. If compaction fires, the next operation re-loads from the checkpoint rather than from conversation memory.

What belongs in a checkpoint:

Active constraints (including negative constraints — especially those)
Decisions made and the reason they were made
Current task state: what is complete, what is in progress, what is blocked
Any system facts that were discovered and are relevant going forward
Explicit re-statement of things that must not happen

The checkpoint is only useful if it is written before context-heavy operations. Writing it after means compaction may have already fired.

A checkpoint cadence that works: write before any operation that will consume more than a few thousand tokens (running tests, reading large files, invoking sub-agents, executing database migrations). Write at each logical phase boundary regardless of token consumption.

Pattern 2: Explicit State Re-Injection

Checkpoints are only useful if they are read. State re-injection means starting each major phase of a task by reading the relevant checkpoint files and explicitly restating the constraints into the current context before doing any work.

This is not redundant. After compaction, the conversation history is a summary. The most recent checkpoint is the last known-good full state. Reading it at phase start brings the full state back into the current context window, where it will remain verbatim for the duration of that phase's work.

The re-injection also serves as a correctness check: if the agent re-reads the checkpoint and notices that its current understanding diverges from what the checkpoint says, that divergence is a signal that something went wrong.

Re-injection should be explicit in the agent's prompt chain: "Before proceeding with phase N, read the phase N checkpoint file and confirm that all listed constraints are still active."

Pattern 3: Compaction Detection

There is no native "compaction occurred" event exposed by Claude Code's context. You cannot query whether compaction has fired. But you can detect it indirectly.

Compaction detection relies on a sentinel: a value written to a file at task start that the agent is instructed to re-read and verify at each phase boundary. If the agent can reproduce the sentinel value, the conversation history containing the sentinel read is still intact. If it cannot, compaction has likely compressed that turn.

More practically: you can detect behavioral evidence of compaction by testing the agent's recall of specific early-session constraints before proceeding. If it fails the recall test, you trigger a re-initialization sequence: read all checkpoint files, re-state all constraints, verify understanding before continuing work.

The detection overhead is low — a single file read and a short verification step. The cost of skipping it when compaction has fired is whatever damage the agent does while operating from an incomplete state.

Pattern 4: Session Segmentation

For tasks that will span many hours and many phases, a single ultra-long session is architecturally unsound. Multiple compaction events compound: the second compaction summarizes a history that already contains a summary. Information loss accelerates with each event.

Session segmentation means treating the task as a sequence of bounded sessions, each with a clean handoff file. Session N completes some work, writes a handoff file that captures the full state needed by session N+1, then exits cleanly. Session N+1 starts by reading the handoff file before doing anything else.

Each session starts fresh — full context window, no compaction debt. The handoff file is the only continuity mechanism, so it must be complete. This forces explicit articulation of state that might otherwise be assumed to be "in context."

The segmentation boundary should align with natural task phases. "Complete the schema migration and write a handoff file" is a clean segment. "Do some of the migration and some of the testing" is not.

Section 4: Code Examples

Checkpoint Write — Python

import json
from datetime import datetime
from pathlib import Path

def write_checkpoint(checkpoint_dir: str, phase: str, state: dict) -> Path:
    """
    Write a phase checkpoint before any context-heavy operation.
    Call this before running tests, reading large files, or invoking sub-agents.
    """
    path = Path(checkpoint_dir) / f"checkpoint_{phase}.json"
    payload = {
        "phase": phase,
        "timestamp": datetime.utcnow().isoformat(),
        "constraints": state.get("constraints", []),
        "decisions": state.get("decisions", {}),
        "do_not_touch": state.get("do_not_touch", []),
        "completed_tasks": state.get("completed_tasks", []),
        "in_progress": state.get("in_progress", ""),
        "facts": state.get("facts", {}),
    }
    path.write_text(json.dumps(payload, indent=2))
    return path


# Example usage before a database migration phase
write_checkpoint(
    checkpoint_dir="./outputs/session_checkpoints",
    phase="pre_migration",
    state={
        "constraints": [
            "Use WAL mode for all SQLite writes",
            "No DDL against tenant_registry table — active writes from separate process",
            "Output files must use snake_case naming convention",
        ],
        "do_not_touch": ["tenant_registry", "auth_tokens"],
        "decisions": {
            "schema_approach": "additive_only",
            "schema_approach_reason": "existing consumers cannot handle column removal",
        },
        "completed_tasks": ["schema_audit", "backup_verification"],
        "in_progress": "column_additions_to_user_profiles",
        "facts": {
            "db_path": "/data/production.db",
            "backup_verified_at": "2026-04-22T09:14:00Z",
        },
    }
)

Checkpoint Read + Re-Injection — Python

import json
from pathlib import Path

def load_checkpoint(checkpoint_dir: str, phase: str) -> dict:
    """
    Load checkpoint at phase start. Re-state all constraints before proceeding.
    This is your recovery path after a compaction event.
    """
    path = Path(checkpoint_dir) / f"checkpoint_{phase}.json"
    if not path.exists():
        raise FileNotFoundError(
            f"No checkpoint found for phase '{phase}'. "
            "Cannot proceed without known-good state."
        )
    state = json.loads(path.read_text())

    # Emit re-injection block — this goes into the agent's active context
    print(f"=== RE-INJECTING STATE FROM CHECKPOINT: {phase} ===")
    print(f"Timestamp: {state['timestamp']}")
    print("\nACTIVE CONSTRAINTS (must be honored for remaining work):")
    for c in state["constraints"]:
        print(f"  - {c}")
    print("\nDO NOT TOUCH:")
    for item in state["do_not_touch"]:
        print(f"  - {item}")
    print("\nKEY DECISIONS:")
    for k, v in state["decisions"].items():
        print(f"  {k}: {v}")
    print("=== END STATE RE-INJECTION ===\n")

    return state

Compaction Detection — Bash

#!/usr/bin/env bash
# compaction_check.sh
# Write a sentinel at task start; verify it at each phase boundary.
# If verification fails, trigger re-initialization before proceeding.

SENTINEL_FILE="./outputs/session_sentinel.txt"
CHECKPOINT_DIR="./outputs/session_checkpoints"
PHASE="${1:-unknown}"

write_sentinel() {
    local session_id
    session_id="$(date +%s)-$$"
    echo "$session_id" > "$SENTINEL_FILE"
    echo "SENTINEL_WRITTEN: $session_id"
}

verify_sentinel_or_reinit() {
    if [[ ! -f "$SENTINEL_FILE" ]]; then
        echo "COMPACTION_DETECTED: sentinel file missing — running re-initialization"
        reinitialize_from_checkpoints
        return 1
    fi
    local stored_sentinel
    stored_sentinel="$(cat "$SENTINEL_FILE")"
    echo "SENTINEL_OK: $stored_sentinel — proceeding with phase $PHASE"
    return 0
}

reinitialize_from_checkpoints() {
    echo "=== COMPACTION RECOVERY: loading all available checkpoints ==="
    for f in "$CHECKPOINT_DIR"/checkpoint_*.json; do
        [[ -f "$f" ]] || continue
        echo "--- Loading: $f ---"
        python3 -c "
import json, sys
state = json.load(open('$f'))
print(f'Phase: {state[\"phase\"]} @ {state[\"timestamp\"]}')
print('Constraints:')
for c in state.get('constraints', []):
    print(f'  - {c}')
print('Do not touch:', state.get('do_not_touch', []))
"
    done
    echo "=== RECOVERY COMPLETE — all constraints re-loaded ==="
}

# At session start: write_sentinel
# At each phase boundary: verify_sentinel_or_reinit

Session Handoff File — Python

import json
from datetime import datetime
from pathlib import Path

def write_handoff(output_dir: str, session_id: str, next_session_instructions: dict):
    """
    Write a clean handoff file at the end of a session segment.
    The next session reads this before doing any work.
    This file is the ONLY continuity mechanism between sessions.
    It must be complete — assume the next session has zero prior context.
    """
    path = Path(output_dir) / f"handoff_{session_id}.json"
    handoff = {
        "generated_at": datetime.utcnow().isoformat(),
        "from_session": session_id,
        "next_session_start_instructions": (
            "Read this file completely before any other action. "
            "All constraints listed here are active. "
            "Do not proceed without acknowledging each constraint."
        ),
        "task_objective": next_session_instructions["objective"],
        "completed_this_session": next_session_instructions["completed"],
        "next_phase": next_session_instructions["next_phase"],
        "hard_constraints": next_session_instructions["constraints"],
        "do_not_touch": next_session_instructions["do_not_touch"],
        "key_facts": next_session_instructions["facts"],
        "open_questions": next_session_instructions.get("open_questions", []),
        "known_risks": next_session_instructions.get("known_risks", []),
    }
    path.write_text(json.dumps(handoff, indent=2))
    print(f"Handoff written to: {path}")
    print(f"Next session must read: {path.name}")
    return path


# Example: end of session 1 of a multi-session migration
write_handoff(
    output_dir="./outputs",
    session_id="migration_s1",
    next_session_instructions={
        "objective": "Complete user profile schema migration and deploy to staging",
        "completed": [
            "Schema audit complete — findings in outputs/schema_audit.json",
            "Backup verified — outputs/backup_verification.md",
            "Column additions to user_profiles — migration script at migrations/002_add_profile_fields.sql",
        ],
        "next_phase": "Run migration against staging, execute integration test suite, write test report",
        "constraints": [
            "No DDL against tenant_registry — active concurrent writes",
            "Migration must be additive only — no column drops",
            "Staging deploy requires RAILS_ENV=staging explicitly set",
        ],
        "do_not_touch": ["tenant_registry", "auth_tokens", "legacy_session_keys"],
        "facts": {
            "staging_db": "postgres://staging-host:5432/app_staging",
            "migration_tool": "alembic",
            "test_suite": "pytest tests/integration/",
            "expected_test_count": 47,
        },
        "known_risks": [
            "Test DB may have stale fixtures — run pytest --setup-show to verify fixture state",
        ],
    }
)

The Architecture in Summary

Compaction is not a bug to work around. It is a fundamental constraint of context-window-bounded agents. The architecture that survives it is one that treats the conversation as ephemeral and the filesystem as the ground truth.

Checkpoint writes externalize state before it can be lost. Re-injection restores full context after a compaction event. Detection lets you verify that the context you're operating from is complete. Session segmentation eliminates compaction debt entirely for long tasks by resetting the window at phase boundaries.

None of these patterns are expensive. A checkpoint file write takes milliseconds. A re-injection read adds a few hundred tokens to the current context. The compaction detection sentinel is a single file read. A handoff file is twenty lines of JSON.

The cost of not using them is the kind of failure that doesn't announce itself — an agent that proceeds confidently from a state it believes is correct, into work that violates a constraint it no longer remembers.

I packaged the full compaction-resistant architecture — detection hooks, checkpoint templates, re-injection patterns, and session handoff schemas — as a ClawMart skill: Agent Compaction Architecture — Production Context Management. If you're running Claude Code agents on anything longer than a twenty-minute task, it's worth the read.

~K¹ (W. Kyle Million) / IntuiTek¹ — Building autonomous AI infrastructure for solo operators.

Tags: claudecode, devtools, aiagents

DEV Community