Claude Code blocks the agent while compacting. LangGraph runs compaction in the background and silently drops messages. Aider spawns a background thread and hopes for the best. Async compaction sounds like the obvious optimization — until you try to build it.
We surveyed how major frameworks handle context compaction timing — synchronous, asynchronous, or not at all — and catalogued the concurrency hazards that emerge when you move compaction off the critical path. Here's what we found.
Why compaction blocks
Most frameworks run compaction synchronously. The agent stops, the LLM summarizes, the agent continues with a shorter context. It's slow but safe.
| Framework | Approach | Agent blocked | Race risk |
|---|---|---|---|
| Claude Code | Sync at 95% capacity | Yes | None |
| LangChain | Sync after turn | Yes | None |
| AutoGen | Sync between chats | Yes | None |
| Cursor | None (manual reset) | N/A | N/A |
| ChatGPT | None (manual) | N/A | N/A |
| Aider | Background thread | No | Medium |
| Google ADK | Async event-based | No | Medium |
| LangGraph | Async background | No | High |
Six of eight frameworks either block or don't compact at all. The industry has voted with its implementations: synchronous compaction is the safe default.
The cost is real. LLM summarization takes 2–10 seconds depending on context size and model. During that window, the agent can't respond. For interactive use cases (coding assistants, chatbots), that's a noticeable hang. For background automation, it barely matters.
The five concurrency hazards
Moving compaction to a background task introduces five categories of concurrency bugs. We found evidence of all five in production frameworks.
1. Stale snapshot
Compaction reads the current message history, sends it to an LLM for summarization, and waits for the result. During that wait, new messages arrive. The compacted summary doesn't include them.
When the summary replaces the original history, the new messages are silently lost.
LangGraph's documented race: history is rebuilt from a stale snapshot then fully replaced, dropping items recorded during the compaction window. The proposed fix — version counters and generation IDs — is not yet implemented.
2. Silent message drop
This is the consequence of stale snapshots, but it deserves its own category because of how it manifests: the agent simply "forgets" recent context with no error, no warning, no log entry.
The user says "actually, use pnpm instead of yarn." Compaction starts. The compacted summary captures the pre-change state. The user's correction vanishes.
LangGraph's three-step async operation (snapshot → summarize → replace) can fail mid-way, leaving memory and disk out of sync. A partial failure means the summary was written but the old history wasn't fully removed — or vice versa.
3. Ordering violation
If multiple WHS services or agents compact in parallel, results arrive out of order. Service A compacts messages 1–50 while Service B compacts messages 30–60 with overlapping coverage. Which result wins? How do you merge overlapping compactions?
In single-service systems this is less likely. But in walrus — where memory, search, and channels are all WHS services that may declare the Compact capability — parallel compaction is a real scenario.
4. Failed rollback
Compaction produces a bad summary — it drops a critical fact, mischaracterizes a decision, or generalizes away an edge case. In synchronous compaction, you can validate before continuing. In async compaction, the agent has already acted on the pre-compaction context. By the time you detect the bad summary, the damage is done.
No framework we surveyed implements compaction rollback. The summary is treated as authoritative the moment it's produced.
5. Double compaction
Token threshold crossed → compaction starts in background → more messages arrive → threshold crossed again → second compaction starts. Two concurrent compactions now race on the same history.
LangGraph has no max_compact_attempts counter — infinite compaction retries are theoretically possible. The proposed fix includes a maximum attempt limit, but it's unimplemented.
[Interactive chart — see original post]
The chart tells a clear story: synchronous compaction (walrus's current approach) has zero concurrency risk. Every async implementation introduces hazards. LangGraph's are the most severe because its async design was retrofitted onto a system that assumed sequential execution.
How three frameworks handle async
Aider: background thread with weak model
Aider runs recursive summarization in a background thread using a cheaper "weak model" — a smaller, faster LLM that handles compression while the main model continues reasoning.
What works: the main agent is never blocked. Compaction cost is reduced by using a cheaper model. Recursive summarization (summary of summaries) keeps context compact over long sessions.
What's missing: no documented handling of what happens when the agent queries content that's currently being compacted. If the background thread hasn't finished and the agent needs the old context, it reads stale data or waits — defeating the purpose of async.
Google ADK: event-based async summarization
Google ADK triggers compaction via events and runs summarization asynchronously. The result is written back as a new event. A sliding window with overlap preserves the most recent messages.
What works: the event-based architecture means compaction is just another event in the stream. The overlap window (keeping the last N messages uncompacted) prevents the worst stale-snapshot problems — recent context always survives.
What's missing: ordering guarantees when events arrive during compaction are not documented. If the compaction event completes after several new user events, the insertion point matters. Google ADK doesn't specify whether the summary event is inserted at the position where compaction started or at the current head.
LangGraph: async with known race conditions
LangGraph attempts true async compaction but has documented concurrency bugs:
- Silent drop: items recorded during the compaction window are lost when history is fully replaced
- Partial failure: memory and disk can get out of sync if the three-step operation (snapshot → summarize → replace) fails mid-way
- Unbounded retries: no maximum compaction attempt counter
The proposed fixes are sound — version counters, atomic replacement, max attempts — but none are implemented as of March 2026. LangGraph is the clearest evidence that async compaction is harder than it looks.
[Interactive chart — see original post]
What MemGPT got right: don't compact in the background
MemGPT (now Letta) takes a radically different approach: the agent controls its own memory tiers, like an operating system managing physical and virtual memory. The LLM context window is "physical memory." External storage is "virtual memory." The agent explicitly moves information between tiers via function calls.
No background compaction. No race conditions. The agent decides what to archive and what to recall. This is the only framework we surveyed with zero concurrency hazards.
The trade is cognitive overhead: the agent spends tokens reasoning about memory management instead of the actual task. MemGPT's approach is elegant but expensive in a different currency — model attention rather than infrastructure complexity.
The walrus problem
Walrus currently compacts synchronously. The on_compact() hook blocks the agent loop while WHS services return compacted context — tokio::task::block_in_place() bridges the async/sync gap. Each service has a 10-second timeout. Safe, but the agent hangs.
Moving to async compaction would look like this:
- Agent loop detects context threshold → fires
CompactSessionevent - Background tokio task dispatches to all
Compact-capable WHS services - Services return compacted prompt additions
- Results stored in session as "pending compaction"
- Next
on_before_run()injects pending compaction into the prompt - Agent continues immediately after step 1
This design uses walrus's existing event infrastructure — DaemonEvent variants, tokio::spawn(), the task watcher pattern in the task registry. No Hook trait changes required.
But all five hazards apply:
Stale snapshot: messages arrive between event fire (step 1) and result injection (step 5). The compacted summary doesn't include them. Fix: keep a generation counter on the session history. Reject compaction results if the generation has advanced beyond a threshold.
Silent drop: if pending compaction replaces history naively, messages from steps 2–4 vanish. Fix: merge, don't replace. Append the compaction summary alongside messages received during the compaction window, not instead of them.
Ordering: multiple WHS services may compact in parallel. Their results must be serialized. Fix: the existing RPC mutex on ServiceRegistry (already used for tool dispatch) can serialize compaction results. Alternatively, sequence compaction responses by service priority.
Failed rollback: a bad summary from a WHS service corrupts context. Fix: store pre-compaction history snapshot. If the agent detects degraded quality (a heuristic, not foolproof), restore from snapshot.
Double compact: threshold crossed again before first compaction completes. Fix: at most one compaction in flight per session. New threshold crossings set a "compact pending" flag but don't spawn another task.
[Interactive chart — see original post]
The timing chart shows why async is appealing: the agent is blocked for ~50ms (event dispatch) instead of ~5,000ms (full summarization). Total wall time is similar — the LLM still takes 5+ seconds — but the agent can work during that time.
Patterns that work
From surveying frameworks and Anthropic's context engineering guide, four patterns emerge:
1. Version counters — Track a generation ID on session history. When compaction starts, record the current generation. When results arrive, check if the generation has advanced. If it has, either reject the compaction or merge it with the new messages. Proposed for LangGraph but not yet implemented.
2. Overlapping windows — Never compact the last N messages. Google ADK uses this with its sliding window. Anthropic recommends raw context over compaction over summarization — keep as much original context as possible, especially recent messages.
3. Optimistic apply with validation — Apply the async compaction result, then run a quick validation: are key facts preserved? Does the summary mention the current task? If validation fails, roll back to pre-compaction history. This adds one more LLM call but catches the worst failures.
4. Throttled compaction — At most one compaction in flight per session. New threshold crossings queue, don't spawn. This prevents double compaction entirely and simplifies the state machine. Walrus's task registry already implements similar concurrency control with its queue-and-promote pattern.
Open questions
Is the latency savings worth the complexity? Sync compaction blocks for 2–10 seconds. For interactive agents, that's annoying. For background automation, it's irrelevant. How often does compaction actually happen in practice — once per session? Once per hundred turns? If it's rare, the engineering cost of async may not pay off.
Should results be applied immediately or at a natural break? Injecting compaction results mid-turn could confuse the agent. Waiting for a natural break (tool response, user message) is safer but adds latency. Where's the right insertion point?
Can you validate a compaction summary without another LLM call? Embedding similarity between pre- and post-compaction context could catch gross information loss. String matching for key entities could catch fact drops. Neither is as reliable as LLM-based validation, but both are cheaper.
How should async compaction appear in the task registry? Walrus's task registry tracks agent tasks as a live tree visible via
walrus ps. Should background compaction appear as a task? A session annotation? Invisible infrastructure? Observability matters for debugging.Does MemGPT's approach eliminate the need for async compaction entirely? If the agent controls its own memory paging, there's nothing to run in the background. The trade is cognitive overhead — but with capable models, that overhead shrinks. Is agent-controlled paging the endgame, making async compaction a transitional pattern?
Further reading
- Anthropic: Effective context engineering for AI agents
- MemGPT: Towards LLMs as Operating Systems — virtual context management
- LangGraph race conditions — documented concurrency bugs
- LangChain async memory issue — the original feature request
- ACON: Optimizing Context Compression — failure-driven compression for long-horizon agents
- Claude Code compaction docs — sync approach with automatic trigger
- Aider repository map — background summarization architecture
- Our context compaction survey covers the eight frameworks at an architectural level. This post goes deeper on the async-specific challenges.
- The persistent agent memory survey covers the broader memory architecture that compaction interacts with. Mem0's extraction pipeline faces similar async challenges. Hermes's FTS5 layer must also handle concurrent writes.
Originally published at OpenWalrus.
Top comments (0)