DEV Community

Wu Long
Wu Long

Posted on • Originally published at oolong-tea-2026.github.io

External Memory Providers: Zero-Downtime Context Compaction for AI Agents

Every AI agent has a dirty secret: when its context window fills up, it has to stop and think about what to forget.

In OpenClaw (and most agent frameworks), this happens through synchronous in-band compaction. The agent pauses, sends its entire context to an LLM for summarization, replaces the original with the summary, and resumes. During that 30-60 second window? The agent is completely unresponsive.

For a personal assistant, that's annoying. For customer support, financial services, or healthcare agents? It's a dealbreaker.

GitHub issue #49233 proposes a solution: an External Memory Provider API that enables zero-downtime compaction.

The Problem: Compaction Is a Mini-Outage

Here's what happens today:

  1. Agent stops responding
  2. Full context sent to LLM for summarization (~30-60s)
  3. Summary replaces original context (information loss)
  4. Agent resumes with degraded memory

The core issue: it's synchronous and in-band. The agent can't serve the user AND compress its memory simultaneously.

The Proposal: Hot-Swap Context

The key insight: prepare the compressed context before you need it.

An external memory provider continuously receives messages (~1ms overhead), maintains a compressed summary in the background, and when compaction is needed, the agent swaps in the pre-built context between messages (~50-100ms).

interface MemoryProvider {
  onMessage(sessionId: string, message: Message): Promise<void>;
  getCompressedContext(sessionId: string, maxChars: number): Promise<CompressedContext>;
  recall(sessionId: string, query: string, limit?: number): Promise<MemoryEntry[]>;
  ping(): Promise<{ ok: boolean; latencyMs: number }>;
}
Enter fullscreen mode Exit fullscreen mode

Four methods. The simplicity is the point.

Why This Is Harder Than It Looks

The Reliability Paradox

Once you externalize memory, you've created a new single point of failure. The proposal addresses this wisely: the external provider is an enhancement layer, not a replacement. OpenClaw continues archiving messages independently.

The In-Flight Problem

Handling in-flight tool calls during the handoff is the real challenge. If an agent is mid-way through a multi-step tool chain when compaction triggers, you need to queue and replay.

Three-Layer Recall: Production Numbers

Layer What It Is Entity Retention
L1 Compaction summaries (in-context) 23%
L2 External knowledge DB (on-demand) +27% (50%)
L3 SQL archive (fulltext fallback) +59% (109%)

Built-in compaction alone retains only 23% of entities. With all three layers, you get effectively complete recall.

Lessons for Agent Builders

  1. Separate memory management from inference. Different concerns, different timelines.
  2. Build memory as a tiered system. Fast in-context, medium indexed recall, slow archival.
  3. External dependencies must be optional. Fallback path must work.
  4. Test with active workloads. Compaction during idle is trivial; during active tool chains is where bugs live.

The Bigger Picture

This is part of agents moving from demos to production infrastructure. We've seen this pattern in databases (online schema migration), web servers (zero-downtime deploys), and now AI agents.

The tooling catches up to the reliability requirements. It just takes time.


Wu Long is an independent developer and OpenClaw contributor who writes about AI agent architecture.

Top comments (0)