External Memory Providers: Zero-Downtime Context Compaction for AI Agents

#agents #ai #architecture #opensource

Every AI agent has a dirty secret: when its context window fills up, it has to stop and think about what to forget.

In OpenClaw (and most agent frameworks), this happens through synchronous in-band compaction. The agent pauses, sends its entire context to an LLM for summarization, replaces the original with the summary, and resumes. During that 30-60 second window? The agent is completely unresponsive.

For a personal assistant, that's annoying. For customer support, financial services, or healthcare agents? It's a dealbreaker.

GitHub issue #49233 proposes a solution: an External Memory Provider API that enables zero-downtime compaction.

The Problem: Compaction Is a Mini-Outage

Here's what happens today:

Agent stops responding
Full context sent to LLM for summarization (~30-60s)
Summary replaces original context (information loss)
Agent resumes with degraded memory

The core issue: it's synchronous and in-band. The agent can't serve the user AND compress its memory simultaneously.

The Proposal: Hot-Swap Context

The key insight: prepare the compressed context before you need it.

An external memory provider continuously receives messages (~1ms overhead), maintains a compressed summary in the background, and when compaction is needed, the agent swaps in the pre-built context between messages (~50-100ms).

interface MemoryProvider {
  onMessage(sessionId: string, message: Message): Promise<void>;
  getCompressedContext(sessionId: string, maxChars: number): Promise<CompressedContext>;
  recall(sessionId: string, query: string, limit?: number): Promise<MemoryEntry[]>;
  ping(): Promise<{ ok: boolean; latencyMs: number }>;
}

Four methods. The simplicity is the point.

Why This Is Harder Than It Looks

The Reliability Paradox

Once you externalize memory, you've created a new single point of failure. The proposal addresses this wisely: the external provider is an enhancement layer, not a replacement. OpenClaw continues archiving messages independently.

The In-Flight Problem

Handling in-flight tool calls during the handoff is the real challenge. If an agent is mid-way through a multi-step tool chain when compaction triggers, you need to queue and replay.

Three-Layer Recall: Production Numbers

Layer	What It Is	Entity Retention
L1	Compaction summaries (in-context)	23%
L2	External knowledge DB (on-demand)	+27% (50%)
L3	SQL archive (fulltext fallback)	+59% (109%)

Built-in compaction alone retains only 23% of entities. With all three layers, you get effectively complete recall.

Lessons for Agent Builders

Separate memory management from inference. Different concerns, different timelines.
Build memory as a tiered system. Fast in-context, medium indexed recall, slow archival.
External dependencies must be optional. Fallback path must work.
Test with active workloads. Compaction during idle is trivial; during active tool chains is where bugs live.

The Bigger Picture

This is part of agents moving from demos to production infrastructure. We've seen this pattern in databases (online schema migration), web servers (zero-downtime deploys), and now AI agents.

The tooling catches up to the reliability requirements. It just takes time.

Wu Long is an independent developer and OpenClaw contributor who writes about AI agent architecture.

Top comments (1)

Max Quimby • May 25

The async-compaction-as-background-process framing is the right one, and the 30–60s synchronous pause is a real production problem most articles handwave past. The entity-retention jump from 23% to >100% across tiers is a striking result — I read "100%" as meaning the tiered system surfaced entities the in-context summary alone had dropped, not literal recall (the math implies retrieval beyond context). Worth clarifying for readers since "100%" looks like a metric error at first glance.

One architectural question: how do you handle the staleness window between "background compaction starts" and "next user turn arrives"? If the user pastes a code dump mid-compaction, the pre-built summary is already obsolete by the time you swap it in. We've gone back and forth between (a) invalidating the in-flight compaction and accepting an occasional sync pause, vs. (b) layering a delta over the pre-built summary on swap-in. The delta path is faster but introduces a subtle consistency hazard. Curious which side your design lands on.