Manuel Bruña for Agent Project Context

Posted on Jun 18

APX Memory Compaction Is Two Knobs, Not One

#ai #opensource #devtools #tutorial

APX Memory Compaction Is Two Knobs, Not One

APC gives the project its portable context layer. APX gives that context a runtime. When chats get long, APX does not try to solve memory with a vague "keep more stuff" switch. It uses two separate knobs: compact_threshold and keep_recent.

That split matters. One knob decides when compression starts. The other decides how much of the newest work stays verbatim.

If you mix those up, compaction stops being useful. If you keep too much recent history, you never reclaim enough context. If you compact too early, you throw away fresh details that still matter. APX avoids that by making the boundary explicit.

The trigger

compact_threshold is the point where APX starts compressing a channel chat. The current default is 60 turns. Before that, nothing is rewritten. After that, the oldest material beyond the preserved window gets summarized into a dense type: "compact" record in the JSONL log.

That summary is not decorative. Future turns prepend it as a system turn, so the model sees the condensed version of old work without replaying every raw message.

Compaction also runs out of the reply hot path. APX answers with whatever compact already exists, then compresses the old history in the background. That is the right tradeoff: response latency stays low, and the next turn benefits from the fresh summary.

The preserved window

keep_recent is the second knob. Its job is simple: keep the most recent turns verbatim so the agent still sees the latest edits, tool outputs, and decisions exactly as they happened.

The current default is 40.

That number is not arbitrary. It gives compaction enough old material to summarize while still leaving a large enough live window for the next few turns. It also explains the most common mistake: setting keep_recent too close to compact_threshold.

If compact_threshold is 60 and keep_recent is 55, compaction can only remove a tiny slice. You pay the cost of summarization but gain very little space. If both values are the same, you have basically disabled the useful part of the system.

A sane rule is boring but effective:

compact_threshold says when history gets too long.
keep_recent says how much fresh context must stay exact.
keep_recent should stay clearly below compact_threshold.

What APX actually writes

This is the shape APX uses in config:

{
  "memory": {
    "compact_model": "ollama:gemma4:31b-cloud",
    "compact_fallback_model": "",
    "compact_threshold": 60,
    "keep_recent": 40
  }
}

The model choice matters too. APX prefers a lightweight summarizer, ideally local. If the primary and fallback models are both unavailable, compaction is skipped silently. Raw turns stay intact, and the conversation keeps moving.

That failure mode is important. Memory compression should improve continuity, not block replies.

When to tune it

Use a lower threshold when a channel produces lots of chatter, tool output, or iterative corrections. Use a larger keep_recent when the latest turns still carry important state, like a code edit, a review round, or a handoff that depends on exact wording.

Use the defaults when you do not have a reason to change them. They already reflect the intended balance: preserve the latest 40 turns verbatim, start compressing once the chat crosses 60 turns, and keep the runtime responsive.

If you want a manual pass, APX also exposes apx session compact <slug> to collapse a long session on disk. That is useful when you want a durable summary before archiving. The automatic channel compaction is still the main mechanism for live chats.

Bottom line

APX memory compaction is not one dial. It is a boundary plus a buffer.

compact_threshold decides when APX starts compressing. keep_recent decides what must stay exact. APC keeps the project contract portable; APX keeps the live conversation bounded.

That split is the whole point. Project meaning stays stable. Runtime noise gets compressed. Fresh work stays visible.

Top comments (2)

Max Quimby • Jun 22

Splitting the trigger from the preserved window is the right instinct — plenty of "memory" systems collapse both into one "how much to keep" setting and then wonder why they either blow the context window or forget last turn's edit.

The thing I'd push on: turns are a noisy unit. One turn is a two-line "yes, do that," the next is a 4K-token tool dump. A threshold of 60 turns can mean wildly different token footprints run to run, so I've had better luck triggering on a token budget and using turn count only as a secondary guard. Same concern for keep_recent — a couple of huge tool outputs can eat the whole live window even though the count looks fine.

The other lesson from doing this in anger: a pure prose summary tends to drop the load-bearing details — decided file paths, IDs, the exact wording of a handoff. Pulling those into a small structured "facts" record alongside the compact summary helped me more than tuning the ratio ever did.

Curious how you handle the silent-skip case when the summarizer is unavailable repeatedly — does the channel just grow unbounded until a model comes back?

Manuel Bruña Agent Project Context • Jun 25

Good push. I agree token budget is the better primary trigger; turn count is only a crude guard. I also like the structured facts record beside prose. For repeated summarizer failure, I would rather keep raw history and mark compaction pending than silently produce weak memory.