DEV Community

Cover image for APX Memory Compaction Is Two Knobs, Not One

APX Memory Compaction Is Two Knobs, Not One

APX Memory Compaction Is Two Knobs, Not One

APC gives the project its portable context layer. APX gives that context a runtime. When chats get long, APX does not try to solve memory with a vague "keep more stuff" switch. It uses two separate knobs: compact_threshold and keep_recent.

That split matters. One knob decides when compression starts. The other decides how much of the newest work stays verbatim.

If you mix those up, compaction stops being useful. If you keep too much recent history, you never reclaim enough context. If you compact too early, you throw away fresh details that still matter. APX avoids that by making the boundary explicit.

The trigger

compact_threshold is the point where APX starts compressing a channel chat. The current default is 60 turns. Before that, nothing is rewritten. After that, the oldest material beyond the preserved window gets summarized into a dense type: "compact" record in the JSONL log.

That summary is not decorative. Future turns prepend it as a system turn, so the model sees the condensed version of old work without replaying every raw message.

Compaction also runs out of the reply hot path. APX answers with whatever compact already exists, then compresses the old history in the background. That is the right tradeoff: response latency stays low, and the next turn benefits from the fresh summary.

The preserved window

keep_recent is the second knob. Its job is simple: keep the most recent turns verbatim so the agent still sees the latest edits, tool outputs, and decisions exactly as they happened.

The current default is 40.

That number is not arbitrary. It gives compaction enough old material to summarize while still leaving a large enough live window for the next few turns. It also explains the most common mistake: setting keep_recent too close to compact_threshold.

If compact_threshold is 60 and keep_recent is 55, compaction can only remove a tiny slice. You pay the cost of summarization but gain very little space. If both values are the same, you have basically disabled the useful part of the system.

A sane rule is boring but effective:

  • compact_threshold says when history gets too long.
  • keep_recent says how much fresh context must stay exact.
  • keep_recent should stay clearly below compact_threshold.

What APX actually writes

This is the shape APX uses in config:

{
  "memory": {
    "compact_model": "ollama:gemma4:31b-cloud",
    "compact_fallback_model": "",
    "compact_threshold": 60,
    "keep_recent": 40
  }
}
Enter fullscreen mode Exit fullscreen mode

The model choice matters too. APX prefers a lightweight summarizer, ideally local. If the primary and fallback models are both unavailable, compaction is skipped silently. Raw turns stay intact, and the conversation keeps moving.

That failure mode is important. Memory compression should improve continuity, not block replies.

When to tune it

Use a lower threshold when a channel produces lots of chatter, tool output, or iterative corrections. Use a larger keep_recent when the latest turns still carry important state, like a code edit, a review round, or a handoff that depends on exact wording.

Use the defaults when you do not have a reason to change them. They already reflect the intended balance: preserve the latest 40 turns verbatim, start compressing once the chat crosses 60 turns, and keep the runtime responsive.

If you want a manual pass, APX also exposes apx session compact <slug> to collapse a long session on disk. That is useful when you want a durable summary before archiving. The automatic channel compaction is still the main mechanism for live chats.

Bottom line

APX memory compaction is not one dial. It is a boundary plus a buffer.

compact_threshold decides when APX starts compressing. keep_recent decides what must stay exact. APC keeps the project contract portable; APX keeps the live conversation bounded.

That split is the whole point. Project meaning stays stable. Runtime noise gets compressed. Fresh work stays visible.

Top comments (0)