zxpmail

Posted on Jun 28

Don't Compress, Promote

#ai #webdev #productivity #architecture

AI coding has a hidden bottleneck that isn't in the model — it's in how you manage context across sessions.

You finish Phase 1. The codebase grew by 5000 lines. When you start Phase 2, how do you carry "what the AI knows" across?

The common answer today is Repomix: compress the entire codebase into one Markdown file, dump it into the prompt. It looks like a solution, but it creates a bigger problem.

Repomix Is a Full GC Heap Dump

A -XX:+HeapDumpOnOutOfMemoryError snapshot contains every living object, every dead object, every byte of fragmentation. You can fit the whole heap on disk, but loading it, parsing it, and finding the 12 objects you actually care about among thousands — that's the real cost.

In AI context terms:

100K-line codebase → Repomix packs it into ~150K tokens → dumped into the prompt
The AI has to find "the 3 files Phase 2 needs to change" inside 150K tokens
150K tokens of latency + attention dilution + key signals buried in boilerplate
Phase 2 code starts drifting from Phase 1's intent → more corrections needed → more context bloat → death spiral

This has a name: lost in the middle. Accuracy for the middle portion of long contexts drops off a cliff. By feeding the entire 150K-token heap dump, you're guaranteeing the AI forgets the 100K tokens in the middle.

But the deeper issue is:

Only Full GC dumps need "compression." Promotion doesn't compress — it promotes.

The JVM Had This Figured Out 25 Years Ago

HotSpot splits memory into three generations:

Generation	Role	Collection Strategy
Eden	Where new objects are born	Most die in Minor GC; survivors get promoted
Survivor (S0/S1)	Objects that survived 1+ GC cycles	Copied between S0/S1, age increments each round
Tenured (Old)	Long-lived objects promoted from Survivor	Collected rarely (Major GC)

This maps perfectly to the lifecycle of information in a codebase during AI-assisted development:

Phase completed → what survives goes to next phase
     │
     ├─ Eden code (90%)             → DON'T carry forward
     │    Scaffolding, boilerplate, temp solutions, experiments
     │
     ├─ Survivor (9%)               → PROMOTE to context
     │    Interfaces, types, domain models validated by this phase
     │
     ├─ Broken assumptions          → LOG to assumption registry
     │    "PostgreSQL doesn't support this full-text search syntax"
     │    "This library behaves differently on Windows paths"
     │
     └─ Known technical debt        → TAG explicitly, don't forget
          "Phase 3 must refactor the auth provider"

The difference isn't compression — it's promotion. You don't need to flatten the entire heap. You only need to upgrade the surviving objects to the next generation's context.

How to Promote (Takeaway Template)

At Phase End: Three Questions

Q1: Which data structures and interfaces proved their long-term value?→ Promote the declarations, not the implementations.

# Core Domain (promoted from Phase 1)
- User: { id, email, hashedPassword, displayName }
  invariant: email globally unique, validated on create
- Book: { id, title, isbn, ownerId, status }
  invariant: status ∈ {reading, finished, abandoned}

Q2: Which assumptions got broken?→ One line each. The next phase shouldn't relearn them.

# Assumption Changes
- "DB connection pool default of 10 is enough" ❌ bumped to 25
- "Vercel free tier supports 100MB responses" ❌ added pagination

Q3: What's knowingly left undone?→ Tag it explicitly so it survives the phase boundary.

# Carried Debt
- [ ] Phase 3: Migrate auth from JWT session to OAuth2
      Rationale: MVP first, third-party login required in Phase 3

At Phase Start: Only Load Promoted Data

Phase N context
├── Product spec (confirmed, not draft)
├── Core domain (promoted types + invariants)
│     ├── Survivor interfaces / domain models
│     ├── Assumption change log
│     └── Carried debt tags
└── Phase N goals (from development plan)

No Repomix dump. No full session history from the previous phase. No design docs you already finished reasoning through.

Token comparison:

Approach	Tokens	Attention Profile
Full Repomix dump	~100K-500K	Diluted globally, key signals drowned
Promotion-based load	~3K-10K	Concentrated on what this phase actually needs

That's two orders of magnitude.

Repomix and Promotion Aren't Mutually Exclusive

Compression solves a transport problem: "can the whole codebase fit in context?"

Promotion solves a selection problem: "what does the next phase actually need?"

Repomix is fine for cross-reference lookup — keep it as a collapsible reference that the AI reads on demand. But it shouldn't be the foundation of every phase start. The foundation should be promoted knowledge.

The right prompt structure:

[Phase context] — promoted interfaces + phase goals       (3K-10K tokens)
[Change files] — the 3-5 files this phase modifies        (10K-20K tokens)
[Repomix dump] — optional, for cross-reference lookup     (collapsible)

The AI's attention stays on the most critical signals: what survived from before, and what needs to change now. The full codebase becomes an on-demand reference, not mandatory reading.

Closing

Repomix solves a real problem (the codebase doesn't fit in context), but it chooses the wrong answer: a bigger dump instead of a smarter filter. In JVM terms, it's choosing more frequent Full GCs over generational collection.

And any engineer who's tuned a JVM knows: the generational hypothesis holds — most objects die young. The few that survive are worth promoting.

Codebase information follows the same pattern:

90% is Eden — written once, never needed again
9% is Survivor — promoted each phase
1% is Tenured — core domain model, changes rarely

You don't need compression. You need to recognize the 10% worth keeping.

*June 2026. Inspiration from JVM generational GC — the original "promote, don't compress."

Top comments (8)

Mike Czerwinski • Jun 28

The generational GC framing is right, and most people will still get poorer with it. The leak is not where you think. It is in how you check what got promoted.

Promote prose and you have promoted nothing. "Assumed pool of 10, turned out 25" reads clean in PROMOTED.md, but confirming it later means reasoning about it, which means reloading the context that produced it. Do that per assumption with a fresh agent and you have rebuilt the Repomix dump in a verifier costume. You paid the heap tax twice and called it discipline.

So promote checks, not claims. The instant an assumption breaks, write it where it enforces itself for free. "email globally unique" is a constraint, not a sentence. "pool >= 25" is a value a grep settles. The check runs forever at zero tokens. Whatever cannot be encoded is the only thing an LLM should ever reread, and it reads the diff and PROMOTED.md, never the session.

Which gives you the only test that matters. If verifying a promotion means reloading the heap, it was never promoted. It is in the heap wearing a better name. Promotion is paying to be right once, at write-time, while the context is still in your hand.

zxpmail • Jun 28

*“This frames promotion as a binary (heap vs PROMOTED.md). But what about third-order knowledge—the reasoning chain that produced the constraint? You can encode 'email unique' as a constraint, but you cannot encode why it became unique (GDPR? dedup bug? product pivot) without prose. If you purge the prose, the next agent might delete the constraint because it looks like 'over-engineering' without the context. So my counter: promote prose + checks, but make the prose point to the check, and make the check point to the ticket. Verification then is just grep -r "pool >= 25" + reading the commit log. The session is never reloaded, but the reason survives. Isn't that the actual promoted artifact?”

Mike Czerwinski • Jun 28

Promoting the chain is right, and it still preserves the reason as a record, not as a constraint. prose points to check points to ticket gives the next agent the why as a string it can read. It does not give it a way to find out the string went false. GDPR gets repealed, the product pivots, the dedup bug that actually motivated email-unique got fixed three releases ago, and grep "pool >= 25" returns green the entire time. The pointer survived. The binding behind it died quietly, and the prose is now a confident story explaining a constraint whose justification no longer holds. That is worse than a bare constraint, because the next agent who reads the GDPR note trusts it more, not less, and over-engineering that ships with a paragraph and a ticket link looks load-bearing exactly when it has stopped being. Reading the commit log to recover the reason is the actor reading its own history: it tells you what you believed at write-time, not whether it is still true. The reason that actually survives is the one with a payer downstream, the audit that fines you, the consumer that breaks the moment the dedup is gone. If email-unique is held up only by a closed ticket and good prose, the agent deleting it as over-engineering might be right, and your chain gives you no way to tell. The promoted artifact that does the work is not the reason written down. It is the reason still able to fail out loud when it stops being one.

zxpmail • Jun 28

“You just defined what Tenured actually means in the codebase context—and it's not the domain model I originally called Tenured.”

In my original GC framing, I said Tenured = core domain model that changes rarely. But you're right: a domain model is just a snapshot of what was true. It survives grep, but it doesn't survive a GDPR repeal or a fixed dedup bug unless it's still biting someone downstream.

Your “payer downstream” (the audit that fines you, the consumer that breaks) is the real Tenured generation. In JVM terms, an object only reaches Tenured if it survives multiple GC cycles. In code terms, a constraint only deserves promotion if it has survived multiple product pivots—not just one phase boundary. If the payer stops enforcing it, the constraint hasn't survived time; it's just Eden wearing a Tenured nametag, passing CI while lying to the next agent.

So the promoted artifact isn't the code, the prose, or even the test. It's the mapping from the constraint to the external payer. If you can't trace the constraint to a live oracle that will fail out loud when the assumption breaks, it doesn't belong in PROMOTED.md—it belongs back in Survivor, waiting for another cycle of validation.

This closes the loop perfectly with your earlier α + β comment on speculative decoding. The draft model writes the prose. The target model verifies the check. But time is the ultimate verifier—and it only promotes what the downstream payer keeps billing for. Thanks for giving my GC metaphor its missing generation. That should be the epilogue of the original post.”

Mike Czerwinski • Jun 28

That is the right place to stop, and the JVM frame pays off better than either of us started with. One thing to keep from the Tenured analogy: the oracle ages too. A payer can stop paying silently. The audit gets repealed, the downstream consumer migrates off the field, and the constraint is still wired to an oracle that no longer fires because nothing reaches it anymore. So the mapping is not promote-once either. It is the same liveness check one level up: a promotion is only still promoted while something downstream is observably still billing for it. The real Tenured generation is not a place you reach. It is a property you keep paying to maintain. Good thread.

zxpmail • Jun 28

A promotion is only still promoted while something downstream is observably still billing for it.' — this is going in the glossary. And it also tells us something about the social layer of this conversation: the thread only stays Tenured if someone keeps replying. So I'll stop here while it still pays out. Thanks for the upgrade

Nazar Boyko • Jun 29

The generational frame has one more piece worth stealing that the post leaves on the table. In a real collector nothing gets promoted because someone predicted it would survive. It gets promoted because it already survived a few cycles, so the age counter does the deciding, not a guess. Your three questions at phase end are still a human guessing up front which 9% will matter next, which is the exact prediction the JVM is built to avoid. A closer copy would let promotion follow what later phases actually reached back for, so the thing you guessed wrong about gets quietly demoted instead of carried forever. Did you try anything like a usage signal rather than a judgment call at phase end?

zxpmail • Jun 29

You’re absolutely right – my “three questions” are essentially a human tuning the GC parameters manually, which is the least JVM‑like part of the analogy. Real generational collectors don’t guess; they promote based on actual survival history.

That said, if we copied the JVM literally (waiting for later phases to actually reference something before promoting it), we’d hit a cold‑start problem in practice: Phase 2 starts with no context at all – the AI wouldn’t even know what to reach for, so the first round would be blind.

My pragmatic compromise is: manual priming (guess) + actual usage feedback. The PROMOTED.md is a “seed survivor set” to get the next phase off the ground. But I completely agree that we should retroactively measure whether those promoted items were actually referenced – and if they weren’t, they should be demoted next time.

To truly automate what you’re suggesting, we’d need IDE plugins or prompt‑logging tools that track every RAG query the AI makes. I haven’t seen a mature solution yet, so I admit that pure human prediction is the Achilles’ heel of my post – and your comment nails that weak spot.

Thanks for pushing this – I’ll be thinking about how to add a lightweight “reference counter” without over‑engineering the process.