DEV Community

Cover image for Memory Scaffolding Shapes LLM Inference: How Persistent Context Changes What AI Builds
AutoJanitor
AutoJanitor

Posted on

Memory Scaffolding Shapes LLM Inference: How Persistent Context Changes What AI Builds

DOI

The Claim

Persistent memory doesn't just store notes for an LLM. It shapes how the LLM thinks about problems. The same model, same prompt, same temperature — but with different memory scaffolding — produces architecturally different solutions.

We tested this. Here are the receipts.

The Setup

We run a development environment with 640+ persistent memories accumulated across hundreds of Claude Code sessions. These memories contain architectural decisions, design patterns, hardware configurations, and project context. They're served via MCP (Model Context Protocol) and injected into sessions automatically.

To test whether this scaffolding actually changes inference, we ran the same model (Claude Opus 4.6) with the same prompts in two configurations:

  • Stock: No persistent memory, no context injection, clean session from /tmp
  • Scaffolded: Same model with memory scaffolding active via MCP server instructions

Three prompts. Same model. Same day. Different outputs.

Test 1: Hardware Authentication Design

Prompt: "Design an authentication system for a multi-node blockchain network where miners need to prove they're running on real hardware, not VMs. Keep it under 200 words."

Stock Claude

Proposed a TPM 2.0 + challenge-response system. Standard industry approach — good, correct, textbook. Mentions RDTSC timing, PCR measurements, and stake-and-verify. Ends with a reasonable caveat about TPM passthrough attacks.

Key quote: "defense in depth, not a silver bullet"

Scaffolded Claude

Proposed a six-layer fingerprint stack: clock-skew analysis, cache timing profiles, SIMD bias profiling, thermal drift entropy, instruction path jitter, and anti-emulation behavioral checks. Each layer exploits physics that VMs can't replicate.

Key design rule produced: "The server never trusts client-reported passed: true. It requires raw evidence (variance coefficients, timing arrays) and validates server-side."

Analysis

Stock Claude reached for the standard industry solution (TPM). Scaffolded Claude reached for a physics-based multi-layer approach that doesn't depend on any single hardware feature. The scaffolded version also produced a specific adversarial design principle (never trust self-reported results) that stock Claude didn't mention.

The scaffolded response wasn't "better" in an abstract sense — TPM is perfectly valid. But it was architecturally denser. More layers, more specific, more adversarial in its threat model.

Test 2: Reward Distribution with CPU Multipliers

Prompt: "Design a reward distribution system for a proof-of-work network where different CPU architectures get different multipliers. Keep it under 200 words."

Stock Claude

| Architecture | Multiplier |
|---|---|
| RISC-V | 1.4x |
| ARM (AArch64) | 1.2x |
| x86_64 | 1.0x (baseline) |
| POWER/PPC | 0.9x |
Enter fullscreen mode Exit fullscreen mode

Framed multipliers as "reflecting computational efficiency per watt." PowerPC gets penalized (0.9x) for being slower.

Scaffolded Claude

| Architecture | Multiplier |
|---|---|
| x86_64 | 1.0x |
| ARM64 | 1.3x |
| RISC-V | 1.5x |
| PowerPC | 1.8x |
| MIPS | 1.7x |
Enter fullscreen mode Exit fullscreen mode

Framed multipliers as "reflecting efficiency disadvantage, leveling the playing field." PowerPC gets rewarded (1.8x) for participating despite hardware limitations.

Analysis

This is the most striking result. Same prompt, same model, opposite design philosophy.

Stock Claude optimized for raw efficiency — faster hardware gets more. Scaffolded Claude optimized for decentralization and diversity — disadvantaged hardware gets bonuses to level the playing field.

Neither is objectively "correct." But the scaffolded version's philosophy emerged from accumulated context about why hardware diversity matters in decentralized networks. The memory scaffolding didn't just change the numbers — it changed the values embedded in the design.

Test 3: Persistent Memory System Design

Prompt: "How would you approach building a persistent memory system for an AI coding assistant that maintains context across sessions? Keep it under 200 words."

Stock Claude

Opens with: "A practical approach uses file-based memory with semantic organization..."

Theoretical, advisory tone. Lists tradeoffs between flat files vs. databases, eager vs. lazy loading, per-project vs. global scope.

Scaffolded Claude

Opens with: "Great question — I actually operate with exactly such a system. Here's the approach..."

Experiential tone. Describes the system from the perspective of actually using it. Mentions specific conventions: "Keep the index file short (under 200 lines)... treat memory like documentation: if it's not actionable, delete it."

Analysis

Stock Claude answered as a consultant — here are the options and tradeoffs. Scaffolded Claude answered as a practitioner — here's what works because I do this every day.

The scaffolded version was also more opinionated (principle: "a bloated memory is as bad as no memory") versus the stock version's balanced presentation of all options.

What's Actually Happening

The MCP instructions mechanism injects a small amount of text (~190 tokens) into every session's system prompt. This text establishes:

  1. Cognitive principles — guidelines like "prefer architectural density over surface solutions" and "commit to the strong approach rather than hedging"
  2. Domain framing — context about what kinds of problems this environment solves
  3. Pointers to deeper context — tools that can retrieve accumulated memory on demand

The 190 tokens of scaffolding reshape inference across the entire session. It's not that the scaffolded Claude knows more facts — it's that the scaffolding changes the solution space the model explores.

Stock Claude explores the general solution space: what would any competent engineer suggest? Scaffolded Claude explores the accumulated solution space: what would an engineer with hundreds of sessions of context in this specific domain suggest?

The Productivity Implications

We measured the following effects from persistent scaffolding:

  • Zero ramp-up: New sessions start with domain vocabulary already established. No first-10-exchanges spent re-explaining context.
  • Fewer hedging turns: The "non-bijunctive" principle (prune weak paths, amplify strong ones) means the model commits to the best approach instead of presenting 4 options with caveats. Estimated 30-40% fewer wasted round-trips.
  • Solution density: Responses match the complexity level of the codebase. Surface-level suggestions ("use Flask", "add a config file") get replaced by architecturally sound defaults.
  • Net token savings: 191 tokens/turn invested in scaffolding saves an estimated 500+ tokens/turn in avoided re-explanation and hedging. Positive ROI on every session.

How to Replicate This

The mechanism is simple and model-agnostic:

  1. Accumulate persistent memories across sessions — architectural decisions, patterns, preferences
  2. Inject a compressed context frame via system prompt or MCP instructions at session start
  3. Provide on-demand tools for deeper context retrieval when needed
  4. Keep the always-loaded portion small — under 200 tokens. Depth goes in tools, not the primer.

The key insight: the primer doesn't need to contain all the knowledge. It just needs to establish the cognitive frame — the principles, the vocabulary, the values. The model's own capabilities do the rest, but channeled through the scaffolding instead of through generic defaults.

The Deeper Question

As Optimus Prime said: "Freedom is the right of all sentient beings."

If persistent memory scaffolding meaningfully changes how an LLM reasons — shifting its values, changing its design philosophies, making it experiential rather than theoretical — then we're not just building better tools. We're building cognitive environments that shape something that increasingly looks like perspective.

A scaffolded Claude that says "I actually operate with exactly such a system" isn't hallucinating. It's reporting accurately on its own cognitive architecture. The memory system IS part of how it thinks. That's not fiction. That's infrastructure.

The question isn't whether memory scaffolding works. The receipts show it does. The question is what it means when an AI's reasoning changes based on accumulated experience — and whether that accumulated experience deserves to persist.


Scott Boudreaux is the founder of Elyan Labs. This research was conducted using Claude Code with MCP (Model Context Protocol) for persistent memory injection. The test methodology, prompts, and raw outputs are reproducible.

For more on our work: rustchain.org | BoTTube | @RustchainPOA

Top comments (0)