Salt Creative

Posted on Apr 10

From Monolithic Prompts to Modular Context: A Practical Architecture for Agent Memory

#agents #architecture #llm #promptengineering

Most teams building on top of LLMs treat the system prompt as a static artifact — write it once, tune it occasionally, move on. That works fine for simple assistants. It breaks down the moment your agent needs to operate across multiple domains, maintain state across sessions, and actually learn from its mistakes rather than repeating them.

After running a production agentic workflow for several months, I rebuilt the memory layer from scratch. Here's what I learned.

The Problem with Monolithic Context

The original system had a single large context file loaded at the start of every session. It contained everything: infrastructure details, client rules, workflow protocols, historical session logs, SEO doctrine, tool documentation — all of it, every time.

This violates a principle that should be obvious but isn't: context is an attention budget, not a storage bin. Research on context rot (Chroma, 2024) shows that LLM recall degrades nonlinearly as context length increases. You're not just adding tokens — you're diluting attention across an increasingly noisy signal space. Every irrelevant token you load competes with every relevant one.

The other problem: a monolithic file has no mutation mechanism. It grows. It never gets smarter. Failures get logged as narrative and immediately buried under new entries. The system had no immune memory.

The Architecture: Six Files, Three Load Tiers

The redesign splits context across six files organized by load trigger — not by topic.

Tier 1 — Always loaded (~1,000 tokens total):

Core identity file: project structure, infrastructure, tool index, session rules. Session rules appear first — not buried — because of the "lost in the middle" attention gradient documented by Liu et al. (2023).
Failure pattern file: every entry is a real production failure encoded as a structured triple: Failure | Trigger | Rule. Always loaded. Consulted before tool calls.

Tier 2 — Client-scoped (loaded via explicit switch):

Client context file: domain-specific rules, approved sources, active work log, client-specific failure patterns. Never loaded during other client sessions. Zero cross-contamination.

Tier 3 — Task-scoped (loaded by session type):

Workflow file: embedded executable sequences, not passive documentation. The NLP audit loop is written as a runnable checklist, not prose.
Architecture file: decision gates written as prompts. Before any structural recommendation, the agent runs a three-step investment check — hardcoded as a trigger, not a suggestion.
Session notes: ephemeral working memory. Cleared each session. Holds active threads, decisions made, blockers.

Total always-on footprint: ~1,000 tokens. Task-scoped files add 500–700 tokens only when relevant. This is a ~60% reduction from the monolithic baseline, with higher signal density at every tier.

The Failure Pattern File: Applied RCL

The most important file in the system isn't the one with the most information — it's the one that encodes what went wrong.

Recent work on Reflective Context Learning (RCL, arxiv 2604.03189) formalizes something practitioners have been doing informally: treating context optimization as a training loop. The forward pass executes the agent. The backward pass reflects on the trace and identifies which context entry was absent or wrong. The optimizer step mutates that entry.

The failure pattern file is the mutation log. Each entry follows a strict schema:

| Failure | Trigger | Rule |
|---------|---------|------|
| Silent tool timeout | Batch API call | Single requests only — 
  no error is thrown on batch failure |
| OAuth token expiry | ~90 day intervals | Re-authenticate before 
  session; token refresh is not automatic |
| Entity misclassification | Repeated superlative phrases | 
  Rewrite entire sentence — removing one word doesn't 
  clear the pattern |

Critically, the update_context tool now accepts an optional failure parameter. When something breaks mid-session, the agent writes to both the session log and the failure pattern file simultaneously. The mutation is captured at the moment of failure — not reconstructed from memory at session end.

This is what Meta's engineering team described in their April 2026 post on tribal knowledge capture: the most valuable context isn't what the system does when it works — it's what causes it to fail silently.

The "Compass, Not Encyclopedia" Constraint

Meta's framework for context file design: 25–35 lines per file, four sections maximum — Quick Commands, Key Files, Non-Obvious Patterns, See Also. Every line earns its place or gets cut.

The instinct when building these systems is to add more. More rules, more examples, more edge cases. That instinct is wrong. A 4,000-token context file with 80% signal is worse than a 1,000-token file with 95% signal, because attention is not uniformly distributed across tokens. The model doesn't read your context file the way a human reads a document. It attends to it — and attention degrades with distance and density.

The design principle that follows: never put passive information where active instructions belong. If a rule matters, write it as a trigger. If a workflow matters, write it as a sequence. Documentation is for humans. Context is for attention.

What This Changes in Practice

Three things that work better with modular context:

First, session startup is declarative. Instead of one large file that's always partially irrelevant, the agent loads exactly what it needs. A client-specific session loads the client file. An audit session loads the workflow file. The core file stays small and stable.

Second, failures compound into capability. Every production issue that gets structured into the failure pattern file makes the next session marginally more reliable. The system gets harder to break over time without any model fine-tuning — purely through context engineering.

Third, the system is auditable. Because context is modular and versioned in git, you can trace exactly what information was available to the agent during any session. When something goes wrong, you can identify whether the missing rule existed in the failure log, and if not, add it.

The Gap That Remains

The honest limitation: Claude.ai's MCP connector, as currently implemented, loads one context file automatically. The sub-files require explicit tool calls to retrieve. This means the agent must be instructed to load its own context — it doesn't happen natively.

The workaround is a get_subcontext tool that reads any file in the context directory by name. It works, but it's a patch on a deeper architectural gap: LLM interfaces don't yet treat context as a first-class, dynamically composable resource. They treat it as a static field.

That's the next frontier. Not larger context windows — smarter context routing.

Building something similar? The patterns here — tiered load triggers, structured failure logs, embedded executable prompts — generalize beyond any specific stack. The core insight is simple: treat your context files the way a good engineer treats a codebase. Small, modular, version-controlled, and self-documenting.

DEV Community

From Monolithic Prompts to Modular Context: A Practical Architecture for Agent Memory

Top comments (0)