How the Model-View-Agent pattern changed how we build MCP servers - and why it probably should change how you build yours too.
The part of the DeepSeek R1 story that stuck with me had nothing to do with the chip export drama or the stock market panic. What I kept coming back to was the architecture decision: constrain the context, structure the inputs, and suddenly you don't need the brute-force compute everyone assumed was non-negotiable. Efficiency over scale. It worked embarrassingly well.
The MCP ecosystem, right now, is doing the opposite. Enthusiastically.
A Real Build Going Sideways
Early on, we were connecting an agent to a billing database. The first tool implementation was around 30 lines - query the table, serialize the result, hand it back. Straightforward. We shipped it, it passed tests, and we moved on.
Then production happened.
The payload coming back from the database was enormous. The model was receiving raw rows: password_salt, tenant_id, updated_at, a dozen columns that had no business being in the agent's context window. To compensate, we started bolting rules onto the system prompt. Currency formatting here, status handling there. Every time someone noticed the agent doing something slightly wrong, a new rule went in. Six weeks later, the prompt was this exhausting, paranoid document trying to explain the entire company's billing logic to a language model on every single request - including the ones where the user had asked something completely unrelated.
I started calling it Hope-Driven Development. Pipe the data in, write enough instructions that the model should get it right, cross your fingers.
The frustrating part isn't that it fails constantly. It's that it fails unpredictably - in ways that are almost impossible to reproduce in a controlled environment, which means they're almost impossible to fix.
The Three Specific Failure Modes
"LLMs hallucinate" has been repeated so many times it's lost all meaning as a diagnostic. The actual failures in this architecture are more specific than that, and they're worth naming precisely.
Attention dilution. There's solid research on what's sometimes called the "Lost in the Middle" problem: relevant information in the center of a large context gets recalled worse than information at the edges. Fill a context window with irrelevant columns - internal keys, salted hashes, audit timestamps - and you're actively degrading the model's ability to reason about the data you actually care about. The noise crowds out the signal.
Prompt cost at scale. Because raw data arrives stripped of any context, you compensate with instructions. "Always divide monetary values by 100." "If status is pending, suggest the payment flow." These instructions travel on every request - the complex ones, sure, but also the simple ones, the greeting messages, the clarification questions. At meaningful volume, you're burning thousands of tokens per session on rules that are irrelevant to what the user is actually doing. We ran estimates on one project and the annualized overhead was uncomfortable enough that it changed our approach.
Probabilistic formatting. ECharts expects exact JSON. Mermaid diagrams have strict syntax. When you ask a language model to generate these from raw data, you're using a stochastic system to produce output that needs to be deterministically correct. It works most sessions - until it doesn't, and then your user gets a blank chart and you get a support ticket describing an error you cannot reproduce.
MVC Already Solved This (for Humans)
MVC has been the default architecture for web applications for four decades, and it's not hard to see why. The View layer doesn't pass raw database records to the browser. It translates them. A human reading 45000 in a field labeled amount_cents understands it as $450 - they bring context, they infer meaning, they fill gaps.
Agents don't do that. They operate on probability distributions over tokens. Feed them raw, noisy, context-free data and they make educated guesses - usually reasonable ones, occasionally catastrophic ones, always in ways your test suite won't catch.
The realization we kept coming back to was that we weren't dealing with a prompt quality problem. We were dealing with an architecture problem: we'd built a View layer for human eyes and wired an AI directly to the database instead.
MVA: The Same Separation, Applied to AI
We formalized this into a pattern called MVA - Model-View-Agent - and built a TypeScript framework around it: mcp-fusion, open source under Apache 2.0.
The rule is simple: data never reaches the agent raw. It passes through a Presenter - server-side, deterministic code - before the LLM sees it. The Presenter shapes the payload, injects only the rules relevant to this specific data type, handles truncation, and pre-renders anything that doesn't need probabilistic generation.
Rules that travel with data, not ahead of it
export const InvoicePresenter = createPresenter('Invoice')
.schema(invoiceSchema)
.systemRules([
'CRITICAL: amount_cents is in CENTS. Divide by 100 before displaying.'
]);
Invoice rules appear when the agent queries an invoice. They don't appear when the agent queries a user profile. The global system prompt shrinks dramatically - in practice, we see around 80% reduction in input tokens per session, because you're only sending context that's genuinely relevant to the current interaction.
Truncation that tells the agent what to do next
export const InvoicePresenter = createPresenter('Invoice')
.agentLimit(50, (omittedCount) =>
ui.summary(`⚠️ 50 of ${omittedCount + 50} records shown. Use 'search' with filters to narrow results.`)
)
A query that returns 10,000 rows won't explode your context window. The array gets sliced at the Presenter level, before serialization, and the agent receives a hint that actively shapes its next move. You're not just capping costs - you're teaching the agent to paginate.
Formatting that doesn't rely on luck
export const InvoicePresenter = createPresenter('Invoice')
.collectionUiBlocks((invoices) => [
ui.echarts({
series: [{ type: 'bar', data: invoices.map(i => i.amount_cents / 100) }]
})
]);
Your Node.js server generates the exact chart configuration. The agent forwards a pre-built block to the client. The LLM is not in the formatting loop at all, which means the formatting loop cannot hallucinate. Not "rarely hallucination" - structurally impossible, because the probabilistic system isn't involved.
Affordances: telling the agent what comes next
Agents get stuck more than people publicly admit. They guess tool names, loop on unfamiliar states, or take reasonable-looking actions that are wrong for the current business context. REST solved a version of this with HATEOAS - the server tells the client which actions are valid from the current state. MVA borrows that idea directly:
export const InvoicePresenter = createPresenter('Invoice')
.suggestActions((invoice) => {
if (invoice.status === 'pending') {
return [{ tool: 'billing.pay', reason: 'Process immediate payment' }];
}
return [];
});
Business logic lives in code that can be reviewed, tested, and versioned - not buried inside a prompt where it's invisible to your normal engineering processes.
What Changes in Practice
MVA doesn't eliminate all agent failures, and I want to be honest about that. Complex multi-step reasoning is still hard. There are problem classes where you want the LLM doing more heavy lifting, not less.
But the specific failure modes we kept running into - unpredictable cost spikes, fragile UI rendering, agents confused by business data they were never supposed to see - all of them were architecture problems. Throwing better prompts at architecture problems just generates more technical debt.
The shift felt less like learning something new and more like applying engineering discipline we already had to a context where we'd been ignoring it.
mcp-fusion is open source, TypeScript.
Docs & MVA Manifesto: vinkius-labs.github.io/mcp-fusion
GitHub: github.com/vinkius-labs/mcp-fusion
If context bloat is something you're actively fighting in your agentic stack, I'd like to know how you're approaching it. Open an issue or drop a comment - genuinely curious what patterns others are landing on.
Top comments (1)
"Hope-Driven Development" is exactly right, and it's more common than anyone admits publicly.
The billing system example resonates. We hit the same wall — the system prompt kept growing because we were trying to compensate for bad data hygiene upstream. Eventually you end up with a 4,000 token system prompt where 80% of it is cargo-culted rules that apply to maybe 5% of requests.
The HATEOAS parallel is sharp. What you're describing with affordances is essentially hypermedia for agent state — the server tells the agent which transitions are valid from this point. That's been missing from most MCP discussions because people are focused on tool exposure, not on what the agent should do after it calls a tool.
One extension worth thinking about: this pattern gets more valuable as you add models. If you ever chain agents — a planner calling a specialist — the Presenter layer becomes the contract between them. Raw database payloads are brutal to pass between models. Pre-shaped, semantically dense data travels much better through a multi-agent pipeline.
The 80% token reduction claim is believable. We've seen similar numbers when moving domain-specific context from the system prompt into tool response envelopes. Going to check out mcp-fusion.