Parth Sarthi Sharma

Posted on Jan 9

Prompt Routing & Context Engineering: Letting the System Decide What It Needs

#ai #promptengineering #vectordatabase #rag

Most LLM systems fail not because the model is weak
but because we shove everything into the prompt and hope for magic.

If you’ve ever built a RAG or agentic system, you’ve probably tried this at least once:

Retrieve more documents
Increase chunk count
Add system instructions
Extend the prompt
Increase context window

And yet… the answer still feels off.

That’s because context is not information. Context is relevance + timing + placement.

This article is about how mature LLM systems stop stuffing prompts
and start deciding what context they actually need.

The Core Problem: Static Prompts in a Dynamic World

Most early-stage LLM systems look like this:

User Query
  → Retrieve top K chunks
  → Stuff everything into a single prompt
  → Generate response

This works… until it doesn’t.

Why?

Because:

Not all questions need the same context
Not all tasks need the same instructions
Not all users need the same depth

Yet we treat every request identically. That’s where prompt routing enters.

What Is Prompt Routing (Really)?

Prompt routing is decision-making before generation.

Instead of asking:

“How do I write the perfect prompt?”

You ask:

“Which prompt, context, and tools does this request require?”

Think of it as a traffic controller for LLM calls.

A routing layer decides:

Which system prompt to use
Which context sources to include
Whether retrieval is even required
Whether the model should reason, summarise, or act

A Mental Model: LLMs Don’t Need More Context — They Need the Right Context

Consider these two queries:

“Summarise the payment terms in this contract”
“Can we safely terminate this contract early and what are the risks?”

Same document. Very different needs.

Query 1 needs:

A small, focused chunk
No reasoning
No tools

Query 2 needs:

Multiple clauses
Cross-referencing
Risk interpretation
Possibly external policy context

If both go through the same prompt pipeline, one of them will fail.

Prompt Routing in Practice (Without Buzzwords)

A practical routing layer usually classifies queries into intent buckets, such as:

❓ Factual lookup
📄 Summarisation
🧠 Reasoning / decision-making
🛠 Tool execution
🔁 Multi-step workflows

This classification can be:

Rule-based (early stage)
LLM-based (later stage)
Hybrid (best in production)

Once intent is known, everything else follows.

Context Engineering: The Part Most People Miss

Prompt routing decides what path to take.
Context engineering decides what to inject and where.

Bad context engineering looks like:

Dumping raw chunks
No ordering
No metadata
No separation between instructions and data

Good context engineering is deliberate.

Proven patterns that actually work:
1. Instruction / Data Separation

Never mix:

System rules
Retrieved content
User instructions

LLMs treat early tokens as authority.

2. Query-Aware Retrieval

Retrieve based on intent, not keywords.

A “why” question should retrieve:

Explanations
Rationale
Trade-offs

A “what” question should retrieve:

Definitions
Tables
Direct facts

3. Context Placement Matters

Important facts belong:

At the start (primacy bias)
Or at the end (recency bias)

Middle content is often ignored (hello, Lost in the Middle).

Why This Is the Bridge Between RAG and Agentic Systems

Prompt routing is the missing layer between:

Simple RAG
Agentic RAG

Without routing:

Agents overthink
Simple RAG underperforms

With routing:

Simple RAG stays simple
Agents are invoked only when needed

This is how mature systems stay:

Faster
Cheaper
Easier to debug

A Simple Rule of Thumb

If retrieval answers the question → don’t use an agent
If decisions must be made → route to reasoning
If actions are needed → allow tools
If uncertainty exists → slow the system down

That’s not prompt engineering.

That’s system design.

What’s Next

In the next article, we’ll explore:

Local RAG vs Cloud RAG: What Changes When You Leave the Demo

We’ll look at:

Why local RAG feels perfect during development
Where it quietly breaks under concurrency and scale
What cloud RAG actually buys you (and what it doesn’t)
How routing and context strategies behave differently in local vs managed setups

Because once your system can decide what context it needs,
the next challenge is making sure that decision is reliable, observable, and repeatable in production.