Serhii Panchyshyn

Posted on Apr 13 • Edited on Apr 19 • Originally published at animanovalabs.com

Your Agent Isn't Broken Because of the Prompt. It's Broken Because of What the Model Can See.

#agents #ai #production #architecture

I've watched teams spend weeks rewriting the same system prompt.

Different phrasings. More examples. Clearer instructions. The agent still picks the wrong tool. Still hallucinates. Still feels broken.

Then they rename six functions and accuracy jumps 30%.

This pattern shows up constantly across the teams I work with. The model doesn't care how clever your prompt is. It cares about what it can see.

The problem I see everywhere

Teams treat prompts like magic spells. Say the right words, get the right output.

But agents aren't following instructions. They're making predictions based on everything in context. The tool names. The API responses. The error messages. The structure of your data.

That's perception. And it matters way more than your system prompt.

Most teams optimize the wrong layer. They iterate on prompts for weeks while their tool names are handleData and processRequest. The model has no chance.

Here are 10 patterns I've seen work across the past two years of helping teams build production agents 💪

1. Tool names are the real prompt

Bad tool names are invisible to the model.

I audit client codebases and find this constantly:

// ❌ The model has no idea what this does
async function handleRequest(data: unknown) { }

// ✅ Now it knows exactly when to use this
async function createInvoiceFromQuote(quoteId: string) { }

I've seen agents with 30, 40, even 50+ tools defined Half had names like processData or executeAction. The model was guessing.

We renamed a handful of functions. Tool selection accuracy went from 60% to 87%. No prompt changes.

2. Tool descriptions matter more than you think

The model reads descriptions to decide which tool to pick.

I tell clients: write descriptions like you're onboarding a new developer. Because you are.

// ❌ Vague description
const tool = {
  name: "searchRecords",
  description: "Search for records in the system"
}

// ✅ Specific description with constraints
const tool = {
  name: "searchSupportTickets",
  description: "Search support tickets by ticket ID, customer email, priority, or date range. Returns max 50 results. Use filters to narrow results before searching."
}

Specific descriptions reduce wrong tool selection by 30-40% in my experience.

3. Passing everything into context is lazy

I've reviewed architectures where teams dump entire conversation histories into context. 20 turns. 50 tool results. Everything.

The model drowns.

What works:

Last 3 turns by default
Relevant retrieved docs only
Structured summaries instead of raw data

Less context. Better decisions. Faster responses.

One team cut their context by 60% and saw answer quality improve. Counter-intuitive until you realize the model was distracted by noise.

4. Scoped retrieval beats broad retrieval

Early RAG implementations pull from everywhere. The whole knowledge base. 200+ docs. The model has no idea which ones matter.

I push clients toward module-level filtering. If someone asks about billing, only retrieve billing docs.

// ❌ Retrieve from everything
const docs = await retriever.search(query);

// ✅ Scope to relevant module
const docs = await retriever.search(query, { 
  module: detectModule(query),
  maxResults: 5 
});

Recall goes up. Hallucinations go down. Should be the default from day one.

5. Structured outputs prevent downstream chaos

If another agent or system consumes your output, structure it.

// ❌ Free text response
"I found 3 tickets that match. The first one is #12345 from a customer in Chicago..."

// ✅ Structured response
"tickets": [
{ "id": "12345", "customer": "Acme Corp", "status": "escalated" },
{ "id": "12346", "customer": "Globex Inc", "status": "resolved" }
]

Unstructured responses compound errors. Each downstream consumer has to parse and guess. I've seen entire pipelines break because one agent returned prose instead of JSON.

6. Silent failures are invisible failures

The model can't fix what it can't see.

I audit error handling in every client codebase. Same pattern:

// ❌ Silent failure
if (!hasPermission) {
  return null
}

// ✅ Loud failure
if (!hasPermission) {
  return {
    error: "PERMISSION_DENIED",
    message: "User lacks 'tickets.create' permission",
    requiredPermission: "tickets.create",
    suggestedAction: "Request access from workspace admin"
  }
}

Explicit errors let the agent reason about what went wrong. And let you debug faster.

7. Real system state beats assumed state

I've seen agents confidently tell users something was done when it wasn't. Ticket resolved. Payment processed. Account updated. The agent assumed based on patterns instead of checking the actual record.

This happens when teams don't pass real state:

// ❌ Agent has to guess
const context = {
  id: "12345",
}

// ✅ Agent knows the truth
const context = {
 ticket: {
    id: "12345",
    status: "open",
    lastUpdate: "2024-01-15T10:30:00Z",
    assignedAgent: "Sarah K."
  }
}

Agents will make up state if you don't give them real state. Always.

8. Specialized agents beat one generalist

I've seen teams try to build one agent that handles everything. Customer questions. Data entry. Workflow automation. Reports.

It's mediocre at all of them.

The pattern that works:

One agent for customer Q&A using the knowledge base
One agent for data operations with strict schemas
One agent for document parsing with specialized prompts

Each one is easier to eval. Easier to constrain. Easier to improve.

Generalist agents are harder to debug and harder to trust. I push clients toward decomposition early.

9. Guardrails should block bad things, not useful things

"Can you help me set up a webhook?" → BLOCKED (mentions code execution)

"What's the API endpoint for exports?" → BLOCKED (mentions API)

Users stop trusting the product. Not because the AI is bad. Because the guardrails are dumb.

The users stopped trusting the product. Not because the AI was bad. Because the guardrails were dumb.

Narrow guardrails work better. Be specific about what's actually dangerous. Allow everything else.

10. Audit perception before rewriting prompts

When a client tells me their agent is underperforming, I ask these questions first:

Can it see the right tools? Are names and descriptions clear?
Can it see the right context? Or is it drowning in noise?
Can it see real state? Or is it guessing?
Can it see errors? Or do failures happen silently?

Nine times out of ten, the problem is perception. Not the prompt.

The outcome when you get this right

Teams that engineer perception instead of prompts:

Stop the endless prompt iteration cycle
Get measurable accuracy improvements in days, not months
Build agents that actually work in production
Have clear debugging paths when things break

The teams that keep tweaking prompts stay stuck. I've seen it enough times to know.

The mental model shift

Prompt engineering asks: "How do I word this better?"

Perception engineering asks: "What does the agent need to see to make a good decision?"

One has diminishing returns after a few iterations.

The other compounds as your system improves.

Stop rewriting prompts. Start auditing what your agent can perceive.

Rename tools for clarity
Scope your context
Pass real state
Make errors loud
Use specialized agents

Your agent is only as good as what it can see 👀

If you're building agents and want a second set of eyes on your architecture, I help teams get this right. DM me on X or LinkedIn.

DEV Community