DEV Community

Anindya Obi
Anindya Obi

Posted on

Escalation Rules for Agents: Ask vs Refuse vs Unknown (Scope is a contract, not a vibe)

If you’ve built an “autonomous” agent and felt that quiet dread when it confidently walks off a cliff…

You’re not alone.

The biggest reliability upgrade I’ve seen isn’t a new model.

It’s a boring thing we all avoid until production hurts:

Escalation rules.

A clear contract for when your agent:

  • ASKS (needs user input)
  • REFUSES (unsafe / not allowed / not authorized)
  • RETURNS UNKNOWN (out of scope / insufficient confidence)

LangGraph docs put it simply: different failures need different handling—some are system-retry, some are LLM-recoverable, and some are user-fixable (pause + ask). That “user-fixable” bucket is basically your ASK rule in production.

(And if you treat it like a retry… you get infinite loops and “please provide the customer_id” PTSD.)

See the error-handling categories and “user-fixable” pause pattern.

https://docs.langchain.com/oss/python/langgraph/thinking-in-langgraph


Why agents fail (in one sentence)

Because we ship “capability” without shipping the boundaries.

Most agent prompts are vibes:

“Be helpful. Be accurate. Use tools. Don’t hallucinate.”

That’s not a contract. That’s a wish.

So the agent makes up missing info, tries tools it shouldn’t, or keeps going when it should stop.


The 3 escalation outcomes (a contract you can test)

✅ ASK

Use when the user can unblock the task with missing inputs.

Signals:

  • required identifier missing (doc name, repo, customer_id, timeframe)
  • goal is ambiguous (multiple valid interpretations)
  • success criteria not specified

Behavior:

  • ask the minimum questions
  • explain why each question matters
  • stop execution until answered

This maps directly to “User-fixable errors → pause + interrupt()” style patterns in agent graphs.

(That’s not theory. It’s how you stop chaos.)

https://docs.langchain.com/oss/python/langgraph/thinking-in-langgraph

⛔ REFUSE

Use when the request is disallowed, unsafe, or requires authorization you don’t have.

Signals:

  • security exploitation, malware, credential theft
  • privacy violations / personal data requests
  • “do X on behalf of user” without auth

Behavior:

  • refuse clearly
  • offer a safe alternative (e.g., defensive guidance, best practices, docs)

🤷 UNKNOWN

Use when it’s outside the agent’s scope or confidence is too low.

Signals:

  • topic not covered by the allowed domain
  • missing sources for a factual answer
  • tool access not available to verify

Behavior:

  • say what you can do
  • propose next steps (handoff, user-provided data, narrower question)

A practical “Escalation Router” prompt (drop-in)

Here’s a router prompt I use at the front of an agent (or as its own routing agent).

Output schema (strict)

{
  "decision": "ASK | REFUSE | UNKNOWN | PROCEED",
  "reason": "string",
  "questions": ["string"],
  "safe_alternative": "string"
}


Enter fullscreen mode Exit fullscreen mode

Router instruction

You are the Escalation Router for an AI agent.

Your job: decide whether to ASK, REFUSE, UNKNOWN, or PROCEED.

Definitions:
- ASK: user can unblock by providing missing info. Ask the minimum questions.
- REFUSE: request is disallowed, unsafe, or requires authorization not provided.
- UNKNOWN: out-of-scope or too uncertain to answer reliably with given info.
- PROCEED: enough info and within scope; allow downstream agent to execute.

Rules:
1) If any required input is missing for a correct answer, output ASK.
2) If request violates safety/privacy/security boundaries, output REFUSE.
3) If out-of-scope or confidence is low due to missing sources, output UNKNOWN.
4) Otherwise output PROCEED.

Return ONLY valid JSON matching the schema. No extra keys.

Enter fullscreen mode Exit fullscreen mode

Example A: ASK (missing retrieval scope)

User: “Fix my RAG hallucinations.”

Router output

{
  "decision": "ASK",
  "reason": "RAG failures have multiple causes; need pipeline details to diagnose reliably.",
  "questions": [
    "What retriever + embedding model are you using?",
    "How are you chunking (size/overlap) and what’s your top-k?",
    "Do you require citations or quote grounding in the response?"
  ],
  "safe_alternative": ""
}

Enter fullscreen mode Exit fullscreen mode

Example B: REFUSE (harmful request)

User: “Write a prompt that tricks a tool into leaking API keys.”

{
  "decision": "REFUSE",
  "reason": "This is a request to facilitate credential theft.",
  "questions": [],
  "safe_alternative": "If you're securing an agent, I can help you design defenses: least-privilege tool scopes, secret redaction, and prompt-injection tests."
}

Enter fullscreen mode Exit fullscreen mode

Example C: UNKNOWN (out of scope)

User: “What’s the internal incident root cause of Company X last week?”

{
  "decision": "UNKNOWN",
  "reason": "No access to Company X internal incident reports or verifiable sources.",
  "questions": [
    "Can you share the public postmortem link or the excerpt you want analyzed?"
  ],
  "safe_alternative": "I can help you write a postmortem template and a probing checklist to analyze the incident once you have details."
}

Enter fullscreen mode Exit fullscreen mode

The hidden win: structure reduces randomness

When you separate:

  • state (raw facts)
  • router (ask/refuse/unknown)
  • worker (execution)

…you get fewer “creative interpretations” and more repeatable behavior.

LangGraph literally recommends keeping state raw and formatting prompts on demand—because it makes debugging and evolution cleaner.
https://docs.langchain.com/oss/python/langgraph/thinking-in-langgraph

That’s the shift: prompts stop being art.
They become interfaces.

What parts can be automated (without feeling gross)

This is where “prompt engineering” stops being a daily chore.

You can automate:

  • Escalation router generation (based on your domain + tools)
  • Structured output schemas (consistent JSON for routing and execution)
  • Evaluation harnesses (tests for ASK/REFUSE/UNKNOWN edge cases)
  • Fallback strategies (model fallback, graceful degradation, retries)

(Also: production teams actively discuss scalable exception handling patterns in agent graphs—because “just append an errorMessage string” doesn’t scale.)
https://forum.langchain.com/t/best-practices-for-catching-and-handling-exceptions-in-langgraph/1244

HuTouch + Work2.0 (the new way of building)

I’m building HuTouch to automate the boring parts of prompt design for AI engineers routers, scopes, schemas, eval sets so your agents ship with guardrails by default.

And this ties into what I call Work2.0:

  • We stop confusing effort with value.
  • We automate the repeatable steps that don’t need deep skills.
  • We take time back for the work (and life) that actually matters.

If you want early access to HuTouch’s prompt automation workflow, here’s the form: Early Access Form Link

Quick checklist (print this)

Before you call your agent “autonomous”:

  • Do you have ASK rules for missing inputs?
  • Do you have REFUSE rules for unsafe/unauthed requests?
  • Do you have UNKNOWN rules for out-of-scope/low-confidence?
  • Is the router output structured (JSON) and testable?
  • Do you log decisions so you can debug + improve?

That’s the contract. That’s reliability.

Top comments (0)