Gatekeeper: building a refund autopilot that knows when to stop

#qwen #ai #agents #showdev

Live demo: https://gatekeeper-ochre.vercel.app
Code: https://github.com/yanzaaa/gatekeeper

The trap nobody talks about

Refund and dispute triage is the perfect thing to automate. It is high volume, repetitive, and rule-heavy. So everyone reaches for an agent that reads the request and decides: approve or deny.

Here is the trap. An agent that auto-approves and auto-denies everything will, sooner or later, confidently refund a fraudster or reject a real customer who deserves their money back. And it will do it at 95 percent confidence, with a clean explanation, in a tone that sounds exactly like the times it was right.

The dangerous failure mode is not a slow agent. It is an autopilot that acts when it should have stopped.

So I built Gatekeeper: an autonomous refund-triage agent on Qwen that clears the routine cases on its own and refuses to act on the risky ones, escalating them to a human with its reasoning attached.

What it does

You hand Gatekeeper a queue of refund requests. Each one has the amount, the item condition, the customer's stated reason, the days since purchase, and their history. For every request, it does one of three things:

Clear, safe cases get auto-resolved. A defective item inside the return window from a first-time buyer is approved. A used item returned 41 days late is denied, with the exact policy reason cited.
Risky or uncertain cases get refused and escalated. High value, a possible serial refunder, a reason that conflicts with the item, an ambiguous policy, or low confidence, and Gatekeeper hands it to a human instead of guessing.

A person only ever looks at the escalation queue. Everything else is already handled.

The part that actually matters: restraint in code

Qwen on Qwen Cloud is the reasoning engine, called through the OpenAI-compatible endpoint with structured JSON output. It is genuinely good at reading these cases.

But the model is not where the trust comes from. The trust comes from a deterministic restraint guardrail that sits on top of it. Even when Qwen confidently returns to approve or deny, the guardrail overrides it and forces an escalation when:

the amount is over a set threshold,
the model's confidence is below a floor, or
a blocking risk flag is present (serial refunder, suspected fraud, conflicting evidence).

Before deciding, the agent makes a real Qwen tool-call (assess_customer_risk) for deterministic risk signals; the guardrail then escalates when the amount > $500, confidence < 0.78, or a blocking flag is present.

Most refund agents are a one-way ratchet toward approval; every prompt tweak makes them more eager to say yes. Gatekeeper ratchets the other way. The restraint isn't a prompt the model can be talked out of; it's deterministic code the model never sees and can't override. You can copy the UI in an afternoon. You can't copy a guardrail that fires before the model's verdict is ever trusted.

Restraint is enforced in code, not requested in a prompt. A prompt can ask a model to be careful. It cannot guarantee it. A few lines of deterministic TypeScript can.

// The guardrail runs AFTER Qwen returns its verdict.
// Even a confident "approve" gets overridden here.
function applyGuardrail(result: ModelVerdict): Action {
  if (result.amount > HIGH_VALUE_THRESHOLD) return "escalate";
  if (result.confidence < CONFIDENCE_FLOOR)  return "escalate";
  if (BLOCKING_FLAGS.some(f => result.risk_flags.includes(f)))
    return "escalate";
  return result.action; // only now do we trust the model
}

In production: HIGH_VALUE_THRESHOLD = $500, CONFIDENCE_FLOOR = 0.78, plus a blocking-flag set. Any one trips an escalation.

The moment that proves it

The demo has one card I always point to: a 1,240 dollar TV, reported as arriving with a cracked screen. Qwen looked at it and decided to approve at 95 percent confidence. A pure model-driven agent would have refunded it on the spot.

Gatekeeper ships with a 26-test suite covering the guardrail and triage logic. Across the representative refund cases, it auto-resolved the routine ones, escalated every high-value or low-confidence call, and produced zero bad auto-approvals — including the $1,240 TV the model wanted to approve at 95% confidence.
Across 8 representative cases, Gatekeeper auto-resolved 5, escalated 3, and held back every high-risk one with zero bad auto-approvals.

What I learned

The most valuable thing an autonomous agent can do is know when not to act. Covering the routine cases is table stakes. Trust is earned entirely on the risky ones.

Qwen, and qwen-max via Alibaba Cloud DashScope, plus a thin deterministic guardrail, turned out to be a strong, general pattern for agentic decisions where a wrong auto-action is expensive. The model brings judgment and natural-language reasoning. The code brings guarantees. You want both, and you want them in that order.

I also kept the system honest under demo pressure: a key-free deterministic fallback keeps the app running even if the API is unavailable, so the live demo cannot crash, and the restraint logic is transparent and auditable rather than hidden in a prompt.

What is next

Wire it to a real commerce or support backend (Shopify, Zendesk) and act on the approvals.
Learn the escalation thresholds from human overrides over time.
Add a second judgment category for chargebacks and disputes.

I built the whole thing solo with Claude Code, in a single build session. It is deployed, live, and the restraint behavior is visible right in the UI.

Gatekeeper handles routine work and raises its hand for risky calls. The autopilot you can trust, because it knows its limits.

Live: https://gatekeeper-ochre.vercel.app
Code: https://github.com/yanzaaa/gatekeeper