Building contractor automation without overpromising dispatch: boundaries, observability, and escalation

#fieldservice #ai #contractors #automation

I'm Abe, founder of OnCrew. We build the AI answering service for HVAC, plumbing, electrical, and roofing contractors. This is a follow-up to my earlier post on triage architecture. Here I want to write about something less glamorous but more important: the boundaries you have to design around, the observability you need to know they're holding, and the escalation paths you need so that a misclassified call doesn't turn into a real-world problem.

The shorthand version of this article: it's tempting to build a "full dispatcher" automation. Don't. Build the part that's automatable, leave the part that isn't to humans, and make the seam between them legible.

Why dispatch is not the part to automate

Dispatch — actually slotting a job into a route, promising a customer that a tech will be there at 2:15pm, taking a payment, handling a warranty discussion — has a few properties that make it a bad fit for current-generation automation:

Real-time inventory and constraints. Dispatch depends on where every tech currently is, what jobs they're already committed to, what parts are on the truck, and what skill mix the next job needs. Modeling this in an AI agent is possible but the latency and reliability profile is bad. Get it wrong and you've promised something you can't deliver.

Liability surface. A tech ETA is a soft commitment to a customer. Miss it and they're entitled to be angry. Promise it without authority and the shop is on the hook for the disappointment, not the vendor.

High-empathy edge cases. A frantic customer at 2am whose basement is flooding does not need an AI to read them a service window from a script. They need a person to say "we're getting somebody on the way, here's what to do in the meantime to limit damage." That voice belongs to a human dispatcher.

So our design rule is: the agent does intake, triage, and structured handoff. The human dispatcher does dispatch. The seam between them is where we spend most of our engineering effort.

The boundary layer

The boundary layer is a small piece of code with one job: prevent the agent from saying anything dispatcher-shaped. It runs after the response generator and before the TTS layer.

Filters in the boundary layer:

Promise words. "Guarantee," "definitely," "we will be there," "I'll send," "we'll dispatch." These get rejected or rewritten.
Time-of-arrival statements. Anything matching a pattern like "in X minutes," "around X o'clock," "by X" gets rewritten to "we'll alert the on-call team with your details."
Price statements. "It'll cost $X," "the service fee is $X," any dollar-denominated quote. Rewritten to "the team will be able to give you pricing when they follow up."
Diagnostic statements. "It sounds like your capacitor is failing," "that's probably a clogged drain." Rewritten to "I'll note the symptoms for the team to assess."
Safety-action statements. "You should turn the breaker off," "go ahead and shut the main water valve." Rewritten in most cases to "I'll have the on-call team follow up about that" except for explicit life-safety direction (call 911, leave the building if gas smell), which is allowed.

This boundary layer is the most boring part of the system and the most important. The LLM will, on its own initiative, occasionally say something it shouldn't. The boundary layer catches it. We treat any boundary-layer rewrite as a signal — it tells us where the response generator's prompts need work.

Observability

You cannot maintain these boundaries blind. The minimum observability surface for a phone agent that handles real calls:

Per-call structured summary. Captured fields, classification, action log. Stored as a queryable record, not a free-text blob.

Per-call boundary-layer events. Every rewrite or rejection logged with the candidate response, the rule that triggered, and the final response. This is the dataset you need to improve the prompts. Without it you're guessing.

Per-call ASR confidence histogram. Especially on critical fields (callback number, address). Low-confidence regions get flagged for review.

Per-call disposition. What happened to the captured ticket downstream? Did the dispatcher act on it? Did the call get returned within X hours? Did the customer call back saying the agent got something wrong?

Aggregate dashboards. Classification distributions over time. Boundary-layer event rates. Field-coverage rates per trade. Average call duration. These let you spot drift before the customer does.

Escalation alerts. When a call gets misclassified, or a field that should always be captured wasn't, or a known-bad utterance was generated, an alert fires to a human review queue. We staff this queue. It's the only way to catch regressions in production.

Escalation paths

There are calls the agent shouldn't handle. The design question is what happens when it encounters one.

We have three escalation paths:

1. Live transfer to the on-call human. For calls flagged as urgent and where the shop has a live-answer human on the on-call line. The agent captures intake, then connects. The transfer is a hand-off, not a hot-potato — we pass the captured fields and a one-line summary so the human starts informed.

2. Alert with structured payload. For calls flagged as urgent where there isn't a live-answer line — most shops, most of the time. The agent captures intake, completes the call with the caller, and fires an alert into the shop's configured workflow (text, phone, Slack, webhook into dispatch tool, email). The on-call human sees the structured intake and decides whether to call the customer back themselves.

3. Self-direction to 911 / utility. For life-safety calls — gas smell with respiratory symptoms, smoke, electrocution risk, downed lines. The agent explicitly tells the caller to call 911 and/or the utility's emergency number while also capturing intake for the shop's records. This is not a substitute for emergency services. We don't pretend it is.

The thing to avoid is the fourth path: "agent improvises and tells the caller something it shouldn't." That's what the boundary layer is for. If the agent doesn't know what to do, the right behavior is to capture what it can, complete the call politely, and let the human follow up.

What "automation" actually means in this domain

I want to argue for a definition. "Automating" a piece of contractor operations doesn't mean replacing the human at every step. It means moving the routine, structured, time-of-day-independent work onto a machine, and leaving the judgment-heavy, real-time, relationship-heavy work to the human.

Phone intake is mostly routine and structured. Pattern-matching urgency cues is mostly routine and structured. Producing a clean ticket is mostly routine and structured. These are good candidates.

Dispatch is real-time and judgment-heavy. Payment is judgment-heavy and liability-laden. Warranty discussions are relationship-heavy. These are bad candidates, today.

The framing matters because it changes what you build. If your goal is to replace the dispatcher, you'll build a system that overreaches and disappoints. If your goal is to make the dispatcher's morning easier — clean tickets waiting in priority order, urgent calls already alerted, transcripts available for review — you'll build a system that the dispatcher actually wants.

This is one of those places where the right ambition is the smaller ambition.

Concrete checklist for the design

If you're building something similar, here's what I'd make sure you have before you put it in production:

Hard boundary on promise words and ETA statements, enforced in code.
Per-call boundary-layer event log.
Structured intake schema per trade, with field-coverage instrumentation.
Escalation paths: live transfer, alert with payload, self-direction to emergency services.
Human review queue for flagged calls.
Disposition feedback loop from the dispatcher back into the training data.
Honest marketing copy. If you tell the buyer the agent "dispatches" or "books" calls when it actually intakes and alerts, you've built a credibility debt that you'll pay down for the lifetime of the customer.

The OnCrew side of this is on our contractor answering service page if you want to see the buyer-facing version. There's a demo line at (818) 578-4783 if you want to call it and see how the boundary layer behaves in real time. (Try to get it to promise you an ETA. It won't.)

Closing

The work in this space isn't deciding whether AI can do contractor intake — it clearly can. The work is deciding what AI shouldn't do, building the fences to enforce it, and making the seam to humans clean enough that the whole system actually works. That's the part that takes the time. That's the part that's worth building well.