DEV Community

Abe
Abe

Posted on

Contractor phone service vs contractor phone answering service: the intake layer field crews actually need

Disclosure up front: I am Abe, founder of OnCrew, which builds an AI phone answering layer for contractors. I have a bias. This post is the system design I wish I had read two years ago, written for builders and operators wiring up telephony for plumbing, HVAC, electrical, garage door, roofing, locksmith, and similar field-service shops. Pick whatever vendor you want. The architecture below holds either way.

A short note on terminology first, because it trips up almost every spec doc I have seen in this space.

The vocabulary problem

"Contractor phone service" and "contractor phone answering service" sound like the same product. They are not.

A contractor phone service is the transport and routing layer. It owns the business number, the SIP trunk or carrier integration, IVR menus, voicemail, call recording, and the rules that decide which device or queue a call rings. Twilio, RingCentral, OpenPhone, and similar products live here. They move audio. They mostly do not talk.

A contractor phone answering service is the conversation layer that sits behind the routing layer. Its job is to pick up calls the human owner cannot, ask the right questions, classify what came in, and emit a structured artifact the shop can act on. Historically this was a call-center seat. Increasingly it is a voice agent backed by an LLM with strict guardrails.

If you are building or buying in this space, the bug to avoid is treating the two as one product. The routing layer should be dumb, deterministic, and observable. The conversation layer should be small, opinionated, and constrained. Mixing them is how teams end up with an IVR that pretends to be an assistant and an "assistant" that quietly hangs up on edge cases.

The rest of this post is about the conversation layer, because that is where most of the design work lives.

The intake schema

Start from the artifact, not the conversation. Every call that reaches the answering layer should resolve to exactly one of three records: an intake, a callback request, or an explicit hangup. The intake is the rich one. Here is a working schema, simplified to what an operator actually consumes:

{
  "call_id": "uuid",
  "received_at": "2026-05-11T14:22:09Z",
  "answered_in_brand": "Maple Plumbing",
  "caller": {
    "name": "string|null",
    "callback_number": "E.164|null",
    "callback_number_source": "ani|spoken|both",
    "preferred_contact": "call|sms|either"
  },
  "service_location": {
    "address_line": "string|null",
    "city": "string|null",
    "postal_code": "string|null",
    "unit_or_suite": "string|null",
    "access_notes": "string|null"
  },
  "job": {
    "trade": "plumbing|hvac|electrical|...",
    "stated_problem": "string",
    "structured_tags": ["leak", "no_hot_water"],
    "first_noticed": "string|null",
    "prior_work_by_us": "bool|null"
  },
  "urgency": {
    "level": "emergency|same_day|scheduled|info_only",
    "reasoning": "string",
    "life_safety_flag": "bool"
  },
  "scheduling_intent": {
    "preferred_window": "string|null",
    "earliest_ok": "iso8601|null",
    "constraints": "string|null"
  },
  "channels": {
    "transcript_url": "string",
    "recording_url": "string|null",
    "summary": "string"
  },
  "routing": {
    "ring_group": "on_call_pm|after_hours|owner_cell",
    "notified_at": "iso8601|null",
    "ack_deadline_seconds": 600
  }
}
Enter fullscreen mode Exit fullscreen mode

A few things to notice:

  • callback_number_source exists because automatic number identification and what the caller speaks aloud often disagree, and the shop needs to know which one to trust before calling back.
  • urgency.level is a discrete enum, not a free-text guess. Downstream tools filter and route on it, and operators need to write rules against it.
  • urgency.life_safety_flag is intentionally separate from urgency.level. Gas smell, water on an electrical panel, smoke, sewage backup affecting an elderly resident: these need a distinct flag because the right action is sometimes "stop and call 911 or the gas utility," not "send a truck."
  • routing.ack_deadline_seconds is the timer that decides when an unacknowledged intake falls through to the next escalation hop.

Design the schema before the prompt. The prompt is downstream of the schema, not the other way around.

Routing states

The answering layer is best modeled as a small state machine, not a free-running chat. Five states cover almost every real call.

  1. GREETING. Answer in the configured brand. Confirm the shop name. Do not start with a menu.
  2. TRIAGE. Collect the caller's name, callback number, address, and a one-sentence description of the issue.
  3. CLASSIFY. Tag urgency and life-safety. If life-safety, branch to SAFETY.
  4. CAPTURE_DETAILS. Ask the trade-specific follow-ups (water shutoff location, breaker accessible, system age, smell or color of leak) and the access notes (gate code, dog in yard, parking, lockbox).
  5. CLOSE. Confirm what happens next in honest language ("I am sending this to the on-call person now, expect a callback from a real human"), repeat the callback number back to the caller, end the call.

SAFETY is a sixth, branchable state. Direct the caller to 911 or the relevant utility, capture the rest of the intake if it is safe and quick, end the call promptly.

Two rules worth stealing for any builder writing this:

  • The agent should skip TRIAGE or CAPTURE_DETAILS fields the caller has already volunteered. Do not re-ask. Re-asking is the single most common reason real callers hang up on a voice agent.
  • CLOSE should make a promise the system can keep. "Someone will call you back" is keepable. "A technician will arrive in 30 minutes" is not, and an answering layer should never say it.

The escalation boundary

This is the part most teams get wrong. An answering layer should not:

  • Commit a specific technician.
  • Commit an ETA or appointment window.
  • Quote a price, even "approximately."
  • Confirm an appointment as booked on the shop's calendar.
  • Diagnose the trade problem ("that is a failing capacitor").
  • Tell the caller what physical action is safe in a hazardous situation, beyond standard "call 911 or your utility" guidance.

Why so strict? Every one of those statements is a contract the shop has to honor, and the answering layer does not have the calendar, the parts inventory, the licensure, or the legal standing to make it. The job of the answering layer is to capture enough information that a human at the shop can make those commitments accurately and fast.

In code, this looks like a system prompt with a hard list of refusals, plus a deterministic post-call validator that scans the transcript and the structured output for forbidden phrases ("on the way", "will arrive at", "the price is", "we book you for"). If the validator fires, the artifact is flagged for human review before it is treated as a normal intake. Treat the validator as a load-bearing safety control, not a logging step. Failures here are the difference between an answering service that helps and one that quietly creates disputes.

Urgency classification

The hardest part of the conversation layer is urgency. Operators want four buckets, not a sliding scale.

  • emergency. Active damage or safety risk happening now. Burst pipe with water spreading, gas smell, no heat in freezing weather with an infant or elderly resident in the home, locked out at night, electrical smoke. Page the on-call human immediately.
  • same_day. Not an emergency, but the caller cannot reasonably wait. No hot water in a working household, partial outage, a blocking issue for a small business's operating day. Route into the same-day queue.
  • scheduled. Routine repair, maintenance, quote request, install. Goes into the normal scheduling queue.
  • info_only. Warranty question, billing question, "do you service my zip code." Does not need a callback from a technician.

Encode the rubric in the prompt, and also encode it in a deterministic post-processing step. LLM classification drifts under load. Deterministic re-classification using the structured tags is a useful safety net. If the two disagree, escalate the higher-urgency one and log the disagreement for prompt tuning.

A real subtlety: do not let urgency classification be the agent's primary goal during the call. Capture the facts first. Classify at the end, or in a separate post-call step. Agents that try to triage in real time tend to lead the witness ("does that feel like an emergency to you?") and contaminate the data.

Handoff logic

The most underrated feature of a contractor phone answering service is the part after the call ends. A practical handoff has four pieces:

  1. Structured payload delivered to the shop's system of record (CRM, dispatch tool, or a webhook). The schema above is what this should look like.
  2. Human-readable summary sent by SMS or email to the on-call number, optimized for reading on a phone in a driveway between jobs.
  3. Acknowledgement loop with a deadline. If the on-call human does not ack within the configured window, the next hop in the escalation chain gets the same payload and SMS, and the system queues a callback record so nothing is dropped.
  4. Audit trail with the transcript and, where consented and lawful, the recording. This is how you tune the agent and how you defend yourself in a dispute.

A subtle point worth designing for: the handoff is also where shops learn what the agent should not be doing. Tag every artifact with what the agent said it would do ("told caller someone would call back within 30 minutes"). When the shop fails to honor those statements, you find out exactly which promises the agent should stop making.

Cost model

Pricing for the answering layer is genuinely hard to compare across vendors. Three shapes dominate:

  • Per minute, with a monthly minimum. Common for human call centers. Real cost moves with call length, which moves with how chatty your callers are.
  • Per call, with an included bucket and an overage rate. Common for AI services.
  • Per seat, for in-house staffing. Predictable if utilization is high, painful if it is not.

For reference, OnCrew runs Starter at $49 per month for 100 included calls, Pro at $149 per month for 400, Multi-Truck at $349 per month for 1,000, with overage at $0.99 per call. If you are evaluating vendors, compare the included bucket and the overage rate together. A low headline price with expensive overage can cost more than a higher headline price with generous overage in a busy month.

Where OnCrew fits

If you are operating a small or mid-size shop and want a working example of this architecture rather than building it yourself, OnCrew implements the schema, state machine, escalation boundary, and handoff loop described above. The product page for the contractor phone answering service walks through how it answers in your brand, asks the trade-specific questions, classifies urgency, alerts the configured on-call workflow, and emits the transcript and structured summary the shop reads between stops. It does not commit a technician, an ETA, a price, or a calendar slot. That stays with the human at the shop, on purpose.

The reason I am posting the design and the price together is that the design is the part you should steal whether or not you ever pay us. The shops that win at intake over the next few years will be the ones whose phone answering layer makes specific, machine-readable promises about what it will and will not do. Build that, or pick a vendor that already has.


Built something similar, or have a war story about an answering layer overstepping its bounds? Drop it in the comments. I read every reply in this category and the failure modes are surprisingly consistent.

Top comments (0)