I Cold-Called 50 Contractors Before I Wrote a Line of Code. The AI Was the Easy Part.

P_A — Wed, 03 Jun 2026 14:05:54 +0000

I built the first version in an afternoon.

Throw a job description at an LLM. Ask for a price. Render the number.

It was useless. And the reason it was useless is the whole point of this post.

The afternoon prototype that lied with confidence

The naive build demos beautifully. Ask it to price "repaint a 12x14 bedroom" and it hands back a clean, confident dollar figure every single time.

The problem is that the figure is fiction.

Too high, the contractor loses the job. Too low, they eat the loss out of their own pocket. And a model that hallucinates a number the contractor then says out loud to a paying customer is worse than no tool at all, because now it's their name on a guess, not mine.

So I closed the editor and started dialing.

50 phone calls beat 50 hours of market research

I cold-called about fifty handymen, carpenters, and remodelers and asked one question: what do you actually hate about running this business?

The answer was almost unanimous, and it had nothing to do with swinging a hammer. It was estimating. Drive across town, measure everything, build a detailed quote on your own time, then watch more than half of those quotes go nowhere because the customer was only ever shopping for a cheaper number.

One guy nailed it: at least 60% of his estimates died because the customer "always knows somebody being the cheapest."

That reframed the entire product. The job was never "generate a number." Any tool can generate a number. The job was generate a number a contractor will stake their reputation on. Call it the trust gap, and it's the only problem that actually mattered.

The engineering problem

I'll keep the secret sauce vague, that part is the business. But the shape of it:

Ground the model so it can't freewheel. Job details, photos, and external property data all go in as hard context it has to reason from.
Wrap the probabilistic layer in a deterministic one. The LLM is great at parsing messy human descriptions and genuinely bad at arithmetic and consistency, so it never gets the final word on the math.
Let it learn each contractor's real pricing over time, so the output converges on their numbers, not a national average.

The mental model that fixed everything: the LLM's job is to translate and interpret. The pricing decision lives somewhere it can't hallucinate. The day I stopped treating the model as the source of truth, the whole thing got reliable.

The stack

I kept it deliberately boring. Boring scales. Clever breaks at 2am.

Next.js (App Router), front and back in one place. Server Actions kept the AI and pricing logic server-side with no separate API service to babysit.
TypeScript, strict mode. AI features ship with enough nondeterminism already.
An LLM provider for the language layer (not saying which, or how it's prompted).
A third-party property-data API so the model isn't guessing square footage.
Stripe for collecting payment from the homeowner directly.
Managed Postgres and auth, because I'm one person and I'm not running my own infra.

The plot twist: I wasn't building an estimator

I thought estimating was the product. The contractors taught me it was the wedge.

That same "what would you delete forever?" question kept surfacing the same cluster: chasing late payments, scope-creep fights, scheduling collisions, invoices at 10pm, receipts lost before tax season. A tool that only estimated would have been easy to ignore.

So the build grew into everything that happens after an estimate gets accepted. Payment collection. A customer portal. Scheduling that respects the calendar. Auto-invoicing. Scope boundaries set up front so nobody fights about "I thought that was included" three weeks later.

The lesson I keep relearning: talk to users before you fall in love with your feature. The AI estimate was real. Shipping only it would have failed.

It's live at trylightwork.com if you want to see where it landed.

What was actually hard

Trust calibration. Getting someone to believe an AI number comes down to transparency as much as accuracy. Showing the inputs and the reasoning mattered as much as the figure.
Grounding against hallucination. The part I iterated on most and say the least about.
Payments. The one place where "move fast and break things" is just "lawsuit." I leaned entirely on Stripe primitives and refused to be clever.
Being solo. Nobody to catch my bad assumptions. The 50 phone calls were my code review for product decisions.

If you're building for an industry that usually gets garbage software (trades, home services, the unglamorous stuff), I think that's where the true opening is right now because the incumbents are slow and the users are starving for anything that respects their time.

So I'll throw it to you: if you've shipped an AI feature into a non-technical, high-stakes vertical, how did you actually get users to trust the output? because I'm still working on that bit.