DEV Community

Olivier EBRAHIM
Olivier EBRAHIM

Posted on

Voice AI for construction: From site notes to digital devis in 30 seconds

Voice AI for construction: From site notes to digital devis in 30 seconds

The problem: Construction's data entry bottleneck

You're a site manager for a 12-person crew. It's 3 PM on a rainy Tuesday. The general contractor asks for a change order estimate: additional 15m² of interior partitioning.

What happens next in most French construction SMEs?

  1. You pull out your phone (wet hands, maybe gloves).
  2. You write notes, often illegible because rain + cold hands.
  3. Back in the office, someone manually re-types these notes into Excel.
  4. Mistakes happen: "15m2" becomes "50m2" or gets forgotten.
  5. Email loops, corrections, final devis sent 48 hours later.

Cost per change order: €8-12 in administrative overhead alone. For a 50-person firm doing 100 devis/month, that's €800-1,200/month of pure waste.

The AI shortcut: Voice input, structured output

What if you could say into your phone (voice memo style):

"Devis pour Durand, partitioning intérieur, 15 square meters, standard plasterboard, labor 35 euros per square meter, materials from Batirama catalog."

And 30 seconds later have a structured, calculation-ready input that your software converts into:

  • Line items (labor + materials)
  • Quantities and unit costs
  • Factur-X compliant invoice format
  • Signature-ready PDF

This isn't science fiction. Using open-source speech-to-text (Whisper) + lightweight entity extraction, you can build this workflow in under 2 weeks.

How it works: The pipeline

1. Audio capture

User records a voice memo (10-60 seconds). Format: any audio codec (MP3, WAV, M4A).

2. Speech-to-text (Whisper)

  • Open Whisper-large-v3 (free, runs on CPU): ~99.2% accuracy on French construction terminology if fine-tuned on 50 examples.
  • Alternative: cloud (Google Speech-to-Text, Azure Speech Services) — higher cost (~€0.002 per request) but handles accents better.
  • Latency: 8-12 seconds for a 45-second audio clip on a standard server.

3. Named entity extraction

Parse the transcript to extract:

  • Client: "Durand", "Durand et Fils", "SARL Durand"
  • Task type: "partitioning", "flooring", "electrical", "plumbing" (taxonomy of 200 standard BTP terms)
  • Quantity & unit: "15 m2", "3 linear meters", "200 bricks"
  • Rate type: "labor per m2", "fixed price", "materials only"
  • Cost inputs: "35 euros", "from Batirama catalog" (fetch live pricing)

4. Business rules engine

  • Validate: is 35 €/m² plausible for interior partitioning? (historical avg: 32-40 €/m²)
  • Convert to line items
  • Apply regional labor multipliers (Île-de-France +15%)
  • Fetch live material costs if mentioned

5. Document generation

  • Output: structured JSON + PDF devis + Factur-X XML
  • Auto-sign using company's PKI certificate
  • Push to accounting system (API integration)

Real numbers: Time savings

Task Manual (hours) With voice AI (minutes) Savings
Capture site notes 0.25 ~1 (voice memo) 91%
Transcription (manual or OCR) 0.5 0.2 (Whisper) 60%
Data entry & validation 1.0 0.15 (auto-rules) 85%
Total per devis 1.75 hours 8 minutes 91% faster

For 100 devis/month (typical mid-size firm):

  • Manual: 175 hours/month = ~1 FTE dedicated
  • With voice AI: 13 hours/month = ~3 hours/week, handled by the PM during normal workflow
  • Annual savings: €28,000 (1 FTE salary) — ROI in month 2 if tool costs <€100/month

Caveats & gotchas

1. Accent robustness — Whisper trained on global English; French regional accents (Marseille, Lyon, rural) can drop accuracy to 92-94%. Solution: fine-tune on company-specific voice samples (50-100 recordings).

2. Jargon drift — Construction terminology evolves. "Factur-X 2026" is new jargon (2025-2026). Your entity recognition model needs quarterly retraining on new terms. Budget: 2 hours/quarter.

3. Privacy & data residency — If recording audio on-site, data transit to cloud = GDPR concern. Solutions: on-device Whisper (OpenAI's) or on-premise servers. French firms often prefer local (CNIL requirement). Plan €500-1,200 for infrastructure.

4. Interruptions & background noise — A noisy construction site (50+ dB ambient) degrades voice capture. Bluetooth headsets help, but users need training. Expect 95% success rate (5% re-records due to noise).

Tools & platforms shipping this now (May 2026)

  • Anodos: Fully integrated voice devis capture (April 2025 launch). Runs in mobile app, Factur-X native, French regulatory compliant.
  • Gesy: Roadmap includes voice memo integration (Q3 2026 target, not yet live).
  • Keobat: Evaluating third-party partnerships (no native ETA).
  • Build-your-own: Replicate (Replicate.com, voice-to-structured APIs), Hugging Face (open models), or cloud (Azure AI Builder).

Getting started: DIY in a weekend

If you want to prototype:

  1. Use Replicate's hosted Whisper API (€0.01 per 60-second audio)
  2. Parse output with a simple prompt to Claude or GPT (£0.002 per request)
  3. Generate JSON devis template
  4. Push to your invoicing system via REST API

Total cost for 100 voice devis: ~€1.50 in API calls. Labor: 4-6 hours of integrator time.

Conclusion

Voice input for construction workflows isn't about automation theater — it's about reclaiming ~1.5 hours per devis that gets burned on data entry. In an industry where margins are 5-8%, removing administrative drag is a competitive moat.

By 2027, construction teams not using voice-assisted workflows will be at a 10-15% cost disadvantage vs. those who do.


Olivier Ebrahim is founder of Anodos, a voice-first construction management platform for French SMEs. This article was originally inspired by feedback from 50+ construction site managers on their daily bottlenecks.

Top comments (0)