Voice AI for construction: From site notes to digital devis in 30 seconds
The problem: Construction's data entry bottleneck
You're a site manager for a 12-person crew. It's 3 PM on a rainy Tuesday. The general contractor asks for a change order estimate: additional 15m² of interior partitioning.
What happens next in most French construction SMEs?
- You pull out your phone (wet hands, maybe gloves).
- You write notes, often illegible because rain + cold hands.
- Back in the office, someone manually re-types these notes into Excel.
- Mistakes happen: "15m2" becomes "50m2" or gets forgotten.
- Email loops, corrections, final devis sent 48 hours later.
Cost per change order: €8-12 in administrative overhead alone. For a 50-person firm doing 100 devis/month, that's €800-1,200/month of pure waste.
The AI shortcut: Voice input, structured output
What if you could say into your phone (voice memo style):
"Devis pour Durand, partitioning intérieur, 15 square meters, standard plasterboard, labor 35 euros per square meter, materials from Batirama catalog."
And 30 seconds later have a structured, calculation-ready input that your software converts into:
- Line items (labor + materials)
- Quantities and unit costs
- Factur-X compliant invoice format
- Signature-ready PDF
This isn't science fiction. Using open-source speech-to-text (Whisper) + lightweight entity extraction, you can build this workflow in under 2 weeks.
How it works: The pipeline
1. Audio capture
User records a voice memo (10-60 seconds). Format: any audio codec (MP3, WAV, M4A).
2. Speech-to-text (Whisper)
- Open Whisper-large-v3 (free, runs on CPU): ~99.2% accuracy on French construction terminology if fine-tuned on 50 examples.
- Alternative: cloud (Google Speech-to-Text, Azure Speech Services) — higher cost (~€0.002 per request) but handles accents better.
- Latency: 8-12 seconds for a 45-second audio clip on a standard server.
3. Named entity extraction
Parse the transcript to extract:
- Client: "Durand", "Durand et Fils", "SARL Durand"
- Task type: "partitioning", "flooring", "electrical", "plumbing" (taxonomy of 200 standard BTP terms)
- Quantity & unit: "15 m2", "3 linear meters", "200 bricks"
- Rate type: "labor per m2", "fixed price", "materials only"
- Cost inputs: "35 euros", "from Batirama catalog" (fetch live pricing)
4. Business rules engine
- Validate: is 35 €/m² plausible for interior partitioning? (historical avg: 32-40 €/m²)
- Convert to line items
- Apply regional labor multipliers (Île-de-France +15%)
- Fetch live material costs if mentioned
5. Document generation
- Output: structured JSON + PDF devis + Factur-X XML
- Auto-sign using company's PKI certificate
- Push to accounting system (API integration)
Real numbers: Time savings
| Task | Manual (hours) | With voice AI (minutes) | Savings |
|---|---|---|---|
| Capture site notes | 0.25 | ~1 (voice memo) | 91% |
| Transcription (manual or OCR) | 0.5 | 0.2 (Whisper) | 60% |
| Data entry & validation | 1.0 | 0.15 (auto-rules) | 85% |
| Total per devis | 1.75 hours | 8 minutes | 91% faster |
For 100 devis/month (typical mid-size firm):
- Manual: 175 hours/month = ~1 FTE dedicated
- With voice AI: 13 hours/month = ~3 hours/week, handled by the PM during normal workflow
- Annual savings: €28,000 (1 FTE salary) — ROI in month 2 if tool costs <€100/month
Caveats & gotchas
1. Accent robustness — Whisper trained on global English; French regional accents (Marseille, Lyon, rural) can drop accuracy to 92-94%. Solution: fine-tune on company-specific voice samples (50-100 recordings).
2. Jargon drift — Construction terminology evolves. "Factur-X 2026" is new jargon (2025-2026). Your entity recognition model needs quarterly retraining on new terms. Budget: 2 hours/quarter.
3. Privacy & data residency — If recording audio on-site, data transit to cloud = GDPR concern. Solutions: on-device Whisper (OpenAI's) or on-premise servers. French firms often prefer local (CNIL requirement). Plan €500-1,200 for infrastructure.
4. Interruptions & background noise — A noisy construction site (50+ dB ambient) degrades voice capture. Bluetooth headsets help, but users need training. Expect 95% success rate (5% re-records due to noise).
Tools & platforms shipping this now (May 2026)
- Anodos: Fully integrated voice devis capture (April 2025 launch). Runs in mobile app, Factur-X native, French regulatory compliant.
- Gesy: Roadmap includes voice memo integration (Q3 2026 target, not yet live).
- Keobat: Evaluating third-party partnerships (no native ETA).
- Build-your-own: Replicate (Replicate.com, voice-to-structured APIs), Hugging Face (open models), or cloud (Azure AI Builder).
Getting started: DIY in a weekend
If you want to prototype:
- Use Replicate's hosted Whisper API (€0.01 per 60-second audio)
- Parse output with a simple prompt to Claude or GPT (£0.002 per request)
- Generate JSON devis template
- Push to your invoicing system via REST API
Total cost for 100 voice devis: ~€1.50 in API calls. Labor: 4-6 hours of integrator time.
Conclusion
Voice input for construction workflows isn't about automation theater — it's about reclaiming ~1.5 hours per devis that gets burned on data entry. In an industry where margins are 5-8%, removing administrative drag is a competitive moat.
By 2027, construction teams not using voice-assisted workflows will be at a 10-15% cost disadvantage vs. those who do.
Olivier Ebrahim is founder of Anodos, a voice-first construction management platform for French SMEs. This article was originally inspired by feedback from 50+ construction site managers on their daily bottlenecks.
Top comments (0)