DEV Community

Fabien Ledoux
Fabien Ledoux

Posted on

Building Gaia: Our Journey Creating a Custom Voicebot for Customer Service

Hi everyone 👋, I’m Fabien, CEO of ContactMedia, a customer relationship center based in Avignon, France. While I’m not a developer myself, our team decided to take on a very technical challenge: building our own voicebot, which we called Gaia.

In this post, I’d like to share our journey — why we built it, what tech stack we used, the challenges we faced, and the lessons we learned along the way.

Why Build Instead of Buy?

When exploring voicebots, we quickly realized that off-the-shelf solutions were either:

Too rigid (hard to adapt to our clients’ workflows), or

Too expensive for continuous experimentation.

So we took the hard path: building our own. Our goal wasn’t to compete with big tech, but to create a tailored, flexible bot we could train and iterate on.

The Core Architecture

Like most voicebots, Gaia is powered by three main components:

Speech-to-Text (STT): Converts customer speech into text.

We experimented with Google Speech-to-Text and Whisper (by OpenAI).

Whisper provided excellent accuracy with noisy environments, but latency was higher.

Natural Language Understanding (NLU): Extracts intent from text.

We started with Rasa NLU and also tested Dialogflow.

Rasa gave us more flexibility in training domain-specific intents.

Text-to-Speech (TTS): Turns responses back into voice.

We tested Amazon Polly and Microsoft Azure TTS.

Azure’s neural voices gave the most natural experience for French.

We orchestrated everything via a Node.js backend, connected to SIP telephony APIs for real-time call handling.

Handling Real-World Complexity

Some of the challenges we faced:

Accents and dialects: French has strong regional variations. Training custom language models with our call data significantly improved accuracy.

Interruptions and overlaps: Customers often talk over the bot. We had to implement “barge-in” handling so Gaia could stop talking when the caller resumed.

Fallback logic: No NLU is perfect. We designed clear fallbacks — confirming intent, asking clarifying questions, or escalating to a human agent.

Latency: Even 500ms delay feels unnatural in a phone conversation. Optimizing our pipeline (especially TTS) was critical.

Measuring Success

We didn’t want to judge Gaia only by “calls automated.” Instead, we tracked:

Task success rate: Did the bot achieve the goal (e.g., booking an appointment)?

Average handling time: Was it faster than a human?

Customer satisfaction surveys: Did callers feel the experience was acceptable?

So far, Gaia handles ~60% of appointment scheduling calls without human help.

Key Lessons

Start with narrow use cases. Appointment scheduling was the easiest entry point.

Design for escalation. Customers need to know they can reach a human.

Continuously retrain. New slang, new accents, new cases — the bot must evolve.

Latency is UX. Users forgive mistakes, but not long pauses.

What’s Next

We’re now exploring:

Adding sentiment analysis to detect frustration.

Better context handling across multi-turn conversations.

Using fine-tuned LLMs (like GPT-4o-mini) for more natural dialogues.

Final Thoughts

Building Gaia was a huge technical challenge for a mid-sized company like ours. But it showed us something important: you don’t need to be a Silicon Valley giant to experiment with conversational AI.

If you’re a dev curious about voicebots, my advice is simple: start small, iterate fast, and always design with the human in mind.

Top comments (0)