This is a submission for the GitHub Finish-Up-A-Thon Challenge
What I Built
Ovela AI started as a side project driven by a question that kept pulling me back:
What would it take for a business to genuinely trust a voice AI system?
At first, I thought the answer was simple: make conversations sound natural.
The original prototype could answer calls, respond to questions, and carry a conversation reasonably well. From a technical perspective, it looked impressive.
But after speaking with accommodation providers and small business owners, I realized I was focused on the wrong problem.
Businesses don't trust a system because it sounds human.
They trust it because it behaves reliably.
Can it check availability correctly?
Can it update reservations safely?
Can it collect payments?
Can it transfer a call when confidence is low?
Can staff see exactly what happened afterward?
That realization changed the direction of the project completely.
Ovela AI evolved from a voice demo into an operational voice assistant designed to help businesses handle real customer interactions while keeping humans in control of important decisions.
Today, Ovela can:
- Handle inbound phone calls
- Check room availability
- Create reservations
- Process payments through Stripe
- Answer property and local information questions
- Transfer calls when needed
- Keep staff synchronized through a management dashboard
More importantly, every improvement is guided by a simple principle:
AI should support human operations, not blindly replace them.
Demo
๐ Live Demo (Australia)
Phone: +61 3 4823 6219
Due to abuse protection and testing limits, availability may occasionally be restricted.
Try asking:
- "Do you have any rooms available this weekend?"
- "Can I make a reservation?"
- "What attractions are nearby?"
- "What's the weather like today?"
๐ Website: https://ovela.dev
๐ GitHub Repository:https://github.com/My-CMDhub/Ovela-AI
The Comeback Story
Like many side projects, Ovela reached a point where the prototype worked well enough to demonstrate the idea.
Then it sat untouched not because the project failed but because other priorities took over.
Months later, after more conversations with business owners and more exposure to real operational challenges, I came back to the project with a very different perspective.
The biggest lesson was surprisingly non-technical.
The challenge isn't making AI speak.
The challenge is making AI behave appropriately within human workflows.
A real receptionist doesn't simply answer questions.
They:
- Recognize interruptions
- Acknowledge requests before acting
- Handle uncertainty
- Understand when information is missing
- Escalate sensitive situations
- Maintain context across an entire conversation
Most voice demos don't fail because speech recognition is poor.
They fail because the operational behavior doesn't match what people expect from a trusted assistant.
That became the focus of the revival.
What Changed
Multi-Agent Architecture
The original system relied on a much simpler flow.
The new version uses a multi-agent architecture built around Google's Agent Development Kit (ADK), allowing different agents to handle reservations, business operations, and information requests independently.
Lower Latency Conversations
Voice interactions are highly sensitive to delays.
Several architectural bottlenecks were removed to improve response times and reduce awkward pauses during calls.
Stronger Context Awareness
One of the most interesting challenges was interruption handling.
People interrupt constantly during real conversations.
The system now maintains awareness of what information has already been spoken, allowing it to continue naturally instead of restarting or losing context.
Operational Reliability
Reservation workflows, payment handling, availability checks, and dashboard synchronization were rebuilt to behave more like real business processes rather than isolated AI actions.
Abuse Protection
Real phone systems attract misuse.
Rate limits, call protections, and operational safeguards were added to prevent abuse while keeping legitimate usage frictionless.
My Experience with GitHub Copilot
Returning to a codebase that has sat inactive for months is often harder than starting a brand new one. You inherit your own past decisions without fully remembering why you made them.
For the revival of Ovela AI, I didn't use GitHub Copilot as a simple autocomplete tool to write boilerplate code. Instead, I used it as a high-level engineering partner and data auditor to manage complex architectural shifts and harden my systemโs reliability.
Here are the two major ways Copilot helped me cross the finish line, backed by real-world interaction during my workflow development:
1. Translating Complex Systems into Architecture Diagrams
As Ovela AI transitioned to a multi-agent setup, mapping out component connections, telephony triggers, and dashboard synchronization endpoints became a major cognitive bottleneck. I leveraged Copilot within my workspace as a principal solutions architect. By feeding it my core file dependencies, it mapped out a clean, production-ready system workflow directly in Mermaid.js syntax for the repository documentation.
2. Eval Hardening (The Supreme Judge)
Building a reliable voice AI requires robust testing. I simulate conversations between two LLMs and dump the evaluation telemetry into local .json log files. However, default automated grading scripts are notoriously prone to false positives (e.g., grading a hallucinated response highly simply because it sounded polite).
I utilized Copilot as a Supreme AI Evaluation Auditor. I passed it raw JSON conversation objects, prompting it to critically audit the automated scores, spot misleading feedback, and generate an adjusted "Supreme Score" with a bulleted logical justification. This drastically reduced noise in my evaluation pipeline.
Operational Code Polish & Balancing
Beyond these two core pillars, Copilot served as an excellent "cleanup crew" throughout this journey even after hitting rate limits โ. It assisted in tracking down legacy typing issues, reviewing asynchronous edge cases, and generating clean inline documentation.
Ultimately, the biggest value Copilot provided wasn't just writing lines of code faster but it was accelerating complex architectural decisions and data validation when reviving a stale codebase.
What I Learned
The most valuable lesson wasn't technical.
It was understanding the difference between a convincing demo and a useful product.
A demo succeeds when the AI says the right thing.
A business system succeeds when the right thing actually happens afterward.
That distinction changed how I think about voice AI.
Natural conversation matters.
Latency matters.
Speech quality matters.
But trust matters more.
Trust comes from reliability, transparency, and knowing when humans should remain part of the process.
I don't believe current voice AI systems perfectly replicate human interaction, and that's not really the goal.
What interests me is the space between humans and AI:
How can AI handle repetitive operational work while humans remain responsible for judgment, relationships, and important decisions?
Reviving Ovela helped me explore that question far more deeply than when I first started the project.
And honestly, that's what made finishing it worthwhile.



Top comments (0)