DEV Community

Christopher Derrell
Christopher Derrell

Posted on

I built a Voice Agent that plans 5Ks & Marathons - Like Me.

Built with Google Gemini: Writing Challenge

This is a submission for the Built with Google Gemini: Writing Challenge

BigTree Cover Image

If you’ve ever asked for directions as someone from the Caribbean, you already know the genesis for the name for my app, BigTree. I'm 100% sure it's not just us, but directions are "turn by the next big frangipani tree, and come straight down the road". I'm yet to meet anyone who say, turn at Latitude 51.5151° N, Longitude 0.2185° W, lol.

It’s:

“Go down the road, turn left at the big tree, pass the painted stone, and it’s right there.”

That’s how we think. That’s how we communicate. That’s how we move.

And as a runner and a developer building GoodFinish - a race management platform built specifically for small, grassroots Race Directors (50–200 runners, not corporate mega-events) - there's a friction point for race organizers:

Mapping the route was the most painful part of the process.

Most tools force you into slow, tedious point-and-click plotting.
But that’s not how we describe routes.
And it’s definitely not how small-town RDs think.

So I built something different.

What I Built with Google Gemini

I built BigTree — a conversational, voice-activated route design add-on for Goodfinish. But it also works standalone. Instead of clicking 200 points on a map, you just talk with it. Because it uses Gemini 3.1 Pro and the Google Maps API on the backend, it should be pretty accurate. It's not perfect so far, but try it out, see the distances it gives you.

*Try it out for yourself here - BigTree Routes *

You can say:

“Start at the local park in MY CITY. Map a 5K heading north toward the beach and loop back.”

BigTree listens.
It responds.
It suggests improvements.
It draws the polyline live on the map.
And it instantly generates a downloadable, industry-standard GPX file.

No GIS headaches.
No technical friction.
Just describe it the way you would describe it to a friend.


What Role Did Gemini Play?

Gemini isn’t a feature in BigTree.

It’s the brain.

Here’s how the architecture breaks down:

1️⃣ Real-Time Voice Interface

Gemini 2.5 Native Audio (Live API) powers the live conversation.

  • It listens to route descriptions in real-time.
  • It talks back (Zephyr voice) to confirm distances.
  • It suggests route alternatives.
  • It warns about disconnected roads.
  • It allows interruptions mid-conversation.

The latency is low enough that it feels natural. Not “AI-ish.” Just fluid.


2️⃣ Spatial Reasoning Engine

Gemini 3.1 Pro handles the heavy geospatial thinking.

Using function calling:

  • The Live API passes structured intent.
  • 3.1 Pro translates natural language (including landmark-based Caribbean-style directions) into:

    • Exact latitude/longitude coordinate arrays
    • Smooth polylines
    • Raw GPX XML

Essentially, I turned an LLM into a geospatial engine.

That part was fascinating.


3️⃣ Real-World Context & Search

Gemini 2.5 Flash + Maps Grounding

  • Finds real-world landmarks.
  • Handles requests like:

“Route us past a good coffee shop at mile 2.”

Gemini 3 Flash Preview + Search Grounding

  • Pulls real-time data like:

    • Weather conditions for race day
    • Live environmental context

What I Learned

Technical Lessons

Streaming Audio with WebSockets
Integrating the Live API forced me deep into:

  • Web Audio API
  • PCM16 audio streaming
  • Script processor nodes
  • Raw audio chunk transmission over WebSockets

Real-time voice is not trivial. But once it works? Game-changing.

Spatial Prompt Engineering
Getting an LLM to output:

  • Strict JSON
  • Clean coordinate arrays
  • Valid GPX XML
  • Smooth realistic route curves

…requires extremely disciplined prompting.

You can’t “kind of” structure it.
It has to be deterministic enough for production.


Unexpected Insight

Voice might be the ultimate UI for mapping.

I didn’t realize how much friction traditional map tools create until I removed the mouse.

When you can just speak and watch the route draw itself, it feels like magic.

And more importantly:

It lowers the barrier for small, community Race Directors who just want to host a great 5K — not learn GIS software.

That matters to me.

Because Goodfinish was never about enterprise race timing.

It’s about empowering the grassroots.


What Worked Well

  • Live API latency & voice quality — surprisingly natural.
  • Function calling reliability — context flowed cleanly from voice session to backend route generation.
  • The model clearly understood the difference between:

    • “Describe a route”
    • “Ask a general question”

That separation was impressive.


Where I Hit Friction

Audio Buffer Management
Capturing mic input → converting to exact PCM format → decoding returned audio streams
… was not plug-and-play.

An out-of-the-box abstraction or SDK utility for browser audio contexts would be incredible.

Strict JSON Output
Occasionally, large GPX responses were wrapped in markdown blocks:

{ ... }
Enter fullscreen mode Exit fullscreen mode

That breaks JSON.parse() instantly.

I had to implement backend sanitization to guarantee pipeline stability.

The biggest one of course is that this is still and LLM 😆. So you can get route hallucinations between different points on the map.

Production AI requires guardrails.


Why This One Matters to Me

I build for:

  • Small operators.
  • Grassroots events.
  • People who don’t have tech teams.
  • Communities where directions are still “turn by the big tree.”
  • I know for a FACT my local run club would benefit from this and love it.

BigTree feels like one of those moments where AI stops being hype and becomes utility. Finding the right blend of Gemini APIs for cost effectiveness(!) is important for an almost production ready app.

And we’re just getting started.

gemini #google #webdev #running #buildinpublic

Top comments (0)