DEV Community

Cover image for How I built an Offline-First Voice-Controlled Map Engine in JavaScript
Sanish Kumar
Sanish Kumar

Posted on

How I built an Offline-First Voice-Controlled Map Engine in JavaScript

Have you ever tried to drag a map on your phone while carrying groceries? Or tried to annotate a field survey map while wearing gloves? Traditional GIS UIs assume you always have two free hands and perfect focus. I wanted to change that.

Today I'm open-sourcing VoiceGIS — a robust, offline-capable JavaScript library that lets you control Leaflet and OpenLayers maps using natural voice commands.

Here's how I solved the hardest parts of building a production-grade voice mapping engine.


The Problem with "Just use Web Speech API"

If you've ever played with window.SpeechRecognition, you know it's a neat toy, but it has two massive flaws for production GIS apps:

  1. It requires an internet connection. If you are doing an environmental survey in the woods, or using a tablet on a remote construction site, it instantly breaks.
  2. It's deeply tied to Google/Apple servers.

To solve this, VoiceGIS ships with a hybrid engine architecture.

By default, it uses the browser's native Web Speech API. But the moment the user goes offline (or explicitly requests privacy), VoiceGIS seamlessly falls back to an on-device Whisper AI model using @huggingface/transformers.

The onnx-community/whisper-tiny.en model is downloaded directly into the browser's Cache API (~40MB). It processes your speech entirely locally using WebAssembly or WebGPU. No audio ever leaves the user's device.

The Middleware Pipeline: More Than Just Parsing

Voice commands are messy. You might say: "Zoom to Paris and show the satellite layer."

Most tutorials solve this with a massive switch statement. I wanted VoiceGIS to be extensible like an Express.js server.

So, I built a Koa-style middleware pipeline right into the execution loop. When you speak, the text is split into sequential chain links, parsed, and passed through your custom middlewares:

import { VoiceGIS, voiceFeedback } from 'voicegis';

const app = new VoiceGIS({ mapContainerId: 'map' });

// Middleware 1: Analytics logging
app.use(async (ctx, next) => {
  console.log(`User intent: ${ctx.result.intent}`);
  await next();
});

// Middleware 2: The map talks back! (Built-in TTS plugin)
app.use(voiceFeedback({ lang: 'en-US' }));
Enter fullscreen mode Exit fullscreen mode

Because it uses async/await middleware, you can even intercept commands to show confirmation modals before destructive actions, or block commands based on app state (e.g., Read-Only mode).

Handling Geospatial Context

Extracting an intent (like go_to) is easy. Extracting the payload (like "Paris") and turning it into coordinates is hard.

VoiceGIS uses an internal fuzzy geocoder that leverages the Nominatim API, but falls back to a local LRU cache and predefined aliases. It handles conversational cruft effortlessly:
"Can you please take me to the Eiffel Tower?" → Intent: GO_TO, Payload: [48.8584, 2.2945].

And because users inevitably make mistakes, every state-mutating command (like panning or zooming) automatically pushes a snapshot to the CommandHistory stack. If the map flies off to the wrong city, the user just says "undo" and the map snaps right back.

Try It Out

You can drop this into any React, Vue, or Vanilla JS app in about 3 lines of code.

npm install voicegis
Enter fullscreen mode Exit fullscreen mode

Check out the GitHub repo for complete example apps, including:

  • A Next.js Dashboard integration
  • An offline Electron Kiosk setup
  • A field survey app

I'd love to hear what you build with it! Are there any other intents or offline engines you'd like to see added?

Top comments (0)