Sanish Kumar

Posted on Jun 22

How I built an Offline-First Voice-Controlled Map Engine in JavaScript

#webdev #opensource #gis #javascript

Have you ever tried to drag a map on your phone while carrying groceries? Or tried to annotate a field survey map while wearing gloves? Traditional GIS UIs assume you always have two free hands and perfect focus. I wanted to change that.

Today I'm open-sourcing VoiceGIS — a robust, offline-capable JavaScript library that lets you control Leaflet and OpenLayers maps using natural voice commands.

Here's how I solved the hardest parts of building a production-grade voice mapping engine.

The Problem with "Just use Web Speech API"

If you've ever played with window.SpeechRecognition, you know it's a neat toy, but it has two massive flaws for production GIS apps:

It requires an internet connection. If you are doing an environmental survey in the woods, or using a tablet on a remote construction site, it instantly breaks.
It's deeply tied to Google/Apple servers.

To solve this, VoiceGIS ships with a hybrid engine architecture.

By default, it uses the browser's native Web Speech API. But the moment the user goes offline (or explicitly requests privacy), VoiceGIS seamlessly falls back to an on-device Whisper AI model using @huggingface/transformers.

The onnx-community/whisper-tiny.en model is downloaded directly into the browser's Cache API (~40MB). It processes your speech entirely locally using WebAssembly or WebGPU. No audio ever leaves the user's device.

The Middleware Pipeline: More Than Just Parsing

Voice commands are messy. You might say: "Zoom to Paris and show the satellite layer."

Most tutorials solve this with a massive switch statement. I wanted VoiceGIS to be extensible like an Express.js server.

So, I built a Koa-style middleware pipeline right into the execution loop. When you speak, the text is split into sequential chain links, parsed, and passed through your custom middlewares:

import { VoiceGIS, voiceFeedback } from 'voicegis';

const app = new VoiceGIS({ mapContainerId: 'map' });

// Middleware 1: Analytics logging
app.use(async (ctx, next) => {
  console.log(`User intent: ${ctx.result.intent}`);
  await next();
});

// Middleware 2: The map talks back! (Built-in TTS plugin)
app.use(voiceFeedback({ lang: 'en-US' }));

Because it uses async/await middleware, you can even intercept commands to show confirmation modals before destructive actions, or block commands based on app state (e.g., Read-Only mode).

Handling Geospatial Context

Extracting an intent (like go_to) is easy. Extracting the payload (like "Paris") and turning it into coordinates is hard.

VoiceGIS uses an internal fuzzy geocoder that leverages the Nominatim API, but falls back to a local LRU cache and predefined aliases. It handles conversational cruft effortlessly:
"Can you please take me to the Eiffel Tower?" → Intent: GO_TO, Payload: [48.8584, 2.2945].

And because users inevitably make mistakes, every state-mutating command (like panning or zooming) automatically pushes a snapshot to the CommandHistory stack. If the map flies off to the wrong city, the user just says "undo" and the map snaps right back.

Try It Out

You can drop this into any React, Vue, or Vanilla JS app in about 3 lines of code.

npm install voicegis

Check out the GitHub repo for complete example apps, including:

A Next.js Dashboard integration
An offline Electron Kiosk setup
A field survey app

I'd love to hear what you build with it! Are there any other intents or offline engines you'd like to see added?

Top comments (3)

Frank • Jun 23

How did you handle caching and updating map tiles for offline use, especially in areas with limited storage capacity?

Sanish Kumar • Jun 25 • Edited

Sorry for the delayed response.
VoiceGIS actually doesn't handle map tiles at all—it just does the offline voice-to-intent stuff and leaves tile storage to the map engine. If storage is tight and you're using Leaflet, look into leaflet.offline. It lets you cache a strict bounding box so you only save the exact area you need.

Sanish Kumar • Jun 25

Or if you go with OpenLayers, you can supply your own vector tiles for low storage. Because they send mathematical geometry instead of thousands of raster images, they take up a fraction of the space.