<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sanish Kumar</title>
    <description>The latest articles on DEV Community by Sanish Kumar (@sanish_kumar).</description>
    <link>https://dev.to/sanish_kumar</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3994121%2Fdaf37d27-7930-4a3b-8505-c4c7daef0576.png</url>
      <title>DEV Community: Sanish Kumar</title>
      <link>https://dev.to/sanish_kumar</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sanish_kumar"/>
    <language>en</language>
    <item>
      <title>How I built an Offline-First Voice-Controlled Map Engine in JavaScript</title>
      <dc:creator>Sanish Kumar</dc:creator>
      <pubDate>Mon, 22 Jun 2026 20:30:00 +0000</pubDate>
      <link>https://dev.to/sanish_kumar/how-i-built-an-offline-first-voice-controlled-map-engine-in-javascript-if8</link>
      <guid>https://dev.to/sanish_kumar/how-i-built-an-offline-first-voice-controlled-map-engine-in-javascript-if8</guid>
      <description>&lt;p&gt;&lt;em&gt;Have you ever tried to drag a map on your phone while carrying groceries? Or tried to annotate a field survey map while wearing gloves? Traditional GIS UIs assume you always have two free hands and perfect focus. I wanted to change that.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Today I'm open-sourcing &lt;a href="https://npmjs.com/package/voicegis" rel="noopener noreferrer"&gt;&lt;strong&gt;VoiceGIS&lt;/strong&gt;&lt;/a&gt; — a robust, offline-capable JavaScript library that lets you control Leaflet and OpenLayers maps using natural voice commands.&lt;/p&gt;

&lt;p&gt;Here's how I solved the hardest parts of building a production-grade voice mapping engine.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem with "Just use Web Speech API"
&lt;/h2&gt;

&lt;p&gt;If you've ever played with &lt;code&gt;window.SpeechRecognition&lt;/code&gt;, you know it's a neat toy, but it has two massive flaws for production GIS apps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;It requires an internet connection.&lt;/strong&gt; If you are doing an environmental survey in the woods, or using a tablet on a remote construction site, it instantly breaks.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;It's deeply tied to Google/Apple servers.&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;To solve this, VoiceGIS ships with a &lt;strong&gt;hybrid engine architecture&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;By default, it uses the browser's native Web Speech API. But the moment the user goes offline (or explicitly requests privacy), VoiceGIS seamlessly falls back to an &lt;strong&gt;on-device Whisper AI model&lt;/strong&gt; using &lt;code&gt;@huggingface/transformers&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;The &lt;code&gt;onnx-community/whisper-tiny.en&lt;/code&gt; model is downloaded directly into the browser's Cache API (~40MB). It processes your speech entirely locally using WebAssembly or WebGPU. No audio ever leaves the user's device.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Middleware Pipeline: More Than Just Parsing
&lt;/h2&gt;

&lt;p&gt;Voice commands are messy. You might say: &lt;em&gt;"Zoom to Paris and show the satellite layer."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Most tutorials solve this with a massive &lt;code&gt;switch&lt;/code&gt; statement. I wanted VoiceGIS to be extensible like an Express.js server. &lt;/p&gt;

&lt;p&gt;So, I built a Koa-style middleware pipeline right into the execution loop. When you speak, the text is split into sequential chain links, parsed, and passed through your custom middlewares:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;VoiceGIS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;voiceFeedback&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;voicegis&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;VoiceGIS&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;mapContainerId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;map&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Middleware 1: Analytics logging&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`User intent: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;intent&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Middleware 2: The map talks back! (Built-in TTS plugin)&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;voiceFeedback&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;lang&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;en-US&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because it uses &lt;code&gt;async/await&lt;/code&gt; middleware, you can even intercept commands to show confirmation modals before destructive actions, or block commands based on app state (e.g., Read-Only mode).&lt;/p&gt;

&lt;h2&gt;
  
  
  Handling Geospatial Context
&lt;/h2&gt;

&lt;p&gt;Extracting an intent (like &lt;code&gt;go_to&lt;/code&gt;) is easy. Extracting the &lt;em&gt;payload&lt;/em&gt; (like "Paris") and turning it into coordinates is hard.&lt;/p&gt;

&lt;p&gt;VoiceGIS uses an internal fuzzy geocoder that leverages the Nominatim API, but falls back to a local LRU cache and predefined aliases. It handles conversational cruft effortlessly:&lt;br&gt;
&lt;em&gt;"Can you please take me to the Eiffel Tower?"&lt;/em&gt; → Intent: &lt;code&gt;GO_TO&lt;/code&gt;, Payload: &lt;code&gt;[48.8584, 2.2945]&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;And because users inevitably make mistakes, every state-mutating command (like panning or zooming) automatically pushes a snapshot to the &lt;code&gt;CommandHistory&lt;/code&gt; stack. If the map flies off to the wrong city, the user just says &lt;strong&gt;"undo"&lt;/strong&gt; and the map snaps right back.&lt;/p&gt;
&lt;h2&gt;
  
  
  Try It Out
&lt;/h2&gt;

&lt;p&gt;You can drop this into any React, Vue, or Vanilla JS app in about 3 lines of code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;voicegis
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check out the GitHub repo for complete example apps, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A Next.js Dashboard integration&lt;/li&gt;
&lt;li&gt;An offline Electron Kiosk setup&lt;/li&gt;
&lt;li&gt;A field survey app&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'd love to hear what you build with it! Are there any other intents or offline engines you'd like to see added?&lt;/p&gt;

</description>
      <category>javascript</category>
      <category>webdev</category>
      <category>gis</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
