DEV Community

Cover image for State-Aware Edge AI: Building a Weather-Synced Sentient Sprout
Quentin Merle
Quentin Merle

Posted on

State-Aware Edge AI: Building a Weather-Synced Sentient Sprout

June Solstice Game Jam Submission

This is a submission for the June Solstice Game Jam

What I Built

Solstice Sprout is a cozy, sentient, real-time Tamagotchi-style browser game where your objective is to keep a little sprout alive until the Summer Solstice (June 21st).

While it looks like a Neobrutalist toy project, the core engineering target was to solve a specific challenge: In-Browser Local AI State Awareness. Most developers treat client-side LLMs as a novelty chatbot widget. Following up on the hybrid routing and local telemetry patterns explored in our Ping Prompt R&D experiments, I wanted to see if we could bridge this gap inside a real-time application—embedding a local SLM (Llama-3.2-1B) directly inside a web app's reactive state loop, making the model fully aware of actual DOM parameters, local geolocation weather, and procedural SVG/Audio APIs.

Play local-first on GitHub Pages: https://quentinmerle.github.io/solstice-sprout/


Code

https://github.com/QuentinMerle/solstice-sprout


How I Built It

To keep the application fast, light, and private, I built it with vanilla JavaScript, styled it with custom CSS, and bundled it with Vite. Moving the AI model entirely to the Edge meant dealing with real browser constraints. Here is the breakdown of the technical obstacles and how I resolved them.

Obstacle 1: The UI Rendering Bottleneck

In game loops, logic updates are decoupled from rendering. The plant's internal state (water, happiness, and life) was calculated on a 1Hz ticker. While this was lightweight, it meant that when a user performed an action (like clicking the "Water" button), the visual state of the SVG plant did not update until the next second rolled over. The interaction felt sluggish.

The Solution: Rather than locking UI updates to the 1Hz ticker loop, I implemented a lightweight custom event bus inside the state manager. The moment an action updates the model, an 'update' event is fired to trigger immediate rendering.


// In src/state.js
update(newVals) {
  this.data = { ...this.data, ...newVals };
  this.calculateLife();
  this.save();
  this.updateUI();
  this.dispatch({ type: 'update' }); // Dispatch event instantly on action
}

// In src/main.js
state.onEvent((evt) => {
  if (evt.type === 'update') {
    plant.update(state);
    retention.update(state);
  }
});
Enter fullscreen mode Exit fullscreen mode

Obstacle 2: Asset Overhead & Bundle Bloat (Procedural Audio Synth)

I wanted the initial page load to be under a few kilobytes (excluding the optional local LLM weights). Packing static MP3 files for music clips was out of the question due to network overhead.

The Solution: I bypassed static files entirely by generating procedural audio on-the-fly. Using the Web Audio API, I instantiated a synthesizer that schedules notes dynamically using three distinct sound profiles (smooth triangle waves for a lullaby, square waves for an 8-bit chiptune, and pure sine waves for bell chimes).


// In src/chat.js
playMusic() {
  if (!this.audioCtx) {
    this.audioCtx = new (window.AudioContext || window.webkitAudioContext)();
  }

  const ctx = this.audioCtx;

  const melodies = [
    {
      type: 'triangle', // Flute-like
      notes: [
        { freq: 523.25, t: 0.00, dur: 0.24 }, // C5
        { freq: 659.25, t: 0.24, dur: 0.24 }, // E5
        { freq: 783.99, t: 0.48, dur: 0.24 }, // G5
        { freq: 1046.5, t: 1.20, dur: 0.60 }  // C6
      ]
    },
    {
      type: 'square', // Chiptune
      notes: [
        { freq: 523.25, t: 0.00, dur: 0.08 }, // C5
        { freq: 587.33, t: 0.08, dur: 0.08 }, // D5
        { freq: 659.25, t: 0.16, dur: 0.08 }, // E5
        { freq: 1046.5, t: 0.48, dur: 0.16 }  // C6
      ]
    },
    {
      type: 'sine', // Crystal Bells
      notes: [
        { freq: 880.00, t: 0.00, dur: 0.20 }, // A5
        { freq: 987.77, t: 0.20, dur: 0.20 }, // B5
        { freq: 1174.7, t: 0.40, dur: 0.20 }, // D6
        { freq: 1760.0, t: 0.80, dur: 0.40 }  // A6
      ]
    }
  ];

  const choice = melodies[Math.floor(Math.random() * melodies.length)];

  choice.notes.forEach(({ freq, t, dur }) => {
    const osc = ctx.createOscillator();
    const gain = ctx.createGain();
    osc.connect(gain);
    gain.connect(ctx.destination);
    osc.type = choice.type;
    osc.frequency.setValueAtTime(freq, ctx.currentTime + t);
    gain.gain.setValueAtTime(0, ctx.currentTime + t);

    // Square waves are loud, so we lower the peak gain for comfort
    const peakGain = choice.type === 'square' ? 0.05 : 0.22;
    gain.gain.linearRampToValueAtTime(peakGain, ctx.currentTime + t + 0.02);
    gain.gain.exponentialRampToValueAtTime(0.001, ctx.currentTime + t + dur - 0.02);
    osc.start(ctx.currentTime + t);
    osc.stop(ctx.currentTime + t + dur);
  });
}
Enter fullscreen mode Exit fullscreen mode

Obstacle 3: The WebGPU-less Fallback (Mock Mode)

Not every device has WebGPU capability or the network bandwidth to pull 800MB model weights on a train ride. To ensure a cohesive experience, the game falls back to a Mock Mode. But how do we keep the plant "sentient" without a neural network?

The Solution: I built a deterministic, stat-aware regex parsing engine. When the user asks the plant about its health, happiness, or why its petals are missing, the mock engine pulls the live stats and builds contextually accurate responses.


// In src/chat.js (Mock Mode fallback)
const cleanMsg = userMessage.toLowerCase();
const asksAboutPetals = cleanMsg.includes("petal") || cleanMsg.includes("pétale") || cleanMsg.includes("flower") || cleanMsg.includes("fleur");

if (asksAboutPetals) {
  if (happiness < 20) {
    reply = `I don't have any petals because my happiness is only ${happiness.toFixed(0)}%! I need at least 20% happiness for my first petal to bloom. Try playing some music! 🎵🌸`;
  } else {
    const count = Math.min(5, Math.floor(happiness / 20));
    reply = `I've got ${count} petals right now because my happiness is at ${happiness.toFixed(0)}%. Make me happier to see more bloom! 🌸✨`;
  }
}
Enter fullscreen mode Exit fullscreen mode

Nuances & Trade-offs: The Case for Hybridization

Let’s be honest: running a 1.2B parameter LLM directly in the client browser is not a silver bullet. Downloading ~800MB of quantized weights on the first page load is a massive onboarding barrier. In a real-world product, this leads to a high bounce rate.

The solution isn't to force the download, but to design a hybrid architecture (similar to what we tested on other edge AI projects):

Onboarding with Cloud SLMs: On the first visit (or on devices without WebGPU support), route the chat prompts to a cheap serverless API like OpenRouter running the same model (Llama-3.2-1B or Gemma-2-2b). OpenRouter hosts these models at fractions of a cent (e.g., $0.07 per million tokens).
Background Caching: While the user is interacting with the game via the cloud fallback, spin up a background worker to progressively download and cache the model weights locally.
The Hot-Swap: Once the cache is ready, seamlessly hot-swap the model runner from the cloud API to WebLLM running locally in their VRAM.

The Cost-to-Performance Reality

At a small scale, querying a cloud model on OpenRouter is negligible. However, if you scale to 50,000 active users chatting with their plant 50 times a day, you are looking at around 250 million tokens per month. Even at $0.07/M tokens, that’s a constant monthly bill.

By hot-swapping to local WebLLM for returning users with compatible hardware (estimated at around 30% of users due to WebGPU support limitations), we drastically reduce cloud reliance. For those 30%, the marginal server cost drops to exactly $0.00 by offloading the VRAM/GPU compute directly to the client's device. This represents a massive optimization of cloud resources, with the remaining users continuing to run seamlessly on the cloud fallback.

The trade-off shifts from server bills to client battery drain. For a Tamagotchi-style companion, offloading computation is the ultimate privacy and cost-saving win, but progressive hybridization is the only way to make it production-ready.

What do you think? Are you using hybrid local/cloud architectures for SLMs, or are you waiting for standard browser-native APIs (like window.ai) to mature?


Prize Category

Best Google AI Usage

For this category, the entire game—from initial mechanics design, procedural SVG structures, and Web Audio API synthesizers, to CSS layout, responsive breakpoints, and local WebLLM prompt architecture—was built in a collaborative pair-programming workflow with Google's Gemini models. The AI acted as a lead game developer, ensuring clean code separation, performance optimization, and styling details.


Proudly developed in Beauce, Québec 🇨🇦

Top comments (0)