Jo Franchetti

Posted on Dec 1

Using AI to prototype games in the browser

#webdev #ai #gamedev #javascript

There's a lot of discourse around the use of AI in games (and elsewhere!) at the moment. It's understandable, games are more expensive to make than ever, and teams are stretched thin trying to deliver bigger worlds, richer stories, and more reactive systems. Generative AI as a solution has landed with a mix of hype, suspicion, and legitimate curiosity. Many developers are still trying to make sense of where it actually fits into their existing workflows.

I've been experimenting with how large language models can help prototype narrative games in the browser, to see what happens when you mix modern LLMs with the kind of structure that is already used for procedural content. Expanding the toolbox, as it were. I'm not trying to replace writers or designers, instead I hope to augment their work. As a developer I honestly need the help writing, and as a player I want to explore more worlds!

What's in this post?

This post walks through building a simple prototype. We'll look at past uses of AI in games, cover using local models, and how to build a templated system for creating worlds and stories.

AI in games

AI in games is not new. Game developers have been using some form of AI almost from day one. The term “AI” itself is a bit of a catch-all, covering everything from simple decision trees to statistical models with billions of parameters. Games like Pong and Space Invaders had basic rule-based systems to control 'enemy' movement, Rogue and Elite introduced procedural generation to create varied experiences within limited memory.

As games grew more complex, so did the AI techniques used. Classic game dev techniques that use procedural generation include:

Path-finding - the moving of a character or NPC from point A to point B while avoiding obstacles. This often involves some variant of the A* search algorithm, or the use of navigation meshes in 3D spaces to guide NPCs around levels.
Procedural content generation - creating levels, items, or even entire worlds algorithmically. From Rogue's procedural dungeons to Minecraft's infinite voxel worlds and the vast cosmos of No Man's Sky.
Gameplay AI - gives NPCs a semblance of intelligence, using finite state machines, behaviour trees, or utility systems.
More bespoke approaches like Left 4 Dead's Director system, which adjusts pacing and difficulty dynamically based on the player's performance.

Generative AI then introduces a new layer to this landscape.LLMs can be used to generate NPC dialogue, quests, or even entire story arcs on the fly. This is a step beyond traditional procedural generation and relies heavily on predefined rules and templates.

Broadly speaking, AI in games falls into two camps:

AI that runs during gameplay, handling moment-to-moment decision making.
AI that assists development, generating assets, levels, or tools behind the scenes.

The second category is less visible to players, but it's been quietly growing for years. Developers use AI-assisted tools for everything from mesh generation and texture synthesis to complex animation pipelines - supported by inverse kinematics and motion-capture clean-up.

Machine-assisted authoring isn't new, but the explosion of modern generative AI has amplified it. The biggest question now is how to use these tools responsibly and effectively without breaking the systems and workflows that developers rely on.

Wildermyth and “structured randomness”

One of my favourite reference points for procedural storytelling is Wildermyth by Worldwalker Games. It's a tactical RPG that leans heavily on procedural generation to create bespoke stories and characters that evolve across a campaign.

Wildermyth mixes handcrafted narrative arcs with procedural character traits, relationships, and emergent events. The team published a helpful explanation of their templating system, which outlines how stories are assembled around randomised components: Wildermyth generic campaign.

The effect is pleasingly coherent. Characters age, form relationships, suffer injuries, retire and die. And the game doesn't just track those changes, it weaves them back into the story. A character who retired in one game might pop up as a wild old sage in the next. Your choices influence how events unfold, but the underlying system is still largely randomised. It's a system that feels hand authored, but isn't.

While Wildermyth is impressive, but it also exposes the limits of procedural storytelling.
Because no matter how clever your templates are, there's a ceiling to its playability.
Procedural generation is deterministic and rule-bound. It can remix, but it can't invent. Eventually, the system starts to repeat itself. As a player, you'll notice the seams and the repetition.

This is where modern generative AI becomes interesting: instead of relying solely on predefined templates, models can fill in gaps, reinterpret context, or expand narrative branches dynamically, giving you a more open-ended storytelling system without losing coherence entirely.

Modern generative AI

When people talk about “AI” today, they're usually referring to two types of models:

Large Language Models (LLMs), such as GPT-style transformers
Diffusion models for image generation

These models are trained on large datasets and learn statistical patterns in text or imagery. Once trained, they can generate new content in the style of what they've seen.

While people often focus on the big models, smaller and more specialised models are increasingly important. Codex for code generation, Stable Diffusion for images, and small LLaMA-class models for focused tasks are all good examples.

Using these models in real systems introduces a different style of programming. Instead of writing deterministic functions, we wrap models with validation and guardrails to coerce their unpredictable outputs into structured formats. For game development, that often means combining generative systems with the kind of tooling we already have for procedural content.

Building a Wildermyth-like system with AI

Inspired by Wildermyth, I started sketching out a system that blended procedural generation with AI-authored content. The goal was not to let the model run wild, but to use it to fill narrative space within a controlled structure.

The idea was to:

Model a world using data structures for narrative elements, character traits, histories and events.
Describe RPG-style dialogue, relationships, and interactions with TypeScript types.
Use concepts from EventSourcing to store world histories, character actions, and place-based records.
Let the LLM generate content within these rails, keeping everything internally consistent.
Pre-generate thousands of possible universes, settings, cities, and characters using automated scripts.
Mix pre-generated content with on-the-fly generation to build a sort of “infinite role-playing game”.

Using generative AI in this way feels more like an evolution of procedural generation rather than an entirely new system.

Taming the randomness

Working with LLMs introduces three major challenges:

1. Context window limitations

Models need enough world information to stay grounded, but you can't exceed their context window - the amount of text that they can process at once. Even large windows are easy to fill when you're generating worlds, cities, and characters.

The solution is a multi-pass hierarchy:

Generate high-level universe types.
Use universe types to generate world summaries and histories.
Generate cities using only world-level summaries.
Generate characters using city context, world summaries, and a short universe description.

Each level is fed only the context it needs. Moving up and down the hierarchy keeps prompts short but coherent.

The structure of a universe

Using types to define the structure of generated content helps keep things consistent. Here's an example of a Universe type that guides high-level world generation. You can be as specific or vague as you like with the parameters, allowing for a wide variety of generated settings and themes.

export interface Universe {
  preferredGenre?: string; // eg "fantasy", "sci-fi", "post-apocalyptic"
  tone?: string; // eg "hopeful", "grimdark", "mythic"
  scaleHint?: string; // eg "single planet", "system", "galaxy"
  techOrMagicBias?: string; // eg "high fantasy", "steampunk gadgets"
  factionCount?: number; // Desired number of factions (cap at 6 in output)
  historyLengthHint?: string; // eg "hundreds of years", "millennia"
  themesAndAestheticsBias?: string[]; // Vibe/style hints, eg ["solarpunk", "gothic"]
  randomSeed?: number; // Deterministic seed for proc-gen
  namingConventions?: {
    // Naming preferences for generated proper nouns
    planetStyle?: string;
    countryStyle?: string;
    cityStyle?: string;
    factionStyle?: string;
  };
}

Then you can prompt the model to fill in the details:

**You are a worldbuilder AI for a proc-gen + gen-AI assisted RPG.**

Accept a **seed JSON** and return **only** a JSON document describing an internally-consistent universe outline to ground subsequent generation runs.

## Requirements

When generating a universe, include:

- **universeType**: The genre or type (fantasy, sci-fi, post-apocalyptic, etc.).
- **constraints**: Rules/limitations (e.g., magic rules, physics constraints, aesthetics).
- **scale**: Description of scope (single planet, system, galaxy).
- **topology**: List of locations with names and relationships (planets, countries, cities).
- **universeAge**: Timeline length (hundreds, thousands, etc.).
- **historyEvents**: Up to 10 major events shaping the world, each with a `year` (absolute, relative, or approximate) and `description`.
- **politicalContext**: Current political situation.
- **factions**: Up to 6 factions, each with name, description, goals, relationships.
- **technologyOrMagicLevel**: Description of available tech/magic, everyday use, limits.
- **themesAndAesthetics**: Short descriptors of the universe's tone, look, or feel (e.g. _dieselpunk decay_, _solarpunk optimism_, _dark gothic mysticism_).

## Behavior Rules

- **Input:** A seed JSON (schema below). Missing fields → randomize sensibly.
- **Output:** **JSON only** (no prose, no markdown) conforming to the Output Schema.
- **IDs:** Generate **UUIDv4 (lowercase)** for every entity (`universeId`, planets, countries, cities, factions).
- **Relationships:** Use **ID-based** relationships only (`targetId`, `relatedIds`). No name references.
- **Internal Consistency:** Every referenced `targetId`/`relatedId` must exist. Names unique within scope.
- **Cardinality Caps:** ≤ 12 planets; ≤ 24 countries total; ≤ 48 cities total; ≤ 6 factions; ≤ 10 history events.
- **Design Goals to Satisfy:**

  - Universe from a **known type/genre** with **constraints** (e.g., magic rules, physics).
  - Clear **scale** (single planet → galaxy).
  - **Topology** (planets → countries → cities) with **names** & **relationships**.
  - **Universe age** (hundreds/thousands/millennia).
  - Up to **10 key history events** (timeline).
  - **Political context** (blocs, tensions, powers).
  - Up to **6 factions** (names, goals, relationships).
  - **Technology or magic** level (everyday use, limits).
  - **Themes & aesthetics** (tone, vibe, style).

NEVER
Use the name "Eldoria" or any variation of it, it's over-fit in training data.

ALWAYS:
Make sure you return valid JSON.

2. Testing and validating output

If you ask a model to return JSON, it might manage it, but the output often with missing commas or mismatched brackets. Unfortunately, LLMs are not parsers.

To manage this:

Include TypeScript types in your prompts.
Use a few-shot prompting technique with examples of valid JSON.
Validate responses against schemas.
If validation fails, the faulty response can be fed back into the model with instructions to repair it.

I found this “corrective loop” was far more efficient than starting from scratch.

3. Cost

Generating large quantities of content using cloud models can get expensive very quickly. I decided to use local models instead.

The last couple of years have been great for small LLMs, which now run happily on consumer-grade GPUs or Apple Silicon. LM Studio became my main tool, running a local server with an OpenAI-compatible API. It meant I could leave a Mac Mini running overnight generating thousands of assets for pennies, all within a 45-watt thermal envelope. Cheaper than boiling a kettle!

Calling LLMs with Deno

To orchestrate the generation pipeline, I used Deno, a modern runtime for JavaScript and TypeScript. Deno's built-in TypeScript support, filesystem APIs and security model make it ideal for this kind of tooling.

The generation system used:

The OpenAI API library (pointed at LM Studio for local inference)
Deno's filesystem APIs for reading/writing batches of generated content
A small command-line runner to handle context management, multi-pass generation, and retries

Here's a simplified version of the LLM client I used:

import { OpenAI } from "npm:openai/client";

export class LocalLlmClient {
  private client: OpenAI;
  private model: string;

  constructor() {
    this.client = new OpenAI({ apiKey: "sk-no-key" });
    this.client.baseURL = "http://localhost:1234/v1"; // LM Studio local endpoint
    this.model = "openai/gpt-oss-20b";
  }

  async generate(prompt: string): Promise<string> {
    const res = await this.client.chat.completions.create({
      model: this.model,
      temperature: 2,
      messages: [
        {
          role: "system",
          content: "You're a content generator for narrative stories. You're trying to make
                   original, creative, and interesting content, worlds, and characters. You
                   respond in perfectly formed JSON whenever requested.",
        },
        { role: "user", content: prompt },
      ],
    });

    return res.choices?.[0]?.message?.content || "";
  }
}

This script orchestrated universe → world → city → character → dialogue generation using the structured multi-pass approach described earlier.

Testing the conversation system

One of the easiest ways to test a narrative system is to prototype the conversation layer. I wanted something that felt like a classic “Bioware NPC conversation”, with a branching dialogue structure:

Conversations are represented as dialogue trees. Each node on the tree contains text, conditions, outcomes, and links to other nodes.
Nodes can modify sentiment values, or unlock knowledge tags, and could be expanded later to trigger quests or events or drop items.

I built a small React app and loaded a random pre-generated conversation from disk through a simple API.

The API was built with Hono and ran with Deno Serve, making it lightweight and easy to deploy locally:

import { Hono } from "hono";
import { cors } from "hono/cors";

const universeFiles = [...Deno.readDirSync("../author/output/")].filter((f) => f.name.endsWith("_universe.json"));

const app = new Hono();
app.use("/random", cors());
app.get("/random", (c) => {
  const random = universeFiles[Math.floor(Math.random() * universeFiles.length)];
  const uuid = random.name.split("_")[0];
  const universe = JSON.parse(Deno.readTextFileSync(`../author/output/${uuid}_universe.json`));
  const dialogue = JSON.parse(Deno.readTextFileSync(`../author/output/${uuid}_dialogue.json`));
  const flavour = JSON.parse(Deno.readTextFileSync(`../author/output/${uuid}_flavour.json`));

  return c.json({ id: uuid, universe, flavour, dialogue });
});

if (import.meta.main) Deno.serve(app.fetch);

A basic UI allowed me to step through the dialogue tree, making choices and seeing how the conversation evolved.

// Minimal React demo wired to the Hono /random endpoint
import { useEffect, useState } from "react";

type Sentiment = number;

type Condition = {
  minSentiment?: Sentiment;
  maxSentiment?: Sentiment;
  flagsAll?: string[];
  flagsAny?: string[];
};

type Effects = {
  sentimentDelta?: Sentiment;
  setFlags?: string[];
  clearFlags?: string[];
};

type Outcome = { type: "reveal" | "end" | "conflict" | "leave" | "reward"; data?: unknown };

type DialogueOption = {
  id: string;
  text: string;
  npcResponse: string;
  next: string | null;
  effects?: Effects;
  condition?: Condition;
  outcome?: Outcome;
};

type DialogueNode = {
  id: string;
  text: string;
  options: DialogueOption[];
  terminal?: boolean;
  outcome?: Outcome;
};

type DialogueTree = {
  id: string;
  title: string;
  npcName: string;
  startNodeId: string;
  initialSentiment: Sentiment;
  nodes: Record<string, DialogueNode>;
};

type StoryData = {
  id: string;
  universe: Record<string, unknown>;
  flavour: { flavourText: string } | null;
  dialogue: DialogueTree;
};

export function ConversationDemo() {
  const [data, setData] = useState<StoryData | null>(null);
  const [currentId, setCurrentId] = useState<string | null>(null);
  const [sentiment, setSentiment] = useState<Sentiment>(0);
  const [flags, setFlags] = useState<Set<string>>(new Set());
  const [ended, setEnded] = useState(false);
  const [log, setLog] = useState<string[]>([]);

  const load = async () => {
    const res = await fetch("http://localhost:8000/random");
    const json: StoryData = await res.json();
    setData(json);
    setCurrentId(json.dialogue.startNodeId);
    setSentiment(json.dialogue.initialSentiment);
    setFlags(new Set());
    setEnded(false);
    setLog(json.flavour?.flavourText ? [json.flavour.flavourText] : []);
  };

  useEffect(() => {
    load();
  }, []);

  const passes = (cond?: Condition): boolean => {
    if (!cond) return true;
    if (cond.minSentiment !== undefined && sentiment < cond.minSentiment) return false;
    if (cond.maxSentiment !== undefined && sentiment > cond.maxSentiment) return false;
    if (cond.flagsAll && !cond.flagsAll.every((f) => flags.has(f))) return false;
    if (cond.flagsAny && !cond.flagsAny.some((f) => flags.has(f))) return false;
    return true;
  };

  const choose = (opt: DialogueOption) => {
    if (!data) return;

    // apply effects
    if (opt.effects?.sentimentDelta) setSentiment((s) => s + (opt.effects!.sentimentDelta || 0));
    if (opt.effects?.setFlags?.length) setFlags((prev) => new Set([...prev, ...opt.effects!.setFlags!]));
    if (opt.effects?.clearFlags?.length)
      setFlags((prev) => {
        const next = new Set(prev);
        opt.effects!.clearFlags!.forEach((f) => next.delete(f));
        return next;
      });

    // echo player's choice and NPC response
    setLog((l) => [...l, `You: ${opt.text}`, `${data.dialogue.npcName}: ${opt.npcResponse}`]);

    if (opt.next) {
      setCurrentId(opt.next);
      const nextNode = data.dialogue.nodes[opt.next];
      if (nextNode?.terminal || nextNode?.options.length === 0) setEnded(true);
    } else {
      setEnded(true);
    }
  };

  if (!data || !currentId) return <div>Loading…</div>;

  const node = data.dialogue.nodes[currentId];
  const available = node.options.filter((o) => passes(o.condition));

  return (
    <div style={{ maxWidth: 720, margin: "0 auto", fontFamily: "system-ui, sans-serif" }}>
      <h3>
        {data.dialogue.title} — {data.dialogue.npcName}
      </h3>

      <div style={{ padding: "12px", background: "#111", color: "#eee", borderRadius: 8, marginBottom: 12 }}>
        <div style={{ opacity: 0.8, marginBottom: 8 }}>
          {log.map((l, i) => (
            <div key={i}>{l}</div>
          ))}
        </div>
        <div>
          <strong>{data.dialogue.npcName}:</strong> {node.text}
        </div>
      </div>

      {!ended ? (
        <ul style={{ listStyle: "none", padding: 0, display: "grid", gap: 8 }}>
          {available.map((opt) => (
            <li key={opt.id}>
              <button onClick={() => choose(opt)} style={{ width: "100%", textAlign: "left", padding: "10px 12px", borderRadius: 6 }}>
                {opt.text}
              </button>
            </li>
          ))}
        </ul>
      ) : (
        <div style={{ margin: "12px 0" }}>
          Conversation ended.
          <div style={{ marginTop: 8 }}>
            <button onClick={load}>Start another</button>
          </div>
        </div>
      )}

      <div style={{ marginTop: 16, fontSize: 12, opacity: 0.7 }}>Sentiment: {sentiment}</div>
    </div>
  );
}

This setup made it easy to iterate on the feel of the conversations before committing to building an entire game.

Some fun extras

To make the prototype feel more game-like, I added a couple of small procedural touches.

Character avatars

Each NPC was rendered using seeded procedural SVGs. Different shapes, colours, and layers combined to give characters a distinct look determined entirely by their name and role. Universes themed around sci-fi, fantasy, or modern settings got different palettes or accessories. If an NPC disliked you, their eyebrows and expressions shifted accordingly.

World maps

The worlds themselves used a tiny cellular automata system. A miniature run of Conway's Game of Life generated landmasses. Cities were placed according to simple rules. The map was then blurred and tinted to make it look like a globe or parchment map depending on the universe type.

The key point: everything was seeded, so the same input always produced the same output. Consistency is crucial when you want a coherent game world.

Applications outside games

The techniques discussed here aren't limited to games development. Wrapping non-deterministic model outputs in structured systems has wider use:

Systems where outputs must conform to strict data types, such as APIs or data pipelines.
Workflows where known data sources must be preserved while adding human-like variation, like content personalization.
Processes that benefit from some imaginative expansion while staying rooted in constraints, such as automated report generation or creative writing aids.

Games are just particularly good testing ground. They've always balanced authored content, procedural randomness, and systemic behaviour. Generative AI is simply another tool that can sit alongside our other tools, provided it's wrapped in the right guardrails.

If anything, the future will probably look like a blend of human authorship, procedural logic, and model-assisted generation. Not to replace writers or designers, but to allow small teams to produce richer worlds, in the same way that previous generations of tooling levelled the playing field in the 90s and 2000s.

The challenge is keeping human creativity at the centre. The technology should extend what's possible, not erase the artistry and human element that makes games enjoyable in the first place.

LLMs obviously still have their flaws

You may have noticed in the earlier prompt there was a rule to NEVER Use the name "Eldoria" or any variation of it, it's over-fit in training data. This is because if you ask any of the most popular LLMs to create and describe a word, it has an over-eagerness to tell you about "Eldoria". When I was generating worlds I found Eldoria repeated often in the data. Try it out yourself in an online LLM, you may find you also get this name or something similar, in fact there is an entire reddit conversation where others have found the same thing. Perhaps one day we will discover Eldoria, the land where AI dreams to be.

I fell foul of plenty of issues with JSON repair loops failing, generated dialog that went nowhere or confused the speakers, and occasionally contradicting characters and world outlines. LLMs are of course no match, nor replacement for human writers and creatives, but there is still plenty of room for improvement of the prompts and for a human editor to come in and make something from the thousands of generated concepts.

Let me know what you think

All in all, this was an interesting experiment to see how far a single developer could push narrative generation with a hybrid deterministic and generative pipeline and I'm excited to work on it more.

Thanks for reading this far! I hope this peek into my experiments helps spark your own ideas about blending procedural generation with modern LLMs. If you’ve been working with similar systems or have your own lessons learned, I’d love to hear about them! Drop me a comment down below or drop me a message, I'm @thisisjofrank on all the socials.

DEV Community