Murni Marcus

Posted on May 25 • Originally published at vantage-digital.online

How We Built Dynamic NPC Dialogue with LLMs — Lessons from Early Access

#gamedev #ai #llm #npc

How We Built Dynamic NPC Dialogue with LLMs

We're a small team at Vantage Digital Labs building AI tooling for game developers. Our first product is an NPC dialogue engine powered by LLMs — and we've been running it in early access for a few months now. Here's what we've learned.

The Problem

Traditional NPC dialogue is written by hand. Every line, every branch, every response to every possible player input. For a small studio making an RPG with 50 NPCs, that's thousands of lines of dialogue — and it's all static.

What if NPCs could respond dynamically? What if a merchant could actually react to what the player says, instead of cycling through 3 pre-written lines?

Our Architecture

We went with a simple but effective pipeline:

Player Input → Context Builder → LLM API → Response Parser → Game Engine
                    ↑                              |
                    └──── Memory / State ──────────┘

Context Builder — Injects the NPC's personality, location, knowledge, and recent conversation history into a system prompt.

LLM API — We started with GPT-4o-mini, then tested DeepSeek and Qwen. For cost-sensitive indie games, smaller models work surprisingly well if the prompt is good.

Response Parser — Extracts the dialogue text plus metadata like emotion tags ([emotion:happy]) and action tags ([action:wave]).

Memory — A simple relevance-scored store that lets NPCs "remember" past interactions.

What Actually Matters

After running this for a few months, here's what we found:

1. System Prompt Engineering > Model Size

A well-crafted system prompt with a 7B model beats a generic prompt with GPT-4. We spend more time on personality definitions and context injection than on model selection.

You are Goron, a friendly dwarven merchant who loves haggling.
Location: Marketplace
You know about: prices, rare items, local rumors
Respond in character. Keep replies under 3 sentences.

Short, specific, constrained. That's it.

2. Response Parsing is Underrated

LLMs are chatty. Games need structured output. We use simple tag extraction:

const emotionMatch = raw.match(/\[emotion:(\w+)\]/i);
const actionMatch = raw.match(/\[action:([^\]]+)\]/i);
const text = raw.replace(/\[(emotion|action):[^\]]*\]/gi, '').trim();

This gives us clean dialogue text plus metadata for animation triggers.

3. Latency Matters More Than Quality

Players won't wait 3 seconds for an NPC to respond. We target <500ms total latency. This means:

Streaming responses (display text as it generates)
Smaller models for non-critical NPCs
Aggressive caching of common responses

4. Conversation History Windowing

Sending the full conversation history is expensive and slow. We window to the last 10 exchanges, with a separate memory system for important facts.

if (history.length > 20) history.splice(0, 2);

Simple, effective, cheap.

Cost Reality Check

For a game with 1000 daily active players, each talking to 5 NPCs per session:

GPT-4o-mini: ~$2-5/day
DeepSeek V3: ~$0.50-1/day
Self-hosted 7B: ~$0 (on existing game server)

For indie games, the economics work. It's not free, but it's cheaper than hiring a dialogue writer for every language.

Open Questions We're Still Working On

Consistency — How do you keep an NPC's personality stable across thousands of conversations?
Multilingual — Supporting 5+ languages without maintaining 5x the prompts
Voice — Combining LLM dialogue with real-time TTS (we're experimenting with this)

Try It

We have a live demo on our website where you can talk to NPCs powered by our engine. It's running a real inference backend, not canned responses.

If you're building a game and want to experiment with AI NPCs, we're in early access and happy to chat.

Vantage Digital Labs builds AI tooling for game teams. vantage-digital.online

DEV Community