Why Your AI Fiction Player Feels Like a Chatbot in a Costume

#webdev #react #ai

Most AI story apps look believable in a demo and wrong in use.

The cause is rarely bad design. It is concurrent state management.

During a single AI-generated fiction turn, the frontend receives seven distinct event types — streaming dialogue tokens every 50ms, a speaker_state expression change mid-stream, a bgm_change, a final done payload, enriched metadata_ready a second later, and a generated background image in asset_ready five to twenty seconds after that. All of them potentially update visual state. None of them arrive in guaranteed order relative to each other.

In a naive implementation with one shared state object, these events compete. The one applied last wins, regardless of semantic correctness. And if the user is browsing conversation history while the live stream processes, the live events overwrite what they are looking at.

This article is about the three patterns that solve it.

The Symptom

Open any AI companion app. Find an older conversation. Scroll back through the message history. Watch the character portrait.

In most apps, the portrait shows the character's current expression — the one from their most recent message. Not the expression they had in the moment you're reading.

This is not a CSS bug. It's an architectural symptom: a live event updated a shared state store, and the history viewport reads from the same store.

Pattern 1: Two-Snapshot Model

The store maintains two snapshots of the visual world:

liveSnapshot: updated by every incoming event, always. This is the live ground truth.

viewedSnapshot: derived from a specific historical message's accumulated data. Frozen while the user is in history mode.

// The entire rendering isolation is one line in the ViewModel:
const sceneSnapshot = viewedSnapshot ?? liveSnapshot;

Live events (speaker_state, done, asset_ready, metadata_ready) always write to liveSnapshot. They do not know whether the user is browsing history. The isolation is at the read layer, not the write layer.

When the user navigates to an old message: derive viewedSnapshot from that message's patch map entry.
When the user returns to latest: clear viewedSnapshot. liveSnapshot — which has been updating correctly the whole time — immediately renders.

No reconciliation. No catch-up logic. The live stream was always tracking correctly.

Pattern 2: Per-Message Patch Map

scenePatchMap: Map<number, SceneInfo>;

Each message ID accumulates visual patches from three sources:

Message fields at load time from the database (persisted portrait_expression, scene_background_url)
done payload fields at turn end
asset_ready patches matched by assistant_message_id

On history navigation: scenePatchMap[messageId] → derive viewedSnapshot. Any message, any time, deterministic.

Merge discipline: Every field uses explicit presence detection:

sceneBackgroundUrl: hasOwnField(patch, "sceneBackgroundUrl")
    ? patch.sceneBackgroundUrl ?? null
    : current.sceneBackgroundUrl ?? null,

Backend payloads are sparse — done often arrives before asset_ready, without a background URL. hasOwnField distinguishes "field absent" (preserve current) from "field = null" (reset). Without it, sparse payloads blank fields they never intended to clear.

Pattern 3: Preload Before Transition

Background and portrait transitions use a queuing pipeline:

New image URL arrives in state
const img = new Image(); img.src = url
img.onload fires → set queuedUrl
Effect on queuedUrl → start CSS transition (750ms for stage, 460ms for actor)
Transition end → advance displayedUrl

Components render displayedUrl, not the raw state URL. The transition fires only after the image is in browser cache. No white flash on the first frame.

Portrait transitions also distinguish between character-swap (full cross-fade, new character fades in after outbound fades out) vs expression change on the same character (overlapping transition).

Why You Can't Add These Later

These patterns require:

assistant_message_id threaded through the WebSocket protocol from backend to frontend — every asset_ready and metadata_ready event carries it
client_request_id generated client-side, echoed by the server on every related response, persisted on message records
Store separation at the architecture level: separate scene store from session store from UI store, with separate write paths
Rendering layer consuming viewedSnapshot ?? liveSnapshot, not raw event state

A team that starts with a simple chat state model and wants to add history browsing, smooth image transitions, and optimistic message reconciliation has to make all of these changes. They just discover it after more code is already in place.

About This Project

This architecture is built into the Novellum interactive fiction player. Concurrent event handling, history fidelity, and smooth visual transitions are default behaviors for all deployments.

This is the second article in the "Building Interactive Fiction at Scale" series:

How Structured Stream Parsing Makes AI Interactive Fiction Feel Instant — parsing structured LLM responses in real time
This article — concurrent state and the two-snapshot pattern

Have questions about the patterns? Leave a comment or reach out at team@novellum.live.