When we built the Novellum interactive fiction player, the first obvious move was streaming. Push LLM tokens to the frontend as they arrive. Simple.
It didn't work.
The LLM response in an interactive fiction system isn't plain text. It's a narrative envelope:
[DIALOGUE]You were brave to come here.[/DIALOGUE]
[SPEAKER_NPC_ID]42[/SPEAKER_NPC_ID]
[PORTRAIT_EXPRESSION]sad[/PORTRAIT_EXPRESSION]
[BGM_MOOD]melancholic[/BGM_MOOD]
[SCENE]A candlelit library. Rain against tall windows.[/SCENE]
Dialogue for the user. Character expression, music mood, and scene description for the rendering layer. All from a single generation pass.
Stream this raw → brackets and tag names appear mid-story.
Buffer the full response → 3-5 second loading wait.
Neither is acceptable.
The Reactive Stream Scanner
We built a scanner that processes the stream chunk-by-chunk and behaves differently per tag type:
For [DIALOGUE] content:
Emit partial tokens immediately as they accumulate. UTF-8 safe — no cutting multi-byte Chinese or Japanese characters mid-byte.
For semantic tags (BGM_MOOD, PORTRAIT_EXPRESSION, SPEAKER_NPC_ID, SCENE):
Accumulate until the closing tag is found. Emit a single complete event.
Each complete event triggers an immediate downstream action:
| Event | Action |
|---|---|
DIALOGUE partial |
stream WS message → text appears word-by-word |
BGM_MOOD complete |
Track lookup → bgm_change → music fades in mid-stream |
SPEAKER_NPC_ID + PORTRAIT_EXPRESSION both present |
speaker_state → portrait expression changes |
SCENE complete |
Async image generation queued → asset_ready later |
Timeline of a Single Turn
t+0ms User message sent
t+100ms First dialogue words appear ← user is already reading
t+300ms BGM_MOOD closed → music starts changing
t+500ms speaker_state fires → character expression shifts
t+2000ms LLM finishes generating
t+10000ms Scene background image arrives
By the time the LLM finishes, the user has been reading for 1-2 seconds. The music changed. The character's face settled. The world moved while the story was being written.
Notable Edge Cases
UTF-8 safety: The scanner must find the longest prefix that ends on a valid rune boundary before emitting a partial event. East Asian content breaks if you cut at a byte boundary.
Ambiguous brackets: Inside [DIALOGUE], a [ might start a closing tag or be literal character dialogue. The scanner keeps a one-bracket lookahead before committing.
speaker_state coordination: SPEAKER_NPC_ID and PORTRAIT_EXPRESSION arrive as separate tags in no guaranteed order. The processor accumulates partial state and emits speaker_state only when both are present and valid.
Fallback mode: If the buffer exceeds a threshold without recognized tag structure (malformed model output), the scanner enters fallback mode rather than stalling indefinitely.
Why This Is Hard to Retrofit
This layer requires 4 coordinated parts:
- Prompt engineering — the model must emit the tag structure consistently
- Server-side scanner — runs backend to resolve BGM/speaker lookups before forwarding
- Typed WebSocket protocol — not raw text, typed event envelopes per event class
-
Frontend projection system — handles concurrent
stream+bgm_change+speaker_stateevents without visual artifacts
You can't bolt this onto a chatbot architecture. You'd be replacing the interaction layer.
About Novellum
Novellum is a full-stack interactive fiction system for platform deployment. The reactive stream parsing layer is a core component of the standard player experience — included with full-system deployments alongside creator tools, operations backend, and monetization.
For the product strategy context: Why AI Chat Products Lose Users and Interactive Fiction Keeps Them
Top comments (0)