Randall

Posted on Jun 14

Yozakura's Prompt Templating System

#ai #webdev #typescript #programming

Recently I open-sourced a top-secret pet project called Yozakura. Check it out on GitHub. There are a number of interesting technical aspects I'd like to do deep dives on, and this article will discuss the first of those: the prompt templating system.

What is Yozakura?

Yozakura is an LLM-powered social simulation where many AI characters (up to dozens or even hundreds) navigate around a simple map, talk to each other, and form memories and intentions towards each other. The user can also control a character. You can read more about the core features in the documentation here.

What is a prompt templating system?

In LLM-powered software that abstracts the actual prompt text away from the user, it's often advantageous to implement a templating system to enable advanced users to customize prompt message content.

One example is an application that allows the user to enter their name during setup. The application could then have a system prompt template that includes a statement like:

You should address the user as {{context.firstName}}.

The actual configured first name is dynamically rendered into the prompt before it's sent to the LLM API, and thus the LLM knows what to call the user.

But what if the user would prefer to be called by their last name? If the application allows the user to edit the template itself, then they can update it to:

You should address the user as {{context.lastName}}.

The context is an object containing all of the dynamic data that the broader system makes available inside of the template. This could include more information like date of birth, name of the file currently being worked on, settings, or anything really.

That is the essence of a prompt templating system.

Prompt Templating in Yozakura

The prompt templating system in Yozakura has a few features that make it especially powerful.

First, to ground the conversation, here's what editing a prompt template group in Yozakura looks like:

A prompt template group represents a group of templates that are rendered together to construct a sequence of prompts that are processed together by the LLM.

In this case, we're looking at the prompts used to ask the LLM "who should speak next?" when in a group chat with intelligent speaker selection mode enabled. We generate two prompts via two templates, a system role prompt containing instructions, and a user role prompt containing the actual information to process (a transcript of chat messages so far and a list of eligible next-speaker candidates).

Finally, this prompt group also includes a parser function, which transforms the output from the LLM. In the prompts, we instruct the LLM to output a first name for the next speaker. But Yozakura actually wants a character ID, not first name. The parser allows us to do a transformation from first name to character ID after the LLM (hopefully) returns a first name.

Full JavaScript support

Prompt templates in Yozakura are rendered by Eta, a templating engine that has the same syntax as EJS, but with performance and other improvements.

This allows full support of JavaScript code in prompt templates. If you want to inject the current date into the prompt, you could add into the template:

The current date and time is <%= new Date().toLocaleString() %>

It supports all browser APIs and even async/await. The actual implementation here is pretty simple (just instantiate Eta and pass the template string and context in) so I won't go into too much detail, but if you're interested you can see the relevant code here.

AI-first editing workflow

While Eta makes the templating system very flexible and powerful, it also makes templates a PITA to read and write when they contain advanced logic like loops, async operations, etc, and it's pretty impenetrable if you're not a coder. For this reason, Yozakura generates a context document for each template that users can simply copy, paste into their favorite frontier AI, describe what changes they want, and let the AI do the hard work. This document is available via the AI Assistant Instructions button:

Here's an example of Claude Opus 4.8 Extra helping me update the next speaker selection prompt:

Me: (uploads document for next speaker selection) Can you make the character named Wendy more talkative when it's windy in real-world Chicago? But only when Jill isn't around, she makes Wendy feel intimidated.

Claude: Only the system prompt template needs to change — the parser stays exactly as-is (it still just maps a name to an ID).

Here's how I interpreted your request, so you can adjust if I got something wrong:

Wendy only gets boosted if she's actually one of the speaker candidates this turn.

"Jill isn't around" = Jill is not a participant in the current conversation (it.participants). If she's present, no boost.

"Windy in Chicago" = live wind speed ≥ 20 mph from a real weather API.

For the weather I used Open‑Meteo, because it's free, needs no API key, and sends Access-Control-Allow-Origin: *, so it works straight from the browser without a CORS proxy (which Yozakura doesn't provide). The result is cached in globalWritableContext for 10 minutes so you're not hitting the API on every single speaker selection. If the fetch ever fails, it silently falls back to normal behavior.

Update: "Moderation Next Speaker (System)" textarea

Replace the entire contents with this:

<updated prompt template, too big to include here>

Boom, now I have live weather data influencing speaker selection, just by making a simple request to Claude.

Automatic Documentation

To support the context document feature, I needed a way to generate the documents efficiently and in a way that's in sync with the actual reality of what context gets passed into the prompt. Writing and maintaining these by hand would be a nightmare.

As it often does, one of my favorite JavaScript libraries of all time, Zod, comes to the rescue. It handles validation, typing, and documentation, all from a single source of truth in my code.

Essentially, I implement a complete list of all possible context fields in a Zod object schema, using Zod's .meta() function to document the meaning of each field. That looks like this:

const contextSchemaFields = z.object({
  focusedCharacter: characterSchema.meta({
    description: `The character who is in focus for this prompt (maybe they are the speaker, maybe they are having memories generated, etc. The exact meaning of "in focus" depends on the specific prompt template.).`,
  }),
  focusedCharacterAppearance: z.string().meta({
    description: `The concatenation of the focused character's baseAppearanceTags with all of their enabled wardrobes. This is a convenience field and doesn't encode any information beyond what is already available in focusedCharacter.wardrobes.`,
  }),
  ...
  ...
  ...
});

For anyone who's not familiar with Zod, a Zod schema like the above can be used as a validator (via mySchema.parse()), or a type (via z.infer<typeof mySchema>), or, less frequently known, as a documentation source (via mySchema.toJSONSchema()). Or of course, all of the above!

After declaring all of these fields in one master context field schema, I declare various subsets of these fields, which each are available in different execution contexts that prompts can run in. These subsets include:

GlobalExecutionContext - All prompts get these context fields - basically just application settings.
ScenarioExecutionContext - All prompts running inside of a scenario (as opposed to main menu, character creator, etc) - Gets information such as the user character, current map, all other characters. - Inherits from GlobalExecutionContext
ConversationExecutionContext - All prompts running in the context of a conversation currently happening - Gets information such as conversation participants, conversation transcript so far, etc. Inherits from ScenarioExecutionContext.
FocusedConversationExecutionContext - All prompts running in the context of a conversation currently happening, and where there is a character who is currently the "subject of focus" (they are speaking, they are having an image generated, etc). - Inherits from ConversationExecutionContext
...A few more

Declaring these looks like, for example:

export const focusedConversationExecutionContextSchema =
  conversationExecutionContextSchema.and(
    contextSchemaFields
      .pick({
        focusedCharacter: true,
        focusedCharacterAppearance: true,
        rollingConversationSummariesText: true,
      })
      .meta({
        description:
          'This is the context that is available to all prompts that run in the scope of a conversation between two or more characters, and where one of those characters is currently the subject of focus. The meaning of "subject of focus" depends on the specific prompt, but it means things like "the character who is currently speaking", "the character we are currently generating an image of", etc.',
      })
  );

export type FocusedConversationExecutionContext
  = z.infer<typeof focusedConversationExecutionContextSchema>;

There are a few more, but as you can see, they are hierarchical and they also have a standardized vocabulary. When a character is speaking or having a memory generated, they are the context.focusCharacter, rather than separately being context.currentSpeaker and context.currentMemoryGeneratingCharacter or such.

Each prompt template group declares which context it needs, and requires that exact context to be constructed and passed in to its render function. The context builder is able to source most of the required context from the global Zustand state, requiring the caller to pass in only whatever non-global context is needed, if any.

Finally, this gives me all the pieces I need for my context doc generator. Since each template group knows which context it takes, it can expose a getDocumentation() function, which essentially calls this.contextSchema.toJSONSchema() to get a JSON Schema for its context. Through the power of Zod, this JSON Schema contains all the necessary type information for the context fields, and the descriptions I wrote for them. The JSON Schema then gets rendered into the "context document template" to generate text that's ready for hand-off to your favorite AI assistant.

Just to reiterate, I also get type-safety in my code. If I try to build a context in a way that doesn't conform to its documented schema, I'll get TypeScript errors. So this pattern makes it nigh impossible for the documentation and the code to get out of sync.

Schema Post-Processing

The schema from toJSONSchema() is sub-optimal in a couple of ways:

It doesn't make any attempt to single-source entity definitions in a $defs section, leading to unnecessary bloat.
It has ugly deep stacks of allOf arrays which complicate the documentation while adding little semantic utility (this is caused by using Zod's and() function to extend schemas. extend() doesn't do this, but has its own issues for this use case, as I've found).

So I do some post-processing on it to repair that.

The same code is also used to generate the documentation for my GitHub pages site, which you can see here.

As an aside, I pretty much vibe coded this whole post-processing logic file, and this is exactly the kind of task I value AI coding for the most: narrow integration point, small blast radius, easy to test. I could have spent most of a day writing this monstrosity myself.

Parser

As hinted before, prompt groups can have a custom parser function to transform the output from the LLM. The parser is also async and has access to the same context. For most use cases, no custom parser is needed and the default identity parser ((response) => response) is sufficient. But the "next speaker selection" example is one instance where a custom parser makes sense, and you can get creative with it. For regular conversation messages, you could instruct the LLM to return JSON to you with message, and facialExpression properties, and your custom parser could just return the message part, while writing the facialExpression part into globalWritableContext for usage later in your image generation prompt template, for example.

globalWritableContext is a shared global, writable object available to all templates/parsers, which they can use to pass information around. Claude also suggested using it above in the weather example, to cache results from the weather API, which is good judgement.

Why no AI editor assistant in-app?

Yozakura has completions API integration and a chat interface, so wouldn't it be natural to build the "ask AI to change your template" flow directly into Yozakura? I'd actually like to do this in principle, but consider it a mere nice-to-have. The copy-paste flow is more flexible in that:

You get to leverage whatever tools and whatnot your favorite AI platform has, like web search, thinking, etc.
Some people don't have API access to frontier models. They might have a regular monthly subscription, while using local models in Yozakura.

I think the friction of the copy-paste flow is fairly minimal for the kinds of users who will actually want to edit templates.

Security?

It's important to note that full JavaScript support in prompt templates means that copy-pasting malicious code into templates can be harmful. Yozakura does have a warning for this:

I'd like to explore executing template renders in a Web Worker, which should be more secure, though it has some significant drawbacks as well. For this type of application, just trusting the user to take care of themselves may be the best solution (hey, the license agreement does say "no warranty"!)

What's next

I'm dying to implement tool calling and an agentic loop, so that AI characters can do things like add new characters to the scenario, update location descriptions, search past conversations, and more. Wouldn't that be rad? I started working on this a little but decided not to delay a release for it. I've already scope-creeped myself enough as-is.

This will likely require extensions of this documentation generation system, as JSON Schemas don't support any function data type, and my off-the-shelf documentation renderer for my GitHub pages site presumably doesn't either.

Conclusion

I love how magical this feature can feel with the code-execution capabilities and AI-first editing workflow. You can achieve some pretty surprising things with it, it's almost like a mini modding system.

Let me know what you think or if you have any questions!

DEV Community