Nick Goldstein

Posted on Dec 16, 2025

Multilayered Architectures - Build AI Platforms From Scratch #4

#ai #architecture #tutorial #discuss

What Is a Multilayered AI Architecture?

And how can it supercharge the power of your app?

If one prompt takes an input and runs it through a series of filters and rules and fundamentally transforms the input data, then outputs it to you, multiple coordinated prompts compound those transformations substantially.

The separation of concerns between AI layers not only makes these systems more manageable, but also helps avoid confusing the AI with a million tasks—bolstering performance for the most important functionalities.

This approach is especially useful for systems aiming for near-perfect performance or simply more consistent results.

An Example from Emstrata

The Emstrata Cycle

The Emstrata Cycle is a standardized series of prompts that run on every turn in an Emstrata simulation.

This cycle:

Retains a comprehensive memory of all entities in the simulation
Plans and positions entities on an interactive coordinate plane
Writes prose according to exacting instructions
Captures secrets and memories
Corrects all continuity errors after the narrative is written

No single prompt—or backend wizardry—could accomplish this alone.

Simplified Layers

Groundskeeper (system memory)
Discovery (planning / consequence handling)
Narration (writing the narrative)
Chron-Con (correcting minor errors)

Think Architecturally

Strategize for better platform results

Start with your actual goal, then break it down into steps.

If you were to perform this action yourself:

What steps would you follow?
What decisions would you make?
What information would you need at each stage?

Write that down. That’s your workflow.

Once the workflow is formalized, identify the data transformations required at each step. Build prompts to automate those transformations—and then chain them together.

Illustrative example

If your platform relies heavily on conversation history, token count and performance can suffer.
A conversation consolidation layer may help.
If you need true randomness, serve it from the backend instead of relying on LLM training data to approximate it.

Correction Layers

The referee of your platform

Correction layers catch errors after other layers have completed their work. They are your quality control.

They detect:

Continuity breaks
Logical inconsistencies
Constraint violations

In Emstrata:

The Chron-Con layer runs after the narrative is written and checks things like:

Did a character teleport without traveling?
Did someone use an item they don’t possess?
Are spatial coordinates consistent with the described action?

When you need one:

Use correction layers when your platform has complex requirements. Correcting before revealing the final answer significantly reduces bad outputs.

Reasoning / Strategy Layers

The decision-maker of your platform

Reasoning layers decide what should happen before anything is written.

They:

Evaluate the current state
Consider available options
Assess consequences
Choose a direction

In Emstrata:

Discovery handles this. It evaluates participant intent, simulation state, and narrative logic to determine outcomes—without writing prose.

Rule of thumb:

If you’re asking an LLM to both decide what happens and write it beautifully, you’re overloading a single prompt.

Reason first. Write second.

Memory Consolidation Layers

The stenographer of your platform

These layers distill what just happened into structured, retrievable memory.

They:

Extract important details from verbose content
Store data efficiently for future querying
Maintain a system’s source of truth

In Emstrata:

Groundskeeper updates the comprehensive simulation state after Discovery and Narration complete their work.

Content Layers

The performer of your platform

Content layers generate the output users actually experience.

They:

Take decisions from reasoning layers
Pull context from memory layers
Optimize for tone, pacing, and emotional resonance

In Emstrata:

The Narration layer writes the prose players read. It focuses on atmosphere—not logic or consistency (those are handled elsewhere).

Catch-All / Connector Layers

The clean-up crew of your platform

Some layers don’t fit neatly into one category. These hybrid layers handle glue-work between systems.

They often emerge when:

Layers speak different “languages”
Multiple layers need the same preprocessing
No single layer should own a task outright

In Emstrata:

Chron-Con also extracts and tags secrets and memories for Groundskeeper.

Narration shouldn’t stop to categorize secrets
Groundskeeper needs them explicitly labeled
Chron-Con bridges the gap

Cyclical vs Circumstantial Systems

And everything in-between

Cyclical systems

Same prompts, same order, every time
Predictable execution
Easier debugging and cost estimation

Emstrata runs:

Discovery → Narration → Chron-Con → Groundskeeper

Circumstantial systems

Execution path changes based on outcomes
Routing layers determine what runs next
More adaptive, more complex

Hybrid systems

A reliable core cycle
Conditional branches for edge cases

Most real-world systems land here—including Emstrata.

Agnostic Backend Interaction

What happens between AI layers

Why the backend matters:

Data persistence: save transformed data for debugging and replay
Reusability: present or reuse data later
Unbiased judgment: the backend has no “opinions”

Emstrata example: Weighted randomness

Discovery determines likelihood
Backend rolls a number (1–1000)
Backend confirms success or failure
Narration receives the outcome

True randomness belongs outside the LLM.

Randomness Injection

A jolt of creativity

If your outputs feel trope-y or predictable, try Random Concept Injection.

Use randomness to:

Generate novel character names
Inject unexpected concepts
Build characters from abstract archetypes

Any list of random strings can be injected into a decision-making process to break pattern lock-in.

Cost Considerations

Usage costs will increase

Multilayered systems cost more.

Each layer is an API call. A four-layer cycle can cost ~4× a single prompt.

The real question isn’t:

“How do I add layers cheaply?”

It’s:

“Does the quality improvement justify the cost?”

Optimization tips

Use cheaper models for correction layers
Cache aggressively in cyclical systems
Cut layers that don’t earn their keep

Performance Considerations

Speed vs quality

More layers = more latency.

However:

Independent layers can run in parallel
Sometimes fewer, stronger prompts outperform many weak ones

Layering helps—but it’s not always the answer.

Hallucination Considerations

Avoid compounding errors

Hallucinations compound across layers.

If:

A reasoning layer invents a fact
A content layer writes it confidently

You’ve produced beautifully wrong output.

Critical rule:

Correction must happen before memory consolidation.

Bad data in memory becomes permanent—and grows worse over time.

Major Takeaways

What to remember

Multilayered architectures compound transformations
Layer types give you a vocabulary for intentional design
Cyclical, circumstantial, and hybrid systems each have trade-offs
Backends handle what LLMs shouldn’t: randomness, persistence, determinism

System Prompt Generator Tool

A great way to get started

Available here:

👉 https://nicholasmgoldstein.com/system-prompt-generator

Prebuilt modular system prompt skeleton
Easy to extend with your own rulesets and logic
Copy into Notion, Docs, Word, or anywhere you work

External Resources

Repo: Build AI Platforms From Scratch

https://github.com/goldsteinnicholas/build-ai-platforms-from-scratch
System Prompt Generator

https://nicholasmgoldstein.com/system-prompt-generator
Emstrata

https://emstrata.com/
PLATO5

https://plato5.us/

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.