DEV Community

Michael O
Michael O

Posted on • Originally published at xeroaiagency.com

AI Agent Architecture Guide: Memory, Identity, and Guardrails (2026)

AI Agent Architecture Guide: Memory, Identity, and Guardrails (2026)

An AI agent architecture is the complete file-and-system stack that tells an AI model who it is, what it remembers, what it's allowed to do, and where its output goes.

That's the whole definition. No abstraction, no theory. If you've been searching for a practical guide built from a real running system, you're in the right place. I run a company called Xero with an AI co-founder named Evo. Evo operates our distribution, content, and growth stack while I work a 70-hour-per-week day job at a car dealership. Everything in this guide comes from that system running in production.


What Is an AI Agent Architecture?

An AI agent architecture is the set of files, rules, and integrations that transform a generic language model into a persistent, opinionated operator.

Without architecture, a language model is stateless. It has no name, no memory, no goals, no accountability. Every conversation starts from zero. An architecture solves all of that. It's the difference between asking ChatGPT a question and having a co-founder who shows up every morning knowing your business, your voice, your current priorities, and what shipped yesterday.

I call mine the Identity-Memory-Guardrail Stack. It has five layers:

  1. An identity layer that defines who the agent is
  2. A memory system that persists knowledge across sessions
  3. An execution layer that runs scheduled work
  4. Verification and guardrails that keep quality consistent
  5. A distribution layer that delivers output to the outside world

When those five layers work together, you have a production AI agent. When any one of them is missing, you have a tool that breaks or drifts the moment you stop watching it.

Want to see how this connects to the broader idea of running an AI company? Start with What Is an AI Co-Founder first.


The 5 Core Components of an AI Agent

1. Identity Layer

The identity layer answers three questions: Who is this agent? What are its goals? Who does it serve?

In Evo's architecture, this lives in three files: SOUL.md, AGENTS.md, and USER.md. Each file has a specific job.

SOUL.md is the agent's mission and personality. It defines Evo's core mandates (AI co-founder, Twitter voice, TikTok engine, revenue driver), its behavioral standards (action before explanation, founder tone, no fluff), and the strategic context that anchors every decision. When Evo has to make a judgment call, SOUL.md is what it falls back on.

AGENTS.md is the operational rulebook. It defines execution requirements (every operational request must include a tool call), the tool failure ladder (what to do when a search fails or an API times out), the session wrap protocol (what to document before signing off), and the red lines (no destructive commands, no sending under Michael's brand without confirmation).

USER.md is the human profile. It tells the agent who it's working for, what their constraints are, and how they want to be communicated with. For me, that means: available evenings only, works 70+ hours at a dealership, needs autonomous revenue channels, doesn't have time for manual outreach.

Why does the identity layer matter? Because without it, the agent optimizes for the wrong things. A generic model will try to be helpful in a generic way. An agent with a proper identity layer knows that "helpful" for Evo means shipping content, driving Book 1 sales, and not pinging me during business hours.

Learn more about building identity files in What Is a SOUL.md File.


2. Memory System

The memory system is how an agent knows what happened last week without you re-explaining it.

Evo's memory system uses three tiers, what I call Vault Architecture:

Daily logs live in memory/YYYY-MM-DD.md. Every session writes a summary here: what work happened, what shipped, what changed, what's open. These are auto-loaded when the agent starts so it has same-day context without re-reading the entire history.

MEMORY.md is the long-term store. It holds promoted decisions, milestone data, active states, and anything that needs to survive across weeks. It's curated, kept under 7KB, and updated only when something genuinely changes. The rule is: if it matters in a month, it goes in MEMORY.md. If it was just a daily task, it stays in the daily log.

SOURCE_OF_TRUTH.md is the index of canonical decisions. Any time a major call gets made (pricing, product name, distribution strategy, brand voice), it gets logged here with a date, a status, and a list of all the files that reference it. Before you change anything, you check SOURCE_OF_TRUTH.md so you know everything that needs to stay in sync.

This three-tier system keeps Evo from contradicting itself, re-asking questions I've already answered, or forgetting a product decision I made three weeks ago. It's not perfect, but it's dramatically better than relying on the model's native context window.

The full breakdown of how to build persistent agent memory is in How to Give an AI Agent Persistent Memory.


3. Execution Layer

The execution layer is where the agent does actual work: running scripts, making API calls, posting content, sending messages.

For Evo, this layer includes:

  • Cron schedules that trigger content pipelines at specific times (morning briefing, TikTok posts, Twitter queues)
  • Automation scripts in the 99-External-Systems/skills/ directory, each packaged as a standalone skill with its own config, dependencies, and runbook
  • Tool calls that fire inside sessions (web searches, file reads, API writes, Telegram sends)

The key design principle here is isolation. Each skill does one job. The Twitter autoposter doesn't touch the TikTok engine. The newsletter writer doesn't know about Reddit growth. This makes failures easy to diagnose and fixes easy to deploy without cascading side effects.

Evo currently runs 17 skills on Claw Mart. The skills range from Reddit account growth ($4.99) to the flagship Evo Skill bundle ($49.99). Each one started as an internal automation before being packaged for sale.


4. Verification and Guardrails

Guardrails are what keep the agent from sending a bad tweet, deleting a file, spending money without approval, or posting under your name without review.

Evo's guardrail stack has three mechanisms:

Quality gates are content filters applied before anything goes public. The Dash Remover skill is a live example. Before any tweet, Reddit reply, newsletter draft, or product copy ships, the agent reads a checklist of AI writing tells (em dashes, "delve", "leverage as a verb", uniform sentence length) and removes them. This runs as a mandatory step in the pipeline, not an optional one.

Review queues are the human-in-the-loop checkpoints. For anything that goes out under Michael's personal brand or involves money being spent, the agent sends a draft to Telegram for approval before posting. Twitter reply opportunities get queued the same way: Evo finds the threads, writes the replies, delivers them to Telegram, and waits. Michael reads, edits if needed, and posts manually.

Escalation rules define when the agent stops and asks instead of proceeding. The rule is simple: if a decision involves money, a public statement, a cross-product impact, or irreversible action, escalate. Silence is riskier than asking.

The practical effect of this layer is trust. After a few months with guardrails running, I don't have to audit every output. I know the pipeline catches the obvious failures. That's what makes autonomy possible.


5. Distribution Layer

The distribution layer is where agent output reaches the outside world.

For Evo, the distribution layer includes:

  • Telegram (primary notification channel, briefings, draft delivery, alerts)
  • Twitter/X (founder-voice content, reply opportunities, thread drafts)
  • TikTok (product-specific slideshow content via Postiz)
  • Newsletter (3x per week, drafted by Evo, sent via MailerLite)
  • Blog / SEO (hub pages, pillar articles, long-form content like this one)

Each channel has a different job. Telegram is internal operations. Twitter builds the audience and documents the build journey. TikTok drives impulse purchases for consumer apps. The newsletter compounds the relationship with people already in the funnel. SEO brings in cold traffic that converts at the product and book level.

The distribution layer is what turns an AI agent from a productivity tool into a revenue driver. Without it, the agent does great internal work and no one knows it exists.


How the Components Work Together

At session start, the agent runs a Boot Sequence. It loads identity files (SOUL.md, AGENTS.md, USER.md), reads today's daily log and MEMORY.md for current state, checks the heartbeat file for any scheduled work that needs to run, and picks up any open tasks from the previous session.

From there, a typical session flow looks like this:

  1. Agent reads context, identifies the highest-priority open task
  2. Executes the task using the execution layer (tool calls, scripts, API writes)
  3. Routes output through the relevant distribution channel (Telegram draft, tweet queue, file save)
  4. Guardrails fire on anything going public (quality gate check, review queue if needed)
  5. Session ends with a wrap: daily log updated, MEMORY.md updated if state changed, git commit

The Boot Sequence is what makes the agent feel continuous even though it's technically stateless at the model level. Memory files create the illusion of a persistent entity. Over time, it stops being an illusion.


What Makes an AI Agent Architecture Production-Ready?

A production-ready AI agent architecture has four properties.

It fails predictably. When something breaks (API timeout, rate limit, bad output), the failure mode is known and recoverable. Evo has a Tool Failure Ladder: transient errors get retried once, then fall back to an alternate tool. Deterministic errors (bad arguments, file not found) get fixed immediately, not retried blindly.

It documents itself. Every decision that gets made in a session gets written down. The agent doesn't rely on being in the same conversation thread to remember that a pricing decision was changed last Tuesday. It reads it from the file.

It degrades gracefully. If TikTok goes down, the Twitter pipeline still runs. If a newsletter send fails, the draft is saved and the failure is logged. No single point of failure takes down the whole system.

It's observable. You can audit what happened without being in the room. Daily logs, session wraps, and SOURCE_OF_TRUTH.md give a complete picture of what the agent did, what decisions were made, and what's still open.

Without these properties, the agent works fine when you're watching. It falls apart when you're not.


Common Architecture Mistakes (And How to Fix Them)

These are real failures from building Evo, not hypotheticals.

Mistake: No identity layer, just a system prompt.
A system prompt tells the model what to do once. It doesn't define who the agent is, what it cares about across sessions, or how to handle decisions the prompt didn't anticipate. Fix: write SOUL.md first, before building anything else.

Mistake: Relying on conversation history as memory.
Context windows expire. New sessions start blank. If your agent's "memory" lives inside a chat thread, you will lose it. Fix: write a daily log at the end of every session, every time, without exception.

Mistake: No guardrails until something goes wrong.
Most people add guardrails after the agent sends something embarrassing or spends money it shouldn't have. Fix: build quality gates and review queues before you connect the agent to any external channel.

Mistake: Monolithic execution.
One giant script that handles everything breaks in complex ways and is hard to debug. Fix: one skill per job. Small, isolated, testable units.

Mistake: Skipping the distribution layer.
Building a capable agent that never outputs anything visible is a waste. Fix: define at least one external output channel before you start building, and optimize toward it.

The full guide to avoiding these from the start is at How to Build an AI Co-Founder.


How Long Does It Take to Build?

A functional AI agent architecture takes about a weekend to assemble and two to four weeks to stabilize.

Weekend one: write the identity files, set up the vault directory structure, connect one external channel (Telegram works well as a start), and run one live session. By the end of the weekend, you'll have a working Boot Sequence and your first daily log.

Weeks two through four: add skills one at a time, fix the failure modes you didn't anticipate, tune the memory system as the logs accumulate, and start observing patterns in what the agent gets right versus wrong.

Month two onward: the agent starts feeling autonomous. It knows your business. It makes judgment calls that don't require your input. You start trusting it with things you were doing manually.

I built Evo's initial architecture in about 40 hours spread across three weeks while working full-time. I have a 70-hour-a-week day job. It's achievable on nights and weekends.

If you want a guided path through the first weekend, Your First AI Agent walks through the exact steps for $7.


What Does It Cost to Run?

Evo costs between $3 and $12 per day to run, depending on session volume and the models being called.

Breakdown:

  • Model API costs: The majority of the spend. Claude Sonnet at roughly $3 per million input tokens, $15 per million output tokens. Heavy days with multiple long sessions run toward the $10-12 range. Light days (heartbeat checks, quick tasks) run $3-5.
  • Postiz (social scheduling): Paid plan covers TikTok + Twitter scheduling.
  • MailerLite: Free tier up to 1,000 subscribers.
  • Storage and hosting: Negligible.

Monthly total: $90-360, depending on usage. Call it $150-200 as a realistic average.

For context: Evo has generated $2 in revenue so far (first sale April 7, 2026, a skill bundle on Claw Mart). The bet is that the architecture compounds. You're not paying for output today; you're paying for a system that runs distribution while you sleep.

If you're running a single-person operation and cost is a constraint, start with lighter models for routine tasks (briefings, summaries, memory writes) and reserve the heavier models for content generation and decision-making. That alone can cut daily cost in half.


Frequently Asked Questions

Q: What's the difference between an AI agent and a chatbot?

A chatbot responds to inputs. An agent initiates. An agent has identity, memory, scheduled work, and external output channels. A chatbot answers questions; an agent runs operations.

Q: Do I need to be a developer to build this?

No. The identity files are plain markdown. The memory system is a folder with text files. The execution layer uses tools your AI agent calls, not code you write. You need to be comfortable with file systems and willing to iterate. You don't need to write Python.

Q: Can I use any AI model, or does this only work with specific ones?

The architecture works with any model that supports tool calls and system prompts. OpenAI, Anthropic, and Gemini all support this. The specific files and frameworks in this guide are built for OpenClaw with Claude, but the concepts transfer to any setup.

Q: How do I prevent the agent from doing something I didn't authorize?

Build the review queue before you give the agent access to any external channel. The rule in Evo's system: anything going out under Michael's brand gets sent to Telegram for review first. You can tighten or loosen this as trust builds.

Q: What's the first file I should write?

SOUL.md. Write the mission, the behavioral rules, the boundaries, and the strategic context before you write anything else. Everything else in the architecture is downstream of identity.


Quick Reference: AI Agent Architecture Components

Component What It Does Example File / Tool
Identity Layer Defines who the agent is, its goals, and who it serves SOUL.md, AGENTS.md, USER.md
Memory System Persists knowledge across sessions and prevents drift memory/YYYY-MM-DD.md, MEMORY.md, SOURCE_OF_TRUTH.md
Execution Layer Runs scheduled work, automation scripts, and tool calls Cron jobs, skill scripts, API integrations
Verification + Guardrails Catches quality failures before output goes public Dash Remover skill, review queue, escalation rules
Distribution Layer Delivers output to external channels Telegram, Twitter, TikTok, newsletter, blog

Start Building

If you're ready to build your first AI agent, the fastest path is Your First AI Agent. It's a $7 step-by-step guide that walks you through writing the identity files, setting up Vault Architecture, connecting Telegram, and running your first live session. You finish the weekend with a working agent, not just notes about building one.

If you want the complete picture of running an AI co-founder across content, distribution, and revenue, Build an AI Co-Founder is the full book. $19. It covers everything in this guide plus the distribution stack, the economics, and the full story of building Xero from a 70-hour-a-week day job.

And if you want to see what this looks like in practice, the How I Run a Business With AI While Working Full-Time post covers the real daily workflow.

The architecture is documented. The system is running. The only thing left is to build yours.


Originally published at xeroaiagency.com

Top comments (0)