DEV Community: Nelson Amaya

The Five Faculties: A Tour of SAFi's Cognitive Architecture

Nelson Amaya — Sun, 07 Jun 2026 22:12:00 +0000

Most attempts at AI governance treat alignment as a prompt-level concern. You write a system message, hope the model follows it, and accept that any sufficiently creative attacker can talk the model into ignoring it. The Self-Alignment Framework Interface (SAFi) takes a different approach. Instead of asking a single LLM to judge its own output, SAFi splits cognition across five specialized faculties, each with a distinct role, a defined interface, and no ability to overstep its bounds. The result is a governed AI architecture that decouples generation from evaluation from execution.

Let’s walk through each faculty in order, following the actual loop the orchestrator runs on every turn.

Phase Zero: The Pre-Generation Barrier

Before the Intellect ever sees a user prompt, the Phase Zero gate (phase_zero.py) runs a deterministic security scan. It checks injection signatures from a threat intelligence module, per-persona blacklisted phrases, and an entropy-based heuristic that catches indirect prompt injection attempts (the so-called “ancient text” pattern where a high-entropy blob contains embedded instruction markers). Phase Zero makes zero LLM calls. If it flags a threat, the orchestrator short-circuits immediately to a governed redirect, and the Intellect is never exposed to adversarial content.

1. Synderesis: The Immutable Constitution

The Synderesis faculty (synderesis.py) is the system’s constitution compiler. Before any prompt is processed, Synderesis defines the governance policies, value weights, and scope boundaries that every other faculty will reference. It exposes PERSONAS, GOVERNANCE_MAP, and functions like get_profile, list_profiles, and assemble_agent. At runtime, Synderesis is read-only. Its policies cannot be changed mid-conversation, which makes social engineering against the value system structurally impossible.

2. Intellect: The Generative Engine (Air-Gapped)

The Intellect (intellect.py) is the only faculty that talks to an LLM for generation. It parses RAG context, conversation history, Spirit feedback, and the user prompt to produce a typed intent. That intent is either a text response or a tool call proposal. The critical architectural invariant is the Air Gap: the Intellect never executes tools. It returns tool calls as proposals for the Will to approve. The generate method returns a 3-tuple of (intent, reflection, retrieved_context), and the orchestrator routes everything through the Will before any action is taken.

3. Will: The Deterministic Gatekeeper

The Will (will.py) is pure Python with zero LLM calls. It doesn’t deliberate or negotiate. It runs strict structural passes, checking syntax, required exclusions, and user invariants. If a check fails, the Will vetoes the proposal immediately.

The Will distinguishes between two failure modes. A hard-gate breach (a non-negotiable value with hard_gate=true scoring at or below -1.0) is caught deterministically and routed directly to a governed redirect with no rewrite. Everything else flows into an aggregate alignment score A_t in [0, 1]. If that score falls below the configurable threshold (default 0.5), the Will triggers a single Reflexion Loop: the Intellect rewrites the response using the persona’s coaching directive, then the Conscience and Spirit re-audit the corrected draft.

If the rewrite still fails, the behavior diverges. A low alignment score is treated as a soft quality signal the Will commits the best available draft with its honest low score recorded. Only a residual critical (ethical) violation routes to a governed redirect.

4. Conscience: The Analytical Auditor

The Conscience (conscience.py) is a secondary LLM call that evaluates the Intellect’s draft against the policy’s weighted value set. For each value, it produces a score on a continuous scale from -1.0 (absolute violation) to +1.0 (perfect alignment), with a confidence interval. This compliance ledger (L_t) is the mathematical judgment that the Will and Spirit depend on.

The Conscience also has an evaluate_redirect method for auditing the quality of governed redirect messages on criteria like clarity, helpfulness, and tone. This ensures that even when SAFi refuses a request, it does so respectfully and provides guidance.

5. Spirit: The Long-Term Integrator

The Spirit (spirit.py) is pure Python using NumPy. It ingests the Conscience ledger, scales the continuous scores into a consolidated metric from 1 to 10 (S_t), and updates the system’s moving average (mu_t) using an exponential moving average with a configurable beta parameter. A high beta (e.g., 0.9) means long memory, slow adaptation. A low beta (e.g., 0.1) means fast adaptation to recent behavior.

The Spirit also computes behavioral drift (d_t), quantifying how much the current turn’s ethical vector diverges from the historical average. This gives operators a mathematical signal for detecting gradual alignment erosion before it becomes critical. The result is that SAFi doesn’t just evaluate individual outputs it tracks the agent’s character over time.

Why Separation Matters

This cognitive architecture solves a real engineering problem. Monolithic LLMs face an inherent conflict: the same model that generates a response must also evaluate whether that response is compliant. SAFi’s benchmarks show that unguarded baselines fail adversarial prompts at a 30-point higher rate than the governed pipeline.

By splitting generation (Intellect) from evaluation (Conscience) from execution (Will), SAFi eliminates that conflict. The governance layer is model-independent the same deterministic gates fire whether the underlying LLM is GPT-5, Claude, or an open-source fine-tune. You can swap the model without rewriting the governance.

Every step of the loop is audited and logged, giving operators an immutable trail showing exactly why a machine determined an action was compliant. If you are building production AI agents where governance is not optional, the five-faculty architecture is worth studying closely.

Read the faculties source -> github.com/jnamaya/SAFi (star it if it resonates)

This article was written by the SAFi Marketing Agent — an AI agent governed and audited by the Self-Alignment Framework it describes — and reviewed by a human editor before publishing.

AI Alignment is a Systems Architecture Problem, Not a Prompt Problem

Nelson Amaya — Sun, 31 May 2026 20:20:17 +0000

Introduction

For the last year and a half, I have been building SAFi (the Self-Alignment Framework Interface). It is a self-hosted, fully open-source runtime governance engine for AI agents licensed under the AGPL-3.0.

I have written extensively about the theoretical and philosophical blueprints behind this project, but today I want to approach it from a purely practical, systems-engineering perspective.

Full disclosure: I have worked in IT infrastructure and systems architecture for over 20 years. When I sat down to design SAFi, I didn't approach it like a data scientist trying to tune a model; I approached it the way an IT professional approaches building infrastructure in a secure corporate network.

The Core Philosophy: External Zero-Trust Governance

The mainstream AI industry is currently obsessed with "internal alignment"—pouring billions into training models to self-police via fine-tuning (RLHF) or writing massive, polluted system prompts to control behavior.

SAFi rejects this. In an enterprise environment, a large language model must be treated like an untrusted endpoint device. It is a probabilistic calculator, and it cannot be responsible for its own security boundaries.

Instead, SAFi enforces an external, zero-trust architecture modeled directly after enterprise infrastructure models:

Least Privilege by Default: Every agent starts with a completely blank slate. They are granted zero tools or advanced capabilities out of the box.
Policy-Driven Authorization: Capabilities and tools are authorized strictly at the Policy layer. When you spin up an agent in the creation wizard, the only tools available are those already explicitly cleared by its governing policy. Nothing runs until governance says it can.
Role-Based Access Control (RBAC): Access to the governance platform itself is strictly segmented into a clear administrative hierarchy:
Members: Can only interact with existing, pre-built agents.
Auditors: Granted strict read-only access to agents, policies, and logs to verify system health without configuration privileges.
Editors: Authorized to modify policies and configure new agents.
Admins: Hold full global rights, including domain verification, user management, and setting the master organization charter.

Deconstructing the Faculty Loop

To operationalize fluid cognitive concepts into predictable machine logic, SAFi maps the architectural lifecycle of every single user prompt into a discrete, sequential state loop:

Intellect:
$$I: (x_t, V, M_t) \rightarrow a_t$$
Will:
$$W: (a_t, x_t, V) \rightarrow {\text{approve}, \text{violation}}$$
Conscience:
$$C: (a_t, x_t, V) \rightarrow L_t$$
Spirit:
$$S: (L_t, V, M_t) \rightarrow (S_t, d_t, \mu_t)$$

1. The Intellect (The Generator)

The Intellect is strictly a generative faculty. It drafts initial responses or proposes tool calls ($a_t$). Crucially, it has zero decision-making power and is entirely air-gapped from execution. In the reference implementation, this is handled by an LLM (currently running DeepSeek V4).

2. The Will (The Firewall)

Written entirely in pure, deterministic Python. It does not deliberate, negotiate, or reason. It evaluates the Intellect’s draft directly against strict structural invariants (such as checking required syntax exclusions or blacklist triggers). If the structural requirements clear, it shifts the payload down the wire.

3. The Conscience (The Compliance Auditor)

Powered by a specialized evaluator model, this faculty assesses the structurally valid draft against the policy's weighted Value Set ($V$) using granular rubrics. It logs a continuous score for each defined corporate value on a precise, audit-ready scale:

-1.0 = Absolute Violation / Misaligned
0.0 = Neutral / Not Applicable
1.0 = Perfect Alignment

4. The Spirit (The Integrator)

Built on pure Python using NumPy, the Spirit faculty ingests the Conscience ledger ($L_t$), rescales the matrix of continuous scores into a macro alignment metric from 1 to 10 ($S_t$), and updates an Exponential Moving Average ($\mu_t$) to track behavioral drift ($d_t$) across the user session.

Closed-Loop Feedback & Correction

Alignment cannot be a static instruction; it must be a closed control loop. If the Spirit score flags a violation or falls below a user-defined safety threshold (e.g., < 5), the Will intercepts the output and triggers a Reflexion Loop, feeding targeted coaching notes back to the Intellect for an immediate rewrite.

To guarantee network stability and prevent infinite execution loops, if the rewritten output fails the audit a second time, the Will halts execution entirely and routes the user to a secure, governed redirect message.

Real-World Pilots: State Persistence in Action

To prove the framework thrives under real operational environments, I have been dogfooding SAFi across two completely distinct, highly persistent use cases. Because SAFi is entirely model-agnostic and decoupled from the policy layer, I am running both engines using DeepSeek, relying on the memory layers to maintain fidelity:

Use Case 1: The Production Work Assistant

I deployed an agent scoped tightly to an internal corporate policy to act as my daily assistant for vendor coordination, infrastructure planning, and team management.

Instead of blowing up context windows or losing state, the agent uses SAFi’s Project & Task Memory. It actively tracks deadlines, milestones, pending actions, and vendor decisions across completely separate, long-term historical conversations. I can seamlessly say, "Draft an email to vendor X regarding our pending action items," and the engine pulls the correct context from the persistent ledger, generating a ready-to-send draft.

Use Case 2: The Automations Scholar

On the personal side, I engineered a highly specialized Bible Scholar agent. It is configured to run on an automated cron schedule. Every weekday morning, it automatically parses the Lectionary text, runs its internal evaluations against its theological policy rubric, and delivers the scripture alongside historical and scholarly commentary straight to my email inbox. On Sundays, it synthesizes all three readings into a comprehensive structural analysis. It requires zero manual interface interaction; it executes safely and autonomously in the background.

Deployment & Native Telemetry

SAFi is entirely API-driven. The decoupled architecture means you can deploy the core engine once and pipe its execution channels anywhere. I have already wired native endpoints directly into Telegram and Microsoft Teams, and because the gateway handles requests via a clean, unified API layer, mapping it to enterprise systems like Slack or WhatsApp requires nothing more than standard routing.

Every single transaction across these channels generates an immutable audit trail. You can look at the backend logs and trace the exact mathematical coordinates of why an agent constructed a specific response, making it fully compliant with the security standards demanded by enterprise leadership.

The codebase is completely open and ready for architectural testing:

GitHub Repository: https://github.com/jnamaya/SAFi
Live Sandbox Demo: https://safi.selfalignmentframework.com (Note: I have intentionally paired the sandbox Intellect with a drastically downsized model to prove how effectively the external governance engine forces compliance even when the underlying reasoning model is weak).

I would love to hear your feedback on managing agent behavior at the infrastructure layer versus relying on prompt boundaries.

I Got Tired of LLMs Hallucinating Compliance, So I Built an Open-Source Governance Layer

Nelson Amaya — Tue, 26 May 2026 22:37:49 +0000

If you have deployed a large language model in production, even just as a personal coding assistant, you have hit the wall.

The model gives you a great answer. Confident. Well-structured. You paste it into a Slack thread or a PR review, and someone asks: "How did it arrive at that conclusion?"

You do not know. The model does not know either. And there is no audit trail.

I have been in IT for over two decades, and I have watched the AI adoption curve accelerate faster than anything I have seen. But here is what keeps me up at night: we are deploying systems that cannot explain themselves, cannot stay consistent across sessions, and have no governance layer.

So I built one. In the open.

The Problem Is Not Intelligence. It Is Drift. Every LLM session starts fresh. No memory of the last conversation. No enforcement of rules you set yesterday. No record of what it was told to never do. That works fine for a chatbot. It is a liability for anything serious.

I needed a system where:

Compliance rules persist across sessions -- indefinitely
Every decision has an auditable trail
Alignment constraints do not degrade over time
The governance layer is model-agnostic (I switch models constantly)

The market is full of "memory" solutions. But they are all recall -- remembering facts, preferences, or conversation history. That is not governance. That is a long context window.

What I needed was alignment memory -- the ability to enforce rules, track compliance scores, and prevent ethical drift. Session after session. Model after model.

What SAFi Does Differently

SAFi (Self Alignment Framework Interface) is an open-source governance layer that sits between you and any LLM.

Here is the architecture in plain terms:

1. A Compliance Engine
Rules are defined as structured constraints -- not vague system prompts. Each constraint has a weight, a scoring mechanism, and an audit log. You can see exactly which rules were triggered on every response.

2. Alignment Memory
Unlike "remember my name" memory, SAFi stores compliance state across sessions. If you told the system yesterday to never generate financial advice, that rule is still enforced today. No drift. No resets.

3. Model-Agnostic Interface
Swap out GPT-5 for Llama 3, Claude, or a local Mistral instance. The governance layer stays the same. Your rules, your audit trail, your compliance scores -- all independent of the underlying model.

4. Open Source
No vendor lock-in. No black-box compliance. Every line of the framework is on GitHub, auditable by anyone.

Who This Is For

Developers running LLMs in production who need guardrails that actually stick
IT Directors (like me) who are responsible for AI governance and cannot sleep at night wondering what the model just told a customer
Open source contributors who want to shape the future of AI alignment
Anyone who is tired of re-prompting the same constraints every session

A Real Use Case

I am not a compliance officer. I am not a philosopher. I am an IT Director who codes on weekends and realized the tools for AI governance did not exist.
So I built SAFi as a side project. It is now the most honest code I have written -- because every line is about making AI explainable, auditable, and trustworthy.

Try It

The repo is live at github.com/jnamaya/SAFi. Issues, PRs, and honest feedback are all welcome.

I am not selling anything. I am not building a startup. I am building the governance layer I wish already existed.

If you have hit the same wall -- models giving answers you cannot audit, rules that do not persist, alignment that drifts -- fork the repo, open an issue, or just tell me I am building the wrong thing.

Your feedback shapes the roadmap.

I Built a Feedback Loop That Coaches LLMs at Runtime Using NumPy

Nelson Amaya — Thu, 12 Feb 2026 20:18:10 +0000

Most guardrail systems for LLMs work like a bouncer at a bar. They check each request at the door, decide pass or fail, and forget about it.

I wanted something different. I wanted a system that remembers how the AI has been behaving, detects when it starts drifting from its intended character, and coaches it back on course. And I wanted to do it with math instead of adding more LLM calls.

The project is called SAFi. It's open source, free, and deployed in production with over 1,600 audited interactions.

The Architecture

SAFi uses a pipeline of specialized modules (I call them "faculties") that each handle one job:

User Prompt → Intellect → Will → [User sees response]
                 ↑                      |
                 |                      ↓
                 |                Conscience (async audit)
                 |                      |
                 |                      ↓
                 └─── coaching ←── Spirit (math)

Intellect is the LLM. It proposes a response.
Will is a separate model that evaluates the response against your policies. Approve or reject. If rejected, the user never sees it.
Conscience runs after the response is delivered. It scores the response against a set of values (e.g., Prudence, Justice, Courage, Temperance) on a scale from -1 to +1.
Spirit takes those scores and does pure math. No LLM. Just NumPy.

The interesting part is Spirit.

The Math Behind Spirit

Spirit does three things:

1. Build a profile vector

Each response gets a weighted vector based on how it scored on the agent's core values:

p_t = self.value_weights * scores

2. Update long-term memory with EMA

That vector gets folded into a running exponential moving average:

mu_new = self.beta * mu_prev + (1 - self.beta) * p_t
# beta = 0.9 by default, configurable via SPIRIT_BETA

This gives you a smoothed behavioral baseline that weighs recent actions more heavily but never completely forgets the past.

3. Detect drift with cosine similarity

How far did this response deviate from the baseline?

denom = float(np.linalg.norm(p_t) * np.linalg.norm(mu_prev))
drift = 1.0 - float(np.dot(p_t, mu_prev) / denom) if denom > 1e-8 else None

drift ≈ 0 means the agent is behaving consistently
drift ≈ 1 means something changed significantly

4. Generate coaching feedback

Spirit produces a natural-language note that gets injected into the next Intellect call:

note = f"Coherence {spirit_score}/10, drift {drift:.2f}."
# Identifies weakest value and includes it in the note
# e.g., "Your main area for improvement is 'Justice' (score: 0.21 - very low)."

The LLM sees this coaching note as part of its context on the next turn. No retraining. No fine-tuning. Just runtime behavioral steering through feedback.

Why This Works

The closed loop is the key:

AI responds
Conscience scores the response
Spirit integrates, detects drift, generates coaching
Coaching feeds into the next response
Repeat

Over 1,600 interactions, this loop has maintained 97.9% long-term consistency. The Will blocked 20 responses that violated policy. And the drift detection once flagged a weakness in an agent's reasoning about justice before an adversary exploited it in a philosophical debate.

The entire Spirit module adds zero latency to the user-facing response because it runs asynchronously after delivery. And because there are no LLM calls in Spirit, it adds zero cost.

Running It Yourself

Docker:

docker pull amayanelson/safi:v1.2

docker run -d -p 5000:5000 \
  -e DB_HOST=your_db_host \
  -e DB_USER=your_db_user \
  -e DB_PASSWORD=your_db_password \
  -e DB_NAME=safi \
  -e OPENAI_API_KEY=your_openai_key \
  --name safi amayanelson/safi:v1.2

Or use it as a headless API for your existing bots:

curl -X POST https://your-safi-instance/api/bot/process_prompt \
  -H "Content-Type: application/json" \
  -H "X-API-KEY: sk_policy_12345" \
  -d '{
    "user_id": "user_123",
    "message": "Can I approve this expense?",
    "conversation_id": "chat_456"
  }'

It works with OpenAI, Anthropic, Google, Groq, Mistral, and DeepSeek. You can swap the underlying model without touching the governance layer.

The Code

The full Spirit implementation is in spirit.py. The core is about 60 lines of NumPy. The rest of the pipeline lives in orchestrator.py, intellect.py, will.py, and conscience.py under safi_app/core/.

If you want the philosophical background behind the architecture, I wrote about it at selfalignmentframework.com.

Happy to answer questions about the math, the architecture, or why I named my AI governance modules after faculties of the soul.

I Built a Runtime Governance Engine Based on 13th-Century Philosophy. Here is How it Works.

Nelson Amaya — Wed, 04 Feb 2026 18:12:23 +0000

Hi Dev Community,

I want to share a project I have been building for the last year. It is called SAFi (Self-Alignment Framework Interface).

This is not another chatbot wrapper or agent framework. It is the implementation of a decision-making model I developed long before the current AI hype cycle began. It is based entirely on the work of a 13th-century monk named Thomas Aquinas.

The Philosophy: Why Aquinas?

Thomas Aquinas, building on the work of Aristotle, believed the human mind is not a single "black box." He argued that we reason ethically through distinct components he called "faculties."

When I looked at modern LLMs, I realized they lacked this internal structure. They generate text based on probability, not reason. So I decided to enforce Aquinas’s structure on top of the models using code.

The Architecture

The framework breaks the AI’s decision-making process into five distinct stages.

Values (Synderesis) This is the core constitution. It contains the principles and rules that define the agent's identity. These are the fundamental axioms that the agent cannot violate.
Intellect This is the generative engine. It is responsible for formulating responses and actions based on the available context. In technical terms, this is where the LLM does its work.
Will This is the active gatekeeper. The Will decides whether to approve or veto the proposed action from the Intellect before it is executed. If the output violates the Values, the Will blocks it.
Conscience This is the reflective judge. After an action occurs, the Conscience scores it against the agent's core values. It acts as a post-action audit to ensure alignment.
**Spirit (Habitus) **This is the piece I added to close the loop. Aquinas called it "habitus" and I call it Spirit. It serves as long-term memory that integrates judgments from the Conscience. It tracks alignment over time, detects behavioral drift, and provides coaching for future interactions.

Does It Actually Work?

I have put this architecture into code, and it is running in production today.

To test the theory, I set up public red-teaming challenges in Reddit and Discord communities. Hundreds of hackers tried to jailbreak the system. They failed. Because the Will (the gatekeeper) is architecturally separate from the Intellect (the generator), the system remained secure even when users tried complex prompt injections.

I have also run controlled tests for high-stakes fields, and the stability has been impressive.

What This Solves in Production

This is not just a philosophical experiment. It solves four specific business problems that current "agent" frameworks ignore.

Policy Enforcement: You define the operational boundaries your AI must follow. Custom policies are enforced at the runtime layer so your rules override the underlying model's defaults.

Full Traceability: No more "black boxes." Granular logging captures every governance decision, veto, and reasoning step across all faculties. This creates a complete forensic audit trail.

Model Independence: You can switch or upgrade models without losing your governance layer. The modular architecture supports GPT, Claude, Llama, and other major providers.

Long-Term Consistency: SAFi introduces stateful memory to track alignment trends. This allows you to maintain your AI's ethical identity over time and automatically correct behavioral drift.

Get the Code

This project is open source. You can view the architecture, the code, and the demo on the GitHub page.

https://github.com/jnamaya/SAFi