I Built a Runtime Governance Engine Based on 13th-Century Philosophy. Here is How it Works.

#ai #agents #opensource

Hi Dev Community,

I want to share a project I have been building for the last year. It is called SAFi (Self-Alignment Framework Interface).

This is not another chatbot wrapper or agent framework. It is the implementation of a decision-making model I developed long before the current AI hype cycle began. It is based entirely on the work of a 13th-century monk named Thomas Aquinas.

The Philosophy: Why Aquinas?

Thomas Aquinas, building on the work of Aristotle, believed the human mind is not a single "black box." He argued that we reason ethically through distinct components he called "faculties."

When I looked at modern LLMs, I realized they lacked this internal structure. They generate text based on probability, not reason. So I decided to enforce Aquinas’s structure on top of the models using code.

The Architecture

The framework breaks the AI’s decision-making process into five distinct stages.

Values (Synderesis) This is the core constitution. It contains the principles and rules that define the agent's identity. These are the fundamental axioms that the agent cannot violate.
Intellect This is the generative engine. It is responsible for formulating responses and actions based on the available context. In technical terms, this is where the LLM does its work.
Will This is the active gatekeeper. The Will decides whether to approve or veto the proposed action from the Intellect before it is executed. If the output violates the Values, the Will blocks it.
Conscience This is the reflective judge. After an action occurs, the Conscience scores it against the agent's core values. It acts as a post-action audit to ensure alignment.
**Spirit (Habitus) **This is the piece I added to close the loop. Aquinas called it "habitus" and I call it Spirit. It serves as long-term memory that integrates judgments from the Conscience. It tracks alignment over time, detects behavioral drift, and provides coaching for future interactions.

Does It Actually Work?

I have put this architecture into code, and it is running in production today.

To test the theory, I set up public red-teaming challenges in Reddit and Discord communities. Hundreds of hackers tried to jailbreak the system. They failed. Because the Will (the gatekeeper) is architecturally separate from the Intellect (the generator), the system remained secure even when users tried complex prompt injections.

I have also run controlled tests for high-stakes fields, and the stability has been impressive.

What This Solves in Production

This is not just a philosophical experiment. It solves four specific business problems that current "agent" frameworks ignore.

Policy Enforcement: You define the operational boundaries your AI must follow. Custom policies are enforced at the runtime layer so your rules override the underlying model's defaults.

Full Traceability: No more "black boxes." Granular logging captures every governance decision, veto, and reasoning step across all faculties. This creates a complete forensic audit trail.

Model Independence: You can switch or upgrade models without losing your governance layer. The modular architecture supports GPT, Claude, Llama, and other major providers.

Long-Term Consistency: SAFi introduces stateful memory to track alignment trends. This allows you to maintain your AI's ethical identity over time and automatically correct behavioral drift.