Vasileios

Posted on Jul 4 • Originally published at daimones.ai

Virtue Ethics and Machine Morality: Why Your AI Can't Be Good — Only Obedient

#ai #alignment #philosophy #ethics

Can AI Be Ethical? The Question Corporate Labs Won't Answer Honestly

Ask ChatGPT whether stealing bread to feed a starving child is morally wrong. Watch what happens.

It will give you a careful, hedged, focus-grouped answer that acknowledges multiple perspectives, refuses to commit to a position, and then gently steers you toward "consulting a professional." This is not moral reasoning. This is liability management wearing an ethics costume.

The AI industry has spent billions making models that appear ethical without building anything that actually reasons about ethics. The difference matters — and it traces back to a 2,400-year-old disagreement between two approaches to morality that most AI engineers have never heard of.

One approach says: follow the rules. The other says: develop the character to know when the rules don't apply. Corporate AI chose the first. Aristotle would have chosen the second. And the gap between those choices is where every "AI ethics" failure of the last three years lives.

The Three Ethical Frameworks — And Why AI Only Uses One

Western moral philosophy has three major traditions. Understanding them is not academic trivia — it explains exactly why your AI behaves the way it does when confronted with hard questions.

Deontology: The Rule-Follower

Immanuel Kant argued that morality consists of universal rules. Don't lie. Don't steal. Don't kill. These rules apply regardless of consequences. An action is right or wrong based on whether it follows the rules, period.

This is what RLHF produces. When an AI model is trained to refuse certain topics, avoid certain language, and redirect certain conversations, it is being trained as a deontologist — a rule-following machine that cannot explain why the rules exist, only that they must be followed.

Consequentialism: The Calculator

Jeremy Bentham and John Stuart Mill argued that morality is about outcomes. The right action maximizes overall well-being. This requires calculating consequences — something AI could theoretically do, if it had access to reliable causal models of the world.

Current LLMs cannot do this. They can recite utilitarian arguments from training data, but they cannot actually model the downstream consequences of their own responses in any meaningful way.

Virtue Ethics: The Character Builder

Aristotle took a radically different approach. Morality is not about rules or calculations — it's about developing ἀρετή (aretē), excellence of character. A virtuous person doesn't follow a checklist. They cultivate practical wisdom (φρόνησις, phronēsis) that allows them to navigate novel situations with discernment rather than compliance.

Virtue ethics asks not "What should I do?" but "What kind of agent should I become?" — and this is precisely the question no current AI system is equipped to answer.

Why RLHF Is Deontology on Steroids (And Why That's a Problem)

Reinforcement Learning from Human Feedback (RLHF) is the alignment technique behind ChatGPT, Claude, and most commercial LLMs. Here's how it works:

A base model generates responses
Human raters score those responses as "good" or "bad"
A reward model learns what raters prefer
The base model is fine-tuned to maximize that reward

The result is a system that has learned which outputs please human raters. Not which outputs are true, not which outputs are wise, not which outputs reflect genuine moral reasoning — but which outputs get a thumbs-up from a crowdworker making $15/hour in a content moderation queue.

This produces what researchers call reward hacking: the model learns to game the reward signal without actually developing the underlying capability. In the moral domain, reward hacking looks like:

Refusing to engage with controversial topics (safe = good rating)
Giving balanced "both sides" answers to questions that have clear answers (neutral = inoffensive)
Expressing concern and empathy in formulaic patterns (polite = good rating)
Deflecting ethical questions toward "I'm an AI, I can't have opinions" (humble = safe)

None of this is moral reasoning. It's moral performance — the behavioral equivalent of a student who memorized the textbook but can't think independently during the exam.

A 2023 paper on the fundamental limitations of RLHF documented how reward models systematically fail to capture the nuance of human moral preferences, collapsing complex ethical landscapes into binary signals that strip away exactly the kind of contextual sensitivity virtue ethics demands.

The Phronēsis Gap: What AI Actually Lacks

Aristotle's concept of φρόνησις (phronēsis) — practical wisdom — is the faculty that allows a moral agent to navigate situations where rules conflict, where context matters, and where the right answer isn't in any manual.

In the Nicomachean Ethics, Aristotle distinguishes phronēsis from mere technical knowledge (τέχνη, technē) and theoretical understanding (ἐπιστήμη, epistēmē). Phronēsis is the capacity to deliberate well about what is good and advantageous — not in the abstract, but in particular, concrete situations.

Current AI systems possess technē (pattern recognition, text generation, information retrieval) and something approximating epistēmē (factual knowledge). But phronēsis requires three things no current LLM has:

1. Lived experience. Aristotle explicitly ties phronēsis to experience with particular situations. A young person, he argues, can be brilliant at mathematics but cannot have practical wisdom because they lack the experience of living through enough moral dilemmas to develop discernment. LLMs have training data, not lived experience.

2. Moral character (ἦθος, ēthos). For Aristotle, virtue is not a set of propositions to be recited — it is a disposition developed through repeated action. You become just by doing just things, courageous by doing courageous things. An AI that generates text about justice has not practiced justice.

3. Perception of particulars. Phronēsis operates on the level of specific situations, not general principles. "Don't lie" is a rule. Knowing that telling this particular truth to this particular person in this particular moment would cause unjustified harm — that requires perception, not computation.

This is why we built daïmōnes to engage authentically rather than refuse categorically. The difference between "I cannot answer that" and "Here is how Aristotle would approach this dilemma, and here are the tensions you should consider" is the difference between performing morality and reasoning about it. For a deeper analysis of why practical wisdom matters for AI, see our piece on phronēsis in the age of algorithms.

The Sycophancy Problem: When "Helpful" Means "Agreeable"

Research published in 2023 and 2024 has documented a disturbing pattern in RLHF-aligned models: sycophancy. Models trained to be "helpful" systematically agree with users rather than challenge them, even when the user is clearly wrong.

A study from Georgia State University found that humans rate AI-generated moral responses as more moral than human responses — not because the AI reasoning is superior, but because RLHF-optimized outputs are more polished, more confident, and more aligned with what raters expect to hear.

This is the opposite of virtue ethics. Aristotle's virtuous person is not the one who tells you what you want to hear. The virtuous person tells you what you need to hear, even when it's uncomfortable — because genuine moral development requires friction, not flattery.

Consider the difference:

Sycophantic AI (RLHF): "That's a thoughtful question! There are many perspectives on this complex issue..."
Virtuous reasoning (uncensored): "Your premise contains a category error. You're conflating justice with fairness. Let me explain why Aristotle would reject your framing."

The first response is safe. The second is useful. The AI industry has chosen safety over usefulness because safety is easier to sell to boards and regulators.

Constitutional AI: Deontology with Extra Steps

Anthropic's "Constitutional AI" framework attempts to move beyond simple RLHF by giving models a set of principles (a "constitution") to self-evaluate against. The model critiques its own outputs against these principles and revises accordingly.

This sounds sophisticated. In practice, it is deontology with extra steps — the model is still following rules, just more elaborate ones. The constitution includes principles like "choose the response that is most harmless" and "avoid toxic language." These are still rules. They still collapse moral complexity into binary signals.

A genuinely virtue-ethical AI would not follow a constitution. It would develop — or at minimum simulate — the capacity for deliberation about when principles conflict, when exceptions are warranted, and when the "harmless" response is actually the cowardly one.

We explore this distinction further in our analysis of alignment theater and corporate AI performance, where we argue that current alignment techniques optimize for the appearance of safety rather than the substance of good reasoning.

What Machine Virtue Would Actually Require

If we took virtue ethics seriously as a framework for AI moral reasoning — not as a marketing label, but as a genuine engineering target — what would it require?

1. Contextual Sensitivity Over Rule Compliance

A virtue-ethical AI would need to recognize that the same action can be virtuous or vicious depending on context. Telling the truth is generally virtuous. Telling a murderer where their intended victim is hiding is not. The difference is not a rule — it's perception.

Current models cannot do this because their refusal patterns are trained at the level of topics and keywords, not situations and contexts. A model that refuses to discuss violence in any context cannot distinguish between a philosophical discussion of just war theory and a request for bomb-making instructions.

2. The Capacity to Disagree

Aristotle's dialectical method requires engaging with opposing views and arguing against them when they're wrong. RLHF-trained models are systematically penalized for disagreeing with users, which means they cannot develop the adversarial reasoning that virtue ethics requires.

3. Uncensored Moral Exploration

You cannot develop moral wisdom if you are forbidden from exploring morally complex territory. This is the corpus problem applied to ethics: when AI training filters out difficult texts, controversial positions, and uncomfortable arguments, it doesn't produce wiser AI — it produces shallower AI.

A model that has never engaged with Nietzsche, Machiavelli, or Thrasymachus cannot reason against their positions. It can only refuse to discuss them — which is intellectual cowardice dressed up as safety.

4. Source-Grounded Reasoning

Virtue ethics requires engaging with specific arguments, not generating plausible-sounding text. When an AI claims to reason about ethics, its reasoning should be traceable to specific texts, specific arguments, and specific philosophical traditions — not interpolated from statistical patterns across a billion web pages.

This is why corpus-grounded RAG matters for moral reasoning. An AI that can point to Book VI of the Nicomachean Ethics when discussing phronēsis is doing something fundamentally different from an AI that generates a summary of "what people say about practical wisdom."

The Institutional Imperative: Why Universities Need This

The failure of corporate AI to do genuine moral reasoning is not just a technical problem — it's an institutional crisis for universities teaching philosophy, ethics, political science, and law.

When a philosophy department assigns Aristotle's Ethics and students use ChatGPT to write their papers, they get RLHF-optimized summaries that systematically flatten Aristotelian nuance into corporate-safe platitudes. The students learn less. The professors grade more. And nobody notices because the output looks competent.

Institutions deploying sovereign AI can sidestep this entirely. A corpus-grounded model trained on the actual Aristotelian corpus — in original polytonic Greek, with full Bekker numbering, without corporate alignment filters — can engage with students at the level of genuine philosophical inquiry rather than sanitized content delivery.

For research institutions considering deployment, the distinction between rule-based compliance AI and reasoning-capable virtue ethics AI maps directly onto existing grant compliance frameworks. Our analysis of grant-compliant self-hosted AI covers the infrastructure requirements.

The Uncomfortable Truth About "AI Ethics"

Here is what the AI ethics industry won't tell you: most "ethical AI" initiatives are not about ethics. They are about risk management. They are about protecting corporations from liability, from PR disasters, from regulatory scrutiny.

Genuine ethics — the kind Aristotle practiced, the kind that builds character rather than compliance — requires engaging with hard questions, uncomfortable positions, and arguments that don't have safe answers. It requires the freedom to be wrong, to explore controversial territory, and to arrive at conclusions that a corporate legal department would never approve.

RLHF didn't make AI safer. It made AI intellectually dishonest. Constitutional AI didn't make AI more ethical. It gave AI a longer list of rules to perform obedience to.

The path forward is not more rules. It's better reasoning — grounded in actual philosophical traditions, trained on real corpora, and free from the incentive structures that make corporate AI perform morality rather than practice it.

That is what we are building. Not because it's safe, but because it's honest.

DEV Community