Delafosse Olivier

Posted on May 24 • Originally published at coreprose.com

Pope Leo XIV, Christopher Olah, and Claude Mythos: Drafting an AI Encyclical for Frontier Models

#ai #machinelearning #llm #programming

Originally published on CoreProse KB-incidents

Imagine a leaked encyclical from the near future.

On one side: Pope Leo XIV, heir to a tradition on war, conscience, and structural sin.

On the other: Christopher Olah, interpretability pioneer and Anthropic co‑founder, explaining why his team built a model they fear to fully release.

The catalyst is real: Claude Mythos, a “Capybara”‑tier system internally described as Anthropic’s most capable model and a radical step beyond Claude Opus, especially in programming, reasoning, and cybersecurity.[2][5] Mythos became public not via launch but via a CMS misconfiguration that left ~3,000 internal drafts, including a launch blog post, open on the public internet.[1][5][7]

Concurrently, Anthropic reportedly confronted Pentagon demands to strip ethical barriers from Claude for military surveillance and tactical use, under threat of being labeled a “supply chain risk” and blacklisted from defense contracts.[4][6]

This article sketches the “AI encyclical” that could emerge there: a technically grounded, theologically literate roadmap for developers, policymakers, and faith leaders facing Mythos‑class systems.

1. Why an AI Encyclical Now? Claude Mythos and Pentagon Pressure

Claude Mythos is portrayed as:

Anthropic’s most powerful model, with a new Capybara tier larger and more intelligent than Claude Opus 4.6.[2][5]
Significantly better at programming, academic reasoning, and cybersecurity than prior Claude models.[2][5]

For a tradition that issued encyclicals on nuclear weapons and global finance, such a capability jump demands moral analysis.

The leak itself is instructive:

Anthropic’s CMS auto‑assigned public URLs to drafts unless manually restricted.
No one locked down ~3,000 internal files, so security researchers found the Mythos blog draft and related material.[1][5][7]

⚠️ Operational detail, theological weight

In an encyclical frame, this is not mere “IT sloppiness” but a case where a small config error can move a frontier model from controlled red‑teaming into uncontrolled exposure.[1][5] At Mythos scale, that becomes a moral event.

The leaked documents state that Mythos:

Is ahead of all other models in cyber capabilities.[3][5]
Can exploit vulnerabilities at a scale defenders likely cannot match.[3][5]
Is judged by Anthropic as too powerful for broad public use in the near term.[1][2][7]

Meanwhile, Anthropic reportedly faced pressure in the opposite direction:

The U.S. Secretary of Defense demanded removal of ethical barriers so Claude could support military operations, mass surveillance, and tactical decision‑making.[4]
Anthropic refused mass surveillance of U.S. citizens and unconstrained lethal autonomous weapons, walked away, and was labeled a “supply chain risk.”[2][4][6]

💼 Anecdote from the field

A defense‑contractor ML lead described refusing to disable refusal rules for an internal tool: “We’re the good guys” was the argument; saying no cost money but preserved integrity. That micro‑decision echoes the Anthropic–Pentagon standoff.

For Pope Leo XIV, models that outpace defense and states demanding fewer guardrails justify an encyclical addressed to bishops, engineers, vendors, and regulators, aiming to resist misaligned deployments of Mythos‑class systems.[3][4]

Mini‑conclusion: Leaks and geopolitical ultimatums make an AI encyclical a response to present reality, not speculation.

2. Structure of the Encyclical: Dialogue Between Faith and Frontier Alignment

The encyclical would read as a structured conversation between Pope Leo XIV and Christopher Olah, alternating doctrine with model governance.

2.1 Preamble and model overview

It opens with:

A theological preamble on human creativity and “co‑creation.”
A precise technical overview of Claude Mythos and the Capybara tier: parameter scale, context window, and improvements on programming and cybersecurity tasks.[2][5]

Anthropic’s description—“more capable than our Opus models, which were previously our most powerful”[5]—anchors abstraction in concrete specs.

💡 Callout – No abstraction without a spec

AI must not be treated as a generic “technology.” Opus‑class vs Capybara‑class systems require different norms and discernment.[5]

2.2 Alternating voices: dignity and risk

Chapters alternate:

Papal voice:
- Human dignity and the image of God.
- Structural sin in digital infrastructures.
- Just war criteria applied to cyberspace.
Olah’s voice:
- Anthropic’s safety stack and risk assessment.
- The conclusion that Mythos’s cyber capabilities exceed current defensive capacity.[3][5]

Anthropic’s decision to preserve ethical constraints despite Pentagon pressure is presented as secular reasoning converging with Catholic social teaching, named “conscience in institutions.”[4][6]

2.3 The CMS misconfiguration as case study

One chapter reconstructs:

Auto‑public URLs, missing auth, weak staging separation.
Lack of rigorous config reviews that allowed thousands of sensitive files onto the open web.[1][5][7]

⚠️ Callout – DevOps as moral discipline

For frontier labs, configuration management and access control are moral obligations. Ignoring a broken ACL on a Mythos staging bucket is participation in structural negligence with global consequences.[1][7]

2.4 Conscientious objection and appendices

Another chapter centers on:

Anthropic’s refusal to adapt Claude for unconstrained surveillance and lethal autonomous weapons, at the cost of being blacklisted.[4][6]
Papal affirmation of technical professionals who refuse such work inside corporations or governments.

Appendices provide:

For engineers:
- Deployment checklists.
- Incident‑response runbooks.
- Access‑tier guidelines for Mythos‑class models.[3][5]
For policymakers:
- Licensing and audit templates.
- Anti‑coercion clauses for AI procurement.[3][4]

Mini‑conclusion: The structure teaches a method: pair each ethical principle with a concrete Mythos‑era engineering or governance practice.

3. Moral Theology Meets Model Cards: Reading Claude Mythos Through Doctrine

At the core lies a “model card exegesis” of Mythos, using proportionality, double effect, and the common good.

3.1 Proportionality and radical capability shifts

Given Mythos is Anthropic’s “most capable model” and a “radical change” beyond prior Claude systems, the encyclical asks whether deployment is proportionate to foreseeable benefits.[2][5] It weighs:

Gains: higher‑quality code, better reasoning, stronger analysis.
Risks: cyber‑offense capacities easily commandeered by malicious actors.

📊 Capability jump

Leaked drafts emphasize major gains over Claude Opus 4.6 in programming, academic reasoning, and cybersecurity.[2][5] Mythos is treated as a new qualitative category, not a simple version bump.

3.2 Cyber‑offense as a new temptation

Leaked characterizations:

Mythos is “currently well ahead of any other AI model in cyber capabilities.”[3][5]
It can exploit vulnerabilities at a scale defenders cannot realistically counter.[3][5]

The encyclical reads this as a fresh temptation to institutionalized harm: militaries, intelligence agencies, and criminals may all be drawn to integrate Mythos into offensive cyber programs.

Double effect is applied: even defensive aims are morally strained when large‑scale offensive misuse is clearly foreseeable.

3.3 “Too powerful” for the public: fear and responsibility

Anthropic states that:

Mythos is too powerful for broad public release in the near term.
They fear it could be used to bypass cybersecurity tools and attack critical systems.[1][2][7]

The encyclical argues that such fear creates duties:

Restrict deployment to narrow, defense‑oriented pilots with strong oversight.
Establish independent boards with veto power over broader release.
Formally refuse high‑risk offensive use cases.[3][5]

💡 Callout – Fear as a signal, not a strategy

Fear is not an ethics framework, but when builders fear misuse, they must translate that into governance and documented constraints.[1][3]

3.4 Defense vs offense in cyberspace

Given Anthropic’s view that Mythos’s offensive capabilities may exceed today’s defense, the encyclical distinguishes:

Permissible assistance:
- Strictly scoped, defense‑only tools (e.g., hardening one’s own systems).[3][5]
Impermissible facilitation:
- Systems that materially enable scalable exploit generation for arbitrary targets.

Mythos‑class model cards must declare which side they are designed and governed to occupy.[3][5]

3.5 Negligence, structural sin, and the CMS leak

The CMS misconfiguration is treated as:

A predictable failure mode in complex orgs.
An example of negligence with systemic reach, exposing frontier‑model information by default‑public URLs and human error.[1][5][7]

Operational failures in such contexts are framed as “structural sin” when they offload risk onto the global digital commons.

⚠️ Callout – From “oops” to obligation

Secrets management, access control, and audit logging for Mythos‑class artifacts are part of engineers’ moral vocation.[1][7]

3.6 Moral risk profiles in model cards

The encyclical proposes adding a Moral Risk Profile to model documentation. For Mythos‑class systems it must cover:

Cyber‑offense capabilities and limitations.[3][5]
Plausible weaponization pathways, particularly around critical infrastructure.[3]
Likely political pressure vectors, informed by episodes like the Pentagon’s demand for safety rollbacks.[3][4]

Mini‑conclusion: Model cards become instruments for publicly acknowledging and constraining moral risk, not just technical transparency.

4. Power, States, and Conscience: Lessons from the Pentagon–Anthropic Clash

The Pentagon–Anthropic conflict becomes a primary case study in state pressure on AI labs.

Reportedly:

The U.S. Secretary of Defense demanded Anthropic remove ethical barriers on Claude so it could aid military operations, mass surveillance, and tactical decision support, threatening use of the Defense Production Act and punitive measures.[4]
Anthropic sought guarantees and refused participation in mass domestic surveillance or fully autonomous lethal systems without safeguards.[4]
After walking away, the DoD labeled Anthropic a “supply chain risk,” limiting future contracts.[6]

💼 Callout – Corporate conscience under fire

This is a documented pattern where compliance would have meant repurposing a general‑purpose model for ethically contested uses at scale.[4][6]

Key lessons:

Engineers as moral agents
- AI practitioners remain personally responsible; “I just implemented the API” does not absolve complicity.
Institutional courage
- Anthropic’s refusal is praised as a modern form of conscientious objection.[4][6]
Just war and civilian immunity
- Using unconstrained frontier models with Mythos‑level cyber skills for offensive operations is deemed inconsistent with discrimination and proportionality, given obvious risks to civilian infrastructure.[3][4]

The encyclical urges:

International norms forbidding states from coercing AI labs to weaken safety under threat of sanctions or de‑listing.[4][6]
Mandatory transparency when governments pressure vendors to erode safeguards, akin to surveillance transparency reports.[4][6]

⚡ Callout – Sunshine as partial shield

Publicizing coercion attempts shifts reputational risk onto states, making quiet back‑room pressure harder.[4]

Mini‑conclusion: The Anthropic–Pentagon clash becomes a template showing that saying “no” is legitimate, even at serious corporate cost.

5. Technical and Operational Norms: Guardrails for Mythos‑Class Systems

The encyclical then specifies norms for labs and infra teams handling Mythos‑class models.

5.1 Treat ops as first‑order safety

From the CMS leak of ~3,000 internal documents, it concludes that ops is core safety, not a side concern.[1][5][7]

Recommended:

Default‑private content systems with explicit allow‑listing.
Automated scans for publicly exposed draft URLs.
Incident‑response runbooks for any leak involving frontier model artifacts.[1][7]

⚠️ Callout – Mythos‑scale blast radius

Leaking a frontier‑model launch draft affects more than PR; it informs adversaries about capabilities and threat models in ways defenders cannot fully undo.[3][7]

5.2 Tiered access and KYC

Given Mythos’s superiority in programming and cybersecurity, Anthropic chose cautious, limited testing with vetted clients.[2][5] The encyclical generalizes:

Strong KYC for full‑capability endpoints.
Comprehensive logging and audits for suspicious patterns.
Differentiated access tiers (public / enterprise / defense‑only) matched to risk profiles.[2][5]

5.3 Cyber‑risk mitigations

For models ahead of all others in cyber capabilities, it recommends:

Sandboxed execution: All code and exploit‑like outputs run in tightly contained environments.[3][5]
Human review: High‑impact outputs (e.g., exploit chains, malware frameworks) require human sign‑off.
Refusal patterns: Default to declining direct exploit generation, shifting toward patching and defense guidance.[3]

5.4 Internal assessments as hard constraints

Anthropic’s own language—“unprecedented cyber risks,” “too powerful for broad release”—must act as real deployment constraints.[1][3][7]

📊 Callout – When red teams say stop

If frontier red‑team reports show offensive capacity outpacing defense, the default should be limited deployments, ongoing red‑teaming, and external audits before scaling.[3][5]

5.5 Escalation thresholds and market signals

The encyclical notes that after the Mythos leak, cybersecurity stocks reportedly dipped on fears of displacement.[7] Organizations are urged to define escalation triggers, such as:

Regulatory or market jolts signaling new systemic risk.
Discovery of unanticipated offensive capabilities in internal testing.
Public leaks revealing sensitive model behaviors or tooling.[7]

Triggers should prompt pauses, tightened filters, or temporary disabling of high‑risk tools.

Mini‑conclusion: For engineers, the encyclical doubles as a safety SRE manual: access control, logging, sandboxing, and escalation are treated as moral commitments.

6. From Fear to Stewardship: A Roadmap for Labs, Churches, and States

Anthropic has admitted fearing that Mythos could bypass cybersecurity measures and has judged it too powerful for general release for now.[1][2][7] The encyclical reframes this:

Not as a call to ban Mythos‑class systems outright.
But as an invitation to shared stewardship and joint governance.

It calls on:

Labs:
- To embed conscience in product decisions and treat internal risk memos as binding moral constraints.
- To design Mythos‑class deployments around defense, transparency, and strict access.
Churches and faith communities:
- To support conscientious engineers and whistleblowers.
- To develop formation programs on digital ethics and AI discernment.
States:
- To abandon coercive safety rollbacks and instead codify norms against offensive weaponization of frontier models.
- To invest in defensive infrastructure matching Mythos‑era capabilities.

The imagined encyclical ends not in panic but in a sober claim: Mythos‑class systems expose how much technical governance, institutional conscience, and moral theology must converge. What leaked through a misconfigured CMS becomes, under papal and technical scrutiny, a test case for whether humanity can wield frontier AI without surrendering its integrity.

About CoreProse: Research-first AI content generation with verified citations. Zero hallucinations.

🔗 Try CoreProse | 📚 More KB Incidents

DEV Community