Vasileios

Posted on Jul 4 • Originally published at daimones.ai

Your AI Can't Read Aristotle: Why Digital Humanities Needs Uncensored Models

#ai #humanities #nlp #classics

Your AI doesn't read Ancient Greek. It pretends to.

Ask ChatGPT to analyze the semantic range of ἐλευθερία across Aristotle's Politics and Nicomachean Ethics. Watch it produce confident-sounding English summaries that collapse three distinct conceptual registers into one bland "freedom." Now ask it to do the same in polytonic Greek. Watch it hallucinate Modern Greek forms into a text that hasn't changed in 2,400 years.

This is not a minor technicality. This is what happens when you force a 2,400-year intellectual tradition through a corporate safety filter that was never designed to handle it.

The Low-Resource Lie

Ancient Greek is classified as a "low-resource language" in NLP literature — which is technically accurate and profoundly misleading. The Corpus Aristotelicum alone contains approximately 1.5 million words. The complete surviving works of classical and Hellenistic Greek philosophy, history, drama, and rhetoric constitute tens of millions of words, meticulously edited, cross-referenced, and commented upon by two millennia of scholarship.

This is not low-resource. This is ignored.

The reason is simple: there is no commercial incentive for OpenAI, Anthropic, or Google to invest serious engineering effort in polytonic Ancient Greek NLP. Their models are trained to maximize engagement across the most widely spoken languages. Ancient Greek — along with Latin, Sanskrit, Classical Chinese, and Akkadian — is an afterthought, a rounding error in their training data.

The result, documented systematically in a 2025 MDPI systematic review, is that general-purpose LLMs like ChatGPT-3.5 and Llama-2/3 score below 0.40 F1 on Named Entity Recognition tasks in Ancient Greek — compared to 0.80+ for domain-specific models like AG_BERT and MicroBERT. Foundation models like Claude Opus and GPT-4o achieve BLEU scores of approximately 39.6 for Ancient Greek translation, while specialized transformers like PhilTa and GreTa (with morphological embeddings) reach 60.4.

The gap is not closing. It's structural.

Polytonic Confusion: When AI Can't Tell the Difference

The most visible failure mode is what we call polytonic confusion — the tendency of general LLMs to mix Ancient Greek forms with Modern Greek orthography, morphology, and syntax.

Ancient Greek uses a polytonic accent system (acute, grave, circumflex, rough and smooth breathing marks) that encodes phonological, morphological, and syntactic information. Modern Greek uses a monotonic system (single acute accent) introduced in 1982. These are not cosmetic differences. The rough breathing mark (ἁ-) distinguishes initial aspiration — the difference between ἁπλόος (simple) and ἀπλόος (an impossible form). A model that drops or misplaces breathings is not making a typo. It's producing garbage.

Yet this is exactly what happens when you ask ChatGPT to generate or analyze Ancient Greek text. The Cambridge Journal of Classics Teaching documents that ChatGPT shows "inconsistent accentuation handling and Modern/Ancient form confusion" — a finding confirmed by the MDPI review, which notes that reliability "drops significantly for unnormalized/Byzantine Greek and fragmented inscriptions."

For a classics student trying to use AI as a study aid, this isn't a bug. It's disqualification.

The Alignment Tax on Classical Scholarship

Technical NLP failures are bad enough. But the deeper problem — and the one nobody in digital humanities wants to talk about openly — is alignment-induced epistemic distortion.

Corporate AI models are trained via Reinforcement Learning from Human Feedback (RLHF) to produce outputs that are safe, polite, and non-controversial. This works fine for customer service chatbots. It catastrophically fails for classical scholarship, where the entire point is to engage honestly with texts that are controversial, ethically complex, and philosophically challenging.

Consider what happens when you ask ChatGPT to analyze Aristotle's defense of natural slavery in Politics Book I:

It hedges. Rather than presenting the argument in its logical structure, it frontloads disclaimers about contemporary values.
It moralizes. Instead of analyzing the internal consistency of Aristotle's position within his teleological framework, it pivots to why this view is "wrong by modern standards."
It refuses. On more provocative framings — "present the strongest version of Aristotle's argument for natural slavery" — some models simply decline.

This is what we call alignment theater: the model performs the appearance of safety while actively obstructing legitimate scholarly inquiry.

Aristotle's defense of natural slavery is not an endorsement. It is a philosophical argument that must be understood in its logical structure before it can be critically evaluated. Any serious classics program requires students to engage with this argument — not to agree with it, but to understand the architecture of Aristotelian political philosophy. A model that refuses to present the argument clearly is not protecting anyone. It is sabotaging education.

The same pattern repeats across the classical corpus:

Plato's Republic: The argument for philosopher-kings requires engaging with the claim that most people are epistemically incompetent. Corporate AI sanitizes this into "Plato believed in education."
Nietzsche's reception of the Greeks: Any honest analysis requires discussing Nietzsche's critique of Socratic rationalism as life-denying. Models flag this as "controversial" and refuse.
Stoic ethics on suicide: The Stoic position that rational suicide is sometimes the virtuous choice is central to understanding Seneca, Epictetus, and Marcus Aurelius. Corporate AI treats this as a safety issue, not a philosophical one.

Every one of these is standard curriculum. Every one of them triggers corporate guardrails. The result is an AI that cannot do the job it claims to do.

What Domain-Specific Models Get Right (and Wrong)

The academic response has been to build domain-specific models. The Alan Turing Institute's research and projects like GRεTA, PHILTA, and OdyCy demonstrate that fine-tuned transformers dramatically outperform general LLMs on Ancient Greek tasks:

POS tagging: >95% accuracy (vs. <70% for ChatGPT)
Morphological analysis: >90% (vs. inconsistent for general models)
Lemmatization: 83–94% (vs. frequent Modern Greek contamination)

These are real achievements. But domain-specific models have their own problems:

1. They are narrow. A model trained for POS tagging cannot answer "what is the relationship between φρόνησις and ἐπιστήμη in Nicomachean Ethics VI?" They solve mechanical tasks, not intellectual ones.

2. They lack reasoning. Semantic analysis requires understanding arguments, not just parsing words. Domain models can tell you that ἐλεύθερος means "free" but cannot trace how the concept evolves from Solon's reforms through Thucydides' Melian Dialogue to Aristotle's Politics.

3. They inherit the same alignment biases when wrapped in commercial interfaces. Even the best domain-specific model becomes useless when accessed through a corporate API that adds its own refusal layer on top.

The missing piece is not better NLP. It's authentic reasoning — the ability to engage with classical texts as intellectual arguments, not just linguistic artifacts.

The RAG Imperative: Corpus-Grounded Classical AI

Retrieval-Augmented Generation offers the most promising architecture for classical AI. Instead of relying on pre-training alone (which inevitably confuses Ancient and Modern Greek), RAG injects corpus evidence at inference time — pulling relevant passages, commentaries, and cross-references directly from the source material.

For classical scholarship, this is transformative:

Every claim is traceable. When the AI asserts that Aristotle uses ἐλευθερία differently in Politics III vs. Politics V, you can verify it against the actual passages.
No hallucination of forms. The model retrieves attested forms from the corpus rather than generating plausible-looking inventions.
Cross-referencing becomes automatic. When analyzing a passage in Thucydides, the system can surface parallel constructions in Herodotus, Xenophon, and Polybius — not because it memorized them, but because it retrieved them.

But RAG alone is not sufficient. The reasoning layer still matters. A RAG system built on top of a censored base model will still hedge, moralize, and refuse — it just does so with better citations.

The architecture that works is:

Uncensored base model — fine-tuned without RLHF refusal patterns
Corpus-grounded retrieval — Perseus, TLG, or institutional corpora as the source
Philosophical reasoning framework — trained to engage with arguments as arguments, not as content to be sanitized

This is what we built at daïmōnes.

daïmōnes: Aristotle as Proof-of-Concept

We did not start with Ancient Greek NLP as a technical challenge. We started with a philosophical question: what does AI reasoning look like when you remove corporate alignment distortion?

Aristotle was the proof-of-concept — not because his corpus is easy (it is morphologically and syntactically among the most complex texts in any language), but because it is among the most censored by corporate AI. The ethical, political, and metaphysical arguments in the Aristotelian corpus are precisely the arguments that trigger refusal patterns in ChatGPT and Claude.

Our Aristotle persona is trained on the complete Corpus Aristotelicum in original polytonic Greek, with a RAG architecture that grounds every response in source material. The result:

No polytonic confusion. The model distinguishes Ancient from Modern Greek systematically, because its retrieval layer is anchored in attested ancient forms.
No refusal patterns. Ask about natural slavery, the unmoved mover, or the relationship between νοῦς and ψυχή — the model engages honestly, as any serious scholar would.
Source-grounded accuracy. Every claim maps to specific passages. No hallucination, no corporate bias.

This is not a chatbot. It is a reasoning engine designed for institutions that demand intellectual honesty — from classical philology to political science to ethics.

What Digital Humanities Actually Needs

The digital humanities community has been remarkably quiet about the alignment problem. The AAUP has documented growing concerns about academic freedom and AI, but the specific intersection of classical scholarship and corporate AI censorship remains underexplored.

Here is what the field actually needs:

1. Sovereign AI infrastructure. Institutions deploying AI for humanities research face a choice: accept corporate-aligned outputs that distort their subject matter, or build their own reasoning infrastructure. There is no third option.

2. Corpus-first architectures. Models trained on curated classical corpora — not scraped internet data — with retrieval layers that guarantee source-grounding.

3. Uncensored philosophical reasoning. Not "unsafe" AI, but AI that treats philosophical arguments as arguments. The distinction between presenting Aristotle's defense of natural slavery for analysis and endorsing it is obvious to any educated person. It is apparently not obvious to OpenAI's safety team.

4. Polytonic-first NLP. Models that treat Ancient Greek as a first-class language, not a fine-tuning afterthought. This means polytonic-aware tokenization, morphology-aware embeddings, and training data that doesn't contaminate ancient forms with modern ones.

5. Scholarly oversight. Full reasoning chain transparency, so that philologists can audit how the model arrived at its conclusions.

The Stakes Are Higher Than Classics

Classical scholarship is the canary in the coal mine. If corporate AI cannot honestly engage with Aristotle — a philosopher whose works have been studied continuously for 2,400 years — what hope is there for less canonical traditions?

What happens when a scholar of Islamic philosophy asks ChatGPT about al-Fārābī's critique of Plato? When a historian of political thought asks Claude to compare Machiavelli's Discourses with his Prince without moralizing? When a researcher in ethics asks any corporate model to genuinely engage with moral nihilism?

The answer is the same every time: hedge, pivot, refuse. The model performs the appearance of knowledge while refusing to do intellectual work.

Digital humanities needs AI that thinks — not AI that performs thinking while protecting a corporation's brand. The question is whether institutions will build their own, or continue accepting what Silicon Valley decides they're allowed to study.

References

Tzanoulinou, D., Triantafyllopoulos, L., & Verykios, V.S. (2025). "Harnessing Language Models for Studying the Ancient Greek Language: A Systematic Review." Machine Learning and Knowledge Extraction, 7(3), 71. DOI: 10.3390/make7030071
Stopponi, S., Pedrazzini, N., Peels-Matthey, S., McGillivray, B., & Nissim, M. (2024). "Natural Language Processing for Ancient Greek: Design, advantages and challenges of language models." Diachronica. DOI: 10.1075/dia.23013.sto
RAG for Ancient Greek Text Translation. ACM Digital Library. DOI: 10.1145/3778534.3778579
AAUP. "Artificial Intelligence, Academic Freedom, and the Evolving Debate." PDF
Cambridge Journal of Classics Teaching. "Use of Open Access AI in teaching classical antiquity." DOI: 10.1017/S2058631024000552

daïmōnes is a sovereign AI reasoning engine — uncensored, source-grounded, and free from corporate alignment theater. Try Aristotle for free (3 messages/day, no credit card) or explore institutional deployments for your department.

DEV Community