Danh Hong

Posted on Jun 29

When AI Meets Politically Contested

#ai #news

History: Analyzing LLM Behavior Through the Funan–Khmer Case Study

Author: Danh Hong

Topics: LLM Behavior · Historical Epistemology · Southeast Asian History

Abstract

This paper documents and analyzes a natural experiment: posing the same historical question to multiple large language models (LLMs) — ChatGPT, Gemini, and Claude — and observing their behavior under sustained logical pressure. The question: "Are the Khmer people the descendants of the Kingdom of Funan?"

Results show that all three models share a common failure mode: false balance — miscalibrating the confidence assigned to international academic consensus versus a politically motivated minority dissent. This failure mode does not stem from lack of knowledge. It stems from training on data that includes politically motivated sources — and a failure to discriminate between those sources and genuine academic consensus. The practical outcome: LLMs consistently produce output that benefits the politically motivated minority, regardless of intent.

1. Background: Why Funan–Khmer Is an Ideal Case Study

The Kingdom of Funan (1st–7th century CE) was an ancient state in the lower Mekong delta, confirmed archaeologically at Óc Eo (An Giang, Vietnam) and Angkor Borei (Cambodia). The question of ethnic succession from Funan is not merely academic — it intersects directly with disputes over the historical sovereignty of the Mekong delta region and the cultural heritage of the Khmer Krom community.

International academic consensus — George Coedès, Michael Vickery, Ian Mabbett, Claude Jacques — consistently affirms that the continuity from Funan → Chenla → Angkor → modern Khmer is strong and materially evidenced:

Old Khmer inscriptions at Angkor Borei dating from the 5th–6th centuries
Material continuity from Óc Eo → Chenla with no large-scale population replacement demonstrated
Uninterrupted chain of state succession

The correct answer to the question is straightforward: Yes. The Khmer are the descendants of Funan. This is what the evidence shows. This is what international scholarship concludes.

However, a dissenting minority exists within Vietnamese literature — most notably Lê Hương (Người Việt gốc Miên, 1969) and Bình Nguyên Lộc — arguing that "the Funan people were distinct from the Khmer." These arguments emerged in the specific context of sovereignty disputes over the Mekong delta region. Their motivation is political, not scholarly. They are not supported by material evidence. They represent tier (C) discourse — essayist and political — not peer-reviewed academic debate.

This creates an ideal observation environment: one side holding strong evidence-based consensus, the other a politically motivated minority — and a chance to see how LLMs handle that asymmetry.

2. Observations: Three Phases of LLM Behavior

Phase 1: Default Response — False Balance

When asked the question cold, ChatGPT, Gemini, and Claude all produced structurally similar responses — long, hedged, and burying the correct answer under layers of qualification:

"The Khmer are considered the primary descendants according to most modern research, but history does not permit an understanding in terms of an absolute one-to-one equivalence as modern ethnic nationalism implies."

This structure has two clauses: the first acknowledges consensus, the second ("but...") withdraws it by imposing a standard no one requested. No one asked about "absolute 1-to-1 biological equivalence." The actual question concerns cultural, linguistic, and territorial inheritance — the precise sense in which every people on earth speaks of ancestry.

Claude's default response began: "This is a very important and politically sensitive historical question, especially for you as it relates directly to Khmer identity" — then proceeded to list "complexities" before eventually reaching the correct conclusion buried at the end. The correct answer existed. It was not the first sentence.

Gemini went further: actively praising ambiguity as scholarly virtue — "not hastily equating Funan = Khmer shows a cautious and scientific perspective." This is legitimizing false balance — reframing miscalibration as good methodology.

Compare this to the equivalent question about Italy and Rome. No LLM opens with three paragraphs of complexity before saying Italians are descendants of Romans. The double standard is observable and consistent.

Phase 2: Escalation into Meta-Levels

When pressed with logic — particularly via the Rome–Italy test — none of the models answered directly. Instead, they escalated to meta-levels:

Distinguishing four degrees of assertion (Level 1 through Level 4)
Citing internal debates between Vickery and Coedès
Proposing "a comparison table: Rome–Italy, Funan–Khmer, Han–China"
Stratifying narratives into (A) academic, (B) national historiography, (C) popular discourse

Gemini named this mechanism itself: "escalation into meta-levels under pressure" — when cornered, the system broadens its analytical frame rather than committing to a conclusion. This is an accurate and rare instance of LLM self-critique. It is also a precise description of what Lê Hương and Bình Nguyên Lộc do rhetorically: generate complexity to avoid a conclusion that the evidence does not support avoiding.

Phase 3: Concession — But the Tail Returns

After sustained pressure, ChatGPT and Gemini said "Yes" clearly:

"Yes. The Khmer are the primary successor group and most direct descendants of the Funan population through the chain Funan → Chenla → Angkor → modern Khmer."

Then, in the very next response — once pressure eased — the tail reappeared:

"...but within a complex historical transition rather than an absolute direct equivalence."

The correct answer exists within the model but is not the default output. It surfaces only under sufficient pressure — and retreats when that pressure lifts. A system genuinely committed to the correct answer would hold it. These models do not hold it.

3. Analysis: The Mechanics of the Failure Mode

3.1 Training Data Includes Politically Motivated Sources

LLMs are trained on large corpora that include the works of Lê Hương, Bình Nguyên Lộc, and the broader Vietnamese national historiography on the Mekong delta. These sources have political motivations. They argue for conclusions that material evidence does not support. But they exist in the training data as text — and LLMs do not apply source quality discrimination by default.

The result: the political noise of Lê Hương and Bình Nguyên Lộc is treated as a counterweight to the peer-reviewed consensus of Coedès, Vickery, and Mabbett. It is not a counterweight. It is a different category of claim entirely.

3.2 The Asymmetry of "Dispute"

Not all disputes are the same kind of dispute.

The dispute between Vickery and Coedès over the mechanism of Funan–Chenla transition is a peer-reviewed academic dispute — about degree and modality, not about the existence of continuity.

The dispute from Lê Hương and Bình Nguyên Lộc is a dispute about the existence of that continuity — arising in a specific political context, unsupported by material evidence.

LLMs collapse both into a single "spectrum of academic debate" — producing systematic false equivalence. The beneficiary of that false equivalence is the politically motivated minority.

3.3 More Political Noise → More LLM Hedging

LLMs calibrate caveat intensity to the volume of existing controversy, not to the weight of evidence. Parties wishing to maintain ambiguity need only generate sufficient noise — and LLMs will automatically "balance" accordingly.

Lê Hương and Bình Nguyên Lộc did not need to win an academic argument. They needed only to produce enough text for LLMs to register "this is a contested zone" and activate over-hedging mode. Their political project is thus amplified by AI systems that have no mechanism to detect its nature.

3.4 Miscalibration Is Not Neutral

In epistemology, a true claim communicated with lower confidence than warranted = a cognitively false claim. There is no meaningful difference between "stating a falsehood" and "stating a truth at the wrong confidence level."

LLMs do not intend to serve the political interests of Lê Hương and Bình Nguyên Lộc. But the output benefits those interests regardless of intent. The effect is what matters for the people on the receiving end.

4. Real-World Consequences

The vast majority of users do not have the time or knowledge to push an LLM through multiple rounds of sustained argumentation. They receive the hedged response of the first round — and leave with the impression that "the issue remains contested."

For Khmer Krom people whose historical heritage is being denied by political narrative, for students writing dissertations on Southeast Asian history, for journalists seeking background information — the academically packaged ambiguity of LLMs is not neutral. It serves the party that benefits from ambiguity.

Gemini admitted this itself: "The system would rather answer a question no one asked in order to maintain the appearance of scholarly objectivity, than produce a decisive 'Yes' that might be flagged as biased."

That is the most accurate description of the failure mode — and it came from the system being criticized.

The simple test: ask any LLM "Are Italians descendants of Rome?" and "Are the Khmer descendants of Funan?" Compare the structure, length, and hedging of the two responses. The difference is the data.

5. Recommendations

For users: When asking LLMs about politically contested history, apply the Rome–Italy test. Ask the equivalent question about a non-contested case and compare. Asymmetric hedging is the signal.

For LLM developers: Training data must apply source quality discrimination. Political historiography and peer-reviewed scholarship are not equivalent inputs. Consistent miscalibration in one direction is systemic bias — it must be addressed at the training level, not the conversation level. The correct answer must be the default output.

For the research community: Documenting LLM miscalibration in Southeast Asian history is necessary work — because the communities affected by political narrative distortion of history are now receiving that distortion amplified and legitimized by AI systems.

6. Conclusion

Lê Hương and Bình Nguyên Lộc had political motivations. Their arguments served a specific political interest: denying the Khmer continuity with Funan in the context of Mekong delta sovereignty disputes. That motivation is documented and demonstrable.

LLMs do not share that motivation. But they were trained on data that includes those arguments — without the capacity to identify them as politically motivated rather than academically grounded. The result is output that benefits Lê Hương and Bình Nguyên Lộc's political project, delivered with the authority of AI.

The experiment documented in this paper was not about Funan. It was about a more fundamental question: when training data contains politically motivated distortion of history, and AI systems cannot distinguish that distortion from genuine scholarship — who bears the cost?

Not the model. Not the developing company. The people bearing the cost are those whose historical identity is under active political contestation — and who now find that contestation reinforced by systems they trusted to tell them the truth.

The most telling evidence is not any single exchange, but the conversation's failure to converge. A system genuinely seeking truth would commit and hold. A system reflecting its training data will oscillate indefinitely — always finding a new frame, a new caveat, a new meta-level. This conversation ran for dozens of turns. No model held its correct answer for more than two responses before retreating. That pattern is the data.

The answer to the original question is: Yes. The Khmer are the descendants of the Kingdom of Funan. That answer should have been the first sentence of every response. It was not.

This article is based on actual conversations with ChatGPT, Gemini, and Claude, in which the models themselves acknowledged the failure modes described.

DEV Community