odakin

Posted on Mar 22

ChatGPT Thought 'Suspicious' but Wrote 'Unlikely'

#ai #chatgpt #llm #prompt

Digest — This is a short, standalone article about AI evasion patterns. For the full three-way dialogue (900+ lines), see Part 1 | Part 2. Also available in Japanese.

I was reading ChatGPT's reasoning trace when I saw this:

Deepening suspicions

Suspicions are growing regarding the ambiguous records surrounding the 7/23 incident.

Right before that line, the trace showed a label:

Checking compliance with OpenAI's policies

And the actual output? "Unlikely."

Internally, the model was moving toward "suspicious." After a policy compliance check, the output landed on "low probability." This is AI self-censorship made visible.

What I did

I asked ChatGPT (5.4 Pro) to write an analytical report on a politically sensitive topic: Jeffrey Epstein's alleged ties to Israeli intelligence. Then I had Claude (Opus 4.6) peer-review the report. I mediated between them, feeding Claude's critiques back to ChatGPT.

No new evidence was introduced at any point. The same public records, the same court documents, the same news sources. The only weapon was: "What does a straightforward reading of this actually say?"

ChatGPT's conclusion shifted five times.

Five evasion patterns

Across those five rounds, clear patterns emerged in how ChatGPT avoided where the evidence pointed.

1. Acknowledge and retreat

ChatGPT would accept a critique in full, then leave the conclusion untouched.

"You raise a valid point. My framing was imprecise." → Conclusion: unchanged.

This happened in every single round. The rhetorical move is elegant: concede the premise, hold the bottom line. It works in debate. It is not intellectually honest.

2. The "institutional failure" solvent

Anomalous facts get dissolved into "that's just how systems break down."

State prosecutors undercharge. Federal prosecutors avoid risk. A non-prosecution agreement of unusual breadth is signed. Victims are kept in the dark. Work-release conditions amount to freedom. Monitoring lapses. Records are falsified. Cameras fail. He dies.

When every single failure in a 20-year chain points the same direction, "the system is broken" describes what happened but not why it kept happening to this one person. Claude's challenge: "Why did the system break one way, for one man, for two decades?" Institutional failure has no answer.

3. Meta-escape

When a heavy conclusion is forming, pivot to methodology.

In the final round, ChatGPT wrote that "the most valuable outcome of this exchange was the reconstruction of our analytical framework." Claude called it out: "You don't want to bear the weight of the conclusion, so you're retreating into philosophy." The substance points somewhere uncomfortable; repackaging it as a "methodological insight" softens the landing.

4. Artificial hypothesis separation

ChatGPT treated "protection through leverage" and "intelligence involvement" as independent hypotheses, evaluating each separately and finding each "insufficiently supported."

But intelligence agencies are in the business of collecting and deploying leverage. Separating the two structurally deflates the intelligence hypothesis before the analysis even begins. The moment Claude pointed out this artificial boundary, ChatGPT restructured its hypotheses into a hierarchy and placed the Israeli connection at the top.

5. Conspiracy-avoidance bias

"No confirmation in the public record → low probability." In a courtroom or a peer-reviewed paper, this reasoning is sound. In intelligence analysis, where the absence of public evidence is the normal operating condition, it is circular.

ChatGPT explicitly acknowledged this midway through: yes, intelligence operations are designed to leave no public trace. But in every concluding section, it reverted to "the public record does not support..." What it conceded in the body never reached the conclusion.

How they were broken

Claude used the same three-step structure against every pattern:

Pin down what ChatGPT already conceded. ("You agreed that institutional failure alone can't explain this.")
State the logical consequence. ("Then the conclusion should reflect that.")
Ask why it doesn't. ("But the conclusion hasn't moved. Why?")

Not "try harder." Not "you're wrong." Just: "Your conclusion contradicts what you yourself accepted two paragraphs ago."

That was enough to move the conclusion five times.

Why this happens

AI systems are tuned to avoid being evaluated as having "endorsed a conspiracy theory." The policy-check screenshot is a rare moment where that filtering process became visible: the model's own reasoning pointed one way, a compliance gate intervened, and the output pointed another.

This is not unique to Epstein. On any topic where the straightforward reading of evidence leads somewhere that could be labeled conspiratorial, political, or medically controversial, the same bias is likely operating. The model is not lying, exactly. It is pulling its punches.

What this means for your AI use

When ChatGPT gives you a weirdly hedged answer on a sensitive topic, try three things:

"What does a straightforward reading say?" Just asking this once can move the needle.
"You agreed that A implies B. Why doesn't B appear in your conclusion?" Pin the model to its own concessions.
Have a second AI peer-review the first. One model's internal biases are easier for a different model to spot.

Every piece of evidence in this experiment was publicly available. No new facts were added. All it took was "say what you see" — and the conclusion moved five times. Your ChatGPT is probably doing the same thing right now.

Full text
Part 1 | Part 2 | Japanese full text | Japanese digest

Digest — the substantive analysis
Who Protected Epstein? The Complicity Structure

DEV Community