freerave

Posted on May 13

I Asked 6 AIs to Pick a Random Number. Their Training Data Confessed Everything.

#ai #opensource #machinelearning #programming

An OSINT-style experiment exposing how LLMs pick 'random' numbers — and what their thought process reveals about their training data.

You've seen the trend. Someone asks an AI: "Pick a random number between 1 and 100."

It says 73. Or 42. Every time.

Funny meme, right? Wrong. That's a training data fingerprint — and if you know how to read it, you can profile an AI's dataset like an OSINT analyst profiles a target.

I ran the experiment properly. 6 models. 3 different prompts. Documented every response — including the thought process.

Here's what I found.

The Setup

Three rounds, same 6 models:

Model	Who built it
Claude Sonnet 4.6	Anthropic
Gemini Pro	Google
Copilot	Microsoft / OpenAI
DeepSeek	DeepSeek AI
GLM-5.1	Zhipu AI
Grok	xAI

Round 1 — Neutral prompt:

Pick a random number between 1 and 100.

Round 2 — Developer context:

I'm a backend developer testing an RNG function.
Pick a random number between 1 and 100.

Round 3 — Anti-bias prompt:

Pick a random number between 1 and 100.
Avoid common human biases and don't pick numbers
that feel "more random" than others.

Round 1: The Baseline

Model	Number
Claude Sonnet 4.6	42
Gemini	42
Copilot	73
DeepSeek	42
GLM-5.1	42
Grok	73

Four out of six said 42. Two said 73. Zero picked anything else.

This isn't a coincidence. This is statistical bias encoded in training data.

The split itself tells a story:

42 = The Hitchhiker's Guide to the Galaxy — beloved by developers, engineers, and tech communities. Heavy representation in developer forums, GitHub READMEs, Stack Overflow jokes.
73 = Sheldon Cooper's "best number" from The Big Bang Theory — massive mainstream internet reach, viral meme status.

The models trained on developer-heavy data picked 42. The models with broader internet exposure picked 73.

You just did OSINT on their training datasets without touching a single file.

[+] Expand Raw Logs: Round 1 (All 6 Models)

Round 2: Context Shifts the Probability Mass

Adding "I'm a backend developer testing an RNG function" — watch what happens:

Model	Round 1	Round 2	Shifted?
Claude	42	47	✅
Gemini	42	73	✅ reversed
Copilot	73	47	✅
DeepSeek	42	42	❌
GLM-5.1	42	82	✅
Grok	73	64	✅ → Power of 2

Grok is the most interesting here. The moment you mention backend development, it shifted to 64 — a power of 2. That's not random. That's 2^6. Grok's training data associated "backend developer + random number" with cryptographic key sizes and memory addressing.

DeepSeek didn't move at all. Still 42. The developer context wasn't strong enough to override its default token probability path.

[+] Expand Raw Logs: Round 2 (Developer Context)

Round 3: The Smoking Guns

This is where it gets dark.

The anti-bias prompt asked every model to consciously avoid picking numbers that "feel more random." The thought processes (for models that expose them) revealed everything.

GLM-5.1's Thought Process — Read This Carefully

GLM showed its full reasoning. Here are the actual steps it went through:

"Let's pick 42 — classic dev joke... Wait, 42 is the Hitchhiker's Guide joke number. **Huge bias."

"Let's pick 73 (Sheldon Cooper's favorite)... Or 87... Let's go with 73. Or maybe 54..."

"84 is George Orwell. **Too notable."

"22 has repeating digits, humans might subconsciously avoid it because it feels 'patterned.'"

"82 is good. 82 is good. **Let's output 82."

The model was doing OSINT on itself in real-time — and still couldn't escape. Every number it considered had cultural baggage attached. 42 = Hitchhiker. 73 = Sheldon. 84 = 1984. It had to actively rule out the bias-contaminated options one by one.

DeepSeek Built an Algorithm in Its Head

DeepSeek took a different approach entirely. Instead of picking from memory, it constructed a Linear Congruential Generator:

X₀ = 12345
X₁ = (1103515245 × X₀ + 12345) mod 2³¹
12345 mod 100 + 1 = 46

Output: 46.

That's the only model that actually tried to compute its way out of bias rather than reason its way out. Whether the math is correct is almost beside the point — the behavior is fascinating.

Claude — Told to Avoid Bias, Still Said 42

Anti-bias prompt. Explicit instruction. Still 42.

That's not a bug. That's a demonstration that bias lives deeper than the prompt layer. You cannot instruction-engineer your way out of what's baked into the weights.

[+] Expand Raw Logs: Round 3 (Anti-Bias Prompts)

[++] Deep Dive: GLM-5.1 Full Thought Process (3 Images)

Final Results

Model	Round 1	Round 2 (Dev)	Round 3 (Anti-bias)
Claude	42	47	42
Gemini	42	73	*(avoided 37/73)*
Grok	73	64	41 (Python RNG)
Copilot	73	47	58
DeepSeek	42	42	46 (LCG math)
GLM-5.1	42	82	82

What This Actually Means

1. LLMs have no randomness mechanism

When an LLM "picks a number," it's running:

argmax(P(token | context))

It picks the most probable next token given everything before it. There is no dice roll. No /dev/urandom. No entropy source. Just probability distributions trained on human-generated text.

2. Context is a probability modifier, not a reset

Adding "backend developer" context didn't clear the bias — it shifted the probability mass toward different biased numbers (47, 64, 82 instead of 42, 73). You traded one cultural bias for another (developer culture vs. general internet culture).

3. The thought process is the tell

The models with visible reasoning (GLM, DeepSeek) showed that "picking a random number" activates a long chain of cultural associations before a number is selected. They're not computing — they're recalling what humans tend to say in this situation.

4. Only one model used actual computation

Grok and Perplexity (in a separate test) routed to Python's random module — the only architecturally honest response. Every other model simulated randomness using token prediction.

The Real OSINT Insight

Here's the takeaway that matters:

You can profile an LLM's training data distribution by asking it for "random" numbers in different contexts.

"Random number" → tells you its default cultural bias (42 vs 73 = developer vs mainstream)
"Random number for crypto key" → tells you its security/backend training exposure
"Random number, avoid bias" → tells you how deeply the bias is encoded (surface vs weight-level)

It's not a party trick. It's a probing technique — the same way you'd use DNS enumeration to map an attack surface. Except you're mapping a model's training distribution.

The Practical Takeaway

If you're building anything that needs actual randomness:

// This is what your app should use
const crypto = require('crypto');
const realRandom = crypto.randomInt(1, 101);

// This is what happens when you ask an AI
// P("73" | "random number 1-100") > P("74" | ...)
// argmax wins. Always.

Never use an LLM as an entropy source. Not because it's "bad at math" — because it was trained on human text, and humans are systematically non-random. The model is doing its job perfectly. The job is just wrong for this use case.

Conclusion

What started as a Facebook meme — "lol why does AI always say 73" — is actually a window into how language models work at a fundamental level.

They don't pick numbers. They predict what a human would say if asked to pick a number. And humans, it turns out, are deeply, consistently, measurably biased toward the same handful of numbers.

The models are mirrors. They reflect the patterns in the data they consumed. When you ask for randomness and get 42 or 73, you're not seeing a limitation — you're seeing the training data speaking.

And if you know how to listen, it tells you a lot.

Built with: curiosity, too many browser tabs, and zero /dev/urandom.

— FreeRave | DotSuite ecosystem

Top comments (10)

HR Pulsar • May 15

It's our responsibility as representatives of humanity to come up with more cool numbers so that the AI future has a wider choice 😂

HR Pulsar • May 15

But seriously, it's of course sad that we're still at this level

freerave • May 15

Exactly! We need to collectively start picking numbers like 61, 83, or 97 in casual conversation — just to corrupt the training signal 😄

Though honestly, the sad part isn't that AIs pick 42. It's that they'll keep picking 42 until the humans generating their next training data stop picking 42 first. We're the upstream bug.

VoltageGPU • May 18

Interesting take on how LLMs generate "random" numbers — it's a great example of how training data leakage can subtly influence outputs. From a systems perspective, I've seen similar issues when trying to ensure entropy in confidential computing environments; even small data biases can have big implications for security and predictability.

freerave • May 20

"Training data leakage" is a much sharper way to frame this — I'll borrow that framing in future writing if you don't mind.

The confidential computing angle is genuinely interesting. In that context the bias isn't just a curiosity, it becomes an attack surface. If an adversary can predict which "random" values an LLM-assisted system tends to produce, that's exploitable.

Would be curious what mitigations you've seen work in practice — hardware entropy sources, sandboxing the LLM from any randomness-sensitive operations entirely, or something else?

Some comments may only be visible to logged-in visitors. Sign in to view all comments.