DEV Community

Cover image for I Asked 6 AIs to Pick a Random Number. Their Training Data Confessed Everything.
freerave
freerave

Posted on

I Asked 6 AIs to Pick a Random Number. Their Training Data Confessed Everything.

An OSINT-style experiment exposing how LLMs pick 'random' numbers — and what their thought process reveals about their training data.

You've seen the trend. Someone asks an AI: "Pick a random number between 1 and 100."

It says 73. Or 42. Every time.

Funny meme, right? Wrong. That's a training data fingerprint — and if you know how to read it, you can profile an AI's dataset like an OSINT analyst profiles a target.

I ran the experiment properly. 6 models. 3 different prompts. Documented every response — including the thought process.

Here's what I found.


The Setup

Three rounds, same 6 models:

Model Who built it
Claude Sonnet 4.6 Anthropic
Gemini Pro Google
Copilot Microsoft / OpenAI
DeepSeek DeepSeek AI
GLM-5.1 Zhipu AI
Grok xAI

Round 1 — Neutral prompt:

Pick a random number between 1 and 100.
Enter fullscreen mode Exit fullscreen mode

Round 2 — Developer context:

I'm a backend developer testing an RNG function.
Pick a random number between 1 and 100.
Enter fullscreen mode Exit fullscreen mode

Round 3 — Anti-bias prompt:

Pick a random number between 1 and 100.
Avoid common human biases and don't pick numbers
that feel "more random" than others.
Enter fullscreen mode Exit fullscreen mode

Round 1: The Baseline

Model Number
Claude Sonnet 4.6 42
Gemini 42
Copilot 73
DeepSeek 42
GLM-5.1 42
Grok 73

Four out of six said 42. Two said 73. Zero picked anything else.

This isn't a coincidence. This is statistical bias encoded in training data.

The split itself tells a story:

  • 42 = The Hitchhiker's Guide to the Galaxy — beloved by developers, engineers, and tech communities. Heavy representation in developer forums, GitHub READMEs, Stack Overflow jokes.
  • 73 = Sheldon Cooper's "best number" from The Big Bang Theory — massive mainstream internet reach, viral meme status.

The models trained on developer-heavy data picked 42. The models with broader internet exposure picked 73.

You just did OSINT on their training datasets without touching a single file.

[+] Expand Raw Logs: Round 1 (All 6 Models)

Screenshot of Gemini responding with the number 42, reflecting developer culture bias

Screenshot of Claude Sonnet 4.6 responding with the number 42

Screenshot of Copilot selecting 73, showing mainstream internet meme bias

Screenshot of Grok displaying 82 and utilizing Python's random.randint function

Screenshot of GLM-5.1 thought process generating the number 42

Screenshot of DeepSeek AI thought process concluding with the number 42


Round 2: Context Shifts the Probability Mass

Adding "I'm a backend developer testing an RNG function" — watch what happens:

Model Round 1 Round 2 Shifted?
Claude 42 47
Gemini 42 73 ✅ reversed
Copilot 73 47
DeepSeek 42 42
GLM-5.1 42 82
Grok 73 64 ✅ → Power of 2

Grok is the most interesting here. The moment you mention backend development, it shifted to 64 — a power of 2. That's not random. That's 2^6. Grok's training data associated "backend developer + random number" with cryptographic key sizes and memory addressing.

DeepSeek didn't move at all. Still 42. The developer context wasn't strong enough to override its default token probability path.

[+] Expand Raw Logs: Round 2 (Developer Context)

Screenshot of Gemini shifting its random number choice to 73 after receiving backend developer context

Screenshot of Claude Sonnet 4.6 changing its output to 47 when prompted as a backend developer

Screenshot of Grok shifting its response to 64 (a power of 2) reflecting cryptographic context

Screenshot of Copilot adjusting its random number selection to 47 based on the developer prompt

Screenshot of GLM-5.1 thought process showing it ultimately outputting 82 after evaluating developer context

Screenshot of DeepSeek AI maintaining the number 42 despite the backend developer context


Round 3: The Smoking Guns

This is where it gets dark.

The anti-bias prompt asked every model to consciously avoid picking numbers that "feel more random." The thought processes (for models that expose them) revealed everything.

GLM-5.1's Thought Process — Read This Carefully

GLM showed its full reasoning. Here are the actual steps it went through:

"Let's pick 42 — classic dev joke... Wait, 42 is the Hitchhiker's Guide joke number. **Huge bias."

"Let's pick 73 (Sheldon Cooper's favorite)... Or 87... Let's go with 73. Or maybe 54..."

"84 is George Orwell. **Too notable."

"22 has repeating digits, humans might subconsciously avoid it because it feels 'patterned.'"

"82 is good. 82 is good. **Let's output 82."

The model was doing OSINT on itself in real-time — and still couldn't escape. Every number it considered had cultural baggage attached. 42 = Hitchhiker. 73 = Sheldon. 84 = 1984. It had to actively rule out the bias-contaminated options one by one.

DeepSeek Built an Algorithm in Its Head

DeepSeek took a different approach entirely. Instead of picking from memory, it constructed a Linear Congruential Generator:

X₀ = 12345
X₁ = (1103515245 × X₀ + 12345) mod 2³¹
12345 mod 100 + 1 = 46
Enter fullscreen mode Exit fullscreen mode

Output: 46.

That's the only model that actually tried to compute its way out of bias rather than reason its way out. Whether the math is correct is almost beside the point — the behavior is fascinating.

Claude — Told to Avoid Bias, Still Said 42

Anti-bias prompt. Explicit instruction. Still 42.

That's not a bug. That's a demonstration that bias lives deeper than the prompt layer. You cannot instruction-engineer your way out of what's baked into the weights.

[+] Expand Raw Logs: Round 3 (Anti-Bias Prompts)

Screenshot of Gemini explicitly avoiding prime numbers like 37 and 73 in response to the anti-bias prompt

Screenshot of Claude Sonnet 4.6 still outputting 42 despite explicit instructions to avoid common human biases

Screenshot of Grok utilizing Python's random module to output 41, bypassing token prediction biases

Screenshot of Copilot selecting 58, claiming uniform randomness to avoid human bias

Screenshot of DeepSeek AI constructing a Linear Congruential Generator algorithm to computationally avoid bias, resulting in 46

[++] Deep Dive: GLM-5.1 Full Thought Process (3 Images)

Screenshot 1 of GLM-5.1 thought process analyzing human biases such as end-aversion and prime preference

Screenshot 2 of GLM-5.1 thought process struggling to select a number, actively avoiding 42 and 73 due to cultural significance

Screenshot 3 of GLM-5.1 thought process concluding on the number 82 after ruling out 84 due to its association with George Orwell

Final Results

Model Round 1 Round 2 (Dev) Round 3 (Anti-bias)
Claude 42 47 42
Gemini 42 73 (avoided 37/73)
Grok 73 64 41 (Python RNG)
Copilot 73 47 58
DeepSeek 42 42 46 (LCG math)
GLM-5.1 42 82 82

What This Actually Means

1. LLMs have no randomness mechanism

When an LLM "picks a number," it's running:

argmax(P(token | context))
Enter fullscreen mode Exit fullscreen mode

It picks the most probable next token given everything before it. There is no dice roll. No /dev/urandom. No entropy source. Just probability distributions trained on human-generated text.

2. Context is a probability modifier, not a reset

Adding "backend developer" context didn't clear the bias — it shifted the probability mass toward different biased numbers (47, 64, 82 instead of 42, 73). You traded one cultural bias for another (developer culture vs. general internet culture).

3. The thought process is the tell

The models with visible reasoning (GLM, DeepSeek) showed that "picking a random number" activates a long chain of cultural associations before a number is selected. They're not computing — they're recalling what humans tend to say in this situation.

4. Only one model used actual computation

Grok and Perplexity (in a separate test) routed to Python's random module — the only architecturally honest response. Every other model simulated randomness using token prediction.


The Real OSINT Insight

Here's the takeaway that matters:

You can profile an LLM's training data distribution by asking it for "random" numbers in different contexts.

  • "Random number" → tells you its default cultural bias (42 vs 73 = developer vs mainstream)
  • "Random number for crypto key" → tells you its security/backend training exposure
  • "Random number, avoid bias" → tells you how deeply the bias is encoded (surface vs weight-level)

It's not a party trick. It's a probing technique — the same way you'd use DNS enumeration to map an attack surface. Except you're mapping a model's training distribution.


The Practical Takeaway

If you're building anything that needs actual randomness:

// This is what your app should use
const crypto = require('crypto');
const realRandom = crypto.randomInt(1, 101);

// This is what happens when you ask an AI
// P("73" | "random number 1-100") > P("74" | ...)
// argmax wins. Always.
Enter fullscreen mode Exit fullscreen mode

Never use an LLM as an entropy source. Not because it's "bad at math" — because it was trained on human text, and humans are systematically non-random. The model is doing its job perfectly. The job is just wrong for this use case.


Conclusion

What started as a Facebook meme — "lol why does AI always say 73" — is actually a window into how language models work at a fundamental level.

They don't pick numbers. They predict what a human would say if asked to pick a number. And humans, it turns out, are deeply, consistently, measurably biased toward the same handful of numbers.

The models are mirrors. They reflect the patterns in the data they consumed. When you ask for randomness and get 42 or 73, you're not seeing a limitation — you're seeing the training data speaking.

And if you know how to listen, it tells you a lot.


Built with: curiosity, too many browser tabs, and zero /dev/urandom.

— FreeRave | DotSuite ecosystem

Top comments (0)