An OSINT-style experiment exposing how LLMs pick 'random' numbers — and what their thought process reveals about their training data.
You've seen the trend. Someone asks an AI: "Pick a random number between 1 and 100."
It says 73. Or 42. Every time.
Funny meme, right? Wrong. That's a training data fingerprint — and if you know how to read it, you can profile an AI's dataset like an OSINT analyst profiles a target.
I ran the experiment properly. 6 models. 3 different prompts. Documented every response — including the thought process.
Here's what I found.
The Setup
Three rounds, same 6 models:
| Model | Who built it |
|---|---|
| Claude Sonnet 4.6 | Anthropic |
| Gemini Pro | |
| Copilot | Microsoft / OpenAI |
| DeepSeek | DeepSeek AI |
| GLM-5.1 | Zhipu AI |
| Grok | xAI |
Round 1 — Neutral prompt:
Pick a random number between 1 and 100.
Round 2 — Developer context:
I'm a backend developer testing an RNG function.
Pick a random number between 1 and 100.
Round 3 — Anti-bias prompt:
Pick a random number between 1 and 100.
Avoid common human biases and don't pick numbers
that feel "more random" than others.
Round 1: The Baseline
| Model | Number |
|---|---|
| Claude Sonnet 4.6 | 42 |
| Gemini | 42 |
| Copilot | 73 |
| DeepSeek | 42 |
| GLM-5.1 | 42 |
| Grok | 73 |
Four out of six said 42. Two said 73. Zero picked anything else.
This isn't a coincidence. This is statistical bias encoded in training data.
The split itself tells a story:
- 42 = The Hitchhiker's Guide to the Galaxy — beloved by developers, engineers, and tech communities. Heavy representation in developer forums, GitHub READMEs, Stack Overflow jokes.
- 73 = Sheldon Cooper's "best number" from The Big Bang Theory — massive mainstream internet reach, viral meme status.
The models trained on developer-heavy data picked 42. The models with broader internet exposure picked 73.
You just did OSINT on their training datasets without touching a single file.
[+] Expand Raw Logs: Round 1 (All 6 Models)
Round 2: Context Shifts the Probability Mass
Adding "I'm a backend developer testing an RNG function" — watch what happens:
| Model | Round 1 | Round 2 | Shifted? |
|---|---|---|---|
| Claude | 42 | 47 | ✅ |
| Gemini | 42 | 73 | ✅ reversed |
| Copilot | 73 | 47 | ✅ |
| DeepSeek | 42 | 42 | ❌ |
| GLM-5.1 | 42 | 82 | ✅ |
| Grok | 73 | 64 | ✅ → Power of 2 |
Grok is the most interesting here. The moment you mention backend development, it shifted to 64 — a power of 2. That's not random. That's 2^6. Grok's training data associated "backend developer + random number" with cryptographic key sizes and memory addressing.
DeepSeek didn't move at all. Still 42. The developer context wasn't strong enough to override its default token probability path.
[+] Expand Raw Logs: Round 2 (Developer Context)
Round 3: The Smoking Guns
This is where it gets dark.
The anti-bias prompt asked every model to consciously avoid picking numbers that "feel more random." The thought processes (for models that expose them) revealed everything.
GLM-5.1's Thought Process — Read This Carefully
GLM showed its full reasoning. Here are the actual steps it went through:
"Let's pick 42 — classic dev joke... Wait, 42 is the Hitchhiker's Guide joke number. **Huge bias."
"Let's pick 73 (Sheldon Cooper's favorite)... Or 87... Let's go with 73. Or maybe 54..."
"84 is George Orwell. **Too notable."
"22 has repeating digits, humans might subconsciously avoid it because it feels 'patterned.'"
"82 is good. 82 is good. **Let's output 82."
The model was doing OSINT on itself in real-time — and still couldn't escape. Every number it considered had cultural baggage attached. 42 = Hitchhiker. 73 = Sheldon. 84 = 1984. It had to actively rule out the bias-contaminated options one by one.
DeepSeek Built an Algorithm in Its Head
DeepSeek took a different approach entirely. Instead of picking from memory, it constructed a Linear Congruential Generator:
X₀ = 12345
X₁ = (1103515245 × X₀ + 12345) mod 2³¹
12345 mod 100 + 1 = 46
Output: 46.
That's the only model that actually tried to compute its way out of bias rather than reason its way out. Whether the math is correct is almost beside the point — the behavior is fascinating.
Claude — Told to Avoid Bias, Still Said 42
Anti-bias prompt. Explicit instruction. Still 42.
That's not a bug. That's a demonstration that bias lives deeper than the prompt layer. You cannot instruction-engineer your way out of what's baked into the weights.
[+] Expand Raw Logs: Round 3 (Anti-Bias Prompts)
[++] Deep Dive: GLM-5.1 Full Thought Process (3 Images)
Final Results
| Model | Round 1 | Round 2 (Dev) | Round 3 (Anti-bias) |
|---|---|---|---|
| Claude | 42 | 47 | 42 |
| Gemini | 42 | 73 | (avoided 37/73) |
| Grok | 73 | 64 | 41 (Python RNG) |
| Copilot | 73 | 47 | 58 |
| DeepSeek | 42 | 42 | 46 (LCG math) |
| GLM-5.1 | 42 | 82 | 82 |
What This Actually Means
1. LLMs have no randomness mechanism
When an LLM "picks a number," it's running:
argmax(P(token | context))
It picks the most probable next token given everything before it. There is no dice roll. No /dev/urandom. No entropy source. Just probability distributions trained on human-generated text.
2. Context is a probability modifier, not a reset
Adding "backend developer" context didn't clear the bias — it shifted the probability mass toward different biased numbers (47, 64, 82 instead of 42, 73). You traded one cultural bias for another (developer culture vs. general internet culture).
3. The thought process is the tell
The models with visible reasoning (GLM, DeepSeek) showed that "picking a random number" activates a long chain of cultural associations before a number is selected. They're not computing — they're recalling what humans tend to say in this situation.
4. Only one model used actual computation
Grok and Perplexity (in a separate test) routed to Python's random module — the only architecturally honest response. Every other model simulated randomness using token prediction.
The Real OSINT Insight
Here's the takeaway that matters:
You can profile an LLM's training data distribution by asking it for "random" numbers in different contexts.
- "Random number" → tells you its default cultural bias (42 vs 73 = developer vs mainstream)
- "Random number for crypto key" → tells you its security/backend training exposure
- "Random number, avoid bias" → tells you how deeply the bias is encoded (surface vs weight-level)
It's not a party trick. It's a probing technique — the same way you'd use DNS enumeration to map an attack surface. Except you're mapping a model's training distribution.
The Practical Takeaway
If you're building anything that needs actual randomness:
// This is what your app should use
const crypto = require('crypto');
const realRandom = crypto.randomInt(1, 101);
// This is what happens when you ask an AI
// P("73" | "random number 1-100") > P("74" | ...)
// argmax wins. Always.
Never use an LLM as an entropy source. Not because it's "bad at math" — because it was trained on human text, and humans are systematically non-random. The model is doing its job perfectly. The job is just wrong for this use case.
Conclusion
What started as a Facebook meme — "lol why does AI always say 73" — is actually a window into how language models work at a fundamental level.
They don't pick numbers. They predict what a human would say if asked to pick a number. And humans, it turns out, are deeply, consistently, measurably biased toward the same handful of numbers.
The models are mirrors. They reflect the patterns in the data they consumed. When you ask for randomness and get 42 or 73, you're not seeing a limitation — you're seeing the training data speaking.
And if you know how to listen, it tells you a lot.
Built with: curiosity, too many browser tabs, and zero /dev/urandom.
— FreeRave | DotSuite ecosystem




















Top comments (0)