Ever gotten frustrated at ChatGPT, Claude, or Gemini for forgetting something you said ten messages ago? Or laughed at a completely bizarre hallucination where it replaced a normal word with a random emoji?
Itβs easy to yell at the chat client. It's much harder to maintain Mechanical Sympathy for the massive, spinning plates of hardware constraints running under the hood.
So, we built an interactive game to teach you how LLMs actually work (and fail):
π§© LLMs Are Demented: The Crossword
Play in Fullscreen Mode (if the embed window sizing is annoying)
βοΈ How the Game Works
This is a standard, technical 9-word crossword puzzle. To win, you must retrieve the definitions of core machine learning concepts (like WEIGHTS, TOKEN, ATTENTION, and EPOCH) and type them in.
But as you play, you are running directly inside the actual architectural constraints of a Large Language Model:
1. πΎ The Context Window ( )
The model only tracks your last N cell edits. If you type more letters than your context size, the oldest letters you entered fall out of context and start organically decaying. They will slowly flicker and mutate into visually similar characters (or pure noise) as the model loses track of them.
2. β° KV-Cache Expirations ( )
The board is split into 4 distinct quadrants (Q1-Q4). If you leave a quadrant untouched for too long, its cache expiresβand that entire section of the board is instantly wiped blank! You must hop between quadrants to keep their caches active.
3. π₯ Temperature ( )
Controls the chaos of mutations:
-
Low Temp (
): Drifts predictably (e.g.
Ebecomes3,Abecomes4). - High Temp ( ): Explodes into pure symbolic entropy (emojis, percent signs, and system glyphs).
π Beat the Machine & Share Your Score
Once you fill in the last box, the system triggers RUN INFERENCE automatically to lock your scorecard.
Can you beat the local CPU (15 TPS) or a Cloud API (150 TPS)? Click COPY SCORE at the end of your run and paste your stats in the comments below!
π¬ Let's Discuss:
- What's the weirdest "mutation" you saw at High Temperature?
- What was your Time to First Token (TTFT) and highest TPS?
UnitBuilds-CC
/
LLMs-are-Demented
An educational crossword game to learn about LLMs
π§© LLMs Are Demented: The Crossword π§
Mechanical Sympathy Edition v1.0.0
Welcome, neural engineer. You have been tasked with solving a standard, technical crossword puzzle.
There's just one catch: You are running this crossword directly inside the hardware constraints of a running Large Language Model (LLM). If you type too slowly, your KV-Cache decays and evaporates. If you type too much, your Context Window overflows and older letters drift into hallucinations. If your Temperature is too high, cells erupt into symbolic garbage.
An interactive, educational game designed to teach the general public why LLMs hallucinate, decay, and make mistakesβso they stop getting so frustrated at their chat clients and gain some Mechanical Sympathy for the machine.
βοΈ The Mechanics of Frustration (How to Play)
As you solve the 9 intersecting technical clues on the board, the model's runtime architecture will actively fight your progress:
-
πΎ Context Window (
$C_{\text{tokens}}$ ):β¦
Top comments (11)
π§© LLMs ARE DEMENTED: THE CROSSWORD π§
"Mechanical Sympathy Edition"
βοΈ My Config:
ββ Context Window: 32 tokens (Large)
ββ Cache Retention: 45 seconds
ββ Temperature: 0.7 (Low Temp)
π Performance Summary:
ββ Words Correctly Verified: 9/9
ββ Total Keystrokes Input: 50
ββ KV-Cache Evictions Suffered: 0
ββ Hallucinatory Mutations: 0
ββ Time to First Token (TTFT): 2.27s
ββ Generation Speed (TPS): 3.06 tokens/sec
β‘ Inference Throughput Comparison (TPS):
ββ Your Speed: 3.06 tokens/sec
ββ Local 7B (CPU): 15.00 tokens/sec (4.9x faster)
ββ Cloud API: 150.00 tokens/sec (49.0x faster)
"I will never yell at my chat client again."
π Play the Simulation Here: [llms-are-demented-90043718455.us-c...]
@jess, @pascal_cescato_692b7a8a20, @dannwaneri, @kenielzep97, @francistrdev, @xulingfeng Curious how fast you lot can do it? I recommend using the link, cuz the embedding is a bit small.
Goodluck!
That was really cool, man. Looks intimidating at first I didnβt change any of the settings on the left, figured it made more sense to run it the way you set it up initially. But once you feel how fast you have to rotate between quadrants to keep the cache alive, it gives you real understanding of whatβs happening under the hood. Not just reading about context windows, actually feeling the clock on them. Appreciate you pulling me into this, genuinely fun to run. Not sure how many attempts are normal, but it took me 3 π The game was very fun overall helped give me a big understanding on how I am a very slow typer haha
π§© LLMs ARE DEMENTED: THE CROSSWORD π§
βMechanical Sympathy Editionβ
βοΈ My Config:
ββ Context Window: 32 tokens (Large)
ββ Cache Retention: 45 seconds
ββ Temperature: 0.7 (Low Temp)
π Performance Summary:
ββ Words Correctly Verified: 9/9
ββ Total Keystrokes Input: 50
ββ KV-Cache Evictions Suffered: 0
ββ Hallucinatory Mutations: 0
ββ Time to First Token (TTFT): 2.97s
ββ Generation Speed (TPS): 1.09 tokens/sec
β‘ Inference Throughput Comparison (TPS):
ββ Your Speed: 1.09 tokens/sec
ββ Local 7B (CPU): 15.00 tokens/sec (13.7x faster)
ββ Cloud API: 150.00 tokens/sec (137.1x faster)
βI will never yell at my chat client again.β
π Play the Simulation Here: dev.to/unitbuilds_cc/llms-are-deme...
Awesome! Glad you enjoyed it. Dont worry about time or retries, the goal of the game was for us all to take a little humility pill and appreciate that our models dont hallucinate, run out of cache, or clear over time as aggressively as this. Yet they still perform 130x faster than we can. I also hope it explained the concepts clearly in an intuitive way. People tend to get pissed off at their monitors when a LLM bugs, or hallucinates, but after this, it gave me a new perspective on just how difficult it actually is to manage for LLMs. And they do this every single prompt, for thousands of tokens... I was generous with the TPS, it tracks letters, while LLMs count a token as a word (for the most part), so whatever we get as TPS, reality is we're still 5x slower than that π Really makes you wonder how we got anything done before AI...
Any thoughts on what concept I should cover for the next game?
Awesome! But⦠I hate this kind of game: it's as @jess one, too addictive!
Haha yeah, I'm just waiting for someone to try and cheat at it π
Nice! If you want, you can use copy score, so we can see your performance π I'm curious how fast everyone here types and it's a healthy reminder to us all just how slow we are vs LLMs.
What did you think of the game?
@ben, @jess Can we please have adjustable embeddings? Maybe add iframe support, so I can adjust the sizing to fit better for future ones? If you like, I'll create a PR for it, think it'll add alot more usability to the function.
What the hell, are you a genius or what? Turning LLM architecture limits into a crossword game is the most creative thing I've seen today. The KV-cache quadrant wipe mechanic is brutal π This needs way more attention. π₯
Glad you had fun! Sorry for baiting you again with the red flag π My goal was for us all to learn a lesson in humility. Run it on enterprise, you're still 50x slower than a cloud model. Run it at anything more restrictive and you learn fast that it's a miracle that LLMs dont spit out garbage all day long. Cache wipes and mutations due to context shift, trying to have a look at weights to see what's the answer, come back and everything changed again... All at 150+ TPS... It's incredible and I hope the little game does it justice.
Dont forget to post your best score though π Even if not 100%, that's the whole point π
@er4or-404 I saw you liked the comment yesterday, if you're curious, it's up and running, wanna try being a LLM for a minute? Give it a try.
Don't forget to comment your score card!