DEV Community

Cover image for LLMs are Demented!
UnitBuilds for UnitBuilds CC

Posted on

LLMs are Demented!

Ever gotten frustrated at ChatGPT, Claude, or Gemini for forgetting something you said ten messages ago? Or laughed at a completely bizarre hallucination where it replaced a normal word with a random emoji?

It’s easy to yell at the chat client. It's much harder to maintain Mechanical Sympathy for the massive, spinning plates of hardware constraints running under the hood.

So, we built an interactive game to teach you how LLMs actually work (and fail):

🧩 LLMs Are Demented: The Crossword

Play in Fullscreen Mode (if the embed window sizing is annoying)


βš™οΈ How the Game Works

This is a standard, technical 9-word crossword puzzle. To win, you must retrieve the definitions of core machine learning concepts (like WEIGHTS, TOKEN, ATTENTION, and EPOCH) and type them in.

But as you play, you are running directly inside the actual architectural constraints of a Large Language Model:

1. πŸ’Ύ The Context Window ( CtokensC_{\text{tokens}} )

The model only tracks your last N cell edits. If you type more letters than your context size, the oldest letters you entered fall out of context and start organically decaying. They will slowly flicker and mutate into visually similar characters (or pure noise) as the model loses track of them.

2. ⏰ KV-Cache Expirations ( Ο„\tau )

The board is split into 4 distinct quadrants (Q1-Q4). If you leave a quadrant untouched for too long, its cache expiresβ€”and that entire section of the board is instantly wiped blank! You must hop between quadrants to keep their caches active.

3. πŸ”₯ Temperature ( TT )

Controls the chaos of mutations:

  • Low Temp ( T≀0.8T \le 0.8 ): Drifts predictably (e.g. E becomes 3, A becomes 4).
  • High Temp ( Tβ‰₯1.3T \ge 1.3 ): Explodes into pure symbolic entropy (emojis, percent signs, and system glyphs).

πŸ› οΈ Choose Your Hardware Preset

Before you click INITIATE RUN, select your inference endpoint difficulty:

  • 🏒 Enterprise API (Easy): Large context window ($C=64$), 90-second cache, very low temperature. Very forgiving.
  • πŸ’» Local Llama (Medium): Quantized 7B model running on a laptop ($C=32$), 45-second cache, standard temperature ($0.7$). You'll need to move fast to avoid decay.
  • 🍞 Smart Toaster (Hard): Edge inference on a kitchen appliance ($C=16$), 15-second cache, high temperature ($1.4$). Complete hardware chaos.

Tip: If you need a cheatsheet, click the 🧠 VIEW WEIGHTS button to dump the answers database. But be warned: the database query locks keyboard inputs, forcing you to close the weights, switch contexts, and recall the answers from memory!


🏁 Beat the Machine & Share Your Score

Once you fill in the last box, the system triggers RUN INFERENCE automatically to lock your scorecard.

Can you beat the local CPU (15 TPS) or a Cloud API (150 TPS)? Click COPY SCORE at the end of your run and paste your stats in the comments below!


πŸ’¬ Let's Discuss:

  • What's the weirdest "mutation" you saw at High Temperature?
  • What was your Time to First Token (TTFT) and highest TPS?

GitHub logo UnitBuilds-CC / LLMs-are-Demented

An educational crossword game to learn about LLMs

🧩 LLMs Are Demented: The Crossword 🧠

Mechanical Sympathy Edition v1.0.0

Model Accuracy: 100% Temperature: Chaos Deployment: Cloud%20Run

Welcome, neural engineer. You have been tasked with solving a standard, technical crossword puzzle.

There's just one catch: You are running this crossword directly inside the hardware constraints of a running Large Language Model (LLM). If you type too slowly, your KV-Cache decays and evaporates. If you type too much, your Context Window overflows and older letters drift into hallucinations. If your Temperature is too high, cells erupt into symbolic garbage.

An interactive, educational game designed to teach the general public why LLMs hallucinate, decay, and make mistakesβ€”so they stop getting so frustrated at their chat clients and gain some Mechanical Sympathy for the machine.


βš™οΈ The Mechanics of Frustration (How to Play)

As you solve the 9 intersecting technical clues on the board, the model's runtime architecture will actively fight your progress:

  • πŸ’Ύ Context Window ($C_{\text{tokens}}$):…

Top comments (11)

Collapse
 
unitbuilds profile image
UnitBuilds UnitBuilds CC

🧩 LLMs ARE DEMENTED: THE CROSSWORD 🧠
"Mechanical Sympathy Edition"

βš™οΈ My Config:
β”œβ”€ Context Window: 32 tokens (Large)
β”œβ”€ Cache Retention: 45 seconds
└─ Temperature: 0.7 (Low Temp)

πŸ“Š Performance Summary:
β”œβ”€ Words Correctly Verified: 9/9
β”œβ”€ Total Keystrokes Input: 50
β”œβ”€ KV-Cache Evictions Suffered: 0
β”œβ”€ Hallucinatory Mutations: 0
β”œβ”€ Time to First Token (TTFT): 2.27s
└─ Generation Speed (TPS): 3.06 tokens/sec

⚑ Inference Throughput Comparison (TPS):
β”œβ”€ Your Speed: 3.06 tokens/sec
β”œβ”€ Local 7B (CPU): 15.00 tokens/sec (4.9x faster)
└─ Cloud API: 150.00 tokens/sec (49.0x faster)

"I will never yell at my chat client again."
πŸš€ Play the Simulation Here: [llms-are-demented-90043718455.us-c...]

@jess, @pascal_cescato_692b7a8a20, @dannwaneri, @kenielzep97, @francistrdev, @xulingfeng Curious how fast you lot can do it? I recommend using the link, cuz the embedding is a bit small.

Goodluck!

Collapse
 
kenielzep97 profile image
Self-Correcting Systems

That was really cool, man. Looks intimidating at first I didn’t change any of the settings on the left, figured it made more sense to run it the way you set it up initially. But once you feel how fast you have to rotate between quadrants to keep the cache alive, it gives you real understanding of what’s happening under the hood. Not just reading about context windows, actually feeling the clock on them. Appreciate you pulling me into this, genuinely fun to run. Not sure how many attempts are normal, but it took me 3 πŸ˜‚ The game was very fun overall helped give me a big understanding on how I am a very slow typer haha

🧩 LLMs ARE DEMENTED: THE CROSSWORD 🧠
β€œMechanical Sympathy Edition”
βš™οΈ My Config:
β”œβ”€ Context Window: 32 tokens (Large)
β”œβ”€ Cache Retention: 45 seconds
└─ Temperature: 0.7 (Low Temp)
πŸ“Š Performance Summary:
β”œβ”€ Words Correctly Verified: 9/9
β”œβ”€ Total Keystrokes Input: 50
β”œβ”€ KV-Cache Evictions Suffered: 0
β”œβ”€ Hallucinatory Mutations: 0
β”œβ”€ Time to First Token (TTFT): 2.97s
└─ Generation Speed (TPS): 1.09 tokens/sec
⚑ Inference Throughput Comparison (TPS):
β”œβ”€ Your Speed: 1.09 tokens/sec
β”œβ”€ Local 7B (CPU): 15.00 tokens/sec (13.7x faster)
└─ Cloud API: 150.00 tokens/sec (137.1x faster)
β€œI will never yell at my chat client again.”
πŸš€ Play the Simulation Here: dev.to/unitbuilds_cc/llms-are-deme...

Collapse
 
unitbuilds profile image
UnitBuilds UnitBuilds CC

Awesome! Glad you enjoyed it. Dont worry about time or retries, the goal of the game was for us all to take a little humility pill and appreciate that our models dont hallucinate, run out of cache, or clear over time as aggressively as this. Yet they still perform 130x faster than we can. I also hope it explained the concepts clearly in an intuitive way. People tend to get pissed off at their monitors when a LLM bugs, or hallucinates, but after this, it gave me a new perspective on just how difficult it actually is to manage for LLMs. And they do this every single prompt, for thousands of tokens... I was generous with the TPS, it tracks letters, while LLMs count a token as a word (for the most part), so whatever we get as TPS, reality is we're still 5x slower than that πŸ˜… Really makes you wonder how we got anything done before AI...

Any thoughts on what concept I should cover for the next game?

Collapse
 
pascal_cescato_692b7a8a20 profile image
Pascal CESCATO

Awesome! But… I hate this kind of game: it's as @jess one, too addictive!

Collapse
 
unitbuilds profile image
UnitBuilds UnitBuilds CC

Haha yeah, I'm just waiting for someone to try and cheat at it 😁

Collapse
 
dannwaneri profile image
Daniel Nwaneri

 enterprise mode. didn't trust myself on toaster after the day we just had.

Collapse
 
unitbuilds profile image
UnitBuilds UnitBuilds CC

Nice! If you want, you can use copy score, so we can see your performance πŸ˜‚ I'm curious how fast everyone here types and it's a healthy reminder to us all just how slow we are vs LLMs.

What did you think of the game?

Collapse
 
unitbuilds profile image
UnitBuilds UnitBuilds CC

@ben, @jess Can we please have adjustable embeddings? Maybe add iframe support, so I can adjust the sizing to fit better for future ones? If you like, I'll create a PR for it, think it'll add alot more usability to the function.

Collapse
 
xulingfeng profile image
xulingfeng

What the hell, are you a genius or what? Turning LLM architecture limits into a crossword game is the most creative thing I've seen today. The KV-cache quadrant wipe mechanic is brutal πŸ˜‚ This needs way more attention. πŸ”₯

Collapse
 
unitbuilds profile image
UnitBuilds UnitBuilds CC

Glad you had fun! Sorry for baiting you again with the red flag πŸ˜‚ My goal was for us all to learn a lesson in humility. Run it on enterprise, you're still 50x slower than a cloud model. Run it at anything more restrictive and you learn fast that it's a miracle that LLMs dont spit out garbage all day long. Cache wipes and mutations due to context shift, trying to have a look at weights to see what's the answer, come back and everything changed again... All at 150+ TPS... It's incredible and I hope the little game does it justice.

Dont forget to post your best score though πŸ˜‚ Even if not 100%, that's the whole point 😁

Collapse
 
unitbuilds profile image
UnitBuilds UnitBuilds CC

@er4or-404 I saw you liked the comment yesterday, if you're curious, it's up and running, wanna try being a LLM for a minute? Give it a try.

Don't forget to comment your score card!