DEV Community

Cover image for LLMs are Demented!

LLMs are Demented!

UnitBuilds on July 01, 2026

Ever gotten frustrated at ChatGPT, Claude, or Gemini for forgetting something you said ten messages ago? Or laughed at a completely bizarre halluci...
Collapse
 
unitbuilds profile image
UnitBuilds UnitBuilds CC

🧩 LLMs ARE DEMENTED: THE CROSSWORD 🧠
"Mechanical Sympathy Edition"

βš™οΈ My Config:
β”œβ”€ Context Window: 32 tokens (Large)
β”œβ”€ Cache Retention: 45 seconds
└─ Temperature: 0.7 (Low Temp)

πŸ“Š Performance Summary:
β”œβ”€ Words Correctly Verified: 9/9
β”œβ”€ Total Keystrokes Input: 50
β”œβ”€ KV-Cache Evictions Suffered: 0
β”œβ”€ Hallucinatory Mutations: 0
β”œβ”€ Time to First Token (TTFT): 2.27s
└─ Generation Speed (TPS): 3.06 tokens/sec

⚑ Inference Throughput Comparison (TPS):
β”œβ”€ Your Speed: 3.06 tokens/sec
β”œβ”€ Local 7B (CPU): 15.00 tokens/sec (4.9x faster)
└─ Cloud API: 150.00 tokens/sec (49.0x faster)

"I will never yell at my chat client again."
πŸš€ Play the Simulation Here: [llms-are-demented-90043718455.us-c...]

@jess, @pascal_cescato_692b7a8a20, @dannwaneri, @kenielzep97, @francistrdev, @xulingfeng Curious how fast you lot can do it? I recommend using the link, cuz the embedding is a bit small.

Goodluck!

Collapse
 
kenielzep97 profile image
Self-Correcting Systems

That was really cool, man. Looks intimidating at first I didn’t change any of the settings on the left, figured it made more sense to run it the way you set it up initially. But once you feel how fast you have to rotate between quadrants to keep the cache alive, it gives you real understanding of what’s happening under the hood. Not just reading about context windows, actually feeling the clock on them. Appreciate you pulling me into this, genuinely fun to run. Not sure how many attempts are normal, but it took me 3 πŸ˜‚ The game was very fun overall helped give me a big understanding on how I am a very slow typer haha

🧩 LLMs ARE DEMENTED: THE CROSSWORD 🧠
β€œMechanical Sympathy Edition”
βš™οΈ My Config:
β”œβ”€ Context Window: 32 tokens (Large)
β”œβ”€ Cache Retention: 45 seconds
└─ Temperature: 0.7 (Low Temp)
πŸ“Š Performance Summary:
β”œβ”€ Words Correctly Verified: 9/9
β”œβ”€ Total Keystrokes Input: 50
β”œβ”€ KV-Cache Evictions Suffered: 0
β”œβ”€ Hallucinatory Mutations: 0
β”œβ”€ Time to First Token (TTFT): 2.97s
└─ Generation Speed (TPS): 1.09 tokens/sec
⚑ Inference Throughput Comparison (TPS):
β”œβ”€ Your Speed: 1.09 tokens/sec
β”œβ”€ Local 7B (CPU): 15.00 tokens/sec (13.7x faster)
└─ Cloud API: 150.00 tokens/sec (137.1x faster)
β€œI will never yell at my chat client again.”
πŸš€ Play the Simulation Here: dev.to/unitbuilds_cc/llms-are-deme...

Collapse
 
unitbuilds profile image
UnitBuilds UnitBuilds CC

Awesome! Glad you enjoyed it. Dont worry about time or retries, the goal of the game was for us all to take a little humility pill and appreciate that our models dont hallucinate, run out of cache, or clear over time as aggressively as this. Yet they still perform 130x faster than we can. I also hope it explained the concepts clearly in an intuitive way. People tend to get pissed off at their monitors when a LLM bugs, or hallucinates, but after this, it gave me a new perspective on just how difficult it actually is to manage for LLMs. And they do this every single prompt, for thousands of tokens... I was generous with the TPS, it tracks letters, while LLMs count a token as a word (for the most part), so whatever we get as TPS, reality is we're still 5x slower than that πŸ˜… Really makes you wonder how we got anything done before AI...

Any thoughts on what concept I should cover for the next game?

Thread Thread
 
kenielzep97 profile image
Self-Correcting Systems • Edited

That β€œwe’re still 5x slower than the generous number” line is the best part of this reply, honestly. You built a game that measures something, then told us straight that even the measurement was flattering us. That’s rare most people would’ve just let the TPS stand. For the next one: what about a mode where confidence and correctness get decoupled Right now decay shows itself you can see letters flicker and mutate. What if instead the board looked completely stable, totally confident, while it was quietly wrong underneath? No visual tell. You only find out at RUN INFERENCE that half your β€œlocked in” answers drifted and you never caught it. That’s the harder failure mode to teach, because it’s not β€œthe system visibly struggling,” it’s β€œthe system looking fine while it’s already wrong. Closer to how hallucination actually catches people off guard in real use. Appreciate you building this, man. Genuinely taught me something by making me feel it instead of read it.

Thread Thread
 
unitbuilds profile image
UnitBuilds UnitBuilds CC

Hm, sounds like an interesting toggle... Hide what's happening, so everything looks fine, then you run inference, see the terrible score, wonder how and it shows you what changed and how... Interesting, lemme patch it and I'll let you know once the updated version is live. Thanks for the suggestion!

Tomorrow's lesson will be MoE gating 🀫

Thread Thread
 
unitbuilds profile image
UnitBuilds UnitBuilds CC

Update live, I added a landing page too. Blind Inference is the new feature 😁 I'll quickly update the post to shoutout the great suggestion! Lemme know how it plays

Collapse
 
pascal_cescato_692b7a8a20 profile image
Pascal CESCATO

Awesome! But… I hate this kind of game: it's as @jess one, too addictive!

Collapse
 
unitbuilds profile image
UnitBuilds UnitBuilds CC

Haha yeah, I'm just waiting for someone to try and cheat at it 😁

Collapse
 
dannwaneri profile image
Daniel Nwaneri

 enterprise mode. didn't trust myself on toaster after the day we just had.

Collapse
 
unitbuilds profile image
UnitBuilds UnitBuilds CC

Nice! If you want, you can use copy score, so we can see your performance πŸ˜‚ I'm curious how fast everyone here types and it's a healthy reminder to us all just how slow we are vs LLMs.

What did you think of the game?

Collapse
 
unitbuilds profile image
UnitBuilds UnitBuilds CC

@ben, @jess Can we please have adjustable embeddings? Maybe add iframe support, so I can adjust the sizing to fit better for future ones? If you like, I'll create a PR for it, think it'll add alot more usability to the function.

Collapse
 
xulingfeng profile image
xulingfeng

What the hell, are you a genius or what? Turning LLM architecture limits into a crossword game is the most creative thing I've seen today. The KV-cache quadrant wipe mechanic is brutal πŸ˜‚ This needs way more attention. πŸ”₯

Collapse
 
unitbuilds profile image
UnitBuilds UnitBuilds CC

Glad you had fun! Sorry for baiting you again with the red flag πŸ˜‚ My goal was for us all to learn a lesson in humility. Run it on enterprise, you're still 50x slower than a cloud model. Run it at anything more restrictive and you learn fast that it's a miracle that LLMs dont spit out garbage all day long. Cache wipes and mutations due to context shift, trying to have a look at weights to see what's the answer, come back and everything changed again... All at 150+ TPS... It's incredible and I hope the little game does it justice.

Dont forget to post your best score though πŸ˜‚ Even if not 100%, that's the whole point 😁

Collapse
 
unitbuilds profile image
UnitBuilds UnitBuilds CC

@er4or-404 I saw you liked the comment yesterday, if you're curious, it's up and running, wanna try being a LLM for a minute? Give it a try.

Don't forget to comment your score card!