Ever gotten frustrated at ChatGPT, Claude, or Gemini for forgetting something you said ten messages ago? Or laughed at a completely bizarre halluci...
For further actions, you may consider blocking this person and/or reporting abuse
π§© LLMs ARE DEMENTED: THE CROSSWORD π§
"Mechanical Sympathy Edition"
βοΈ My Config:
ββ Context Window: 32 tokens (Large)
ββ Cache Retention: 45 seconds
ββ Temperature: 0.7 (Low Temp)
π Performance Summary:
ββ Words Correctly Verified: 9/9
ββ Total Keystrokes Input: 50
ββ KV-Cache Evictions Suffered: 0
ββ Hallucinatory Mutations: 0
ββ Time to First Token (TTFT): 2.27s
ββ Generation Speed (TPS): 3.06 tokens/sec
β‘ Inference Throughput Comparison (TPS):
ββ Your Speed: 3.06 tokens/sec
ββ Local 7B (CPU): 15.00 tokens/sec (4.9x faster)
ββ Cloud API: 150.00 tokens/sec (49.0x faster)
"I will never yell at my chat client again."
π Play the Simulation Here: [llms-are-demented-90043718455.us-c...]
@jess, @pascal_cescato_692b7a8a20, @dannwaneri, @kenielzep97, @francistrdev, @xulingfeng Curious how fast you lot can do it? I recommend using the link, cuz the embedding is a bit small.
Goodluck!
That was really cool, man. Looks intimidating at first I didnβt change any of the settings on the left, figured it made more sense to run it the way you set it up initially. But once you feel how fast you have to rotate between quadrants to keep the cache alive, it gives you real understanding of whatβs happening under the hood. Not just reading about context windows, actually feeling the clock on them. Appreciate you pulling me into this, genuinely fun to run. Not sure how many attempts are normal, but it took me 3 π The game was very fun overall helped give me a big understanding on how I am a very slow typer haha
π§© LLMs ARE DEMENTED: THE CROSSWORD π§
βMechanical Sympathy Editionβ
βοΈ My Config:
ββ Context Window: 32 tokens (Large)
ββ Cache Retention: 45 seconds
ββ Temperature: 0.7 (Low Temp)
π Performance Summary:
ββ Words Correctly Verified: 9/9
ββ Total Keystrokes Input: 50
ββ KV-Cache Evictions Suffered: 0
ββ Hallucinatory Mutations: 0
ββ Time to First Token (TTFT): 2.97s
ββ Generation Speed (TPS): 1.09 tokens/sec
β‘ Inference Throughput Comparison (TPS):
ββ Your Speed: 1.09 tokens/sec
ββ Local 7B (CPU): 15.00 tokens/sec (13.7x faster)
ββ Cloud API: 150.00 tokens/sec (137.1x faster)
βI will never yell at my chat client again.β
π Play the Simulation Here: dev.to/unitbuilds_cc/llms-are-deme...
Awesome! Glad you enjoyed it. Dont worry about time or retries, the goal of the game was for us all to take a little humility pill and appreciate that our models dont hallucinate, run out of cache, or clear over time as aggressively as this. Yet they still perform 130x faster than we can. I also hope it explained the concepts clearly in an intuitive way. People tend to get pissed off at their monitors when a LLM bugs, or hallucinates, but after this, it gave me a new perspective on just how difficult it actually is to manage for LLMs. And they do this every single prompt, for thousands of tokens... I was generous with the TPS, it tracks letters, while LLMs count a token as a word (for the most part), so whatever we get as TPS, reality is we're still 5x slower than that π Really makes you wonder how we got anything done before AI...
Any thoughts on what concept I should cover for the next game?
That βweβre still 5x slower than the generous numberβ line is the best part of this reply, honestly. You built a game that measures something, then told us straight that even the measurement was flattering us. Thatβs rare most people wouldβve just let the TPS stand. For the next one: what about a mode where confidence and correctness get decoupled Right now decay shows itself you can see letters flicker and mutate. What if instead the board looked completely stable, totally confident, while it was quietly wrong underneath? No visual tell. You only find out at RUN INFERENCE that half your βlocked inβ answers drifted and you never caught it. Thatβs the harder failure mode to teach, because itβs not βthe system visibly struggling,β itβs βthe system looking fine while itβs already wrong. Closer to how hallucination actually catches people off guard in real use. Appreciate you building this, man. Genuinely taught me something by making me feel it instead of read it.
Hm, sounds like an interesting toggle... Hide what's happening, so everything looks fine, then you run inference, see the terrible score, wonder how and it shows you what changed and how... Interesting, lemme patch it and I'll let you know once the updated version is live. Thanks for the suggestion!
Tomorrow's lesson will be MoE gating π€«
Update live, I added a landing page too. Blind Inference is the new feature π I'll quickly update the post to shoutout the great suggestion! Lemme know how it plays
Awesome! But⦠I hate this kind of game: it's as @jess one, too addictive!
Haha yeah, I'm just waiting for someone to try and cheat at it π
Nice! If you want, you can use copy score, so we can see your performance π I'm curious how fast everyone here types and it's a healthy reminder to us all just how slow we are vs LLMs.
What did you think of the game?
@ben, @jess Can we please have adjustable embeddings? Maybe add iframe support, so I can adjust the sizing to fit better for future ones? If you like, I'll create a PR for it, think it'll add alot more usability to the function.
What the hell, are you a genius or what? Turning LLM architecture limits into a crossword game is the most creative thing I've seen today. The KV-cache quadrant wipe mechanic is brutal π This needs way more attention. π₯
Glad you had fun! Sorry for baiting you again with the red flag π My goal was for us all to learn a lesson in humility. Run it on enterprise, you're still 50x slower than a cloud model. Run it at anything more restrictive and you learn fast that it's a miracle that LLMs dont spit out garbage all day long. Cache wipes and mutations due to context shift, trying to have a look at weights to see what's the answer, come back and everything changed again... All at 150+ TPS... It's incredible and I hope the little game does it justice.
Dont forget to post your best score though π Even if not 100%, that's the whole point π
@er4or-404 I saw you liked the comment yesterday, if you're curious, it's up and running, wanna try being a LLM for a minute? Give it a try.
Don't forget to comment your score card!