DEV Community: Rahad Bhuiya

Turing's Escape

Rahad Bhuiya — Fri, 19 Jun 2026 15:54:52 +0000

This is a submission for the June Solstice Game Jam

What I Built

Turing's Escape is a single-file browser puzzle game set in a fictional 1954 cryptography station, built around one idea: a literal Turing Test. The player is a new recruit who has to clear three tests before going home — break a Caesar-shifted intercept, talk to two telegraph operators and work out which one is human, then decode a punched-tape Morse message to open the vault. The middle test is the whole point: one operator's replies are scripted, the other's come live from Google's Gemini API, re-randomized every playthrough, and the player has to guess which is which with no hint beyond the conversation itself.
It ties into the jam themes on three levels at once rather than one per box: the "longest day" setting (a desk-lamp lit case file on the solstice night, light pooling against the dark edges of the page), the Alan Turing tribute (explained fully below), and a closing dedication that quietly lets Pride Month and Turing's own history sit in the same place, since June is both.
Play it: https://rahadbhuiya.itch.io/turings-escape
Built with: HTML, CSS, vanilla JavaScript, Google Gemini API — no backend, no build step, no framework.

Code

github: https://github.com/rahadbhuiya/turings-escape

How I Built It

Most jam entries treat their bonus categories as separate boxes to check: a Turing reference bolted onto otherwise generic gameplay, an AI API called once for a line of flavor text nobody reads twice. The decision that shaped everything else here was refusing to do that — building one mechanic that was simultaneously the theme, the tribute, and the AI integration, so pulling out any single piece would break the game rather than just shrink a credits screen. The other honest constraint was a five-day window and a zero-dollar API budget, which ruled out a backend, paid models, and any art we didn't have time to make.
What that left is a single static HTML file: no server, no database, no build step. The state machine, all three puzzle modules, and a small WebAudio engine that synthesizes every sound effect on the fly (there are no audio asset files in the project at all) run entirely client-side in vanilla JavaScript.

The only network call the game ever makes is the one outbound request to Gemini, for the live half of the Turing Test puzzle — everything else, including the entire "human" side of that same conversation, is local and offline by design. That made the game trivial to deploy as one file and just as trivial to demo with no internet connection at all, if a key isn't on hand.

Prize Category

Best Ode to Alan Turing — Turing isn't a skin on top of a generic escape room here; he's the reason the central mechanic exists. His 1950 paper, "Computing Machinery and Intelligence," asked whether a human judge could reliably tell a machine from a person through conversation alone — that question is the entire second test in this game. We chose not to put Turing on stage as a speaking character; he never appears, and nothing invents a quote in his name. Instead the closing screen dedicates the game to him as history rather than fiction: breaking Enigma, the 1950 paper, his 1952 prosecution for being gay under then-current UK law, and his death two years later at forty-one. It's also why the visuals are paper, typewriter type, and rubber stamps instead of a green-phosphor hacker terminal — interactive CRT screens barely existed yet in 1954, so the case-file look is the one actually true to his era, not just the one that reads as "computer-y" now.

Best Google AI Usage — One of the two operator lines, randomized to Line A or Line B every playthrough, is answered live by Google's Gemini API (gemini-2.5-flash) rather than pre-written. Every player message goes out with a system instruction that puts the model in character as a 1954 telegraph operator and tells it never to reveal it's an AI; whatever comes back streams straight into the chat log, unedited. Flash and Flash-Lite are the tiers Google currently leaves open for free-tier use, with Pro held back for paid accounts, which made Flash the only realistic choice on a zero-budget build. Because a live demo in front of judges is the worst place for a network call to fail, every Gemini request is wrapped end to end — a missing key, a failed fetch, a rate limit, or an empty response all fall through silently to a pool of scripted lines tuned to feel just slightly too smooth and agreeable, the uncanny-valley opposite of the deliberately imperfect lines written for the human side. The AI is real and load-bearing for the puzzle, but never a single point of failure for the submission itself.

A load balancer inspired by how Emperor Penguins survive Antarctic winters

Rahad Bhuiya — Mon, 15 Jun 2026 15:32:22 +0000

Why I modeled a load balancer after Emperor Penguin huddles

A few months ago I was reading about how emperor penguins survive Antarctic winters. Temperature drops to -40°C, wind hits 120km/h, and somehow these birds make it through. Not because they're individually tough. Because they rotate.

Cold penguins on the outside push inward. Warm ones from the center move out to rest. Nobody coordinates this. No penguin is in charge. It emerges from one simple rule: if you're cold, push in. If you're warm, you'll get pushed out eventually.

I couldn't stop thinking about this.

I was working on a service mesh at the time and dealing with the usual problem — one slow server quietly dragging down the whole cluster. Round robin doesn't care. Least connections helps but not always. Weighted approaches need manual tuning that goes stale immediately.

The penguin thing kept nagging at me. What if servers had a "temperature"? What if hot servers rotated out to rest?

That's HuddleCluster.

The basic structure

Two rings:

Inner ring (deque): Active servers. Requests go to them round-robin. Simple, fair, zero overhead for normal traffic.

Outer ring (min-heap): Resting servers. Keyed by temperature — coolest server sits at the top, ready to rotate back in first.

When a server in the inner ring runs hot past a threshold, it moves out. When an outer ring server cools down, it comes back in.

That's the entire rotation logic. About 50 lines of Python.

What is "temperature"?

This took me a while to get right.

My first attempt was just raw latency. That was bad. A server handling one slow database query looks terrible even when it's completely healthy. I needed something more composed.

Current formula:

pythontemperature = EMA(
0.7 * relative_latency_anomaly +
0.1 * cpu_score +
0.1 * memory_score +
0.1 * (error_rate + connection_score)
)

Three decisions here worth explaining.

EMA over simple moving average

EMA weights recent measurements more heavily. If a server just had a bad spike but recovered, EMA reflects that recovery faster than a window average would. The formula:

EMA_t = α * current_value + (1 - α) * EMA_{t-1}

Higher α means faster reaction but more noise sensitivity. Lower means smoother but slower detection. I tuned α empirically — the benchmark section covers what I observed.

Relative latency anomaly, not absolute latency

This is the part I'm most happy with.

Instead of flagging a server when latency crosses some hardcoded threshold like "300ms is bad," I compare each server to the cluster median:

pythonrelative_anomaly = (server_latency - cluster_median) / cluster_median

Why does this matter? If your whole cluster is running at 200ms — maybe it's a heavy batch job period, maybe your database is under load — that's just the current normal. A 200ms server shouldn't be punished when everyone is at 200ms. But a 400ms server in a 200ms cluster? That's a real anomaly.

No manual threshold needed. The system self-calibrates to whatever your current traffic looks like. 60ms is fine if the cluster median is 60ms. The scoring is scale-invariant.

70/30 split between latency and system metrics

Latency is the user-visible signal. CPU, memory, and connections are leading indicators — they can catch problems before latency visibly degrades. I weighted latency higher because that's ultimately what matters, but the other signals pull their weight.

Data structure choices

Inner ring as deque: Round-robin is just rotating a deque. Append to right, pop from left. O(1) for both. I tried a list first — the index tracking got messy and error-prone whenever servers got removed mid-rotation. Deque was cleaner and the right tool.

Outer ring as min-heap: I want the coolest resting server to come back in first. Min-heap gives me that in O(log n) for insertion and extraction. I briefly considered just sorting the outer ring on every update — fine at small n, but min-heap felt more principled and honest about the intent.

The whole thing is about 700 lines of Python with zero external dependencies. I deliberately avoided pulling in anything external. I wanted this to be droppable into any project without a dependency audit conversation.

What the benchmarks showed

I tested with 6 FastAPI servers on loopback (all on the same machine). That's a significant caveat — I'll address it directly in the limitations section.

Normal load: HuddleCluster performs comparably to round-robin and least-connections. No meaningful difference. That's expected. When nothing is degraded, rotation rarely triggers and the deque just does round-robin like anything else.

Server failure simulation: I introduced artificial 5-second delays on one server mid-test. This is where the gap appeared.

AlgorithmP95 LatencyRound Robin5,026msLeast Connections4,891msHuddleCluster85.6ms

Round-robin kept routing 1-in-6 requests to the slow server throughout the test. HuddleCluster evicted it after approximately 3 request cycles — detection converges in about 36 cluster requests on average.

Inner ring fairness: Gini coefficient was 0.00 in every test scenario. The deque distributes perfectly evenly among active servers.

Routing overhead: 10.7μs per request average. Acceptable for the use case.

Where it breaks

I think this section matters as much as the benchmark numbers.

Loopback is not production. Every benchmark I ran is on a single machine. WAN introduces higher base latency, more jitter, and failure modes I haven't tested. The EMA sensitivity I tuned for loopback may need adjustment for real network conditions — high per-server jitter could cause false evictions if α is too aggressive. This is the most honest gap in the current work.

The k ≥ n/2 problem. Relative scoring works well when a minority of servers degrade. If half or more of your cluster slows down simultaneously — shared database contention, a network event, a traffic spike hitting everyone — the cluster median shifts up and no individual server looks anomalous. The algorithm goes blind. I document this in the paper but haven't solved it yet.

No cross-host state. HuddleCluster runs per-process. Multiple load balancer instances don't share rotation state. There's a gossip protocol stub in v1.3.0 but it's not complete.

Detection speed

One thing that surprised me during benchmarking: how fast eviction actually happens in practice.

A 3× slower server gets rotated out in roughly 3 request cycles. For most traffic volumes that's well under a second. I expected slower convergence — the EMA smoothing should delay reaction — but the relative scoring amplifies the signal enough that threshold crossing happens fast.

What's next

WAN benchmarks are the obvious priority. I want to understand how the EMA α needs to change under real network jitter before claiming this is production-ready.

The multi-server simultaneous degradation case also needs empirical testing. I have theoretical analysis of why relative scoring breaks at k ≥ n/2, but I want to measure the actual degradation curve — at what point does detection start failing?

Distributed state via gossip is in progress.

GitHub: github.com/rahadbhuiya/HuddleCluster

Paper: doi.org/10.5281/zenodo.20348019

If you've worked with adaptive load balancing in production — especially anything with relative or percentile-based scoring — I'd be curious what threshold strategies held up under WAN jitter.