DEV Community

Tower Before Dusk: I Built a Puzzle Game for Humans and AI

Daniel Balcarek on June 18, 2026

This is a submission for the June Solstice Game Jam It's interesting how the most exciting ideas always arrive when I have basically no time to wo...

Read full post

Alberto Brandi • Jul 25 • Edited

The balance between heuristic logic and procedural challenges must have been fascinating to engineer during development. As someone who spends hours staring at code and testing logic puzzles, I usually like taking small breaks to clear my head with live match updates or interactive mini-games. When looking for a dependable dashboard to follow live events during work downtime, reading a tech review that explained Is MelBet secure? gave me a great breakdown of how their backend optimization and responsive web design handle high traffic. Smooth, bug-free performance is something I appreciate both in indie game builds and web architecture.

Daniel Balcarek • Jul 25

Thanks! Glad you like it.

Hemapriya Kanagala • Jun 18

Daniel, this is a really creative idea. Making a game that both humans and AI can play is not something you see every day.

I'll definitely give it a try when I get some time. Curious to see whether I can beat the AI on the harder levels 😄

Daniel Balcarek • Jun 18

Thanks, Hemapriya! ❤️

The first three levels are intentionally easier because the lightweight models were already struggling with them. Levels 4 and 5 should feel more like normal puzzle difficulty.

And I’m hoping to add a few genuinely hard ones over the weekend too. 😅

Sylwia Laskowska • Jun 18

Wow, another addictive game! 😄 Saving this one for after work. BTW, Google should probably send us some stickers for all the free webMCP promotion we're doing 😂

Daniel Balcarek • Jun 18

That would actually be amazing! 😄 But without your article, I probably would not even know WebMCP existed, so most of the credit goes to you! ❤️

Sylwia Laskowska • Jun 18

Deal! 😄 In that case, I'll take two stickers! 😂❤️

Daniel Balcarek • Jun 18

Absolutely! You definitely deserve both of them!🏆 😂

Daniel Balcarek • Jun 22

Small update after the discussion here. 👇

First, I’d like to thank everyone for the feedback, ideas and kind words. I did not expect the article/game to start such an active discussion.

I made a few additions based on the comments:

Added three more levels, hopefully harder ones. 😅

Added AI plan simulation. The AI can now check whether its current plan is valid before calling the submitPlan tool. It works through a new WebMCP tool called checkPlan, which allows the AI to validate its plan before submitting it. There is a limit, though: after the AI reaches that limit, it must submit its final plan.

Lightweight models now perform better. In many cases, they still do not reach the goal, but they get closer. I’m planning to test better models too, so we’ll see how they perform.

Thanks again for trying the game and sharing ideas! ❤️

Utkarsh Bansal • Jun 18

Loved the game, it's really addictive. I have a few suggestions though.
Right now, the game feels a bit too much like calculating moves, similar to a puzzle like checkers. Instead of giving a strict move limit and placing the castle far away, you could move the castle closer and give players some extra moves.

Example: if reaching the castle takes 32 moves, give the player 40 moves.
The extra 8 moves could be used to collect extra resources that could be used in the future levels. This will add a layer of strategy on top of it instead of forcing a single optimal path.

With this you can also create a leader board or personal best score on the min moves taken to reach the castle.

Daniel Balcarek • Jun 18

Thanks, Utkarsh, I really appreciate it and I’m glad you like it!

Those are both great ideas. The game would definitely become much more interesting. My only concern is the AI side: the lightweight models already struggle with the simplest levels and adding optional objectives plus long-term resource decisions could make them lose track even more easily.

But for a human-focused mode, I think this would be a very fun direction. 👍️

Aliaksei Zelianouski • Jun 18

The "AI struggled" result might be model tier plus format more than AI in general. A move-budgeted tile puzzle leans on exactly what trips LLMs up - spatial reasoning, counting, and one-shot planning with no feedback loop - and you tested the lightweight Flash models, which is where that breaks first. Worth a frontier run before lowering difficulty: I've seen people report Fable 5 is genuinely strong at spatial reasoning in games now, so the gap might be tier, not a ceiling.

Either way it comes back to balance, and that's why I build conversational games. Mixing human and AI players is just easier when the game advances through dialogue - the challenge becomes language, the one thing these models are genuinely good at, so they sit near human level instead of way below or way above. Assuming you can prompt them to stay on track, of course.

Daniel Balcarek • Jun 18

Yep, I agree and I actually described it in the article that this is a limitation of the models I used. Stronger models would probably perform much better.

And as you said, conversational games are a much more natural fit for these models, but then it would not be as much fun for me to challenge them with something outside their comfort zone. 😄

Web Developer Hyper • Jun 18

The idea of AI thinking for itself and playing the game is unique and fun. Good game! There might be many other great ways to use AI that we haven't discovered yet. 🤔

Daniel Balcarek • Jun 18

Thanks! Glad you like it. ❤️

Yeah, you’re right. Discovering those new possibilities is one of the most fascinating and interesting things to do.

Web Developer Hyper • Jun 18

Looking forward to seeing what unique AI idea you come up with next! 😄

Daniel Balcarek • Jun 19

❤️A lot of my ideas are inspired by articles from the DEV Community, so I’m always curious to see what interesting things people build and write about next! 😄😅

Web Developer Hyper • Jun 19

You are surely one of the most creative and highly skilled engineers in the DEV Community! 👍

Daniel Balcarek • Jun 19

Oh, thank you! That’s really encouraging and it warms my heart. 😊

I wouldn’t go that far, though, there are plenty of excellent engineers in the DEV Community, including you! 🙌

Harsh • Jun 18

Cool😎 Puzzle game for humans and AI such a unique angle Curious what's one puzzle type that AI solves faster than humans and one that humans consistently beat AI at? Would love to hear about the design process.

Thanks for sharing! 🚀

Daniel Balcarek • Jun 18

Thanks, Harsh!

I developed and debugged it with Google’s WebMCP Model Context Tool Inspector. It currently offers only three model options: Gemini 3 Flash Preview, Gemini 3.1 Flash-Lite, and Gemini 3.5 Flash, so there was not much room for broader experimentation.

From what I tested, Flash-Lite often fails even on level 1. The other two can finish level 1, but they already struggle with level 2.

So, for now, humans win. 😄

I think stronger models would probably do better, but testing those would mean building my own agent outside the Inspector. That could be a fun next step. 🤔

Eryc Tri Juni S • Jun 18

the hard part wasn't WebMCP. it was the puzzle. 🎯
one question though — did the model actually count the moves, or just vibes-based sequence and pray?
because those are very different failure modes. 👀

Daniel Balcarek • Jun 18

I included remainingMoves in the game state, and Gemini 3.1 Flash-Lite especially often did not use the full budget. For example, it could have 28 moves available but submit only around 20 actions and stop before reaching the goal.

So it was not always a case of running out of moves, sometimes it simply produced an incomplete plan. The models I tested were lightweight, though, so stronger models would probably perform much better.

Eryc Tri Juni S • Jun 18

so it had budget left and still stopped short — that's not a counting problem, that's the model not knowing it failed until after it submitted.

it thought it was done. that's the scarier failure mode.

Mykola Kondratiuk • Jun 19

curious how the AI actually navigates the puzzle - does WebMCP give it a structured state dump, or does it read the DOM like a human would? the move order would look really different depending on that

Daniel Balcarek • Jun 19

WebMCP exposes a gameState tool, and that returns a structured state with the objective, rules, legend, remaining moves, current resources, and visibleMap:

visibleMap: [
  "P . W ~",
  ". R . G"
]

So the AI sees the puzzle as structured data, not as DOM. I included the full state earlier in the article, probably the longest code example there. 😀

Mykola Kondratiuk • Jun 19

structured state makes the agent reasoning legible in a way raw DOM never could — you can actually trace a bad move back to the input. does the visible map ever mislead it when fog is only partial?

Daniel Balcarek • Jun 19

Ah, sorry for the confusion, there is no fog. visibleMap exposes the full map, so the failures are more about planning/counting mistakes than partial visibility.

Yep, bad variable naming on my side. 😅

Mykola Kondratiuk • Jun 19

got it, that's actually cleaner to debug - if the map's fully visible and it still miscounts, the failure is clearly in the planning layer. easier to isolate

Marina Eremina • Jun 18

Really cool game, I even reached level 5! The previous one was entertaining as well, great job! 🎉

Daniel Balcarek • Jun 18

Thanks! That was fast, you must be a good player! 😄

I made the first three levels easier on purpose so the AI would have a chance, but levels 4 and 5 were meant to be more challenging. Maybe I should add a few tougher ones. 😅

Marina Eremina • Jun 18

I just like this type of game. The only thing is they usually come packed with ads. What about yours? Should we expect ads to show up later? 😅

Daniel Balcarek • Jun 18

Never! 😄 Or at least until Cloudflare’s free tier is no longer enough to host the game. 😅

Just joking, I’d rather find another solution before adding ads.

𝐓𝐡𝐞 𝐋𝐚𝐳𝐲 𝐆𝐢𝐫𝐥 • Jun 18

Bro, you're making so many awesome games these days, I wouldn't be surprised if GTA 6 turns out to be your next project!😅

Daniel Balcarek • Jun 18

That’s a great one! 😂 I’d love to say “Challenge accepted,” but I think GTA 6 might be slightly out of scope for the next DEV challenge. 🤣

dehkadeh honar • Jun 18

wow very nice

Daniel Balcarek • Jun 18

Thanks! I’m really glad you like it.

mote • Jun 23

This is a fascinating concept. I have been thinking about how to evaluate AI problem-solving in a way that is fair and human-comparable, and your game does exactly that by making the evaluation itself the entertainment.

The framing of "AI agents as a new class of user" is interesting. In robotics and embodied AI, we have been dealing with this for years — systems that need to operate in environments designed for humans, with human-shaped constraints. The challenge is always that human-designated "easy" problems (recognizing an object, navigating a space) are hard for AI, while AI finds "hard" problems (perfect chess moves) trivially easy.

Have you thought about how the adversarial dynamics evolve as AI gets better at the game? Does the "human fun" threshold shift as the AI can no longer be meaningfully challenged?

Daniel Balcarek • Jun 23

That contrast between human-hard and AI-hard problems is a really interesting point of view. While building this game, I ran into that difference, but I had not really thought about it in this way before.

The idea of AI agents as a new class of user is exactly what made the project fun for me. I was not only designing a puzzle for humans, but also thinking about how an AI would understand the rules, read the map, plan ahead, and interact with the game through tools.

For humans, I think around 10 levels could still be fun. But to avoid making it repetitive, I would probably need to add new mechanics. And that connects nicely with the AI side: every new mechanic changes the problem space and can make the game challenging for AI again.

With the current design, there is also another way to make the game more challenging for AI. If AI becomes too good, I can reduce the number of attempts for the checkPlan tool, which lets the AI validate whether its plan reaches the goal before submitting it.

So yes, I think the “human fun” threshold would shift as AI gets better. But maybe that is part of the interesting challenge: designing levels that are still fun for humans, while also exposing where AI planning still breaks down.

Aarti Jangid • Jun 22

Interesting concept! Combining puzzle-solving with both human and AI interaction creates a unique gameplay experience.

Daniel Balcarek • Jun 22

Thanks! It was really interesting to implement, especially because I had to look at the game from both perspectives.

Vaibhav Kumar Kandhway • Jun 21

The AI can also play is a new angle would love to play it sometime.

Daniel Balcarek • Jun 22

Thanks for the comment! It started just as an experiment with WebMCP, but it turned out to be really fun. I’m definitely planning to improve it.

Mudassir Khan • Jun 19

the batch planning design is the interesting architectural choice here. you avoided the move by move feedback loop on purpose — the AI sees state once, plans once, submits. but that's also where it breaks: one shot planning on a spatial problem is basically asking the model to simulate an interpreter in its head.

we've hit the same failure mode building agents that generate database migration sequences. GPT class models do fine on 3 step plans, fall apart past 8 even with the full schema in context. the model stops writing before reaching the goal, convinced it's already done.

did you try giving the model a scratch pad tool before calling submitPlan? intermediate reasoning before committing is usually the unlock for counting problems.

Daniel Balcarek • Jun 19

I didn’t try it, but this is actually an excellent idea.

I could simply expose a checkPlan tool with a response like:

{
  "valid": false,
  "reachedGoal": false,
  "finalPosition": { "x": 4, "y": 2 },
  "remainingMoves": 3,
  "reason": "Plan ended before reaching G"
}

But one concern comes to my mind: for easy levels it could be fast and perfectly fine, but as the difficulty increases, the model might call this tool too often. The reasoning would take more time and burn more tokens, which is exactly what I wanted to avoid with the one-shot planning design.

Still, I could allow it to call checkPlan only once or twice before submitPlan, which could improve the success rate while keeping the experiment controlled.

Thanks for the amazing idea!