This is a submission for the June Solstice Game Jam
It's interesting how the most exciting ideas always arrive when I have basically no time to wo...
For further actions, you may consider blocking this person and/or reporting abuse
Daniel, this is a really creative idea. Making a game that both humans and AI can play is not something you see every day.
I'll definitely give it a try when I get some time. Curious to see whether I can beat the AI on the harder levels 😄
Thanks, Hemapriya! ❤️
The first three levels are intentionally easier because the lightweight models were already struggling with them. Levels 4 and 5 should feel more like normal puzzle difficulty.
And I’m hoping to add a few genuinely hard ones over the weekend too. 😅
Wow, another addictive game! 😄 Saving this one for after work. BTW, Google should probably send us some stickers for all the free webMCP promotion we're doing 😂
That would actually be amazing! 😄 But without your article, I probably would not even know WebMCP existed, so most of the credit goes to you! ❤️
Deal! 😄 In that case, I'll take two stickers! 😂❤️
Absolutely! You definitely deserve both of them!🏆 😂
The "AI struggled" result might be model tier plus format more than AI in general. A move-budgeted tile puzzle leans on exactly what trips LLMs up - spatial reasoning, counting, and one-shot planning with no feedback loop - and you tested the lightweight Flash models, which is where that breaks first. Worth a frontier run before lowering difficulty: I've seen people report Fable 5 is genuinely strong at spatial reasoning in games now, so the gap might be tier, not a ceiling.
Either way it comes back to balance, and that's why I build conversational games. Mixing human and AI players is just easier when the game advances through dialogue - the challenge becomes language, the one thing these models are genuinely good at, so they sit near human level instead of way below or way above. Assuming you can prompt them to stay on track, of course.
Yep, I agree and I actually described it in the article that this is a limitation of the models I used. Stronger models would probably perform much better.
And as you said, conversational games are a much more natural fit for these models, but then it would not be as much fun for me to challenge them with something outside their comfort zone. 😄
Loved the game, it's really addictive. I have a few suggestions though.
Right now, the game feels a bit too much like calculating moves, similar to a puzzle like checkers. Instead of giving a strict move limit and placing the castle far away, you could move the castle closer and give players some extra moves.
Example: if reaching the castle takes 32 moves, give the player 40 moves.
The extra 8 moves could be used to collect extra resources that could be used in the future levels. This will add a layer of strategy on top of it instead of forcing a single optimal path.
With this you can also create a leader board or personal best score on the min moves taken to reach the castle.
Thanks, Utkarsh, I really appreciate it and I’m glad you like it!
Those are both great ideas. The game would definitely become much more interesting. My only concern is the AI side: the lightweight models already struggle with the simplest levels and adding optional objectives plus long-term resource decisions could make them lose track even more easily.
But for a human-focused mode, I think this would be a very fun direction. 👍️
the hard part wasn't WebMCP. it was the puzzle. 🎯
one question though — did the model actually count the moves, or just vibes-based sequence and pray?
because those are very different failure modes. 👀
I included
remainingMovesin the game state, and Gemini 3.1 Flash-Lite especially often did not use the full budget. For example, it could have 28 moves available but submit only around 20 actions and stop before reaching the goal.So it was not always a case of running out of moves, sometimes it simply produced an incomplete plan. The models I tested were lightweight, though, so stronger models would probably perform much better.
so it had budget left and still stopped short — that's not a counting problem, that's the model not knowing it failed until after it submitted.
it thought it was done. that's the scarier failure mode.
Cool😎 Puzzle game for humans and AI such a unique angle Curious what's one puzzle type that AI solves faster than humans and one that humans consistently beat AI at? Would love to hear about the design process.
Thanks for sharing! 🚀
Thanks, Harsh!
I developed and debugged it with Google’s WebMCP Model Context Tool Inspector. It currently offers only three model options: Gemini 3 Flash Preview, Gemini 3.1 Flash-Lite, and Gemini 3.5 Flash, so there was not much room for broader experimentation.
From what I tested, Flash-Lite often fails even on level 1. The other two can finish level 1, but they already struggle with level 2.
So, for now, humans win. 😄
I think stronger models would probably do better, but testing those would mean building my own agent outside the Inspector. That could be a fun next step. 🤔
The idea of AI thinking for itself and playing the game is unique and fun. Good game! There might be many other great ways to use AI that we haven't discovered yet. 🤔
Thanks! Glad you like it. ❤️
Yeah, you’re right. Discovering those new possibilities is one of the most fascinating and interesting things to do.
Looking forward to seeing what unique AI idea you come up with next! 😄
❤️A lot of my ideas are inspired by articles from the DEV Community, so I’m always curious to see what interesting things people build and write about next! 😄😅
You are surely one of the most creative and highly skilled engineers in the DEV Community! 👍
Oh, thank you! That’s really encouraging and it warms my heart. 😊
I wouldn’t go that far, though, there are plenty of excellent engineers in the DEV Community, including you! 🙌
curious how the AI actually navigates the puzzle - does WebMCP give it a structured state dump, or does it read the DOM like a human would? the move order would look really different depending on that
WebMCP exposes a
gameStatetool, and that returns a structured state with the objective, rules, legend, remaining moves, current resources, and visibleMap:So the AI sees the puzzle as structured data, not as DOM. I included the full state earlier in the article, probably the longest code example there. 😀
structured state makes the agent reasoning legible in a way raw DOM never could — you can actually trace a bad move back to the input. does the visible map ever mislead it when fog is only partial?
Ah, sorry for the confusion, there is no fog.
visibleMapexposes the full map, so the failures are more about planning/counting mistakes than partial visibility.Yep, bad variable naming on my side. 😅
got it, that's actually cleaner to debug - if the map's fully visible and it still miscounts, the failure is clearly in the planning layer. easier to isolate
Really cool game, I even reached level 5! The previous one was entertaining as well, great job! 🎉
Thanks! That was fast, you must be a good player! 😄
I made the first three levels easier on purpose so the AI would have a chance, but levels 4 and 5 were meant to be more challenging. Maybe I should add a few tougher ones. 😅
I just like this type of game. The only thing is they usually come packed with ads. What about yours? Should we expect ads to show up later? 😅
Never! 😄 Or at least until Cloudflare’s free tier is no longer enough to host the game. 😅
Just joking, I’d rather find another solution before adding ads.
wow very nice
Thanks! I’m really glad you like it.
Bro, you're making so many awesome games these days, I wouldn't be surprised if GTA 6 turns out to be your next project!😅
That’s a great one! 😂 I’d love to say “Challenge accepted,” but I think GTA 6 might be slightly out of scope for the next DEV challenge. 🤣
the batch planning design is the interesting architectural choice here. you avoided the move by move feedback loop on purpose — the AI sees state once, plans once, submits. but that's also where it breaks: one shot planning on a spatial problem is basically asking the model to simulate an interpreter in its head.
we've hit the same failure mode building agents that generate database migration sequences. GPT class models do fine on 3 step plans, fall apart past 8 even with the full schema in context. the model stops writing before reaching the goal, convinced it's already done.
did you try giving the model a scratch pad tool before calling submitPlan? intermediate reasoning before committing is usually the unlock for counting problems.
I didn’t try it, but this is actually an excellent idea.
I could simply expose a
checkPlantool with a response like:But one concern comes to my mind: for easy levels it could be fast and perfectly fine, but as the difficulty increases, the model might call this tool too often. The reasoning would take more time and burn more tokens, which is exactly what I wanted to avoid with the one-shot planning design.
Still, I could allow it to call
checkPlanonly once or twice beforesubmitPlan, which could improve the success rate while keeping the experiment controlled.Thanks for the amazing idea!