This is a submission for the GitHub Copilot CLI Challenge
What I Built
I built minecraft-mcp-server: a local MCP server that connects AI agents to Minecraft through Mineflayer, designed specifically to test and stress 3D spatial reasoning in modern LLMs.
Instead of generic “chat with a game” behavior, this project exposes explicit, typed tools that force grounded spatial decisions in a real 3D world:
-
Creative mode tools for structure generation and world manipulation (
setblock,fill,clone_area,fly_to,teleport_to,set_time,set_weather, etc.) -
Survival mode tools for embodied task execution (
go_to,dig_block,place_block,collect_block,craft_item,equip_item) -
Config-driven local runtime (
MC_MODE=creative|survival) so agents can run immediately without hand-configuring connection parameters
I also focused heavily on reliability for agent evaluation:
-
fly_tonow uses a robust fallback chain (direct flight → arc flight → teleport fallback) - command tools return explicit confirmation metadata (
executed,category,timedOut) so runs can be analyzed as true success/failure, not just “best effort”
To me, this is a practical testbed for the next generation of LLMs: not just language fluency, but spatial planning, coordinate reasoning, and grounded action feedback loops.
Demo
Project Repo: https://github.com/risnake/minecraft-mcp-server
Images
Reference images


-
#OpenAI GPT 5.3 Codex 

-
#Anthropic Claude Opus 4.6
1.
2.
3.
Key takeaways
- Opus 4.6 seems to have a much better spatial understanding compared to the latest GPT 5.3 Codex model.
- Opus often added tiny details which it observed in the image which GPT often failed to do
My Experience with GitHub Copilot CLI
GitHub Copilot CLI felt like an orchestration layer for shipping an agentic system quickly: I used it to research APIs, generate implementation plans, dispatch parallel coding/research passes, and iterate on reliability bugs without losing momentum.
The biggest win was speed + structure: I could move from idea to working MCP server with mode-aware tooling, then harden it through targeted debugging (runtime interop issues, flight edge cases, command acknowledgment integrity) in a tight loop.
Most importantly, Copilot CLI helped turn a broad concept (“LLMs in Minecraft”) into a focused experiment platform for evaluating embodied 3D reasoning—which is exactly the frontier I care about.
Top comments (2)
This is such a creative and fun approach to evaluating spatial reasoning! Using a Minecraft MCP server as a sandbox for model testing not only makes the experiments visually intuitive, but also opens up a lot of possibilities for real-world reasoning benchmarks. I especially love how the environment naturally generates complex scenarios — it feels like a great middle ground between synthetic toy tasks and fully realistic simulations. Looking forward to seeing how the models perform and evolve in this space! 🎮🧠
Thank you so much, I really appreciate your feedback!