How I Teach LLMs to Play BattleTech (Part 1): Architecture, Agents, and Tools

#mcp #dotnet #ai #gamedev

This post is Part 1 of a series on building an LLM-powered BattleTech bot.

👉 Part 2: Building an MCP Server and Agents in C#

What to Expect

Here’s how the series is structured:

Part 1. Theory & Architecture

Introduction
MakaMek Architecture Overview
How LLM Models Use Tools

Part 2. Hands-On Implementation

Building an MCP Server to Query Game State
Creating Agents with the Microsoft Agent Framework
Empowering Agents with Tools
Conclusion

Introduction

Nowadays there seems to be a tendency to solve almost every problem with a solution that implies use of AI agents. "There is not enough AI in this report", or "this proposal is great, but where are your AI agents?" is something that I hear frequently. But are we really supposed to throw AI on every problem? I do find the technology extremely useful for many use cases, but at the same time there are many cases where traditional automation is still a valid option.

To illustrate this, I want to describe how after building a rule-based bot for my pet-project MakaMek (which is a computer implementation of a turn-based tactical wargame BattleTech), I've decided to make one step further and create an LLM-powered bot too.

Someone has actually suggested: “just use ChatGPT for the bot” when I asked for an advice on bot's strategy. I was quite skeptical about the idea: as I explained in my previous (now almost one-year-old) blog post, I do not believe that predicting the next token has anything to do with “true intelligence”, but at the same time I already had all the functions and scripts to evaluate the actual tactical situation on the board, coming from my rules-based implementation. So I wondered: if I provide that information with the available options in a clear text format, could the model pick choices that actually make sense?

Working on this feature was a lot of fun, and it gave me valuable hands-on experience in building agents and connecting them with various tools, including a custom Model Context Protocol (MCP) server.

In this post, I cover some fundamentals of agentic systems and the journey behind my LLM-powered bot. I’ll explain the architecture that allowed me to add it without changing the main application, and demonstrate how to build an MCP server, AI agents, LLM providers, and tool integrations using .NET and C#.

MakaMek Architecture

MakaMek is my own computer version of BattleTech that I use as a playground for experimenting with new technology. The game is FOSS and available on GitHub.

It is a server–client application written in .NET. The server holds the authoritative game state, which can be changed by applying commands received from clients. A single client can host one or more players, including human players and bots. The server and clients communicate via a custom pub-sub–based command protocol that supports multiple transports, including reactive commands, pure WebSockets, and SignalR. Both server and client are UI-agnostic and can be hosted in any process.

Given this architecture, the obvious solution for an LLM bot was to implement it as a standalone, headless application hosting the game client and the bot logic. I was able to reuse the Bot class from the rules-based implementation. The bot itself is generic: it contains no decision logic and only observes the client game state. For decision-making, the bot relies on an IDecisionEngine interface where the actual logic is implemented. The rules-based bot has a dedicated engine per game phase. I took the same approach for the LLM-powered bot and introduced four llm* decision engines for the phases that require user input. These engines delegate the actual decision-making to an AI agent.

For the AI agents, I decided to introduce a separate host application/service to improve flexibility and scalability. In my setup, the bot and the agent host are two independent applications packaged as Docker containers that communicate over HTTP. The bot (via the LLM decision engine) sends a request to the agent application containing a high-level description of the game state, including all units.

The agent application contains four phase-specific agents. A router redirects each request to the appropriate agent based on the current game phase. The agent converts the structured request into an LLM-friendly text prompt and invokes the model. The model produces a decision, which is then returned to the bot as an HTTP response.

While this design provides a solid foundation, there are still two problems not addressed by the process:

Information about the units alone is often not enough. An LLM cannot truly “understand” a tactical situation from raw state data. It needs additional context, such as where units are allowed to move, which enemies pose a threat, hit probabilities, and similar factors.
The agent is expected to return a well-structured, schema-compliant response that can be safely executed by the game.

To address both issues, we can equip our agents with tools:

The bot application can run an MCP server exposing tools that provide tactical data by querying the game client.
The agent can include helpers to validate and format responses according to the expected command schema.

This leads to a question that still confuses many people, including experienced engineers: what tools actually are, what types of tools exist, and how agents and models actually use them.

How LLM Models Use Tools

Let’s approach these questions one by one. So, what is a “tool”? The simple answer is: any custom code or script written in any programming language. It can be a local function running in the same process as the agent (a local tool), a CLI program exposed via a local MCP server, or a function available on a remote server through a REST API or MCP. Based on this definition, a tool can effectively do “anything”: provide additional data, perform calculations, or execute actions.

But how do LLM models call those tools? And the simple answer to this question is: they don’t — at least not directly. An LLM is text-in, text-out; it is not capable of taking actions on its own. Instead, the model receives a list of available tools and their descriptions as part of the prompt and can respond with the name of the tool to use along with the required arguments. The actual execution is delegated to orchestration code, which we usually call an agent.

Here is a sample generic flow showing a scenario in which a model is provided with a list of tools of different types and “executes” them one by one:

The key takeaway is that every time a model decides to use a tool, it returns that decision to the agent. The agent executes the tool and then resubmits the original prompt together with the tool result back to the model so it can continue reasoning.

Each tool execution therefore requires another round-trip to the model, which means extra latency and additional token usage. In practice, this results in roughly double the tokens being spent for every additional tool call. If data is available upfront, it often makes sense to include it directly in the initial prompt instead of relying on tool calls, which introduce extra cost and delay.

This concludes the theoretical part of the series.

👉 In Part 2 we’ll get more practical and build a remote MCP server using the C# MCP SDK, define agents with the Microsoft Agent Framework, and connect the agents to a local and cloud LLM.

We’ll also deploy the two bot implementations against each other. Curious who will win? Read Part 2 to the end.
Or maybe you already know the answer?😄 Let me know in the comments.