Breaking the "Pattern": How We Built (and tried to scale) Strategic AI for Sweep

#ai #machinelearning #architecture #elixir

In the world of online card games, players aren't just looking for a challenge; they’re looking for a soul.
At my company, our game Sweep had a long-standing "bot problem." Our computer opponents were built on a classic rule-based system. While they followed the mechanics perfectly, they were fundamentally predictable. Experienced players quickly identified their repetitive move patterns, effectively turning a game of high-stakes strategy into a solved puzzle.
We knew we had to evolve. We needed bots that could reason, bluff, and adapt like humans. This is the story of how we integrated locally hosted LLMs to build a "human-level" experience, and the hard infrastructure lessons we learned along the way.

The Proof of Concept: Beyond "If/Then" Logic

Our goal was to move from static rules to a Deterministic AI system that leveraged the pattern-matching power of Large Language Models.

The Hybrid "Move-Index" Approach

Instead of letting the LLM generate raw text (which is slow and hard to parse), we used it as a Pattern-Matching Engine. Here was our workflow:

Generation: Our existing rule-based system would generate every possible valid move for the current game state.
Rating: These moves were rated based on basic metrics (points, defense, etc.).
Selection: We fed the board state and the list of rated moves into the LLM.
Indexing: We asked the LLM to return only the index number of its chosen move.

This "index-only" response drastically reduced processing complexity and token overhead, allowing us to focus the AI's power solely on strategic choice rather than language generation.

Infrastructure Realities: Paramaters and Performance

When we moved from theory to locally hosted models (using Ollama), we hit our first major wall: the "Parameters vs. Latency" trade-off.

At one point, we naively tried to route every bot move through the LLM. The GPU queue exploded, latency spiked, and we had to roll back the feature within hours.

Small Models (Under 3B parameters): These models were blazing fast, but they were "playing dumb." They missed obvious partner synergies and failed to recognize long-term threats.
Large Models (Over 13B parameters): These were strategic masters, but they were too slow. By the time they finished "reading" the prompt and processing the board state, the player experience had already stalled.

To standardize our AI's "personality," we used Ollama Modelfiles to save specific system prompts. This ensured every bot instance had the same strategic baseline without us having to re-send huge prompt blocks for every turn.

Scaling with "Strategic Triage"

We technically succeeded in creating human-level bots, but we couldn't scale. Our infrastructure simply couldn't handle thousands of bots trying to hit a local GPU at the same time.

The solution was Strategic Triage.

Instead of giving every bot an LLM "brain" for every turn, we categorized moves:

The "Obvious" Tier: Simple captures were still handled by the fast rule-based system.
The "Critical" Tier: In 4-player games where Partner Synergy was vital, we activated the LLM.
Probabilistic Shedding: We only used the LLM brain for a percentage of the total bot pool. This "triage" helped us scale while still making the overall bot population feel more intelligent and unpredictable.

Humanizing the Bot: The Temperature Lever

To prevent the bots from becoming too perfect (and thus, boring), we used the Temperature parameter to simulate human mindsets.

Low Temperature (0.1–0.3): Created "Conservative" bots that played strictly by the book—perfect for professional-level rooms.
High Temperature (0.7–0.9): Created "Risk-Takers." These bots made aggressive, sometimes "erroneous" moves that felt exactly like a human player trying a bold bluff.

Conclusion: The Path Forward

The PoC proved that LLMs can indeed break the repetitive patterns of rule-based bots. However, the infrastructure cost of locally hosting high-parameter models is the current frontier. By using Deterministic Triage and Index-based responses, we've found a middle ground: bots that play like people, without the server-melting overhead.