DEV Community

Harish Kotra (he/him)
Harish Kotra (he/him)

Posted on

Building "Pixel Raiders": An LLM-Powered Multi-Agent Arena with Ollama

Local LLMs are getting smaller and faster every day. But how do they perform when tasked with real-time spatial decision-making? To find out, we built Pixel Raiders—a grid-based simulation where agents powered by models like Gemma 3, Llama 3.2, and Qwen hunt for treasure while dodging deadly traps.

In this post, we’ll dive into the technical challenges of orchestrating multiple local models in a single simulation loop.

The Architecture: A Streamlit-Ollama Marriage

We chose Streamlit for its rapid UI prototyping and Ollama for its rock-solid local LLM hosting. The simulation runs on a pure Python engine, using PIL (Pillow) to dynamically render the 2D pixel world.

  1. Minimalist State Representation Local models have limited context windows and reasoning capacity compared to giants like GPT-4. To make them effective raiders, we had to compress the entire world state into a single, punchy prompt:
prompt = f"Goal: {treasure}. Traps: {traps}. Position: {pos}. Grid: 8x8. Next Move?"
Enter fullscreen mode Exit fullscreen mode
  1. The Bottleneck: VRAM Model Swapping The biggest challenge was VRAM. When you assign Agent 1 to Qwen and Agent 2 to Llama, your GPU has to swap these models out of memory every few seconds. On consumer hardware, this takes 10-20 seconds per swap.

The Solution: We implemented a robust 60-second timeout and a "Thinking..." status in the UI. This informs the user that the system hasn't hung; it’s just the GPU doing the heavy lifting of reloading weights.

  1. Fighting LLM Chatter with Regex Small models love to talk. Ask them for "UP", and they’ll give you "I will move UP because the goal is at [5,5]". This breaks deterministic movement logic.

We solved this with a three-pronged strategy:

  1. System Role Enforcement: Telling the AI it's a "Grid Move Generator" that forbidden dialogue.
  2. Tight Token Limits: Setting max_tokens=15 to shut the door before they start rambling.
  3. Regex Extraction: Using \b(UP|DOWN|LEFT|RIGHT)\b to find the move wherever it hides.
match = re.search(r"\b(UP|DOWN|LEFT|RIGHT)\b", raw_text)
if match:
    move = match.group(1)
Enter fullscreen mode Exit fullscreen mode

The "Knockout" Logic

The game became truly exciting once we added permanent consequences. Using an active state flag, we ensure that as soon as an LLM walks its agent into a trap, it's eliminated. The UI reflects this with a semi-transparent card decay.

How this works!

Lessons Learned

  • Bigger isn't always better: Qwen 4B often responded faster and followed instructions better than larger models in this specific spatial task.
  • Latency is UX: Without clear terminal logs and UI "Thinking" indicators, users will think the app is broken during model swaps.
  • Local is Private: The beauty of this app is that everything including the "brains" runs entirely on your local machine.

Pixel Raiders proves that even with 3B-parameter models, we can build complex, autonomous agents if we give them the right instructions and a resilient parsing engine.

Follow the project on GitHub: Pixel Raiders

Top comments (0)