DEV Community

Cover image for Building a Multi-Agent Sudoku Arena in Node.js
Harish Kotra (he/him)
Harish Kotra (he/him)

Posted on

Building a Multi-Agent Sudoku Arena in Node.js

This post walks through a real project: a multi-provider AI Sudoku system where each model acts as an independent agent and competes under the same constraints.

If you care about AI reliability, this project is a practical pattern: never trust model output directly, always validate, and design orchestration to survive bad responses.

Why Sudoku?

Sudoku is a great benchmark for agent behavior because:

  • rules are strict and deterministic
  • outputs are easy to validate
  • hallucinations are immediately observable
  • step-by-step progress can be visualized cleanly

That makes it ideal for comparing local and cloud LLM behavior under identical prompt and runtime conditions.

What We Built

  • A modular Node.js app with four providers:
    • OpenAI
    • Ollama
    • LM Studio
    • Featherless (OpenAI-compatible)
  • A shared solve(board, mode) contract for all agents.
  • A robust Sudoku validation core.
  • A live web UI with side-by-side providers.
  • Counters for invalid moves and timeouts.

System Design

System Design

Folder Layout

agents/   # provider implementations
core/     # sudoku logic + orchestration
utils/    # json, timing, formatting
web/      # frontend UI
server.js # HTTP + SSE backend
index.js  # CLI entry
Enter fullscreen mode Exit fullscreen mode

Core Interface: Agent Contract

Every provider implements the same shape, making orchestration provider-agnostic.

class SomeProviderAgent {
  constructor(options) {
    this.name = "ProviderName";
    this.options = options;
  }

  async solve(board, mode = "full") {
    // return strict JSON data
  }
}
Enter fullscreen mode Exit fullscreen mode

Modes:

  • full -> { solution: [[...9x9]] }
  • step -> { row, col, value }

Defensive Output Handling

Model outputs are treated as untrusted data.

if (!text.startsWith("{") || !text.endsWith("}")) {
  return { ok: false, error: "Response is not strict JSON object text." };
}
Enter fullscreen mode Exit fullscreen mode

Even valid JSON is still validated semantically against Sudoku rules.

Sudoku Validation Strategy

The validator enforces:

  • board shape (9x9, integer bounds)
  • no duplicate values in rows/columns/3x3 boxes
  • move legality
  • clue preservation
  • solved-state completeness

This guarantees a model cannot β€œwin” by returning formatted but invalid answers.

Orchestrator Behavior: Resilience Over Fragility

An earlier version stopped a run on invalid move. We changed that for better observability and robustness.

Current behavior:

  • invalid move -> increment invalidMoveCount, continue
  • timeout -> increment timeoutCount, retry, continue until threshold
  • step with no valid move -> emit step_skipped, continue
  • solve success -> finish as solved

Pseudo-flow:

for each step:
  for each retry attempt:
    response = await agent.solve(board, "step")
    if invalid:
      invalidMoveCount++
      continue
    if timeout:
      timeoutCount++
      continue
    apply move
    emit move
    if solved: finish
  if no valid move in step:
    emit step_skipped
    continue
Enter fullscreen mode Exit fullscreen mode

Why SSE for Real-Time Updates?

SSE was enough for one-way streaming (server -> client), simpler than WebSockets for this use case.

res.writeHead(200, {
  "Content-Type": "text/event-stream",
  "Cache-Control": "no-cache",
  Connection: "keep-alive",
});
Enter fullscreen mode Exit fullscreen mode

Each event carries live stats so UI never needs hidden state from backend.

UI Design Decisions

  • Split providers into two rows:
    • Local models (Ollama, LM Studio)
    • Third-party models (OpenAI, Featherless)
  • Two columns each row for quick comparison.
  • Per-provider model configuration:
    • local: auto-detected model dropdown
    • cloud: manual model entry
  • Per-provider timeout input to address local model latency variability.

Local Model Discovery

We added provider-specific discovery endpoints:

  • Ollama: GET /api/tags
  • LM Studio: GET /v1/models

The frontend can refresh model lists without restarting server.

Timeout Lessons

Local models can be slow on first token or heavy model loads. A single global timeout is usually wrong.

What worked better:

  • per-provider timeout control in UI
  • higher defaults for local providers (>= 180000ms)
  • retryable timeout policy + timeout counters

Example Run Start Payload

{
  "providerId": "ollama",
  "model": "gemma4:latest",
  "timeoutMs": 180000
}
Enter fullscreen mode Exit fullscreen mode

Contribution Opportunities

If you want to extend this project, here are high-impact additions:

  1. Add a baseline deterministic solver and compare LLM deviation.
  2. Add puzzle packs and ELO-style provider rating.
  3. Add persistent run history (SQLite + charting).
  4. Add tests for orchestrator edge cases.
  5. Add CI + linting + type checks.
  6. Add websocket mode and richer live metrics.

Key Takeaways

  • Standard contracts unlock multi-provider experimentation.
  • Validation is non-negotiable when models are in the loop.
  • Reliability improves when invalid outputs become measurable events, not hard crashes.
  • Observability (attempts, invalids, timeouts) is as important as final correctness.

Output

Example Output

If you build a similar system for another constrained task (SQL generation, code transforms, schema mapping), this architecture transfers almost directly.

Github: https://github.com/harishkotra/agentoku

Top comments (0)