DEV Community

Cover image for Agentoku V2: From Step-by-Step Sudoku Racing to One-Shot Full Solve
Harish Kotra (he/him)
Harish Kotra (he/him)

Posted on

Agentoku V2: From Step-by-Step Sudoku Racing to One-Shot Full Solve

Yesterday’s v1 build proved the core concept: multiple LLM providers can compete on the same Sudoku board with strict validation and real-time observability.

Today’s v2 upgrade extends that system with a different benchmark mode: single-call one-shot solving.

This post focuses on what changed from v1, why it matters, and how to apply the same design pattern in other AI systems.

V1 recap (baseline)

V1 included:

  • multi-provider step-by-step solving
  • standardized provider interface (solve(board, mode))
  • strict JSON parsing and Sudoku validation
  • SSE-powered live UI with retries, invalid move tracking, and timeout tracking

This made model behavior visible, but also introduced repeated model calls and repeated prompt overhead for each move.

Why V2 was needed

For benchmarking inference efficiency and cost, we needed:

  1. one request per full puzzle (instead of one request per move)
  2. lower prompt token usage
  3. provider usability without hard dependency on startup env keys

V2 key additions

1) One-Shot page (/one-shot)

A dedicated page where user:

  • picks a provider
  • selects/enters model
  • sets timeout
  • clicks one button to solve full board in one call

This is intentionally simpler than the race UI: one board in, one board out.

2) New API endpoint: POST /api/solve-once

The backend now supports full-board one-shot requests.

High-level flow:

  1. resolve provider + model + timeout (+ optional runtime API key)
  2. call agent.solve(board, "full") exactly once
  3. validate returned board
  4. return status (solved, invalid, timeout, failed) + latency

3) Runtime API key input for OpenAI/Featherless

In v1/v1.5, cloud providers could appear disabled when env keys were missing.

V2 change:

  • OpenAI and Featherless are selectable
  • one-shot UI accepts runtime API key input
  • request can include apiKey
  • backend falls back to env key if runtime key not provided

This makes testing easier across environments without editing .env every time.

4) Prompt compaction for lower token usage

We replaced verbose full-solve instructions with a compact strict schema prompt.

V2 architecture

V2 architecture

Core backend snippet (conceptual)

const response = await withTimeout(() => agent.solve(puzzle, "full"), timeoutMs);
const validated = validateFullSolutionPayload(response, puzzle);

if (!validated.ok) {
  return { status: "invalid", reason: validated.reason };
}

return { status: "solved", solution: validated.solution };
Enter fullscreen mode Exit fullscreen mode

Cost-optimized prompt strategy (V2)

V1 prompt style was explicit but longer.
V2 uses a concise prompt preserving only required constraints + schema.

[
  "Solve Sudoku. Strict JSON only.",
  "Rules: digits 1-9; each row/col/3x3 has 1-9 exactly once; never change non-zero clues.",
  'Return exactly: {"solution":[[9x9 integers]]}',
  "No markdown, no extra keys/text.",
  "Board:",
  safeStringify(board),
].join("\n");
Enter fullscreen mode Exit fullscreen mode

Why this is cost-aware

  • Fewer instruction tokens per request
  • No repetitive step prompts
  • Better fit for one-shot evaluation experiments

Validation still remains strict

Even with shorter prompting, we do not relax safety:

  • board shape must be valid 9x9
  • fixed clues must remain unchanged
  • board must satisfy Sudoku constraints
  • board must be fully solved

If any check fails, result is invalid.

Observability in one-shot mode

One-shot UI exposes:

  • selected provider/model
  • timeout used
  • result status
  • latency
  • optional token/cost estimator panel

Estimator is intentionally approximate but useful for quick tradeoff testing against step-based assumptions.

What this teaches (beyond Sudoku)

The v2 pattern is transferable to many AI workflows:

  • keep a stable provider abstraction
  • introduce alternate execution modes (step vs batch/one-shot)
  • optimize prompts per mode
  • keep strict validation unchanged
  • decouple cloud auth from startup env when practical

Suggested V3 expansions

  • persist one-shot vs step run comparisons
  • add provider/model auto-profiling over multiple puzzles
  • expose prompt presets (compact, strict, reasoning-heavy)
  • generate benchmark reports and trend charts

V1 gave us operational resilience.
V2 gives us cost-aware one-shot benchmarking while preserving correctness gates.

Github Repo: https://github.com/harishkotra/agentoku

Top comments (0)