Harish Kotra (he/him)

Posted on Apr 19

Agentoku V2: From Step-by-Step Sudoku Racing to One-Shot Full Solve

#ai #programming #javascript #dailybuild2026

Yesterday’s v1 build proved the core concept: multiple LLM providers can compete on the same Sudoku board with strict validation and real-time observability.

Today’s v2 upgrade extends that system with a different benchmark mode: single-call one-shot solving.

This post focuses on what changed from v1, why it matters, and how to apply the same design pattern in other AI systems.

V1 recap (baseline)

V1 included:

multi-provider step-by-step solving
standardized provider interface (solve(board, mode))
strict JSON parsing and Sudoku validation
SSE-powered live UI with retries, invalid move tracking, and timeout tracking

This made model behavior visible, but also introduced repeated model calls and repeated prompt overhead for each move.

Why V2 was needed

For benchmarking inference efficiency and cost, we needed:

one request per full puzzle (instead of one request per move)
lower prompt token usage
provider usability without hard dependency on startup env keys

V2 key additions

1) One-Shot page (`/one-shot`)

A dedicated page where user:

picks a provider
selects/enters model
sets timeout
clicks one button to solve full board in one call

This is intentionally simpler than the race UI: one board in, one board out.

2) New API endpoint: `POST /api/solve-once`

The backend now supports full-board one-shot requests.

High-level flow:

resolve provider + model + timeout (+ optional runtime API key)
call agent.solve(board, "full") exactly once
validate returned board
return status (solved, invalid, timeout, failed) + latency

3) Runtime API key input for OpenAI/Featherless

In v1/v1.5, cloud providers could appear disabled when env keys were missing.

V2 change:

OpenAI and Featherless are selectable
one-shot UI accepts runtime API key input
request can include apiKey
backend falls back to env key if runtime key not provided

This makes testing easier across environments without editing .env every time.

4) Prompt compaction for lower token usage

We replaced verbose full-solve instructions with a compact strict schema prompt.

V2 architecture

Core backend snippet (conceptual)

const response = await withTimeout(() => agent.solve(puzzle, "full"), timeoutMs);
const validated = validateFullSolutionPayload(response, puzzle);

if (!validated.ok) {
  return { status: "invalid", reason: validated.reason };
}

return { status: "solved", solution: validated.solution };

Cost-optimized prompt strategy (V2)

V1 prompt style was explicit but longer.
V2 uses a concise prompt preserving only required constraints + schema.

[
  "Solve Sudoku. Strict JSON only.",
  "Rules: digits 1-9; each row/col/3x3 has 1-9 exactly once; never change non-zero clues.",
  'Return exactly: {"solution":[[9x9 integers]]}',
  "No markdown, no extra keys/text.",
  "Board:",
  safeStringify(board),
].join("\n");

Why this is cost-aware

Fewer instruction tokens per request
No repetitive step prompts
Better fit for one-shot evaluation experiments

Validation still remains strict

Even with shorter prompting, we do not relax safety:

board shape must be valid 9x9
fixed clues must remain unchanged
board must satisfy Sudoku constraints
board must be fully solved

If any check fails, result is invalid.

Observability in one-shot mode

One-shot UI exposes:

selected provider/model
timeout used
result status
latency
optional token/cost estimator panel

Estimator is intentionally approximate but useful for quick tradeoff testing against step-based assumptions.

What this teaches (beyond Sudoku)

The v2 pattern is transferable to many AI workflows:

keep a stable provider abstraction
introduce alternate execution modes (step vs batch/one-shot)
optimize prompts per mode
keep strict validation unchanged
decouple cloud auth from startup env when practical

Suggested V3 expansions

persist one-shot vs step run comparisons
add provider/model auto-profiling over multiple puzzles
expose prompt presets (compact, strict, reasoning-heavy)
generate benchmark reports and trend charts

V1 gave us operational resilience.
V2 gives us cost-aware one-shot benchmarking while preserving correctness gates.

Github Repo: https://github.com/harishkotra/agentoku

DEV Community

Agentoku V2: From Step-by-Step Sudoku Racing to One-Shot Full Solve

V1 recap (baseline)

Why V2 was needed

V2 key additions

1) One-Shot page (`/one-shot`)

2) New API endpoint: `POST /api/solve-once`

3) Runtime API key input for OpenAI/Featherless

4) Prompt compaction for lower token usage

V2 architecture

Core backend snippet (conceptual)

Cost-optimized prompt strategy (V2)

Why this is cost-aware

Validation still remains strict

Observability in one-shot mode

What this teaches (beyond Sudoku)

Suggested V3 expansions

Top comments (0)

V1 recap (baseline)

Why V2 was needed

V2 key additions

1) One-Shot page (/one-shot)

2) New API endpoint: POST /api/solve-once

3) Runtime API key input for OpenAI/Featherless

4) Prompt compaction for lower token usage

V2 architecture

Core backend snippet (conceptual)

Cost-optimized prompt strategy (V2)

Why this is cost-aware

Validation still remains strict

Observability in one-shot mode

What this teaches (beyond Sudoku)

Suggested V3 expansions

1) One-Shot page (`/one-shot`)

2) New API endpoint: `POST /api/solve-once`