DEV Community

Cover image for Building Wordle Agent Duel: How I Made AI Models Compete at Word Games
Harish Kotra (he/him)
Harish Kotra (he/him)

Posted on

Building Wordle Agent Duel: How I Made AI Models Compete at Word Games

A deep-dive into building a competitive Wordle arena where autonomous AI agents — and a human player — race to solve puzzles using local and cloud LLMs.


The Idea

What if you could watch two AI models play Wordle against each other in real-time? Better yet, what if you could jump in and compete against them?

That's exactly what Wordle Agent Duel is. It's a web app that pits two LLM-powered agents against each other (and optionally you) in a competitive Wordle arena. Each agent reasons through letter constraints, generates guesses, and races to crack a hidden 5-letter word — all while you watch their thought processes unfold in real-time.

The app supports multiple AI providers (Ollama, OpenAI, Gaia, AIsa.one), so you can literally watch GPT-4o battle a local Llama 3.2 model. Who wins? Spoiler: it's not always the bigger model.


Architecture Overview

The app is a full-stack TypeScript application with a React frontend and an Express backend acting as an API proxy layer.

┌──────────────────────────────────────────────────────┐
│                    React Frontend                    │
│  ┌─────────────┬─────────────┬───────────────┐       │
│  │ Agent Board │ Agent Board │ Human Board   │       │
│  │ (Thinking)  │ (Thinking)  │  (Input)      │       │
│  └──────┬──────┴──────┬──────┴──────┬────────┘       │
│         │             │             │                │
│         └─────────────┼─────────────┘                │
│                       │                              │
│              llmService.ts                           │
│         (Prompt + Constraint Engine)                 │
└───────────────────────┬──────────────────────────────┘
                        │ HTTP
┌───────────────────────┼──────────────────────────────┐
│               Express Server (server.ts)             │
│  ┌────────────────┐  ┌────────────────────┐          │
│  │ /api/ollama    │  │ /api/chat-proxy    │          │
│  │ (Local LLM)    │  │ (Cloud OpenAI-compat)│        │
│  └───────┬────────┘  └───────┬────────────┘          │
└──────────┼───────────────────┼───────────────────────┘
           │                   │
    Ollama:11434        OpenAI / Gaia / AIsa.one
Enter fullscreen mode Exit fullscreen mode

Why a Server Proxy?

Ollama runs locally on port 11434 and has CORS restrictions. Cloud providers need API keys. Instead of exposing keys client-side, we route everything through Express:

// server.ts — Generic proxy for OpenAI-compatible providers
app.post("/api/chat-proxy", async (req, res) => {
  const { provider, model, messages, baseUrl } = req.body;
  const config = PROVIDER_CONFIG[provider];
  const apiKey = process.env[config.envKey];

  const response = await fetch(baseUrl || config.url, {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "Authorization": `Bearer ${apiKey}`,
    },
    body: JSON.stringify({ model, messages, stream: false }),
  });

  const data = await response.json();
  // Normalize to { message: { content } } to match Ollama's shape
  res.json({ message: { content: data.choices?.[0]?.message?.content } });
});
Enter fullscreen mode Exit fullscreen mode

All cloud providers return the same OpenAI-compatible response shape, so one proxy endpoint handles them all. The response is normalized to match Ollama's format so the frontend doesn't need to care which provider is being used.


The Prompt Engineering Problem

Here's where it gets interesting. The naive approach to Wordle is:

"Here's the feedback from your previous guesses. What's your next guess?"

This does not work with local models. Even 7B parameter models frequently:

  • Reuse letters that were marked absent
  • Drop letters that were confirmed correct
  • Ignore present letters entirely

The Solution: Pre-Computed Constraint Injection

Instead of asking the LLM to interpret raw feedback, we compute the constraints in TypeScript and inject them directly into the prompt:

// Pre-compute constraints from all previous guess history
const correctPositions: (string | null)[] = Array(5).fill(null);
const presentLetters = new Set<string>();
const absentLetters = new Set<string>();

for (const h of history) {
  for (let i = 0; i < h.result.length; i++) {
    const r = h.result[i];
    if (r.state === 'correct') correctPositions[i] = r.letter.toUpperCase();
    else if (r.state === 'present') presentLetters.add(r.letter.toUpperCase());
    else if (r.state === 'absent') absentLetters.add(r.letter.toUpperCase());
  }
}

// Handle Wordle's duplicate letter edge case
for (const letter of correctPositions) {
  if (letter) { absentLetters.delete(letter); presentLetters.delete(letter); }
}
Enter fullscreen mode Exit fullscreen mode

This produces a constraint block that gets injected into the prompt:

=== CURRENT CONSTRAINTS (YOU MUST OBEY THESE) ===
Known pattern: [ T _ A _ _ ]
Letters that MUST appear somewhere in your word: E, R
Letters that are BANNED (do NOT use): H, O, U, S, G, N
================================================
Enter fullscreen mode Exit fullscreen mode

The LLM doesn't have to think about which letters are where — we tell it. This improved accuracy dramatically across all models we tested.

Structured Output with XML Tags

We ask the LLM to return structured output using XML-style tags:

First, provide your detailed reasoning inside <thinking></thinking> tags.
Then, provide ONLY the 5-letter word in uppercase inside <guess></guess> tags.
Enter fullscreen mode Exit fullscreen mode

Parsing is simple regex:

const thoughtMatch = content.match(/<thinking>([\s\S]*?)<\/thinking>/i);
const guessMatch = content.match(/<guess>([\s\S]*?)<\/guess>/i);
Enter fullscreen mode Exit fullscreen mode

This gives us both the reasoning (displayed in the UI) and the guess (used for game logic).


The Wordle Game Engine

The core game logic lives in constants.ts with a two-pass checkGuess algorithm:

export function checkGuess(guess: string, target: string): GuessResult[] {
  const result: GuessResult[] = Array(5).fill(null).map((_, i) => ({
    letter: guess[i], state: 'absent'
  }));

  const targetLetters = target.split('');
  const guessLetters = guess.split('');

  // Pass 1: Find exact matches (correct position)
  for (let i = 0; i < 5; i++) {
    if (guessLetters[i] === targetLetters[i]) {
      result[i].state = 'correct';
      targetLetters[i] = '#';  // Mark as consumed
      guessLetters[i] = '$';
    }
  }

  // Pass 2: Find present but misplaced letters
  for (let i = 0; i < 5; i++) {
    if (guessLetters[i] === '$') continue;
    const index = targetLetters.indexOf(guessLetters[i]);
    if (index !== -1) {
      result[i].state = 'present';
      targetLetters[index] = '#';
    }
  }

  return result;
}
Enter fullscreen mode Exit fullscreen mode

The two-pass approach is critical for handling duplicate letters correctly — you can't mark a letter as "present" if it's already been consumed by a "correct" match elsewhere.


The Game Loop

The duel runs as an async loop orchestrated by React state and useEffect:

const runDuel = useCallback(async () => {
  // Both agents solve simultaneously
  const promises = [
    solveStep(agent1, agent1Guesses, setAgent1Guesses, ...),
    solveStep(agent2, agent2Guesses, setAgent2Guesses, ...)
  ];

  const results = await Promise.all(promises);

  if (results[0] && results[1]) { /* Tie */ }
  else if (results[0]) { declareWinner(agent1.name, ...); }
  else if (results[1]) { declareWinner(agent2.name, ...); }
}, [gameState, agent1Guesses, agent2Guesses]);
Enter fullscreen mode Exit fullscreen mode

Each solveStep call hits the LLM, parses the response, validates the guess, and updates state. If the LLM returns garbage (not a valid 5-letter word), it increments a retry counter and tries again on the next loop iteration — no crash, no lost turn.

The human player runs independently — they submit guesses via a text input, and if they solve it before either agent, they win.


Multi-Provider Support

Since OpenAI, Gaia, and AIsa.one all use the same OpenAI-compatible chat completions format, supporting all of them required just one proxy endpoint and a config map:

const PROVIDER_CONFIG: Record<string, { url: string; envKey: string }> = {
  openai: { url: "https://api.openai.com/v1/chat/completions", envKey: "OPENAI_API_KEY" },
  gaia:   { url: "https://llama3b.gaia.domains/v1/chat/completions", envKey: "GAIA_API_KEY" },
  aisa:   { url: "https://api.aisa.one/v1/chat/completions", envKey: "AISA_API_KEY" },
};
Enter fullscreen mode Exit fullscreen mode

On the frontend, switching providers simply changes which API route gets called:

if (config.provider === 'ollama') {
  response = await fetch("/api/ollama", { ... });
} else {
  response = await fetch("/api/chat-proxy", {
    body: JSON.stringify({ provider: config.provider, model: config.model, messages })
  });
}
Enter fullscreen mode Exit fullscreen mode

Lessons Learned

  1. Small LLMs need hand-holding. Don't ask a 7B model to derive constraints from raw data. Pre-compute everything and tell it exactly what to do.
  2. XML tags > JSON for structured LLM output. Local models are much better at producing <tag>content</tag> than valid JSON.
  3. All OpenAI-compatible APIs are truly compatible. One proxy endpoint handles OpenAI, Gaia, and AIsa.one with zero code changes.
  4. Error tolerance is everything. LLMs hallucinate. Build retry logic, not crash handlers.

Tech Stack

  • Frontend: React 19, TypeScript, Tailwind CSS 4, Framer Motion, Lucide React
  • Backend: Express.js, Vite 6 (middleware mode), dotenv
  • AI Providers: Ollama, OpenAI, Gaia, AIsa.one
  • Build: Vite, tsx

Try It

The project is open-source: github.com/harishkotra/wordle-agent-duel

git clone https://github.com/harishkotra/wordle-agent-duel.git
cd wordle-agent-duel
npm install
npm run dev
Enter fullscreen mode Exit fullscreen mode

Pull some Ollama models, visit localhost:3000, and watch the AI duel. 🤖⚔️🤖

Top comments (0)