A deep-dive into building a competitive Wordle arena where autonomous AI agents — and a human player — race to solve puzzles using local and cloud LLMs.
The Idea
What if you could watch two AI models play Wordle against each other in real-time? Better yet, what if you could jump in and compete against them?
That's exactly what Wordle Agent Duel is. It's a web app that pits two LLM-powered agents against each other (and optionally you) in a competitive Wordle arena. Each agent reasons through letter constraints, generates guesses, and races to crack a hidden 5-letter word — all while you watch their thought processes unfold in real-time.
The app supports multiple AI providers (Ollama, OpenAI, Gaia, AIsa.one), so you can literally watch GPT-4o battle a local Llama 3.2 model. Who wins? Spoiler: it's not always the bigger model.
Architecture Overview
The app is a full-stack TypeScript application with a React frontend and an Express backend acting as an API proxy layer.
┌──────────────────────────────────────────────────────┐
│ React Frontend │
│ ┌─────────────┬─────────────┬───────────────┐ │
│ │ Agent Board │ Agent Board │ Human Board │ │
│ │ (Thinking) │ (Thinking) │ (Input) │ │
│ └──────┬──────┴──────┬──────┴──────┬────────┘ │
│ │ │ │ │
│ └─────────────┼─────────────┘ │
│ │ │
│ llmService.ts │
│ (Prompt + Constraint Engine) │
└───────────────────────┬──────────────────────────────┘
│ HTTP
┌───────────────────────┼──────────────────────────────┐
│ Express Server (server.ts) │
│ ┌────────────────┐ ┌────────────────────┐ │
│ │ /api/ollama │ │ /api/chat-proxy │ │
│ │ (Local LLM) │ │ (Cloud OpenAI-compat)│ │
│ └───────┬────────┘ └───────┬────────────┘ │
└──────────┼───────────────────┼───────────────────────┘
│ │
Ollama:11434 OpenAI / Gaia / AIsa.one
Why a Server Proxy?
Ollama runs locally on port 11434 and has CORS restrictions. Cloud providers need API keys. Instead of exposing keys client-side, we route everything through Express:
// server.ts — Generic proxy for OpenAI-compatible providers
app.post("/api/chat-proxy", async (req, res) => {
const { provider, model, messages, baseUrl } = req.body;
const config = PROVIDER_CONFIG[provider];
const apiKey = process.env[config.envKey];
const response = await fetch(baseUrl || config.url, {
method: "POST",
headers: {
"Content-Type": "application/json",
"Authorization": `Bearer ${apiKey}`,
},
body: JSON.stringify({ model, messages, stream: false }),
});
const data = await response.json();
// Normalize to { message: { content } } to match Ollama's shape
res.json({ message: { content: data.choices?.[0]?.message?.content } });
});
All cloud providers return the same OpenAI-compatible response shape, so one proxy endpoint handles them all. The response is normalized to match Ollama's format so the frontend doesn't need to care which provider is being used.
The Prompt Engineering Problem
Here's where it gets interesting. The naive approach to Wordle is:
"Here's the feedback from your previous guesses. What's your next guess?"
This does not work with local models. Even 7B parameter models frequently:
- Reuse letters that were marked
absent - Drop letters that were confirmed
correct - Ignore
presentletters entirely
The Solution: Pre-Computed Constraint Injection
Instead of asking the LLM to interpret raw feedback, we compute the constraints in TypeScript and inject them directly into the prompt:
// Pre-compute constraints from all previous guess history
const correctPositions: (string | null)[] = Array(5).fill(null);
const presentLetters = new Set<string>();
const absentLetters = new Set<string>();
for (const h of history) {
for (let i = 0; i < h.result.length; i++) {
const r = h.result[i];
if (r.state === 'correct') correctPositions[i] = r.letter.toUpperCase();
else if (r.state === 'present') presentLetters.add(r.letter.toUpperCase());
else if (r.state === 'absent') absentLetters.add(r.letter.toUpperCase());
}
}
// Handle Wordle's duplicate letter edge case
for (const letter of correctPositions) {
if (letter) { absentLetters.delete(letter); presentLetters.delete(letter); }
}
This produces a constraint block that gets injected into the prompt:
=== CURRENT CONSTRAINTS (YOU MUST OBEY THESE) ===
Known pattern: [ T _ A _ _ ]
Letters that MUST appear somewhere in your word: E, R
Letters that are BANNED (do NOT use): H, O, U, S, G, N
================================================
The LLM doesn't have to think about which letters are where — we tell it. This improved accuracy dramatically across all models we tested.
Structured Output with XML Tags
We ask the LLM to return structured output using XML-style tags:
First, provide your detailed reasoning inside <thinking></thinking> tags.
Then, provide ONLY the 5-letter word in uppercase inside <guess></guess> tags.
Parsing is simple regex:
const thoughtMatch = content.match(/<thinking>([\s\S]*?)<\/thinking>/i);
const guessMatch = content.match(/<guess>([\s\S]*?)<\/guess>/i);
This gives us both the reasoning (displayed in the UI) and the guess (used for game logic).
The Wordle Game Engine
The core game logic lives in constants.ts with a two-pass checkGuess algorithm:
export function checkGuess(guess: string, target: string): GuessResult[] {
const result: GuessResult[] = Array(5).fill(null).map((_, i) => ({
letter: guess[i], state: 'absent'
}));
const targetLetters = target.split('');
const guessLetters = guess.split('');
// Pass 1: Find exact matches (correct position)
for (let i = 0; i < 5; i++) {
if (guessLetters[i] === targetLetters[i]) {
result[i].state = 'correct';
targetLetters[i] = '#'; // Mark as consumed
guessLetters[i] = '$';
}
}
// Pass 2: Find present but misplaced letters
for (let i = 0; i < 5; i++) {
if (guessLetters[i] === '$') continue;
const index = targetLetters.indexOf(guessLetters[i]);
if (index !== -1) {
result[i].state = 'present';
targetLetters[index] = '#';
}
}
return result;
}
The two-pass approach is critical for handling duplicate letters correctly — you can't mark a letter as "present" if it's already been consumed by a "correct" match elsewhere.
The Game Loop
The duel runs as an async loop orchestrated by React state and useEffect:
const runDuel = useCallback(async () => {
// Both agents solve simultaneously
const promises = [
solveStep(agent1, agent1Guesses, setAgent1Guesses, ...),
solveStep(agent2, agent2Guesses, setAgent2Guesses, ...)
];
const results = await Promise.all(promises);
if (results[0] && results[1]) { /* Tie */ }
else if (results[0]) { declareWinner(agent1.name, ...); }
else if (results[1]) { declareWinner(agent2.name, ...); }
}, [gameState, agent1Guesses, agent2Guesses]);
Each solveStep call hits the LLM, parses the response, validates the guess, and updates state. If the LLM returns garbage (not a valid 5-letter word), it increments a retry counter and tries again on the next loop iteration — no crash, no lost turn.
The human player runs independently — they submit guesses via a text input, and if they solve it before either agent, they win.
Multi-Provider Support
Since OpenAI, Gaia, and AIsa.one all use the same OpenAI-compatible chat completions format, supporting all of them required just one proxy endpoint and a config map:
const PROVIDER_CONFIG: Record<string, { url: string; envKey: string }> = {
openai: { url: "https://api.openai.com/v1/chat/completions", envKey: "OPENAI_API_KEY" },
gaia: { url: "https://llama3b.gaia.domains/v1/chat/completions", envKey: "GAIA_API_KEY" },
aisa: { url: "https://api.aisa.one/v1/chat/completions", envKey: "AISA_API_KEY" },
};
On the frontend, switching providers simply changes which API route gets called:
if (config.provider === 'ollama') {
response = await fetch("/api/ollama", { ... });
} else {
response = await fetch("/api/chat-proxy", {
body: JSON.stringify({ provider: config.provider, model: config.model, messages })
});
}
Lessons Learned
- Small LLMs need hand-holding. Don't ask a 7B model to derive constraints from raw data. Pre-compute everything and tell it exactly what to do.
-
XML tags > JSON for structured LLM output. Local models are much better at producing
<tag>content</tag>than valid JSON. - All OpenAI-compatible APIs are truly compatible. One proxy endpoint handles OpenAI, Gaia, and AIsa.one with zero code changes.
- Error tolerance is everything. LLMs hallucinate. Build retry logic, not crash handlers.
Tech Stack
- Frontend: React 19, TypeScript, Tailwind CSS 4, Framer Motion, Lucide React
- Backend: Express.js, Vite 6 (middleware mode), dotenv
- AI Providers: Ollama, OpenAI, Gaia, AIsa.one
- Build: Vite, tsx
Try It
The project is open-source: github.com/harishkotra/wordle-agent-duel
git clone https://github.com/harishkotra/wordle-agent-duel.git
cd wordle-agent-duel
npm install
npm run dev
Pull some Ollama models, visit localhost:3000, and watch the AI duel. 🤖⚔️🤖
Top comments (0)