DEV Community

André
André

Posted on

Clueless: A Codenames-Inspired Game You Can Play With AI Teammates or Just Watch Them Go

GitHub Copilot CLI Challenge Submission

This is a submission for the GitHub Copilot CLI Challenge

What I Built

Clueless is a multiplayer word spy game inspired by Codenames, but instead of needing 4+ friends, you play alongside AI teammates who have distinct personalities, argue with each other, crack jokes, and sometimes make terrible decisions.

The game generates a 5×5 grid of words. Two teams compete to identify their team's words using one-word hints from a spymaster. The twist: your teammates are LLMs with personalities like "The Overthinker" (second-guesses everything), "The Hothead" (wants to guess immediately), and "The Drama Queen" (turns every guess into a theatrical event). They discuss, debate, propose guesses, and vote.

Three play modes:

  • Spymaster: You give the hints, AI operatives debate and guess
  • Operative: AI spymaster gives hints, you discuss and vote with AI teammates
  • Spectator: Watch two full AI teams battle it out

What makes it unique:

  • 20 distinct built-in AI personalities with genuine behavioral differences
  • LLMs discuss freely among themselves. You can jump in anytime, pause the conversation when you need to think
  • Text-to-Speech: Every message can be spoken aloud with 18 different voice options, turning the game into a podcast-like experience
  • Cross-team banter: Between turns, players from both teams trash-talk and react to what just happened
  • Mix-and-match AI models — Assign different LLM models to different players on the same team (e.g., GPT-4 as spymaster, Qwen3 as operatives)
  • Works with any OpenAI-compatible API: OpenRouter, Llama.cpp, LM Studio, vLLM, or any other endpoint

Demo

Want to play yourself? → GitHub: https://github.com/and1mon/clueless

Overview

Game Modes

🕵️ Spymaster

You give hints, AI guesses.

Spymaster Board
Colored borders show card ownership

🎯 Operative

AI gives hints, you guess with AI teammates.

Operative Board
Make proposals:

Proposal:

Suggestion Input

Result in Chat:

Suggestion Chat

Vote on proposals:

Vote:

Accept Input

Result in Chat:

Accept Chat

👁️ Spectator

Watch AI vs AI with full visibility.

Watch Gemini-3-Flash play: https://www.youtube.com/watch?v=dO0b7Gf278s

My Experience with GitHub Copilot CLI

This entire project was built through conversation with GitHub Copilot CLI. Not autocomplete suggestions in an editor, but actual back-and-forth problem-solving directly in the terminal.

Why it worked so well for this project

The biggest difference was continuity. Copilot CLI remembered the architecture, the bugs we had already fixed, and the design decisions we had made. When I said "it is the same error again," it knew which error, checked whether the previous fix had regressed, and traced a new root cause. No context-switching between editor, browser, and terminal. Everything happened in one place.

It was especially useful for the kind of bugs you cannot Google. When an AI player starts ignoring the voting system or gives subtly wrong answers, there is no StackOverflow thread for that. Copilot CLI could look at the prompt, the game state, and the model output together and figure out what was actually going wrong.

Concrete examples

Kokoro TTS integration. I had no prior experience with on-device TTS. Copilot CLI researched alternatives, compared them, read the kokoro-js library docs, and proposed the architecture: server-side generation with the 82M ONNX model, a voice pool mapped to personality indices, and markdown stripping before synthesis. It handled the full implementation across the Express endpoint, the client-side audio playback, and the voice assignment logic.

The personality system. Getting 20 distinct personalities to actually feel different during gameplay required a lot of prompt iteration. "The Hothead" needed to push for immediate guesses without becoming incoherent, "The Overthinker" needed to second-guess without stalling the game forever. Copilot CLI could test a prompt change, watch the resulting game behavior, and suggest targeted adjustments. That feedback loop would have been painful to do manually.

Light mode. This one is just fun to mention. I asked for it and Copilot one-shot the entire implementation in a single prompt: CSS custom properties, data-theme attribute switching, localStorage persistence, system preference detection as fallback. It then kept the theming consistent across every new component for the rest of the project.

Challenges

Building an entire app through a terminal conversation is a massive win for productivity, but it introduces a specific set of challenges that you won't find in a traditional IDE setup, and although this is a challenge to embrace Copilot CLI, it would be wrong not to include the challenges that come with it.

Frontend Design Copilot CLI is brilliant at logic but blind to pixels. Debugging UI glitches felt like describing a painting over the phone. In the end, for CSS adjustments and layout tweaks, it was often faster to just open the file and fix it myself than to iterate through five conversational attempts to get the spacing right.

Review fatigue When an LLM generates 50 lines of seemingly perfect TypeScript in seconds, it is incredibly tempting to just skim it and say "LGTM!". It is easy to fall into the trap of subtle bugs because the rest of the code looks so professional. The challenge is maintaining the discipline to read every line when the "author" is a machine.

The meta layer

There is something fitting about using an AI tool to build a game where AI players argue with each other. Copilot CLI was effectively another player at the table, one that happened to be good at TypeScript and prompt engineering instead of word association.

Verdict

I hope you have as much fun with this project as I had. Watching different LLMs get into heated arguments, change their minds mid-debate, or confidently lead their team into the assassin word, it is genuinely entertaining. It also works as an unconventional benchmark: how well does a model actually reason under social pressure, with incomplete information, when other models are pushing back?

Top comments (0)