CyberTabletop CLI — I turned GitHub Copilot into a tabletop exercise facilitator

#devchallenge #githubchallenge #cli #githubcopilot

GitHub Copilot CLI Challenge Submission

This is a submission for the GitHub Copilot CLI Challenge

What I Built

Most security teams never practice incident response. Tabletop exercises are the fix — you sit around a table, someone reads a scenario, and you talk through what you'd do. But running one takes prep, a facilitator, and scheduling time that nobody has.

So I built a CLI game that does it for you. You pick a ransomware scenario, make decisions each turn, and the situation evolves based on what you chose. The twist: GitHub Copilot CLI is the facilitator. It generates the scenarios, writes the narrative, decides the consequences, and keeps the pressure on.

CyberTabletop CLI is a turn-based cybersecurity tabletop exercise game that runs entirely in your terminal. You play as the incident response lead at a pharmaceutical company that just got hit by ransomware.

Every turn, you get a situation update and 3-4 choices. Pick one, and the game advances the clock, adjusts risk scores, changes asset statuses, and generates the next set of problems. The whole thing is driven by copilot -p calls under the hood — there's no hardcoded scenario tree, so every playthrough is different.

Here's what a session looks like:

You run python main.py start
Copilot generates three ransomware variants with different entry vectors (phishing, supply chain compromise, insider threat, etc.)
You pick one
The game bootstraps a full company profile — assets, security tools, risk levels, the works
Then you're in the loop. Each turn: read the update, pick A/B/C/D, deal with the consequences

The game tracks three risk dimensions (operational, data, legal) on a 0-100 scale, manages an asset list with real-time status changes, and keeps an in-game clock that advances 5-20 minutes per turn depending on what's happening.

When you're done, you can generate an AI debrief with /debrief or export a full markdown transcript with /export — useful for sharing with your team or writing up what happened.

The interactive session supports slash commands so you never leave the game loop: /status to check your risk scores, 'json' to peek at raw state, /help to see what's available, and /exit to save and quit.

Tech stack: Python, Typer, Rich (for terminal formatting), and GitHub Copilot CLI as the AI backend. That's it.

Demo

My Experience with GitHub Copilot CLI

This project uses Copilot CLI in a way I haven't seen before — not just as a development tool, but as the runtime AI engine for the game itself.

Every time the game needs content, it shells out to copilot -p "<prompt>" and parses the JSON that comes back. That means Copilot is generating scenarios, writing incident narratives, calculating consequences, and producing debriefs. It's doing the work of a human facilitator, live, in your terminal.

Getting this to work reliably was the interesting part.

Prompt engineering for structured output

The biggest challenge was getting consistent JSON back from a CLI tool that's designed for conversational output. I ended up embedding full JSON schemas directly in every prompt, plus a system prompt (FACILITATOR_RULES) that hammers home the constraints: JSON only, no markdown, no backticks, no commentary, ransomware only, 3-4 choices per turn.

Each prompt type (start menu, bootstrap, turn, debrief) has its own schema so Copilot knows exactly what shape to return. The turn prompt also sends the full game state — risk levels, asset statuses, signals, flags, recent player actions — so Copilot can write consequences that actually make sense in context.

Defensive JSON parsing

Even with strict prompts, Copilot sometimes wraps output in markdown fences or adds commentary. So I built a defensive parser that first tries json.loads() on the raw output, then falls back to regex-extracting the first {...} block.

On Windows, I hit another curveball — Copilot occasionally returned PowerShell object notation (@{}, @()) instead of JSON. So there's a converter that handles that too. It's not pretty, but it works.

State machine design

The game state is a single dict that gets passed to Copilot each turn. Copilot returns a "state patch" — risk deltas, asset updates, new signals, flag changes — and the engine applies it. This keeps the game consistent across turns without Copilot needing to remember anything.

I kept the state compact on purpose. Only the last 6 player actions get sent back, and the state only includes what's needed for the next decision. More context = more tokens = slower and less stable responses.

What worked well

Copilot CLI turned out to be surprisingly good at this. The scenarios feel realistic, the consequences track logically, and it naturally escalates tension over time. Constraining it to pharma + ransomware helped a lot — narrow scope = more consistent output.

The 90-second subprocess timeout was generous enough that I never hit it in practice, even with the longer bootstrap prompts.

What I'd do differently

The prompt engineering is brittle. If Copilot changes how it handles system instructions, the JSON parsing could break. A future version would probably benefit from retry logic and schema validation on the response.

I'd also like to support more scenario types beyond ransomware and more industries beyond pharma, but for an MVP the narrow focus was the right call. It kept the prompts stable and the output quality high.

This was a fun build. There's something satisfying about turning a conversational AI tool into a structured game engine — and the result is something I'd actually use to run exercises with my team.