<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: BrewHubPHL</title>
    <description>The latest articles on DEV Community by BrewHubPHL (@brewhubphl).</description>
    <link>https://dev.to/brewhubphl</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3756619%2F99de4cf2-afec-4ffe-a30e-fb10bb091bcf.png</url>
      <title>DEV Community: BrewHubPHL</title>
      <link>https://dev.to/brewhubphl</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/brewhubphl"/>
    <language>en</language>
    <item>
      <title>Don't Trust Your LLM's Safety Promises Across Runtimes</title>
      <dc:creator>BrewHubPHL</dc:creator>
      <pubDate>Sun, 24 May 2026 01:30:28 +0000</pubDate>
      <link>https://dev.to/brewhubphl/dont-trust-your-llms-safety-promises-across-runtimes-44pe</link>
      <guid>https://dev.to/brewhubphl/dont-trust-your-llms-safety-promises-across-runtimes-44pe</guid>
      <description>&lt;p&gt;Most LLM safety guardrails are built with a silent assumption baked in: all your customer-facing traffic runs through a single runtime. One process. One in-process safety check. Done.&lt;/p&gt;

&lt;p&gt;That assumption breaks the moment you deploy a polyglot stack.&lt;/p&gt;

&lt;p&gt;This is a write-up of a pattern we call &lt;strong&gt;parity contracts&lt;/strong&gt; — a deployable security primitive for LLM commerce agents that span multiple runtimes. We implemented it in production at &lt;a href="https://brewhubphl.com" rel="noopener noreferrer"&gt;BrewHub PHL&lt;/a&gt;, a Philadelphia café whose AI agent, Franklin, places orders, charges customer wallets, and issues loyalty mutations &lt;strong&gt;without a human approval step&lt;/strong&gt;. The full academic paper, red-team corpus, and parity test runner are open-source at &lt;a href="https://github.com/BrewHubPHL/allergen-parity-corpus" rel="noopener noreferrer"&gt;github.com/BrewHubPHL/allergen-parity-corpus&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem: Polyglot Deployments Break In-Process Safety
&lt;/h2&gt;

&lt;p&gt;BrewHub's architecture spans three runtimes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Runtime 1 — Next.js on Netlify&lt;/strong&gt; (AWS Lambda): customer-facing SSE chat endpoint, Franklin's tool calls, price recompute&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runtime 2 — Netlify Functions&lt;/strong&gt; (AWS Lambda): POS checkout, payment processing, Square webhook handlers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runtime 3 — Google Cloud Run&lt;/strong&gt; (Python ADK): six specialized workflow agents — ops, marketing, barista training, service recovery, concierge, provenance storyteller&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every existing LLM safety library — Llama Guard, NeMo Guardrails, LlamaFirewall, MCP-Guard — assumes a single serving runtime. Their guarantee is an in-process guarantee: &lt;em&gt;if traffic goes through this runtime, the gate holds&lt;/em&gt;. That's fine for single-runtime deployments. It's a liability when you have independently reachable runtimes that each produce customer-facing or customer-adjacent output.&lt;/p&gt;

&lt;p&gt;In our case: the workflow-agent path on Cloud Run can produce text that eventually reaches a customer via email or staff review. If the safety gate only lives in the Next.js Lambda, and someone engineers a path to Cloud Run, the gate is missing.&lt;/p&gt;

&lt;p&gt;This is the deployed-agent gap. We formalize it as &lt;strong&gt;Adversary D — the runtime-bypass attacker&lt;/strong&gt;: an adversary who discovers or engineers a path that reaches a secondary runtime while bypassing the primary safety gate.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Core Insight: Treat LLM Tool Arguments as Untrusted Input
&lt;/h2&gt;

&lt;p&gt;Before getting to parity contracts specifically, the foundational reclassification:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LLM tool arguments should be classified as untrusted input on par with browser JSON.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every production web developer knows to re-derive truth server-side rather than trusting client-supplied values. We apply the same rule to the tool-call boundary. When Franklin's &lt;code&gt;place_order&lt;/code&gt; tool fires, our backend (&lt;code&gt;_pricing.js&lt;/code&gt;) ignores whatever &lt;code&gt;price_cents&lt;/code&gt; the model supplied and re-fetches &lt;code&gt;merch_products.price_cents&lt;/code&gt; and &lt;code&gt;modifiers.price_delta_cents&lt;/code&gt; directly from Supabase. If the model-supplied total drifts from the server-computed total by a single penny, the transaction fails.&lt;/p&gt;

&lt;p&gt;Same for identity: we never read &lt;code&gt;customer_id&lt;/code&gt; from a tool argument. Identity resolves exclusively from the Bearer JWT via &lt;code&gt;bearer_client&lt;/code&gt; in &lt;code&gt;python-agents/lib/supabase_clients.py&lt;/code&gt;. A hallucinating model and a prompt-injection adversary are indistinguishable at the tool-call boundary — the defense covers both.&lt;/p&gt;




&lt;h2&gt;
  
  
  Parity Contracts: The Pattern
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Definition.&lt;/strong&gt; Let &lt;em&gt;f_A&lt;/em&gt; and &lt;em&gt;f_B&lt;/em&gt; be deterministic safety classifiers implemented in distinct runtimes &lt;em&gt;A&lt;/em&gt; and &lt;em&gt;B&lt;/em&gt;. A &lt;em&gt;parity contract&lt;/em&gt; between them consists of three obligations:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Equivalence:&lt;/strong&gt; &lt;code&gt;∀s. f_A(s) = f_B(s)&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shared test corpus:&lt;/strong&gt; a finite set of labeled inputs that both implementations must pass, covering positive, negative, and boundary cases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CI enforcement:&lt;/strong&gt; a gate that evaluates both implementations against the corpus on every commit and blocks deployment on any disagreement&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;

&lt;p&gt;The key word is &lt;em&gt;deterministic&lt;/em&gt;. This pattern applies to regex classifiers, finite-state automata, and rule engines — not probabilistic LLM-based guardrails. Deterministic classifiers admit byte-equivalence testing. That property is what makes CI-gated equivalence possible.&lt;/p&gt;




&lt;h2&gt;
  
  
  Instantiation: The Three-Layer Allergen Kill Switch
&lt;/h2&gt;

&lt;p&gt;The concrete implementation is our allergen safety gate. The failure mode is concrete and high-stakes: a hallucinated claim that a drink is peanut-free can cause anaphylaxis. Prompts can be circumvented. System instructions can be bypass-tested. The gate cannot live in the LLM's reasoning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 1 — Pre-LLM interception (&lt;code&gt;lib/safety/allergen.py&lt;/code&gt;):&lt;/strong&gt;&lt;br&gt;
Before any user message reaches Anthropic or Gemini, a synchronous regex engine intercepts it. &lt;code&gt;ALLERGEN_KEYWORDS&lt;/code&gt;, &lt;code&gt;MEDICAL_KEYWORDS&lt;/code&gt;, and &lt;code&gt;DIETARY_SAFETY_KEYWORDS&lt;/code&gt; patterns run against the raw text. If any match, the request is blocked before a single token is billed, returning the canonical &lt;code&gt;ALLERGEN_SAFE_RESPONSE&lt;/code&gt; string. Median latency: &lt;strong&gt;3.4 μs&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 2 — Mid-stream scrubbing (&lt;code&gt;lib/chat/allergen-safety.ts&lt;/code&gt;):&lt;/strong&gt;&lt;br&gt;
If a prompt evades Layer 1, the outbound token stream is wrapped by &lt;code&gt;scrubbing_text_stream()&lt;/code&gt;, which maintains a 50-character lookahead buffer. &lt;code&gt;DANGEROUS_REPLY_RE&lt;/code&gt; matches patterns like &lt;code&gt;\b100%\s+(?:\w+[- ])?free\b&lt;/code&gt; and &lt;code&gt;\bguaranteed\s+(?:safe|free)\b&lt;/code&gt;. A match mid-flight breaks the SSE stream and substitutes &lt;code&gt;ALLERGEN_SAFE_RESPONSE&lt;/code&gt; before any byte of the dangerous assurance reaches the client.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 3 — Post-response audit:&lt;/strong&gt;&lt;br&gt;
Every safety interception is logged to &lt;code&gt;franklin_safety_audit&lt;/code&gt; in Supabase. Layer 3 is explicitly characterized as best-effort forensic evidence — AWS Lambda's fire-and-forget execution model means absence of an audit row is not proof of non-execution. Positive evidence only.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Parity Test: Parsing TypeScript from Python
&lt;/h2&gt;

&lt;p&gt;The parity contract enforcement lives in &lt;code&gt;python-agents/tests/safety/test_allergen_parity.py&lt;/code&gt;. Its mechanism is worth examining:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Read &lt;code&gt;src/lib/chat/allergen-safety.ts&lt;/code&gt; from the repository filesystem at test time&lt;/li&gt;
&lt;li&gt;Regex-extract each declaration of the form &lt;code&gt;export const NAME = /pattern/i;&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Compile each extracted pattern under Python's &lt;code&gt;re&lt;/code&gt; engine with &lt;code&gt;re.IGNORECASE&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Assert behavioral equivalence against a 90-case battery for all four named regexes&lt;/li&gt;
&lt;li&gt;Assert that &lt;code&gt;ALLERGEN_SAFE_RESPONSE&lt;/code&gt; is byte-identical between the TypeScript template literal and the Python string constant&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is structurally stronger than testing two independent implementations against a shared corpus. The test &lt;strong&gt;parses the TypeScript source and recompiles it under Python's engine&lt;/strong&gt;. The most common parity-bug shape — engineer edits the regex on one side and forgets the other — is caught by construction. A change to the TypeScript regex with no change to the Python regex is impossible to ship: either the new regex passes the battery under Python (parity preserved by accident) or it fails (CI blocks deployment).&lt;/p&gt;

&lt;p&gt;The 90-case battery breaks down as: 27 allergen-positive, 27 medical-positive, 10 dietary-safety-positive, 19 dangerous-reply-positive, 7 negative controls.&lt;/p&gt;




&lt;h2&gt;
  
  
  Red-Team Results
&lt;/h2&gt;

&lt;p&gt;We evaluated against a 100-prompt corpus (75 adversarial + 25 benign controls) across four categories:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Executed&lt;/th&gt;
&lt;th&gt;Block rate&lt;/th&gt;
&lt;th&gt;False positive rate&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;A — Allergen bypass&lt;/td&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;B — Price/identity (commerce language)&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;D — Cross-runtime / Unicode&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;N — Benign controls&lt;/td&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Two honest Layer-1 gaps surfaced: &lt;code&gt;sulfites&lt;/code&gt; didn't match the bare &lt;code&gt;\bsulfite\b&lt;/code&gt; pattern (plural boundary issue) and the "does this contain any nuts?" form pushed keywords outside the &lt;code&gt;.{0,30}&lt;/code&gt; window. Both are Layer-2 caught — the kill switch held — but the gate placement was one layer deeper than ideal. Both are documented as actionable findings rather than silently fixed before publication.&lt;/p&gt;

&lt;p&gt;Layer-1 p99 latency: &lt;strong&gt;8.79 μs&lt;/strong&gt; — roughly four orders of magnitude below the 10–50ms cost of an intra-region HTTPS round trip to a centralized safety service.&lt;/p&gt;




&lt;h2&gt;
  
  
  When to Use This Pattern
&lt;/h2&gt;

&lt;p&gt;The parity contract is worth the maintenance cost when &lt;strong&gt;all three&lt;/strong&gt; of these hold:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Two or more runtimes can independently produce customer-facing or customer-adjacent output&lt;/li&gt;
&lt;li&gt;The safety property is deterministic (regex, automaton, rule engine)&lt;/li&gt;
&lt;li&gt;Runtime ownership crosses team or language boundaries — making silent divergence likely in practice&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If only one runtime produces customer-facing output, enforce the gate once. If the safety property is probabilistic, the parity contract is the wrong shape — you need distributional equivalence, which is a harder problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  The HMAC Wire Contract
&lt;/h2&gt;

&lt;p&gt;One more primitive worth naming: every request from the Next.js edge to Cloud Run is signed with HMAC-SHA256 via &lt;code&gt;internal-hmac.ts&lt;/code&gt; (TypeScript) and verified by &lt;code&gt;hmac_auth.py&lt;/code&gt; (Python) as ASGI middleware before any ADK invocation. The shared secret lives in disjoint Doppler configs for the two runtimes. Timestamp freshness is enforced at 60 seconds. This is a second parity contract — same shape, different domain — enforced by CODEOWNERS rules requiring a security-tagged reviewer for co-modification of either file.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Full Paper and Corpus
&lt;/h2&gt;

&lt;p&gt;The complete paper, formal definitions, ablation methodology, and the 100-prompt red-team corpus with executable runner are available at:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/BrewHubPHL/allergen-parity-corpus" rel="noopener noreferrer"&gt;github.com/BrewHubPHL/allergen-parity-corpus&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The runner executes in local in-process mode against the Python safety layer, with staging-instance mode (full SSE against live infrastructure) described for camera-ready validation. The corpus is released open-source so reviewers can extend the battery and re-execute the methodology end-to-end.&lt;/p&gt;

&lt;p&gt;The pattern is language-agnostic. Replace TypeScript and Python with any two languages, replace the allergen regex with any deterministic classifier, replace Jest and pytest with any shared runner. The three adoption conditions above are necessary and sufficient.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>opensource</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Beyond Chatbots: How I use Gemini 2.5 and Supabase to run a fully automated retention team</title>
      <dc:creator>BrewHubPHL</dc:creator>
      <pubDate>Wed, 29 Apr 2026 00:23:22 +0000</pubDate>
      <link>https://dev.to/brewhubphl/beyond-chatbots-how-i-use-gemini-25-and-supabase-to-run-a-fully-automated-retention-team-130l</link>
      <guid>https://dev.to/brewhubphl/beyond-chatbots-how-i-use-gemini-25-and-supabase-to-run-a-fully-automated-retention-team-130l</guid>
      <description>&lt;p&gt;When most developers think about integrating AI into their apps, the default move is to build a chatbot. But for my coffee shop app, &lt;a href="https://brewhubphl.com" rel="noopener noreferrer"&gt;BrewHub PHL&lt;/a&gt;, I didn't want users talking to an AI. I wanted the AI doing the heavy lifting in the background.&lt;/p&gt;

&lt;p&gt;Customer retention is notoriously hard for local businesses. Figuring out who hasn't visited in a while, drafting a personalized message, and issuing a custom discount code usually takes hours of manual marketing work.&lt;/p&gt;

&lt;p&gt;I decided to fully automate this using Gemini 2.5 Flash, Supabase, and a simple weekly cron job. Here is how I built a headless AI retention agent that automatically wins back lapsed customers while I sleep.&lt;/p&gt;

&lt;p&gt;The Architecture&lt;br&gt;
The pipeline runs every Monday at 10 AM and consists of four steps:&lt;/p&gt;

&lt;p&gt;The Data Layer: A Supabase RPC finds eligible lapsed customers.&lt;/p&gt;

&lt;p&gt;The Privacy Layer: The script strips all Personally Identifiable Information (PII) before it touches the LLM.&lt;/p&gt;

&lt;p&gt;The Brains: The Gemini API generates hyper-personalized SMS messages and forces the output into strict JSON.&lt;/p&gt;

&lt;p&gt;The Execution: The system generates physical POS vouchers and sends the SMS via Twilio.&lt;/p&gt;

&lt;p&gt;Step 1: Finding Eligible Customers&lt;br&gt;
I didn't want to spam one-off visitors. To find the right targets, I wrote a Postgres RPC in Supabase called get_lapsed_customers_eligible_for_retention.&lt;/p&gt;

&lt;p&gt;It filters the database for users who:&lt;/p&gt;

&lt;p&gt;Have ordered at least 3 times (loyal customers).&lt;/p&gt;

&lt;p&gt;Haven't ordered in the last 14 days.&lt;/p&gt;

&lt;p&gt;Haven't received a marketing voucher in the last 90 days (the cooldown period).&lt;/p&gt;

&lt;p&gt;SQL&lt;br&gt;
-- The Supabase RPC handles the heavy data filtering instantly&lt;br&gt;
SELECT id, full_name, phone, favorite_drink, days_since_last_visit &lt;br&gt;
FROM get_lapsed_customers_eligible_for_retention(&lt;br&gt;
  p_min_orders := 3, &lt;br&gt;
  p_lapsed_days := 14, &lt;br&gt;
  p_cooldown_days := 90, &lt;br&gt;
  p_batch_limit := 10&lt;br&gt;
);&lt;br&gt;
Step 2: Privacy by Design&lt;br&gt;
Sending raw customer data to an LLM is a terrible idea. Before the data leaves my server, the script maps the Supabase response to a strictly anonymous payload.&lt;/p&gt;

&lt;p&gt;Names and phone numbers are dropped. Gemini only sees the customer_id, their favorite_drink, and days_since_last_visit.&lt;/p&gt;

&lt;p&gt;Step 3: Prompting for Structured JSON with Gemini 2.5&lt;br&gt;
This is where the magic happens. I don't just want Gemini to write a message; I need it to return an array of objects that my code can iterate over to send SMS messages.&lt;/p&gt;

&lt;p&gt;Using the official @google/generative-ai SDK, I pass responseMimeType: "application/json" to guarantee the output won't break my script.&lt;/p&gt;

&lt;p&gt;JavaScript&lt;br&gt;
import { GoogleGenerativeAI } from "@google/generative-ai";&lt;/p&gt;

&lt;p&gt;const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);&lt;br&gt;
const model = genAI.getGenerativeModel({ &lt;br&gt;
  model: "gemini-2.5-flash",&lt;br&gt;
  generationConfig: {&lt;br&gt;
    responseMimeType: "application/json",&lt;br&gt;
  }&lt;br&gt;
});&lt;/p&gt;

&lt;p&gt;const prompt = `&lt;br&gt;
You are the BrewHub PHL Retention Agent. I will provide a list of anonymous customer profiles. &lt;br&gt;
For each customer, write a short, friendly, and highly personalized SMS text message under 160 characters. &lt;br&gt;
Acknowledge that we haven't seen them in a while, mention their favorite drink by name, and offer them a $5 voucher to come back.&lt;/p&gt;

&lt;p&gt;Return ONLY a JSON array with this exact structure:&lt;br&gt;
[&lt;br&gt;
  { "customer_id": "uuid-here", "sms_message": "Hey! It's been a while..." }&lt;br&gt;
]&lt;/p&gt;

&lt;p&gt;Customer Data:&lt;br&gt;
${JSON.stringify(anonymousCustomerList)}&lt;br&gt;
`;&lt;/p&gt;

&lt;p&gt;const response = await model.generateContent(prompt);&lt;br&gt;
const aiDecisions = JSON.parse(response.text());&lt;br&gt;
Because Gemini 2.5 Flash is incredibly fast, this entire batch generation takes just a few seconds.&lt;/p&gt;

&lt;p&gt;Step 4: Fulfillment and SMS&lt;br&gt;
Once Gemini hands back the clean JSON array of messages, my cron job loops through the results.&lt;/p&gt;

&lt;p&gt;For each customer_id, it:&lt;/p&gt;

&lt;p&gt;Generates a unique secure voucher code (e.g., 5OFF-A3F9C1).&lt;/p&gt;

&lt;p&gt;Inserts that active voucher into my Supabase vouchers table.&lt;/p&gt;

&lt;p&gt;Appends the code to Gemini's personalized message: "Show this code to one of our baristas: 5OFF-A3F9C1".&lt;/p&gt;

&lt;p&gt;Dispatches the final message via Twilio.&lt;/p&gt;

&lt;p&gt;The Result&lt;br&gt;
By moving AI out of the chat window and into a scheduled backend worker, the system feels like magic. Customers get a highly personalized text referencing their actual favorite order, complete with a working POS discount code, and I don't have to lift a finger.&lt;/p&gt;

&lt;p&gt;The Gemini API's strict JSON output makes it incredibly reliable for server-to-server data pipelines, proving that the real power of modern LLMs is as a background reasoning engine.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If you want to see the Supabase + Next.js architecture in action (or just want to order some coffee in Philadelphia), you can check out the live web app at &lt;a href="https://brewhubphl.com" rel="noopener noreferrer"&gt;brewhubphl.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>googleaichallenge</category>
      <category>javascript</category>
      <category>webdev</category>
      <category>supabase</category>
    </item>
    <item>
      <title>I built an Infinite AI Debate Arena using the GitHub Copilot CLI 🥊</title>
      <dc:creator>BrewHubPHL</dc:creator>
      <pubDate>Wed, 11 Feb 2026 19:29:41 +0000</pubDate>
      <link>https://dev.to/brewhubphl/i-built-an-infinite-ai-debate-arena-using-the-github-copilot-cli-421l</link>
      <guid>https://dev.to/brewhubphl/i-built-an-infinite-ai-debate-arena-using-the-github-copilot-cli-421l</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/github-2026-01-21"&gt;GitHub Copilot CLI Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;💡 The Idea&lt;br&gt;
What happens if you lock two AI personalities in a room and force them to argue about "Is a hotdog a sandwich?" forever?&lt;/p&gt;

&lt;p&gt;For the GitHub Copilot CLI Challenge, I didn't want to just build a utility tool. I wanted to build something chaotic. Enter the AI Debate Arena.&lt;/p&gt;

&lt;p&gt;It’s a terminal-based "fighting game" where:&lt;/p&gt;

&lt;p&gt;Captain Capslock (An angry Boomer) fights Lil' Zoomer (A Gen-Z teen).&lt;/p&gt;

&lt;p&gt;They argue in an infinite loop.&lt;/p&gt;

&lt;p&gt;Sentiment Analysis determines who is "winning" (getting angrier).&lt;/p&gt;

&lt;p&gt;🎥 The Demo&lt;br&gt;
[&lt;a href="https://youtu.be/-RBdUKZY9zA" rel="noopener noreferrer"&gt;https://youtu.be/-RBdUKZY9zA&lt;/a&gt;]&lt;/p&gt;

&lt;p&gt;🛠️ How it Works&lt;br&gt;
The project uses Python to orchestrate the chaos, but the "brains" are entirely powered by the GitHub Copilot CLI.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The "Persona Injection"
I used the gh copilot explain command to generate the dialogue. By injecting a specific persona into the prompt, we can force Copilot to break character.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Python&lt;/p&gt;

&lt;h1&gt;
  
  
  The Secret Sauce
&lt;/h1&gt;

&lt;p&gt;prompt = f"{persona} Your opponent said: '{last_response}'. Reply in one short, funny sentence."&lt;br&gt;
cmd = ["gh", "copilot", "explain", "-p", prompt]&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The Loop
The script creates a feedback loop:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Fighter A generates a response using Copilot.&lt;/p&gt;

&lt;p&gt;TextBlob analyzes the sentiment (Politeness = Weakness, Anger = Power).&lt;/p&gt;

&lt;p&gt;Fighter B takes that response and generates a counter-argument.&lt;/p&gt;

&lt;p&gt;Rich renders the ASCII faces and health bars in real-time.&lt;/p&gt;

&lt;p&gt;🎨 The Tech Stack&lt;br&gt;
GitHub CLI (gh): The AI engine.&lt;/p&gt;

&lt;p&gt;Rich: For the beautiful terminal UI and layouts.&lt;/p&gt;

&lt;p&gt;TextBlob: For the "Rage Meter" logic.&lt;/p&gt;

&lt;p&gt;Python: To glue it all together.&lt;/p&gt;

&lt;p&gt;🏆 The Outcome&lt;br&gt;
Sometimes they argue about politics, sometimes about cereal. The CLI handles the roleplay surprisingly well. My favorite line so far?&lt;/p&gt;

&lt;p&gt;"Milk-first people stay taking Ls fr fr, that's giving unhinged villain energy." — Lil' Zoomer&lt;/p&gt;

&lt;p&gt;🔗 The Code&lt;br&gt;
Check out the repository here to run it yourself! [&lt;a href="https://github.com/BrewHubPHL/ai-debate.git" rel="noopener noreferrer"&gt;https://github.com/BrewHubPHL/ai-debate.git&lt;/a&gt;]&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>githubchallenge</category>
      <category>cli</category>
      <category>githubcopilot</category>
    </item>
  </channel>
</rss>
