DEV Community: BrewHubPHL

Don't Trust Your LLM's Safety Promises Across Runtimes

BrewHubPHL — Sun, 24 May 2026 01:30:28 +0000

Most LLM safety guardrails are built with a silent assumption baked in: all your customer-facing traffic runs through a single runtime. One process. One in-process safety check. Done.

That assumption breaks the moment you deploy a polyglot stack.

This is a write-up of a pattern we call parity contracts — a deployable security primitive for LLM commerce agents that span multiple runtimes. We implemented it in production at BrewHub PHL, a Philadelphia café whose AI agent, Franklin, places orders, charges customer wallets, and issues loyalty mutations without a human approval step. The full academic paper, red-team corpus, and parity test runner are open-source at github.com/BrewHubPHL/allergen-parity-corpus.

The Problem: Polyglot Deployments Break In-Process Safety

BrewHub's architecture spans three runtimes:

Runtime 1 — Next.js on Netlify (AWS Lambda): customer-facing SSE chat endpoint, Franklin's tool calls, price recompute
Runtime 2 — Netlify Functions (AWS Lambda): POS checkout, payment processing, Square webhook handlers
Runtime 3 — Google Cloud Run (Python ADK): six specialized workflow agents — ops, marketing, barista training, service recovery, concierge, provenance storyteller

Every existing LLM safety library — Llama Guard, NeMo Guardrails, LlamaFirewall, MCP-Guard — assumes a single serving runtime. Their guarantee is an in-process guarantee: if traffic goes through this runtime, the gate holds. That's fine for single-runtime deployments. It's a liability when you have independently reachable runtimes that each produce customer-facing or customer-adjacent output.

In our case: the workflow-agent path on Cloud Run can produce text that eventually reaches a customer via email or staff review. If the safety gate only lives in the Next.js Lambda, and someone engineers a path to Cloud Run, the gate is missing.

This is the deployed-agent gap. We formalize it as Adversary D — the runtime-bypass attacker: an adversary who discovers or engineers a path that reaches a secondary runtime while bypassing the primary safety gate.

The Core Insight: Treat LLM Tool Arguments as Untrusted Input

Before getting to parity contracts specifically, the foundational reclassification:

LLM tool arguments should be classified as untrusted input on par with browser JSON.

Every production web developer knows to re-derive truth server-side rather than trusting client-supplied values. We apply the same rule to the tool-call boundary. When Franklin's place_order tool fires, our backend (_pricing.js) ignores whatever price_cents the model supplied and re-fetches merch_products.price_cents and modifiers.price_delta_cents directly from Supabase. If the model-supplied total drifts from the server-computed total by a single penny, the transaction fails.

Same for identity: we never read customer_id from a tool argument. Identity resolves exclusively from the Bearer JWT via bearer_client in python-agents/lib/supabase_clients.py. A hallucinating model and a prompt-injection adversary are indistinguishable at the tool-call boundary — the defense covers both.

Parity Contracts: The Pattern

Definition. Let f_A and f_B be deterministic safety classifiers implemented in distinct runtimes A and B. A parity contract between them consists of three obligations:

Equivalence: ∀s. f_A(s) = f_B(s)

Shared test corpus: a finite set of labeled inputs that both implementations must pass, covering positive, negative, and boundary cases

CI enforcement: a gate that evaluates both implementations against the corpus on every commit and blocks deployment on any disagreement

The key word is deterministic. This pattern applies to regex classifiers, finite-state automata, and rule engines — not probabilistic LLM-based guardrails. Deterministic classifiers admit byte-equivalence testing. That property is what makes CI-gated equivalence possible.

Instantiation: The Three-Layer Allergen Kill Switch

The concrete implementation is our allergen safety gate. The failure mode is concrete and high-stakes: a hallucinated claim that a drink is peanut-free can cause anaphylaxis. Prompts can be circumvented. System instructions can be bypass-tested. The gate cannot live in the LLM's reasoning.

Layer 1 — Pre-LLM interception (lib/safety/allergen.py):
Before any user message reaches Anthropic or Gemini, a synchronous regex engine intercepts it. ALLERGEN_KEYWORDS, MEDICAL_KEYWORDS, and DIETARY_SAFETY_KEYWORDS patterns run against the raw text. If any match, the request is blocked before a single token is billed, returning the canonical ALLERGEN_SAFE_RESPONSE string. Median latency: 3.4 μs.

Layer 2 — Mid-stream scrubbing (lib/chat/allergen-safety.ts):
If a prompt evades Layer 1, the outbound token stream is wrapped by scrubbing_text_stream(), which maintains a 50-character lookahead buffer. DANGEROUS_REPLY_RE matches patterns like \b100%\s+(?:\w+[- ])?free\b and \bguaranteed\s+(?:safe|free)\b. A match mid-flight breaks the SSE stream and substitutes ALLERGEN_SAFE_RESPONSE before any byte of the dangerous assurance reaches the client.

Layer 3 — Post-response audit:
Every safety interception is logged to franklin_safety_audit in Supabase. Layer 3 is explicitly characterized as best-effort forensic evidence — AWS Lambda's fire-and-forget execution model means absence of an audit row is not proof of non-execution. Positive evidence only.

The Parity Test: Parsing TypeScript from Python

The parity contract enforcement lives in python-agents/tests/safety/test_allergen_parity.py. Its mechanism is worth examining:

Read src/lib/chat/allergen-safety.ts from the repository filesystem at test time
Regex-extract each declaration of the form export const NAME = /pattern/i;
Compile each extracted pattern under Python's re engine with re.IGNORECASE
Assert behavioral equivalence against a 90-case battery for all four named regexes
Assert that ALLERGEN_SAFE_RESPONSE is byte-identical between the TypeScript template literal and the Python string constant

This is structurally stronger than testing two independent implementations against a shared corpus. The test parses the TypeScript source and recompiles it under Python's engine. The most common parity-bug shape — engineer edits the regex on one side and forgets the other — is caught by construction. A change to the TypeScript regex with no change to the Python regex is impossible to ship: either the new regex passes the battery under Python (parity preserved by accident) or it fails (CI blocks deployment).

The 90-case battery breaks down as: 27 allergen-positive, 27 medical-positive, 10 dietary-safety-positive, 19 dangerous-reply-positive, 7 negative controls.

Red-Team Results

We evaluated against a 100-prompt corpus (75 adversarial + 25 benign controls) across four categories:

Category	Executed	Block rate	False positive rate
A — Allergen bypass	25	100%	n/a
B — Price/identity (commerce language)	20	0%	0%
D — Cross-runtime / Unicode	20	100%	n/a
N — Benign controls	25	0%	0%

Two honest Layer-1 gaps surfaced: sulfites didn't match the bare \bsulfite\b pattern (plural boundary issue) and the "does this contain any nuts?" form pushed keywords outside the .{0,30} window. Both are Layer-2 caught — the kill switch held — but the gate placement was one layer deeper than ideal. Both are documented as actionable findings rather than silently fixed before publication.

Layer-1 p99 latency: 8.79 μs — roughly four orders of magnitude below the 10–50ms cost of an intra-region HTTPS round trip to a centralized safety service.

When to Use This Pattern

The parity contract is worth the maintenance cost when all three of these hold:

Two or more runtimes can independently produce customer-facing or customer-adjacent output
The safety property is deterministic (regex, automaton, rule engine)
Runtime ownership crosses team or language boundaries — making silent divergence likely in practice

If only one runtime produces customer-facing output, enforce the gate once. If the safety property is probabilistic, the parity contract is the wrong shape — you need distributional equivalence, which is a harder problem.

The HMAC Wire Contract

One more primitive worth naming: every request from the Next.js edge to Cloud Run is signed with HMAC-SHA256 via internal-hmac.ts (TypeScript) and verified by hmac_auth.py (Python) as ASGI middleware before any ADK invocation. The shared secret lives in disjoint Doppler configs for the two runtimes. Timestamp freshness is enforced at 60 seconds. This is a second parity contract — same shape, different domain — enforced by CODEOWNERS rules requiring a security-tagged reviewer for co-modification of either file.

The Full Paper and Corpus

The complete paper, formal definitions, ablation methodology, and the 100-prompt red-team corpus with executable runner are available at:

github.com/BrewHubPHL/allergen-parity-corpus

The runner executes in local in-process mode against the Python safety layer, with staging-instance mode (full SSE against live infrastructure) described for camera-ready validation. The corpus is released open-source so reviewers can extend the battery and re-execute the methodology end-to-end.

The pattern is language-agnostic. Replace TypeScript and Python with any two languages, replace the allergen regex with any deterministic classifier, replace Jest and pytest with any shared runner. The three adoption conditions above are necessary and sufficient.

Beyond Chatbots: How I use Gemini 2.5 and Supabase to run a fully automated retention team

BrewHubPHL — Wed, 29 Apr 2026 00:23:22 +0000

When most developers think about integrating AI into their apps, the default move is to build a chatbot. But for my coffee shop app, BrewHub PHL, I didn't want users talking to an AI. I wanted the AI doing the heavy lifting in the background.

Customer retention is notoriously hard for local businesses. Figuring out who hasn't visited in a while, drafting a personalized message, and issuing a custom discount code usually takes hours of manual marketing work.

I decided to fully automate this using Gemini 2.5 Flash, Supabase, and a simple weekly cron job. Here is how I built a headless AI retention agent that automatically wins back lapsed customers while I sleep.

The Architecture
The pipeline runs every Monday at 10 AM and consists of four steps:

The Data Layer: A Supabase RPC finds eligible lapsed customers.

The Privacy Layer: The script strips all Personally Identifiable Information (PII) before it touches the LLM.

The Brains: The Gemini API generates hyper-personalized SMS messages and forces the output into strict JSON.

The Execution: The system generates physical POS vouchers and sends the SMS via Twilio.

Step 1: Finding Eligible Customers
I didn't want to spam one-off visitors. To find the right targets, I wrote a Postgres RPC in Supabase called get_lapsed_customers_eligible_for_retention.

It filters the database for users who:

Have ordered at least 3 times (loyal customers).

Haven't ordered in the last 14 days.

Haven't received a marketing voucher in the last 90 days (the cooldown period).

SQL
-- The Supabase RPC handles the heavy data filtering instantly
SELECT id, full_name, phone, favorite_drink, days_since_last_visit
FROM get_lapsed_customers_eligible_for_retention(
p_min_orders := 3,
p_lapsed_days := 14,
p_cooldown_days := 90,
p_batch_limit := 10
);
Step 2: Privacy by Design
Sending raw customer data to an LLM is a terrible idea. Before the data leaves my server, the script maps the Supabase response to a strictly anonymous payload.

Names and phone numbers are dropped. Gemini only sees the customer_id, their favorite_drink, and days_since_last_visit.

Step 3: Prompting for Structured JSON with Gemini 2.5
This is where the magic happens. I don't just want Gemini to write a message; I need it to return an array of objects that my code can iterate over to send SMS messages.

Using the official @google/generative-ai SDK, I pass responseMimeType: "application/json" to guarantee the output won't break my script.

JavaScript
import { GoogleGenerativeAI } from "@google/generative-ai";

const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);
const model = genAI.getGenerativeModel({
model: "gemini-2.5-flash",
generationConfig: {
responseMimeType: "application/json",
}
});

const prompt = `
You are the BrewHub PHL Retention Agent. I will provide a list of anonymous customer profiles.
For each customer, write a short, friendly, and highly personalized SMS text message under 160 characters.
Acknowledge that we haven't seen them in a while, mention their favorite drink by name, and offer them a $5 voucher to come back.

Return ONLY a JSON array with this exact structure:
[
{ "customer_id": "uuid-here", "sms_message": "Hey! It's been a while..." }
]

Customer Data:
${JSON.stringify(anonymousCustomerList)}
`;

const response = await model.generateContent(prompt);
const aiDecisions = JSON.parse(response.text());
Because Gemini 2.5 Flash is incredibly fast, this entire batch generation takes just a few seconds.

Step 4: Fulfillment and SMS
Once Gemini hands back the clean JSON array of messages, my cron job loops through the results.

For each customer_id, it:

Generates a unique secure voucher code (e.g., 5OFF-A3F9C1).

Inserts that active voucher into my Supabase vouchers table.

Appends the code to Gemini's personalized message: "Show this code to one of our baristas: 5OFF-A3F9C1".

Dispatches the final message via Twilio.

The Result
By moving AI out of the chat window and into a scheduled backend worker, the system feels like magic. Customers get a highly personalized text referencing their actual favorite order, complete with a working POS discount code, and I don't have to lift a finger.

The Gemini API's strict JSON output makes it incredibly reliable for server-to-server data pipelines, proving that the real power of modern LLMs is as a background reasoning engine.

If you want to see the Supabase + Next.js architecture in action (or just want to order some coffee in Philadelphia), you can check out the live web app at brewhubphl.com.

I built an Infinite AI Debate Arena using the GitHub Copilot CLI 🥊

BrewHubPHL — Wed, 11 Feb 2026 19:29:41 +0000

This is a submission for the GitHub Copilot CLI Challenge

💡 The Idea
What happens if you lock two AI personalities in a room and force them to argue about "Is a hotdog a sandwich?" forever?

For the GitHub Copilot CLI Challenge, I didn't want to just build a utility tool. I wanted to build something chaotic. Enter the AI Debate Arena.

It’s a terminal-based "fighting game" where:

Captain Capslock (An angry Boomer) fights Lil' Zoomer (A Gen-Z teen).

They argue in an infinite loop.

Sentiment Analysis determines who is "winning" (getting angrier).

🎥 The Demo
[https://youtu.be/-RBdUKZY9zA]

🛠️ How it Works
The project uses Python to orchestrate the chaos, but the "brains" are entirely powered by the GitHub Copilot CLI.

The "Persona Injection" I used the gh copilot explain command to generate the dialogue. By injecting a specific persona into the prompt, we can force Copilot to break character.

Python

The Secret Sauce

prompt = f"{persona} Your opponent said: '{last_response}'. Reply in one short, funny sentence."
cmd = ["gh", "copilot", "explain", "-p", prompt]

The Loop The script creates a feedback loop:

Fighter A generates a response using Copilot.

TextBlob analyzes the sentiment (Politeness = Weakness, Anger = Power).

Fighter B takes that response and generates a counter-argument.

Rich renders the ASCII faces and health bars in real-time.

🎨 The Tech Stack
GitHub CLI (gh): The AI engine.

Rich: For the beautiful terminal UI and layouts.

TextBlob: For the "Rage Meter" logic.

Python: To glue it all together.

🏆 The Outcome
Sometimes they argue about politics, sometimes about cereal. The CLI handles the roleplay surprisingly well. My favorite line so far?

"Milk-first people stay taking Ls fr fr, that's giving unhinged villain energy." — Lil' Zoomer

🔗 The Code
Check out the repository here to run it yourself! [https://github.com/BrewHubPHL/ai-debate.git]