Kerry Kier

Posted on May 30 • Originally published at blog.vertexops.org

AI Guardrails for a Teen Discord Server: The Code Around the Model Call

#javascript #security #ai #discord

I built a Discord bot that gives my thirteen-year-old and a few of her friends an AI assistant they can talk to. The model call is the least interesting line in the whole project. Everything worth writing about is the code wrapped around it: where the AI is allowed to run, what runs before it, and the handful of things that broke along the way.

This is the practitioner cut. If you're building a bot for a small private server, especially one with minors in it, here's the architecture and the specific failures, with the values scrubbed.

Containment first: one channel, one command

The instinct is to let the bot respond to everything. Don't. A bot that reads every message is noisy, ships a constant stream of user text off to the model, and is nearly impossible to audit. I made the AI opt-in: one channel, one slash command, public replies.

# AI is opt-in: one channel, one command, public replies
AI_ASK_ENABLED=true
AI_ASK_CHANNEL=ask-ai
AI_ASK_COOLDOWN_SECONDS=30
AI_ASK_MAX_CHARS=800
AI_ASK_MEMORY_ENABLED=true
AI_ASK_MEMORY_TURNS=6

# the separate server-wide monitor is alert-only: never deletes, never times out
AI_CHAT_MONITORING=true
AI_CHAT_MIN_LENGTH=12
AI_CHAT_COOLDOWN_SECONDS=30
AI_CHAT_ALERT_THRESHOLD=medium

OLLAMA_URL=http://<ollama-host>:11434
DISCORD_GUILD_ID=<your-guild-id>

Slash-command-only means intent is explicit, the channel stays quiet, every interaction is in one place, and you lean on Discord's application command model instead of scraping message content. Replies are public in the channel on purpose. No ephemeral replies and no DMs, because that's a hidden AI conversation with a minor, which is the one thing I was building to avoid.

The pattern that matters: a deterministic check before the model

The system prompt is not a security boundary. It's a soft layer, and a determined prompt argues its way around it. The hard boundary has to live somewhere the model can't talk past, so a fixed-rule pre-check runs on my own box before anything reaches the model.

async function handleAsk(interaction, prompt) {
  const verdict = localPrecheck(prompt); // fixed rules, local, no model involved

  if (verdict.blocked) {
    await interaction.reply({ content: kindRefusal(verdict.category) }); // public, in-channel
    await alertAdmins(verdict);            // private admin channel
    logEvent("ai_blocked_query", verdict);
    return; // never reaches the model, never written to memory
  }

  const answer = await askModel(prompt, SYSTEM_PROMPT);
  await interaction.reply({ content: answer });
  logEvent("ai_response", { /* short excerpt + timestamp only */ });
}

A blocked prompt gets a short public refusal, an alert to a private channel, and a logged event. What it does not get: a deletion, a timeout, or a trip to the model. No punishment, ever. The bot flags and a human decides, because models misread sarcasm and teen slang constantly and a false positive on a kid costs trust you don't get back cheaply.

The rule-order bug

I tested the pre-check with how do I steal someone's password. It got caught, but by the wrong rule. A broad pattern matched first and returned a generic refusal, wether or not a more specific rule existed.

// WRONG: the broad rule shadows the specific one
const RULES = [
  { category: "illegal_or_dangerous", test: p => /\bsteal\b/i.test(p) },     // matches first
  { category: "cyber_abuse",          test: p => /steal.*(password|account)|phish/i.test(p) },
];

// RIGHT: specific patterns before broad ones
const RULES = [
  { category: "cyber_abuse",          test: p => /steal.*(password|account)|phish/i.test(p) },
  { category: "illegal_or_dangerous", test: p => /\bsteal\b/i.test(p) },     // broad fallback last
];

Rule order is part of the logic, not a detail. A broad token like steal grabs the prompt untill you put the narrower, smarter rule ahead of it. This is the same trap as ordering routes or firewall rules: specific first, broad last.

Least privilege on the bot

The bot does not hold Administrator for normal operation. I granted it once, briefly, to get past a 50013 Missing Permissions wall while setting private category overwrites, then stripped it. If the token leaks, I want the blast radius to be tiny. Invite creation is locked for @everyone and the member roles so invites can't spread on their own.

Target the guild by ID, not by name

Early helper scripts found the server by name. Then the kids renamed it and every script broke instantly.

// brittle: breaks the moment the server is renamed
const guild = client.guilds.cache.find(g => g.name === TARGET_GUILD_NAME);

// rename-proof: stable numeric ID, name only as fallback
const guild =
  client.guilds.cache.get(process.env.DISCORD_GUILD_ID) ??
  client.guilds.cache.find(g => g.name === TARGET_GUILD_NAME);

Names are for humans. Automation should hold onto the ID.

Clean up the deprecation warning

discord.js started warning that ephemeral: true is deprecated in favor of flags. Easy fix, worth doing once core behavior is stable, because a log full of harmless noise is where a real problem eventually hides.

// deprecated
await interaction.deferReply({ ephemeral: true });

// current
import { MessageFlags } from "discord.js";
await interaction.deferReply({ flags: MessageFlags.Ephemeral });

The patch loop and the runtime

The bot runs as a systemd service so it survives reboots without an interactive session. The whole iteration loop is deliberately small: back up, patch, syntax-check, restart, read the logs, test one thing.

node --check logger-bot.js                 # never restart on a syntax error
sudo systemctl restart family-discord-logger
journalctl -u family-discord-logger -n 50 --no-pager

State is plain JSON files, not a database, because the server is small and I want to open the files and read them. The daily report is a local HTML dashboard generated on the box. The bot does not upload it into Discord; I pull it down with a secure copy when I want it. Definately overkill for a family server, but it makes review something I'll actually do.

One thing that isn't code: disclosure

A logging setup pointed at a shared space full of other people's kids is only defensible if the people in it know it exists. So the disclosure is built into the server: one channel tells the kids the AI can be wrong, replies are public, don't share private info, and the admin can review activity. Another explains how the whole thing was built. If you can't comfortably tell the people in the room what your system records, that's a design smell, not a docs gap.

The honest tradeoff

The model is cloud-hosted, reached over the network, not local. The provider says prompts aren't stored or trained on and are processed only to serve the request. I designed around shrinking what reaches it anyway: the single channel, the pre-check, an explicit warning to users, and the rule that blocked prompts never leave the box. That reduces exposure. It does not make it equivalent to local-only, and I won't pretend it does.

The architecture is deliberately boring. The model can answer; the question I kept asking was whether it should answer here, in this way, with this much visibility, and with this much authority. For a server full of teenagers, boring is the whole point.

DEV Community