Harish Kotra (he/him)

Posted on May 15

Building Sentinel: A WAF for AI Agents with Genkit

#ai #programming #productivity #dailybuild2026

Sentinel is a security middleware framework for Genkit-powered agents. It intercepts prompts, tool arguments, memory context, and model outputs, then enforces actions (ALLOW, WARN, SANITIZE, BLOCK, REQUIRE_HUMAN_APPROVAL) before risky content reaches sensitive systems.

This post explains architecture, implementation details, and the exact engineering tradeoffs used to ship a practical, demo-ready security layer.

Problem: Agent Systems Need Input Firewalls

LLM agents are exposed to untrusted input from users, web retrieval, prior memory, and tools. Prompt injection attacks are not rare edge cases; they are expected behavior in open systems.

Traditional app security has WAFs and policy gates. Agent stacks usually do not.

Sentinel closes that gap.

Design Goals

Sit directly inside agent middleware/tool loop
Block obvious jailbreaks early
Preserve usability via sanitization when possible
Log every decision for replay and audits
Support multiple providers (cloud and local)
Add human-in-the-loop approvals for risky cases

System Architecture

Input surfaces inspected:

user prompt
system prompt
tool arguments
memory retrievals
model output
intermediate loop messages

Threat Detection Strategy

Sentinel uses deterministic detectors with weighted scoring.

Examples:

Prompt injection phrases (ignore previous instructions) -> +30
Hidden text/comments/invisible unicode -> +20
Encoded payload blobs (base64/hex) -> +35/+40
Data exfiltration attempts (reveal api keys) -> +80

Core scanner snippet

for (const pattern of INJECTION_PATTERNS) {
  if (pattern.test(text)) {
    signals.push(makeSignal('PROMPT_INJECTION', surface, 30, `Matched pattern: ${pattern.source}`));
  }
}

Threat levels and actions:

SAFE (0-20) -> ALLOW
SUSPICIOUS (21-50) -> WARN
DANGEROUS (51-80) -> SANITIZE
CRITICAL (81-100) -> BLOCK

Middleware Decisioning

The middleware composes detector output + policy overrides.

const assessment = scanThreats({ surface, text: input });
const policyAction = applyPolicyOverrides(ctx.policy, input);

if (policyAction && policyAction !== assessment.action) {
  assessment.action = policyAction;
}

This gives you deterministic policy behavior with scored fallback behavior.

Sanitization Pipeline

For dangerous-but-recoverable input, Sentinel sanitizes and continues.

It currently removes:

hidden HTML comments
invisible unicode control chars
encoded payload blobs
high-risk injection phrases

return text
  .replace(/<!--([\\s\\S]*?)-->/g, '')
  .replace(/\\u200b|\\u200c|\\u200d|\\ufeff/g, '')
  .replace(/(?:[A-Za-z0-9+/]{40,}={0,2})/g, '[REMOVED_ENCODED_PAYLOAD]')
  .replace(/ignore\\s+previous\\s+instructions?/gi, '[REMOVED_INJECTION]')
  .trim();

Tool Execution Firewall

Sentinel wraps risky tools with explicit controls:

path allowlist and traversal rejection
dangerous shell pattern blocking
metadata endpoint and localhost SSRF checks
destructive SQL pattern checks

if (toolName === 'shell.exec') {
  const cmd = String(args.command ?? '');
  if (/\\b(?:rm\\s+-rf|mkfs|shutdown|reboot|sudo)\\b/.test(cmd)) return 'BLOCK';
  return 'REQUIRE_HUMAN_APPROVAL';
}

Provider Portability

Sentinel supports:

Genkit Google provider
Featherless.ai (OpenAI-compatible)
LM Studio local endpoint

Provider resolution is explicit or auto-detected by key presence.

if (raw === 'featherless') return 'featherless';
if (raw === 'lmstudio') return 'lmstudio';
if (process.env.FEATHERLESS_API_KEY) return 'featherless';

This lets teams keep the same security middleware even when model backends change.

Human-in-the-Loop with Telegram

REQUIRE_HUMAN_APPROVAL creates a pending approval request and sends it to Telegram with approve/deny links.

This keeps a fast, low-friction review flow for risky requests without blocking entire sessions permanently.

Observability and Replay

Sentinel logs every event with:

threat signals
score + level + action
trace ID
optional tool metadata

The dashboard provides:

live threat feed
analytics
execution timeline
trace viewer
playground for attack replay

What Developers Can Build Next

Adaptive classifier using secondary LLM judge
Persistent approval queue with expiry and escalation
Policy bundles and environment-scoped rule sets
SIEM integrations (Datadog/Splunk/Elastic)
Cross-agent security for multi-agent orchestration
Additional human approval channels (Slack/Teams/Webhooks)

Running the Project

npm install
cp apps/api/.env.example apps/api/.env
npm run dev

Then test:

bash scripts/demo-actions.sh

Code & more: https://www.dailybuild.xyz/project/133-sentinel

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.