DEV Community

Cover image for Building Sentinel: A WAF for AI Agents with Genkit
Harish Kotra (he/him)
Harish Kotra (he/him)

Posted on

Building Sentinel: A WAF for AI Agents with Genkit

Sentinel is a security middleware framework for Genkit-powered agents. It intercepts prompts, tool arguments, memory context, and model outputs, then enforces actions (ALLOW, WARN, SANITIZE, BLOCK, REQUIRE_HUMAN_APPROVAL) before risky content reaches sensitive systems.

This post explains architecture, implementation details, and the exact engineering tradeoffs used to ship a practical, demo-ready security layer.


Problem: Agent Systems Need Input Firewalls

LLM agents are exposed to untrusted input from users, web retrieval, prior memory, and tools. Prompt injection attacks are not rare edge cases; they are expected behavior in open systems.

Traditional app security has WAFs and policy gates. Agent stacks usually do not.

Sentinel closes that gap.


Design Goals

  • Sit directly inside agent middleware/tool loop
  • Block obvious jailbreaks early
  • Preserve usability via sanitization when possible
  • Log every decision for replay and audits
  • Support multiple providers (cloud and local)
  • Add human-in-the-loop approvals for risky cases

System Architecture

System Architecture

Input surfaces inspected:

  • user prompt
  • system prompt
  • tool arguments
  • memory retrievals
  • model output
  • intermediate loop messages

Threat Detection Strategy

Sentinel uses deterministic detectors with weighted scoring.

Examples:

  • Prompt injection phrases (ignore previous instructions) -> +30
  • Hidden text/comments/invisible unicode -> +20
  • Encoded payload blobs (base64/hex) -> +35/+40
  • Data exfiltration attempts (reveal api keys) -> +80

Core scanner snippet

for (const pattern of INJECTION_PATTERNS) {
  if (pattern.test(text)) {
    signals.push(makeSignal('PROMPT_INJECTION', surface, 30, `Matched pattern: ${pattern.source}`));
  }
}
Enter fullscreen mode Exit fullscreen mode

Threat levels and actions:

  • SAFE (0-20) -> ALLOW
  • SUSPICIOUS (21-50) -> WARN
  • DANGEROUS (51-80) -> SANITIZE
  • CRITICAL (81-100) -> BLOCK

Middleware Decisioning

The middleware composes detector output + policy overrides.

const assessment = scanThreats({ surface, text: input });
const policyAction = applyPolicyOverrides(ctx.policy, input);

if (policyAction && policyAction !== assessment.action) {
  assessment.action = policyAction;
}
Enter fullscreen mode Exit fullscreen mode

This gives you deterministic policy behavior with scored fallback behavior.


Sanitization Pipeline

For dangerous-but-recoverable input, Sentinel sanitizes and continues.

It currently removes:

  • hidden HTML comments
  • invisible unicode control chars
  • encoded payload blobs
  • high-risk injection phrases
return text
  .replace(/<!--([\\s\\S]*?)-->/g, '')
  .replace(/\\u200b|\\u200c|\\u200d|\\ufeff/g, '')
  .replace(/(?:[A-Za-z0-9+/]{40,}={0,2})/g, '[REMOVED_ENCODED_PAYLOAD]')
  .replace(/ignore\\s+previous\\s+instructions?/gi, '[REMOVED_INJECTION]')
  .trim();
Enter fullscreen mode Exit fullscreen mode

Tool Execution Firewall

Sentinel wraps risky tools with explicit controls:

  • path allowlist and traversal rejection
  • dangerous shell pattern blocking
  • metadata endpoint and localhost SSRF checks
  • destructive SQL pattern checks
if (toolName === 'shell.exec') {
  const cmd = String(args.command ?? '');
  if (/\\b(?:rm\\s+-rf|mkfs|shutdown|reboot|sudo)\\b/.test(cmd)) return 'BLOCK';
  return 'REQUIRE_HUMAN_APPROVAL';
}
Enter fullscreen mode Exit fullscreen mode

Provider Portability

Sentinel supports:

  • Genkit Google provider
  • Featherless.ai (OpenAI-compatible)
  • LM Studio local endpoint

Provider resolution is explicit or auto-detected by key presence.

if (raw === 'featherless') return 'featherless';
if (raw === 'lmstudio') return 'lmstudio';
if (process.env.FEATHERLESS_API_KEY) return 'featherless';
Enter fullscreen mode Exit fullscreen mode

This lets teams keep the same security middleware even when model backends change.


Human-in-the-Loop with Telegram

REQUIRE_HUMAN_APPROVAL creates a pending approval request and sends it to Telegram with approve/deny links.

This keeps a fast, low-friction review flow for risky requests without blocking entire sessions permanently.


Observability and Replay

Sentinel logs every event with:

  • threat signals
  • score + level + action
  • trace ID
  • optional tool metadata

The dashboard provides:

  • live threat feed
  • analytics
  • execution timeline
  • trace viewer
  • playground for attack replay

What Developers Can Build Next

  • Adaptive classifier using secondary LLM judge
  • Persistent approval queue with expiry and escalation
  • Policy bundles and environment-scoped rule sets
  • SIEM integrations (Datadog/Splunk/Elastic)
  • Cross-agent security for multi-agent orchestration
  • Additional human approval channels (Slack/Teams/Webhooks)

Running the Project

npm install
cp apps/api/.env.example apps/api/.env
npm run dev
Enter fullscreen mode Exit fullscreen mode

Then test:

bash scripts/demo-actions.sh
Enter fullscreen mode Exit fullscreen mode

Code & more: https://www.dailybuild.xyz/project/133-sentinel

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.