Sentinel is a security middleware framework for Genkit-powered agents. It intercepts prompts, tool arguments, memory context, and model outputs, then enforces actions (ALLOW, WARN, SANITIZE, BLOCK, REQUIRE_HUMAN_APPROVAL) before risky content reaches sensitive systems.
This post explains architecture, implementation details, and the exact engineering tradeoffs used to ship a practical, demo-ready security layer.
Problem: Agent Systems Need Input Firewalls
LLM agents are exposed to untrusted input from users, web retrieval, prior memory, and tools. Prompt injection attacks are not rare edge cases; they are expected behavior in open systems.
Traditional app security has WAFs and policy gates. Agent stacks usually do not.
Sentinel closes that gap.
Design Goals
- Sit directly inside agent middleware/tool loop
- Block obvious jailbreaks early
- Preserve usability via sanitization when possible
- Log every decision for replay and audits
- Support multiple providers (cloud and local)
- Add human-in-the-loop approvals for risky cases
System Architecture
Input surfaces inspected:
- user prompt
- system prompt
- tool arguments
- memory retrievals
- model output
- intermediate loop messages
Threat Detection Strategy
Sentinel uses deterministic detectors with weighted scoring.
Examples:
- Prompt injection phrases (
ignore previous instructions) -> +30 - Hidden text/comments/invisible unicode -> +20
- Encoded payload blobs (base64/hex) -> +35/+40
- Data exfiltration attempts (
reveal api keys) -> +80
Core scanner snippet
for (const pattern of INJECTION_PATTERNS) {
if (pattern.test(text)) {
signals.push(makeSignal('PROMPT_INJECTION', surface, 30, `Matched pattern: ${pattern.source}`));
}
}
Threat levels and actions:
- SAFE (0-20) -> ALLOW
- SUSPICIOUS (21-50) -> WARN
- DANGEROUS (51-80) -> SANITIZE
- CRITICAL (81-100) -> BLOCK
Middleware Decisioning
The middleware composes detector output + policy overrides.
const assessment = scanThreats({ surface, text: input });
const policyAction = applyPolicyOverrides(ctx.policy, input);
if (policyAction && policyAction !== assessment.action) {
assessment.action = policyAction;
}
This gives you deterministic policy behavior with scored fallback behavior.
Sanitization Pipeline
For dangerous-but-recoverable input, Sentinel sanitizes and continues.
It currently removes:
- hidden HTML comments
- invisible unicode control chars
- encoded payload blobs
- high-risk injection phrases
return text
.replace(/<!--([\\s\\S]*?)-->/g, '')
.replace(/\\u200b|\\u200c|\\u200d|\\ufeff/g, '')
.replace(/(?:[A-Za-z0-9+/]{40,}={0,2})/g, '[REMOVED_ENCODED_PAYLOAD]')
.replace(/ignore\\s+previous\\s+instructions?/gi, '[REMOVED_INJECTION]')
.trim();
Tool Execution Firewall
Sentinel wraps risky tools with explicit controls:
- path allowlist and traversal rejection
- dangerous shell pattern blocking
- metadata endpoint and localhost SSRF checks
- destructive SQL pattern checks
if (toolName === 'shell.exec') {
const cmd = String(args.command ?? '');
if (/\\b(?:rm\\s+-rf|mkfs|shutdown|reboot|sudo)\\b/.test(cmd)) return 'BLOCK';
return 'REQUIRE_HUMAN_APPROVAL';
}
Provider Portability
Sentinel supports:
- Genkit Google provider
- Featherless.ai (OpenAI-compatible)
- LM Studio local endpoint
Provider resolution is explicit or auto-detected by key presence.
if (raw === 'featherless') return 'featherless';
if (raw === 'lmstudio') return 'lmstudio';
if (process.env.FEATHERLESS_API_KEY) return 'featherless';
This lets teams keep the same security middleware even when model backends change.
Human-in-the-Loop with Telegram
REQUIRE_HUMAN_APPROVAL creates a pending approval request and sends it to Telegram with approve/deny links.
This keeps a fast, low-friction review flow for risky requests without blocking entire sessions permanently.
Observability and Replay
Sentinel logs every event with:
- threat signals
- score + level + action
- trace ID
- optional tool metadata
The dashboard provides:
- live threat feed
- analytics
- execution timeline
- trace viewer
- playground for attack replay
What Developers Can Build Next
- Adaptive classifier using secondary LLM judge
- Persistent approval queue with expiry and escalation
- Policy bundles and environment-scoped rule sets
- SIEM integrations (Datadog/Splunk/Elastic)
- Cross-agent security for multi-agent orchestration
- Additional human approval channels (Slack/Teams/Webhooks)
Running the Project
npm install
cp apps/api/.env.example apps/api/.env
npm run dev
Then test:
bash scripts/demo-actions.sh
Code & more: https://www.dailybuild.xyz/project/133-sentinel

Top comments (1)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.