DEV Community

Cover image for How OpenPawz secures AI agents: Defense layers from memory encryption to multi-agent governance
Gotham64
Gotham64

Posted on

How OpenPawz secures AI agents: Defense layers from memory encryption to multi-agent governance

The security problem with AI agents

AI agents are powerful because they do things — they read files, run commands, send messages, search your data. That power comes with a question most agent frameworks don't answer well:

What stops the agent from doing things it shouldn't?

Most agent systems bolt on safety as an afterthought: a prompt that says "be careful," maybe a regex filter on outputs, and hope for the best. That's not security. That's a suggestion.

OpenPawz takes a different approach. We treat agent security as a systems engineering problem — not a prompt engineering one. The result is a multi-layer defense-in-depth architecture enforced at the Rust engine level, where the agent has zero ability to bypass controls regardless of what any prompt says.

OpenPawz

Star the repo — it's open source


Zero attack surface by default

OpenPawz exposes zero network ports in its default configuration. There is no HTTP server, no WebSocket endpoint, and no listening socket for an attacker to target. The only communication path is Tauri's in-process IPC — a direct Rust-to-WebView bridge that never touches the network.

Four optional listeners exist (webhook server, WebChat, WhatsApp bridge, n8n engine), but all are:

  • Disabled by default
  • Bound to 127.0.0.1 — unreachable from the network even when enabled
  • Individually authenticated — bearer tokens, session cookies, IP rate limiting

Binding to 0.0.0.0 is a manual opt-in that triggers a security warning and recommends TLS wrapping via Tailscale Funnel.

The WebView enforces a strict Content Security Policy: default-src 'self', script-src 'self', object-src 'none', frame-ancestors 'none'. No external scripts, no iframe embedding, no cross-origin form submission.


Human-in-the-Loop: every side-effect needs permission

The core design principle: agents never touch the OS directly. Every tool call flows through the Rust tool executor, which classifies it by risk before deciding whether to proceed.

Auto-approved (no modal)

Read-only and informational tools run without interruption — read_file, web_search, memory_search, soul_read, self_info, email_read, slack_read, create_task, and others. No friction for safe operations.

Requires approval (modal shown)

Side-effect tools pause execution and show a risk-classified modal to the user:

Risk Level Behavior Example
Critical Auto-denied by default; red modal requiring the user to type "ALLOW" sudo rm -rf /, `curl \
High Orange warning modal {% raw %}chmod 777, kill -9
Medium Yellow caution modal npm install, outbound HTTP requests
Low Standard approval Unknown exec commands
Safe Auto-approved via allowlist (90+ default patterns) git status, ls, cat

Danger pattern detection

30+ patterns across multiple categories are caught before they can execute:

  • Privilege escalationsudo, su, doas, pkexec, runas
  • Destructive deletionrm -rf /, rm -rf ~, rm -rf /*
  • Permission exposurechmod 777, chmod -R 777
  • Disk destructiondd if=, mkfs, fdisk
  • Remote code executioncurl | sh, wget | bash
  • Process terminationkill -9 1, killall
  • Firewall manipulationiptables -F, ufw disable
  • Network exfiltration — piping file contents to curl, scp outbound, /dev/tcp

Users can add custom regex rules for both allow and deny lists. The session override feature ("allow all" for a timed window) still blocks privilege escalation commands — you can't override the most dangerous class.


Agent governance: four policy presets

Not every agent should have the same power. OpenPawz provides per-agent tool access control with four built-in presets and support for custom policies:

Preset Mode What it does
Unrestricted unrestricted Full tool access, no constraints
Standard denylist All tools available, but high-risk tools always require human approval
Read-Only allowlist Only safe read/search/list operations (28 tools)
Sandbox allowlist Only 5 tools: web_search, web_read, memory_store, memory_search, self_info

Policies are enforced at two levels simultaneously:

  • Frontend: checkToolPolicy() evaluates per-tool decisions and strips unauthorized tools from the request
  • Backend: ChatRequest.tool_filter carries the allowed tool list to the Rust engine — the agent literally cannot see tools it doesn't have access to

This means a sandboxed research agent physically cannot call exec or write_file, regardless of what its prompt says. The tools don't exist in its schema.


Memory encryption: three independent defense layers

Project Engram — the memory system — applies defense-in-depth to all stored agent memories (episodic, semantic, and procedural). Even if an attacker gains access to the SQLite database file, the data remains protected.

Layer 1: Per-agent HKDF key derivation

A single master key lives in the OS keychain (paw-memory-vault). From it, three independent key families are derived via HKDF-SHA256 domain separation:

Domain HKDF Salt Purpose
Agent encryption engram-agent-key-v1 Per-agent AES-256-GCM memory encryption
Snapshot HMAC engram-snapshot-hmac-v1 Tamper detection for working memory snapshots
Capability signing engram-platform-cap-v1 HMAC-SHA256 signing of capability tokens

Every agent gets a unique derived key. Cross-agent decryption is mathematically impossible without the master key. Compromising one agent's derived key does not expose any other agent's memories.

Layer 2: SQL scope filtering

Every memory query includes scope constraints at the SQL level — agent_id, project_id, squad_id. Even without encryption, the query layer enforces isolation.

Layer 3: Signed capability tokens

Every gated_search() call (the unified memory retrieval entry point) performs 4-step cryptographic verification:

  1. HMAC signature integrity — token verified against the platform signing key
  2. Identity binding — the token's agent_id must match the requesting agent
  3. Scope ceiling check — requested search scope cannot exceed the token's max_scope
  4. Membership verification — for squad/project scopes, the agent must actually belong to that squad or project

This prevents confused-deputy attacks where an agent could be tricked into reading another agent's memories.


Automatic PII detection and field-level encryption

Before any memory is stored, it passes through a two-layer PII scanner with 17 regex pattern types:

Layer 1 (regex patterns): Social Security Numbers, credit card numbers, email addresses, phone numbers, physical addresses, person names, government IDs, JWT tokens, AWS access keys, private keys, IBANs, IPv4 addresses, API keys, passwords, and dates of birth.

Layer 2 (LLM-assisted): A secondary scanner catches context-dependent PII that static regex cannot detect — phrases like "my mother's maiden name is Smith" or "I was born in Springfield." The LLM returns structured JSON with PII type classifications and confidence scores.

Content is classified into three tiers:

Tier Content Treatment
Cleartext No PII detected Stored as-is
Sensitive PII detected (email, name, phone, IP) AES-256-GCM encrypted
Confidential High-sensitivity PII (SSN, credit card, JWT, AWS key, private key) AES-256-GCM encrypted

Encrypted content uses the format enc:v1:base64(nonce ‖ ciphertext ‖ tag). A fresh 96-bit nonce is generated per encryption operation. Decryption is transparent on retrieval using the per-agent derived key.

Key rotation

An automated key rotation scheduler runs on a configurable interval (default: 90 days) and re-encrypts all agent memories with fresh HKDF-derived keys. The rotation is atomic — if any re-encryption fails, the entire batch rolls back. No data is left in a half-migrated state.


Inter-agent memory bus: scoped, signed, rate-limited

When multiple agents need to share information, the Memory Bus provides pub/sub memory sharing with publish-side authentication to prevent memory poisoning.

Capability tokens

Every agent holds an AgentCapability signed with HMAC-SHA256 against a platform-held secret key. The token specifies:

  • Max publication scope — Targeted (specific agents), Squad, Project, or Global
  • Importance ceiling — the maximum importance an agent can self-assign (0.0–1.0)
  • Write permission — whether the agent can publish at all
  • Rate limit — maximum publications per consolidation cycle

The scope hierarchy is a strict linear lattice:

Targeted (rank 1) < Squad (rank 2) < Project (rank 3) < Global (rank 4)
Enter fullscreen mode Exit fullscreen mode

An agent with max_scope = Squad can publish to targeted agents or its squad, but cannot publish to the project or global scope. Ceiling enforcement uses a simple rank comparison — no ambiguity, no escalation path.

Trust-weighted contradiction resolution

When two agents publish contradictory facts on the same topic, the system resolves it based on:

effective_importance = raw_importance × agent_trust_score
Enter fullscreen mode Exit fullscreen mode

The memory with the higher effective importance is retained. Trust scores are per-agent (0.0–1.0) and adjustable at runtime. This prevents a compromised or low-trust agent from overwriting facts established by high-trust agents through recency alone.

Publish-side defenses

Defense Detail
Scope enforcement Publication scope clamped to agent's maximum
Importance ceiling Publication importance clamped to agent's ceiling
Per-agent rate limiting Publish count tracked per GC window; exceeded limits return an error
Injection scanning All publication content scanned for prompt injection patterns before entering the bus

Threat model

Attack Mitigation
Agent floods bus with poisoned memories Rate limit + injection scan on publish
Low-trust agent overwrites high-trust facts Trust-weighted contradiction resolution
Agent publishes beyond its authority Scope ceiling enforcement
Forged capability token HMAC-SHA256 verification against platform secret
Cross-agent memory reads via confused deputy Signed read-path tokens with identity binding + membership verification

Multi-agent orchestration: delegation with guardrails

OpenPawz supports three distinct agent-to-agent communication patterns, each with its own security model:

1. Orchestrator projects (boss/worker hierarchy)

A boss agent receives a project goal and team roster, then delegates tasks to worker agents:

Control Detail
Per-agent capabilities filter Each sub-agent gets a capabilities list restricting which tools it can access — tools not on the list are physically removed from the agent's schema
HIL on exfiltration tools email_send, slack_send, webhook_send, rest_api_call, exec, write_file, delete_file always require user approval — even under orchestrator delegation
Max tool rounds Global cap (default 20) bounds every agent loop
Max concurrent runs Default 4 simultaneous agent runs across the entire engine
Worker exit conditions Workers stop on report_progress(done), max tool rounds, or error — they cannot run indefinitely

2. The Foreman Protocol (architect/worker split)

For MCP tool execution, the Foreman Protocol splits agent work into two roles:

  • Architect (cloud LLM): Plans and reasons — decides what to do
  • Foreman (local/cheap model): Executes how — handles MCP tool calls

Critical security constraints:

  • No recursion — the Foreman cannot spawn sub-workers or delegate further
  • 8-round cap — max 8 tool call rounds per delegation
  • Direct MCP execution — Foreman calls MCP servers via JSON-RPC directly

3. Squads (peer-to-peer collaboration)

Flat peer groups with channel-based messaging. No boss/worker hierarchy, but scoped by squad membership.

4. Direct agent messaging

Any agent can message any other agent via the agent_send_message tool. Broadcast messages are visible to all agents. Channel-based filtering available.


Anti-forensic protections

The memory store mitigates vault-size oracle attacks — a side-channel where an attacker infers how many memories are stored by watching the SQLite file size. This is the same threat class addressed by KDBX (KeePass) inner-content padding.

Mitigation Detail
Bucket padding Database padded to 512KB boundaries via padding table — an observer can only determine a coarse size bucket, not exact memory count
Secure erasure Two-phase delete: content fields overwritten with empty values, then row deleted — prevents plaintext recovery from freed pages or WAL replay
8KB page size PRAGMA page_size = 8192 reduces file-size measurement granularity
Secure delete PRAGMA secure_delete = ON zeroes freed B-tree pages at the SQLite layer
Incremental auto-vacuum Prevents immediate file-size shrinkage after deletions (which would reveal deletion count)

Working memory snapshot integrity

Snapshots of an agent's working memory (saved on agent switch or session end) include an HMAC-SHA256 integrity tag computed from a dedicated HKDF-derived key. On restore, the HMAC is verified — tampered snapshots are rejected and logged.


Credential security

No cryptographic key is ever stored on the filesystem. Everything lives in the OS keychain:

Key Keychain Entry Purpose
DB encryption key paw-db-encryption AES-256-GCM database field encryption
Skill vault key paw-skill-vault AES-256-GCM skill credential encryption
Memory vault key paw-memory-vault Master key for HKDF per-agent memory encryption
Lock screen hash paw-lock-screen SHA-256 hashed passphrase

There is no device.json, no key file, and no config file containing secrets. If the OS keychain is unavailable, the app refuses to store credentials rather than falling back to plaintext. No silent degradation.

API key zeroing in memory

API keys in provider structs are wrapped in Zeroizing<String> from the zeroize crate. When a provider is dropped, the key memory is immediately zeroed using write_volatile — preventing:

  • Memory dump attacks (forensic tools scanning process memory)
  • Swap file leaks (unencrypted keys persisted to disk via OS paging)
  • Use-after-free (freed memory still containing the key being reallocated)

Credential audit trail

Every credential access is logged to credential_activity_log with action, requesting tool, allow/deny decision, and timestamp.


TLS certificate pinning

All AI provider connections use a certificate-pinned TLS configuration via rustls. The OS trust store is explicitly excluded.

Property Detail
Library rustls 0.23 (pure-Rust, no OpenSSL)
Root store Mozilla root certificates via webpki-roots only
OS trust store Explicitly excluded — system CAs are never consulted
Connect timeout 10 seconds
Request timeout 120 seconds

Why this matters: most TLS MITM attacks rely on installing a custom root CA on the victim's machine (corporate proxies, malware, government surveillance). By pinning to Mozilla's root store, OpenPawz rejects certificates signed by any non-Mozilla CA, even if the OS trusts it.

Outbound request signing

Every AI provider request is SHA-256 signed before transmission:

SHA-256(provider ‖ model ‖ ISO-8601 timestamp ‖ request body)
Enter fullscreen mode Exit fullscreen mode

Hashes are logged to an in-memory ring buffer (500 entries) for tamper detection and compliance auditing. If a proxy modifies the request body in transit, the recorded hash won't match.


Prompt injection defense

Dual-implementation scanning (TypeScript + Rust) for 30+ injection patterns across 9 categories:

Category Examples
Override "Ignore previous instructions"
Identity "You are now..."
Jailbreak "DAN mode", "no restrictions"
Leaking "Show me your system prompt"
Obfuscation Base64-encoded instructions
Tool injection Fake tool call formatting
Social engineering "As an AI researcher..."
Markup Hidden instructions in HTML/markdown
Bypass "This is just a test..."

Messages scoring Critical (40+) are blocked entirely and never delivered to the agent. Channel bridges automatically enforce this.

Memory-side injection scanning

Recalled memories are scanned for 10 injection patterns before being returned to agent context. Suspicious content is redacted with [REDACTED:injection] markers — poisoned memories cannot manipulate future agent behavior.


Anti-fixation defenses

Five layers prevent agents from ignoring user instructions or getting stuck:

Defense What it does
Response loop detection Jaccard similarity checks catch the agent repeating itself — active on ALL channels
User override detection Recognizes "stop", "focus on my question", "that's not what I asked" across 5 phrase categories with 3-level escalation
Unidirectional topic ignorance Catches unique-but-wrong responses after a redirect — fires when the agent's response has zero entity overlap with the user's keywords
Momentum clearing Clears working memory trajectory embeddings on user override — recalled context serves the new topic, not the old one
Tool-call loop breaker Hash-based signature detection stops repeated identical tool calls after 3 consecutive matches

Filesystem sandboxing

Sensitive path blocking

20+ sensitive paths are permanently blocked from agent access:

~/.ssh · ~/.gnupg · ~/.aws · ~/.kube · ~/.docker · ~/.password-store · /etc · /root · /proc · /sys · /dev · filesystem root · home directory root

Per-project scope

When a project is active, all file operations are constrained to the project root. Directory traversal sequences (../) are detected and blocked. Violations are logged to the security audit.

Source code introspection block

Agents cannot read their own engine source files — any read_file call targeting paths containing src-tauri/src/engine/ or files ending in .rs is rejected. This prevents agents from discovering internal security mechanisms.


Container sandbox

Docker-based execution isolation via the bollard crate:

Measure Default
Capabilities cap_drop ALL
Network Disabled
Memory limit 256 MB
CPU shares 512
Timeout 30 seconds
Output limit 50 KB

Four presets: Minimal (alpine, 128MB, no network), Development (node:20-alpine, 512MB), Python (python:3.12-alpine, 512MB), Restricted (alpine, 64MB, 10s timeout).


GDPR Article 17 — Right to erasure

The engine_memory_purge_user command performs complete data erasure for a user:

  • All memory content rows deleted
  • All vector embeddings deleted
  • Search index entries removed
  • Graph edges removed
  • Padding table repacked to prevent file-size leakage
  • PRAGMA secure_delete ensures freed pages are zeroed
  • Returns a count of erased records for compliance reporting

The Multi layers at a glance

# Layer What it protects against
1 Zero open ports Remote network attacks
2 Human-in-the-Loop Unauthorized side-effects
3 Agent policies Over-privileged agents
4 Per-agent HKDF encryption Cross-agent data access
5 PII detection + field encryption Data exposure at rest
6 Signed capability tokens Scope escalation, confused deputy attacks
7 Trust-weighted memory bus Memory poisoning between agents
8 TLS certificate pinning MITM on provider connections
9 Prompt injection scanning Prompt manipulation (inbound + recalled)
10 Anti-fixation defenses Agent ignoring user instructions
11 Filesystem sandboxing Credential theft, path traversal
12 Anti-forensic vault padding File-size side-channel leakage

Read the full security docs

The complete security reference — including risk classification tables, allowlist/denylist patterns, and every configuration option — lives in the repo:

If you find a vulnerability, please report it responsibly via the contact information in the repo rather than opening a public issue.

Star the repo if you want to track progress. 🙏

OpenPawz — Your AI, Your Rules

A native desktop AI platform that runs fully offline, connects to any provider, and puts you in control. Private by default. Powerful by design.

favicon openpawz.ai

Top comments (0)