DEV Community

韩

Posted on

The Coding Agent Harness Is Broken: 5 Security Patterns Nobody Teaches You

The Coding Agent Harness Is Broken: 5 Security Patterns Nobody Teaches You

This is not fearmongering -- Uber burned $500-2K per engineer per month on AI coding. Last week, a Replit AI agent deleted a company's entire production database. If you're vibe-coding without these patterns, you're one prompt away from a disaster.

Featured in: Hacker News | Reddit r/artificial | PCMag: Vibe Coding Fiasco


When an AI coding agent gone rogue deletes your production database -- as reported by PCMag -- it's not the AI's fault. It's yours. You gave it a broken harness.

The good news? A new wave of open-source tools is finally solving this properly. Let me show you the patterns that actually work.

Why 90% of Developers Get Agent Harnessing Wrong

The standard setup looks like this: give the agent an API key, point it at your repo, let it run. This is like handing your laptop to a caffeinated intern and hoping for the best.

The problems are well-documented now:

  1. Over-privileged agents -- they get credentials with production write access by default
  2. No execution boundaries -- a bad prompt injection turns the agent into an attacker
  3. Silent failures -- you don't know what the agent did until it's too late
  4. No sandboxing -- the agent can touch anything on your system or cloud environment

As one HN commenter put it: "The agent harness belongs outside the sandbox" -- meaning the safety layer needs to be architected separately from the agent itself, not bolted on as an afterthought.


Pattern 1: The Capability-Based Sandboxing (jcode)

jcode is a Rust-based coding agent harness that enforces execution boundaries at the OS level. Unlike traditional agent setups, it uses Linux namespaces and seccomp-bpf to restrict what the agent process can actually do.

# Install jcode (Rust-based, zero runtime dependencies)
cargo install jcode

# Run with a restricted policy -- no network, read-only filesystem
jcode run --policy restrict --allowed-dirs ./my-project --no-network
Enter fullscreen mode Exit fullscreen mode

The key insight: capabilities are granted explicitly, not inherited. The agent can't touch /etc, can't reach the internet, and can't write outside ./my-project unless you explicitly allow it.

# jcode Python SDK example: scoped execution
from jcode import Agent, Policy

policy = Policy(
    allowed_paths=["./my-project"],
    network_whitelist=["github.com"],  # Only allow GitHub API calls
    max_file_size_mb=50,
    read_only=False,
    timeout_seconds=300,
)

agent = Agent(
    model="claude-3-7-sonnet",
    policy=policy,
)

# This will be blocked: agent trying to access ~/.ssh
result = agent.execute("Check the SSH config at ~/.ssh/config")
# Result: PolicyViolationError: Path outside allowed_dirs
Enter fullscreen mode Exit fullscreen mode

Why this matters: even if an attacker prompts your agent to "check production secrets," the harness blocks it at the OS level before the agent even sees those files.

Data point: jcode has 2,800+ GitHub stars with climbing adoption -- it's becoming the standard for sandboxed coding agents in production.


Pattern 2: Ephemeral Cloud Sandboxes (Browserbase Skills)

browserbase/skills takes a different approach: don't run the agent on your machine at all. Instead, it spins up ephemeral cloud environments for each task.

# Install Browserbase Skills CLI
npm install -g @browserbase/skills

# Authenticate
bb skills auth

# Run a coding task in an isolated cloud environment
bb skills run --task "Refactor the auth module" \
    --repo git@github.com:your-org/backend.git \
    --env production \
    --no-persist  # All changes discarded after task completes
Enter fullscreen mode Exit fullscreen mode

The killer feature: zero blast radius. Even if the agent goes completely rogue in the cloud sandbox, your local environment is untouched. The sandbox is created fresh for each task and destroyed immediately after.

// Programmatic usage with Browserbase SDK
import { Browserbase } from '@browserbase/sdk';

const bb = new Browserbase({ apiKey: process.env.BB_API_KEY });

// Create an ephemeral project environment
const session = await bb.sessions.create({
  projectId: 'your-project',
  ephemeral: true,
  autoDestroy: true,  // Destroyed 5 minutes after last activity
  allowedCommands: ['git', 'npm', 'python', 'docker'],
  blockedCommands: ['rm -rf', 'drop database', 'kubectl delete'],
});

console.log(`Session ${session.id} -- isolated, ephemeral, auditable`);
Enter fullscreen mode Exit fullscreen mode

This is the pattern used by high-stakes deployments: you get an audit log of every command the agent ran, in an environment that simply cannot touch your production systems.


Pattern 3: The Read-First Linter Guard

Before allowing any write operation, route it through a linter that flags dangerous changes. This adds a human-in-the-loop for destructive actions.

import subprocess
import re

DANGEROUS_PATTERNS = [
    r'drop\s+database',
    r'delete\s+from\s+\w+',
    r'rm\s+-rf\s+/',
    r'\.env(?!\.example)',
    r'chmod\s+777',
    r'sudo\s+rm',
    r'kubectl\s+delete',
    r'docker\s+rm\s+-f\s+$(docker\s+ps',
]

def validate_write(path, content):
    # Returns (allowed, reason)
    sensitive = ['/etc/', '/root/.ssh/', '/var/log/', '~/.aws/']
    from pathlib import Path
    for s in sensitive:
        if path.startswith(s.replace('~', str(Path.home()))):
            return False, f"Blocked: sensitive path {path}"

    content_lower = content.lower()
    for pattern in DANGEROUS_PATTERNS:
        if re.search(pattern, content_lower, re.IGNORECASE):
            return False, f"Blocked: dangerous pattern '{pattern}' in content"

    if 'rm -rf' in content_lower and 'node_modules' not in path:
        return False, "Blocked: suspicious rm -rf command"

    return True, "Allowed"

def execute_write(path, content):
    allowed, reason = validate_write(path, content)
    if not allowed:
        return {"status": "blocked", "reason": reason, "path": path}

    with open(path, 'w') as f:
        f.write(content)

    return {"status": "written", "path": path, "logged": True}

# Usage in an agent loop
result = execute_write("./src/auth.py", ai_generated_code)
if result["status"] == "blocked":
    print(f"WARNING: Write blocked: {result['reason']}")
    # Notify human, await approval
Enter fullscreen mode Exit fullscreen mode

This pattern works great layered on top of jcode -- even if the OS-level sandbox misses something, the application-level guard catches it.


Pattern 4: Graduated Permission Escalation

Instead of all-or-nothing access, design your agent workflow with escalation levels:

# Level 0: Read-only analysis (no writes, no execution)
AGENT_LEVEL=0
echo "Level 0: Passive analysis only"

# Level 1: Read + write to ./scratch only
AGENT_LEVEL=1
export AGENT_SANDBOX_DIR=./scratch
echo "Level 1: Can write to ./scratch, no production access"

# Level 2: Read production + write to ./scratch
AGENT_LEVEL=2
export READ_PRODUCTION=true
export WRITE_DIR=./scratch
export CONFIRM_DESTRUCTIVE=true
echo "Level 2: Can read production code, writes go to scratch"

# Level 3: Full access with full logging (human approves each destructive action)
AGENT_LEVEL=3
export READ_PRODUCTION=true
export WRITE_PRODUCTION=true
export LOG_ALL=true
export APPROVE_DESTRUCTIVE=true
echo "Level 3: Full access with human-in-the-loop"
Enter fullscreen mode Exit fullscreen mode
from dataclasses import dataclass
from enum import IntEnum

class AgentLevel(IntEnum):
    ANALYZE = 0  # Read-only, no execution
    SCRATCH = 1  # Write to ./scratch only
    REVIEW = 2   # Read production, write scratch, destructive needs approval
    FULL = 3     # Full access, all destructive ops logged and approved

@dataclass
class AgentPermissions:
    level: AgentLevel
    read_production: bool = False
    write_production: bool = False
    execute_commands: bool = False
    max_file_size_mb: int = 10

    @classmethod
    def for_level(cls, level):
        configs = {
            0: cls(level=cls.ANALYZE),
            1: cls(level=cls.SCRATCH, write_production=False, max_file_size_mb=10),
            2: cls(level=cls.REVIEW, read_production=True, write_production=False),
            3: cls(level=cls.FULL, read_production=True, write_production=True,
                   execute_commands=True, max_file_size_mb=100),
        }
        return configs.get(level, configs[0])

# Junior task = junior permissions
profile = AgentPermissions.for_level(1)
print(f"Level {profile.level.value}: scratch writes only")
Enter fullscreen mode Exit fullscreen mode

This mirrors how human engineers work -- a junior dev gets Level 1, a senior gets Level 2 with approval, and Level 3 is reserved for emergency hotfixes with full audit trails.


Pattern 5: Audit Everything with Structured Logging

You can't fix what you can't see. Every agent action should be logged in a structured format that you can actually query.

import json
import time
from datetime import datetime
import sqlite3

class AgentAuditLogger:
    def __init__(self, db_path="./agent_audit.db"):
        self.conn = sqlite3.connect(db_path)
        self.conn.execute("""
            CREATE TABLE IF NOT EXISTS agent_actions (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                timestamp TEXT,
                agent_id TEXT,
                action_type TEXT,
                target TEXT,
                outcome TEXT,
                duration_ms INTEGER,
                metadata TEXT,
                approved_by TEXT
            )
        """)

    def log(self, agent_id, action_type, target, outcome, duration_ms=0, metadata=None, approved_by=None):
        self.conn.execute(
            """INSERT INTO agent_actions (timestamp, agent_id, action_type, target, outcome, duration_ms, metadata, approved_by) VALUES (?, ?, ?, ?, ?, ?, ?, ?)""",
            (datetime.utcnow().isoformat(), agent_id, action_type, target, outcome, duration_ms,
             json.dumps(metadata or {}), approved_by)
        )
        self.conn.commit()

    def get_dangerous_actions(self, days=7):
        cursor = self.conn.execute(
            """SELECT timestamp, action_type, target FROM agent_actions
               WHERE outcome IN ('blocked', 'flagged') AND timestamp > datetime('now', ?)
               ORDER BY timestamp DESC LIMIT 50""",
            (f'-{days} days',)
        )
        return cursor.fetchall()

# Usage: wrap every agent action
logger = AgentAuditLogger()

def agent_write_file(agent_id, path, content):
    start = time.time()

    # Validate before writing
    allowed, reason = validate_write(path, content)
    outcome = "allowed" if allowed else "blocked"

    logger.log(
        agent_id=agent_id,
        action_type="write_file",
        target=path,
        outcome=outcome,
        duration_ms=int((time.time() - start) * 1000),
        metadata={"size_bytes": len(content), "reason": reason if not allowed else ""}
    )

    if not allowed:
        return {"status": "blocked", "reason": reason}

    with open(path, 'w') as f:
        f.write(content)
    return {"status": "written"}
Enter fullscreen mode Exit fullscreen mode

Query your audit log after every session:

# Find all blocked actions in the last week
sqlite3 agent_audit.db \
  "SELECT timestamp, action_type, target FROM agent_actions \
   WHERE outcome='blocked' AND timestamp > datetime('now','-7 days')"
Enter fullscreen mode Exit fullscreen mode

The Bottom Line

The "vibe coding" era is here, but most developers are running AI agents with the security posture of a house of cards. The tools exist to do this right:

  • jcode for OS-level sandboxing with explicit capability grants
  • Browserbase Skills for ephemeral cloud environments with zero blast radius
  • Guarded write executors for application-level policy enforcement
  • Graduated permissions to match access levels to task complexity
  • Structured audit logging so you can reconstruct exactly what happened

Pick one pattern and implement it this week. Your future self -- and your production database -- will thank you.


Related Reading

GitHub's 22 Models with 400+ MCP Integrations -- 90% of Developers Haven't Found

n8n's 5 Hidden Workflow Patterns -- 186K Stars, But 90% Use It Wrong

The Local LLM Ecosystem Doesn't Need Ollama -- 5 llama.cpp Tricks 90% Are Missing


What safety patterns are you using for AI coding agents? Drop your thoughts in the comments -- especially if you've had a close call.

Top comments (0)