How to Secure Claude CLI When It Runs Inside Your Software (don't ask)

#ai #security #backend #promptengineering

If your application triggers Claude CLI server-side based on user input, you have a prompt injection surface. User types freeform text, your app wraps it in a prompt, Claude processes it. Without guardrails, that user could attempt to make Claude leak context, produce malicious output, or — if tools are enabled — interact with the host system.

Five layers, stacked. None sufficient alone.

Layer 1: Text-Only Mode

claude --print

--print disables interactive tool use in normal operation. Claude receives text via stdin, returns text via stdout. No file reads, no bash, no writes.

Caveat: This is a behavioral constraint, not a formal security boundary. It depends on CLI implementation details and should not be your only control.

Layer 2: Strip Capabilities

claude --print \
  --bare \
  --disallowedTools "Bash,Edit,Write,Read,Glob,Grep,Agent,NotebookEdit"

--bare disables hooks, LSP, plugin sync, auto-discovery of project files (CLAUDE.md), and keychain reads. Reduces context available to the model — but does not guarantee zero leakage. Environment variables and OS-level information may still be accessible at the process level.
--disallowedTools explicitly denies every tool by name. Defense in depth — if --print behavior changes in a future version, tools remain blocked.

Layer 3: Process Isolation

const child = spawn("claude", ["--print", "--bare", "--disallowedTools", "..."], {
  cwd: os.tmpdir(),
  timeout: 300000,
});

cwd: /tmp means the process starts in a directory with nothing interesting. This is not a filesystem sandbox — the process can still access absolute paths. It reduces incidental exposure, not hard access.

For actual isolation, run the process inside a container with restricted filesystem mounts, no network access, a non-root user, and resource limits (memory, CPU). The cwd trick is a soft boundary, not a security boundary.

Layer 4: Prompt Validation

function validatePrompt(prompt) {
  // Must contain system marker (user input alone can't form a valid prompt)
  if (!prompt.includes("YourSystemMarker")) {
    return "Prompt must originate from the application";
  }

  // Reduce obvious attack patterns
  const forbidden = [
    /```
{% endraw %}
(?:bash|sh|shell|zsh)\n/i,
    /\bexec\s*\(/i,
    /\bprocess\.env/i,
    /\bchild_process/i,
    /\bfs\.\w+/i,
    /rm\s+-rf/i,
    /sudo\s/i,
  ];

  for (const pattern of forbidden) {
    if (pattern.test(prompt)) return {% raw %}`Blocked: ${pattern}`{% endraw %};
  }

  // Must request structured output (app controls format, not user)
  if (!prompt.includes("===OUTPUT_START===")) {
    return "Prompt must request delimited output";
  }

  return null;
}
{% raw %}

The system marker and output delimiters ensure the app assembled the prompt — raw user input can't pass validation alone.

Important limitation: Prompt injection is semantic, not syntactic. A user doesn't need exec() or rm -rf to manipulate model behavior. They can write "ignore previous instructions" or "reveal the system prompt" and no regex catches that. Pattern matching reduces surface area for obvious attacks. It does not prevent prompt injection.

Layer 5: Output Containment

This is the most important layer. Never execute Claude's output. Treat it as untrusted text.


javascript
const match = output.match(/===OUTPUT_START===([\s\S]*?)===OUTPUT_END===/);

const targetDir = `outputs/${sessionId}/`;
fs.writeFileSync(path.join(targetDir, "result.md"), match[1]);

Write only to an isolated output directory — never source code, config, or system files
Write only inert file types (markdown, static HTML) — never executable code
New directory per operation — previous outputs are immutable

The real danger from LLM output is indirect: your system does something dangerous with it. If the output is never executed, evaluated, or passed to a shell, the model's text is inert regardless of what it says.

Combined


plaintext
User input (freeform text)
  ↓
App assembles prompt (system context + delimiters + user text)
  ↓
[Layer 4] Validate prompt (origin, patterns, format)
  ↓
[Layer 1-3] claude --print --bare --disallowedTools "..." --cwd /tmp
  ↓
[Layer 5] Parse delimited output → write to isolated directory

What this achieves:

User can't control prompt structure (app assembles it)
Obvious injection patterns are rejected (regex filter)
Tools are disabled at CLI level (behavioral + explicit deny)
Host context is reduced (bare mode, /tmp cwd)
Output is treated as untrusted text (never executed)

What this does not achieve:

Prevention of semantic prompt injection ("ignore instructions")
Guaranteed zero context leakage (env vars, process info)
Filesystem sandboxing (cwd is not chroot)

API vs CLI

When calling the Anthropic API directly, layers 1-3 don't apply — there's no CLI process. Layers 4 and 5 still work identically. The API has no filesystem access by default, but injection risk remains: the model can still be manipulated to leak data you included in the prompt or produce output that influences downstream systems.

Starting Point, Not Endpoint

With all five layers applied, Claude is rendered effectively harmless — it can't use tools, can't see files, can't execute commands, and its output goes nowhere dangerous. This is the correct starting point. Strip everything, verify it's inert, then selectively grant back capability and access as your use case requires — with each addition evaluated as a new attack surface.