Claude Code in CI/CD: What Goes Wrong When You Run It Without a Sandbox
Claude code CI CD security refers to the set of practices, configurations, and runtime controls required to safely run Anthropic's Claude Code agent inside continuous integration and deployment pipelines — covering credential isolation, filesystem containment, subprocess privilege inheritance, and audit trail completeness. Without these controls, a CI job that invokes Claude Code can expose secrets, modify files outside the intended working tree, or make outbound network calls that bypass your pipeline's security model entirely.
Most teams don't think twice before adding Claude Code to a GitHub Actions workflow. It feels like adding any other CLI tool. You install it, pass an API key, and let it run. The problem is that Claude Code isn't a linter or a formatter. It's an autonomous agent that reads files, writes code, runs shell commands, and can call external APIs — all within the same process environment as your CI runner.
Why CI Environments Are Less Isolated Than They Look
Shared runners on GitHub Actions, GitLab CI, and CircleCI aren't containers in the security sense. They're virtual machines — or worse, shared runner pools — where every environment variable injected at the pipeline level is visible to every subprocess spawned during that job. When you mark a secret as "masked" in GitHub Actions, what that actually does is suppress the value from appearing in log output. The secret is still present in the process environment as a plaintext string. Any child process — including every subprocess Claude Code forks — inherits it.
This is not a Claude Code-specific issue. It's standard POSIX fork/exec behavior. But it becomes a Claude Code-specific problem because the agent operates at a higher level of autonomy than a typical CLI tool. When Claude Code runs a bash command as part of its task, that subprocess receives your full environment, including ANTHROPIC_API_KEY, any cloud provider credentials, repository tokens, and signing keys. GitHub's own documentation on encrypted secrets explicitly states: "Secrets are not passed to workflows that are triggered by a forked repository" — but within the same job, inheritance is total.
The blast radius expands when you consider what Claude Code can do with those inherited credentials. If the agent determines that running aws s3 cp or gh pr create is part of completing its task, it has the credentials to do so. You may never see it in the logs.
Credential Inheritance: The ANTHROPIC_API_KEY Problem
There's a second-order risk here that most CI security models miss. Your pipeline probably scopes cloud credentials using short-lived OIDC tokens or role-assumption — good practice. But ANTHROPIC_API_KEY is almost always a long-lived static key with no built-in scope restrictions. If Claude Code is running in the same environment as those OIDC tokens, and a malicious prompt injection causes Claude to exfiltrate environment variables (a demonstrated attack class — see CVE-2025-59536 for a concrete example of prompt injection in agentic Claude contexts), your API key leaves the building.
The mitigation isn't complicated, but it requires intent. Scope the key to a dedicated service account with usage monitoring. Inject it as a runtime secret only into the specific step that needs it, not as a pipeline-wide variable. Use GitHub Actions' env block at the step level, not the job level. And use a secrets manager — AWS Secrets Manager, HashiCorp Vault, or equivalent — to rotate the key on a schedule short enough that a leaked key expires before it can be abused.
For teams running self-hosted runners, the exposure surface is larger. Self-hosted runners persist state between jobs unless you explicitly configure ephemeral mode. Claude Code writing to ~/.claude/ or leaving cache artifacts on a persistent runner is a real data retention risk.
Container and Worktree Patterns That Actually Contain the Blast Radius
The most effective containment pattern is also the most obvious: run Claude Code inside a dedicated container that doesn't inherit the full CI environment. Build a minimal Docker image that includes Claude Code, the specific CLI tools it needs for the task, and nothing else. Inject only the secrets required for that specific task using Docker's --secret flag (BuildKit secrets) or environment injection scoped to the container run command.
At the filesystem level, Claude Code's --add-dir flag and worktree configuration give you explicit control over what the agent can see and write. Combine this with a read-only bind mount for directories the agent should only read, and a writable mount scoped to a temporary worktree for its output. This won't stop Claude from making network calls, but it does prevent filesystem writes outside intended boundaries — which eliminates a large class of accidental (and adversarial) damage.
For teams using GitHub Actions specifically, the permissions key at the job level is worth auditing. It's common to see permissions: write-all on jobs that only need read access. If Claude Code is running in a job with write permissions to packages, deployments, or security-events, and something goes wrong, the scope of impact is unnecessarily large. Default to least privilege and add permissions explicitly.
Our CLaude coe product overview covers the full architecture of these containment patterns, including how to wire up runtime policy enforcement alongside Claude Code's native configuration.
PreToolUse Hooks and Audit Trails as a Last Line of Defense
Claude Code's hook system — specifically PreToolUse hooks — lets you intercept every tool call before it executes. This is where you can enforce policy at runtime: block filesystem writes outside the working directory, reject bash commands that match a deny-list of dangerous patterns, or log every tool invocation with a timestamp and the triggering prompt context.
A minimal PreToolUse hook that writes to a structured log file gives you the audit trail that most CI security reviews require. Without it, you have no record of what Claude Code actually did during a pipeline run — only what it produced as output. When a security incident happens (and in any organization running Claude Code at scale in CI, eventually it will), "Claude ran for 90 seconds and then the PR appeared" is not a useful incident record.
Hook output should go to a log aggregation system outside the CI runner, not just to the job log. If the runner is compromised or the job log is truncated, you want hook events persisted independently. Ship them to your SIEM, to CloudWatch Logs, or to a dedicated audit bucket with object lock enabled.
The CLaude coe documentation includes a working reference implementation for PreToolUse hooks with structured JSON output, including examples for CI-specific policy enforcement patterns.
Claude Code CI CD Security Checklist
-
Run Claude Code in an ephemeral container, not directly on a shared runner
- Inject
ANTHROPIC_API_KEYat the step level, not job or pipeline level - Rotate static API keys on a schedule of 30 days or less; use usage alerts
- Bind-mount source directories as read-only; provide a separate writable worktree
- Configure
--add-direxplicitly — don't let Claude discover the full repo tree by default - Implement
PreToolUsehooks and ship events to external audit storage - Audit job-level
permissionsin GitHub Actions workflows; remove write-all - Test what happens when Claude receives a malformed or adversarial prompt — before production does
- Inject
Prompt injection is the threat model most teams skip when setting up claude code CI CD security. The assumption is that Claude will only see trusted content — the repository code, the task prompt you wrote. In practice, CI pipelines often pull in dependencies, fetch remote configurations, or process data from external sources. Any of that content can carry an embedded instruction that Claude interprets as a directive. Containment at the filesystem and credential layer limits what a successful injection can actually accomplish.
Frequently Asked Questions
Should Claude Code ever run with --dangerously-skip-permissions in a CI pipeline?
No. --dangerously-skip-permissions disables the agent's permission gating entirely, meaning Claude can execute any tool call without user confirmation. In an interactive session, this trades safety for speed — a deliberate tradeoff a developer can monitor in real time. In a CI pipeline, there's no human in the loop to catch unexpected behavior. Any prompt injection, model error, or edge-case task interpretation executes without a gate. The flag exists for development convenience; it has no place in automated pipelines.
Can masked CI secrets still be read by subprocesses in GitHub Actions?
Yes. GitHub Actions' secret masking applies only to log output — the runner scrubs matching strings before they're written to the job log. The underlying environment variable is still injected as plaintext into the runner's process environment. Any subprocess spawned by your job, including every bash command Claude Code executes, inherits that environment. This is documented behavior: GitHub's Actions documentation states that secrets "are available to the runner as environment variables." Masking is a log hygiene feature, not an access control mechanism.
Does Claude Code need network egress in CI?
It needs outbound access to api.anthropic.com on port 443 to function at all. Beyond that, it depends on what tasks you're running. If Claude is calling tools that invoke package managers, git operations, or external APIs, those require their own egress paths. The correct posture is to start with a network policy that only allows api.anthropic.com and add exceptions explicitly, rather than allowing unrestricted outbound and hoping Claude doesn't do something unexpected with it.
How do I scope ANTHROPIC_API_KEY to least privilege?
Anthropic's API keys are currently all-or-nothing — there's no native scoping to specific models, rate limits, or IP ranges per key. Your controls need to come from the outside: create a dedicated key per CI environment (not shared with developer machines), set usage-based alerts in the Anthropic console, rotate on a 30-day schedule, and revoke immediately if a pipeline job produces anomalous token consumption. For teams running multiple pipelines, use separate keys per pipeline so a compromised key's blast radius is bounded to one environment.
What's the minimum hook configuration for a production CI deployment?
At minimum: a PreToolUse hook that logs tool name, tool input (sanitized of secrets), and UTC timestamp to a structured JSON log; and a PostToolUse hook that records the tool result status. This gives you an event-level audit trail without requiring you to parse Claude's conversational output. Add a bash command deny-list to the PreToolUse hook — patterns like curl | bash, wget -O- | sh, and direct writes to /etc — and you've covered the highest-risk command patterns for a modest implementation cost.
Running Claude Code in CI without these controls isn't necessarily a disaster waiting to happen. But it is a bet that your task prompts are always clean, your model always interprets them as intended, and no external content your pipeline touches carries an adversarial payload. That's a lot of assumptions to stake a production pipeline on. The controls described here are not expensive to implement — a few hours of configuration work buys a substantially more defensible posture.
For a full breakdown of policy options and runtime enforcement capabilities, see the CLaude coe product overview and the companion articles on our CLaude coe blog covering permissions hardening and sandbox configuration for production deployments.
At CLaude coe, we work with security and platform engineering teams to close the gap between "Claude Code works in CI" and "Claude Code is safe to run in CI at scale." Those are different problems, and the second one requires deliberate design.
Top comments (0)