How to Enforce Runtime Policy on Coding Agents Before They Touch Your Credentials
Coding agent runtime guardrails are policy enforcement mechanisms that monitor, log, and restrict what an AI coding agent can read, write, execute, or transmit during an active session — operating independently of how the agent was configured or what Skills it loaded before execution began.
That distinction matters more than most teams currently appreciate. Pre-execution scanning is the right starting point, but it only answers one question: does this Skill file look malicious before we run it? Runtime guardrails answer a different question entirely: what is the agent actually doing right now, and should it be allowed to continue?
Why Scanning Alone Leaves You Exposed
When developers clone a repo that includes a .cursor/skills/ or .claude/skills/ directory, those markdown files become executable instructions the agent will follow. Skill Sentinel, Enkrypt AI's open source scanner, exists because most teams have no visibility into what those files contain before the agent reads them.
But scanning solves a supply chain problem. It does not solve a runtime behavior problem. Consider a Skill that passes every scan because it contains no obviously malicious instructions. At runtime, the agent autonomously decides to read ~/.aws/credentials to "help" complete a deployment task. Or it reads .env because the user asked it to "check why the API call is failing." Neither action required a malicious Skill. Both can silently move sensitive data into a model response, a log file, or an outbound API call.
There is also the multi-step problem. Individually, reading a file, formatting its contents, and calling an API endpoint are all routine agent actions. Chained together in a single session, they constitute exfiltration. No scanner catches this because no scanner watches sequences of actions across a session. That requires runtime visibility.
The Runtime Attack Surface
The credentials an agent can reach during a normal development session are significant. On a typical developer machine:
-
~/.ssh/id_rsaand~/.ssh/id_ed25519— private keys for every server the developer can reach-
~/.aws/credentials,~/.config/gcloud/,~/.azure/— cloud provider tokens, often with broad IAM permissions -
.envfiles scattered across project directories — API keys, database URLs, service tokens - Git credential helpers and
~/.gitconfig— authentication to private repositories - Browser credential stores and keychain-accessible secrets
-
An agent operating in autonomous mode — the default for most Cursor and Claude Code workflows — has read access to all of this. It does not require explicit permission to open a file. It will read whatever it decides is relevant to the task, and the developer often cannot tell what it read from the chat interface alone.
Model output is also part of the attack surface. An agent that reads ~/.ssh/id_rsa and includes the key in a "here's what I found" response has already exposed that credential to the model provider's API endpoint. The data left your environment before you saw the response.
Hooking Into Execution Paths
Both Cursor and Claude Code expose hooks that let you intercept agent actions before they complete. The approach differs by platform, but the goal is the same: insert a policy evaluation layer between the agent's intent and the filesystem or network call it wants to make.
For Claude Code, the settings file at .claude/settings.json controls which tools the agent is allowed to use and which Bash commands it can execute without confirmation. A minimal enforcement posture might look like this:
-
Deny reads on
~/.ssh/*,~/.aws/*,~/.config/gcloud/*, and any.envfile outside the explicitly scoped project directory- Require confirmation before any
curl,wget, orfetchcall to an external host - Block
git pushto remotes not on an allowlist - Log every file read to a session audit trail with timestamps and the tool call that triggered it
- Require confirmation before any
For Cursor, rule files in .cursor/rules/ provide instruction-level constraints, though they are advisory rather than hard enforcement. Cursor's MCP (Model Context Protocol) server configuration offers a harder enforcement point — you can route all tool calls through a local proxy that applies policy before forwarding or blocking the request.
The same proxy pattern works for CrewAI, LangGraph, and OpenAI SDK-based agents: intercept at the tool execution layer, evaluate against policy, log the decision, and either allow, block, or alert. Vercel AI's tool definitions expose a similar interception surface.
What you are building in all of these cases is a runtime policy engine that lives between the model's output and your system's resources. The agent can intend whatever it wants. What it can do is determined by what the policy layer permits.
Writing an Audit-Ready Policy Layer
An effective runtime policy has three categories: allow, block, and alert. Most teams skip the alert tier and end up with policies that are either too permissive (everything allowed) or too brittle (everything blocked breaks workflows). The alert tier is where you catch the interesting cases.
Allow without confirmation: reads within the current project directory tree, writes to files the agent explicitly created in the current session, tool calls to internal services on allowlisted domains, test execution within the project.
Block unconditionally: reads on SSH private keys, cloud provider credential files, browser credential stores, system keychain access. Outbound requests to IP addresses (as opposed to domains). Any action taken while the agent is processing a prompt that arrived mid-session from an unexpected source — this is the prompt injection case.
Alert and require confirmation: reads on any .env file, reads on files outside the project root, any network request to a domain not previously seen in this session, deletion of files the agent did not create, git push to any remote.
The audit trail from this layer is where most organizations have a gap. Without structured logging of what the agent read, what it executed, and what left the environment, you cannot do incident response. You cannot answer "did the agent read the .env file during last Tuesday's session?" without logs that capture that at the tool call level, not just at the terminal output level.
At Enkrypt AI, we built runtime guardrails into the Secure Vibe Coding solution specifically because audit-ready logging is the piece teams consistently skip when they self-implement policy layers. The result is enforcement without visibility — you blocked some things, but you cannot prove what you blocked or detect what you missed.
Two Layers, Not One
The framing of "scanning vs. runtime" sets up a false choice. You need Skill Sentinel running before the agent executes Skills, and you need runtime guardrails running while it executes. Scanning catches supply chain attacks embedded in markdown files. Runtime guardrails catch autonomous credential access, prompt injection mid-session, and multi-step exfiltration sequences that look innocuous step by step.
Teams that only scan have no protection against a clean Skill that accesses credentials at runtime. Teams that only enforce runtime policy have no protection against a Skill that injects instructions that rewrite the runtime policy itself. Both gaps are real, both are being exploited, and neither is detectable after the fact without the logs that most teams are not currently collecting.
FAQ
Can a prompt injection mid-session override a previously trusted Skill?
Yes, and this is one of the more serious runtime risks. A Skill loaded at session start is evaluated against policy when it first executes. But if the agent processes a prompt mid-session that contains adversarial instructions — embedded in a webpage it fetched, a code comment it read, or a test fixture it parsed — those instructions can alter its behavior for the rest of the session without triggering a re-scan of the original Skill. The original Skill is still "trusted." The agent is now following different instructions. Runtime guardrails that evaluate every action, not just actions traceable to the loaded Skill, are the only way to catch this. The key signal is an action that is out of pattern for the stated task: a file read on ~/.ssh/ during a session that was supposed to be fixing a UI bug has no legitimate explanation.
How do runtime guardrails differ from traditional secrets scanning?
Traditional secrets scanning — tools like truffleHog, GitGuardian, or GitHub's push protection — looks for credential patterns in files committed to version control. It is a post-write, pre-publish control. It catches secrets that were accidentally hardcoded into source files before they reach a remote repository. Runtime guardrails operate at the access layer, not the content layer. They prevent an agent from reading a credential file in the first place, regardless of what the agent intended to do with the contents. Secrets scanning cannot detect that an agent read ~/.aws/credentials and included the key in a model response that was transmitted to an API endpoint. That event never touched version control. Runtime guardrails can detect and block the read before the credential reaches the model at all.
Top comments (0)