curi0us_dev

Posted on May 21

Hardening Claude Code Security for Policy as Code: How a Cerbos Skill Changed My Setup

#claudecode #ai #security

I use Claude Code a lot.

I started cautious. I'm not cautious anymore. That's the part I want to talk about.

It’s become my default way to work on code: refactors, tests, quick spikes. But the more I leaned on it, the more I started thinking about Claude Code security and how it fits into my policy as code setup.

This isn’t “just a chatbot”, it’s an agent that can read my repo, edit files, and run shell commands.
In the wrong configuration, it can do a lot more than I’d ever let a junior engineer do unsupervised.

That pushed me to tighten how I use Claude Code permissions and granular controls. The most interesting piece of that journey has been integrating a Cerbos authorization policy skill that runs the real cerbos compile command in a loop until my policy bundle is clean.

This post is about that pattern:

How Claude Code’s own permission model and permissions work in practice
Why policy as code and authorization policies are an especially risky thing to hand over to an AI coding agent
How the validate‑in‑a‑loop approach with Cerbos gives me a safer way to work
How I think about iteration budgets and failure modes for this kind of domain‑specific skill

If you care about Claude Code security, granular permissions, authorization policies, or policy as code, this might give you a blueprint to copy or improve.

What worried me about Claude Code security

Claude Code ships with a decent security model out of the box:

It asks for explicit approval before running shell commands, applying large edits, or doing other sensitive actions.
You can run it in different permission modes – from very conservative (always ask) to more automated setups for isolated environments.
You can limit its effective scope with project boundaries, permission rules, and sandboxing so it doesn’t have broad freedom on your machine.

That’s already better than a raw “paste code into a chat window” experience. But in practice I noticed a pattern:

As I got more comfortable, I gradually slid from very safe settings towards more automated ones to save time.

That’s fine for some workflows, but it scared me for others – especially anything related to security, access control, and infrastructure. I didn’t want to be in a situation where:

Claude Code quietly edited authorization policies in ways that still compiled but weakened my access model.
A misconfigured shell permission let it run commands I didn’t intend.
I had no strong validation loop besides “the code didn’t crash in my quick test”.

That’s what sent me looking for a better story around fine‑grained authorization policies in a Claude Code‑driven workflow.

Why policy as code and authorization policies are dangerous to get wrong

Most of the systems I care about have non‑trivial authorization logic:

Different roles (admin, operator, support, end user)
Tenant isolation
Special cases for finance, security, audit, etc.

I like the policy‑as‑code approach for this – in particular with Cerbos, where you write YAML policies that live in git and get validated by a real compiler.

The upside:

Policies are reviewable.
You can gate changes with CI.
You can write tests for edge cases.

The downside:

The syntax is detailed and easy to mess up.
A single indentation mistake can change semantics.
A wrong default rule can accidentally open up access across tenants.

Once I started thinking about using an AI coding agent for this, it felt obvious that authorization policies needed stricter guardrails than normal code.

If Claude Code generates a slightly wrong React component, I get a bug. If it generates a slightly wrong Cerbos policy, I get a data exposure.

So the question became: can I still use Claude Code to help with authorization policies, while keeping Claude Code security and granular permissions front and center?

The missing piece: a Cerbos policy skill that validates in a loop

The thing that unlocked this for me was a small Claude Code skill built around Cerbos policies.

Roughly speaking, the skill does this:

Plain English in → Cerbos policy bundle out → run cerbos compile in Docker → fix errors → repeat until the bundle is clean.

From a user perspective:

I describe what I want, e.g.:
- "Only HR can see salary fields. Managers can see performance reviews for their own direct reports."
The skill generates or edits a Cerbos policy bundle in YAML.
It runs cerbos compile against that bundle inside Docker.
If there are errors, it reads the actual compiler output, fixes something, and tries again.
Only when the bundle compiles cleanly do I see the result.

The install is straightforward:

npx skills add cerbos/skills --skill cerbos-policy

After that, I can call the Cerbos policy skill from Claude Code in the project where my policies live.

What I like about this is that the source of truth is not the model – it’s the real Cerbos compiler, running in an environment that mirrors production. The skill is just a smart loop around it.

Inside the validate‑in‑a‑loop pattern

The validate-in-a-loop pattern is the part that makes the Cerbos skill feel practical rather than just impressive. The full workflow is covered in this Cerbos post, but the important idea is simple: the model is not the source of truth. The compiler is.

Before generating anything, the skill asks clarifying questions in plain language. A vague requirement like “managers can see reports” turns into a more precise discussion about which managers, which reports, ownership, confidentiality, and edge cases. That matters because ambiguity in the prompt can become ambiguity in the policy.

Once the spec is clear, the skill writes the policy bundle and validates it with cerbos compile in Docker. If something fails, it fixes one issue at a time: syntax, schemas, compile errors, then tests. It also stops after repeated failures and, importantly, does not delete tests just to make validation pass.

That’s the real value: a disciplined loop around real validation.

How I lock down Claude Code around this skill

Validation-in-a-loop is one layer. Claude Code permission hardening is the other. Neither is enough on its own.

The compiler can tell me a policy is syntactically valid. It cannot tell me Claude Code shouldn't have been allowed to touch the policy file in the first place. So I run both.

Concretely, my setup looks like this:

Narrowed filesystem scope

In this setup, I keep Claude Code focused on the repo that contains my Cerbos policies and related services, rather than giving it broad access to my machine.
Whitelisted commands

The only shell command this skill really needs is the Docker‑wrapped cerbos compile (and optionally cerbos test). I don’t approve arbitrary bash unless I understand exactly why it’s needed.
Conservative permission mode for policy work

For simple refactors I might let Claude Code run in a faster, more automated mode. For anything touching authorization policies, I intentionally use a safer mode where edits and commands are surfaced clearly for approval.
Human review for semantics

The Cerbos compiler can tell me whether the bundle is valid and tests pass. It cannot tell me if the business rule itself is sane. I still run policy changes through normal code review and CI before they go anywhere near production.

This way, Claude Code security and granular permissions form one layer of defence, and the Cerbos compiler plus tests form another.

Iteration budgets: how far should the loop go?

One open question I’m still exploring is how aggressive to be with iteration budgets for validate‑in‑a‑loop skills.

Right now, the defaults that feel reasonable for me are:

Up to 3 attempts to fix the same error type, then stop.
A modest global limit on compile cycles per instruction (enough for a few syntax + schema + test passes, but not dozens).
Early exit as soon as cerbos compile and tests pass, with no extra speculative “improvements” in that loop.

In practice:

Most fixes converge in 1–3 iterations.
When the loop hits its limit, it almost always means the prompt was ambiguous or the existing policies have deeper structural issues a human needs to sort out.

I’m very curious what other people are doing here, both for Claude Code and for other agentic tools:

If you’re using a static analyzer, test suite, or custom validator in the loop, how many tries do you give the agent?
Do you treat security‑sensitive domains (auth, infra, data migrations) differently from purely functional code?
Have you found good patterns for showing these loops to humans in a way that builds trust instead of hiding all the complexity?

Trying this pattern yourself

If you’re experimenting with Claude Code security and want a more robust way to handle authorization policies, this is the pattern that’s been working for me:

Define a narrow, domain‑specific skill

In my case: a Cerbos policy skill that understands my policy layout and how to run cerbos compile in Docker.
Make the real tool the judge

Don’t let the model decide when things are “good enough”. Let your domain compiler or validator (Cerbos, a linter, a test runner, a schema checker) be the arbiter of success.
Wrap it in a validate‑in‑a‑loop controller

One change per iteration, with a small iteration budget, surfacing compiler output and diffs back to you.
Layer this on top of Claude Code’s own security model

Restrict filesystem scope, whitelist commands, and pick conservative permission modes when you’re touching anything related to security or infrastructure.

If you want to try the exact Cerbos setup I’m using, the install on the Claude Code side is:

npx skills add cerbos/skills --skill cerbos-policy

Then point it at a repo with Cerbos policies and start with something small and easy to reason about. Watch the validation loop, read the compiler errors it surfaces, and tune your iteration budget based on what feels safe.

Open question for the community

I’m sharing this mostly because I suspect a lot of people are converging on similar ideas:

Use Claude Code (or another agent) for generation and edits, but let domain‑specific tools enforce correctness, and make the agent iterate until those tools are happy.

If you’re running something similar – especially around security, authorization, granular permissions, or infra as code – I’d love to hear:

What your validate‑in‑a‑loop looks like
How strict your iteration budgets are
Any horror stories where the lack of a loop bit you

We’re still early in figuring out how to make powerful coding agents safe by default. For me, combining Claude Code security settings with Cerbos‑backed validation loops has been a big step in the right direction.