Bala Paranj

Posted on May 22

The First Agent-Centric Cloud Security Platform — And Why We Didn't Build It That Way On Purpose

#ai #agents #security #cloud

Every boundary in Stave's pipeline has a machine-verifiable contract. We built them for solo developer productivity. They turned out to be exactly what agents need. The CLI tool became a platform. Here's why that changes who can run a cloud security program.

I didn't set out to build a cloud security platform for agents. My goal was to build a CLI tool one person could maintain.

The decisions that followed — standard JSON Schema instead of a proprietary format, exit codes instead of prose output, deterministic evaluation instead of probabilistic scoring, small composable tools instead of a monolith were made for human productivity. One person can't maintain a proprietary schema, debug non-deterministic output or maintain a monolith.

Fourteen months later, I ran five independent trials. I gave agents a reasoning specification and Stave's data export. No implementation code. No documentation beyond the spec. No hints. The agents produced correct security verdicts for five different reasoning engines — Z3 (mathematical proof), Soufflé (blast radius enumeration), Clingo (violation detection), Prolog (proof trees), and PRISM (risk probability).

Scope: this proof is valid within the scope of exported SIR facts; the Fact Export reference names which property domains the SIR currently covers.

Two of the trials were fully blind — fresh agents with zero prior context. Both passed.

I didn't target agent support from day 1. The architecture produced it.

What agent-centric means

Every security vendor is adding AI. Copilots that summarize findings. Chatbots that answer questions about your security posture. LLMs that suggest remediation steps. These are useful features. They're also decorations on top of existing architectures.

Agent-centric means an agent can build the pipeline — not just answer questions about it. The distinction is the difference between a tool that has AI and a tool that agents can develop against.

The test is simple: can an agent that has never seen your source code produce correct results from your published contracts alone? If yes, the tool is agent-centric. If the agent needs implementation code, internal documentation, or human guidance, the tool has AI features bolted onto a human-dependent architecture.

Stave passed this test. Five times. With two blind runs.

The contracts that make it work

Every boundary in the pipeline has three properties:

1. Machine-readable specification. Not documentation — a JSON Schema or YAML file that an agent parses, understands, and generates conforming output against.

2. Binary assertion. The step either succeeded or it didn't. stave validate --strict exits 0 or non-zero. stave apply produces findings or doesn't. No subjective quality judgment. No "does this look right?" While the agent drafting the code is probabilistic, the contract it targets is deterministic. The platform provides a rigorous feedback loop: the agent iterates until it hits the binary "success" state defined by the contract. The platform is a deterministic sandbox for a probabilistic agent.

3. Actionable error on failure. When the assertion fails, the error names the specific field that's wrong and what was expected. The agent reads the error, fixes the field, and retries. No human interpretation needed.

Here's what the pipeline looks like with these contracts:

Steampipe table schema          →  Published mapping YAML
(agent reads column names)         (agent reads field_map)
                                          ↓
                                   stave validate --strict
                                   (assertion: exit 0?)
                                          ↓
                                   stave apply
                                   (assertion: deterministic findings)
                                          ↓
                                   stave export-sir
                                   (SIR: Stave Intermediate Representation
                                    — JSONL triples / SMT-LIB assertions)
                                          ↓
                                   reasoning-spec YAML
                                   (agent maps logic → engine code)
                                          ↓
                                   golden answer comparison
                                   (assertion: matches?)

Every arrow is a contract. Every contract is machine-readable. Every assertion is binary. An agent traverses this pipeline the same way a developer does — except the agent never needs to ask "is this right?" because the contracts answer that question automatically.

What the five trials proved

We wrote reasoning specifications — YAML files describing a security question, the input data, the step-by-step reasoning chain, and the expected output format. The reasoning spec defines logic constraints (e.g., "a bucket is public if the policy allows the AllUsers principal"), not implementation code. The agent's job was to translate those logic constraints into the specific syntax of the target engine — Soufflé Datalog, Z3 SMT-LIB, Clingo ASP atoms. We stripped the expected answer. We gave the spec and the input data to agents with no access to our codebase.

Trial	Engine	Question	Blind?	Result
1	Z3	"Can anonymous users reach this S3 bucket?"	Same-session	PASS — correct verdict + SAT witness (attack path)
2	Soufflé	"How many resources can an anonymous identity reach?"	Same-session	PASS — count: 12 (byte-identical)
3	Clingo	"Which violation rules fire on this configuration?"	Blind	PASS — all 4 violations correct
4	Prolog	"What is the proof tree for this attack path?"	Blind	PASS — 12 proof trees correct
5	PRISM	"What is the probability of successful exploitation?"	Same-session	PASS — 0.412 (within ±0.005)

Two of the five trials caught real defects — one in the spec, one in our test suite. The framework automatically classified them: when the engine and agent agreed but the golden answer differed, we found a human transcription error in our test suite (we'd written 6 when the correct count was 12). When the agent's output failed to match the engine's actual vocabulary, we found a spec ambiguity (the spec said mfa_enforced but the export uses has_mfa_enforced). The contracts allowed the agents to debug our own test methodology.

No other security platform has published evidence that agents can produce correct security reasoning from published contracts alone.

What this changes for enterprises

Before: a team problem

Deploying cloud security posture management traditionally requires:

A security engineer to configure the scanner
A cloud architect to interpret the findings
A compliance analyst to map findings to frameworks
A DevOps engineer to integrate into CI/CD
A manager to prioritize remediation

Five roles. Monthly ongoing cost. The tool is the smallest part of the expense — the team to operate it is the real cost. This is why startups skip CSPM: not because the tool costs $50K, but because the team to run it costs $500K.

After: an agent problem

With agent-centric architecture, the same pipeline runs with one engineer directing agents:

Engineer: "Connect Steampipe to our AWS account and produce
           Stave observations for S3 and IAM."

Agent 1:   Reads contracts/steampipe/aws_s3_bucket.yaml
           Reads contracts/steampipe/aws_iam_role.yaml
           Queries Steampipe, transforms output, validates
           → valid observations

Engineer: "Evaluate and show me compound risks."

Agent 2:   Runs stave apply → findings
           Runs stave gaps → what's missing
           → prioritized findings + gap report

Engineer: "Prove whether anonymous access to PHI is reachable."

Agent 3:   Reads reasoning-specs/z3-public-read-bucket/spec.yaml
           Runs stave export-sir → SMT-LIB facts
           Follows reasoning steps → SAT/UNSAT verdict
           → mathematical proof

Engineer: "Map findings to HIPAA Technical Safeguards."

Agent 4:   Reads compliance profile → requirement mapping
           Aggregates findings per requirement
           → compliance status report

One engineer. Four agents. The agents work because every step has a machine-verifiable contract. The engineer's job shifts from operating the tool to directing agents and reviewing results. The security expertise is still human — which questions to ask, which findings matter most and the business context. The mechanical work such as collection, transformation, evaluation, export, reasoning is agent-executed.

The staffing math changes:

Traditional CSPM	Agent-centric CSPM
5 roles × $150K = $750K/year	1 Security Architect (Agent Orchestrator) × $200K = $200K/year
Tool: $50K-$100K	Tool: $0 (open source)
Time to value: 3-6 months	Time to value: days
Scales by hiring	Scales by adding agents

This isn't theoretical. The contracts exist. The trials passed. The agent templates ship in the repo. An engineer who runs the demo today can direct agents against their own infrastructure tomorrow.

Why monolithic tools can't match this capability

A monolithic security tool where collection, evaluation, and reporting are one binary with one proprietary format can add an AI chatbot. It can add an LLM-generated summary. It can add a copilot sidebar.

What it can't add is agent-developable composition. Because composition requires:

Separate steps with independent contracts (monolith has one step)
Standard formats between steps that any agent framework can read (monolith has proprietary internals)
Machine-verifiable assertions at each boundary (monolith validates internally, opaquely)
Published reasoning specs that agents execute independently (monolith's reasoning is embedded in code)

Retrofitting these properties means decomposing the monolith which means abandoning the architecture. The Unix philosophy isn't a feature. It's a structural decision that produces emergent properties you can't add later.

Every enterprise customer has their own tools — their own CMDB, their own collector, their own SIEM, their own compliance framework. A monolithic scanner says: use our collector, our evaluator, our dashboard. An agent-centric pipeline says: bring your tools, target our contracts, agents compose the pipeline.

Customer A:  Steampipe       → Stave → Z3       → Splunk
Customer B:  AWS Config      → Stave → Soufflé  → Jira
Customer C:  Terraform state → Stave → Clingo   → PagerDuty
Customer D:  Custom CMDB     → Stave → Prolog   → Neo4j

Four customers. Four different collectors. Four different reasoning engines. Four different downstream consumers. The same evaluation contracts in the middle. Zero custom integration code. The variation is absorbed by the contracts at the boundaries, not by adapters inside the tool.

Cloud security for the agentic era

The shift is already happening. Google's defensive roadmap calls for agentic SOC. AWS is adding agent capabilities to Security Hub. Every vendor is racing to add AI to their existing tools.

The question isn't whether agents will operate security platforms. It's whether the tools are built for agents to operate — or whether agents are added as a layer on top of tools built for humans.

Stave is built for agents to operate. Not because we planned it. Because the architectural decisions that make a tool maintainable by one person are the same decisions that make it operable by agents: standard contracts, binary assertions, deterministic evaluation, composable steps, published reasoning specs.

The landing page says: Cloud Security for the Agentic Era.

It means: one engineer with agents runs a cloud security program that used to require a team. The contracts are published. The trials are passed. The architecture is proven.

The era where cloud security required a team to operate a tool is ending. The era where one engineer directs agents against a published contract platform is beginning. What started as a CLI tool built for one developer's constraints — small tools, standard formats, deterministic evaluation — became the platform the agentic era needs.

That wasn't the plan. It's better than any plan could have been.

Stave is an open-source cloud security platform. 2,650+ controls, 585 compound chains, 109 per-asset-type JSON Schemas, 17 Steampipe mappings, 5 validated reasoning specs, 9 independent reasoning engines. Every pipeline boundary has a machine-verifiable contract that agents develop against. Try it: bash examples/demo-ai-security/run.sh