Bala Paranj

Posted on Jun 18

Cloud Computing is Missing One Component. Everyone Builds the Wrong Five.

#devops #cloud #architecture #security

Your car has an engine, a transmission, wheels, a steering wheel, and a cruise control. Remove the cruise control and the car still drives — but YOU become the cruise control. You watch the speedometer, compare it to the speed limit, and adjust the pedal. The car is functionally complete. You're the missing component.

Cloud computing works the same way. It has engines: Lambda, Kubernetes controllers, Terraform apply. It has transmission: CI/CD pipelines, GitOps, APIs that carry intent to infrastructure. It has tools: cloud APIs that touch resources. It has an interface: the CLI, the console, the IaC files.

What it's missing is the control unit — the component that senses the current state, compares it to the declared intent, and signals when they diverge. Without it, YOU are the control unit. You read the AWS console, compare it to what you intended, and file a Jira ticket when something's wrong.

The cloud is a car without cruise control. You're the speedometer, the comparator, and the pedal.

Except it's worse than that. A car travels at 70 mph. A human can react at 70 mph — the speed mismatch is manageable. Cloud infrastructure operates at the speed of packets traveling through fiber: a misconfigured IAM role is exploitable the instant it's deployed. A public S3 bucket is discoverable by automated scanners within minutes. A credential leaked to a public repository is harvested by bots in seconds.

The human control unit operates at the speed of: read an email alert (minutes), open a dashboard (minutes), triage the finding (minutes to hours), decide if it's real (hours), open a Jira ticket (minutes), assign it to a team (hours), wait for the fix (days), verify the fix (hours).

Infrastructure speed:    Misconfiguration exploitable in seconds
Scanner speed:           Finding appears in minutes to hours
Human speed:             Triage → decision → fix → verify in days

The gap between exploitation and correction: days to weeks
The gap between deployment and exploitation: seconds

The human isn't just the weakest link in the control loop. The human is incapable of being the control unit for a system that operates at this speed. It's a physics problem — biological response time cannot match electronic propagation speed. Asking humans to be the comparator for cloud infrastructure is like asking a pedestrian to be the cruise control for a jet.

This is a losing system by design. Because the system assigned the control function to the one component that can't operate at the system's speed.

And speed is only the first disqualification. Humans have at least four structural limitations that make them unsuitable as control units for cloud infrastructure:

They get tired. Alert fatigue is not a metaphor — it's a biological response. After the 50th alert in a shift, the human stops reading them. The 51st alert is the one that matters. The control unit stopped functioning three hours ago.

They're not available. The infrastructure runs at 3am on a Saturday. The human doesn't. The misconfiguration deployed at 3:07am is exploitable by 3:08am. The human reads the alert at 9am Monday. The system was uncontrolled for 54 hours.

They have skill gaps. Cloud IAM has 17,000 actions. No human knows all of them. The misconfiguration that composes a Cognito identity pool, an IAM trust policy, and an S3 bucket policy requires expertise across three services. The engineer who deployed the Cognito pool doesn't understand S3 bucket policies. The engineer who wrote the bucket policy doesn't understand IAM trust chains. The compound risk lives in the gap between two engineers' knowledge — and neither one sees it.

They don't compose. A human reviewing one finding can assess it. A human reviewing 200 findings cannot mentally compose them to detect that findings #47, #112, and #189 form a compound attack path. The composition requires evaluating every pair and triple of findings for shared trust relationships. For 200 findings, that's 1.3 million triples. The human reviews them sequentially. The control unit must compose them simultaneously.

A thermostat doesn't get tired. It works at 3am. It knows exactly one thing (is the temperature different from the setpoint?) and it knows it perfectly. It composes with other thermostats without coordination. It is the simplest possible control unit — and it's more reliable than any human at the one job it does.

Cloud security needs a thermostat, not a more alert human.

The thermostat your cloud doesn't have

Your home has a furnace, ductwork, and vents. Without a thermostat, you're the control system — you feel cold, walk to the furnace, turn it on, feel warm, turn it off. The furnace works. The ducts work. The vents work. You're the missing component.

Now imagine someone sells you a smart thermometer — it measures the temperature and beeps when it's too cold. You still walk to the furnace. You still turn it on. You still decide when to turn it off. The thermometer added a sensor but didn't close the loop. You're still the control system. You just have a louder alarm.

That's what every cloud security product is today. A thermometer with an alarm — relabeled as a thermostat. Scanners sense the current state. Dashboards display findings. Alerts fire in Slack. The vendor calls this "continuous monitoring" and "automated response." But the human still triages the alert, decides if it's real, figures out what to do, opens a Jira ticket, assigns it to a team, waits for the fix, and re-scans to verify. An alert is not control. An alert is notification that control is absent. The product added a sensor and a speaker. The loop is still open. The human is still walking to the furnace.

A thermostat is different. It has three things a thermometer doesn't: a setpoint (72°F — what you DECLARED), a comparator (is the current temperature different from the setpoint?), and a signal (tell the furnace to act). You declare your intent once. The system closes the loop. You're out of it.

Thermometer with alarm (today's scanners):
    Sensor: "it's 58°F"  →  Alert: "it's cold!"  →  Human decides  →  Human acts

Thermostat (what's missing):
    Setpoint: 72°F  →  Sensor: 58°F  →  Comparator: divergence  →  Signal: furnace ON
    No human in the loop.

Cloud infrastructure has the furnace (Lambda, Kubernetes — they execute), the ductwork (CI/CD, APIs — they carry changes), and the vents (cloud resources — they touch the environment). Every security product adds a better thermometer — more sensors, louder alarms, prettier dashboards. But the human is still walking to the furnace.

What's missing is the thermostat: the component that knows what you want (declared invariants), measures what you have (observed state), compares them deterministically (evaluation), and signals the correction path (exit codes that CI/CD acts on). Declare your intent once. The pipeline enforces it on every push, every PR, every scheduled run. The human declared the setpoint. The system closes the loop.

Stave as thermostat:
    Setpoint: 2,650 invariants (declared once)
    Sensor: observation snapshot (captured by existing collectors)
    Comparator: stave apply → exit 0 (matches) or exit 3 (diverges)
    Signal: CI/CD pipeline blocks the merge, triggers rollback, fires alert
    Human: declared intent once → out of the loop

That's the missing component. A thermostat that closes the loop.

The five components every system needs

There's a law in systems engineering — TRIZ's Law of System Completeness. This law establishes the number and functionality of the principal parts of any autonomous technological system.

Component	What it does	Car example
Engine	Source of energy, authority, or purpose	Internal combustion engine
Transmission	Converts engine output into a form the tool can use	Gearbox + driveshaft
Tool	The part that contacts and changes the environment	Wheels on road
Interface	Surface through which a human supplies intent	Steering wheel + pedals
Control unit	Senses, compares, and corrects — closes the loop	Cruise control + ABS + traction control

A system missing any component can't operate autonomously. A human (or another system) must supply the missing part. As the system evolves, each missing part gets internalized, until the system is complete and operates independently.

Where cloud computing is stuck

Cloud evolved four of five components over the last fifteen years:

Phase	What emerged	Cloud example
Phase 0	Human is the whole system	Hand-written scripts, manual deploys, manual safety checks
Phase 1	Transmission emerged	APIs, CI/CD, GitOps, IaC — carry intent but don't judge it
Phase 2	Engine emerged	Lambda, Kubernetes controllers, Terraform apply, IAM policy engine — execute instructions but don't judge correctness
Phase 3	Control unit — still missing	Humans remain the primary verifier

TRIZ identifies this sequence as a universal pattern in the evolution of technological systems: human components are dislodged in a predictable sequence — transmission first, engine second, control last. Transmission is dislodged first because it requires the least autonomy — carrying things from A to B is a mechanical replacement. Engine is dislodged second because execution requires more autonomy but still follows instructions. Control is dislodged last because it requires the most autonomy — it must sense, compare against declared intent, and decide whether the output is correct. That last step requires something no mechanical replacement can infer: what the human meant. Until the human declares intent in a form the machine can evaluate, the human remains the control unit by default. Cloud computing is stuck at the point this law predicts: transmission and engine are fully mechanized, and the human is still the comparator, because control requires declared intent, and the industry hasn't built the declaration layer.

Visualised as the five-component system:

System Completeness

The mismatch is predictable from the law: when the Engine and Transmission exist but the Control unit doesn't, propagation is fast but detection is slow. Changes deploy globally in seconds. The human who should have caught the misconfiguration finds out hours or days later — if they find out at all.

This explains every recurring pattern in cloud security: breaches that persist for months before detection. Misconfigurations that propagate across regions before anyone notices. Drift that accumulates because nobody's comparing the current state to the declared intent continuously.

The cloud has a powerful engine and a fast transmission connected to nothing that checks whether the output is correct.

The obvious fix and why it's an illusion

The obvious fix: build a product that internalizes all five components. A single platform that collects, evaluates, remediates, and monitors. Every cloud security vendor claims this:

Vendor product (claims all five):
    Engine:       Built-in rule engine         ← exists
    Transmission: Built-in collectors          ← exists
    Tool:         Built-in remediation         ← claimed
    Interface:    Built-in dashboard           ← exists
    Control:      Continuous monitoring loop   ← claimed

Look closer at Control and Tool. What vendors call "continuous monitoring" is alerting — sense and notify. An alert is not control. An alert is notification that control is absent. The thermometer beeps. The human walks to the furnace. The vendor labeled the beep "control."

What vendors call "remediation" is auto-fix for a handful of simple cases — disable a public bucket, rotate a key. For anything compound, ambiguous, or risky, the remediation is: open a Jira ticket. A Jira ticket is not remediation. It's a request for a human to remediate. The vendor labeled the ticket "tool."

Strip the labels and look at what happens when a finding fires:

Vendor "control loop" in practice:
    1. Scanner detects misconfiguration          (sense — real)
    2. Dashboard shows finding                   (alert — not control)
    3. Alert fires in Slack                      (louder alert — still not control)
    4. Human triages alert                       (human is the comparator)
    5. Human decides if it's real                (human is the decision-maker)
    6. Human opens Jira ticket                   (human initiates correction)
    7. Another human fixes it days later         (human is the actuator)
    8. Scanner re-scans to verify                (sense again — loop took days)

That's not a control loop. That's a notification pipeline with humans at every decision point. The "Control" component in the vendor's five-component claim is a sensor relabeled as a controller. The "Tool" component is a ticket system relabeled as an actuator.

The cloud isn't stuck between Engine and Control because vendors haven't built Control. It's stuck because what vendors CALL Control is alerting — and alerting is the symptom of missing control, not the solution.

The resolution: supply only the missing piece

The cloud already has four of the five components:

Engine:        ✅  Lambda, Kubernetes, Terraform, IAM policy engine
Transmission:  ✅  CI/CD, GitOps, APIs, IaC
Tool:          ✅  Cloud APIs that touch resources
Interface:     ✅  CLI, console, IaC files
Control:       ❌  Missing — humans are the comparator

Four components exist and are mature. They don't need to be rebuilt inside a security product. They need to be CONNECTED to the missing fifth component.

The missing component isn't a product. It's a function: sense the current state, compare it to declared intent, signal when they diverge. Sense → Compare → Correct.

Split that function across the boundary:

Control sub-step	Who supplies it
Sense (capture current state)	The operator's existing collectors — Steampipe, AWS Config, Terraform state, custom exporters
Compare (evaluate against declared intent)	The missing piece — a deterministic comparator that reads observations and evaluates invariants
Correct (act on the divergence)	The operator's existing CI/CD — GitHub Actions, Jenkins, ArgoCD, PR review, ChatOps

The only component that doesn't already exist in the operator's environment is the comparator. Everything else — the collectors that sense, the CI/CD that corrects, the cloud APIs that act — is already running.

Supply the comparator. Connect it to what exists. The five-component system is complete — assembled from the operator's existing stack plus one new binary, not rebuilt from scratch inside a monolith.

What the comparator looks like

# Sense: the operator's existing collector captured a snapshot
ls observations/
# s3-2026-05-16.obs.json  iam-2026-05-16.obs.json

# Compare: the missing piece — deterministic evaluation
stave apply --observations ./observations --now 2026-05-16T00:00:00Z
# Exit 0 = intent matches reality
# Exit 3 = divergence found (12 findings, 3 compound chains)

# Correct: the operator's existing CI/CD acts on the result
# GitHub Action blocks the PR. ArgoCD triggers rollback. Slack alert fires.

The comparator is a pure function: files in, findings out. It doesn't collect (the operator's Steampipe does that). It doesn't remediate (the operator's CI/CD does that). It doesn't monitor continuously (the operator's cron job or GitHub Action schedule does that). It COMPARES — the one function the cloud was missing.

The system completeness map

TRIZ component	Inside the comparator	In the operator's existing stack
Engine	2,650 controls + 585 chains (declares what must be true)	—
Transmission (consumer)	Adapters that read observation JSON	—
Transmission (producer)	—	Steampipe, AWS Config, custom collectors
Tool	—	Cloud APIs, Terraform, Kubernetes, CI/CD
Interface	CLI with exit codes + JSON output	—
Control: Sense	—	Collector cron jobs, on-commit triggers
Control: Compare	`stave apply`, `stave diff`, `stave gaps`	—
Control: Correct	—	GitHub Actions, Jenkins, ArgoCD, PR review

The complete system exists. It's just not a single product. The comparator supplies the missing component — the Engine (what must be true) and the Compare step (does reality match). Everything else was already running.

Why this is better than vertical completeness

Property	Vertically complete product	Comparator + existing stack
Credentials	Required — the product collects and remediates	Not required — the comparator reads files
Blast radius	Unbounded — the product modifies infrastructure	Zero — the comparator can't change anything
Failure modes	Many — daemon, agent, dashboard, API connections	One — the binary crashes (and it's a pure function, so restart it)
Lock-in	Total — replace the product, replace everything	Minimal — replace the comparator, keep your collectors and CI/CD
Adoption cost	Deploy the product, integrate every layer	Add one step to existing CI/CD
Component reuse	None — the product rebuilds what you already have	Full — uses your existing Steampipe, your existing CI/CD, your existing cloud APIs

The vertically complete product rebuilds four components that already exist so it can supply the fifth. The comparator supplies the fifth and connects to the four that exist. Same completeness. Fraction of the mechanism.

The reconciliation loop you don't have to build

The original design for this architecture was vertically complete. A Kubernetes-style reconciliation loop that would sense, compare and correct. Declared invariant → observed infrastructure → API calls to fix violations → loop.

It was cut. Kubernetes proves the pattern works. It was cut because the remediation half requires understanding what "fix" means for every resource type, needs credentials to act, and has unbounded blast radius when the "fix" is wrong.

Removing the remediation loop accidentally produced a better architecture. The comparator is a pure function: deterministic, reproducible, composable, provable. A reconciliation loop with API calls is none of those things — it has side effects, race conditions, and blast radius.

The operator's existing CI/CD is the remediation loop. It's battle-tested. It has rollback. It has approval gates. It has audit trails. Rebuilding it inside a security product would be worse than using it.

Supply the missing component. Don't rebuild the components that already work.

Tips for Founders

If you're building a developer tool and find yourself rebuilding components your users already have — their CI/CD, their monitoring, their cloud access, their change management — stop and ask: which component is missing?

The TRIZ Law of System Completeness says: a system needs five components to operate autonomously. It doesn't say: one product must contain all five. The five components can span multiple systems. The product's job is to supply the MISSING one and connect to the rest.

A well designed product is the one that does one thing the ecosystem can't do — and trusts the ecosystem for everything else. The messiest product is the one that rebuilds the ecosystem inside itself so it can control every layer.

Cloud computing is missing a control unit. Supply the control unit. Don't rebuild the engine, the transmission, the tool, and the interface inside it.

The comparator described here — 2,650 controls, 585 compound chains, deterministic evaluation, exit codes for CI/CD, no credentials, no network, no persistent state — is Stave, an open-source Risk Reasoner. It supplies the missing component. Your existing stack supplies the rest. Try it: bash examples/demo-ai-security/run.sh

Top comments (3)

VoltageGPU • Jun 20

As someone working on secure compute infrastructure, I find the idea of a dedicated control unit particularly relevant when it comes to managing GPU resources in the cloud. In projects like VoltageGPU, we’ve seen how tightly coupled components can lead to inefficiencies in resource allocation and security enforcement. Separating the control plane could help enforce isolation and dynamic correction in heterogeneous compute environments.

Bala Paranj • Jun 20

Hardware engineers are 30 years ahead of AI development. They use formal verification, they don't call Computer = CPU + Harness. They know computer consists of subsystems.

Bala Paranj • Jun 18

For diagrams, please see: gist.github.com/sufield/45d82dbed0...