Bala Paranj

Posted on May 17

Z3 Can Prove Your Cloud is Unsafe. It Can't Tell You Why.

#cloud #security #aws #go

Z3 is one of the most powerful reasoning engines ever built. Microsoft Research created it to verify chip designs and flight software. It can take your cloud configuration, model it as a set of logical assertions, and mathematically prove whether an attack path exists.

Z3 says when it finds one:

sat

Three letters. No context. No explanation. No link to the configuration property that caused it. No fix. Just "sat" — which means "satisfiable," which means "the unsafe state you asked about is reachable," which means your cloud is vulnerable. Probably. If you encoded the question correctly. Which you can't verify from the output.

This is the gap between a powerful engine and a useful tool. The engine answers the question. The tool explains the answer. This article explains why the explanation layer matters more than the engine, and what it looks like in practice.

What Z3 outputs

Let's say you want to check whether an anonymous internet user can read PHI data from an S3 bucket through a Cognito identity pool. You model the configuration as SMT-LIB assertions, write a query, and run it through Z3.

The input Z3 sees:

(set-logic ALL)
(declare-fun allows_unauthenticated (String String) Bool)
(declare-fun maps_unauth_to (String String) Bool)
(declare-fun has_action (String String) Bool)
(declare-fun has_tag (String String) Bool)

(assert (allows_unauthenticated "pool-abc" "true"))
(assert (maps_unauth_to "pool-abc" "role/AppUnauthRole"))
(assert (has_action "role/AppUnauthRole" "s3:GetObject"))
(assert (has_tag "bucket-prod-phi" "data_classification:phi"))

(declare-const principal String)
(declare-const action String)
(declare-const resource String)

(assert (allows_unauthenticated principal "true"))
(assert (has_action principal action))
(assert (has_tag resource "data_classification:phi"))

(check-sat)
(get-model)

The output Z3 produces:

sat
(
  (define-fun principal () String "pool-abc")
  (define-fun action () String "s3:GetObject")
  (define-fun resource () String "bucket-prod-phi")
)

In SMT-LIB, sat means the forbidden state is reachable. The model names the specific pool, action, and bucket. The proof is mathematically sound.

If you're a security engineer who just wants to know whether your Cognito configuration is safe, this output is useless. You have three questions:

Is this result correct? Did the input assertions match your configuration, or did the translation introduce a bug?
What does it mean? Which specific settings in which specific files create the vulnerability?
What do I do about it? What's the fix, what does it cost, and how do I verify it worked?

Z3 answers none of these. It answered the math question. The security question is still open.

The two translation boundaries

Between your cloud configuration and Z3's answer, there are two translation steps. Each can introduce bugs. Each is invisible if you only look at the solver's output.

YOUR CLOUD CONFIG          THE MATH              YOUR ANSWER

  S3 bucket policy    →    SMT-LIB assertions    →    "sat"
  IAM role policy          (was the translation        (was the translation
  Cognito settings          correct?)                   back to cloud
                                                        terms correct?)

  ENCODING BOUNDARY        Z3 SOLVER              DECODING BOUNDARY
  (can have bugs)          (trusted)              (can have bugs)

Encoding bugs: Your S3 bucket has PublicAccessBlock.BlockPublicPolicy = true, but the translation emits has_public_read "bucket" "true". The assertion is wrong — the bucket is private, but Z3 thinks it's public. Z3 faithfully returns sat based on the wrong input. The "vulnerability" doesn't exist. You can't tell from sat that the encoding was wrong.

Decoding bugs: Z3 returns sat with a model. You read the model and conclude "the bucket is directly accessible from the internet." But the actual path is through the Cognito identity pool, not direct access. The model told you which variables were assigned, but the interpretation of those variables — what they mean in cloud terms — is your responsibility. Misread the model, misunderstand the vulnerability.

The solver is the most reliable component in the pipeline. The translation layers are where the bugs hide. And they're the layers that get the least attention because everyone focuses on the solver.

What an orchestration layer provides

An orchestration layer wraps the solver with five capabilities that make the answer trustworthy, traceable, and actionable:

1. Encoding explanation — "Did the tool understand my configuration?"

Before the solver runs, you see what the tool extracted from your configuration, in your language:

Configuration Summary: 7 assets, 43 facts extracted

Asset: arn:aws:s3:::prod-phi (S3 Bucket)
  ├── Public read access: ENABLED
  │   Source: prod-phi-bucket.obs.json → access.public_read
  │   Fact: a3f8c2e91b04
  │
  ├── Encryption: AES256 (SSE-S3, not KMS)
  │   Source: prod-phi-bucket.obs.json → encryption.algorithm
  │   Fact: b7d1e4f03c89
  │
  └── Data classification: PHI
      Source: prod-phi-bucket.obs.json → tags.data_classification
      Fact: c9e2a1f04d77

Asset: arn:aws:cognito-identity:...:identitypool/abc (Identity Pool)
  ├── Unauthenticated access: ALLOWED
  │   Source: cognito-pool.obs.json → identity.access.allow_unauthenticated
  │   Fact: d4e5f6a7b8c9
  │
  └── Maps unauthenticated users to: arn:aws:iam::111122223333:role/AppUnauthRole
      Source: cognito-pool.obs.json → identity.cognito.unauth_role_arn
      Fact: e5f6a7b8c9d0

No SMT-LIB. No predicate names. "Public read access: ENABLED" — the security engineer compares this to their mental model of the bucket. If the bucket is private, the encoding is wrong and the engineer knows before the solver runs.

Every fact has a unique identifier and a traceable source showing which file and which property path produced it. This is the audit trail. When the solver's answer doesn't match expectations, the engineer traces the identifier back to the source and checks whether the encoding was correct.

2. Human verdict — "What does the result mean?"

After the solver runs, you see the answer in security language:

VERDICT: UNSAFE

An anonymous internet user can read PHI data from the prod-phi
bucket through the Cognito identity pool.

The forbidden state is reachable because:
  1. Identity pool allows unauthenticated access
     (cognito-pool.obs.json → identity.access.allow_unauthenticated = true)
  2. Unauthenticated users receive credentials for AppUnauthRole
     (cognito-pool.obs.json → identity.cognito.unauth_role_arn)
  3. AppUnauthRole has s3:GetObject permission
     (iam-role.obs.json → policies.attached_policies[0].Action)
  4. Target bucket contains PHI
     (prod-phi-bucket.obs.json → tags.data_classification = phi)

Not sat. Not a model with define-fun expressions. A four-step chain in plain English, each step linked to a specific file and property path. The security engineer reads it and knows exactly which settings create the vulnerability and where they live.

3. Fix guidance — "What do I do about it?"

FIX: Disable unauthenticated access on the identity pool.

  aws cognito-identity update-identity-pool \
    --identity-pool-id us-east-1:abc123 \
    --no-allow-unauthenticated-identities

  Cost: $0. Time: 30 seconds.
  Effect: Breaks the chain at step 1.

VERIFICATION: After applying the fix, re-run the analysis.
  Expected result: SAFE
  (Z3 returns UNSAT — no assignment of principals, actions,
  and resources can satisfy the attack path conditions.)

The fix is a shell command. The cost is quantified. The effect names the specific chain step that breaks. The verification tells the engineer what to expect and explains the solver's output in cloud terms.

4. Traceability — "Which property caused this?"

Every step in the verdict traces back to a specific property in a specific file through a unique identifier:

# The verdict says step 1 caused the chain.
# Trace identifier d4e5f6a7b8c9:

grep "d4e5f6a7b8c9" facts.jsonl

# Returns:
# fact_id: d4e5f6a7b8c9
# subject: pool-abc
# predicate: allows_unauthenticated = true
# source: cognito-pool.obs.json
# property: identity.access.allow_unauthenticated
# captured: 2026-05-01T00:00:00Z

One identifier. One grep. Full trace from the verdict to the configuration property, including when the snapshot was taken. No manual correlation across output files.

5. Encoding verification — "Can I trust the translation?"

The orchestration layer verifies its own encoding by comparing each extracted fact against the raw configuration:

Encoding verification: 43/43 facts verified ✓

Every extracted fact matches the corresponding property
in the observation file. The solver's input is consistent
with the configuration snapshot.

Or, when there's a bug:

Encoding verification: 41/43 facts verified

MISMATCH:
  Fact d4e5f6a7b8c9:
    Extracted: allows_unauthenticated = "true"
    Observation: identity.access.allow_unauthenticated = "false"
    File: cognito-pool.obs.json
    → ENCODING BUG: fact says unauthenticated is allowed,
      but the configuration says it's disabled.
      The UNSAFE verdict may be incorrect.

No other tool in the market verifies its own translation layer. The security engineer doesn't need to read SMT-LIB to trust the result — the tool proves its encoding is correct, or reports where it is wrong.

What you get without orchestration vs. with it

Capability	Z3 alone	With orchestration
Prove an attack path exists	`sat`	UNSAFE: anonymous user can read PHI data
Explain which settings cause it	Read the SMT-LIB model	Four-step chain with file names and property paths
Verify the encoding is correct	Manually review assertions	Automated: 43/43 facts match observations
Trace the finding to a file	Grep through comments	One identifier, one grep, full trace
Get the fix	Derive it from the model	Shell command, cost, time, expected result
Run multiple solvers	Reformat per solver	Same input, three solvers, consensus
Explain to an auditor	Show them SMT-LIB	Show them the chain in English with evidence

The right column is the product. The left column is a math library. Security engineers don't need a math library — they need trustworthy answers in their language.

Why this matters more for compound detection

The orchestration layer is most valuable when the analysis spans multiple services. A single-service check ("is this bucket public?") is simple enough to verify manually. A cross-service check ("can an anonymous user reach PHI data through a chain of Cognito → IAM → S3?") involves three services, three configuration files, and dozens of properties.

The encoding has more places to be wrong. The verdict has more steps to explain. The traceability has more links to follow.

This is where raw solver output becomes dangerous. If Z3 returns sat on a three-service chain and the encoding has a bug in the IAM layer, the finding is a false positive. The security team triages it as critical, burns two sprints investigating, and discovers it was caused by a property path error in the translation code. The encoding verification catches that error before the solver runs.

If Z3 returns unsat and the encoding has a bug — a property that should be true was encoded as false — the finding is a false negative. The team thinks they're safe. They're not. The encoding verification catches that too.

The more complex the analysis, the more valuable the translation layer. Single-service checks can tolerate raw solver output. Cross-service compound detection cannot.

The bottom line

Don't evaluate formal verification tools by which solver they use. Z3, cvc5, and Yices all produce correct answers to the questions they're asked. The question is whether the question was asked correctly and whether the answer is translated back to your language.

Don't accept sat as an answer. Accept "UNSAFE: an anonymous internet user can read PHI data from the prod-phi bucket through the Cognito identity pool, caused by these four settings in these three files, fixable with this one command for $0 in 30 seconds, verified by three independent solvers, encoding confirmed correct against 43 observation properties."

That's an answer. sat is a data point.

The orchestration layer described in this article is implemented in Stave, an open-source static analysis tool that evaluates cloud configurations via CEL predicates and exports standardized facts for consumption by nine independent reasoning engines. The encoding explanation, verdict translation, traceability, and encoding verification are built on Stave's fact_id provenance chain — every fact carries a deterministic identifier linking the solver's input to the specific observation file and property path that produced it. All analysis runs on air-gapped snapshots with no cloud credentials required.

DEV Community