Bala Paranj

Posted on May 9

Your Private API is Currently Safe. One Developer Change Away From Unsafe.

#security #aws #cloud #appsec

A 2023 Medium tutorial walks through restricting a private API Gateway to a single EC2 host in a single VPC. The author's intent — read literally — is correct: make the API reachable only from inside the VPC, only through the VPC endpoint they create, only by the EC2 they specify. The configuration they publish does this almost. It also has two active gaps and one latent gap. The active gaps are visible to a careful reader of AWS documentation. The latent gap is invisible until a future change activates it.

Z3 from Microsoft Research runs four queries against the published configuration and proves all three. The two active gaps return SAT with concrete witnesses. The latent gap returns UNSAT on the published configuration but SAT on a one-line variant — the kind of variant a developer would introduce while adding a new method or stage. That asymmetry is the article's central argument: a configuration's current safety status and its structural fragility are different questions. Heuristic scanners only check the first. Formal verification can answer both.

The configuration

The author's resource policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Deny",
      "Principal": "*",
      "Action": "execute-api:Invoke",
      "Resource": "execute-api:/prod/GET/*",
      "Condition": {
        "StringNotEquals": {
          "aws:sourceVpc": "vpc-0b52ca08e7db8531f"
        }
      }
    },
    {
      "Effect": "Allow",
      "Principal": "*",
      "Action": "execute-api:Invoke",
      "Resource": "execute-api:/prod/GET/*"
    }
  ]
}

The API uses --authorization-type NONE. The resource policy is the only access control on the API itself.

Three claims, three Z3 queries

Claim 1: Allow/Deny resource pattern alignment

Both statements scope to execute-api:/prod/GET/*. Today, that means every request the Allow admits is also covered by the Deny condition. The Deny applies whenever the request comes from a VPC other than vpc-0b52ca08e7db8531f; the Allow is unconditional but scoped to the same resource pattern. The two statements together restrict /prod/GET/* to traffic from the VPC.

Z3 query: is there a (stage, method, resource) witness that the Allow admits but the Deny does not?

Verdict on the published config: UNSAT. Every witness the Allow admits is also blocked by the Deny. The current design is aligned.

But the design is fragile: there's no structural guarantee that the Allow and Deny patterns stay aligned under future edits. To demonstrate, consider what a developer would do when asked to "add a dev stage so QA can test against it." The natural change is to broaden the Allow:

 {
   "Effect": "Allow",
   "Principal": "*",
   "Action": "execute-api:Invoke",
-  "Resource": "execute-api:/prod/GET/*"
+  "Resource": "execute-api:/*"
 }

The Deny stays at /prod/GET/* because nobody noticed it needed to change too. Re-run Z3 against this variant:

Verdict on the broadened-allow variant: SAT.

verdict: SAT — witness: stage=prod method=POST resource=execute-api:/prod/POST/users
         (Allow widened to execute-api:/* — Deny still scoped to
          /prod/GET/*, so this resource is allowed without any VPC
          restriction)

The witness is concrete. execute-api:/prod/POST/users is admitted by the broadened Allow (matches execute-api:/*) and not blocked by the Deny (does not match execute-api:/prod/GET/*). The Deny condition's VPC restriction does not apply to this resource at all. The new POST endpoint is publicly invocable — without any VPC restriction — even though the API is still configured as a private REST API.

The published configuration is currently safe because the Allow and Deny patterns happen to be identical. That equality is structural. It depends on a human remembering, every time either pattern changes, to update the other. Z3 can detect when that invariant breaks.

Claim 2: aws:sourceVpc admits every endpoint in the VPC

The Deny condition uses aws:sourceVpc:

"Condition": {
  "StringNotEquals": {
    "aws:sourceVpc": "vpc-0b52ca08e7db8531f"
  }
}

This restricts to traffic from any source in vpc-0b52ca08e7db8531f. The author's narrative says "restrict to the VPC endpoint we created" — but the condition restricts to the entire VPC, not the specific endpoint.

In a shared enterprise VPC, multiple teams create multiple VPC endpoints. Each endpoint is a separate network path; each has its own security group and its own service. The author's intended endpoint (vpce-0abc123def456789) is one of them. Any other interface endpoint in the same VPC, with permissive security-group rules, is a path to this private API that the author did not intend.

Z3 query: is there a VPC endpoint in vpc-0b52ca08e7db8531f that is not vpce-0abc123def456789, but reaches the API anyway?

Verdict on the published config: SAT.

verdict: SAT — witness: vpce-0999888777666555
         (in vpc-0b52ca08e7db8531f, NOT vpce-0abc123def456789)
         (Deny condition uses aws:sourceVpc which matches every
          endpoint in the VPC, not just the intended one)

The witness is the exact failure mode: a different VPC endpoint in the same VPC, created by another team for another service, can route traffic to this API. The security group on the intended endpoint restricts to one EC2 IP, but the resource policy doesn't care which endpoint a request came through — it only checks the source VPC.

The fix is one word in the condition key:

 "Condition": {
   "StringNotEquals": {
-    "aws:sourceVpc": "vpc-0b52ca08e7db8531f"
+    "aws:sourceVpce": "vpce-0abc123def456789"
   }
 }

aws:sourceVpce matches the specific endpoint ID, not the entire VPC. After this change, the Deny applies to every endpoint except the named one; the witness from above is denied.

Claim 3: No authorization compounds the gap

The API method uses --authorization-type NONE. There is no IAM authorization, no Cognito user pool, no Lambda authorizer. The resource policy is the only gate. If the resource policy has a gap (claim 2: it does), no second layer catches it.

Z3 query: is there a request that reaches the API with auth_type=NONE AND a non-intended VPC endpoint?

Verdict on the published config: SAT — the same witness as claim 2, plus auth_type: NONE. Any workload in the VPC, on any endpoint, with any identity (or no identity), can invoke the API.

This is not a discrete vulnerability so much as a defense-in-depth failure. AWS provides four authorization options at the method level (NONE, AWS_IAM, COGNITO_USER_POOLS, CUSTOM). Three of the four require the request be authenticated by some mechanism; the resource policy is then a second layer that further restricts based on source. With NONE, the resource policy is the only layer.

 aws apigateway put-method \
   --rest-api-id timegateway-id \
   --resource-id root-id \
   --http-method GET \
-  --authorization-type NONE
+  --authorization-type AWS_IAM

After this change, requests must be SigV4-signed by an authorized IAM principal. The resource policy still applies; the IAM check is a second layer that catches resource-policy gaps that creep in over time.

Why this is a logic question

CEL evaluates a state predicate. Stave's existing CTL.APIGATEWAY.NETWORK.PRIVATE.POLICY.001 checks one boolean: resource_policy_restricts_vpc. The published configuration sets this to true (the policy does restrict by VPC, just by the wrong condition key). The control reports clean.

Z3 doesn't pattern-match. It runs SAT searches over witness sets:

exists witness:
  resource_policy_admits(witness)
  AND witness != intended

For Claim 1, the witness set is a few stage/method tuples. The query asks whether any tuple is admitted by the Allow but not blocked by the Deny. On the published config: no — the patterns happen to match. On the broadened-allow variant: yes — prod/POST/users is admitted but not blocked.

For Claim 2, the witness set is several VPC endpoint ARNs in the same VPC. The query asks whether the Deny fails to apply for any non-intended endpoint. On the published config: yes for every non-intended endpoint — the condition's StringNotEquals on aws:sourceVpc
is false (the source IS in the VPC), so the Deny doesn't fire, so the Allow takes over.

For Claim 3, the query compounds Claim 2 with the auth_type=NONE field. On the published config: yes — same witness as Claim 2 plus no auth check.

CEL's predicate vocabulary doesn't include "find a witness." Z3's does.

The reframe

Earlier version of this example series refuted a suspicion about KMS Resource: "*" — the pattern looked unsafe to a heuristic scanner; Z3 proved it was correctly scoped because of KMS-specific semantics. The lesson was: formal verification is right when you expect it to be right and right when you expect it to be wrong.

Later it demonstrates the converse case. Finding 1a returns UNSAT — the published configuration is currently aligned. A heuristic might stop there and declare the configuration safe. Z3 doesn't stop there. The same query, against a one-line variant that represents a common developer change, returns SAT.

The article's claim is not "your config is broken right now." It's "your config is one developer change away from being broken, and the change you'd most naturally make is the one that breaks it." That's a stronger statement than a heuristic can make because it requires reasoning about the configuration under perturbation — which is exactly what an SMT solver gives you.

The remediation

Three changes. None requires architectural rework:

 {
   "Version": "2012-10-17",
   "Statement": [
     {
       "Effect": "Deny",
       "Principal": "*",
       "Action": "execute-api:Invoke",
-      "Resource": "execute-api:/prod/GET/*",
+      "Resource": "execute-api:/*",
       "Condition": {
         "StringNotEquals": {
-          "aws:sourceVpc": "vpc-0b52ca08e7db8531f"
+          "aws:sourceVpce": "vpce-0abc123def456789"
         }
       }
     },
     {
       "Effect": "Allow",
       "Principal": "*",
       "Action": "execute-api:Invoke",
-      "Resource": "execute-api:/prod/GET/*"
+      "Resource": "execute-api:/*"
     }
   ]
 }

Plus the method-level change to authorization-type AWS_IAM.

After remediation:

Finding 1: UNSAT (patterns aligned at execute-api:/*).
Finding 2: UNSAT (aws:sourceVpce scopes to the specific endpoint).
Finding 3: UNSAT (IAM auth gates the request before the resource policy is consulted).

The pattern alignment is now structural — both statements use the same execute-api:/* — not an accident of the author choosing the same literal twice.

The prevention lesson

Three layers, in priority order:

Resource-policy template review. Whoever publishes infrastructure tutorials should include the alignment invariant ("Allow.Resource ⊆ Deny.Resource union with the inverse condition") as a checked property, not an implicit one. Z3 can check this automatically. Humans cannot reliably check it under deadline.

aws:sourceVpce is the scoping condition for private API restrictions, not aws:sourceVpc. Treat any AWS tutorial that uses the latter for "endpoint-specific" restriction as outdated, and update the policy when copy-pasting. The Vpc form admits every endpoint in the VPC — usually not what the author intended.

Authorization-type NONE is a deliberate choice, not a default. If the API serves authenticated traffic, use AWS_IAM. The resource policy then becomes a second layer, and the resource-policy alignment fragility above stops being a single point of failure.

Checklist

Allow and Deny statements in API Gateway resource policies use the same Resource pattern (or the Deny is broader)
VPC-restriction conditions use aws:sourceVpce (specific endpoint), not aws:sourceVpc (entire VPC)
Private REST APIs use --authorization-type AWS_IAM (or another non-NONE type) so the resource policy is a second layer behind authentication
Resource-policy templates run through formal verification at publish time; tutorials include the verification step in the walkthrough
Pre-merge CI runs the equivalent of this iteration's Z3 prover against the post-deploy observation snapshot

The article author's published configuration is currently safe. It works. The EC2 reaches the API. The intended VPC endpoint is the only path. As infrastructure documentation, the tutorial does its job. As a security artefact, it's one well-meant developer change away from an open API. Z3 can tell you that. Pattern matchers can't.

The example shipped with this article can be found at stave/examples/apigw-private-api-scoped-deny/ — two binaries side by side: a CEL foil that runs Stave's CTL.APIGATEWAY.NETWORK.PRIVATE.POLICY.001 control against the writeup, broadened-allow, and remediated configurations and reports COMPLIANT on all three (the existing control fires only when the private API has no VPC restriction at all), and a Z3 SAT prover that runs the four queries described in this article. The Z3 binary lives in a sibling Go module so its libz3 link stays out of Stave's main vendored tree. Stave detects this pattern and 31 other H1-grounded scenarios from local AWS configuration snapshots, with no cloud credentials.

DEV Community