Second in a series on using AI to review authorization — not to spray reports.
Companion reference: AuthZ Smell Catalog.
1. Why AuthZ review is not vulnerability spraying
The cheapest thing an AI can do in security is generate suspicion. Point a model at a
codebase and it will hand you fifty "possible IDORs" before you finish your coffee. Almost
all of them are wrong — guarded three lines up, scoped at the data layer, or protected at a
boundary the model never saw.
That flood is exactly why several bug bounty programs spent 2026 tightening or pausing:
they were drowning in confident, plausible, wrong reports.
So this review inverts the usual loop. The AI's job is not to find bugs — it is to
over-generate hypotheses cheaply. My job is to kill them. What survives that killing
is the only thing worth a human's time, and the record of what died is more useful than the
record of what lived.
The artifact of an honest review is therefore not a finding. It's a kill table.
2. Target and scope
- Target: Ory Kratos — an open-source identity and user-management server (login, registration, recovery, verification, sessions, self-service settings). Source-available, Apache-2.0.
- Why Kratos: it is exactly the shape where authorization goes wrong — multiple identities, a public API and an admin API, and (in Ory's hosted product) multi-tenancy. If a boundary is fragile, this is where it shows.
-
Scope of this write-up: source reading only, on the public repository, single-tenant
OSS build. No hosted target was touched. Nothing here is an undisclosed finding — the point
is the method and the boundary design, and where relevant, how the design held against
the hypotheses I tested. This maps to the reproduction tiers we track: everything below is
repo_only, and I say so explicitly rather than implying it reaches a live product.
What this review does and does not claim. In this limited, repo-only review, the
hypotheses I tested were killed. This is not a claim that Kratos has no vulnerabilities,
and it is not a security audit. It is a case study in how AI-assisted AuthZ review can
avoid false positives — how to kill a suspicion instead of shipping it.
3. What the AI suspected
I let cheap finders over-generate against the AuthZ Smell Catalog. The raw candidate
list, unfiltered:
- H1 — Admin API has no authorization in code. Identity CRUD, session deletion, and courier endpoints on the admin API have no per-request permission check in the handler. (Catalog §01 object-id substitution, §13 middleware-only authorization.)
-
H2 — Cross-identity / cross-tenant read. Can one identity read another's data via
/sessions/whoami, identity lookups, or an admin identity fetch? (Catalog §04 tenant id from client input, §05 cross-tenant leak via related object.) - H3 — Recovery / verification token reuse. Are one-time tokens actually one-time? (Catalog §09 invite/share token reuse.)
- H4 — Settings-flow identity confusion. Can the self-service settings flow be pointed at another identity's traits? (Catalog §02 ownership vs reachability, §07 stale privilege.)
- H5 — Tenant assignment from payload. Can an admin-create/update set an arbitrary network id and cross the tenant boundary? (Catalog §04.)
Five confident hypotheses. This is the part AI is good at. Now the part it can't do.
4. How we tried to kill each hypothesis
The rule: assume each is by-design until a concrete test says otherwise, and default to
killing it. For source-only review, the "test" is: can I trace a user-reachable path where
the invariant actually breaks — not just a handler that looks unguarded?
H1 — Admin API "missing" authorization → KILLED (by design).
Kratos deliberately ships the admin API with no built-in authorization. Ory's own
documentation states the admin API must be protected at the network boundary (ingress, a
reverse proxy, Oathkeeper) and never exposed publicly. So "no authz check in the handler" is
not a missing guard — it is the guard living one layer out, exactly the false-positive shape
in Catalog §13 (middleware/deployment-layer authorization). A report of "admin API allows
identity CRUD without auth" is by-design and would be closed as such. Killed.
H2 — Cross-identity / cross-tenant read → KILLED (chokepoint design).
This is the interesting one. Kratos does not scatter tenant checks across handlers. Its
persistence layer runs every query through a network Contextualizer that injects the
network id (nid) into the SQL — the data-access layer itself filters by tenant, centrally.
A handler cannot accidentally read across the boundary, because the boundary is enforced
below the handler, at the one place every read funnels through. On the public API, identity
access is derived from the session's identity, never from a client-supplied id. To break
H2 you would have to find a read path that bypasses the persister entirely — and I found no
user-reachable one in this build. Killed. And worth noting as a pattern: concentrating the
tenant filter at the data-access layer collapses the whole class into a single auditable
point — which is why these particular hypotheses died here (Catalog §B).
H3 — Token reuse → KILLED.
Recovery and verification tokens are single-use and time-boxed; redemption invalidates the
token in the same transaction. Replay after use fails. Killed.
H4 — Settings-flow identity confusion → KILLED.
The settings flow binds to the identity resolved from the authenticated session. The identity
being modified is not taken from client input, so you cannot retarget the flow at someone
else's traits. Killed (Catalog §02 — read-reachability is not write-reachability, and here
even read is session-bound).
H5 — Tenant from payload → KILLED.
The network id is derived from context, not from the request body. An admin create/update
cannot smuggle a foreign nid. Killed.
5. The kill table
The deliverable of the whole review, on one screen:
| # | Hypothesis | Catalog | Verdict | Why it died |
|---|---|---|---|---|
| H1 | Admin API missing authz | §01, §13 | by-design | authz is at the network boundary, not the handler — documented |
| H2 | Cross-identity / cross-tenant read | §04, §05 | defended |
nid enforced at the persister via the Contextualizer; public reads are session-bound |
| H3 | Recovery/verification token reuse | §09 | defended | single-use, time-boxed, invalidated on redemption |
| H4 | Settings-flow identity confusion | §02, §07 | defended | flow bound to the session identity, not client input |
| H5 | Tenant assignment from payload | §04 | defended |
nid from context, not request body |
Five hypotheses in. Zero findings out. This is a successful review, not a failed one —
and to be exact, it is a successful review of five hypotheses, not a clean bill of health
for Kratos.
6. False-positive patterns worth naming
Two shapes here generalize far beyond Kratos:
- Deployment-layer authorization reads as "missing" authorization. An endpoint with no in-code check is a red flag only if nothing outside the code guards it. Admin planes, internal services, and "protect this at the ingress" designs all trip a naive AI. Always ask where else could the guard live before reporting (Catalog §13).
- A chokepoint boundary looks unguarded per-handler. When the real boundary is enforced at a single data-access layer, every individual handler looks naked. The correct review question flips from "does this handler check?" to "can anything reach the data without passing the chokepoint?" (Catalog §B, §12 alternate-path).
7. What survived and what did not
None of the hypotheses I tested survived source-only review — and the reason is worth
publishing: Kratos concentrates its tenant boundary in one place (the persister's
Contextualizer) and derives identity from the session rather than from client input. That
design choice is precisely what made four of my five hypotheses collapse to one question, and
that question had a clean answer.
If I were to keep going, the only honest next move would be to enumerate every ingress that
could reach persisted data without the persister — background jobs, imports, any raw query.
In the OSS build there is no user-reachable one. That negative result is real signal, and it
is tier repo_only: I am not claiming it holds against any specific hosted deployment.
8. Lessons for AI-assisted security review
- Let AI over-generate; make humans the kill switch. The value is in the rejection, not the suspicion.
- Default to by-design. Assume the boundary is intentional until a concrete, reachable path proves otherwise. This single bias would have prevented most of 2026's report-spam.
- Find the chokepoint first. If a codebase enforces a boundary centrally, the review collapses to "is the chokepoint ever bypassed?" — a far smaller, far more honest question.
-
Name your reproduction tier out loud.
repo_onlyis nothosted_confirmed. Say which one you have. Conflating them is how OSS reading turns into a false bounty claim. - Publish the kill table, not just the win. The rejections are the portfolio.
9. How this feeds the AuthZ Smell Catalog
Each kill sharpened a catalog entry's confirm/kill column — the column that separates a real
bug from a by-design behavior:
- §13 gained the deployment-layer-authz false positive (from H1).
- §B / §04 gained the chokepoint-Contextualizer confirm test: trace the response fields, and ask whether any read path skips the central scoper (from H2).
- §09 gained the single-use-token replay test (from H3).
The catalog is not a static list; every real outcome — even a clean by-design result — feeds a
sharper kill test back into it. That feedback loop is the asset, not the entry count.
10. Next step: from OSS case study to one real bounty target
This review produces exactly one row for the outcome ledger — the honest kind, a defended
target:
date=2026-07-04, program=Ory Kratos (self-directed OSS review), source_type=oss_source_available,
class=tenant_boundary, repro_tier=repo_only, human_verdict=by_design, final_status=not_applicable,
payout_usd=0, lesson="Contextualizer/nid chokepoint concentrates the tenant boundary; admin-API
authz is deployment-layer by design — both are KILLs, not bugs. Review collapses to: can anything
reach persisted data without the persister?"
Row #1 in a ledger is not supposed to be a payout. It's supposed to be true. From here, the
next step is a single source-available target with a newly-added permission boundary
(a fresh RBAC, workspace, billing, or SSO/SCIM feature) — the un-picked-over surface — run
through the same over-generate-then-kill loop, and logged as ledger row #2. One target. Not ten.
I use AI to reject candidates and humans to verify the few that survive. If that approach is
useful to you, the AuthZ Smell Catalog is the companion reference this series builds on.
Top comments (0)