Assumptions don't have signatures

#security #programming #codequality #architecture

Scanners find what's syntactically wrong. The interesting issues live in assumptions, and assumptions don't have signatures.

I spend a lot of time reading other people's code - open source projects, security audits, things I'm about to depend on. Not scanning, not fuzzing. Just reading it the way you'd read it if you were about to own it in production.

What I look for first

Entry points. Where does external input come in? That's where the trust boundary is, and it's where most assumptions start.

Data flow. Pick an input and follow it. Where does it get validated, where does it get used without validation, how many function calls between "user gave me this" and "I'm using this in a query, file path, or shell command?" The longer that chain, the more likely something got dropped along the way.

Authorization boundaries. Not "does auth exist" but "is auth checked consistently?" A path protected in one subsystem but wide open in another. I've seen this more than any other class of bug - the developer who wrote the admin API added auth middleware, the developer who wrote the internal API assumed it was handled upstream. Both were reasonable. The combination was a vulnerability.

What scanners miss

Missing headers, outdated dependencies, known CVE patterns - scanners handle that fine. That's the baseline. The interesting issues live a layer deeper.

The parse-time operation nobody thought to bound

func parseConfig(data []byte) (*Config, error) {
    var c Config
    if err := yaml.Unmarshal(data, &c); err != nil {
        return nil, err
    }
    return &c, nil
}

Fine when the config is on disk and you wrote it. Becomes a problem the day someone accepts config over an API endpoint. YAML parsers have famously done interesting things with deeply nested structures, anchors, and aliases. The parse itself becomes the attack surface.

The code didn't change. The deployment did. The assumption "this input is trusted" was baked into the function and never re-examined when the caller changed.

Code correct at write-time, system grew around it

// Added in 2019
func isInternalIP(addr string) bool {
    return strings.HasPrefix(addr, "10.") ||
           strings.HasPrefix(addr, "192.168.")
}

Correct for the 2019 network. IPv6 wasn't in scope, link-local addresses weren't relevant, nothing ran in 172.16/12. By 2023 the function is load-bearing in an SSRF defense, and it's trivially bypassable. Nobody wrote bad code. The code just stopped matching the system.

Partial validation - the last field nobody got to

Filed a report recently where a validation layer checked three of four fields. The fourth wasn't intentionally skipped - it was just the one nobody got to. But because everything around it was validated, the team assumed it was handled.

switch err.Code {
case 400: return retry(req)
case 429: return backoff(req)
case 500: return retry(req)
case 503: return backoff(req)
// 502?
}

The fifth error code looks handled because everything around it is. Same with a test suite that covers two edge cases - the third feels tested because its neighbors are. Partial coverage gives you the wrong mental model, and wrong mental models are harder to fix than missing ones, because nobody's looking.

These aren't in any scanner's database because they're not patterns yet. They're the gap between what the developer intended and what the code actually does. You only see that gap if you understand what the code was trying to do in the first place.

The method is just debugging in reverse

It's the same skill as debugging production systems. Trace the data, find where reality diverges from the mental model, understand why.

When you debug, something broke and you're working backward to find the divergence. When you read code for security, you're working forward, looking for the divergence before it breaks. Same skill, different direction.

The developers I've seen do this well aren't security specialists. They're the ones who debug well - who read stack traces carefully, who ask "what state was the system in when this happened," who don't stop at the first explanation that sounds right.

What to read first in a new codebase

If I'm picking up a project I've never seen before:

Routing / entry points - main.go, app.py, index.ts, wherever requests come in. This tells you the shape of the attack surface.
Auth middleware - how it's applied, whether it's opt-in or opt-out. Opt-in means every new endpoint is unprotected by default. That's a design choice that compounds.
Input parsing - especially anything that handles user-supplied structure (JSON, XML, YAML, archives). Unbounded parsing is the most common class of bug I find by reading.
Error handlers - what gets logged, what gets returned to the user. Stack traces in error responses, internal paths in error messages, database errors passed through unfiltered.
The oldest code - sort by last modified date, read what hasn't been touched in years. That's where the assumptions are most stale and the test coverage is thinnest.

None of this requires a security background. It requires reading code carefully and asking "what happens if the input isn't what the developer expected?"

That's debugging. You already know how to do it.