The Query Is a Flashlight. The Eyes Are the Work.

#cpp #cybersecurity #security #tooling

The Query Is a Flashlight. The Eyes Are the Work.

Running a CodeQL query against a 3.5-year-stale Firefox database, and what reading the source taught me on top of the query output.

Twenty-one hits.

I wrote a CodeQL query targeting a single shape: a parent-process IPDL Recv* handler that assigns a content-controlled parameter to a member field with no preceding guard. Compiled it. Ran it against Mozilla's last public CodeQL database for Firefox, version 105, dated September 2022. Six minutes of evaluation. Twenty-one hits across dom/ipc/, gfx/, accessible/, ipc/glue/, netwerk/dns/. The query landed on real Firefox code on the first run.

I expected the interesting story to be the hits. It wasn't. The interesting story is what the query found, what it missed, and what reading the source taught me on top of it.

The query

The .ql file is short. It defines what a parent-side Recv* handler looks like (member function whose name starts with Recv, declared on a class whose name ends in Parent), defines what an unvalidated parameter-to-field assignment looks like (mField = aParam or mField = aParam.mInner, where mField is a member of the enclosing class), and defines what counts as a guard:

predicate isValidationCall(FunctionCall fc) {
  exists(string n | n = fc.getTarget().getName() |
    n = "Equals" or
    n = "EqualsLiteral" or
    n = "IsValid" or
    n.matches("IsValid%") or
    n.matches("Validate%") or
    n.matches("Check%") or
    n = "IPC_FAIL" or
    n = "IPC_FAIL_NO_REASON"
  )
}

predicate isGatingComparison(Expr e) {
  e instanceof ComparisonOperation or
  e instanceof BinaryBitwiseOperation
}

A handler is flagged when an assignment matching the shape exists and no validation call or gating comparison referencing the same parameter appears earlier in the function body. That's the whole detection logic. Roughly fifty lines including the predicates.

The shape I'm hunting is a known one in Mozilla's bounty history. A compromised content process can send arbitrary IPDL messages to the parent. If a parent-side Recv* handler stuffs a content-controlled value directly into a member field that gates a downstream privileged decision -- a sandbox bit, a mixed-content flag, a layer-tree id, an actor token -- then the trust boundary has just lifted whatever the child wanted into a region that's supposed to be authoritative. Mozilla has paid sec-moderate to sec-high for this exact shape multiple times. A small set of targets had already been enumerated by an earlier hand-reconnaissance pass against this codebase; the query's job was to widen that net mechanically across the rest of the parent-side IPC surface.

The Firefox 105 database is 3.5 years stale. Mozilla stopped publishing public CodeQL databases in 2022. But the IPC plumbing in dom/ipc/, ipc/glue/, and gfx/ has the same structural shape today as it did then -- handler classes still end in Parent, IPDL still emits Recv* methods, member fields still get assigned in handler bodies. The rows cross-port. I checked. For each FF 105 hit I went out to the current mozilla-firefox/firefox main branch via the GitHub API and pulled the present-day handler body. Some rows had been refactored away. Most were still there in essentially the same shape. A few looked, from the diff, like they had been patched between then and now.

That last category is what this post is about.

The case I expected to be a clean win

One of the twenty-one hits sat at gfx/ipc/CanvasManagerParent.cpp:119 in Firefox 105. The alert text:

Recv* handler 'RecvInitialize' assigns content-controlled parameter 'aId' to member field without validation. Review for IPC trust-boundary bug.

I triaged the row, marked it interesting, and moved on. Later, cross-referencing against the current mozilla-firefox/firefox main branch, I noted that the same handler at SHA 4272397b835a480b1be6cee142d0fa39e166dbc6 looked like this:

mozilla::ipc::IPCResult CanvasManagerParent::RecvInitialize(
    const uint32_t& aId) {
  if (!aId) {
    return IPC_FAIL(this, "invalid id");
  }
  if (mId) {
    return IPC_FAIL(this, "already initialized");
  }
  mId = aId;
  return IPC_OK();
}

Two IPC_FAIL guards in front of the assignment. My initial reading: Mozilla had patched the issue between FF 105 and main. The query found a real bug, fixed in the intervening years. Portfolio gold even with zero bounty.

Then I went back to the FF 105 source itself, extracted from the CodeQL database, to write up the contrast for this post. The handler at gfx/ipc/CanvasManagerParent.cpp lines 111-121, Firefox 105:

mozilla::ipc::IPCResult CanvasManagerParent::RecvInitialize(
    const uint32_t& aId) {
  if (!aId) {
    return IPC_FAIL(this, "invalid id");
  }
  if (mId) {
    return IPC_FAIL(this, "already initialized");
  }
  mId = aId;
  return IPC_OK();
}

Identical. Both guards already present. There was no patch. The validation was there in 2022 and is still there now.

The query had flagged a non-bug. My triage notes had said it was patched. Neither was true.

What reading the source actually taught me

The handler matched the query's "no validation" predicate even though the validation was right there on the line above the assignment. To see why, look at the predicate again:

predicate isValidationCall(FunctionCall fc) {
  exists(string n | n = fc.getTarget().getName() | ...)
}

predicate isGatingComparison(Expr e) {
  e instanceof ComparisonOperation or
  e instanceof BinaryBitwiseOperation
}

if (!aId) is a UnaryLogicalNotOperation. It is not a comparison. It is not a bitwise op. It is not a function call. The predicate has no clause for it.

IPC_FAIL(this, "invalid id") is a function-call shape, and the name IPC_FAIL does match isValidationCall. But the final clause of the hasValidationBefore predicate requires the guard to lexically mention the parameter being checked:

guard.getAChild*().(VariableAccess).getTarget() = param

IPC_FAIL(this, "invalid id") does not reference aId. It references this and a string literal. So the call alone, without the surrounding if (!aId) shape, does not connect back to the parameter. The predicate fails, and the assignment gets flagged.

That's the false positive in one sentence: the validation existed, but the guard's shape (a unary-not check, with the parameter reference in the condition rather than inside the failure call) wasn't in the query's vocabulary.

I caught this only because I went back to the source to write up a case study. If I had stopped at "main has guards, FF 105 is the SARIF row, therefore patched" -- which is what my own triage notes said -- I would have published a confident lie.

Flashlight, eyes, doorway

The query is a flashlight. It illuminates a rough neighborhood: handlers where a parameter lands in a field without a validation shape the query knows how to recognize. It does not understand semantics. It does not know which parameter matters. It does not know which member field is load-bearing. It does not know whether a guard the query failed to model is sitting one line above the flagged statement.

The eyes -- a person reading source, walking from the handler to the field's other use sites, checking whether the guard the query missed is doing the work the query was hunting for -- are where the actual investigation happens. The query is a starting point. It is not a finished investigation.

The CanvasManagerParent case is the cleanest possible illustration. The query said: this is suspicious. The eyes said: the validation is two lines above the assignment, written in a shape the query's predicate didn't model, the field is gated correctly, this is a false positive. Both steps were necessary. The query alone produced a wrong answer. Reading random source files without the query would have taken weeks to find this specific function on its own merits.

The toolsmith's value isn't the .ql file in isolation. It's the .ql file plus the disciplined source-walking it forces. The combination is reproducible. Either alone isn't.

There is a stronger version of this lesson hiding underneath. CodeQL's predicates are vocabulary. The vocabulary I picked for this query -- Comparison, BinaryBitwise, name-matched function calls -- is narrow on purpose. A wider vocabulary (logical-not, ternary expressions, early-return patterns, lookup-then-null-check shapes) would catch more guards and reduce false positives, at the cost of more query lines and slower evaluation. That tradeoff is the actual craft. Every query is a wager about which guard shapes are common enough to model and which are rare enough to absorb as noise. The CanvasManagerParent miss tells me my wager was wrong about unary-not. The next iteration of the query has one more predicate clause.

The same lesson shows up at the meta level. My triage's PATCHED classification was, in effect, my own brain's predicate firing on a shape match -- "current-main has guards, FF 105 SARIF row exists, therefore patched between." That heuristic is exactly as shape-narrow as the CodeQL predicate that produced the row in the first place, and it can be wrong in exactly the same way. The eyes only do their job if they actually open the older file and look. I almost didn't. I had a tidy story already -- query catches bug, Mozilla fixes bug, post writes itself -- and tidy stories are seductive. Going back to the FF 105 source was unrewarding right up until it was the entire post.

I want to be careful not to overclaim from one false positive. One miss isn't a methodology paper. But the shape of the miss generalizes: any time a static analysis predicate is narrower than the language's actual guard repertoire, a corresponding category of false positives exists, and only source reading distinguishes that category from real findings. CodeQL ships with mature standard libraries that handle a lot of this -- DataFlow, TaintTracking, GuardCondition -- and the right move for a maturing query is to lean on those instead of rolling shape-recognition predicates by hand. My version is the toy version. The next iteration moves toward the standard libraries. That's a query iteration I'll do after the rest of the rows are walked.

What's next

About fourteen of the twenty-one rows are still novel candidates against current main. Some of them will turn out to be false positives in the same way CanvasManagerParent was -- a guard the query couldn't see, validation that exists in a shape the predicate didn't recognize. Some of them will turn out to be already known to Mozilla and tracked elsewhere. Some will need more source-walking to confirm whether the parameter is load-bearing for any privileged decision downstream, or whether an attacker controlling it just gets to corrupt their own state.

A few might be real. I'm not promising. I'm running a triage discipline, one row at a time: read the FF 105 source, fetch the current-main version, walk the call graph, decide. If a row survives all four steps with the parameter still reaching a privileged sink unguarded, it becomes a PoC candidate. If it doesn't, it goes in the false-positives bucket and informs the next query iteration.

The point of putting the methodology in writing isn't that I have a result to sell. It's that the discipline -- query, then read, then verify, then either keep walking or close the row honestly -- is the actual artifact. Anyone can copy this for their own target codebase.

Close

Killing things teaches you more than running things. But running things wakes you up.

The query woke me up about CanvasManagerParent. Reading the source kept me from publishing a false claim about it. Both were necessary. Neither would have been enough alone.