DEV Community

Cover image for Dead Code Finder: GitLab Orbit-based static analysis that turned out to be harder than expected
Nidhi
Nidhi

Posted on

Dead Code Finder: GitLab Orbit-based static analysis that turned out to be harder than expected

I built a GitLab Duo Agent Platform flow for a hackathon. The goal was simple: find code that nothing actually calls.

Not "what breaks if I delete this." That question already has a dozen tools. I wanted the narrower one: does anything call this at all?


What I built

A flow called Dead Code Finder. It queries GitLab Orbit's knowledge graph for CALLS and IMPORTS edges on every Definition node in the project, then sorts findings into three buckets:

  • Confident : zero incoming edges, not behind a decorator, not an external entry point
  • Uncertain : ambiguous cases I can't fully resolve statically (inheritance, MRO dispatch)
  • Skipped : decorator-based dispatch, test framework reflection, hardware entry points. Explicitly flagged, not silently dropped.

One hard rule: never say "safe to delete." The report only says "no reference found in the static call graph, here is exactly what was checked."

It posts one report and stops. Never modifies files, never opens an MR.

The core logic lives in a SKILL.md that defines the Orbit traversal procedure step by step. The agent flow reads it, runs it, writes the report.


What I expected

Query the graph, check edge counts, flag anything with zero incoming CALLS or IMPORTS. Pretty mechanical.

I expected the hard part to be writing clean classification logic. It wasn't.


What actually happened

1. The platform didn't cooperate

Two things broke at runtime.

query_graph and get_graph_schema were declared in the flow's YAML config but weren't in the actual toolset when the flow ran. Not a permissions issue — the graph itself was fully queryable. I confirmed this later by running the real SKILL.md procedure manually via glab orbit remote query CLI and getting real CALLS/IMPORTS results back. The gap is specifically that custom flows in this environment don't have a documented path to those tools the way certain foundational agents do.

Separately, skill injection was unreliable. Sometimes the flow received only the manifest entry for SKILL.md — name and description — instead of the actual procedure body.

So I did two things: inlined the full procedure directly in the system prompt as a guaranteed fallback, and built a file-based fallback mode for when the graph tools are missing. In fallback mode, the flow reads actual repo files using list_repository_tree, get_repository_file, find_files, and blob_search, applies the same heuristics, labels every finding [INFERRED], and opens the report with an explicit banner naming which tools were missing.

No pretending to have graph evidence when I didn't.

Agent session log showing fallback mode banner

2. Static analysis is messier than it looks

A few cases a naive edge-count check gets wrong:

__init__, __enter__, __exit__, __del__

These almost always look dead in a naive check. When you instantiate a class, Orbit registers a CALLS edge to the class, not to __init__. So __init__ shows zero incoming edges even when the class is instantiated dozens of times. I confirmed this against a real ~500-line SDK class (EphemeralAgentExecutor) with a full unittest suite. Correcting the check to look at the enclosing class's incoming edges instead fixed the false positives.

Decorator-based dispatch

A function registered into a dict via a decorator and called later by string-key lookup is structurally indistinguishable from dead code in a static call graph. There is no literal call in source. These go in Skipped with the actual reason stated.

Test framework reflection

unittest.TestCase methods are discovered through reflection over class attributes, not through any literal call anywhere. The test runner finds them at runtime in a way the static graph can't see. Same bucket, same reasoning.

Inheritance and MRO

A method only reachable through a subclass that doesn't override it might show no direct incoming edges on the base class method itself. These go in Uncertain rather than flagged dead.


Validation

I built test fixtures specifically to hit the tricky cases: plain unused functions, cross-file import resolution, inheritance/MRO, decorator dispatch, constructor/dunder handling, and unittest reflection.

Two live runs in fallback mode correctly identified every planted dead-code fixture with cited file and line evidence, correctly excluded the decorator and test-discovery cases with real reasoning for each, and correctly downgraded the ambiguous inheritance and script entry point cases to Uncertain.

Then I ran the real Orbit traversal procedure manually against the live graph via CLI to check whether fallback mode's file-based guesses matched actual graph data.

Results:

  • totally_unused_helper — zero incoming CALLS/IMPORTS edges. Actually dead.
  • cross_file_dead_helper — zero incoming edges. Actually dead.
  • undecorated_dead_function — zero incoming edges. Actually dead.
  • Base.greet — real incoming edge from Child.run. Reachable via inheritance. Fallback mode correctly put this in Uncertain rather than flagging it dead.
  • summarize_text — alive via script entry point. Correctly Uncertain.
  • __init__, __exit__, __del__ — zero direct incoming edges, with real usage only on the enclosing class. Constructor correction validated.

The real graph also surfaced something fallback mode couldn't confirm: add_temp_file and issue_credential show zero incoming CALLS edges. Potential genuine dead code that file-reading alone couldn't settle.

Every fallback finding held up against the real graph.


What I'd do differently

The procedure logic ended up being the reliable part. The platform integration was the unreliable part, and I didn't fully account for that up front.

I'd start by confirming which tools are actually available in the runtime before writing any logic that depends on them. Building the fallback mode wasn't hard, but it would have been cleaner to design for both paths from the start.


What's still open

  • Wire the real Orbit tools into the flow (no documented path for custom flows yet)
  • Extend validation past Python
  • Transitive dead code detection (code only referenced by other dead code)
  • References from non-code files: YAML, CI templates, cron jobs

The thing I actually learned

A report that says "I don't know" in the cases where it genuinely doesn't know is more useful than one that sounds confident everywhere and is occasionally wrong.

That ended up applying to the platform too. Once I confirmed the Orbit tools weren't available, the right call was to say so loudly in the report, not mask it with a fallback that looked like the real thing.

The three-bucket output came from the same instinct. Most dead-code tools give you a flat list. The flat list trains you to ignore it once it's wrong a few times. Labeling the uncertainty explicitly is what makes the confident findings actually worth acting on.

Top comments (0)