On Static Analysis + LLM

#ai #llm #python #programming

Static analysis is understanding your code before running it.

def add(a, b):
  return a + b

def main():
  return add(1, 3)

Above is a trivial program.
At a glance, you can tell that calling main() will return 4.

Here are some questions to ponder about:

What happens if you call add with non-numeric types? i.e. what does add("foo", "bar") return?
How do you know about the above?

Static analysis is about answering these questions without having to run the code.
As human programmers, we develop familiarity with the language and runtime.
This lets us answer such questions easily.

But how does a machine figure out the answer?

Programs have many representations (the human-friendly syntax being one of many).
Here are some other ways to also represent the above python program:

AST

Module(
  body=[
    FunctionDef(
      name='add',
      args=arguments(
        args=[
          arg(arg='a'),
          arg(arg='b')]),
      body=[
        Return(
          value=BinOp(
            left=Name(id='a'),
            op=Add(),
            right=Name(id='b')))]),
    FunctionDef(
      name='main',
      args=arguments(),
      body=[
        Return(
          value=Call(
            func=Name(id='add'),
            args=[
              Constant(value=1),
              Constant(value=3)]))])])

Bytecode

Disassembly of <add>:
  1  RESUME              0
  2  LOAD_FAST           0 (a)
     LOAD_FAST           1 (b)
     BINARY_OP           0 (+)
     RETURN_VALUE

Disassembly of <main>:
4 RESUME 0
5 LOAD_GLOBAL 1 (add)
LOAD_SMALL_INT 1
LOAD_SMALL_INT 3
CALL 2
RETURN_VALUE

These representations are "closer to the machine".
They're the same program, but with details that are relevant to a compiler or CPU trying to execute the program.

Most human programmers never bother with these details.
They're frankly too low-level.
Not much work warrants looking at the AST or bytecode or compiled assembly of a program.

But!
There are some other representations of programs that perhaps all programms should be aware of.
For example, here's a call-graph of our sample program:

example.py
  [D] add
  [D] main
main
  [U] add

[D] means defined in, and [U] means uses.
You read the above as add and main being defined in example.py, and add being used in main.

Call graphs are very useful for understanding the chain of depencies and dataflow of your program.
In systems that grow to 100s of thousands lines, data being passed across several thousand functions and several hundred files -- call graphs end up being quite useful!
Without actually running or profiling the code, you can answer: what are all the functions and files this data passes through?
If I had to refactor or rethink the approach of how this data flows, what are all the files and functions I'd need to touch?

Static analysis is a powerful tool for people working on very complicated systems, and don't have the privilege of "just" re-writing it from scratch.
Many systems grow by standing on past foundations -- even if the foundations are quite shaky.

In the year of 2026, though, we have LLMs.
Can't we just use LLMs for our refactors and rewrites now?

There is a sense that you can't really do "serious" engineering with LLMs.
Their context windows are too small, their tendency to produce slop too high.
I disagree with these sentiments entirely.

I truly believe in this idea of a "capability overhang" in models.
That is, naive prompting and context management nerfs a model's ability to perform quite dramatically.
It's like taking a competent engineer and saying, you only get to look at the code for 1 second and then you must come up with solutions to the problem ondemand with no thinking or external tools.
Clearly insane and naive!

Harnesses like Claude Code/Codex/Opencode make the experience better, giving models hands and envrionments to test their code and iterate, but are still rather restricted due to tool permissions or patterns we encode in our prompts.
In a sense, a model+harness's results is limited by your own competency as an engineer.
How will you know to drive the model to better and better engineering practices if you don't know about them yourself?

Great engineers are not wizards.
Rather, they're good engineers who use great tools.

There's a variety of analysis you can do with a program without running or profiling it at all.
Call-graph analysis and control-flow analysis being some of them.
There's "low hanging fruit" in giving models access to these tools that senior+ engineers use to make changes in larger and more complex systems.
In my experience, giving models the ability to analyze codebases via static analysis enables them to more competently plan and execute larger scale refactors.
They're able to reason about the codebase that a senior+ engineer would.

The patterns I'm experimenting with right now is building better static analysis tools for Python.
Due to the dynamic nature of the language (and the fact that most users of Python are not programmers themselves), the state of these tools are a bit immature compared to any C/LLVM based language.
But it's great that LLMs are turbocharging the development here.

Already, I've found that LLMs have been able to refactor and come up with reasonable code improvements to a 10k LOC Python project I maintain, just by nature of being able to look at call graphs and control-flow graphs.

It seems to me that frontier models are probably as competent as a senior or staff level engineer, if given the right prompt and tools and ability to reason for long enough.
Getting this right will "just" be a matter of building tools and formats and representations that are very amenable to LLM reasoning.

DEV Community

On Static Analysis + LLM

AST

Bytecode

Top comments (0)