k38f

Posted on Apr 17 • Edited on Apr 22

How I Built a Static Analyzer for Python Env Variables in ~700 Lines

#webdev #python #tutorial #devops

I keep forgetting to add new env vars to .env.production before deploying. A few times it was harmless, a few times it was not. At some point it was easier to write a tool than to fix myself.

The tool is envsleuth. It reads your Python source, finds every os.getenv() / os.environ[] / os.environ.get(), and compares that list with your .env file. Pretty boring on the outside, but building it was a nice excuse to dig into the ast module. That's what this post is about.

What grep gets you

The first version wasn't a tool, it was just:

grep -rn "os.getenv" src/

This gets you through maybe a week. Then you notice it misses os.environ["X"]. And os.environ.get("X"). And it matches methods on unrelated classes that happen to be called getenv. And it has no way to tell os.getenv("X") apart from os.getenv("X", "fallback"), which matters because the second one is optional.

At that point you either write a smarter regex (don't) or start looking at ast.

What ast actually is

ast.parse(source) takes Python code and gives you back a tree. Every line of your program becomes a nested structure of nodes: Module, FunctionDef, Call, Attribute, Name, Constant, and so on.

For os.getenv("DATABASE_URL") you get roughly:

Call(
  func=Attribute(
    value=Name(id='os'),
    attr='getenv'),
  args=[Constant(value='DATABASE_URL')])

Once you see it in this shape, the matching logic writes itself. "Is this a Call? Is the function an Attribute called getenv? Is it an attribute of a Name called os? Yes? Take args[0]."

import ast

class EnvVisitor(ast.NodeVisitor):
    def __init__(self):
        self.names = []

    def visit_Call(self, node):
        if (isinstance(node.func, ast.Attribute)
                and node.func.attr == "getenv"
                and isinstance(node.func.value, ast.Name)
                and node.func.value.id == "os"):
            if node.args and isinstance(node.args[0], ast.Constant):
                self.names.append(node.args[0].value)
        self.generic_visit(node)

That last line, generic_visit, is the one I forgot on the first try. Without it the visitor only sees top-level nodes and misses everything nested in functions. A whole project returned zero results before I figured that out.

Three patterns

Python has three idiomatic ways to read env vars:

os.getenv("X")        # Call on Attribute
os.environ["X"]       # Subscript
os.environ.get("X")   # Call on nested Attribute

Each is a different AST shape, so each needs its own matcher. The nested one (os.environ.get) is the most fun — it's a Call whose func is an Attribute whose value is another Attribute. Nesting levels matter.

def _is_environ_get(self, node):
    func = node.func
    if not isinstance(func, ast.Attribute) or func.attr != "get":
        return False
    inner = func.value
    return (isinstance(inner, ast.Attribute)
            and inner.attr == "environ"
            and isinstance(inner.value, ast.Name)
            and inner.value.id == "os")

For the Subscript case (os.environ["X"]) you override visit_Subscript instead of visit_Call. Same idea, different node type.

Aliases

This is where the naive version breaks in practice. All four of these mean the same thing:

import os
os.getenv("X")

import os as sys_env
sys_env.getenv("X")

from os import getenv
getenv("X")

from os import getenv as ge
ge("X")

If you hardcode "os" in your matcher, you only catch the first one. Real codebases have the other three all the time.

The fix is a first pass that collects import bindings, then the matcher checks membership:

def visit_Import(self, node):
    for alias in node.names:
        if alias.name == "os":
            self._os_aliases.add(alias.asname or "os")
    self.generic_visit(node)

def visit_ImportFrom(self, node):
    if node.module == "os":
        for alias in node.names:
            bound = alias.asname or alias.name
            if alias.name == "getenv":
                self._getenv_aliases.add(bound)
            elif alias.name == "environ":
                self._environ_aliases.add(bound)

Now _is_getenv_call asks "is the thing we're calling in the _os_aliases set", not "is it literally the string os".

There's a nice side effect: this also cuts out false positives. If somebody has a class with a getenv method, the name "getenv" exists in the code but it's never bound via import os — so it's correctly ignored. A grep-based tool would flag it.

The thing ast can't solve

os.getenv(f"FEATURE_{name.upper()}")
os.getenv(some_variable)

These variable names are computed at runtime. Static analysis fundamentally can't know what they are without running the program. You have three choices:

Skip them silently — user never knows they exist
Refuse to handle the file — annoying
Report them separately as "dynamic usages, check manually"

I went with 3. The tool prints them under a separate warning section with the file and line number, so at least you know something's there. Pretending to solve a problem you can't solve is worse than admitting the limit.

What else is in the box

The scanner is ~200 lines. The rest of the project is:

A .env reader (via python-dotenv, no point rewriting that)
A comparison module that knows about .envignore for things you want to exclude
ANSI terminal output (I did not pull in rich for colouring five lines)
A click-based CLI
A generate command that turns a scan result into a .env.example with comments pointing at where each variable is used

Total ~700 lines, 49 tests, MIT licensed.

Try it

pip install envsleuth
envsleuth scan --path ./src
envsleuth generate              # writes .env.example from your code
envsleuth scan --strict         # exit 1 if anything's missing, for CI

Source: github.com/k38f/envsleuth

If you find a case it doesn't handle (Django settings modules, pydantic-settings integration, anything else), open an issue or a PR. Curious to see what breaks on real codebases.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.