I keep forgetting to add new env vars to .env.production before deploying. A few times it was harmless, a few times it was not. At some point it was easier to write a tool than to fix myself.
The tool is envsleuth. It reads your Python source, finds every os.getenv() / os.environ[] / os.environ.get(), and compares that list with your .env file. Pretty boring on the outside, but building it was a nice excuse to dig into the ast module. That's what this post is about.
What grep gets you
The first version wasn't a tool, it was just:
grep -rn "os.getenv" src/
This gets you through maybe a week. Then you notice it misses os.environ["X"]. And os.environ.get("X"). And it matches methods on unrelated classes that happen to be called getenv. And it has no way to tell os.getenv("X") apart from os.getenv("X", "fallback"), which matters because the second one is optional.
At that point you either write a smarter regex (don't) or start looking at ast.
What ast actually is
ast.parse(source) takes Python code and gives you back a tree. Every line of your program becomes a nested structure of nodes: Module, FunctionDef, Call, Attribute, Name, Constant, and so on.
For os.getenv("DATABASE_URL") you get roughly:
Call(
func=Attribute(
value=Name(id='os'),
attr='getenv'),
args=[Constant(value='DATABASE_URL')])
Once you see it in this shape, the matching logic writes itself. "Is this a Call? Is the function an Attribute called getenv? Is it an attribute of a Name called os? Yes? Take args[0]."
import ast
class EnvVisitor(ast.NodeVisitor):
def __init__(self):
self.names = []
def visit_Call(self, node):
if (isinstance(node.func, ast.Attribute)
and node.func.attr == "getenv"
and isinstance(node.func.value, ast.Name)
and node.func.value.id == "os"):
if node.args and isinstance(node.args[0], ast.Constant):
self.names.append(node.args[0].value)
self.generic_visit(node)
That last line, generic_visit, is the one I forgot on the first try. Without it the visitor only sees top-level nodes and misses everything nested in functions. A whole project returned zero results before I figured that out.
Three patterns
Python has three idiomatic ways to read env vars:
os.getenv("X") # Call on Attribute
os.environ["X"] # Subscript
os.environ.get("X") # Call on nested Attribute
Each is a different AST shape, so each needs its own matcher. The nested one (os.environ.get) is the most fun — it's a Call whose func is an Attribute whose value is another Attribute. Nesting levels matter.
def _is_environ_get(self, node):
func = node.func
if not isinstance(func, ast.Attribute) or func.attr != "get":
return False
inner = func.value
return (isinstance(inner, ast.Attribute)
and inner.attr == "environ"
and isinstance(inner.value, ast.Name)
and inner.value.id == "os")
For the Subscript case (os.environ["X"]) you override visit_Subscript instead of visit_Call. Same idea, different node type.
Aliases
This is where the naive version breaks in practice. All four of these mean the same thing:
import os
os.getenv("X")
import os as sys_env
sys_env.getenv("X")
from os import getenv
getenv("X")
from os import getenv as ge
ge("X")
If you hardcode "os" in your matcher, you only catch the first one. Real codebases have the other three all the time.
The fix is a first pass that collects import bindings, then the matcher checks membership:
def visit_Import(self, node):
for alias in node.names:
if alias.name == "os":
self._os_aliases.add(alias.asname or "os")
self.generic_visit(node)
def visit_ImportFrom(self, node):
if node.module == "os":
for alias in node.names:
bound = alias.asname or alias.name
if alias.name == "getenv":
self._getenv_aliases.add(bound)
elif alias.name == "environ":
self._environ_aliases.add(bound)
Now _is_getenv_call asks "is the thing we're calling in the _os_aliases set", not "is it literally the string os".
There's a nice side effect: this also cuts out false positives. If somebody has a class with a getenv method, the name "getenv" exists in the code but it's never bound via import os — so it's correctly ignored. A grep-based tool would flag it.
The thing ast can't solve
os.getenv(f"FEATURE_{name.upper()}")
os.getenv(some_variable)
These variable names are computed at runtime. Static analysis fundamentally can't know what they are without running the program. You have three choices:
- Skip them silently — user never knows they exist
- Refuse to handle the file — annoying
- Report them separately as "dynamic usages, check manually"
I went with 3. The tool prints them under a separate warning section with the file and line number, so at least you know something's there. Pretending to solve a problem you can't solve is worse than admitting the limit.
What else is in the box
The scanner is ~200 lines. The rest of the project is:
- A
.envreader (via python-dotenv, no point rewriting that) - A comparison module that knows about
.envignorefor things you want to exclude - ANSI terminal output (I did not pull in
richfor colouring five lines) - A
click-based CLI - A
generatecommand that turns a scan result into a.env.examplewith comments pointing at where each variable is used
Total ~700 lines, 49 tests, MIT licensed.
Try it
pip install envsleuth
envsleuth scan --path ./src
envsleuth generate # writes .env.example from your code
envsleuth scan --strict # exit 1 if anything's missing, for CI
Source: github.com/k38f/envsleuth
If you find a case it doesn't handle (Django settings modules, pydantic-settings integration, anything else), open an issue or a PR. Curious to see what breaks on real codebases.

Top comments (1)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.