Create account

DEV Community

Brad Kinnard

Posted on Apr 28

The Jupyter notebook bug that only crashes for other people

#python #jupyter #datascience #opensource

Cell 0 uses df. Cell 1 defines df.

Notebook works for you because your kernel ran the cells in some other order and the variable's still in memory. You commit. Someone clones the repo, hits Restart and Run All, dies on cell 0.

Standard Python linters can't catch this. ruff, flake8, mypy operate on one source file at a time. A notebook is N cells whose execution order in your kernel may have nothing to do with their order on disk. The bug isn't inside any single cell. It's in the relationship between cells.

nborder is a static linter for that relationship.

Rules

Code	Flags
NB101	`execution_count` decreases in source order
NB201	Name used in cell N, only defined in cell M where M > N
NB102	Name used somewhere, never defined anywhere
NB103	Stochastic call (numpy, torch, tensorflow, stdlib random) before any seed

How the cross-cell analysis works

Each cell gets parsed with libCST. A visitor extracts symbol definitions (assignments, function defs, class defs, imports) and symbol uses (name references, attribute roots) per cell. Connect them across cells in source order, you get a dataflow graph at notebook scope.

NB201 findings are uses whose nearest matching definition lives in a later cell. NB102 findings are uses with no matching definition anywhere.

The graph also makes the auto-fix safe. When NB201 fires, the fixer runs a topological sort over cell dependency edges. Sort succeeds, cells get reordered to respect dataflow and execution counts get cleared. Cycle detected, fixer bails with an explicit message naming the cycle.

NB201 fix example

Input:

# cell 0
result = df.head()

# cell 1
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3]})

Run nborder check --fix notebook.ipynb:

notebook.ipynb:cell_0:1:10: NB201 Variable `df` used in cell 0 is only defined in cell 1. The notebook will fail on Restart-and-Run-All. [*]
Fix outcomes:
  reorder: applied (reordered 2 cells and cleared execution counts)

Output:

# cell 0
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3]})

# cell 1
result = df.head()

Cell IDs preserved. Execution counts cleared. Second nborder check exits 0.

NB103 and seed injection

NB103 walks the same graph for stochastic calls (np.random.rand, torch.rand, tf.random.normal, random.random) firing before any matching seed. The fix injects a single seed cell at the right position. Multi-library notebooks get one cell:

import numpy as np
np.random.seed(42)
rng = np.random.default_rng(42)
import torch
torch.manual_seed(42)

Alias-aware. import numpy as numpy_lib produces a seed line using numpy_lib, not a redundant fresh import. After fixing a NumPy notebook, computed cell outputs are byte-identical across consecutive jupyter nbconvert --execute runs.

JAX and scikit-learn get diagnostic-only handling. JAX needs PRNGKey threading through call signatures. sklearn random_state=None needs a value chosen against your testing strategy. Neither is a single line you can inject.

Byte-stable writer

Parse a notebook, modify nothing, write it back, bytes match exactly. Verified against nbformat v4.0, v4.4, v4.5 fixtures plus a real-world notebook corpus. When the writer does mutate during a fix, only the cells that actually changed get rewritten. Cell IDs, metadata, and unrelated cells stay verbatim.

Outputs

Four reporters:

text: ruff-style path:cell:line:col: NB### message
json: machine-readable
github: ::error file=...,line=...,title=NB201:: annotations for PR inline comments
sarif: SARIF 2.1.0, schema-validated

Pre-commit hook and a composite GitHub Action included:

- uses: moonrunnerkc/nborder@v0.1.4
  with:
    path: notebooks/
    select: NB201,NB103

What it doesn't do

Doesn't execute notebooks. Pair with nbval or papermill for kernel-level validation.
Doesn't lint cell-internal style. That's nbqa.
Dynamic name resolution (exec, getattr, **kwargs, monkey-patching) is invisible. Same limitation as any static analyzer.
Cell magics are stripped before analysis. Names introduced by %%capture get tracked. Anything magic-internal does not.

Install

pip install nborder
nborder check path/to/notebooks/

Python 3.10+.

moonrunnerkc / nborder

A fast, opinionated linter and auto-fixer for Jupyter notebook hidden-state and execution-order bugs.

nborder

A fast, opinionated linter and auto-fixer for Jupyter notebook hidden-state and execution-order bugs.

What this catches

Code	Name	One-line example
NB101	Non-monotonic execution counts	Cell 1 ran with `In [3]:` after cell 0 ran with `In [5]:`.
NB102	Won't survive Restart-and-Run-All	`print(df)` references a name no cell in the notebook defines.
NB201	Use-before-assign across cells	Cell 0 uses `df`; `df = ...` only appears in cell 1.
NB103	Stochastic library used without seed	`np.random.rand(3)` runs with no seed call before it.

Each rule has a docs page under docs/rules/ explaining the bug class, a bad and good example, and the auto-fix behaviour. The four sections below walk through each rule with the diagnostic nborder actually emits.

NB101: out-of-order execution

The execution_count field on each cell records the order Jupyter actually ran cells in, not the order they appear in the file. When those orders disagree, the recorded…

View on GitHub

Top comments (0)

Brad Kinnard

Systems Architect & Full-Stack Engineer. 20+ years building software. Currently focused on AI agent verification, multi-agent orchestration, and quality gates. Founder, Aftermath Technologies.