I built a tool that makes any PyTorch model inspectable with one line of code. No retraining, no architecture changes, no extra memory. Here's how it works.
The Problem
You train a model. It works. But why does it work? Which layers matter? Are any neurons dead? What are the attention heads actually doing?
Most interpretability tools try to answer these questions after the fact -- approximations bolted onto a black box. I wanted something different: exact traces of what actually happened inside the model during inference.
The Solution: 3 Lines
pip install hdna-workbench[pytorch]
import workbench
model = workbench.inspect(model) # swap layers for inspectable versions
output = model(input) # same math, same output
traces = workbench.trace(model) # see what every layer did
That's it. workbench.inspect() walks your model and replaces each layer with a subclass that records what happens during forward passes. nn.Linear becomes InspectableLinear, nn.MultiheadAttention becomes InspectableMultiheadAttention, etc.
Because they're subclasses:
-
isinstance(layer, nn.Linear)is stillTrue -
model.state_dict()works unchanged -
torch.save(model)works unchanged - Output is numerically identical
What You Get
Here's real output from a small transformer:
embedding calls=1 shape= (2, 32, 128) time=0.08ms
layers.0.self_attn calls=1 shape= (2, 32, 128) time=2.29ms
Head entropy: ['2.922', '2.987', '2.984', '2.970']
Head sharpness: ['0.173', '0.159', '0.152', '0.156']
Head redundancy: 0.5618
layers.0.linear1 calls=1 shape= (2, 32, 256) time=0.07ms
Weights: mean=0.0001 std=0.0511 sparsity=0.0%
layers.1.self_attn calls=1 shape= (2, 32, 128) time=0.66ms
Head redundancy: 0.8213
norm calls=1 shape= (2, 32, 128) time=0.03ms
head calls=1 shape= (2, 32, 1000) time=0.12ms
Per-layer timing. Attention head entropy (how spread out the attention is). Head redundancy (how similar heads are to each other). Weight statistics. All automatic.
Going Deeper
Find Dead Neurons and Anomalies
anomalies = workbench.anomalies(model)
for a in anomalies:
print(f"WARNING: {a['layer']} -- {a['issue']}")
Inspect Attention Patterns
for name, module in model.named_modules():
if hasattr(module, 'attention_weights'):
heads = module.head_summary()
for h in heads:
print(f"Head {h['head']}: entropy={h['entropy']:.3f}")
Track Embedding Usage
for name, module in model.named_modules():
if hasattr(module, 'most_accessed'):
print(f"Top tokens: {module.most_accessed(10)}")
print(f"Never accessed: {len(module.never_accessed())} tokens")
Set Breakpoints
# Halt when output magnitude exceeds threshold
layer.add_breakpoint(lambda l, inp, out: out.abs().max() > 100)
Control Trace Depth
from workbench import TraceDepth
workbench.set_depth(model, TraceDepth.FULL) # activations + gradients + history
workbench.set_depth(model, TraceDepth.STATS) # running statistics only
workbench.set_depth(model, TraceDepth.OFF) # disable for benchmarking
Revert When Done
model = workbench.revert(model) # back to standard PyTorch
torch.save(model, "clean.pt") # no workbench dependency in saved model
It's More Than a Wrapper
The inspection wrapper is one part of a larger platform called HDNA Workbench. HDNA stands for Highly Dynamic Neural Architecture -- it includes:
- An open-box AI engine where every neuron has persistent memory, mutable routing tables, and semantic tags. Not a black box with explanations bolted on -- transparent by design. Core runs on numpy alone.
- Universal adapters that connect any model (PyTorch, HuggingFace, ONNX, or API) to the same research tools
- 6 research tools: Inspector, Decision Replay, Daemon Studio, Experiment Forge, Model Comparison, and Exporter
- 3 built-in curricula: Math (14 phases), Language (sentiment/topic/emotion/intent), Spatial (grid pattern recognition)
- Compliance mapping to EU AI Act, NIST AI RMF, and ISO/IEC 42001
If you just want the PyTorch inspection, pip install hdna-workbench[pytorch] and use the 3 lines above. If you want to study how AI learns from the ground up, the HDNA core is there too.
14 Supported Layer Types
| Category | Layers |
|---|---|
| Core | Linear, Embedding, Sequential |
| Transformer | MultiheadAttention, TransformerEncoderLayer, TransformerDecoderLayer |
| Normalization | LayerNorm, BatchNorm1d, BatchNorm2d |
| Convolution | Conv1d, Conv2d |
| Activation | ReLU, GELU, Softmax |
Custom layers: workbench.register(MyLayer, InspectableMyLayer)
Links
- GitHub: github.com/staffman76/HDNA-Workbench
-
PyPI:
pip install hdna-workbench - Docs: Wiki
- License: BSL 1.1 (free for research, education, individuals, and orgs under $1M revenue)
Feedback welcome -- especially from anyone working on model interpretability or AI compliance.
Top comments (0)