Cross-posted from oourmind.io — part of an ongoing series on the 3D Interpretability Lab*
The Problem With Black Boxes in Space
We've gotten quite good at asking what neural networks know. Mechanistic interpretability — the field dedicated to reverse-engineering how AI models work internally — has made remarkable progress on language models. We can find circuits that detect curves, attention heads that implement induction, and linear subspaces that encode factual associations.
But spatial models — models that understand, generate, or reason about 3D environments — remain largely opaque. Not because we lack curiosity, but because we lack a handle: the internal representations of most vision and world models aren't structured in a way that makes them easy to probe, intervene on, or interpret.
That's what makes World Labs' recent essay on "3D as Code" so interesting — and so relevant to 3D interpretability research.
A Quick Glossary
Before diving in, here are the key concepts you'll need:
Mechanistic Interpretability — A subfield of AI safety/alignment research that tries to reverse-engineer neural networks: not just what they output, but how they compute it internally. Think of it as neuroscience for AI.
Activation Patching — An intervention technique where you replace a model's internal activations at a specific layer with those from a different input, then observe how outputs change. Lets you trace which internal computations cause which behaviors.
Probing — Training a small classifier on a model's internal representations to test whether a specific concept (e.g., "depth", "surface normal", "object identity") is linearly encoded in the activations.
NeRF (Neural Radiance Field) — A method for representing 3D scenes implicitly inside a neural network's weights. You query the network with a 3D position + viewing direction, and it returns color + density. Famously opaque: the "scene" lives nowhere you can easily inspect.
Gaussian Splatting (3DGS) — A newer, faster alternative to NeRF. Represents a 3D scene as a cloud of 3D Gaussians (think: fuzzy ellipsoids), each with explicit parameters: position, orientation, opacity, color. Crucially, these are inspectable and editable artifacts.
Residual Stream — In transformer architectures, the vector that flows through the model and gets additively updated by each layer. Interpretability research often focuses on what information is encoded in this stream at each layer.
World Model — A model that builds an internal representation of an environment and can simulate how it changes over time. Relevant for robotics, game AI, and spatial reasoning.
The World Labs Argument
World Labs' essay makes a bold claim: 3D representations are to spatial AI what code is to software.
The analogy goes like this:
- Code is an explicit, inspectable, editable artifact that separates reasoning (writing the algorithm) from execution (running it). It can be versioned, debugged, shared, and composed.
- 3D representations — meshes, Gaussian splats, scene graphs — can play the same role for spatial systems. They externalize structure in a form that humans and machines can both inspect and manipulate.
The alternative — collapsing everything into a single end-to-end model that maps inputs directly to pixels — is like asking a language model to be the program instead of writing it. It might work, but you lose all the affordances that make code powerful: inspectability, composability, reusability.
Their model, Marble, is built around this philosophy. It generates structured 3D outputs (Gaussian splats, meshes) rather than raw pixels. Their experimental interface Chisel lets you give coarse 3D layouts as input — walls, volumes, planes — which Marble then renders into rich, detailed scenes.
Why This Matters for 3D Interpretability
1. Gaussian Splats as Ground Truth Geometry
Most vision models give you outputs (pixels, bounding boxes, feature vectors) without any explicit geometric structure to compare against. Marble externalizes Gaussian splat parameters — position, covariance, opacity, color — as concrete artifacts.
This means you can do something rare in interpretability: correlate internal activations with explicit geometric ground truth. Does the model's residual stream encode splat positions linearly? Do specific attention heads track surface orientation? With exported splats, you have a reference to probe against.
2. The Factorized Stack as a Dissection Surface
World Labs argues for a factorized architecture: separate components for perception, generation, and rendering, connected through 3D interfaces. Each handoff between modules is a natural interpretability seam.
At every boundary you can ask: what does this module "know" about 3D structure, and how is that knowledge encoded? This is mechanistic interpretability's core question, applied to a spatial pipeline where the module boundaries are explicit by design.
3. Chisel as a Causal Intervention Tool
Chisel — the coarse layout → rich scene interface — is almost a ready-made intervention setup.
In standard activation patching, you modify an internal representation and observe how outputs change. With Chisel, you can modify explicit input geometry (move a wall, resize a volume, add an object) and trace how that propagates through internal representations. It's behavioral interpretability without needing weight access — a spatial version of causal tracing.
4. The Scene Graph Hypothesis
The most theoretically interesting question the World Labs framing raises: does Marble internally maintain something like a scene graph?
A scene graph separates geometric structure (where things are, how they relate spatially) from appearance (materials, lighting, texture). If the model has learned this factorization internally — even without being explicitly trained to — you'd expect to find:
- An interpretable subspace encoding layout, orthogonal to one encoding appearance
- View-invariant geometry features that persist across different camera angles
- Causal separation: editing geometry activations changes structure but not style, and vice versa
Testing this would be a clean, novel contribution at the intersection of the World Labs framing and mechanistic interpretability methodology.
The Research Agenda
For a 3D interpretability lab with access to Marble's weights or API, here's what this opens up:
With weights (mechanistic):
- Activation patching across the generation pipeline to locate geometry-encoding layers
- Linear probing for depth ordering, surface normals, occlusion relationships
- Viewpoint-invariance analysis: which features survive camera transformations?
- Searching for a "scene graph circuit" — a set of components that collectively implement layout/appearance factorization
With API only (behavioral):
- Chisel perturbation experiments as proxy interventions
- Contrastive prompting to isolate geometric vs. semantic knowledge
- Sensitivity mapping: how much does output change per unit of input geometry change?
Why Now
Three things are converging:
- Mechanistic interpretability methodology is mature enough to apply to new domains — transformers, circuits, probing, causal tracing all have established tooling
- World models with explicit 3D structure (like Marble) are newly available, giving interpretability researchers the handles they've lacked
- The stakes are rising — as world models get used in robotics, digital twins, and simulation, understanding what they internally represent becomes a safety-relevant question, not just an academic one
The World Labs essay frames this as an engineering choice. For interpretability researchers, it's an invitation.
Further Reading
- 3D as Code — World Labs Blog
- Marble World Model — World Labs
- World Labs API
- Circuits — Distill.pub
- Transformer Circuits Thread
- Gaussian Splatting Paper (Kerbl et al., 2023)
- NeRF: Representing Scenes as Neural Radiance Fields
This article is part of ongoing research at the 3D Interpretability Lab, developed under oourmind.io. If you're working on spatial interpretability and want to collaborate, reach out.
Tags: interpretability 3d machinelearning worldmodels aisafety neuralnetworks gaussiansplatting computerVision
Top comments (0)