I wanted to describe a network, not assemble it: FlowBuilder in flodl

#rust #machinelearning #showdev #deeplearning

Last post was why. This one is what it looks like.

The thing I said at the end of last post was: with flodl I don't rewrite when I pivot. I add or remove a graph member. This post is the primitive that makes that sentence true. Meet FlowBuilder. It's a declarative graph DSL for neural networks, and it's the API I'd find hardest to give up.

The gap

By my third Python pivot, the wiring code outweighed the model. Freezing submodules, loading partial checkpoints, rerouting a tensor through a newly-inserted path, unfreezing for a finetune: each of these was three to ten lines of procedural glue that had nothing to do with the architecture. The model was in there somewhere, but finding it meant reading past everything else first.

What I wanted was simple. I wanted to describe the network. What's its structure? What's tagged? What's frozen? What loads from where? And then I wanted the framework to handle the wiring.

Procedurally assembling a network from module instances and class hierarchies is fine when the shape stays stable. Mine wasn't. A shape that pivots every two days and nests frozen subgraphs inside other frozen subgraphs doesn't want to be a script. It wants to be a graph.

What FlowBuilder looks like

Here's a small model with a tagged hidden activation and a residual connection:

let model = FlowBuilder::from(Linear::new(784, 128)?)
    .through(GELU)
    .tag("hidden")
    .through(LayerNorm::new(128)?)
    .also(Linear::new(128, 128)?)
    .tag("residual")
    .through(Linear::new(128, 10)?)
    .build()?;

Top to bottom, the architecture is visible in the code. No construction state to hold in your head; the structure is the text.

The method names carry the intent:

from(...) starts the flow with an entry module.
through(...) chains a module in series. Stream in, stream out.
tag("name") marks the current stream position for later reference: observation, freezing, checkpoint loading.
also(Linear::new(...)) adds a residual: output = stream + module(stream).
build() finalizes and validates. Unmerged streams and cycles surface as errors at build time, not at forward time.

There's more in the vocabulary. fork for side-branches that don't disturb the main stream. split with merge(MergeOp::Add) or merge(MergeOp::Mean) for parallel branches that recombine. switch and gate for routing. loop_body for iteration. map for applying a body across slices or tagged collections. The thing I care about is that the builder stays flat. A complex graph is more lines, not more indentation.

When I pivot a shape, I add or remove lines. The rest of the build doesn't move.

The graph renders itself

A Graph carries enough structural information to draw itself. One method call:

graph.svg(Some("model.svg"))?;

That writes an SVG with modules as nodes and stream connections as edges. Tags appear annotated on the nodes they marked, and parallel-execution levels are grouped as clusters. For training loops, svg_with_profile(...) colors nodes by measured forward-pass time so the hot path is visible instead of guessed at.

Layout runs through graphviz. It works well up to a few dozen nodes. Past that the visualization gets dense and I start squinting. That's one of the edges I'm still working on. The DOT output is available raw for people who want to pipe it through their own tooling.

Graph trees

Here's the part I think matters most, because no other DL framework I know has it at this shape.

A Graph implements Module. That means a Graph can be fed into another FlowBuilder anywhere a module is expected:

let encoder = FlowBuilder::from(Linear::new(32, 64)?)
    .through(GELU)
    .tag("hidden")
    .label("encoder")
    .build()?;

let model = FlowBuilder::from(encoder)
    .through(Linear::new(64, 10)?)
    .build()?;

.label("encoder") registers the inner graph as a named child of the outer. Once composed, the inner graph's structure is addressable from the outer scope via label paths:

model.freeze("encoder")?;                              // freeze every parameter in the subgraph
model.load_subgraph_checkpoint("encoder", path)?;      // load weights into just that subgraph
model.tagged_at("encoder.hidden")?;                    // read the tagged activation across the boundary
model.subgraph("encoder")?;                            // recover the child Graph

Nesting composes. An encoder inside a classifier inside a multi-head pipeline gives you paths like head.classifier.encoder.hidden, and everything addressable by label keeps working at any depth. Freeze, thaw, load, observe, swap.

This is the primitive FBRL needed. A trained letter reader is a frozen Graph inside the word reader. A trained word reader is a frozen Graph inside the line reader. Each level addressable by name, each level's checkpoint loadable independently, gradients cleanly blocked at the frozen boundary.

Transfer learning. Multi-phase pretraining. Anywhere you're stitching trained components into larger architectures and you want the composition to stay legible in a year when you come back to it.

What it isn't yet

One real edge: when the wiring is wrong, the error messages are functional but not great. If you merge two branches with mismatched shapes, you get a shape-mismatch error. You do not get told which branch of which split produced the offender. For a short graph you eyeball it. For a deep graph you add prints. I have a list of places where the errors need to carry more structural context back out to the user. That's next-round work.

I flag this one because it's the rough edge I touch most often. The shape of the API is right. The ergonomics of the error path is what needs sharpening.

What hooked me (again)

I started flodl because FBRL needed a composition primitive Python didn't give me cleanly. By the time FlowBuilder was working, I'd noticed I was solving a framework problem I cared about for its own sake. Ergonomics pulled me in first.

Then performance. Then distributed training. Then convergence under heterogeneous compute.

This is the part of the journey I didn't expect. I'll walk through the rest of it post by post.

Previous post: Why I built a Rust deep learning framework (and what I got wrong twice first).

Next post: how flodl actually benchmarks against PyTorch on real architectures, and what the libtorch FFI bet from last post actually buys you.

flodl: flodl.dev · github.com/flodl-labs/flodl · @flodl_dev