DEV Community

Cover image for Dev Log: It generates! (Mostly. The strings are IOUs.)
Ernesto Herrera Salinas
Ernesto Herrera Salinas

Posted on

Dev Log: It generates! (Mostly. The strings are IOUs.)

I’m reviving Munchausen, a C# NuGet package I started 9 years ago. This is part 7 of an 8-part series documenting both the development process and the engineering decisions behind bringing the project back to life.

This is the Dev Log: the practical work, cleanup, implementation steps, and day-to-day progress behind this part of the project.

M6 is the milestone where Munchausen finally does the thing. Generate() runs.
There's a traversal that constructs an object, fills its members in order, runs
derivations, handles cancellation, wraps errors, and refuses to recurse forever.
After five milestones of plumbing, watching a definition actually produce an object
is a genuine hit of dopamine.

But first I had to answer a question I'd been dodging.

The chicken-and-egg

The runtime's whole job is to execute the plan, which means running generators. But
the real generators, the things that turn a FirstName into "Anthony", are
datasets, and datasets are the next milestone. The string type-default literally
is "two lorem words," and lorem lives in M7. So if I run the runtime against the
canonical Car, the moment it hits Make (a string), it throws "bound in M7."

This exposed a genuine fork in my milestone plan. I could build the runtime now
and defer dataset-backed generation to M7, or drag datasets forward and blur the
boundary. I chose the clean split: bind only the pure-PRNG type defaults in M6,
ints, bools, guids, doubles, anything made from the random stream alone, and let
strings, dates, and semantic generators arrive with the datasets they need.

The realization that made this comfortable was that almost every runtime
behavior, lifecycle order, cancellation, exception wrapping, depth limits,
cycle detection, and reproducibility, can be tested with primitive-only fixtures
and user delegates. M6 could be rigorous without pretending to be complete.

The surprise while capturing a golden

Here's my favorite moment of the milestone. I needed a golden for the runtime, so
I built a tiny model with neutral field names, Alpha, Beta, Gamma, Delta,
Epsilon, all primitive types, specifically to avoid the unbound semantic
generators. Ran it. Boom, "bound in M7."

Wait, what? They're all primitives. Which one tripped a semantic generator?

Epsilon. Because epsilon ends with lon, and lon is a catalog alias for
longitude, which is a double, and Epsilon is a double. The suffix matcher
did exactly what it was built to do and matched Epsi-LON to longitude. A
perfectly correct false positive, and a perfect little demonstration of why the
catalog uses confidence levels and modes in the first place. I renamed it Score
and moved on, grinning. The inference engine caught my own test off guard.

Getting construction right (and fast)

One detail I cared about: no reflection during generation. The accessors were already compiled back in M2. Construction was the last reflective holdout, and I'll fully fix that in M8, where the allocation test forces the issue, but M6 lays the groundwork: collection materializers are compiled to delegates at build time so the per-element loop never reflects.

And the exception story took real care. When your With delegate throws, you should see your exception as the InnerException, not some TargetInvocationException wrapper from reflection's DynamicInvoke. So user delegates are adapted with compiled Expression.Invoke, which throws cleanly. Wrap exactly once, tag it with the right phase (Construction / MemberPopulation /
Derivation), let cancellation pass straight through. Details, but they're the difference between an exception you can debug and one you curse at.

Where it leaves things

You can now generate primitive-and-nested graphs, in batches, reproducibly, with proper cancellation and depth/cycle safety. The lifecycle is exactly ordered, the goldens reproduce across runs, and a self-referential Node terminates cleanly at depth 1 instead of blowing the stack. The canonical Car still can't fully generate, its strings are IOUs, but every mechanism is real and tested.

What's next

M7: the datasets. The biggest milestone by far. Eight public dataset classes, the English data tables, VIN check digits, RFC-safe email domains, and the wiring that finally connects every semantic name to a real generator. It will test whether the abstractions from M4 through M6 really accept their missing leaves as cleanly as I hoped.

Top comments (0)