DEV Community

Ernesto Herrera Salinas
Ernesto Herrera Salinas

Posted on

Engineering Post: Cashing the IOUs: eight datasets and a check-digit

M7 is the biggest milestone. It binds every leaf the previous two milestones left
as placeholders. Eight public dataset classes, the English data tables, the
internal-only generators, Dataset<T>() resolution, and the wiring that connects
every semantic catalog name to a real implementation. This is where I find out
whether the abstractions from M4 through M6 can accept their missing leaves
without being redesigned.

The datasets

RandomData (the primitive façade over DeterministicRandom), plus NameData,
InternetData, AddressData, DateData, LoremData, VehicleData,
CommerceData. I designed their public surfaces before implementation, then used
the PublicApiAnalyzer to check that the code introduced nothing extra.
Constructors are internal; callers reach datasets through the context, never
new.

The most important dataset choice is that realism must remain fictionally safe:
email domains come only from the RFC 2606 reserved set
(example.com/org/net), IPs from TEST-NET documentation blocks, phone numbers
from the 555-01xx range, and VINs are structurally valid but not
manufacturer-registered. That last one is the fun bit:

// 17 chars, no I/O/Q, with a real check digit at position 9
characters[8] = ComputeCheckDigit(characters);  // weighted transliteration mod 11
Enter fullscreen mode Exit fullscreen mode

The contract test recomputes the check digit with an independent implementation and
asserts it matches. Because the two use different code paths, a switch and a
lookup string, this is a genuine cross-check rather than a tautology.

Binding the catalog

In M4 the semantic catalog stored generator names ("Name.First",
"Internet.Email"). M7 maps each name to a real delegate:

["Name.First"]     = c => c.Name.First(),
["Internet.Email"] = c => c.Internet.Email(),
["Vehicle.Vin"]    = c => c.Dataset<VehicleData>().Vin(),
["internal phone"] = InternalGenerators.Phone,
Enter fullscreen mode Exit fullscreen mode

Dates need a twist: the catalog's date generators yield DateTimeOffset, but the
member might be a DateTime or DateOnly. So the compiler adapts the generator to
the member's declared type. The type-default generators extended too, string
becomes Lorem.Words(2), Uri becomes an Internet.Url, and the date/time family
binds to DateData.

A determinism-preserving sleight of hand

GenerationContext.Random became the public RandomData façade, but it wraps the
same DeterministicRandom the type generators were already using. Since
RandomData.Int just forwards to DeterministicRandom.Int, the byte stream is
byte-for-byte identical to M6. The M6 golden still passes untouched. Datasets are
instantiated once per operation and cached, all drawing from that single stream, so
"every dataset method draws only from the operation stream" is structurally true, and a determinism test proves two same-seed operations produce identical output.

The tooling mystery, solved

Remember M5's "duplicate source file" error that broke dotnet format? Here's the
diagnosis: the analyzer package already auto-includes PublicAPI.*.txt, and my
M0 csproj also added them explicitly via <AdditionalFiles>. Double-registered →
dotnet format choked. Deleting that one ItemGroup fixed it, and dotnet format
happily generated all 75 new dataset surface entries, correctly, down to
int.MinValue-2147483648. A whole class of future pain, gone, because I
finally read the warning properly.

What's next: M8, the finale

The last milestone tests the complete design: Explain() makes inference
inspectable, the cached Lie<T> path removes configuration, reflection-free
construction faces an allocation test, and final goldens cover the complete
pipeline. After M8, I should know whether v1.0 works as one coherent library.

Top comments (0)