PRML v0.1 is a small specification I drafted three weeks ago. It binds an ML evaluation claim — (metric, comparator, threshold, dataset hash, random seed, producer) — to a SHA-256 digest computed over canonical YAML bytes, before the experiment runs. The spec is at spec.falsify.dev/v0.1. The Python reference implementation is on GitHub. v0.2 freezes 2026-05-22.
A specification with one implementation is indistinguishable from that implementation's bugs. So this past weekend I sat down and built a second reference implementation, in Node.js, from scratch. The goal: take the prose spec, ignore the Python source, and produce byte-identical canonical bytes for all twelve v0.1 conformance vectors.
It worked. 12/12 vectors pass byte-for-byte. The implementation is 404 lines of JavaScript with zero runtime dependencies beyond the Node.js standard library. You can run it from impl/js/falsify.js.
What's interesting is what didn't work the first time. The exercise surfaced three quiet portability gotchas — places where the spec's prose and the spec's twelve vectors silently disagreed about what the bytes should be. Each of them is a real defect in the v0.1 specification, and each is now an action item for v0.2.
This post is the three findings.
Finding 1 — Sixty-four-bit integer precision
The first failing vector was TV-006: seed: 18446744073709551615. That's $2^{64} - 1$, the largest unsigned 64-bit integer the v0.1 spec allows for the seed field.
Naive Node.js parses this through JSON.parse into a Number. JavaScript's Number is IEEE-754 binary64. The largest integer you can safely represent in binary64 is $2^{53} - 1$, which is about $9 \times 10^{15}$. Above that, integers round to the nearest representable float.
So when Node.js read the test vector input file, the seed 18446744073709551615 quietly became 18446744073709552000 — a value $385$ larger than what the test vector said. The canonicalizer then dumped that wrong number, and the hash didn't match.
The same problem hits Go (int64, $2^{63} - 1$ ceiling), Java (same), and any other language whose default integer type isn't unbounded.
| Language | Native integer ceiling | TV-006 round-trips? |
|---|---|---|
| Python 3 | unbounded | yes |
| JavaScript Number | $2^{53} - 1$ | no |
Go int64
|
$2^{63} - 1$ | no |
Java long
|
$2^{63} - 1$ | no |
Rust u64
|
$2^{64} - 1$ | yes |
The PyYAML-based Python reference implementation works only because Python's int is arbitrary-precision. The spec did not mention this, anywhere.
The fix in the Node.js implementation: parse the JSON text with a regex that wraps any 16-or-more-digit integer in a sentinel string before JSON.parse sees it, then unwrap to BigInt after parse. Twenty lines of JavaScript that no spec reader could have predicted from the prose.
The fix for v0.2: make seed a quoted decimal string in the canonical form: seed: '18446744073709551615'. Languages with weak integer types now get a string and can opt into BigInt themselves. The format is unambiguous from the bytes alone.
Finding 2 — Integer-valued floats lose their type
The next failing vector was TV-008: a manifest with threshold: 1.0.
The expected canonical bytes contain threshold: 1.0. The actual produced bytes contain threshold: 1. The hash differed. This bothered me for ten minutes.
It turns out: when JSON parsers encounter 1.0 in a JSON document, almost all of them lose the float-ness. JavaScript's JSON.parse returns Number(1), indistinguishable at runtime from the integer 1. When a YAML emitter then takes that number and serialises it, it has no signal that the producer wrote 1.0 rather than 1. So it emits 1. The hash drifts.
PyYAML doesn't have this problem because PyYAML's load-and-dump cycle uses Python's native float type, which round-trips through 1.0 cleanly. JavaScript's Number cannot.
This is a property of the JSON format itself. JSON does not distinguish integer-valued floats from integers. The information is destroyed at parse time, before any canonicalizer runs.
The fix in the Node.js implementation: a small "this field should always render as a float" set, currently containing one element: {'threshold'}. The canonicalizer checks the field name and forces .0 when the value is integer-valued. A field-specific hack.
The fix for v0.2: specify that threshold always renders with at least one decimal place in the canonical form. Two lines in the spec close it. No field-aware emitter logic required.
Finding 3 — "Plain scalar" disagreements
The third failing case was the same vector, TV-008: comparator: ==.
The expected canonical bytes have comparator: ==. JavaScript's js-yaml library produced comparator: '==' — single-quoted. SHA-256 is unforgiving; this difference sets a different hash.
YAML 1.1 and 1.2 both have a notion of "plain scalars": strings that don't need quotes because they contain no characters or patterns that would confuse the parser. A long list of rules governs whether a particular string can be plain: must not start with an indicator character (-, ?, :, ,, [, ], {, }, #, &, *, !, |, >, ', ", %, @, `), must not contain colon-space, must not look like a number/boolean/null/timestamp, must not have leading/trailing whitespace, etc.
PyYAML and js-yaml implement this predicate with subtly different conservatism. PyYAML accepts == as a plain scalar because none of the rules fire — there is no indicator character, no number resolution, no timestamp pattern. js-yaml is more defensive: it sees a string that could be confusing and quotes it.
For >=, <=, >, <, both libraries quote — the leading character is in the indicator set. So those work. Only == is special, and only == differs.
The fix in the Node.js implementation: I rewrote the plain-scalar predicate from scratch, in about fifty lines, matching PyYAML's behaviour. It checks for indicator-prefix, leading/trailing whitespace, colon-space and hash-space, number-resolution regex, boolean/null set, timestamp regex, and control-character escape. With this hand-rolled predicate, TV-008 reproduces.
The fix for v0.2: publish a formal canonicalization grammar. Or, simpler and aggressive: drop the plain-scalar concept entirely. Always single-quote every string scalar in the canonical form. The output is ~10% larger; the ambiguity surface is zero. No predicate needed; no second implementation reverse-engineering an emitter.
What this exercise really proves
It does not prove that PRML is bulletproof. It proves that PRML is implementable in a second language — which, at the v0.1 stage, was not yet established. A specification existing in only one implementation is indistinguishable from that implementation's bugs. PRML is now demonstrably more than that.
It also does not prove that all PyYAML edge cases are covered. The Node.js implementation matches the twelve current vectors, which exercise specific cases. Adding new vectors (Unicode normalisation, control characters, very long strings, unusual line-folding) might reveal further divergences.
The general lesson: a content-addressed format has to be specified in terms of the bytes it produces, not in terms of the emitter that produces them. PyYAML's safe_dump is a stable, careful, twenty-year-old emitter. It is not a specification. The next time someone wants to write a content-addressed YAML format — for SBOMs, for build provenance, for AI evaluation claims, anything — write the canonicalization grammar first, and then implement it. Don't describe an emitter; describe bytes.
v0.2 action items, summarised
The findings translate to three concrete v0.2 specification changes:
-
seedis a quoted decimal string. Closes 64-bit integer precision portability. -
thresholdalways renders with at least one decimal place. Closes integer-valued float type loss. - Always-quoted string scalars. Eliminates the plain-scalar predicate ambiguity entirely.
Plus a fourth, broader change:
- Publish a formal canonicalization grammar in ABNF. With the always-quoted rule, the grammar is short — about forty production rules. It becomes the source of truth for conformance, replacing the implicit "PyYAML's behaviour" reference.
The full v0.2 roadmap, including six other extension fields (algorithm agility, tolerance, multi-claim manifests, mandatory signatures for high-risk Annex III, twelve new conformance vectors, sidecar format extension), is at spec/v0.2/ROADMAP.md. The freeze is targeted 2026-05-22 — three weeks from this writing — and the five open RFC questions in the roadmap are the parts where outside opinion would carry the most weight.
How to read along
If you want to see the artefacts directly:
-
The Node.js implementation:
impl/js/falsify.js— 404 LOC, MIT. -
The portability findings document:
spec/analysis/canonicalization-portability-v0.1.md. -
The conformance suite:
spec/test-vectors/v0.1/— JSON, twelve entries with locked digests. -
The v0.1 spec:
spec.falsify.dev/v0.1. -
The arXiv preprint (working draft):
spec/paper/— 14-page LaTeX, CC BY 4.0. - Public review thread: GitHub Discussion #6.
If you want to add a third implementation in a third language — Rust, Go, Java, Swift, OCaml — the test vectors are the contract. If your canonicalizer reproduces all twelve byte-for-byte, your implementation is conformant. Open a PR; I'll add it.
— Studio-11 (independent), hello@studio-11.co
Top comments (0)