Mark Lenhardt

Posted on Feb 27

json-canon: A Strict RFC 8785 Implementation in Go for Deterministic JSON

#go #json #cryptography #opensource

JSON is not deterministic. Object key order, number formatting, and whitespace vary across serializers, languages, and runtime versions. For most applications this doesn’t matter. For systems that sign, hash, or compare JSON by its raw bytes, it is a correctness failure.

RFC 8785 — the JSON Canonicalization Scheme (JCS) — defines a canonical form that eliminates this nondeterminism. It specifies lexicographic key sorting, ECMAScript-compatible number serialization, and I-JSON constraints to produce byte-identical output for logically equivalent data.

json-canon is an infrastructure-grade RFC 8785 implementation in pure Go. v0.2.0 was released on February 27, 2026. This post explains why it was built, the engineering decisions behind it, and who it is for.

The problem in practice

Consider two services that independently serialize the same data structure. One produces {"amount":1e2} and the other produces {"amount":100}. Both are valid JSON representing the same value. If either system signs or hashes the serialized bytes, the signatures will not match.

This class of problem appears wherever JSON participates in cryptographic operations: digital signatures over API payloads, content-addressed storage, deduplication pipelines, audit trail verification, and determinism proofs in replay systems. The root cause is always the same — JSON serializers do not guarantee a single byte representation for a given logical value.

RFC 8785 solves this by defining a canonical form. But an RFC is a specification, not an implementation. The implementation details matter.

Why another implementation

Existing JCS implementations generally fall into one of two categories: thin wrappers around a language’s built-in JSON serializer, or lenient parsers that accept malformed input and normalize it into canonical form.

Both approaches create problems for infrastructure use. A wrapper inherits the serialization behavior of whatever JSON library version is installed — behavior that can change between releases. A lenient parser that silently accepts malformed JSON means two systems may canonicalize the same malformed input differently, producing different outputs with no error reported.

json-canon takes a different approach. It owns every byte of its output through a hand-written strict parser and an ECMA-262-compatible number formatter implemented from scratch. It has zero external dependencies in go.mod. This eliminates the risk of formatting drift from upstream library changes.

The parser rejects invalid input immediately and never produces partial output. If the input is not well-formed JSON, the operation fails with a classified error. This is a deliberate design choice: a canonicalizer that accepts invalid input is a normalizer, and normalizers do not provide the determinism guarantees that signing and hashing pipelines require.

Failure taxonomy

Every rejection is classified into one of eleven stable failure classes: INVALID_UTF8, INVALID_GRAMMAR, DUPLICATE_KEY, LONE_SURROGATE, NONCHARACTER, NUMBER_OVERFLOW, NUMBER_NEGZERO, NUMBER_UNDERFLOW, BOUND_EXCEEDED, NOT_CANONICAL, and CLI_USAGE. Each failure class maps to a documented exit code. Errors include the failure class, byte offset, and a diagnostic message.

This matters for pipeline integration. When a canonicalization step fails in a CI/CD pipeline or an audit system, operators need to know what failed and where — not just that something went wrong. Stable failure classes also allow downstream systems to branch on error type without parsing human-readable error messages.

Resource bounds

Seven configurable limits are enforced at parse time: nesting depth (default 1,000), input size (64 MiB), total values (1M), object members (250K), array elements (250K), string bytes (8 MiB), and number token length (4,096). All bound violations classify as BOUND_EXCEEDED.

These defaults are tuned for typical infrastructure workloads. They can be adjusted via the Go API. The purpose is DoS resilience — untrusted input should not be able to cause unbounded resource consumption in a canonicalization step.

Conformance evidence

Unit tests prove local behavior. json-canon goes further.

76 traced requirements map from RFC clauses to requirement IDs to implementation symbols to tests. The conformance harness validates bidirectional coverage: no requirement exists without a test, and no test exists without a requirement. This traceability is the mechanism for substantiating the claim that the implementation conforms to RFC 8785 — not as an assertion, but as auditable evidence.

Additionally, over 286,000 oracle vectors validate the ECMA-262 number formatting logic against an independent reference implementation. Number formatting is the most error-prone part of any JCS implementation because IEEE 754 double-precision semantics and the ECMAScript Number::toString algorithm have subtle edge cases around exponent boundaries, negative zero, and precision limits.

An offline cold-replay framework validates behavioral stability across Linux distributions, kernel versions, and CPU architectures (x86_64 and arm64). Evidence bundles bind source identity, binary checksums, and matrix/profile digests. Release gating requires validated evidence from both architectures before publication.

Stable CLI ABI

json-canon ships a CLI tool (jcs-canon) with a versioned ABI under SemVer. Commands, flags, exit codes, failure class semantics, and stream contracts are defined in a machine-readable manifest. Breaking changes require a major version bump.

This is relevant for teams that embed jcs-canon in shell scripts, CI pipelines, or Makefiles. A canonicalization tool that changes its exit code semantics or output format between minor releases breaks every downstream consumer.

# Canonicalize
echo '{"b":2,"a":1}' | jcs-canon canonicalize -
# {"a":1,"b":2}

# Verify canonical form
jcs-canon verify document.json
# exit 0 if canonical, exit 2 if not

Go library usage

import (
    "github.com/lattice-substrate/json-canon/jcs"
    "github.com/lattice-substrate/json-canon/jcstoken"
)

v, err := jcstoken.Parse([]byte(`{"b":2,"a":1}`))
if err != nil {
    log.Fatal(err) // classified error with stable failure class
}
out, _ := jcs.Serialize(v)
// {"a":1,"b":2}

Install: go get github.com/lattice-substrate/json-canon

Supply-chain integrity

Release binaries are static Linux builds (CGO_ENABLED=0). Artifacts include SHA-256 checksums and SLSA build provenance attestation via GitHub Actions. All CI and release workflow actions are pinned by commit SHA.

Who should use this

json-canon is designed for a specific set of use cases:

Signing or hashing JSON documents where signatures must verify across languages, services, and time.
Content-addressable storage where identical logical content must produce identical hashes.
Audit and replay pipelines where byte drift between runs is a correctness failure.
Pipeline validation gates where untrusted JSON input requires strict rejection with stable, machine-readable exit codes.

Who should not use this

If you need pretty-printing or human-readable formatting, use jq.
If you need lenient parsing of malformed JSON, this tool will reject it by design.
If you need macOS or Windows support, the supported runtime is Linux only.
If you need a general-purpose JSON transformation toolkit, this is a canonicalization primitive, not a query engine.

DEV Community