DEV Community

Cover image for Most JSON-to-Schema tools over-fit one example. mkschema merges many samples.
benjamin
benjamin

Posted on

Most JSON-to-Schema tools over-fit one example. mkschema merges many samples.

You need a JSON Schema for an API response, a config file, a stream of log records — for validation, docs, or contract tests. Hand-writing it is tedious and you'll get it subtly wrong. So you reach for a "JSON to JSON Schema" generator… and it hands you a schema built from one example: every field marked required, every type pinned to whatever that single record happened to contain. The first real payload that omits an optional field fails validation against a schema you just generated.

The problem isn't generating a schema. It's that one example isn't your data. So I built mkschema to merge many samples. Zero dependencies, no network.

$ printf '{"id":1,"name":"Ada","age":30}\n{"id":2,"age":30.5}\n' | npx mkschema --ndjson -

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "age":  { "type": "number" },     // 30 (int) and 30.5 (float) unioned
    "id":   { "type": "integer" },
    "name": { "type": "string" }
  },
  "required": ["age", "id"]           // name was missing from sample 2 → optional
}
Enter fullscreen mode Exit fullscreen mode

Feed it one sample and everything is required (same as the others). Feed it your actual data — a --ndjson log file, a folder of fixtures, a paged API dump — and it figures out what's really there:

  • A key in every sample is required; a key in only some is optional.
  • An integer here and a float there union to number; genuinely different types become a type array.
  • String formats are inferred (date-time, date, email, uuid, ipv4, uri) — but only kept when every sample of that field agrees.

Usage

mkschema response.json                 # one file
mkschema a.json b.json c.json          # merge several files
mkschema --ndjson events.ndjson        # one sample per line
curl -s https://api/users | mkschema -  # straight from an API
mkschema users.json --title User > user.schema.json
Enter fullscreen mode Exit fullscreen mode

It writes the schema to stdout (draft 2020-12), with properties and required
sorted, so it diffs cleanly in version control.

A few honest notes

  • Zero dependencies, both builds — a Node build and a Python build that produce identical output. npx mkschema or pip install mkschema.
  • Numbers are classified by value, so 5.0 is an integer — and the two builds agree (a subtlety that took an adversarial pass to get right, along with rejecting NaN/Infinity identically and not mistaking a user@host URL for an email).
  • It infers structure, not constraints. You get the scaffold from real data; add your own enum, minLength, pattern on top.

Links


How do you produce JSON Schemas today — by hand, from a single example, or from a
framework's types? And would "schema from N real samples" actually fit your
workflow?

Top comments (0)