You need a JSON Schema for an API response, a config file, a stream of log records — for validation, docs, or contract tests. Hand-writing it is tedious and you'll get it subtly wrong. So you reach for a "JSON to JSON Schema" generator… and it hands you a schema built from one example: every field marked required, every type pinned to whatever that single record happened to contain. The first real payload that omits an optional field fails validation against a schema you just generated.
The problem isn't generating a schema. It's that one example isn't your data. So I built mkschema to merge many samples. Zero dependencies, no network.
$ printf '{"id":1,"name":"Ada","age":30}\n{"id":2,"age":30.5}\n' | npx mkschema --ndjson -
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"age": { "type": "number" }, // 30 (int) and 30.5 (float) unioned
"id": { "type": "integer" },
"name": { "type": "string" }
},
"required": ["age", "id"] // name was missing from sample 2 → optional
}
Feed it one sample and everything is required (same as the others). Feed it your actual data — a --ndjson log file, a folder of fixtures, a paged API dump — and it figures out what's really there:
- A key in every sample is
required; a key in only some is optional. - An
integerhere and afloatthere union tonumber; genuinely different types become atypearray. - String formats are inferred (
date-time,date,email,uuid,ipv4,uri) — but only kept when every sample of that field agrees.
Usage
mkschema response.json # one file
mkschema a.json b.json c.json # merge several files
mkschema --ndjson events.ndjson # one sample per line
curl -s https://api/users | mkschema - # straight from an API
mkschema users.json --title User > user.schema.json
It writes the schema to stdout (draft 2020-12), with properties and required
sorted, so it diffs cleanly in version control.
A few honest notes
-
Zero dependencies, both builds — a Node build and a Python build that
produce identical output.
npx mkschemaorpip install mkschema. -
Numbers are classified by value, so
5.0is aninteger— and the two builds agree (a subtlety that took an adversarial pass to get right, along with rejectingNaN/Infinityidentically and not mistaking auser@hostURL for an email). -
It infers structure, not constraints. You get the scaffold from real data;
add your own
enum,minLength,patternon top.
Links
- npm: https://www.npmjs.com/package/mkschema
- PyPI: https://pypi.org/project/mkschema/
- Source: https://github.com/jjdoor/mkschema
How do you produce JSON Schemas today — by hand, from a single example, or from a
framework's types? And would "schema from N real samples" actually fit your
workflow?
Top comments (0)