Two years ago I shipped a webhook handler without input validation. A partner started sending us a slightly malformed payload (an extra field, one missing required field) and our worker silently processed garbage into the database for three days before anyone noticed. By the time I traced it, we had 12,000 corrupt rows and a very awkward customer call.
I learned JSON Schema the next week. This post is the cheat sheet I wish someone had handed me on day one — the keywords I actually use, the gotchas that bit me again later, and the honest comparison with OpenAPI and TypeScript types.
The seven types you'll use
Every JSON value is one of seven types: string, number, integer, boolean, object, array, or null. The integer type is a JSON Schema convenience (raw JSON only has number) but the schema layer enforces "no decimal places." A minimal schema:
{
"type": "string"
}
That validates any string and rejects everything else. You can also accept a union:
{
"type": ["string", "null"]
}
Useful for optional fields you want to keep present in the payload rather than omitting. Before I write more than a one-line schema I usually paste a sample payload into a JSON formatter to see the actual shape pretty-printed. Type errors almost always come from misreading the structure.
Objects, required, and the additionalProperties trap
Most real validation work happens on objects. The three keywords you use every day are properties, required, and additionalProperties:
{
"type": "object",
"properties": {
"email": { "type": "string" },
"age": { "type": "integer" },
"verified": { "type": "boolean" }
},
"required": ["email"],
"additionalProperties": false
}
Three gotchas to internalize. First, properties describes each field but does NOT make any of them required. Without the required array, every property is optional. Second, required is a separate list of property names that must be present (presence only, you still need type to validate the value). Third, additionalProperties: false rejects any property not listed. Without this line, the schema accepts arbitrary extra fields silently. This was the bug that hit me — the partner was sending email_address instead of email, and without additionalProperties: false my schema accepted it as "no email + an unknown field."
Set additionalProperties: false by default. Remove it only when you genuinely want a free-form object. For maps with arbitrary keys but a known value type, use it as a schema instead of a boolean:
{
"type": "object",
"additionalProperties": { "type": "number" }
}
That validates any object where every value is a number. Perfect for price lookup tables, feature-flag percentages, or anything keyed dynamically.
String validation: minLength, pattern, format, enum
Real string validation goes beyond "is it a string." The keywords that earn their keep:
-
minLength/maxLength, integer bounds on UTF-16 code units (not bytes, not graphemes) -
pattern, ECMA-262 regex the string must match somewhere (use^...$anchors for a full match) -
format, named formats likeemail,uri,date,date-time,uuid,ipv4,ipv6 -
enum, a fixed list of allowed values (works for any type) -
const, a single allowed value (equivalent to a one-item enum)
A practical username field:
{
"type": "string",
"minLength": 3,
"maxLength": 20,
"pattern": "^[a-zA-Z0-9_]+$"
}
One gotcha that cost me a day: format is informational by default in older drafts. You must enable format assertion in your validator. Ajv requires ajv-formats. Python jsonschema needs format_checker. Without it, "format": "email" documents intent but does not actually reject invalid emails. See the JSON Schema spec for format for the full list and the assertion behavior per draft.
Number validation: minimum, maximum, multipleOf
For numbers and integers, the validation keywords are arithmetic:
-
minimum/maximum, inclusive bounds -
exclusiveMinimum/exclusiveMaximum, exclusive bounds (in Draft 2020-12 these take a number, in older drafts they took a boolean) -
multipleOf, the value must be a multiple of this number
Validating a percentage that must be 0 to 100 in 0.01 increments:
{
"type": "number",
"minimum": 0,
"maximum": 100,
"multipleOf": 0.01
}
multipleOf has a floating-point trap I keep getting wrong. 0.1 is not exactly representable in IEEE 754, so { "multipleOf": 0.1 } will sometimes reject values you expect to pass. For money, I now store and validate as integer cents ({ "type": "integer", "minimum": 0 }). It is the same precision argument behind storing prices in the smallest currency unit everywhere else in the stack.
Array validation: items, minItems, uniqueItems
For arrays the workhorses are items (schema applied to every element), minItems / maxItems (length bounds), and uniqueItems (rejects duplicates by deep equality). A list of unique tags:
{
"type": "array",
"items": { "type": "string", "minLength": 1 },
"minItems": 1,
"maxItems": 10,
"uniqueItems": true
}
For positional tuples where each index has a different schema, use prefixItems in Draft 2020-12 or items as an array in older drafts. A coordinate pair where index 0 is longitude and index 1 is latitude:
{
"type": "array",
"prefixItems": [
{ "type": "number", "minimum": -180, "maximum": 180 },
{ "type": "number", "minimum": -90, "maximum": 90 }
],
"items": false
}
The trailing "items": false rejects any extra elements beyond the two declared positions. The array equivalent of additionalProperties: false.
Schema composition: $ref, allOf, oneOf, anyOf
Once your schemas grow past a single page, you will want to break them up and combine them. JSON Schema has four composition keywords:
-
$ref, reuse another schema by JSON Pointer (e.g.,"#/$defs/address"or an external URL) -
allOf, data must validate against every subschema (intersection / mixin) -
anyOf, data must validate against at least one (union, OK if multiple match) -
oneOf, data must validate against exactly one (XOR, rejects if zero or multiple match)
A reusable address schema referenced from two parents:
{
"$defs": {
"address": {
"type": "object",
"properties": {
"street": { "type": "string" },
"city": { "type": "string" },
"country": { "type": "string", "minLength": 2, "maxLength": 2 }
},
"required": ["street", "city", "country"]
}
},
"type": "object",
"properties": {
"shipping": { "$ref": "#/$defs/address" },
"billing": { "$ref": "#/$defs/address" }
}
}
For discriminated unions (event types, message kinds), oneOf with a const discriminator is the standard pattern:
{
"oneOf": [
{ "type": "object", "properties": { "kind": { "const": "email" },
"to": { "type": "string", "format": "email" } }, "required": ["kind", "to"] },
{ "type": "object", "properties": { "kind": { "const": "sms" },
"phone": { "type": "string", "pattern": "^\\+[1-9]\\d{1,14}$" } }, "required": ["kind", "phone"] }
]
}
A real signup schema
Putting every keyword together, here is roughly the schema I now use for a signup endpoint:
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "SignupRequest",
"type": "object",
"properties": {
"email": { "type": "string", "format": "email", "maxLength": 254 },
"password": { "type": "string", "minLength": 12, "maxLength": 128 },
"username": { "type": "string", "pattern": "^[a-zA-Z0-9_]{3,20}$" },
"age": { "type": "integer", "minimum": 13, "maximum": 120 },
"country": { "type": "string", "enum": ["US", "UK", "CA", "AU"] },
"newsletter":{ "type": "boolean", "default": false },
"referrals": { "type": "array", "items": { "type": "string", "format": "email" },
"maxItems": 5, "uniqueItems": true }
},
"required": ["email", "password", "username", "age", "country"],
"additionalProperties": false
}
It enforces the email format with the 254-char maximum from RFC 5321, a 12-character minimum password from NIST SP 800-63B, a regex-validated username, an integer age within plausible bounds, a closed enum of supported countries, an optional boolean with a documented default, and an optional referral list capped at 5 unique emails. The trailing additionalProperties: false is the line that would have saved me three days and 12,000 rows two years ago.
Tooling: Ajv (Node) and jsonschema (Python)
Declare which draft you target with the $schema keyword at the root. The two production-grade validators I reach for:
-
Ajv for Node.js and browser, the fastest JS validator, supports Draft 2020-12. Install
ajvandajv-formatstogether if you useformat. Compile schemas once at startup withconst validate = ajv.compile(schema), then callvalidate(data)on every request. This is 10 to 100 times faster than recompiling per call. -
jsonschemafor Python, the reference Python validator. UseDraft202012Validator(schema).validate(data)or iterate.iter_errors(data)to surface all errors at once instead of failing on the first.
For quick iteration without writing code, I usually paste the schema and a sample payload into a JSON formatter to confirm both parse, then run them through a browser-based validator. When debugging an unexpected failure, a diff checker helps me compare a failing payload against a known-good payload to spot the offending field.
JSON Schema vs OpenAPI vs TypeScript
These three describe data shapes but solve different problems:
- TypeScript types are compile-time only. They vanish at runtime, so a malformed API payload will silently corrupt your program if you trust the type without validating. Great for developer ergonomics, useless for runtime safety.
- JSON Schema is runtime validation that works in any language. Use it at API boundaries, for config files, for database documents, and for any cross-language data contract. A single schema can drive validation in your Node frontend, Python backend, and Go worker without rewriting.
- OpenAPI (formerly Swagger) wraps JSON Schema inside an API description. It adds endpoints, methods, status codes, authentication, examples, and tooling for client SDK generation. Use it when you are describing an HTTP API and want documentation, client codegen, and validation in one document.
The stack I default to now: write the JSON Schema as source of truth, generate TypeScript types from it with json-schema-to-typescript, and embed the same schema inside an OpenAPI spec for HTTP routes. One source, three outputs, no drift.
The mistakes I kept making
1. Forgetting additionalProperties: false
The original bug. Without it, any extra field passes validation. A client typo like { "emial": "x@y.com" } validates as "no email present plus an unknown field" instead of the clean error you want. Add it by default.
2. Confusing required with type
Listing a property under properties does NOT make it required. You must also add it to the required array. Conversely, required only checks presence. A wrong-type field still fails, but on the type check, not the required check.
3. Using format without enabling assertion
In Ajv you must require('ajv-formats')(ajv). In Python jsonschema pass format_checker=FormatChecker(). Without this, format: email is metadata only and accepts any string. I burned half a day on this one.
4. oneOf where anyOf is correct
oneOf rejects data that matches more than one subschema. If your subschemas overlap (a value that is both a positive integer and a multiple of 5), oneOf rejects. Use it only for genuinely disjoint cases like discriminated unions.
5. multipleOf with floats
IEEE 754 cannot exactly represent 0.1. { "multipleOf": 0.1 } will reject values you expect to pass. Use integer units (cents, basis points) instead.
6. Recompiling schemas on every request
Ajv's compile() is expensive. The compiled validator is fast. Compile once at module load, store the function, reuse it.
Closing thought
JSON Schema looks verbose at first. Often the schema is longer than the data. That is the point. Every constraint you encode is one bug you cannot ship. Start with your top three API endpoints, then your config files, then your cross-service messages. Within a sprint you will catch at least one bug that would have made it to production.
If you want a sandbox, try the JSON Schema Reference Tutorial and an online validator like jsonschemavalidator.net. And if you ever debug a pattern validation that is misbehaving, a regex tester is faster than guessing.
Originally published at calculators.im.
Top comments (0)