SEN LLC

Posted on May 31

Try the Tech Radar #3 — JSON Schema LLM Prompt, Visualised

#llm #jsonschema #webdev #javascript

Thoughtworks Technology Radar Vol 34 (April 2026) puts Structured output from LLMs in the Adopt ring. That's the "industry should be doing this" tier — not "evaluate," not "trial." This post is a 500-line vanilla JS tool that makes the technique concrete: paste a JSON Schema, see the natural-language prompt fragment an LLM consumes, see the synthesised example output, and validate whatever the model returns against the original schema. No LLM API calls, no build step.

🌐 Demo: https://sen.ltd/portfolio/schema-prompt/
📦 GitHub: https://github.com/sen-ltd/schema-prompt

Structured output is two halves

"Give the LLM a schema" is one sentence, but in code it's two responsibilities:

Schema → prompt. LLMs don't read JSON Schema syntax directly. type, properties, required get re-expressed as natural-language bullets the model can follow. OpenAI's response_format: { type: "json_object" } API lets you pass a schema, but the same translation happens server-side.
Output → validation. Even with structured output, models hallucinate enum values, miss required fields, blow past maximum. Validating at the boundary turns runtime drift into a typed error you can handle.

Libraries like Instructor / Pydantic AI / Outlines hide both. If you're calling raw APIs (Anthropic, OpenRouter, local llama.cpp), you write them. This tool makes both visible.

Schema → prompt translation

Sample input — a sentiment + confidence schema:

{
  "type": "object",
  "properties": {
    "sentiment": {
      "type": "string",
      "enum": ["positive", "negative", "neutral"],
      "description": "Overall sentiment of the input text"
    },
    "confidence": {
      "type": "number",
      "minimum": 0,
      "maximum": 1,
      "description": "Confidence score, 0–1"
    },
    "keywords": {
      "type": "array",
      "items": { "type": "string", "minLength": 1 },
      "maxItems": 5
    }
  },
  "required": ["sentiment", "confidence"]
}

Generated prompt:

Return a JSON object that conforms to the following structure.
Output JSON only — no prose, no code fences.

Fields:
- sentiment: string [one of: ["positive","negative","neutral"]] (required) — Overall sentiment of the input text
- confidence: number [≥ 0, ≤ 1] (required) — Confidence score, 0–1
- keywords: array [max 5 items] (optional)
  - (each item): string [length ≥ 1] (optional)

Use null for fields you cannot determine. Do not invent data.

Choices that matter:

enum becomes [one of: ...] — LLMs don't parse JSON Schema natively, but they read enumerated alternatives directly
≥ / ≤ — the unicode comparison operators travel cleanly through tokenisers
explicit (required) / (optional) — required is the most-missed signal in practice; flag it twice
nested (each item) with indentation — the array item schema needs its own description, indented under the array bullet
trailing "Do not invent data." — the hallucination-suppression line. Without it, models confidently make up values for fields they can't actually determine from the input

The recursive `describeProp`

Each property is one line, called recursively for objects and arrays:

function describeProp(propName, schema, indent = 0) {
  const pad = "  ".repeat(indent);
  const type = jsonType(schema);
  const required = schema._required === true;
  const tag = required ? " (required)" : " (optional)";
  const desc = schema.description ? ` — ${schema.description}` : "";

  if (type === "object" && schema.properties) {
    const inner = describeObject(schema, indent + 1);
    return `${pad}- ${propName}: object${tag}${desc}\n${inner}`;
  }
  if (type === "array" && schema.items) {
    const arrayConstraint = formatConstraints(schema);
    const itemDesc = describeProp("(each item)", schema.items, indent + 1);
    return `${pad}- ${propName}: array${arrayConstraint}${tag}${desc}\n${itemDesc}`;
  }
  const constraint = formatConstraints(schema);
  return `${pad}- ${propName}: ${type}${constraint}${tag}${desc}`;
}

required is an object-level field in JSON Schema; we propagate it to children as _required so the recursion can render (required) / (optional) per leaf.

The bug I caught with tests: the array branch initially called describeProp for the item but forgot to include formatConstraints on the array itself. So minItems: 1, maxItems: 5 silently disappeared from the prompt. The test caught it cleanly:

test("array bounds", () => {
  const out = buildPrompt({
    type: "object",
    properties: {
      tags: { type: "array", items: { type: "string" }, minItems: 1, maxItems: 5 },
    },
  });
  assert.match(out, /min 1 items/);
  assert.match(out, /max 5 items/);
});

Calling the same responsibility (formatConstraints) in every recursion branch is the kind of invariant unit tests should pin down — it's easy to forget on one path.

Example output synthesis

The tool also generates a JSON shape that matches the schema — useful as a few-shot anchor in the prompt:

function synthesize(schema) {
  if (schema.const !== undefined) return schema.const;
  if (schema.enum) return schema.enum[0];
  const type = jsonType(schema);
  if (type === "object" && schema.properties) {
    const out = {};
    for (const [name, sub] of Object.entries(schema.properties)) {
      out[name] = synthesize(sub);
    }
    return out;
  }
  if (type === "array" && schema.items) {
    return [synthesize(schema.items)];
  }
  if (type === "string") return schema.format ? `<${schema.format}>` : "<string>";
  if (type === "number" || type === "integer") return 0;
  if (type === "boolean") return false;
  return null;
}

format: "email" → "<email>". enum: ["a", "b"] → "a". Otherwise type-appropriate placeholders. The example communicates shape, not data, so placeholders are correct.

Output validation at the boundary

LLMs violate schemas in characteristic ways:

Returning a new enum value not in the list ("happy" when the schema asks for ["positive", "negative", "neutral"])
Forgetting required fields
Negative confidence (schema says minimum: 0, model says -0.3)
Type confusion (confidence: "high" instead of a number)

Catch every one at the boundary, with paths:

const errs = validate(schema, { sentiment: "happy", confidence: 1.5 });
// → [
//   { path: "$.sentiment", message: 'must be one of ["positive","negative","neutral"]' },
//   { path: "$.confidence", message: "1.5 > maximum 1" },
// ]

JSONPath-style paths ($.user.profile.email) work for arbitrary nesting. Hand the error list back to the LLM as a retry instruction and you've got a self-correcting loop.

The JSON Schema subset that's enough

Implementing all of Draft 7 / 2020-12 is thousands of lines. The subset that actually shows up in LLM workflows is much smaller:

Implemented:

type (string, number, integer, boolean, array, object, null, plus union arrays)
properties + required
items + minItems / maxItems
enum / const
minimum / maximum
minLength / maxLength
pattern (regex)
format (rendered in the prompt only — no client-side validation; the LLM should treat it as an instruction)

Deliberately omitted:

oneOf / anyOf / allOf — rare in LLM workflows
$ref — schema-internal references; expansion adds complexity without LLM-use payoff
additionalProperties strict mode — LLMs don't volunteer extra properties anyway

Validator size: ~110 lines. Enough for LLM-output use.

Architecture

prompt.js    ← Schema → LLM prompt + example synthesizer (15 tests)
validate.js  ← JSON Schema validator subset (18 tests)
presets.js   ← 5 real-world schemas + matching sample outputs
app.js       ← UI glue

prompt.js and validate.js are DOM-free. 33 unit tests under node --test cover scalars, nested objects, arrays-of-objects, enum / const / pattern / range constraints, realistic compound schemas, and the "array bounds get included" invariant that bit me earlier.

The 5 presets cover the schemas you actually meet:

Sentiment + confidence — the classic NLP shape
Address parser — structured extraction from a free-form string
Meeting summary — title + attendees + action items array + decisions
Entity extraction — text + type + start/end position
Product extraction — product description → name / category / price / stock / tags

Try it

Demo: https://sen.ltd/portfolio/schema-prompt/
GitHub: https://github.com/sen-ltd/schema-prompt

Pick a preset, copy the generated prompt fragment into any LLM (ChatGPT / Claude / your local model), paste the response back into the validator. The whole cycle takes ~30 seconds and shows you exactly what your structured-output pipeline needs to handle.

Takeaways

Structured output from LLMs is two responsibilities: schema → natural-language prompt, and output → validation.
Translate JSON Schema constructs to English explicitly — enum → [one of: ...], minimum/maximum → [≥ x, ≤ y], required flagged on every field.
End the prompt with hallucination suppression ("Use null. Do not invent data.") — small line, large effect.
Synthesize an example output from the schema as a shape hint, not real data. Placeholders are correct.
Validate at the boundary with paths in errors — feeds directly into LLM retry loops.
Don't implement all of JSON Schema. The subset that matters for LLM workflows is ~110 lines of validator.

This is OSS portfolio #249 from SEN LLC (Tokyo), the third entry in the "Try the Tech Radar" series. Previous: #248 Markdown → Typst, #247 TOON converter. Next up: Server-driven UI. We ship continuously: https://sen.ltd/portfolio/

DEV Community

Try the Tech Radar #3 — JSON Schema LLM Prompt, Visualised

Structured output is two halves

Schema → prompt translation

The recursive `describeProp`

Example output synthesis

Output validation at the boundary

The JSON Schema subset that's enough

Architecture

Try it

Takeaways

Top comments (0)

Structured output is two halves

Schema → prompt translation

The recursive describeProp

Example output synthesis

Output validation at the boundary

The JSON Schema subset that's enough

Architecture

Try it

Takeaways

The recursive `describeProp`