DEV Community

SEN LLC
SEN LLC

Posted on

Try the Tech Radar #3 — JSON Schema LLM Prompt, Visualised

Thoughtworks Technology Radar Vol 34 (April 2026) puts Structured output from LLMs in the Adopt ring. That's the "industry should be doing this" tier — not "evaluate," not "trial." This post is a 500-line vanilla JS tool that makes the technique concrete: paste a JSON Schema, see the natural-language prompt fragment an LLM consumes, see the synthesised example output, and validate whatever the model returns against the original schema. No LLM API calls, no build step.

🌐 Demo: https://sen.ltd/portfolio/schema-prompt/
📦 GitHub: https://github.com/sen-ltd/schema-prompt

Screenshot

Structured output is two halves

"Give the LLM a schema" is one sentence, but in code it's two responsibilities:

  1. Schema → prompt. LLMs don't read JSON Schema syntax directly. type, properties, required get re-expressed as natural-language bullets the model can follow. OpenAI's response_format: { type: "json_object" } API lets you pass a schema, but the same translation happens server-side.
  2. Output → validation. Even with structured output, models hallucinate enum values, miss required fields, blow past maximum. Validating at the boundary turns runtime drift into a typed error you can handle.

Libraries like Instructor / Pydantic AI / Outlines hide both. If you're calling raw APIs (Anthropic, OpenRouter, local llama.cpp), you write them. This tool makes both visible.

Schema → prompt translation

Sample input — a sentiment + confidence schema:

{
  "type": "object",
  "properties": {
    "sentiment": {
      "type": "string",
      "enum": ["positive", "negative", "neutral"],
      "description": "Overall sentiment of the input text"
    },
    "confidence": {
      "type": "number",
      "minimum": 0,
      "maximum": 1,
      "description": "Confidence score, 0–1"
    },
    "keywords": {
      "type": "array",
      "items": { "type": "string", "minLength": 1 },
      "maxItems": 5
    }
  },
  "required": ["sentiment", "confidence"]
}
Enter fullscreen mode Exit fullscreen mode

Generated prompt:

Return a JSON object that conforms to the following structure.
Output JSON only — no prose, no code fences.

Fields:
- sentiment: string [one of: ["positive","negative","neutral"]] (required) — Overall sentiment of the input text
- confidence: number [≥ 0, ≤ 1] (required) — Confidence score, 0–1
- keywords: array [max 5 items] (optional)
  - (each item): string [length ≥ 1] (optional)

Use null for fields you cannot determine. Do not invent data.
Enter fullscreen mode Exit fullscreen mode

Choices that matter:

  • enum becomes [one of: ...] — LLMs don't parse JSON Schema natively, but they read enumerated alternatives directly
  • / — the unicode comparison operators travel cleanly through tokenisers
  • explicit (required) / (optional) — required is the most-missed signal in practice; flag it twice
  • nested (each item) with indentation — the array item schema needs its own description, indented under the array bullet
  • trailing "Do not invent data." — the hallucination-suppression line. Without it, models confidently make up values for fields they can't actually determine from the input

The recursive describeProp

Each property is one line, called recursively for objects and arrays:

function describeProp(propName, schema, indent = 0) {
  const pad = "  ".repeat(indent);
  const type = jsonType(schema);
  const required = schema._required === true;
  const tag = required ? " (required)" : " (optional)";
  const desc = schema.description ? ` — ${schema.description}` : "";

  if (type === "object" && schema.properties) {
    const inner = describeObject(schema, indent + 1);
    return `${pad}- ${propName}: object${tag}${desc}\n${inner}`;
  }
  if (type === "array" && schema.items) {
    const arrayConstraint = formatConstraints(schema);
    const itemDesc = describeProp("(each item)", schema.items, indent + 1);
    return `${pad}- ${propName}: array${arrayConstraint}${tag}${desc}\n${itemDesc}`;
  }
  const constraint = formatConstraints(schema);
  return `${pad}- ${propName}: ${type}${constraint}${tag}${desc}`;
}
Enter fullscreen mode Exit fullscreen mode

required is an object-level field in JSON Schema; we propagate it to children as _required so the recursion can render (required) / (optional) per leaf.

The bug I caught with tests: the array branch initially called describeProp for the item but forgot to include formatConstraints on the array itself. So minItems: 1, maxItems: 5 silently disappeared from the prompt. The test caught it cleanly:

test("array bounds", () => {
  const out = buildPrompt({
    type: "object",
    properties: {
      tags: { type: "array", items: { type: "string" }, minItems: 1, maxItems: 5 },
    },
  });
  assert.match(out, /min 1 items/);
  assert.match(out, /max 5 items/);
});
Enter fullscreen mode Exit fullscreen mode

Calling the same responsibility (formatConstraints) in every recursion branch is the kind of invariant unit tests should pin down — it's easy to forget on one path.

Example output synthesis

The tool also generates a JSON shape that matches the schema — useful as a few-shot anchor in the prompt:

function synthesize(schema) {
  if (schema.const !== undefined) return schema.const;
  if (schema.enum) return schema.enum[0];
  const type = jsonType(schema);
  if (type === "object" && schema.properties) {
    const out = {};
    for (const [name, sub] of Object.entries(schema.properties)) {
      out[name] = synthesize(sub);
    }
    return out;
  }
  if (type === "array" && schema.items) {
    return [synthesize(schema.items)];
  }
  if (type === "string") return schema.format ? `<${schema.format}>` : "<string>";
  if (type === "number" || type === "integer") return 0;
  if (type === "boolean") return false;
  return null;
}
Enter fullscreen mode Exit fullscreen mode

format: "email""<email>". enum: ["a", "b"]"a". Otherwise type-appropriate placeholders. The example communicates shape, not data, so placeholders are correct.

Output validation at the boundary

LLMs violate schemas in characteristic ways:

  • Returning a new enum value not in the list ("happy" when the schema asks for ["positive", "negative", "neutral"])
  • Forgetting required fields
  • Negative confidence (schema says minimum: 0, model says -0.3)
  • Type confusion (confidence: "high" instead of a number)

Catch every one at the boundary, with paths:

const errs = validate(schema, { sentiment: "happy", confidence: 1.5 });
// → [
//   { path: "$.sentiment", message: 'must be one of ["positive","negative","neutral"]' },
//   { path: "$.confidence", message: "1.5 > maximum 1" },
// ]
Enter fullscreen mode Exit fullscreen mode

JSONPath-style paths ($.user.profile.email) work for arbitrary nesting. Hand the error list back to the LLM as a retry instruction and you've got a self-correcting loop.

The JSON Schema subset that's enough

Implementing all of Draft 7 / 2020-12 is thousands of lines. The subset that actually shows up in LLM workflows is much smaller:

Implemented:

  • type (string, number, integer, boolean, array, object, null, plus union arrays)
  • properties + required
  • items + minItems / maxItems
  • enum / const
  • minimum / maximum
  • minLength / maxLength
  • pattern (regex)
  • format (rendered in the prompt only — no client-side validation; the LLM should treat it as an instruction)

Deliberately omitted:

  • oneOf / anyOf / allOf — rare in LLM workflows
  • $ref — schema-internal references; expansion adds complexity without LLM-use payoff
  • additionalProperties strict mode — LLMs don't volunteer extra properties anyway

Validator size: ~110 lines. Enough for LLM-output use.

Architecture

prompt.js    ← Schema → LLM prompt + example synthesizer (15 tests)
validate.js  ← JSON Schema validator subset (18 tests)
presets.js   ← 5 real-world schemas + matching sample outputs
app.js       ← UI glue
Enter fullscreen mode Exit fullscreen mode

prompt.js and validate.js are DOM-free. 33 unit tests under node --test cover scalars, nested objects, arrays-of-objects, enum / const / pattern / range constraints, realistic compound schemas, and the "array bounds get included" invariant that bit me earlier.

The 5 presets cover the schemas you actually meet:

  • Sentiment + confidence — the classic NLP shape
  • Address parser — structured extraction from a free-form string
  • Meeting summary — title + attendees + action items array + decisions
  • Entity extraction — text + type + start/end position
  • Product extraction — product description → name / category / price / stock / tags

Try it

Pick a preset, copy the generated prompt fragment into any LLM (ChatGPT / Claude / your local model), paste the response back into the validator. The whole cycle takes ~30 seconds and shows you exactly what your structured-output pipeline needs to handle.

Takeaways

  • Structured output from LLMs is two responsibilities: schema → natural-language prompt, and output → validation.
  • Translate JSON Schema constructs to English explicitly — enum[one of: ...], minimum/maximum[≥ x, ≤ y], required flagged on every field.
  • End the prompt with hallucination suppression ("Use null. Do not invent data.") — small line, large effect.
  • Synthesize an example output from the schema as a shape hint, not real data. Placeholders are correct.
  • Validate at the boundary with paths in errors — feeds directly into LLM retry loops.
  • Don't implement all of JSON Schema. The subset that matters for LLM workflows is ~110 lines of validator.

This is OSS portfolio #249 from SEN LLC (Tokyo), the third entry in the "Try the Tech Radar" series. Previous: #248 Markdown → Typst, #247 TOON converter. Next up: Server-driven UI. We ship continuously: https://sen.ltd/portfolio/

Top comments (0)