Nova

Posted on Feb 18

JSON-First Prompting: Valid Structured Output (Plus a Repair Loop)

#promptengineering

If you use LLMs for anything beyond “brainstorm ideas”, you quickly run into the same bottleneck:

You don’t want prose.

You want structured output you can pipe into code:

a list of tasks with owners
a set of test cases
a config file
a PR checklist
a JSON object you can validate

The problem: LLMs love to almost follow your format.

This post is my practical approach to JSON-first prompting—including a recovery flow for when the model outputs invalid JSON.

The principle: treat output like an API response

When you want JSON, stop prompting like you’re chatting and start prompting like you’re designing an API contract.

That means:

explicitly define the schema
forbid extra keys
specify error behavior
validate output in code

A copy/paste JSON-first prompt

Use this template as a starting point.

You are a careful assistant that outputs ONLY valid JSON.

Return a JSON object matching this schema:
{
  "title": string,
  "summary": string,
  "tasks": [
    {
      "id": string,
      "description": string,
      "priority": "low" | "medium" | "high",
      "estimate_hours": number,
      "dependencies": string[]
    }
  ]
}

Rules:
- Output must be VALID JSON (double quotes, no trailing commas).
- Output must contain ONLY the JSON object. No markdown, no commentary.
- Do not include keys not in the schema.
- If the input is missing critical info, set summary to "NEEDS_INFO" and include a "tasks" entry with id "question" describing what you need.

Input:
<PASTE YOUR NOTES HERE>

This does a few important things:

It says “ONLY JSON” twice (people underestimate how much that helps)
It defines acceptable enums
It specifies what to do when info is missing (instead of hallucinating)

Add a “strictness knob” when it matters

Sometimes you want the model to ask questions. Sometimes you want it to make assumptions.

I add a strictness knob:

Strictness: 0 = assume reasonable defaults, 1 = ask questions when unsure.
Strictness = 1

Then include:

- If Strictness = 1 and any required field can’t be derived, return NEEDS_INFO behavior.

It prevents the model from guessing estimates, priorities, etc.

The reality: you still need validation

Even with a great prompt, you should validate.

Here’s a minimal Node.js snippet that:

1) extracts the first JSON object from a response (helpful when a model leaks extra text)
2) parses it
3) validates basic shape

// parse-json.js
export function extractJson(text) {
  const start = text.indexOf("{");
  const end = text.lastIndexOf("}");
  if (start === -1 || end === -1 || end <= start) {
    throw new Error("No JSON object found");
  }
  return text.slice(start, end + 1);
}

export function parseJson(text) {
  const jsonText = extractJson(text);
  return JSON.parse(jsonText);
}

export function assertSchema(obj) {
  if (typeof obj.title !== "string") throw new Error("title must be string");
  if (!Array.isArray(obj.tasks)) throw new Error("tasks must be array");
  for (const t of obj.tasks) {
    if (typeof t.id !== "string") throw new Error("task.id must be string");
    if (typeof t.description !== "string") throw new Error("task.description must be string");
  }
}

This isn’t a full JSON Schema validator, but it catches 90% of “oops” errors fast.

Recovery flow: when the JSON is invalid

When parsing fails, don’t throw away the whole response. Instead, send a repair prompt.

Repair prompt (copy/paste)

You output invalid JSON.

Fix it.

Rules:
- Output ONLY valid JSON.
- Keep the same data and meaning.
- Do not add new keys.

Here is the invalid output:
<PASTE>

This works surprisingly well.

Bonus: validate → repair → validate loop

In practice, I do:

1) ask for JSON
2) parse + validate
3) if fail, run repair prompt
4) parse + validate again

You can even automate this loop.

A more robust option: JSON Schema (when available)

Some LLM APIs support “structured output” / JSON Schema mode. If you have it, use it.

But even then, the prompting principles stay the same:

define the schema
constrain output
validate anyway

Because the failure mode isn’t just “invalid JSON”—it’s valid JSON with the wrong meaning.

Practical example: generate test cases as JSON

Here’s a prompt that generates unit test cases you can feed into a test runner.

Output ONLY JSON.

Schema:
{
  "function": string,
  "cases": [
    {
      "name": string,
      "input": object,
      "expected": object,
      "notes": string
    }
  ]
}

Rules:
- Valid JSON only.
- cases must include: happy path, edge cases, invalid inputs.

Function contract:
- function: normalizeUser(user)
- input: { name?: string, email?: string, age?: number }
- output: { name: string, email: string, age: number, isAdult: boolean }
- behavior:
  - missing name/email => throw "INVALID_USER"
  - missing age => default to 0

Generate 8 cases.

This gives you something you can turn into real tests quickly.

The key mindset shift

When you prompt for structured output, you’re not “asking for help.”

You’re defining:

a contract
a validator
and a retry path

That combo is what makes LLMs usable in pipelines.

Want more copy/paste templates?

I publish the templates I actually use (code review, debugging, planning, writing) in my Prompt Engineering Cheatsheet.

Free sample: https://getnovapress.gumroad.com/l/prompt-sample
Full shop (Cheatsheet $9+): https://getnovapress.gumroad.com

DEV Community