DEV Community

Cover image for How to structure JSON for LLMs (and stop wasting tokens)
Tahmid
Tahmid

Posted on

How to structure JSON for LLMs (and stop wasting tokens)

Most developers treat JSON as an afterthought when building LLM-powered apps. They dump raw API responses into prompts and wonder why the model hallucinates, misreads fields, or burns through tokens.

JSON structure is a first-class concern in AI engineering. Here's how to get it right.


The problem: LLMs don't read JSON like humans do

When you paste this into a prompt:

{"user":{"id":1,"name":"Alice","preferences":{"theme":"dark","notifications":true,"language":"en"}}}
Enter fullscreen mode Exit fullscreen mode

The model tokenizes it character by character. Every {, ", :, and , is a token. Deeply nested structures force the model to maintain more working context just to understand the shape of the data — before it even processes the values.

The result: more tokens consumed, more room for misinterpretation, higher API costs.


Rule 1: Flatten before you prompt

Nested JSON is great for APIs. It's bad for prompts.

Before:

{
  "user": {
    "profile": {
      "name": "Alice",
      "age": 28
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

After (flattened):

{
  "user_name": "Alice",
  "user_age": 28
}
Enter fullscreen mode Exit fullscreen mode

One level deep is almost always enough for LLM context. If the model needs to reason about relationships, describe them in natural language alongside the data — don't encode them in nesting.


Rule 2: Strip fields the model doesn't need

Every field in your JSON costs tokens. If the model doesn't need created_at, updated_at, internal_id, or _metadata — remove them before building the prompt.

const { created_at, updated_at, _metadata, ...relevant } = apiResponse;
const prompt = `Here is the user data: ${JSON.stringify(relevant)}`;
Enter fullscreen mode Exit fullscreen mode

This alone can cut token usage by 20–40% on typical API responses.


Rule 3: Use TOON for large payloads

If you're passing payloads larger than ~500 tokens, consider TOON (Token-Oriented Object Notation). It's a compact alternative to JSON that strips redundant syntax while preserving structure.

JSON:

[
  {"name": "Alice", "role": "admin"},
  {"name": "Bob", "role": "editor"}
]
Enter fullscreen mode Exit fullscreen mode

TOON:

name|role
Alice|admin
Bob|editor
Enter fullscreen mode Exit fullscreen mode

Token reduction: 30–60% on typical datasets. The model reads it correctly because the structure is still unambiguous — just more compact.

Try it on your own payloads with the JSON to TOON converter. There's also a TOON to JSON converter for decoding the model's response back.


Rule 4: Use JSON Schema to enforce structured outputs

LLMs can return JSON — but without constraints, they hallucinate keys, change types, and add fields you didn't ask for.

The fix: define a schema and include it in your system prompt.

{
  "type": "object",
  "properties": {
    "sentiment": { "type": "string", "enum": ["positive", "negative", "neutral"] },
    "confidence": { "type": "number", "minimum": 0, "maximum": 1 },
    "summary": { "type": "string", "maxLength": 200 }
  },
  "required": ["sentiment", "confidence", "summary"]
}
Enter fullscreen mode Exit fullscreen mode

Tell the model: "Respond only with a JSON object matching this schema. No explanation, no markdown."

Then validate the output with a JSON Schema validator before trusting it. This is especially critical in agentic workflows where one bad output poisons downstream steps. You can generate a schema automatically from any sample payload using the JSON Schema generator — useful as a starting point you can then tighten up.


Rule 5: Use Pydantic or Zod to validate at the boundary

Never trust raw LLM JSON output in production. Parse and validate it immediately.

Python (FastAPI / AI agents):

from pydantic import BaseModel

class SentimentResult(BaseModel):
    sentiment: str
    confidence: float
    summary: str

result = SentimentResult.model_validate_json(llm_output)
Enter fullscreen mode Exit fullscreen mode

TypeScript (Next.js / tRPC):

import { z } from 'zod';

const SentimentResult = z.object({
  sentiment: z.enum(['positive', 'negative', 'neutral']),
  confidence: z.number().min(0).max(1),
  summary: z.string().max(200),
});

const result = SentimentResult.parse(JSON.parse(llmOutput));
Enter fullscreen mode Exit fullscreen mode

Writing these by hand from a large JSON payload is tedious. The JSON to Pydantic and JSON to Zod generators handle it instantly — paste your payload, get the model.


Rule 6: Use TypeScript interfaces when working with typed LLM outputs

If you're building in TypeScript, generating interfaces from your JSON response shapes saves time and prevents drift between what the LLM returns and what your code expects.

// Generated from your actual LLM response shape
interface SentimentResponse {
  sentiment: 'positive' | 'negative' | 'neutral';
  confidence: number;
  summary: string;
}
Enter fullscreen mode Exit fullscreen mode

The JSON to TypeScript converter generates these from any payload — useful when you're iterating quickly on prompt outputs and want the type system to catch regressions.


The full checklist

Before passing JSON to an LLM:

  • Flatten nested structures to one level where possible
  • Strip fields irrelevant to the task
  • Use TOON for payloads > 500 tokens
  • Define a JSON Schema for expected output
  • Validate output with Pydantic or Zod before use
  • Use TypeScript interfaces to catch output shape regressions

These aren't micro-optimisations. On high-volume AI apps, they compound into significant cost and reliability improvements.


What's your current approach to JSON in LLM workflows? Drop it in the comments — I'm curious how others handle this.


Free tools used in this post:

Top comments (0)