DEV Community

Cover image for Why Asking an LLM for JSON Isn’t Enough

Why Asking an LLM for JSON Isn’t Enough

Vaishali on March 11, 2026

When I first learned prompting, I assumed something simple. If I needed structured data from an LLM, I assumed I could just tell the model to resp...
Collapse
 
ben profile image
Ben Halpern

Great post

Collapse
 
dev-in-progress profile image
Vaishali

Thanks, glad you found it useful!

Collapse
 
uxter profile image
Vasiliy Shilov

This is a great breakdown of the evolution toward reliability. Since you mentioned the challenges of Prompt Maintenance and the overhead of managing schemas, you might find Token-Oriented Object Notation (TOON ) interesting.
I came across it on GitHub toon-format/toon. It’s specifically designed to act as a bridge between JSON and LLMs. It looks a bit like YAML but is optimized to save 30-60% on tokens by stripping away the syntactic noise of JSON while remaining machine-readable.

Collapse
 
deathsaber profile image
Aakash

TOON is an interesting idea, especially for reducing token overhead when you're passing small, repeated structures to a model. Cutting JSON’s syntactic noise can definitely help in prompt-heavy workflows.

One limitation I’ve run into with formats like that is when the schema becomes deeply nested or complex. At that point the readability and structure advantages of JSON (and the surrounding tooling/validators) tend to win back the ground that was saved in tokens.

So in practice I’ve found a rough rule of thumb:

  • Small / repeated schemas → formats like TOON can work nicely
  • Deep or complex schemas → JSON + schema validation tends to be more maintainable

Another approach that sometimes helps is simplifying the schema and splitting the task into multiple LLM calls instead of trying to force one large structured response.

Curious if anyone here has tried TOON in larger pipelines and how it behaved with more complex structures.

Collapse
 
dev-in-progress profile image
Vaishali

That’s a really interesting perspective. The trade-off between token efficiency and maintainability makes a lot of sense, especially once schemas start getting deeply nested.

I haven’t experimented with TOON yet, but the point about JSON + schema validation winning for complex structures feels very realistic given the tooling around it.

Collapse
 
dev-in-progress profile image
Vaishali

Thanks! That’s interesting — I hadn’t come across TOON before. The idea of reducing JSON’s syntactic overhead for LLM interactions is pretty clever, especially if it can meaningfully reduce token usage.

In practice I’ve mostly focused on making structured outputs reliable (schemas, validation, etc.), but exploring formats that are more LLM-friendly is definitely an interesting direction.

Collapse
 
trinhcuong-ast profile image
Kai Alder

One thing I've been dealing with on the JS/TS side — if you're not using Python and Pydantic, Zod + the OpenAI SDK's zodResponseFormat helper is a game changer for this exact problem. You define your schema once in Zod, pass it as the response format, and get typed, validated output back. No more writing JSON schemas by hand.

The part about prompt injection affecting structured output is something I don't see enough people talk about. Even with json_schema mode, the content of the values can still be manipulated by injection — the schema just ensures the shape is right. So you still need to sanitize/validate the actual values in your business logic.

Curious — have you run into issues with the strict schema mode and optional fields? I found that handling nullable vs missing fields gets tricky when additionalProperties: false is set.

Collapse
 
dev-in-progress profile image
Vaishali

Thanks for sharing this — the Zod + OpenAI SDK approach sounds really useful for the JS/TS side.

I’m currently exploring the JS/TS ecosystem around this and planning a small project to experiment with structured outputs more deeply. I haven’t hit the strict schema + optional field edge cases yet, but I’ll definitely watch for that as I build it.

Really appreciate you pointing that out — I’ll share what I learn once I’ve had a chance to experiment with it more.

Collapse
 
williamwangai profile image
William Wang

Really solid walkthrough of the progression from "just ask for JSON" to proper structured outputs. This mirrors my experience exactly.

One thing I'd add — even with structured output schemas, you still need defensive parsing in production. Models can timeout, connections can drop mid-stream, and you'll get partial JSON. So the pattern I've landed on is: structured output schema as the first line of defense, then a fallback parser that attempts to extract partial data rather than failing completely.

The prompt injection point is especially important and often overlooked. I've seen production systems where the entire data pipeline depended on LLM-generated JSON with zero validation. Structured outputs don't just improve reliability — they fundamentally change the trust boundary between your LLM layer and the rest of your system.

Collapse
 
dev-in-progress profile image
Vaishali

Thanks for sharing this — that’s a really useful addition. The idea of treating structured outputs as the first line of defense and still keeping fallback parsing for partial responses makes a lot of sense for production systems.

I also like how you framed it as a trust boundary. That’s a great way to think about integrating LLM outputs into real systems.

Collapse
 
williamwangai profile image
William Wang

Appreciate the kind words! The trust boundary framing really resonated with me too — in production, you can't just assume the LLM will always return valid JSON. Having structured outputs as the primary path with graceful fallback parsing is exactly how we handle it in our systems. It's similar to how you'd validate any external API response, except LLM outputs are inherently less deterministic.

Thread Thread
 
dev-in-progress profile image
Vaishali

Appreciate you sharing your experience. The comparison with external API validation makes a lot of sense, and your point about fallback parsing was especially insightful for thinking about how these systems behave in production.

Thread Thread
 
williamwangai profile image
William Wang

Thanks Vaishali! Yeah, the production behavior aspect is where most teams get surprised. In my experience, the gap between "works in testing" and "handles real-world LLM output gracefully" is where fallback parsing really earns its keep. The models are getting more reliable at structured output, but having that safety net means you can upgrade models or change prompts without worrying about breaking downstream consumers.

Collapse
 
deathsaber profile image
Aakash

Nice breakdown of the evolution — prompting → JSON mode → function calling → json_schema. That progression really shows how LLMs are slowly moving from “text generators” toward software components.

One thing I’d add from the systems side: the real shift happens when an LLM response stops being UI text and becomes machine input.

The moment you do something like:

const data = JSON.parse(response)

the LLM is effectively part of your production system boundary. And boundaries fail.

So even with strict json_schema, most production pipelines still wrap the model with:

• schema validation (Pydantic / Zod / Ajv)
• correction loops (“repair the JSON to match this schema”)
• retry logic
• logging/observability for schema drift

A useful mental model is that LLMs behave less like deterministic libraries and more like unreliable upstream services.

Structured outputs reduce entropy, but the reliability really comes from the surrounding system design.

Prompting teaches you how to talk to models.
Engineering with them means assuming they will occasionally be wrong and designing around that.

Collapse
 
dev-in-progress profile image
Vaishali

Thanks for sharing this — I really like the framing of LLMs as unreliable upstream services. Once the response becomes machine input instead of UI text, the reliability requirements change quite a bit.

I did mention schema validation with tools like Pydantic in the article. The retry logic and observability side of things are areas I’m planning to explore more as I go deeper into building with these systems.

Collapse
 
klement_gunndu profile image
klement Gunndu

Worth adding that Pydantic model_validate_json() pairs well with structured outputs — you get runtime type coercion plus field-level validation in one step, which catches the year-as-string problem you showed.

Collapse
 
dev-in-progress profile image
Vaishali

Thanks! That’s a great addition. Pairing structured outputs with model_validate_json() gives you both type coercion and validation, which makes handling issues like the year-as-string case much safer in real applications.

Collapse
 
moebiusansa profile image
Fard Johnmar

I learned this lesson as well. But I've learned that one trick beats them all for guaranteeing reliable output from LLMs: Pydantic.

Collapse
 
dev-in-progress profile image
Vaishali

Totally agree — Pydantic makes validation much easier and adds a really helpful safety layer when working with LLM outputs.

Collapse
 
moebiusansa profile image
Fard Johnmar

Here's some additional information about this pattern: python.useinstructor.com/blog/2024... -- older article, but my LLM outputs completely transformed when I started to implement these patterns. instructor is a great package that you can install that will enforce reliability at runtime with structured validation that not only enforces format and quality requirements but also pipes feedback back to the LLM to self-correct. Using this framework I can implement highly complex workflows with multiple-agent handoffs and review reliably 100% of the time. It's essentially helped me make LLMs deterministic rather than probabilistic.

Another lesson I've learned is to really keep agents focused on delivering specific outputs rather than relying on them to deliver multiple outputs at once. My agentic systems are usually bundles of agents all assigned to specific tasks that increases quality and observability along with as much support as I can give the agents by delivering information into their context that's highly structured and specific.

The issue I see a lot of people using AI in workflows is that they give agents too much to do and don't provide enough support for the agent to deliver consistent results. So being judicious about where you deploy agents in workflows is really important too.

But that's another topic all-together.

Thread Thread
 
dev-in-progress profile image
Vaishali

Thanks for sharing this — the Instructor pattern looks really interesting.

I also like the point about keeping agents focused on specific outputs rather than asking them to do too many things at once. That seems like a really practical approach for improving reliability and observability in agent workflows.

Collapse
 
velx profile image
Velx Dev

Good progression laid out here. One wrinkle worth adding: streaming complicates this whole picture. With json_schema or function calling you get reliability at the end of a complete response, but once you enable streaming you're back to dealing with partial JSON mid-flight. Libraries like partial-json or the streaming parsers in some SDKs help, but it's an easy trap to walk into — especially when you want low-latency UIs that show output as it arrives. The moment you try to JSON.parse() a streaming chunk you're back to square one.

Collapse
 
dev-in-progress profile image
Vaishali

That’s a great point. Streaming definitely complicates things because you’re dealing with partial JSON until the full response completes.
The trade-off between low-latency streaming UIs and reliable structured parsing is a really interesting challenge for production systems.

Collapse
 
ell1s profile image
Ellis • Edited

So only works with chatgpt? why not just have validation step at the end. I.e, this is already a solved problem, think of any web form you hit submit on the backend you just don't blindly accept it.

Collapse
 
dev-in-progress profile image
Vaishali

Good point — validation is definitely still important, just like backend validation for form submissions. The difference is that with LLMs the model may not produce valid JSON at all unless it’s guided toward a schema.

Structured outputs help constrain the model during generation so the response already follows the expected structure, and validation can then act as the safety check afterward.

And it’s not limited to ChatGPT — I used OpenAI in the examples because it’s widely used, and many providers expose OpenAI-compatible APIs, so similar structured output patterns can often be used across different models.

Collapse
 
designestimationllc profile image
Design Estimation LLC

LLM is a Game changer in Seo Field

Collapse
 
dev-in-progress profile image
Vaishali

Definitely — LLMs are already changing how content is created, analyzed, and optimized in SEO workflows.

Collapse
 
harsh2644 profile image
Harsh

So true. Telling an LLM respond in JSON is like telling a junior dev write clean code the intent is there, but the execution needs guardrails.

Collapse
 
dev-in-progress profile image
Vaishali

That’s a great analogy. The intent is there, but without clear constraints and validation, the results can still be unpredictable — which is exactly why structured outputs and schemas become important.