When I first learned prompting, I assumed something simple.
If I needed structured data from an LLM, I assumed I could just tell the model to resp...
For further actions, you may consider blocking this person and/or reporting abuse
Great post
Thanks, glad you found it useful!
This is a great breakdown of the evolution toward reliability. Since you mentioned the challenges of Prompt Maintenance and the overhead of managing schemas, you might find Token-Oriented Object Notation (TOON ) interesting.
I came across it on GitHub toon-format/toon. It’s specifically designed to act as a bridge between JSON and LLMs. It looks a bit like YAML but is optimized to save 30-60% on tokens by stripping away the syntactic noise of JSON while remaining machine-readable.
TOON is an interesting idea, especially for reducing token overhead when you're passing small, repeated structures to a model. Cutting JSON’s syntactic noise can definitely help in prompt-heavy workflows.
One limitation I’ve run into with formats like that is when the schema becomes deeply nested or complex. At that point the readability and structure advantages of JSON (and the surrounding tooling/validators) tend to win back the ground that was saved in tokens.
So in practice I’ve found a rough rule of thumb:
Another approach that sometimes helps is simplifying the schema and splitting the task into multiple LLM calls instead of trying to force one large structured response.
Curious if anyone here has tried TOON in larger pipelines and how it behaved with more complex structures.
That’s a really interesting perspective. The trade-off between token efficiency and maintainability makes a lot of sense, especially once schemas start getting deeply nested.
I haven’t experimented with TOON yet, but the point about JSON + schema validation winning for complex structures feels very realistic given the tooling around it.
Thanks! That’s interesting — I hadn’t come across TOON before. The idea of reducing JSON’s syntactic overhead for LLM interactions is pretty clever, especially if it can meaningfully reduce token usage.
In practice I’ve mostly focused on making structured outputs reliable (schemas, validation, etc.), but exploring formats that are more LLM-friendly is definitely an interesting direction.
One thing I've been dealing with on the JS/TS side — if you're not using Python and Pydantic, Zod + the OpenAI SDK's
zodResponseFormathelper is a game changer for this exact problem. You define your schema once in Zod, pass it as the response format, and get typed, validated output back. No more writing JSON schemas by hand.The part about prompt injection affecting structured output is something I don't see enough people talk about. Even with
json_schemamode, the content of the values can still be manipulated by injection — the schema just ensures the shape is right. So you still need to sanitize/validate the actual values in your business logic.Curious — have you run into issues with the strict schema mode and optional fields? I found that handling nullable vs missing fields gets tricky when
additionalProperties: falseis set.Thanks for sharing this — the Zod + OpenAI SDK approach sounds really useful for the JS/TS side.
I’m currently exploring the JS/TS ecosystem around this and planning a small project to experiment with structured outputs more deeply. I haven’t hit the strict schema + optional field edge cases yet, but I’ll definitely watch for that as I build it.
Really appreciate you pointing that out — I’ll share what I learn once I’ve had a chance to experiment with it more.
Really solid walkthrough of the progression from "just ask for JSON" to proper structured outputs. This mirrors my experience exactly.
One thing I'd add — even with structured output schemas, you still need defensive parsing in production. Models can timeout, connections can drop mid-stream, and you'll get partial JSON. So the pattern I've landed on is: structured output schema as the first line of defense, then a fallback parser that attempts to extract partial data rather than failing completely.
The prompt injection point is especially important and often overlooked. I've seen production systems where the entire data pipeline depended on LLM-generated JSON with zero validation. Structured outputs don't just improve reliability — they fundamentally change the trust boundary between your LLM layer and the rest of your system.
Thanks for sharing this — that’s a really useful addition. The idea of treating structured outputs as the first line of defense and still keeping fallback parsing for partial responses makes a lot of sense for production systems.
I also like how you framed it as a trust boundary. That’s a great way to think about integrating LLM outputs into real systems.
Appreciate the kind words! The trust boundary framing really resonated with me too — in production, you can't just assume the LLM will always return valid JSON. Having structured outputs as the primary path with graceful fallback parsing is exactly how we handle it in our systems. It's similar to how you'd validate any external API response, except LLM outputs are inherently less deterministic.
Appreciate you sharing your experience. The comparison with external API validation makes a lot of sense, and your point about fallback parsing was especially insightful for thinking about how these systems behave in production.
Thanks Vaishali! Yeah, the production behavior aspect is where most teams get surprised. In my experience, the gap between "works in testing" and "handles real-world LLM output gracefully" is where fallback parsing really earns its keep. The models are getting more reliable at structured output, but having that safety net means you can upgrade models or change prompts without worrying about breaking downstream consumers.
Nice breakdown of the evolution — prompting → JSON mode → function calling → json_schema. That progression really shows how LLMs are slowly moving from “text generators” toward software components.
One thing I’d add from the systems side: the real shift happens when an LLM response stops being UI text and becomes machine input.
The moment you do something like:
const data = JSON.parse(response)the LLM is effectively part of your production system boundary. And boundaries fail.
So even with strict
json_schema, most production pipelines still wrap the model with:• schema validation (Pydantic / Zod / Ajv)
• correction loops (“repair the JSON to match this schema”)
• retry logic
• logging/observability for schema drift
A useful mental model is that LLMs behave less like deterministic libraries and more like unreliable upstream services.
Structured outputs reduce entropy, but the reliability really comes from the surrounding system design.
Prompting teaches you how to talk to models.
Engineering with them means assuming they will occasionally be wrong and designing around that.
Thanks for sharing this — I really like the framing of LLMs as unreliable upstream services. Once the response becomes machine input instead of UI text, the reliability requirements change quite a bit.
I did mention schema validation with tools like Pydantic in the article. The retry logic and observability side of things are areas I’m planning to explore more as I go deeper into building with these systems.
Worth adding that Pydantic model_validate_json() pairs well with structured outputs — you get runtime type coercion plus field-level validation in one step, which catches the year-as-string problem you showed.
Thanks! That’s a great addition. Pairing structured outputs with
model_validate_json()gives you both type coercion and validation, which makes handling issues like the year-as-string case much safer in real applications.I learned this lesson as well. But I've learned that one trick beats them all for guaranteeing reliable output from LLMs: Pydantic.
Totally agree — Pydantic makes validation much easier and adds a really helpful safety layer when working with LLM outputs.
Here's some additional information about this pattern: python.useinstructor.com/blog/2024... -- older article, but my LLM outputs completely transformed when I started to implement these patterns. instructor is a great package that you can install that will enforce reliability at runtime with structured validation that not only enforces format and quality requirements but also pipes feedback back to the LLM to self-correct. Using this framework I can implement highly complex workflows with multiple-agent handoffs and review reliably 100% of the time. It's essentially helped me make LLMs deterministic rather than probabilistic.
Another lesson I've learned is to really keep agents focused on delivering specific outputs rather than relying on them to deliver multiple outputs at once. My agentic systems are usually bundles of agents all assigned to specific tasks that increases quality and observability along with as much support as I can give the agents by delivering information into their context that's highly structured and specific.
The issue I see a lot of people using AI in workflows is that they give agents too much to do and don't provide enough support for the agent to deliver consistent results. So being judicious about where you deploy agents in workflows is really important too.
But that's another topic all-together.
Thanks for sharing this — the Instructor pattern looks really interesting.
I also like the point about keeping agents focused on specific outputs rather than asking them to do too many things at once. That seems like a really practical approach for improving reliability and observability in agent workflows.
Good progression laid out here. One wrinkle worth adding: streaming complicates this whole picture. With json_schema or function calling you get reliability at the end of a complete response, but once you enable streaming you're back to dealing with partial JSON mid-flight. Libraries like partial-json or the streaming parsers in some SDKs help, but it's an easy trap to walk into — especially when you want low-latency UIs that show output as it arrives. The moment you try to JSON.parse() a streaming chunk you're back to square one.
That’s a great point. Streaming definitely complicates things because you’re dealing with partial JSON until the full response completes.
The trade-off between low-latency streaming UIs and reliable structured parsing is a really interesting challenge for production systems.
So only works with chatgpt? why not just have validation step at the end. I.e, this is already a solved problem, think of any web form you hit submit on the backend you just don't blindly accept it.
Good point — validation is definitely still important, just like backend validation for form submissions. The difference is that with LLMs the model may not produce valid JSON at all unless it’s guided toward a schema.
Structured outputs help constrain the model during generation so the response already follows the expected structure, and validation can then act as the safety check afterward.
And it’s not limited to ChatGPT — I used OpenAI in the examples because it’s widely used, and many providers expose OpenAI-compatible APIs, so similar structured output patterns can often be used across different models.
LLM is a Game changer in Seo Field
Definitely — LLMs are already changing how content is created, analyzed, and optimized in SEO workflows.
So true. Telling an LLM respond in JSON is like telling a junior dev write clean code the intent is there, but the execution needs guardrails.
That’s a great analogy. The intent is there, but without clear constraints and validation, the results can still be unpredictable — which is exactly why structured outputs and schemas become important.