I shipped a feature that sends a photo to a vision model and gets back structured feedback — three labeled fields, every time, no exceptions. The hard part wasn't the prompt. It was getting the model to return JSON I could actually trust without writing a defensive parser around every call.
If you've ever written this, you know the pain:
const text = response.choices[0].message.content;
const json = JSON.parse(text); // 🙏 please be valid
That line works in the demo and fails in production. The model wraps the JSON in json fences. It adds a "Sure! Here's your analysis:" preamble. It returns colorCoordination one day and color_coordination the next. Every one of those is a thrown exception or a silent undefined three layers down.
Here's how I got rid of that entire class of bug using OpenAI's Responses API and Zod — with the actual code from a side project I built, an AI outfit-feedback app.
The contract: define the shape once, in Zod
The whole trick is that you describe the output shape as a schema, hand it to the API, and the API guarantees the response conforms to it. No fences, no preamble, no key drift. The model is constrained at decode time, not politely asked in the prompt.
Start with the schema. This is the exact shape my app expects back for one outfit:
import { z } from 'zod';
const outfitZod = z.object({
fit: z.string().nullable().optional(),
colorCoordination: z.string().nullable().optional(),
occasionSuitability: z.string().nullable().optional(),
error: z.string().nullable().optional(),
});
export type OutfitFeedback = z.infer<typeof outfitZod>;
Notice two things:
-
One schema, two jobs. It defines the wire format and gives me a TypeScript type for free via
z.infer. The shape can't drift between what the API returns and what my code thinks it returns — they're the same source of truth. -
An
errorfield lives in the schema. This is how I let the model say "I can't analyze this image" inside the structured contract instead of breaking out of it with free text. More on that below — it's the part most tutorials skip.
Wiring the schema into the call
OpenAI ships a helper, zodTextFormat, that converts your Zod schema into the JSON Schema the API wants. You pass it in the text.format field of a Responses API call:
import OpenAI from 'openai';
import { zodTextFormat } from 'openai/helpers/zod';
const openai = new OpenAI({ apiKey: /* your api key */ });
export async function sendPromptToGPT(imageUrl: string): Promise<OutfitFeedback> {
const response = await openai.responses.parse({
model: 'gpt-5-nano', // any vision-capable model works; swap in whatever you use
input: [
{ role: 'system', content: systemPrompt },
{
role: 'user',
content: [{ type: 'input_image', image_url: imageUrl, detail: 'auto' }],
},
],
text: {
format: zodTextFormat(outfitZod, 'outfit_response'),
},
});
const parsed = response.output_parsed;
if (!parsed) {
throw new Error('Invalid response format from GPT');
}
return parsed;
}
The two lines that matter:
-
openai.responses.parse(...)—.parseis the structured-output variant of.create. It runs the response through your schema and hands back the validated object. -
response.output_parsed— already typed asOutfitFeedback. NoJSON.parse. No casting. Noas any. If the model's output didn't fit the schema, this is where you'd find out, not 200 lines downstream when a.toUpperCase()blows up onundefined.
The image goes in as a content part with type: 'input_image'. The Responses API accepts both public URLs and base64 data URLs here, which matters later.
The part tutorials skip: modeling failure inside the schema
Real inputs are messy. Someone uploads a blurry photo, a screenshot of a spreadsheet, a picture with no person in it. The naive move is to let the model "handle it" in prose — which immediately breaks your structured contract, because now you're back to parsing free text to figure out whether the call succeeded.
Instead, I made failure a first-class citizen of the schema. Look back at it: error sits right next to fit and colorCoordination. The system prompt then tells the model exactly when to use it:
Return one of two shapes only:
- Success:
fit,colorCoordination, andoccasionSuitability.- Error:
erroronly.If the image cannot be analyzed at all (no person present, extremely poor quality), set
errorto "Can not analyze the image." and omit all other fields.
Now "I can't do this" is a normal, typed, schema-valid response. My handler reads cleanly:
const parsed = response.output_parsed;
if (!parsed) {
throw new Error('Invalid response format from GPT');
}
// The model told us, in-schema, that it couldn't analyze the image
if (parsed.error) {
return { error: parsed.error };
}
// Belt and suspenders: a "success" response must actually be complete
if (!parsed.fit || !parsed.colorCoordination || !parsed.occasionSuitability) {
return { error: 'Incomplete feedback received. Please try again.' };
}
return parsed;
That last check is deliberate. Structured output guarantees the response matches the schema — but my fields are .optional(), so an empty {} technically matches. The schema enforces shape, not business completeness. Keeping those two concerns separate is the whole game: let the API guarantee structure, and write the handful of lines that guarantee meaning. Don't try to make Zod express "exactly these three fields together OR just this one" — you'll fight the type system and the model. A flat schema plus two if statements is clearer and never lies to you.
Why .nullable().optional() and not .string()
This bites people, so it's worth a beat. With strict structured output, every key you declare may be required to appear. A model that has nothing to say for a field will happily emit null — and a bare z.string() rejects null, turning a perfectly reasonable response into a validation error.
z.string().nullable().optional() accepts the string you want, the null the model sometimes sends, and the absent key in the error case. You trade a little strictness at the schema layer for robustness, then recover the strictness in code with the completeness check above. That's the right division of labor.
What this buys you
Before, every call site needed a try/catch around JSON.parse, a regex to strip code fences, and a key-normalization step. All of it was load-bearing and none of it was tested, because how do you reliably reproduce "the model added a markdown fence today"?
After:
-
Zero hand-parsing.
output_parsedis the object, typed. - One source of truth. The Zod schema is the API contract and the TS type.
-
Failure is data. Unanalyzable input is a typed
errorfield, not an exception. -
Refactors are safe. Rename
colorCoordinationin the schema and TypeScript walks you to every consumer.
The reliability jump is the real payoff. The feature went from "works until a user uploads something weird" to "handles weird input as a normal code path."
Try it
This pattern runs in production in https://stylebias.app — upload a photo, get back the three structured fields you saw above, rendered straight from output_parsed. If you're building anything that turns an image or a document into structured data, the Responses-API plus Zod combo is the cleanest approach I've found.
If you've solved this a different way — function calling, a grammar, an instructor-style wrapper — drop it in the comments.
Top comments (0)