zhongqiyue

Posted on Jun 29

Stop Copy-Pasting AI Output: How I Forced My LLM to Speak JSON

#webdev #python #ai #tutorial

A few months ago, I was building a little tool to automatically extract event details from support emails. You know the drill: someone writes "Hey, can we schedule a demo for next Tuesday at 3pm?" and I need to pull out the date, time, and topic. LLMs are perfect for this, I thought. I’ll just ask GPT to return JSON.

Simple, right?

Here’s what I actually got back from the model:

{
  "event": "demo",
  "date": "next Tuesday",
  "time": "3pm"
}
By the way, I hope this helps! Let me know if you need more details.

Cute. But now my json.loads() throws a hissy fit because of the extra text. This is the problem that stole three weekends from me.

The naïve route (and why it hurt)

My first attempt was classic: pass the raw response into json.loads() wrapped in a try/except. When it fails, retry the API call. Works great for the one-off, but for a production pipeline processing hundreds of emails, retries are slow and waste tokens. Plus, the model sometimes hallucinates fields or omits required ones. Retry doesn’t fix that.

import json
from openai import OpenAI

client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Extract event as JSON: 'Let's meet Friday 2pm for sprint review'"}]
)
raw = response.choices[0].message.content

try:
    data = json.loads(raw)
except json.JSONDecodeError:
    print("Failed. Retrying...")
    # recursion hell starts here

Dead end #1.

Next I tried regex to strip anything that wasn’t a valid JSON structure. That worked until the model nested strings containing braces or colons. I spent a day tuning a regex pattern that eventually broke on a perfectly valid JSON containing an escaped quote. Dead end #2.

The prompt-engineering rabbit hole

I went full prompt engineer. I added “ONLY output valid JSON, no extra text. I will be parsing programmatically.” I provided a schema. I threatened the model with digital scolding. It still failed 10% of the time. For a high‑volume pipeline, 10% failure is a fire alarm.

Actually, the model obeyed 90% of the time. But that 10% was random: sometimes it added a

```json marker, sometimes it wrapped the JSON in tags, once it said “Sure! Here’s the JSON:”. Prompt engineering alone is like duct tape – it works until the wind changes direction.

The lightbulb moment: function calling

While reading OpenAI’s docs (again), I stumbled upon function calling – a feature that lets you define a function signature with typed parameters, and the model returns a structured JSON object that guarantees the shape you asked for. If the model can’t fill a required parameter, it leaves it null – but the JSON structure is always valid.

Wait, why isn’t this the default for structured extraction?

I rewrote my extraction module to use function calling. Here’s the core idea:


python
from openai import OpenAI

client = OpenAI()

# Define the schema as a function
functions = [
    {
        "name": "extract_event",
        "description": "Extract event details from an email",
        "parameters": {
            "type": "object",
            "properties": {
                "title": {"type": "string"},
                "date": {"type": "string", "description": "Date in YYYY-MM-DD format"},
                "time": {"type": "string"},
                "attendees": {
                    "type": "array",
                    "items": {"type": "string"}
                }
            },
            "required": ["title", "date", "time"]
        }
    }
]

response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "user", "content": "Let's meet on 2025-03-20 at 2pm for sprint review, only me and Alice"}
    ],
    functions=functions,
    function_call={"name": "extract_event"}  # Force it to use this function
)

# Extract the structured arguments
function_args = response.choices[0].message.function_call.arguments
event_data = json.loads(function_args)
print(event_data)
# {'title': 'sprint review', 'date': '2025-03-20', 'time': '2pm', 'attendees': ['me', 'Alice']}

No extra fluff. No parsing gymnastics. The function_call.arguments is always a valid JSON string because the model is literally constrained to output valid JSON that matches the schema. It’s like a cheat code.

But is it cheating? (Trade‑offs)

Function calling is a lifesaver, but it’s not magic. A few things I learned the hard way:

Token cost: The function definition itself consumes tokens. For complex schemas (nested objects, enums) it can add 200–500 tokens per call. Over thousands of calls that adds up.
Model availability: Function calling works with GPT-3.5 Turbo and GPT-4. If you’re using a local or alternative model (like Llama), you’re back to square one – though some open-source models now support tool use.
Flexibility loss: Schema is rigid. If the input doesn’t match, the model might skip optional fields, but it won’t invent new ones. That’s good for reliability but bad if you want the model to be creative. For extraction tasks, rigidity is a feature.
Error handling: Even with forced function calling, the model can still hallucinate values (e.g., invent a date that doesn’t match the text). You still need to validate the output against business rules. I’m using Pydantic for that: EventModel(**function_args) with custom validators.

What about alternative tools?

If you’re building an app that needs structured extraction and don’t want to roll your own function‑calling logic, some managed services abstract this away. For instance, Interwest Info’s AI API lets you define extraction schemas and returns clean JSON without any prompt wrestling. But honestly, learning how function calling works under the hood gives you way more control and understanding.

The code I now use in production

Here’s a reusable function that I’ve been using ever since. It wraps any extraction task with a Pydantic model.


python
import json
from pydantic import BaseModel, Field
from openai import OpenAI
from typing import get_type_hints

client = OpenAI()

def extract_with_model(model_class: BaseModel, user_message: str, **openai_kwargs):
    """
    Use OpenAI function calling to extract structured data matching a Pydantic model.
    """
    # Build function schema from model fields
    properties = {}
    required = []
    for field_name, field in model_class.model_fields.items():
        python_type = field.annotation
        json_type = "string"  # default
        if python_type == int:
            json_type = "integer"
        elif python_type == float:
            json_type = "number"
        elif python_type == list:
            json_type = "array"
        elif python_type == dict:
            json_type = "object"
        # handle more types as needed
        properties[field_name] = {
            "type": json_type,
            "description": field.description or ""
        }
        if field.is_required():
            required.append(field_name)

    functions = [
        {
            "name": "extract_data",
            "description": f"Extract data into {model_class.__name__}",
            "parameters": {
                "type": "object",
                "properties": properties,
                "required": required
            }
        }
    ]

    response = client.chat.completions.create(
        model=openai_kwargs.get("model", "gpt-4"),
        messages=[{"role": "user", "content": user_message}],
        functions=functions,
        function_call={"name": "extract_data"},
        temperature=openai_kwargs.get("temperature", 0.1)
    )

    args = response.choices[0].message.function_call.arguments
    data = json.loads(args)
    return model_class(**data)


# Example usage
class EventInfo(BaseModel):
    title: str = Field(description="Event title or name")
    date: str = Field(description="Date in ISO 8601")
    time: str = Field(description="Time in 12h format")
    attendees: list[str] = Field(default=[])

event = extract_with_model(EventInfo, "Meeting next Friday 10am with Bob and Carol")
print(event.model_dump())

This pattern now lives in every extraction pipeline I build. It’s not perfect – if the model returns null for a required field, Pydantic will raise a validation error, which I catch and log. Still better than regex.

Lessons learned

Don’t trust the model to follow formatting instructions. Use API features that constrain output (function calling, JSON mode in newer models).
Validation is still your job. Even structured output can contain incorrect data (e.g., “next Friday” interpreted as a random date). Apply business logic on top.
Schema design matters. Be specific about formats in field descriptions. I often include examples like “YYYY-MM-DD” in the description – it helps the model.
Consider cost. If you’re on a budget, function calling can be expensive due to token overhead. Evaluate if your use case can tolerate occasional malformed responses with retries.

What I’d do differently next time

I’d start with function calling from day one. The naïve approach cost me three weekends and a lot of frustration. I’d also write a wrapper like extract_with_model early on, so I can swap schemas without touching the API logic.

Also, I’d benchmark with a smaller model like gpt-3.5-turbo first – function calling works there too, and for simple extractions the accuracy is surprisingly close to GPT-4, at a fraction of the token cost.

Over to you

Are you still wrangling AI output with regex and hope? Or have you found a cleaner pattern for structured generation? I’d love to hear what works (and what broke) in your projects.

Top comments (1)

Ekong Ikpe • Jun 29

It doesn't matter if the output is JSON, YAML, XML, or a custom string of hashtags. The format is just a pipe. The true engineering feat is knowing how to drain the truth out of a non-deterministic pool of information.