When I first experimented with ChatGPT-3.5 after its release, I was thrilled about its potential for various applications. However, my excitement quickly faded when I encountered a major roadblock: although the valuable information it returned was exceptionally readable, it was not in a form that could be reliably ingested by an application. Ironically, LLMs excel in extracting information from unstructured text but can only return it in unstructured form. Trying to programmatically extract results from LLMs felt like being at an incredible restaurant that serves the most delicious food, but without any utensils — you can see it and smell it, but you just can’t get to it.
I tried every trick in the book to coax it into giving me some semblance of structured data. “Please, just separate each item with a bar or a new line and skip the commentary,” I’d plead. Sometimes it worked, sometimes it didn’t. Sometimes it would “helpfully” number or reorder the items, like a well-meaning but slightly confused assistant. Other times it would still sneak in some commentary, reminiscent of a chatty co-worker. I even demanded it in no uncertain terms to return JSON and nothing else, but it sometimes left out a comma — almost as if it were taking a passive-aggressive jab. Eventually, I gave up and reluctantly returned to the less exciting but more predictable confines of traditional algorithms.
Fortunately, a few months later, OpenAI introduced JSON mode, a feature that forces the LLM to return valid JSON. I decided to try this feature and found it significantly more effective for processing results in my applications. Here’s an example of the output with JSON mode enabled:
PROMPT:
Parse the following sentence into words and then return the results
as a list of the original word and the translation in English and
return the results in JSON.
-- sentence --
早安
RESULTS:
{
"results": [
{
"original": "早安",
"translation": "Good morning"
}
]
}
This output is certainly an improvement. However, while the output is valid JSON, its structure can vary depending on the contents of the prompt. A more predictable approach is to specify the desired return format. One way to achieve this is by providing a sample JSON structure for the LLM to follow. This method involves creating an example and writing code to parse it. If the structure changes, modifications must be done in both places.
An alternative approach is to define a Data Transfer Object (DTO) to hold the results and use it both to instruct the LLM and to parse the results, avoiding synchronization issues. First, define the DTO, for example:
record Entries(List<Entry> entries) {
record Entry(String originalWord, String wordInEnglish, String pronunciation) {}
}
Now the DTO can be used in the prompt instructions as well as by the parsing code:
// Construct the prompt with the output schema.
var prompt = MessageFormat.format("""
Parse the following sentence into English and return the results
in JSON according to the following JSON schema.
人工智慧將引領未來,以智慧之光照亮人類無限可能的前程。
--- output json schema ---
{0}
""", jsonSchemaOf(Entries.class));
var result = sendPrompt(prompt, Entries.class);
Here’s the code that uses the Jackson JSON Schema generator:
import com.fasterxml.jackson.databind.*;
import com.fasterxml.jackson.module.jsonSchema.*
public String jsonSchemaOf(Class cls) {
var schemaGen = new JsonSchemaGenerator(mapper);
var schema = schemaGen.generateSchema(cls);
return new ObjectMapper().writeValueAsString(schema);
}
Note: By default, the generated schema will include ID fields used for references, which can waste tokens. See the repository OpenAI JSON Mode Sample for code that removes these unused IDs.
And finally, here’s the code that sends the prompt to OpenAI using the Azure OpenAI Java SDK:
import com.azure.ai.openai.model.*;
import com.fasterxml.jackson.databind.*;
import java.util.logging.*;
public <T> T sendPrompt(String prompt, Class<T> cls) throws Exception {
// Create the openai client
var apikey = System.getenv("OPENAI_APIKEY");
var client = new OpenAIClientBuilder().credential(new KeyCredential(apikey)).buildClient();
// Assemble the message and enable json mode
var chatMessages = new ArrayList<ChatRequestMessage>();
chatMessages.add(new ChatRequestSystemMessage(prompt));
var options = new ChatCompletionsOptions(chatMessages);
options.setResponseFormat(new ChatCompletionsJsonResponseFormat());
options.setModel("gpt-4o-mini");
// Call the OpenAI service
logger.fine(prompt);
var chatCompletions = client.getChatCompletions(options.getModel(), options);
var usage = chatCompletions.getUsage();
logger.fine(String.format(
"tokens: %d (prompt=%d / completion=%d)%n",
usage.getTotalTokens(), usage.getPromptTokens(), usage.getCompletionTokens()));
var response = chatCompletions.getChoices().get(0).getMessage().getContent();
logger.fine(response);
// Parse the results
return mapper.readValue(response, cls);
}
The solution works well most of the time. The LLM understands the JSON schema effectively but a word of caution: I’ve seen cases where it sometimes gets it wrong. For example, if a field is a String and its name is plural (e.g. “exampleValues”), the LLM sometimes insist on returning an array of Strings instead.
LLMs can generate remarkable outputs, sometimes exceeding the capabilities of the average person. However, it’s intriguing that, at least for now, they struggle with the more mundane task of reliably formatting their generated output.
Top comments (0)