I connected MuleSoft to GPT-4o last quarter for a support ticket classifier. The prompt builder worked (I covered that in a previous post). The LLM call worked. Then the response came back and my parser crashed.
The LLM was supposed to return clean JSON. Instead it returned: "Here is my analysis:\n
json\n{\"ranking\": [\"TK-101\"]}\n
\nLet me know if you need anything else."
Markdown fences. Preamble text. Trailing conversation. My read() call choked on "Here is my analysis" — not valid JSON.
TL;DR
- LLMs wrap JSON responses in markdown fences — your parser must extract the JSON first
-
try()fromdw::Runtimecatches parse failures gracefully — without it, one malformed response crashes your flow - Always validate required keys after parsing — LLMs hallucinate extra fields and omit required ones
- The regex for fence extraction:
raw match /(?s)(?:json)?\s*(\{.*?\})\s*/ - Test with 5 response variants: clean JSON, fenced JSON, fenced without language tag, no fences, broken JSON
The Problem: LLMs Don't Return Clean JSON
You ask GPT-4o to return JSON. Sometimes it does:
{"ranking": ["TK-101", "TK-098"], "summary": "Timeout is critical."}
But often it adds commentary:
Here is my analysis:
json
{"ranking": ["TK-101"], "summary": "Timeout is critical."}
Let me know if you need anything else.
plaintext
That's not valid JSON. read(raw, "application/json") throws. Your Mule flow crashes. 50,000 LLM responses per day = 50,000 potential crashes.
The Solution: 3-Layer Defense in DataWeave
%dw 2.0
import try from dw::Runtime
output application/json
var raw = payload.rawResponse
var fenceMatch = raw match /(?s)```
(?:json)?\s*(\{.*?\})\s*
```/
var jsonStr = if (fenceMatch[1]?) fenceMatch[1] else raw
var parsed = try(() -> read(jsonStr, "application/json"))
var keys = if (parsed.success) (parsed.result pluck $$) else []
var missing = payload.requiredKeys filter (k) -> !(keys contains k)
---
{
parsed: if (parsed.success) parsed.result else null,
valid: parsed.success and isEmpty(missing),
missingKeys: missing
}
100 production-ready DataWeave patterns with tests: mulesoft-cookbook on GitHub
Layer 1: Regex Fence Extraction
var fenceMatch = raw match /(?s)```
(?:json)?\s*(\{.*?\})\s*
```/
var jsonStr = if (fenceMatch[1]?) fenceMatch[1] else raw
The regex matches content between markdown fences. (?s) enables dotall mode so . matches newlines. (?:json)? handles both
`json` and bare
``. The captured group({.*?})` extracts just the JSON object.
If no fences found (fenceMatch[1]? is false), fall back to parsing the raw string directly. This handles the case where the LLM returns clean JSON without fences.
Layer 2: try() Wraps the Parse
dataweave
var parsed = try(() -> read(jsonStr, "application/json"))
try() from dw::Runtime is the critical piece. Without it, read() on invalid JSON throws an exception and crashes the flow. With try(), you get:
- Success:
{success: true, result: {...parsed object...}} - Failure:
{success: false, error: {...error details...}}
Your flow continues either way. Log the error, route to a dead-letter queue, retry with a different prompt — your choice. But the flow never crashes.
The trap: If you don't check parsed.success before accessing parsed.result, you get null on failure. Downstream code that expects an object gets null instead. Always check the success flag.
Layer 3: Required Key Validation
dataweave
var keys = if (parsed.success) (parsed.result pluck $$) else []
var missing = payload.requiredKeys filter (k) -> !(keys contains k)
LLMs hallucinate. You asked for ranking and summary. The LLM returned ranking, analysis, and confidence — but not summary. The JSON is valid but missing a required field.
pluck $$ extracts all keys from the parsed object. filter checks each required key against the actual keys. missing tells you exactly which keys the LLM omitted.
The 5 Response Variants to Test
I test every LLM parser against these variants:
| Variant | Example | What Breaks |
|---|---|---|
| Clean JSON | {"ranking": [...], "summary": "..."} |
Nothing — baseline |
| Fenced with language | ``` |
json\n{...}\n
| `read()` without fence extraction |
| Fenced without language |
\n{...}\n
``| Regex that requiresjsonafter fence |Here is my analysis: {"ranking": [...]}
| No fences, with preamble || Naiveread()on full string |{"ranking": ["TK-101"
| Broken JSON |(truncated) | Any parser withouttry()` |
I hit all 5 in production within the first week. The same model, same prompt, same temperature — 5 different formats across 50,000 responses.
Trap: The Regex Doesn't Handle Nested Objects Well
The regex \{.*?\} uses non-greedy matching. It works for flat JSON objects. But if your LLM returns nested objects:
json
{"analyses": [{"ticketId": "TK-101", "action": "increase pool"}], "summary": "..."}
The non-greedy .*? stops at the first } — inside the nested array. You get a partial JSON parse.
For production, I switched to finding the matching closing brace by counting open/close braces, not regex. But the regex works for 90% of LLM responses where the top-level structure is flat.
Performance
The parser handles 50,000 LLM responses per day. Average parse time: 2ms per response. The regex is the slowest part — try() and read() are fast because they're native DataWeave functions.
Zero flow crashes since deploying the 3-layer approach. Before it, we had 3-5 crashes per day from malformed LLM responses.
What I Do Now
- Every LLM integration gets this parser as the first step after the API call
- Failed parses go to a dead-letter queue with the raw response for debugging
- Missing key reports go to a monitoring dashboard — tracks LLM reliability over time
- I test with all 5 variants before any production deployment
100 patterns with MUnit tests: github.com/shakarbisetty/mulesoft-cookbook
60-second video walkthroughs: youtube.com/@SanThaParv
Top comments (0)