Parsing LLM Responses in DataWeave: 3 Layers of Defense Against Markdown Fences

#dataweave #mulesoft #ai #tutorial

I connected MuleSoft to GPT-4o last quarter for a support ticket classifier. The prompt builder worked (I covered that in a previous post). The LLM call worked. Then the response came back and my parser crashed.

The LLM was supposed to return clean JSON. Instead it returned: "Here is my analysis:\n

json\n{\"ranking\": [\"TK-101\"]}\n

\nLet me know if you need anything else."

Markdown fences. Preamble text. Trailing conversation. My read() call choked on "Here is my analysis" — not valid JSON.

TL;DR

LLMs wrap JSON responses in markdown fences — your parser must extract the JSON first
try() from dw::Runtime catches parse failures gracefully — without it, one malformed response crashes your flow
Always validate required keys after parsing — LLMs hallucinate extra fields and omit required ones
The regex for fence extraction: raw match /(?s)(?:json)?\s*(\{.*?\})\s*/
Test with 5 response variants: clean JSON, fenced JSON, fenced without language tag, no fences, broken JSON

The Problem: LLMs Don't Return Clean JSON

You ask GPT-4o to return JSON. Sometimes it does:

{"ranking": ["TK-101", "TK-098"], "summary": "Timeout is critical."}

But often it adds commentary:

Here is my analysis:

json
{"ranking": ["TK-101"], "summary": "Timeout is critical."}

Let me know if you need anything else.

plaintext

That's not valid JSON. read(raw, "application/json") throws. Your Mule flow crashes. 50,000 LLM responses per day = 50,000 potential crashes.

The Solution: 3-Layer Defense in DataWeave

%dw 2.0
import try from dw::Runtime
output application/json
var raw = payload.rawResponse
var fenceMatch = raw match /(?s)```

(?:json)?\s*(\{.*?\})\s*

```/
var jsonStr = if (fenceMatch[1]?) fenceMatch[1] else raw
var parsed = try(() -> read(jsonStr, "application/json"))
var keys = if (parsed.success) (parsed.result pluck $$) else []
var missing = payload.requiredKeys filter (k) -> !(keys contains k)
---
{
  parsed: if (parsed.success) parsed.result else null,
  valid: parsed.success and isEmpty(missing),
  missingKeys: missing
}

100 production-ready DataWeave patterns with tests: mulesoft-cookbook on GitHub

Layer 1: Regex Fence Extraction

var fenceMatch = raw match /(?s)```

(?:json)?\s*(\{.*?\})\s*

```/
var jsonStr = if (fenceMatch[1]?) fenceMatch[1] else raw

The regex matches content between markdown fences. (?s) enables dotall mode so . matches newlines. (?:json)? handles both

json` and bare

`. The captured group({.*?})` extracts just the JSON object.

If no fences found (fenceMatch[1]? is false), fall back to parsing the raw string directly. This handles the case where the LLM returns clean JSON without fences.

Layer 2: try() Wraps the Parse

`dataweave var parsed = try(() -> read(jsonStr, "application/json")) `

try() from dw::Runtime is the critical piece. Without it, read() on invalid JSON throws an exception and crashes the flow. With try(), you get:

Success: {success: true, result: {...parsed object...}}
Failure: {success: false, error: {...error details...}}

Your flow continues either way. Log the error, route to a dead-letter queue, retry with a different prompt — your choice. But the flow never crashes.

The trap: If you don't check parsed.success before accessing parsed.result, you get null on failure. Downstream code that expects an object gets null instead. Always check the success flag.

Layer 3: Required Key Validation

`dataweave var keys = if (parsed.success) (parsed.result pluck $$) else [] var missing = payload.requiredKeys filter (k) -> !(keys contains k) `

LLMs hallucinate. You asked for ranking and summary. The LLM returned ranking, analysis, and confidence — but not summary. The JSON is valid but missing a required field.

pluck $$ extracts all keys from the parsed object. filter checks each required key against the actual keys. missing tells you exactly which keys the LLM omitted.

The 5 Response Variants to Test

I test every LLM parser against these variants:

Variant	Example	What Breaks
Clean JSON	`{"ranking": [...], "summary": "..."}`	Nothing — baseline
Fenced with language	json\n{...}\n	`read()` without fence extraction
Fenced without language	\n{...}\n	Regex that requires `json` after fence
No fences, with preamble	`Here is my analysis: {"ranking": [...]}`	Naive `read()` on full string
Broken JSON	`{"ranking": ["TK-101"` (truncated)	Any parser without `try()`

I hit all 5 in production within the first week. The same model, same prompt, same temperature — 5 different formats across 50,000 responses.

Trap: The Regex Doesn't Handle Nested Objects Well

The regex \{.*?\} uses non-greedy matching. It works for flat JSON objects. But if your LLM returns nested objects:

`json {"analyses": [{"ticketId": "TK-101", "action": "increase pool"}], "summary": "..."} `

The non-greedy .*? stops at the first } — inside the nested array. You get a partial JSON parse.

For production, I switched to finding the matching closing brace by counting open/close braces, not regex. But the regex works for 90% of LLM responses where the top-level structure is flat.

Performance

The parser handles 50,000 LLM responses per day. Average parse time: 2ms per response. The regex is the slowest part — try() and read() are fast because they're native DataWeave functions.

Zero flow crashes since deploying the 3-layer approach. Before it, we had 3-5 crashes per day from malformed LLM responses.

What I Do Now

Every LLM integration gets this parser as the first step after the API call
Failed parses go to a dead-letter queue with the raw response for debugging
Missing key reports go to a monitoring dashboard — tracks LLM reliability over time
I test with all 5 variants before any production deployment

100 patterns with MUnit tests: github.com/shakarbisetty/mulesoft-cookbook

60-second video walkthroughs: youtube.com/@SanThaParv