DEV Community

Cover image for Parsing LLM Responses in DataWeave: 3 Layers of Defense Against Markdown Fences
ThaSha
ThaSha

Posted on

Parsing LLM Responses in DataWeave: 3 Layers of Defense Against Markdown Fences

I connected MuleSoft to GPT-4o last quarter for a support ticket classifier. The prompt builder worked (I covered that in a previous post). The LLM call worked. Then the response came back and my parser crashed.

The LLM was supposed to return clean JSON. Instead it returned: "Here is my analysis:\n

json\n{\"ranking\": [\"TK-101\"]}\n

\nLet me know if you need anything else."

Markdown fences. Preamble text. Trailing conversation. My read() call choked on "Here is my analysis" — not valid JSON.

TL;DR

  • LLMs wrap JSON responses in markdown fences — your parser must extract the JSON first
  • try() from dw::Runtime catches parse failures gracefully — without it, one malformed response crashes your flow
  • Always validate required keys after parsing — LLMs hallucinate extra fields and omit required ones
  • The regex for fence extraction: raw match /(?s)(?:json)?\s*(\{.*?\})\s*/
  • Test with 5 response variants: clean JSON, fenced JSON, fenced without language tag, no fences, broken JSON

The Problem: LLMs Don't Return Clean JSON

You ask GPT-4o to return JSON. Sometimes it does:

{"ranking": ["TK-101", "TK-098"], "summary": "Timeout is critical."}
Enter fullscreen mode Exit fullscreen mode

But often it adds commentary:

Here is my analysis:
Enter fullscreen mode Exit fullscreen mode


json
{"ranking": ["TK-101"], "summary": "Timeout is critical."}

Let me know if you need anything else.
Enter fullscreen mode Exit fullscreen mode


plaintext

That's not valid JSON. read(raw, "application/json") throws. Your Mule flow crashes. 50,000 LLM responses per day = 50,000 potential crashes.

The Solution: 3-Layer Defense in DataWeave

%dw 2.0
import try from dw::Runtime
output application/json
var raw = payload.rawResponse
var fenceMatch = raw match /(?s)```

(?:json)?\s*(\{.*?\})\s*

```/
var jsonStr = if (fenceMatch[1]?) fenceMatch[1] else raw
var parsed = try(() -> read(jsonStr, "application/json"))
var keys = if (parsed.success) (parsed.result pluck $$) else []
var missing = payload.requiredKeys filter (k) -> !(keys contains k)
---
{
  parsed: if (parsed.success) parsed.result else null,
  valid: parsed.success and isEmpty(missing),
  missingKeys: missing
}
Enter fullscreen mode Exit fullscreen mode

100 production-ready DataWeave patterns with tests: mulesoft-cookbook on GitHub


Layer 1: Regex Fence Extraction

var fenceMatch = raw match /(?s)```

(?:json)?\s*(\{.*?\})\s*

```/
var jsonStr = if (fenceMatch[1]?) fenceMatch[1] else raw
Enter fullscreen mode Exit fullscreen mode

The regex matches content between markdown fences. (?s) enables dotall mode so . matches newlines. (?:json)? handles both

`json` and bare


``. The captured group({.*?})` extracts just the JSON object.

If no fences found (fenceMatch[1]? is false), fall back to parsing the raw string directly. This handles the case where the LLM returns clean JSON without fences.

Layer 2: try() Wraps the Parse


dataweave
var parsed = try(() -> read(jsonStr, "application/json"))


Enter fullscreen mode Exit fullscreen mode

try() from dw::Runtime is the critical piece. Without it, read() on invalid JSON throws an exception and crashes the flow. With try(), you get:

  • Success: {success: true, result: {...parsed object...}}
  • Failure: {success: false, error: {...error details...}}

Your flow continues either way. Log the error, route to a dead-letter queue, retry with a different prompt — your choice. But the flow never crashes.

The trap: If you don't check parsed.success before accessing parsed.result, you get null on failure. Downstream code that expects an object gets null instead. Always check the success flag.

Layer 3: Required Key Validation


dataweave
var keys = if (parsed.success) (parsed.result pluck $$) else []
var missing = payload.requiredKeys filter (k) -> !(keys contains k)


Enter fullscreen mode Exit fullscreen mode

LLMs hallucinate. You asked for ranking and summary. The LLM returned ranking, analysis, and confidence — but not summary. The JSON is valid but missing a required field.

pluck $$ extracts all keys from the parsed object. filter checks each required key against the actual keys. missing tells you exactly which keys the LLM omitted.

The 5 Response Variants to Test

I test every LLM parser against these variants:

Variant Example What Breaks
Clean JSON {"ranking": [...], "summary": "..."} Nothing — baseline
Fenced with language ```


json\n{...}\n

| `read()` without fence extraction |
| Fenced without language |


\n{...}\n

``| Regex that requiresjsonafter fence |
| No fences, with preamble |
Here is my analysis: {"ranking": [...]}| Naiveread()on full string |
| Broken JSON |
{"ranking": ["TK-101"(truncated) | Any parser withouttry()` |

I hit all 5 in production within the first week. The same model, same prompt, same temperature — 5 different formats across 50,000 responses.

Trap: The Regex Doesn't Handle Nested Objects Well

The regex \{.*?\} uses non-greedy matching. It works for flat JSON objects. But if your LLM returns nested objects:


json
{"analyses": [{"ticketId": "TK-101", "action": "increase pool"}], "summary": "..."}


Enter fullscreen mode Exit fullscreen mode

The non-greedy .*? stops at the first } — inside the nested array. You get a partial JSON parse.

For production, I switched to finding the matching closing brace by counting open/close braces, not regex. But the regex works for 90% of LLM responses where the top-level structure is flat.

Performance

The parser handles 50,000 LLM responses per day. Average parse time: 2ms per response. The regex is the slowest part — try() and read() are fast because they're native DataWeave functions.

Zero flow crashes since deploying the 3-layer approach. Before it, we had 3-5 crashes per day from malformed LLM responses.

What I Do Now

  1. Every LLM integration gets this parser as the first step after the API call
  2. Failed parses go to a dead-letter queue with the raw response for debugging
  3. Missing key reports go to a monitoring dashboard — tracks LLM reliability over time
  4. I test with all 5 variants before any production deployment

100 patterns with MUnit tests: github.com/shakarbisetty/mulesoft-cookbook

60-second video walkthroughs: youtube.com/@SanThaParv

Top comments (0)