Building LLM Prompts From Enterprise Data in DataWeave: 2 Traps That Garbled My AI Output

#dataweave #mulesoft #ai #tutorial

I connected a MuleSoft API to an LLM last quarter for a support ticket classifier. The API call was easy — the MuleSoft AI Connector handles that. Building the prompt payload from enterprise data? That's where I spent 2 hours debugging escape sequences.

TL;DR

DataWeave transforms ticket data into structured LLM prompt payloads (system + user roles)
joinBy "\n" produces literal backslash-n in JSON — not actual newlines. The LLM sees one continuous line.
No token estimation → prompt consumes most of the context window → truncated response
The pattern builds system role, user role, model config, and structured response format in 12 lines

The Pattern: Enterprise Data to LLM Prompt

%dw 2.0
output application/json
var systemPrompt = "You are an enterprise support analyst."
var lines = payload.ticketHistory map (t) -> "- [$(t.priority upper)] $(t.id): $(t.subject)"
var userPrompt = "Analyze tickets for $(payload.customer.name):\n" ++ (lines joinBy "\n")
---
{
  model: payload.model,
  max_tokens: payload.maxTokens,
  messages: [
    {role: "system", content: systemPrompt},
    {role: "user", content: userPrompt}
  ]
}

Input: customer object + ticket array + model config.
Output: ready-to-send LLM payload with system instructions and contextual user prompt.

100 production-ready DataWeave patterns with tests: mulesoft-cookbook on GitHub

Trap 1: joinBy "\n" Is Literal, Not a Newline

The prompt looks correct in the DataWeave Playground:

- [HIGH] TK-101: API timeout
- [MEDIUM] TK-098: OAuth refresh failing
- [LOW] TK-095: Batch stuck at 80 pct

But the actual JSON sent to the LLM contains:

"content": "- [HIGH] TK-101: API timeout\n- [MEDIUM] TK-098: OAuth refresh failing\n- [LOW] TK-095: Batch stuck at 80 pct"

Literal \n characters. Not newlines. The LLM sees one continuous string and its analysis is garbled — it can't distinguish between tickets.

I spent 2 hours wondering why the classification was wrong before I checked the raw HTTP request body.

The fix: Use an actual newline in a variable, or use template literals that preserve whitespace.

Trap 2: No Token Estimation

I injected 200 ticket summaries into one prompt. Each summary is ~20 tokens. That's 4,000 tokens just for the ticket list. max_tokens was set to 500 for the response.

The model's context window was 4,096 tokens. Prompt + response budget = 4,500 tokens. Didn't fit. The response was truncated mid-sentence.

The fix: Estimate prompt tokens before setting max_tokens:

var estimatedPromptTokens = sizeOf(userPrompt) / 4
var safeMaxTokens = 4096 - estimatedPromptTokens - 100

When to Use This Pattern

Use it when	Alternatives
MuleSoft AI Connector integration	Direct API call with HTTP requester
Structured enterprise data → LLM prompt	Hardcoded prompts (won't scale)
Dynamic context injection (tickets, customer data)	Static system prompts
Multiple LLM providers (swap model field)	Provider-specific SDK