Why I Built My Own Data Transformation Language?

#data #programming #showdev #tooling

Why I Built My Own Data Transformation Language

Integration mapping hasn't meaningfully advanced in a decade. I got tired of waiting.

The Problem Nobody Talks About

Every integration project I've worked on hits the same wall. You need to transform data between systems. Simple enough — until you look at your options.

XSLT is the old guard. Powerful, battle-tested, 25 years of history. But it's XML-only. The moment your source or target is JSON, CSV, or YAML, you're on your own. In 2025, almost nothing is XML-only.

Tibco BW used to be the king of composable Integration, but XML transformations only, heavily depending on XSLT.

MuleSoft started using DataWeave. DataWeave is genuinely good. Format-agnostic, functional, modern syntax. But it's owned by MuleSoft, which means it's owned by Salesforce, which means you're locked in. You can't use it outside the MuleSoft ecosystem without paying for it. There's no community fork (for a short while it looked otherwise, but unfortunattely it was not opened). There's no escape hatch.

Custom code is what most teams fall back on. Write a Python script, a Java class, a Node.js function, groovy script, ECMA script.. It works — until you need to maintain four versions of the same transformation logic in four different codebases for four different formats.

So you end up with XSLT for XML, jq for JSON, custom scripts for CSV, something else for YAML — and a growing pile of tools that all do the same thing differently.

This is the state of the art in 2025. A problem that was solved for XML in 1999 still hasn't been solved generically.

Why Nothing Filled the Gap

I kept looking for an open source, format-agnostic transformation language. Something like XSLT but not XML-specific. Something like DataWeave but not vendor-locked.

It didn't exist.

There are converters — tools that mechanically flip JSON to XML or CSV to JSON. But that's not transformation. Transformation is filtering, mapping, joining, aggregating, restructuring. Business logic. The stuff that actually matters in integration work.

The closest thing was jq — elegant, powerful, beloved. But jq is JSON-only. Try pointing it at an XML file.

So I Built It

UTL-X is the tool I wanted to exist. A functional, declarative transformation language that works the same way regardless of whether your data is XML, JSON, CSV, YAML, or OData.

You write a transformation once:

%utlx 1.0
input xml
output json
---
{
  customers: $input.Orders.Order
    |> filter(o => o.@status == "active")
    |> map(o => { id: o.@id, total: parseNumber(o.Total) })
    |> sortBy(o => -o.total)
}

And if tomorrow your source switches from XML to JSON, you change one word: input xml becomes input json. The transformation logic is untouched.

For quick work, it behaves like jq:

cat data.xml | utlx                    # instant XML → JSON
echo '{"name":"Alice"}' | utlx -e '.name' -r   # Alice
cat data.xml | utlx -e '.Orders.Order |> map(o => {id: o.@id})'

For production, it scales up to a full pipeline engine with Kafka integration, thread pools, health probes, and hot reload.

What v1.0.1 Looks Like

Format support: XML, JSON, CSV, YAML, OData
Schema transformation: XSD ↔ JSON Schema ↔ Avro ↔ Protobuf ↔ OData/EDMX
652 stdlib functions
Native binaries for macOS, Linux, Windows — no JVM required
Strong type system with compile-time checking
Three executables: utlx (CLI), utlxd (IDE daemon/LSP), utlxe (production engine)
AGPL-3.0 — genuinely free, no vendor lock-in

This Should Have Existed Already

I don't think what I built is particularly clever. The ideas behind UTL-X — format abstraction, functional pipelines, declarative syntax — have been well understood for years. XSLT proved the concept in 1999. DataWeave proved it could be done well in 2013.

What's new is that it's open source, format-agnostic, and available to anyone without a MuleSoft contract.

Integration mapping hasn't advanced in a decade because the tools that worked well were locked behind vendor walls. I'm hoping UTL-X changes that — even a little.

If you work with data transformation, ETL pipelines, API integration, or just find yourself reaching for jq and wishing it worked on XML too — give it a try.

GitHub: https://github.com/grauwen/utl-x

I'd love to hear what formats or use cases you'd want to see supported. Issues and discussions are open.