Why Your SAP Migration Mapping Layer Shouldn't Live in Spreadsheets

#datamigration #sap #dataengineering #opensource

If you've ever worked on a SAP migration, you already know the feeling.

Week one, someone opens a fresh Excel file. Columns: source table, target table, transformation logic, owner, status. Clean. Organized. Promising.

Fast forward three months: that file has 47 tabs, six versions with names like final_v3_REAL_USE_THIS.xlsx, three people who "own" different parts of it, and at least one tab that nobody touches because the person who built it left the project.

This is how mapping logic dies — not in a dramatic failure, but in slow fragmentation.

The SAP ECC 2027 problem

A lot of teams are about to feel this pain very acutely.

SAP is decommissioning ECC in 2027. That means a massive wave of migrations is either already underway or being planned right now — to S/4HANA, Snowflake, Databricks, or a mix of all three. These are not small lifts. SAP schemas are complex, deeply customized, and full of implicit business logic that's never been written down anywhere.

The mapping work alone — figuring out what maps to what, how it transforms, and why — can take months. And the dirty secret is that most of that work still happens in spreadsheets and Confluence pages, or lives inside the heads of a few senior consultants.

When the project ends (or the consultant rolls off), that knowledge is just... gone.

The tool gap nobody talks about

There's no shortage of ETL tools. Databricks, Informatica, Talend, SSIS — take your pick. These tools are excellent at moving data. That's exactly what they're built for.

But they don't really answer the question: why did this field end up here, and what happened to it along the way?

That's a different problem. It's a documentation problem, a governance problem, a traceability problem. And it's one that most teams are solving with the least appropriate tool possible: a spreadsheet.

What we built, and why

We were running into this repeatedly on migration projects, so we built a tool called ARCXA to deal with it.

ARCXA doesn't move data. It explains data movement.

It sits on top of your existing ETL stack and gives you:

Schema mapping — source-to-target field mappings, stored in a queryable knowledge graph
Data lineage — field-level tracking of where data came from and how it was transformed
Transformation traceability — a record of what happened to each field and why, that compounds across every project you run

The knowledge graph is built on SPARQL/RDF via Oxigraph. The backend is Rust, the frontend is React. Connectors for SAP HANA, Oracle, DB2, PostgreSQL, Snowflake, Databricks, S3 Parquet, and MySQL out of the box.

Deploy via Docker Compose or Helm. No SaaS, no cloud dependency, works air-gapped.

What this actually changes

The biggest shift is organizational memory.

When your mapping logic lives in ARCXA instead of a spreadsheet, it doesn't disappear when the project ends. New team members can query it. Auditors can trace it. You can reuse it on the next migration instead of starting from scratch.

Instead of hunting through tabs to understand why a transformation was written a certain way, you have a graph you can query. "Show me everything that feeds into this target field" becomes a real question with a real answer.

And because it's open-source (BSL 1.1), you can run it in dev and test for free, inspect the code, and adapt it to your stack.

Is this for you?

If you're working on a SAP ECC migration — or any large-scale legacy migration involving Oracle, DB2, or similar — and you're currently managing mapping logic in spreadsheets or docs, this is probably worth looking at.

It's not a replacement for your ETL tooling. It's the layer that explains what your ETL is doing.

Try it

GitHub repo: https://github.com/equitusai/arcxa

More details on the project and use cases on the ARCXA landing page.

Would love to hear how others are handling this — especially if you're deep in a SAP migration right now. Are you using any tooling for lineage and traceability, or is it still spreadsheets all the way down?