Marco Gazerro for Krenalis

Posted on Jun 5

If the warehouse already has the data, why are we copying it elsewhere?

#architecture #data #dataengineering #systemdesign

When we started working on Krenalis, we spent a lot of time reviewing how customer data typically flows through a modern data stack.

One pattern kept showing up often enough that we started questioning it.

In many modern stacks, customer data already lands in a warehouse. Yet we often copy that same data into a CDP before we can start building customer profiles.

During one of those discussions, someone asked a question that sounded almost naive:

Why are we moving all this data in the first place?

Nobody had a particularly strong answer ready. The answer was mostly:

Because that's how CDPs work.

We expected the question to have an obvious answer. It didn't.

The warehouse is no longer just for analytics

Over the last few years, the role of the data warehouse has changed significantly. Warehouses are no longer just analytical systems. They're increasingly becoming the place where organizations centralize the context used by applications, AI agents, copilots, and business processes.

Customer data from systems like Shopify, Stripe, CRMs, support platforms, and internal applications often ends up there long before anyone starts thinking about segmentation or activation. In many organizations, the warehouse is already the place where teams answer questions about customers, revenue, retention, and product usage.

That made us wonder:

If the warehouse is already becoming the operational center of the data stack, why does customer identity usually live somewhere else?

Consider a customer who buys through Shopify, pays through Stripe, opens support tickets in Zendesk, and uses the product under a different email address. In many organizations, all of those records already end up in the warehouse. Yet building a unified profile often requires exporting that same data into another platform before identity can be resolved.

The cost of another copy

To be clear, data duplication is not inherently bad. Most software systems rely on some form of replication, caching, or denormalization.

The question is whether an additional copy is actually necessary.

When customer data exists across multiple platforms, a few familiar challenges tend to appear. Sooner or later, two systems report different numbers and someone has to determine which one is correct. As customer profiles become the result of multiple pipelines and transformations, understanding exactly how a profile was built becomes more difficult. Additional systems also introduce additional integrations, monitoring requirements, and opportunities for data drift.

None of these problems are unique to CDPs. They're simply common side effects of moving data between systems.

A different perspective

At some point we stopped asking:

How should we move data into the CDP?

and started asking:

What if we didn't?

Once we started looking at the problem from that angle, a warehouse-native architecture felt like the obvious thing to explore.

Instead of bringing customer data into the CDP, we started exploring what would happen if identity resolution, profile generation, and audience segmentation happened directly on top of the data that was already in the warehouse.

The underlying idea can be summarized in a simple sentence:

The CDP doesn't need to own the data. It only needs to understand it.

Whether that idea ultimately proves right or wrong is a different question. What mattered to us was that it seemed increasingly aligned with the direction the rest of the modern data stack was already taking.

Why this question feels relevant today

The idea behind Krenalis didn't emerge in isolation.

By the time we started working on it, many parts of the modern data stack had already moved closer to the warehouse. Analytics increasingly treats it as the source of truth, transformations often run there, and business logic is frequently built around it. More recently, AI applications and agent workflows have started using it as a source of context as well.

The trend itself wasn't particularly surprising. What caught our attention was the question that followed from it.

If so much of the modern data stack is converging around the warehouse, what does that mean for customer identity?

If customer events, transactions, support interactions, business metrics, and AI context are already stored in the warehouse, resolving customer identity there starts to feel less like a technical compromise and more like a natural extension of the same idea.

One aspect of this approach that we find particularly compelling is transparency. Customer identity becomes something that can be inspected, queried, and reasoned about using the same tools teams already use for the rest of their data.

Instead of treating identity as something produced by a separate platform, it becomes part of the data model itself.

There may still be good reasons to keep customer identity inside a dedicated system. Dedicated platforms can provide a better user experience, faster onboarding, and a simpler path for teams that don't want to operate directly on warehouse data.

But the more central the warehouse becomes, the more natural it feels to ask whether identity resolution should happen there too.

The trade-offs are real

Warehouse-native isn't a universal answer.

If a team doesn't already rely heavily on a data warehouse, introducing warehouse-native tooling may create more complexity than it removes. Likewise, if the primary goal is to launch campaigns quickly with minimal involvement from data teams, a traditional CDP may be a better fit.

A warehouse-native architecture also doesn't solve data quality problems. If customer data is incomplete, inconsistent, or fragmented, those issues remain. In fact, they often become more visible.

Depending on your perspective, that's either a benefit or an inconvenience.

A bet on where things might be heading

We're not arguing that every CDP should become warehouse-native.

Traditional CDPs solve real problems and continue to provide value for many organizations.

What interests us is a broader question. As warehouses become the foundation for analytics, business operations, AI applications, and agent workflows, will customer identity eventually follow the same path?

We think there is a strong case for that direction, especially in composable architectures where the warehouse is already the place where customer context is modeled, governed, and used.

Maybe this becomes the default model for some teams.

Maybe it remains one architectural option among many.

We kept coming back to that question, and eventually decided it was worth building around.

And it's one of the reasons Krenalis exists.

What do you think?

If you've worked on CDPs, customer identity, or warehouse-native architectures, we'd genuinely love to hear about your experience.

Leave a comment below or drop us a note at hello@krenalis.com.

For anyone interested in how we're approaching these problems, we've made the project repository available here:

https://github.com/krenalis/krenalis

DEV Community