Cooper D

Posted on Nov 18

Why Your Enterprise Data Platform Is No Longer Just for Analytics

#dataengineering #architecture #database #analytics

Key Takeaways

The relationship between data and applications is undergoing a fundamental shift. For decades, we've moved data to applications. Now, we're moving applications to data. This isn't just an architectural preference—it's becoming a necessity as businesses demand richer context, faster insights, and real-time operations. Here's what's driving this change:

Context is king: Connected data provides multidimensional insights that isolated data simply cannot match
The old pattern is breaking: Extracting data to specialized tools creates silos, brittleness, and duplication
The line has blurred: Enterprise Data Platforms are no longer just analytical systems—they're becoming operational platforms
Three critical shifts: Data latency must drop to seconds, query latency to sub-seconds, and availability must reach production-grade standards
The solution space: Event-driven architectures, operational databases like serverless Postgres, and treating your EDP as a P1 system
Best of both worlds: Leverage specialized analytics capabilities by bringing them to your data, not moving data to them—every data movement step adds failure points, costs, and complexity

The Power of Connected Data: A Tale of Two Dashboards

Consider how Uber thinks about a driver who just completed a ride.

Without connected data: "Driver #47291 completed an 18-minute ride. Rating: 5 stars."

With connected data: "Driver #47291 completed an 18-minute ride during rush hour in San Francisco. Has a 4.92 rating over 2,847 trips, typically works evenings, now in a surge zone. The passenger is gold-status but gave 3 stars today (usually gives 5). Heavy rain—this driver's cancellation rate jumps 8% in rain."

Same event, different universe. The first tells you what happened. The second tells you why, predicts what might happen next, and suggests what action to take. When you view information through multiple dimensions—user behavior, location patterns, time series, weather, operational metrics—you move from reporting to insight.

Why Enterprise Data Platforms Became the Center of Gravity

In large organizations, data from everywhere converges in a central Enterprise Data Platform: CRM systems, transaction data, product telemetry, marketing attribution, customer service interactions.

This wasn't arbitrary. Connecting data is hard, and doing it repeatedly across different tools is wasteful. The EDP became the natural convergence point where data gets cleaned once, relationships between different sources get mapped, historical context accumulates, and governance gets enforced.

When you need to understand customer lifetime value, you need purchase history, support interactions, usage patterns, and marketing touchpoints. These don't naturally live together—they get connected in the EDP. This made it perfect for contextual analytics: not just because data lives there, but because the relationships and ability to view information from multiple angles exist in one place.

The Old Playbook: Extract, Load, Specialize

For years, the workflow was straightforward. When teams wanted to improve customer experience, run marketing campaigns, or optimize products, they'd: identify needed data from the EDP, procure a specialized tool (Qualtrics for customer experience, Segment for customer data, Hightouch for reverse ETL), build pipelines to extract and load data, then let the specialized tool work its magic.

Marketing got Braze. Customer success got Gainsight. Product got Amplitude. Each loaded with curated enterprise data.

This made sense—these platforms had years of domain expertise and optimized databases for specific use cases. But cracks started showing.

The Data Paradox: More is Better, Until It Isn't

Every specialized tool works better with richer data. Your NPS scores don't just tell you satisfaction dropped—you want to know it dropped specifically among enterprise customers with multiple support tickets coming up for renewal.

Theoretically, send more data. Practically, this creates three problems:

First, you're duplicating your entire dataset across multiple tools. Your customer data lives in the EDP, in marketing, in customer success, in product analytics. Each copy needs syncing. Each represents another data quality surface.

Second, you're creating brittle pipelines. Different data models, different APIs, different limitations. Each pipeline is a failure point needing maintenance as schemas evolve.

Third, you're siloing insights. Marketing sees one version of the customer, product sees another, support a third. The connected data you built in the EDP gets disconnected as it flows into specialized tools.

This becomes an anti-pattern—working against what made the EDP valuable: keeping data connected.

The Inversion: Bring Applications to the Data

If moving data to applications creates these problems, what if we inverted the pattern? Instead of extracting data from the EDP to specialized tools, build those capabilities on top of the EDP itself.

When the most contextual, connected data already lives in your Enterprise Data Platform, why ship it elsewhere? Why not build your customer experience dashboards, your marketing segmentation engines, your operational applications directly on the EDP?

This is where most people raise an objection.

Wait, Isn't That an Anti-Pattern?

For decades, we've been taught that analytical systems and operational systems are fundamentally different. Analytics platforms—data warehouses, lakes, EDPs—handle complex queries over large datasets, optimized for throughput. Operational systems—transactional databases—handle fast queries on specific records, optimized for latency.

You wouldn't run e-commerce checkout on a data warehouse. You wouldn't build real-time fraud detection on overnight batch jobs.

But here's what changed: the line between analytical and operational has blurred dramatically over the past five years.

Why the Line Has Blurred

Applications have become analytics-hungry. A decade ago, an operational application might look up a customer record. Today, that same application needs to compute lifetime value in real-time, analyze 90 days of behavior, compare against historical patterns, and aggregate data across multiple dimensions.

Meanwhile, data freshness requirements have compressed. Marketing campaigns that used to refresh daily now need hourly or minute-level updates. Customer health scores calculated overnight now need to reflect recent interactions within minutes.

And context requirements have exploded. It's no longer enough to know what a customer bought—you need what they viewed but didn't buy, what promotions they've seen, what support issues they've had, and what predictive models say about their churn likelihood.

This creates a new reality: operational applications need the rich, connected context of the EDP, but with operational characteristics—low latency, high availability, and fresh data.

EDPs can no longer be P2 or P3 systems that indirectly support business. They're becoming P1 systems powering business directly, in real-time, at the point of customer interaction.

The Three Critical Shifts

For EDPs to power operational applications, three characteristics must change:

1. Data Latency: From Hours to Seconds

Traditional data pipelines moved data in batches—often daily, sometimes hourly, occasionally every 15 minutes if you were pushing it. This worked fine when insights were consumed the next morning in dashboard reviews.

It doesn't work when you're trying to trigger a marketing campaign based on a customer's action taken 30 seconds ago. It doesn't work when flagging potentially fraudulent transactions while they're still pending. It doesn't work when customer service needs to see what happened during the call that just ended.

The solution: Event-driven architecture, end to end. This isn't just about having a message queue somewhere. It means rethinking how data flows through your entire enterprise. When a customer completes a purchase, that event should propagate through your systems in seconds, not hours.

This is the architecture that makes an enterprise truly data-driven—not from yesterday's data, but from what's happening right now. Technologies like Kafka, Debezium for change data capture, and streaming platforms become foundational, not optional.

2. Query Latency: From Seconds to Sub-Seconds

Users won't wait three seconds for a dashboard to load. They definitely won't wait 30 seconds for a page to render. Applications need to respond in hundreds of milliseconds, not seconds.

But here's the fundamental issue: modern data warehouses and lakes are built on storage-compute separation. This isn't a bug—it's an intentional design choice that provides enormous benefits for analytical workloads. You can scale storage and compute independently. You can spin up compute when needed and shut it down when you don't.

However, this separation introduces a first-principles problem: when you run a query, data needs to move from remote storage to compute nodes. Even with optimized formats like Parquet, even with clever caching—data still needs to travel. For analytical queries over large datasets, a few seconds is acceptable. For operational APIs, it's not.

Why this matters for operational workloads: Operational applications don't make single queries. They chain hundreds of API calls. A single page load might trigger dozens of queries. Real-time business decisions—approve this transaction, show this offer, flag this behavior—can't wait for data to move from storage to compute. They need millisecond responses.

The solution: Relational databases, where compute and storage live together. This is where solutions like Neon and serverless Postgres come into play.

The pattern: Keep your rich, historical, connected data in the EDP where it belongs—that's still the system of record. But sync the operational subset—the data that needs to power real-time applications—into a relational database optimized for low-latency queries.

This operational database becomes your fast access layer, holding the most frequently accessed data: current customer states, recent transactions, active orders. Everything else—full history, rarely accessed dimensions, large analytical datasets—stays in the EDP and is linked when needed.

Why relational databases? When compute and storage are together, query latency drops dramatically. No network hop to fetch data. Indexes live next to the data. Query planners optimize on actual data locality.

Why serverless Postgres? It solves the operational challenges that traditionally made databases hard to scale—automatic scaling, no provisioning for peak load—while maintaining the low-latency benefits of the relational model.

3. High Availability: From "It's Down" to "It's Always On"

When your data platform is used for monthly reports and strategic planning, a few hours of downtime is annoying but not catastrophic. When your data platform powers customer-facing applications, every minute of downtime directly impacts revenue.

This means treating your EDP—or at least the operational layer sitting on top of it—with the same availability standards you'd apply to any production application.

The solution: Active-active configurations, multi-region deployments, automatic failover. At minimum, the operational database layer needs production-grade infrastructure.

This shift is cultural as much as technical. It means your data team needs to adopt DevOps practices. It means SLAs matter. It means on-call rotations become part of data platform management.

Why This Matters Now

None of these ideas are entirely new. People have been talking about operational analytics for years. So why is this pattern becoming critical now?

Several trends have converged:

The cost of computation has dropped dramatically. What was prohibitively expensive five years ago—maintaining real-time data pipelines, running operational databases on large datasets—is now economically feasible. Serverless architectures have made it even more accessible.

Competitive pressure has increased. Customers expect personalization, immediate responses, and consistency across channels. Companies that can deliver these experiences with richer context have a meaningful advantage.

The technology has matured. Event streaming platforms are production-ready. Change data capture tools reliably sync databases. Serverless databases handle operational workloads without traditional overhead. The pieces needed to build this architecture actually work now.

Data teams have the skills. A generation of data engineers who grew up building real-time pipelines and thinking about data as something that flows rather than sits have moved into leadership positions. The organizational knowledge exists to execute this pattern.

The New Architecture

Here's what this looks like in practice:

Your Enterprise Data Platform remains the system of record—the place where data is cleaned, connected, and stored historically. Data flows into it through event-driven pipelines that capture changes as they happen, not in overnight batches.

On top of the EDP, an operational layer provides fast, consistent access to the subset of data needed for real-time applications. This might be a serverless Postgres instance that's automatically synced with your data platform, maintaining operational data with sub-second query latency.

Applications—whether internal tools, customer-facing features, or analytical dashboards—query the operational layer directly. They get the rich context of the EDP with the performance characteristics of an operational database.

The operational layer is treated as a P1 system: multi-region if needed, highly available, monitored like any production service.

Data flows through this architecture in near real-time. An event happens in a source system, gets captured and streamed to the EDP, triggers processing and transformation, and updates the operational layer—all within seconds or minutes.

What This Enables

When you build applications on top of your connected data rather than extracting subsets to specialized tools, several things become possible:

Richer insights. You're not limited to the subset of data you could feasibly extract and load. Your application has access to the full context of the EDP.

Faster iteration. Adding a new dimension to your analysis doesn't require building a new pipeline and waiting for data to load. It's already there.

Reduced duplication. Data lives in fewer places. Updates happen in one location. Data quality issues are fixed once.

Better cross-functional work. When everyone is building on the same data foundation, insights are easier to share. Marketing and product aren't looking at different versions of customer behavior.

Lower operational overhead. Fewer pipelines to maintain, fewer data synchronization issues to debug, fewer copies of data to govern and secure.

The Trade-offs

You're trading specialized tools' optimizations for platform flexibility. You need teams capable of building applications and organizational buy-in to treat data platforms as production infrastructure. But for many organizations, the benefits—flexibility, reduced duplication, faster iteration, contextual insights—justify the investment.

But What About Specialized Capabilities?

Here's a legitimate question: what about all those cutting-edge features that specialized platforms offer? Qualtrics has StatsIQ and TextIQ—sophisticated analytics capabilities built over years. Segment has identity resolution algorithms refined across thousands of companies.

If we're building on our EDP instead of using these tools, are we throwing away innovation? Are we asking data teams to rebuild complex models from scratch?

Not necessarily. The key insight: you don't need to move data to leverage specialized capabilities. Bring those capabilities to where data lives, or let them operate on your data in place.

Two Emerging Patterns

First, bring capabilities to the EDP. This is already happening. Many specialized analytics capabilities are becoming available as standalone services or libraries that operate directly on data platforms. Modern EDPs support user-defined functions, external ML service calls, and integration with specialized processing engines. You can invoke sentiment analysis APIs on text stored in your EDP. You can run statistical models using libraries that operate directly on your warehouse tables.

Second, let specialized tools operate in place. Instead of extracting data into Qualtrics, imagine Qualtrics connecting directly to your EDP and running its StatsIQ algorithms on your data where it sits. This "compute on data in place" trend is accelerating—it's the core idea behind data clean rooms, query federation, and interoperability standards.

Why Operating In Place Wins

Every time you add a step to move data, you introduce:

Failure points: Another pipeline that can break, another synchronization that can fall out of date
Costs: Data egress charges, storage duplication, compute for transformation and loading
Latency: Time to extract, transform, and load before insights are available
Complexity: Another system to monitor, another set of credentials to manage
Risk: More copies of sensitive data, more surfaces for security issues

The most successful solutions operate on existing data in place. Think about dbt's success—it transforms data where it sits. Or how BI tools evolved from requiring extracts to connecting directly to warehouses. The winning pattern is always "work with data in place."

What This Requires

From vendors: APIs that operate on external data, federated query engines, embedded analytics libraries. Some will resist—their business models depend on data lock-in. But those that embrace this will win in a world where enterprises are consolidating their data.

From data platforms: Rich API layers, fine-grained access control, performance for external queries, support for specialized compute through user-defined functions and external procedures. Modern platforms like Snowflake's external functions, Databricks' ML capabilities, and BigQuery's remote functions are steps in this direction.

The Emerging Architecture

Your Enterprise Data Platform holds your connected, contextual data. Your operational layer provides fast access for real-time applications. And specialized analytics capabilities—whether built in-house or licensed from vendors—operate on this data without requiring it to be moved.

You get the rich context and operational efficiency of centralized, connected data. And you get the specialized capabilities of best-in-class tools. Without the brittleness, cost, and complexity of moving data between systems.

Looking Forward

The shift from "move data to applications" to "move applications to data" reflects how central data has become. The line between operational and analytical systems has blurred.

Organizations adapting to this—event-driven architectures, operational databases near data platforms, treating EDPs as P1 systems—will act on richer context, respond faster, and deliver better experiences. Those maintaining old extraction patterns will fight complex pipelines and synchronization issues.

The technology exists. The question is organizational readiness.

The future of enterprise data isn't choosing between analytical power and operational performance. It's architectures delivering both.

DEV Community