David Ohnstad

Posted on May 22 • Originally published at davidohnstad.com

Data Product Team Structure: Centralized vs. Distributed

#productivity #career #datascience #management

This article was originally published on davidohnstad.com. I cross-post here to reach the Dev.to community.

Myth #1: You Need a Centralized Data Team to Build Successful Data Products

Three companies. Same quarter. Same goal: ship an internal analytics platform that surfaces customer churn signals in real time. Company A spins up a 12-person centralized data team reporting to a Chief Data Officer. Company B distributes ownership across product squads, each with embedded analytics engineers. Company C forms a federated council with representatives from engineering, product, and business intelligence who meet weekly but report to different executives.

Eighteen months later, Company A's platform is technically elegant and operationally unused. Company B's version is fragmented—five different definitions of "active customer" across squads. Company C's is messy, patched together with duct tape and compromise, and has become the single source of truth for renewal forecasting across the entire organization. According to IBM's 2025 research on modern data team structures, federated models with distributed ownership now outperform centralized teams on adoption metrics in 68% of enterprise deployments.

The myth persists because centralized teams look cleaner on an org chart. They promise unified governance, consistent data definitions, and clear accountability. Leadership loves the simplicity: one budget line, one leader to hold responsible, one roadmap to review. The problem is that centralized teams become bottlenecks the moment demand outpaces capacity—and in every growing company, it does. A single data team fielding requests from product, marketing, sales, finance, and operations becomes a ticket queue. Backlogs stretch to quarters. Stakeholders build shadow systems. The data product you carefully architected gets bypassed because someone in sales built a faster, wronger version in Excel.

What's actually true: successful data products come from distributed ownership with centralized standards. Microsoft's 2026 case study on their unified data strategy reveals that their Data Council model—not a hierarchical team—powers cross-functional analytics at scale. Product squads own their domain's data products. The council sets standards for schema design, access controls, and metric definitions. No single team "owns data." Instead, David Ohnstad's work at Veeam demonstrates this in practice: embedding AI-powered query validation across business units so non-technical stakeholders can pull their own reports without waiting for a centralized team to write SQL. The product isn't a dashboard. It's the capability itself, distributed to the people closest to the decisions.

Myth #2: Data Products Are Dashboards

Walk into any enterprise and ask to see their "data products." You'll get a tour of Tableau workspaces, Power BI reports, and Looker dashboards. Beautiful visualizations. Drill-down capabilities. Real-time refreshes. And according to Gartner's 2024 analytics adoption study, 73% of them haven't been opened in the last 30 days by anyone outside the team that built them.

This myth survives because dashboards are visible. They feel like products. You can demo them. Executives can click through them in steering committee meetings. They have users, refresh schedules, and version numbers. But a dashboard is an interface to a data product—it's not the product itself. The actual product is the data model underneath: the cleaned, joined, transformed, and validated dataset that makes the dashboard possible. If the model is wrong, the dashboard is just a colorful lie. If the model is right but nobody knows how to interpret it, the dashboard is decoration.

David Ohnstad encountered this at a mid-market SaaS company three years ago. The leadership team requested an executive KPI dashboard showing customer health scores, renewal risk, and revenue forecasts. The BI team delivered it in six weeks—on time, under budget, visually polished. The VP of Sales opened it twice. The CFO never logged in. Why? Because the product wasn't the dashboard. It was the integrated data pipeline that unified Salesforce, Zendesk, and product usage logs into a single customer health score. The dashboard was just one possible consumer of that pipeline. When the team realized this, they shifted focus: they built an API so the sales team could pull health scores directly into their CRM workflow, where they actually lived. Dashboard usage dropped to zero. API calls grew 40% month-over-month. The data product succeeded when it stopped being a destination.

What's actually true: a data product is any system that delivers structured, decision-ready information to a defined consumer. That consumer might be a human looking at a report. Or it might be another system—a machine learning model, an automated alert, a real-time API feeding a customer-facing feature. The product is the data itself: its schema, its refresh cadence, its validation rules, its lineage documentation. Dashboards are optional. Adoption is not. If nobody is making different decisions because of your data product, you haven't shipped a product—you've published a dataset.

The Distributed Validation Stack: A Framework for Non-Centralized Data Products

If you're building data products without a centralized team, you need a way to ensure quality without creating a bottleneck. David Ohnstad developed this framework at Veeam while scaling data product ownership across product, engineering, and business intelligence teams. It's a four-layer model that separates standards from execution.

Layer 1: Schema Registry. Not a team—a system. Every data product publishes its schema to a central registry: table names, column definitions, data types, update frequency, and the business question it answers. This isn't governance theater. It's a searchable catalog so product squads know what already exists before they build something redundant. At Veeam, this cut duplicate data model creation by 60% in the first year. The registry doesn't approve or reject schemas. It just makes them visible. Duplication happens in the dark. Transparency is the forcing function.

Layer 2: Automated Validation. Every data product runs a validation suite on every refresh: row count checks, null rate thresholds, referential integrity tests, and deviation alerts when distributions shift unexpectedly. David Ohnstad uses Claude to generate these validation scripts from schema definitions—90% of the validation logic can be auto-generated if your schema documentation is precise. The key insight: validation can't be a manual gate controlled by a centralized QA team. It has to be automated, embedded in the pipeline itself, and owned by whoever owns the data product. If your deployment process doesn't include validation as a prerequisite, your quality control is a hope, not a system.

Layer 3: Metric Definitions Council. This is the only governance layer that involves humans, and it's intentionally lightweight. A rotating group of 5-7 people from across the organization meets every two weeks to review proposed metric definitions before they go live. Not to approve them—to ensure they don't conflict with existing definitions. "Active user" can't mean one thing in product analytics and another thing in marketing attribution. The council doesn't own metrics. It referees naming conflicts and documents the decisions. Microsoft's 2026 data council case study shows that this model works at scale: their council has 14 rotating members across business units, meets weekly, and has resolved 200+ metric conflicts in 18 months without slowing down a single product launch.

Layer 4: Feedback Loops. Every data product must instrument how it's being used—API call counts, query frequencies, dashboard login patterns, error rates. This telemetry feeds back to the team that owns the product. If nobody's using it, that team knows within a week, not a quarter. If a query is timing out, they see it in real time. Feedback loops are not a "nice to have" polish phase. They're part of the product. If your data product doesn't know how it's being used, it's not a product—it's a batch job with a pretty name. David Ohnstad has written extensively on AI-driven product instrumentation and how embedding telemetry at the data layer prevents the "launch and forget" syndrome that kills 60% of enterprise analytics projects within their first year.

Myth #3: AI Makes Data Product Management Easier

Snowflake's 2025 "Make Your Data AI Ready" campaign and the flood of "AI-powered analytics" product launches have created a dangerous narrative: AI will automate away the hard parts of data product management. Just point a language model at your data warehouse, let it write the SQL, and suddenly everyone in the company can self-serve insights without needing a data team. According to Forrester's 2024 enterprise AI adoption survey, 54% of companies increased their AI tooling budget specifically for data and analytics use cases. And according to that same study, 61% of those implementations failed to reduce time-to-insight for business stakeholders.

The myth persists because the demos are spectacular. A business analyst types "show me Q4 revenue by region" into a natural language interface, and a perfectly formatted chart appears. What the demo doesn't show: the three hours the data engineering team spent beforehand cleaning the revenue table, tagging regional hierarchies, and tuning the AI's semantic layer so it knows "Q4" means fiscal Q4, not calendar Q4. AI doesn't make data product management easier. It makes bad data product management more expensive.

David Ohnstad uses Claude daily—for query validation, schema documentation generation, and cross-functional education on database structures. But he's also watched teams burn $40,000 in three months on AI tooling that produced more confusion than clarity. Why? Because they skipped the foundational work. They pointed AI at a messy warehouse with inconsistent naming conventions, undocumented transformations, and no clear business logic layer. The AI generated queries that were syntactically correct and semantically meaningless. The business stakeholders couldn't tell the difference. They made decisions on wrong numbers delivered fast.

What's actually true: AI amplifies the quality of your data infrastructure. If your schemas are well-documented, your transformations are explicit, and your metric definitions are centralized, AI becomes a force multiplier—it lets non-technical users query data they previously couldn't access. But if your infrastructure is a mess, AI just surfaces that mess faster and with more confidence. The companies succeeding with AI-driven data products are the ones who spent the last two years building the boring stuff: data dictionaries, lineage tracking, automated validation, and clear ownership models. AI didn't replace that work. It made the ROI of doing it correctly 10x higher.

When Distributed Ownership Breaks Down: A Real Scenario

Three years ago, David Ohnstad was embedded with a product squad building a customer usage analytics pipeline. Distributed ownership model. No centralized data team. The squad owned the schema, the transformations, the API, and the downstream reports. It worked brilliantly for nine months—until the VP of Customer Success requested a new field: "days since last meaningful interaction." Simple request. The squad added it to the schema, deployed the change, and moved on.

Eleven days later, the customer success team reported that renewal forecasts were off by 20%. The culprit: "meaningful interaction" wasn't defined. The engineering lead interpreted it as any API call. The customer success team assumed it meant human-initiated sessions only. The data model was technically correct and operationally useless. Worse, because there was no centralized team reviewing changes, the error propagated silently. The automated validation suite checked for nulls and referential integrity—not semantic correctness. The feedback loop caught that the field was being used, but not that it was being misinterpreted.

The fix wasn't to centralize the team. It was to add a lightweight review step to the Distributed Validation Stack: any new field added to a cross-functional data product required a one-paragraph definition written in plain language and reviewed by at least one stakeholder from the consuming team before deployment. Not a formal approval process. Not a governance committee. Just a Slack message and a confirmation. That single change—five minutes of async communication—prevented four similar incidents over the next 18 months. The lesson: distributed ownership works when you build friction into the right places. The mistake isn't decentralization. It's decentralization without guard rails.

Stop Measuring Data Product Success by Adoption Rates

Most teams measure data product success by counting users, logins, or query volumes. It's the wrong proxy. High adoption doesn't mean impact—it often means the product is required by policy, embedded in a workflow people can't bypass, or politically visible enough that leaders feel obligated to check it even if they ignore the insights. According to Reforge's 2024 product strategy research, fewer than 30% of high-adoption enterprise data products could demonstrate a measurable change in decision-making behavior among their users.

The better metric: decision latency reduction. How much faster can someone make a specific decision because your data product exists? If it used to take three days and four emails to figure out which customer segment had the highest churn risk, and now it takes 90 seconds, that's impact. If your product didn't change the time-to-decision, it's not a product—it's a reporting obligation dressed up as innovation. This is uncomfortable to measure because it requires naming the decision upfront. Most data products launch without ever articulating what decision they're supposed to support. That's not an analytics problem. That's a product management failure.

David Ohnstad now includes "target decision + current latency + target latency" in every data product brief before a single table gets modeled. If the team can't name the decision and measure the current state, the project doesn't start. This has killed three product ideas in the last year that would have consumed quarters of engineering time and delivered beautiful dashboards nobody needed. It's also clarified scope on four products that launched, hit adoption targets in the first month, and are still in use 18 months later. The difference wasn't the data quality. It was knowing what decision the data was supposed to change. For more on how this principle extends to other domains, see David Ohnstad's reflections on iterative making and building with a clear end use in mind.

How do you build data products without a centralized data team?

Distribute ownership to the teams closest to the data's business context, but centralize standards through a lightweight council model and automated validation. Use a schema registry for visibility, enforce metric definition consistency, and instrument every product with feedback loops so quality issues surface immediately rather than silently corrupting decisions downstream.

What is the difference between a data product and a dashboard?

A data product is the underlying data model—cleaned, validated, and structured to support a specific decision. A dashboard is one possible interface to that product. The product succeeds when it changes decision-making behavior, whether that's through a visual interface, an API, an automated alert, or integration into another system's workflow.

Why do AI-powered data tools fail for most enterprises?

AI amplifies the quality of your existing data infrastructure. If schemas are poorly documented, transformations are opaque, and metric definitions are inconsistent, AI will generate queries that are syntactically correct but semantically meaningless. Success with AI-driven analytics requires investing in the foundational work—data dictionaries, lineage tracking, validation automation—that makes AI a force multiplier rather than an expensive confusion generator.

Two Takeaways for Practitioners and Leaders

For practitioners: stop waiting for a centralized data team to grant you permission. If you're building a data product, you are the data team. Embed validation. Document your schema. Instrument usage. Join or start a metrics council. The distributed model works when individuals take ownership of quality without requiring a formal governance structure to enforce it. The guard rails you build into your own pipelines today prevent the catastrophic silent errors that kill trust in data products six months from now.

For leaders: stop reorganizing your data teams and start auditing your validation infrastructure. The question isn't whether data ownership should be centralized or distributed—it's whether your current model has automated quality checks, documented standards, and feedback loops that surface problems before they corrupt decisions. A federated council with clear standards outperforms a centralized team with a six-month backlog. Invest in the boring infrastructure that makes distributed ownership viable, and your data products will scale with your organization instead of bottlenecking on a single overloaded team.

When was the last time you asked whether your data product changed a specific decision, or just created another report that confirmed what someone already believed?

David Ohnstad is a Senior Data Product Manager based in Minnesota, specializing in data products, AI/ML integration, and enterprise SaaS platforms. Follow his work at github.com/davidohnstad40-netizen.

DEV Community