DEV Community

Apache SeaTunnel
Apache SeaTunnel

Posted on

Why Your ADS Layer Always Goes Wild and How a Strong DWS Layer Fixes It

In a data warehouse system, the DWS and ADS layers mark the critical boundary between “data modeling” and “data delivery.” The former carries shared aggregation and metric reuse capabilities, determining the stability and efficiency of the data system; the latter is oriented toward specific consumption scenarios, directly impacting business delivery efficiency and user experience.

If the DWS layer is poorly designed, metrics will be repeatedly produced in the ADS layer, ultimately leading to inconsistent definitions and siloed data; if the ADS layer runs out of control, it can even backfire on the shared layer, forming unmanageable data assets. Therefore, a healthy data system must establish a clear boundary and evolution mechanism between “shared foundation” and “flexible delivery.”

As the fourth article in the Data Lakehouse design and practice series, this piece systematically summarizes the core design principles of the DWS/ADS delivery layer, including methods for shared aggregation and subject-wide table modeling, metric definition frameworks, delivery layer strategies, and lifecycle governance practices. It also addresses common issues, helping teams build a highly reusable, governable, and sustainable data delivery system.

Why DWS Must Be “Thick Enough”

In many team data systems, the DWS layer is often underestimated or even weakened, resulting in all requirements being pushed to the ADS layer. In the short term, this seems flexible, but over time it quickly spirals out of control.

The core positioning of DWS is as a shared aggregation and reuse layer. It is not designed to serve a single report, but to provide a unified data foundation for multiple applications to share. If this layer is underdeveloped, every new requirement will trigger recalculation and redefinition of metrics, resulting in a bunch of incompatible results.

In practice, a healthy state is: about 70% of analytical needs can be directly fulfilled by combining DWS tables. This means most scenarios do not require creating new tables, but rather combining existing shared capabilities. This “ready-to-use” capability is the core of reuse value.

Conversely, if each department has its own ADS tables and each report has its own metric definitions, typical silo problems emerge: metrics with the same name do not match, computations are duplicated, and data cannot be aligned. Teams spend most of their time reconciling definitions instead of analyzing business.

The value of DWS lies precisely in solving these common issues. By precomputing aggregated results of high-frequency dimension combinations, building subject-wide tables, and unifying metric outputs, DWS moves dispersed computations to the offline layer. As a result, online queries no longer rely on temporary large-scale joins or full table scans, making performance and cost more controllable.

More importantly, it changes team collaboration. Metrics no longer depend on verbal agreements—they exist as data assets: with owners, definitions, lineage, and quality rules. So-called “metric disputes” essentially become “asset governance issues.”

But there is a prerequisite: DWS must be governable. If fields lack explanations, metrics lack definitions, update frequency is unclear, or quality rules are missing, this layer will become a “wide-table collection nobody dares to use,” reducing reuse rates.

Shared Aggregation and Subject-Wide Tables: Balancing Reuse and Performance

DWS design revolves around two types of tables: shared aggregation tables and subject-wide tables.

Shared aggregation tables hinge on clarity. They must clearly define aggregation granularity (e.g., daily, weekly, monthly, or cumulative), dimension combinations (e.g., time, organization, channel, category), and metric calculation scope (e.g., amount, count, or frequency). Without clear boundaries, downstream reuse becomes unreliable.

Subject-wide tables emphasize usability. They usually focus on a business domain, e.g., users, transactions, or products, flattening frequently joined dimensions in advance to reduce query complexity. Importantly, wide tables are a result-oriented form for analytics—they are not a replacement for fact tables and must be traceable back to underlying models.

A common practical problem is wide tables continually growing. To mitigate this, fields can be governed based on usage frequency: retain high-frequency fields in the main wide table, split or join low-frequency fields on demand, and regularly slim tables according to usage.

Another common pitfall is mixing different aggregation levels in the same table, e.g., daily and monthly data together. This greatly increases misuse risk and complicates maintenance. A better approach is to split tables by level or at least enforce strict naming conventions.

All these designs assume consistent dimensions exist. Core dimensions such as user, organization, channel, and time must have unified codes and definitions, otherwise cross-table reuse fails.

From a performance perspective, DWS’s core strategy is always pre-aggregation first. Reduce data scan scale via offline computation before applying indexing, partitioning, or materialized views. Otherwise, all optimizations become remedial measures.

Metric Framework: Layered Design from Atomic to Composite

If DWS solves data reuse, then the metric framework ensures definition consistency.

A governable metric system typically has three levels: atomic metrics, derived metrics, and composite metrics.

Atomic metrics are the fundamental units. They must clearly define the target, scope, filters, and time granularity. For example, “successful payment amount” must clearly count only successful payments and use the payment completion time.

Derived metrics are calculated from atomic metrics. For example, average order value = “successful payment amount / number of successful orders.” Key here: derived metrics must inherit atomic metric definitions, or bias will occur.

Composite metrics span multiple processes or business domains, e.g., conversion rate, retention, or repeat purchase. These rely heavily on a consistent dimension system and event definitions, making them the most prone to ambiguity.

To avoid confusion, every metric must have four elements: business definition, calculation formula, scope, and time granularity. This is not just documentation—it is the basis for traceability and auditability.

Metrics must also support version control. Changes to definitions cannot overwrite historical results directly; versions or effective dates should be used to prevent “historical data being rewritten.”

In terms of layering, atomic metrics should reside in DWS (or traceable to DWD), while ADS handles only lightweight combination and presentation. If ADS takes on definition duties, it quickly becomes a new “metric generation layer.”

ADS and Data Marts: Delivery for Consumption

If DWS is about accumulation, ADS is about delivery.

ADS (or DM, data marts) aims to provide data products for specific consumption scenarios, e.g., BI reports, API services, or analytical datasets. Structures here emphasize usability, not generality.

Delivery tables should follow a “one table, one scenario” principle. Field names can be closer to business semantics, and additional display, sort, or status fields can be added to improve user experience.

But one bottom line must be enforced: delivery should not invent metrics. All core metrics must come from DWS or the metric system; ADS only handles combination, formatting, and lightweight calculation. Violating this quickly returns to “one metric per report.”

Update frequency must respect business SLA. Daily, hourly, or minute-level updates directly affect compute chains and resource costs. The higher the frequency, the more careful you must be with field scale and calculation complexity.

Governance of data marts is also crucial. They can be department- or scenario-specific, but must be built on a unified dimension and metric framework. Views or semantic layers may meet variation needs, but duplicating underlying logic is not allowed.

From “Fast Delivery” to “Sustainable Evolution”

Early on, many teams experience a phase: stacking tables in ADS for fast delivery. Initially responsive, but over time, problems emerge—delivery layers balloon, shared layers hollow out, and maintenance costs soar.

A healthier model: gradually thicken the shared layer (DWS), keep the delivery layer light, and continuously recover general capabilities back to DWS.

This also implies delivery tables must support lifecycle management. Track usage frequency, retire low-value tables, or recycle general fields and metrics back to the shared layer to avoid duplication.

Ultimately, a mature data system is not “built fast,” but “used long.” Layered DWS and ADS design underpins this long-term evolution.

Top comments (0)