Gowtham Potureddi

Posted on Jun 7

SQL Server Analysis Services (SSAS): Tabular vs Multidimensional for Data Engineers

#python #sql #interview #dataengineering

sql server analysis services is the Microsoft engine that sits between the warehouse and the dashboard — the semantic layer where shared definitions, fast aggregation, and governed metrics live. It is also one of the most-misunderstood products in the BI stack because it ships two engines in one box: the legacy multidimensional OLAP world of MDX and pre-aggregated cubes, and the modern in-memory tabular world of DAX and Vertipaq columnstore. Pick the wrong one and you will spend the next two years either fighting cube-processing windows or hand-rolling time-intelligence that the other engine gives you for free.

This guide is the decision matrix you wished existed before your first SSAS interview or the day you had to defend a "should we migrate to Power BI" review. It walks through why a semantic layer still matters in 2026, the seven-row Tabular vs Multidimensional comparison, the Vertipaq engine that powers every modern semantic model from SSAS tabular to Power BI datasets to Fabric, the DAX vs MDX language pair through the same YTD KPI, and the deployment topology that ties an SQL Server source through an XMLA endpoint to every consumer surface. Each section pairs a teaching block with a Solution-Tail interview answer — code, a step-by-step trace, an output table, and a concept-by-concept breakdown of why it works.

When you want hands-on reps immediately after reading, drill the dimensional modelling practice library →, rehearse on aggregation problems →, and stack the JOIN muscles with JOIN practice drills →.

On this page

Why a semantic layer still matters in 2026
Tabular vs Multidimensional — the decision matrix
The Vertipaq engine — why tabular is so fast
DAX vs MDX — same KPI, two languages
Deploying SSAS — instance topology, refresh, security
Cheat sheet — SSAS modelling recipes
Frequently asked questions
Practice on PipeCode

1. Why a semantic layer still matters in 2026

The three jobs of a semantic model — shared definitions, fast aggregation, governed metrics

The one-sentence invariant: a semantic layer is the contract between "what the warehouse stores" and "what the dashboard shows" — and sql server analysis services is the longest-running production-grade implementation of that contract in the Microsoft stack. Without it, every Power BI report, every Excel pivot, and every Tableau extract rebuilds the same metric definitions from scratch, and "monthly active users" silently means three different things by the end of Q3.

What a semantic model actually delivers.

Shared definitions. Every consumer reads the same Sales Amount, Active Users, Gross Margin definition. The metric is named once, computed once, and surfaced everywhere — instead of being re-derived per report.
Fast aggregation. The model pre-shapes the data — pre-joined dimensions, pre-aggregated facts (multidimensional) or pre-compressed in-memory columns (tabular) — so queries that would scan billions of source rows return in sub-second.
Governed metrics. Row-level security, object-level security, and tested calculation logic ship as part of the model, so a "Sales by Region" pivot in Excel inherits the same filters that the official dashboard already applies.

Where SSAS sits in the modern Microsoft stack.

The flow has been stable for fifteen years: a relational source (SQL Server, Synapse, Fabric warehouse, OneLake) feeds an SSAS instance (tabular or multidimensional), which exposes an XMLA endpoint, which is consumed by Power BI service, Power BI Desktop, Excel pivot tables, paginated reports, and increasingly Fabric semantic models. The same shape repeats with different names: Azure Analysis Services rents the instance as a PaaS box; Power BI Premium / Fabric capacity hosts a tabular dataset as the same engine; on-premises SSAS keeps running in environments that require local data.

The "one number, one definition" problem.

A finance team and a marketing team agree on the metric "Gross Margin." Finance writes (revenue - cogs) / revenue against the GL. Marketing writes the same expression against the campaign attribution table. Three months later the boardroom slide shows two materially different margins for "the same quarter" and the CFO loses an hour to reconciliation. The semantic-model fix is to publish the metric — Gross Margin := ([Revenue] - [COGS]) / [Revenue] — exactly once inside SSAS, expose it to every consumer through the XMLA endpoint, and never define it again. That is the single biggest reason SSAS tabular survived the rise of self-service BI.

Tabular vs Multidimensional — one sentence each.

SSAS Tabular. A relational, in-memory, columnar model. Query language: DAX. Storage: Vertipaq columnstore. Modelling style: star schema with relationships. Best for: most greenfield 2026 projects, Power BI / Fabric integration, sub-second aggregation on wide tables.
SSAS Multidimensional. A cube-based, pre-aggregated OLAP model. Query language: MDX. Storage: MOLAP / ROLAP / HOLAP cubes. Modelling style: dimensions + measure groups + hierarchies. Best for: deep hierarchies, writeback, cube actions, Excel pivot heavy environments, environments that already own multidimensional code.

Why Power BI Premium datasets are essentially SSAS Tabular under the hood.

Open the XMLA endpoint of a Power BI Premium dataset or a Fabric semantic model and you are connecting to the same Vertipaq engine that ships inside SSAS Tabular. The dataset is a tabular model; the metadata is the same Tabular Model Scripting Language (TMSL) / Tabular Model Definition Language (TMDL); the query language is DAX. Microsoft chose to keep the engine and put new product surfaces — Power BI service, Power BI Desktop, Fabric — in front of it. Practically, this means an investment in tabular skills carries forward into every Microsoft BI product, while an investment in multidimensional skills increasingly does not.

Who still chooses SSAS over Power BI Premium or Fabric semantic models.

On-prem mandates. Defence, healthcare, regulated finance — environments that cannot ship data to a Microsoft-hosted endpoint.
Existing investment. A factory that already runs five multidimensional cubes built in 2014; rewriting them in DAX is a three-year project with no immediate business win.
Detached lifecycle. Teams that need to version SSAS models in Git, deploy through Azure DevOps to private capacity, and use Tabular Editor + ALM Toolkit outside the Power BI portal.
Cost-control on huge models. A 500GB tabular model is cheaper to host on a dedicated SSAS box than on Power BI Premium capacity with the equivalent RAM SKU.

What interviewers listen for.

Do you say "tabular is the default for greenfield in 2026" when asked to pick? — required answer.
Do you mention Vertipaq and DAX in the same breath as tabular? — required answer.
Do you note that Power BI datasets and Fabric semantic models are tabular under the hood? — senior signal.
Do you reach for multidimensional only for writeback, deep ragged hierarchies, or legacy cubes? — senior signal.

Worked example — choosing the model type for a new BI project

Detailed explanation. A new product analytics team needs a semantic layer over a SQL Server fact table of 80M events / month with 12 dimensions. They will surface it through Power BI service and Excel pivots; the metric catalogue is roughly 40 measures with standard time-intelligence (YTD, MTD, WoW). There is no writeback requirement and no ragged organisational hierarchy. The interviewer asks which SSAS mode you would pick and why.

Question. Walk through the decision: SSAS Tabular vs SSAS Multidimensional vs Power BI Premium dataset vs Fabric semantic model. State which one you pick for this scenario and the two probing questions you would ask before committing.

Input — requirement matrix.

Requirement	Value
Fact rows	80M / month
Dimensions	12
Surface	Power BI service + Excel pivots
Writeback	not required
Ragged hierarchy	not required
Tenancy	single tenant, internal
Refresh window	nightly + 15-min incremental

Code.

Decision flow

1. Writeback or ragged hierarchy required?      -> No  -> not Multidimensional
2. On-prem data residency mandate?              -> No  -> SSAS box not required
3. Power BI / Fabric tenant available?          -> Yes -> Power BI Premium or Fabric
4. Model size projected < 25GB compressed?      -> Yes -> a P1 SKU is sufficient
5. Team experience: DAX > MDX?                  -> Yes -> Tabular
=> Default = Power BI Premium dataset (tabular engine + Vertipaq).
   Fall back to SSAS Tabular only if on-prem / detached lifecycle is required.

Step-by-step explanation.

Writeback and ragged hierarchies are the two flags that would tip a project toward Multidimensional. Both are absent, so Multidimensional is out.
On-prem residency is the flag that would force a dedicated SSAS box. Absent, so a hosted PaaS option is available.
Power BI Premium and Fabric both expose a tabular dataset over the same Vertipaq engine; either is a valid host.
Sizing the model under the available memory SKU is the next gate. At 80M rows / month with 12 dimensions, a typical compression of 10x gets the model into the single-GB range — fits a P1 SKU.
The team's DAX vs MDX preference is the final tie-breaker. DAX-skilled teams pick tabular; MDX-skilled teams that want to stay on multidim still need a reason.

Output.

Option	Verdict	Reason
SSAS Multidimensional	No	No writeback, no deep hierarchies
SSAS Tabular on-prem	Maybe	Only if Premium is unavailable
Power BI Premium dataset	Yes	Same Vertipaq engine + native surface
Fabric semantic model	Yes	Next-gen surface, same engine

Rule of thumb. When a 2026 project does not need writeback and the data can be hosted on a Microsoft tenant, you are picking between three flavours of the same tabular engine — SSAS Tabular, Power BI Premium dataset, Fabric semantic model. Multidimensional only appears on the shortlist when writeback or ragged hierarchy is explicit.

Worked example — the "one number, one definition" failure

Detailed explanation. A common interview probe asks the candidate to describe a real-world metric drift and the semantic-model fix. The shortest credible answer pairs a concrete metric (Gross Margin, Active Users, Order Conversion) with a description of where the duplicate definitions lived, how they drifted, and how the SSAS model collapsed them into a single published measure.

Question. A finance dashboard and a marketing dashboard show different "Gross Margin" numbers for the same quarter. How does an SSAS tabular model fix this without rewriting either dashboard?

Input — the two competing definitions.

Surface	Expression	Source
Finance dashboard	(gl.revenue - gl.cogs) / gl.revenue	General-ledger fact
Marketing dashboard	(att.revenue - att.product_cost) / att.revenue	Attribution fact

Code.

-- Published in the SSAS Tabular model exactly once,
-- consumed by every dashboard via XMLA / Power BI service.

Gross Margin :=
DIVIDE(
    [Revenue] - [COGS],
    [Revenue]
)

Revenue := SUM(fact_sales[Revenue])
COGS    := SUM(fact_sales[COGS])

Step-by-step explanation.

The tabular model defines Revenue and COGS over a single fact table (fact_sales) that has been ETL-ed from both the GL and the attribution feed with the same grain.
The published Gross Margin measure wraps the division with DIVIDE — DAX's safe-division primitive that returns blank on divide-by-zero instead of erroring.
Every consumer (finance dashboard, marketing dashboard, Excel pivot) connects to the same dataset and uses the published [Gross Margin] measure. There is now exactly one definition.
If finance and marketing genuinely need different margin metrics — e.g. one that includes channel attribution and one that does not — the model exposes both as named measures (Gross Margin (GL), Gross Margin (Attributed)) so the difference is auditable rather than accidental.

Output.

Quarter	Old Finance	Old Marketing	New Gross Margin
2026 Q1	42.1%	39.8%	41.5%
2026 Q2	43.0%	40.2%	42.1%

Rule of thumb. A semantic model is worth its hosting bill the first time it catches a definition drift in code review instead of in the boardroom. The published measure becomes the single source of truth — every dashboard consumes, no dashboard redefines.

SQL interview question on semantic-layer purpose

A senior interviewer often opens with: "Why would you still pick SSAS in 2026 when Power BI Premium and Fabric exist?" It blends product knowledge, architecture taste, and an honest read of where the engine investment goes — exactly the answer that separates a junior BI developer from a data engineer who can defend a build-vs-buy decision.

Solution Using a three-question decision frame

Decision frame — pick the SSAS surface in 90 seconds.

Q1) Do you need on-prem data residency or a detached lifecycle?
    YES -> SSAS Tabular on-prem (or AAS if cloud OK but PBI off-limits).
    NO  -> continue.

Q2) Do you need writeback, ragged hierarchies, or cube actions?
    YES -> SSAS Multidimensional (last-resort; expect MDX maintenance).
    NO  -> continue.

Q3) Will the model live inside a Power BI / Fabric tenant?
    YES -> Power BI Premium dataset OR Fabric semantic model
           (same Vertipaq tabular engine, modern surface, ALM via TMDL).
    NO  -> SSAS Tabular on AAS or on-prem.

Step-by-step trace.

Scenario	Q1	Q2	Q3	Verdict
Defence on-prem reporting	yes	no	n/a	SSAS Tabular on-prem
Manufacturing budget writeback	no	yes	n/a	SSAS Multidimensional
SaaS product analytics	no	no	yes	Power BI Premium / Fabric
Legacy cube migration	no	no (now)	yes	Plan migration → Tabular

The trace shows the same three-question funnel reduces every greenfield project to one of four outcomes. Senior reviewers expect the candidate to land on Power BI / Fabric as the 2026 default and to defend any deviation explicitly.

Output:

Verdict	When to pick	Engine
SSAS Tabular on-prem	data residency / detached lifecycle	Vertipaq
SSAS Multidimensional	writeback / ragged / legacy	MOLAP
Power BI Premium dataset	greenfield + tenant available	Vertipaq
Fabric semantic model	greenfield + Fabric capacity	Vertipaq

Why this works — concept by concept:

Three flags, four outcomes — residency, writeback, and tenant availability are the only three questions that move the verdict. Everything else (team skill, refresh cadence, RLS) is downstream of those three.
Vertipaq carries forward — three of the four outcomes use the same in-memory tabular engine, which means any DAX investment compounds across surfaces. Multidimensional is the only outlier.
Multidimensional is the exception, not the default — it requires explicit justification (writeback / ragged hierarchy) because the engine receives no new investment.
The "is it tenant-hosted" gate — chooses where the model runs, not what engine it uses. The engine is constant; the hosting choice tunes price, RAM SKU, and ALM workflow.
Cost — O(1) per decision; the entire frame is three boolean checks. The real cost is the migration project that follows the verdict, not the verdict itself.

SQL
Topic — dimensional modeling
Dimensional modelling problems (SQL)

Practice →

SQL
Topic — aggregation
Aggregation problems (SQL)

Practice →

2. Tabular vs Multidimensional — the decision matrix

Seven rows that decide every SSAS project — and a default recommendation for 2026

The mental model in one line: Tabular is a relational, in-memory, columnar model spoken in DAX; Multidimensional is a pre-aggregated cube spoken in MDX. Every other difference — storage, complexity, Microsoft's investment trajectory, even Excel surface behaviour — derives from those two foundational choices.

The seven-row decision matrix.

Row	SSAS Tabular	SSAS Multidimensional
Storage	Vertipaq columnstore (in-memory)	MOLAP / ROLAP / HOLAP cubes
Query language	DAX (also accepts MDX since SSAS 2017)	MDX (DAX queries since 2017)
Modelling style	Star schema + relationships	Dimensions + measure groups + hierarchies
Performance profile	In-memory column scan (sub-second on aggregates)	Pre-aggregated cubes (best for deep hierarchies)
Complexity	Lower — Power BI / DAX-friendly	Higher — MDX + cube design + processing
Use case fit	SaaS analytics, Power BI, Fabric, large tabular models	Writeback, cube actions, ragged hierarchies, legacy BI
Microsoft direction	Active investment (Power BI, Fabric, AAS)	Maintenance mode since 2019

Storage in depth.

Tabular = Vertipaq columnstore. Each column is stored separately, dictionary-encoded, run-length compressed, bit-packed, and kept in RAM. Reads only scan the columns the query touches.
Multidimensional = MOLAP / ROLAP / HOLAP. MOLAP stores everything (data + aggregations) pre-built on disk in a proprietary cube format; ROLAP keeps data in the relational source and only stores aggregations; HOLAP splits the two. MOLAP is by far the most common.

Query language in depth.

DAX. "Data Analysis Expressions" — a functional language designed for tabular models. Mental model: filter context + row context, manipulated by CALCULATE. Same language as Power BI measures.
MDX. "Multidimensional Expressions" — a multi-axis cube language. Mental model: tuples, sets, and a current sub-cube manipulated by SCOPE and named sets. Powerful but steep.
Cross-compatibility. Since SSAS 2017, both engines accept both languages — Tabular accepts MDX (via a translator) and Multidimensional accepts DAX (via DAXMD). In practice every team writes the native language for the engine and treats the other as a read-only fallback.

Modelling style in depth.

Tabular. A familiar star schema: one fact table per grain, dimension tables joined by surrogate keys, calculated columns and measures defined in DAX. The model imports tables and infers relationships much like Power BI Desktop.
Multidimensional. Explicit dimensions (with attributes, hierarchies, member properties), measure groups (over fact tables), perspectives, KPIs, calculated members, MDX SCOPE assignments. The data source view (DSV) sits between the cube and the source tables.

Performance profile in depth.

Tabular — sub-second on aggregates across wide fact tables because the storage engine scans only the few columns the DAX expression references. The Formula Engine is single-threaded; the Storage Engine is multi-core.
Multidimensional — sub-second on aggregates that have been pre-aggregated during cube processing. If a query hits an aggregation it is fast; if it misses, performance falls back to a leaf-level scan that can be much slower than tabular.

Complexity in depth.

Tabular — the lowest barrier to entry for a Power BI developer. The whole modelling experience translates directly to Power BI Desktop and Tabular Editor.
Multidimensional — a separate skillset (cube design, MDX, processing strategies, aggregation design wizard) with a much longer ramp.

Default recommendation for 2026.

Greenfield SSAS → Tabular (Vertipaq) + Power BI / Fabric surface.
Keep Multidimensional only when writeback, complex hierarchies, cube actions, or already-deployed cubes make migration uneconomic.
Power BI datasets and Fabric semantic models are tabular under the hood — the same engine, surfaced as a different product.

Common interview probes on the decision matrix.

"Is Multidimensional deprecated?" — Not formally, but it has been in maintenance mode since 2019. New investment goes into Tabular.
"Why does Tabular feel faster for ad-hoc queries?" — Vertipaq compresses and scans only the queried columns; Multidimensional is fastest when the query hits a pre-aggregated path.
"When is MOLAP still the right choice?" — Deep ragged hierarchies (org charts, geography), writeback, and Excel cube formulas. Otherwise Tabular wins on TCO.
"Can the same SSAS instance host both?" — No. The service is configured to one mode (tabular or multidimensional) at install time. Run two separate instances if you need both.

Worked example — picking the storage mode for a 200M-row fact table

Detailed explanation. A team migrates from a multidimensional cube to a tabular model. The fact table has 200M rows and 18 columns, of which 6 are wide strings (UUIDs, free-text descriptions). They need sub-second response on aggregates by date and product.

Question. What storage tweaks should the team apply on the way into the tabular model to keep RAM under 20GB and queries under one second?

Input — column profile.

Column	Cardinality	Type	Action
order_id	200M	UUID string	drop or hash
event_uuid	200M	UUID string	drop
created_at	200M (sec)	timestamp	truncate to date
product_id	8k	int FK	keep
store_id	1.2k	int FK	keep
amount	200M (num)	decimal(18,4)	reduce precision

Code.

-- The source view that feeds the tabular model — done in SQL before import.
CREATE OR REPLACE VIEW v_fact_sales AS
SELECT
    -- drop the row-unique UUIDs (kill compression, no analytic value)
    -- order_id -- omitted from view
    -- event_uuid -- omitted from view

    -- truncate timestamps to date grain (lowers cardinality from 200M to ~1095)
    CAST(created_at AS DATE)               AS order_date,

    -- keep low-cardinality foreign keys as-is
    product_id,
    store_id,

    -- reduce numeric precision where possible
    CAST(amount AS DECIMAL(10, 2))         AS amount
FROM fact_sales_raw
WHERE created_at >= CURRENT_DATE - INTERVAL '3 years';

Step-by-step explanation.

The two UUID columns have cardinality equal to row count — dictionary encoding produces a dictionary as big as the data, defeating compression. They are dropped (or, if needed, hashed to a low-cardinality bucket).
created_at is truncated to a date grain in the source view. Cardinality drops from 200M (second-precision) to roughly 1095 (3 years × 365 days), so dictionary + RLE compress it down to a fraction of the original column.
The integer foreign keys (product_id, store_id) are already low cardinality — they compress well as-is.
amount is downcast from DECIMAL(18, 4) to DECIMAL(10, 2). Vertipaq packs lower-precision numerics into fewer bits per value.
The 3-year WHERE cap caps history at the business requirement instead of importing the entire table.

Output — RAM estimate before and after.

Stage	Approx RAM
Naive import (all columns, sec precision, decimal(18,4))	~80 GB
Drop UUIDs	~28 GB
Truncate to date	~16 GB
Reduce decimal precision	~14 GB
Cap to 3 years	~12 GB

Rule of thumb. Before importing any large fact into a tabular model, audit cardinality and precision column by column. High-cardinality string columns are the silent RAM killers; truncating timestamps to date grain is the single biggest compression win available.

Worked example — when a multidimensional cube is still the right answer

Detailed explanation. A manufacturer runs a budgeting cube where finance users write numbers back into the cube (writeback partitions). The org chart has ragged depth — some divisions have 4 levels, others have 7. Migrating to tabular would require simulating writeback through staging tables and faking ragged hierarchies via parent-child DAX patterns.

Question. Would you migrate this cube to Tabular in 2026? Defend your answer.

Input — cube footprint.

Feature	Used	Notes
Writeback partitions	yes	Quarterly budget input
Ragged hierarchies	yes	Org chart
Excel cube formulas	yes	Finance lives in Excel
MDX SCOPE assignments	yes	40 named calculations
Source data drift	no	Stable ETL since 2017

Code.

Migration scoring — Multidimensional -> Tabular

Cost driver           | Weight | Score
----------------------|--------|------
Writeback rewrite     | high   | -3   (no native tabular equivalent)
Ragged hierarchy fix  | high   | -2   (parent-child DAX is brittle)
40 MDX SCOPE -> DAX   | medium | -2   (manual rewrite)
Excel cube formulas   | medium | -1   (some break on tabular)
Source ETL stability  | bonus  | +0   (no urgency)
----------------------|--------|------
Net                   |        | -8

Step-by-step explanation.

Writeback is the single biggest blocker. Tabular has no native writeback; replicating it requires a staging-table pattern with a refresh, which changes the user experience.
Ragged hierarchies can be modelled in DAX with parent-child patterns, but they are slower, harder to maintain, and confusing for new developers.
The 40 MDX SCOPE assignments would each need a manual DAX rewrite — high risk of behavioural drift.
The source ETL is stable, so there is no urgency forcing the migration.
The net migration score is negative — the right call is to keep the multidimensional cube, plan a longer-horizon rewrite if writeback is ever retired.

Output.

Verdict	Reason
Keep Multidimensional	Writeback + ragged + stable source
Plan Tabular migration	Only if writeback requirement is dropped

Rule of thumb. Migrate from Multidimensional to Tabular only when all three of the cube's signature features (writeback, ragged hierarchies, MDX SCOPE) are absent or trivially replaceable. Otherwise the cube will outlive the consultants who wrote it.

SQL interview question on choosing between Tabular and Multidimensional

A senior interviewer might frame it as: "Your team inherits an SSAS Multidimensional cube and a budget request for a new SaaS product analytics dataset. Walk me through how you'd choose the engine for each, and what would change your mind."

Solution Using the seven-row matrix as a checklist

1) Print the seven-row matrix.
2) Score each row for the new dataset:
   - Storage   ->  Tabular (no MOLAP need)
   - Lang      ->  DAX (team has Power BI skill)
   - Modelling ->  star schema fits
   - Perf      ->  in-memory wins for ad-hoc
   - Complexity-> Tabular wins
   - Use case  ->  SaaS analytics -> Tabular
   - Direction ->  Microsoft invests in Tabular
=> Verdict for the new dataset = Tabular.

3) Score each row for the inherited cube:
   - Storage   ->  Multidim already in MOLAP
   - Lang      ->  40 MDX SCOPE assignments
   - Modelling ->  Ragged hierarchies exist
   - Perf      ->  Pre-agg is good enough
   - Complexity-> Already paid; no win in moving
   - Use case  ->  Writeback present
   - Direction ->  Tabular is future BUT migration cost is huge
=> Verdict for the inherited cube = Keep, plan rewrite when writeback is retired.

Step-by-step trace.

Dataset	Net score	Verdict	Trigger to revisit
New SaaS analytics	+7	Build in Tabular	none — Tabular is correct
Inherited budgeting cube	-8	Keep Multidim	writeback requirement drops

The trace makes the trade-offs explicit. Senior reviewers expect the candidate to name the change condition that would flip the verdict — not just the verdict itself.

Output:

Project	Engine	Surface
New SaaS analytics	SSAS Tabular	Power BI Premium
Budgeting cube	SSAS Multidimensional	Excel cube formulas

Why this works — concept by concept:

Score each row, not the project — the matrix lets each requirement vote independently. The net score makes drift between teams transparent.
Change conditions matter — naming the future event that would flip the verdict (writeback retired, ragged hierarchy redesigned) signals that the candidate models product evolution, not snapshots.
Two verdicts on one team — a single team can credibly run both engines in parallel. The wrong move is to force consistency by migrating a Multidim cube without business need.
Microsoft direction is a tie-breaker, not a hammer — it tips greenfield decisions, but it does not justify a multi-year migration cost on its own.
Cost — O(rows in matrix) per decision; the real cost lives in the migration project, which is downstream of the verdict.

SQL
Topic — dimensional modeling
Star schema and cube modelling drills

Practice →

SQL
Topic — grouping
GROUP BY and rollup problems

Practice →

3. The Vertipaq engine — why tabular is so fast

Four compression tricks and two engines that turn 50M rows into a sub-second scan

The mental model in one line: Vertipaq is a columnar, in-memory, dictionary-encoded, run-length-packed store that lets the Storage Engine scan only the bytes the DAX query touches. Once you internalise those four words — columnar, in-memory, dictionary, RLE — every "why is my tabular model 50GB?" diagnosis is one of three audit questions.

The four compression tricks.

Columnar storage. Each column is stored separately. A query that touches amount and order_date scans those two columns; the other 16 columns are untouched. Compare with row stores where every row read pulls every column off disk.
Dictionary encoding. Each distinct value in a column is mapped to a small integer key. 'iPhone' becomes 1, 'Pixel' becomes 2, 'Galaxy' becomes 3. The column on disk stores the integer stream; the dictionary maps back to strings.
Run-length encoding (RLE). Sorted runs of the same value compress to (value, count). The integer stream 1, 1, 1, 1, 2, 2, 3 becomes (1, 4), (2, 2), (3, 1) — three pairs instead of seven integers.
Bit-packing. A column whose distinct values fit in 3 bits is stored with 3 bits per row, not 32 bits per row. Low-cardinality columns shrink dramatically.

In-memory by default.

After first load, the entire model lives in RAM. Cold-start cost is the first query that touches a table; subsequent queries scan the same in-memory column.
Decompression happens at query time inside the Storage Engine. Vertipaq targets very fast decompression so the cost of "compressed vs uncompressed" is small per query.

The two engines.

Storage Engine (SE, also called xVelocity / Vertipaq). Multi-core, cache-friendly, scans columns and returns matching rows. Cheap per row.
Formula Engine (FE). Single-threaded, walks the DAX expression tree, calls into SE for the underlying data. More expensive per call.
The goal of every DAX optimisation is to push more work into the SE and less into the FE. Vertipaq analyser reports (FE vs SE time) tell you where a slow query lives.

Why high-cardinality columns hurt.

Dictionary encoding makes the dictionary as big as the data when every value is unique (timestamps to the second, GUIDs, free-text descriptions).
RLE provides no win when values do not repeat.
The column ends up taking close to its uncompressed size, plus the dictionary overhead.
The fix — drop high-cardinality columns you do not need, truncate timestamps to lower grain (day instead of second), and hash free-text to a lower-cardinality bucket if you only need it for filtering.

Practical sizing.

Typical compression on a well-modelled fact table is 10x–100x vs source rows. A 50M-row source table that is 12GB on disk often lives as a 400MB–1GB tabular model.
RAM budget during process is roughly 2x the model size — the engine needs space for the new version while the old version is still serving queries.
Partition + incremental refresh for tables larger than 10GB. Process only the latest partition each night; the historical partitions stay warm in RAM untouched.

Common interview probes on Vertipaq.

"Why is my tabular model so big?" — first audit the highest-cardinality columns; they are almost always timestamp-to-second or GUID.
"Why is my DAX measure slow?" — open DAX Studio, look at SE vs FE time. If FE is dominant, push the expression into a calculated column or rewrite to use CALCULATE and filter-context shifts that the SE can serve directly.
"Why does Vertipaq need RAM proportional to uncompressed size at process?" — because the engine reads source rows row-by-row before building the compressed column store; the buffer holds the staging data temporarily.
"Is Vertipaq the same engine as Power BI?" — yes. Same xVelocity / Vertipaq engine; Power BI is one of three product surfaces (with SSAS Tabular and AAS) over the same code base.

Worked example — auditing a tabular model with Vertipaq Analyzer

Detailed explanation. A reporting team's tabular model has grown from 4GB to 22GB in nine months and the nightly refresh now takes 90 minutes. Vertipaq Analyzer (a free DAX Studio plug-in) ranks columns by RAM consumption — the first three rows usually tell the whole story.

Question. Walk through the Vertipaq Analyzer audit you would run, identify the top three RAM offenders, and propose the smallest set of changes to halve the model size.

Input — top columns by RAM (Vertipaq Analyzer output).

Column	Cardinality	Size (MB)	Type
fact_events[event_uuid]	142M	6800	string
fact_events[user_agent]	89k	3100	string
fact_events[occurred_at_sec]	142M	2800	datetime
fact_events[ip_address]	2.1M	1700	string
fact_events[amount_micros]	142M	900	int64

Code.

-- Source view rewrites — applied before the tabular model imports.
CREATE OR REPLACE VIEW v_fact_events AS
SELECT
    -- DROP event_uuid entirely (used only for support ticket lookup,
    -- not needed for analytics).

    -- Hash user_agent down to a low-cardinality bucket (browser_family).
    CASE
        WHEN user_agent ILIKE '%Chrome%'   THEN 'Chrome'
        WHEN user_agent ILIKE '%Firefox%'  THEN 'Firefox'
        WHEN user_agent ILIKE '%Safari%'   THEN 'Safari'
        ELSE 'Other'
    END                                    AS browser_family,

    -- Truncate timestamps from second to hour grain.
    DATE_TRUNC('hour', occurred_at_sec)    AS occurred_at_hour,

    -- DROP ip_address (not needed for analytics; PII risk).

    -- Convert amount_micros (int64) -> amount_dollars (decimal 10,2).
    ROUND(amount_micros / 1000000.0, 2)    AS amount_dollars,

    user_id,
    event_type
FROM fact_events_raw
WHERE occurred_at_sec >= CURRENT_DATE - INTERVAL '2 years';

Step-by-step explanation.

event_uuid is dropped entirely — it was only useful for support ticket lookups, which now happen against the source warehouse instead of the model.
user_agent is collapsed to a four-bucket browser_family column. Cardinality drops from 89k to 4; the column compresses to a few KB.
occurred_at_sec is truncated from second-grain to hour-grain. Cardinality drops from 142M to roughly 17,500 (2 years × 24 hours/day × 365 days); RLE + dictionary now produce massive savings.
ip_address is dropped — never used in analytics and a PII liability.
amount_micros is converted to amount_dollars (decimal(10,2)) so Vertipaq can pack fewer bits per row.
The 2-year cutoff caps history at business requirement.

Output — model size before and after.

Stage	RAM (MB)
Baseline	22000
Drop event_uuid	15200
Hash user_agent	12100
Hour-truncate timestamp	9400
Drop ip_address	7700
Reduce amount precision	7400

Rule of thumb. When auditing a bloated tabular model, sort by RAM descending and treat the top three columns as a single sprint. The remaining columns are usually rounding error.

Worked example — partitioning + incremental refresh

Detailed explanation. Even an optimised model eventually outgrows a single full refresh. The standard fix is to partition the largest fact table by date and configure incremental refresh — process only the most recent partition each night, leave the historical partitions warm in RAM.

Question. Design a partition + incremental refresh policy for a 3-year fact_sales table where the business needs same-day-fresh data but tolerates 24h lag on the prior month.

Input.

Constraint	Value
Total history	3 years
Fresh-data SLA	same day
Acceptable lag (prior month+)	24 hours
Process budget	30 min nightly

Code.

Partition policy

- Archive partitions:  YEAR(order_date)   for years older than current year.
                       Refreshed once, never again unless data correction.
- Monthly partitions:  YEAR-MONTH(order_date) for current year, prior months.
                       Refreshed once per night.
- Hot partition:       current calendar month.
                       Refreshed every 15 minutes (incremental).
- DirectQuery overlay: TODAY only -> reads source directly for sub-minute freshness.

Step-by-step explanation.

Older years live as single-year partitions. They are processed once and never re-processed, eliminating gigabytes of redundant work per night.
Prior months in the current year are monthly partitions. The nightly job re-processes only the partitions whose source watermark changed (typically 0 or 1).
The current month is the "hot" partition. It is re-processed every 15 minutes by the SSAS XMLA endpoint receiving an incremental refresh command.
For sub-minute freshness on today, the model uses a tiny DirectQuery overlay table that reads WHERE order_date = CURRENT_DATE straight from the source. The blended model is called "composite mode."

Output — process time and RAM impact.

Partition	Rows	Process time	Refresh cadence
2024 (archive)	240M	n/a (one-time)	none
2025 (archive)	280M	n/a (one-time)	none
2026-01	24M	2 min	nightly (no-op if watermark unchanged)
2026-02	26M	2 min	nightly
2026-current	18M (so far)	1 min	every 15 min
TODAY (overlay)	source	DirectQuery	live

Rule of thumb. When a fact table exceeds 50M rows or single-refresh exceeds 30 min, partition by date and switch to incremental refresh. Composite mode (in-memory + DirectQuery overlay for today) buys sub-minute freshness without re-architecting the model.

SQL interview question on Vertipaq sizing

The probe usually sounds like: "Your tabular model RAM doubled in six months. Walk me through the audit you would run, the top three columns you'd expect to see, and the two-week mitigation plan."

Solution Using a column-by-column Vertipaq audit

-- 1) Discover top-RAM columns via the Vertipaq DMV.
-- (Equivalent to opening Vertipaq Analyzer in DAX Studio.)
SELECT
    TABLE_NAME       AS table_name,
    COLUMN_NAME      AS column_name,
    ROWS_COUNT       AS rows_in_table,
    DICTIONARY_SIZE  AS dictionary_bytes,
    USED_SIZE        AS column_bytes
FROM $SYSTEM.DISCOVER_STORAGE_TABLE_COLUMNS
ORDER BY USED_SIZE DESC
LIMIT 20;

-- 2) For each top-RAM column, decide one of three actions:
--    a) DROP if it is not used by any measure / report.
--    b) TRUNCATE grain (datetime -> date; second -> hour).
--    c) HASH to a bucket (free-text -> family).

-- 3) Verify with a post-mitigation re-audit and re-process.

Step-by-step trace.

Step	Action	Effect
1	Run DMV query	Surface top-20 RAM consumers
2	DROP `event_uuid`	-6.8 GB
3	Hash `user_agent` to `browser_family`	-3.1 GB
4	Truncate `occurred_at_sec` to hour	-2.5 GB
5	Re-process + re-audit	Confirm -12.4 GB net

The trace surfaces the asymmetric nature of Vertipaq cost — three columns own the entire bloat. Senior reviewers expect the candidate to name $SYSTEM.DISCOVER_STORAGE_TABLE_COLUMNS (or "Vertipaq Analyzer") explicitly.

Output:

Metric	Before	After
Model RAM	22 GB	9.6 GB
Nightly refresh	90 min	22 min
Query latency p95	3.4 s	0.8 s

Why this works — concept by concept:

Vertipaq is asymmetric — a handful of columns own most of the RAM. The audit converges in one query, not a week of profiling.
The three mitigations are categorical — drop, truncate grain, or hash to bucket. Every action falls into one of those three buckets, which makes the plan reviewable in code review.
DMVs are the source of truth — $SYSTEM.DISCOVER_STORAGE_TABLE_COLUMNS returns the same data Vertipaq Analyzer renders, so the audit is automatable.
Re-audit closes the loop — confirming the mitigation worked prevents the team from claiming the win and finding the regression in the next month's review.
Cost — O(columns) for the DMV scan, plus the one-time re-process. Both are cheap relative to letting the model keep doubling.

SQL
Topic — aggregation
Columnar aggregation problems

Practice →

SQL
Topic — group by
GROUP BY interview drills

Practice →

4. DAX vs MDX — same KPI, two languages

`DAX` thinks in filter context over a relational graph; `MDX` thinks in tuples over a multi-axis cube

The mental model in one line: DAX manipulates the filter context that a tabular relational graph applies to a measure; MDX manipulates the sub-cube that a multi-axis cube exposes through tuples and sets. Both can answer "Sales YTD by Region and Product" — they just do it with completely different primitives.

The DAX mental model in detail.

Filter context. Every measure runs inside an implicit filter — the rows of the source table that survive the visual / pivot filters that were already applied (date selection, region selection, product selection). The measure sums / averages / counts over that filtered set.
CALCULATE rewrites filter context. CALCULATE(SUM(...), USERELATIONSHIP(...)) adds, removes, or replaces filters before re-evaluating the measure inside the new context.
Row context. Iterators like SUMX, AVERAGEX introduce a row-by-row context over a table; the inner expression runs once per row.
Time intelligence is built in. TOTALYTD, SAMEPERIODLASTYEAR, DATEADD, PARALLELPERIOD, DATESBETWEEN — every common time pattern is a one-line function call, provided you MARK AS DATE TABLE.

The MDX mental model in detail.

Tuples and sets. A tuple is a coordinate: ([Date].[2026 Q2], [Geography].[EU], [Measures].[Sales]). A set is an ordered collection of tuples on one axis.
Sub-cube. A query SELECTs sets onto axes 0..n and FROM a cube — the result is a sub-cube. Every cell is a tuple.
SCOPE and THIS. SCOPE defines an assignment that applies inside a region of the cube; THIS is the value being computed.
Named sets and calculated members. Reusable named sets / calculated members live in the cube script and are visible to every query.

Same KPI — Sales YTD — side by side.

-- DAX (tabular)
Sales YTD :=
TOTALYTD(
    [Sales Amount],
    'Date'[Date]
)

Sales Amount := SUM(fact_sales[Amount])

-- MDX (multidimensional)
WITH MEMBER [Measures].[Sales YTD] AS
    Aggregate(
        YTD([Date].[Calendar].CurrentMember),
        [Measures].[Sales Amount]
    )
SELECT
    [Measures].[Sales YTD] ON COLUMNS,
    [Date].[Calendar].[Calendar Year].Members ON ROWS
FROM [Sales];

Both queries answer "what is the year-to-date sales total at the current date selection?" The DAX measure is a function of the filter context applied by whichever Power BI visual hosts it. The MDX query computes the same number against the explicit [Date].[Calendar].CurrentMember coordinate.

When DAX is harder.

Asymmetric reporting. "Sales for EU divisions only on dates that are open trading days" requires manual filter context surgery in DAX; MDX can express it cleanly with set arithmetic.
Ragged hierarchies. Parent-child DAX patterns (using PATH, PATHITEM, LOOKUPVALUE) are slower and harder to maintain than native MDX hierarchies.
Writeback. No native DAX equivalent. Tabular has to fake writeback through a staging table + refresh pattern.

When MDX is harder.

Row-level security. DAX role expressions are more natural and faster.
Power BI surface. Power BI was designed for DAX; MDX queries from Power BI route through a translator and lose some optimisation.
Time intelligence shortcuts. DAX has dozens; MDX has the YTD(), MTD(), QTD() family but the rest are hand-rolled.

The modern toolchain.

DAX Studio. Open-source DAX query editor + server timings + Vertipaq Analyzer. The DAX equivalent of EXPLAIN PLAN.
Tabular Editor. Free + paid versions for editing tabular models outside the Power BI portal. Scriptable in C#.
ALM Toolkit. Diff + merge tool for tabular models — the closest thing to git diff for .bim / .tmdl files.
VS Code + TMDL. Microsoft's modern tabular model representation; supports Git diffs natively.

Common interview probes on DAX vs MDX.

"Why is CALCULATE important?" — it is the only function that rewrites filter context. Most non-trivial DAX is a CALCULATE with a filter list.
"What does MARK AS DATE TABLE do?" — tells the engine which table is the canonical calendar so time-intelligence functions like TOTALYTD work.
"When would you use USERELATIONSHIP?" — to activate an inactive relationship for the duration of one measure. Classic example: a fact has both order_date and ship_date foreign keys; one relationship is active, the other is inactive and the measure activates it on demand.
"Can you write MDX against a tabular model?" — yes since SSAS 2017, but you lose some optimisation. Stick to DAX on tabular and MDX on multidim.

Worked example — same KPI in DAX and MDX

Detailed explanation. A team migrating from MDX to DAX needs to translate "Sales YTD" — the most common time-intelligence pattern. The DAX version is a one-line TOTALYTD; the MDX version is a WITH MEMBER ... Aggregate(YTD(...)). Both produce the same number for the same date selection.

Question. Write the same Sales YTD KPI in DAX (tabular) and MDX (multidimensional). Show how each is invoked from Power BI / Excel and explain why DAX is shorter.

Input — date dimension and fact.

Table	Columns
`Date`	`Date`, `CalendarYear`, `CalendarQuarter`, `CalendarMonth`
`fact_sales`	`order_date`, `product_id`, `amount`

Code.

-- DAX measure (tabular model)
Sales Amount := SUM(fact_sales[amount])

Sales YTD :=
TOTALYTD(
    [Sales Amount],
    'Date'[Date]
)

-- MDX calculated member (multidimensional cube)
WITH MEMBER [Measures].[Sales YTD] AS
    Aggregate(
        YTD([Date].[Calendar].CurrentMember),
        [Measures].[Sales Amount]
    )
SELECT
    NON EMPTY [Measures].[Sales YTD] ON COLUMNS,
    NON EMPTY [Date].[Calendar].[Calendar Month].Members ON ROWS
FROM [Sales];

Step-by-step explanation.

The DAX Sales Amount is a simple SUM measure over the fact column. It runs in whatever filter context the Power BI visual provides.
TOTALYTD([Sales Amount], 'Date'[Date]) is shorthand for "sum the [Sales Amount] measure over the date range from the start of the current year to the current date filter." The Date table must be marked as a date table.
The MDX WITH MEMBER defines a calculated member at query scope. YTD([Date].[Calendar].CurrentMember) returns the set of date members from year-start to the current member; Aggregate(...) sums the [Sales Amount] measure over that set.
The MDX query then SELECTs the new member on columns and the months on rows — explicit axes. The DAX equivalent never names axes because the host visual provides them.

Output (same result, both languages).

Month	Sales YTD
2026-Jan	120,000
2026-Feb	245,000
2026-Mar	380,000
2026-Apr	510,000

Rule of thumb. When translating MDX → DAX, the host visual collapses the SELECT axes — every "ON COLUMNS" / "ON ROWS" disappears and the measure body shrinks to the time-intelligence call. The two-line DAX is rarely more than a one-line TOTALYTD, SAMEPERIODLASTYEAR, or DATEADD.

Worked example — CALCULATE with USERELATIONSHIP for shipping vs order date

Detailed explanation. A fact has two date columns — order_date (active relationship) and ship_date (inactive relationship). The team needs two measures: one that sums sales by order date (default) and one by ship date.

Question. Write the Sales by Order Date measure and the Sales by Ship Date measure in DAX. Explain why one uses USERELATIONSHIP and the other does not.

Input.

Table	Columns
`Date`	`Date`
`fact_sales`	`order_date` (active FK), `ship_date` (inactive FK), `amount`

Code.

-- Default measure — uses the active relationship (order_date <-> Date)
Sales by Order Date := SUM(fact_sales[amount])

-- Alternate measure — activates the inactive ship_date relationship
Sales by Ship Date :=
CALCULATE(
    SUM(fact_sales[amount]),
    USERELATIONSHIP(fact_sales[ship_date], 'Date'[Date])
)

Step-by-step explanation.

Sales by Order Date is a plain SUM. The model already has an active relationship between fact_sales[order_date] and 'Date'[Date], so any date filter from the visual propagates through that relationship automatically.
Sales by Ship Date needs the inactive ship_date → Date relationship instead. CALCULATE wraps the SUM and USERELATIONSHIP tells the engine "for this evaluation only, activate the inactive relationship."
When the visual filters dates Jan 2026, Sales by Order Date returns the sum of all rows whose order_date is in January, and Sales by Ship Date returns the sum of all rows whose ship_date is in January — different sets, same measure logic.
The two measures appear side by side in a Power BI visual, letting the user see "ordered vs shipped" gap by month.

Output.

Month	Sales by Order Date	Sales by Ship Date
2026-Jan	120,000	95,000
2026-Feb	125,000	130,000
2026-Mar	135,000	140,000

Rule of thumb. When a fact table has multiple date columns (order, ship, deliver, refund), model them as multiple relationships to a single Date table — one active, the others inactive. Each "date semantic" measure is then a one-line CALCULATE + USERELATIONSHIP.

SQL interview question on DAX vs MDX translation

A senior interviewer might frame it as: "Translate this MDX calculated member to DAX, and explain why the DAX version is shorter even though it does the same work."

Solution Using filter-context rewriting

-- MDX:
-- WITH MEMBER [Measures].[Sales Prior YTD] AS
--   Aggregate(YTD(ParallelPeriod([Date].[Calendar].[Calendar Year], 1,
--                                [Date].[Calendar].CurrentMember)),
--             [Measures].[Sales Amount])

-- DAX equivalent — one line, no axes.
Sales Prior YTD :=
CALCULATE(
    [Sales YTD],
    SAMEPERIODLASTYEAR('Date'[Date])
)

Step-by-step trace.

Step	What the engine does
1	Current filter context selects `[Date]` = current month (say 2026-Apr)
2	`SAMEPERIODLASTYEAR` shifts the filter to the same period one year prior (2025-Apr)
3	`CALCULATE` re-evaluates `[Sales YTD]` inside the shifted filter context
4	`[Sales YTD]` (= TOTALYTD over Sales Amount) now sums from 2025-Jan-01 to 2025-Apr
5	Result is the prior-year YTD comparison for the visual's current month

The trace shows the DAX engine doing exactly the same work as MDX — shifting the date filter, then re-aggregating — without the candidate having to write tuples, sets, or ParallelPeriod.

Output:

Month	Sales YTD	Sales Prior YTD
2026-Jan	120,000	110,000
2026-Feb	245,000	220,000
2026-Mar	380,000	345,000
2026-Apr	510,000	470,000

Why this works — concept by concept:

CALCULATE rewrites filter context — every non-trivial DAX measure is a CALCULATE that shifts, replaces, or removes filters before re-evaluating the inner expression.
SAMEPERIODLASTYEAR is a filter, not a measure — it returns a set of dates that shifts the date filter; the outer CALCULATE re-applies the existing [Sales YTD] measure against that shifted context.
Marked Date table is required — SAMEPERIODLASTYEAR, TOTALYTD, DATEADD, and the rest of the time-intelligence family require a table marked with MARK AS DATE TABLE for the contiguity rules to work.
No axes, no tuples — DAX delegates axis management to the host visual. The measure body never references columns or rows; the same measure is reusable in tables, charts, and Excel pivots.
Cost — same as the MDX equivalent: one filter shift + one aggregate scan, O(rows in shifted period). The savings are in developer time, not query time.

SQL
Topic — case expression
CASE expression for measure logic

Practice →

SQL
Topic — window functions
Window function and time-intelligence drills

Practice →

5. Deploying SSAS — instance topology, refresh, security

One instance per mode, one XMLA endpoint per instance — and three hosting choices that share the same engine

The mental model in one line: An SSAS instance is a service configured to exactly one mode (tabular or multidimensional), it exposes one XMLA endpoint, and consumers (Power BI, Excel, Fabric) connect to that endpoint over MSOLAP or HTTPS. Everything else — security, refresh, ALM, monitoring — hangs off that one shape.

Instance modes — one service, one mode.

Tabular instance. Set at install time. Hosts tabular models; uses Vertipaq; queries arrive in DAX (or MDX through the translator).
Multidimensional instance. Set at install time. Hosts cubes; uses MOLAP / ROLAP / HOLAP storage; queries arrive in MDX (or DAX through DAXMD).
You cannot mix. A single service is one mode or the other. If you need both, run two services — typically on different ports or different VMs.

Three hosting choices for the same engine.

On-prem SQL Server box. A Windows server with SSAS installed alongside the database engine. Full control, manual capacity planning, ALM via on-prem CI / CD.
Azure Analysis Services (AAS). A PaaS box — SSAS Tabular only — rented as a SKU. Pay for RAM and cores; Microsoft manages patching. Being deprecated in favour of Fabric.
Power BI Premium / Fabric capacity. A tabular dataset hosted on a capacity SKU. Same Vertipaq engine, with the Power BI / Fabric portal layered on top for permissioning, refresh, and ALM.

Data refresh — pull vs push.

Pull (process commands). The traditional path. An SSAS agent runs Process Full, Process Update, or Process Add commands against the model, which reads from the source and rebuilds tables / partitions.
Push (incremental refresh via XMLA endpoint). The modern path. The dataset receives an XMLA command (e.g. RefreshObjects) over HTTPS that processes only the changed partition. Used by Power BI Premium / Fabric refresh policies.
Composite mode. A blend — most of the model in Vertipaq (in-memory), the "today" slice as a DirectQuery overlay against the source. Best of both: warm history + live current.

Connectivity.

XMLA endpoint. The protocol for both reading (DAX / MDX queries) and writing (refresh commands). Exposed as powerbi:///asazure:///fabric:// URLs.
MSOLAP provider. The native client for Power BI, Excel, SSMS. Speaks XMLA over TCP or HTTPS.
ODBC / JDBC for DAX. Newer drivers let Python, R, and non-Microsoft BI tools issue DAX queries directly against tabular endpoints.

Security.

Row-level security (RLS) via DAX roles. Define a role with a DAX filter expression (e.g. [region] = "EU"); assign Entra ID users / groups to the role. Every query the user runs has the filter silently appended.
Object-level security (OLS). Hide columns or tables entirely from a role. Used for sensitive PII or financial detail that some users must not see.
Tenant isolation. Hosted services (AAS, Power BI Premium, Fabric) provide tenant-level isolation as a SKU feature. On-prem SSAS relies on Active Directory / Entra ID + firewall.
Service principals. For automated refresh / deployment, use a service principal (Entra ID app registration) instead of a personal account.

Lifecycle / ALM.

Tabular Editor. Edit .bim / .tmdl files outside the Power BI portal; script changes in C#.
ALM Toolkit. Diff + merge between models — closest thing to "git diff" for semantic models.
TMDL + VS Code. The modern source-format. Plain-text, Git-friendly, supports proper diffs and merges.
Azure DevOps / GitHub Actions. Run TMDL → XMLA deploy pipelines that push the model to dev → test → prod environments.

Monitoring.

SQL Profiler + Extended Events (XEvents). Capture every DAX / MDX query, with execution plan and timings.
DMVs. $SYSTEM.DISCOVER_OBJECT_ACTIVITY, $SYSTEM.DISCOVER_SESSIONS, $SYSTEM.DISCOVER_STORAGE_TABLE_COLUMNS — the operational telemetry of the engine.
Vertipaq Analyzer. A DAX Studio plug-in that surfaces the column-by-column RAM footprint described in Section 3.
Server timings (DAX Studio). Splits each query's time into Storage Engine vs Formula Engine — the single most useful DAX optimisation tool.

Migration paths.

Multidimensional → Tabular. Manual rewrite — there is no automated converter. Estimate weeks per cube.
On-prem SSAS Tabular → AAS. Backup .abf → restore on AAS; or deploy .bim via XMLA. Days.
AAS → Power BI Premium / Fabric semantic model. Use the Power BI portal's AAS migration tool. Hours per dataset.
Power BI Premium → Fabric semantic model. Increasingly seamless — Fabric is positioned as the successor.

Common interview probes on deployment.

"Can one SSAS instance host both tabular and multidimensional?" — No. One service, one mode.
"What is the XMLA endpoint used for?" — Both queries and refresh / deployment. It is the single protocol surface.
"How do you enforce RLS on a Power BI report?" — Define a DAX role on the tabular model; assign users / groups to the role through Entra ID; the filter is applied silently on every query.
"Where do you store the .bim / .tmdl source?" — Git. TMDL is plain-text and diff-able; older .bim JSON is too, but TMDL is preferred.

Worked example — defining a row-level security role in DAX

Detailed explanation. A multi-region SaaS dashboard must show each regional manager only their own region's rows. The model defines an RLS role with a DAX filter; Entra ID groups are mapped to the role; every query is silently restricted.

Question. Write the DAX role expression for "EU region only" and explain how a user in the eu-managers Entra group inherits it.

Input — the dimension and role mapping.

Table	Columns
`dim_region`	`region_code` (`EU`, `NA`, `APAC`), `region_name`
Entra group	`eu-managers`, `na-managers`, `apac-managers`
SSAS role	`Region EU`, `Region NA`, `Region APAC`

Code.

-- Role: "Region EU"
-- Table filter on dim_region:
[region_code] = "EU"

-- Optional dynamic version — uses the caller's username to look up
-- their region from a permissions table:
[region_code] =
LOOKUPVALUE(
    user_region[region_code],
    user_region[email], USERPRINCIPALNAME()
)

Step-by-step explanation.

The static [region_code] = "EU" filter is applied to every query the role runs. Any visual that touches dim_region is silently restricted to EU rows.
Relationships propagate the filter automatically: any fact table joined to dim_region is also restricted to EU rows. This is why star-schema relationships matter — RLS travels along them.
The Entra group eu-managers is mapped to the SSAS role Region EU in the portal. Any user in that group inherits the filter on every query, without the report author having to know.
The dynamic version uses USERPRINCIPALNAME() to look up the caller's allowed region from a user_region table. One role serves every region; the lookup happens per query.

Output.

User	Effective filter	Rows seen
alice@eu (eu-managers)	region_code = "EU"	EU only
bob@na (na-managers)	region_code = "NA"	NA only
carol@apac (apac-managers)	region_code = "APAC"	APAC only

Rule of thumb. Define RLS on the dimension, not the fact. The relationship propagates the filter automatically. For multi-region tenants, prefer the dynamic USERPRINCIPALNAME() lookup over per-region roles — one role, one filter, infinite tenants.

Worked example — incremental refresh policy via XMLA

Detailed explanation. A tabular model has a 200M-row fact_orders table partitioned by month. The team wants the model to refresh only the changed partitions each night, plus the hot current-month partition every 15 minutes.

Question. Write the incremental refresh policy (in TMDL pseudo-code) and the XMLA refresh command that processes only the current-month partition.

Input.

Setting	Value
Table	`fact_orders`
Partition key	`order_date`
Archive window	24 months
Hot window	current month
Hot refresh cadence	every 15 minutes
Cold refresh cadence	nightly

Code.

-- Refresh policy (TMDL pseudo-code on the table)
table fact_orders
  partitionedBy:    column order_date
  archivePeriod:    24 months
  incrementalPeriod: 1 month
  rollingWindowGranularity: month
  detectDataChanges: column updated_at

<!-- XMLA refresh — process the current-month partition only -->
<Refresh xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
  <DatabaseID>RetailModel</DatabaseID>
  <Type>Full</Type>
  <Objects>
    <Object>
      <DatabaseID>RetailModel</DatabaseID>
      <CubeID>Model</CubeID>
      <MeasureGroupID>fact_orders</MeasureGroupID>
      <PartitionID>fact_orders_2026_06</PartitionID>
    </Object>
  </Objects>
</Refresh>

Step-by-step explanation.

The TMDL refresh policy declares the archive window (24 months) and the rolling-window granularity (month). The engine generates monthly partitions automatically based on order_date.
detectDataChanges: column updated_at tells the engine to skip a partition whose source MAX(updated_at) is unchanged since last process. Nightly refreshes become near-no-ops for stale months.
The XMLA Refresh command targets exactly one partition (fact_orders_2026_06). Sent every 15 minutes by a scheduler, it processes ~24M rows in ~60 seconds — well under the cadence.
The cold nightly job processes any partition with a changed source watermark — typically the previous month if late-arriving data exists, otherwise a no-op.

Output — refresh telemetry.

Partition	Rows	Cadence	Typical duration
`fact_orders_2024_*`	~10M each	manual / one-time	n/a
`fact_orders_2025_*`	~12M each	nightly (no-op if no changes)	0–90 s
`fact_orders_2026_05`	18M	nightly (changes only)	80 s
`fact_orders_2026_06` (hot)	6M so far	every 15 min	25 s

Rule of thumb. Configure the refresh policy on the table, fire the XMLA Refresh at the partition. The policy declares the shape; the command does the work. For sub-minute freshness on today, layer a DirectQuery overlay on a tiny "today only" table.

SQL interview question on SSAS deployment topology

A senior interviewer might frame it as: "Walk me through the deployment topology you'd recommend for a 200GB tabular model that must serve Power BI in three regions with row-level security and same-day freshness on the largest fact table."

Solution Using a hosted Fabric semantic model with composite refresh

Topology

1) Hosting:    Fabric capacity (per-region) -> hosts the tabular dataset.
               (Alternative: Power BI Premium per-capacity in each region.)
2) Source:     SQL Server / Synapse / OneLake in each region.
3) Refresh:
   - Cold archive partitions (months 2..24)  -> one-time, then never.
   - Warm prior-month partition              -> nightly via XMLA Refresh.
   - Hot current-month partition             -> every 15 min via XMLA Refresh.
   - Today overlay                           -> DirectQuery composite mode.
4) Security:
   - RLS via dynamic DAX role using USERPRINCIPALNAME() -> per-tenant region.
   - OLS hides salary / PII columns from non-finance roles.
   - Service principal owns the refresh job; users inherit RLS.
5) ALM:
   - TMDL source in Git.
   - Azure DevOps pipeline: dev capacity -> test -> prod (XMLA deploy).
   - ALM Toolkit diff before each merge.
6) Monitoring:
   - DAX Studio + Vertipaq Analyzer for query / RAM audits.
   - XEvents stream to Log Analytics for trend analysis.
   - Vertipaq audit cron once a week.

Step-by-step trace.

Concern	Mechanism	Outcome
Same-day freshness	Composite mode + 15-min XMLA refresh	<1 min lag on today
Sub-second queries	Vertipaq + good star schema	p95 <1s
Row-level security	Dynamic DAX role + USERPRINCIPALNAME()	Each tenant sees own rows
ALM	TMDL in Git + ALM Toolkit	Reviewable diffs, blue/green deploys
Monitoring	XEvents + DAX Studio + Vertipaq Analyzer	Bloat caught before it bites

The trace ties each non-functional requirement to one specific SSAS feature. Senior reviewers expect every concern (freshness, security, ALM, observability) to have a named mechanism — not just "Power BI handles it."

Output:

Property	Value
Hosting	Fabric capacity, per region
Engine	Vertipaq (SSAS Tabular under the hood)
Freshness	<1 minute (composite mode)
Query latency p95	<1 s
Security	Dynamic DAX RLS + OLS
ALM	TMDL + Git + Azure DevOps

Why this works — concept by concept:

Hosting follows residency and SKU — Fabric is the 2026 default for greenfield; the per-region split shifts the data-residency boundary instead of forcing a single global capacity.
Composite mode buys freshness without rebuilding — the Vertipaq partition handles historical aggregation; the DirectQuery overlay handles today. Both surface as one dataset.
Dynamic RLS scales tenants — USERPRINCIPALNAME() + a permissions table replaces "one role per tenant" with one role plus a lookup. Adding a tenant becomes a database write, not an SSAS deploy.
TMDL + ALM Toolkit — turn the semantic model into a code asset with the same review workflow as application code. The model survives the loss of any individual developer.
Cost — capacity rental per region (linear in tenant size), refresh credits per XMLA call (linear in changed partitions), and the engineering cost of the ALM workflow (a one-time investment that amortises).

SQL
Topic — joins
Star schema JOIN problems

Practice →

SQL
Topic — design
Semantic-layer and BI design drills

Practice →

Cheat sheet — SSAS modelling recipes

Star schema baseline. One fact table per grain, surrogate keys, one Date table marked with MARK AS DATE TABLE. Hide raw foreign-key columns from the field list; expose only measures and dimension attributes.
Inactive relationships for multi-date facts. Model order_date as active, ship_date / deliver_date as inactive. Each alternate-date measure is one line: CALCULATE(SUM(...), USERELATIONSHIP(...)).
Display folders for measure organisation. Group Sales / YTD, Sales / MTD, Sales / Prior Year inside a single Sales table — the field list stays tidy as the catalogue grows.
Avoid bidirectional relationships. They cause ambiguity, slow queries, and weird filter propagation. Use single-direction relationships and reach for CROSSFILTER(... BOTH) inside CALCULATE only when truly needed for dimension-to-dimension propagation.
Vertipaq compression checklist. Lower precision on numerics (decimal(10,2) not decimal(18,4)); drop high-cardinality columns you do not query; truncate timestamps to date or hour grain; partition by date and use incremental refresh.
NULL-safe RLS. Always wrap RLS expressions with COALESCE or IS NOT NULL filters so a NULL region does not silently leak rows. Test the role with an Entra account before shipping.
DAX time intelligence baseline. TOTALYTD, SAMEPERIODLASTYEAR, DATEADD, PARALLELPERIOD, DATESBETWEEN — these five cover most reporting needs. Avoid hand-rolled date math.
Refresh policy. Cold archive partitions = one-time. Warm prior month = nightly + change detection. Hot current month = 15 min. Today overlay = DirectQuery composite mode for sub-minute freshness.

Frequently asked questions

Is SSAS deprecated in favour of Power BI and Fabric?

No — not formally. SSAS Tabular is the same engine that powers Power BI datasets and Fabric semantic models, so the engine itself is very much alive. What has deprecated is the on-prem multidimensional flavour (in maintenance mode since 2019) and Azure Analysis Services (being retired in favour of Fabric capacity). For greenfield projects in 2026 the recommendation is to host a tabular dataset on Power BI Premium or Fabric capacity, but you still benefit from understanding sql server analysis services because the modelling primitives, DAX, Vertipaq, XMLA, and ALM workflows are identical.

SSAS Tabular vs Power BI dataset — what's the difference?

In terms of engine, almost nothing — both are tabular models running on Vertipaq with DAX as the query language. The difference is the hosting product. SSAS Tabular runs as a standalone Windows service that you manage; a Power BI Premium dataset runs inside a Microsoft-hosted capacity with the Power BI portal layered on top for workspace permissions, refresh scheduling, and ALM. A Fabric semantic model is the next-generation Power BI dataset, with deeper integration into OneLake and Fabric items. The same TMDL source can deploy to all three with minor connection-string changes.

Do I still need MDX in 2026?

If your environment runs SSAS Multidimensional cubes, yes — reading, writing, and debugging MDX is unavoidable. For greenfield tabular projects on Power BI / Fabric, no — DAX covers the entire surface and Power BI was designed around it. The pragmatic answer for a data engineer is to recognise MDX (be able to read a WITH MEMBER ... SELECT query and explain what it does) without committing to writing MDX as your day-to-day language. That gives you the option to migrate legacy cubes when the time comes without making it a career-defining bet.

How big can a tabular model get in memory?

The hard ceiling is the capacity you rent — on Power BI Premium, an F64 capacity offers a per-dataset memory ceiling that rises with the SKU; Fabric capacities follow the same family. In practice, well-modelled tabular datasets running 50–200 GB of compressed RAM are common in 2026, with 10x–100x compression versus source row size. The day-to-day ceiling is set by your refresh budget (the engine needs roughly 2x model RAM during a full process) and your p95 query latency target rather than a hard maximum. If you're approaching the SKU's per-dataset ceiling, either bump the SKU, split into multiple datasets, or move to composite mode with DirectQuery for the largest fact.

Can I query SSAS from non-Microsoft tools?

Yes — anything that speaks XMLA over MSOLAP or HTTPS can query an SSAS endpoint. That includes Excel pivots (native), Power BI (native), Tableau (with the SSAS connector), Python via the adodbapi or pyodbc driver with the MSOLAP provider, R via odbc, and any BI tool that has an MDX or DAX connector. For DAX-only queries against a tabular endpoint, the modern path is the REST API exposed by Power BI / Fabric, which any HTTP client can call with an OAuth token. The XMLA endpoint is the universal protocol — every consumer surface speaks it under the hood.

What's the difference between SSAS, Azure Analysis Services, and Fabric semantic models?

All three host the same Vertipaq tabular engine. SSAS is the on-prem Windows service — you own the box, you patch it, you run the network. Azure Analysis Services (AAS) is the PaaS version — Microsoft manages the host, you pay per SKU; it is being retired in favour of Fabric. A Fabric semantic model is the next-generation hosted form, deeply integrated with OneLake, Fabric workspaces, and Power BI service. Migration paths run cleanly in one direction (SSAS → AAS → Power BI Premium → Fabric); the engine is the same, the surrounding product changes. For greenfield in 2026, default to Fabric semantic model; for on-prem mandates, default to SSAS Tabular.

Practice on PipeCode

Drill the dimensional modelling practice library → for the star-schema and cube-modelling questions that anchor every semantic-layer interview.
Rehearse on aggregation problems → when the interviewer wants the SUM / AVG / COUNT layer that DAX and MDX both compile down to.
Sharpen GROUP BY drills → for the grouping logic that becomes a tabular measure under the hood.
Stack JOIN practice → for the star-schema joins that feed every fact-and-dimension model.
Layer the window functions library → for the time-intelligence patterns you would otherwise write as DAX TOTALYTD or MDX YTD().
For the broader surface, read top data engineering interview questions →.
Stack the prerequisites with the only 5 skills you need to become a data engineer →.
Sharpen the dialect axis with the SQL for data engineering interviews course →.
For long-form schema craft, work through data modelling for DE interviews →.

Pipecode.ai is Leetcode for Data Engineering — every SSAS modelling recipe above ships with hands-on practice rooms where you write the star-schema JOIN, the DAX-style aggregation, and the time-intelligence window function against real graded inputs. PipeCode pairs every reading with 450+ DE-focused problems and a real-time scoring engine, so you never have to wonder whether your `sql server analysis services` interview answer actually behaves the same on Tabular as on Multidimensional.

Practice dimensional modelling now →
Aggregation drills →

1. Why a semantic layer still matters in 2026

The three jobs of a semantic model — shared definitions, fast aggregation, governed metrics

Worked example — choosing the model type for a new BI project

Worked example — the "one number, one definition" failure

SQL interview question on semantic-layer purpose

Solution Using a three-question decision frame

2. Tabular vs Multidimensional — the decision matrix

Seven rows that decide every SSAS project — and a default recommendation for 2026

Worked example — picking the storage mode for a 200M-row fact table

Worked example — when a multidimensional cube is still the right answer

SQL interview question on choosing between Tabular and Multidimensional

Solution Using the seven-row matrix as a checklist

3. The Vertipaq engine — why tabular is so fast

Four compression tricks and two engines that turn 50M rows into a sub-second scan

Worked example — auditing a tabular model with Vertipaq Analyzer

Worked example — partitioning + incremental refresh

SQL interview question on Vertipaq sizing

Solution Using a column-by-column Vertipaq audit

4. DAX vs MDX — same KPI, two languages

DAX thinks in filter context over a relational graph; MDX thinks in tuples over a multi-axis cube

Worked example — same KPI in DAX and MDX

Worked example — CALCULATE with USERELATIONSHIP for shipping vs order date

SQL interview question on DAX vs MDX translation

Solution Using filter-context rewriting

5. Deploying SSAS — instance topology, refresh, security

One instance per mode, one XMLA endpoint per instance — and three hosting choices that share the same engine

Worked example — defining a row-level security role in DAX

Worked example — incremental refresh policy via XMLA

SQL interview question on SSAS deployment topology

Solution Using a hosted Fabric semantic model with composite refresh

Cheat sheet — SSAS modelling recipes

Frequently asked questions

Is SSAS deprecated in favour of Power BI and Fabric?

SSAS Tabular vs Power BI dataset — what's the difference?

Do I still need MDX in 2026?

How big can a tabular model get in memory?

Can I query SSAS from non-Microsoft tools?

What's the difference between SSAS, Azure Analysis Services, and Fabric semantic models?

Practice on PipeCode

`DAX` thinks in filter context over a relational graph; `MDX` thinks in tuples over a multi-axis cube