Sajjad Rahman

Posted on Dec 6, 2025

DP-600 Fabric Analytics Engineer – Structured Study Notes

#microsoft #microsoftcloud #powerplatform #dataengineering

DP-600 Fabric Analytics Engineer – Structured Study Notes

(Governance, Admin, Lifecycle, Data Prep, Semantic Models)

0. Big Picture – What DP-600 Expects From You

The DP-600 exam is about designing, building, governing, and optimizing analytics solutions in Microsoft Fabric.

At a high level you must be able to:

Maintain a data analytics solution
– Governance, security, admin, lifecycle, impact analysis
Prepare data
– Ingest, transform, store in Lakehouse/Warehouse/KQL DB, apply patterns like incremental and CDC
Implement and manage semantic models
– Modeling, storage modes, DAX basics, RLS/OLS, performance

This note focuses on all the topics you listed and ties them together with exam-style thinking.

1. Maintain a Data Analytics Solution

(Governance, Admin, Lifecycle)

This area covers:

Security & governance (access, RLS/CLS/OLS, sensitivity labels, endorsement)
Admin: workspaces, capacities, roles
Development lifecycle: Git, PBIP, deployment pipelines, XMLA
Monitoring & impact analysis (lineage)

1.1 Security & Access Control in Fabric

Security is layered. Think about:

Workspace-level roles
Item-level permissions
Data-level security (RLS/CLS/OLS)
File-level security
Sensitivity labels
Endorsement

1.1.a Workspace-Level Access

Each workspace has roles:

Admin
- Full control over the workspace
- Can change settings, add/remove users, delete items
- Used by platform owners / team leads
Member / Contributor
- Can create and edit items (Lakehouse, Dataflow, Reports, Pipelines, Notebooks)
- Cannot change workspace-level admin settings (like capacity)
Viewer
- Read-only access: can view reports, dashboards, semantic models, etc.
- Cannot create, edit, or publish new items

Common exam scenario:
“A user can open a report but cannot edit it or create a new dataflow in the same workspace. What is the likely cause?”
→ The user is a Viewer and needs at least Contributor/Member.

1.1.b Item-Level Permissions & Build

Even inside a workspace, every item has its own permissions:

Lakehouse
Warehouse
Semantic model (dataset)
Report
Notebook
Dataflow

A user might be Contributor on the workspace but still:

Cannot use a semantic model to build new reports
Cannot Analyze in Excel

The key concept is Build permission on the semantic model.

Build permission is required to:

Create new reports from a semantic model
Use Analyze in Excel
Use the semantic model from another workspace (when cross-workspace is allowed)
Allow composite models based on that dataset

Exam pattern:
“User can view a report but cannot use ‘Analyze in Excel’ or create a new report from the dataset.”
→ They are missing Build permission on the semantic model.

1.1.c RLS, CLS, OLS & File-Level Security

Security doesn’t stop at item level; it also applies to data inside the model.

Row-Level Security (RLS)

Restricts which rows a user can see.
Example:
- User in “Bangladesh” role only sees rows where Country = 'BD'.
Implemented in the semantic model (Power BI dataset / Fabric semantic model).

Column-Level Security (CLS)

Controls which columns a user can see.
Example:
- Hide Salary, SSN, PhoneNumber from certain roles.

Object-Level Security (OLS)

Hides entire tables or columns from users or roles.
Example:
- A Finance table is completely hidden from Sales users.

File-Level Security (OneLake / Files)

In Fabric:

Lakehouse has:
- Tables → usually accessed via SQL endpoint & semantic model
- Files → raw storage in OneLake filesystem

You can secure Files separately:

A user might:
- Query tables via semantic model
- But not be allowed to browse the Lakehouse Files section
That requires filesystem / OneLake permissions, which are separate.

Exam pattern:
“User can query a Lakehouse table via the SQL endpoint but cannot open the Files area in the Lakehouse.”
→ They are missing file-system permission on OneLake.

1.1.d Sensitivity Labels (Purview Integration)

Sensitivity labels classify and protect data. Common examples:

Public
General
Confidential
Highly Confidential – No Export

Once applied to semantic models, reports, etc., labels can:

Block Export to Excel / CSV
Block Publish to web
Restrict external sharing (especially with encrypted labels)

Important admin settings include:

Allow users to apply sensitivity labels
– Without this, labels aren’t available in Fabric.
Automatically apply sensitivity labels to downstream content
– A label applied to a semantic model can propagate to reports, dashboards, etc.
Allow Microsoft Purview to secure AI interactions
– Controls whether Copilot/AI can interact with labeled data under Purview governance.

Exam pattern:
“After applying a label to a report, users can view it but can no longer export the data. Why?”
→ The label (e.g., Highly Confidential – No Export) is configured to block export.

1.1.e Endorsement: Promoted & Certified

Endorsement helps users know which data they can trust.

Promoted
- Indicates team-level / local confidence
- Suggestion: “This is useful, but not centrally governed.”
Certified
- Indicates organization-level endorsement
- Represents single source of truth
- Generally restricted to a set of data stewards / admins

Endorsed items typically show in:

Data hub
OneLake data explorer
Search and discovery areas

Exam pattern:
“The central BI team wants certain semantic models to be recognized as single sources of truth across the organization. What should they use?”
→ Use Certification with appropriate governance.

1.2 Admin & Governance: Tenant, Capacity, Workspace

A lot of DP-600 questions are about “which setting to change?”.

The mental model:

Tenant = organization-wide policies
Capacity = compute configuration (F64, F128, Premium)
Workspace = team/project scope

Tenant

Controls global features like:

Whether Fabric is enabled at all
Whether users can create Fabric items
Export policies (Excel/CSV, PBIX download)
Guest users and external sharing
Sensitivity labels and Purview/AI integration
Trial activation (“Users can try Fabric paid features”)

Think: “This affects everyone / every workspace” → Tenant.

Capacity

Controls the compute resources and workloads:

Type of capacity: Fabric F-SKU, Power BI Premium, Embedded, etc.
Region and SKU size (F64, F128, etc.)
Workload settings:
- Spark / Data Engineering
- Dataflows
- Pipelines
- DirectLake / cache behavior
Ability to pause and resume capacity
Concurrency limits and memory settings

Common issues:

Dataflows stuck in queue → capacity under pressure / workload disabled

Notebooks fail to start → Spark capacity exhausted

DirectLake becomes slow → capacity overloaded, cache eviction

Workspace

The workspace ties everything together:

Assigned to a capacity (shared vs Fabric/Premium)
Used by teams to store:
- Lakehouses
- Warehouses
- Dataflows
- Pipelines
- Semantic models
- Reports

Key points:

Shared capacity workspaces:
- Can host Power BI content (reports, datasets, dashboards)
- Cannot host Fabric items like Lakehouse, Warehouse, Dataflow Gen2, Notebooks, Pipelines, DirectLake
Fabric/Premium capacity workspaces:
- Can use the full Fabric experiences.

Exam pattern:
“A workspace does not show the option to create a Lakehouse or Dataflow Gen2, but users can still create reports.”
→ The workspace is on Shared capacity or tenant has disabled Users can create Fabric items.

1.3 Analytics Development Lifecycle (CI/CD, Versioning)

This part is about acting like a real analytics engineer, not just building in the UI.

Key areas:

Version control with Git
PBIX vs PBIP/PBIR
Deployment pipelines (Dev → Test → Prod)
XMLA endpoint
Impact analysis / lineage

1.3.a Version Control with Git

Fabric supports Git integration for workspaces, so you can:

Connect a Fabric workspace to a Git repository
Store:
- Reports as PBIP / PBIR
- Notebooks as .ipynb or scripts
- SQL scripts
- Pipeline definitions as code

Benefits:

Team collaboration
Pull request (PR)-based code review
History & rollback
Integration into DevOps/CI/CD pipelines

Exam-view: DP-600 expects you to know why Git integration is useful, not every click.

1.3.b PBIX vs PBIP / PBIR

PBIX
- Binary file
- Traditional Power BI format
- Harder to source control (large monolithic file)
PBIP / PBIR
- Text-based project structure
- Separates metadata, model, and report layout into files/folders
- Much easier for Git and automated deployments

For DP-600, you must understand:
PBIP/PBIR = Git-friendly representation of Power BI content.

1.3.c Deployment Pipelines

Deployment pipelines support stage-based promotion of content:

Typical stages:
- Development → Test → Production

You can deploy:

Semantic models (datasets)
Reports
Dashboards
Some Fabric items (or via scripts/automation)

Key ideas:

Rules / parameters per stage:
- Different data source connections per environment (Dev DB vs Prod DB)
Ensuring changes are tested in Test before going to Production
Automatic or manual deployment flows

Exam scenario:
“You must promote changes from Dev to Prod while pointing to different databases. What feature should you use?”
→ Deployment pipelines with rules for data source bindings.

1.3.d XMLA Endpoint (for Semantic Models)

XMLA endpoint allows enterprise management of semantic models using external tools like:

Tabular Editor
SQL Server Management Studio (SSMS)
Scripts for deployment and partition management

Typical actions:

Partitioning large tables
Incremental refresh control
Advanced calculation groups
Scripting deployments (e.g., ALM Toolkit, DevOps pipelines)

For DP-600, you need conceptual understanding:
XMLA endpoint = enterprise-grade control and automation for Fabric/Power BI semantic models.

1.3.e Impact Analysis & Lineage

Impact analysis means understanding:

If I change this object, what breaks?

Fabric offers:

Lineage view:
- Shows flow: Data source → Dataflow → Lakehouse/Warehouse → Semantic model → Reports

Use cases:

Before changing a table schema
Before deleting a Dataflow or Lakehouse
Before renaming columns or tables

Exam pattern:
“You plan to change a Lakehouse table schema and want to ensure you understand which semantic models and reports will be affected. What should you use?”
→ Use lineage view or impact analysis tools in Fabric.

2. Prepare Data (Largest Weight in Exam)

This is the “heart” of DP-600:

Ingest data
Transform data
Store/serve data in Lakehouse / Warehouse / KQL DB
Design data processing patterns (incremental, CDC, real-time)
Enforce data quality and business rules

2.1 Ingest Data into Fabric

Main tools:

Data Factory (Fabric Data Factory experience)
Dataflow Gen2
Spark Notebooks
Shortcuts (zero-copy)

a) Data Factory

Fabric Data Factory is the orchestration & ingestion engine:

Copy Activity:
- Move data from source to destination (e.g., SQL → Lakehouse, Blob → Warehouse)
Pipeline templates:
- Prebuilt patterns (e.g., copy new files daily, ingest from SaaS, etc.)
Supports hundreds of connectors:
- SQL Server, Azure SQL, ADLS, Blob, Salesforce, etc.

Use Data Factory when:

You need pure data movement (ELT)
You orchestrate multiple steps and sources
You want scheduled ingestion pipelines

b) Dataflow Gen2

Dataflow Gen2 is Power Query Online in Fabric:

Perform transformations (cleaning, shaping) using Power Query
Define ETL logic in a friendly GUI
Output to:
- Lakehouse tables
- Warehouse tables
- KQL Databases

Typical use cases:

Business-friendly transformation definitions
Creating conformed dimension tables (e.g., DimCustomer)
Reusable data preparation logic

c) Spark Notebooks

Spark notebooks are for big data engineering:

Language options:
- PySpark, Scala, Spark SQL
Use cases:
- Large-scale joins and transforms
- Processing semi-structured or unstructured data
- Machine learning and advanced analytics
Write results into:
- Delta tables in the Lakehouse
- Files (Parquet, CSV, etc.)

d) Shortcuts (Zero-Copy)

Shortcuts let you:

Reference data in external storage (ADLS Gen2, S3, other OneLake accounts)
Without copying data into your Lakehouse

Think of a shortcut as:

A pointer to external data, not a copy.

Use for:

Central governance (single copy of data)
Multi-domain or multi-tenant architectures
Simplifying multi-region data access

2.2 Transform Data

Three main transformation engines:

Dataflow Gen2 (Power Query)
Spark Notebooks
T-SQL in Warehouse / SQL Endpoint

1) Dataflow Gen2

Good for:

Business-friendly logic
Medium-size data
Tabular transformations
Slowly changing dimensions (when combined with logic)

Examples:

Clean column names, change data types
Merge (join) multiple sources
Filter out errors
Create calculated columns (e.g., MonthNumber = Date.Month([OrderDate]))

2) Spark Notebooks

Good for:

Very large datasets
Complex transformations where M/Power Query is limiting
Streaming + batch
AI/ML workflows

You can:

Use PySpark to message large tables
Optimize with Delta features (OPTIMIZE, VACUUM)

3) T-SQL (Warehouse / SQL Endpoint)

Good for:

Teams with strong SQL skills
ELT patterns (Load raw → Clean → Transform using SQL)
Dimensional modeling inside Warehouse (Dim/Fact tables)
Creating views and stored procedures

You can:

Write SQL scripts
Use them in pipelines as SQL script activities
Build business logic in the Warehouse itself

Exam angle:
You must choose the right engine for each scenario:

Business user ETL? → Dataflow Gen2

Heavy big data & ML? → Spark notebooks

Standard enterprise SQL team? → T-SQL in Warehouse

2.3 Lakehouse vs Warehouse (Plus KQL DB)

Lakehouse

Combines:
- Files (raw storage)
- Tables (Delta format)
Ideal for:
- Medallion architecture:
- Bronze (raw)
- Silver (cleaned)
- Gold (business)
- Big data, semi-structured files, logs
Works best with:
- Spark notebooks
- Dataflow Gen2
- DirectLake-based semantic models

Warehouse

SQL-first experience:
- Tables, views, stored procedures
Ideal for:
- Structured, relational modeling
- Enterprise reporting, financial analytics
- Teams strong in SQL
Serves:
- Semantic models
- Reports directly via SQL engine

Typical exam scenario:

“Customer 360 view, finance, heavily structured reporting, strong SQL skills → choose Warehouse.”

“High-volume logs, clickstream, semi-structured events → choose Lakehouse + Spark.”

KQL Database (Kusto Query Language)

Optimized for log and time-series analytics
Used for:
- Real-time telemetry
- Observability data
- Operational analytics

2.4 Patterns: Incremental, CDC, Real-Time

Incremental Load

Process only new or changed data.
Benefits:
- Performance
- Cost
- Reduced load windows

Examples:

Dataflow Gen2 with incremental refresh policy (by date column)
SQL-based approaches using LastModifiedDate watermark

Change Data Capture (CDC) / Change Feed

Tracks row-level changes from a source (insert, update, delete)
Good for:
- Keeping Warehouse or Lakehouse in sync with OLTP systems
- Near-real-time updates

Real-Time / Streaming

Fabric also offers Real-Time Intelligence components like:
- Eventstreams
- KQL DB
- Real-time dashboards

Use case:

IoT streams
Website event tracking
Monitoring metrics with low latency

3. Implement and Manage Semantic Models

This is the Power BI + Fabric modeling side:

Storage modes
Modeling fundamentals
DAX basics
Security in the model (RLS/OLS)
Performance & optimization

3.1 Storage Modes: Import, DirectQuery, DirectLake, Composite

Import

Data is stored inside the model in memory.
Very fast query performance.
Requires scheduled or manual refresh.
Good for small–medium datasets and users who can tolerate some latency in freshness.

DirectQuery

No data stored in the model; all queries go to the source (SQL, etc.)
Good for real-time or near-real-time when data must stay at source.
Slower and more limited in modeling features; depends heavily on source performance.

DirectLake

Fabric-only mode.
Queries Delta tables in OneLake directly, with caching.
Combines advantages of Import (fast) and DirectQuery (near-real-time) for Fabric-based data.
Requires Fabric/Premium capacity and proper Lakehouse/Warehouse setup.

Composite Models

Mixes Import + DirectQuery + DirectLake in one model.
Very flexible but more complex to manage.

Exam scenario example:
“You must query a large Fabric Lakehouse with low latency and avoid scheduled refresh, and you have Fabric capacity. Which storage mode is recommended?”
→ DirectLake.

3.2 Modeling Fundamentals

Star schema is the gold standard:

Fact tables
- Large, numeric, transactional data
- E.g., FactSales, FactOrders
Dimension tables
- Smaller, descriptive, business entities
- E.g., DimCustomer, DimProduct, DimDate

Benefits:

Simpler queries
Better performance
More intuitive modeling

Relationships:

Mostly one-to-many (Dimension → Fact)
Use single direction filtering (Dimension filters Fact) unless there’s a strong reason otherwise.
Many-to-many relationships: use carefully; they can create ambiguous filter paths.

Best practices:

Use surrogate keys (integers) for relationships
Keep dimension tables wide but not huge
Avoid snowflake if performance is critical (flatten dimensions where reasonable)

3.3 DAX & Calculations

You don’t need to be a DAX guru for DP-600, but you should understand:

Calculated columns vs Measures:
- Calculated columns:
- Computed at data refresh time
- Stored in the model
- Row context
- Measures:
- Computed at query time
- Based on filter context
Row context vs Filter context:
- Row context → operations that work “row by row”
- Filter context → the filters that apply to the measure (via slicers, filters, etc.)
Basic time intelligence:
- TOTALYTD, SAMEPERIODLASTYEAR, DATEADD, etc.

3.4 Security in Semantic Models

Again, security at the model level:

RLS (Row-Level Security):
- Table filter expressions per role
- Example: [Country] = "Bangladesh" for certain users
OLS (Object-Level Security):
- Hide specific tables/columns from roles

Important:

RLS/OLS apply regardless of storage mode (Import/DirectQuery/DirectLake), with some advanced nuances.
Even if a user has Build, they still only see rows allowed by RLS.

3.5 Performance & Optimization

Large or complex models can slow down. Common optimization strategies:

Use star schema instead of snowflake
Avoid many-to-many relationships and bi-directional filters unless necessary
Use aggregations
Avoid overly complex DAX measures that iterate row-by-row over large tables
Use proper types & summarization to reduce memory usage
Use XMLA endpoint and Tabular Editor for:
- Partitioning fact tables
- Incremental refresh tuning
- Calculation groups

Exam-style question:
“A large semantic model is slow. What are effective techniques to improve performance?”
→ Star schema, aggregations, optimizing relationships, partitioning, and simplifying DAX.

4. Practical Study Plan (How to Use This Note)

Since you already have some hands-on with Fabric:

Step 1 – Fabric Objects & Admin

Understand:
- Lakehouse, Warehouse, Dataflow Gen2, Pipelines, Notebooks, Semantic models, KQL DB
Learn:
- Tenant vs Capacity vs Workspace
- Critical settings: “Users can create Fabric items”, guest access, export settings, labels, DirectLake requirements

Step 2 – Prepare Data (Hands-on + Theory)

Build a small scenario:
- Ingest CSV → Dataflow Gen2 → Lakehouse
- Transform with Dataflow (custom column, type changes)
- Optionally, use a Pipeline to orchestrate Dataflow + Notebook
Explore:
- Lakehouse Tables vs Files, Delta basics
- Shortcuts to some external storage (if possible)

Step 3 – Semantic Model & Reporting

Create a semantic model from the Lakehouse:
- Try Import and DirectLake
Build:
- A star-schema style model
- Some simple DAX measures (Total Sales, YTD, Last Year comparison)
Configure:
- RLS roles (e.g., region-based)

Step 4 – Governance & Lifecycle

Review:
- Workspace roles
- Item permissions & Build permission
- Sensitivity labels (set up a simple label and see its effects)
Understand:
- Git integration & PBIP concept
- Deployment pipelines Dev → Test → Prod
- Lineage view and impact analysis

Top comments (1)

namatai kanyai • Feb 11

Thanks. Your well-organized and concise notes make things simple to understand. I don't feel intimidated any more.