DEV Community

Cover image for DP-600 Fabric Analytics Engineer – Structured Study Notes
Sajjad Rahman
Sajjad Rahman

Posted on

DP-600 Fabric Analytics Engineer – Structured Study Notes

DP-600 Fabric Analytics Engineer – Structured Study Notes

(Governance, Admin, Lifecycle, Data Prep, Semantic Models)

0. Big Picture – What DP-600 Expects From You

The DP-600 exam is about designing, building, governing, and optimizing analytics solutions in Microsoft Fabric.

At a high level you must be able to:

  1. Maintain a data analytics solution
    – Governance, security, admin, lifecycle, impact analysis

  2. Prepare data
    – Ingest, transform, store in Lakehouse/Warehouse/KQL DB, apply patterns like incremental and CDC

  3. Implement and manage semantic models
    – Modeling, storage modes, DAX basics, RLS/OLS, performance

This note focuses on all the topics you listed and ties them together with exam-style thinking.


1. Maintain a Data Analytics Solution

(Governance, Admin, Lifecycle)

This area covers:

  1. Security & governance (access, RLS/CLS/OLS, sensitivity labels, endorsement)
  2. Admin: workspaces, capacities, roles
  3. Development lifecycle: Git, PBIP, deployment pipelines, XMLA
  4. Monitoring & impact analysis (lineage)

1.1 Security & Access Control in Fabric

Security is layered. Think about:

  • Workspace-level roles
  • Item-level permissions
  • Data-level security (RLS/CLS/OLS)
  • File-level security
  • Sensitivity labels
  • Endorsement

1.1.a Workspace-Level Access

Each workspace has roles:

  • Admin

    • Full control over the workspace
    • Can change settings, add/remove users, delete items
    • Used by platform owners / team leads
  • Member / Contributor

    • Can create and edit items (Lakehouse, Dataflow, Reports, Pipelines, Notebooks)
    • Cannot change workspace-level admin settings (like capacity)
  • Viewer

    • Read-only access: can view reports, dashboards, semantic models, etc.
    • Cannot create, edit, or publish new items

Common exam scenario:
“A user can open a report but cannot edit it or create a new dataflow in the same workspace. What is the likely cause?”
→ The user is a Viewer and needs at least Contributor/Member.


1.1.b Item-Level Permissions & Build

Even inside a workspace, every item has its own permissions:

  • Lakehouse
  • Warehouse
  • Semantic model (dataset)
  • Report
  • Notebook
  • Dataflow

A user might be Contributor on the workspace but still:

  • Cannot use a semantic model to build new reports
  • Cannot Analyze in Excel

The key concept is Build permission on the semantic model.

Build permission is required to:

  • Create new reports from a semantic model
  • Use Analyze in Excel
  • Use the semantic model from another workspace (when cross-workspace is allowed)
  • Allow composite models based on that dataset

Exam pattern:
“User can view a report but cannot use ‘Analyze in Excel’ or create a new report from the dataset.”
→ They are missing Build permission on the semantic model.


1.1.c RLS, CLS, OLS & File-Level Security

Security doesn’t stop at item level; it also applies to data inside the model.

Row-Level Security (RLS)

  • Restricts which rows a user can see.
  • Example:

    • User in “Bangladesh” role only sees rows where Country = 'BD'.
  • Implemented in the semantic model (Power BI dataset / Fabric semantic model).

Column-Level Security (CLS)

  • Controls which columns a user can see.
  • Example:

    • Hide Salary, SSN, PhoneNumber from certain roles.

Object-Level Security (OLS)

  • Hides entire tables or columns from users or roles.
  • Example:

    • A Finance table is completely hidden from Sales users.

File-Level Security (OneLake / Files)

In Fabric:

  • Lakehouse has:

    • Tables → usually accessed via SQL endpoint & semantic model
    • Files → raw storage in OneLake filesystem

You can secure Files separately:

  • A user might:

    • Query tables via semantic model
    • But not be allowed to browse the Lakehouse Files section
  • That requires filesystem / OneLake permissions, which are separate.

Exam pattern:
“User can query a Lakehouse table via the SQL endpoint but cannot open the Files area in the Lakehouse.”
→ They are missing file-system permission on OneLake.


1.1.d Sensitivity Labels (Purview Integration)

Sensitivity labels classify and protect data. Common examples:

  • Public
  • General
  • Confidential
  • Highly Confidential – No Export

Once applied to semantic models, reports, etc., labels can:

  • Block Export to Excel / CSV
  • Block Publish to web
  • Restrict external sharing (especially with encrypted labels)

Important admin settings include:

  • Allow users to apply sensitivity labels
    – Without this, labels aren’t available in Fabric.

  • Automatically apply sensitivity labels to downstream content
    – A label applied to a semantic model can propagate to reports, dashboards, etc.

  • Allow Microsoft Purview to secure AI interactions
    – Controls whether Copilot/AI can interact with labeled data under Purview governance.

Exam pattern:
“After applying a label to a report, users can view it but can no longer export the data. Why?”
→ The label (e.g., Highly Confidential – No Export) is configured to block export.


1.1.e Endorsement: Promoted & Certified

Endorsement helps users know which data they can trust.

  • Promoted

    • Indicates team-level / local confidence
    • Suggestion: “This is useful, but not centrally governed.”
  • Certified

    • Indicates organization-level endorsement
    • Represents single source of truth
    • Generally restricted to a set of data stewards / admins

Endorsed items typically show in:

  • Data hub
  • OneLake data explorer
  • Search and discovery areas

Exam pattern:
“The central BI team wants certain semantic models to be recognized as single sources of truth across the organization. What should they use?”
→ Use Certification with appropriate governance.


1.2 Admin & Governance: Tenant, Capacity, Workspace

A lot of DP-600 questions are about “which setting to change?”.

The mental model:

  • Tenant = organization-wide policies
  • Capacity = compute configuration (F64, F128, Premium)
  • Workspace = team/project scope

Tenant

Controls global features like:

  • Whether Fabric is enabled at all
  • Whether users can create Fabric items
  • Export policies (Excel/CSV, PBIX download)
  • Guest users and external sharing
  • Sensitivity labels and Purview/AI integration
  • Trial activation (“Users can try Fabric paid features”)

Think: “This affects everyone / every workspace” → Tenant.


Capacity

Controls the compute resources and workloads:

  • Type of capacity: Fabric F-SKU, Power BI Premium, Embedded, etc.
  • Region and SKU size (F64, F128, etc.)
  • Workload settings:

    • Spark / Data Engineering
    • Dataflows
    • Pipelines
    • DirectLake / cache behavior
  • Ability to pause and resume capacity

  • Concurrency limits and memory settings

Common issues:

  • Dataflows stuck in queue → capacity under pressure / workload disabled
  • Notebooks fail to start → Spark capacity exhausted
  • DirectLake becomes slow → capacity overloaded, cache eviction

Workspace

The workspace ties everything together:

  • Assigned to a capacity (shared vs Fabric/Premium)
  • Used by teams to store:

    • Lakehouses
    • Warehouses
    • Dataflows
    • Pipelines
    • Semantic models
    • Reports

Key points:

  • Shared capacity workspaces:

    • Can host Power BI content (reports, datasets, dashboards)
    • Cannot host Fabric items like Lakehouse, Warehouse, Dataflow Gen2, Notebooks, Pipelines, DirectLake
  • Fabric/Premium capacity workspaces:

    • Can use the full Fabric experiences.

Exam pattern:
“A workspace does not show the option to create a Lakehouse or Dataflow Gen2, but users can still create reports.”
→ The workspace is on Shared capacity or tenant has disabled Users can create Fabric items.


1.3 Analytics Development Lifecycle (CI/CD, Versioning)

This part is about acting like a real analytics engineer, not just building in the UI.

Key areas:

  1. Version control with Git
  2. PBIX vs PBIP/PBIR
  3. Deployment pipelines (Dev → Test → Prod)
  4. XMLA endpoint
  5. Impact analysis / lineage

1.3.a Version Control with Git

Fabric supports Git integration for workspaces, so you can:

  • Connect a Fabric workspace to a Git repository
  • Store:

    • Reports as PBIP / PBIR
    • Notebooks as .ipynb or scripts
    • SQL scripts
    • Pipeline definitions as code

Benefits:

  • Team collaboration
  • Pull request (PR)-based code review
  • History & rollback
  • Integration into DevOps/CI/CD pipelines

Exam-view: DP-600 expects you to know why Git integration is useful, not every click.


1.3.b PBIX vs PBIP / PBIR

  • PBIX

    • Binary file
    • Traditional Power BI format
    • Harder to source control (large monolithic file)
  • PBIP / PBIR

    • Text-based project structure
    • Separates metadata, model, and report layout into files/folders
    • Much easier for Git and automated deployments

For DP-600, you must understand:
PBIP/PBIR = Git-friendly representation of Power BI content.


1.3.c Deployment Pipelines

Deployment pipelines support stage-based promotion of content:

  • Typical stages:

    • Development → Test → Production

You can deploy:

  • Semantic models (datasets)
  • Reports
  • Dashboards
  • Some Fabric items (or via scripts/automation)

Key ideas:

  • Rules / parameters per stage:

    • Different data source connections per environment (Dev DB vs Prod DB)
  • Ensuring changes are tested in Test before going to Production

  • Automatic or manual deployment flows

Exam scenario:
“You must promote changes from Dev to Prod while pointing to different databases. What feature should you use?”
→ Deployment pipelines with rules for data source bindings.


1.3.d XMLA Endpoint (for Semantic Models)

XMLA endpoint allows enterprise management of semantic models using external tools like:

  • Tabular Editor
  • SQL Server Management Studio (SSMS)
  • Scripts for deployment and partition management

Typical actions:

  • Partitioning large tables
  • Incremental refresh control
  • Advanced calculation groups
  • Scripting deployments (e.g., ALM Toolkit, DevOps pipelines)

For DP-600, you need conceptual understanding:
XMLA endpoint = enterprise-grade control and automation for Fabric/Power BI semantic models.


1.3.e Impact Analysis & Lineage

Impact analysis means understanding:

  • If I change this object, what breaks?

Fabric offers:

  • Lineage view:

    • Shows flow: Data source → Dataflow → Lakehouse/Warehouse → Semantic model → Reports

Use cases:

  • Before changing a table schema
  • Before deleting a Dataflow or Lakehouse
  • Before renaming columns or tables

Exam pattern:
“You plan to change a Lakehouse table schema and want to ensure you understand which semantic models and reports will be affected. What should you use?”
→ Use lineage view or impact analysis tools in Fabric.


2. Prepare Data (Largest Weight in Exam)

This is the “heart” of DP-600:

  • Ingest data
  • Transform data
  • Store/serve data in Lakehouse / Warehouse / KQL DB
  • Design data processing patterns (incremental, CDC, real-time)
  • Enforce data quality and business rules

2.1 Ingest Data into Fabric

Main tools:

  1. Data Factory (Fabric Data Factory experience)
  2. Dataflow Gen2
  3. Spark Notebooks
  4. Shortcuts (zero-copy)

a) Data Factory

Fabric Data Factory is the orchestration & ingestion engine:

  • Copy Activity:

    • Move data from source to destination (e.g., SQL → Lakehouse, Blob → Warehouse)
  • Pipeline templates:

    • Prebuilt patterns (e.g., copy new files daily, ingest from SaaS, etc.)
  • Supports hundreds of connectors:

    • SQL Server, Azure SQL, ADLS, Blob, Salesforce, etc.

Use Data Factory when:

  • You need pure data movement (ELT)
  • You orchestrate multiple steps and sources
  • You want scheduled ingestion pipelines

b) Dataflow Gen2

Dataflow Gen2 is Power Query Online in Fabric:

  • Perform transformations (cleaning, shaping) using Power Query
  • Define ETL logic in a friendly GUI
  • Output to:

    • Lakehouse tables
    • Warehouse tables
    • KQL Databases

Typical use cases:

  • Business-friendly transformation definitions
  • Creating conformed dimension tables (e.g., DimCustomer)
  • Reusable data preparation logic

c) Spark Notebooks

Spark notebooks are for big data engineering:

  • Language options:

    • PySpark, Scala, Spark SQL
  • Use cases:

    • Large-scale joins and transforms
    • Processing semi-structured or unstructured data
    • Machine learning and advanced analytics
  • Write results into:

    • Delta tables in the Lakehouse
    • Files (Parquet, CSV, etc.)

d) Shortcuts (Zero-Copy)

Shortcuts let you:

  • Reference data in external storage (ADLS Gen2, S3, other OneLake accounts)
  • Without copying data into your Lakehouse

Think of a shortcut as:

A pointer to external data, not a copy.

Use for:

  • Central governance (single copy of data)
  • Multi-domain or multi-tenant architectures
  • Simplifying multi-region data access

2.2 Transform Data

Three main transformation engines:

  1. Dataflow Gen2 (Power Query)
  2. Spark Notebooks
  3. T-SQL in Warehouse / SQL Endpoint

1) Dataflow Gen2

Good for:

  • Business-friendly logic
  • Medium-size data
  • Tabular transformations
  • Slowly changing dimensions (when combined with logic)

Examples:

  • Clean column names, change data types
  • Merge (join) multiple sources
  • Filter out errors
  • Create calculated columns (e.g., MonthNumber = Date.Month([OrderDate]))

2) Spark Notebooks

Good for:

  • Very large datasets
  • Complex transformations where M/Power Query is limiting
  • Streaming + batch
  • AI/ML workflows

You can:

  • Use PySpark to message large tables
  • Optimize with Delta features (OPTIMIZE, VACUUM)

3) T-SQL (Warehouse / SQL Endpoint)

Good for:

  • Teams with strong SQL skills
  • ELT patterns (Load raw → Clean → Transform using SQL)
  • Dimensional modeling inside Warehouse (Dim/Fact tables)
  • Creating views and stored procedures

You can:

  • Write SQL scripts
  • Use them in pipelines as SQL script activities
  • Build business logic in the Warehouse itself

Exam angle:
You must choose the right engine for each scenario:

  • Business user ETL? → Dataflow Gen2
  • Heavy big data & ML? → Spark notebooks
  • Standard enterprise SQL team? → T-SQL in Warehouse

2.3 Lakehouse vs Warehouse (Plus KQL DB)

Lakehouse

  • Combines:

    • Files (raw storage)
    • Tables (Delta format)
  • Ideal for:

    • Medallion architecture:
    • Bronze (raw)
    • Silver (cleaned)
    • Gold (business)
    • Big data, semi-structured files, logs
  • Works best with:

    • Spark notebooks
    • Dataflow Gen2
    • DirectLake-based semantic models

Warehouse

  • SQL-first experience:

    • Tables, views, stored procedures
  • Ideal for:

    • Structured, relational modeling
    • Enterprise reporting, financial analytics
    • Teams strong in SQL
  • Serves:

    • Semantic models
    • Reports directly via SQL engine

Typical exam scenario:

  • “Customer 360 view, finance, heavily structured reporting, strong SQL skills → choose Warehouse.”
  • “High-volume logs, clickstream, semi-structured events → choose Lakehouse + Spark.”

KQL Database (Kusto Query Language)

  • Optimized for log and time-series analytics
  • Used for:

    • Real-time telemetry
    • Observability data
    • Operational analytics

2.4 Patterns: Incremental, CDC, Real-Time

Incremental Load

  • Process only new or changed data.
  • Benefits:

    • Performance
    • Cost
    • Reduced load windows

Examples:

  • Dataflow Gen2 with incremental refresh policy (by date column)
  • SQL-based approaches using LastModifiedDate watermark

Change Data Capture (CDC) / Change Feed

  • Tracks row-level changes from a source (insert, update, delete)
  • Good for:

    • Keeping Warehouse or Lakehouse in sync with OLTP systems
    • Near-real-time updates

Real-Time / Streaming

  • Fabric also offers Real-Time Intelligence components like:

    • Eventstreams
    • KQL DB
    • Real-time dashboards

Use case:

  • IoT streams
  • Website event tracking
  • Monitoring metrics with low latency

3. Implement and Manage Semantic Models

This is the Power BI + Fabric modeling side:

  1. Storage modes
  2. Modeling fundamentals
  3. DAX basics
  4. Security in the model (RLS/OLS)
  5. Performance & optimization

3.1 Storage Modes: Import, DirectQuery, DirectLake, Composite

  1. Import
  • Data is stored inside the model in memory.
  • Very fast query performance.
  • Requires scheduled or manual refresh.
  • Good for small–medium datasets and users who can tolerate some latency in freshness.
  1. DirectQuery
  • No data stored in the model; all queries go to the source (SQL, etc.)
  • Good for real-time or near-real-time when data must stay at source.
  • Slower and more limited in modeling features; depends heavily on source performance.
  1. DirectLake
  • Fabric-only mode.
  • Queries Delta tables in OneLake directly, with caching.
  • Combines advantages of Import (fast) and DirectQuery (near-real-time) for Fabric-based data.
  • Requires Fabric/Premium capacity and proper Lakehouse/Warehouse setup.
  1. Composite Models
  • Mixes Import + DirectQuery + DirectLake in one model.
  • Very flexible but more complex to manage.

Exam scenario example:
“You must query a large Fabric Lakehouse with low latency and avoid scheduled refresh, and you have Fabric capacity. Which storage mode is recommended?”
DirectLake.


3.2 Modeling Fundamentals

Star schema is the gold standard:

  • Fact tables

    • Large, numeric, transactional data
    • E.g., FactSales, FactOrders
  • Dimension tables

    • Smaller, descriptive, business entities
    • E.g., DimCustomer, DimProduct, DimDate

Benefits:

  • Simpler queries
  • Better performance
  • More intuitive modeling

Relationships:

  • Mostly one-to-many (Dimension → Fact)
  • Use single direction filtering (Dimension filters Fact) unless there’s a strong reason otherwise.
  • Many-to-many relationships: use carefully; they can create ambiguous filter paths.

Best practices:

  • Use surrogate keys (integers) for relationships
  • Keep dimension tables wide but not huge
  • Avoid snowflake if performance is critical (flatten dimensions where reasonable)

3.3 DAX & Calculations

You don’t need to be a DAX guru for DP-600, but you should understand:

  • Calculated columns vs Measures:

    • Calculated columns:
    • Computed at data refresh time
    • Stored in the model
    • Row context
    • Measures:
    • Computed at query time
    • Based on filter context
  • Row context vs Filter context:

    • Row context → operations that work “row by row”
    • Filter context → the filters that apply to the measure (via slicers, filters, etc.)
  • Basic time intelligence:

    • TOTALYTD, SAMEPERIODLASTYEAR, DATEADD, etc.

3.4 Security in Semantic Models

Again, security at the model level:

  • RLS (Row-Level Security):

    • Table filter expressions per role
    • Example: [Country] = "Bangladesh" for certain users
  • OLS (Object-Level Security):

    • Hide specific tables/columns from roles

Important:

  • RLS/OLS apply regardless of storage mode (Import/DirectQuery/DirectLake), with some advanced nuances.
  • Even if a user has Build, they still only see rows allowed by RLS.

3.5 Performance & Optimization

Large or complex models can slow down. Common optimization strategies:

  • Use star schema instead of snowflake
  • Avoid many-to-many relationships and bi-directional filters unless necessary
  • Use aggregations
  • Avoid overly complex DAX measures that iterate row-by-row over large tables
  • Use proper types & summarization to reduce memory usage
  • Use XMLA endpoint and Tabular Editor for:

    • Partitioning fact tables
    • Incremental refresh tuning
    • Calculation groups

Exam-style question:
“A large semantic model is slow. What are effective techniques to improve performance?”
→ Star schema, aggregations, optimizing relationships, partitioning, and simplifying DAX.


4. Practical Study Plan (How to Use This Note)

Since you already have some hands-on with Fabric:

Step 1 – Fabric Objects & Admin

  • Understand:

    • Lakehouse, Warehouse, Dataflow Gen2, Pipelines, Notebooks, Semantic models, KQL DB
  • Learn:

    • Tenant vs Capacity vs Workspace
    • Critical settings: “Users can create Fabric items”, guest access, export settings, labels, DirectLake requirements

Step 2 – Prepare Data (Hands-on + Theory)

  • Build a small scenario:

    • Ingest CSV → Dataflow Gen2 → Lakehouse
    • Transform with Dataflow (custom column, type changes)
    • Optionally, use a Pipeline to orchestrate Dataflow + Notebook
  • Explore:

    • Lakehouse Tables vs Files, Delta basics
    • Shortcuts to some external storage (if possible)

Step 3 – Semantic Model & Reporting

  • Create a semantic model from the Lakehouse:

    • Try Import and DirectLake
  • Build:

    • A star-schema style model
    • Some simple DAX measures (Total Sales, YTD, Last Year comparison)
  • Configure:

    • RLS roles (e.g., region-based)

Step 4 – Governance & Lifecycle

  • Review:

    • Workspace roles
    • Item permissions & Build permission
    • Sensitivity labels (set up a simple label and see its effects)
  • Understand:

    • Git integration & PBIP concept
    • Deployment pipelines Dev → Test → Prod
    • Lineage view and impact analysis

Top comments (0)