DEV Community

Cover image for DP-600 Fabric Analytics Engineer – Structured Study Notes
Sajjad Rahman
Sajjad Rahman

Posted on

DP-600 Fabric Analytics Engineer – Structured Study Notes

DP-600 Fabric Analytics Engineer – Structured Study Notes

(Governance, Admin, Lifecycle, Data Prep, Semantic Models)

0. Big Picture – What DP-600 Expects From You

The DP-600 exam is about designing, building, governing, and optimizing analytics solutions in Microsoft Fabric.

At a high level you must be able to:

  1. Maintain a data analytics solution
    – Governance, security, admin, lifecycle, impact analysis

  2. Prepare data
    – Ingest, transform, store in Lakehouse/Warehouse/KQL DB, apply patterns like incremental and CDC

  3. Implement and manage semantic models
    – Modeling, storage modes, DAX basics, RLS/OLS, performance

This note focuses on all the topics you listed and ties them together with exam-style thinking.


1. Maintain a Data Analytics Solution

(Governance, Admin, Lifecycle)

This area covers:

  1. Security & governance (access, RLS/CLS/OLS, sensitivity labels, endorsement)
  2. Admin: workspaces, capacities, roles
  3. Development lifecycle: Git, PBIP, deployment pipelines, XMLA
  4. Monitoring & impact analysis (lineage)

1.1 Security & Access Control in Fabric

Security is layered. Think about:

  • Workspace-level roles
  • Item-level permissions
  • Data-level security (RLS/CLS/OLS)
  • File-level security
  • Sensitivity labels
  • Endorsement

1.1.a Workspace-Level Access

Each workspace has roles:

  • Admin

    • Full control over the workspace
    • Can change settings, add/remove users, delete items
    • Used by platform owners / team leads
  • Member / Contributor

    • Can create and edit items (Lakehouse, Dataflow, Reports, Pipelines, Notebooks)
    • Cannot change workspace-level admin settings (like capacity)
  • Viewer

    • Read-only access: can view reports, dashboards, semantic models, etc.
    • Cannot create, edit, or publish new items

Common exam scenario:
“A user can open a report but cannot edit it or create a new dataflow in the same workspace. What is the likely cause?”
→ The user is a Viewer and needs at least Contributor/Member.


1.1.b Item-Level Permissions & Build

Even inside a workspace, every item has its own permissions:

  • Lakehouse
  • Warehouse
  • Semantic model (dataset)
  • Report
  • Notebook
  • Dataflow

A user might be Contributor on the workspace but still:

  • Cannot use a semantic model to build new reports
  • Cannot Analyze in Excel

The key concept is Build permission on the semantic model.

Build permission is required to:

  • Create new reports from a semantic model
  • Use Analyze in Excel
  • Use the semantic model from another workspace (when cross-workspace is allowed)
  • Allow composite models based on that dataset

Exam pattern:
“User can view a report but cannot use ‘Analyze in Excel’ or create a new report from the dataset.”
→ They are missing Build permission on the semantic model.


1.1.c RLS, CLS, OLS & File-Level Security

Security doesn’t stop at item level; it also applies to data inside the model.

Row-Level Security (RLS)

  • Restricts which rows a user can see.
  • Example:

    • User in “Bangladesh” role only sees rows where Country = 'BD'.
  • Implemented in the semantic model (Power BI dataset / Fabric semantic model).

Column-Level Security (CLS)

  • Controls which columns a user can see.
  • Example:

    • Hide Salary, SSN, PhoneNumber from certain roles.

Object-Level Security (OLS)

  • Hides entire tables or columns from users or roles.
  • Example:

    • A Finance table is completely hidden from Sales users.

File-Level Security (OneLake / Files)

In Fabric:

  • Lakehouse has:

    • Tables → usually accessed via SQL endpoint & semantic model
    • Files → raw storage in OneLake filesystem

You can secure Files separately:

  • A user might:

    • Query tables via semantic model
    • But not be allowed to browse the Lakehouse Files section
  • That requires filesystem / OneLake permissions, which are separate.

Exam pattern:
“User can query a Lakehouse table via the SQL endpoint but cannot open the Files area in the Lakehouse.”
→ They are missing file-system permission on OneLake.


1.1.d Sensitivity Labels (Purview Integration)

Sensitivity labels classify and protect data. Common examples:

  • Public
  • General
  • Confidential
  • Highly Confidential – No Export

Once applied to semantic models, reports, etc., labels can:

  • Block Export to Excel / CSV
  • Block Publish to web
  • Restrict external sharing (especially with encrypted labels)

Important admin settings include:

  • Allow users to apply sensitivity labels
    – Without this, labels aren’t available in Fabric.

  • Automatically apply sensitivity labels to downstream content
    – A label applied to a semantic model can propagate to reports, dashboards, etc.

  • Allow Microsoft Purview to secure AI interactions
    – Controls whether Copilot/AI can interact with labeled data under Purview governance.

Exam pattern:
“After applying a label to a report, users can view it but can no longer export the data. Why?”
→ The label (e.g., Highly Confidential – No Export) is configured to block export.


1.1.e Endorsement: Promoted & Certified

Endorsement helps users know which data they can trust.

  • Promoted

    • Indicates team-level / local confidence
    • Suggestion: “This is useful, but not centrally governed.”
  • Certified

    • Indicates organization-level endorsement
    • Represents single source of truth
    • Generally restricted to a set of data stewards / admins

Endorsed items typically show in:

  • Data hub
  • OneLake data explorer
  • Search and discovery areas

Exam pattern:
“The central BI team wants certain semantic models to be recognized as single sources of truth across the organization. What should they use?”
→ Use Certification with appropriate governance.


1.2 Admin & Governance: Tenant, Capacity, Workspace

A lot of DP-600 questions are about “which setting to change?”.

The mental model:

  • Tenant = organization-wide policies
  • Capacity = compute configuration (F64, F128, Premium)
  • Workspace = team/project scope

Tenant

Controls global features like:

  • Whether Fabric is enabled at all
  • Whether users can create Fabric items
  • Export policies (Excel/CSV, PBIX download)
  • Guest users and external sharing
  • Sensitivity labels and Purview/AI integration
  • Trial activation (“Users can try Fabric paid features”)

Think: “This affects everyone / every workspace” → Tenant.


Capacity

Controls the compute resources and workloads:

  • Type of capacity: Fabric F-SKU, Power BI Premium, Embedded, etc.
  • Region and SKU size (F64, F128, etc.)
  • Workload settings:

    • Spark / Data Engineering
    • Dataflows
    • Pipelines
    • DirectLake / cache behavior
  • Ability to pause and resume capacity

  • Concurrency limits and memory settings

Common issues:

  • Dataflows stuck in queue → capacity under pressure / workload disabled
  • Notebooks fail to start → Spark capacity exhausted
  • DirectLake becomes slow → capacity overloaded, cache eviction

Workspace

The workspace ties everything together:

  • Assigned to a capacity (shared vs Fabric/Premium)
  • Used by teams to store:

    • Lakehouses
    • Warehouses
    • Dataflows
    • Pipelines
    • Semantic models
    • Reports

Key points:

  • Shared capacity workspaces:

    • Can host Power BI content (reports, datasets, dashboards)
    • Cannot host Fabric items like Lakehouse, Warehouse, Dataflow Gen2, Notebooks, Pipelines, DirectLake
  • Fabric/Premium capacity workspaces:

    • Can use the full Fabric experiences.

Exam pattern:
“A workspace does not show the option to create a Lakehouse or Dataflow Gen2, but users can still create reports.”
→ The workspace is on Shared capacity or tenant has disabled Users can create Fabric items.


1.3 Analytics Development Lifecycle (CI/CD, Versioning)

This part is about acting like a real analytics engineer, not just building in the UI.

Key areas:

  1. Version control with Git
  2. PBIX vs PBIP/PBIR
  3. Deployment pipelines (Dev → Test → Prod)
  4. XMLA endpoint
  5. Impact analysis / lineage

1.3.a Version Control with Git

Fabric supports Git integration for workspaces, so you can:

  • Connect a Fabric workspace to a Git repository
  • Store:

    • Reports as PBIP / PBIR
    • Notebooks as .ipynb or scripts
    • SQL scripts
    • Pipeline definitions as code

Benefits:

  • Team collaboration
  • Pull request (PR)-based code review
  • History & rollback
  • Integration into DevOps/CI/CD pipelines

Exam-view: DP-600 expects you to know why Git integration is useful, not every click.


1.3.b PBIX vs PBIP / PBIR

  • PBIX

    • Binary file
    • Traditional Power BI format
    • Harder to source control (large monolithic file)
  • PBIP / PBIR

    • Text-based project structure
    • Separates metadata, model, and report layout into files/folders
    • Much easier for Git and automated deployments

For DP-600, you must understand:
PBIP/PBIR = Git-friendly representation of Power BI content.


1.3.c Deployment Pipelines

Deployment pipelines support stage-based promotion of content:

  • Typical stages:

    • Development → Test → Production

You can deploy:

  • Semantic models (datasets)
  • Reports
  • Dashboards
  • Some Fabric items (or via scripts/automation)

Key ideas:

  • Rules / parameters per stage:

    • Different data source connections per environment (Dev DB vs Prod DB)
  • Ensuring changes are tested in Test before going to Production

  • Automatic or manual deployment flows

Exam scenario:
“You must promote changes from Dev to Prod while pointing to different databases. What feature should you use?”
→ Deployment pipelines with rules for data source bindings.


1.3.d XMLA Endpoint (for Semantic Models)

XMLA endpoint allows enterprise management of semantic models using external tools like:

  • Tabular Editor
  • SQL Server Management Studio (SSMS)
  • Scripts for deployment and partition management

Typical actions:

  • Partitioning large tables
  • Incremental refresh control
  • Advanced calculation groups
  • Scripting deployments (e.g., ALM Toolkit, DevOps pipelines)

For DP-600, you need conceptual understanding:
XMLA endpoint = enterprise-grade control and automation for Fabric/Power BI semantic models.


1.3.e Impact Analysis & Lineage

Impact analysis means understanding:

  • If I change this object, what breaks?

Fabric offers:

  • Lineage view:

    • Shows flow: Data source → Dataflow → Lakehouse/Warehouse → Semantic model → Reports

Use cases:

  • Before changing a table schema
  • Before deleting a Dataflow or Lakehouse
  • Before renaming columns or tables

Exam pattern:
“You plan to change a Lakehouse table schema and want to ensure you understand which semantic models and reports will be affected. What should you use?”
→ Use lineage view or impact analysis tools in Fabric.


2. Prepare Data (Largest Weight in Exam)

This is the “heart” of DP-600:

  • Ingest data
  • Transform data
  • Store/serve data in Lakehouse / Warehouse / KQL DB
  • Design data processing patterns (incremental, CDC, real-time)
  • Enforce data quality and business rules

2.1 Ingest Data into Fabric

Main tools:

  1. Data Factory (Fabric Data Factory experience)
  2. Dataflow Gen2
  3. Spark Notebooks
  4. Shortcuts (zero-copy)

a) Data Factory

Fabric Data Factory is the orchestration & ingestion engine:

  • Copy Activity:

    • Move data from source to destination (e.g., SQL → Lakehouse, Blob → Warehouse)
  • Pipeline templates:

    • Prebuilt patterns (e.g., copy new files daily, ingest from SaaS, etc.)
  • Supports hundreds of connectors:

    • SQL Server, Azure SQL, ADLS, Blob, Salesforce, etc.

Use Data Factory when:

  • You need pure data movement (ELT)
  • You orchestrate multiple steps and sources
  • You want scheduled ingestion pipelines

b) Dataflow Gen2

Dataflow Gen2 is Power Query Online in Fabric:

  • Perform transformations (cleaning, shaping) using Power Query
  • Define ETL logic in a friendly GUI
  • Output to:

    • Lakehouse tables
    • Warehouse tables
    • KQL Databases

Typical use cases:

  • Business-friendly transformation definitions
  • Creating conformed dimension tables (e.g., DimCustomer)
  • Reusable data preparation logic

c) Spark Notebooks

Spark notebooks are for big data engineering:

  • Language options:

    • PySpark, Scala, Spark SQL
  • Use cases:

    • Large-scale joins and transforms
    • Processing semi-structured or unstructured data
    • Machine learning and advanced analytics
  • Write results into:

    • Delta tables in the Lakehouse
    • Files (Parquet, CSV, etc.)

d) Shortcuts (Zero-Copy)

Shortcuts let you:

  • Reference data in external storage (ADLS Gen2, S3, other OneLake accounts)
  • Without copying data into your Lakehouse

Think of a shortcut as:

A pointer to external data, not a copy.

Use for:

  • Central governance (single copy of data)
  • Multi-domain or multi-tenant architectures
  • Simplifying multi-region data access

2.2 Transform Data

Three main transformation engines:

  1. Dataflow Gen2 (Power Query)
  2. Spark Notebooks
  3. T-SQL in Warehouse / SQL Endpoint

1) Dataflow Gen2

Good for:

  • Business-friendly logic
  • Medium-size data
  • Tabular transformations
  • Slowly changing dimensions (when combined with logic)

Examples:

  • Clean column names, change data types
  • Merge (join) multiple sources
  • Filter out errors
  • Create calculated columns (e.g., MonthNumber = Date.Month([OrderDate]))

2) Spark Notebooks

Good for:

  • Very large datasets
  • Complex transformations where M/Power Query is limiting
  • Streaming + batch
  • AI/ML workflows

You can:

  • Use PySpark to message large tables
  • Optimize with Delta features (OPTIMIZE, VACUUM)

3) T-SQL (Warehouse / SQL Endpoint)

Good for:

  • Teams with strong SQL skills
  • ELT patterns (Load raw → Clean → Transform using SQL)
  • Dimensional modeling inside Warehouse (Dim/Fact tables)
  • Creating views and stored procedures

You can:

  • Write SQL scripts
  • Use them in pipelines as SQL script activities
  • Build business logic in the Warehouse itself

Exam angle:
You must choose the right engine for each scenario:

  • Business user ETL? → Dataflow Gen2
  • Heavy big data & ML? → Spark notebooks
  • Standard enterprise SQL team? → T-SQL in Warehouse

2.3 Lakehouse vs Warehouse (Plus KQL DB)

Lakehouse

  • Combines:

    • Files (raw storage)
    • Tables (Delta format)
  • Ideal for:

    • Medallion architecture:
    • Bronze (raw)
    • Silver (cleaned)
    • Gold (business)
    • Big data, semi-structured files, logs
  • Works best with:

    • Spark notebooks
    • Dataflow Gen2
    • DirectLake-based semantic models

Warehouse

  • SQL-first experience:

    • Tables, views, stored procedures
  • Ideal for:

    • Structured, relational modeling
    • Enterprise reporting, financial analytics
    • Teams strong in SQL
  • Serves:

    • Semantic models
    • Reports directly via SQL engine

Typical exam scenario:

  • “Customer 360 view, finance, heavily structured reporting, strong SQL skills → choose Warehouse.”
  • “High-volume logs, clickstream, semi-structured events → choose Lakehouse + Spark.”

KQL Database (Kusto Query Language)

  • Optimized for log and time-series analytics
  • Used for:

    • Real-time telemetry
    • Observability data
    • Operational analytics

2.4 Patterns: Incremental, CDC, Real-Time

Incremental Load

  • Process only new or changed data.
  • Benefits:

    • Performance
    • Cost
    • Reduced load windows

Examples:

  • Dataflow Gen2 with incremental refresh policy (by date column)
  • SQL-based approaches using LastModifiedDate watermark

Change Data Capture (CDC) / Change Feed

  • Tracks row-level changes from a source (insert, update, delete)
  • Good for:

    • Keeping Warehouse or Lakehouse in sync with OLTP systems
    • Near-real-time updates

Real-Time / Streaming

  • Fabric also offers Real-Time Intelligence components like:

    • Eventstreams
    • KQL DB
    • Real-time dashboards

Use case:

  • IoT streams
  • Website event tracking
  • Monitoring metrics with low latency

3. Implement and Manage Semantic Models

This is the Power BI + Fabric modeling side:

  1. Storage modes
  2. Modeling fundamentals
  3. DAX basics
  4. Security in the model (RLS/OLS)
  5. Performance & optimization

3.1 Storage Modes: Import, DirectQuery, DirectLake, Composite

  1. Import
  • Data is stored inside the model in memory.
  • Very fast query performance.
  • Requires scheduled or manual refresh.
  • Good for small–medium datasets and users who can tolerate some latency in freshness.
  1. DirectQuery
  • No data stored in the model; all queries go to the source (SQL, etc.)
  • Good for real-time or near-real-time when data must stay at source.
  • Slower and more limited in modeling features; depends heavily on source performance.
  1. DirectLake
  • Fabric-only mode.
  • Queries Delta tables in OneLake directly, with caching.
  • Combines advantages of Import (fast) and DirectQuery (near-real-time) for Fabric-based data.
  • Requires Fabric/Premium capacity and proper Lakehouse/Warehouse setup.
  1. Composite Models
  • Mixes Import + DirectQuery + DirectLake in one model.
  • Very flexible but more complex to manage.

Exam scenario example:
“You must query a large Fabric Lakehouse with low latency and avoid scheduled refresh, and you have Fabric capacity. Which storage mode is recommended?”
DirectLake.


3.2 Modeling Fundamentals

Star schema is the gold standard:

  • Fact tables

    • Large, numeric, transactional data
    • E.g., FactSales, FactOrders
  • Dimension tables

    • Smaller, descriptive, business entities
    • E.g., DimCustomer, DimProduct, DimDate

Benefits:

  • Simpler queries
  • Better performance
  • More intuitive modeling

Relationships:

  • Mostly one-to-many (Dimension → Fact)
  • Use single direction filtering (Dimension filters Fact) unless there’s a strong reason otherwise.
  • Many-to-many relationships: use carefully; they can create ambiguous filter paths.

Best practices:

  • Use surrogate keys (integers) for relationships
  • Keep dimension tables wide but not huge
  • Avoid snowflake if performance is critical (flatten dimensions where reasonable)

3.3 DAX & Calculations

You don’t need to be a DAX guru for DP-600, but you should understand:

  • Calculated columns vs Measures:

    • Calculated columns:
    • Computed at data refresh time
    • Stored in the model
    • Row context
    • Measures:
    • Computed at query time
    • Based on filter context
  • Row context vs Filter context:

    • Row context → operations that work “row by row”
    • Filter context → the filters that apply to the measure (via slicers, filters, etc.)
  • Basic time intelligence:

    • TOTALYTD, SAMEPERIODLASTYEAR, DATEADD, etc.

3.4 Security in Semantic Models

Again, security at the model level:

  • RLS (Row-Level Security):

    • Table filter expressions per role
    • Example: [Country] = "Bangladesh" for certain users
  • OLS (Object-Level Security):

    • Hide specific tables/columns from roles

Important:

  • RLS/OLS apply regardless of storage mode (Import/DirectQuery/DirectLake), with some advanced nuances.
  • Even if a user has Build, they still only see rows allowed by RLS.

3.5 Performance & Optimization

Large or complex models can slow down. Common optimization strategies:

  • Use star schema instead of snowflake
  • Avoid many-to-many relationships and bi-directional filters unless necessary
  • Use aggregations
  • Avoid overly complex DAX measures that iterate row-by-row over large tables
  • Use proper types & summarization to reduce memory usage
  • Use XMLA endpoint and Tabular Editor for:

    • Partitioning fact tables
    • Incremental refresh tuning
    • Calculation groups

Exam-style question:
“A large semantic model is slow. What are effective techniques to improve performance?”
→ Star schema, aggregations, optimizing relationships, partitioning, and simplifying DAX.


4. Practical Study Plan (How to Use This Note)

Since you already have some hands-on with Fabric:

Step 1 – Fabric Objects & Admin

  • Understand:

    • Lakehouse, Warehouse, Dataflow Gen2, Pipelines, Notebooks, Semantic models, KQL DB
  • Learn:

    • Tenant vs Capacity vs Workspace
    • Critical settings: “Users can create Fabric items”, guest access, export settings, labels, DirectLake requirements

Step 2 – Prepare Data (Hands-on + Theory)

  • Build a small scenario:

    • Ingest CSV → Dataflow Gen2 → Lakehouse
    • Transform with Dataflow (custom column, type changes)
    • Optionally, use a Pipeline to orchestrate Dataflow + Notebook
  • Explore:

    • Lakehouse Tables vs Files, Delta basics
    • Shortcuts to some external storage (if possible)

Step 3 – Semantic Model & Reporting

  • Create a semantic model from the Lakehouse:

    • Try Import and DirectLake
  • Build:

    • A star-schema style model
    • Some simple DAX measures (Total Sales, YTD, Last Year comparison)
  • Configure:

    • RLS roles (e.g., region-based)

Step 4 – Governance & Lifecycle

  • Review:

    • Workspace roles
    • Item permissions & Build permission
    • Sensitivity labels (set up a simple label and see its effects)
  • Understand:

    • Git integration & PBIP concept
    • Deployment pipelines Dev → Test → Prod
    • Lineage view and impact analysis

Top comments (4)

Collapse
 
fabaos profile image
wibemi

Just starting my DP-600 journey and this post is exactly what I needed to understand the big picture. Also using Dumps4Azure.com for practice and the quality of questions is really impressive. Feeling much more confident already!

Collapse
 
anandaallen profile image
anandaallen

Just got my DP-600 result and passed first time. Dumps4Azure was genuinely the best investment I made in my preparation. Questions were realistic, explanations were solid and the whole platform just works. Nothing else I tried gave me this level of confidence.

Collapse
 
namatai_kanyai_0a7d8431a5 profile image
namatai kanyai

Thanks. Your well-organized and concise notes make things simple to understand. I don't feel intimidated any more.

Collapse
 
poljini profile image
poljini

I recently passed the DP-600 exam and practicing real questions was the most effective part of my study plan. I used Pass4SurExams practice questions and they were very useful.