DP-600 Fabric Analytics Engineer – Structured Study Notes
(Governance, Admin, Lifecycle, Data Prep, Semantic Models)
0. Big Picture – What DP-600 Expects From You
The DP-600 exam is about designing, building, governing, and optimizing analytics solutions in Microsoft Fabric.
At a high level you must be able to:
Maintain a data analytics solution
– Governance, security, admin, lifecycle, impact analysisPrepare data
– Ingest, transform, store in Lakehouse/Warehouse/KQL DB, apply patterns like incremental and CDCImplement and manage semantic models
– Modeling, storage modes, DAX basics, RLS/OLS, performance
This note focuses on all the topics you listed and ties them together with exam-style thinking.
1. Maintain a Data Analytics Solution
(Governance, Admin, Lifecycle)
This area covers:
- Security & governance (access, RLS/CLS/OLS, sensitivity labels, endorsement)
- Admin: workspaces, capacities, roles
- Development lifecycle: Git, PBIP, deployment pipelines, XMLA
- Monitoring & impact analysis (lineage)
1.1 Security & Access Control in Fabric
Security is layered. Think about:
- Workspace-level roles
- Item-level permissions
- Data-level security (RLS/CLS/OLS)
- File-level security
- Sensitivity labels
- Endorsement
1.1.a Workspace-Level Access
Each workspace has roles:
-
Admin
- Full control over the workspace
- Can change settings, add/remove users, delete items
- Used by platform owners / team leads
-
Member / Contributor
- Can create and edit items (Lakehouse, Dataflow, Reports, Pipelines, Notebooks)
- Cannot change workspace-level admin settings (like capacity)
-
Viewer
- Read-only access: can view reports, dashboards, semantic models, etc.
- Cannot create, edit, or publish new items
Common exam scenario:
“A user can open a report but cannot edit it or create a new dataflow in the same workspace. What is the likely cause?”
→ The user is a Viewer and needs at least Contributor/Member.
1.1.b Item-Level Permissions & Build
Even inside a workspace, every item has its own permissions:
- Lakehouse
- Warehouse
- Semantic model (dataset)
- Report
- Notebook
- Dataflow
A user might be Contributor on the workspace but still:
- Cannot use a semantic model to build new reports
- Cannot Analyze in Excel
The key concept is Build permission on the semantic model.
Build permission is required to:
- Create new reports from a semantic model
- Use Analyze in Excel
- Use the semantic model from another workspace (when cross-workspace is allowed)
- Allow composite models based on that dataset
Exam pattern:
“User can view a report but cannot use ‘Analyze in Excel’ or create a new report from the dataset.”
→ They are missing Build permission on the semantic model.
1.1.c RLS, CLS, OLS & File-Level Security
Security doesn’t stop at item level; it also applies to data inside the model.
Row-Level Security (RLS)
- Restricts which rows a user can see.
-
Example:
- User in “Bangladesh” role only sees rows where
Country = 'BD'.
- User in “Bangladesh” role only sees rows where
Implemented in the semantic model (Power BI dataset / Fabric semantic model).
Column-Level Security (CLS)
- Controls which columns a user can see.
-
Example:
- Hide
Salary,SSN,PhoneNumberfrom certain roles.
- Hide
Object-Level Security (OLS)
- Hides entire tables or columns from users or roles.
-
Example:
- A Finance table is completely hidden from Sales users.
File-Level Security (OneLake / Files)
In Fabric:
-
Lakehouse has:
- Tables → usually accessed via SQL endpoint & semantic model
- Files → raw storage in OneLake filesystem
You can secure Files separately:
-
A user might:
- Query tables via semantic model
- But not be allowed to browse the Lakehouse Files section
That requires filesystem / OneLake permissions, which are separate.
Exam pattern:
“User can query a Lakehouse table via the SQL endpoint but cannot open the Files area in the Lakehouse.”
→ They are missing file-system permission on OneLake.
1.1.d Sensitivity Labels (Purview Integration)
Sensitivity labels classify and protect data. Common examples:
- Public
- General
- Confidential
- Highly Confidential – No Export
Once applied to semantic models, reports, etc., labels can:
- Block Export to Excel / CSV
- Block Publish to web
- Restrict external sharing (especially with encrypted labels)
Important admin settings include:
Allow users to apply sensitivity labels
– Without this, labels aren’t available in Fabric.Automatically apply sensitivity labels to downstream content
– A label applied to a semantic model can propagate to reports, dashboards, etc.Allow Microsoft Purview to secure AI interactions
– Controls whether Copilot/AI can interact with labeled data under Purview governance.
Exam pattern:
“After applying a label to a report, users can view it but can no longer export the data. Why?”
→ The label (e.g., Highly Confidential – No Export) is configured to block export.
1.1.e Endorsement: Promoted & Certified
Endorsement helps users know which data they can trust.
-
Promoted
- Indicates team-level / local confidence
- Suggestion: “This is useful, but not centrally governed.”
-
Certified
- Indicates organization-level endorsement
- Represents single source of truth
- Generally restricted to a set of data stewards / admins
Endorsed items typically show in:
- Data hub
- OneLake data explorer
- Search and discovery areas
Exam pattern:
“The central BI team wants certain semantic models to be recognized as single sources of truth across the organization. What should they use?”
→ Use Certification with appropriate governance.
1.2 Admin & Governance: Tenant, Capacity, Workspace
A lot of DP-600 questions are about “which setting to change?”.
The mental model:
- Tenant = organization-wide policies
- Capacity = compute configuration (F64, F128, Premium)
- Workspace = team/project scope
Tenant
Controls global features like:
- Whether Fabric is enabled at all
- Whether users can create Fabric items
- Export policies (Excel/CSV, PBIX download)
- Guest users and external sharing
- Sensitivity labels and Purview/AI integration
- Trial activation (“Users can try Fabric paid features”)
Think: “This affects everyone / every workspace” → Tenant.
Capacity
Controls the compute resources and workloads:
- Type of capacity: Fabric F-SKU, Power BI Premium, Embedded, etc.
- Region and SKU size (F64, F128, etc.)
-
Workload settings:
- Spark / Data Engineering
- Dataflows
- Pipelines
- DirectLake / cache behavior
Ability to pause and resume capacity
Concurrency limits and memory settings
Common issues:
- Dataflows stuck in queue → capacity under pressure / workload disabled
- Notebooks fail to start → Spark capacity exhausted
- DirectLake becomes slow → capacity overloaded, cache eviction
Workspace
The workspace ties everything together:
- Assigned to a capacity (shared vs Fabric/Premium)
-
Used by teams to store:
- Lakehouses
- Warehouses
- Dataflows
- Pipelines
- Semantic models
- Reports
Key points:
-
Shared capacity workspaces:
- Can host Power BI content (reports, datasets, dashboards)
- Cannot host Fabric items like Lakehouse, Warehouse, Dataflow Gen2, Notebooks, Pipelines, DirectLake
-
Fabric/Premium capacity workspaces:
- Can use the full Fabric experiences.
Exam pattern:
“A workspace does not show the option to create a Lakehouse or Dataflow Gen2, but users can still create reports.”
→ The workspace is on Shared capacity or tenant has disabled Users can create Fabric items.
1.3 Analytics Development Lifecycle (CI/CD, Versioning)
This part is about acting like a real analytics engineer, not just building in the UI.
Key areas:
- Version control with Git
- PBIX vs PBIP/PBIR
- Deployment pipelines (Dev → Test → Prod)
- XMLA endpoint
- Impact analysis / lineage
1.3.a Version Control with Git
Fabric supports Git integration for workspaces, so you can:
- Connect a Fabric workspace to a Git repository
-
Store:
- Reports as PBIP / PBIR
- Notebooks as
.ipynbor scripts - SQL scripts
- Pipeline definitions as code
Benefits:
- Team collaboration
- Pull request (PR)-based code review
- History & rollback
- Integration into DevOps/CI/CD pipelines
Exam-view: DP-600 expects you to know why Git integration is useful, not every click.
1.3.b PBIX vs PBIP / PBIR
-
PBIX
- Binary file
- Traditional Power BI format
- Harder to source control (large monolithic file)
-
PBIP / PBIR
- Text-based project structure
- Separates metadata, model, and report layout into files/folders
- Much easier for Git and automated deployments
For DP-600, you must understand:
PBIP/PBIR = Git-friendly representation of Power BI content.
1.3.c Deployment Pipelines
Deployment pipelines support stage-based promotion of content:
-
Typical stages:
- Development → Test → Production
You can deploy:
- Semantic models (datasets)
- Reports
- Dashboards
- Some Fabric items (or via scripts/automation)
Key ideas:
-
Rules / parameters per stage:
- Different data source connections per environment (Dev DB vs Prod DB)
Ensuring changes are tested in Test before going to Production
Automatic or manual deployment flows
Exam scenario:
“You must promote changes from Dev to Prod while pointing to different databases. What feature should you use?”
→ Deployment pipelines with rules for data source bindings.
1.3.d XMLA Endpoint (for Semantic Models)
XMLA endpoint allows enterprise management of semantic models using external tools like:
- Tabular Editor
- SQL Server Management Studio (SSMS)
- Scripts for deployment and partition management
Typical actions:
- Partitioning large tables
- Incremental refresh control
- Advanced calculation groups
- Scripting deployments (e.g., ALM Toolkit, DevOps pipelines)
For DP-600, you need conceptual understanding:
XMLA endpoint = enterprise-grade control and automation for Fabric/Power BI semantic models.
1.3.e Impact Analysis & Lineage
Impact analysis means understanding:
- If I change this object, what breaks?
Fabric offers:
-
Lineage view:
- Shows flow: Data source → Dataflow → Lakehouse/Warehouse → Semantic model → Reports
Use cases:
- Before changing a table schema
- Before deleting a Dataflow or Lakehouse
- Before renaming columns or tables
Exam pattern:
“You plan to change a Lakehouse table schema and want to ensure you understand which semantic models and reports will be affected. What should you use?”
→ Use lineage view or impact analysis tools in Fabric.
2. Prepare Data (Largest Weight in Exam)
This is the “heart” of DP-600:
- Ingest data
- Transform data
- Store/serve data in Lakehouse / Warehouse / KQL DB
- Design data processing patterns (incremental, CDC, real-time)
- Enforce data quality and business rules
2.1 Ingest Data into Fabric
Main tools:
- Data Factory (Fabric Data Factory experience)
- Dataflow Gen2
- Spark Notebooks
- Shortcuts (zero-copy)
a) Data Factory
Fabric Data Factory is the orchestration & ingestion engine:
-
Copy Activity:
- Move data from source to destination (e.g., SQL → Lakehouse, Blob → Warehouse)
-
Pipeline templates:
- Prebuilt patterns (e.g., copy new files daily, ingest from SaaS, etc.)
-
Supports hundreds of connectors:
- SQL Server, Azure SQL, ADLS, Blob, Salesforce, etc.
Use Data Factory when:
- You need pure data movement (ELT)
- You orchestrate multiple steps and sources
- You want scheduled ingestion pipelines
b) Dataflow Gen2
Dataflow Gen2 is Power Query Online in Fabric:
- Perform transformations (cleaning, shaping) using Power Query
- Define ETL logic in a friendly GUI
-
Output to:
- Lakehouse tables
- Warehouse tables
- KQL Databases
Typical use cases:
- Business-friendly transformation definitions
- Creating conformed dimension tables (e.g., DimCustomer)
- Reusable data preparation logic
c) Spark Notebooks
Spark notebooks are for big data engineering:
-
Language options:
- PySpark, Scala, Spark SQL
-
Use cases:
- Large-scale joins and transforms
- Processing semi-structured or unstructured data
- Machine learning and advanced analytics
-
Write results into:
- Delta tables in the Lakehouse
- Files (Parquet, CSV, etc.)
d) Shortcuts (Zero-Copy)
Shortcuts let you:
- Reference data in external storage (ADLS Gen2, S3, other OneLake accounts)
- Without copying data into your Lakehouse
Think of a shortcut as:
A pointer to external data, not a copy.
Use for:
- Central governance (single copy of data)
- Multi-domain or multi-tenant architectures
- Simplifying multi-region data access
2.2 Transform Data
Three main transformation engines:
- Dataflow Gen2 (Power Query)
- Spark Notebooks
- T-SQL in Warehouse / SQL Endpoint
1) Dataflow Gen2
Good for:
- Business-friendly logic
- Medium-size data
- Tabular transformations
- Slowly changing dimensions (when combined with logic)
Examples:
- Clean column names, change data types
- Merge (join) multiple sources
- Filter out errors
- Create calculated columns (e.g.,
MonthNumber = Date.Month([OrderDate]))
2) Spark Notebooks
Good for:
- Very large datasets
- Complex transformations where M/Power Query is limiting
- Streaming + batch
- AI/ML workflows
You can:
- Use PySpark to message large tables
- Optimize with Delta features (OPTIMIZE, VACUUM)
3) T-SQL (Warehouse / SQL Endpoint)
Good for:
- Teams with strong SQL skills
- ELT patterns (Load raw → Clean → Transform using SQL)
- Dimensional modeling inside Warehouse (Dim/Fact tables)
- Creating views and stored procedures
You can:
- Write SQL scripts
- Use them in pipelines as SQL script activities
- Build business logic in the Warehouse itself
Exam angle:
You must choose the right engine for each scenario:
- Business user ETL? → Dataflow Gen2
- Heavy big data & ML? → Spark notebooks
- Standard enterprise SQL team? → T-SQL in Warehouse
2.3 Lakehouse vs Warehouse (Plus KQL DB)
Lakehouse
-
Combines:
- Files (raw storage)
- Tables (Delta format)
-
Ideal for:
- Medallion architecture:
- Bronze (raw)
- Silver (cleaned)
- Gold (business)
- Big data, semi-structured files, logs
-
Works best with:
- Spark notebooks
- Dataflow Gen2
- DirectLake-based semantic models
Warehouse
-
SQL-first experience:
- Tables, views, stored procedures
-
Ideal for:
- Structured, relational modeling
- Enterprise reporting, financial analytics
- Teams strong in SQL
-
Serves:
- Semantic models
- Reports directly via SQL engine
Typical exam scenario:
- “Customer 360 view, finance, heavily structured reporting, strong SQL skills → choose Warehouse.”
- “High-volume logs, clickstream, semi-structured events → choose Lakehouse + Spark.”
KQL Database (Kusto Query Language)
- Optimized for log and time-series analytics
-
Used for:
- Real-time telemetry
- Observability data
- Operational analytics
2.4 Patterns: Incremental, CDC, Real-Time
Incremental Load
- Process only new or changed data.
-
Benefits:
- Performance
- Cost
- Reduced load windows
Examples:
- Dataflow Gen2 with incremental refresh policy (by date column)
- SQL-based approaches using
LastModifiedDatewatermark
Change Data Capture (CDC) / Change Feed
- Tracks row-level changes from a source (insert, update, delete)
-
Good for:
- Keeping Warehouse or Lakehouse in sync with OLTP systems
- Near-real-time updates
Real-Time / Streaming
-
Fabric also offers Real-Time Intelligence components like:
- Eventstreams
- KQL DB
- Real-time dashboards
Use case:
- IoT streams
- Website event tracking
- Monitoring metrics with low latency
3. Implement and Manage Semantic Models
This is the Power BI + Fabric modeling side:
- Storage modes
- Modeling fundamentals
- DAX basics
- Security in the model (RLS/OLS)
- Performance & optimization
3.1 Storage Modes: Import, DirectQuery, DirectLake, Composite
- Import
- Data is stored inside the model in memory.
- Very fast query performance.
- Requires scheduled or manual refresh.
- Good for small–medium datasets and users who can tolerate some latency in freshness.
- DirectQuery
- No data stored in the model; all queries go to the source (SQL, etc.)
- Good for real-time or near-real-time when data must stay at source.
- Slower and more limited in modeling features; depends heavily on source performance.
- DirectLake
- Fabric-only mode.
- Queries Delta tables in OneLake directly, with caching.
- Combines advantages of Import (fast) and DirectQuery (near-real-time) for Fabric-based data.
- Requires Fabric/Premium capacity and proper Lakehouse/Warehouse setup.
- Composite Models
- Mixes Import + DirectQuery + DirectLake in one model.
- Very flexible but more complex to manage.
Exam scenario example:
“You must query a large Fabric Lakehouse with low latency and avoid scheduled refresh, and you have Fabric capacity. Which storage mode is recommended?”
→ DirectLake.
3.2 Modeling Fundamentals
Star schema is the gold standard:
-
Fact tables
- Large, numeric, transactional data
- E.g., FactSales, FactOrders
-
Dimension tables
- Smaller, descriptive, business entities
- E.g., DimCustomer, DimProduct, DimDate
Benefits:
- Simpler queries
- Better performance
- More intuitive modeling
Relationships:
- Mostly one-to-many (Dimension → Fact)
- Use single direction filtering (Dimension filters Fact) unless there’s a strong reason otherwise.
- Many-to-many relationships: use carefully; they can create ambiguous filter paths.
Best practices:
- Use surrogate keys (integers) for relationships
- Keep dimension tables wide but not huge
- Avoid snowflake if performance is critical (flatten dimensions where reasonable)
3.3 DAX & Calculations
You don’t need to be a DAX guru for DP-600, but you should understand:
-
Calculated columns vs Measures:
- Calculated columns:
- Computed at data refresh time
- Stored in the model
- Row context
- Measures:
- Computed at query time
- Based on filter context
-
Row context vs Filter context:
- Row context → operations that work “row by row”
- Filter context → the filters that apply to the measure (via slicers, filters, etc.)
-
Basic time intelligence:
-
TOTALYTD,SAMEPERIODLASTYEAR,DATEADD, etc.
-
3.4 Security in Semantic Models
Again, security at the model level:
-
RLS (Row-Level Security):
- Table filter expressions per role
- Example:
[Country] = "Bangladesh"for certain users
-
OLS (Object-Level Security):
- Hide specific tables/columns from roles
Important:
- RLS/OLS apply regardless of storage mode (Import/DirectQuery/DirectLake), with some advanced nuances.
- Even if a user has Build, they still only see rows allowed by RLS.
3.5 Performance & Optimization
Large or complex models can slow down. Common optimization strategies:
- Use star schema instead of snowflake
- Avoid many-to-many relationships and bi-directional filters unless necessary
- Use aggregations
- Avoid overly complex DAX measures that iterate row-by-row over large tables
- Use proper types & summarization to reduce memory usage
-
Use XMLA endpoint and Tabular Editor for:
- Partitioning fact tables
- Incremental refresh tuning
- Calculation groups
Exam-style question:
“A large semantic model is slow. What are effective techniques to improve performance?”
→ Star schema, aggregations, optimizing relationships, partitioning, and simplifying DAX.
4. Practical Study Plan (How to Use This Note)
Since you already have some hands-on with Fabric:
Step 1 – Fabric Objects & Admin
-
Understand:
- Lakehouse, Warehouse, Dataflow Gen2, Pipelines, Notebooks, Semantic models, KQL DB
-
Learn:
- Tenant vs Capacity vs Workspace
- Critical settings: “Users can create Fabric items”, guest access, export settings, labels, DirectLake requirements
Step 2 – Prepare Data (Hands-on + Theory)
-
Build a small scenario:
- Ingest CSV → Dataflow Gen2 → Lakehouse
- Transform with Dataflow (custom column, type changes)
- Optionally, use a Pipeline to orchestrate Dataflow + Notebook
-
Explore:
- Lakehouse Tables vs Files, Delta basics
- Shortcuts to some external storage (if possible)
Step 3 – Semantic Model & Reporting
-
Create a semantic model from the Lakehouse:
- Try Import and DirectLake
-
Build:
- A star-schema style model
- Some simple DAX measures (Total Sales, YTD, Last Year comparison)
-
Configure:
- RLS roles (e.g., region-based)
Step 4 – Governance & Lifecycle
-
Review:
- Workspace roles
- Item permissions & Build permission
- Sensitivity labels (set up a simple label and see its effects)
-
Understand:
- Git integration & PBIP concept
- Deployment pipelines Dev → Test → Prod
- Lineage view and impact analysis
Top comments (0)