sajjadrahman265

Posted on Nov 15, 2025

📘 DP-600 Complete Deep Dive Guide

#dp600 #ai #cloud #microsoft

1️⃣ Dataflows Gen2

Definition

Dataflows Gen2 is a low-code/no-code data integration and transformation tool in Microsoft Fabric that uses Power Query for ETL operations.

When and Why Used

When: You need to ingest data from multiple sources (CSV, databases, APIs) and transform it before loading into Lakehouse/Warehouse
Why: Provides visual, user-friendly interface for data engineers without deep coding skills; supports incremental refresh; integrates directly with OneLake

How It Works in Fabric

Uses Power Query M language behind the scenes
Connects to 100+ data sources
Applies transformations (filter, merge, aggregate)
Lands data into Lakehouse tables or files
Supports staging (intermediate storage) for complex transformations

Real-World Example

A retail company receives daily sales CSV files from 50 stores. Dataflows Gen2 ingests files, cleans missing values, standardizes date formats, and loads into Lakehouse Bronze layer.

Exam Appearance

"Which tool should you use for automated daily CSV ingestion?" → Dataflows Gen2
"You need low-code transformation before loading Lakehouse" → Dataflows Gen2

Differences from Similar Concepts

Dataflows Gen2 vs Gen1: Gen2 writes directly to OneLake, supports staging, better performance
Dataflows vs Pipelines: Dataflows = transformations; Pipelines = orchestration
Dataflows vs Notebooks: Dataflows = no-code; Notebooks = code-based (Spark)

2️⃣ Lakehouse

Definition

Lakehouse is a unified data architecture combining data lake flexibility (raw data, Spark) with data warehouse structure (SQL queries).

When and Why Used

When: You need to store raw, semi-structured, and curated data with Spark processing
Why: Provides single storage layer for all data types; supports Delta tables; enables Medallion architecture

How It Works in Fabric

Built on OneLake (Delta Lake format)
Stores files (Parquet, CSV, JSON) and tables (Delta)
Supports Spark notebooks, SQL endpoint, shortcuts
Automatically creates SQL analytics endpoint for querying

Real-World Example

An IoT company stores raw sensor JSON files in Lakehouse, transforms them with Spark notebooks into Delta tables, and allows analysts to query with SQL.

Exam Appearance

"Store raw + curated data with Spark flexibility" → Lakehouse
"Need both file storage and table queries" → Lakehouse

Differences from Similar Concepts

Lakehouse vs Warehouse: Lakehouse = flexible (files + tables); Warehouse = structured SQL only
Lakehouse vs Data Lake: Lakehouse adds Delta tables, ACID transactions, time travel

3️⃣ Warehouse

Definition

Warehouse is a fully managed, enterprise-grade data warehouse optimized for SQL workloads and relational analytics.

When and Why Used

When: Business users need structured, relational data with pure SQL queries
Why: Optimized for aggregations, joins, star schema; familiar to SQL developers; better performance for complex SQL

How It Works in Fabric

Stores data in columnar format (similar to Synapse)
Supports T-SQL queries, stored procedures, views
Integrates with Power BI for DirectQuery
Provides separation of compute and storage

Real-World Example

Finance team builds dimensional model (fact + dimension tables) in Warehouse for budgeting reports, queried by Power BI dashboards.

Exam Appearance

"Business users need structured SQL-only access" → Warehouse
"Build star schema for reporting" → Warehouse

Differences from Similar Concepts

Warehouse vs Lakehouse: Warehouse = SQL-focused, structured; Lakehouse = Spark-focused, flexible
Warehouse vs Traditional DW: Fabric Warehouse = cloud-native, integrated with OneLake

4️⃣ Delta Tables

Definition

Delta tables are open-source storage format adding ACID transactions, time travel, and update/delete capabilities to Parquet files.

When and Why Used

When: You need reliable updates/deletes, historical tracking, or data quality checks
Why: Parquet is immutable; Delta adds transaction logs enabling versioning and atomic operations

How It Works in Fabric

Stores data as Parquet files + transaction log (JSON)
Log tracks all changes (inserts, updates, deletes)
Supports VACUUM (cleanup old versions), OPTIMIZE (compaction)
Enables time travel: SELECT * FROM table VERSION AS OF 5

Real-World Example

Customer profile table updates addresses daily. Delta table maintains history with time travel for compliance audits (GDPR right to erasure tracking).

Exam Appearance

"Maintain historical versions with updates" → Delta tables with time travel
"Need ACID transactions in Lakehouse" → Delta tables

Differences from Similar Concepts

Delta vs Parquet: Delta = updatable, versioned; Parquet = immutable
Delta vs SCD-2: Delta = storage layer; SCD-2 = modeling pattern (Delta can implement SCD-2)

5️⃣ KQL (Kusto Query Language)

Definition

KQL is a query language optimized for fast, interactive analytics on large volumes of streaming and time-series data.

When and Why Used

When: Real-time analytics, log analysis, anomaly detection, telemetry monitoring
Why: Sub-second query performance on streaming data; built for time-series operations

How It Works in Fabric

Used with KQL Database (Eventhouse)
Reads from streaming sources (Event Hub, IoT Hub)
Supports aggregations, time-windowing, pattern matching
Syntax: TableName | where Timestamp > ago(1h) | summarize count() by Category

Real-World Example

Bank monitors credit card transactions in real-time using KQL to detect fraud patterns (multiple high-value transactions within 5 minutes).

Exam Appearance

"Real-time fraud detection with streaming data" → KQL Database
"Sub-second query latency on telemetry" → KQL

Differences from Similar Concepts

KQL vs T-SQL: KQL = streaming, time-series; T-SQL = relational, batch
KQL vs Spark SQL: KQL = real-time; Spark SQL = batch processing

6️⃣ Eventstream

Definition

Eventstream is a real-time data ingestion service for capturing, transforming, and routing streaming data in Fabric.

When and Why Used

When: Ingesting continuous data flows (IoT sensors, clickstreams, logs)
Why: No-code streaming pipeline; routes to multiple destinations; applies transformations on-the-fly

How It Works in Fabric

Connects to Event Hub, IoT Hub, sample data
Applies transformations (filter, aggregate)
Routes to KQL Database, Lakehouse, Reflex (alerts)
Visual designer interface

Real-World Example

Manufacturing plant streams machine sensor data via Eventstream, filters critical alerts, and writes to KQL Database for monitoring dashboards.

Exam Appearance

"Ingest real-time IoT data visually" → Eventstream
"Route streaming data to multiple destinations" → Eventstream

Differences from Similar Concepts

Eventstream vs Dataflows: Eventstream = streaming; Dataflows = batch
Eventstream vs Event Hub: Event Hub = Azure service; Eventstream = Fabric wrapper with transformations

7️⃣ Row-Level Security (RLS)

Definition

RLS is a security feature that filters data rows dynamically based on user roles or attributes.

When and Why Used

When: Different users should see different subsets of data (managers see their department only)
Why: Maintains single dataset while enforcing data access policies

How It Works in Fabric

Defined in Power BI semantic model using DAX filters
Example: [Region] = USERPRINCIPALNAME()
Applies filters automatically when users query data
Works with Direct Lake, Import, DirectQuery modes

Real-World Example

HR dataset shows all employees. RLS rule: [ManagerID] = USERPRINCIPALNAME() ensures managers only see their direct reports.

Exam Appearance

"Managers see only their department data" → Row-Level Security (RLS)
"Dynamic filtering based on user identity" → RLS

Differences from Similar Concepts

RLS vs Column-Level Security: RLS = row filtering; CLS = column hiding
RLS vs Workspace Permissions: RLS = data-level; Workspace = object-level (entire dataset access)

8️⃣ Direct Lake Mode

Definition

Direct Lake is a Power BI connection mode that queries Delta tables in Lakehouse/Warehouse directly without importing data.

When and Why Used

When: You want near-real-time dashboards without data duplication
Why: Combines DirectQuery speed with Import performance; no data movement

How It Works in Fabric

Power BI reads Delta Parquet files directly from OneLake
Uses Fabric compute engine for queries
No data copied to Power BI semantic model
Automatic fallback to DirectQuery if needed

Real-World Example

Sales dashboard queries 10TB Lakehouse table using Direct Lake, refreshing insights every hour without importing data.

Exam Appearance

"Query Lakehouse without duplicating storage" → Direct Lake Mode
"Near-real-time dashboards on Delta tables" → Direct Lake

Differences from Similar Concepts

Direct Lake vs Import: Direct Lake = no copy, live data; Import = copied, scheduled refresh
Direct Lake vs DirectQuery: Direct Lake = faster, Fabric-optimized; DirectQuery = source database queried

9️⃣ Materialized Views

Definition

Materialized views are pre-computed, stored query results that improve performance for repeated queries.

When and Why Used

When: Complex aggregations run repeatedly; query performance is slow
Why: Stores results physically; refreshes periodically; drastically reduces compute

How It Works in Fabric

Created in Warehouse using T-SQL: CREATE MATERIALIZED VIEW AS SELECT...
Query optimizer automatically uses view when applicable
Refreshed manually or on schedule
Stored as physical tables

Real-World Example

Daily sales aggregation by region takes 10 minutes. Materialized view pre-computes results nightly, queries complete in seconds.

Exam Appearance

"Repeated aggregations are slow" → Materialized Views
"Pre-compute results for performance" → Materialized Views

Differences from Similar Concepts

Materialized Views vs Regular Views: Materialized = stored results; Regular = virtual query
Materialized Views vs Tables: Materialized = auto-updated from source; Tables = static until manually updated

🔟 Fact Tables

Definition

Fact tables store quantitative business metrics (sales, revenue) in a dimensional model, typically in star schema.

When and Why Used

When: Building data warehouse dimensional models for analytics
Why: Central table for measurements; joins to dimension tables for context

How It Works in Fabric

Contains measures (Amount, Quantity) and foreign keys to dimensions
Optimized for aggregations (SUM, AVG, COUNT)
Typically large, millions/billions of rows
Forms center of star schema

Real-World Example

FactSales table stores TransactionID, Date, ProductID, CustomerID, Amount. Joins to DimProduct, DimCustomer for reporting.

Exam Appearance

"Central table storing business metrics" → Fact Table
"Star schema center with measures" → Fact Table

Differences from Similar Concepts

Fact vs Dimension: Fact = measures (what); Dimension = context (who, where, when)
Fact vs Transactional Table: Fact = aggregated for analytics; Transactional = raw operational data

1️⃣1️⃣ Relational Cardinality

Definition

Cardinality defines the relationship type between tables: one-to-one (1:1), one-to-many (1:), many-to-many (:*).

When and Why Used

When: Building semantic models in Power BI; defining table relationships
Why: Determines how filters propagate; impacts query performance and correctness

How It Works in Fabric

1:* (One-to-Many): Most common; dimension filters fact table
1:1: Rare; two tables with matching primary keys
*:*: Avoided in star schema; requires bridge tables

Real-World Example

DimProduct (ProductID unique) → FactSales (ProductID repeated) = 1:* relationship

Exam Appearance

"Define relationship between DimCustomer and FactSales" → 1:* (One-to-Many)
"Avoid performance issues with relationships" → Avoid *:*

Differences from Similar Concepts

1:* vs :: 1:* = efficient filtering; : = performance penalty, complex logic

1️⃣2️⃣ Capacity Planning

Definition

Capacity planning determines the Fabric capacity size (F2, F64, F128, etc.) based on workload requirements.

When and Why Used

When: Provisioning Fabric environment; experiencing performance issues
Why: Ensures adequate compute resources; optimizes cost vs. performance

How It Works in Fabric

Capacity = compute units (CU) allocated for workloads
Sizes: F2 (2 CU) → F2048 (2048 CU)
Charged per hour; can pause/resume
Monitoring via Capacity Metrics app

Real-World Example

Data engineering team runs heavy Spark jobs. F64 capacity is too slow. Upgrade to F128 reduces job time from 2 hours to 30 minutes.

Exam Appearance

"What determines capacity size?" → Workload intensity + budget
"Multiple capacities needed for..." → Compliance, billing, workload segregation

Differences from Similar Concepts

Capacity vs Workspace: Capacity = compute resource; Workspace = logical container

1️⃣3️⃣ Data Ingestion

Definition

Data ingestion is the process of moving data from source systems into Fabric (Lakehouse, Warehouse, KQL Database).

When and Why Used

When: Initial data load; ongoing incremental updates; real-time streaming
Why: Centralizes data for analytics; enables transformations

How It Works in Fabric

Methods vary by source:

Cloud storage (ADLS, S3): Shortcuts (zero-copy)
Databases (Azure SQL, Snowflake): Mirroring
On-prem databases: Pipelines + On-prem gateway
Streaming (Event Hub): Eventstream
Files (CSV, Excel): Dataflows Gen2
Custom logic: Notebooks (Spark)

Real-World Example

Company ingests:

ADLS data via Shortcuts
On-prem SQL via Pipeline + Gateway
IoT streams via Eventstream

Exam Appearance

"Ingest on-prem SQL data" → Pipeline + On-prem gateway
"Zero-copy from ADLS" → Shortcuts
"Real-time events" → Eventstream

Differences from Similar Concepts

Shortcuts vs Mirroring: Shortcuts = reference, no copy; Mirroring = replicated, near-real-time
Dataflows vs Notebooks: Dataflows = no-code; Notebooks = code-based

1️⃣4️⃣ Data Gateways

Definition

Gateways are secure connectors enabling Fabric to access on-premises or private network data sources.

When and Why Used

When: Data sources behind firewall (on-prem SQL Server, file shares)
Why: Enables secure data ingestion without exposing sources to internet

How It Works in Fabric

Two types:

On-Premises Data Gateway: Installed on local machine; creates outbound connection to Fabric
VNet Data Gateway: Uses Azure VNet private endpoints; no on-prem installation needed

Real-World Example

Retail company has SQL Server on-prem. Installs On-Prem Gateway, configures Dataflow Gen2 to ingest sales data daily.

Exam Appearance

"Ingest from on-prem SQL Server" → On-prem Data Gateway
"Secure ingestion via private endpoint" → VNet Gateway

Differences from Similar Concepts

On-Prem vs VNet Gateway: On-Prem = local install; VNet = Azure-managed

1️⃣5️⃣ Medallion Architecture

Definition

Medallion (Bronze-Silver-Gold) is a data organization pattern separating raw, curated, and business-ready data layers.

When and Why Used

When: Building scalable data platforms with clear data quality stages
Why: Separates concerns; enables incremental refinement; supports diverse use cases

How It Works in Fabric

Bronze (Raw): Ingested data as-is; minimal transformations
Silver (Curated): Cleaned, validated, deduplicated; Delta tables
Gold (Business-Ready): Aggregated, modeled; star schema for consumption

Real-World Example

E-commerce platform:

Bronze: Raw JSON clickstream files
Silver: Parsed, deduplicated sessions (Delta)
Gold: Aggregated user behavior metrics (Warehouse)

Exam Appearance

"Separate raw and curated data layers" → Medallion Architecture
"Bronze, Silver, Gold pattern" → Medallion Architecture

Differences from Similar Concepts

Medallion vs ETL: Medallion = layered storage pattern; ETL = process pattern

🎯 Quick Reference: When to Use What

Requirement	Solution
Low-code ingestion	Dataflows Gen2
Raw + curated data, Spark	Lakehouse
Structured SQL only	Warehouse
Historical tracking, updates	Delta Tables
Real-time streaming analytics	KQL + Eventhouse
Row-based security	RLS in Power BI
No data duplication	Direct Lake Mode
Repeated query optimization	Materialized Views
On-prem data ingestion	On-Prem Gateway
Layered data organization	Medallion Architecture

Would you like me to create a one-page cheat sheet mapping business scenarios to correct Fabric features, or build more scenario-based practice questions focusing on specific weak areas?