Abel

Posted on Mar 25 • Originally published at Medium on Apr 29, 2025

OmniSync: Near Real-Time Lakehouse, Spark Streaming and Power BI in Microsoft Fabric (Part 2)

#microsoftfabric #fabric #ipaas #integration

Building near real-time reports and analytics with Fabric’s Lakehouse, Spark, and Power BI.

Posts in this Series

OmniSync: A Real-World Architecture for Syncing Salesforce, D365, and Fabric in Near Real-Time (Part 1)
OmniSync: Real-Time Lakehouse, Spark Streaming, and Power BI in Microsoft Fabric (Part 2) (This Post)
OmniSync: Integrating Salesforce with Microsoft Fabric and Dynamics 365 (Part 3)
OmniSync: Dynamics 365 Integration with Salesforce and Fabric (Part 4)

Introduction

This second part of the OmniSync series focuses on how Microsoft Fabric was used to build the analytics and reporting layer, fed by both legacy data and near real-time sync flows from Salesforce and Dynamics 365.

The goal wasn’t just to centralize data, but to apply real architectural principles:

Medallion layer structuring (Bronze, Silver, Gold)
Real-time streaming with Spark
Unified models powering live Power BI reports

Fabric provided a strong foundation for this — combining Dataflows Gen2 , Lakehouse , Spark Notebooks , and Power BI into a single analytics platform.

While this phase required learning new tools, it gave complete control over ingestion, transformation, and reporting, all within a single platform.

Microsoft Fabric Workspaces

To keep things organized, OmniSync uses a dedicated Fabric workspace to manage all components involved in ingestion, transformation, and reporting.

This workspace contains:

Lakehouses for storing raw and curated data
Environments for custom libraries like log4Java
API For GraphQL for SPA accessing externally
Event Stream to get data externally and store on LakeHouse
Semantic Models as a way to model entities, relationships and measures
Dataflows Gen2 for batch ingestion (like the initial load)
Spark Notebooks for transformations, CDC logic, and business rules
Spark Jobs for streaming notebooks
Power BI reports for final dashboards and metrics
Monitoring Eventhouse for monitoring workspace
Deployment Pipelines for CI/CD amongst different environments

Each workspace is linked to a Microsoft OneLake storage container behind the scenes, where all the data (Delta tables, logs, artifacts) is physically stored.

Why Lakehouse Instead of Data Warehouse?

OmniSync uses a Lakehouse as its main data store instead of a traditional Data Warehouse or even the newer SQL Database option in Microsoft Fabric.

This decision was primarily driven by the goal of exploring and learning about the Lakehouse architecture and the use of Spark within Fabric. It gave a chance to evaluate its benefits and limitations compared to:

A more traditional operational database like SQL
A classic OLAP-style Data Warehouse

While this may not have been the most production-optimized option for a real-time scenario (an Eventhouse architecture might have been more suitable), it fit the PoC’s purpose perfectly, allowing to experiment with modern patterns, event-driven ingestion, and Spark-based transformations in a controlled and flexible environment.

Microsoft provides a detailed guide to help decide between these storage types in Fabric:

👉Choosing a data store in Microsoft Fabric

In this case, the Lakehouse provided the right mix of flexibility and hands-on Spark learning, while also supporting the real-time ingestion patterns needed from CDC.

SQL Analytics Endpoint

The SQL Analytics Endpoin t in Microsoft Fabric allows you to query Lakehouse Delta tables using standard T-SQL, so there is no need to write Spark code for everything.

For OmniSync, it was primarily used to:

Run ad hoc SQL queries to inspect raw and cleaned data
Query externally from Logic Apps and validate integration data

This endpoint gives you a familiar SQL interface to the Lakehouse, which is especially useful when building reports or debugging synchronization issues without needing to open a notebook.

It’s worth noting that while SQL Endpoint is great for reporting and light analytics, complex transformation logic is still handled more efficiently in Spark Notebooks.

Medallion Architecture

Before diving into the Medallion setup, keep in mind that OmniSync’s design draws on ideas from Kappa and Lambda architectures, adapted into a simplified event-driven model.

As explained in Part 1, OmniSync first runs an initial ETL process using Dataflows Gen2 to load legacy SQL Server data into Fabric. After that, all operations are handled via streaming basis.

This Medallion structure builds on that foundation: starting with batch ingestion, but fully shifting into real-time sync after the initial load.

A Medallion Architecture is a common pattern in modern data engineering that organizes data into structured layers — typically Bronze , Silver , and Gold — to improve clarity, scalability, and data quality as it flows from raw ingestion to final reporting.

Bronze : Raw, unprocessed data landed exactly as received
Silver : Cleaned, filtered, and enriched data, ready for modeling
Gold : Business-ready data, often aggregated or transformed for analytics and BI tools

This layered approach allows teams to decouple ingestion from transformation and reporting, while also enabling better debugging, governance, and performance optimization at each stage.

OmniSync uses a Medallion architecture with a little difference: Bronze and Silver layers are used primarily for initial load, while the Gold layer is continuously updated with real-time streaming data from Salesforce and Dynamics 365.

Bronze Layer

The Bronze layer ingests legacy data from a SQL Server database which in this case is Microsoft’s sample ContosoDataWarehouse, adapted for our own entities and structure.

Ingestion is handled via Dataflows Gen2 , which provides a fast, no-code way to move data into the Bronze Lakehouse without applying any transformations. It preserves the structure and contents exactly as received.

Silver Layer

The Silver layer is where we start to prepare the data for actual use.

A Spark Notebook reads from the Bronze layer and performs several operations:

Cleansing and normalizing data
Seeding initial structures
Applying business transformations
Creating new tables and adding derived columns
Mapping values using reference tables (like MasterDataMapping)

For example, when processing Stores :

A new table is created with cleaned schema
Geolocation values are extracted into latitude and longitude
Store-to-customer relationships are inferred and added

Another example is Account seeding, where incoming Salesforce and Dynamics 365 Id’s are mapped using the MasterDataMapping table, aligning cross-platform references for future synchronization logic.

After this layer, data is structured and aligned with OmniSync’s data model, but not yet real-time.

Gold Layer

The Gold layer is where real-time streaming meets analytics.

Initially, this layer is populated via a notebook that copies cleaned tables from the Silver layer. But more importantly, Gold becomes the landing zone for live CDC events, handled by streaming Spark notebooks that process changes from Salesforce and Dynamics 365 in near real-time.

This layer includes the final Sales fact table , which joins and flattens multiple sources and becomes the main source for Power BI reporting.

In OmniSync, the Gold layer is the main data layer. It combines past data with new updates, making the system simple, scalable, and ready for analysis.

Star Schema Model

The final Gold layer in OmniSync follows a Star Schema, a widely used and performant model for analytics tools like Power BI.

This wasn’t designed from scratch. The starting point was a legacy SQL Server Data Warehouse, which already had well-defined fact and dimension tables. For the PoC, we simplified that model and focused on what mattered most: clean reporting, easier maintenance, and compatibility with real-time updates.

At the center of the model is the SalesOrder fact table , adapted from the original FactOnlineSales fact table in Contoso’s warehouse. This is materialized in another fact table Sales becoming the main table consumed by Power BI, and the foundation for all reporting.

Surrounding it are simplified dimension tables , created and enriched through Spark Notebooks:

Products
Product (Sub)Categories
Stores
Customers (Accounts)
Geography
Date
Currency

Each of these dimensions is either loaded during the initial data load or kept fresh through real-time CDC synchronization from Salesforce and Dynamics 365.

In this model:

Fact tables hold the transactional data
Dimensions describe the context (e.g., which product, store, customer, or region)
Relationships are built using foreign keys , enabling fast filtering and slicing in Power BI

The semantic layer was built using:

Spark Notebooks to clean and prepare dimensions
Manual modeling in Power BI to define relationships, hierarchies, and joins
DAX measures to define key metrics and calculated columns

This setup offers:

Fast cross-filtering across dimensions
Reusable metrics that stay consistent across reports
A clean separation between transformation logic and report logic

By applying a classic Star Schema, we keep the Power BI model both performant and intuitive, even with real-time data flowing in from multiple platforms.

Power BI

Power BI serves as the primary reporting tool for OmniSync, running directly on top of the Gold Lakehouse tables.

Thanks to Microsoft Fabric’s native integration with Power BI, reporting is seamless, with no complex external connections needed.

Reports connect to the Gold Lakehouse tables using DirectLake for high performance with cached data
Data model is shaped using the Star Schema built in Fabric
DAX is used to define measures and KPIs
Reports were modeled and created in Power BI Desktop.

Implemented Reports

Several Power BI reports were created to visualize synced data from Salesforce and Dynamics 365 into our new LakeHouse built directly on top of the Gold layer in Fabric.

Key Reports Included

📈Sales Report

Displays total sales volume by region, stores, product categories, dates, COGS, Net Profit, % Change in Sales amongst others.

📈Sales By Country

Displays total sales volume and LY comparison by selectable continent, country, brand, class and product category amongst others.

📈 Sales By Store

Displays total sales by store, LY by month, product categories and geography of the stores.

📈Margin Analysis

Shows profit analysis, net profit comparisons and those margins by month, city and product category.

📈 Sales By Product

Displays total sales by product with last year comparison by country, brand, class, manufacturer, color and category amongst others.

Each report is built with:

Slicers for key dimensions (time, region)
DAX measures for dynamic KPIs (e.g., % Net Profit, COGS)
Visual filtering via field parameters

Semantic Model & DAX Measures

On top of the Gold Lakehouse tables, OmniSync builds a Fabric semantic model to define relationships, and key performance metrics all using DAX which then Power BI reports can query clean, efficiently and business stakeholders can consume.

Data Analysis Expressions (DAX) is the formula language used in Power BI to define calculations, aggregations, and business logic across tables and models.

Semantic model includes:

Relationships between fact and dimension tables are defined, following the Star Schema built inside Fabric.
Calculated columns are used for derived attributes, such as Store Type or Category Group.
Hierarchies are added for drilldowns, like navigating from Country → Region → City in Geography analysis.
Business calculations and KPIs are created using DAX

Several measures were used for the different reports some are relatively complex like the one below which calculates the Total Sales by Store and Date selected.

Store Sales (Selected) = 
CALCULATE( 
 [Total Sales] , 
 FILTER(
        ALLSELECTED('Date'[DateKey]),
        'Date'[DateKey] <= MAX('Date'[DateKey])),
    GROUPBY( Store, Store[StoreName])
)

Other are simple like the Total Sales which is basically the sum of Sales

Total Sales = SUM(Sales[Sales])

Real-Time Hub: Streaming Data into Fabric

After the initial batch load, OmniSync shifts entirely into near real-time data ingestion, powered by Azure services and Spark Streaming inside Microsoft Fabric.

At the core of this live data flow is Fabric’s Real-Time Hub, a specialized engine that enables continuous event ingestion directly into Lakehouse tables without requiring heavy batch processing.

How the Streaming Happens

Salesforce and Dynamics 365 emit change events (via CDC streams or webhooks).
Events are routed through a middleware layer involving EventGrid , Logic Apps , and Azure Event Hub.
A Fabric Spark Streaming Notebook listens continuously to Event Hub using the Real-Time Hub.
Incoming events are processed, enriched, and immediately written into the Gold layer of the Lakehouse.

In OmniSync, the Real-Time Hub makes it possible to continuously push changes from Salesforce and Dynamics 365 directly into Fabric with minimal infrastructure setup.

Benefits of this Setup

Near real-time updates for key business entities (Accounts, Orders, Products)
Automatic enrichment (resolving relationships, IDs, lookups) during ingestion
Fresh Power BI dashboards without relying on heavy batch refreshes
Low operational overhead : No manual Spark cluster management needed

While actual latency can vary slightly (due to retries, cold starts, or checkpoint delays), most updates flow through and appear in the Gold Lakehouse layer within 5–20 seconds after a change is made in the source system.

Thanks to Fabric’s Real-Time Hub, OmniSync achieves true near-real-time syncing, not just for analytics, but for live operational data across systems.

CDC Table Format

To make real-time ingestion manageable, each event captured from Salesforce or Dynamics 365 is normalized into a consistent CDC (Change Data Capture) format before being processed in Fabric.

Each incoming record includes fields like:

Entity: The type of entity (e.g., Account, Order)
Operation: Type of change (insert, update, delete)
Values: JSON object representing the record state
CreatedDate: Creation timestamp of when the row was created
UpdateDate: Update timestamp of when the was updated

This consistent schema makes it easier for Spark notebooks to:

Apply insert/update/delete logic
Route each entity type to the right table

Streaming Notebook for CDC Processing

As explained earlier, we use a streaming technique to capture and process CDC data from external systems into Fabric. All raw CDC events are first stored in a centralized ExternalCDC Delta table , which acts as the landing zone for real-time changes coming from Azure Event Hub.

To process these changes continuously, we use Spark Structured Streaming , and run it as a long-running Spark Job Definition in Fabric. This keeps the notebook running in the background, monitoring for new events and processing them as they arrive in near real time.

Here’s the basic Spark streaming code:

df.writeStream \
    .outputMode("append") \
    .trigger(processingTime='10 seconds') \
    .option("checkpointLocation","Files/cdc_checkpoint") \
    .format("delta") \
    .foreachBatch(sendToSinkTable) \
    .start() \
    .awaitTermination()

This stream is scheduled and executed as a Fabric Job Definition , allowing it to run continuously, restart on failure without manual triggering. Job is rescheduled every night at 1:00 am when nobody is working and the retry policy is to do indefinitely, on failure, every 10 seconds.

The function sendToSinkTable is a custom PySpark routine that decodes each batch of CDC events and applies them to the correct Lakehouse tables.

It handles all Create , Update , and Delete operations by:

Parsing the raw JSON payloads into DataFrames
Determining the type of operation
Routing the data to the appropriate destination tables

It also performs key logic like:

Creating auxiliary data (Dates, Geography)
Generating identity primary keys where needed
Mapping relationships (e.g., Store → Account)
Decoding epoch timestamps
Aligning cross-system IDs (Salesforce, Dynamics, Fabric)
Materializing sales data for Power BI reporting

Below we can see an image of the Sales materialization PySpark notebook code:

Geolocation Mapping via GeoApify

To enrich location-based entities like Stores and Accounts, OmniSync integrates with the GeoApify API to resolve geographic coordinates and standardize location data within the Geography dimension.

This enrichment is handled by a dedicated Geography CDC Spark notebook, which is triggered as part of the real-time synchronization flow. The notebook analyzes each CDC event and determines the best way to enrich location fields based on the input data.

Depending on the payload:

If latitude and longitude are present, it performs reverse geocoding using GeoApify to extract: street, city, postal code, region and country
If coordinates are not available , the notebook uses available address components (street, city, country, etc.) to perform forward geocoding , resolving coordinates instead.

The notebook then merges the enriched fields (like resolved_country, lat, lon) back into the corresponding entity record before writing to the Delta table.

GeoApify Provides:

Address normalization and cleaning
Reverse geocoding (latitude/longitude → full address info)
Forward geocoding (partial address → coordinates)
A consistent geographic hierarchy : Country > Region > City > Postal Code > Street

The API is called directly from within Spark using an HTTP client, and the GeoApify API key is stored securely in Azure Key Vault, injected into the notebook environment at runtime.

This enrichment is especially useful for region-based reporting in Power BI and for filtering and aggregating KPIs by geography, ensuring accurate and consistent location data across the entire reporting layer.

Below we can see an image of the PySpark notebook code of this:

Master Data Mapping

As mentioned earlier, OmniSync uses a dedicated table called MasterDataMapping to maintain consistent references between entities across different systems.

This table plays a critical role in aligning data between Fabric , Salesforce , Dynamics 365 , and potentially SAP (in future phases). It supports mapping for key business entities like Customers, Products, Stores, SalesOrders, and Currencies , ensuring that records from different platforms can be reliably matched.

🔧 Table Structure:

MasterDataMapping: Identity-based primary key (auto-incremented)
FabricId: Internal identifier used within Fabric
SalesForceId: External key referencing the Salesforce ID
D365Id: External key referencing the Dynamics 365 ID
SAPId: Reserved for future SAP integration
Entity: The type of entity (e.g., Currency, Customer, Product, SalesOrder, Store)
Name: Natural key or label to help identify the external entity
IsDeleted: Soft-delete flag
CreatedDate: When the record was created
UpdateDate: When the record was last modified

Previous Synced Systems

This table is pre-seeded with mapping values during the Silver layer processing, as explained earlier. These seed values allow the system to function properly from the start by:

Establishing relationships across systems for historical data
Preventing sync conflicts
Enabling lookups during real-time CDC operations

This table is essential for ensuring that all references remain aligned, especially when syncing multiple systems that don’t share a common key structure.

CRUD

Supported operations in OmniSync are Create , Update , and Delete (obviating Read). While UnDelete was also explored, mainly because Salesforce supports it, it was ultimately left out due to its complexity and the following reasons:

Dynamics 365 does have an UnDelete feature, but it’s still in Preview.
SAP, which we may support later, doesn’t offer this capability at all.
Dynamics 365 also requires a non-trial environment to fully use it.
Recover synced deleted entities in SAP and Dynamics 365 was not feasible in an easy way.

So to avoid overcomplicating the logic, ensure broader compatibility, and easy to test the system the decision was to not implement UnDelete in OmniSync.

Salesforce Deletes

Even without UnDelete, Salesforce handles deletions as soft deletes. To align with that behavior, OmniSync implements a soft delete strategy across all entities.

Every table includes an IsDeleted column, and delete operations only mark the record as deleted, rather than removing it.

This decision leaves the door open to reintroduce the UnDelete logic in the future, if needed.

Idempotency

One critical part of the design is idempotency. In a distributed, event-driven system, “ exactly once ” delivery can’t be guaranteed, due to retries, delays, or duplicated events amogst others.

To handle this, OmniSync enforces idempotent logic during CDC processing:

If a record already exists, it won’t be created again
Deletes won’t be applied multiple times
Updates are allowed if the incoming change is newer

This is all handled in PySpark notebooks, since working with Lakehouse Delta tables means we don’t have traditional database guarantees like primary keys, unique constraints, or foreign keys. Those must be managed at the application logic level.

While not bulletproof, this approach is solid enough for near real-time sync scenarios and avoids common pitfalls in distributed data processing.

Monitoring in Microsoft Fabric

Microsoft Fabric offers a comprehensive suite of monitoring tools to help track performance, resource usage, and activity across your entire data ecosystem, from ingestion flows to Power BI and Spark jobs.

These tools are especially important in scenarios like OmniSync, where streaming jobs, resource capacity, and real-time sync health must be monitored closely.

We will detail now those used in Fabric for OmniSync.

Monitor Hub

Purpose : Centralized activity tracking across all workspaces

Provides a real-time view of active tasks, such as pipelines , dataflows , and notebooks
Fabric users can see activities for items they have access to useful for quick status checks
Great for monitoring job completion, failure states, or troubleshooting long-running tasks

Workspace Monitoring

Purpose : In-depth logging and metric collection within a specific workspace

Collects logs and performance data for all workspace-level items
Helps with troubleshooting , performance tuning , and resource optimization
Useful for capacity planning and identifying usage patterns over time

Microsoft Fabric Capacity Metrics App

Purpose : : Enables monitoring of compute and storage usage for Microsoft Fabric and Power BI Premium capacities.

Tracks Capacity Units (CU) consumption offering insights into compute resource usage.
Provides storage metrics usage data including OneLake and tenant-level storage.
Identifies peak demand, usage patterns, and potential throttling issues to optimize performance.
Helps admins decide on resource scaling or enabling autoscale for efficient capacity management.

Lakehouse & Spark Monitoring (with Log Analytics)

Purpose : Track activity, performance, and health of Spark notebooks and streaming jobs running inside Fabric Lakehouses.

Because OmniSync relies heavily on Spark Structured Streaming to ingest CDC events, it’s essential to monitor Spark jobs at a granular level, beyond just workspace or capacity views.

Fabric provides two main ways to monitor Spark activity:

🔎 Native Spark Monitoring

Track Spark session start/end times
Monitor executor memory and job durations
Inspect checkpointing behavior for streaming pipelines
Detect job failures, retries, and streaming lags

This basic monitoring is available through Fabric UI — but for deeper observability, integration with Azure Log Analytics is recommended.

🛠️ Log Analytics Integration for Spark Logs

To have proper, centralized log monitoring, Azure Log Analytics is connected to Fabric through Environment configuration.

This setup allows you to:

Capture custom logs and metrics directly from Spark notebooks
Retain detailed execution history
Run advanced queries to analyze job behavior, memory usage, and exceptions

To enable this it is needed to create a Fabric Environment linked to the Omnisync workspace and configure with a Log Analytics Workspace and Key ID. After that Spark jobs and notebooks can be attached to run inside that Environment.

More detailed instructions are provided here:

🔗 Configure Fabric Diagnostic Emitters for Log Analytics

Once configured, you can query Spark execution logs in Log Analytics using Kusto Query Language (KQL).

Kusto Query Language (KQL) is a powerful, query language used to analyze large volumes of structured, semi-structured, and unstructured data in Microsoft Azure services like Azure Data Explorer, Azure Monitor, and Microsoft Fabric.

SparkListenerEvent_CL
| where fabricWorkspaceId_g == "{FabricWorkspaceId}" 
  and artifactId_g == "{ArtifactId}" 
  and fabricLivyId_g == "{LivyId}"
| order by TimeGenerated desc
| limit 100

This query lets you monitor:

When jobs start and finish
Memory and execution stats per notebook run
Errors, retries, and warnings inside Spark pipelines

This level of deep monitoring is essential for projects like OmniSync, where continuous ingestion reliability and performance are critical.

GraphQL Monitoring API

Purpose : Monitors GraphQL API activity, providing metrics on request rates, latency, and errors for performance optimization.

Dashboard & Logs: Features a visual dashboard and stores detailed logs in Kusto tables, retained for 30 days.
Troubleshooting: Identifies slow queries and issues, requiring workspace monitoring and incurring storage costs.

Security

Microsoft Fabric uses a three-level security model , evaluated sequentially when users access data:

Microsoft Entra ID authentication : Verifies the user can authenticate to Azure Active Directory (now called Entra ID).
Fabric access : Checks whether the user has access rights to Fabric itself.
Data security : Controls what actions the user can perform on specific tables, files, or reports.

At the data security level , Fabric offers several access controls:

Workspace roles (Admin, Contributor, Viewer)
Item-level permissions (datasets, notebooks, reports)
Compute and granular permissions (control over Spark, SQL resources)
OneLake data access controls (still in preview)

How Security Was Handled in OmniSync

Since OmniSync was a PoC developed by a single user , security was intentionally kept very simple :

Only the Admin role was used across the Fabric workspace.
Item-level permissions and granular compute permissions were not configured.
OneLake access controls were not enabled.

This allowed faster development without worrying about permission conflicts.

In a production environment, OmniSync would require full role separation, proper item-level security, compute-level permissions, and centralized identity management via Entra ID.

Client App Permissions

For external integrations (like the SPA dashboard or Power BI REST calls), a Microsoft Entra App Registration was created with only:

“Power BI Service” permission
“Run Queries and Mutations” (for GraphQL-based access)

This limits the app to read-only query access and prevents modification of workspace artifacts.

Key Vault Integration

Key Vault was used to:

Securely store sensitive tokens
Inject secrets dynamically into Spark notebooks or environment variables at runtime

In OmniSync, GeoApify external API keys were stored securely in Key Vault and pulled when needed by streaming notebooks.

SPA Client App: GraphQL and Embedded Power BI

OmniSync includes a lightweight Single Page Application (SPA) designed to provide a flexible and user-friendly way to interact with the system’s data.

The SPA serves two main purposes:

🔎 Querying Fabric Lakehouse via GraphQL

The app connects to Microsoft Fabric’s GraphQL endpoint to directly query real-time data from the Lakehouse.

SPA can run GraphQL queries against views and tables.
This allows lightweight, low-latency access to live data without needing to always use Power BI dashboards (if needed more complicated stuff).

Example GraphQL query:

query {
  lakehouseTable(tableName: "salesorders") {
    rows(limit: 10) {
      OrderId
      TotalAmount
      StoreName
    }
  }
}

Publishing Embedded Power BI Reports

Besides querying raw data, the SPA also embeds full Power BI reports.

This lets users interact with pre-built dashboards without needing to navigate the Fabric portal directly.

Authentication is done through:

Microsoft Entra App Registration using OAuth 2.0.
MSAL client library

This approach combines flexibility (GraphQL) and rich visualization (Power BI) in one unified front-end.

By combining direct data access and embedded visual analytics, the SPA provides a complete real-time monitoring and reporting solution without needing to depend entirely on the Fabric UI.

CI/CD and Deployment

Even though OmniSync was built as a simple PoC with a single main environment, it still took the opportunity to experiment with Fabric’s Git integration and deployment pipelines to simulate how Dev-to-Prod promotion and version control would work in a real-world scenario.

Git Integration in Fabric

Fabric’s Git integration allows:

Versioning of artifacts (reports, datasets, notebooks, etc.)
Branch separation between development and production workspaces

Fabric Workspace Pipelines

A deployment pipeline was created inside Fabric, based on the OmniSync workspace.
Two stages were defined:

✔️ Development (default workspace)

✔️ Production (new workspace connected to a different Git branch)

The pipeline was linked to Git integration , assigning each workspace to its own branch:

✔️The Development workspace mapped to a dev Git branch.

✔️The Production workspace mapped to the main Git branch.

However, Fabric’s promotion between stages is manual :

Developers push changes into the Development branch.
Inside Fabric, you must manually press the “Deploy” button to move artifacts from Development to Production stages.

Still a Pull Request strategy could be implemented but as said to keep things simply this approach was chosen.

Lessons Learned

Working with Microsoft Fabric for real-time and its pipelines helped underastand more about an Analytics platform like Fabric. Here are the main takeaways from this phase of the project:

Lakehouse + Spark Work : Getting started with Spark notebooks and Delta tables inside Fabric was not easy at first, especially coming from more traditional SQL environments. But once the basics were in place, the flexibility of Spark for transformation and control logic was a huge win.
Medallion Architecture: Organizing data into Bronze, Silver, and Gold layers helped keep everything clearly separated, even if it felt a bit heavy for a PoC where only the initial load used the full structure and only the Gold layer stayed active. Still, the separation brought real benefits in keeping the architecture clean and organized.
Streaming in Fabric: Streaming data into Fabric Lakehouse using Spark notebooks works well, but it’s not as plug-and-play as typical streaming services. Features like CDC merge logic, idempotency all need to be implemented manually.
Power BI: Is an amazing tool to display and work with Dashboards and makes super easy when you structure properly with STAR schema, geography and dates to display data through DirectLake. Still DAX measures somehow are not as comfortable and easy to learn if you are more used to a SQL syntaxis.