How I Finally implemented CI/CD for Microsoft Fabric — And What Nobody Tells You About It

#dataengineering #fabric #cicd #devops

A data engineer's honest account of automating deployments when the platform is brand new and the documentation is still catching up.

There's a particular kind of confidence that comes from clicking a button and watching your code flow — automatically, safely, predictably — from your laptop all the way to production. It's the kind of confidence that lets you sleep well on a Sunday night before a Monday stand-up.

For a long time, Microsoft Fabric didn't give data engineers that feeling. You built your pipelines, your notebooks, your lakehouse schemas — and then you deployed them the old-fashioned way. Manually. Carefully. Nervously. Hoping you didn't overwrite something important.

That changes now. Here's how we set up a proper CI/CD system for our Fabric data engineering project — and what the journey actually looked like.

The Problem We Were Trying to Solve

Our team was working across three environments — Dev, Test, and Production. Each was its own Microsoft Fabric workspace. On paper, this is clean architecture. In practice, it meant that every time we wanted to push a change to production, someone had to manually coordinate it. Someone had to remember what changed. Someone had to click the right buttons in the right order.

And if something broke in production at 11pm? We had no rollback plan. We had prayers.

The goal was simple: make deployments boring. When something is automated and repeatable, it stops being an event. It just becomes Tuesday.

Choosing the Tools

We already lived in the Microsoft ecosystem — Azure DevOps for version control and pipelines, Microsoft Fabric for the data platform. The natural choice was to wire them together directly.

The architecture we landed on has two layers working in harmony:

Layer 1 — Git Integration. Each Fabric workspace is connected to a branch in Azure DevOps. When a developer commits changes to the Dev branch, the Dev workspace reflects those changes. Same for Test. This means the source of truth lives in Git, not inside some workspace that only one person has access to.

Layer 2 — Fabric Deployment Pipelines. When code is ready to promote, we use Fabric's native deployment pipeline to push content from Dev → Test → Prod. This isn't just copying files — Fabric understands its own artifacts and promotes them intelligently.

The Azure DevOps pipeline is the conductor. It decides when to trigger a deployment, authenticates securely, calls the Fabric API, and waits to confirm everything succeeded before declaring victory.

The Service Principal Problem

Here's where most tutorials skip the hard part.

To automate anything in Fabric, you need a Service Principal — essentially a machine identity that your pipeline runs as, rather than impersonating a real human's account. This is non-negotiable for proper automation. A pipeline that runs under someone's personal credentials is one password reset away from breaking.

Getting the SPN set up touched every layer of the organisation:

The data platform team needed to register the application in Microsoft Entra ID
The security team needed to grant it the right permissions in Fabric — specifically, a tenant-level setting that allows service principals to call Fabric APIs at all (it's off by default)
The infrastructure team needed to store the SPN credentials in Azure Key Vault and grant the right access to the right people and pipelines

And this last point was where we learned something important about how enterprise teams operate. When the infrastructure team sent over an email saying access had been provisioned — they'd given three developers human read-access to the Key Vault to verify the secret was there, while separately ensuring the pipeline's machine identity could retrieve it at runtime.

At first that felt confusing. Why do humans need access to a secret that pipelines are supposed to use? The answer: someone needs to verify the secret exists before trusting the pipeline to run. Human verification and pipeline consumption are separate concerns, and treating them that way is actually good practice.

Secrets Done Right

The infra team made one very specific request: retrieve the secret once at pipeline startup and cache it in memory for the duration of the run. Do not re-query Key Vault from individual steps.

This pattern deserves more attention than it usually gets. Every time a pipeline task calls Key Vault, that's a network round trip, a potential throttling risk, and another thing that can fail. Fetching once, storing in a pipeline variable, and passing it through the rest of the run is cleaner, faster, and harder to break.

In the final YAML, the AzureKeyVault@2 task runs as a pre-job step — before anything else executes. By the time the first "real" task starts, the Fabric SPN secret is already loaded and masked in pipeline logs. No other task ever reaches back out to Key Vault. The token gets acquired once, used for the deployment, and discarded when the job ends.

This is a small design decision that makes the whole system more resilient.

What the Flow Actually Looks Like

On a normal day, this is what happens:

A developer finishes work in the Dev workspace, commits through Fabric's Git integration UI, and raises a Pull Request from Dev to Test. A colleague reviews and approves it — they cannot approve their own PR, a branch policy we enforced to prevent the classic problem of being both the author and the rubber stamp. When the PR merges, the pipeline fires automatically, authenticates as the SPN, and pushes Dev → Test through the Fabric Deployment Pipeline API. The pipeline polls until the operation completes and fails loudly if anything goes wrong.

For production, the same flow repeats — Test to main — but with an additional manual approval gate in Azure DevOps. Someone with prod access gets a notification, reviews what's being deployed, and clicks Approve. Only then does the automation proceed.

The whole thing takes a few minutes. The developer doesn't touch anything after raising the PR.

The Safety Net Nobody Thinks About Until They Need It

We also built a rollback stage. It does exactly one thing: re-deploy the Test workspace state back to Production, overwriting whatever bad change just landed there.

It never runs automatically. It sits in the pipeline doing nothing until the day something breaks in Prod at an inconvenient hour. Then someone manually triggers just that stage, the same production approval gate fires — because even emergency rollbacks should be authorised — and Prod is restored in minutes instead of hours.

Building the rollback before you need it is the entire point. The worst time to design a safety net is when you're already falling.

What Surprised Us

A few things caught us off guard along the way.

Fabric's API is newer than the documentation. The deployment API returns operation status asynchronously — you trigger a deploy, get back an operation ID, and then have to poll a separate endpoint to know when it finished. Several tutorials don't mention this at all. We had to add a polling loop and handle the operation ID coming back from multiple possible places in the response.

The tenant-level setting is easy to miss. If you don't enable "Service principals can use Fabric APIs" in the Fabric Admin Portal, every API call from your SPN returns a 401 and the error message doesn't tell you why. This burned an entire afternoon.

Branch policies are not optional. The moment we enforced "requestors cannot approve their own changes", the whole process felt legitimate. Before that, the CI/CD pipeline was technically automated but the governance around it was still built on trust. After, it was built on policy.

Where We Are Now

We have a deployment process that:

Cannot accidentally bypass code review
Never hardcodes a credential anywhere
Leaves a full audit trail of every deployment — who triggered it, what build number, what the deployment note said
Can be rolled back in under five minutes if something goes wrong
Runs the same way every single time, regardless of who initiates it

None of this is glamorous work. There's no clever algorithm here, no interesting data transformation, nothing that would impress someone at a conference. It's plumbing. It's the infrastructure that lets everything else be done with confidence.

But that Sunday-night feeling before a Monday stand-up? We have it now. That was the whole point.

Building something similar on Fabric? The biggest advice I'd give: get the service principal and Key Vault access sorted first, before writing a single line of YAML. Everything else flows from that foundation.