DEV Community: Neeraj Agarwal

Hybrid Row-Level Security: AWS + Power BI

Neeraj Agarwal — Wed, 29 Apr 2026 17:02:16 +0000

Most “hybrid cloud” case studies are really just migration stories. This one isn’t. A US-based physical security and fire systems integrator had 14+ subsidiary companies worth of data in AWS, thousands of field technicians in Azure Active Directory, and a board that wanted three distinct views of the same data - enforced by who the user was, not by which dashboard they clicked.

Here is the architecture we stood up, the access pattern underneath it, and the two gotchas that will cost you a week if you replicate the stack without knowing about them. The full engagement is written up as a case study on the data-warehouse modernization we delivered for this integrator; this post is the access-layer deep-dive for the practitioners.

The starting point: one data estate, thousands of field workers, three audiences
The customer is a security and fire systems integrator grown through acquisition - 14+ subsidiary operating companies spanning integration, security, and fire divisions. Each subsidiary arrived with its own ERP instance (Microsoft Business Central, mostly, with a long tail), its own Dynamics 365 CRM tenant or local variant, and its own definition of “customer” and “project.” The pre-consolidation data estate looked like fourteen independent companies pretending to be one.

At the user layer there were three distinct reporting personas:

Executives - consolidated rollup across all 14+ subsidiaries. Division-level (integration / security / fire) comparisons, quarter-over-quarter, revenue and operational KPIs at the group level. No drill to individual techs or jobs.
Regional / subsidiary managers (RMs) - their subsidiary and territory only. Branch-level detail, job pipeline, inspection compliance, same dashboard structure across every subsidiary so execs could compare like-for-like.
Field technicians and inspectors - thousands of them, on-the-job, doing fire system inspections and security installs. Each sees only their own work: assigned jobs, their completion rates, their compliance scores. Emphatically not their peers’ numbers by name.
On the technology side:

Data lived in AWS. S3 as the landing zone, Glue + PySpark for the 20+ ETL jobs, Step Functions orchestrating the daily pipeline, AWS DMS streaming CDC from the operational ERPs/CRMs, and Lake Formation governing the catalog.
Identity lived in Azure AD - the HRMS (UKG) pushed new hires, role changes, and terminations into AD as the system of truth; everything else read from it.
Reporting was Power BI. 20+ KPI dashboards across Finance, Operations, HR, and Sales. Licensing was already paid under the M365 agreement; the user population was already in AD; switching to QuickSight was never on the table.
Asking the customer to move data to Azure was off the table - multi-year AWS contract. Asking them to replicate identity into AWS IAM was off the table - two sources of truth for “who works here” is a governance disaster, especially with thousands of field workers rotating through. So we had to make the two clouds cooperate.

The architecture, top to bottom
Azure AD (identity)
│
│ SAML 2.0 / OIDC federation
▼
AWS IAM (role-based trust)
│
│ STS AssumeRoleWithSAML
▼
Lake Formation (authorization)
│
├── Row-level filters by user attribute
├── Column-level masking for PII
└── Tag-based access control
│
▼
Athena (query layer)
│
▼
Power BI (presentation)
│
├── Executive workspace
├── Regional workspace (RLS on region)
└── Field workspace (RLS on store)
Each arrow carries user identity forward - so that by the time a query hits a table, Lake Formation knows exactly who is asking and what they are allowed to see.

Identity: Azure AD as the root of trust
Nothing in this architecture works if the AD groups don’t reflect reality. UKG pushed Subsidiary, Territory, and TechnicianId as AD user attributes on every hire, transfer, and termination. Dynamic AD groups then materialized off those attributes:

rg-executives - static group, holdco leadership
rg-subsidiary-managers - dynamic rule matching ops-leader job titles across all 14+ subsidiaries
rg-field-technicians - dynamic rule matching field-job titles (inspector, installer, service tech)
The Subsidiary, Territory, and TechnicianId attributes flowed through to the federation layer as SAML assertions. That is the single most important implementation detail in the whole stack. Without accurate AD attributes you have no row filters; without dynamic groups you spend every Monday reassigning permissions when field techs transfer between subsidiaries - and with thousands of them rotating through, that is a full-time job you do not want to create.

The federation bridge: Azure AD to AWS IAM
AWS has had SAML federation for years; the modern path is to register Azure AD as a SAML 2.0 identity provider in IAM and then create IAM roles that trust it. A condensed version of the role trust policy:

{
  "Effect": "Allow",
  "Principal": { "Federated": "arn:aws:iam::123456789012:saml-provider/AzureAD" },
  "Action": "sts:AssumeRoleWithSAML",
  "Condition": {
    "StringEquals": { "SAML:aud": "https://signin.aws.amazon.com/saml" }
  }
}

Three IAM roles were created, each mapped to one AD group via the Enterprise Applications claim rules in Azure:

LakeFormation-Executive (full catalog read across all 14+ subsidiaries)
LakeFormation-SubsidiaryManager (Lake Formation enforces subsidiary + territory filter)
LakeFormation-FieldTechnician (Lake Formation enforces per-technician filter)
When a user signs in to AWS via the Azure AD app tile, Azure issues a SAML assertion, AWS STS validates it, and the session credentials carry PrincipalTag/Subsidiary, PrincipalTag/Territory, and PrincipalTag/TechnicianId claims that Lake Formation reads later. Those claims are the whole key to the row-level filter.

Lake Formation: where permissions actually live
This is the hop that most “AWS + Power BI” tutorials skip. Giving a user an IAM role with athena:* is not the same as giving them data - Lake Formation sits between. LF permissions are set on the Glue Catalog (databases, tables, columns) and they override coarse IAM permissions for tables it governs.

The data warehouse we built had three layers in Glue Catalog:

raw_* - bronze landing tables, no user-facing grants
conformed_* - silver joins, accessible to analysts only
reporting_* - gold, the only layer Power BI queries
Lake Formation permissions on reporting_* were set via tag-based access control (LF-TBAC) rather than explicit table grants. Every table in reporting_* was tagged with domain=sales, sensitivity=standard|pii, etc. The IAM roles got grants against those tags instead of individual tables - so when a new table shipped under the sales domain, the existing permissions flowed to it automatically. One less deployment friction.

The row-level filter
The piece that does the actual filtering work is a Lake Formation data filter. A data filter is a named expression stored on a table that LF applies to every query, transparently:

-- Filter name: subsidiary_scope
-- Target table: reporting_jobs_daily
-- Filter expression:
subsidiary_id = current_user_attribute('Subsidiary')

The current_user_attribute('Subsidiary') function reads the session tag AWS propagates from the SAML assertion. The result: a subsidiary manager running SELECT * FROM reporting_jobs_daily sees only their subsidiary’s rows. No WHERE clause in the query, no report-level filter to forget, no report author needs to remember anything. The filter runs in Lake Formation before Athena returns results.

Field technicians got a second data filter keyed on technician_id - each tech sees their own assigned jobs and their own compliance scores, but not their peers’. Executives had no filter. Data analysts got a third filter that masked PII columns (customer_email redacted, customer_phone redacted) while leaving other columns alone.

Connecting Power BI: the two-hop problem
Here is the first non-obvious thing. Power BI does not natively speak SAML. Its Athena connector authenticates with AWS IAM credentials (access key + secret), not with a user’s federated SAML session. So the natural instinct - “Power BI signs in as the user” - doesn’t work out of the box.

The pattern we landed on:

Create a dedicated IAM role PowerBI-AthenaService that can query reporting_* tables
Power BI workspaces connect to Athena as that service role (credentials stored in the Power BI data gateway)
Row-level security moves into the Power BI semantic model, not Lake Formation
Wait - what happened to the whole Lake Formation row filter setup? It still runs, but now it guards direct Athena access (analysts running ad-hoc SQL, scripts assuming the user’s SAML role). The reporting pipeline uses the service role; RLS for reports is enforced at the Power BI layer using the user’s UPN.

This sounds like a downgrade. It isn’t - it’s just a different layer for a different access path. The Power BI semantic model has a [Subsidiary] dimension and defines an RLS rule:

[Subsidiary] = LOOKUPVALUE(Users[Subsidiary], Users[UPN], USERPRINCIPALNAME())

USERPRINCIPALNAME() returns the Azure AD UPN of the signed-in Power BI user. The Users dimension is refreshed nightly from AD (same UKG-sourced attributes). When a subsidiary manager opens their dashboard, Power BI filters every visual to their subsidiary automatically. A field technician gets a narrower rule keyed on TechnicianId that ends up in the same dataset but only returns their own jobs.

Why both layers
A single RLS layer is a risk surface. Lake Formation alone means anyone with the Power BI service role can bypass RLS. Power BI semantic model alone means anyone with Athena access can bypass RLS. Running both means a query has to pass both checks to surface data - and every access path (SQL-direct via Athena, Power BI report) lands in the same answer.

For regulated data specifically (healthcare, finance), regulators often require defense in depth on access controls - not “the RLS is set up in Power BI, promise.” Two enforcement layers, both independently auditable, is the posture they want to see. Pair this with our data governance consulting practice if auditor-ready controls are part of the engagement.

Three Power BI workspaces, not three reports
A common mistake: building one workspace with three different reports for three audiences. That design puts all the RLS logic in the report filters, duplicates dataset refreshes, and creates cross-visibility risk - any user who can open the workspace can (usually) see all artifacts inside it.

We built three workspaces, each with its own Power BI app:

Executive workspace - one dataset, rollup-only dashboards spanning all 14+ subsidiaries. Executives are added as workspace viewers. No RLS, because the rolled-up data has no row-level privacy.
Subsidiary-managers workspace - a separate dataset with RLS on [Subsidiary]. All SMs are viewers. Each sees only their subsidiary when they open the app.
Field-workforce workspace - a third dataset with RLS on [TechnicianId]. Every inspector and installer is a viewer. Each sees their own assigned jobs and compliance scorecard.
Why three workspaces instead of three apps off one dataset? Because RLS in Power BI is per-dataset. If a subsidiary manager is accidentally added as a workspace admin (not just a viewer) on the executive workspace, RLS does nothing for them - admins bypass RLS. Separate workspaces isolate admin access and prevent cross-audience leakage through over-granted roles. When the customer’s internal audit showed up six months later, this separation is the architectural choice they asked about first.

What we shipped
Concrete outcomes from the engagement (the full write-up is on the case study page):

14+ subsidiary ERPs unified into a single governed data lake on AWS S3, with AWS DMS streaming CDC from every operational source
20+ real-time Power BI KPI dashboards across Finance, Operations, HR, and Sales - split across the three workspaces described above
70% reduction in reporting preparation time - finance stopped rebuilding the weekly rollup in Excel and started trusting the pipeline
100% automated daily pipeline - Step Functions orchestrating Glue + PySpark; zero manual reruns after the initial backfill
Custom MDM ledger in RDS PostgreSQL - a 75+ brand dictionary resolving customer identities across the 14+ subsidiaries, without which the consolidated revenue rollup would have double-counted everything
Thousands of field technicians authenticated via existing Azure AD credentials - no new passwords, no new portal
3 IAM roles + 1 Power BI service role - the entire access matrix fits on one whiteboard
Audit outcome: the customer’s annual security audit the following quarter cleared the data platform without findings. Two independent RLS enforcement layers, attribute flow traceable from HR system (UKG) to dashboard, and no shadow service accounts was the specific combination the audit team called out.

Two gotchas worth knowing about
The enterprise engineering team was sharp and still hit both of these.

Gotcha 1: Power BI’s “Azure AD” is not always the same tenant as your AWS federation’s “Azure AD.” Large enterprises often run multiple AD tenants - one for M365 workloads, one for developer tools, one left over from an acquisition. Verify that the tenant Power BI is licensed against is the same tenant issuing SAML assertions to AWS. If not, UPNs will not match across layers and your RLS lookups silently return nothing (not an error - just empty datasets). Pre-flight check: pick three users across personas and verify their UPN is identical in Power BI and in the AD attributes that feed AWS SAML claims.

Gotcha 2: Lake Formation data filters have per-table quotas (check the current AWS limit - at time of writing it is 1,000 data filters per table). If your filter design requires a different filter per regional manager, you hit that ceiling fast. The fix is what we did: one parameterized data filter that reads a session attribute, not N hard-coded filters. The temptation to create filter_region_northeast, filter_region_southeast, etc., is strong and leads to an unmaintainable mess.

Where the pattern has been validated at larger scale
This is not a novel architecture. BMW Group runs a structurally identical pattern across their data mesh - federated identity feeding Lake Formation’s fine-grained access control, attribute-based filters gating access to data product tables, and downstream analytics tools reading through that enforcement layer. Their public AWS case study on Lake Formation fine-grained access control walks through the same federated-identity-to-LF-filter design applied at considerably larger scale (thousands of data products across hundreds of teams, versus 14+ subsidiaries here).

The takeaway for enterprises evaluating the pattern: the plumbing is proven. The work is in the AD attributes flowing cleanly from your HR system and the discipline to keep enforcement in Lake Formation (not scattered across reports). BMW’s writeup and this engagement arrive at the same answer from opposite ends of the scale spectrum.

When this pattern fits
The hybrid AWS + Azure AD + Power BI pattern is right when:

Data already lives in AWS and moving it is not an option
Identity lives in Azure AD (standard for M365 shops)
Row-level security matters at scale - typically regulated industries, multi-brand retail, or distributed field operations with thousands of workers
Power BI is already the reporting tool (avoid a BI migration unless you have another reason)
If any of those constraints don’t apply, simpler patterns exist. If all of them do, this is the path of least regret - and the one that cleared the customer’s audit the first time.

For a deeper architectural read on multi-cloud identity flow, the hybrid cloud engagement playbook walks through the governance layer in detail. If the Power BI side is where you’re stuck, Microsoft Power BI consulting covers the semantic model and RLS patterns specifically. And if you arrived here because consolidating multiple acquired companies’ data is what’s actually driving the access-control complexity, the 180-day M&A data consolidation playbook is the companion piece to this one.

Fabric OneLake shortcuts vs ADLS Gen2 mounts: what actually works in production

Neeraj Agarwal — Sat, 18 Apr 2026 15:15:55 +0000

If you've just moved a team onto Microsoft Fabric, the "how do I read our existing ADLS Gen2 data?" question comes up in the first week. Two patterns are on the table:

OneLake shortcuts - Fabric's newer, metadata-only references that make external data appear as a Lakehouse table
ADLS Gen2 mounts - the mssparkutils/notebookutils pattern that's been around since Synapse

The docs describe them as roughly interchangeable options. They aren't. We've now shipped both across ~15 Fabric projects and the failure modes are completely different.

Here's the field report.

Quick refresher - the two patterns

Shortcut (the new way)

Create via UI, REST API, or Fabric CLI. Zero-copy reference to an external location, exposed as a table or folder inside a Lakehouse:

curl -X POST \
  "https://api.fabric.microsoft.com/v1/workspaces/{wsid}/items/{lakehouseid}/shortcuts" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "path": "Tables",
    "name": "orders_raw",
    "target": {
      "adlsGen2": {
        "location": "https://prodstorage.dfs.core.windows.net",
        "subpath": "/raw/orders",
        "connectionId": "1234abcd-..."
      }
    }
  }'

Once created, Spark and SQL treat it as a normal Lakehouse table:

df = spark.read.table("Lakehouse.orders_raw")

Mount (the familiar way)

In a Fabric notebook, create a logical mount path backed by an ADLS Gen2 container using a LinkedService:

notebookutils.fs.mount(
    "abfss://raw@prodstorage.dfs.core.windows.net/orders",
    "/orders_raw",
    {"linkedService": "prod_adls_linked"}
)

df = spark.read.parquet("/synfs/nb_resource/orders_raw")

Both read the same bytes. The differences start everywhere else.

Where shortcuts are magic

1. Direct Lake mode only works with shortcuts

This is the big one. If you want your Power BI semantic models to query Lakehouse data with zero import latency via Direct Lake, the table must live in the Lakehouse - which means shortcut or physical copy. Mounts don't count.

Direct Lake reads  →  Lakehouse table  →  shortcut OR Delta files in /Tables
Direct Lake reads  ✗  Mount path

If Direct Lake is in your architecture diagram (and on Fabric it usually should be), this alone settles the argument.

2. ACLs inherit from the source

Shortcuts honor the ACL model of the underlying storage. A user who can read /raw/orders in ADLS via POSIX ACL or RBAC can read it through the shortcut - Fabric doesn't re-auth.

Mounts run under the LinkedService's identity, which is usually a service principal with broad access. You've just given every notebook in that workspace the same blast radius.

3. Zero operational overhead at mount time

Shortcut creation is metadata only. A 40TB dataset becomes queryable in under a second. Mounts require the runtime to resolve the path at session start - fine for small containers, annoying when you're re-starting sessions dozens of times a day across a team.

4. SQL endpoint sees them

A Lakehouse's SQL analytics endpoint exposes shortcut tables for free. Warehouses can cross-query them with three-part names. Mounts are invisible to the SQL endpoint - you can only read them from a notebook.

-- Works on a shortcut, fails on a mount
SELECT COUNT(*)
FROM [MyLakehouse].[dbo].[orders_raw]
WHERE order_ts >= '2026-01-01';

Where shortcuts silently break

1. Cross-region reads are expensive and slow

Shortcuts don't copy data. If your ADLS account is in westus2 and your Fabric capacity is in eastus, every query pulls bytes across the region boundary - at egress pricing, at cross-region latency.

We had one client where a dashboard that ran in 4 seconds against a co-located Lakehouse took 38 seconds through a shortcut to a foreign-region ADLS. Nobody's documentation flagged this.

Rule: shortcut target and Fabric capacity home region should match. If they can't, plan for a daily materialized copy or move the capacity.

2. Connection credentials expire in ways you won't see

When you create a shortcut via UI, Fabric prompts for credentials once and stores them as a Connection object. If that connection uses a SAS token, it expires. When it does, the shortcut starts returning:

Error: [FabricOneLakeShortcutAuthFailure] Connection refused

Not at creation time. Not logged anywhere obvious. Users just see empty tables.

Rule: use workspace identity or a managed-identity-backed connection, not SAS. If you must use SAS, alert on expiry explicitly.

3. ADLS Gen1 is not supported

We know, you thought it was fully deprecated. We had a client with a Gen1 account still humming along for a batch job. Fabric won't shortcut it - no workaround. You're migrating to Gen2 first.

4. Shortcuts to Hive-partitioned data don't auto-discover partitions

If your external ADLS Gen2 layout is /sales/year=2025/month=04/, a shortcut exposes it as a folder shortcut - not a partitioned table. You either:

Shortcut the root and read in Spark (spark.read.option("basePath", …)) - loses the Lakehouse table abstraction
Convert to Delta first on the ADLS side, then shortcut (the "right" answer, but it's work)

Mounts have the same issue, but at least there's no expectation of table-like behavior.

Performance we measured

We ran the same Spark query - a 200M-row aggregation over Parquet sitting in ADLS Gen2 - through both access patterns, from a Fabric F64 capacity. Five cold runs, five warm runs, averaged:

Access path	Cold (first run)	Warm (cached)	Notes
Delta files directly in Lakehouse `/Tables/`	11.2 s	2.1 s	Baseline. No shortcut, no mount.
Shortcut to same-region ADLS (Delta)	12.8 s	2.4 s	~15% overhead cold, negligible warm
Mount to same-region ADLS (Parquet)	14.1 s	3.0 s	Lacks Delta stats; CBO worse
Shortcut to cross-region ADLS (Delta)	38.4 s	6.7 s	Egress + latency dominates

Two takeaways:

Same-region shortcuts are essentially free
Cross-region shortcuts are a trap dressed up as convenience

When to use which - decision matrix

Requirement	Shortcut	Mount
Direct Lake mode from Power BI	✅ required	❌ not supported
SQL analytics endpoint access	✅	❌
User-level ACL enforcement	✅ honored from source	❌ runs as SPN
External Iceberg tables (preview)	✅	❌
Read ad-hoc files (CSV dumps, logs)	⚠️ overkill	✅ just mount and read
Cross-region source storage	❌ egress pain	❌ egress pain - at least you know it
ADLS Gen1	❌	❌
Write back to external storage from Fabric	❌ shortcuts are read-focused	✅ mount supports write
Quick ETL staging in a notebook	⚠️	✅ simpler

The pattern we land on

On every new Fabric engagement now we default to this:

All production table access → shortcuts. No exceptions. Direct Lake, SQL endpoint access, and source-ACL inheritance are non-negotiable in an enterprise deployment.
Shortcuts target same-region ADLS. If source is elsewhere, we budget a one-way replication with a scheduled pipeline into same-region staging, then shortcut that.
Connections use workspace identity or MI-backed Connection objects. SAS is banned unless there's a specific third-party constraint, and if it is, we alert on token expiry.
Mounts only for notebook-level ETL utility work - reading a one-off CSV, writing a temp file back to a scratch container. Never for anything a dashboard depends on.
ADLS Gen1 migrates to Gen2 before Fabric migration. Non-negotiable.

The TL;DR for teams coming from Synapse: you're going to reach for mounts because they're familiar. Resist. Shortcuts are the primitive Fabric is built around - learn them well, and most of Fabric's headline features (Direct Lake, SQL endpoint cross-query, cross-item data reuse) fall into your lap for free.

If you're mid-migration and want a second set of eyes on shortcut architecture, the team at Algoscale runs these assessments regularly. The full write-up of the migration playbook including the connection-identity pattern lives on our blog.