Ravi Kiran Pagidi

Posted on Jun 5

Using Microsoft Fabric Shortcuts to Avoid Duplicate Data Copies

#microsoftfabric #dataengineering #lakehouse #azure

Enterprise data platforms are really good at one thing: creating copies of the same data everywhere. Different teams copy the same curated folders into their own lakehouses, then copy again into another workspace "for reporting," then again for a data science sandbox. Storage grows, pipelines multiply, and nobody is sure which copy is the source of truth anymore.

Microsoft Fabric Shortcuts give us a way out of that pattern by letting a Fabric Lakehouse reference data where it already lives instead of copying it again. You still get a first-class experience in the Lakehouse, SQL endpoint, and Power BI, but the bytes stay in one place.

What Are Microsoft Fabric Shortcuts?

In plain terms, a shortcut in Microsoft Fabric is a logical link that points from your Lakehouse (or other Fabric item) to some other storage location. In the Lakehouse explorer it looks like a regular folder or table, but the data is actually being read from the target location.

Supported shortcut targets today include:

Other OneLake locations (files or tables in different workspaces or lakehouses)
Azure Data Lake Storage Gen2 (ADLS Gen2) accounts and containers
Amazon S3 buckets
Dataverse and other external sources via Fabric connectors

Think of a shortcut as a "symbolic link" in OneLake: it has a shortcut path where it appears in your Lakehouse and a target path that points to where the data really lives. No data is physically moved or duplicated when you create one.

Why Duplicate Data Copies Become a Problem

Most enterprises end up with multiple copies of the same core datasets scattered across environments and workspaces. Typical failure modes:

Every team builds its own "copy pipeline" from the same source into its own lakehouse or workspace
Dev, UAT, and prod end up being fed by slightly different pipelines or schedules, so schema and data drift over time
Pipelines exist solely to move data from one lake to another (ADLS to Fabric, or workspace-to-workspace) with no real transformation
Storage costs grow linearly with the number of teams and environments, and nobody feels responsible because "storage is cheap"
Freshness SLAs become hard to manage, because each copy has its own schedule and failure modes
Governance teams now have to manage access and data protection policies across several physical copies of the same sensitive data
Debugging becomes painful when teams disagree about which version of a dataset is correct

A concrete example: A customer transactions table is copied from ADLS into a Fabric Lakehouse. Then it is copied again into another lakehouse for reporting. Then copied again for data science experiments. Each copy adds storage cost, a new pipeline to monitor, another access policy to manage, and another potential source of stale or inconsistent data. By the time something breaks, you are not sure which copy is authoritative.

How Fabric Shortcuts Solve This

Shortcuts let teams:

Reference data without physically moving it
Build logical lakehouse views over existing data sources
Reduce redundant ETL/ELT pipelines
Keep the original data as the single source of truth
Enable multiple teams to consume the same dataset consistently
Simplify medallion or domain-based architectures

Here is how the data flow looks when shortcuts are used correctly:

Source Data in ADLS / OneLake / S3
          |
    Fabric Shortcut
          |
    Fabric Lakehouse
          |
  SQL Endpoint / Semantic Model / Power BI / Data Science

And a broader architecture view:

[Enterprise Data Lake (ADLS Gen2 / OneLake)]
          |
          |  Shortcut (no physical copy)
          v
[Fabric Lakehouse: Curated Zone]
          |
          +---> [Power BI Semantic Model]
          |
          +---> [Data Science Notebook]
          |
          +---> [SQL Analytics Endpoint]
          |
          +---> [Downstream Data Product]

The source remains authoritative. Consumers get clean, governed access without owning the underlying data.

Real-World Use Case

Scenario: A large enterprise already has curated Delta tables in Azure Data Lake Storage Gen2. Multiple teams want to use those datasets in Microsoft Fabric for reporting, analytics, and AI use cases. Instead of building new copy pipelines into Fabric, the data engineering team creates shortcuts from a Fabric Lakehouse to the existing curated folders in ADLS.

Implementation steps:

Step 1: Identify trusted curated data in ADLS
Before creating any shortcuts, confirm the source folders contain governed, validated data. Raw or unvalidated folders are not good shortcut candidates.

Step 2: Create a Fabric Lakehouse for the analytics domain
Set up a dedicated lakehouse in the appropriate Fabric workspace. Apply workspace roles and permissions aligned with the consuming team.

Step 3: Add shortcuts to the curated folders
Navigate to the Lakehouse, select New Shortcut, choose ADLS Gen2, provide the connection and folder path, and create the shortcut. It appears immediately as a folder or table reference in the Lakehouse.

Step 4: Validate table structure and permissions
Confirm the shortcut resolves correctly, the data schema is as expected, and that end users have the right access through Fabric's permission model and OneLake security.

Step 5: Build SQL views or semantic models on top
Use the SQL Analytics Endpoint to create views or expose tables. Build a Power BI semantic model on top for reporting teams. Keep the raw shortcut path abstracted from end users.

Step 6: Let reporting and analytics teams consume without extra copies
Reporting, data science, and analytics teams now access the same data through Fabric. No additional pipelines. No additional storage. One source of truth.

Business impact:

Less storage duplication
Fewer pipelines to build and maintain
Faster onboarding of new data products
Reduced data freshness issues
Better alignment with data governance policies
Simpler architecture to explain and audit

Architecture Pattern: Shortcut-Based Lakehouse Consumption

Pattern name: Shortcut-Based Lakehouse Consumption Pattern

Layers:

Layer	What It Contains
Source Layer	ADLS Gen2, OneLake, S3, Dataverse, existing lakehouses
Shortcut Layer	Logical references inside Fabric Lakehouse
Consumption Layer	Lakehouse tables, SQL endpoint, notebooks, semantic models
Governance Layer	Microsoft Purview, Fabric permissions, workspace roles, OneLake security
Monitoring Layer	Pipeline monitoring, usage tracking, access auditing

Architecture diagram:

[Source Systems]
       |
       v
[Raw / Curated Data in ADLS or OneLake]
       |
       |  Fabric Shortcut
       v
[Fabric Lakehouse]
       |
       +---> [SQL Endpoint]
       +---> [Power BI Semantic Model]
       +---> [Data Science / ML]
       +---> [Business Data Product]
       |
       v
[Governance, Security, Monitoring]

Example Implementation

Creating a Shortcut from a Fabric Lakehouse to ADLS Gen2

Steps in the Fabric UI:

Open your Fabric workspace
Create or open an existing Lakehouse
In the Lakehouse explorer, go to Files or Tables
Click New Shortcut
Choose Azure Data Lake Storage Gen2 as the source
Provide the connection details (storage account, container, credential)
Select the target folder
Name the shortcut and create it
Validate the data in Lakehouse Explorer
Use the data from notebooks, SQL endpoint, or Power BI

Reading shortcut data from a PySpark notebook

df = spark.read.format("delta").load("Files/shortcuts/customer_transactions")
display(df.limit(10))

If the shortcut points to a Delta-formatted folder, Spark reads it directly. If the data is in Parquet or CSV, adjust the format accordingly.

Querying via the SQL Analytics Endpoint

SELECT TOP 100 *
FROM lakehouse.customer_transactions;

Note: whether a shortcut appears under Files or Tables in the Lakehouse explorer depends on how it was created and whether the target folder is a recognized Delta table. If it appears under Files only, you can register it as a table using CREATE TABLE in a notebook or via the Lakehouse UI.

When to Use Shortcuts

Good scenarios:

You already have trusted, governed data in ADLS or OneLake
Multiple teams need access to the same dataset without owning it
You want to avoid building copy pipelines just to move data between workspaces
You are building domain-oriented or product-oriented lakehouses
You want faster analytics access without waiting for a pipeline to run
You are connecting Fabric to an existing cloud storage investment

When Not to Use Shortcuts

Shortcuts are not always the right answer. Avoid or be careful when:

The source data is not governed, validated, or trusted
Permissions are unclear or inconsistently applied at the source
Performance requirements need optimized physical layout inside Fabric (compaction, partitioning, Z-ordering)
The source folder structure is messy or changes frequently
The consuming team expects full ownership and control of the data
Cross-cloud latency or egress cost is a concern (for example, S3 shortcuts)
The source is controlled by an external team with no SLA alignment

Best Practices

Use shortcuts primarily for trusted curated datasets, not raw ingestion zones
Keep naming conventions clean and consistent with your lakehouse standards
Document shortcut ownership: who created it, what it points to, and who the source owner is
Avoid creating shortcuts to random raw folders just because it is convenient
Validate access controls before promoting shortcuts to production
Use semantic models or SQL views to abstract the shortcut path from end consumers
Monitor usage and performance, especially for cross-cloud shortcuts
Align shortcuts with data product boundaries, not just individual tables
Do not treat shortcuts as a substitute for proper data governance
Maintain a clear source-of-truth policy so teams know which shortcut is authoritative

Common Pitfalls

Users assume the data is physically stored in Fabric. It is not. If the source is unavailable or deleted, the shortcut breaks.
Teams delete or reorganize source folders without knowing shortcuts depend on them. This silently breaks downstream consumption.
Permissions work for engineers but fail for business users. Always test access with a non-admin account before go-live.
Too many shortcuts create a confusing lakehouse structure. Organize them with clear folder hierarchies and naming.
No documentation for where shortcut data comes from. Future team members have no idea what the shortcut points to or why.
Shortcuts are used to bypass proper data modeling. A shortcut to a raw table is not a curated data product.

Conclusion

Microsoft Fabric Shortcuts are not just a convenience feature. They are an important architectural pattern for reducing duplicate data copies, simplifying enterprise lakehouse design, and accelerating analytics adoption. Used correctly, they help teams build cleaner, cheaper, and more governable data platforms.

But like any architecture pattern, they need ownership, naming standards, security design, and monitoring. A shortcut without governance is just technical debt with a different shape.

"The best data architecture is not always the one that moves data faster. Sometimes, it is the one that avoids moving data unnecessarily."

DEV Community