DEV Community

Cover image for Using Microsoft Fabric Shortcuts to Avoid Duplicate Data Copies
Ravi Kiran Pagidi
Ravi Kiran Pagidi

Posted on

Using Microsoft Fabric Shortcuts to Avoid Duplicate Data Copies

Enterprise data platforms are really good at one thing: creating copies of the same data everywhere. Different teams copy the same curated folders into their own lakehouses, then copy again into another workspace "for reporting," then again for a data science sandbox. Storage grows, pipelines multiply, and nobody is sure which copy is the source of truth anymore.

Microsoft Fabric Shortcuts give us a way out of that pattern by letting a Fabric Lakehouse reference data where it already lives instead of copying it again. You still get a first-class experience in the Lakehouse, SQL endpoint, and Power BI, but the bytes stay in one place.


What Are Microsoft Fabric Shortcuts?

In plain terms, a shortcut in Microsoft Fabric is a logical link that points from your Lakehouse (or other Fabric item) to some other storage location. In the Lakehouse explorer it looks like a regular folder or table, but the data is actually being read from the target location.

Supported shortcut targets today include:

  • Other OneLake locations (files or tables in different workspaces or lakehouses)
  • Azure Data Lake Storage Gen2 (ADLS Gen2) accounts and containers
  • Amazon S3 buckets
  • Dataverse and other external sources via Fabric connectors

Think of a shortcut as a "symbolic link" in OneLake: it has a shortcut path where it appears in your Lakehouse and a target path that points to where the data really lives. No data is physically moved or duplicated when you create one.


Why Duplicate Data Copies Become a Problem

Most enterprises end up with multiple copies of the same core datasets scattered across environments and workspaces. Typical failure modes:

  • Every team builds its own "copy pipeline" from the same source into its own lakehouse or workspace
  • Dev, UAT, and prod end up being fed by slightly different pipelines or schedules, so schema and data drift over time
  • Pipelines exist solely to move data from one lake to another (ADLS to Fabric, or workspace-to-workspace) with no real transformation
  • Storage costs grow linearly with the number of teams and environments, and nobody feels responsible because "storage is cheap"
  • Freshness SLAs become hard to manage, because each copy has its own schedule and failure modes
  • Governance teams now have to manage access and data protection policies across several physical copies of the same sensitive data
  • Debugging becomes painful when teams disagree about which version of a dataset is correct

A concrete example: A customer transactions table is copied from ADLS into a Fabric Lakehouse. Then it is copied again into another lakehouse for reporting. Then copied again for data science experiments. Each copy adds storage cost, a new pipeline to monitor, another access policy to manage, and another potential source of stale or inconsistent data. By the time something breaks, you are not sure which copy is authoritative.


How Fabric Shortcuts Solve This

Shortcuts let teams:

  • Reference data without physically moving it
  • Build logical lakehouse views over existing data sources
  • Reduce redundant ETL/ELT pipelines
  • Keep the original data as the single source of truth
  • Enable multiple teams to consume the same dataset consistently
  • Simplify medallion or domain-based architectures

Here is how the data flow looks when shortcuts are used correctly:

Source Data in ADLS / OneLake / S3
          |
    Fabric Shortcut
          |
    Fabric Lakehouse
          |
  SQL Endpoint / Semantic Model / Power BI / Data Science
Enter fullscreen mode Exit fullscreen mode

And a broader architecture view:

[Enterprise Data Lake (ADLS Gen2 / OneLake)]
          |
          |  Shortcut (no physical copy)
          v
[Fabric Lakehouse: Curated Zone]
          |
          +---> [Power BI Semantic Model]
          |
          +---> [Data Science Notebook]
          |
          +---> [SQL Analytics Endpoint]
          |
          +---> [Downstream Data Product]
Enter fullscreen mode Exit fullscreen mode

The source remains authoritative. Consumers get clean, governed access without owning the underlying data.


Real-World Use Case

Scenario: A large enterprise already has curated Delta tables in Azure Data Lake Storage Gen2. Multiple teams want to use those datasets in Microsoft Fabric for reporting, analytics, and AI use cases. Instead of building new copy pipelines into Fabric, the data engineering team creates shortcuts from a Fabric Lakehouse to the existing curated folders in ADLS.

Implementation steps:

Step 1: Identify trusted curated data in ADLS
Before creating any shortcuts, confirm the source folders contain governed, validated data. Raw or unvalidated folders are not good shortcut candidates.

Step 2: Create a Fabric Lakehouse for the analytics domain
Set up a dedicated lakehouse in the appropriate Fabric workspace. Apply workspace roles and permissions aligned with the consuming team.

Step 3: Add shortcuts to the curated folders
Navigate to the Lakehouse, select New Shortcut, choose ADLS Gen2, provide the connection and folder path, and create the shortcut. It appears immediately as a folder or table reference in the Lakehouse.

Step 4: Validate table structure and permissions
Confirm the shortcut resolves correctly, the data schema is as expected, and that end users have the right access through Fabric's permission model and OneLake security.

Step 5: Build SQL views or semantic models on top
Use the SQL Analytics Endpoint to create views or expose tables. Build a Power BI semantic model on top for reporting teams. Keep the raw shortcut path abstracted from end users.

Step 6: Let reporting and analytics teams consume without extra copies
Reporting, data science, and analytics teams now access the same data through Fabric. No additional pipelines. No additional storage. One source of truth.

Business impact:

  • Less storage duplication
  • Fewer pipelines to build and maintain
  • Faster onboarding of new data products
  • Reduced data freshness issues
  • Better alignment with data governance policies
  • Simpler architecture to explain and audit

Architecture Pattern: Shortcut-Based Lakehouse Consumption

Pattern name: Shortcut-Based Lakehouse Consumption Pattern

Layers:

Layer What It Contains
Source Layer ADLS Gen2, OneLake, S3, Dataverse, existing lakehouses
Shortcut Layer Logical references inside Fabric Lakehouse
Consumption Layer Lakehouse tables, SQL endpoint, notebooks, semantic models
Governance Layer Microsoft Purview, Fabric permissions, workspace roles, OneLake security
Monitoring Layer Pipeline monitoring, usage tracking, access auditing

Architecture diagram:

[Source Systems]
       |
       v
[Raw / Curated Data in ADLS or OneLake]
       |
       |  Fabric Shortcut
       v
[Fabric Lakehouse]
       |
       +---> [SQL Endpoint]
       +---> [Power BI Semantic Model]
       +---> [Data Science / ML]
       +---> [Business Data Product]
       |
       v
[Governance, Security, Monitoring]
Enter fullscreen mode Exit fullscreen mode

Example Implementation

Creating a Shortcut from a Fabric Lakehouse to ADLS Gen2

Steps in the Fabric UI:

  1. Open your Fabric workspace
  2. Create or open an existing Lakehouse
  3. In the Lakehouse explorer, go to Files or Tables
  4. Click New Shortcut
  5. Choose Azure Data Lake Storage Gen2 as the source
  6. Provide the connection details (storage account, container, credential)
  7. Select the target folder
  8. Name the shortcut and create it
  9. Validate the data in Lakehouse Explorer
  10. Use the data from notebooks, SQL endpoint, or Power BI

Reading shortcut data from a PySpark notebook

df = spark.read.format("delta").load("Files/shortcuts/customer_transactions")
display(df.limit(10))
Enter fullscreen mode Exit fullscreen mode

If the shortcut points to a Delta-formatted folder, Spark reads it directly. If the data is in Parquet or CSV, adjust the format accordingly.

Querying via the SQL Analytics Endpoint

SELECT TOP 100 *
FROM lakehouse.customer_transactions;
Enter fullscreen mode Exit fullscreen mode

Note: whether a shortcut appears under Files or Tables in the Lakehouse explorer depends on how it was created and whether the target folder is a recognized Delta table. If it appears under Files only, you can register it as a table using CREATE TABLE in a notebook or via the Lakehouse UI.


When to Use Shortcuts

Good scenarios:

  • You already have trusted, governed data in ADLS or OneLake
  • Multiple teams need access to the same dataset without owning it
  • You want to avoid building copy pipelines just to move data between workspaces
  • You are building domain-oriented or product-oriented lakehouses
  • You want faster analytics access without waiting for a pipeline to run
  • You are connecting Fabric to an existing cloud storage investment

When Not to Use Shortcuts

Shortcuts are not always the right answer. Avoid or be careful when:

  • The source data is not governed, validated, or trusted
  • Permissions are unclear or inconsistently applied at the source
  • Performance requirements need optimized physical layout inside Fabric (compaction, partitioning, Z-ordering)
  • The source folder structure is messy or changes frequently
  • The consuming team expects full ownership and control of the data
  • Cross-cloud latency or egress cost is a concern (for example, S3 shortcuts)
  • The source is controlled by an external team with no SLA alignment

Best Practices

  • Use shortcuts primarily for trusted curated datasets, not raw ingestion zones
  • Keep naming conventions clean and consistent with your lakehouse standards
  • Document shortcut ownership: who created it, what it points to, and who the source owner is
  • Avoid creating shortcuts to random raw folders just because it is convenient
  • Validate access controls before promoting shortcuts to production
  • Use semantic models or SQL views to abstract the shortcut path from end consumers
  • Monitor usage and performance, especially for cross-cloud shortcuts
  • Align shortcuts with data product boundaries, not just individual tables
  • Do not treat shortcuts as a substitute for proper data governance
  • Maintain a clear source-of-truth policy so teams know which shortcut is authoritative

Common Pitfalls

  • Users assume the data is physically stored in Fabric. It is not. If the source is unavailable or deleted, the shortcut breaks.
  • Teams delete or reorganize source folders without knowing shortcuts depend on them. This silently breaks downstream consumption.
  • Permissions work for engineers but fail for business users. Always test access with a non-admin account before go-live.
  • Too many shortcuts create a confusing lakehouse structure. Organize them with clear folder hierarchies and naming.
  • No documentation for where shortcut data comes from. Future team members have no idea what the shortcut points to or why.
  • Shortcuts are used to bypass proper data modeling. A shortcut to a raw table is not a curated data product.

Conclusion

Microsoft Fabric Shortcuts are not just a convenience feature. They are an important architectural pattern for reducing duplicate data copies, simplifying enterprise lakehouse design, and accelerating analytics adoption. Used correctly, they help teams build cleaner, cheaper, and more governable data platforms.

But like any architecture pattern, they need ownership, naming standards, security design, and monitoring. A shortcut without governance is just technical debt with a different shape.

"The best data architecture is not always the one that moves data faster. Sometimes, it is the one that avoids moving data unnecessarily."

Top comments (0)