DEV Community

Reiji Otake
Reiji Otake

Posted on

Fabric & Databricks Interoperability (1): Purpose of Hub Storage for Table Sharing

Introduction

Although they each have their own characteristics, Microsoft Fabric and Databricks are broadly similar in what they can do.

Through Azure Databricks Unity Catalog mirroring, we are now able to reference Databricks-managed data in Fabric, but editing the data is still not possible.

This brings up the following concerns:

  • Is it possible for our department to use tables managed by other departments in Databricks via our Fabric?
  • I want to make the tables I create open for modification and reference, regardless of the tool used!
  • We are currently using Databricks, but we might migrate to Fabric in the future... we want to maintain a vendor-free stance.
  • Business-side employees use Fabric, but engineers use Databricks; there are times when we need to reference the same table.

In this article, I will introduce use cases for seamlessly utilizing Fabric and Databricks:

  • Using tables created in Databricks in Fabric
  • Using tables created in Fabric in Databricks

The goal of this article:
image.png

:::note info
This article consists of four parts:

  1. Overview and purpose of interoperability (this article)
  2. Detailed setup of hub storage
  3. Using tables created in Fabric in Databricks
  4. Using tables created in Databricks in Fabric :::

Prerequisite: Fabric and Databricks have similar functions... which one should we actually use?

image.png
Fabric and Databricks are both attracting attention as Lakehouse platforms that handle data end-to-end.

As someone new to the industry, my first impression of using both was that "They can probably do about the same things?".
They both support ETL processes and AI model creation.

Fabric is appealing because of its beginner-friendly GUI, designed for intuitive operations.
On the other hand, Databricks is more code-based, so it seems to require a slightly higher skill level.
Additionally, Databricks offers more customization options for computer resources, and if you stop the cluster frequently, it can be more cost-effective than Fabric.

I believe the choice between these platforms depends on whether you prioritize ease of use or flexibility.

▽Reference

Azure Databricks と Microsoft Fabric の関係を考える🧐 #PowerBI - Qiita

はじめに本記事はDatabricks アドベントカレンダー2024 7日目の記事です。https://qiita.com/advent-calendar/2024/databricks/本記事…

favicon qiita.com

Considering interoperability methods

Here, I will explore methods for achieving interoperability between Fabric and Databricks.

image.png

① Unity Catalog Mirroring (Currently DML from Fabric to Databricks is not supported)

Using Azure Databricks Unity Catalog mirroring (preview), it is possible to reference (SELECT statements) Databricks tables from Fabric, but editing (DML statements) is currently not supported.

Thus, this method is not suitable for interoperability.

Please let me know if my understanding is incorrect 🙇

② Specifying OneLake as an external location from Databricks (Currently not supported)

I tried configuring an external location for OneLake via cloud storage connection to Azure Databricks, but it is not currently supported.

I was hoping this method would work, but unfortunately, it doesn't...
So this method is also not suitable for interoperability.

▽Reference

Azure Databricks から OneLake 上のデータにアクセスする方法 2024/12 版 #Microsoft - Qiita

はじめにAzure Databricks の Unity Catalog ミラーリング を通して、Databricks の管理するデータについて Fabric で利用できるようになりましたが、Fa…

favicon qiita.com

③ Hub storage as a solution

image.png

Since mirroring and external locations didn't work, I decided to store the actual tables in Azure Data Lake Gen 2 (ADLS2).
By using a schema shortcut to ADLS2 from Fabric, and specifying ADLS2 as the storage location in Databricks' catalog, both Fabric and Databricks can perform SELECT and DML operations.

This means that interoperability between Fabric and Databricks is now possible!

From this point forward, I will refer to this ADLS2 as hub storage.

▽For the specific setup method, refer to the following article:

FabricとDatabricksの相互運用性②:hubストレージ設定方法 -Databricks で作成したテーブルをFabric で利用する、Fabric で作成したテーブルをDatabricksで利用する- #Azure - Qiita

はじめに今回はDatabricks で作成したテーブルをFabric で利用するFabric で作成したテーブルをDatabricksで利用するというユースケースを実施するための設定方法につ…

favicon qiita.com

Hub storage works thanks to Delta Lake

image.png

As explained above, hub storage allows interoperability between Fabric and Databricks.

But why does this interoperability work?

The key lies in Delta Lake, the mechanism behind it.

The mechanism of Delta Lake

Delta Lake is an open-source storage layer that provides transaction and schema management on top of a data lake. It uses a combination of Parquet and JSON as its underlying data formats. Parquet is a columnar compression format that enables fast queries and data compression, while JSON is used as a transaction log to record data change history and versioning.

By leveraging the Delta Lake mechanism, hub storage enables advanced data sharing and operations. When using Fabric or Databricks, it’s crucial to understand the underlying infrastructure to fully take advantage of the features provided by Delta Lake.

▽For more details on Delta Lake, refer to the official documentation:

Delta Lake とは - Azure Databricks | Microsoft Learn

Databricks レイクハウスに電源を供給するために使用される Delta Lake ストレージ プロトコルについて説明します。

learn.microsoft.com

Conclusion

In this article, I summarized the purpose and methods of interoperability between Fabric and Databricks.

The next article will provide a detailed guide for setting up hub storage.

▽Next article:

FabricとDatabricksの相互運用性②:hubストレージ設定方法 -Databricks で作成したテーブルをFabric で利用する、Fabric で作成したテーブルをDatabricksで利用する- #Azure - Qiita

はじめに今回はDatabricks で作成したテーブルをFabric で利用するFabric で作成したテーブルをDatabricksで利用するというユースケースを実施するための設定方法につ…

favicon qiita.com

Sentry image

Hands-on debugging session: instrument, monitor, and fix

Join Lazar for a hands-on session where you’ll build it, break it, debug it, and fix it. You’ll set up Sentry, track errors, use Session Replay and Tracing, and leverage some good ol’ AI to find and fix issues fast.

RSVP here →

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay