Delta Tables in Microsoft Fabric: What They Are and How They're Structured

#database #dataengineering #microsoft #opensource

If you've worked with the Microsoft Fabric Lakehouse, you've probably noticed that all your managed tables are stored as Delta tables. But what exactly is a Delta table? What does it look like on disk? And why does Fabric use it as the default format?

This blog answers all of that simply and clearly.

What Is a Delta Table?

A Delta table is a table stored in the Delta Lake open-source format. It's built on top of regular Parquet files (a popular columnar file format) — but with a key addition: a transaction log that tracks every change made to the table.

That transaction log is what makes Delta tables different from plain files. It gives you:

ACID transactions — Reads and writes are reliable. No partial writes, no corrupt data.
Time travel — Query the table as it looked yesterday, last week, or any past version.
Schema enforcement — Delta rejects data that doesn't match the table's schema.
Efficient updates and deletes — You can actually UPDATE or DELETE rows, which you can't do with plain Parquet files.

Where Do Delta Tables Live in the Fabric Lakehouse?

In Microsoft Fabric, your Lakehouse is connected to OneLake — Fabric's unified storage layer. OneLake uses an ADLS Gen2-compatible folder structure under the hood.

Every Lakehouse has two sections:

Section	What it is
Tables	Managed Delta tables. Schema tracked by Fabric. Appear in the SQL endpoint automatically.
Files	Raw files (CSV, JSON, Parquet, etc.) you manage yourself. Not automatically queryable as tables.

When you create a Delta table in the Lakehouse (either via Spark, a pipeline, or a Dataflow), it gets stored inside the Tables folder.

The Folder Structure of a Delta Table

This is the most important part. Let's say you create a table called sales. Here's what the folder structure looks like in OneLake:

Lakehouse/
└── Tables/
    └── sales/
        ├── _delta_log/
        │   ├── 00000000000000000000.json
        │   ├── 00000000000000000001.json
        │   ├── 00000000000000000002.json
        │   └── ... (one file per transaction)
        ├── part-00000-<uuid>.snappy.parquet
        ├── part-00001-<uuid>.snappy.parquet
        └── part-00002-<uuid>.snappy.parquet

Let's walk through each part.

The Parquet Files — Your Actual Data

The files named part-00000-....snappy.parquet are where your data lives. Each file is a Parquet file — a compressed, columnar binary format optimized for analytical queries.

A few things to know:

There can be many Parquet files per table, depending on how many Spark partitions were used when writing.
Each file is self-contained. You can read it independently.
They are compressed (usually Snappy or ZSTD), so they're much smaller than equivalent CSV files.
They are columnar — meaning if you query only the revenue column, only that column's data is read from disk. This makes analytical queries very fast.

When you have a large table, there could be hundreds of these Parquet files. Spark reads them all in parallel.

The `_delta_log` Folder — The Transaction Log

This is the heart of Delta Lake. The _delta_log folder contains a series of JSON files, one per transaction (or commit).

Every time something changes in the table — an INSERT, a DELETE, an UPDATE, a schema change — Delta writes a new JSON file to _delta_log with a description of what happened.

Here's what a simple log entry (simplified) looks like:

{
  "add": {
    "path": "part-00000-abc123.snappy.parquet",
    "size": 1048576,
    "stats": "{\"numRecords\": 50000, \"minValues\": {\"date\": \"2024-01-01\"}, \"maxValues\": {\"date\": \"2024-03-31\"}}"
  }
}

And when a file is removed (after an UPDATE or DELETE):

{
  "remove": {
    "path": "part-00000-abc123.snappy.parquet",
    "deletionTimestamp": 1710000000000
  }
}

The log is append-only. Nothing is deleted from it. This is how Delta supports time travel — you can replay the log up to any version to reconstruct the table at that point in time.

Checkpoints: Keeping the Log Fast

As the log grows (thousands of transactions), reading all those JSON files to figure out the current state of the table gets slow. Delta solves this with checkpoints.

Every 10 commits (by default), Delta writes a checkpoint file in Parquet format that summarizes the full state of the table at that point. Future reads only need to read the latest checkpoint + any newer log files after it.

_delta_log/
├── 00000000000000000000.json
├── ...
├── 00000000000000000010.json
├── 00000000000000000010.checkpoint.parquet   ← checkpoint
├── 00000000000000000011.json
├── 00000000000000000012.json
└── ...

You'll see these checkpoint files appear naturally in your Lakehouse as tables get updated over time.

Partitioned Delta Tables

For large tables, you'll typically partition your data — split the files into subfolders based on a column value. For example, partitioning a sales table by year and month looks like this:

Tables/
└── sales/
    ├── _delta_log/
    ├── year=2023/
    │   ├── month=01/
    │   │   └── part-00000-<uuid>.snappy.parquet
    │   └── month=02/
    │       └── part-00000-<uuid>.snappy.parquet
    └── year=2024/
        ├── month=01/
        │   └── part-00000-<uuid>.snappy.parquet
        └── month=02/
            └── part-00000-<uuid>.snappy.parquet

Partitioning is a performance optimization. If you query WHERE year = 2024 AND month = 01, Spark only reads the files in that one subfolder — skipping everything else. For tables with years of data, this makes an enormous difference.

How Fabric Uses This Structure

In Microsoft Fabric:

The Lakehouse UI reads the _delta_log to show you table metadata, column names, row counts, and table history. This all comes from the transaction log.
The SQL Analytics Endpoint is automatically built on top of your Delta tables. Fabric reads the Delta log to register the tables and their schemas, making them instantly queryable with T-SQL.
Power BI Direct Lake mode reads the Parquet files directly using V-Order optimization (Fabric writes Parquet files in a special ordered format), bypassing the need to import or cache data. This is why Direct Lake is faster than Import mode.
Time travel works out of the box. You can run SELECT * FROM sales VERSION AS OF 5 in Spark SQL to see the table as it was at version 5.

A Quick Example: Inspecting Your Delta Table

In a Fabric notebook, you can inspect the table history and files easily:

# View the history of all changes made to the table
display(spark.sql("DESCRIBE HISTORY sales"))

# View the individual files that make up the table right now
display(spark.sql("DESCRIBE DETAIL sales"))

# Time travel: query the table as it was at version 2
display(spark.sql("SELECT * FROM sales VERSION AS OF 2"))

You can also read the _delta_log files directly if you're curious:

log = spark.read.json("abfss://<workspace>@onelake.dfs.fabric.microsoft.com/<lakehouse>.Lakehouse/Tables/sales/_delta_log/*.json")
display(log)

Summary: What a Delta Table Really Is

Component	What it does
Parquet files	Store the actual data, compressed and columnar
`_delta_log/` JSON files	Record every transaction — adds, removes, schema changes
Checkpoint files	Summarize table state every 10 commits for fast reads
Partition folders	Optional subfolders by column value for query performance

A Delta table is not magic — it's Parquet files you can already read, plus a log folder that makes those files transactional, versioned, and reliable.

Microsoft Fabric builds everything on top of this structure: the Lakehouse SQL endpoint, Direct Lake Power BI, time travel, and ACID-safe pipelines. Understanding the folder structure helps you reason about how your data is stored, why queries perform the way they do, and how to troubleshoot when something looks off.