DEV Community

Cover image for DAY 11 – Time Travel & Data Recovery
Subhasis Das
Subhasis Das

Posted on

DAY 11 – Time Travel & Data Recovery

Day 11 focused on Delta Lake’s time travel functionality and how historical data versions can be accessed in production data systems.

Visual Concept

Two test records were appended to the ecom_orders Delta table to simulate a new ingestion event. Using DESCRIBE HISTORY, the table version history was examined to identify the newly created version. The dataset was then queried using VERSION AS OF to retrieve the table state before the append operation.

Notebook

Notebook

Row counts were compared between Version 6 and Version 7 to validate the append operation. The dataset size increased from 312,456,680 rows to 312,456,682 rows, confirming that two new records were successfully added.

Notebook

Notebook

Additional filtering queries isolated the newly inserted rows using high user IDs. Timestamp-based time travel was also demonstrated to retrieve the table snapshot immediately before the append occurred.

Notebook

An earlier attempt to query the initial table version failed due to Delta retention policies and a prior VACUUM operation, highlighting an important production consideration when relying on historical table versions.

Notebook

During the implementation process, ChatGPT helped diagnose schema mismatches during append operations and guided the correct use of Delta time travel queries within Databricks.

Codes

Activity Log

Top comments (0)