DEV Community

Cover image for DAY 4 – Structured Streaming (Basic Simulation)
Subhasis Das
Subhasis Das

Posted on

DAY 4 – Structured Streaming (Basic Simulation)

As part of Day 4 of Phase 1: Better Data Engineering in the Databricks 14 Days AI Challenge – 2 (Advanced), I explored the basics of Structured Streaming through a folder-based simulation approach.

Day-4 in Short

The objective was to simulate incremental data ingestion by monitoring a folder for incoming files and writing processed results into Delta format. Streaming input and checkpoint directories were prepared within Volume storage, and a predefined schema was used to configure streaming reads from curated data.

Notebook

During implementation, several practical challenges were encountered. Volume path validation, folder preparation, and workspace limitations prevented the use of continuous streaming triggers. The workflow therefore required adapting to an alternative trigger suitable for controlled execution. Checkpoint behavior also highlighted how previously detected files are ignored during subsequent runs, demonstrating how incremental ingestion is maintained.

Notebook

Although the streaming output could not be consistently demonstrated within the environment constraints, the exercise provided valuable insight into how storage configuration, checkpoints, and execution environments affect streaming pipelines.

Codes

Activity Log

Top comments (0)