DAY 2 – Feature Table (Silver Layer Thinking)

#ai #sql #python

As part of Day 2 of Phase 1: Better Data Engineering in the Databricks 14 Days AI Challenge – 2 (Advanced), I focused on building a user-level Feature Table using a Silver Layer approach.

The workflow started by loading the previously created merged Delta table instead of working again with raw datasets. The objective was to transform event-level records into structured user-level features that could be reused for analytics or downstream machine learning tasks.

Using PySpark aggregations, I generated features such as total events, number of purchases, total spending, and average price across interactions. Non-purchase events were intentionally included to capture overall engagement patterns rather than restricting analysis only to completed transactions.

To ensure reliability, duplicate user records were handled explicitly and feature quality validation was performed. Null validation confirmed no missing user identifiers, while descriptive statistics helped review behavior across more than 5.3 million users.