DAY 8 - Batch Inference Pipeline

#ai #sql #python #database

Day 8 of Phase 2: AI System Building focused on implementing a batch inference pipeline.

Using the engineered Silver feature table, feature vectors were assembled and applied to the trained Random Forest model to score over 5.3 million users. The model generated prediction probabilities and class outputs, which were then persisted into a managed Gold Delta table to simulate a production-style scoring layer.

During implementation, Spark ML probability outputs were stored as VectorUDT types, requiring explicit conversion before extracting class probabilities. Additionally, notebook schema rendering messages initially appeared as errors but were confirmed to be display-related rather than pipeline failures. These debugging steps reinforced the importance of understanding Spark’s internal data types during inference workflows.