Day 8 of Phase 2: AI System Building focused on implementing a batch inference pipeline.
Using the engineered Silver feature table, feature vectors were assembled and applied to the trained Random Forest model to score over 5.3 million users. The model generated prediction probabilities and class outputs, which were then persisted into a managed Gold Delta table to simulate a production-style scoring layer.
During implementation, Spark ML probability outputs were stored as VectorUDT types, requiring explicit conversion before extracting class probabilities. Additionally, notebook schema rendering messages initially appeared as errors but were confirmed to be display-related rather than pipeline failures. These debugging steps reinforced the importance of understanding Sparkโs internal data types during inference workflows.
The highest-ranked users displayed probabilities close to 1.0, consistent with earlier model evaluation outcomes.
Throughout the process, ChatGPT assisted in resolving vector extraction issues and validating inference pipeline logic within Databricks.
This exercise completed the transition from experimentation to operational batch scoring in the AI system workflow.







Top comments (0)