DEV Community

Cover image for DAY 8 - Batch Inference Pipeline
Subhasis Das
Subhasis Das

Posted on

DAY 8 - Batch Inference Pipeline

Day 8 of Phase 2: AI System Building focused on implementing a batch inference pipeline.

Concept Visual

Using the engineered Silver feature table, feature vectors were assembled and applied to the trained Random Forest model to score over 5.3 million users. The model generated prediction probabilities and class outputs, which were then persisted into a managed Gold Delta table to simulate a production-style scoring layer.

Notebook

Notebook

During implementation, Spark ML probability outputs were stored as VectorUDT types, requiring explicit conversion before extracting class probabilities. Additionally, notebook schema rendering messages initially appeared as errors but were confirmed to be display-related rather than pipeline failures. These debugging steps reinforced the importance of understanding Sparkโ€™s internal data types during inference workflows.

Notebook

Notebook

The highest-ranked users displayed probabilities close to 1.0, consistent with earlier model evaluation outcomes.

Notebook

Throughout the process, ChatGPT assisted in resolving vector extraction issues and validating inference pipeline logic within Databricks.

Codes

This exercise completed the transition from experimentation to operational batch scoring in the AI system workflow.

Activity Log

Top comments (0)