DEV Community

Cover image for DAY 6 - Model Training & Tuning
Subhasis Das
Subhasis Das

Posted on

DAY 6 - Model Training & Tuning

As part of Day 6 of Phase 2: AI System Building in the Databricks 14 Days AI Challenge – 2 (Advanced), I focused on model training, tuning, and evaluation using the supervised dataset prepared earlier.

Visual Concept

Feature vectors were assembled from engineered user-level metrics, and both Logistic Regression and Random Forest classifiers were trained using an 80/20 train-test split. Model performance was evaluated using ROC-AUC to ensure threshold-independent comparison.

Notebook

Notebook

Due to workspace limitations in the shared/serverless environment, CrossValidator-based tuning was not supported because of temporary storage configuration restrictions. As a result, hyperparameter tuning for Random Forest was performed manually by iterating over different tree counts and depths.

Notebook

Notebook

The observed AUC values were extremely high (≈0.999999 for Logistic Regression and 1.0 for Random Forest). This highlighted an important modeling insight regarding feature-label relationships and the need to carefully assess potential information leakage in supervised learning workflows.

Notebook

During implementation, ChatGPT supported validation of model configuration, evaluation logic, environment troubleshooting, and interpretation of performance metrics within scalable AI system design practices inside Databricks.

Codes

Activity Log

Top comments (0)