Step 1: Data Preparation
The first step was cleaning the dataset and handling missing values.
- 
Categorical features were replaced with 
'NA' - 
Numerical features were replaced with 
0.0 
This ensured the dataset was consistent and ready for analysis.
Step 2: Exploring the Data
I examined the dataset’s key patterns and relationships.
- The most frequent industry among leads turned out to be the mode of the 
industrycolumn. - I generated a correlation matrix to identify the strongest relationships among numerical features — a crucial step before modeling.
 
This helped highlight features that might have overlapping or dependent influences on the target variable.
Step 3: Feature Engineering & Splitting
To evaluate model performance fairly, I split the dataset into train (60%), validation (20%), and test (20%) sets, ensuring reproducibility with a fixed random seed.
Step 4: Understanding Feature Relationships
Using mutual information, I explored which categorical features had the strongest relationship with the target (converted). This revealed how factors like industry, employment status, and lead source contribute to conversion likelihood.
Step 5: Logistic Regression Model
After encoding all categorical features with one-hot encoding, I trained a logistic regression model with these parameters:
model = LogisticRegression(solver='liblinear', C=1.0, max_iter=1000, random_state=42)
The model achieved a validation accuracy of 0.68, which rounds to 0.64 in the grading scale.
While it might not seem like a perfect score, it provided valuable insights into which features were most predictive and where improvements could be made.
Step 6: Feature Importance & Regularization
I then ran feature elimination experiments — dropping one feature at a time (like industry, lead_score, and employment_status) to see which had the smallest impact on accuracy.
Finally, I tuned the model’s regularization strength (C) to find the best-performing setup.
Key Takeaways
This project deepened my understanding of:
- Data preprocessing and imputation
 - Feature correlation and mutual information
 - Model validation and tuning
 - The balance between model complexity and generalization
 
Final Thoughts
This assignment reinforced a key lesson — predictive modeling isn’t just about achieving high accuracy; it’s about building interpretable, actionable insights. Every model is a story told in data, and each iteration gets you closer to understanding your audience, customers, or users better.
    
Top comments (0)