DEV Community

Cover image for From Raw Leads to Predictive Insights: My Logistic Regression Assignment Journey
Kenechukwu Anoliefo
Kenechukwu Anoliefo

Posted on

From Raw Leads to Predictive Insights: My Logistic Regression Assignment Journey

Step 1: Data Preparation

The first step was cleaning the dataset and handling missing values.

  • Categorical features were replaced with 'NA'
  • Numerical features were replaced with 0.0

This ensured the dataset was consistent and ready for analysis.

Step 2: Exploring the Data

I examined the dataset’s key patterns and relationships.

  • The most frequent industry among leads turned out to be the mode of the industry column.
  • I generated a correlation matrix to identify the strongest relationships among numerical features — a crucial step before modeling.

This helped highlight features that might have overlapping or dependent influences on the target variable.

Step 3: Feature Engineering & Splitting

To evaluate model performance fairly, I split the dataset into train (60%), validation (20%), and test (20%) sets, ensuring reproducibility with a fixed random seed.

Step 4: Understanding Feature Relationships

Using mutual information, I explored which categorical features had the strongest relationship with the target (converted). This revealed how factors like industry, employment status, and lead source contribute to conversion likelihood.

Step 5: Logistic Regression Model

After encoding all categorical features with one-hot encoding, I trained a logistic regression model with these parameters:

model = LogisticRegression(solver='liblinear', C=1.0, max_iter=1000, random_state=42)
Enter fullscreen mode Exit fullscreen mode

The model achieved a validation accuracy of 0.68, which rounds to 0.64 in the grading scale.

While it might not seem like a perfect score, it provided valuable insights into which features were most predictive and where improvements could be made.

Step 6: Feature Importance & Regularization

I then ran feature elimination experiments — dropping one feature at a time (like industry, lead_score, and employment_status) to see which had the smallest impact on accuracy.
Finally, I tuned the model’s regularization strength (C) to find the best-performing setup.

Key Takeaways

This project deepened my understanding of:

  • Data preprocessing and imputation
  • Feature correlation and mutual information
  • Model validation and tuning
  • The balance between model complexity and generalization

Final Thoughts

This assignment reinforced a key lesson — predictive modeling isn’t just about achieving high accuracy; it’s about building interpretable, actionable insights. Every model is a story told in data, and each iteration gets you closer to understanding your audience, customers, or users better.

Top comments (0)