<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Abhijeet Pratap Singh</title>
    <description>The latest articles on DEV Community by Abhijeet Pratap Singh (@abhijeet_pratapsingh_868).</description>
    <link>https://dev.to/abhijeet_pratapsingh_868</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4010075%2Fdec69484-cdf0-4fdd-b429-1ea9c59c392a.jpg</url>
      <title>DEV Community: Abhijeet Pratap Singh</title>
      <link>https://dev.to/abhijeet_pratapsingh_868</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/abhijeet_pratapsingh_868"/>
    <language>en</language>
    <item>
      <title>Decision Trees (Supervised Learning)</title>
      <dc:creator>Abhijeet Pratap Singh</dc:creator>
      <pubDate>Wed, 01 Jul 2026 21:48:10 +0000</pubDate>
      <link>https://dev.to/abhijeet_pratapsingh_868/decision-trees-supervised-learning-2h3b</link>
      <guid>https://dev.to/abhijeet_pratapsingh_868/decision-trees-supervised-learning-2h3b</guid>
      <description>&lt;h1&gt;
  
  
  1. The Problem It Solves
&lt;/h1&gt;

&lt;p&gt;Many real-world problems don't follow a straight-line relationship.&lt;/p&gt;

&lt;p&gt;People don't make decisions by gradually increasing or decreasing something. Instead, they often make decisions based on &lt;strong&gt;conditions&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Will this customer upgrade?&lt;/li&gt;
&lt;li&gt;Is this transaction fraudulent?&lt;/li&gt;
&lt;li&gt;Should this loan be approved?&lt;/li&gt;
&lt;li&gt;Will this machine fail?&lt;/li&gt;
&lt;li&gt;Is this email spam?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The answer usually depends on a series of &lt;strong&gt;if-else rules&lt;/strong&gt;, not a mathematical equation.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If monthly spending is greater than $500 &lt;strong&gt;and&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Login frequency is less than twice a week &lt;strong&gt;and&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Support tickets are increasing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;then the customer is likely to churn.&lt;/p&gt;

&lt;p&gt;Decision Trees are designed to discover these kinds of rules automatically.&lt;/p&gt;

&lt;p&gt;Instead of fitting a line like Linear or Logistic Regression, they keep asking questions that split the data into smaller and more similar groups.&lt;/p&gt;




&lt;h1&gt;
  
  
  2. Core Intuition
&lt;/h1&gt;

&lt;p&gt;Imagine you're playing &lt;strong&gt;20 Questions&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;You're trying to guess whether a customer will upgrade their subscription.&lt;/p&gt;

&lt;p&gt;Instead of making one big guess, you ask simple Yes/No questions.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Does the customer have more than 20 seats?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If yes...&lt;/p&gt;

&lt;p&gt;Ask another question.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Are API calls greater than 500 per day?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If yes...&lt;/p&gt;

&lt;p&gt;Ask another question.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Has the account been active in the last week?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Eventually, you reach a point where almost every customer in that group behaves the same way.&lt;/p&gt;

&lt;p&gt;That final group becomes a &lt;strong&gt;Leaf Node&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Whenever a new customer arrives, you simply walk them through the same set of questions until they reach a leaf.&lt;/p&gt;

&lt;p&gt;The prediction is based on the majority of training examples that ended up there.&lt;/p&gt;




&lt;h1&gt;
  
  
  3. How the Algorithm Works
&lt;/h1&gt;

&lt;p&gt;Decision Trees are built one split at a time.&lt;/p&gt;

&lt;p&gt;At every node, the algorithm asks:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;"Which question separates the data the best?"&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It tries every feature.&lt;/p&gt;

&lt;p&gt;Then every possible split point.&lt;/p&gt;

&lt;p&gt;The split that creates the cleanest separation is chosen.&lt;/p&gt;

&lt;p&gt;This process repeats until the stopping criteria are met.&lt;/p&gt;




&lt;h1&gt;
  
  
  4. Measuring Node Purity
&lt;/h1&gt;

&lt;p&gt;To decide whether a split is good, the algorithm measures how "mixed" the classes are inside each node.&lt;/p&gt;

&lt;p&gt;One of the most common metrics is &lt;strong&gt;Gini Impurity&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;pᵢ&lt;/strong&gt; = probability of class &lt;em&gt;i&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;C&lt;/strong&gt; = total number of classes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Interpretation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Gini = 0&lt;/strong&gt; → Every sample belongs to one class (perfectly pure)&lt;/li&gt;
&lt;li&gt;Higher values → Classes are mixed together&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal is to make every leaf node as pure as possible.&lt;/p&gt;




&lt;h1&gt;
  
  
  5. Information Gain
&lt;/h1&gt;

&lt;p&gt;Every possible split is evaluated.&lt;/p&gt;

&lt;p&gt;The algorithm calculates how much impurity decreases after making that split.&lt;/p&gt;

&lt;p&gt;This decrease is called &lt;strong&gt;Information Gain&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The split with the &lt;strong&gt;highest Information Gain&lt;/strong&gt; becomes the next branch in the tree.&lt;/p&gt;

&lt;p&gt;Then the entire process repeats recursively for each child node.&lt;/p&gt;




&lt;h1&gt;
  
  
  6. When Does the Tree Stop Growing?
&lt;/h1&gt;

&lt;p&gt;If left alone, a Decision Tree keeps splitting until every training example has its own leaf.&lt;/p&gt;

&lt;p&gt;That almost always leads to overfitting.&lt;/p&gt;

&lt;p&gt;To prevent this, we usually limit tree growth using parameters like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;max_depth&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;min_samples_split&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;min_samples_leaf&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;max_leaf_nodes&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These regularization settings help the tree generalize to unseen data instead of memorizing the training set.&lt;/p&gt;




&lt;h1&gt;
  
  
  7. When Should You Use Decision Trees?
&lt;/h1&gt;

&lt;p&gt;Decision Trees work well when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Relationships are non-linear.&lt;/li&gt;
&lt;li&gt;Data contains many conditional rules.&lt;/li&gt;
&lt;li&gt;Features are a mix of numerical and categorical values.&lt;/li&gt;
&lt;li&gt;Interpretability is important.&lt;/li&gt;
&lt;li&gt;You don't want extensive preprocessing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Typical applications include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Customer churn prediction&lt;/li&gt;
&lt;li&gt;Credit approval&lt;/li&gt;
&lt;li&gt;Fraud detection&lt;/li&gt;
&lt;li&gt;Medical diagnosis&lt;/li&gt;
&lt;li&gt;Product recommendation&lt;/li&gt;
&lt;li&gt;Customer segmentation&lt;/li&gt;
&lt;li&gt;Risk assessment&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  8. Advantages
&lt;/h1&gt;

&lt;p&gt;Decision Trees have several practical benefits.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No feature scaling required.&lt;/li&gt;
&lt;li&gt;Handles numerical and categorical data.&lt;/li&gt;
&lt;li&gt;Learns non-linear relationships automatically.&lt;/li&gt;
&lt;li&gt;Easy to visualize and explain.&lt;/li&gt;
&lt;li&gt;Captures feature interactions naturally.&lt;/li&gt;
&lt;li&gt;Works well even with missing values (depending on implementation).&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  9. When It Starts Breaking Down
&lt;/h1&gt;

&lt;p&gt;Decision Trees are powerful, but they have some important weaknesses.&lt;/p&gt;

&lt;h2&gt;
  
  
  Overfitting
&lt;/h2&gt;

&lt;p&gt;The biggest problem.&lt;/p&gt;

&lt;p&gt;If the tree grows without limits, it starts memorizing the training data instead of learning real patterns.&lt;/p&gt;

&lt;p&gt;This usually results in poor performance on new data.&lt;/p&gt;




&lt;h2&gt;
  
  
  High Variance
&lt;/h2&gt;

&lt;p&gt;Decision Trees are unstable.&lt;/p&gt;

&lt;p&gt;A small change in the training data can completely change the structure of the tree.&lt;/p&gt;

&lt;p&gt;Two trees trained on almost identical datasets may look very different.&lt;/p&gt;




&lt;h2&gt;
  
  
  Greedy Decisions
&lt;/h2&gt;

&lt;p&gt;The algorithm always chooses the best split &lt;strong&gt;right now&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It never looks ahead.&lt;/p&gt;

&lt;p&gt;That means an early decision can prevent the tree from finding a better overall structure later.&lt;/p&gt;




&lt;h2&gt;
  
  
  Bias Toward Features with Many Split Points
&lt;/h2&gt;

&lt;p&gt;Continuous numerical features often have many possible split locations.&lt;/p&gt;

&lt;p&gt;Without proper controls, the algorithm may favor these features even when they aren't the most meaningful.&lt;/p&gt;




&lt;h1&gt;
  
  
  10. Python Implementation
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.tree&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DecisionTreeClassifier&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.tree&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;export_text&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;accuracy_score&lt;/span&gt;

&lt;span class="c1"&gt;# Generate sample data
&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;seed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;seat_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;api_calls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Business rule
&lt;/span&gt;&lt;span class="n"&gt;upgraded&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seat_count&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_calls&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Seat_Count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;seat_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;API_Calls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;api_calls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Upgraded&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;upgraded&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Seat_Count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;API_Calls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Upgraded&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Train Decision Tree
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DecisionTreeClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;max_depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Predictions
&lt;/span&gt;&lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Accuracy:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nf"&gt;accuracy_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;predictions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Decision Rules&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nf"&gt;export_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;feature_names&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Seat_Count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;API_Calls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h1&gt;
  
  
  11. How to Evaluate the Model
&lt;/h1&gt;

&lt;h3&gt;
  
  
  Accuracy
&lt;/h3&gt;

&lt;p&gt;Measures the percentage of correct predictions.&lt;/p&gt;

&lt;p&gt;Useful when classes are balanced.&lt;/p&gt;




&lt;h3&gt;
  
  
  Precision
&lt;/h3&gt;

&lt;p&gt;How many predicted positives were actually positive.&lt;/p&gt;




&lt;h3&gt;
  
  
  Recall
&lt;/h3&gt;

&lt;p&gt;How many actual positive cases were correctly identified.&lt;/p&gt;




&lt;h3&gt;
  
  
  F1 Score
&lt;/h3&gt;

&lt;p&gt;Balances Precision and Recall.&lt;/p&gt;

&lt;p&gt;Useful for imbalanced datasets.&lt;/p&gt;




&lt;h3&gt;
  
  
  Tree Depth
&lt;/h3&gt;

&lt;p&gt;A deeper tree isn't always better.&lt;/p&gt;

&lt;p&gt;Very deep trees usually indicate overfitting.&lt;/p&gt;




&lt;h3&gt;
  
  
  Feature Importance
&lt;/h3&gt;

&lt;p&gt;Decision Trees automatically estimate how useful each feature was during training.&lt;/p&gt;

&lt;p&gt;This helps explain which variables influenced predictions the most.&lt;/p&gt;




&lt;h1&gt;
  
  
  12. Real-World Engineering Notes
&lt;/h1&gt;

&lt;p&gt;Here are a few things you'll notice in production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Decision Trees are one of the easiest ML models to explain to non-technical teams.&lt;/li&gt;
&lt;li&gt;They require very little preprocessing.&lt;/li&gt;
&lt;li&gt;Always limit tree growth using &lt;code&gt;max_depth&lt;/code&gt; or &lt;code&gt;min_samples_leaf&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;A single Decision Tree rarely gives the best performance.&lt;/li&gt;
&lt;li&gt;Most production systems use ensembles like Random Forest or Gradient Boosting because they reduce overfitting and improve accuracy.&lt;/li&gt;
&lt;li&gt;Think of a Decision Tree as the building block for many of today's strongest machine learning algorithms.&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  13. Key Takeaways
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Decision Trees solve classification and regression problems using a series of if-else rules.&lt;/li&gt;
&lt;li&gt;They automatically discover non-linear relationships in data.&lt;/li&gt;
&lt;li&gt;The algorithm chooses splits that maximize Information Gain and reduce impurity.&lt;/li&gt;
&lt;li&gt;Easy to understand, visualize, and explain.&lt;/li&gt;
&lt;li&gt;Requires little preprocessing and no feature scaling.&lt;/li&gt;
&lt;li&gt;Can overfit easily if not regularized.&lt;/li&gt;
&lt;li&gt;Forms the foundation of Random Forests, Extra Trees, XGBoost, LightGBM, and many other ensemble methods.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>algorithms</category>
      <category>beginners</category>
      <category>datascience</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Logistic Regression (Supervised Family)</title>
      <dc:creator>Abhijeet Pratap Singh</dc:creator>
      <pubDate>Wed, 01 Jul 2026 21:31:00 +0000</pubDate>
      <link>https://dev.to/abhijeet_pratapsingh_868/logistic-regression-supervised-family-1om4</link>
      <guid>https://dev.to/abhijeet_pratapsingh_868/logistic-regression-supervised-family-1om4</guid>
      <description>&lt;h1&gt;
  
  
  1. The Problem It Solves
&lt;/h1&gt;

&lt;p&gt;Logistic Regression is used when the outcome is &lt;strong&gt;a category rather than a number&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Most commonly, it's used for &lt;strong&gt;binary classification&lt;/strong&gt;, where the answer is either &lt;strong&gt;Yes or No&lt;/strong&gt;, &lt;strong&gt;True or False&lt;/strong&gt;, or &lt;strong&gt;1 or 0&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Typical business problems include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Will a customer churn?&lt;/li&gt;
&lt;li&gt;Is this transaction fraudulent?&lt;/li&gt;
&lt;li&gt;Will a customer click an ad?&lt;/li&gt;
&lt;li&gt;Will a loan default?&lt;/li&gt;
&lt;li&gt;Is an email spam?&lt;/li&gt;
&lt;li&gt;Will a machine fail in the next 24 hours?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Unlike Linear Regression, we're not trying to predict a continuous value.&lt;/p&gt;

&lt;p&gt;Instead, we're predicting the &lt;strong&gt;probability&lt;/strong&gt; that an event belongs to a particular class.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;p&gt;A customer may have an &lt;strong&gt;82% probability of churning&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The business can then decide whether that probability is high enough to trigger an intervention.&lt;/p&gt;




&lt;h1&gt;
  
  
  2. Core Intuition
&lt;/h1&gt;

&lt;p&gt;Imagine you're trying to predict whether a customer will cancel their subscription.&lt;/p&gt;

&lt;p&gt;Suppose the only feature you have is how many times they opened your app this month.&lt;/p&gt;

&lt;p&gt;If you use a straight line like Linear Regression, the predictions quickly become unrealistic.&lt;/p&gt;

&lt;p&gt;A very active customer might end up with a &lt;strong&gt;-20% chance of churn&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A completely inactive customer could end up with &lt;strong&gt;140%&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Probabilities obviously can't work like that.&lt;/p&gt;

&lt;p&gt;To fix this, Logistic Regression takes the linear equation and passes it through a mathematical function called the &lt;strong&gt;Sigmoid Function&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Instead of producing a straight line, it creates an &lt;strong&gt;S-shaped curve&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;No matter how large or small the input becomes, the output always stays between &lt;strong&gt;0 and 1&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That makes it perfect for probability estimation.&lt;/p&gt;




&lt;h1&gt;
  
  
  3. The Mathematical Model
&lt;/h1&gt;

&lt;p&gt;The model first calculates a linear score.&lt;/p&gt;

&lt;p&gt;Instead of using that score directly, it passes it through the Sigmoid function.&lt;/p&gt;

&lt;p&gt;Where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;z&lt;/strong&gt; = linear score&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;p̂&lt;/strong&gt; = predicted probability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The final output is always between &lt;strong&gt;0 and 1&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0.08  → Very unlikely
0.32  → Low risk
0.65  → Moderate risk
0.94  → Very high probability
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Businesses can then choose a decision threshold.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Probability ≥ 0.50 → Predict Churn&lt;/li&gt;
&lt;li&gt;Probability &amp;lt; 0.50 → Predict Renewal&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That threshold doesn't have to be 0.5.&lt;/p&gt;

&lt;p&gt;Fraud detection systems often use much lower thresholds to catch more suspicious transactions.&lt;/p&gt;




&lt;h1&gt;
  
  
  4. What Is the Model Optimizing?
&lt;/h1&gt;

&lt;p&gt;Linear Regression minimizes squared error.&lt;/p&gt;

&lt;p&gt;That doesn't work well for classification.&lt;/p&gt;

&lt;p&gt;Instead, Logistic Regression minimizes &lt;strong&gt;Log Loss&lt;/strong&gt; (also called Binary Cross Entropy).&lt;/p&gt;

&lt;p&gt;Log Loss heavily penalizes predictions that are both &lt;strong&gt;wrong and confident&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;p&gt;Actual class = Fraud&lt;/p&gt;

&lt;p&gt;Prediction = 0.99 Legitimate&lt;/p&gt;

&lt;p&gt;This receives a much larger penalty than predicting 0.55.&lt;/p&gt;

&lt;p&gt;That's exactly what we want.&lt;/p&gt;

&lt;p&gt;A model should never be extremely confident when it's wrong.&lt;/p&gt;




&lt;h1&gt;
  
  
  5. How the Model Learns
&lt;/h1&gt;

&lt;p&gt;Unlike Linear Regression, there isn't a direct mathematical formula that instantly finds the best coefficients.&lt;/p&gt;

&lt;p&gt;Instead, Logistic Regression learns gradually.&lt;/p&gt;

&lt;p&gt;It starts with random weights.&lt;/p&gt;

&lt;p&gt;It makes predictions.&lt;/p&gt;

&lt;p&gt;Measures the error.&lt;/p&gt;

&lt;p&gt;Then adjusts the coefficients a little.&lt;/p&gt;

&lt;p&gt;This repeats thousands of times until the Log Loss stops improving.&lt;/p&gt;

&lt;p&gt;Gradient Descent is one of the most common optimization methods used during this process.&lt;/p&gt;




&lt;h1&gt;
  
  
  6. Decision Boundary
&lt;/h1&gt;

&lt;p&gt;Eventually, the model needs to convert probabilities into class labels.&lt;/p&gt;

&lt;p&gt;This is done using a &lt;strong&gt;decision threshold&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Predicted Probability = 0.81

Threshold = 0.50

Prediction = Churn
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Changing the threshold changes how conservative the model becomes.&lt;/p&gt;

&lt;p&gt;Lower thresholds increase recall.&lt;/p&gt;

&lt;p&gt;Higher thresholds increase precision.&lt;/p&gt;

&lt;p&gt;Choosing the right threshold depends on the business problem.&lt;/p&gt;




&lt;h1&gt;
  
  
  7. When Should You Use Logistic Regression?
&lt;/h1&gt;

&lt;p&gt;Logistic Regression works well when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The target is binary.&lt;/li&gt;
&lt;li&gt;The classes are reasonably separable.&lt;/li&gt;
&lt;li&gt;You need probability estimates.&lt;/li&gt;
&lt;li&gt;You want a fast, interpretable model.&lt;/li&gt;
&lt;li&gt;The relationship between features and the log-odds is roughly linear.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Common applications include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Customer churn prediction&lt;/li&gt;
&lt;li&gt;Fraud detection&lt;/li&gt;
&lt;li&gt;Medical diagnosis&lt;/li&gt;
&lt;li&gt;Email spam detection&lt;/li&gt;
&lt;li&gt;Credit approval&lt;/li&gt;
&lt;li&gt;Employee attrition prediction&lt;/li&gt;
&lt;li&gt;Marketing campaign response prediction&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  8. Core Assumptions
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Independent Observations
&lt;/h2&gt;

&lt;p&gt;Each training example should be independent.&lt;/p&gt;




&lt;h2&gt;
  
  
  Linear Relationship in Log-Odds
&lt;/h2&gt;

&lt;p&gt;The features should have a roughly linear relationship with the &lt;strong&gt;log-odds&lt;/strong&gt;, not necessarily with the probability itself.&lt;/p&gt;




&lt;h2&gt;
  
  
  No High Multicollinearity
&lt;/h2&gt;

&lt;p&gt;Features shouldn't contain nearly identical information.&lt;/p&gt;

&lt;p&gt;Highly correlated variables make the coefficients unstable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Limited Influence of Extreme Outliers
&lt;/h2&gt;

&lt;p&gt;Extreme feature values can heavily influence the learned coefficients.&lt;/p&gt;




&lt;h1&gt;
  
  
  9. When It Starts Breaking Down
&lt;/h1&gt;

&lt;p&gt;Logistic Regression isn't designed for every classification problem.&lt;/p&gt;

&lt;p&gt;It struggles when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Class boundaries are highly non-linear.&lt;/li&gt;
&lt;li&gt;Features interact in complex ways.&lt;/li&gt;
&lt;li&gt;Classes overlap heavily.&lt;/li&gt;
&lt;li&gt;There are many irrelevant features.&lt;/li&gt;
&lt;li&gt;One feature perfectly separates both classes (Perfect Separation).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;p&gt;Suppose every customer with more than five support tickets always churns.&lt;/p&gt;

&lt;p&gt;The coefficient for that feature can grow toward infinity, making the model unstable.&lt;/p&gt;




&lt;h1&gt;
  
  
  10. Python Implementation
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.linear_model&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LogisticRegression&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;classification_report&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;roc_auc_score&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Generate sample customer activity
&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;seed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;app_opens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;concatenate&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;normal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;normal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;churned&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;concatenate&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;zeros&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ones&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;App_Opens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;app_opens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Churned&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;churned&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;App_Opens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Churned&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Train model
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LogisticRegression&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;solver&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lbfgs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Predictions
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Probability&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict_proba&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;)[:,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Prediction&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Intercept : &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intercept_&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Coefficient : &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;coef_&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;classification_report&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Prediction&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ROC AUC:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nf"&gt;roc_auc_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Probability&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h1&gt;
  
  
  11. How to Evaluate the Model
&lt;/h1&gt;

&lt;h3&gt;
  
  
  Accuracy
&lt;/h3&gt;

&lt;p&gt;The percentage of correct predictions.&lt;/p&gt;

&lt;p&gt;Works well only when classes are balanced.&lt;/p&gt;




&lt;h3&gt;
  
  
  Precision
&lt;/h3&gt;

&lt;p&gt;Out of everything predicted as positive,&lt;/p&gt;

&lt;p&gt;how many were actually positive?&lt;/p&gt;

&lt;p&gt;Useful when false positives are expensive.&lt;/p&gt;




&lt;h3&gt;
  
  
  Recall
&lt;/h3&gt;

&lt;p&gt;Out of all actual positive cases,&lt;/p&gt;

&lt;p&gt;how many did the model find?&lt;/p&gt;

&lt;p&gt;Useful when missing a positive case is costly.&lt;/p&gt;




&lt;h3&gt;
  
  
  F1 Score
&lt;/h3&gt;

&lt;p&gt;Balances Precision and Recall.&lt;/p&gt;

&lt;p&gt;A good overall metric for imbalanced datasets.&lt;/p&gt;




&lt;h3&gt;
  
  
  ROC-AUC
&lt;/h3&gt;

&lt;p&gt;Measures how well the model separates the two classes across every possible threshold.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;1.0&lt;/strong&gt; → Perfect classifier&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;0.5&lt;/strong&gt; → Random guessing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Higher is better.&lt;/p&gt;




&lt;h1&gt;
  
  
  12. Real-World Engineering Notes
&lt;/h1&gt;

&lt;p&gt;Some practical lessons you'll run into:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Always look at predicted probabilities, not just class labels.&lt;/li&gt;
&lt;li&gt;Adjust the decision threshold based on business needs instead of blindly using 0.5.&lt;/li&gt;
&lt;li&gt;Scale numerical features when using gradient-based optimization.&lt;/li&gt;
&lt;li&gt;Logistic Regression is often the strongest baseline classifier before trying tree-based models.&lt;/li&gt;
&lt;li&gt;Highly imbalanced datasets usually need class weighting or resampling techniques.&lt;/li&gt;
&lt;li&gt;Don't rely on accuracy alone—Precision, Recall, F1 Score, and ROC-AUC usually tell a much better story.&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  13. Key Takeaways
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Logistic Regression predicts probabilities for binary classification problems.&lt;/li&gt;
&lt;li&gt;It converts a linear model into probabilities using the Sigmoid function.&lt;/li&gt;
&lt;li&gt;It learns by minimizing Log Loss instead of squared error.&lt;/li&gt;
&lt;li&gt;Fast to train, easy to interpret, and widely used in production.&lt;/li&gt;
&lt;li&gt;Produces probability scores rather than just Yes/No predictions.&lt;/li&gt;
&lt;li&gt;Works best when class boundaries are reasonably linear.&lt;/li&gt;
&lt;li&gt;A great baseline classifier before moving to Decision Trees, Random Forests, XGBoost, or Neural Networks.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>beginners</category>
      <category>datascience</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Linear Regression (Supervised Learning)</title>
      <dc:creator>Abhijeet Pratap Singh</dc:creator>
      <pubDate>Tue, 30 Jun 2026 20:37:03 +0000</pubDate>
      <link>https://dev.to/abhijeet_pratapsingh_868/linear-regression-supervisedlearning-4e93</link>
      <guid>https://dev.to/abhijeet_pratapsingh_868/linear-regression-supervisedlearning-4e93</guid>
      <description>&lt;h1&gt;
  
  
  1. The Problem It Solves
&lt;/h1&gt;

&lt;p&gt;Linear Regression is one of the simplest and most widely used machine learning algorithms for predicting &lt;strong&gt;continuous numeric values&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Whenever your target is a number rather than a category, Linear Regression is usually the first model worth trying.&lt;/p&gt;

&lt;p&gt;Some common examples include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Predicting monthly cloud infrastructure costs&lt;/li&gt;
&lt;li&gt;Estimating customer lifetime value (CLV)&lt;/li&gt;
&lt;li&gt;Forecasting next month's sales&lt;/li&gt;
&lt;li&gt;Predicting electricity consumption&lt;/li&gt;
&lt;li&gt;Estimating delivery times&lt;/li&gt;
&lt;li&gt;Predicting marketing leads based on ad spend&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The idea is simple.&lt;/p&gt;

&lt;p&gt;Given a set of input features, the model learns the relationship between them and predicts a numeric output.&lt;/p&gt;

&lt;p&gt;For example, suppose a SaaS company wants to estimate a customer's next monthly usage bill.&lt;/p&gt;

&lt;p&gt;The inputs could be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Active seats&lt;/li&gt;
&lt;li&gt;API requests&lt;/li&gt;
&lt;li&gt;Storage usage&lt;/li&gt;
&lt;li&gt;Historical consumption&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The output would be a single number:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Predicted Monthly Bill&lt;/strong&gt;&lt;/p&gt;




&lt;h1&gt;
  
  
  2. Core Intuition
&lt;/h1&gt;

&lt;p&gt;Imagine plotting every house in a city.&lt;/p&gt;

&lt;p&gt;The horizontal axis represents the size of the house.&lt;/p&gt;

&lt;p&gt;The vertical axis represents its selling price.&lt;/p&gt;

&lt;p&gt;Every house becomes a point on the graph.&lt;/p&gt;

&lt;p&gt;The points won't line up perfectly. They'll be scattered everywhere.&lt;/p&gt;

&lt;p&gt;Now imagine placing a long ruler across those points.&lt;/p&gt;

&lt;p&gt;You slowly rotate it and move it up or down until it passes through the center of the data as closely as possible.&lt;/p&gt;

&lt;p&gt;That's exactly what Linear Regression is trying to do.&lt;/p&gt;

&lt;p&gt;The model adjusts only two things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Intercept&lt;/strong&gt; — where the line starts on the Y-axis.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Slope&lt;/strong&gt; — how steep the line is.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Its goal is to find the line that produces the smallest overall prediction error.&lt;/p&gt;




&lt;h1&gt;
  
  
  3. The Mathematical Model
&lt;/h1&gt;

&lt;p&gt;Linear Regression assumes that the relationship between the input variables (&lt;strong&gt;X&lt;/strong&gt;) and the target (&lt;strong&gt;y&lt;/strong&gt;) can be represented using a straight line.&lt;/p&gt;

&lt;p&gt;Where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ŷ&lt;/strong&gt; = predicted value&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;β₀&lt;/strong&gt; = intercept&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;β₁ ... βₙ&lt;/strong&gt; = feature coefficients&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;x₁ ... xₙ&lt;/strong&gt; = input features&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every coefficient tells us how much the prediction changes when that feature increases by one unit.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;p&gt;Suppose the learned equation becomes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Predicted Leads = 50 + 0.08 × Marketing Spend&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That means every extra &lt;strong&gt;$1&lt;/strong&gt; spent on marketing increases the expected leads by &lt;strong&gt;0.08&lt;/strong&gt;, assuming everything else stays the same.&lt;/p&gt;

&lt;p&gt;This interpretability is one of the biggest reasons Linear Regression is still widely used in business.&lt;/p&gt;




&lt;h1&gt;
  
  
  4. What Is the Model Optimizing?
&lt;/h1&gt;

&lt;p&gt;Not every line fits the data equally well.&lt;/p&gt;

&lt;p&gt;Some lines pass too high.&lt;/p&gt;

&lt;p&gt;Others pass too low.&lt;/p&gt;

&lt;p&gt;Linear Regression measures the difference between the actual value and the predicted value.&lt;/p&gt;

&lt;p&gt;These differences are called &lt;strong&gt;Residual Errors&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Instead of simply adding those errors together (which would cancel positive and negative values), the model squares every error before adding them.&lt;/p&gt;

&lt;p&gt;This gives us the &lt;strong&gt;Sum of Squared Residuals (SSR)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The smaller this value becomes, the better the fitted line.&lt;/p&gt;

&lt;p&gt;The entire training process is simply trying to minimize this error.&lt;/p&gt;




&lt;h1&gt;
  
  
  5. How the Model Learns
&lt;/h1&gt;

&lt;p&gt;There are two common ways to calculate the coefficients.&lt;/p&gt;

&lt;h2&gt;
  
  
  Method 1 — Normal Equation
&lt;/h2&gt;

&lt;p&gt;For smaller datasets, Linear Regression has a direct mathematical solution.&lt;/p&gt;

&lt;p&gt;Instead of learning gradually, it computes the best coefficients in one step.&lt;/p&gt;

&lt;p&gt;Advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Exact solution&lt;/li&gt;
&lt;li&gt;No learning rate&lt;/li&gt;
&lt;li&gt;No iterations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Limitations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Computationally expensive for very large datasets&lt;/li&gt;
&lt;li&gt;Requires matrix inversion&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Method 2 — Gradient Descent
&lt;/h2&gt;

&lt;p&gt;For larger datasets, calculating the exact solution becomes expensive.&lt;/p&gt;

&lt;p&gt;Instead, the model starts with random coefficients.&lt;/p&gt;

&lt;p&gt;It then repeatedly measures the prediction error and slightly adjusts the coefficients in the direction that reduces the loss.&lt;/p&gt;

&lt;p&gt;Each update moves the model closer to the minimum error.&lt;/p&gt;

&lt;p&gt;Where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;α&lt;/strong&gt; = learning rate&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;∂J/∂β&lt;/strong&gt; = gradient of the loss function&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The process repeats until the error stops improving.&lt;/p&gt;




&lt;h1&gt;
  
  
  6. When Should You Use Linear Regression?
&lt;/h1&gt;

&lt;p&gt;Linear Regression works well when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The target is continuous.&lt;/li&gt;
&lt;li&gt;The relationship is approximately linear.&lt;/li&gt;
&lt;li&gt;You need an interpretable model.&lt;/li&gt;
&lt;li&gt;Training speed matters.&lt;/li&gt;
&lt;li&gt;You need a strong baseline before trying more advanced algorithms.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Typical applications include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Revenue prediction&lt;/li&gt;
&lt;li&gt;Cost estimation&lt;/li&gt;
&lt;li&gt;Demand forecasting&lt;/li&gt;
&lt;li&gt;Capacity planning&lt;/li&gt;
&lt;li&gt;Financial modeling&lt;/li&gt;
&lt;li&gt;Energy consumption forecasting&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  7. Core Assumptions
&lt;/h1&gt;

&lt;p&gt;Linear Regression relies on several assumptions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Linearity
&lt;/h2&gt;

&lt;p&gt;The relationship between inputs and output should roughly follow a straight line.&lt;/p&gt;




&lt;h2&gt;
  
  
  Independence
&lt;/h2&gt;

&lt;p&gt;Observations should not influence one another.&lt;/p&gt;




&lt;h2&gt;
  
  
  Homoscedasticity
&lt;/h2&gt;

&lt;p&gt;Residual errors should have roughly constant variance across all prediction levels.&lt;/p&gt;




&lt;h2&gt;
  
  
  No Multicollinearity
&lt;/h2&gt;

&lt;p&gt;Input variables should not be highly correlated with each other.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;p&gt;Using both:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Age in Years&lt;/li&gt;
&lt;li&gt;Birth Year&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;creates redundant information and makes coefficients unstable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Normally Distributed Residuals (mainly for statistical inference)
&lt;/h2&gt;

&lt;p&gt;Residual errors should be approximately normally distributed if confidence intervals or hypothesis testing are important.&lt;/p&gt;




&lt;h1&gt;
  
  
  8. When It Starts Breaking Down
&lt;/h1&gt;

&lt;p&gt;Linear Regression is powerful, but only under the right conditions.&lt;/p&gt;

&lt;p&gt;It struggles when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The relationship is curved rather than linear.&lt;/li&gt;
&lt;li&gt;A few extreme outliers dominate the data.&lt;/li&gt;
&lt;li&gt;Important variables are missing.&lt;/li&gt;
&lt;li&gt;Input features are highly correlated.&lt;/li&gt;
&lt;li&gt;The variance changes dramatically across different prediction ranges.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A common example is stock prices.&lt;/p&gt;

&lt;p&gt;Markets rarely move in a straight line, so Linear Regression usually performs poorly without additional feature engineering.&lt;/p&gt;




&lt;h1&gt;
  
  
  9. Python Implementation
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.linear_model&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LinearRegression&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;mean_squared_error&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r2_score&lt;/span&gt;

&lt;span class="c1"&gt;# Generate sample data
&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;seed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;marketing_spend&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;leads_generated&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="mi"&gt;50&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
    &lt;span class="mf"&gt;0.08&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;marketing_spend&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
    &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;normal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Marketing_Spend&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;marketing_spend&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Leads_Generated&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;leads_generated&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Marketing_Spend&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Leads_Generated&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Train model
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LinearRegression&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Predictions
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Predicted_Leads&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Evaluation
&lt;/span&gt;&lt;span class="n"&gt;rmse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nf"&gt;mean_squared_error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Predicted_Leads&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;r2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;r2_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Predicted_Leads&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Intercept : &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intercept_&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Coefficient : &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;coef_&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;RMSE : &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;rmse&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;R² Score : &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r2&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h1&gt;
  
  
  10. How to Evaluate the Model
&lt;/h1&gt;

&lt;h3&gt;
  
  
  RMSE (Root Mean Squared Error)
&lt;/h3&gt;

&lt;p&gt;Measures the average prediction error.&lt;/p&gt;

&lt;p&gt;Lower is better.&lt;/p&gt;




&lt;h3&gt;
  
  
  R² Score
&lt;/h3&gt;

&lt;p&gt;Measures how much variance the model explains.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;1.0&lt;/strong&gt; → Perfect predictions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;0.8&lt;/strong&gt; → Explains 80% of the variance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;0.0&lt;/strong&gt; → No better than predicting the average&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  11. Real-World Engineering Notes
&lt;/h1&gt;

&lt;p&gt;Some lessons you'll quickly learn in production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Linear Regression should almost always be your first baseline model.&lt;/li&gt;
&lt;li&gt;Feature engineering usually improves accuracy more than changing algorithms.&lt;/li&gt;
&lt;li&gt;Always inspect residual plots before trusting the predictions.&lt;/li&gt;
&lt;li&gt;Remove or investigate extreme outliers before training.&lt;/li&gt;
&lt;li&gt;Scale isn't required for ordinary Linear Regression, but becomes important when using Gradient Descent or regularized variants like Ridge and Lasso.&lt;/li&gt;
&lt;li&gt;Just because the R² score is high doesn't mean the assumptions are satisfied.&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  12. Key Takeaways
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;One of the simplest and most interpretable machine learning algorithms.&lt;/li&gt;
&lt;li&gt;Predicts continuous numeric values using a linear relationship.&lt;/li&gt;
&lt;li&gt;Finds the best-fitting line by minimizing squared prediction errors.&lt;/li&gt;
&lt;li&gt;Extremely fast to train and easy to explain to business stakeholders.&lt;/li&gt;
&lt;li&gt;Works best when relationships are approximately linear.&lt;/li&gt;
&lt;li&gt;Struggles with non-linear patterns, outliers, and multicollinearity.&lt;/li&gt;
&lt;li&gt;A great baseline model before moving to more advanced algorithms like Decision Trees, Random Forests, or Gradient Boosting.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>beginners</category>
      <category>datascience</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
