DEV Community

Cover image for Logistic Regression Algorithm
likhitha manikonda
likhitha manikonda

Posted on • Edited on

Logistic Regression Algorithm

What Is Logistic Regression?

Logistic Regression is a supervised machine learning algorithm used for classification, not regression.
It predicts categories (like yes/no, spam/not spam, pass/fail), not numbers.
Example: Will a house sell above a certain price? Is an email spam?

How Is It Different from Linear Regression?
Linear Regression: Predicts a continuous value (e.g., house price).
Logistic Regression: Predicts a probability that something belongs to a class (e.g., probability of passing an exam).

How Does Logistic Regression Work?
It uses a mathematical function called the sigmoid to turn predictions into probabilities between 0 and 1.
If the probability is above a threshold (usually 0.5), it predicts one class; otherwise, it predicts the other. Probability is considered usually with 0.5 because the predictions are always between 0 and 1.

Real-World Example of Logistic Regression
Scenario:
Suppose you work for a bank and want to predict whether a customer will default on a loan (yes/no) based on features like income, age, and loan amount.

Features: Income, Age, Loan Amount
Target: Default (1 = Yes, 0 = No)

Logistic regression helps you predict the probability that a customer will default. If the probability is above 0.5, you predict “Yes”; otherwise, “No”. The probability is above 0.5 because the output/target is always 0 or 1.

Other common examples:

  • Predicting if an email is spam or not spam.
  • Predicting if a patient has a disease (yes/no) based on medical test results.
  • Predicting if a student will pass or fail an exam.

Why Can’t We Use Linear Regression for Problems Meant for Logistic Regression?
1. Type of Prediction
Linear Regression: Predicts a continuous number (e.g., house price, temperature).
Logistic Regression: Predicts a category/class (e.g., yes/no, spam/not spam, pass/fail).

2. Output Range
Linear Regression: Can output any number, from negative infinity to positive infinity.
Logistic Regression: Outputs a probability between 0 and 1 (using the sigmoid function), which is then used to classify into categories.

3. Example Problem
Suppose you want to predict if a student will pass or fail an exam (yes/no):
Linear Regression might give you predictions like 1.2, -0.3, 0.7, which don’t make sense for categories.
Logistic Regression gives you probabilities (e.g., 0.8 means likely to pass, 0.2 means likely to fail), and you can set a threshold (like 0.5) to decide the class.

4. Interpretation
Linear Regression: Not designed for classification; its predictions can be outside the valid range for categories.
Logistic Regression: Designed for classification; its predictions are always valid probabilities.

5. Mathematical Reason
Linear Regression: Fits a straight line.
Logistic Regression: Fits an S-shaped curve (sigmoid) that maps any input to a value between 0 and 1.

Key Points to Remember

  1. Use logistic regression when your target is a category/class.
  2. Evaluate with accuracy, precision, recall, F1-score—not RMSE or R².
  3. Visualize results to see how well your model is classifying.

🟢 Decision Boundary in Logistic Regression (Explained Simply)

When you first hear the term decision boundary, it might sound complicated. But don’t worry — it’s actually a very simple idea once you picture it.


🚪 What is a Decision Boundary?

Imagine you’re standing in front of two doors:

  • One door leads to Class A
  • The other door leads to Class B

The decision boundary is like the invisible line on the floor that tells you which door to choose. If you’re on one side of the line, you go to Class A. If you’re on the other side, you go to Class B.


📊 Logistic Regression in Action

Logistic regression is a machine learning algorithm used for classification — deciding between categories like spam vs. not spam, yes vs. no, or cat vs. dog.

  • Logistic regression looks at your data points (like emails, images, or numbers).
  • It then draws a boundary line (or curve) that separates one class from the other.
  • This boundary is based on probabilities. If the probability is greater than 0.5, it predicts one class; if less than 0.5, it predicts the other.

🎨 A Simple Example

Let’s say you want to classify fruits:

  • 🍎 Apples are red
  • 🍌 Bananas are yellow

If you plot color on a graph, logistic regression will draw a line (the decision boundary) that says:

  • Left side of the line → Apple
  • Right side of the line → Banana

That line is the decision boundary.


✨ Why It Matters

The decision boundary is important because:

  • It shows how the model makes decisions.
  • It helps us visualize classification problems.
  • It tells us where the model is uncertain (right near the boundary).

📝 Final Takeaway

Think of the decision boundary as the dividing line that logistic regression uses to separate categories. It’s like a referee drawing a line on the ground: one team plays on the left, the other on the right.

🌀 Sigmoid Function vs 🟢 Decision Boundary

🌀 Sigmoid Function

  • The sigmoid function is a mathematical curve shaped like an “S.”
  • It takes any input number (from negative infinity to positive infinity) and squashes it into a value between 0 and 1.
  • In logistic regression, this output is interpreted as a probability.
    • Example: If the sigmoid outputs 0.8, that means an 80% chance of belonging to Class A.

📏 Decision Boundary

  • The decision boundary is the line (or curve) that separates different classes in your data.
  • It’s the “cut‑off point” where the model decides:
    • If probability ≥ 0.5 → Class A
    • If probability < 0.5 → Class B
  • On a graph, this boundary is drawn where the model is equally uncertain (probability = 0.5).

🔑 Key Difference

  • The sigmoid function is the tool that converts inputs into probabilities.
  • The decision boundary is the rule or line that uses those probabilities to split data into classes.

Think of it like this:

  • Sigmoid = thermometer (gives you a reading between 0 and 1).
  • Decision boundary = threshold line (decides: if temperature ≥ 0.5, call it “hot”; otherwise “cold”).

📝 Takeaway

  • Sigmoid function: mathematical curve → outputs probabilities.
  • Decision boundary: threshold or dividing line → decides class based on those probabilities.

🧩 If this piece fit perfectly into your brain puzzle, the next one might just complete the picture. Slide on over! 🧠🧵https://dev.to/codeneuron/how-to-check-if-logistic-regression-works-for-your-dataset-1853

Top comments (0)