Imagine you have a dataset and want to see if logistic regression is a good fit. Here’s what you do, step by step:
Typical Steps for Logistic Regression:
Find a dataset:
Example: Titanic dataset (predict survival), Disease dataset (predict diagnosis), or any dataset with a categorical target (0/1).
Load the dataset:
Use pandas: df = pd.read_csv('your_dataset.csv')
Choose features and target:
Features: columns you’ll use to predict.
Target: column with categories (e.g., 0/1).
Split into train/test sets:
Use train_test_split.
Train logistic regression:
model = LogisticRegression()
model.fit(X_train, y_train)
Make predictions and evaluate:
y_pred = model.predict(X_test)
Use confusion_matrix, classification_report, and accuracy_score.
Visualize results:
Plot confusion matrix using seaborn or matplotlib.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import matplotlib.pyplot as plt
import seaborn as sns
# 1. Load your dataset
df = pd.read_csv('your_dataset.csv') # Replace with your file name
# 2. Choose features and target
X = df[['feature1', 'feature2', 'feature3']] # Replace with your feature columns
y = df['target'] # Target should be categorical (e.g., 0 or 1)
# 3. Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 4. Train the model
model = LogisticRegression()
model.fit(X_train, y_train)
# 5. Make predictions
y_pred = model.predict(X_test)
# 6. Evaluate the model
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))
# 7. Visualize Confusion Matrix
sns.heatmap(confusion_matrix(y_test, y_pred), annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()
How to Interpret the Confusion Matrix
A confusion matrix is a table that shows how well your classification model is performing.
TN (True Negative): Model predicted No, and it was actually No.
TP (True Positive): Model predicted Yes, and it was actually Yes.
FP (False Positive): Model predicted Yes, but it was actually No.
FN (False Negative): Model predicted No, but it was actually Yes.
How to use it:
Accuracy: (TP + TN) / Total predictions
Precision: TP / (TP + FP) — How many predicted Yes were actually Yes?
Recall: TP / (TP + FN) — How many actual Yes did the model find?
F1 Score/ F-Beta: Combines precision and recall.
🌈 You’ve danced through this post like a pixelated unicorn — now gallop into the next one and keep the magic alive! 🦄✨
https://dev.to/codeneuron/decision-trees-algorithm-19ec
Top comments (0)