DEV Community

Cover image for ๐Ÿ›ณ๏ธ Titanic Survival Prediction: A Gentle Introduction to Data Science for Beginners๐Ÿ“Š
Shamanta Sristy
Shamanta Sristy

Posted on

๐Ÿ›ณ๏ธ Titanic Survival Prediction: A Gentle Introduction to Data Science for Beginners๐Ÿ“Š

Hey there, curious minds! ๐Ÿ‘‹
If you're new to data science and machine learning like me, this post is just for you. In this blog, Iโ€™ll walk you through one of the most classic beginner projects โ€” predicting survival on the Titanic ๐Ÿšข โ€” in a way thatโ€™s super beginner-friendly. Weโ€™ll explore the steps I took, the code I wrote, and the lessons I learned, all while keeping things simple and clear.


๐Ÿ“š What's the Titanic Project?

The Titanic dataset is one of the most popular datasets used to learn data science. The goal is to predict whether a passenger survived or not based on information like their age, gender, ticket class, etc.

This project is perfect for learning how to:

  • Explore data ๐Ÿ“Š
  • Clean and prepare it ๐Ÿงน
  • Visualize patterns ๐ŸŽจ
  • Apply machine learning ๐Ÿค–

๐Ÿ“š Project Overview

This project uses the Titanic dataset to predict whether a passenger survived based on their features like age, sex, class, etc. Itโ€™s perfect for learning:

  • Data analysis and visualization
  • Handling missing data
  • Feature engineering
  • Training a basic machine learning model (Logistic Regression)

๐Ÿงฐ Tools & Technologies Used

  • Language: Python
  • Libraries: Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn
  • IDE: Jupyter Notebook

๐Ÿ” Step-by-Step Walkthrough

1. Importing Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
Enter fullscreen mode Exit fullscreen mode

2. Loading the Data

df = pd.read_csv("titanic.csv")
df.head()
Enter fullscreen mode Exit fullscreen mode

3. Checking for Missing Values

df.isnull().sum()
Enter fullscreen mode Exit fullscreen mode
  • Age: 177 missing

  • Cabin: 687 missing โ†’ dropped

  • Embarked: 2 missing โ†’ filled with mode


4. Handling Missing Data

df['Age'].fillna(df['Age'].median(), inplace=True)
df['Embarked'].fillna(df['Embarked'].mode()[0], inplace=True)
df.drop(columns='Cabin', inplace=True)
Enter fullscreen mode Exit fullscreen mode

5. Visualizing the Data

Survival by Gender

sns.countplot(x='Survived', hue='Sex', data=df)
plt.title("Survival Count by Gender")
Enter fullscreen mode Exit fullscreen mode

Survival by Passenger Class

sns.countplot(x='Survived', hue='Pclass', data=df)
plt.title("Survival Count by Class")
Enter fullscreen mode Exit fullscreen mode

6. Correlation Heatmap

plt.figure(figsize=(10, 6))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
plt.title("Correlation Heatmap")
Enter fullscreen mode Exit fullscreen mode

7. Feature Encoding

df['Sex_encoded'] = df['Sex'].map({'male': 0, 'female': 1})
df = pd.get_dummies(df, columns=['Embarked'], drop_first=True)
Enter fullscreen mode Exit fullscreen mode

8. Model Preparation

X = df[['Pclass', 'Sex_encoded', 'Age', 'Fare', 'Embarked_Q', 'Embarked_S']]
y = df['Survived']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Enter fullscreen mode Exit fullscreen mode

9. Logistic Regression Model

model = LogisticRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))
Enter fullscreen mode Exit fullscreen mode

๐Ÿง  Key Learnings

  • โœ… How to explore and clean a real-world dataset

  • โœ… Understanding visual patterns in data

  • โœ… Feature encoding and selection

  • โœ… Building a logistic regression model

  • โœ… Evaluating model accuracy and performance


๐Ÿ’ญ Reflections

This project was more than just code โ€” it helped me gain confidence in using ML tools and understand the real process behind building predictive models. I'm now more excited than ever to keep exploring!

Thanks for reading! If you're also starting out in machine learning or have suggestions for improvement, Iโ€™d love to connect and hear your thoughts ๐Ÿ’ฌ

Top comments (0)