DEV Community

Cover image for 🚢 Titanic App Streamlit "Machine Learning Scikit Learn-Random Forest"
Sofiane Chehboune
Sofiane Chehboune

Posted on

🚢 Titanic App Streamlit "Machine Learning Scikit Learn-Random Forest"

🚢 I built a web app to explore the Titanic dataset and predict your chances of survival using Streamlit and Scikit-Learn.

Hello DEV Community!

I’m excited to share my latest project: an interactive web application built with Streamlit that explores the famous Titanic dataset. Not only does it let you visualize key passenger statistics, but it also includes a Machine Learning model (RandomForest) to predict survival chances based on features you choose.

✨ Demo & Source Code

Try the app live: Titanic Survival Predictor (Streamlit link)

Explore the source on Rep GitHub

⭐ Feel free to leave a star on the repo if you like the project!
💡 Pro tip: Create a short GIF of your app in action and host it on a site like Imgur to embed it here. It’s super effective!

🛠️ Tech Stack

Here’s what I used:

Streamlit: for quickly building the web interface (pure Python, no HTML/CSS/JS needed)

Pandas: for data manipulation and cleaning

Scikit-Learn: training the RandomForestClassifier, which performs well out of the box

Plotly & Seaborn: for interactive and static visualizations

🔬 How It Works

The project is broken down into three main steps:

1. Data Exploration & Visualization (EDA)
The first part of the app is dedicated to exploratory analysis. I used Plotly and Seaborn to build visualizations that answer questions such as:

  • Did gender impact survival?

  • Did ticket class play a role?

How was age distributed among survivors versus non-survivors?

Here’s a small code snippet showing how I displayed a pie chart of survival by gender:

import plotly.express as px

# Création d'un DataFrame pour le graphique
gender_survival = df.groupby('Sex')['Survived'].value_counts(normalize=True).unstack().fillna(0)
gender_survival = (gender_survival * 100).round(2)

# Affichage avec Plotly
fig = px.pie(    names=['Non-survivants', 'Survivants'],
    values=gender_survival.loc[selected_gender],
    title=f"Pourcentage de survie pour le sexe : {selected_gender}"
)st.plotly_chart(fig)
Enter fullscreen mode Exit fullscreen mode

2. Training the Machine Learning Model

I chose a RandomForestClassifier
for its robustness and strong performance out of the box. The workflow is fairly standard:

Data Cleaning: Replace missing values (for example, fill missing ages with the median age).

Feature Engineering: Convert categorical variables (such as gender or port of embarkation) into numerical values.

Training: Split the data into training and test sets, then train the model on the training data.

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

# Définition des features (X) et de la cible (y)
features = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare']
X = df[features]
y = df['Survived']

# Encodage des variables catégorielles
X = pd.get_dummies(X, columns=['Sex'], drop_first=True)

# Division des données et entraînement
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)```





#### 3. Création de l'Interface de Prédiction

C'est là que la magie de Streamlit opère. J'ai utilisé la barre latérale (`st.sidebar`) pour créer un formulaire où l'utilisateur peut entrer ses propres données (classe, sexe, âge, etc.).





```python
# Sidebar pour les inputs de l'utilisateur
st.sidebar.header("Faites votre propre prédiction :")
pclass = st.sidebar.selectbox('Classe du billet', [1, 2, 3])
sex = st.sidebar.selectbox('Sexe', ['male', 'female'])
age = st.sidebar.slider('Âge', 0, 100, 25)
# ... autres inputs

# Prédiction en temps réel
if st.sidebar.button("Prédire ma survie"):
    # Créer un DataFrame avec les inputs de l'utilisateur
    user_input = pd.DataFrame(...) 

    # Faire la prédiction
    prediction = model.predict(user_input)

    # Afficher le résultat
    if prediction[0] == 1:
        st.sidebar.success("Félicitations ! Vous auriez probablement survécu !")
    else:
        st.sidebar.error("Désolé... Vous n'auriez probablement pas survécu.")
Enter fullscreen mode Exit fullscreen mode

🏁 Conclusion

This project was a great opportunity to bring together data analysis, machine learning, and web development in a very accessible way. Streamlit keeps impressing me with how easy it makes building interactive data apps.

I’d love to hear your feedback! What do you think? Are there any visualizations or features you’d like to see added? Thanks for reading!

❤️

Top comments (0)