<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sandip Subedi </title>
    <description>The latest articles on DEV Community by Sandip Subedi  (@sandipsubedi0).</description>
    <link>https://dev.to/sandipsubedi0</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3979214%2F7d7ea629-5619-45ea-8244-efaae4c6431f.png</url>
      <title>DEV Community: Sandip Subedi </title>
      <link>https://dev.to/sandipsubedi0</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sandipsubedi0"/>
    <language>en</language>
    <item>
      <title>🤖 I Built a Semantic FAQ Bot That Understands Meaning Instead of Keywords | Project #4</title>
      <dc:creator>Sandip Subedi </dc:creator>
      <pubDate>Fri, 03 Jul 2026 13:53:09 +0000</pubDate>
      <link>https://dev.to/sandipsubedi0/i-built-a-semantic-faq-bot-that-understands-meaning-instead-of-keywords-project-4-4nbo</link>
      <guid>https://dev.to/sandipsubedi0/i-built-a-semantic-faq-bot-that-understands-meaning-instead-of-keywords-project-4-4nbo</guid>
      <description>&lt;h1&gt;
  
  
  🤖 I Built a Semantic FAQ Bot That Understands Meaning Instead of Keywords | Project #4
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Project #4 of my AI &amp;amp; Machine Learning journey&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Most beginner FAQ chatbots work only when the user's question exactly matches the stored question.&lt;/p&gt;

&lt;p&gt;Ask:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;"What is Machine Learning?"&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;and it works.&lt;/p&gt;

&lt;p&gt;But ask:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;"ML"&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;or&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;"Can you explain machine learning?"&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;and many traditional FAQ bots completely fail.&lt;/p&gt;

&lt;p&gt;I wanted to solve this problem by building a chatbot that understands the &lt;strong&gt;meaning&lt;/strong&gt; behind a question instead of simply matching keywords.&lt;/p&gt;

&lt;p&gt;That's exactly why I built my &lt;strong&gt;Semantic FAQ Bot&lt;/strong&gt;.&lt;/p&gt;




&lt;h1&gt;
  
  
  🚀 What is a Semantic FAQ Bot?
&lt;/h1&gt;

&lt;p&gt;A Semantic FAQ Bot uses &lt;strong&gt;sentence embeddings&lt;/strong&gt; instead of keyword matching.&lt;/p&gt;

&lt;p&gt;Rather than checking whether two sentences contain the same words, it converts both the user's query and every FAQ question into numerical vectors (embeddings).&lt;/p&gt;

&lt;p&gt;It then finds the FAQ whose meaning is most similar to the user's question using &lt;strong&gt;Cosine Similarity&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This allows the chatbot to understand:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;abbreviations&lt;/li&gt;
&lt;li&gt;paraphrased questions&lt;/li&gt;
&lt;li&gt;casual language&lt;/li&gt;
&lt;li&gt;differently worded queries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;without needing exact text matches.&lt;/p&gt;




&lt;h1&gt;
  
  
  🎯 Problem with Traditional FAQ Bots
&lt;/h1&gt;

&lt;p&gt;Imagine your FAQ contains:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What is Machine Learning?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A traditional bot may fail for questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ML&lt;/li&gt;
&lt;li&gt;Explain Machine Learning&lt;/li&gt;
&lt;li&gt;What does ML mean?&lt;/li&gt;
&lt;li&gt;Tell me about Machine Learning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;because none of them are exact matches.&lt;/p&gt;

&lt;p&gt;Semantic Search solves this problem beautifully.&lt;/p&gt;




&lt;h1&gt;
  
  
  🧠 How My Bot Works
&lt;/h1&gt;

&lt;p&gt;The workflow is surprisingly simple.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1
&lt;/h3&gt;

&lt;p&gt;Every FAQ question is converted into a &lt;strong&gt;384-dimensional embedding&lt;/strong&gt; using the Sentence Transformer model.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2
&lt;/h3&gt;

&lt;p&gt;When a user asks a question, that question is also converted into an embedding.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3
&lt;/h3&gt;

&lt;p&gt;The bot calculates the similarity between the user's embedding and every FAQ embedding using &lt;strong&gt;Cosine Similarity&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4
&lt;/h3&gt;

&lt;p&gt;The highest-scoring question is selected.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5
&lt;/h3&gt;

&lt;p&gt;If the similarity score is above a confidence threshold, the corresponding answer is returned.&lt;/p&gt;

&lt;p&gt;Otherwise, the bot politely says it doesn't know the answer instead of giving incorrect information.&lt;/p&gt;




&lt;h1&gt;
  
  
  ⚙️ Tech Stack
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Python&lt;/li&gt;
&lt;li&gt;Sentence Transformers&lt;/li&gt;
&lt;li&gt;all-MiniLM-L6-v2&lt;/li&gt;
&lt;li&gt;NumPy&lt;/li&gt;
&lt;li&gt;Scikit-learn&lt;/li&gt;
&lt;li&gt;Cosine Similarity&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  ✨ Features
&lt;/h1&gt;

&lt;p&gt;✅ Semantic Search instead of keyword matching&lt;/p&gt;

&lt;p&gt;✅ Confidence Score for every prediction&lt;/p&gt;

&lt;p&gt;✅ Around &lt;strong&gt;90 built-in AI, Python, Data Science and Machine Learning FAQs&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;✅ Fast response using pre-computed embeddings&lt;/p&gt;

&lt;p&gt;✅ Easily expandable knowledge base&lt;/p&gt;

&lt;p&gt;✅ Clean and beginner-friendly implementation&lt;/p&gt;




&lt;h1&gt;
  
  
  📌 Example
&lt;/h1&gt;

&lt;h3&gt;
  
  
  User asks
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ML
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Bot understands&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;What is Machine Learning?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Response&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Machine Learning is a field of AI where computers learn patterns from data.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;Another example:&lt;/p&gt;

&lt;p&gt;User asks&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;NLP stands for?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Bot correctly matches&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;What is NLP?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Even though the wording is completely different.&lt;/p&gt;




&lt;h1&gt;
  
  
  💡 What I Learned
&lt;/h1&gt;

&lt;p&gt;While building this project, I learned about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sentence Embeddings&lt;/li&gt;
&lt;li&gt;Vector Representations&lt;/li&gt;
&lt;li&gt;Semantic Search&lt;/li&gt;
&lt;li&gt;Cosine Similarity&lt;/li&gt;
&lt;li&gt;Text Similarity&lt;/li&gt;
&lt;li&gt;Efficient Embedding Reuse&lt;/li&gt;
&lt;li&gt;Confidence Thresholding&lt;/li&gt;
&lt;li&gt;Building Intelligent FAQ Systems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This project gave me a much deeper understanding of how modern AI systems retrieve relevant information.&lt;/p&gt;




&lt;h1&gt;
  
  
  🔥 Future Improvements
&lt;/h1&gt;

&lt;p&gt;This project is only the beginning.&lt;/p&gt;

&lt;p&gt;Some upgrades I plan to implement include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Loading FAQs from CSV or JSON files&lt;/li&gt;
&lt;li&gt;Integrating FAISS for large-scale vector search&lt;/li&gt;
&lt;li&gt;Building a FastAPI backend&lt;/li&gt;
&lt;li&gt;Creating a Streamlit web interface&lt;/li&gt;
&lt;li&gt;Converting it into a Retrieval-Augmented Generation (RAG) chatbot using Large Language Models&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  📚 Why This Project Matters
&lt;/h1&gt;

&lt;p&gt;Semantic Search is one of the core building blocks behind many modern AI applications.&lt;/p&gt;

&lt;p&gt;Understanding embeddings and similarity search opens the door to building:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI Chatbots&lt;/li&gt;
&lt;li&gt;Document Search Systems&lt;/li&gt;
&lt;li&gt;Recommendation Engines&lt;/li&gt;
&lt;li&gt;RAG Applications&lt;/li&gt;
&lt;li&gt;AI Knowledge Bases&lt;/li&gt;
&lt;li&gt;Enterprise Search Systems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Building this project helped me move beyond basic Machine Learning and into practical NLP applications.&lt;/p&gt;




&lt;h1&gt;
  
  
  🎯 Final Thoughts
&lt;/h1&gt;

&lt;p&gt;This is &lt;strong&gt;Project #4&lt;/strong&gt; in my AI &amp;amp; Machine Learning learning journey.&lt;/p&gt;

&lt;p&gt;Every project I build teaches me something new, and this one introduced me to the power of semantic understanding.&lt;/p&gt;

&lt;p&gt;Instead of matching words, the chatbot understands &lt;strong&gt;meaning&lt;/strong&gt;—a small but important step toward building more intelligent AI systems.&lt;/p&gt;

&lt;p&gt;There is still a long road ahead, but every project gets me closer to becoming a skilled AI Engineer.&lt;/p&gt;

&lt;p&gt;Thanks for reading!&lt;/p&gt;




&lt;h1&gt;
  
  
  👨‍💻 About Me
&lt;/h1&gt;

&lt;p&gt;Hi! I'm &lt;strong&gt;Sandip Subedi&lt;/strong&gt;, an aspiring AI &amp;amp; Machine Learning Engineer from Nepal. I'm documenting my journey by building practical projects in Python, Machine Learning, NLP, and Retrieval-Augmented Generation (RAG), sharing everything I learn along the way.&lt;/p&gt;

&lt;h2&gt;
  
  
  📬 Let's Connect
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt;&lt;a href="https://github.com/sandipsubedi0/semantic-faq-bote" rel="noopener noreferrer"&gt;https://github.com/sandipsubedi0/semantic-faq-bote&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LinkedIn:&lt;/strong&gt; &lt;a href="http://www.linkedin.com/in/sandip-subedi-5694b136a" rel="noopener noreferrer"&gt;www.linkedin.com/in/sandip-subedi-5694b136a&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hashnode:&lt;/strong&gt; &lt;a href="https://hashnode.com/edit/cmr4zr5u100000akm6uavg0oc" rel="noopener noreferrer"&gt;https://hashnode.com/edit/cmr4zr5u100000akm6uavg0oc&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Email:&lt;/strong&gt; &lt;a href="mailto:your.email@example.com"&gt;your.email@example.com&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you enjoy following real-world AI projects, feel free to connect with me. I'm always excited to learn, collaborate, and grow with the developer community. 🚀&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>nlp</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Titanic Survival Analysis — What the Data Reveals About Who Lived and Who Died</title>
      <dc:creator>Sandip Subedi </dc:creator>
      <pubDate>Thu, 11 Jun 2026 09:41:42 +0000</pubDate>
      <link>https://dev.to/sandipsubedi0/titanic-survival-analysis-what-the-data-reveals-about-who-lived-and-who-died-1o7l</link>
      <guid>https://dev.to/sandipsubedi0/titanic-survival-analysis-what-the-data-reveals-about-who-lived-and-who-died-1o7l</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhqzp0tdw22sq8zabxogb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhqzp0tdw22sq8zabxogb.png" alt=" " width="800" height="1132"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Titanic disaster of 1912 is one of the most studied events in history. Over 1,500 people lost their lives when the ship sank in the North Atlantic. But when you look at the passenger data, a clear pattern emerges — survival was not random. Your chances of surviving depended heavily on who you were and where you sat on the ship.&lt;/p&gt;

&lt;p&gt;In this project, I analyzed the Titanic passenger dataset to answer one central question: What factors determined whether a passenger survived?&lt;/p&gt;

&lt;p&gt;This is my second data analysis project. My first project was an HR Employee Attrition Analysis — if you haven't read that one yet, check it out. For this project, I followed the same structured 5-phase approach and pushed myself to go deeper with the visualizations.&lt;/p&gt;

&lt;p&gt;Full notebook on GitHub: github.com/sandipsubedi0/titanic-survival-analysis&lt;/p&gt;

&lt;p&gt;Dataset Overview&lt;br&gt;
Source: Kaggle — Titanic: Machine Learning from Disaster&lt;/p&gt;

&lt;p&gt;Rows: 891 passengers&lt;/p&gt;

&lt;p&gt;Columns: 12 features including Age, Sex, Pclass, Fare, Cabin, Embarked, and Survived&lt;/p&gt;

&lt;p&gt;Tools used: Python, Pandas, NumPy, Matplotlib, Seaborn&lt;/p&gt;

&lt;p&gt;Phase 1 — Setup and Data Loading&lt;br&gt;
I started by importing the necessary libraries and loading the dataset using a relative file path. A simple but important habit — never use a hardcoded local path like C:\Users... in a shared notebook, because it will break on every other computer.&lt;/p&gt;

&lt;p&gt;import pandas as pd&lt;br&gt;
import numpy as np&lt;br&gt;
import matplotlib.pyplot as plt&lt;br&gt;
import seaborn as sns&lt;br&gt;
%matplotlib inline&lt;br&gt;
sns.set_style("whitegrid")&lt;/p&gt;

&lt;p&gt;data = pd.read_csv("Titanic-Dataset.csv")&lt;br&gt;
data.head()&lt;/p&gt;

&lt;p&gt;First look at the data: 891 rows, 12 columns. The dataset includes passenger demographics, ticket details, cabin information, and whether they survived.&lt;/p&gt;

&lt;p&gt;Phase 2 — Data Exploration (Before Cleaning)&lt;br&gt;
Before touching anything, I explored the raw data to understand what I was working with.&lt;/p&gt;

&lt;p&gt;Missing values:&lt;/p&gt;

&lt;p&gt;Column  Missing Count   Missing %&lt;br&gt;
Cabin   687 77.1%&lt;br&gt;
Age 177 19.9%&lt;br&gt;
Embarked    2   0.2%&lt;br&gt;
The missing values heatmap made this visually clear — Cabin had a massive gap running through the entire column.&lt;/p&gt;

&lt;p&gt;I also ran data.describe() to check the numerical columns. A few things stood out immediately:&lt;/p&gt;

&lt;p&gt;Age ranged from 0.42 to 80 years — the youngest passenger was less than 1 year old&lt;/p&gt;

&lt;p&gt;Fare had a huge range — minimum 0, maximum 512 — signaling strong economic inequality among passengers&lt;/p&gt;

&lt;p&gt;Only 38% of passengers survived (Survived mean = 0.38)&lt;/p&gt;

&lt;p&gt;For value_counts(), I only checked meaningful categorical columns: Survived, Pclass, Sex, and Embarked. Columns like PassengerId, Name, and Ticket are unique identifiers — running analysis on them produces no useful insight.&lt;/p&gt;

&lt;p&gt;Phase 3 — Data Cleaning&lt;br&gt;
I made a working copy of the original data before applying any changes — always keep the raw data intact as a reference.&lt;/p&gt;

&lt;p&gt;df = data.copy()&lt;/p&gt;

&lt;p&gt;Three cleaning decisions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Age — filled with median Age had 177 missing values (19.9%). I filled with the median, not the mean. Why? Age has outliers (very young children, elderly passengers) that pull the mean away from the typical passenger. The median is more robust.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;df["Age"].fillna(df["Age"].median(), inplace=True)&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Cabin — dropped entirely 77.1% of values were missing. That's too high to fill reliably — any filling method would be guesswork on that scale. Dropping it was the right call.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;df.drop(columns=["Cabin"], inplace=True)&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Embarked — filled with mode Only 2 values missing. With such a small gap, filling with the most common value (mode) is perfectly safe.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;df["Embarked"].fillna(df["Embarked"].mode()[0], inplace=True)&lt;/p&gt;

&lt;p&gt;Verification: After cleaning, df.isnull().sum() showed zero missing values across all columns. The after-cleaning heatmap confirmed this — completely blank, exactly what we want to see.&lt;/p&gt;

&lt;p&gt;Phase 4 — Exploratory Data Analysis and Visualizations&lt;br&gt;
This is where the real story begins. I built 7 charts, each designed to answer a specific question.&lt;/p&gt;

&lt;p&gt;Chart 1 — Overall Survival Count&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzqive1hfx50w7ypwcze2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzqive1hfx50w7ypwcze2.png" alt=" " width="670" height="488"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The first question: how many people actually survived?&lt;/p&gt;

&lt;p&gt;Out of 891 passengers, 342 survived (38.4%) and 549 did not (61.6%). More than 6 in 10 people on the Titanic did not make it. That sets the baseline for everything that follows.&lt;/p&gt;

&lt;p&gt;Chart 2 — Survival Rate by Gender&lt;br&gt;
This is where the data gets striking.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp7t4at0ph0o7tfqijuze.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp7t4at0ph0o7tfqijuze.png" alt=" " width="670" height="491"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Female survival rate: ~74%&lt;/p&gt;

&lt;p&gt;Male survival rate: ~19%&lt;/p&gt;

&lt;p&gt;Women were nearly 4x more likely to survive than men. This is the clearest pattern in the entire dataset. The "women and children first" evacuation protocol was not just a phrase — the data confirms it was actually followed.&lt;/p&gt;

&lt;p&gt;I used a bar chart here (not a pie chart). Survival rates are separate values for two groups — they don't add to 100% of anything, so a pie chart would be misleading.&lt;/p&gt;

&lt;p&gt;Chart 3 — Survival Rate by Passenger Class&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwmfjpyvn6rn61xh3csg2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwmfjpyvn6rn61xh3csg2.png" alt=" " width="668" height="487"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Passenger class tells us where on the ship you were located — and how close you were to the lifeboats.&lt;/p&gt;

&lt;p&gt;1st Class: ~63% survival rate&lt;/p&gt;

&lt;p&gt;2nd Class: ~47% survival rate&lt;/p&gt;

&lt;p&gt;3rd Class: ~24% survival rate&lt;/p&gt;

&lt;p&gt;The survival gap between 1st and 3rd class is enormous. Third-class passengers were housed in the lower decks — further from the lifeboats and with less time to reach the top deck. Economic status directly influenced survival chances.&lt;/p&gt;

&lt;p&gt;Chart 4 — Age Distribution of All Passengers&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmjfgqw3bkotzqq1fovao.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmjfgqw3bkotzqq1fovao.png" alt=" " width="676" height="486"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A histogram of all passenger ages shows a right-skewed distribution. Most passengers were between 20 and 40 years old. There were relatively few children and elderly passengers compared to working-age adults.&lt;/p&gt;

&lt;p&gt;The youngest passenger recorded was under 1 year old. The oldest was 80.&lt;/p&gt;

&lt;p&gt;Chart 5 — Age by Survival (Overlapping Histogram)&lt;br&gt;
This is one of the most informative charts in the project. By plotting two histograms on the same axes — one for survivors, one for non-survivors — with alpha=0.6 on both so they're visible through each other, the overlap pattern becomes clear.&lt;/p&gt;

&lt;p&gt;plt.hist(df[df["Survived"]==1]["Age"], alpha=0.6, label="Survived", bins=20)&lt;br&gt;
plt.hist(df[df["Survived"]==0]["Age"], alpha=0.6, label="Did not survive", bins=20)&lt;br&gt;
plt.legend()&lt;/p&gt;

&lt;p&gt;Young children (under ~10) show a higher proportion of survivors relative to non-survivors — consistent with "children first." For adults aged 20–40, non-survivors heavily outnumber survivors, reflecting the large number of 3rd-class male passengers in that age group.&lt;/p&gt;

&lt;p&gt;Chart 6 — Fare Distribution&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fddcx0dsmw1m8jpj5dj77.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fddcx0dsmw1m8jpj5dj77.png" alt=" " width="669" height="495"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The fare histogram reveals extreme economic inequality on board. The distribution is heavily right-skewed — the vast majority of passengers paid low fares (under £50), while a small number paid extremely high amounts (up to 512).&lt;/p&gt;

&lt;p&gt;This roughly maps to passenger class: 3rd-class passengers paid low fares, 1st-class passengers paid high fares. And as we saw in Chart 3, class directly correlated with survival.&lt;/p&gt;

&lt;p&gt;Chart 7 — Gender × Class Survival Heatmap&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8pwlawpm6u71c4osf148.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8pwlawpm6u71c4osf148.png" alt=" " width="672" height="571"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is the most powerful chart in the project. Instead of looking at gender and class separately, I combined them into a single heatmap using a pivot table.&lt;/p&gt;

&lt;p&gt;pivot = df.pivot_table(values="Survived", index="Sex", columns="Pclass", aggfunc="mean")&lt;br&gt;
sns.heatmap(pivot, annot=True, fmt=".2f", cmap="Blues")&lt;/p&gt;

&lt;p&gt;The results:&lt;/p&gt;

&lt;p&gt;1st Class   2nd Class   3rd Class&lt;br&gt;
Female  ~0.97   ~0.92   ~0.50&lt;br&gt;
Male    ~0.37   ~0.16   ~0.14&lt;br&gt;
1st-class females had a ~97% survival rate. They were almost certain to survive. 3rd-class males had a ~14% survival rate. They had almost no chance.&lt;/p&gt;

&lt;p&gt;The difference between those two groups is 83 percentage points — from the same disaster, on the same ship, at the same time.&lt;/p&gt;

&lt;p&gt;Phase 5 — Key Findings and Conclusion&lt;br&gt;
Key Findings&lt;br&gt;
Only 38% of passengers survived — the majority of people on board did not make it.&lt;/p&gt;

&lt;p&gt;Gender was the strongest single factor — female passengers survived at ~74% vs ~19% for males, confirming the "women and children first" evacuation protocol was followed.&lt;/p&gt;

&lt;p&gt;Passenger class determined access to lifeboats — 1st class survived at ~63%, 3rd class at only ~24%. Where you sat on the ship directly affected your survival.&lt;/p&gt;

&lt;p&gt;The combined effect was extreme — 1st-class females had a ~97% survival rate while 3rd-class males had only ~14%. The gap between best and worst case is 83 percentage points.&lt;/p&gt;

&lt;p&gt;Children showed higher survival rates — the overlapping age histogram showed young children were more likely to survive relative to adults.&lt;/p&gt;

&lt;p&gt;Fare inequality mirrored class inequality — most passengers paid very little, a few paid enormous amounts, and higher fare strongly correlated with higher survival.&lt;/p&gt;

&lt;p&gt;Conclusion&lt;br&gt;
The Titanic data tells a clear story: survival was not random. Gender and passenger class were the two dominant factors, and when combined, they produced an extreme range of outcomes. A 1st-class female passenger had near-certain survival. A 3rd-class male passenger had almost no chance.&lt;/p&gt;

&lt;p&gt;The "women and children first" protocol was real — the data proves it. But access to the upper decks, proximity to lifeboats, and crew assistance were all filtered through socioeconomic status. Wealthier passengers had structural advantages that translated directly into survival.&lt;/p&gt;

&lt;p&gt;This project taught me how to move from raw data to real insight — not just running code, but understanding what the numbers actually mean about human lives.&lt;/p&gt;

&lt;p&gt;What's Next&lt;br&gt;
This is Project 2 of my data analyst portfolio. I'm continuing to build projects that cover real-world datasets and develop my skills in Python, Pandas, and visualization.&lt;/p&gt;

&lt;p&gt;Connect with me:&lt;/p&gt;

&lt;p&gt;🔗 GitHub: github.com/sandipsubedi0&lt;/p&gt;

&lt;p&gt;💼 LinkedIn: linkedin.com/in/sandip-subedi-5694b136a&lt;/p&gt;

&lt;p&gt;📸 Instagram: &lt;a class="mentioned-user" href="https://dev.to/sandipsubedi0"&gt;@sandipsubedi0&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Thanks for reading. If you found this useful, share it with someone learning data analysis.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>data</category>
      <category>analyst</category>
    </item>
  </channel>
</rss>
