<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Loi2008</title>
    <description>The latest articles on DEV Community by Loi2008 (@loi2008).</description>
    <link>https://dev.to/loi2008</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3407241%2F1cee6cba-91c1-4dd3-a88d-5e681e2f0b95.jpeg</url>
      <title>DEV Community: Loi2008</title>
      <link>https://dev.to/loi2008</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/loi2008"/>
    <language>en</language>
    <item>
      <title>Predicting Survival on the Titanic: A Machine Learning Approach</title>
      <dc:creator>Loi2008</dc:creator>
      <pubDate>Sun, 05 Oct 2025 09:44:09 +0000</pubDate>
      <link>https://dev.to/loi2008/predicting-survival-on-the-titanic-a-machine-learning-approach-24a5</link>
      <guid>https://dev.to/loi2008/predicting-survival-on-the-titanic-a-machine-learning-approach-24a5</guid>
      <description>&lt;h2&gt;
  
  
  1. Introduction
&lt;/h2&gt;

&lt;p&gt;One of the most tragic maritime catastrophes in history, the RMS Titanic sinking in 1912 claimed a great number of lives.  In addition to the disaster's immense scope, the intricate interactions between various elements that affected survival rates have long captivated scholars.  &lt;/p&gt;

&lt;p&gt;In order to create a predictive model, this research will examine the Titanic dataset, which is a comprehensive collection of passenger data.  We aim to identify the primary factors influencing survival and develop a strong model that can forecast a person's chances of surviving a disaster by examining characteristics like age, gender, passenger class, among others.&lt;/p&gt;

&lt;p&gt;In addition to illuminating past trends, the knowledge gathered from this analysis will show how effective machine learning is at deriving insightful forecasts from intricate real-world data.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Understanding the Dataset
&lt;/h2&gt;

&lt;p&gt;The dataset used (&lt;a href="https://github.com/Loi2008/Data_Science_Assignments/blob/main/tested.csv" rel="noopener noreferrer"&gt;https://github.com/Loi2008/Data_Science_Assignments/blob/main/tested.csv&lt;/a&gt;) contains information about Titanic passengers. The goal is to develop a prediction model that predicts the likelihood of survival (using Survived column - 0 = No, 1 = Yes), based on other features. The dataset contains the following columns: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PassengerId:&lt;/strong&gt; Unique identifier for each passenger.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Survived:&lt;/strong&gt; Survival (0 = No, 1 = Yes) - &lt;em&gt;&lt;strong&gt;Target variable&lt;/strong&gt;&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pclass:&lt;/strong&gt; Ticket class (1 = 1st, 2 = 2nd, 3 = 3rd).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Name:&lt;/strong&gt; Passenger's name.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sex:&lt;/strong&gt; Sex (male/female).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Age:&lt;/strong&gt; Age in years.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SibSp:&lt;/strong&gt; Number of siblings/spouses aboard.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parch:&lt;/strong&gt; Number of parents/children aboard.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ticket:&lt;/strong&gt; Ticket number.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fare:&lt;/strong&gt; Passenger fare.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cabin:&lt;/strong&gt; Cabin number.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embarked:&lt;/strong&gt; Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  3. Building the Model
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Goal:&lt;/em&gt;&lt;/strong&gt; Predicting the passengers' survival likelihood using 'Survived' column based on other passenger features. Below are the steps involved, together with the Python code for each step:&lt;/p&gt;

&lt;h3&gt;
  
  
  a. Download all the Required Libraries
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler, OneHotEncoder 
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  b. Suppress Warnings for Cleaner Output
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;warnings.filterwarnings('ignore')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  c. Load the Data
&lt;/h3&gt;

&lt;p&gt;Read the CSV file into a pandas DataFrame.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;df = pd.read_csv (r"tested.csv")
df.head()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;em&gt;Output&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;PassengerId&lt;/th&gt;
&lt;th&gt;Survived&lt;/th&gt;
&lt;th&gt;Pclass&lt;/th&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Sex&lt;/th&gt;
&lt;th&gt;Age&lt;/th&gt;
&lt;th&gt;SibSp&lt;/th&gt;
&lt;th&gt;Parch&lt;/th&gt;
&lt;th&gt;Ticket&lt;/th&gt;
&lt;th&gt;Fare&lt;/th&gt;
&lt;th&gt;Cabin&lt;/th&gt;
&lt;th&gt;Embarked&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;892&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Kelly, Mr. James&lt;/td&gt;
&lt;td&gt;male&lt;/td&gt;
&lt;td&gt;34.5&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;330911&lt;/td&gt;
&lt;td&gt;7.8292&lt;/td&gt;
&lt;td&gt;NaN&lt;/td&gt;
&lt;td&gt;Q&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;893&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Wilkes, Mrs. James (Ellen Needs)&lt;/td&gt;
&lt;td&gt;female&lt;/td&gt;
&lt;td&gt;47.0&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;363272&lt;/td&gt;
&lt;td&gt;7.0000&lt;/td&gt;
&lt;td&gt;NaN&lt;/td&gt;
&lt;td&gt;S&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;894&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Myles, Mr. Thomas Francis&lt;/td&gt;
&lt;td&gt;male&lt;/td&gt;
&lt;td&gt;62.0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;240276&lt;/td&gt;
&lt;td&gt;9.6875&lt;/td&gt;
&lt;td&gt;NaN&lt;/td&gt;
&lt;td&gt;Q&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;895&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Wirz, Mr. Albert&lt;/td&gt;
&lt;td&gt;male&lt;/td&gt;
&lt;td&gt;27.0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;315154&lt;/td&gt;
&lt;td&gt;8.6625&lt;/td&gt;
&lt;td&gt;NaN&lt;/td&gt;
&lt;td&gt;S&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;896&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Hirvonen, Mrs. Alexander (Helga E Lindqvist)&lt;/td&gt;
&lt;td&gt;female&lt;/td&gt;
&lt;td&gt;22.0&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;3101298&lt;/td&gt;
&lt;td&gt;12.2875&lt;/td&gt;
&lt;td&gt;NaN&lt;/td&gt;
&lt;td&gt;S&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  d. Exploratory Data Analysis (EDA)
&lt;/h3&gt;

&lt;p&gt;Understand the data (data types, missing values and distributions).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;df.info()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;em&gt;Output&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
&lt;br&gt;
RangeIndex: 418 entries, 0 to 417&lt;br&gt;
Data columns (total 12 columns):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Column&lt;/th&gt;
&lt;th&gt;Non-Null Count&lt;/th&gt;
&lt;th&gt;Dtype&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;PassengerId&lt;/td&gt;
&lt;td&gt;418 non-null&lt;/td&gt;
&lt;td&gt;int64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Survived&lt;/td&gt;
&lt;td&gt;418 non-null&lt;/td&gt;
&lt;td&gt;int64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Pclass&lt;/td&gt;
&lt;td&gt;418 non-null&lt;/td&gt;
&lt;td&gt;int64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Name&lt;/td&gt;
&lt;td&gt;418 non-null&lt;/td&gt;
&lt;td&gt;object&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Sex&lt;/td&gt;
&lt;td&gt;418 non-null&lt;/td&gt;
&lt;td&gt;object&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Age&lt;/td&gt;
&lt;td&gt;332 non-null&lt;/td&gt;
&lt;td&gt;float64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;SibSp&lt;/td&gt;
&lt;td&gt;418 non-null&lt;/td&gt;
&lt;td&gt;int64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Parch&lt;/td&gt;
&lt;td&gt;418 non-null&lt;/td&gt;
&lt;td&gt;int64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;Ticket&lt;/td&gt;
&lt;td&gt;418 non-null&lt;/td&gt;
&lt;td&gt;object&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Fare&lt;/td&gt;
&lt;td&gt;417 non-null&lt;/td&gt;
&lt;td&gt;float64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;Cabin&lt;/td&gt;
&lt;td&gt;91 non-null&lt;/td&gt;
&lt;td&gt;object&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;Embarked&lt;/td&gt;
&lt;td&gt;418 non-null&lt;/td&gt;
&lt;td&gt;object&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;dtypes: float64(2), int64(5), object(5)&lt;br&gt;
memory usage: 39.3+ KB&lt;/p&gt;

&lt;p&gt;Handling missing values&lt;br&gt;
Number of missing values for each column&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;missing = df.isnull().sum()
non_null = df.notnull().sum()
total = len(df)

# Build summary DataFrame
summary = pd.DataFrame({
    "Non-Null Count": non_null,
    "Missing Values": missing,
    "Missing %": (missing / total * 100).round(1),
    "Dtype": df.dtypes
})

# Rename index to 'Column'
summary.index.name = "Column"

# Display
summary
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;em&gt;Output&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Column&lt;/th&gt;
&lt;th&gt;Non-Null Count&lt;/th&gt;
&lt;th&gt;Missing Values&lt;/th&gt;
&lt;th&gt;Missing %&lt;/th&gt;
&lt;th&gt;Dtype&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;PassengerId&lt;/td&gt;
&lt;td&gt;418&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0.0%&lt;/td&gt;
&lt;td&gt;int64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Survived&lt;/td&gt;
&lt;td&gt;418&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0.0%&lt;/td&gt;
&lt;td&gt;int64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pclass&lt;/td&gt;
&lt;td&gt;418&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0.0%&lt;/td&gt;
&lt;td&gt;int64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Name&lt;/td&gt;
&lt;td&gt;418&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0.0%&lt;/td&gt;
&lt;td&gt;object&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sex&lt;/td&gt;
&lt;td&gt;418&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0.0%&lt;/td&gt;
&lt;td&gt;object&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Age&lt;/td&gt;
&lt;td&gt;332&lt;/td&gt;
&lt;td&gt;86&lt;/td&gt;
&lt;td&gt;20.6%&lt;/td&gt;
&lt;td&gt;float64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SibSp&lt;/td&gt;
&lt;td&gt;418&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0.0%&lt;/td&gt;
&lt;td&gt;int64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Parch&lt;/td&gt;
&lt;td&gt;418&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0.0%&lt;/td&gt;
&lt;td&gt;int64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ticket&lt;/td&gt;
&lt;td&gt;418&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0.0%&lt;/td&gt;
&lt;td&gt;object&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fare&lt;/td&gt;
&lt;td&gt;417&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0.2%&lt;/td&gt;
&lt;td&gt;float64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cabin&lt;/td&gt;
&lt;td&gt;91&lt;/td&gt;
&lt;td&gt;327&lt;/td&gt;
&lt;td&gt;78.2%&lt;/td&gt;
&lt;td&gt;object&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Embarked&lt;/td&gt;
&lt;td&gt;418&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0.0%&lt;/td&gt;
&lt;td&gt;object&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Creating a copy of the dataset to avoid Error when modifying the original dataset (df) later .&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;df_processed = df.copy()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The dataset statistical distribution&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Output&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Survived&lt;/th&gt;
&lt;th&gt;Pclass&lt;/th&gt;
&lt;th&gt;Sex&lt;/th&gt;
&lt;th&gt;Age&lt;/th&gt;
&lt;th&gt;SibSp&lt;/th&gt;
&lt;th&gt;Parch&lt;/th&gt;
&lt;th&gt;Fare&lt;/th&gt;
&lt;th&gt;Embarked_C&lt;/th&gt;
&lt;th&gt;Embarked_Q&lt;/th&gt;
&lt;th&gt;Embarked_S&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;count&lt;/td&gt;
&lt;td&gt;418.0000&lt;/td&gt;
&lt;td&gt;418.0000&lt;/td&gt;
&lt;td&gt;418.0000&lt;/td&gt;
&lt;td&gt;418.0000&lt;/td&gt;
&lt;td&gt;418.0000&lt;/td&gt;
&lt;td&gt;418.0000&lt;/td&gt;
&lt;td&gt;418.0000&lt;/td&gt;
&lt;td&gt;418.0000&lt;/td&gt;
&lt;td&gt;418.0000&lt;/td&gt;
&lt;td&gt;418.0000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;mean&lt;/td&gt;
&lt;td&gt;0.3636&lt;/td&gt;
&lt;td&gt;2.2656&lt;/td&gt;
&lt;td&gt;0.6364&lt;/td&gt;
&lt;td&gt;29.5993&lt;/td&gt;
&lt;td&gt;0.4474&lt;/td&gt;
&lt;td&gt;0.3923&lt;/td&gt;
&lt;td&gt;35.5765&lt;/td&gt;
&lt;td&gt;0.2440&lt;/td&gt;
&lt;td&gt;0.1100&lt;/td&gt;
&lt;td&gt;0.6459&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;std&lt;/td&gt;
&lt;td&gt;0.4816&lt;/td&gt;
&lt;td&gt;0.8418&lt;/td&gt;
&lt;td&gt;0.4816&lt;/td&gt;
&lt;td&gt;12.7038&lt;/td&gt;
&lt;td&gt;0.8968&lt;/td&gt;
&lt;td&gt;0.9814&lt;/td&gt;
&lt;td&gt;55.8501&lt;/td&gt;
&lt;td&gt;0.4300&lt;/td&gt;
&lt;td&gt;0.3133&lt;/td&gt;
&lt;td&gt;0.4788&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;min&lt;/td&gt;
&lt;td&gt;0.0000&lt;/td&gt;
&lt;td&gt;1.0000&lt;/td&gt;
&lt;td&gt;0.0000&lt;/td&gt;
&lt;td&gt;0.1700&lt;/td&gt;
&lt;td&gt;0.0000&lt;/td&gt;
&lt;td&gt;0.0000&lt;/td&gt;
&lt;td&gt;0.0000&lt;/td&gt;
&lt;td&gt;0.0000&lt;/td&gt;
&lt;td&gt;0.0000&lt;/td&gt;
&lt;td&gt;0.0000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;25%&lt;/td&gt;
&lt;td&gt;0.0000&lt;/td&gt;
&lt;td&gt;1.0000&lt;/td&gt;
&lt;td&gt;0.0000&lt;/td&gt;
&lt;td&gt;23.0000&lt;/td&gt;
&lt;td&gt;0.0000&lt;/td&gt;
&lt;td&gt;0.0000&lt;/td&gt;
&lt;td&gt;7.8958&lt;/td&gt;
&lt;td&gt;0.0000&lt;/td&gt;
&lt;td&gt;0.0000&lt;/td&gt;
&lt;td&gt;0.0000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;50%&lt;/td&gt;
&lt;td&gt;0.0000&lt;/td&gt;
&lt;td&gt;3.0000&lt;/td&gt;
&lt;td&gt;1.0000&lt;/td&gt;
&lt;td&gt;27.0000&lt;/td&gt;
&lt;td&gt;0.0000&lt;/td&gt;
&lt;td&gt;0.0000&lt;/td&gt;
&lt;td&gt;14.4542&lt;/td&gt;
&lt;td&gt;0.0000&lt;/td&gt;
&lt;td&gt;0.0000&lt;/td&gt;
&lt;td&gt;1.0000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;75%&lt;/td&gt;
&lt;td&gt;1.0000&lt;/td&gt;
&lt;td&gt;3.0000&lt;/td&gt;
&lt;td&gt;1.0000&lt;/td&gt;
&lt;td&gt;35.7500&lt;/td&gt;
&lt;td&gt;1.0000&lt;/td&gt;
&lt;td&gt;0.0000&lt;/td&gt;
&lt;td&gt;31.4719&lt;/td&gt;
&lt;td&gt;0.0000&lt;/td&gt;
&lt;td&gt;0.0000&lt;/td&gt;
&lt;td&gt;1.0000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;max&lt;/td&gt;
&lt;td&gt;1.0000&lt;/td&gt;
&lt;td&gt;3.0000&lt;/td&gt;
&lt;td&gt;1.0000&lt;/td&gt;
&lt;td&gt;76.0000&lt;/td&gt;
&lt;td&gt;8.0000&lt;/td&gt;
&lt;td&gt;9.0000&lt;/td&gt;
&lt;td&gt;512.3292&lt;/td&gt;
&lt;td&gt;1.0000&lt;/td&gt;
&lt;td&gt;1.0000&lt;/td&gt;
&lt;td&gt;1.0000&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Impute the missing values with median - Impute() usedinstead of fillna() due to its robustness. The median is preferred over the mean because the data is skewed. From the statistical distribution results above the means for "Age" and "Fare" are not equal to the median hence the data is skewed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;imputer_age = SimpleImputer(strategy='median')
df_processed['Age'] = imputer_age.fit_transform(df_processed[['Age']])
imputer_fare = SimpleImputer(strategy='median')
df_processed['Fare'] = imputer_fare.fit_transform(df_processed[['Fare']])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Dropping the column Cabin. This is because it has a high percentage of missing values (78.2%), and directly using it often requires complex feature engineering.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;df_processed = df_processed.drop('Cabin', axis=1)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Dropping irrelevant columns.&lt;br&gt;
PassengerId, Name, Ticket are usually unique identifiers or free-text fields that don't directly contribute to survival prediction for basic models.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;df_processed = df_processed.drop(['PassengerId', 'Name', 'Ticket'], axis=1)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Encode categorical columns - 'Sex' and 'Embarked'.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Sex:&lt;/em&gt;&lt;/strong&gt; It's a binary categorical feature (male, female). LabelEncoder converts male to 1 and female to 0 for numerical models to understand.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Embarked:&lt;/em&gt;&lt;/strong&gt; Has three categories (S, C, Q). OneHotEncoder is used to convert this into separate binary columns - Embarked_C, Embarked_Q, Embarked_S. This will prevents the model from assuming an ordinal relationship between the categories.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# 'Sex': Use LabelEncoder (binary feature)
le = LabelEncoder()
df_processed['Sex'] = le.fit_transform(df_processed['Sex']) # male=1, female=0. Or male=0, female=1 depending on internal sorting.

# 'Embarked': Use OneHotEncoder
print("Unique Embarked values before OneHotEncoding:", df_processed['Embarked'].unique())
ohe = OneHotEncoder(handle_unknown='ignore', sparse_output=False)
embarked_encoded = ohe.fit_transform(df_processed[['Embarked']])
embarked_df = pd.DataFrame(embarked_encoded, columns=ohe.get_feature_names_out(['Embarked']), index=df_processed.index)
df_processed = pd.concat([df_processed.drop('Embarked', axis=1), embarked_df], axis=1)

print("DataFrame after preprocessing. First 5 rows:")
print(df_processed.head())
print("\n")
print("DataFrame Info after preprocessing:")
df_processed.info()
print("\n")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;em&gt;Output&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Survived&lt;/th&gt;
&lt;th&gt;Pclass&lt;/th&gt;
&lt;th&gt;Sex&lt;/th&gt;
&lt;th&gt;Age&lt;/th&gt;
&lt;th&gt;SibSp&lt;/th&gt;
&lt;th&gt;Parch&lt;/th&gt;
&lt;th&gt;Fare&lt;/th&gt;
&lt;th&gt;Embarked_C&lt;/th&gt;
&lt;th&gt;Embarked_Q&lt;/th&gt;
&lt;th&gt;Embarked_S&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;34.5&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;7.8292&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;td&gt;1.0&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;47.0&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;7.0000&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;td&gt;1.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;62.0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;9.6875&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;td&gt;1.0&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;27.0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;8.6625&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;td&gt;1.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;22.0&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;12.2875&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;td&gt;1.0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;The final columns&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Column&lt;/th&gt;
&lt;th&gt;Non-Null Count&lt;/th&gt;
&lt;th&gt;Dtype&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;Survived&lt;/td&gt;
&lt;td&gt;418 non-null&lt;/td&gt;
&lt;td&gt;int64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Pclass&lt;/td&gt;
&lt;td&gt;418 non-null&lt;/td&gt;
&lt;td&gt;int64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Sex&lt;/td&gt;
&lt;td&gt;418 non-null&lt;/td&gt;
&lt;td&gt;int64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Age&lt;/td&gt;
&lt;td&gt;418 non-null&lt;/td&gt;
&lt;td&gt;float64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;SibSp&lt;/td&gt;
&lt;td&gt;418 non-null&lt;/td&gt;
&lt;td&gt;int64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Parch&lt;/td&gt;
&lt;td&gt;418 non-null&lt;/td&gt;
&lt;td&gt;int64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Fare&lt;/td&gt;
&lt;td&gt;418 non-null&lt;/td&gt;
&lt;td&gt;float64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Embarked_C&lt;/td&gt;
&lt;td&gt;418 non-null&lt;/td&gt;
&lt;td&gt;float64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;Embarked_Q&lt;/td&gt;
&lt;td&gt;418 non-null&lt;/td&gt;
&lt;td&gt;float64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Embarked_S&lt;/td&gt;
&lt;td&gt;418 non-null&lt;/td&gt;
&lt;td&gt;float64&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  e. Splitting Data
&lt;/h3&gt;

&lt;p&gt;The dataset is divided into training and testing sets.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;print("Splitting Data into Training and Test Sets")
X = df_processed.drop('Survived', axis=1) 
y = df_processed['Survived']             

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

print(f"X_train shape: {X_train.shape}")
print(f"X_test shape: {X_test.shape}")
print(f"y_train shape: {y_train.shape}")
print(f"y_test shape: {y_test.shape}")
print("\n")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;em&gt;Output&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
The data is split Data into Training and Test sets as illustrated:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;X_train shape: (334, 9)&lt;/li&gt;
&lt;li&gt;X_test shape: (84, 9)&lt;/li&gt;
&lt;li&gt;y_train shape: (334,)&lt;/li&gt;
&lt;li&gt;y_test shape: (84,)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Elaboration on the splitting code&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;X = df.drop('Survived', axis=1): Creates the feature matrix X by dropping the target variable.&lt;/li&gt;
&lt;li&gt;y = df['Survived']: Creates the target vector y.&lt;/li&gt;
&lt;li&gt;train_test_split(X, y, test_size=0.2, random_state=42, stratify=y): Splits the data into 80% for training and 20% for testing.&lt;/li&gt;
&lt;li&gt;random_state: Ensures reproducibility of the split.&lt;/li&gt;
&lt;li&gt;stratify=y: Important for classification problems to ensure that the proportion of 'Survived' (0s and 1s) is roughly the same in both training and test sets.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  f. Scaling Numerical Features (Columns)
&lt;/h3&gt;

&lt;p&gt;StandardScaler standardizes features by removing the mean and scaling to unit variance. This is important for algorithms that are sensitive to the scale of input features like Logistic Regression to prevent features with larger values from dominating the learning process. As we are using Logistic Regression, scaling is important as it will ensure each feature contributes equally.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print("X_train_scaled (first 5 rows):")
print(X_train_scaled[:5])
print("\n")
print("X_test_scaled (first 5 rows):")
print(X_test_scaled[:5])
print("\n")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;em&gt;Output&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
X_train_scaled (first 5 rows)&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Row&lt;/th&gt;
&lt;th&gt;Col1&lt;/th&gt;
&lt;th&gt;Col2&lt;/th&gt;
&lt;th&gt;Col3&lt;/th&gt;
&lt;th&gt;Col4&lt;/th&gt;
&lt;th&gt;Col5&lt;/th&gt;
&lt;th&gt;Col6&lt;/th&gt;
&lt;th&gt;Col7&lt;/th&gt;
&lt;th&gt;Col8&lt;/th&gt;
&lt;th&gt;Col9&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0.85435834&lt;/td&gt;
&lt;td&gt;0.75370758&lt;/td&gt;
&lt;td&gt;-0.21477642&lt;/td&gt;
&lt;td&gt;-0.48043064&lt;/td&gt;
&lt;td&gt;-0.41184087&lt;/td&gt;
&lt;td&gt;-0.50912957&lt;/td&gt;
&lt;td&gt;-0.54736724&lt;/td&gt;
&lt;td&gt;-0.33665016&lt;/td&gt;
&lt;td&gt;0.70551956&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;0.85435834&lt;/td&gt;
&lt;td&gt;0.75370758&lt;/td&gt;
&lt;td&gt;-0.68236136&lt;/td&gt;
&lt;td&gt;-0.48043064&lt;/td&gt;
&lt;td&gt;-0.41184087&lt;/td&gt;
&lt;td&gt;-0.49535877&lt;/td&gt;
&lt;td&gt;-0.54736724&lt;/td&gt;
&lt;td&gt;-0.33665016&lt;/td&gt;
&lt;td&gt;0.70551956&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;0.85435834&lt;/td&gt;
&lt;td&gt;0.75370758&lt;/td&gt;
&lt;td&gt;-0.21477642&lt;/td&gt;
&lt;td&gt;-0.48043064&lt;/td&gt;
&lt;td&gt;-0.41184087&lt;/td&gt;
&lt;td&gt;-0.49615131&lt;/td&gt;
&lt;td&gt;-0.54736724&lt;/td&gt;
&lt;td&gt;2.97044263&lt;/td&gt;
&lt;td&gt;-1.41739515&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;0.85435834&lt;/td&gt;
&lt;td&gt;0.75370758&lt;/td&gt;
&lt;td&gt;-1.61753124&lt;/td&gt;
&lt;td&gt;-0.48043064&lt;/td&gt;
&lt;td&gt;0.67126820&lt;/td&gt;
&lt;td&gt;-0.57539142&lt;/td&gt;
&lt;td&gt;-0.54736724&lt;/td&gt;
&lt;td&gt;-0.33665016&lt;/td&gt;
&lt;td&gt;0.70551956&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;0.85435834&lt;/td&gt;
&lt;td&gt;-1.32677450&lt;/td&gt;
&lt;td&gt;-0.21477642&lt;/td&gt;
&lt;td&gt;-0.48043064&lt;/td&gt;
&lt;td&gt;-0.41184087&lt;/td&gt;
&lt;td&gt;-0.49564602&lt;/td&gt;
&lt;td&gt;-0.54736724&lt;/td&gt;
&lt;td&gt;2.97044263&lt;/td&gt;
&lt;td&gt;-1.41739515&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;X_test_scaled (first 5 rows)&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Row&lt;/th&gt;
&lt;th&gt;Col1&lt;/th&gt;
&lt;th&gt;Col2&lt;/th&gt;
&lt;th&gt;Col3&lt;/th&gt;
&lt;th&gt;Col4&lt;/th&gt;
&lt;th&gt;Col5&lt;/th&gt;
&lt;th&gt;Col6&lt;/th&gt;
&lt;th&gt;Col7&lt;/th&gt;
&lt;th&gt;Col8&lt;/th&gt;
&lt;th&gt;Col9&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0.85435834&lt;/td&gt;
&lt;td&gt;-1.32677450&lt;/td&gt;
&lt;td&gt;-0.21477642&lt;/td&gt;
&lt;td&gt;-0.48043064&lt;/td&gt;
&lt;td&gt;-0.41184087&lt;/td&gt;
&lt;td&gt;-0.49615131&lt;/td&gt;
&lt;td&gt;-0.54736724&lt;/td&gt;
&lt;td&gt;2.97044263&lt;/td&gt;
&lt;td&gt;-1.41739515&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;-1.49424813&lt;/td&gt;
&lt;td&gt;-1.32677450&lt;/td&gt;
&lt;td&gt;-0.37063806&lt;/td&gt;
&lt;td&gt;0.61116007&lt;/td&gt;
&lt;td&gt;-0.41184087&lt;/td&gt;
&lt;td&gt;0.32912293&lt;/td&gt;
&lt;td&gt;1.82692702&lt;/td&gt;
&lt;td&gt;-0.33665016&lt;/td&gt;
&lt;td&gt;-1.41739515&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;0.85435834&lt;/td&gt;
&lt;td&gt;0.75370758&lt;/td&gt;
&lt;td&gt;-0.21477642&lt;/td&gt;
&lt;td&gt;-0.48043064&lt;/td&gt;
&lt;td&gt;-0.41184087&lt;/td&gt;
&lt;td&gt;-0.49362833&lt;/td&gt;
&lt;td&gt;-0.54736724&lt;/td&gt;
&lt;td&gt;-0.33665016&lt;/td&gt;
&lt;td&gt;0.70551956&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;-0.31994489&lt;/td&gt;
&lt;td&gt;-1.32677450&lt;/td&gt;
&lt;td&gt;-0.76029218&lt;/td&gt;
&lt;td&gt;-0.48043064&lt;/td&gt;
&lt;td&gt;-0.41184087&lt;/td&gt;
&lt;td&gt;0.00567507&lt;/td&gt;
&lt;td&gt;-0.54736724&lt;/td&gt;
&lt;td&gt;-0.33665016&lt;/td&gt;
&lt;td&gt;0.70551956&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;-1.49424813&lt;/td&gt;
&lt;td&gt;0.75370758&lt;/td&gt;
&lt;td&gt;-0.37063806&lt;/td&gt;
&lt;td&gt;-0.48043064&lt;/td&gt;
&lt;td&gt;-0.41184087&lt;/td&gt;
&lt;td&gt;-0.18034678&lt;/td&gt;
&lt;td&gt;1.82692702&lt;/td&gt;
&lt;td&gt;-0.33665016&lt;/td&gt;
&lt;td&gt;-1.41739515&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;
  
  
  4. Choosing and Training the Model
&lt;/h2&gt;

&lt;p&gt;Two models chosen:&lt;br&gt;
&lt;strong&gt;&lt;em&gt;Logistic Regression:&lt;/em&gt;&lt;/strong&gt; Chosen for its clear interpretability, providing insights into how each factor linearly influences survival probability, and serving as a strong, efficient baseline.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;print("Training Logistic Regression Model...")
log_reg_model = LogisticRegression(random_state=42, solver='liblinear')
log_reg_model.fit(X_train_scaled, y_train)
print("Logistic Regression Model Trained.\n")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;log_reg_model.fit(X_train_scaled, y_train): Trains the model using the scaled training data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Random Forest Classifier&lt;/em&gt;&lt;/strong&gt; An ensemble method, offers higher accuracy by capturing complex, non-linear relationships and interactions without extensive data preprocessing.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;print("Training Random Forest Classifier Model...")
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
# Random Forest is less sensitive to feature scaling, so we can use unscaled X
rf_model.fit(X_train, y_train)
print("Random Forest Classifier Model Trained.\n")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;rf_model.fit(X_train, y_train): Trains the model. Random Forests are less sensitive to feature scaling, so we can use the unscaled X_train here.&lt;/p&gt;

&lt;p&gt;Together, they allow for a comprehensive analysis, balancing transparency with predictive power to understand Titanic survival.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Evaluating the Models
&lt;/h2&gt;

&lt;p&gt;The predictions on the scaled test set for Logistic Regression is generated (y_pred_log_reg = log_reg_model.predict(X_test_scaled)).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;y_pred_log_reg = log_reg_model.predict(X_test_scaled)
accuracy_log_reg = accuracy_score(y_test, y_pred_log_reg)
print(f"Logistic Regression Accuracy: {accuracy_log_reg:.4f}")
print("Logistic Regression Confusion Matrix:")
print(confusion_matrix(y_test, y_pred_log_reg))
print("Logistic Regression Classification Report:")
print(classification_report(y_test, y_pred_log_reg))
print("\n")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;em&gt;Output&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frnwzpcjij1nkqq98s3ix.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frnwzpcjij1nkqq98s3ix.png" alt="Trained Logistic Regression Model Output" width="800" height="508"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The predictions on the unscaled test set for Random Forest are generated (y_pred_rf = rf_model.predict(X_test))&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;y_pred_rf = rf_model.predict(X_test)
accuracy_rf = accuracy_score(y_test, y_pred_rf)
print(f"Random Forest Accuracy: {accuracy_rf:.4f}")
print("Random Forest Confusion Matrix:")
print(confusion_matrix(y_test, y_pred_rf))
print("Random Forest Classification Report:")
print(classification_report(y_test, y_pred_rf))
print("\n")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;em&gt;Output&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ydkutgaq7l1jltqt63p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ydkutgaq7l1jltqt63p.png" alt="Trained Random Forest Model Output" width="800" height="483"&gt;&lt;/a&gt;&lt;br&gt;
From the output of the two models:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;accuracy_score: Calculates the proportion of correctly classified instances.&lt;/li&gt;
&lt;li&gt;confusion_matrix: Shows the number of true positives, true negatives, false positives, and false negatives.&lt;/li&gt;
&lt;li&gt;classification_report: Provides a detailed summary of precision, recall, f1-score, and support for each class.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  6. Making predictions using sample data
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Using hypothetical new single passenger and then multiple passengers&lt;/li&gt;
&lt;li&gt;The data will be processed through the same steps

&lt;ul&gt;
&lt;li&gt;Removing irrelevant columns&lt;/li&gt;
&lt;li&gt;Impute missing values if any.&lt;/li&gt;
&lt;li&gt;Encode categorical features using the &lt;em&gt;fitted&lt;/em&gt; encoders&lt;/li&gt;
&lt;li&gt;Ensuring column order matches training data for X_train -This is crucial (Create a DataFrame with the same columns as X_train, then fill it)&lt;/li&gt;
&lt;li&gt;Prediction using logistic regression and random forest&lt;/li&gt;
&lt;li&gt;Visualizing key relationships&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  a. Capturing the data
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;**_Using Single Passenger_**
new_passenger_features_raw = pd.DataFrame({
    'Pclass': [1],
    'Name': ['New, Mrs. Example (Test User)'], # Name added for consistency, but will be dropped
    'Sex': ['female'],
    'Age': [30],
    'SibSp': [0],
    'Parch': [0],
    'Ticket': ['TEST12345'], # Ticket added for consistency, but will be dropped
    'Fare': [50],
    'Cabin': [None], # Cabin added for consistency, but will be dropped
    'Embarked': ['S']
})
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  b. Dropping irrelevant columns
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;new_passenger_processed = new_passenger_features_raw.drop(['PassengerId', 'Name', 'Ticket', 'Cabin'], axis=1, errors='ignore')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  c. Imputing missing values
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;new_passenger_processed['Age'] = imputer_age.transform(new_passenger_processed[['Age']])
new_passenger_processed['Fare'] = imputer_fare.transform(new_passenger_processed[['Fare']])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  d. Encoding categorical features
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;new_passenger_processed['Sex'] = le.transform(new_passenger_processed['Sex'])
embarked_new = ohe.transform(new_passenger_processed[['Embarked']])
embarked_new_df = pd.DataFrame(embarked_new, columns=ohe.get_feature_names_out(['Embarked']))
new_passenger_processed = pd.concat([new_passenger_processed.drop('Embarked', axis=1), embarked_new_df], axis=1)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  e. Ensure column order matches training data for X_train
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;new_passenger_final = pd.DataFrame(columns=X.columns)
new_passenger_final = pd.concat([new_passenger_final, new_passenger_processed], ignore_index=True)
new_passenger_final = new_passenger_final.fillna(0) 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  f. Prediction with the models
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Logistic Regression using scaled data&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;new_passenger_scaled = scaler.transform(new_passenger_final)
log_reg_prediction = log_reg_model.predict(new_passenger_scaled)
print(f"Logistic Regression predicts survival for new passenger: {'Survived' if log_reg_prediction[0] == 1 else 'Not Survived'}")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt; Logistic Regression predicts survival for new passenger: &lt;strong&gt;&lt;em&gt;Survived&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Random Forest using unscaled data&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;rf_prediction = rf_model.predict(new_passenger_final)
print(f"Random Forest predicts survival for new passenger: {'Survived' if rf_prediction[0] == 1 else 'Not Survived'}")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt; Random Forest predicts survival for new passenger: &lt;strong&gt;&lt;em&gt;Survived&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Using Multiple Passengers&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  a. Capture multiple passengers' data
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;new_passengers_features_raw = pd.DataFrame({
    'Pclass': [1, 2, 3],
    'Name': [
        'New, Mrs. Example (Test User 1)',
        'New, Mr. Sample (Test User 2)',
        'New, Miss Demo (Test User 3)'
    ],
    'Sex': ['female', 'male', 'female'],
    'Age': [30, None, 22],        # Example: missing Age for passenger 2
    'SibSp': [0, 1, 0],
    'Parch': [0, 0, 1],
    'Ticket': ['TEST12345', 'TEST12346', 'TEST12347'],
    'Fare': [50, None, 15],       # Example: missing Fare for passenger 2
    'Cabin': [None, None, None],
    'Embarked': ['S', 'C', 'Q']
})
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  b. Dropping irrelevant columns
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;new_passengers_processed = new_passengers_features_raw.drop(
    ['PassengerId', 'Name', 'Ticket', 'Cabin'],
    axis=1,
    errors='ignore'
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  c. Imputing missing values using fitted imputers
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;new_passengers_processed['Age'] = pd.Series(
    imputer_age.transform(new_passengers_processed[['Age']]).flatten(),
    index=new_passengers_processed.index
)
new_passengers_processed['Fare'] = pd.Series(
    imputer_fare.transform(new_passengers_processed[['Fare']]).flatten(),
    index=new_passengers_processed.index
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  d. Encoding categorical features
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#Label encode 'Sex'
new_passengers_processed['Sex'] = le.transform(new_passengers_processed['Sex'])

#One-hot encode 'Embarked'
embarked_new = ohe.transform(new_passengers_processed[['Embarked']])
embarked_new_df = pd.DataFrame(
    embarked_new, 
    columns=ohe.get_feature_names_out(['Embarked']),
    index=new_passengers_processed.index
)
new_passengers_processed = pd.concat(
    [new_passengers_processed.drop('Embarked', axis=1), embarked_new_df],
    axis=1
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  e. Align columns with training data
&lt;/h3&gt;

&lt;p&gt;Handles cases where new_passengers_processed might miss a dummy column e.g., if a new passenger only has 'Embarked_S' and not 'Embarked_C').&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;new_passengers_final = pd.DataFrame(columns=X.columns)
new_passengers_final = pd.concat([new_passengers_final, new_passenger_processed], ignore_index=True)
new_passengers_final = new_passenger_final.fillna(0)

print(new_passengers_final)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;em&gt;Output&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pclass&lt;/th&gt;
&lt;th&gt;Sex&lt;/th&gt;
&lt;th&gt;Age&lt;/th&gt;
&lt;th&gt;SibSp&lt;/th&gt;
&lt;th&gt;Parch&lt;/th&gt;
&lt;th&gt;Fare&lt;/th&gt;
&lt;th&gt;Embarked_C&lt;/th&gt;
&lt;th&gt;Embarked_Q&lt;/th&gt;
&lt;th&gt;Embarked_S&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;30.0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;50.0000&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;td&gt;1.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;27.0&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;14.4542&lt;/td&gt;
&lt;td&gt;1.0&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;22.0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;15.0000&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;td&gt;1.0&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  e. Scale features and predict survival for multiple passengers
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Logistic Regression&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;print("--- Logistic Regression Predictions ---")
new_passengers_scaled = scaler.transform(new_passengers_final)
log_reg_predictions = log_reg_model.predict(new_passengers_scaled)
log_reg_probabilities = log_reg_model.predict_proba(new_passengers_scaled)[:, 1] # Probability of survival

# Create results_lr using the *index* of new_passengers_final
results_lr = pd.DataFrame({
    'Predicted_Survival_LR': ['Survived' if p == 1 else 'Not Survived' for p in log_reg_predictions],
    'Survival_Probability_LR': log_reg_probabilities.round(2)
}, index=new_passengers_final.index) # Use the index of the processed features for results_lr

print(results_lr)
print("\n")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;em&gt;Output&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Predicted_Survival_LR&lt;/th&gt;
&lt;th&gt;Survival_Probability_LR&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;Survived&lt;/td&gt;
&lt;td&gt;0.99&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Not Survived&lt;/td&gt;
&lt;td&gt;0.01&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Survived&lt;/td&gt;
&lt;td&gt;0.99&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Random Forest&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;print("--- Random Forest Predictions ---")
# Random Forest uses unscaled data, so use new_passengers_final directly
rf_predictions = rf_model.predict(new_passengers_final)
rf_probabilities = rf_model.predict_proba(new_passengers_final)[:, 1] # Probability of survival

results_rf = pd.DataFrame({
    'Predicted_Survival_RF': ['Survived' if p == 1 else 'Not Survived' for p in rf_predictions],
    'Survival_Probability_RF': rf_probabilities.round(4)
})
print(results_rf)
print("\n")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;em&gt;Output&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Predicted_Survival_RF&lt;/th&gt;
&lt;th&gt;Survival_Probability_RF&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;Survived&lt;/td&gt;
&lt;td&gt;0.96&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Not Survived&lt;/td&gt;
&lt;td&gt;0.01&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Survived&lt;/td&gt;
&lt;td&gt;0.96&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Combined Prediction&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;combined_results = pd.merge(results_lr, results_rf, left_index=True, right_index=True)
print("Combined Predictions (merged on index)")
print(combined_results)
print("\n")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;em&gt;Output&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Predicted_Survival_LR&lt;/th&gt;
&lt;th&gt;Survival_Probability_LR&lt;/th&gt;
&lt;th&gt;Predicted_Survival_RF&lt;/th&gt;
&lt;th&gt;Survival_Probability_RF&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;Survived&lt;/td&gt;
&lt;td&gt;0.99&lt;/td&gt;
&lt;td&gt;Survived&lt;/td&gt;
&lt;td&gt;0.96&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Not Survived&lt;/td&gt;
&lt;td&gt;0.01&lt;/td&gt;
&lt;td&gt;Not Survived&lt;/td&gt;
&lt;td&gt;0.01&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Survived&lt;/td&gt;
&lt;td&gt;0.99&lt;/td&gt;
&lt;td&gt;Survived&lt;/td&gt;
&lt;td&gt;0.96&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  8. Visualizing Key Relationships
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Survival rate by Sex&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;print("\n Visualizing Key Relationships - By Sex")
plt.figure(figsize=(6, 4))
sns.barplot(x='Sex', y='Survived', data=df_processed.replace({'Sex': {0: 'Female', 1: 'Male'}}))
plt.title('Survival Rate by Sex')
plt.ylabel('Survival Rate')
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Visually shows the higher survival rate for females.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foanqvd6qwjyf521vnog4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foanqvd6qwjyf521vnog4.png" alt="Predicted Survival Rate by Sex" width="536" height="393"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Survival rate by Pclass&lt;/em&gt;&lt;/strong&gt; - &lt;br&gt;
illustrates how survival rates vary across different passenger classes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;print("\n Visualizing Key Relationships - By PClass")
plt.figure(figsize=(6, 4))
sns.barplot(x='Pclass', y='Survived', data=df_processed)
plt.title('Survival Rate by Pclass')
plt.ylabel('Survival Rate')
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqmc7a89r9y3huyap5mt4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqmc7a89r9y3huyap5mt4.png" alt="Survival Rate by Passenger Class" width="536" height="393"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Distribution of Age and its relation to Survival&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The histogram compares the age distribution of survivors vs. non-survivors.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plt.figure(figsize=(10, 6))
sns.histplot(df_processed[df_processed['Survived'] == 0]['Age'], color='red', label='Not Survived', kde=True)
sns.histplot(df_processed[df_processed['Survived'] == 1]['Age'], color='green', label='Survived', kde=True)
plt.title('Age Distribution by Survival')
plt.xlabel('Age')
plt.ylabel('Number of Passengers')
plt.legend()
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkpryni55tyoxa4onut0k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkpryni55tyoxa4onut0k.png" alt="Relationship between Age and Survival Rate" width="800" height="520"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Feature Importance for Random Forest&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
Since we are using Random Forest for its ability to show feature importance, visualizing or printing the rf_model.feature_importances_ adds viability to our model.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;importances = rf_model.feature_importances_
feature_names = X.columns
forest_importances = pd.Series(importances, index=feature_names).sort_values(ascending=False)

plt.figure(figsize=(10, 6))
sns.barplot(x=forest_importances.values, y=forest_importances.index)
plt.title('Random Forest Feature Importances')
plt.xlabel('Importance')
plt.ylabel('Feature')
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp22v81lhb5le44wxz2rt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp22v81lhb5le44wxz2rt.png" alt="Feature Importance" width="800" height="479"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Conclusion and Summary
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;br&gt;
This project effectively leveraged machine learning to forecast the survival of Titanic passengers. The tragic historical event emphasizes the crucial role that socioeconomic and demographic factors play in crisis outcomes.  The predictions verified that sex and passenger class were by far the most important factors influencing survival. This was evident in both Random Forest Classifiers and Logistic Regression, with women and those in higher classes having significantly higher probability of being saved.  Additionally, age was a factor, favoring very young children in particular.&lt;/p&gt;

&lt;p&gt;Complementary insights were obtained by the application of the two separate models:  As a great starting point, logistic regression provided a clear, intelligible explanation of the linear influence of each characteristic on survival probability.  The effectiveness of ensemble approaches for subtle pattern identification was demonstrated by the Random Forest Classifier. this typically obtained greater predicted accuracy due to its capacity to grasp complicated non-linear linkages and interactions.  The results were further supported by the visual analyses, which offered unambiguous graphical proof of the differences in survival rates.&lt;br&gt;
In conclusion, this project demonstrated machine learning's ability to glean significant, actionable insights from historical data, shedding light on historical trends and providing a potent tool for comprehending intricate real-world phenomena, in addition to producing strong predictive models for the Titanic disaster.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Summary&lt;/strong&gt;&lt;br&gt;
 The process of creating and assessing Titanic passenger survival prediction models was covered in length in this article.  It began with an overview of the issue and proceeded through the fundamental stages of a machine learning process, with emphasis on training, evaluating and testing the model using sample data.&lt;/p&gt;

&lt;h3&gt;
  
  
  References
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Atieno, L., Robina, F., &amp;amp; Otieno, M (2025). &lt;em&gt;Survival Likelihood Model.&lt;/em&gt; (&lt;a href="https://github.com/Loi2008/Data_Science_Assignments/blob/main/Prediction_Model_Titanic_Dataset.ipynb" rel="noopener noreferrer"&gt;https://github.com/Loi2008/Data_Science_Assignments/blob/main/Prediction_Model_Titanic_Dataset.ipynb&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Dawson, E. (1997). &lt;em&gt;The Titanic Disaster: Historical and Social Perspectives.&lt;/em&gt; Journal of Maritime History.
&lt;/li&gt;
&lt;li&gt;Kaggle. (n.d.). &lt;em&gt;Titanic: Machine Learning from Disaster.&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Breiman, L. (2001). &lt;em&gt;Random Forests.&lt;/em&gt; Machine Learning, 45(1), 5–32.
&lt;/li&gt;
&lt;li&gt;Hosmer, D. W., &amp;amp; Lemeshow, S. (2000). &lt;em&gt;Applied Logistic Regression.&lt;/em&gt; Wiley Series in Probability and Statistics.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Further Reading
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Géron, A. (2019). &lt;em&gt;Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow.&lt;/em&gt; O’Reilly Media.
&lt;/li&gt;
&lt;li&gt;Murphy, K. P. (2012). &lt;em&gt;Machine Learning: A Probabilistic Perspective.&lt;/em&gt; MIT Press.
&lt;/li&gt;
&lt;li&gt;Bishop, C. M. (2006). &lt;em&gt;Pattern Recognition and Machine Learning.&lt;/em&gt; Springer.
&lt;/li&gt;
&lt;li&gt;Raschka, S., &amp;amp; Mirjalili, V. (2019). &lt;em&gt;Python Machine Learning.&lt;/em&gt; Packt Publishing.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Blog:&lt;/strong&gt; &lt;a href="https://towardsdatascience.com/top-10-machine-learning-algorithms-you-should-know-89169fb5e7d" rel="noopener noreferrer"&gt;Top 10 Machine Learning Algorithms You Should Know – Towards Data Science&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>beginners</category>
      <category>datascience</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>📊Unlocking the power of SQL: Subqueries, CTEs, and Stored Procedures Demystified</title>
      <dc:creator>Loi2008</dc:creator>
      <pubDate>Fri, 12 Sep 2025 14:20:31 +0000</pubDate>
      <link>https://dev.to/loi2008/unlocking-the-power-of-sql-subqueries-ctes-and-stored-procedures-demystified-4aci</link>
      <guid>https://dev.to/loi2008/unlocking-the-power-of-sql-subqueries-ctes-and-stored-procedures-demystified-4aci</guid>
      <description>&lt;h2&gt;
  
  
  📝Introduction
&lt;/h2&gt;

&lt;p&gt;In SQL, developers are often faced with situations where they are required to break down complex queries, reuse logic, or encapsulate business rules for repeated use. There are three powerful features that help manage complexity and improve efficiency, and each serves a different purpose:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Subqueries - allow quick, inline calculations inside a query.&lt;/li&gt;
&lt;li&gt;Common Table Expressions (CTEs) - improve readability and support recursion within queries.&lt;/li&gt;
&lt;li&gt;Stored procedures - encapsulate reusable, parameterized business logic stored at the database level.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Understanding their similarities, differences, and best use cases is essential for writing efficient, maintainable SQL code.&lt;/p&gt;

&lt;h4&gt;
  
  
  1. Subquery
&lt;/h4&gt;

&lt;p&gt;A subquery is a query nested inside another query. It can be used in the &lt;strong&gt;SELECT&lt;/strong&gt;, &lt;strong&gt;FROM&lt;/strong&gt;, or &lt;strong&gt;WHERE&lt;/strong&gt; clause to provide intermediate results.&lt;br&gt;
Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;employee_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;🟢&lt;strong&gt;Best for:&lt;/strong&gt; Quick, one-off filtering or calculations.&lt;br&gt;
🔴&lt;strong&gt;Limitation:&lt;/strong&gt; Cannot be reused across queries and may affect performance if overused.&lt;/p&gt;
&lt;h4&gt;
  
  
  2. Common Table Expression (CTE)
&lt;/h4&gt;

&lt;p&gt;A CTE is a temporary named result set defined with the **WITH **keyword. It improves readability and supports recursion.&lt;br&gt;
Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;AvgSalary&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_sal&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;employee_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AvgSalary&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;avg_sal&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;🟢&lt;strong&gt;Best for:&lt;/strong&gt; Structuring complex queries, improving readability, and handling recursive scenarios like hierarchies.&lt;br&gt;
🔴&lt;strong&gt;Limitation:&lt;/strong&gt; Exists only within the query scope and cannot be parameterized.&lt;/p&gt;
&lt;h4&gt;
  
  
  3. Stored Procedure
&lt;/h4&gt;

&lt;p&gt;A stored procedure is a precompiled set of SQL statements stored in the database. It can accept parameters, perform multiple operations, and encapsulate business logic.&lt;br&gt;
Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;PROCEDURE&lt;/span&gt; &lt;span class="n"&gt;GetHighEarners&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="n"&gt;minSalary&lt;/span&gt; &lt;span class="nb"&gt;DECIMAL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;BEGIN&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;employee_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;department&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;minSalary&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;GetHighEarners&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;60000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;🟢Best for: Reusable routines, parameterized operations, and business logic encapsulation.&lt;br&gt;
🔴 Limitation: Requires database-level creation and maintenance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Features
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Subquery&lt;/th&gt;
&lt;th&gt;CTE&lt;/th&gt;
&lt;th&gt;Stored Procedure&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Scope&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Within a query&lt;/td&gt;
&lt;td&gt;Within a query&lt;/td&gt;
&lt;td&gt;Stored in the database&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Reusability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Only within same query&lt;/td&gt;
&lt;td&gt;Yes (global)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Parameters&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Supports Recursion&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Can Modify Data&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Rarely (SELECT only)&lt;/td&gt;
&lt;td&gt;Rarely (SELECT only)&lt;/td&gt;
&lt;td&gt;Yes (INSERT/UPDATE/DELETE)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Best Use&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Simple inline logic&lt;/td&gt;
&lt;td&gt;Complex query readability&lt;/td&gt;
&lt;td&gt;Reusable business logic&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Where to use each
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Subquery
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;In a simple, one-off query.&lt;/li&gt;
&lt;li&gt;when filtering or aggregating values inside a query.&lt;/li&gt;
&lt;li&gt;⚠️ Use correlated subqueries sparingly for performance reasons.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  CTE
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;When query logic is complex or needs recursion.&lt;/li&gt;
&lt;li&gt;Makes queries readable and maintainable.&lt;/li&gt;
&lt;li&gt;Ideal when the same subquery is referenced multiple times.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Stored Procedure
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;When logic needs reusability across multiple queries.&lt;/li&gt;
&lt;li&gt;For data modification, business rules, or repetitive operations.&lt;/li&gt;
&lt;li&gt;When performance benefits from precompiled execution.&lt;/li&gt;
&lt;li&gt;When parameters or multiple operations are required in one call.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In SQL, stored procedures, CTEs, and subqueries are complementary tools.  Subqueries are ideal for inline, fast processes;  CTEs facilitate recursion and make difficult queries readable; and stored procedures contain reusable, parameterized business logic for routine tasks.  Additionally, stored procedures bridge the gap between database architecture and programming principles by embodying abstraction, reusability, and modularity, much like Python functions do.  By choosing the appropriate method for the appropriate situation, developers may create scalable, manageable, and effective SQL.&lt;/p&gt;

&lt;h2&gt;
  
  
  📖 Further Reading &amp;amp; References
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Gravell, M. &lt;em&gt;Difference between CTE and subquery on Stack Overflow&lt;/em&gt; – highlights recursive capabilities of CTEs. &lt;a href="https://stackoverflow.com/questions/706972/difference-between-cte-and-subquery" rel="noopener noreferrer"&gt;https://stackoverflow.com/questions/706972/difference-between-cte-and-subquery&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;LearnSQL.com&lt;/strong&gt; – comprehensive overview of subqueries and CTEs with examples. &lt;a href="https://learnsql.com/blog/cte-vs-subquery" rel="noopener noreferrer"&gt;https://learnsql.com/blog/cte-vs-subquery&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;KDnuggets (April 2025):&lt;/strong&gt; &lt;em&gt;SQL CTE vs Subquery: This Debate Ain’t Over Yet&lt;/em&gt; – detailed comparison.  &lt;a href="https://www.kdnuggets.com/sql-cte-vs-subquery-this-debate-aint-over-yet" rel="noopener noreferrer"&gt;https://www.kdnuggets.com/sql-cte-vs-subquery-this-debate-aint-over-yet&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Wikipedia – Stored Procedure&lt;/strong&gt; – in-depth explanation of stored procedures, use cases, and comparison with functions.  &lt;a href="https://en.wikipedia.org/wiki/Stored_procedure" rel="noopener noreferrer"&gt;https://en.wikipedia.org/wiki/Stored_procedure&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;PostgreSQL stored procedures guide&lt;/strong&gt; – syntax and use for transaction-aware routines.&lt;a href="https://pysql.tecladocode.com/section08/lectures/04_stored_procedures" rel="noopener noreferrer"&gt;https://pysql.tecladocode.com/section08/lectures/04_stored_procedures&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Wikipedia – Correlated Subquery&lt;/strong&gt; – explains execution patterns and performance considerations in correlated subqueries.  &lt;a href="https://en.wikipedia.org/wiki/Correlated_subquery" rel="noopener noreferrer"&gt;https://en.wikipedia.org/wiki/Correlated_subquery&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>subqueries</category>
      <category>commontableexpression</category>
      <category>storedprocedures</category>
      <category>python</category>
    </item>
    <item>
      <title>From SQL to Python: Uniting Stored Power with Functional Flexibility</title>
      <dc:creator>Loi2008</dc:creator>
      <pubDate>Fri, 12 Sep 2025 14:10:08 +0000</pubDate>
      <link>https://dev.to/loi2008/from-sql-to-python-uniting-stored-power-with-functional-flexibility-icf</link>
      <guid>https://dev.to/loi2008/from-sql-to-python-uniting-stored-power-with-functional-flexibility-icf</guid>
      <description>&lt;h2&gt;
  
  
  Overview
&lt;/h2&gt;

&lt;p&gt;Databases and programming languages are frequently used in modern software systems to provide effective, scalable, and maintainable solutions. Python functions and SQL stored procedures are essential components of these ecosystems. Python functions encapsulate reusable application logic for computation, integration, and sophisticated processing, whereas stored procedures encapsulate database logic to carry out actions directly within the database engine.&lt;br&gt;
This article explores their similarities, differences, and suitable applications, emphasizing their potential in both individual and combined use. &lt;/p&gt;
&lt;h3&gt;
  
  
  Stored Procedure (SQL)
&lt;/h3&gt;

&lt;p&gt;A stored procedure is a precompiled set of SQL statements (and optional control-of-flow logic) stored in a relational database. It can accept input parameters, perform operations (such as queries, inserts, updates, deletes, or complex business logic), and return results. &lt;/p&gt;
&lt;h4&gt;
  
  
  Key Features
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Encapsulation of database logic.&lt;/li&gt;
&lt;li&gt;Parameterized execution for dynamic queries.&lt;/li&gt;
&lt;li&gt;Control-of-flow logic (IF, WHILE, CASE).&lt;/li&gt;
&lt;li&gt;Enhanced security via procedure-level permissions.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Application
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Generating financial or operational reports&lt;/li&gt;
&lt;li&gt;Running batch updates and ETL jobs&lt;/li&gt;
&lt;li&gt;Enforcing business rules within the database&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  SQL Script: Applying Stored Procedures
&lt;/h4&gt;

&lt;p&gt;The example illustrates application of a stored procedure in a business scenario, using sales database. The script retrieve customer orders above a certain amount and log when the procedure is executed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-- Customers table
CREATE TABLE Customers (
    CustomerID INT PRIMARY KEY,
    CustomerName VARCHAR(100),
    City VARCHAR(50)
);

-- Orders table
CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    OrderDate DATE,
    OrderAmount DECIMAL(10,2),
    FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)
);

-- ProcedureLog table
CREATE TABLE ProcedureLog (
    Log_ID SERIAL PRIMARY KEY,  
    ProcedureName VARCHAR(100),
    ExecutionTime TIMESTAMP
);

-- Insert sample data
INSERT INTO Customers (CustomerID, CustomerName, City)
VALUES (1, 'Alice Johnson', 'New York'),
       (2, 'Michael Smith', 'Chicago'),
       (3, 'Sarah Lee', 'San Francisco');

INSERT INTO Orders (OrderID, CustomerID, OrderDate, OrderAmount)
VALUES (101, 1, '2025-01-10', 250.00),
       (102, 2, '2025-01-15', 120.00),
       (103, 1, '2025-02-01', 500.00),
       (104, 3, '2025-02-05', 90.00);

-- Stored procedure 

CREATE OR REPLACE FUNCTION GetHighValueOrders(min_amount DECIMAL)
RETURNS TABLE (
    OrderID INT,
    CustomerName VARCHAR(100),
    OrderDate DATE,
    OrderAmount DECIMAL(10,2)
)
AS $$
BEGIN
    -- Log the execution
    INSERT INTO ProcedureLog (ProcedureName, ExecutionTime)
    VALUES ('GetHighValueOrders', NOW());

    -- Return query
    RETURN QUERY
    SELECT O.OrderID, C.CustomerName, O.OrderDate, O.OrderAmount
    FROM Orders O
    INNER JOIN Customers C ON O.CustomerID = C.CustomerID
    WHERE O.OrderAmount &amp;gt;= min_amount
    ORDER BY O.OrderAmount DESC;
END;
$$ LANGUAGE plpgsql;
-- Executing the function
SELECT * FROM GetHighValueOrders(200);

-- Check logs
SELECT * FROM ProcedureLog;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The sql code:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;creates the tables.&lt;/li&gt;
&lt;li&gt;Insert customers and orders.&lt;/li&gt;
&lt;li&gt;Create a stored procedure that:

&lt;ul&gt;
&lt;li&gt;Logs every execution into ProcedureLog.&lt;/li&gt;
&lt;li&gt;Returns orders where OrderAmount &amp;gt;= @MinAmount.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Function (Python)
&lt;/h3&gt;

&lt;p&gt;A Python function is a block of reusable code that performs a specific task, takes input arguments (optional), and can return values. Functions in Python support modularity, abstraction, and reusability within applications.&lt;/p&gt;

&lt;h4&gt;
  
  
  Key Features
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Can return any Python object (e.g., int, list, dict)&lt;/li&gt;
&lt;li&gt;Support recursion, loops, and error handling with try...except&lt;/li&gt;
&lt;li&gt;Integrate seamlessly with external APIs and libraries&lt;/li&gt;
&lt;li&gt;Enable abstraction and modularity in software design&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Application
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Data preprocessing and cleaning&lt;/li&gt;
&lt;li&gt;Implementing application business rules&lt;/li&gt;
&lt;li&gt;Applying machine learning and analytics&lt;/li&gt;
&lt;li&gt;Integrating with external APIs and services&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Python Script: Applying Python Function
&lt;/h4&gt;

&lt;p&gt;The script applies functions for reusability and clarity by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Connecting to a database.&lt;/li&gt;
&lt;li&gt;Calling a stored procedure.&lt;/li&gt;
&lt;li&gt;Applying a function to filter, transform, and display the data.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from datetime import datetime

# "Tables" in memory
customers = [
    {"CustomerID": 1, "CustomerName": "Alice Johnson", "City": "New York"},
    {"CustomerID": 2, "CustomerName": "Michael Smith", "City": "Chicago"},
    {"CustomerID": 3, "CustomerName": "Sarah Lee", "City": "San Francisco"}
]

orders = [
    {"OrderID": 101, "CustomerID": 1, "OrderDate": "2025-01-10", "OrderAmount": 250.00},
    {"OrderID": 102, "CustomerID": 2, "OrderDate": "2025-01-15", "OrderAmount": 120.00},
    {"OrderID": 103, "CustomerID": 1, "OrderDate": "2025-02-01", "OrderAmount": 500.00},
    {"OrderID": 104, "CustomerID": 3, "OrderDate": "2025-02-05", "OrderAmount": 90.00}
]

procedure_log = []  # "ProcedureLog table"

# Function to log execution
def log_procedure(name):
    procedure_log.append({
        "ProcedureName": name,
        "ExecutionTime": datetime.now()
    })

# Function to get high-value orders
def get_high_value_orders(min_amount):
    # Log execution
    log_procedure("get_high_value_orders")

    # Filter and join with customers
    result = []
    for order in orders:
        if order["OrderAmount"] &amp;gt;= min_amount:
            customer = next(c for c in customers if c["CustomerID"] == order["CustomerID"])
            result.append({
                "OrderID": order["OrderID"],
                "CustomerName": customer["CustomerName"],
                "OrderDate": order["OrderDate"],
                "OrderAmount": order["OrderAmount"]
            })
# Sort ORDER BY DESC
    result.sort(key=lambda x: x["OrderAmount"], reverse=True)
    return result
# ---------------------------
# Application
print("High value orders &amp;gt;= 200:")
for row in get_high_value_orders(200):
    print(row)
print("\nProcedure logs:")
for log in procedure_log:
    print(log)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Similarities
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. Encapsulation of Logic
&lt;/h4&gt;

&lt;p&gt;Both stored procedures and Python functions encapsulate logic into reusable units. For example, instead of writing the same SQL query or Python code multiple times, you place it in a procedure/function and call it when needed.&lt;/p&gt;

&lt;h4&gt;
  
  
  2. Parameterization
&lt;/h4&gt;

&lt;p&gt;Both accept input parameters, process them, and return results. For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SQL: EXEC GetCustomerOrders @CustomerID = 5

Python: get_customer_orders(customer_id=5)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  3. Modularity &amp;amp; Reusability
&lt;/h4&gt;

&lt;p&gt;Both allow modular program design, making systems easier to maintain. Code changes in one procedure/function apply everywhere it is called.&lt;/p&gt;

&lt;h4&gt;
  
  
  4. Control Flow Support
&lt;/h4&gt;

&lt;p&gt;Both can include conditional logic (IF, CASE in SQL vs. if/else in Python) and looping constructs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Differences
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Aspect&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Stored Procedures (SQL)&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Functions (Python)&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Execution Context&lt;/td&gt;
&lt;td&gt;Runs inside database engine&lt;/td&gt;
&lt;td&gt;Runs in Python interpreter/application layer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Primary Purpose&lt;/td&gt;
&lt;td&gt;Optimizing database operations (queries, transactions)&lt;/td&gt;
&lt;td&gt;Implementing general-purpose logic and algorithms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Return Types&lt;/td&gt;
&lt;td&gt;Result sets, output parameters, status codes&lt;/td&gt;
&lt;td&gt;Any Python object (int, list, dict, etc.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Language Used&lt;/td&gt;
&lt;td&gt;SQL with procedural extensions (T-SQL, PL/SQL, etc.)&lt;/td&gt;
&lt;td&gt;Python syntax&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Performance&lt;/td&gt;
&lt;td&gt;Precompiled, reduces network traffic by processing in DB&lt;/td&gt;
&lt;td&gt;Requires fetching data from DB before processing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Error Handling&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;TRY...CATCH&lt;/code&gt; blocks&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;try...except&lt;/code&gt; blocks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Statefulness&lt;/td&gt;
&lt;td&gt;Tied to database state (tables, views, transactions)&lt;/td&gt;
&lt;td&gt;Independent, works with in-memory or external data&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Suitable Applications
&lt;/h3&gt;

&lt;h5&gt;
  
  
  Stored Procedures (SQL)
&lt;/h5&gt;

&lt;p&gt;Best used when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Heavy database operations are needed (aggregation, filtering, batch updates).&lt;/li&gt;
&lt;li&gt;In need for reduced network overhead (logic executes close to the data).&lt;/li&gt;
&lt;li&gt;Security is critical - permissions can be granted at procedure-level rather than table-level.&lt;/li&gt;
&lt;li&gt;You need performance optimization: pre-compiled execution plans and indexing.&lt;/li&gt;
&lt;/ul&gt;

&lt;h6&gt;
  
  
  Application
&lt;/h6&gt;

&lt;ul&gt;
&lt;li&gt;Generating financial reports directly from the database.&lt;/li&gt;
&lt;li&gt;Performing scheduled batch updates or ETL processes.&lt;/li&gt;
&lt;li&gt;Enforcing business rules within the database.&lt;/li&gt;
&lt;/ul&gt;

&lt;h5&gt;
  
  
  Python Functions
&lt;/h5&gt;

&lt;p&gt;Best used when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Application-level processing is required (business rules, algorithms, data transformations).&lt;/li&gt;
&lt;li&gt;Data needs to be manipulated in memory beyond SQL capabilities (e.g., machine learning, natural language processing).&lt;/li&gt;
&lt;li&gt;You need integration with external APIs, services, or user interfaces.&lt;/li&gt;
&lt;li&gt;Logic requires flexibility beyond relational operations (graph algorithms, recursive calculations, etc.).&lt;/li&gt;
&lt;/ul&gt;

&lt;h6&gt;
  
  
  Application
&lt;/h6&gt;

&lt;ul&gt;
&lt;li&gt;Cleaning and preparing datasets for machine learning.&lt;/li&gt;
&lt;li&gt;Implementing application logic in a web service.&lt;/li&gt;
&lt;li&gt;Calling a database stored procedure and further processing results in Python.&lt;/li&gt;
&lt;/ul&gt;

&lt;h5&gt;
  
  
  Application of both
&lt;/h5&gt;

&lt;p&gt;In real-world systems, stored procedures and Python functions often complement each other- Stored procedure handles data retrieval/aggregation while Python function calls the stored procedure and applies additional business logic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Stored Procedures:&lt;/strong&gt; Optimize and secure database operations, reduce network load, enforce business rules within the DB.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Python Functions:&lt;/strong&gt; Provide flexibility, abstraction, and broader application logic capabilities outside the database.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Both:&lt;/strong&gt; Form a powerful combination — databases handle what they do best (data storage and retrieval), while Python manages application logic and advanced processing.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  References and Further Reading
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Coronel, C., &amp;amp; Morris, S. (2015). &lt;em&gt;Database Systems: Design, Implementation, &amp;amp; Management&lt;/em&gt;. Cengage Learning.
&lt;/li&gt;
&lt;li&gt;Ramakrishnan, R., &amp;amp; Gehrke, J. (2003). &lt;em&gt;Database Management Systems&lt;/em&gt;. McGraw-Hill.
&lt;/li&gt;
&lt;li&gt;Fowler, M. (2018). &lt;em&gt;Refactoring: Improving the Design of Existing Code&lt;/em&gt;. Addison-Wesley.
&lt;/li&gt;
&lt;li&gt;Van Rossum, G., &amp;amp; Drake, F. L. (2009). &lt;em&gt;The Python Language Reference Manual&lt;/em&gt;. Network Theory Ltd.
&lt;/li&gt;
&lt;li&gt;Microsoft Docs. (2023). &lt;a href="https://learn.microsoft.com/en-us/sql/relational-databases/stored-procedures/stored-procedures-database-engine" rel="noopener noreferrer"&gt;Stored Procedures (Database Engine)&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;PostgreSQL Documentation. (2023). &lt;a href="https://www.postgresql.org/docs/current/xfunc.html" rel="noopener noreferrer"&gt;Functions and Stored Procedures&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;Python Software Foundation. (2023). &lt;a href="https://docs.python.org/3/tutorial/controlflow.html#defining-functions" rel="noopener noreferrer"&gt;Python Functions&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;IBM Developer. (2021). &lt;a href="https://developer.ibm.com/articles/ba-perfstoredproc/" rel="noopener noreferrer"&gt;Choosing Between Stored Procedures and Application Logic&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;Real Python. (2023). &lt;a href="https://realpython.com/defining-your-own-python-function/" rel="noopener noreferrer"&gt;Defining Your Own Python Function&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;Stack Overflow Discussions. (Ongoing). &lt;em&gt;Best practices for stored procedures vs. application-level logic&lt;/em&gt;.
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>database</category>
      <category>backend</category>
      <category>sql</category>
      <category>python</category>
    </item>
    <item>
      <title>Beyond the Fields: A Data-Driven Look at Kenya’s Agricultural Productivity</title>
      <dc:creator>Loi2008</dc:creator>
      <pubDate>Fri, 12 Sep 2025 13:26:53 +0000</pubDate>
      <link>https://dev.to/loi2008/beyond-the-fields-a-data-driven-look-at-kenyas-agricultural-productivity-4n8n</link>
      <guid>https://dev.to/loi2008/beyond-the-fields-a-data-driven-look-at-kenyas-agricultural-productivity-4n8n</guid>
      <description>&lt;h2&gt;
  
  
  1. Overview
&lt;/h2&gt;

&lt;p&gt;Kenya's economy is based primarily on agriculture, which makes a substantial contribution to employment, food security, and GDP.  Farming is the main source of income for more than 70% of rural households.  However, due to variations in crop selection, agricultural methods, input usage, and susceptibility to weather variations, productivity and profitability fluctuate greatly.&lt;/p&gt;

&lt;p&gt;A wealth of data on farmers' activities throughout counties can be found in the Kenya Crops Dataset.  The data provide an insight on crop types, planted areas, yields, expenses, income, and management techniques (pest control, irrigation, and fertilizer).&lt;/p&gt;

&lt;p&gt;This article analyses the data set to identify trends, obstacles, and possibilities that can guide practice and policy in Kenya's agriculture industry.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Data Overview
&lt;/h2&gt;

&lt;p&gt;The dataset consists of 500 records drawn from various counties in Kenya, each providing detailed information on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Crops:&lt;/strong&gt; Include crops such as: - Potatoes, maize, tomatoes, sorghum, coffee, beans, and others.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Land use:&lt;/strong&gt; Planted area in acres.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Productivity:&lt;/strong&gt; Yield in kilograms.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Economics:&lt;/strong&gt; Market prices, revenues, production costs, and profits.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Farming practices&lt;/strong&gt;: Fertilizer type, irrigation use, pest control methods.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Environmental impact:&lt;/strong&gt; Weather conditions and soil type.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The data's comprehensiveness makes it feasible to evaluate farming's technical and financial facets, as well as environmental influences.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Key Findings
&lt;/h2&gt;

&lt;h4&gt;
  
  
  3.1 Crop Distribution
&lt;/h4&gt;

&lt;p&gt;The dataset reveals a wide variety of crops grown across counties, with the most popular ones being coffee, potatoes and tea (Fig1). &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd9s6ndwr4d6je8ofbcny.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd9s6ndwr4d6je8ofbcny.png" alt="Crop Type Popularity by County" width="800" height="586"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  3.2 Profitability and Yields
&lt;/h4&gt;

&lt;p&gt;Profitability is highly variable among the various crop types. Some farmers record substantial profits, especially those producing high-value crops like rice and sorghum (Fig2).&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg9i07nus47lb1xvcjekf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg9i07nus47lb1xvcjekf.png" alt="Revenue and profit generated per crop" width="800" height="516"&gt;&lt;/a&gt;&lt;br&gt;
The yields were highest for rice but sorghum registered moderate yield though revenue and profit were high(Fig2 and Fig3).&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh4oh44lojofjp2ulni4g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh4oh44lojofjp2ulni4g.png" alt=" " width="763" height="624"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  3.3 Farming Practices
&lt;/h4&gt;

&lt;p&gt;According to the data set, majority of the farmers did not use any fertilizer. For those who used fertilizer, the preferences favored  were DAP, manure and CAN (Fig4).&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fatx8uxevcpjc64jftgfq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fatx8uxevcpjc64jftgfq.png" alt="Fertilizer Usage by Farmers" width="771" height="543"&gt;&lt;/a&gt;&lt;br&gt;
Irrigation adoption is uneven, with many farmers relying solely on rainfall, exposing yields to droughts and irregular rainfall. Pest control is inconsistently applied, with some farmers report no control methods, exposing crops to risks of pest infection (Fig 5).&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq11wqwind2v9tr2v383x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq11wqwind2v9tr2v383x.png" alt="Farmers' usage of irrigation method and pest control" width="800" height="498"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Missing Data in the Dataset and Effect on Analysis
&lt;/h2&gt;

&lt;h4&gt;
  
  
  Missing Data
&lt;/h4&gt;

&lt;p&gt;One of the issues with the Kenya Crops Dataset is the existence of partial or missing data, which is frequently indicated by "Not Provided" items.  Key variables that have missing values include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Yield (Kg):&lt;/strong&gt; In certain instances, farmers failed to document or supply real yields.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Profit (KES):&lt;/strong&gt; Economic analysis is less accurate when profit numbers are missing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fertilizer Used:&lt;/strong&gt; "Not Provided" restricts information about input usage trends.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pest Control Method:&lt;/strong&gt; Without this information, crop protection measures cannot accurately be evaluated.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Effect on Analysis
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Decreased Average and Total Accuracy:&lt;/strong&gt; The accuracy of computed averages, totals, and comparisons between crops and counties is reduced when yield and profit figures are missing. For instance, if high-profit records are absent, the average profit per crop can be understated.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prejudiced Views:&lt;/strong&gt; Results may disproportionately represent farmers with better-organized records if missing data is not random (smallholder farmers are less likely to record inputs, for example), which would tilt analysis in favor of bigger or better-resourced farms.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Challenges in Trend Analysis:&lt;/strong&gt; Trends in modern versus ancient farming methods are difficult to determine due to incomplete information on pest management or fertilizer practices. As a result, it becomes harder to connect input utilization to productivity results.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Problems with Policy Suggestions:&lt;/strong&gt; To make focused judgments, policymakers depend on comprehensive datasets.  Missing values could conceal crops that require assistance or underperforming areas.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Dashboards in Visualizing Findings
&lt;/h2&gt;

&lt;p&gt;Dashboards convert complicated statistics into understandable, useful insights that enhance in-depth reporting. Intuitive analysis on this dataset is contained in my Dashboard titled: Kenya Crops Analysis Dashboard (Fig 6) &lt;a href="https://github.com/Loi2008/Power-BI-/blob/main/Kenya%20Crops%20Production%20Analysis_1.pbix" rel="noopener noreferrer"&gt;Kenya Crops Analysis Dashboard&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4zqnmxvpjjkjuirnb3q8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4zqnmxvpjjkjuirnb3q8.png" alt="Kenya Crops Analysis Dashboard" width="800" height="470"&gt;&lt;/a&gt;&lt;br&gt;
While reports such as this provide detailed analysis, dashboards make it easier to visualize findings interactively and monitor trends in real time. A dashboard built on this dataset included:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Crop Distribution:&lt;/strong&gt; A detailed bar chart containing the distribution of planting area, yield, revenue and profit across counties.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Impact of weather on revenue and profitability:&lt;/strong&gt; column chart comparing revenue and profits per weather impact.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Visual map:&lt;/strong&gt; giving the position of the counties and their performance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Key Metrics (KPIs):&lt;/strong&gt; Cards displaying total planted area, total yield, total production cost, total revenue and total profit.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The dashboard would enable farmers, researchers, and politicians to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rapidly determine which crops and areas are performing well and poorly.&lt;/li&gt;
&lt;li&gt;Evaluate how well pest management, fertilization, and irrigation are working.&lt;/li&gt;
&lt;li&gt;Monitor how the weather affects crops and earnings.&lt;/li&gt;
&lt;li&gt;Make decisions based on evidence to promote the expansion of agriculture.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Recommendations
&lt;/h2&gt;

&lt;p&gt;Based on the analysis, the following measures could help enhance agricultural outcomes in Kenya:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Increase irrigation coverage to lessen susceptibility to fluctuations in rainfall and enhance smallholder irrigation projects.&lt;/li&gt;
&lt;li&gt;Encourage sustainable farming methods that preserve soil health, support organic substitutes, soil conservation, and balanced fertilizer use.&lt;/li&gt;
&lt;li&gt;Enhance pest control to reduce agricultural losses, and teach farmers integrated pest management techniques.&lt;/li&gt;
&lt;li&gt;Digitize agricultural records to minimize missing data and enhance decision-making. This will also enhance data quality and monitoring using digital platforms.&lt;/li&gt;
&lt;li&gt;Targeted assistance for low-yield counties by offering training, extension services, and resources to counties that continuously perform poorly.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  References and Additional Readings
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Food and Agriculture Organization of the United Nations (FAO). (2021). &lt;em&gt;The State of Food and Agriculture&lt;/em&gt;. Rome: FAO.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Kenya Ministry of Agriculture. (2020). &lt;em&gt;Agricultural Sector Transformation and Growth Strategy (ASTGS) 2019–2029&lt;/em&gt;. Nairobi: Government of Kenya.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;World Bank. (2019). &lt;em&gt;Agricultural Productivity in Kenya: Trends and Determinants&lt;/em&gt;. Washington, DC: World Bank.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;International Fund for Agricultural Development (IFAD). (2022). &lt;em&gt;Climate-Resilient Agriculture in East Africa&lt;/em&gt;. Rome: IFAD.  &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>analytics</category>
      <category>data</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Ms Excel and Predictive Data analysis</title>
      <dc:creator>Loi2008</dc:creator>
      <pubDate>Sun, 31 Aug 2025 17:07:31 +0000</pubDate>
      <link>https://dev.to/loi2008/ms-excel-and-predictive-data-analysis-4g3b</link>
      <guid>https://dev.to/loi2008/ms-excel-and-predictive-data-analysis-4g3b</guid>
      <description>&lt;h2&gt;
  
  
  Overview
&lt;/h2&gt;

&lt;p&gt;Microsoft Excel is a keystone of business analytics and is valued for its familiarity, flexibility, and widespread use across industries. As businesses demand ever-more sophisticated insights, Excel’s limitations become increasingly pronounced. Below, is an exploration of  its strengths and limitations in predictive analysis, and its critical role in enabling data-driven decisions. Despite the limitations, businesses can still leverage its capabilities to gain insights and make informed decisions. This is possible by following best practices, understanding its limitations, and maximizing the benefits of using Excel for predictive analytics.&lt;/p&gt;

&lt;h2&gt;
  
  
  Strengths and Limitations of Excel in Predictive Analysis
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Ms Excel Strengths in Predictive Analysis
&lt;/h3&gt;

&lt;h4&gt;
  
  
  - Accessibility and Ubiquity
&lt;/h4&gt;

&lt;p&gt;Excel is an accessible tool for predictive analysis since it is readily available and well-known to many people. Because of its established reputation in the business sector, many professionals choose it by default. Its widespread use and user-friendly interface allow for rapid analysis with little training.&lt;/p&gt;

&lt;h4&gt;
  
  
  - Built-In Analytical Tools
&lt;/h4&gt;

&lt;p&gt;For trend-based predictions, Excel offers a number of forecasting tools, including single and double exponential smoothening and linear forecasting. Furthermore, its ability to create bespoke models makes it possible to perform competent financial modeling, including scenario simulations, projections, and discount cash flow assessments. - Built-in Functions: Excel's built-in functions, like forecasting and regression analysis, offer a strong basis for predictive analytics.&lt;/p&gt;

&lt;h4&gt;
  
  
  - Visualization and Layout Capabilities
&lt;/h4&gt;

&lt;p&gt;Excel gives customers the ability to visualize data patterns and derive actionable insights with its built-in charts, pivot tables, conditional formatting, and formatting flexibility.&lt;/p&gt;

&lt;h4&gt;
  
  
  - Data Integration &amp;amp; Flexibility
&lt;/h4&gt;

&lt;p&gt;Excel's versatility makes it useful for combining data and enabling rapid, exploratory research. It can import data from a various sources: - CSV files, databases, and APIs.&lt;/p&gt;

&lt;h4&gt;
  
  
  - Data Manipulation
&lt;/h4&gt;

&lt;p&gt;Users can prepare data for predictive models thanks to Excel's flexibility in data manipulation and presentation.&lt;/p&gt;

&lt;h4&gt;
  
  
  - Add-ins
&lt;/h4&gt;

&lt;p&gt;Excel's predictive skills are improved with add-ins like Analysis ToolPak, PI DataLink and Solver.&lt;/p&gt;

&lt;h3&gt;
  
  
  Limitations of Ms Excel in Predictive Analysis
&lt;/h3&gt;

&lt;h4&gt;
  
  
  - Scalability &amp;amp; Performance Constraints
&lt;/h4&gt;

&lt;p&gt;Excel may become unresponsive for large-scale predictive modeling as dataset sizes increase. It may become sluggish, crash, or unable to handle the volume.&lt;/p&gt;

&lt;h4&gt;
  
  
  - Limited Forecasting Sophistication
&lt;/h4&gt;

&lt;p&gt;Excel's Forecast function assumes linear or exponential trends. Inventory Planner frequently fails in situations involving non-linear dynamics, complex seasonal patterns, or environments with outliers and volatility.&lt;/p&gt;

&lt;h4&gt;
  
  
  - Error-Prone Manual Handling
&lt;/h4&gt;

&lt;p&gt;Errors that can jeopardize analytical integrity are frequently caused by manual entry, formula formulation, and inadequate data management.&lt;/p&gt;

&lt;h4&gt;
  
  
  - Version Control and Siloed Collaboration  Issues
&lt;/h4&gt;

&lt;p&gt;Spreadsheets frequently spread among departments, resulting in several, erratic versions. This makes it more difficult to maintain a single source of truth and hinders collaboration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Role of Excel in Data-Driven Business Decisions: Analyzing Sample Jumia Dataset
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Overview
&lt;/h3&gt;

&lt;p&gt;The Jumia dataset contains sample Jumia data on products, pricing of each product, discounts offered on each product, products' rating and reviews. The data was cleaned and analyzed using Ms Excel.A dashboard revealing various perspectives on the data provided enables effective decision making on the products performance (Fig1)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu0lcqai0fbwq07k1f404.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu0lcqai0fbwq07k1f404.png" alt="Jumia Products Analysis Dashboard" width="800" height="716"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Fig1: Jumia Products Analysis Dashboard&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Findings
&lt;/h3&gt;

&lt;p&gt;A total of 112 products sold on Jumia were analysed.All the products were discounted. Fig2 illustrates the top 10 discounted products&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmo6ixv8qidhxjbooa3i1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmo6ixv8qidhxjbooa3i1.png" alt="Top 10 discounted products" width="596" height="353"&gt;&lt;/a&gt; &lt;br&gt;
&lt;strong&gt;Fig2: Top 10 Discounted Products&lt;/strong&gt;&lt;br&gt;
The products had good reviews, with the top 10 products having reviews ranging between 20 and 70 as illustrated in Fig3.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb23eggz7ks8tuhgu2gkw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb23eggz7ks8tuhgu2gkw.png" alt="Top 10 products with the highest reviews" width="800" height="492"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Fig3: Top 10 Reviewed Products&lt;/strong&gt;&lt;br&gt;
The finding also revealed a positive relationship between the product ratings and the number of reviews (Fig4). &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq4b3quytw9tocjah8iby.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq4b3quytw9tocjah8iby.png" alt="Relationship between the products review and the rating" width="800" height="436"&gt;&lt;/a&gt;&lt;br&gt;
Fig.4: Relationship between Product Review and the Rating&lt;br&gt;
More than 50% of the products had high discount and 43% of the products had excellent rating (Fig5 &amp;amp; Fig6).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzoabwre0fxbcmiva0b2q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzoabwre0fxbcmiva0b2q.png" alt="Product Discount category" width="800" height="596"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Fig5: Product Discount Category&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcm32cla8egrrik8w9l1x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcm32cla8egrrik8w9l1x.png" alt="Product Rating Category" width="800" height="594"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Fig6: Product Rating Category&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Overall, the analysis provides a comprehensive view of Jumia’s sales ecosystem and guides strategic decisions to optimize revenue, customer engagement, and operational efficiency.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reference
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Jumia Sample DataSet&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.linkedin.com/advice/3/what-pros-cons-using-excel-data-analysis-skills-data-analy%0Asis" rel="noopener noreferrer"&gt;What are the pros and cons of using Excel for data analysis?&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
    </item>
    <item>
      <title>Global Debt Uncovered - Insights from PostgreSQL Analysis</title>
      <dc:creator>Loi2008</dc:creator>
      <pubDate>Sun, 31 Aug 2025 08:01:58 +0000</pubDate>
      <link>https://dev.to/loi2008/international-debt-analysis-exploring-postgresql-2k5o</link>
      <guid>https://dev.to/loi2008/international-debt-analysis-exploring-postgresql-2k5o</guid>
      <description>&lt;h2&gt;
  
  
  1.    Introduction
&lt;/h2&gt;

&lt;p&gt;This analysis explores a sample international debt dataset using PostgreSQL. The goal is to understand the structure of the data, assess data quality, and generate insights about global debt distribution. The dataset contains information on countries, debt indicators, and debt value. It also includes missing values that must be handled carefully during analysis, if accurate meaning is to be drawn from the dataset.&lt;/p&gt;

&lt;h2&gt;
  
  
  2.   Loading the Dataset
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;**Assuming there is an active connected postgresql&lt;/em&gt;&lt;em&gt;.&lt;/em&gt; &lt;/p&gt;

&lt;h4&gt;
  
  
  Steps:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Open PostgreSQL in Dbeaver&lt;/li&gt;
&lt;li&gt;Create a schema
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;schema&lt;/span&gt; &lt;span class="n"&gt;international_debt_analysis&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Set the search path&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;set&lt;/span&gt; &lt;span class="n"&gt;search_path&lt;/span&gt; &lt;span class="k"&gt;to&lt;/span&gt; &lt;span class="n"&gt;international_debt_analysis&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Right click on tables under your schema&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Import data &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Select the source file&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Map the table to the schema&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Confirm &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Proceed&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Open a new script&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Confirm your table is in the right schema&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;international_debt_with_missing_values&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  3.    SQL Queries and Findings
&lt;/h2&gt;

&lt;p&gt;The data is analysed using SQL queries.  Charts and tables are used for visualization. &lt;/p&gt;

&lt;h3&gt;
  
  
  3.1  The Total Amount of Debt Owed
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;debt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;total_debt&lt;/span&gt; 
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;international_debt_analysis&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Total Amount of Debt Owed is &lt;strong&gt;2,823,893,300,273&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3.2  Number of Distinct Countries
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;distinct&lt;/span&gt; &lt;span class="n"&gt;country_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;distinct_country&lt;/span&gt; 
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;international_debt_analysis&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Distinct Countries = &lt;strong&gt;125&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3.3  Distinct Types of Debt Indicators
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="k"&gt;distinct&lt;/span&gt; &lt;span class="n"&gt;indicator_code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;indicator_name&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;international_debt_analysis&lt;/span&gt;
&lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;indicator_name&lt;/span&gt; &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;
&lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="n"&gt;indicator_name&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;
&lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="n"&gt;indicator_code&lt;/span&gt; &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;
&lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="n"&gt;indicator_code&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h5&gt;
  
  
  Table1: Distinct Debt Indicators
&lt;/h5&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Serial&lt;/th&gt;
&lt;th&gt;Code&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;DT.INT.PRVT.CD&lt;/td&gt;
&lt;td&gt;PPG, private creditors (INT, current US$)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;DT.AMT.OFFT.CD&lt;/td&gt;
&lt;td&gt;PPG, official creditors (AMT, current US$)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;DT.INT.DLXF.CD&lt;/td&gt;
&lt;td&gt;Interest payments on external debt, long-term (INT, current US$)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;DT.INT.DPNG.CD&lt;/td&gt;
&lt;td&gt;Interest payments on external debt, private nonguaranteed (PNG) (INT, current US$)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;DT.DIS.PCBK.CD&lt;/td&gt;
&lt;td&gt;PPG, commercial banks (DIS, current US$)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;DT.AMT.PBND.CD&lt;/td&gt;
&lt;td&gt;PPG, bonds (AMT, current US$)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;DT.DIS.MLAT.CD&lt;/td&gt;
&lt;td&gt;PPG, multilateral (DIS, current US$)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;DT.DIS.PRVT.CD&lt;/td&gt;
&lt;td&gt;PPG, private creditors (DIS, current US$)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;DT.INT.MLAT.CD&lt;/td&gt;
&lt;td&gt;PPG, multilateral (INT, current US$)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;DT.INT.PBND.CD&lt;/td&gt;
&lt;td&gt;PPG, bonds (INT, current US$)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;DT.INT.PROP.CD&lt;/td&gt;
&lt;td&gt;PPG, other private creditors (INT, current US$)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;DT.DIS.OFFT.CD&lt;/td&gt;
&lt;td&gt;PPG, official creditors (DIS, current US$)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;DT.AMT.MLAT.CD&lt;/td&gt;
&lt;td&gt;PPG, multilateral (AMT, current US$)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;DT.INT.OFFT.CD&lt;/td&gt;
&lt;td&gt;PPG, official creditors (INT, current US$)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;DT.DIS.PROP.CD&lt;/td&gt;
&lt;td&gt;PPG, other private creditors (DIS, current US$)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;DT.AMT.PCBK.CD&lt;/td&gt;
&lt;td&gt;PPG, commercial banks (AMT, current US$)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;td&gt;DT.DIS.BLAT.CD&lt;/td&gt;
&lt;td&gt;PPG, bilateral (DIS, current US$)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;DT.AMT.DLXF.CD&lt;/td&gt;
&lt;td&gt;Principal repayments on external debt, long-term (AMT, current US$)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;DT.AMT.PROP.CD&lt;/td&gt;
&lt;td&gt;PPG, other private creditors (AMT, current US$)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;DT.AMT.PRVT.CD&lt;/td&gt;
&lt;td&gt;PPG, private creditors (AMT, current US$)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;td&gt;DT.AMT.BLAT.CD&lt;/td&gt;
&lt;td&gt;PPG, bilateral (AMT, current US$)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;td&gt;DT.INT.PCBK.CD&lt;/td&gt;
&lt;td&gt;PPG, commercial banks (INT, current US$)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;23&lt;/td&gt;
&lt;td&gt;DT.INT.BLAT.CD&lt;/td&gt;
&lt;td&gt;PPG, bilateral (INT, current US$)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;td&gt;DT.DIS.DLXF.CD&lt;/td&gt;
&lt;td&gt;Disbursements on external debt, long-term (DIS, current US$)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;td&gt;DT.AMT.DPNG.CD&lt;/td&gt;
&lt;td&gt;Principal repayments on external debt, private nonguaranteed (PNG) (AMT, current US$)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  3.4  Country with Highest Total Debt, and the Amount
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;country_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;debt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;total_debt&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;international_debt_analysis&lt;/span&gt;
&lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;country_name&lt;/span&gt; &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;
&lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="n"&gt;country_name&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;
&lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;country_name&lt;/span&gt;
&lt;span class="k"&gt;order&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;total_debt&lt;/span&gt; &lt;span class="k"&gt;desc&lt;/span&gt;
&lt;span class="k"&gt;limit&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The county is China with the debt of  &lt;strong&gt;283,748,948,518&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3.5  The Average Debt Across Different Debt Indicators
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;indicator_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;debt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_debt&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;international_debt_with_missing_values&lt;/span&gt;
&lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;indicator_name&lt;/span&gt;
&lt;span class="k"&gt;order&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;avg_debt&lt;/span&gt; &lt;span class="k"&gt;desc&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8aeqakdz7l7e3lwxyx4f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8aeqakdz7l7e3lwxyx4f.png" alt=" " width="800" height="653"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h5&gt;
  
  
  Fig1: Average Debt per Indicator Category
&lt;/h5&gt;

&lt;h3&gt;
  
  
  3.6  Country with Highest Principal Repayment
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;select country_name, SUM(debt) AS total_principal_repayment
from international_debt_analysis
where indicator_name like '%Principal repayment%' and debt&amp;gt; 0
group by country_name
order by total_principal_repayment desc
limit 1;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The county is China with the principal repayment amount of &lt;strong&gt;168,611,607,050&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3.7  Most Common Debt Indicator
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;select indicator_name, COUNT(*) AS frequency
from international_debt_analysis
where indicator_name is not null
and indicator_name &amp;lt;&amp;gt; ''
group by indicator_name
order by frequency desc
limit 1;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;the most common debt indicator is PPG, official creditors (AMT, current US$)    with a total of &lt;strong&gt;124&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3.8  Other Key Debt Trends
&lt;/h3&gt;

&lt;h4&gt;
  
  
  3.8.1   Top 5 countries with the most debt
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;select country_name, sum(debt) as total_debt
from international_debt_with_missing_values
group by country_name
order by total_debt desc
limit 5;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx32n0wllsru0wr2rk4at.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx32n0wllsru0wr2rk4at.png" alt="Average debt per indicator category" width="800" height="438"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h5&gt;
  
  
  Fig2: Average Debt per Indicator Category
&lt;/h5&gt;

&lt;h4&gt;
  
  
  3.8.2   Five Countries with the Least Debt
&lt;/h4&gt;

&lt;p&gt;&lt;em&gt;This excludes countries registering 0 debt&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;select country_name, sum(debt) as total_debt
from international_debt_analysis
where debt&amp;gt; 0 
group by country_name
order by total_debt asc
limit 5;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbrrc6jqj2oksjpgiakpz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbrrc6jqj2oksjpgiakpz.png" alt="Five Countries with the least debt" width="800" height="397"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h5&gt;
  
  
  Fig:5 Countries with the Lowest Debt
&lt;/h5&gt;

&lt;h4&gt;
  
  
  3.8.3   Number of countries with missing debt values or 0 debt
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;select count(distinct country_name) as countries_with_zero_or_missing_debt
from international_debt_analysis
where debt is null or debt = 0;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Total number of countries where debt is 0 or missing debt are &lt;strong&gt;110&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Overall, the data suggests a significant dependence on external financing, with repayment pressures concentrated in a few major economies and vulnerable groups. This underlines the importance of careful debt management policies, diversification of financing sources, and sustainable borrowing strategies to reduce long-term risks.&lt;/p&gt;

&lt;h4&gt;
  
  
  References and Further Reading
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;World Bank. (2023). &lt;em&gt;International Debt Statistics&lt;/em&gt;. Washington, DC: World Bank. Available at: &lt;a href="https://databank.worldbank.org/source/international-debt-statistics" rel="noopener noreferrer"&gt;https://databank.worldbank.org/source/international-debt-statistics&lt;/a&gt;  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;International Monetary Fund (IMF). (2022). &lt;em&gt;Global Debt Database&lt;/em&gt;. Washington, DC: IMF. Available at: &lt;a href="https://www.imf.org/en/Data" rel="noopener noreferrer"&gt;https://www.imf.org/en/Data&lt;/a&gt;  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Reinhart, C. M., &amp;amp; Rogoff, K. S. (2010). &lt;em&gt;Growth in a Time of Debt&lt;/em&gt;. American Economic Review, 100(2), 573–578.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;FAO &amp;amp; UNCTAD. (2021). &lt;em&gt;Financing Sustainable Development in Developing Countries&lt;/em&gt;. Geneva: United Nations.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;PostgreSQL Global Development Group. (2023). &lt;em&gt;PostgreSQL Documentation&lt;/em&gt;. Available at: &lt;a href="https://www.postgresql.org/docs/" rel="noopener noreferrer"&gt;https://www.postgresql.org/docs/&lt;/a&gt;  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;IBM Developer. (2021). &lt;em&gt;Best Practices in Data Analysis Using SQL&lt;/em&gt;. IBM Developer Portal.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;O’Neil, P., &amp;amp; O’Neil, E. (2014). &lt;em&gt;Database Principles, Programming, and Performance&lt;/em&gt;. Morgan Kaufmann.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;DataCamp. (2022). &lt;em&gt;SQL for Data Analysis: Concepts and Practice&lt;/em&gt;. Available at: &lt;a href="https://www.datacamp.com" rel="noopener noreferrer"&gt;https://www.datacamp.com&lt;/a&gt;  &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>internationaldebt</category>
      <category>exploringpostgressql</category>
      <category>dataanalysis</category>
      <category>exploringdata</category>
    </item>
    <item>
      <title>A Guide for Creating a Linux Server VM on Azure &amp; Installing PostgreSQL</title>
      <dc:creator>Loi2008</dc:creator>
      <pubDate>Sat, 02 Aug 2025 07:42:59 +0000</pubDate>
      <link>https://dev.to/loi2008/a-guide-for-creating-a-linux-server-vm-on-azure-installing-postgresql-1dnh</link>
      <guid>https://dev.to/loi2008/a-guide-for-creating-a-linux-server-vm-on-azure-installing-postgresql-1dnh</guid>
      <description>&lt;h2&gt;
  
  
  This guide covers:
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Creating a Linux VM on Azure&lt;/li&gt;
&lt;li&gt;Installing PostgreSQL&lt;/li&gt;
&lt;li&gt;Configuring PostgreSQL for local and remote access&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  ✅ Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Active &lt;strong&gt;Azure subscription&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Access to &lt;a href="https://portal.azure.com" rel="noopener noreferrer"&gt;Azure Portal&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;SSH client (Linux on Windows)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🧰 Part 1: Create a Linux VM on Azure
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Go to Azure Portal
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Navigate to: &lt;a href="https://portal.azure.com" rel="noopener noreferrer"&gt;https://portal.azure.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Go to &lt;strong&gt;"Virtual Machines"&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;"+ Create" &amp;gt; "Azure virtual machine"&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Configure VM Basics
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Subscription&lt;/strong&gt;: Choose your active subscription ( subscribe if you do not have an active subscription)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource Group&lt;/strong&gt;: Create/select one&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VM Name&lt;/strong&gt;: e.g., &lt;code&gt;linux-postgres-vm&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Region&lt;/strong&gt;: Closest to your location&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Image&lt;/strong&gt;: &lt;code&gt;Ubuntu 22.04 LTS&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Size&lt;/strong&gt;: e.g., &lt;code&gt;Standard B1s&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Authentication&lt;/strong&gt;: - &lt;code&gt;SSH public key&lt;/code&gt; (recommended)

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Password&lt;/strong&gt;:e.g, &lt;code&gt;1234&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Username&lt;/strong&gt;: e.g., &lt;code&gt;myLinuxServer&lt;/code&gt;
&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;SSH public key&lt;/strong&gt;: Paste your &lt;strong&gt;public key&lt;/strong&gt; if using SSH&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You can generate one using:&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;pre class="highlight shell"&gt;&lt;code&gt;ssh-keygen &lt;span class="nt"&gt;-t&lt;/span&gt; rsa &lt;span class="nt"&gt;-b&lt;/span&gt; 2048
&lt;/code&gt;&lt;/pre&gt;




&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Networking
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Public IP&lt;/strong&gt;: Enabled&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NSG (firewall)&lt;/strong&gt;: Allow SSH (port 22) &lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Create the VM
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Click &lt;strong&gt;"Review + create"&lt;/strong&gt; then &lt;strong&gt;"Create"&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🔌 Part 2: SSH into the VM
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ssh azureuser@&amp;lt;your-vm-public-ip- e.g, 192.168.20.139&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🐘 Part 3: Install PostgreSQL
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Update Packages
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;sudo &lt;/span&gt;apt upgrade &lt;span class="nt"&gt;-y&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Install PostgreSQL
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install &lt;/span&gt;postgresql postgresql-contrib &lt;span class="nt"&gt;-y&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Enable and Start the Service
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl &lt;span class="nb"&gt;enable &lt;/span&gt;postgresql
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl start postgresql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🔐 Part 4: Configure PostgreSQL
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Switch to postgres User
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; postgres
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Access PostgreSQL Terminal
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;psql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Create User and Database
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;ROLE&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;myPostgres&lt;/span&gt; &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;LOGIN&lt;/span&gt; &lt;span class="n"&gt;PASSWORD&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'1234'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;ROLE&lt;/span&gt; &lt;span class="n"&gt;myPostgres&lt;/span&gt; &lt;span class="k"&gt;CREATEDB&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;DATABASE&lt;/span&gt; &lt;span class="n"&gt;student&lt;/span&gt; &lt;span class="k"&gt;OWNER&lt;/span&gt; &lt;span class="n"&gt;myPostgres&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;
&lt;span class="n"&gt;exit&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🌐 Part 5: Allow Remote Access (Optional)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Edit postgresql.conf
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;nano /etc/postgresql/15/main/postgresql.conf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Find:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;listen_addresses = 'localhost'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Change to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;listen_addresses = '*'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Edit pg_hba.conf
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;nano /etc/postgresql/15/main/pg_hba.conf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add this line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;host    all             all             0.0.0.0/0               md5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Allow Port 5432 in Azure NSG
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Go to Azure portal &amp;gt; VM &amp;gt; &lt;strong&gt;Networking&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;"Add inbound port rule"&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Port: &lt;code&gt;5432&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Protocol: TCP&lt;/li&gt;
&lt;li&gt;Action: Allow&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Restart PostgreSQL
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl restart postgresql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🧪 Part 6: Connect Remotely
&lt;/h2&gt;

&lt;p&gt;From your local pc:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;psql &lt;span class="nt"&gt;-U&lt;/span&gt; myPostgres &lt;span class="nt"&gt;-d&lt;/span&gt; student &lt;span class="nt"&gt;-h&lt;/span&gt; &amp;lt;192.168.20.139&amp;gt; &lt;span class="nt"&gt;-p&lt;/span&gt; 5432
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or use GUI tools such as &lt;strong&gt;DBeaver&lt;/strong&gt;. &lt;/p&gt;




&lt;p&gt;Your Linux VM is now running PostgreSQL and ready to accept connections.&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
