<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: kito2718</title>
    <description>The latest articles on DEV Community by kito2718 (@kito2718).</description>
    <link>https://dev.to/kito2718</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4005968%2Fd44f6292-d85b-4f22-897d-578dc7296268.png</url>
      <title>DEV Community: kito2718</title>
      <link>https://dev.to/kito2718</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kito2718"/>
    <language>en</language>
    <item>
      <title>Kaggle Titanic: Improving Survival Prediction with Random Forest Age Imputation</title>
      <dc:creator>kito2718</dc:creator>
      <pubDate>Thu, 02 Jul 2026 14:00:04 +0000</pubDate>
      <link>https://dev.to/kito2718/kaggle-titanic-improving-survival-prediction-with-random-forest-age-imputation-5b3l</link>
      <guid>https://dev.to/kito2718/kaggle-titanic-improving-survival-prediction-with-random-forest-age-imputation-5b3l</guid>
      <description>&lt;p&gt;&lt;a href="https://dev.to/kito2718/setting-up-kaggle-titanic-environment-on-a-local-pc-336k"&gt;Kaggle Practice 1: Setting Up a Local Environment for the Kaggle Titanic Competition&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/kito2718/kaggle-titanic-my-first-submission-eda-feature-engineering-and-model-evaluation-896"&gt;Kaggle Practice 2: First Submission&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/kito2718/kaggle-titanic-cabin-feature-engineering-is-it-really-effective-44nc"&gt;Kaggle Practice 3: Feature Engineering for Cabin&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/kito2718/kaggle-titanic-improving-survival-prediction-with-random-forest-age-imputation-5b3l"&gt;Kaggle Practice 4: Feature Engineering (Imputing Age with Random Forest)&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.kaggle.com/c/titanic" rel="noopener noreferrer"&gt;https://www.kaggle.com/c/titanic&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/kito2718/KaggleTitanic" rel="noopener noreferrer"&gt;Available on GitHub&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;
  
  
  Abstract
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Changed age imputation from median values by title to predictive imputation using &lt;code&gt;RandomForestRegressor&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The best 5-Fold CV (Cross-Validation) score improved from 0.8507 to 0.8519 (Logistic Regression).&lt;/li&gt;
&lt;li&gt;Kaggle Public Score increased from 0.78708 to 0.78947.&lt;/li&gt;
&lt;li&gt;Validation code is committed to GitHub: &lt;a href="https://github.com/kito2718/KaggleTitanic/blob/main/notebooks/titanic_eda_20260702_2031_age_imputation_.ipynb" rel="noopener noreferrer"&gt;titanic_eda_20260702_2031_age_imputation.ipynb&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;
  
  
  Overview
&lt;/h1&gt;

&lt;p&gt;In the Kaggle Titanic: Machine Learning from Disaster competition, passenger Age is a critical factor for predicting survival.&lt;br&gt;
Previously, we filled missing values with the median age of each passenger title (Mr, Miss, Mrs, Master, Rare). This time, we tried a more advanced approach: predicting the missing ages using a machine learning model (RandomForestRegressor) based on other features (Pclass, Sex, SibSp, Parch, Fare, Embarked, Deck).&lt;/p&gt;
&lt;h1&gt;
  
  
  Implementation
&lt;/h1&gt;

&lt;p&gt;Here is the preprocessing code for the imputation. We trained &lt;code&gt;RandomForestRegressor&lt;/code&gt; on passengers with known ages and predicted the missing values.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.ensemble&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RandomForestRegressor&lt;/span&gt;

&lt;span class="c1"&gt;# Features used for predicting Age
&lt;/span&gt;&lt;span class="n"&gt;age_features&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Pclass&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Sex&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;SibSp&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Parch&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Fare&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Embarked&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Deck&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;FamilySize&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;IsAlone&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Age&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;df_age_prep&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_all&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;age_features&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# One-Hot Encoding for categorical features
&lt;/span&gt;&lt;span class="n"&gt;cat_cols_for_age&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Sex&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Embarked&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Deck&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;df_age_encoded&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_dummies&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_age_prep&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cat_cols_for_age&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;drop_first&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Split into known and unknown age datasets
&lt;/span&gt;&lt;span class="n"&gt;train_age&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_age_encoded&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df_age_encoded&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Age&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;notnull&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt;
&lt;span class="n"&gt;test_age&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_age_encoded&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df_age_encoded&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Age&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;isnull&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt;

&lt;span class="n"&gt;X_train_age&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;train_age&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;drop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Age&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;y_train_age&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;train_age&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Age&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;X_test_age&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;test_age&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;drop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Age&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Train regressor and predict missing age
&lt;/span&gt;&lt;span class="n"&gt;age_regressor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RandomForestRegressor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_estimators&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;age_regressor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train_age&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train_age&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;predicted_ages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;age_regressor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test_age&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Impute missing values in the original dataframe
&lt;/span&gt;&lt;span class="n"&gt;df_all&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df_all&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Age&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;isnull&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Age&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;predicted_ages&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Validation Results
&lt;/h1&gt;

&lt;p&gt;Comparison of 5-Fold CV (Cross-Validation) accuracy across different models:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Before (Median by Title)&lt;/th&gt;
&lt;th&gt;After (Random Forest Imputation)&lt;/th&gt;
&lt;th&gt;Difference&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Logistic Regression&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.8507 +/- 0.0104&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.8519 +/- 0.0115&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+0.0012&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Random Forest&lt;/td&gt;
&lt;td&gt;0.8204 +/- 0.0193&lt;/td&gt;
&lt;td&gt;0.8249 +/- 0.0348&lt;/td&gt;
&lt;td&gt;+0.0045&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;XGBoost&lt;/td&gt;
&lt;td&gt;0.8215 +/- 0.0241&lt;/td&gt;
&lt;td&gt;0.8226 +/- 0.0244&lt;/td&gt;
&lt;td&gt;+0.0011&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LightGBM&lt;/td&gt;
&lt;td&gt;0.8496 +/- 0.0211&lt;/td&gt;
&lt;td&gt;0.8485 +/- 0.0147&lt;/td&gt;
&lt;td&gt;-0.0011&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Logistic Regression achieved our personal best 5-Fold CV accuracy of &lt;strong&gt;0.8519&lt;/strong&gt;.&lt;br&gt;
We also observed accuracy improvements in tree-based models like Random Forest and XGBoost.&lt;/p&gt;

&lt;h1&gt;
  
  
  Kaggle Submission Score
&lt;/h1&gt;

&lt;p&gt;We predicted the test dataset using the updated Logistic Regression model and submitted it to Kaggle.&lt;br&gt;
Our Public Score successfully improved from &lt;strong&gt;0.78708 to 0.78947&lt;/strong&gt;!&lt;br&gt;
It is encouraging to see that the local CV score improvement translated directly to the Kaggle Public Score.&lt;/p&gt;

&lt;h1&gt;
  
  
  Summary &amp;amp; Next Steps
&lt;/h1&gt;

&lt;p&gt;By estimating passenger age from other relevant features instead of using simple median values, the model could learn a more realistic passenger representation.&lt;br&gt;
For our next attempt, we will target hyperparameter tuning (using Optuna) and model ensembling to achieve further improvements.&lt;/p&gt;

&lt;p&gt;Hope this helps!&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>kaggle</category>
      <category>python</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Kaggle Titanic: Cabin Feature Engineering (Is It Really Effective?)</title>
      <dc:creator>kito2718</dc:creator>
      <pubDate>Thu, 02 Jul 2026 13:56:46 +0000</pubDate>
      <link>https://dev.to/kito2718/kaggle-titanic-cabin-feature-engineering-is-it-really-effective-44nc</link>
      <guid>https://dev.to/kito2718/kaggle-titanic-cabin-feature-engineering-is-it-really-effective-44nc</guid>
      <description>&lt;p&gt;&lt;a href="https://dev.to/kito2718/setting-up-kaggle-titanic-environment-on-a-local-pc-336k"&gt;Kaggle Practice 1: Setting Up a Local Environment for the Kaggle Titanic Competition&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/kito2718/kaggle-titanic-my-first-submission-eda-feature-engineering-and-model-evaluation-896"&gt;Kaggle Practice 2: First Submission&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/kito2718/kaggle-titanic-cabin-feature-engineering-is-it-really-effective-44nc"&gt;Kaggle Practice 3: Feature Engineering for Cabin&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/kito2718/kaggle-titanic-improving-survival-prediction-with-random-forest-age-imputation-5b3l"&gt;Kaggle Practice 4: Feature Engineering (Imputing Age with Random Forest)&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.kaggle.com/c/titanic" rel="noopener noreferrer"&gt;https://www.kaggle.com/c/titanic&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/kito2718/KaggleTitanic" rel="noopener noreferrer"&gt;Available on GitHub&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;
  
  
  Abstract
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Extracted the deck (floor) information from the first letter of the &lt;code&gt;Cabin&lt;/code&gt; feature and created a new feature.&lt;/li&gt;
&lt;li&gt;Summary of the evaluation using 5-Fold CV (Cross-Validation) on models like LightGBM.&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;Continuing with the Kaggle Titanic competition.&lt;br&gt;
In the Titanic dataset, the &lt;code&gt;Cabin&lt;/code&gt; (cabin number) column has more than 70% missing values, so I had previously excluded it.&lt;br&gt;
However, in seeking survival possibilities, the physical distance to lifeboat stations and the rate of flooding when sinking might have differed depending on the cabin deck. Therefore, I thought it could be a meaningful feature.&lt;/p&gt;
&lt;h1&gt;
  
  
  Implementation
&lt;/h1&gt;

&lt;p&gt;I implemented a preprocessing function to extract the first letter of &lt;code&gt;Cabin&lt;/code&gt; as the &lt;code&gt;Deck&lt;/code&gt; feature, filling missing values with &lt;code&gt;'U'&lt;/code&gt; (Unknown).&lt;br&gt;
The code is as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;feature_engineering&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="c1"&gt;# Existing preprocessing (e.g. extracting Title, filling Age)
&lt;/span&gt;    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Fare&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Fare&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;fillna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Fare&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;median&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Embarked&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Embarked&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;fillna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Embarked&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;mode&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="c1"&gt;# Extract the first letter of Cabin as Deck
&lt;/span&gt;    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Deck&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Cabin&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;fillna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;U&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="n"&gt;le&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LabelEncoder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Sex&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Embarked&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Deck&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;le&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After preprocessing, I encoded the categorical variables using &lt;code&gt;LabelEncoder&lt;/code&gt; and added the &lt;code&gt;Deck&lt;/code&gt; feature to the model inputs.&lt;/p&gt;

&lt;h1&gt;
  
  
  Results
&lt;/h1&gt;

&lt;p&gt;Evaluation results using 5-Fold CV:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Before (Baseline)&lt;/th&gt;
&lt;th&gt;After (Cabin Added)&lt;/th&gt;
&lt;th&gt;Difference&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Logistic Regression&lt;/td&gt;
&lt;td&gt;0.8014 +/- 0.0133&lt;/td&gt;
&lt;td&gt;0.7991 +/- 0.0199&lt;/td&gt;
&lt;td&gt;-0.0023&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Random Forest&lt;/td&gt;
&lt;td&gt;0.8227 +/- 0.0077&lt;/td&gt;
&lt;td&gt;0.8148 +/- 0.0149&lt;/td&gt;
&lt;td&gt;-0.0079&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;XGBoost&lt;/td&gt;
&lt;td&gt;0.8181 +/- 0.0220&lt;/td&gt;
&lt;td&gt;0.8227 +/- 0.0159&lt;/td&gt;
&lt;td&gt;+0.0046&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LightGBM&lt;/td&gt;
&lt;td&gt;0.8350 +/- 0.0178&lt;/td&gt;
&lt;td&gt;0.8361 +/- 0.0278&lt;/td&gt;
&lt;td&gt;+0.0011&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The validation score of LightGBM improved slightly from 0.8350 to 0.8361.&lt;br&gt;
Since XGBoost, another tree-based model, also improved, the feature seems to have some effect.&lt;/p&gt;

&lt;h1&gt;
  
  
  Submitting to Kaggle
&lt;/h1&gt;

&lt;p&gt;When I submitted the predictions to the Titanic competition, the public score dropped slightly from 0.77272 to 0.77033. The simple addition of this feature might be introducing noise due to overfitting.&lt;/p&gt;

&lt;h1&gt;
  
  
  Summary
&lt;/h1&gt;

&lt;p&gt;Given the score drop after submission, I will next try creating group features by clustering the cabin information.&lt;/p&gt;

&lt;p&gt;Hope this helps!&lt;/p&gt;

&lt;p&gt;Japanese version:&lt;br&gt;
&lt;a href="https://zenn.dev/rg687076/articles/zenn_260627_0000_00_create_local_titanic_env" rel="noopener noreferrer"&gt;Kaggle Practice 1: Setting up Kaggle Titanic Environment on a Local PC&lt;/a&gt;&lt;br&gt;
&lt;a href="https://zenn.dev/rg687076/articles/zenn_260627_0000_01_first_submission" rel="noopener noreferrer"&gt;Kaggle Practice 2: First Submission&lt;/a&gt;&lt;br&gt;
&lt;a href="https://zenn.dev/rg687076/articles/zenn_260627_1940_01_cabin_feature" rel="noopener noreferrer"&gt;Kaggle Practice 3: Feature Engineering for Cabin&lt;/a&gt;&lt;br&gt;
&lt;a href="https://zenn.dev/rg687076/articles/zenn_20260702_2031_age_imputation" rel="noopener noreferrer"&gt;Kaggle Practice 4: Feature Engineering (Imputing Age with Random Forest)&lt;/a&gt;&lt;/p&gt;

</description>
      <category>kaggle</category>
      <category>machinelearning</category>
      <category>python</category>
      <category>lightgbm</category>
    </item>
    <item>
      <title>Kaggle Titanic: My First Submission (EDA, Feature Engineering, and Model Evaluation)</title>
      <dc:creator>kito2718</dc:creator>
      <pubDate>Thu, 02 Jul 2026 13:49:36 +0000</pubDate>
      <link>https://dev.to/kito2718/kaggle-titanic-my-first-submission-eda-feature-engineering-and-model-evaluation-896</link>
      <guid>https://dev.to/kito2718/kaggle-titanic-my-first-submission-eda-feature-engineering-and-model-evaluation-896</guid>
      <description>&lt;p&gt;&lt;a href="https://dev.to/kito2718/setting-up-kaggle-titanic-environment-on-a-local-pc-336k"&gt;Kaggle Practice 1: Setting Up a Local Environment for the Kaggle Titanic Competition&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/kito2718/kaggle-titanic-my-first-submission-eda-feature-engineering-and-model-evaluation-896"&gt;Kaggle Practice 2: First Submission&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/kito2718/kaggle-titanic-cabin-feature-engineering-is-it-really-effective-44nc"&gt;Kaggle Practice 3: Feature Engineering for Cabin&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/kito2718/kaggle-titanic-improving-survival-prediction-with-random-forest-age-imputation-5b3l"&gt;Kaggle Practice 4: Feature Engineering (Imputing Age with Random Forest)&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.kaggle.com/c/titanic" rel="noopener noreferrer"&gt;https://www.kaggle.com/c/titanic&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/kito2718/KaggleTitanic" rel="noopener noreferrer"&gt;Available on GitHub&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;
  
  
  Abstract
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Participated in the Kaggle Titanic competition and performed Exploratory Data Analysis (EDA).&lt;/li&gt;
&lt;li&gt;Performed basic feature engineering such as extracting title prefix (Title) and family size (FamilySize).&lt;/li&gt;
&lt;li&gt;Compared four modeling approaches (Logistic Regression, Random Forest, XGBoost, and LightGBM) using 5-fold cross-validation..&lt;/li&gt;
&lt;li&gt;LightGBM yielded the best CV score, resulting in a public leaderboard score of 0.77272.&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;This is a continuation of my Kaggle Titanic journey.&lt;br&gt;
In this post, we'll start by visualizing and analyzing variables likely related to survival (such as gender and passenger class) and handling missing values as part of the exploratory data analysis (EDA) process.&lt;br&gt;
After that, we'll extract titles from passenger names, build new features like family size, and compare model performance using 5-Fold Cross-Validation (CV).&lt;/p&gt;
&lt;h1&gt;
  
  
  Exploratory Data Analysis (EDA)
&lt;/h1&gt;

&lt;p&gt;The Titanic dataset contains 891 training rows and 418 test rows. Missing values are one of the first challenges in the Titanic dataset:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cabin&lt;/strong&gt;: 77.1% (mostly missing)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Age&lt;/strong&gt;: 19.9% (about 20% missing)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embarked&lt;/strong&gt;: 0.2% (only 2 missing values)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Visualization
&lt;/h2&gt;

&lt;p&gt;Let's plot basic variables against survival rate.&lt;/p&gt;
&lt;h3&gt;
  
  
  1. Survival Rate by Gender and Passenger Class
&lt;/h3&gt;

&lt;p&gt;Looking at survival rates by gender and passenger class (Pclass) reveals strong insights. Female passengers had a significantly higher survival rate, and passengers in 1st class (Pclass = 1) had the highest chance of survival.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F7bn71vwktw4xjye53r0x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F7bn71vwktw4xjye53r0x.png" alt="Survival Rate by Gender and Passenger Class" width="800" height="207"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  2. Age Distribution
&lt;/h3&gt;

&lt;p&gt;Here is the histogram of age distribution, split by survival status. Younger passengers, especially infants and small children, tended to have higher survival rates. and we can observe specific age groups with higher mortality.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F8o36d92x3xxg53xzxf6h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F8o36d92x3xxg53xzxf6h.png" alt="Age Distribution Histogram" width="800" height="222"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  3. Correlation Map
&lt;/h3&gt;

&lt;p&gt;Let's examine the correlation heatmap between numerical features. As expected, Pclass and Fare exhibit a strong negative correlation because higher-class tickets were generally more expensive.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fhhmfapaww1x0v0sqpbbm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fhhmfapaww1x0v0sqpbbm.png" alt="Correlation Heatmap" width="747" height="584"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here is the formula for calculating correlation ($r$):&lt;br&gt;
$$r = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^{n} (x_i - \bar{x})^2} \sqrt{\sum_{i=1}^{n} (y_i - \bar{y})^2}}$$&lt;/p&gt;

&lt;p&gt;Simplifying this using covariance and standard deviation:&lt;br&gt;
$$r = \frac{\text{Cov}(X, Y)}{\sigma_X \sigma_Y}$$&lt;br&gt;
Where $\text{Cov}(X, Y)$ is the covariance between $X$ and $Y$ (representing how they move together), and $\sigma_X, \sigma_Y$ are their respective standard deviations (representing the spread of the data).&lt;/p&gt;
&lt;h1&gt;
  
  
  Preprocessing and Feature Engineering
&lt;/h1&gt;

&lt;p&gt;Based on our EDA findings, I wrote a &lt;code&gt;feature_engineering&lt;/code&gt; function to prepare the data:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Extract Title&lt;/strong&gt;: Extract honorifics (Mr, Miss, Mrs, Master, etc.) from &lt;code&gt;Name&lt;/code&gt; using regular expressions, mapping rare titles to &lt;code&gt;Rare&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Impute Age&lt;/strong&gt;: Instead of using the global median, impute missing age values using the median age of each specific title group. This approach produces more realistic age estimates than using a single global median. (e.g., child-level ages for 'Master' and adult-level ages for 'Mr/Mrs').&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FamilySize and IsAlone&lt;/strong&gt;: Sum sibling/spouse count (&lt;code&gt;SibSp&lt;/code&gt;) and parent/child count (&lt;code&gt;Parch&lt;/code&gt;) plus one (for the passenger themselves) to create &lt;code&gt;FamilySize&lt;/code&gt;. If &lt;code&gt;FamilySize&lt;/code&gt; is 1, set the &lt;code&gt;IsAlone&lt;/code&gt; flag to 1.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Additional imputations&lt;/strong&gt;: Fill missing &lt;code&gt;Fare&lt;/code&gt; with the median and &lt;code&gt;Embarked&lt;/code&gt; with the mode.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Categorical Encoding&lt;/strong&gt;: Encode &lt;code&gt;Sex&lt;/code&gt;, &lt;code&gt;Embarked&lt;/code&gt;, and &lt;code&gt;Title&lt;/code&gt; using &lt;code&gt;LabelEncoder&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here is the code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;feature_engineering&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="c1"&gt;# Extract Title from Name
&lt;/span&gt;    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; ([A-Za-z]+)\.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;expand&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;title_map&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Mr&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Mr&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Miss&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Miss&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Mrs&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Mrs&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Master&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Master&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Dr&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Rare&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Rev&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Rare&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Col&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Rare&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Major&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Rare&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Mlle&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Miss&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Countess&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Rare&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Ms&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Miss&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Lady&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Rare&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Jonkheer&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Rare&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Don&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Rare&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Dona&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Rare&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Mme&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Mrs&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Capt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Rare&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Sir&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Rare&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;title_map&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;fillna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Rare&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Impute missing Age with median of each Title group
&lt;/span&gt;    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Age&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Age&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fillna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;median&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;

    &lt;span class="c1"&gt;# Family-related features
&lt;/span&gt;    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;FamilySize&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;SibSp&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Parch&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;IsAlone&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;FamilySize&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Impute Fare and Embarked
&lt;/span&gt;    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Fare&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Fare&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;fillna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Fare&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;median&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Embarked&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Embarked&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;fillna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Embarked&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;mode&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="c1"&gt;# Label Encoding for categorical variables
&lt;/span&gt;    &lt;span class="n"&gt;le&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LabelEncoder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Sex&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Embarked&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;le&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After preprocessing, we selected 10 features:&lt;br&gt;
&lt;code&gt;['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked', 'Title', 'FamilySize', 'IsAlone']&lt;/code&gt;.&lt;/p&gt;
&lt;h1&gt;
  
  
  Model Evaluation and Results
&lt;/h1&gt;

&lt;p&gt;We compared four popular machine learning models using Stratified 5-Fold Cross-Validation (CV).&lt;/p&gt;

&lt;p&gt;Here is the evaluation code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;cv&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StratifiedKFold&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_splits&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shuffle&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;models&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Logistic Regression&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;LogisticRegression&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_iter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Random Forest&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;       &lt;span class="nc"&gt;RandomForestClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_estimators&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;XGBoost&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;             &lt;span class="n"&gt;xgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;XGBClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_estimators&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;eval_metric&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;logloss&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;verbosity&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;LightGBM&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;           &lt;span class="n"&gt;lgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LGBMClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_estimators&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;verbosity&lt;/span&gt;&lt;span class="o"&gt;=-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num_leaves&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;learning_rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;cross_val_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cv&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cv&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;scoring&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;accuracy&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;scores&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; +/- &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;std&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The average cross-validation scores are shown below:&lt;br&gt;
| Model | 5-Fold CV Score (Accuracy) |&lt;br&gt;
| :--- | :--- |&lt;br&gt;
| Logistic Regression | 0.8014 +/- 0.0133 |&lt;br&gt;
| Random Forest | 0.8227 +/- 0.0077 |&lt;br&gt;
| XGBoost | 0.8181 +/- 0.0220 |&lt;br&gt;
| LightGBM | 0.8350 +/- 0.0178 |&lt;/p&gt;

&lt;p&gt;We plotted the CV score distributions using boxplots:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fcxv0sfwjdw6w3nyyflfn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fcxv0sfwjdw6w3nyyflfn.png" alt="Model Comparison" width="784" height="384"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;LightGBM achieved the best CV score of &lt;strong&gt;0.8350&lt;/strong&gt;.&lt;br&gt;
Random Forest also performed well, achieving a score of 0.8227.&lt;/p&gt;
&lt;h2&gt;
  
  
  Feature Importance
&lt;/h2&gt;

&lt;p&gt;Let's visualize the feature importances of our best LightGBM model:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fcggjct5c0nnjj1vqh0yr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fcggjct5c0nnjj1vqh0yr.png" alt="Feature Importance" width="784" height="484"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The model relied heavily on Sex, Title, Fare, and Age, which aligns well with the patterns observed during EDA.&lt;/p&gt;
&lt;h1&gt;
  
  
  First Kaggle Submission
&lt;/h1&gt;

&lt;p&gt;Using our best LightGBM model, we generated predictions for the test dataset and created &lt;code&gt;submission.csv&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;preds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;best_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;submission&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;PassengerId&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;PassengerId&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Survived&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;preds&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;submission&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;../submissions/submission.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Leaderboard Score: &lt;strong&gt;0.77272&lt;/strong&gt;.&lt;br&gt;
This became my personal best score at the time.&lt;br&gt;
It is common for public leaderboard scores to be slightly lower than local CV scores due to differences between the training folds and the hidden test set.&lt;/p&gt;

&lt;h1&gt;
  
  
  Summary
&lt;/h1&gt;

&lt;p&gt;We walked through the EDA, feature engineering, and model evaluation workflow. Evaluating multiple models proved to be very effective.&lt;br&gt;
In the next article, we'll tackle the heavily missing Cabin feature and see whether it can help us push the score even higher.&lt;/p&gt;

&lt;p&gt;Japanese version:&lt;br&gt;
&lt;a href="https://zenn.dev/rg687076/articles/zenn_260627_0000_00_create_local_titanic_env" rel="noopener noreferrer"&gt;Kaggle Practice 1: Setting up Kaggle Titanic Environment on a Local PC&lt;/a&gt;&lt;br&gt;
&lt;a href="https://zenn.dev/rg687076/articles/zenn_260627_0000_01_first_submission" rel="noopener noreferrer"&gt;Kaggle Practice 2: First Submission&lt;/a&gt;&lt;br&gt;
&lt;a href="https://zenn.dev/rg687076/articles/zenn_260627_1940_01_cabin_feature" rel="noopener noreferrer"&gt;Kaggle Practice 3: Feature Engineering for Cabin&lt;/a&gt;&lt;br&gt;
&lt;a href="https://zenn.dev/rg687076/articles/zenn_20260702_2031_age_imputation" rel="noopener noreferrer"&gt;Kaggle Practice 4: Feature Engineering (Imputing Age with Random Forest)&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Setting Up a Local Environment for the Kaggle Titanic Competition</title>
      <dc:creator>kito2718</dc:creator>
      <pubDate>Thu, 02 Jul 2026 13:20:58 +0000</pubDate>
      <link>https://dev.to/kito2718/setting-up-kaggle-titanic-environment-on-a-local-pc-336k</link>
      <guid>https://dev.to/kito2718/setting-up-kaggle-titanic-environment-on-a-local-pc-336k</guid>
      <description>&lt;p&gt;&lt;a href="https://dev.to/kito2718/setting-up-kaggle-titanic-environment-on-a-local-pc-336k"&gt;Kaggle Practice 1: Setting Up a Local Environment for the Kaggle Titanic Competition&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/kito2718/kaggle-titanic-my-first-submission-eda-feature-engineering-and-model-evaluation-896"&gt;Kaggle Practice 2: First Submission&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/kito2718/kaggle-titanic-cabin-feature-engineering-is-it-really-effective-44nc"&gt;Kaggle Practice 3: Feature Engineering for Cabin&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/kito2718/kaggle-titanic-improving-survival-prediction-with-random-forest-age-imputation-5b3l"&gt;Kaggle Practice 4: Feature Engineering (Imputing Age with Random Forest)&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.kaggle.com/c/titanic" rel="noopener noreferrer"&gt;https://www.kaggle.com/c/titanic&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/kito2718/KaggleTitanic" rel="noopener noreferrer"&gt;Available on GitHub&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;
  
  
  Abstract
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Set up a local development environment for the Kaggle Titanic competition.&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;Kaggle Notebooks are convenient, but local development offers faster iteration, better IDE support, and easier version control with Git.&lt;br&gt;
I've been participating in the Kaggle Titanic competition and trying to improve my score. However, logging into Kaggle every time I wanted to submit an experiment quickly became tedious.&lt;/p&gt;
&lt;h1&gt;
  
  
  Environment Summary
&lt;/h1&gt;

&lt;p&gt;Surprisingly, all I really needed was Python.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Item&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;td&gt;3.14.6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Virtual Env&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;.venv&lt;/code&gt; (venv)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Project Path&lt;/td&gt;
&lt;td&gt;&lt;code&gt;51_googleantigravity\1st_&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pip&lt;/td&gt;
&lt;td&gt;26.1.x&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;


&lt;h1&gt;
  
  
  Pitfalls/Things to Keep in Mind
&lt;/h1&gt;

&lt;p&gt;I ran into a small issue during setup, so I'll document it here.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Crucial Point:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Avoid using non-ASCII characters (such as Japanese characters) in the virtual environment path.&lt;/strong&gt;
C extension libraries (DLLs) such as &lt;code&gt;scipy&lt;/code&gt; and &lt;code&gt;scikit-learn&lt;/code&gt; failed to load properly when the path contained non-ASCII characters.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;h1&gt;
  
  
  1. Step-by-Step Guide
&lt;/h1&gt;
&lt;h2&gt;
  
  
  1.1. Installing Python
&lt;/h2&gt;

&lt;p&gt;Download and install from the &lt;a href="https://www.python.org/downloads/release/python-3146/" rel="noopener noreferrer"&gt;Official Python Release Page&lt;/a&gt;.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fstatic.zenn.studio%2Fuser-upload%2F1d1950b8d6ca-20260627.png%2520%3D500x" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fstatic.zenn.studio%2Fuser-upload%2F1d1950b8d6ca-20260627.png%2520%3D500x" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Scroll down to the bottom of the page to find the installer.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Also, add Python to your PATH:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight batchfile"&gt;&lt;code&gt;&lt;span class="nb"&gt;setx&lt;/span&gt; &lt;span class="kd"&gt;PATH&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;%PATH%&lt;/span&gt;&lt;span class="s2"&gt;;&amp;lt;Python_Installation_Folder&amp;gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  1.2. Creating the Project Folder
&lt;/h2&gt;

&lt;p&gt;Create it anywhere you like on your PC.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight batchfile"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="m"&gt;51&lt;/span&gt;_googleantigravity\1st_
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  1.3. Creating the Virtual Environment
&lt;/h2&gt;

&lt;p&gt;Create a Python virtual environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight batchfile"&gt;&lt;code&gt;# &lt;span class="kd"&gt;Move&lt;/span&gt; &lt;span class="kd"&gt;to&lt;/span&gt; &lt;span class="kd"&gt;the&lt;/span&gt; &lt;span class="kd"&gt;project&lt;/span&gt; &lt;span class="kd"&gt;folder&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; &lt;span class="m"&gt;51&lt;/span&gt;_googleantigravity\1st_
&lt;span class="kd"&gt;python&lt;/span&gt;&lt;span class="err"&gt;.exe&lt;/span&gt; &lt;span class="na"&gt;-m &lt;/span&gt;&lt;span class="kd"&gt;venv&lt;/span&gt; .venv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Structure after creation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1st_/
└── .venv/
    ├── Scripts/
    │   ├── activate
    │   ├── python.exe
    │   └── pip.exe
    └── Lib/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  1.4. Installing Packages
&lt;/h2&gt;

&lt;p&gt;Install various packages required for the Python code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight batchfile"&gt;&lt;code&gt;&lt;span class="kd"&gt;python&lt;/span&gt;&lt;span class="err"&gt;.exe&lt;/span&gt; &lt;span class="na"&gt;-m &lt;/span&gt;&lt;span class="kd"&gt;pip&lt;/span&gt; &lt;span class="kd"&gt;install&lt;/span&gt; &lt;span class="na"&gt;--upgrade &lt;/span&gt;&lt;span class="kd"&gt;pip&lt;/span&gt; &lt;span class="kd"&gt;pandas&lt;/span&gt; &lt;span class="kd"&gt;numpy&lt;/span&gt; &lt;span class="kd"&gt;scikit&lt;/span&gt;&lt;span class="na"&gt;-learn &lt;/span&gt;&lt;span class="kd"&gt;matplotlib&lt;/span&gt; &lt;span class="kd"&gt;seaborn&lt;/span&gt; &lt;span class="kd"&gt;jupyter&lt;/span&gt; &lt;span class="kd"&gt;kaggle&lt;/span&gt; &lt;span class="kd"&gt;xgboost&lt;/span&gt; &lt;span class="kd"&gt;lightgbm&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Major packages installed:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Package&lt;/th&gt;
&lt;th&gt;Version&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;pandas&lt;/td&gt;
&lt;td&gt;3.0.3&lt;/td&gt;
&lt;td&gt;Data manipulation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;numpy&lt;/td&gt;
&lt;td&gt;2.5.0&lt;/td&gt;
&lt;td&gt;Numerical calculations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;scikit-learn&lt;/td&gt;
&lt;td&gt;1.9.0&lt;/td&gt;
&lt;td&gt;Machine learning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;matplotlib&lt;/td&gt;
&lt;td&gt;3.11.0&lt;/td&gt;
&lt;td&gt;Data visualization&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;seaborn&lt;/td&gt;
&lt;td&gt;0.13.2&lt;/td&gt;
&lt;td&gt;Statistical data visualization&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;xgboost&lt;/td&gt;
&lt;td&gt;3.3.0&lt;/td&gt;
&lt;td&gt;Gradient boosting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;lightgbm&lt;/td&gt;
&lt;td&gt;4.6.0&lt;/td&gt;
&lt;td&gt;Gradient boosting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;jupyter / notebook&lt;/td&gt;
&lt;td&gt;7.6.0&lt;/td&gt;
&lt;td&gt;Jupyter Notebook environment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;kaggle&lt;/td&gt;
&lt;td&gt;2.2.3&lt;/td&gt;
&lt;td&gt;Kaggle CLI&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  1.5. Directory Structure
&lt;/h2&gt;

&lt;p&gt;Create directories for datasets, notebooks, and outputs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight batchfile"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="kd"&gt;data&lt;/span&gt;\raw
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="kd"&gt;data&lt;/span&gt;\processed
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="kd"&gt;notebooks&lt;/span&gt;
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="kd"&gt;submissions&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Organized folder structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1st_/
├── .venv/
├── data/
│   ├── raw/          ← Kaggle raw data
│   └── processed/    ← Visualizations and preprocessed data
├── notebooks/        ← Notebook files
└── submissions/      ← Submission CSV files
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  1.6. Creating requirements.txt
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight batchfile"&gt;&lt;code&gt;&lt;span class="kd"&gt;python&lt;/span&gt;&lt;span class="err"&gt;.exe&lt;/span&gt; &lt;span class="na"&gt;-m &lt;/span&gt;&lt;span class="kd"&gt;pip&lt;/span&gt; &lt;span class="kd"&gt;freeze&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="kd"&gt;requirements&lt;/span&gt;.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  1.7. Creating the Notebook
&lt;/h2&gt;

&lt;p&gt;Create &lt;code&gt;notebooks/titanic_eda.ipynb&lt;/code&gt; with the following structure:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Cell&lt;/th&gt;
&lt;th&gt;Content&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Import libraries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Load data (train.csv / test.csv)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Exploratory Data Analysis (missing values, survival rate, age distribution, correlation heatmap)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Feature Engineering (extracting titles, imputing age, family size, etc.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Model training&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Generate submission file (output to submissions/submission.csv)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  1.8. Launching Jupyter Notebook
&lt;/h2&gt;

&lt;p&gt;Start the notebook server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight batchfile"&gt;&lt;code&gt;# &lt;span class="kd"&gt;Move&lt;/span&gt; &lt;span class="kd"&gt;to&lt;/span&gt; &lt;span class="kd"&gt;project&lt;/span&gt; &lt;span class="kd"&gt;directory&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; &lt;span class="m"&gt;51&lt;/span&gt;_googleantigravity\1st_
# &lt;span class="kd"&gt;Activate&lt;/span&gt; &lt;span class="kd"&gt;virtual&lt;/span&gt; &lt;span class="kd"&gt;environment&lt;/span&gt;
.venv\Scripts\activate
# &lt;span class="kd"&gt;Start&lt;/span&gt; &lt;span class="kd"&gt;notebook&lt;/span&gt;
&lt;span class="kd"&gt;jupyter&lt;/span&gt; &lt;span class="kd"&gt;notebook&lt;/span&gt; &lt;span class="kd"&gt;notebooks&lt;/span&gt;\titanic_eda.ipynb
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your browser should open automatically. If not, copy the URL displayed in the terminal and paste it into your browser.&lt;/p&gt;

&lt;h2&gt;
  
  
  1.9. Preparing Kaggle CLI
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;a href="https://www.kaggle.com/settings" rel="noopener noreferrer"&gt;Kaggle Settings&lt;/a&gt; -&amp;gt; "API Tokens" tab.&lt;/li&gt;
&lt;li&gt;Click "Create Legacy API Key" under the Legacy API Credentials section.&lt;/li&gt;
&lt;li&gt;Place the downloaded &lt;code&gt;kaggle.json&lt;/code&gt; file at &lt;code&gt;C:\Users\&amp;lt;your_username&amp;gt;\.kaggle\kaggle.json&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Run the following:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight batchfile"&gt;&lt;code&gt;# &lt;span class="kd"&gt;Activate&lt;/span&gt; &lt;span class="kd"&gt;virtual&lt;/span&gt; &lt;span class="kd"&gt;environment&lt;/span&gt;
.venv\Scripts\activate
# &lt;span class="kd"&gt;Download&lt;/span&gt; &lt;span class="kd"&gt;Titanic&lt;/span&gt; &lt;span class="kd"&gt;data&lt;/span&gt; &lt;span class="kd"&gt;using&lt;/span&gt; &lt;span class="kd"&gt;Kaggle&lt;/span&gt; &lt;span class="kd"&gt;API&lt;/span&gt;
&lt;span class="kd"&gt;kaggle&lt;/span&gt; &lt;span class="kd"&gt;competitions&lt;/span&gt; &lt;span class="kd"&gt;download&lt;/span&gt; &lt;span class="na"&gt;-c &lt;/span&gt;&lt;span class="kd"&gt;titanic&lt;/span&gt; &lt;span class="na"&gt;-p &lt;/span&gt;&lt;span class="kd"&gt;data&lt;/span&gt;\raw
# &lt;span class="kd"&gt;Extract&lt;/span&gt; &lt;span class="kd"&gt;ZIP&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;using&lt;/span&gt; &lt;span class="kd"&gt;standard&lt;/span&gt; &lt;span class="kd"&gt;Windows&lt;/span&gt; &lt;span class="kd"&gt;tar&lt;/span&gt; &lt;span class="kd"&gt;command&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;tar&lt;/span&gt; &lt;span class="na"&gt;-xf &lt;/span&gt;&lt;span class="kd"&gt;data&lt;/span&gt;\raw\titanic.zip &lt;span class="na"&gt;-C &lt;/span&gt;&lt;span class="kd"&gt;data&lt;/span&gt;\raw
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Alternatively, you can download files manually:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Download the following 3 files from &lt;a href="https://www.kaggle.com/c/titanic/data" rel="noopener noreferrer"&gt;https://www.kaggle.com/c/titanic/data&lt;/a&gt; and place them in &lt;code&gt;data\raw\&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;train.csv&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;test.csv&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gender_submission.csv&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  1.10. Submitting to Kaggle
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight batchfile"&gt;&lt;code&gt;&lt;span class="kd"&gt;kaggle&lt;/span&gt; &lt;span class="kd"&gt;competitions&lt;/span&gt; &lt;span class="kd"&gt;submit&lt;/span&gt; &lt;span class="na"&gt;-c &lt;/span&gt;&lt;span class="kd"&gt;titanic&lt;/span&gt; &lt;span class="na"&gt;-f &lt;/span&gt;&lt;span class="kd"&gt;submissions&lt;/span&gt;\submission.csv &lt;span class="na"&gt;-m &lt;/span&gt;&lt;span class="s2"&gt;"Your comment"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it! You now have a fully functional local environment for experimenting with Kaggle Titanic and submitting results directly from your machine.&lt;/p&gt;

&lt;p&gt;Japanese version:&lt;br&gt;
&lt;a href="https://zenn.dev/rg687076/articles/zenn_260627_0000_00_create_local_titanic_env" rel="noopener noreferrer"&gt;Kaggle Practice 1: Setting up Kaggle Titanic Environment on a Local PC&lt;/a&gt;&lt;br&gt;
&lt;a href="https://zenn.dev/rg687076/articles/zenn_260627_0000_01_first_submission" rel="noopener noreferrer"&gt;Kaggle Practice 2: First Submission&lt;/a&gt;&lt;br&gt;
&lt;a href="https://zenn.dev/rg687076/articles/zenn_260627_1940_01_cabin_feature" rel="noopener noreferrer"&gt;Kaggle Practice 3: Feature Engineering for Cabin&lt;/a&gt;&lt;br&gt;
&lt;a href="https://zenn.dev/rg687076/articles/zenn_20260702_2031_age_imputation" rel="noopener noreferrer"&gt;Kaggle Practice 4: Feature Engineering (Imputing Age with Random Forest)&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
