#beginners #machinelearning #pandas #kaggle

#datascience #devjournal #machinelearning #beginners

Apr 14, 2026 · Day 3 of 30

Day 3: I explored the Titanic and Iris datasets the code was easy, understanding the data was not

Today I worked with real datasets for the first time the Titanic and Iris datasets from Kaggle. Loading them, exploring them, filtering columns, even making a few basic visualizations. The tools worked. Pandas did what it was supposed to do.

But somewhere in the middle I hit something I didn't expect: I had no idea what I was actually looking at.

Two very different datasets

Titanic

Passenger data age, class, fare, survived or not. Real people, real event. Messy, missing values everywhere.

Iris

Flower measurements petal length, sepal width, species. Clean, tidy, almost too perfect. A classic beginner dataset for a reason.

What I did

I started with the basics — loading the CSV, running .head(), .info(), and .describe() to get a feel for the shape of the data. Then I filtered some columns, looked at value counts, and made a couple of simple plots.

import pandas as pd df = pd.read_csv("titanic.csv") print(df.head()) print(df.describe()) print(df["Survived"].value_counts())

Technically, all of this worked fine. The output printed, the charts showed up. But then I stopped and asked myself okay, what does this actually tell me?

The part nobody warns you about

I could see that the Pclass column had values 1, 2, and 3. I had to look up that those are ticket classes first, second, third. I could see SibSp and had no idea what it meant without reading the documentation (siblings and spouses on board, apparently).

The code was easy. Understanding what the numbers represent and why it matters for prediction that was the actual challenge of today.

With Iris it was simpler flowers, measurements, species. That clicked faster. But Titanic taught me something important: before you model anything, you have to understand the story the data is trying to tell.

What I'm taking into Day 4

I want to slow down on Titanic a bit more specifically look at which columns might actually predict survival, and why. Not training a model yet. Just thinking about the data like a human before handing it to a machine.

I think that mindset shift from "run the code" to "understand the data" might be one of the most important early lessons in ML.

QUESTION FOR YOU

When you first worked with the Titanic dataset what was the column or insight that made you go "oh, this is what ML is actually about"?