Juhi Kushwah

Posted on Jan 14

How does a machine actually learn from data?

#100daysofcode #mlbasics #learningmodels #scikitlearn

I was discussing this with my co-worker (who is also an ML engineer) as to how a beginner like me should approach machine learning? She said now that I've intentionally mastered NumPy → Pandas → Data Preprocessing conceptually, the next concept should NOT be “more tools”.

It should be ML thinking itself!

Her suggestion, somehow, did not sit well with me—partly because there are endless tools if you think about it! I had narrowed things down to NumPy, Pandas, Data Preprocessing and Scikit-learn (I haven’t covered this topic yet, for reasons I’ll explain as we dive deeper into this post) based on my own understanding of the subject. However, what she said next made more sense to me, because this is where my perspective as a software engineer comes into play—it’s important to understand the mental model behind algorithms.

If you are an iterative learner like me, you're right to pause here and think about why we shouldn't jump into scikit-learn before understanding how learning itself works?

Short answer(the important one): learn just enough scikit-learn, but after you understand how learning works.

Let me elaborate on this:

🎯 The Correct Order (Beginner-Optimal)
You should NOT fully learn scikit-learn before understanding:

what a model is
what loss is
what training means
what overfitting is

Otherwise, scikit-learn becomes a black box.

🧠 Think of scikit-learn like this

Concepts → why something works
scikit-learn → how to apply it quickly

If you reverse this order:

model = LinearRegression()
model.fit(X, y)

You can run code — but you don’t actually know what happened!

why it works?
when it fails?
what assumptions it makes?

Instead, you(a beginner) should learn learning types + core ML ideas.

✅ What You SHOULD do instead (Best approach)
Step 1️⃣ — Learn learning concepts (NO scikit-learn yet)
(This is what we are already doing)

Learn conceptually:

Supervised learning
Regression vs classification
Model = function
Loss function
Overfitting vs underfitting
Train vs test behavior

👉 This can be done with math intuition + NumPy.

Step 2️⃣ — Implement Linear Regression from scratch

Using:

NumPy
A few lines of math
No ML libraries

This answers:

“How does the model actually learn?”

Step 3️⃣ — THEN introduce scikit-learn (lightly)

Once the concept clicks, scikit-learn becomes:

Clean
Logical
Easy

You’ll instantly understand:

.fit()
.predict()
.score()

❌ What NOT to do (common beginner mistake)
❌ Deep dive into scikit-learn API
❌ Memorize classifiers and parameters
❌ Jump to advanced models too early

This creates fragile understanding.

🧭 Minimal scikit-learn you may peek at (optional)

It’s okay to recognize these, not master them yet:

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

(You already used these in previous posts.)

But don’t learn models yet.

🎯 The Next Beginner ML Concept: Supervised Learning Fundamentals

🔑 Concept 1: Types of Machine Learning

1️⃣ Supervised Learning (START HERE)

You have:

Input features (X)
Correct answers (y)

Examples:

Predict salary → regression
Predict spam/not spam → classification

This is 90% of beginner ML.

2️⃣ Unsupervised Learning (later)

No labels.
Model finds structure itself.

Examples:
Customer segmentation: → “Group similar customers”
Clustering→ “The method used to form those groups”

3️⃣ Reinforcement Learning (much later)

Agent learns via rewards.

📌 For now: Focus ONLY on Supervised Learning.

🔑 Concept 2: Regression vs Classification

🟦 Regression

Predict a number.

House price → $250,000
Temperature → 28.5°C

🟥 Classification

Predict a category.

Spam / Not Spam
Yes / No

🧠 Tiny mental exercise
Which is which?

| Problem            | Type           |
| ------------------ | -------------- |
| Predict exam score | Regression     |
| Predict pass/fail  | Classification |

🔑 Concept 3: Model, Parameters & Learning

🧠 What is a model?
A mathematical function that maps:

X → y

Example:

y = w*x + b

w → weight (importance)
b → bias (offset)

Learning = finding best w and b.

🔑 Concept 4: Loss Function (VERY IMPORTANT)

🧠 What is loss?

“How wrong is the model?”

Example:

True value = 100
Prediction = 90
Error = 10

Loss function quantifies this error.

Common:

Mean Squared Error (MSE)

🔑 Concept 5: Training vs Prediction

Training phase:

Model sees data
Adjusts parameters
Minimizes loss

Prediction phase:

Model is frozen
Makes predictions on new data

🔑 Concept 6: Overfitting vs Underfitting

Underfitting:

Model too simple
Misses patterns

Overfitting:

Model memorizes data
Fails on new data

📌 This is the heart of ML.

🔑 Concept 7: Evaluation Metrics (Conceptual)

You don’t evaluate a model on training data.

Examples:

Regression → MSE, RMSE, R²
Classification → Accuracy, Precision, Recall

(You’ll learn these slowly — concept first.)

I know I’ve introduced a few advanced terms at a beginner level to give an idea of what the roadmap to understanding machine learning looks like. Don’t worry if they feel unfamiliar right now — I’ll be exploring each of these topics in depth as we go.

You can refer to these posts for understanding NumPy, Pandas and Data Preprocessing:
Understanding NumPy in the context of Python for Machine Learning
The next basic concept of Machine Learning after NumPy: Pandas
Understanding Data Preprocessing
Beginner-friendly exercises on NumPy, Pandas and Data Preprocessing