I was discussing this with my co-worker (who is also an ML engineer) as to how a beginner like me should approach machine learning? She said now that I've intentionally mastered NumPy → Pandas → Data Preprocessing conceptually, the next concept should NOT be “more tools”.
It should be ML thinking itself!
Her suggestion, somehow, did not sit well with me—partly because there are endless tools if you think about it! I had narrowed things down to NumPy, Pandas, Data Preprocessing and Scikit-learn (I haven’t covered this topic yet, for reasons I’ll explain as we dive deeper into this post) based on my own understanding of the subject. However, what she said next made more sense to me, because this is where my perspective as a software engineer comes into play—it’s important to understand the mental model behind algorithms.
If you are an iterative learner like me, you're right to pause here and think about why we shouldn't jump into scikit-learn before understanding how learning itself works?
Short answer(the important one): learn just enough scikit-learn, but after you understand how learning works.
Let me elaborate on this:
🎯 The Correct Order (Beginner-Optimal)
You should NOT fully learn scikit-learn before understanding:
- what a model is
- what loss is
- what training means
- what overfitting is
Otherwise, scikit-learn becomes a black box.
🧠 Think of scikit-learn like this
- Concepts → why something works
- scikit-learn → how to apply it quickly
If you reverse this order:
model = LinearRegression()
model.fit(X, y)
You can run code — but you don’t actually know what happened!
- why it works?
- when it fails?
- what assumptions it makes?
Instead, you(a beginner) should learn learning types + core ML ideas.
✅ What You SHOULD do instead (Best approach)
Step 1️⃣ — Learn learning concepts (NO scikit-learn yet)
(This is what we are already doing)
Learn conceptually:
- Supervised learning
- Regression vs classification
- Model = function
- Loss function
- Overfitting vs underfitting
- Train vs test behavior
👉 This can be done with math intuition + NumPy.
Step 2️⃣ — Implement Linear Regression from scratch
Using:
- NumPy
- A few lines of math
- No ML libraries
This answers:
“How does the model actually learn?”
Step 3️⃣ — THEN introduce scikit-learn (lightly)
Once the concept clicks, scikit-learn becomes:
- Clean
- Logical
- Easy
You’ll instantly understand:
.fit().predict().score()
❌ What NOT to do (common beginner mistake)
❌ Deep dive into scikit-learn API
❌ Memorize classifiers and parameters
❌ Jump to advanced models too early
This creates fragile understanding.
🧭 Minimal scikit-learn you may peek at (optional)
It’s okay to recognize these, not master them yet:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
(You already used these in previous posts.)
But don’t learn models yet.
🎯 The Next Beginner ML Concept: Supervised Learning Fundamentals
🔑 Concept 1: Types of Machine Learning
1️⃣ Supervised Learning (START HERE)
You have:
- Input features (X)
- Correct answers (y)
Examples:
- Predict salary → regression
- Predict spam/not spam → classification
This is 90% of beginner ML.
2️⃣ Unsupervised Learning (later)
- No labels.
- Model finds structure itself.
Examples:
Customer segmentation: → “Group similar customers”
Clustering→ “The method used to form those groups”
3️⃣ Reinforcement Learning (much later)
- Agent learns via rewards.
📌 For now: Focus ONLY on Supervised Learning.
🔑 Concept 2: Regression vs Classification
🟦 Regression
Predict a number.
House price → $250,000
Temperature → 28.5°C
🟥 Classification
Predict a category.
Spam / Not Spam
Yes / No
🧠 Tiny mental exercise
Which is which?
| Problem | Type |
| ------------------ | -------------- |
| Predict exam score | Regression |
| Predict pass/fail | Classification |
🔑 Concept 3: Model, Parameters & Learning
🧠 What is a model?
A mathematical function that maps:
X → y
Example:
y = w*x + b
-
w→ weight (importance) -
b→ bias (offset)
Learning = finding best w and b.
🔑 Concept 4: Loss Function (VERY IMPORTANT)
🧠 What is loss?
“How wrong is the model?”
Example:
- True value = 100
- Prediction = 90
- Error = 10
Loss function quantifies this error.
Common:
- Mean Squared Error (MSE)
🔑 Concept 5: Training vs Prediction
Training phase:
- Model sees data
- Adjusts parameters
- Minimizes loss
Prediction phase:
- Model is frozen
- Makes predictions on new data
🔑 Concept 6: Overfitting vs Underfitting
Underfitting:
- Model too simple
- Misses patterns
Overfitting:
- Model memorizes data
- Fails on new data
📌 This is the heart of ML.
🔑 Concept 7: Evaluation Metrics (Conceptual)
You don’t evaluate a model on training data.
Examples:
- Regression → MSE, RMSE, R²
- Classification → Accuracy, Precision, Recall
(You’ll learn these slowly — concept first.)
I know I’ve introduced a few advanced terms at a beginner level to give an idea of what the roadmap to understanding machine learning looks like. Don’t worry if they feel unfamiliar right now — I’ll be exploring each of these topics in depth as we go.
You can refer to these posts for understanding NumPy, Pandas and Data Preprocessing:
Understanding NumPy in the context of Python for Machine Learning
The next basic concept of Machine Learning after NumPy: Pandas
Understanding Data Preprocessing
Beginner-friendly exercises on NumPy, Pandas and Data Preprocessing
Top comments (0)