I was discussing this with my co-worker(who is also an ML engineer) as to how a beginner like me should approach machine learning? She said now that I've intentionally mastered NumPy → Pandas → Data Preprocessing conceptually, the next concept should NOT be “more tools”.
It should be ML thinking itself!
Her suggestion, somehow, did not sit well with me. Partially because there are endless tools if you think about it! I narrowed it down to NumPy, Pandas, Data Preprocessing and Scikit-learn( I haven't covered this topic yet for the reasons I will be explaining as we dive deep into this post) as per my own understanding of the subject itself but then what she said next sort of made more sense to me because this is where my own understanding as a software engineer comes into perspective - it's important to understand the mental model behind algorithms.
If you are an iterative learner like me, you are right to pause here and think about why should we not go over scikit-learn before we understand how learning works?
Short answer(the important one): learn just enough scikit-learn, but after you understand how learning works.
Let me elaborate on this:
🎯 The Correct Order (Beginner-Optimal)
You should NOT fully learn scikit-learn before understanding:
- what a model is
- what loss is
- what training means
- what overfitting is
Otherwise, scikit-learn becomes a black box.
🧠 Think of scikit-learn like this
- Concepts → why something works
- scikit-learn → how to apply it quickly
If you reverse this order:
model = LinearRegression()
model.fit(X, y)
you can run code — but you don’t actually know what happened!
✅ What You SHOULD do instead (Best approach)
Step 1️⃣ — Learn learning concepts (NO scikit-learn yet)
(This is what we are already doing)
Learn conceptually:
- Supervised learning
- Regression vs classification
- Model = function
- Loss function
- Overfitting vs underfitting
- Train vs test behavior
👉 This can be done with math intuition + NumPy.
Step 2️⃣ — Implement Linear Regression from scratch
Using:
- NumPy
- A few lines of math
- No ML libraries
This answers:
“How does the model actually learn?”
Step 3️⃣ — THEN introduce scikit-learn (lightly)
Once the concept clicks, scikit-learn becomes:
- Clean
- Logical
- Easy
You’ll instantly understand:
.fit().predict().score()
❌ What NOT to do (common beginner mistake)
❌ Deep dive into scikit-learn API
❌ Memorize classifiers and parameters
❌ Jump to advanced models too early
This creates fragile understanding.
🧭 Minimal scikit-learn you may peek at (optional)
It’s okay to recognize these, not master them yet:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
(You already used these in previous posts.)
But don’t learn models yet.
You can refer to these posts for understanding NumPy, Pandas and Data Preprocessing:
Understanding NumPy in the context of Python for Machine Learning
The next basic concept of Machine Learning after NumPy: Pandas
Understanding Data Preprocessing
Beginner-friendly exercises on NumPy, Pandas and Data Preprocessing
Top comments (0)