Before diving deep into Machine Learning, I would like to share tiny, beginner-friendly code-based exercises based on NumPy, Pandas and Data Preprocessing - small, focused and ML oriented.
β NumPy Mini Exercises (Level: Very Easy)
Make sure you import NumPy first:
import numpy as np
Exercise 1 β Create a NumPy array
Create a NumPy array containing these numbers:
[2, 4, 6, 8]
Solution:
a = np.array([2, 4, 6, 8])
Exercise 2 β Create a 2D array
Create this 2Γ2 matrix:
1 2
3 4
Solution:
m = np.array([[1, 2],
[3, 4]])
Exercise 3 β Array shape
Find the shape of this array:
a = np.array([[10, 20, 30], [40, 50, 60]])
Solution:
a = np.array([[10, 20, 30], [40, 50, 60]])
a.shape
Output:
(2, 3)
Exercise 4 β Element-wise operations
Given:
a = np.array([1, 2, 3])
b = np.array([10, 20, 30])
Compute:
- a + b
- a * b
Solution:
I. Addition
a + b
Output:
array([11, 22, 33])
II. Multiplication
a * b
Output:
array([10, 40, 90])
Exercise 5 β Slicing
Given:
a = np.array([5, 10, 15, 20, 25])
Extract the middle three values:
[10, 15, 20]
Solution:
a = np.array([5, 10, 15, 20, 25])
middle = a[1:4]
Output:
array([10, 15, 20])
Exercise 6 β Zero and Ones arrays
Create:
- A 3Γ3 matrix of zeros
- A 2Γ4 matrix of ones
Solution:
I. 3Γ3 matrix of zeros
np.zeros((3, 3))
II. 2Γ4 matrix of ones
np.ones((2, 4))
Exercise 7 β Random numbers
Generate a NumPy array of five random numbers between 0 and 1.
Solution:
r = np.random.rand(5)
Output:
array([0.23, 0.91, 0.49, 0.11, 0.76])
Exercise 8 β Matrix multiplication
Given:
A = np.array([[1, 2],
[3, 4]])
B = np.array([[5, 6],
[7, 8]])
Compute:
A @ B
(or np.dot(A, B))
Solution:
A = np.array([[1, 2],
[3, 4]])
B = np.array([[5, 6],
[7, 8]])
A @ B # or np.dot(A, B)
Output:
array([[19, 22],
[43, 50]])
Exercise 9 β Mean of an array
Compute the mean of:
x = np.array([4, 8, 12, 16])
Solution:
x = np.array([4, 8, 12, 16])
np.mean(x)
Output:
10.0
Exercise 10 β Reshape
Given:
x = np.array([1, 2, 3, 4, 5, 6])
Reshape it into a 2Γ3 matrix.
Solution:
x = np.array([1, 2, 3, 4, 5, 6])
x.reshape(2, 3)
Output:
array([[1, 2, 3],
[4, 5, 6]])
π Pandas Mini Exercises (Level: Very Easy)
Start with:
import pandas as pd
Exercise 1 β Create a DataFrame
Create a DataFrame from this dictionary:
data = {
"Age": [25, 30, 35],
"Salary": [50000, 60000, 70000]
}
Solution:
data = {
"Age": [25, 30, 35],
"Salary": [50000, 60000, 70000]
}
df = pd.DataFrame(data)
df
π§ Explanation
- Dictionary keys β column names
- Lists β column values
Output:
Age Salary
0 25 50000
1 30 60000
2 35 70000
Exercise 2 β View data
Using the DataFrame from Exercise 1:
- Display the first 2 rows
- Display the column names
- Display the shape of the DataFrame
Solution:
df.head(2)
df.columns
df.shape
π§ Explanation
- head(2) β first 2 rows
- columns β column names
- shape β (rows, columns)
Output:
#first 2 rows
Age Salary
0 25 50000
1 30 60000
#column names
Index(['Age', 'Salary'], dtype='object')
#3 rows, 2 columns
(3, 2)
Exercise 3 β Select a column
Select only the Salary column.
Solution:
df["Salary"]
π§ Explanation
- Single brackets β returns a Series
Output:
#This is a Series, not a DataFrame
0 50000
1 60000
2 70000
Name: Salary, dtype: int64
Exercise 4 β Select multiple columns
Select Age and Salary together.
Solution:
df[["Age", "Salary"]]
π§ Explanation
- Double brackets β returns a DataFrame
Output:
#Double brackets β DataFrame
Age Salary
0 25 50000
1 30 60000
2 35 70000
Exercise 5 β Filter rows
From the DataFrame, select rows where:
Age > 28
Solution:
df[df["Age"] > 28]
π§ Explanation
- Boolean condition filters rows; it is a core Pandas skill
- Very common in data cleaning
Output:
Age Salary
1 30 60000
2 35 70000
Exercise 6 β Add a new column
Add a column called Tax which is 10% of Salary.
Solution:
df["Tax"] = 0.10 * df["Salary"]
df
π§ Explanation
- Pandas supports vectorized operations
- Applied to entire column at once
Output:
# Operations apply row-wise automatically
Age Salary Tax
0 25 50000 5000.0
1 30 60000 6000.0
2 35 70000 7000.0
Exercise 7 β Basic statistics
Compute:
- Mean Age
- Maximum Salary
Solution:
df["Age"].mean()
df["Salary"].max()
π§ Explanation
- Pandas has built-in descriptive stats
- Used heavily during EDA
Output:
# Mean Age
30.0
#Maximum Salary
70000
Exercise 8 β Handle missing values
Given:
data = {
"Age": [25, None, 35],
"Salary": [50000, 60000, None]
}
df = pd.DataFrame(data)
- Detect missing values
- Fill missing values with the column mean
Solution:
df.isnull()
df_filled = df.fillna(df.mean())
df
π§ Explanation
-
isnull()β detects missing values -
fillna(df.mean())β fills numeric NaNs with column mean
Output:
Age Salary
0 25.0 50000.0
1 NaN 60000.0
2 35.0 NaN
Breaking this down here:
β
Detect missing values
df.isnull()
Output:
Age Salary
0 False False
1 True False
2 False True
β
Fill missing values with mean
df_filled = df.fillna(df.mean())
df_filled
Output:
Age Salary
0 25.0 50000.0
1 30.0 60000.0
2 35.0 55000.0
π§ Means used:
- Age mean = 30
- Salary mean = 55,000
Exercise 9 β Sort values
Sort the DataFrame by Salary (descending order).
Solution:
df.sort_values(by="Salary", ascending=False)
π§ Explanation
- Sorting helps identify top/bottom values
- Common during analysis
Output:
Age Salary
1 30.0 60000.0
2 35.0 55000.0
0 25.0 50000.0
Exercise 10 β Convert to NumPy (ML step)
Convert:
- Features β
Age,Salary - Target β
Tax
into NumPy arrays.
Solution:
X = df[["Age", "Salary"]].values
y = df["Tax"].values
X
π§ Explanation
-
.valuesconverts Pandas β NumPy - scikit-learn expects NumPy arrays
Output:
array([[2.5e+01, 5.0e+04],
[3.0e+01, 6.0e+04],
[3.5e+01, 5.5e+04]])
β
Target
y
Output:
array([50000., 60000., 55000.])
π§ This is exactly the format ML models expect
π§ͺ Data Preprocessing: Code-Based Mini Exercises
Start with:
import pandas as pd
import numpy as np
Exercise 1 β Train/Test Split (Ratio practice)
Given:
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([10, 20, 30, 40, 50])
π Split the data into 80% training and 20% testing.
Use:
from sklearn.model_selection import train_test_split
Exercise 2 β Detect missing values
Given:
data = {
"Age": [25, 30, None, 40],
"Salary": [50000, None, 70000, 80000]
}
df = pd.DataFrame(data)
π Write code to:
- Detect missing values
- Count missing values per column
Exercise 3 β Fill missing values (Mean)
Using the same DataFrame above:
π Fill missing values using column mean.
Exercise 4 β One-Hot Encoding (Categorical Data)
Given:
df = pd.DataFrame({
"City": ["Delhi", "Mumbai", "Delhi", "Chennai"]
})
π Convert City into numerical columns using one-hot encoding.
Exercise 5 β Feature Scaling (Standardization)
Given:
X = np.array([
[20, 30000],
[30, 50000],
[40, 70000]
])
π Apply Standard Scaling to X.
Use:
from sklearn.preprocessing import StandardScaler
Exercise 6 β Feature Selection
Given:
df = pd.DataFrame({
"Age": [25, 30, 35],
"Salary": [50000, 60000, 70000],
"EmployeeID": [101, 102, 103]
})
π Remove the EmployeeID column.
Exercise 7 β Outlier Detection (Simple logic)
Given:
ages = np.array([22, 23, 24, 25, 120])
π Write code to remove values greater than 100.
Exercise 8 β Data Leakage Check (Thinking + Code)
Given:
from sklearn.preprocessing import StandardScaler
π Write the correct order of code to:
- Split data
- Fit scaler on training data
- Transform both training and test data
(No need to run it β just write the correct sequence.)
If you can do these exercises comfortably, youβre ML-ready at a foundational level.
I will be adding solutions for Data Preprocessing exercises in the subsequent post. If you are new to python, you can install python version 3.x and try playing around with these exercises in your IDE. I use Jupyter notebook.
You can refer to these posts for understanding NumPy, Pandas and Data Preprocessing:
Understanding NumPy in the context of Python for Machine Learning
The next basic concept of Machine Learning after NumPy: Pandas
Understanding Data Preprocessing
Top comments (0)