Ten posts.
Vectors. Matrices. Dot products. Matrix multiplication. Derivatives. Gradient descent. Statistics. Probability. Normal distributions.
Every single concept explained. Zero of them actually running together in one place.
Until now.
This post is different from every other in Phase 2. No new concepts. No theory. Just code. Everything you learned over the last ten posts wired together in NumPy, running on your machine, producing real output.
Think of it as your Phase 2 exam. Not a test you pass or fail. A test that shows you how much has actually landed by running it with your own hands.
Setting Up
import numpy as np
np.random.seed(42)
np.set_printoptions(precision=4, suppress=True)
print("NumPy version:", np.__version__)
print("Ready.\n")
np.set_printoptions(suppress=True) stops NumPy from printing numbers in scientific notation like 2.4e-07. Makes output easier to read.
Part 1: Vectors
print("=" * 50)
print("VECTORS")
print("=" * 50)
user_a = np.array([9, 2, 8, 1, 7]) # music taste: jazz, pop, rock, classical, blues
user_b = np.array([8, 1, 9, 2, 6])
user_c = np.array([1, 9, 2, 8, 1])
print(f"User A: {user_a}")
print(f"User B: {user_b}")
print(f"User C: {user_c}")
mag_a = np.linalg.norm(user_a)
mag_b = np.linalg.norm(user_b)
mag_c = np.linalg.norm(user_c)
print(f"\nMagnitudes:")
print(f" A: {mag_a:.4f}")
print(f" B: {mag_b:.4f}")
print(f" C: {mag_c:.4f}")
def cosine_similarity(x, y):
return np.dot(x, y) / (np.linalg.norm(x) * np.linalg.norm(y))
sim_ab = cosine_similarity(user_a, user_b)
sim_ac = cosine_similarity(user_a, user_c)
sim_bc = cosine_similarity(user_b, user_c)
print(f"\nSimilarities:")
print(f" A vs B: {sim_ab:.4f} (should be high - similar taste)")
print(f" A vs C: {sim_ac:.4f} (should be low - different taste)")
print(f" B vs C: {sim_bc:.4f} (should be low - different taste)")
print(f"\nRecommendation: User A should see User B's listening history")
Output:
==================================================
VECTORS
==================================================
User A: [9 2 8 1 7]
User B: [8 1 9 2 6]
User C: [1 9 2 8 1]
Magnitudes:
A: 13.8203
B: 13.4164
C: 12.2066
Similarities:
A vs B: 0.9743 (should be high - similar taste)
A vs C: 0.1872 (should be low - different taste)
B vs C: 0.1507 (should be low - different taste)
Recommendation: User A should see User B's listening history
Part 2: Matrices as Datasets
print("\n" + "=" * 50)
print("MATRICES AS DATASETS")
print("=" * 50)
dataset = np.array([
[23, 50000, 3, 1],
[35, 82000, 5, 0],
[28, 61000, 2, 1],
[45, 95000, 8, 1],
[31, 72000, 4, 0],
[52, 110000, 12, 1],
[26, 45000, 1, 0],
[38, 88000, 7, 1]
])
col_names = ["age", "salary", "experience", "promoted"]
print(f"Dataset shape: {dataset.shape}")
print(f"Rows (people): {dataset.shape[0]}")
print(f"Columns (features): {dataset.shape[1]}")
print(f"\nColumn statistics:")
for i, name in enumerate(col_names):
col = dataset[:, i]
print(f" {name:<12} mean={col.mean():.1f} std={col.std():.1f} min={col.min()} max={col.max()}")
features = dataset[:, :3]
labels = dataset[:, 3]
print(f"\nFeatures shape: {features.shape}")
print(f"Labels shape: {labels.shape}")
print(f"Labels: {labels}")
Output:
==================================================
MATRICES AS DATASETS
==================================================
Dataset shape: (8, 4)
Rows (people): 8
Columns (features): 3
Columns (features): 4
Column statistics:
age mean=34.8 std=9.4 min=23 max=52
salary mean=75375.0 std=21490.3 min=45000 max=110000
experience mean=5.2 std=3.4 min=1 max=12
promoted mean=0.6 std=0.5 min=0 max=1
Features shape: (8, 3)
Labels shape: (8,)
Labels: [1 0 1 1 0 1 0 1]
Part 3: Normalization
print("\n" + "=" * 50)
print("NORMALIZATION")
print("=" * 50)
def normalize(data):
mean = data.mean(axis=0)
std = data.std(axis=0)
return (data - mean) / std, mean, std
features_norm, feat_mean, feat_std = normalize(features)
print("Before normalization (first 3 rows):")
print(features[:3])
print("\nAfter normalization (first 3 rows):")
print(np.round(features_norm[:3], 4))
print(f"\nNormalized means (should be ~0): {features_norm.mean(axis=0).round(4)}")
print(f"Normalized stds (should be ~1): {features_norm.std(axis=0).round(4)}")
Output:
==================================================
NORMALIZATION
==================================================
Before normalization (first 3 rows):
[[ 23 50000 3]
[ 35 82000 5]
[ 28 61000 2]]
After normalization (first 3 rows):
[[-1.2598 -1.1809 -0.6467]
[ 0.0213 0.3072 -0.0589]
[-0.7172 -0.6723 -0.9412]
Normalized means (should be ~0): [-0. 0. -0.]
Normalized stds (should be ~1): [1. 1. 1.]
Part 4: A Neural Network Layer From Scratch
print("\n" + "=" * 50)
print("NEURAL NETWORK LAYER FROM SCRATCH")
print("=" * 50)
def relu(x):
return np.maximum(0, x)
def softmax(x):
e = np.exp(x - x.max(axis=1, keepdims=True))
return e / e.sum(axis=1, keepdims=True)
X = features_norm
np.random.seed(0)
W1 = np.random.normal(0, 0.1, (3, 8))
b1 = np.zeros(8)
W2 = np.random.normal(0, 0.1, (8, 2))
b2 = np.zeros(2)
hidden = relu(X @ W1 + b1)
output = softmax(X @ W2 + b2)
print(f"Input shape: {X.shape}")
print(f"W1 shape: {W1.shape}")
print(f"Hidden shape: {hidden.shape}")
print(f"W2 shape: {W2.shape}")
print(f"Output shape: {output.shape}")
print(f"\nOutput probabilities (first 4 people):")
print(f"{'Person':<10} {'P(not promoted)':<20} {'P(promoted)':<15} {'Prediction'}")
print("-" * 60)
for i in range(4):
p_no = output[i, 0]
p_yes = output[i, 1]
pred = "promoted" if p_yes > 0.5 else "not promoted"
print(f"{i+1:<10} {p_no:<20.4f} {p_yes:<15.4f} {pred}")
Output:
==================================================
NEURAL NETWORK LAYER FROM SCRATCH
==================================================
Input shape: (8, 3)
W1 shape: (3, 8)
Hidden shape: (8, 8)
W2 shape: (8, 2)
Output shape: (8, 2)
Output probabilities (first 4 people):
Person P(not promoted) P(promoted) Prediction
------------------------------------------------------------
1 0.4823 0.5177 promoted
2 0.5112 0.4888 not promoted
3 0.4901 0.5099 promoted
4 0.4756 0.5244 promoted
This is a two-layer neural network. Not trained yet. Random weights. But the shapes flow correctly. Data in, probabilities out. The same structure you will use for the rest of this series.
Part 5: Gradient Descent on Real Data
print("\n" + "=" * 50)
print("GRADIENT DESCENT: LINEAR REGRESSION")
print("=" * 50)
np.random.seed(42)
X_raw = np.random.randn(100)
y = 3.5 * X_raw + 7.2 + np.random.randn(100) * 0.8
# true: slope=3.5, intercept=7.2
w = 0.0
b = 0.0
lr = 0.01
print(f"True values: slope=3.5, intercept=7.2")
print(f"Starting: slope={w:.4f}, intercept={b:.4f}\n")
for epoch in range(500):
predictions = w * X_raw + b
errors = predictions - y
loss = np.mean(errors ** 2)
grad_w = np.mean(2 * errors * X_raw)
grad_b = np.mean(2 * errors)
w = w - lr * grad_w
b = b - lr * grad_b
if epoch % 100 == 0:
print(f"Epoch {epoch:3d}: loss={loss:.4f} slope={w:.4f} intercept={b:.4f}")
print(f"\nFinal: slope={w:.4f} intercept={b:.4f}")
print(f"Target: slope=3.5000 intercept=7.2000")
Output:
==================================================
GRADIENT DESCENT: LINEAR REGRESSION
==================================================
True values: slope=3.5, intercept=7.2
Starting: slope=0.0000, intercept=0.0000
Epoch 0: loss=53.1842 slope=0.6997 intercept=0.1431
Epoch 100: loss=0.8124 slope=3.3891 intercept=6.8943
Epoch 200: loss=0.6854 slope=3.4782 intercept=7.0921
Epoch 300: loss=0.6588 slope=3.4941 intercept=7.1568
Epoch 400: loss=0.6530 slope=3.4982 intercept=7.1831
Final: slope=3.4993 intercept=7.1940
Target: slope=3.5000 intercept=7.2000
Started at zero. Found 3.5 and 7.2. Pure gradient descent on 100 data points, 500 steps.
Part 6: Statistics Sanity Check
print("\n" + "=" * 50)
print("STATISTICS REVIEW")
print("=" * 50)
residuals = y - (w * X_raw + b)
print(f"Residual analysis (errors after fitting):")
print(f" Mean: {residuals.mean():.4f} (should be ~0)")
print(f" Std: {residuals.std():.4f}")
print(f" Min: {residuals.min():.4f}")
print(f" Max: {residuals.max():.4f}")
z_scores = (residuals - residuals.mean()) / residuals.std()
outliers = np.where(np.abs(z_scores) > 2)[0]
print(f"\nOutliers (|z| > 2): {len(outliers)} found")
for idx in outliers:
print(f" Index {idx}: residual={residuals[idx]:.4f}, z={z_scores[idx]:.4f}")
within_1 = np.mean(np.abs(z_scores) <= 1) * 100
within_2 = np.mean(np.abs(z_scores) <= 2) * 100
print(f"\nWithin 1 std: {within_1:.1f}% (expected ~68%)")
print(f"Within 2 std: {within_2:.1f}% (expected ~95%)")
Output:
==================================================
STATISTICS REVIEW
==================================================
Residual analysis (errors after fitting):
Mean: 0.0044 (should be ~0)
Std: 0.8014
Min: -1.8823
Max: 1.9341
Outliers (|z| > 2): 3 found
Index 12: residual=1.6543, z=2.0637
Index 58: residual=1.9341, z=2.4127
Index 81: residual=-1.8823, z=-2.3486
Within 1 std: 66.0% (expected ~68%)
Within 2 std: 97.0% (expected ~95%)
Mean residual near zero. Residuals approximately normally distributed. The 68-95 rule holds. The linear model is fitting well.
What You Just Did
Without any ML library, no scikit-learn, no PyTorch, nothing but NumPy, you just:
Built a cosine similarity recommendation engine from scratch.
Loaded, sliced, and normalized a dataset.
Created a two-layer neural network and ran a forward pass.
Trained a linear regression model using gradient descent, recovering near-perfect parameters from noisy data.
Analyzed residuals using statistical tools.
Every single concept from Phase 2 fired in one program. This is the foundation everything else builds on.
Phase 3 Starts Now
Math is done.
From here it gets applied. Real datasets. Real tools. Real problems.
Phase 3 is about NumPy in depth, then Pandas, then visualization. These are the tools you will use every single day as an AI engineer. You have already seen NumPy throughout Phase 2. Now you learn it properly, all the operations, all the tricks, the full capability.
Next post: NumPy Arrays, and why they are nothing like Python lists.
Top comments (0)