I keep seeing people add higher-degree polynomials “just in case”.
Today I got tired of it and built the simplest possible counter-example.
Result: the quadratic model literally got a lower test R² than plain linear regression.
Let’s watch Occam’s Razor do its thing.
More complex model ≠ better model
(That’s literally Occam’s Razor in action)
A simple, reproducible example showing why a more complex model is not always better — even on perfectly linear data.
Today I created a perfectly linear dataset:
y = 2x + 3 + some random noise
Then I trained two models:
- Simple Linear Regression → Test R² ≈ 0.8626
- Polynomial Degree 2 (more complex) → Test R² ≈ 0.8623
Guess what?
The complex model did slightly better on training data…
but performed worse (or at best equal) on unseen test data.
This notebook walks through the entire experiment — step by step — with clean plots and real numbers.
It’s a classic illustration of overfitting and why Occam’s Razor matters in machine learning.
Let’s dive in.
1. Import Libraries & Generate Synthetic Linear Data with Noise
We generate a perfectly linear dataset based on the true relationship y = 2x + 3.
Then we add Gaussian noise to scatter the points slightly off the line — just like real-world data.
Why add noise?
Because in practice, data is never perfectly clean.
This noise tempts the polynomial model to fit random fluctuations instead of the underlying pattern — classic overfitting.
Meanwhile, the simple linear model ignores the noise and captures only the true signal.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import r2_score
Generate perfectly linear data with noise
x = np.arange(-5.0, 5.0, 0.1)
y_true = 2 * x + 3
y_noise = 2 * np.random.normal(size=x.size)
y = y_true + y_noise
Plot: Data + True Relationship
plt.figure(figsize=(10, 6))
plt.scatter(x, y, marker='o', color='blue', alpha=0.7, s=50, label='Observed Data', edgecolor='navy', linewidth=0.5)
plt.plot(x, y_true, color='red', linewidth=3, label='True Relationship (y = 2x + 3)')
plt.title('Synthetic Dataset with Ground Truth', fontsize=14, pad=15)
plt.xlabel('x')
plt.ylabel('y')
plt.legend(loc='lower center', bbox_to_anchor=(0.5, -0.15), ncol=2, fancybox=True, shadow=True, fontsize=11)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
2. Convert to DataFrame & Train/Test Split
We convert the data into a pandas DataFrame for easier handling and cleaner syntax (e.g., df['x'] instead of indexing arrays).
Then we randomly split it into training (80%) and testing (20%) sets.
This split is crucial — it allows us to evaluate how well each model generalizes to unseen data, which is exactly where overfitting reveals itself.
X = x.reshape(-1, 1)
data = np.column_stack((x, y))
df = pd.DataFrame(data, columns=['x', 'y'])
msk = np.random.rand(len(df)) < 0.8
train = df[msk]
test = df[~msk]
train_x = train[['x']]
train_y = train[['y']]
test_x = test[['x']]
test_y = test[['y']]
3. Simple Linear Regression
We fit a basic linear model and evaluate it on the test set.
reg_linear = LinearRegression()
reg_linear.fit(train_x, train_y)
coef_linear = reg_linear.coef_[0]
intercept_linear = reg_linear.intercept_
pred_linear = reg_linear.predict(test_x)
r2_linear = r2_score(test_y, pred_linear)
print(f"r^2 SimpleRegressionLinear : {r2_linear:.4f}")
print(f"coef_ : {coef_linear}")
r^2 SimpleRegressionLinear : 0.8626
coef_ : [2.03889206]
plt.figure(figsize=(10, 6))
plt.scatter(df.x, df.y, color='blue', marker='o', alpha=0.7, s=50, label='Observed Data', edgecolor='darkblue', linewidth=0.5)
plt.plot(df.x, intercept_linear + coef_linear * df.x, color='red', linewidth=2.5, label='Linear Regression')
plt.title('Simple Linear Regression', fontsize=14, pad=15)
plt.xlabel('x')
plt.ylabel('y')
plt.legend(loc='lower center', bbox_to_anchor=(0.5, -0.15), ncol=2, fancybox=True, shadow=True, fontsize=11)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
4. Polynomial Regression (Degree 2)
Now we increase model complexity by adding x² as a feature using PolynomialFeatures.
poly = PolynomialFeatures(degree=2)
poly_train_x = poly.fit_transform(train_x)
poly_test_x = poly.transform(test_x)
poly_train_x = poly.fit_transform(train_x)
poly_test_x = poly.fit_transform(test_x)
reg_poly = LinearRegression()
reg_poly.fit(poly_train_x, train_y)
poly_test_y_poly = reg_poly.predict(poly_test_x)
r2_poly = r2_score(test_y, poly_test_y_poly)
coef_poly = reg_poly.coef_[0]
intercept_poly = reg_poly.intercept_
print(f"r^2 PolynomialRegressionLinear : {r2_poly:.4f}")
print(f"coef_ : {coef_poly}")
r^2 PolynomialRegressionLinear : 0.8623
coef_ : [0. 2.03959043 0.00316868]
5. Final Comparison & Overfitting Detection
We compare both models side-by-side. Notice how the polynomial model fits the training noise slightly better — but performs worse on test data.
print(f"performance simple linear regression on train data : {reg_linear.score(train_x, train_y):.4f} and test data: {reg_linear.score(test_x, test_y):.4f}")
print(f"performance poly linear regression on train data : {reg_poly.score(poly_train_x, train_y):.4f} and test data: {reg_poly.score(poly_test_x, test_y):.4f}")
performance simple linear regression on train data : 0.8926 and test data: 0.8626
performance poly linear regression on train data : 0.8926 and test data: 0.8623
plt.figure(figsize=(10, 6))
plt.scatter(df.x, df.y, color='blue', marker='o', alpha=0.7, label='Observed Data')
plt.plot(df.x, intercept_linear + coef_linear * df.x, color='red', linewidth=2.5, label='Linear Regression')
plt.plot(df.x, intercept_poly + coef_poly[1] * df.x + coef_poly[2] * np.power(df.x, 2), color='green', linewidth=2.5, label='Polynomial Degree 2')
plt.title('Linear vs Polynomial Regression')
plt.xlabel('x')
plt.ylabel('y')
plt.legend(loc='lower center', bbox_to_anchor=(0.5, -0.15), ncol=3, fancybox=True, shadow=True)
plt.grid(True, alpha=0.3)
plt.xlim(-5, 5)
plt.tight_layout()
plt.show()
Conclusion
- The true relationship is perfectly linear
- The polynomial model (more complex) slightly overfits the training noise
- The simpler linear model achieves better generalization on test data
- This demonstrates Occam's Razor: Among models with similar explanatory power, prefer the simpler one
Less is often more in machine learning.



Top comments (0)