DEV Community

Cover image for Occam’s Razor destroyed my polynomial model in 5 minutes
MohammadReza Mahdian
MohammadReza Mahdian

Posted on

Occam’s Razor destroyed my polynomial model in 5 minutes

I keep seeing people add higher-degree polynomials “just in case”.
Today I got tired of it and built the simplest possible counter-example.
Result: the quadratic model literally got a lower test R² than plain linear regression.
Let’s watch Occam’s Razor do its thing.

More complex model ≠ better model

(That’s literally Occam’s Razor in action)

A simple, reproducible example showing why a more complex model is not always better — even on perfectly linear data.

Today I created a perfectly linear dataset:

y = 2x + 3 + some random noise

Then I trained two models:

  • Simple Linear Regression → Test R² ≈ 0.8626
  • Polynomial Degree 2 (more complex) → Test R² ≈ 0.8623

Guess what?

The complex model did slightly better on training data…

but performed worse (or at best equal) on unseen test data.

This notebook walks through the entire experiment — step by step — with clean plots and real numbers.

It’s a classic illustration of overfitting and why Occam’s Razor matters in machine learning.

Let’s dive in.

1. Import Libraries & Generate Synthetic Linear Data with Noise

We generate a perfectly linear dataset based on the true relationship y = 2x + 3.

Then we add Gaussian noise to scatter the points slightly off the line — just like real-world data.

Why add noise?

Because in practice, data is never perfectly clean.

This noise tempts the polynomial model to fit random fluctuations instead of the underlying pattern — classic overfitting.

Meanwhile, the simple linear model ignores the noise and captures only the true signal.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import r2_score
Enter fullscreen mode Exit fullscreen mode

Generate perfectly linear data with noise

x = np.arange(-5.0, 5.0, 0.1)
y_true = 2 * x + 3
y_noise = 2 * np.random.normal(size=x.size)
y = y_true + y_noise
Enter fullscreen mode Exit fullscreen mode

Plot: Data + True Relationship

plt.figure(figsize=(10, 6))
plt.scatter(x, y, marker='o', color='blue', alpha=0.7, s=50, label='Observed Data', edgecolor='navy', linewidth=0.5)
plt.plot(x, y_true, color='red', linewidth=3, label='True Relationship (y = 2x + 3)')

plt.title('Synthetic Dataset with Ground Truth', fontsize=14, pad=15)
plt.xlabel('x')
plt.ylabel('y')

plt.legend(loc='lower center', bbox_to_anchor=(0.5, -0.15), ncol=2, fancybox=True, shadow=True, fontsize=11)

plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Enter fullscreen mode Exit fullscreen mode

2. Convert to DataFrame & Train/Test Split

We convert the data into a pandas DataFrame for easier handling and cleaner syntax (e.g., df['x'] instead of indexing arrays).

Then we randomly split it into training (80%) and testing (20%) sets.

This split is crucial — it allows us to evaluate how well each model generalizes to unseen data, which is exactly where overfitting reveals itself.

X = x.reshape(-1, 1)
data = np.column_stack((x, y))
df = pd.DataFrame(data, columns=['x', 'y'])
msk = np.random.rand(len(df)) < 0.8
train = df[msk]
test = df[~msk]

train_x = train[['x']]
train_y = train[['y']]
test_x = test[['x']]
test_y = test[['y']]
Enter fullscreen mode Exit fullscreen mode

3. Simple Linear Regression

We fit a basic linear model and evaluate it on the test set.

reg_linear = LinearRegression()
reg_linear.fit(train_x, train_y)
Enter fullscreen mode Exit fullscreen mode
coef_linear = reg_linear.coef_[0]
intercept_linear = reg_linear.intercept_
Enter fullscreen mode Exit fullscreen mode
pred_linear = reg_linear.predict(test_x)
r2_linear = r2_score(test_y, pred_linear)
Enter fullscreen mode Exit fullscreen mode
print(f"r^2 SimpleRegressionLinear : {r2_linear:.4f}")
print(f"coef_ : {coef_linear}")
Enter fullscreen mode Exit fullscreen mode
r^2 SimpleRegressionLinear : 0.8626
coef_ : [2.03889206]
Enter fullscreen mode Exit fullscreen mode
plt.figure(figsize=(10, 6))
plt.scatter(df.x, df.y, color='blue', marker='o', alpha=0.7, s=50, label='Observed Data', edgecolor='darkblue', linewidth=0.5)

plt.plot(df.x, intercept_linear + coef_linear * df.x, color='red', linewidth=2.5, label='Linear Regression')

plt.title('Simple Linear Regression', fontsize=14, pad=15)
plt.xlabel('x')
plt.ylabel('y')

plt.legend(loc='lower center', bbox_to_anchor=(0.5, -0.15), ncol=2, fancybox=True, shadow=True, fontsize=11)

plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Enter fullscreen mode Exit fullscreen mode

4. Polynomial Regression (Degree 2)

Now we increase model complexity by adding x² as a feature using PolynomialFeatures.

poly = PolynomialFeatures(degree=2)
poly_train_x = poly.fit_transform(train_x)
poly_test_x = poly.transform(test_x)

poly_train_x = poly.fit_transform(train_x)
poly_test_x = poly.fit_transform(test_x)

reg_poly = LinearRegression()
reg_poly.fit(poly_train_x, train_y)
Enter fullscreen mode Exit fullscreen mode
poly_test_y_poly = reg_poly.predict(poly_test_x)
r2_poly = r2_score(test_y, poly_test_y_poly)
Enter fullscreen mode Exit fullscreen mode
coef_poly = reg_poly.coef_[0]
intercept_poly = reg_poly.intercept_
Enter fullscreen mode Exit fullscreen mode
print(f"r^2 PolynomialRegressionLinear : {r2_poly:.4f}")
print(f"coef_ : {coef_poly}")
Enter fullscreen mode Exit fullscreen mode

‍‍

r^2 PolynomialRegressionLinear : 0.8623
coef_ : [0.         2.03959043 0.00316868]
Enter fullscreen mode Exit fullscreen mode

5. Final Comparison & Overfitting Detection

We compare both models side-by-side. Notice how the polynomial model fits the training noise slightly better — but performs worse on test data.

print(f"performance simple linear regression on train data : {reg_linear.score(train_x, train_y):.4f}  and test data: {reg_linear.score(test_x, test_y):.4f}")
print(f"performance poly linear regression on train data : {reg_poly.score(poly_train_x, train_y):.4f}  and test data: {reg_poly.score(poly_test_x, test_y):.4f}")
Enter fullscreen mode Exit fullscreen mode
performance simple linear regression on train data : 0.8926  and test data: 0.8626
performance poly linear regression on train data : 0.8926  and test data: 0.8623
Enter fullscreen mode Exit fullscreen mode
plt.figure(figsize=(10, 6))
plt.scatter(df.x, df.y, color='blue', marker='o', alpha=0.7, label='Observed Data')
plt.plot(df.x, intercept_linear + coef_linear * df.x, color='red', linewidth=2.5, label='Linear Regression')
plt.plot(df.x, intercept_poly + coef_poly[1] * df.x + coef_poly[2] * np.power(df.x, 2), color='green', linewidth=2.5, label='Polynomial Degree 2')
plt.title('Linear vs Polynomial Regression')
plt.xlabel('x')
plt.ylabel('y')
plt.legend(loc='lower center', bbox_to_anchor=(0.5, -0.15), ncol=3, fancybox=True, shadow=True)
plt.grid(True, alpha=0.3)
plt.xlim(-5, 5)
plt.tight_layout()
plt.show()
Enter fullscreen mode Exit fullscreen mode

Conclusion

  • The true relationship is perfectly linear
  • The polynomial model (more complex) slightly overfits the training noise
  • The simpler linear model achieves better generalization on test data
  • This demonstrates Occam's Razor: Among models with similar explanatory power, prefer the simpler one

Less is often more in machine learning.

Top comments (0)