likhitha manikonda

Posted on Dec 20, 2025

Predicting Tea Sales With ML: Linear Regression, Gradient Descent & Regularization (Beginner Friendly + Code)

#ai #machinelearning #beginners #learning

🍵 Linear Regression for Absolute Beginners

With Code Examples, Tea‑Stall Stories

Machine learning terms like cost function, gradient descent, regularization, and overfitting can feel abstract. We’ll make them concrete with a tea stall story and then bring them to life with working Python code you can run.

What’s ahead

Linear Regression: predict tea sales from temperature

Cost Function (MSE): measure how wrong predictions are

Gradient Descent: improve step‑by‑step

Overfitting: when a model memorizes noise

Regularization (Ridge/Lasso): keep models simple and robust

Visualizing how penalties shrink coefficients

Handy forecast function

🧪 Setup (Run These First)

Explanation: We import the tools we’ll use. numpy and pandas handle numbers/data; matplotlib draws charts; sklearn gives us ready‑made ML models and utilities.

# Install if needed: pip install numpy pandas scikit-learn matplotlib

# Core libraries
import numpy as np                 # numerical computations
import pandas as pd               # data tables (light use here)
import matplotlib.pyplot as plt   # plotting

# Models
from sklearn.linear_model import LinearRegression, Ridge, Lasso

# Utilities
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Make randomness repeatable (so results are the same each run)
np.random.seed(42)

⭐ Scenario 1: Linear Regression (Tea Sales vs. Temperature)

Story: As temperature rises, fewer people want hot tea. We’ll fit a straight line to predict tea cups sold from °C.

Key idea: Linear regression finds the line y = m·x + c that best matches the data (best = smallest average error).

# Synthetic dataset: temperature (°C) → tea cups sold
temps = np.array([10, 12, 15, 18, 20, 22, 24, 26, 28]).reshape(-1, 1)
# reshape(-1,1) turns a 1D list into a column vector for sklearn

tea_sales = np.array([100, 95, 85, 70, 60, 55, 50, 45, 40])

# Create and train the model
lin = LinearRegression()
lin.fit(temps, tea_sales)

print("Slope (m):", lin.coef_[0])       # change in cups for +1°C
print("Intercept (c):", lin.intercept_) # cups at 0°C

# Predict tea sales for 21°C
tomorrow_temp = np.array([[21]])
pred_sales = lin.predict(tomorrow_temp)
print("Predicted tea cups at 21°C:", int(pred_sales[0]))

# Plot data and the fitted line
plt.scatter(temps, tea_sales, color="teal", label="Actual")
plt.plot(temps, lin.predict(temps), color="orange", label="Fitted line")
plt.xlabel("Temperature (°C)")
plt.ylabel("Tea cups sold")
plt.title("Linear Regression: Tea Sales vs. Temperature")
plt.legend()
plt.show()

⭐ Scenario 2: Cost Function (How Wrong Are We?)

# MSE: Mean Squared Error = average of squared differences
def mse(y_true, y_pred):
    return np.mean((y_true - y_pred)**2)

# Evaluate our fitted line on the training data
y_pred = lin.predict(temps)
print("Mean Squared Error (MSE):", mse(tea_sales, y_pred))
# Lower MSE = better fit

⭐ Scenario 3: Gradient Descent (Learning Step‑by‑Step)

Explanation: Gradient descent adjusts parameters (m and c) little by little to reduce MSE. Imagine tasting a tea recipe and tweaking sugar and milk in the direction that improves taste—repeat until it’s good enough.

# Manual gradient descent for y = m*x + c

X = temps.flatten()
y = tea_sales.astype(float)

m, c = 0.0, 0.0             # start with guesses
lr = 0.0005                 # learning rate: step size
epochs = 5000               # number of update steps

def predictions(m, c, X): 
    return m*X + c

def gradients(m, c, X, y):
    y_hat = predictions(m, c, X)
    # Partial derivatives of MSE w.r.t m and c
    dm = (-2/len(X)) * np.sum(X * (y - y_hat))
    dc = (-2/len(X)) * np.sum(y - y_hat)
    return dm, dc

history = []  # track the cost after each update

for _ in range(epochs):
    dm, dc = gradients(m, c, X, y)
    m -= lr * dm            # move m opposite the gradient
    c -= lr * dc            # move c opposite the gradient
    history.append(mse(y, predictions(m, c, X)))

print(f"GD learned slope m={m:.3f}, intercept c={c:.3f}, final MSE={history[-1]:.2f}")

# Visualize learning: MSE should go down over epochs
plt.plot(history)
plt.xlabel("Epoch")
plt.ylabel("MSE (Cost)")
plt.title("Gradient Descent: Cost vs. Epochs")
plt.show()

Tip: If lr is too big, the loss will bounce or explode. If it’s too small, learning is very slow.

⭐ Scenario 4: Overfitting (When a Model Memorizes Noise)

Explanation: Overfitting happens when the model learns not just the real pattern but also the random noise—so it looks great on training data but fails on new data. We’ll build a dataset with both useful features (temperature, rain, festival) and noisy ones (traffic, dog_barks) to see this.

n = 300

# Features: some useful, some noisy
temp = np.random.uniform(5, 35, size=n)                    # useful
rain = np.random.binomial(1, 0.3, size=n)                  # somewhat useful
festival = np.random.binomial(1, 0.1, size=n)              # occasionally useful
traffic = np.random.normal(0, 1, size=n)                   # mostly noise
dog_barks = np.random.normal(0, 1, size=n)                 # pure noise

# True relationship (what the world actually does)
true_sales = (120 - 2.5*temp + 10*rain + 15*festival 
              + np.random.normal(0, 3, size=n))            # irreducible noise

# Feature matrix
X = np.column_stack([temp, rain, festival, traffic, dog_barks])
feature_names = ["temp", "rain", "festival", "traffic", "dog_barks"]

# Split to detect overfitting (train vs test)
X_train, X_test, y_train, y_test = train_test_split(
    X, true_sales, test_size=0.25, random_state=42
)

# Plain Linear Regression (may overfit noisy features)
lr_model = LinearRegression()
lr_model.fit(X_train, y_train)

print("Linear Regression Coefficients:")
for name, coef in zip(feature_names, lr_model.coef_):
    print(f"  {name:<10} -> {coef: .3f}")

print("Train MSE:", mean_squared_error(y_train, lr_model.predict(X_train)))
print("Test  MSE:", mean_squared_error(y_test,  lr_model.predict(X_test)))

What to look for:

Big coefficients on obviously noisy features (e.g., dog_barks)
Train MSE ≪ Test MSE → the model memorized training quirks

⭐ Scenario 5: Fixing Overfitting (Regularization Is the Hero)

Explanation: To fight overfitting, you can:

Remove useless features (domain knowledge)
Gather more data (less variance)
Add regularization (systematic, works even when noise isn’t obvious)

Regularization (the concept): Add a penalty to the loss for large coefficients. This discourages complex models that chase noise.

⭐ Scenario 6: Regularization (Penalty for Complexity)

Tea analogy: Tell your tea‑maker: “Use too many ingredients and you lose points.” The model then prefers simpler recipes that generalize better.

Ridge (L2): Penalizes the square of weights → smoothly shrinks them toward zero
Lasso (L1): Penalizes the absolute value → can push some weights exactly to zero, performing feature selection

⭐ Scenario 7: Regularized Linear Regression (Ridge & Lasso)

Explanation: We’ll fit both Ridge and Lasso to see how penalties change coefficients and test performance. (For real projects, consider scaling features and using cross‑validation to choose alpha.)

# Ridge (L2): Shrinks coefficients smoothly
ridge = Ridge(alpha=10.0)    # alpha = strength of penalty (λ)
ridge.fit(X_train, y_train)

print("\nRidge Coefficients (alpha=10):")
for name, coef in zip(feature_names, ridge.coef_):
    print(f"  {name:<10} -> {coef: .3f}")

print("Ridge Train MSE:", mean_squared_error(y_train, ridge.predict(X_train)))
print("Ridge Test  MSE:", mean_squared_error(y_test,  ridge.predict(X_test)))

# Lasso (L1): Can set some coefficients exactly to zero
lasso = Lasso(alpha=1.0)     # try 0.1, 0.5, 2.0 and compare
lasso.fit(X_train, y_train)

print("\nLasso Coefficients (alpha=1.0):")
for name, coef in zip(feature_names, lasso.coef_):
    print(f"  {name:<10} -> {coef: .3f}")

print("Lasso Train MSE:", mean_squared_error(y_train, lasso.predict(X_train)))
print("Lasso Test  MSE:", mean_squared_error(y_test,  lasso.predict(X_test)))

Interpretation:

Ridge should shrink noisy coefficients closer to zero
Lasso may set truly useless features to zero
Test MSE often improves vs. plain linear regression

⭐ Scenario 8: How Regularization Fixes Overfitting (Deep Dive)

Explanation: We’ll vary alpha and visualize how coefficients shrink and how train vs test MSE behave. Look for the alpha that minimizes test MSE—that’s your sweet spot.

alphas = [0.0, 0.1, 1.0, 10.0, 50.0]  # 0.0 = no regularization baseline
coef_paths_ridge = []
train_mse_ridge, test_mse_ridge = [], []

for a in alphas:
    model = LinearRegression() if a == 0.0 else Ridge(alpha=a)
    model.fit(X_train, y_train)
    coef_paths_ridge.append(model.coef_)
    train_mse_ridge.append(mean_squared_error(y_train, model.predict(X_train)))
    test_mse_ridge.append(mean_squared_error(y_test,  model.predict(X_test)))

coef_paths_ridge = np.array(coef_paths_ridge)

# Coefficient shrinkage plot
plt.figure(figsize=(8, 5))
for i, name in enumerate(feature_names):
    plt.plot(alphas, coef_paths_ridge[:, i], marker="o", label=name)
plt.xscale("log")
plt.xlabel("alpha (log scale)")
plt.ylabel("Coefficient value")
plt.title("Ridge: Coefficient Shrinkage with Increasing Penalty")
plt.legend()
plt.show()

# Train vs Test MSE plot (watch for over/underfitting)
plt.figure(figsize=(8, 5))
plt.plot(alphas, train_mse_ridge, marker="o", label="Train MSE")
plt.plot(alphas, test_mse_ridge, marker="o", label="Test MSE")
plt.xscale("log")
plt.xlabel("alpha (log scale)")
plt.ylabel("MSE")
plt.title("Ridge: Train vs Test MSE Across Penalties")
plt.legend()
plt.show()

Reading the charts:

Low alpha → big coefficients, risk of overfitting (low train MSE, higher test MSE)
Moderate alpha → coefficients shrink, generalization improves
Too‑high alpha → model too simple (underfitting), both MSEs rise

🧠 Bonus: Simple Tea Forecast Function

Explanation: Once you’ve trained a good model (e.g., ridge), you can wrap it in a small function to quickly forecast cups for a given day.

def forecast_tea_cups(temp_c, rain=0, festival=0, model=ridge):
    """
    Predict tea cups for given conditions.
    We ignore traffic/dog_barks at prediction time since they were noise.
    """
    x = np.array([[temp_c, rain, festival, 0.0, 0.0]])
    return float(model.predict(x)[0])

print("Forecast for 18°C, raining, festival day:",
      round(forecast_tea_cups(18, rain=1, festival=1)))

print("Forecast for 30°C, no rain, normal day:",
      round(forecast_tea_cups(30, rain=0, festival=0)))

✅ Final Takeaways

Linear Regression draws the best straight line between features and target.
Cost Function (MSE) penalizes prediction errors, especially big ones.
Gradient Descent iteratively improves parameters to minimize loss.
Overfitting = learning noise; great on training, poor on new data.
Regularization (Ridge/Lasso) shrinks weights, removes noise, and improves generalization
Choose α (lambda) carefully: too small → overfit; too large → underfit.

🎯 Practical Tips for Beginners

Scale features (e.g., StandardScaler) before Lasso/Ridge so alpha behaves consistently.
Use train/test split (and cross‑validation) to choose alpha that minimizes test error.
Start with Ridge for stability; try Lasso when you suspect some features are useless.
Plot residuals (actual − predicted); random scatter = good, patterns = model mis‑specification.

DEV Community