Okparaji Wisdom

Posted on May 25

# How I Built a Retail Demand Forecasting App with Python and Streamlit

#datascience #machinelearning #python #showdev

By Okparaji Wisdom | Data Scientist | Nigeria

Retailers in Nigeria lose millions of naira every year to two problems: stockouts (shelves go empty, customers leave) and overstock (too much inventory, capital tied up, goods expire). Both are avoidable with data.

So I built DemandForecast AI — a machine learning–powered app that predicts weekly product demand up to 26 weeks ahead, across 20 products in 4 retail categories.

In this article I'll walk you through exactly how I built it, the technical decisions I made, and what I learned.

What the App Does

Forecasts weekly demand for 20 retail products (Electronics, Fashion, Food & Grocery, Home & Living)
Supports forecast horizons from 4 to 26 weeks
Models Nigerian festivity demand spikes (December, Easter, New Year)
Analyses the impact of promotions on demand lift
Displays confidence bands on every forecast
Shows model performance metrics (MAPE, MAE, RMSE) for all 20 models

Live app: [https://demandforecast-ai-78egnrsv5ijehv4sayrduu.streamlit.app/]

GitHub: github.com/Santandave961/demandforecast-ai

The Dataset

I generated a synthetic retail dataset of 3,140 weekly records spanning January 2022 to December 2024, covering 20 products across 4 categories.

Each record contains:

{
    "date": "2022-01-02",
    "category": "Food & Grocery",
    "product": "Rice (5kg)",
    "units_sold": 412,
    "price_naira": 18500.00,
    "promotion": 0,
    "month": 1,
    "week_of_year": 1,
    "year": 2022,
    "quarter": 1
}

The demand values were generated with realistic business logic baked in — trend, seasonality, and Nigerian festivity boosts:

prob = (
    base_demand * (1 + trend * i + seasonal + festivity_boost)
    + np.random.normal(0, base_demand * 0.08)
)

Nigerian festivity boosts applied:

December → +35% (Christmas & New Year)
January → +20% (New Year spending)
April → +15% (Easter)
November → +10% (pre-Christmas buildup)

Promotions randomly fire 15% of the time and boost demand by 25% while cutting price by 15% — simulating real promotional mechanics.

Feature Engineering

Raw dates aren't useful to ML models. I converted them into meaningful numerical features using Fourier transforms to capture seasonality:

df["time_index"] = (df["date"] - df["date"].min()).dt.days
df["sin_week"]   = np.sin(2 * np.pi * df["week_of_year"] / 52)
df["cos_week"]   = np.cos(2 * np.pi * df["week_of_year"] / 52)
df["sin_month"]  = np.sin(2 * np.pi * df["month"] / 12)
df["cos_month"]  = np.cos(2 * np.pi * df["month"] / 12)
df["is_q4"]      = (df["quarter"] == 4).astype(int)

Why Fourier features?

A raw month column tells the model January = 1 and December = 12, but doesn't tell it they're actually close together in seasonal behaviour. Sine and cosine transforms encode the circular nature of time — so the model understands that week 52 and week 1 are neighbours, not opposites.

The full feature set:

feature_cols = [
    "time_index",    # captures long-term trend
    "sin_week",      # weekly seasonality
    "cos_week",
    "sin_month",     # monthly seasonality
    "cos_month",
    "is_q4",         # Q4 festivity flag
    "promotion",     # promo indicator
    "price_naira"    # price elasticity
]

The Model

I trained a separate Linear Regression model for each of the 20 products. Each model learns the trend, seasonality pattern, and price/promo sensitivity specific to that product.

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error

model = LinearRegression()
model.fit(X_train, y_train)
preds = np.clip(model.predict(X_test), 0, None)  # demand can't be negative

Why not XGBoost or Prophet?

I specifically chose Linear Regression + Fourier features for the Streamlit Cloud deployment because:

No extra dependencies — scikit-learn is pre-installed everywhere
Fast training — all 20 models train in under a second on app startup
Fourier features do the heavy lifting for seasonality, so a linear model performs well
XGBoost fails silently on some Streamlit Cloud Python versions

In a production system I would use Prophet or XGBoost with lag features for higher accuracy.

Model Performance

Evaluation on the last 12 weeks (held-out test set) per product:

Metric	Value
Avg MAPE	~9.5%
Avg MAE	~28 units
Avg RMSE	~35 units

MAPE (Mean Absolute Percentage Error) below 10% is generally considered good for retail demand forecasting.

mae  = mean_absolute_error(y_test, preds)
rmse = np.sqrt(mean_squared_error(y_test, preds))
mape = np.mean(np.abs((y_test.values - preds) / (y_test.values + 1))) * 100

Note: I add 1 to the denominator to avoid division by zero on weeks with zero demand.

Forecasting Future Demand

For future periods, I generate the feature rows synthetically — extending the time index forward and computing future Fourier values from the future dates:

def make_future_features(last_date, last_time_idx, periods, avg_price, promo_rate):
    rows = []
    for i in range(1, periods + 1):
        future_date = last_date + pd.Timedelta(weeks=i)
        week  = future_date.isocalendar()[1]
        month = future_date.month
        rows.append({
            "date":       future_date,
            "time_index": last_time_idx + i * 7,
            "sin_week":   np.sin(2 * np.pi * week / 52),
            "cos_week":   np.cos(2 * np.pi * week / 52),
            "sin_month":  np.sin(2 * np.pi * month / 12),
            "cos_month":  np.cos(2 * np.pi * month / 12),
            "is_q4":      int(((month - 1) // 3 + 1) == 4),
            "promotion":  1 if np.random.rand() < promo_rate else 0,
            "price_naira": avg_price * np.random.uniform(0.95, 1.05),
        })
    return pd.DataFrame(rows)

Confidence bands are approximated as ±12% around the point forecast — a simple but visually useful representation of uncertainty.

The Streamlit App

The app has 5 pages:

Forecast — select product, horizon, promo rate → get forecast chart + table
Model Performance — MAPE and RMSE charts for all 20 models
Trend Explorer — historical demand lines + monthly seasonality heatmap
Insights — promo impact analysis + Nigerian festivity calendar
About — project details and links

One important Streamlit trick I used — @st.cache_resource to train all 20 models once at startup and reuse them across sessions:

@st.cache_resource
def train_all_models(df):
    models, metrics = {}, {}
    for product in df["product"].unique():
        # train and store each model
        models[product] = model
    return models, metrics, feature_cols

Without this, the app would retrain 20 models on every user interaction — very slow.

Deployment

Deployed on Streamlit Community Cloud in 3 steps:

Push to GitHub
Connect repo at share.streamlit.io
Add runtime.txt containing 3.11 to pin Python version

The runtime.txt file is critical — without it Streamlit Cloud may use Python 3.14+ which breaks some dependencies silently.

What I'd Improve in v2

Replace Linear Regression with Prophet for better seasonality decomposition
Add lag features (demand from last week, last month) for autocorrelation
Connect to a real retail database (SQLite or PostgreSQL)
Add inventory optimisation — recommend reorder points based on forecasts
Deploy as a FastAPI backend with a Streamlit frontend

Key Takeaways

Fourier features are a powerful, lightweight way to encode seasonality without needing Prophet
Training one model per SKU beats training one global model when products have very different demand patterns
@st.cache_resource is essential for any Streamlit app that trains models at startup
Nigerian retail has strong festivity-driven seasonality that generic models miss — localisation matters

Connect

If you found this useful or want to collaborate on data science projects in the Nigerian tech space, connect with me:

Tags: #python #machinelearning #datascience #streamlit #nigeria #retailtech #beginners #tutorial

DEV Community