DEV Community

Okparaji Wisdom
Okparaji Wisdom

Posted on

# How I Built a Retail Demand Forecasting App with Python and Streamlit

By Okparaji Wisdom | Data Scientist | Nigeria


Retailers in Nigeria lose millions of naira every year to two problems: stockouts (shelves go empty, customers leave) and overstock (too much inventory, capital tied up, goods expire). Both are avoidable with data.

So I built DemandForecast AI — a machine learning–powered app that predicts weekly product demand up to 26 weeks ahead, across 20 products in 4 retail categories.

In this article I'll walk you through exactly how I built it, the technical decisions I made, and what I learned.


What the App Does

  • Forecasts weekly demand for 20 retail products (Electronics, Fashion, Food & Grocery, Home & Living)
  • Supports forecast horizons from 4 to 26 weeks
  • Models Nigerian festivity demand spikes (December, Easter, New Year)
  • Analyses the impact of promotions on demand lift
  • Displays confidence bands on every forecast
  • Shows model performance metrics (MAPE, MAE, RMSE) for all 20 models

Live app: [https://demandforecast-ai-78egnrsv5ijehv4sayrduu.streamlit.app/]

GitHub: github.com/Santandave961/demandforecast-ai


The Dataset

I generated a synthetic retail dataset of 3,140 weekly records spanning January 2022 to December 2024, covering 20 products across 4 categories.

Each record contains:

{
    "date": "2022-01-02",
    "category": "Food & Grocery",
    "product": "Rice (5kg)",
    "units_sold": 412,
    "price_naira": 18500.00,
    "promotion": 0,
    "month": 1,
    "week_of_year": 1,
    "year": 2022,
    "quarter": 1
}
Enter fullscreen mode Exit fullscreen mode

The demand values were generated with realistic business logic baked in — trend, seasonality, and Nigerian festivity boosts:

prob = (
    base_demand * (1 + trend * i + seasonal + festivity_boost)
    + np.random.normal(0, base_demand * 0.08)
)
Enter fullscreen mode Exit fullscreen mode

Nigerian festivity boosts applied:

  • December → +35% (Christmas & New Year)
  • January → +20% (New Year spending)
  • April → +15% (Easter)
  • November → +10% (pre-Christmas buildup)

Promotions randomly fire 15% of the time and boost demand by 25% while cutting price by 15% — simulating real promotional mechanics.


Feature Engineering

Raw dates aren't useful to ML models. I converted them into meaningful numerical features using Fourier transforms to capture seasonality:

df["time_index"] = (df["date"] - df["date"].min()).dt.days
df["sin_week"]   = np.sin(2 * np.pi * df["week_of_year"] / 52)
df["cos_week"]   = np.cos(2 * np.pi * df["week_of_year"] / 52)
df["sin_month"]  = np.sin(2 * np.pi * df["month"] / 12)
df["cos_month"]  = np.cos(2 * np.pi * df["month"] / 12)
df["is_q4"]      = (df["quarter"] == 4).astype(int)
Enter fullscreen mode Exit fullscreen mode

Why Fourier features?

A raw month column tells the model January = 1 and December = 12, but doesn't tell it they're actually close together in seasonal behaviour. Sine and cosine transforms encode the circular nature of time — so the model understands that week 52 and week 1 are neighbours, not opposites.

The full feature set:

feature_cols = [
    "time_index",    # captures long-term trend
    "sin_week",      # weekly seasonality
    "cos_week",
    "sin_month",     # monthly seasonality
    "cos_month",
    "is_q4",         # Q4 festivity flag
    "promotion",     # promo indicator
    "price_naira"    # price elasticity
]
Enter fullscreen mode Exit fullscreen mode

The Model

I trained a separate Linear Regression model for each of the 20 products. Each model learns the trend, seasonality pattern, and price/promo sensitivity specific to that product.

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error

model = LinearRegression()
model.fit(X_train, y_train)
preds = np.clip(model.predict(X_test), 0, None)  # demand can't be negative
Enter fullscreen mode Exit fullscreen mode

Why not XGBoost or Prophet?

I specifically chose Linear Regression + Fourier features for the Streamlit Cloud deployment because:

  1. No extra dependencies — scikit-learn is pre-installed everywhere
  2. Fast training — all 20 models train in under a second on app startup
  3. Fourier features do the heavy lifting for seasonality, so a linear model performs well
  4. XGBoost fails silently on some Streamlit Cloud Python versions

In a production system I would use Prophet or XGBoost with lag features for higher accuracy.


Model Performance

Evaluation on the last 12 weeks (held-out test set) per product:

Metric Value
Avg MAPE ~9.5%
Avg MAE ~28 units
Avg RMSE ~35 units

MAPE (Mean Absolute Percentage Error) below 10% is generally considered good for retail demand forecasting.

mae  = mean_absolute_error(y_test, preds)
rmse = np.sqrt(mean_squared_error(y_test, preds))
mape = np.mean(np.abs((y_test.values - preds) / (y_test.values + 1))) * 100
Enter fullscreen mode Exit fullscreen mode

Note: I add 1 to the denominator to avoid division by zero on weeks with zero demand.


Forecasting Future Demand

For future periods, I generate the feature rows synthetically — extending the time index forward and computing future Fourier values from the future dates:

def make_future_features(last_date, last_time_idx, periods, avg_price, promo_rate):
    rows = []
    for i in range(1, periods + 1):
        future_date = last_date + pd.Timedelta(weeks=i)
        week  = future_date.isocalendar()[1]
        month = future_date.month
        rows.append({
            "date":       future_date,
            "time_index": last_time_idx + i * 7,
            "sin_week":   np.sin(2 * np.pi * week / 52),
            "cos_week":   np.cos(2 * np.pi * week / 52),
            "sin_month":  np.sin(2 * np.pi * month / 12),
            "cos_month":  np.cos(2 * np.pi * month / 12),
            "is_q4":      int(((month - 1) // 3 + 1) == 4),
            "promotion":  1 if np.random.rand() < promo_rate else 0,
            "price_naira": avg_price * np.random.uniform(0.95, 1.05),
        })
    return pd.DataFrame(rows)
Enter fullscreen mode Exit fullscreen mode

Confidence bands are approximated as ±12% around the point forecast — a simple but visually useful representation of uncertainty.


The Streamlit App

The app has 5 pages:

  1. Forecast — select product, horizon, promo rate → get forecast chart + table
  2. Model Performance — MAPE and RMSE charts for all 20 models
  3. Trend Explorer — historical demand lines + monthly seasonality heatmap
  4. Insights — promo impact analysis + Nigerian festivity calendar
  5. About — project details and links

One important Streamlit trick I used — @st.cache_resource to train all 20 models once at startup and reuse them across sessions:

@st.cache_resource
def train_all_models(df):
    models, metrics = {}, {}
    for product in df["product"].unique():
        # train and store each model
        models[product] = model
    return models, metrics, feature_cols
Enter fullscreen mode Exit fullscreen mode

Without this, the app would retrain 20 models on every user interaction — very slow.


Deployment

Deployed on Streamlit Community Cloud in 3 steps:

  1. Push to GitHub
  2. Connect repo at share.streamlit.io
  3. Add runtime.txt containing 3.11 to pin Python version

The runtime.txt file is critical — without it Streamlit Cloud may use Python 3.14+ which breaks some dependencies silently.


What I'd Improve in v2

  • Replace Linear Regression with Prophet for better seasonality decomposition
  • Add lag features (demand from last week, last month) for autocorrelation
  • Connect to a real retail database (SQLite or PostgreSQL)
  • Add inventory optimisation — recommend reorder points based on forecasts
  • Deploy as a FastAPI backend with a Streamlit frontend

Key Takeaways

  • Fourier features are a powerful, lightweight way to encode seasonality without needing Prophet
  • Training one model per SKU beats training one global model when products have very different demand patterns
  • @st.cache_resource is essential for any Streamlit app that trains models at startup
  • Nigerian retail has strong festivity-driven seasonality that generic models miss — localisation matters

Connect

If you found this useful or want to collaborate on data science projects in the Nigerian tech space, connect with me:


Tags: #python #machinelearning #datascience #streamlit #nigeria #retailtech #beginners #tutorial

Top comments (0)