Inventory Forecasting System From Scratch

By Kelvin Kipyegon

I am currently applying for a role that requires demand forecasting and inventory planning. I decided to do a project related to forecasting for my portfolio.

This post documents exactly what I did, what went wrong, how I fixed it, and what the final result looks like.

What I Set Out to Build

A system that could:

Take raw sales data and clean it properly
Forecast weekly demand for multiple products
Calculate safety stock and reorder points
Flag which products needed attention
Present everything in an Excel report and a live dashboard

The Data

I used the UCI Online Retail dataset — 541,909 real transactions from a wholesale operation, freely available online. It felt close enough to what a real workshop or warehouse operation would have.

First thing I did was look at what I was working with:

import pandas as pd

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00352/Online%20Retail.xlsx"
df = pd.read_excel(url)

print(f"Rows: {df.shape[0]}, Columns: {df.shape[1]}")
print(df.isnull().sum())

Rows: 541909, Columns: 8

CustomerID     135080
Description      1454

135,000 rows with no customer ID. That is 25% of the dataset. The data was a little bit messy.

Cleaning the Data

Four things needed fixing before I could send anything downstream:

df_clean = df.dropna(subset=['CustomerID', 'Description'])
df_clean = df_clean[~df_clean['InvoiceNo'].astype(str).str.startswith('C')]
df_clean = df_clean[df_clean['Quantity'] > 0]
df_clean = df_clean[df_clean['UnitPrice'] > 0]
df_clean['Revenue'] = df_clean['Quantity'] * df_clean['UnitPrice']

After cleaning I had 397,884 rows — down from 541,909. I lost about 26% of the data .Those rows were returns, cancellations, and test entries. Keeping them would have made every analysis after this point wrong.

Picking Products to Forecast

I filtered to the top 10 products by total quantity sold, then checked how many weeks of sales history each one had. One product — PAPER CRAFT, LITTLE BIRDIE — had 80,995 units sold in a single week and nothing else. That is not demand history, that is a one-off bulk order. I dropped it.

I ended up with 8 products, each with between 22 and 53 weeks of consistent weekly sales data.

The Forecasting Model

I started with Facebook Prophet, which is a time-series forecasting library. It did not work well. The reason which I only understood after debugging for a while is that Prophet needs at least 2 years of data to learn yearly seasonality properly. I had one year. The predictions were off by hundreds of percent on some products.

So I switched to a 4-week rolling average. Simpler, more honest about what the data could support, and easier to explain to someone who does not work in Python.

window = 4
history = list(train['y'].iloc[-window:].values)
predictions = []

for _ in range(len(test) + 4):
    pred = np.mean(history[-window:])
    predictions.append(pred)
    history.append(pred)

I trained on 80% of the data and tested on the remaining 20%. Accuracy was measured using RMSE — I avoided MAPE because it breaks when actual demand hits zero, which happened on a few test weeks.

Results:

Product	Avg Weekly Demand	Accuracy
Rabbit Night Light	2,763 units	66.8%
Assorted Bird Ornament	899 units	52.5%
Jumbo Bag Red Retrospot	1,168 units	40.5%
White Hanging Heart	699 units	36.6%
Mini Paint Set Vintage	625 units	29.1%
Popcorn Holder	1,914 units	12.7%
Cake Cases	448 units	0.8%
WW2 Gliders	1,358 units	0%

Average accuracy: 29.9%

Some of these are weak. The honest interpretation is that products like Cake Cases and WW2 Gliders have highly variable demand — a rolling average struggles with spikes. The right response to that is not to force a better-looking number, it is to give those products a larger safety stock buffer, which is exactly what the next step does.

Safety Stock and Reorder Points

These are standard inventory management formulas. I used a 95% service level and assumed a 1-week supplier lead time:

Safety Stock = 1.65 × Standard Deviation of Demand × √(Lead Time)

Reorder Point = Average Weekly Demand × Lead Time + Safety Stock

safety_stock = round(1.65 * std_demand * np.sqrt(lead_time_weeks))
rop = round(avg_demand * lead_time_weeks + safety_stock)

if current_stock <= safety_stock:
    status = 'CRITICAL'
elif current_stock <= rop:
    status = 'REORDER NOW'
else:
    status = 'OK'

Output:

Product	Safety Stock	Reorder Point	Status
Rabbit Night Light	2,523	3,760	🔴 CRITICAL
WW2 Gliders	1,611	2,657	🟡 REORDER NOW
Popcorn Holder	1,546	2,474	🟡 REORDER NOW
Bird Ornament	895	1,562	🟡 REORDER NOW
White Hanging Heart	903	1,597	🟡 REORDER NOW
Mini Paint Set	658	1,150	🟡 REORDER NOW
Cake Cases	642	1,278	🟡 REORDER NOW
Jumbo Bag	791	1,662	🟢 OK

Excel Report

I exported the full table to a color-coded Excel file using openpyxl. The idea was that a procurement team should be able to open this file and know exactly what needs ordering without any explanation.

The Dashboard

Everything feeds into a live Streamlit dashboard with three pages:

Stock Overview — headline numbers and charts comparing current stock against reorder points
Demand Forecast — select a product, see its history and 4-week forecast
Reorder Alerts — the full status table with recommended order quantities

Live dashboard: (https://forecasting-vv9qmj2kmug42tpct3jqv8.streamlit.app/)

What I Took Away From This

A few honest reflections:

The data cleaning took longer than the modelling. That was not what I expected going in, but it makes sense. Garbage in, garbage out — and in a real operation, the stock records are often the messiest part.

Choosing the simpler model was the right call. There is a temptation to use the most sophisticated tool available. Sometimes that is wrong.

The low-accuracy products were the most interesting finding. If I had just reported average accuracy and moved on, I would have missed the point. The variability is the insight — those products need different handling, not a better model.
Next I will try to automate everything to a good proper data ETL pipeline.

If you are working through something similar or have feedback on the approach, I would be glad to hear it.

— Kelvin Kipyegon