DEV Community

Kelvin Kipyegon
Kelvin Kipyegon

Posted on

Inventory Forecasting System From Scratch

Inventory Forecasting System From Scratch

By Kelvin Kipyegon


I am currently applying for a role that requires demand forecasting and inventory planning. I decided to do a project related to forecasting for my portfolio.

This post documents exactly what I did, what went wrong, how I fixed it, and what the final result looks like.


What I Set Out to Build

A system that could:

  • Take raw sales data and clean it properly
  • Forecast weekly demand for multiple products
  • Calculate safety stock and reorder points
  • Flag which products needed attention
  • Present everything in an Excel report and a live dashboard

The Data

I used the UCI Online Retail dataset — 541,909 real transactions from a wholesale operation, freely available online. It felt close enough to what a real workshop or warehouse operation would have.

First thing I did was look at what I was working with:

import pandas as pd

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00352/Online%20Retail.xlsx"
df = pd.read_excel(url)

print(f"Rows: {df.shape[0]}, Columns: {df.shape[1]}")
print(df.isnull().sum())
Enter fullscreen mode Exit fullscreen mode
Rows: 541909, Columns: 8

CustomerID     135080
Description      1454
Enter fullscreen mode Exit fullscreen mode

135,000 rows with no customer ID. That is 25% of the dataset. The data was a little bit messy.


Cleaning the Data

Four things needed fixing before I could send anything downstream:

df_clean = df.dropna(subset=['CustomerID', 'Description'])
df_clean = df_clean[~df_clean['InvoiceNo'].astype(str).str.startswith('C')]
df_clean = df_clean[df_clean['Quantity'] > 0]
df_clean = df_clean[df_clean['UnitPrice'] > 0]
df_clean['Revenue'] = df_clean['Quantity'] * df_clean['UnitPrice']
Enter fullscreen mode Exit fullscreen mode

After cleaning I had 397,884 rows — down from 541,909. I lost about 26% of the data .Those rows were returns, cancellations, and test entries. Keeping them would have made every analysis after this point wrong.


Picking Products to Forecast

I filtered to the top 10 products by total quantity sold, then checked how many weeks of sales history each one had. One product — PAPER CRAFT, LITTLE BIRDIE — had 80,995 units sold in a single week and nothing else. That is not demand history, that is a one-off bulk order. I dropped it.

I ended up with 8 products, each with between 22 and 53 weeks of consistent weekly sales data.


The Forecasting Model

I started with Facebook Prophet, which is a time-series forecasting library. It did not work well. The reason which I only understood after debugging for a while is that Prophet needs at least 2 years of data to learn yearly seasonality properly. I had one year. The predictions were off by hundreds of percent on some products.

So I switched to a 4-week rolling average. Simpler, more honest about what the data could support, and easier to explain to someone who does not work in Python.

window = 4
history = list(train['y'].iloc[-window:].values)
predictions = []

for _ in range(len(test) + 4):
    pred = np.mean(history[-window:])
    predictions.append(pred)
    history.append(pred)
Enter fullscreen mode Exit fullscreen mode

I trained on 80% of the data and tested on the remaining 20%. Accuracy was measured using RMSE — I avoided MAPE because it breaks when actual demand hits zero, which happened on a few test weeks.

Results:

Product Avg Weekly Demand Accuracy
Rabbit Night Light 2,763 units 66.8%
Assorted Bird Ornament 899 units 52.5%
Jumbo Bag Red Retrospot 1,168 units 40.5%
White Hanging Heart 699 units 36.6%
Mini Paint Set Vintage 625 units 29.1%
Popcorn Holder 1,914 units 12.7%
Cake Cases 448 units 0.8%
WW2 Gliders 1,358 units 0%

Average accuracy: 29.9%

Some of these are weak. The honest interpretation is that products like Cake Cases and WW2 Gliders have highly variable demand — a rolling average struggles with spikes. The right response to that is not to force a better-looking number, it is to give those products a larger safety stock buffer, which is exactly what the next step does.


Safety Stock and Reorder Points

These are standard inventory management formulas. I used a 95% service level and assumed a 1-week supplier lead time:

Safety Stock = 1.65 × Standard Deviation of Demand × √(Lead Time)

Reorder Point = Average Weekly Demand × Lead Time + Safety Stock

safety_stock = round(1.65 * std_demand * np.sqrt(lead_time_weeks))
rop = round(avg_demand * lead_time_weeks + safety_stock)

if current_stock <= safety_stock:
    status = 'CRITICAL'
elif current_stock <= rop:
    status = 'REORDER NOW'
else:
    status = 'OK'
Enter fullscreen mode Exit fullscreen mode

Output:

Product Safety Stock Reorder Point Status
Rabbit Night Light 2,523 3,760 🔴 CRITICAL
WW2 Gliders 1,611 2,657 🟡 REORDER NOW
Popcorn Holder 1,546 2,474 🟡 REORDER NOW
Bird Ornament 895 1,562 🟡 REORDER NOW
White Hanging Heart 903 1,597 🟡 REORDER NOW
Mini Paint Set 658 1,150 🟡 REORDER NOW
Cake Cases 642 1,278 🟡 REORDER NOW
Jumbo Bag 791 1,662 🟢 OK

Excel Report

I exported the full table to a color-coded Excel file using openpyxl. The idea was that a procurement team should be able to open this file and know exactly what needs ordering without any explanation.


The Dashboard

Everything feeds into a live Streamlit dashboard with three pages:

  • Stock Overview — headline numbers and charts comparing current stock against reorder points
  • Demand Forecast — select a product, see its history and 4-week forecast
  • Reorder Alerts — the full status table with recommended order quantities

Live dashboard: (https://forecasting-vv9qmj2kmug42tpct3jqv8.streamlit.app/)


What I Took Away From This

A few honest reflections:

The data cleaning took longer than the modelling. That was not what I expected going in, but it makes sense. Garbage in, garbage out — and in a real operation, the stock records are often the messiest part.

Choosing the simpler model was the right call. There is a temptation to use the most sophisticated tool available. Sometimes that is wrong.

The low-accuracy products were the most interesting finding. If I had just reported average accuracy and moved on, I would have missed the point. The variability is the insight — those products need different handling, not a better model.
Next I will try to automate everything to a good proper data ETL pipeline.


If you are working through something similar or have feedback on the approach, I would be glad to hear it.

— Kelvin Kipyegon

Top comments (0)