Inventory Forecasting System From Scratch
By Kelvin Kipyegon
I am currently applying for a role that requires demand forecasting and inventory planning. I decided to do a project related to forecasting for my portfolio.
This post documents exactly what I did, what went wrong, how I fixed it, and what the final result looks like.
What I Set Out to Build
A system that could:
- Take raw sales data and clean it properly
- Forecast weekly demand for multiple products
- Calculate safety stock and reorder points
- Flag which products needed attention
- Present everything in an Excel report and a live dashboard
The Data
I used the UCI Online Retail dataset — 541,909 real transactions from a wholesale operation, freely available online. It felt close enough to what a real workshop or warehouse operation would have.
First thing I did was look at what I was working with:
import pandas as pd
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00352/Online%20Retail.xlsx"
df = pd.read_excel(url)
print(f"Rows: {df.shape[0]}, Columns: {df.shape[1]}")
print(df.isnull().sum())
Rows: 541909, Columns: 8
CustomerID 135080
Description 1454
135,000 rows with no customer ID. That is 25% of the dataset. The data was a little bit messy.
Cleaning the Data
Four things needed fixing before I could send anything downstream:
df_clean = df.dropna(subset=['CustomerID', 'Description'])
df_clean = df_clean[~df_clean['InvoiceNo'].astype(str).str.startswith('C')]
df_clean = df_clean[df_clean['Quantity'] > 0]
df_clean = df_clean[df_clean['UnitPrice'] > 0]
df_clean['Revenue'] = df_clean['Quantity'] * df_clean['UnitPrice']
After cleaning I had 397,884 rows — down from 541,909. I lost about 26% of the data .Those rows were returns, cancellations, and test entries. Keeping them would have made every analysis after this point wrong.
Picking Products to Forecast
I filtered to the top 10 products by total quantity sold, then checked how many weeks of sales history each one had. One product — PAPER CRAFT, LITTLE BIRDIE — had 80,995 units sold in a single week and nothing else. That is not demand history, that is a one-off bulk order. I dropped it.
I ended up with 8 products, each with between 22 and 53 weeks of consistent weekly sales data.
The Forecasting Model
I started with Facebook Prophet, which is a time-series forecasting library. It did not work well. The reason which I only understood after debugging for a while is that Prophet needs at least 2 years of data to learn yearly seasonality properly. I had one year. The predictions were off by hundreds of percent on some products.
So I switched to a 4-week rolling average. Simpler, more honest about what the data could support, and easier to explain to someone who does not work in Python.
window = 4
history = list(train['y'].iloc[-window:].values)
predictions = []
for _ in range(len(test) + 4):
pred = np.mean(history[-window:])
predictions.append(pred)
history.append(pred)
I trained on 80% of the data and tested on the remaining 20%. Accuracy was measured using RMSE — I avoided MAPE because it breaks when actual demand hits zero, which happened on a few test weeks.
Results:
| Product | Avg Weekly Demand | Accuracy |
|---|---|---|
| Rabbit Night Light | 2,763 units | 66.8% |
| Assorted Bird Ornament | 899 units | 52.5% |
| Jumbo Bag Red Retrospot | 1,168 units | 40.5% |
| White Hanging Heart | 699 units | 36.6% |
| Mini Paint Set Vintage | 625 units | 29.1% |
| Popcorn Holder | 1,914 units | 12.7% |
| Cake Cases | 448 units | 0.8% |
| WW2 Gliders | 1,358 units | 0% |
Average accuracy: 29.9%
Some of these are weak. The honest interpretation is that products like Cake Cases and WW2 Gliders have highly variable demand — a rolling average struggles with spikes. The right response to that is not to force a better-looking number, it is to give those products a larger safety stock buffer, which is exactly what the next step does.
Safety Stock and Reorder Points
These are standard inventory management formulas. I used a 95% service level and assumed a 1-week supplier lead time:
Safety Stock = 1.65 × Standard Deviation of Demand × √(Lead Time)
Reorder Point = Average Weekly Demand × Lead Time + Safety Stock
safety_stock = round(1.65 * std_demand * np.sqrt(lead_time_weeks))
rop = round(avg_demand * lead_time_weeks + safety_stock)
if current_stock <= safety_stock:
status = 'CRITICAL'
elif current_stock <= rop:
status = 'REORDER NOW'
else:
status = 'OK'
Output:
| Product | Safety Stock | Reorder Point | Status |
|---|---|---|---|
| Rabbit Night Light | 2,523 | 3,760 | 🔴 CRITICAL |
| WW2 Gliders | 1,611 | 2,657 | 🟡 REORDER NOW |
| Popcorn Holder | 1,546 | 2,474 | 🟡 REORDER NOW |
| Bird Ornament | 895 | 1,562 | 🟡 REORDER NOW |
| White Hanging Heart | 903 | 1,597 | 🟡 REORDER NOW |
| Mini Paint Set | 658 | 1,150 | 🟡 REORDER NOW |
| Cake Cases | 642 | 1,278 | 🟡 REORDER NOW |
| Jumbo Bag | 791 | 1,662 | 🟢 OK |
Excel Report
I exported the full table to a color-coded Excel file using openpyxl. The idea was that a procurement team should be able to open this file and know exactly what needs ordering without any explanation.
The Dashboard
Everything feeds into a live Streamlit dashboard with three pages:
- Stock Overview — headline numbers and charts comparing current stock against reorder points
- Demand Forecast — select a product, see its history and 4-week forecast
- Reorder Alerts — the full status table with recommended order quantities
Live dashboard: (https://forecasting-vv9qmj2kmug42tpct3jqv8.streamlit.app/)
What I Took Away From This
A few honest reflections:
The data cleaning took longer than the modelling. That was not what I expected going in, but it makes sense. Garbage in, garbage out — and in a real operation, the stock records are often the messiest part.
Choosing the simpler model was the right call. There is a temptation to use the most sophisticated tool available. Sometimes that is wrong.
The low-accuracy products were the most interesting finding. If I had just reported average accuracy and moved on, I would have missed the point. The variability is the insight — those products need different handling, not a better model.
Next I will try to automate everything to a good proper data ETL pipeline.
If you are working through something similar or have feedback on the approach, I would be glad to hear it.
— Kelvin Kipyegon
Top comments (0)