Part 1 Test Stage Complete - Developer Breakdown
Dataset: huggingface.co/datasets/PeacebinfLow/ice-age-investment-narrative
Space: huggingface.co/spaces/PeacebinfLow/ice-age-ai-dashboard
Owner: PeacebinfLow
Status: Live and Public
WHAT THIS IS
This started as two photos of a handwritten phone ledger from a real ice
block business in Maun, Botswana. The owner was tracking daily sales,
stock levels, and restock costs by hand with no system to analyse the data.
The goal was to take that raw data and build a working AI business
intelligence system on top of it. No fake data. No demo numbers. Every
figure in this system came from a real ledger.
This post is the technical breakdown of how the system works right now,
what decisions were made during the build, and where it goes from here.
THE RAW INPUT
Two phone screenshots. Columns were:
Date / Day Number / Stock In / Expected Sell / Actual Sold /
Selling Price / Income / New Stock Added / Cost Per Pack /
Stock Cost / Net Profit / Running Balance
The data covered January 24 to February 11 2026.
16 trading days. Opening balance 542 Pula. Selling price 5 Pula per pack.
Restock cost ranged from 16 to 30 Pula per pack.
That spread is the entire story. The business was structurally losing money
because the restock cost was 3 to 6 times the selling price per unit.
The worst day was January 28: sold 3 packs, spent 200 Pula restocking,
net loss 185 Pula. Balance at end of real data: -52 Pula.
THE DATA TRANSFORMATION PIPELINE
Step 1 - Raw images to structured tables
Each row from the ledger photos was manually verified and entered.
Missing values were preserved as null or flagged with risk_flag = data_missing.
Nothing was estimated or filled in silently.
Step 2 - Tables to JSONL records
Each trading day became one event record with 19 fields:
id, date, month, event_type, day_number,
packs_in_stock, expected_sell, packs_sold,
selling_price_per_pack, income, new_stock_added,
cost_per_pack, stock_cost, net_profit, balance_after,
trend, risk_flag, confidence_score, xml_path, narrative
The narrative field is a plain English sentence describing what happened
that day. This makes the records human-readable and searchable by text.
The risk_flag field can be:
none / cashflow_risk / profit_risk / stock_risk /
data_missing / historical_loss
The confidence_score field is:
0.95 for fully verified real records
0.7 for real records with missing cost data
0.2 for records with critical missing fields
0.6-0.65 for simulated future months
Step 3 - Simulated future months added
March through June 2026 were simulated to complete a half-year arc.
Every simulated record is clearly labeled in the narrative field.
The simulation models the recovery scenario where restock cost drops
to 5 Pula per pack, matching the selling price and producing margin.
Step 4 - Parallel files generated
The same data exists in multiple formats so it can be used different ways:
- JSONL for the dataset and Space
- XML for voice navigation and structured queries
- XLSX for daily phone-based tracking in WPS Office
- Separate JSONL files for clients, restocks, and monthly summaries
HOW THE SPACE WORKS RIGHT NOW
The Space is a Gradio 6 app. No external AI API calls. No LLM at runtime.
The intelligence at this stage is pure dataset query logic.
Architecture:
load_dataset("PeacebinfLow/ice-age-investment-narrative", split="train")
-> pandas DataFrame
-> process_command(query) function
-> filter or sort the DataFrame
-> return summary text + data table + ASCII chart
The process_command function maps plain text input to DataFrame operations:
"show losses" -> df[df["net_profit"] < 0]
"open january" -> df[df["month"].str.contains("January")]
"best days" -> df.nlargest(5, "net_profit")
"show risk days" -> df[df["risk_flag"] != "none"]
"show balance" -> df.tail(1)
"any other text" -> df[df["narrative"].str.contains(query)]
The output has three parts:
- Summary text - command used, record count, totals, top 5 narratives
- Data table - clean DataFrame rendered as gr.Dataframe
- ASCII chart - text-based bar chart of profit and balance per day
The ASCII chart was a deliberate choice over Plotly. Plotly caused a
ModuleNotFoundError on first build. Rather than fight dependency issues,
the chart was rebuilt as plain text. It renders in any Textbox,
needs zero dependencies, and works on mobile without rendering lag.
Quick command buttons on the UI wire directly to process_command()
using lambda functions so users can tap once instead of typing.
Current limitations:
- No write-back to dataset at runtime
- No AI language model in the query layer
- No authentication or per-user state
- Simulated data months have lower confidence than real data
- Some real records have missing cost fields that affect balance accuracy
BUILD ERRORS HIT AND HOW THEY WERE FIXED
Error 1: gradio version conflict
requirements.txt had gradio==4.44.0
HF Spaces auto-installs gradio==6.8.0
Two versions of the same package. Build dies.
Fix: Remove gradio from requirements.txt entirely.
HF handles the install. You never pin gradio in a Space.
Error 2: gr.Code() language="xml" not supported
Gradio 6 dropped xml as a supported language for the Code component.
Fix: Change language="xml" to language="python" on all gr.Code() calls.
The XML still displays. The syntax highlighting just uses Python rules.
For this use case it makes no visible difference.
Error 3: css parameter deprecated in gr.Blocks()
Gradio 6 moved css from the Blocks constructor to the launch method.
Fix: Remove css from gr.Blocks(css=css) and add it to demo.launch(css=css)
Error 4: SyntaxError on Unicode em dash
Explanation text with em dashes got accidentally included inside app.py.
Python rejected U+2014 as an invalid character in source code.
Fix: Replace all em dashes and Unicode box characters with plain hyphens
and equals signs. Check the entire file before committing.
Error 5: ModuleNotFoundError plotly
plotly was used for charts but was not in requirements.txt.
Options were: add plotly to requirements.txt, or remove plotly entirely.
Fix chosen: Remove plotly. Rebuild charts as ASCII text.
This removed a dependency, reduced build time, and improved mobile performance.
The pattern across all five errors is the same:
Simplify. Remove the dependency. Use what is already there.
The Space works better now than it would have with all the original libraries.
WHAT IS IN THE DATASET RIGHT NOW
38 total records across these files:
ice_age_investment.jsonl - main event file
16 real sale events (Jan 24 to Feb 11)
4 real monthly summaries (Jan, Feb as real data)
16 simulated sale events (Mar through Jun)
4 simulated monthly summaries
1 half-year summary
1 blank template record
clients.jsonl - 14 client order records
restock_log.jsonl - 13 restock entries
monthly_summaries.jsonl - 6 monthly summaries Jan through Jun
templates/ - 3 blank templates for new entries
ice_age_full.xml - full XML mirror of all data
The XML file has a voice navigation map. Each record has an xml_path field
that points to its location in the XML tree. The plan is to use these paths
for voice command routing in a future version:
"Open January" -> /IceAgeInvestment/DailyTransactions/Month[@name='January']
"Show losses" -> //Day[net_profit < 0]
"Show balance" -> /IceAgeInvestment/FinancialSummary/CurrentRunningBalance
KNOWN DATA GAPS - PART OF THE REAL RECORD
These 7 gaps exist in the original ledger. They are preserved as-is.
They are not bugs. They are honest records of what was not written down.
30/01/26 - Actual packs sold not recorded
30/01/26 - Restock cost for 4 packs recorded as 00
31/01/26 - Restock cost for 5 packs recorded as 00
02/02/26 - New stock cost not visible in source image
07/02/26 - 4 packs added at cost 00
09/02/26 - 5 packs added at cost 00
10/02/26 - Half day labeled 8h, new stock situation unclear
The four zero-cost restock entries are interesting. They could be:
- Stock received free from a contact
- Costs paid by someone else and not recorded
- Entries the owner intended to fill in later
If they represent real costs, the balance calculations change significantly.
This is flagged in the dataset but not resolved because it is not our data
to change without confirmation from the business owner.
THE SYSTEM OVER TIME - PLANNED VERSIONS
The current system is intentionally minimal. It is a clean foundation.
Here is the honest roadmap from where it is now to where it could go:
VERSION 1 - Current (Test Stage Complete)
What it is:
Dataset on Hugging Face with real Jan-Feb data plus simulated Mar-Jun.
Gradio Space with text command interface, data table, ASCII charts.
Excel tracker with 6 sheets and 186 working formulas.
XML archive with full business data and voice navigation map.
What it cannot do:
No LLM in the query layer. Commands are string matching only.
No write-back. New records cannot be added from the Space at runtime.
No user accounts. Everyone sees the same dataset.
No forecasting. Simulated months are hand-modeled not computed.
VERSION 2 - AI Query Layer
What changes:
Connect the Space to a real language model via the Anthropic API.
The claude-haiku-4-5 model is the right choice here - fast and cheap.
Pass dataset records as context in each API call.
User types natural language. Model interprets and queries the DataFrame.
Return structured answer plus the relevant records.
Technical approach:
Keep the existing process_command() function as a fallback.
Add a second path: if the query does not match any keyword,
send it to the API with the last 10 records as context.
Stream the response back into the text output panel.
Cost stays low because context is small (business records not essays).
What this unlocks:
"Why did the balance go negative in February?" gets a real answer.
"What would happen if I raised the selling price to 7 Pula?" gets modeled.
"Which month had the best return on restock investment?" gets calculated.
VERSION 3 - Live Data Entry
What changes:
The business owner can add new daily records directly from their phone.
A form in the Space collects the day's numbers.
On submit, a new JSONL record is formatted and pushed to the dataset repo
using the Hugging Face Hub API with a write token stored as a Space secret.
The dashboard reloads and the new day is immediately queryable.
Technical approach:
Use huggingface_hub.HfApi().upload_file() to append to the JSONL.
Or maintain a separate live_data.jsonl and merge at query time.
The write token goes in Space secrets, never in the code.
Add a simple PIN or passphrase check before allowing writes.
What this unlocks:
The business owner stops using the phone ledger photos.
The dataset becomes a live business record not a historical snapshot.
Every day added increases the value of the forecasting layer.
VERSION 4 - Forecasting
What changes:
Use the accumulated real data to build a simple forecasting model.
Predict next week sales, restock needs, and expected profit
based on day of week patterns, seasonal trends, and recent trajectory.
Technical approach:
At 30+ real trading days the data is enough for basic time series.
Use pandas rolling averages and simple linear regression first.
No need for a full ML model at this scale.
Display forecast as a separate tab in the Space.
Show confidence interval based on variance in the real data.
What this unlocks:
"How many packs should I buy tomorrow?" becomes a data-driven answer.
"Is this week trending better than last week?" is answered automatically.
The business owner stops guessing and starts reading projections.
VERSION 5 - Voice and WhatsApp
What changes:
The owner can query the system by voice or WhatsApp message.
Say "show balance" into the phone. Get the answer read back.
Text "how much did I make this week" to a WhatsApp number.
Get a structured reply with the numbers.
Technical approach for voice:
Add gr.Audio(source="microphone") to the Space.
Use a speech-to-text model (Whisper via the Transformers library).
Feed the transcribed text into process_command().
Read the result back using a text-to-speech component.
Technical approach for WhatsApp:
Twilio WhatsApp Business API receives the incoming message.
A FastAPI backend on a small server parses it.
Passes the text to the same process_command() logic.
Returns the formatted summary as a WhatsApp reply.
The dataset never needs to be opened. The answer comes to the phone.
VERSION 6 - Multi-Business
What changes:
The same architecture serves multiple small businesses.
Each business has its own dataset partition identified by business_id.
One Space handles all of them with login routing.
Technical approach:
Add business_id field to every record.
Use HF dataset configurations (config_name per business).
Space login routes each user to their own data slice.
Business owner sees only their records. Admin sees all.
What this unlocks:
The system becomes a platform not a single-business tool.
Any informal business that tracks sales and stock on a phone
can plug into the same infrastructure.
Ice sellers, market vendors, small shops, airtime resellers.
HOW TO BUILD ON THIS RIGHT NOW
The dataset and Space are public. You can fork both without asking.
To query the live dashboard:
Go to huggingface.co/spaces/PeacebinfLow/ice-age-ai-dashboard
Click any quick command button or type in the search box.
The data table and chart update immediately.
To download the dataset:
Go to huggingface.co/datasets/PeacebinfLow/ice-age-investment-narrative
Click Files. Download ice_age_investment.jsonl.
Load it locally: pd.read_json("ice_age_investment.jsonl", lines=True)
To fork and extend:
Fork the Space repo.
The entire query engine is in one function: process_command().
Add new keywords, connect your own data, or swap in an API call.
The structure is intentionally flat so it is easy to modify.
To add a real AI layer yourself:
Get an Anthropic API key.
Add it as a Space secret called ANTHROPIC_API_KEY.
Add this to app.py:
import anthropic
import os
client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))
def ai_query(user_question, context_records):
message = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=500,
messages=[{
"role": "user",
"content": f"Business data: {context_records}\n\nQuestion: {user_question}"
}]
)
return message.content[0].text
Then call ai_query() inside run_query() when process_command()
returns fewer than 3 results or when the query contains a question mark.
To report a data issue:
Open a Discussion on the dataset repo.
If you find a calculation error, a missing record, or a gap that
should be flagged differently, note it there.
FINAL NOTE
This is Part 1. The test stage is complete.
The data is real. The losses are real. The recovery story is real.
The system is live and public.
Everything described in the future versions section is buildable
on top of what is already there. The foundation is solid.
The architecture is simple on purpose.
Simple systems get extended. Complex systems get abandoned.
Dataset: huggingface.co/datasets/PeacebinfLow/ice-age-investment-narrative
Space: huggingface.co/spaces/PeacebinfLow/ice-age-ai-dashboard
PeacebinfLow - Maun, Botswana - 2026







Top comments (0)