How I Built an AI That Detects When Your Appliances Are About to Break — Using Only a Smart Meter

#python #machinelearning #iot

When your microwave starts consuming 20% more power than usual, it's about to fail. Your fridge running longer cycles than normal? Compressor degradation. Most people find out only when the appliance dies completely — expensive, inconvenient, and totally preventable.

I built a system that catches this early. Using only the single power meter at your home's entrance.

The Problem With Existing NILM Systems

Non-Intrusive Load Monitoring (NILM) lets you figure out which appliances are running and how much power they're using — without installing sensors on every device. Smart meter data only.

Existing systems do this reasonably well. But they stop there.

They tell you how much energy your washing machine used. They don't tell you whether your washing machine is healthy.

That gap bothered me. Appliances degrade slowly — motor wear, clogged filters, heating element deterioration. By the time you notice something's wrong, the damage is done.

What I Built

I designed a two-stage pipeline called HNILM (Health-aware NILM):

Stage 1 — DBAN-ED (Energy Disaggregation)
A dual-branch 1D-CNN with multi-head attention that separates individual appliance power traces from the aggregate smart meter signal. Two parallel branches capture different temporal patterns — fast transients (microwave switching on) and slower cycles (dishwasher wash cycles). A 4-head attention layer then focuses on the most informative time steps.

Stage 2 — VXGB-AD (Anomaly Detection)
An XGBoost classifier that takes each appliance activation cycle and grades its health into four levels: Normal, Low, Medium, High. Instead of using raw power values, it computes 12 reference-anchored features — expressing each cycle relative to a healthy baseline. This makes it robust to household-specific usage patterns.

The whole thing runs on 8-second smart meter data — the standard commercial sampling rate. No lab-grade equipment needed.

Results on Real Data

I evaluated on the public REFIT House 2 dataset across five appliances: Dishwasher, Microwave, Kettle, Washing Machine, and Fridge.

Energy Disaggregation

Appliance	Accuracy	F1 Score
Microwave	0.958	0.923
Kettle	0.951	0.899
Dishwasher	0.888	0.823
Washing Machine	0.881	0.768
Fridge	0.737	0.724

Anomaly Detection

Appliance	Accuracy	F1 Score
Microwave	0.977	0.977
Fridge	0.910	0.914
Kettle	0.851	0.837
Dishwasher	0.810	0.809
Washing Machine	0.750	0.756

Why F1 matters more here: In anomaly detection, class imbalance is real — normal activations vastly outnumber faulty ones. A model that always predicts "Normal" would get high accuracy but zero usefulness. F1 balances precision and recall, penalizing missed faults. The fact that accuracy and F1 are nearly identical across all appliances confirms the model isn't cheating with class imbalance.

The model outperforms CNN, LSTM, GRU, DTW, and Random Forest baselines on both tasks.

The Key Design Decisions

Why dual-branch CNN?
Single kernel sizes miss either fast transients or slow cycles. Kernel size 3 catches sharp switching events. Kernel size 4 catches slower patterns. Concatenating both gives the attention layer richer features to work with.

Why XGBoost for anomaly detection instead of another neural network?
Interpretability and efficiency. XGBoost on 12 hand-crafted reference-anchored features trains in seconds, needs no GPU at inference, and gives you feature importances you can actually explain. A neural network here would be overkill.

Why reference-anchored features?
Raw power values vary between households — your microwave and my microwave have different baselines. By expressing every cycle relative to a global healthy mean, the classifier becomes household-agnostic. This is the single design choice that made anomaly detection actually work.

The Hardest Appliance: Fridge

The Fridge was the hardest to disaggregate — rapid low-amplitude compressor cycling makes it look like noise at 8-second sampling. MAE is higher than all other appliances.

But it's the second-easiest to detect anomalies in (0.910 accuracy, 0.914 F1). Why? Because compressor faults manifest as distinct duty-cycle changes — longer ON periods, shorter OFF periods — which the duration-ratio features capture cleanly.

This is a useful insight: disaggregation difficulty and anomaly detection difficulty are not correlated. Different failure modes are easier or harder to detect regardless of how clean the power trace is.

What's Next

The biggest open problem is the Washing Machine — its multi-stage power profile (pre-wash, wash, rinse, spin) makes both disaggregation and health grading harder. Phase-aware features or transformer-based temporal modelling are the natural next step.

Cross-building generalisation is the other open question — does a model trained on House 2 work on House 5? Real-world deployment depends on answering this.

Edge deployment through model quantisation is also on the roadmap — the goal is running this entirely on a Raspberry Pi attached to your smart meter.