DEV Community: ANKIT AMBASTA

How I Built an AI That Detects When Your Appliances Are About to Break — Using Only a Smart Meter

ANKIT AMBASTA — Wed, 13 May 2026 16:59:21 +0000

When your microwave starts consuming 20% more power than usual, it's about to fail. Your fridge running longer cycles than normal? Compressor degradation. Most people find out only when the appliance dies completely — expensive, inconvenient, and totally preventable.

I built a system that catches this early. Using only the single power meter at your home's entrance.

The Problem With Existing NILM Systems

Non-Intrusive Load Monitoring (NILM) lets you figure out which appliances are running and how much power they're using — without installing sensors on every device. Smart meter data only.

Existing systems do this reasonably well. But they stop there.

They tell you how much energy your washing machine used. They don't tell you whether your washing machine is healthy.

That gap bothered me. Appliances degrade slowly — motor wear, clogged filters, heating element deterioration. By the time you notice something's wrong, the damage is done.

What I Built

I designed a two-stage pipeline called HNILM (Health-aware NILM):

Stage 1 — DBAN-ED (Energy Disaggregation)
A dual-branch 1D-CNN with multi-head attention that separates individual appliance power traces from the aggregate smart meter signal. Two parallel branches capture different temporal patterns — fast transients (microwave switching on) and slower cycles (dishwasher wash cycles). A 4-head attention layer then focuses on the most informative time steps.

Stage 2 — VXGB-AD (Anomaly Detection)
An XGBoost classifier that takes each appliance activation cycle and grades its health into four levels: Normal, Low, Medium, High. Instead of using raw power values, it computes 12 reference-anchored features — expressing each cycle relative to a healthy baseline. This makes it robust to household-specific usage patterns.

The whole thing runs on 8-second smart meter data — the standard commercial sampling rate. No lab-grade equipment needed.

Results on Real Data

I evaluated on the public REFIT House 2 dataset across five appliances: Dishwasher, Microwave, Kettle, Washing Machine, and Fridge.

Energy Disaggregation

Appliance	Accuracy	F1 Score
Microwave	0.958	0.923
Kettle	0.951	0.899
Dishwasher	0.888	0.823
Washing Machine	0.881	0.768
Fridge	0.737	0.724

Anomaly Detection

Appliance	Accuracy	F1 Score
Microwave	0.977	0.977
Fridge	0.910	0.914
Kettle	0.851	0.837
Dishwasher	0.810	0.809
Washing Machine	0.750	0.756

Why F1 matters more here: In anomaly detection, class imbalance is real — normal activations vastly outnumber faulty ones. A model that always predicts "Normal" would get high accuracy but zero usefulness. F1 balances precision and recall, penalizing missed faults. The fact that accuracy and F1 are nearly identical across all appliances confirms the model isn't cheating with class imbalance.

The model outperforms CNN, LSTM, GRU, DTW, and Random Forest baselines on both tasks.

The Key Design Decisions

Why dual-branch CNN?
Single kernel sizes miss either fast transients or slow cycles. Kernel size 3 catches sharp switching events. Kernel size 4 catches slower patterns. Concatenating both gives the attention layer richer features to work with.

Why XGBoost for anomaly detection instead of another neural network?
Interpretability and efficiency. XGBoost on 12 hand-crafted reference-anchored features trains in seconds, needs no GPU at inference, and gives you feature importances you can actually explain. A neural network here would be overkill.

Why reference-anchored features?
Raw power values vary between households — your microwave and my microwave have different baselines. By expressing every cycle relative to a global healthy mean, the classifier becomes household-agnostic. This is the single design choice that made anomaly detection actually work.

The Hardest Appliance: Fridge

The Fridge was the hardest to disaggregate — rapid low-amplitude compressor cycling makes it look like noise at 8-second sampling. MAE is higher than all other appliances.

But it's the second-easiest to detect anomalies in (0.910 accuracy, 0.914 F1). Why? Because compressor faults manifest as distinct duty-cycle changes — longer ON periods, shorter OFF periods — which the duration-ratio features capture cleanly.

This is a useful insight: disaggregation difficulty and anomaly detection difficulty are not correlated. Different failure modes are easier or harder to detect regardless of how clean the power trace is.

What's Next

The biggest open problem is the Washing Machine — its multi-stage power profile (pre-wash, wash, rinse, spin) makes both disaggregation and health grading harder. Phase-aware features or transformer-based temporal modelling are the natural next step.

Cross-building generalisation is the other open question — does a model trained on House 2 work on House 5? Real-world deployment depends on answering this.

Edge deployment through model quantisation is also on the roadmap — the goal is running this entirely on a Raspberry Pi attached to your smart meter.

Links

📄 Read the full paper: https://doi.org/10.5281/zenodo.20068858

💻 Code & implementation: https://github.com/A-Square8/A-HEALTH-AWARE-NILM-FRAMEWORK-FOR-PREDICTIVE-APPLIANCE-MAINTENANCE-IN-SMART-HOMES

Working on smart home systems, energy management, or time-series ML? I'd love to hear your thoughts in the comments.

The Fallback Pattern: How I Handle 15+ RPM (30,000 Tokens/Min) on Free AI Models # The Solution: Dynamic Fallback Queue"

ANKIT AMBASTA — Tue, 12 May 2026 16:57:48 +0000

When I built VerdictAI X — a high-end decision support system where five specialized AI agents debate your life choices — I ran into a massive architectural problem.

Multi-agent systems do not just eat tokens; they completely destroy your rate limits.

Most tutorials show you how to build a simple chatbot that makes one API call per user message. But what happens when you have a multi-agent orchestration pipeline that triggers 21 simultaneous LLM calls for a single button click?

If you are using the free tier of Google AI Studio, you can hit 429 RESOURCE_EXHAUSTED errors almost immediately.

The bottleneck is not the tokens. It is the RPM (Requests Per Minute).

The Math: Why RPM Kills Multi-Agent Systems

VerdictAI X is not a standard chatbot; it is a multi-layered reasoning pipeline.

When a user submits a dilemma, the system spins up five specialized agents:

The Strategist
The Guardian
The Visionary
The Humanist
The Contrarian

A single user query requires the following behind the scenes:

Initial Analysis: 5 requests
Debate Round 1 (Challenge): 5 requests
Debate Round 2 (Defend & Challenge): 5 requests
Debate Round 2 (Defend): 5 requests
Final Verdict Synthesis: 1 request

Total = 21 LLM requests per user click

That creates a real problem for free-tier usage, because the primary model may allow only around 15 RPM. One user query can already exceed that ceiling, even when token usage is still well under the TPM limit.

The Solution: Dynamic Fallback Queue

Instead of hardcoding a single model, I built a fallback queue.

The idea was simple:

Try the primary model first
If it hits a rate limit, move to the next model
Keep retrying until one succeeds
Show a small system notice in the UI when switching models

This way, the app can keep streaming responses instead of crashing on a 429 error.

Core Failover Logic

Here is the architecture powering the automatic model switching inside gemini_client.py:

import os
from google import genai
from google.genai import types

FALLBACK_MODELS = [
    "gemini-3.1-flash-lite-preview",
    "gemini-2.5-flash",
    "gemma-4-31b-it",
    "gemma-4-26b-a4b-it",
]

def _get_model_queue(use_pro: bool) -> list:
    """Returns a list of models to try in order."""
    primary = "gemini-2.5-pro" if use_pro else "gemini-2.5-flash"
    return [primary] + FALLBACK_MODELS

def generate_stream(prompt: str, system_prompt: str = "", use_pro: bool = False):
    """
    Streams a response with automatic failover to fallback models.
    """
    client = genai.Client(api_key=os.getenv("GEMINI_API_KEY"))
    models_to_try = _get_model_queue(use_pro)

    for i, model in enumerate(models_to_try):
        config, final_prompt = _build_config_and_prompt(model, prompt, system_prompt)

        try:
            if i > 0:
                yield f"<br><span style='color:#fbbf24; font-size:10px;'>[System: Primary RPM limit reached. Switching to {model}...]</span><br>"

            for chunk in client.models.generate_content_stream(
                model=model,
                contents=final_prompt,
                config=config,
            ):
                if chunk.text:
                    yield chunk.text

            return

        except Exception as e:
            error_msg = str(e)

            if "429" in error_msg or "RESOURCE_EXHAUSTED" in error_msg:
                if i < len(models_to_try) - 1:
                    continue
                yield "<span style='color:#f43f5e; font-weight:600;'>System overloaded. All backup models are currently busy. Please try again in a few minutes.</span>"

            elif "500" in error_msg or "internal" in error_msg.lower():
                break

What This Actually Bought Me

When the primary model hits its RPM limit, generate_stream() catches the 429 error, skips to the next model, and retries the same prompt.

Because the fallback happens inside the streaming loop, the UI can show a tiny notice like this:

[System: Primary RPM limit reached. Switching to gemma-4-31b-it...]

The user does not get an ugly error screen. They just keep seeing the response stream normally.

Why I Am Writing About This

Most tutorials end at the point where one LLM call works.

But if you want to build complex, multi-agent AI applications, Requests Per Minute limits are one of the first real architectural hurdles you will face.

You do not always need to upgrade to a paid tier immediately. Sometimes the better solution is to design your system to fail gracefully and take advantage of the available model ecosystem.

Project Links

GitHub: VerdictAI X repository [https://github.com/A-Square8/VerdictAI-X]
LinkedIn: Ankit Ambasta [https://www.linkedin.com/in/ankit-ambasta-4a58002b9/]

Why I Used SHA-256 to Solve a Problem Most RAG Tutorials Pretend Doesn't Exist

ANKIT AMBASTA — Tue, 12 May 2026 16:18:27 +0000

When I built GridMind — a fully offline RAG assistant designed to run on CPU-only hardware with under 4 GB of RAM — I ran into a problem that no LangChain tutorial ever warned me about.

GridMind is a knowledge base assistant designed to work when there's no internet, no GPU, no cloud. Think disaster scenarios, remote areas, zombie apocalypse and government is not coming.

What happens when your knowledge base changes?

Most RAG demos show you the happy path: chunk documents, embed them, store vectors, query. Done. But they quietly skip the part where your source documents get updated, corrected, or extended. Because if you follow the naive approach, the answer is painful: re-embed everything from scratch, every single time.

For GridMind, that wasn't an option.

The Constraints That Forced Me to Think

GridMind's premise is that it works when the grid fails — no internet, no GPU, no cloud. It runs on a Raspberry Pi class machine using nomic-embed-text for embeddings and qwen2.5:3b via Ollama for inference.

Embedding is the expensive step. On CPU, embedding a full knowledge base across 8 survival domains (water, shelter, medical, navigation, etc.) takes minutes. Re-running that every time I updated a markdown file was a non-starter.

I needed a way to know, cheaply and reliably, exactly which documents had changed since the last index run — and only re-embed those.

The Solution: SHA-256 as a Change Fingerprint

The core idea is simple but I didn't see it written about clearly anywhere, so I'll spell it out.

Before embedding any document, compute its SHA-256 hash and store it alongside its vector in FAISS metadata. On the next indexing run, before calling the embedding model at all, hash the current file and compare it against the stored hash.

Hash matches → skip. The document hasn't changed. No embedding call made.
Hash differs → re-embed and update the stored hash.
New file (no hash stored) → embed fresh and store the hash.
File deleted → remove its vectors from the index.

import hashlib

def hash_file(filepath: str) -> str:
    sha256 = hashlib.sha256()
    with open(filepath, "rb") as f:
        for chunk in iter(lambda: f.read(8192), b""):
            sha256.update(chunk)
    return sha256.hexdigest()

Reading in 8 KB chunks matters — it keeps memory flat even for large documents.

Why SHA-256 Specifically?

A few alternatives I considered:

File modification timestamps (mtime) — Fast, but unreliable. Copying a file, running a deployment script, or touching a file changes mtime without changing content. You'd re-embed files that didn't need it.

File size — Even faster, even less reliable. A one-character edit to a 10 KB file changes content but not size.

MD5 — Would work fine here. SHA-256 is marginally slower but the difference at this scale is microseconds. I used it because it's the standard I'm used to reaching for and collision resistance, while overkill for this use case, costs nothing.

The Index Store Structure

I kept a simple JSON manifest alongside the FAISS index:

{
  "documents": {
    "data/water/purification.md": {
      "hash": "a3f5c2d1...",
      "vector_ids": [0, 1, 2, 3],
      "indexed_at": "2024-11-14T10:22:00"
    },
    "data/medical/wound-care.md": {
      "hash": "9b8e1f44...",
      "vector_ids": [4, 5, 6],
      "indexed_at": "2024-11-14T10:22:01"
    }
  }
}

Tracking vector_ids per document is what makes deletion and update clean — when a file changes, you know exactly which FAISS vectors to remove before inserting the new ones.

What This Actually Bought Me

On a knowledge base update where I corrected two markdown files and added one new one, the indexer processed 3 files instead of 47. Embedding time dropped from ~6 minutes to ~40 seconds on the test machine.

More importantly, it made iteration feel fast. When you're building a local-first tool and testing knowledge base changes, waiting 6 minutes per cycle kills momentum. 40 seconds doesn't.

The Honest Limitations

This approach has real tradeoffs I want to be upfront about:

FAISS doesn't natively support deletion. To "remove" old vectors, I rebuild the index from the non-deleted vectors. For 47 documents this is fast. At 10,000 documents it would become the bottleneck. A production system would reach for something like Qdrant or Weaviate that supports vector-level deletes natively.

The manifest is a single JSON file with no locking. If two indexing processes ran simultaneously (they don't in GridMind, but still), you'd get corruption. A proper solution uses SQLite or file-level locking.

SHA-256 hashes content, not semantics. If I rename a section header in a document, the hash changes and it re-embeds — even though the semantic content barely changed. That's probably the right behavior, but it's worth knowing.

Why I'm Writing About This

Because the RAG tutorials that got me started all ended at step 3. They showed me how to build something that works once, in a clean demo environment, with a static knowledge base.

Real systems have messy, evolving data. If you're building anything beyond a proof-of-concept, you'll hit this problem. I spent a day thinking through the right approach before I wrote a line of code, and I think that day was worth it.

GridMind is open source. If you're building something offline-first or resource-constrained, the indexer code is in the repo — feel free to use or adapt it.

GitHub → [https://github.com/A-Square8/GRIDMIND-Intelligence-When-the-Grid-Fails] | LinkedIn → [https://www.linkedin.com/in/ankit-ambasta-4a58002b9]