<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: ANKIT AMBASTA</title>
    <description>The latest articles on DEV Community by ANKIT AMBASTA (@asquare8).</description>
    <link>https://dev.to/asquare8</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3927544%2F647e9276-d20d-401f-9a26-d17c1071cd8f.png</url>
      <title>DEV Community: ANKIT AMBASTA</title>
      <link>https://dev.to/asquare8</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/asquare8"/>
    <language>en</language>
    <item>
      <title>How I Built an AI That Detects When Your Appliances Are About to Break — Using Only a Smart Meter</title>
      <dc:creator>ANKIT AMBASTA</dc:creator>
      <pubDate>Wed, 13 May 2026 16:59:21 +0000</pubDate>
      <link>https://dev.to/asquare8/how-i-built-an-ai-that-detects-when-your-appliances-are-about-to-break-using-only-a-smart-meter-ohp</link>
      <guid>https://dev.to/asquare8/how-i-built-an-ai-that-detects-when-your-appliances-are-about-to-break-using-only-a-smart-meter-ohp</guid>
      <description>&lt;p&gt;When your microwave starts consuming 20% more power than usual, it's about to fail. Your fridge running longer cycles than normal? Compressor degradation. Most people find out only when the appliance dies completely — expensive, inconvenient, and totally preventable.&lt;/p&gt;

&lt;p&gt;I built a system that catches this early. Using only the single power meter at your home's entrance.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem With Existing NILM Systems
&lt;/h2&gt;

&lt;p&gt;Non-Intrusive Load Monitoring (NILM) lets you figure out which appliances are running and how much power they're using — without installing sensors on every device. Smart meter data only.&lt;/p&gt;

&lt;p&gt;Existing systems do this reasonably well. But they stop there.&lt;/p&gt;

&lt;p&gt;They tell you &lt;strong&gt;how much energy&lt;/strong&gt; your washing machine used. They don't tell you whether your washing machine is &lt;strong&gt;healthy&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That gap bothered me. Appliances degrade slowly — motor wear, clogged filters, heating element deterioration. By the time you notice something's wrong, the damage is done.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;I designed a two-stage pipeline called &lt;strong&gt;HNILM&lt;/strong&gt; (Health-aware NILM):&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 1 — DBAN-ED (Energy Disaggregation)&lt;/strong&gt;&lt;br&gt;
A dual-branch 1D-CNN with multi-head attention that separates individual appliance power traces from the aggregate smart meter signal. Two parallel branches capture different temporal patterns — fast transients (microwave switching on) and slower cycles (dishwasher wash cycles). A 4-head attention layer then focuses on the most informative time steps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 2 — VXGB-AD (Anomaly Detection)&lt;/strong&gt;&lt;br&gt;
An XGBoost classifier that takes each appliance activation cycle and grades its health into four levels: &lt;strong&gt;Normal, Low, Medium, High&lt;/strong&gt;. Instead of using raw power values, it computes 12 reference-anchored features — expressing each cycle relative to a healthy baseline. This makes it robust to household-specific usage patterns.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F92b72it19pegubeqp5o7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F92b72it19pegubeqp5o7.png" alt=" " width="800" height="254"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The whole thing runs on &lt;strong&gt;8-second smart meter data&lt;/strong&gt; — the standard commercial sampling rate. No lab-grade equipment needed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Results on Real Data
&lt;/h2&gt;

&lt;p&gt;I evaluated on the public &lt;strong&gt;REFIT House 2 dataset&lt;/strong&gt; across five appliances: Dishwasher, Microwave, Kettle, Washing Machine, and Fridge.&lt;/p&gt;

&lt;h3&gt;
  
  
  Energy Disaggregation
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Appliance&lt;/th&gt;
&lt;th&gt;Accuracy&lt;/th&gt;
&lt;th&gt;F1 Score&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Microwave&lt;/td&gt;
&lt;td&gt;0.958&lt;/td&gt;
&lt;td&gt;0.923&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kettle&lt;/td&gt;
&lt;td&gt;0.951&lt;/td&gt;
&lt;td&gt;0.899&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dishwasher&lt;/td&gt;
&lt;td&gt;0.888&lt;/td&gt;
&lt;td&gt;0.823&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Washing Machine&lt;/td&gt;
&lt;td&gt;0.881&lt;/td&gt;
&lt;td&gt;0.768&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fridge&lt;/td&gt;
&lt;td&gt;0.737&lt;/td&gt;
&lt;td&gt;0.724&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Anomaly Detection
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Appliance&lt;/th&gt;
&lt;th&gt;Accuracy&lt;/th&gt;
&lt;th&gt;F1 Score&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Microwave&lt;/td&gt;
&lt;td&gt;0.977&lt;/td&gt;
&lt;td&gt;0.977&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fridge&lt;/td&gt;
&lt;td&gt;0.910&lt;/td&gt;
&lt;td&gt;0.914&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kettle&lt;/td&gt;
&lt;td&gt;0.851&lt;/td&gt;
&lt;td&gt;0.837&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dishwasher&lt;/td&gt;
&lt;td&gt;0.810&lt;/td&gt;
&lt;td&gt;0.809&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Washing Machine&lt;/td&gt;
&lt;td&gt;0.750&lt;/td&gt;
&lt;td&gt;0.756&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Why F1 matters more here:&lt;/strong&gt; In anomaly detection, class imbalance is real — normal activations vastly outnumber faulty ones. A model that always predicts "Normal" would get high accuracy but zero usefulness. F1 balances precision and recall, penalizing missed faults. The fact that accuracy and F1 are nearly identical across all appliances confirms the model isn't cheating with class imbalance.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The model outperforms CNN, LSTM, GRU, DTW, and Random Forest baselines on both tasks.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Key Design Decisions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Why dual-branch CNN?&lt;/strong&gt;&lt;br&gt;
Single kernel sizes miss either fast transients or slow cycles. Kernel size 3 catches sharp switching events. Kernel size 4 catches slower patterns. Concatenating both gives the attention layer richer features to work with.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why XGBoost for anomaly detection instead of another neural network?&lt;/strong&gt;&lt;br&gt;
Interpretability and efficiency. XGBoost on 12 hand-crafted reference-anchored features trains in seconds, needs no GPU at inference, and gives you feature importances you can actually explain. A neural network here would be overkill.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why reference-anchored features?&lt;/strong&gt;&lt;br&gt;
Raw power values vary between households — your microwave and my microwave have different baselines. By expressing every cycle relative to a global healthy mean, the classifier becomes household-agnostic. This is the single design choice that made anomaly detection actually work.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Hardest Appliance: Fridge
&lt;/h2&gt;

&lt;p&gt;The Fridge was the hardest to disaggregate — rapid low-amplitude compressor cycling makes it look like noise at 8-second sampling. MAE is higher than all other appliances.&lt;/p&gt;

&lt;p&gt;But it's the second-easiest to detect anomalies in (0.910 accuracy, 0.914 F1). Why? Because compressor faults manifest as distinct duty-cycle changes — longer ON periods, shorter OFF periods — which the duration-ratio features capture cleanly.&lt;/p&gt;

&lt;p&gt;This is a useful insight: &lt;strong&gt;disaggregation difficulty and anomaly detection difficulty are not correlated.&lt;/strong&gt; Different failure modes are easier or harder to detect regardless of how clean the power trace is.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;The biggest open problem is the Washing Machine — its multi-stage power profile (pre-wash, wash, rinse, spin) makes both disaggregation and health grading harder. Phase-aware features or transformer-based temporal modelling are the natural next step.&lt;/p&gt;

&lt;p&gt;Cross-building generalisation is the other open question — does a model trained on House 2 work on House 5? Real-world deployment depends on answering this.&lt;/p&gt;

&lt;p&gt;Edge deployment through model quantisation is also on the roadmap — the goal is running this entirely on a Raspberry Pi attached to your smart meter.&lt;/p&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;p&gt;📄 &lt;strong&gt;Read the full paper:&lt;/strong&gt; &lt;a href="https://doi.org/10.5281/zenodo.20068858" rel="noopener noreferrer"&gt;https://doi.org/10.5281/zenodo.20068858&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;💻 &lt;strong&gt;Code &amp;amp; implementation:&lt;/strong&gt; &lt;a href="https://github.com/A-Square8/A-HEALTH-AWARE-NILM-FRAMEWORK-FOR-PREDICTIVE-APPLIANCE-MAINTENANCE-IN-SMART-HOMES" rel="noopener noreferrer"&gt;https://github.com/A-Square8/A-HEALTH-AWARE-NILM-FRAMEWORK-FOR-PREDICTIVE-APPLIANCE-MAINTENANCE-IN-SMART-HOMES&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Working on smart home systems, energy management, or time-series ML? I'd love to hear your thoughts in the comments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>machinelearning</category>
      <category>iot</category>
    </item>
    <item>
      <title>The Fallback Pattern: How I Handle 15+ RPM (30,000 Tokens/Min) on Free AI Models # The Solution: Dynamic Fallback Queue"</title>
      <dc:creator>ANKIT AMBASTA</dc:creator>
      <pubDate>Tue, 12 May 2026 16:57:48 +0000</pubDate>
      <link>https://dev.to/asquare8/the-fallback-pattern-how-i-handle-15-rpm-30000-tokensmin-on-free-ai-models-the-solution-4dig</link>
      <guid>https://dev.to/asquare8/the-fallback-pattern-how-i-handle-15-rpm-30000-tokensmin-on-free-ai-models-the-solution-4dig</guid>
      <description>&lt;p&gt;When I built &lt;strong&gt;VerdictAI X&lt;/strong&gt; — a high-end decision support system where five specialized AI agents debate your life choices — I ran into a massive architectural problem.&lt;/p&gt;

&lt;p&gt;Multi-agent systems do not just eat tokens; they completely destroy your rate limits.&lt;/p&gt;

&lt;p&gt;Most tutorials show you how to build a simple chatbot that makes one API call per user message. But what happens when you have a multi-agent orchestration pipeline that triggers &lt;strong&gt;21 simultaneous LLM calls&lt;/strong&gt; for a single button click?&lt;/p&gt;

&lt;p&gt;If you are using the free tier of Google AI Studio, you can hit &lt;code&gt;429 RESOURCE_EXHAUSTED&lt;/code&gt; errors almost immediately.&lt;/p&gt;

&lt;p&gt;The bottleneck is not the tokens. It is the &lt;strong&gt;RPM (Requests Per Minute)&lt;/strong&gt;. &lt;/p&gt;




&lt;h1&gt;
  
  
  The Math: Why RPM Kills Multi-Agent Systems
&lt;/h1&gt;

&lt;p&gt;VerdictAI X is not a standard chatbot; it is a multi-layered reasoning pipeline.&lt;/p&gt;

&lt;p&gt;When a user submits a dilemma, the system spins up five specialized agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The Strategist&lt;/li&gt;
&lt;li&gt;The Guardian&lt;/li&gt;
&lt;li&gt;The Visionary&lt;/li&gt;
&lt;li&gt;The Humanist&lt;/li&gt;
&lt;li&gt;The Contrarian&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A single user query requires the following behind the scenes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Initial Analysis: 5 requests
Debate Round 1 (Challenge): 5 requests
Debate Round 2 (Defend &amp;amp; Challenge): 5 requests
Debate Round 2 (Defend): 5 requests
Final Verdict Synthesis: 1 request

Total = 21 LLM requests per user click
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That creates a real problem for free-tier usage, because the primary model may allow only around 15 RPM. One user query can already exceed that ceiling, even when token usage is still well under the TPM limit. &lt;/p&gt;




&lt;h1&gt;
  
  
  The Solution: Dynamic Fallback Queue
&lt;/h1&gt;

&lt;p&gt;Instead of hardcoding a single model, I built a &lt;strong&gt;fallback queue&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The idea was simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Try the primary model first&lt;/li&gt;
&lt;li&gt;If it hits a rate limit, move to the next model&lt;/li&gt;
&lt;li&gt;Keep retrying until one succeeds&lt;/li&gt;
&lt;li&gt;Show a small system notice in the UI when switching models&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This way, the app can keep streaming responses instead of crashing on a 429 error. &lt;/p&gt;




&lt;h1&gt;
  
  
  Core Failover Logic
&lt;/h1&gt;

&lt;p&gt;Here is the architecture powering the automatic model switching inside &lt;code&gt;gemini_client.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.genai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;

&lt;span class="n"&gt;FALLBACK_MODELS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-3.1-flash-lite-preview&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.5-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemma-4-31b-it&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemma-4-26b-a4b-it&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_get_model_queue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;use_pro&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Returns a list of models to try in order.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;primary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.5-pro&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;use_pro&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.5-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;primary&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;FALLBACK_MODELS&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;use_pro&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Streams a response with automatic failover to fallback models.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GEMINI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;models_to_try&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_get_model_queue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;use_pro&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;models_to_try&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;final_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_build_config_and_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;br&amp;gt;&amp;lt;span style=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;color:#fbbf24; font-size:10px;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;[System: Primary RPM limit reached. Switching to &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;...]&amp;lt;/span&amp;gt;&amp;lt;br&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content_stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;final_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;

            &lt;span class="k"&gt;return&lt;/span&gt;

        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;error_msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;429&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;error_msg&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;RESOURCE_EXHAUSTED&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;error_msg&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;models_to_try&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;continue&lt;/span&gt;
                &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;span style=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;color:#f43f5e; font-weight:600;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;System overloaded. All backup models are currently busy. Please try again in a few minutes.&amp;lt;/span&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

            &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;500&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;error_msg&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;internal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;error_msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
                &lt;span class="k"&gt;break&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h1&gt;
  
  
  What This Actually Bought Me
&lt;/h1&gt;

&lt;p&gt;When the primary model hits its RPM limit, &lt;code&gt;generate_stream()&lt;/code&gt; catches the &lt;code&gt;429&lt;/code&gt; error, skips to the next model, and retries the same prompt.&lt;/p&gt;

&lt;p&gt;Because the fallback happens inside the streaming loop, the UI can show a tiny notice like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[System: Primary RPM limit reached. Switching to gemma-4-31b-it...]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The user does not get an ugly error screen. They just keep seeing the response stream normally. &lt;/p&gt;




&lt;h1&gt;
  
  
  Why I Am Writing About This
&lt;/h1&gt;

&lt;p&gt;Most tutorials end at the point where one LLM call works.&lt;/p&gt;

&lt;p&gt;But if you want to build complex, multi-agent AI applications, &lt;strong&gt;Requests Per Minute&lt;/strong&gt; limits are one of the first real architectural hurdles you will face.&lt;/p&gt;

&lt;p&gt;You do not always need to upgrade to a paid tier immediately. Sometimes the better solution is to design your system to fail gracefully and take advantage of the available model ecosystem. &lt;/p&gt;




&lt;h1&gt;
  
  
  Project Links
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: VerdictAI X repository [&lt;a href="https://github.com/A-Square8/VerdictAI-X" rel="noopener noreferrer"&gt;https://github.com/A-Square8/VerdictAI-X&lt;/a&gt;]&lt;/li&gt;
&lt;li&gt;LinkedIn: Ankit Ambasta [&lt;a href="https://www.linkedin.com/in/ankit-ambasta-4a58002b9/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/ankit-ambasta-4a58002b9/&lt;/a&gt;]&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>python</category>
    </item>
    <item>
      <title>Why I Used SHA-256 to Solve a Problem Most RAG Tutorials Pretend Doesn't Exist</title>
      <dc:creator>ANKIT AMBASTA</dc:creator>
      <pubDate>Tue, 12 May 2026 16:18:27 +0000</pubDate>
      <link>https://dev.to/asquare8/why-i-used-sha-256-to-solve-a-problem-most-rag-tutorials-pretend-doesnt-exist-2gbc</link>
      <guid>https://dev.to/asquare8/why-i-used-sha-256-to-solve-a-problem-most-rag-tutorials-pretend-doesnt-exist-2gbc</guid>
      <description>&lt;p&gt;When I built GridMind — a fully offline RAG assistant designed to run on CPU-only hardware with under 4 GB of RAM — I ran into a problem that no LangChain tutorial ever warned me about.&lt;/p&gt;

&lt;p&gt;GridMind is a knowledge base assistant designed to work when there's no internet, no GPU, no cloud. Think disaster scenarios, remote areas, zombie apocalypse and government is not coming.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What happens when your knowledge base changes?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most RAG demos show you the happy path: chunk documents, embed them, store vectors, query. Done. But they quietly skip the part where your source documents get updated, corrected, or extended. Because if you follow the naive approach, the answer is painful: re-embed everything from scratch, every single time.&lt;/p&gt;

&lt;p&gt;For GridMind, that wasn't an option.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Constraints That Forced Me to Think
&lt;/h2&gt;

&lt;p&gt;GridMind's premise is that it works &lt;em&gt;when the grid fails&lt;/em&gt; — no internet, no GPU, no cloud. It runs on a Raspberry Pi class machine using &lt;code&gt;nomic-embed-text&lt;/code&gt; for embeddings and &lt;code&gt;qwen2.5:3b&lt;/code&gt; via Ollama for inference.&lt;/p&gt;

&lt;p&gt;Embedding is the expensive step. On CPU, embedding a full knowledge base across 8 survival domains (water, shelter, medical, navigation, etc.) takes minutes. Re-running that every time I updated a markdown file was a non-starter.&lt;/p&gt;

&lt;p&gt;I needed a way to know, cheaply and reliably, exactly which documents had changed since the last index run — and only re-embed those.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Solution: SHA-256 as a Change Fingerprint
&lt;/h2&gt;

&lt;p&gt;The core idea is simple but I didn't see it written about clearly anywhere, so I'll spell it out.&lt;/p&gt;

&lt;p&gt;Before embedding any document, compute its SHA-256 hash and store it alongside its vector in FAISS metadata. On the next indexing run, before calling the embedding model at all, hash the current file and compare it against the stored hash.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hash matches&lt;/strong&gt; → skip. The document hasn't changed. No embedding call made.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hash differs&lt;/strong&gt; → re-embed and update the stored hash.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;New file (no hash stored)&lt;/strong&gt; → embed fresh and store the hash.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;File deleted&lt;/strong&gt; → remove its vectors from the index.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;hash_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;sha256&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sha256&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;iter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;8192&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="sa"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;sha256&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;sha256&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;hexdigest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Reading in 8 KB chunks matters — it keeps memory flat even for large documents.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why SHA-256 Specifically?
&lt;/h2&gt;

&lt;p&gt;A few alternatives I considered:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;File modification timestamps (&lt;code&gt;mtime&lt;/code&gt;)&lt;/strong&gt; — Fast, but unreliable. Copying a file, running a deployment script, or touching a file changes &lt;code&gt;mtime&lt;/code&gt; without changing content. You'd re-embed files that didn't need it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;File size&lt;/strong&gt; — Even faster, even less reliable. A one-character edit to a 10 KB file changes content but not size.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MD5&lt;/strong&gt; — Would work fine here. SHA-256 is marginally slower but the difference at this scale is microseconds. I used it because it's the standard I'm used to reaching for and collision resistance, while overkill for this use case, costs nothing.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Index Store Structure
&lt;/h2&gt;

&lt;p&gt;I kept a simple JSON manifest alongside the FAISS index:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"documents"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"data/water/purification.md"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"hash"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"a3f5c2d1..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"vector_ids"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"indexed_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2024-11-14T10:22:00"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"data/medical/wound-care.md"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"hash"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"9b8e1f44..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"vector_ids"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"indexed_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2024-11-14T10:22:01"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tracking &lt;code&gt;vector_ids&lt;/code&gt; per document is what makes deletion and update clean — when a file changes, you know exactly which FAISS vectors to remove before inserting the new ones.&lt;/p&gt;




&lt;h2&gt;
  
  
  What This Actually Bought Me
&lt;/h2&gt;

&lt;p&gt;On a knowledge base update where I corrected two markdown files and added one new one, the indexer processed 3 files instead of 47. Embedding time dropped from ~6 minutes to ~40 seconds on the test machine.&lt;/p&gt;

&lt;p&gt;More importantly, it made iteration &lt;em&gt;feel&lt;/em&gt; fast. When you're building a local-first tool and testing knowledge base changes, waiting 6 minutes per cycle kills momentum. 40 seconds doesn't.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Honest Limitations
&lt;/h2&gt;

&lt;p&gt;This approach has real tradeoffs I want to be upfront about:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;FAISS doesn't natively support deletion.&lt;/strong&gt; To "remove" old vectors, I rebuild the index from the non-deleted vectors. For 47 documents this is fast. At 10,000 documents it would become the bottleneck. A production system would reach for something like Qdrant or Weaviate that supports vector-level deletes natively.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The manifest is a single JSON file with no locking.&lt;/strong&gt; If two indexing processes ran simultaneously (they don't in GridMind, but still), you'd get corruption. A proper solution uses SQLite or file-level locking.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SHA-256 hashes content, not semantics.&lt;/strong&gt; If I rename a section header in a document, the hash changes and it re-embeds — even though the semantic content barely changed. That's probably the right behavior, but it's worth knowing.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why I'm Writing About This
&lt;/h2&gt;

&lt;p&gt;Because the RAG tutorials that got me started all ended at step 3. They showed me how to build something that works once, in a clean demo environment, with a static knowledge base.&lt;/p&gt;

&lt;p&gt;Real systems have messy, evolving data. If you're building anything beyond a proof-of-concept, you'll hit this problem. I spent a day thinking through the right approach before I wrote a line of code, and I think that day was worth it.&lt;/p&gt;

&lt;p&gt;GridMind is open source. If you're building something offline-first or resource-constrained, the indexer code is in the repo — feel free to use or adapt it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;GitHub → [&lt;a href="https://github.com/A-Square8/GRIDMIND-Intelligence-When-the-Grid-Fails" rel="noopener noreferrer"&gt;https://github.com/A-Square8/GRIDMIND-Intelligence-When-the-Grid-Fails&lt;/a&gt;] | LinkedIn → [&lt;a href="https://www.linkedin.com/in/ankit-ambasta-4a58002b9" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/ankit-ambasta-4a58002b9&lt;/a&gt;]&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
