<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: ZhengZhiCong</title>
    <description>The latest articles on DEV Community by ZhengZhiCong (@mickey_zzc).</description>
    <link>https://dev.to/mickey_zzc</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3972064%2Fd8a4b5b3-f282-4858-b6f6-9f246118a9b0.png</url>
      <title>DEV Community: ZhengZhiCong</title>
      <link>https://dev.to/mickey_zzc</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mickey_zzc"/>
    <language>en</language>
    <item>
      <title>VictoriaMetrics Stream Aggregation: A Three-Year Retrospective (2026)</title>
      <dc:creator>ZhengZhiCong</dc:creator>
      <pubDate>Sun, 07 Jun 2026 22:56:28 +0000</pubDate>
      <link>https://dev.to/mickey_zzc/victoriametrics-stream-aggregation-a-three-year-retrospective-2026-2ojg</link>
      <guid>https://dev.to/mickey_zzc/victoriametrics-stream-aggregation-a-three-year-retrospective-2026-2ojg</guid>
      <description>&lt;p&gt;It's been exactly three years since the &lt;a href="https://blog.mickeyzzc.tech/en/posts/telemetry/stream-metrics-one/" rel="noopener noreferrer"&gt;first article&lt;/a&gt; in this series was published in March 2023. The VictoriaMetrics ecosystem has changed dramatically since then. Let's revisit the problems we laid out, see what the official project has resolved, and assess where our custom &lt;code&gt;stream-metrics-route&lt;/code&gt; gateway stands today.&lt;/p&gt;




&lt;h2&gt;
  
  
  I. The Problems We Identified in 2023
&lt;/h2&gt;

&lt;p&gt;Here's a quick recap of the core issues from the original post:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Problem&lt;/th&gt;
&lt;th&gt;2023 Status&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;P1&lt;/td&gt;
&lt;td&gt;Collection gap inflation&lt;/td&gt;
&lt;td&gt;Network jitter or performance issues cause time gaps that inflate stream aggregation deltas&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P2&lt;/td&gt;
&lt;td&gt;Single-node compute limits&lt;/td&gt;
&lt;td&gt;Stream aggregation has no historical state, fast but single-instance bottlenecked&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P3&lt;/td&gt;
&lt;td&gt;Distributed task allocation&lt;/td&gt;
&lt;td&gt;Which compute node should each sample be assigned to?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P4&lt;/td&gt;
&lt;td&gt;Out-of-order discarding for same-dimension metrics&lt;/td&gt;
&lt;td&gt;Multiple nodes computing the same dimension with different time windows causes later values to be discarded&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P5&lt;/td&gt;
&lt;td&gt;Resource balancing&lt;/td&gt;
&lt;td&gt;Uneven load across distributed compute nodes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P6&lt;/td&gt;
&lt;td&gt;Task ID dimension explosion&lt;/td&gt;
&lt;td&gt;Stream aggregation inserts node IDs into aggregated time series — the label cardinality grows with every node you add&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;To address these, we built &lt;a href="https://github.com/Mi-Bee-Studio/stream-metrics-route" rel="noopener noreferrer"&gt;&lt;code&gt;stream-metrics-route&lt;/code&gt;&lt;/a&gt;, a Go-based distributed stream aggregation gateway.&lt;/p&gt;




&lt;h2&gt;
  
  
  II. Three Years Later — What the Official Project Has Done
&lt;/h2&gt;

&lt;p&gt;I reviewed VictoriaMetrics changelogs from v1.86 through v1.138.0 and the official documentation. Here's the scorecard.&lt;/p&gt;

&lt;h3&gt;
  
  
  ✅ Perfectly Resolved
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Issues P3, P5: Distributed Task Allocation &amp;amp; Resource Balancing
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Official solution:&lt;/strong&gt; vmagent now natively supports &lt;code&gt;-remoteWrite.shardByURL&lt;/code&gt; with &lt;strong&gt;consistent hashing&lt;/strong&gt; sharding.&lt;/p&gt;

&lt;p&gt;Starting from v1.86, basic &lt;code&gt;shardByURL&lt;/code&gt; was introduced. &lt;strong&gt;v1.138.0 (March 2026)&lt;/strong&gt; was the real milestone — it upgraded the data distribution algorithm from round-robin to &lt;strong&gt;consistent hashing&lt;/strong&gt;, which significantly reduces data redistribution ratios during node changes.&lt;/p&gt;

&lt;p&gt;The architecture evolution looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────┐     ┌─────────────────┐
│ Prometheus      │     │ Prometheus      │
│ Agent 1         │     │ Agent 2         │
└────────┬────────┘     └────────┬────────┘
         │ remote write          │ remote write
         ▼                       ▼
┌─────────────────────────────────────────┐
│           vmagent Cluster               │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐│
│  │vmagent-0 │  │vmagent-1 │  │vmagent-2 ││
│  └─────┬────┘  └─────┬────┘  └─────┬────┘│
│        │             │             │     │
│        └─────────────┼─────────────┘     │
│                      ▼                   │
│           ┌──────────────────┐           │
│           │ Consistent Hash │           │
│           └────────┬─────────┘           │
└────────────────────┼─────────────────────┘
                     │ shard by hash
         ┌───────────┼───────────┐
         ▼           ▼           ▼
   ┌──────────┐ ┌──────────┐ ┌──────────┐
   │vmstorage │ │vmstorage │ │vmstorage │
   │    -0    │ │    -1    │ │    -2    │
   └──────────┘ └──────────┘ └──────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The VictoriaMetrics blog provides specific algorithm recommendations. Combined with VictoriaMetrics Operator, you can manage shards via &lt;code&gt;shardCount&lt;/code&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  Issue P2: Single-Node Compute Scaling
&lt;/h4&gt;

&lt;p&gt;vmagent now supports horizontal scaling natively with &lt;code&gt;replicas&lt;/code&gt; + &lt;code&gt;shardCount&lt;/code&gt;, including HA support — see &lt;a href="https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5573" rel="noopener noreferrer"&gt;Issue #5573&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  Out-of-Order / Delayed Data Accuracy (P1 — Partial Mitigation)
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;v1.112.0 (February 2025)&lt;/strong&gt; was a key release, adding &lt;strong&gt;Aggregation Windows&lt;/strong&gt; — dual-window buffering for histogram and rate calculations. Instead of flushing immediately, the output is delayed by a &lt;code&gt;samples_lag&lt;/code&gt; window, which significantly improves accuracy for late-arriving data. The tradeoff: roughly doubled memory usage (maintaining two aggregation windows simultaneously).&lt;/p&gt;

&lt;p&gt;How it works:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Collector          vmagent           VictoriaMetrics
   │                  │                    │
   │  sample1 @T0    │                    │
   ├─────────────────►│  Write to          │
   │                  │  Window A (current)│
   │                  │                    │
   │  sample2 @T1    │                    │
   │  (delayed)      │                    │
   ├─────────────────►│  Write to          │
   │                  │  Window B (previous)│
   │                  │                    │
   │                  │  Aggr result A @T2 │
   │                  ├────────────────────►│
   │                  │  Aggr result B @T3 │
   │                  ├────────────────────►│
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;See the official docs on &lt;a href="https://docs.victoriametrics.com/stream-aggregation/#aggregation-windows" rel="noopener noreferrer"&gt;streaming aggregation windows&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  ❌ Still Unresolved
&lt;/h3&gt;

&lt;h4&gt;
  
  
  True Distributed Stream Aggregation Coordination
&lt;/h4&gt;

&lt;p&gt;vmagent's stream aggregation is &lt;strong&gt;single-instance&lt;/strong&gt;. There is no coordination mechanism between instances — if two vmagent instances aggregate the same metric, you get duplicate or conflicting output. The official recommendation is to use &lt;code&gt;without&lt;/code&gt;/&lt;code&gt;by&lt;/code&gt; label clauses to divide responsibility between instances, rather than providing a cross-instance coordination protocol.&lt;/p&gt;

&lt;h4&gt;
  
  
  Task ID Dimension Explosion (P6)
&lt;/h4&gt;

&lt;p&gt;Official vmagent still inserts internal labels (such as &lt;code&gt;_aggr&lt;/code&gt;-related labels) into aggregated time series, but lacks a &lt;code&gt;stream_task_id&lt;/code&gt; pre-marking plus dimension control design.&lt;/p&gt;




&lt;h2&gt;
  
  
  III. stream-metrics-route: Current Status and Value
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Core Code Architecture
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;File&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;router.go&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Routing core — filters metrics based on relabel rules&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;remotecluster.go&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Dual hashmod scheduling core&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;remotewrite.go&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Remote write HTTP client&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;kafka.go&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Kafka producer&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  The Core Algorithm (from &lt;code&gt;remotecluster.go&lt;/code&gt;)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Dual hashmod scheduling&lt;/span&gt;
&lt;span class="n"&gt;hash&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;sortLabelsHashKey&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Labels&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;dime&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;hashMod&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dimension&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hash&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c"&gt;// First hashmod → task partition ID&lt;/span&gt;

&lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Labels&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Labels&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Label&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;  &lt;span class="s"&gt;"stream_task_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;strconv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Itoa&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dime&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="c"&gt;// Insert stream_task_id label&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="n"&gt;hashnode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sortLabelsHashKey&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filterLabels&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c"&gt;// Second hashmod → node selection&lt;/span&gt;
&lt;span class="n"&gt;tmpch&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;hashMod&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uplen&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hashnode&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;       &lt;span class="c"&gt;// Which backend writer to send to&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Is It Still Needed in 2026?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Yes — but with an adjusted role.&lt;/strong&gt; The positioning should shift from "full stream aggregation gateway" to &lt;strong&gt;"metric distribution routing gateway + Kafka integration layer."&lt;/strong&gt; The core differentiated value:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Dual hashmod scheduling + &lt;code&gt;stream_task_id&lt;/code&gt; pre-injection&lt;/strong&gt; — tags metrics at the gateway layer, so all downstream nodes route consistently by this ID. This solves dimension control earlier in the pipeline than the official approach.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Multi-backend async distribution&lt;/strong&gt; — supports async distribution to both Kafka and remote write, solving the "synchronous forwarding blocks the time window" problem from the original post.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Native Prometheus relabeling integration&lt;/strong&gt; — works with standard Prometheus relabel configs, no custom syntax to learn.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  IV. Recommended 2026 Hybrid Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────┐   ┌──────────────────┐
│  Prometheus     │   │  Business System │
│  Agent Cluster  │   │  Metrics (Kafka) │
└────────┬────────┘   └────────┬─────────┘
         │                     │
         ▼                     ▼
    ┌──────────────────────────────┐
    │   stream-metrics-route       │
    │   (Routing Layer)            │
    │   - Dual hashmod scheduling  │
    │   - stream_task_id injection │
    │   - Relabeling               │
    └──────┬──────┬──────┬─────────┘
           │      │      │
    task=0 │ task=1│task=2│
           ▼      ▼      ▼
    ┌─────────────────────────┐
    │   vmagent Cluster       │
    │  (v1.112.0+ with        │
    │   aggregation windows)  │
    └──────────┬──────────────┘
               │
               ▼
    ┌──────────────────┐      ┌────────────┐
    │  VictoriaMetrics │      │   Kafka    │
    │  (Storage)       │      │   (Topic)  │
    └──────────┬───────┘      └────────────┘
               │
               ▼
    ┌──────────────────┐
    │  vmalert         │
    │  Grafana         │
    └──────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Key Configuration
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;vmagent version requirement:&lt;/strong&gt; &amp;gt;= v1.112.0, with aggregation windows enabled:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# stream aggregation config&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;match&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;http_request_duration_seconds_bucket'&lt;/span&gt;
  &lt;span class="na"&gt;interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5m&lt;/span&gt;
  &lt;span class="na"&gt;without&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;instance&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;enable_windows&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;   &lt;span class="c1"&gt;# Critical! Enables dual-window buffering&lt;/span&gt;
  &lt;span class="na"&gt;outputs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;rate_sum&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  V. Evolution Recommendations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Short-term
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Upgrade vmagent to &amp;gt;= v1.112.0&lt;/td&gt;
&lt;td&gt;Enable &lt;code&gt;enable_windows: true&lt;/code&gt; to improve histogram aggregation accuracy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Evaluate stream-metrics-route necessity&lt;/td&gt;
&lt;td&gt;If you have no Kafka requirement or high-cardinality &lt;code&gt;stream_task_id&lt;/code&gt; control need, consider migrating away&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Medium-term
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Retain stream-metrics-route as front-end routing only&lt;/td&gt;
&lt;td&gt;Keep hashmod task allocation + Kafka distribution; remove aggregation responsibility&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Disable raw metric persistence&lt;/td&gt;
&lt;td&gt;Write only stream-aggregated results to storage to reduce volume&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Add metadata management module&lt;/td&gt;
&lt;td&gt;The &lt;code&gt;ruler-handle-process&lt;/code&gt; from the original post (dynamic Record Rule by dimension) is worth building or contributing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Long-term
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Contribute &lt;code&gt;stream_task_id&lt;/code&gt; dimension control upstream&lt;/td&gt;
&lt;td&gt;If the dual hashmod design proves out in production&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Improve monitoring metrics&lt;/td&gt;
&lt;td&gt;Add business-level metrics — queue depth per routing rule, distribution latency, etc.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Putting It All Together
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Assessment&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Problem resolution rate&lt;/td&gt;
&lt;td&gt;~50% — 2 of 4 core problems resolved via official upgrades; 2 still need custom solutions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Is stream-metrics-route still needed?&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Yes&lt;/strong&gt; — repositioned as "metric distribution routing gateway + Kafka integration layer"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Recommended architecture&lt;/td&gt;
&lt;td&gt;Prometheus → stream-metrics-route → vmagent v1.112.0+ → VictoriaMetrics Storage&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Three years is a long time in the observability space. The VictoriaMetrics ecosystem has matured significantly — consistent hashing, aggregation windows, and native sharding all address problems that required custom tooling in 2023. But the hard problems around true &lt;em&gt;distributed&lt;/em&gt; stream aggregation coordination and dimension control at the gateway layer remain open.&lt;/p&gt;

&lt;p&gt;If you're running a similar stack, the hybrid approach — letting the official project handle what it's good at (single-node aggregation, storage) while keeping custom routing for what it isn't (distributed coordination, dimension pre-injection, Kafka bridging) — has proven to be the right call for us.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://blog.mickeyzzc.tech/en/posts/telemetry/stream-metrics-two/" rel="noopener noreferrer"&gt;my blog&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>database</category>
      <category>devops</category>
      <category>monitoring</category>
      <category>performance</category>
    </item>
    <item>
      <title>Scaling VictoriaMetrics Stream Aggregation: A Deep Dive into Distributed Design（2023）</title>
      <dc:creator>ZhengZhiCong</dc:creator>
      <pubDate>Sun, 07 Jun 2026 22:50:31 +0000</pubDate>
      <link>https://dev.to/mickey_zzc/scaling-victoriametrics-stream-aggregation-a-deep-dive-into-distributed-design-629</link>
      <guid>https://dev.to/mickey_zzc/scaling-victoriametrics-stream-aggregation-a-deep-dive-into-distributed-design-629</guid>
      <description>&lt;p&gt;VictoriaMetrics' stream aggregation is a powerful feature for reducing metric cardinality in real time. But when you push it to millions of time series across a distributed fleet, the native implementation starts to show cracks.&lt;/p&gt;

&lt;p&gt;This post walks through what I found analyzing the stream aggregation source code, the real-world problems that emerged at scale, and the distributed gateway we built to solve them.&lt;/p&gt;




&lt;h2&gt;
  
  
  Community VM Stream Aggregation — Capability Analysis
&lt;/h2&gt;

&lt;p&gt;Stream aggregation was integrated into &lt;code&gt;vmagent&lt;/code&gt; starting from version 1.86 (&lt;a href="https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3460" rel="noopener noreferrer"&gt;GitHub issue #3460&lt;/a&gt;). Let's look at what it actually does under the hood.&lt;/p&gt;

&lt;h3&gt;
  
  
  Core Computation: The &lt;code&gt;pushSample&lt;/code&gt; Function
&lt;/h3&gt;

&lt;p&gt;The heart of stream aggregation lives in the &lt;code&gt;pushSample&lt;/code&gt; function. Here's the simplified logic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;as&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;totalAggrState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;pushSample&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inputKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;outputKey&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="kt"&gt;float64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;currentTime&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;fasttime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UnixTimestamp&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;deleteDeadline&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;currentTime&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;as&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intervalSecs&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;as&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intervalSecs&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;again&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;as&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outputKey&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;totalStateValue&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;lastValues&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;lastValueState&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;vNew&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;loaded&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;as&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LoadOrStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outputKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;loaded&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vNew&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;sv&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;totalStateValue&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;sv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Lock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;deleted&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;sv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;deleted&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;deleted&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;lv&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;sv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lastValues&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;inputKey&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;lv&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;lastValueState&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;
            &lt;span class="n"&gt;sv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lastValues&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;inputKey&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lv&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;lv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;lv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;currentTime&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;as&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ignoreInputDeadline&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;sv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;lv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;
        &lt;span class="n"&gt;lv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;deleteDeadline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;deleteDeadline&lt;/span&gt;
        &lt;span class="n"&gt;sv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;deleteDeadline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;deleteDeadline&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;sv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Unlock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;deleted&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;goto&lt;/span&gt; &lt;span class="n"&gt;again&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The core idea: each incoming sample's value is compared against the last seen value for that time series. The &lt;em&gt;delta&lt;/em&gt; is accumulated into a running total. This works well for counters that monotonically increase, but there's a critical detail — the time window logic is simple periodic checking with no sophisticated handling for delayed or out-of-order arrivals.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Stream Aggregation Looks Like in Practice
&lt;/h3&gt;

&lt;p&gt;In theory, stream aggregation should cleanly reduce high-cardinality metrics down to manageable summaries. In practice, the picture is messier:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ideal model&lt;/strong&gt;: every sample arrives on time, windows align perfectly, aggregation is lossless&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reality&lt;/strong&gt;: samples arrive late, retries flood old data, gaps appear from network or service issues&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Common Issues with Native Stream Aggregation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Collection Gap Problem
&lt;/h3&gt;

&lt;p&gt;Collection gaps are inevitable. Network blips, service restarts, GC pauses — any of these can cause a gap in metric collection. For high-precision stream aggregation, gaps create a specific failure mode:&lt;/p&gt;

&lt;p&gt;When a counter's last tracked value is &lt;code&gt;1000&lt;/code&gt;, and a gap causes the next received value to be &lt;code&gt;5000&lt;/code&gt;, the delta is &lt;code&gt;4000&lt;/code&gt; — which may span an unknown number of actual increments. If the gap occurred &lt;em&gt;within&lt;/em&gt; a single aggregation window, the inflated value corrupts that window's result. If the gap crosses window boundaries, you get compounding errors in downstream calculations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Distributed Computing Challenges
&lt;/h3&gt;

&lt;p&gt;Stream aggregation doesn't persist historical data, so it's fast — but even the fastest service has single-node limits. When you need to scale horizontally, a cascade of new problems appears:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is vmagent's built-in collection viable at scale?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In testing, enabling vmagent shard + replica collection with real-time stream aggregation caused significant resource spikes. At very large scales, collection gaps became more frequent, which &lt;em&gt;amplified&lt;/em&gt; the calculation errors from gaps rather than mitigating them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Which compute node should each sample be assigned to?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Without a consistent routing strategy, the same metric dimension gets split across nodes, producing partial results that can't be safely combined.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do you handle the same dimension set being computed by multiple nodes with different values?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When multiple nodes produce results for the same dimension within the same time window, VictoriaMetrics triggers its out-of-order handling logic — which discards later values. You lose data silently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do you balance resources in distributed computation?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Uneven distribution means some nodes are overloaded while others sit idle.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What about the new dimensions introduced by routing?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you insert a task ID to differentiate compute nodes, you've just added a new dimension that grows with every node you add — defeating part of the purpose of aggregation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Design and Implementation: A Distributed Stream Aggregation Gateway
&lt;/h2&gt;

&lt;p&gt;After analyzing these problems, it became clear that a frontend module was needed to address them at the entry point. Since &lt;code&gt;vmgateway&lt;/code&gt; is an enterprise component, we built our own: &lt;strong&gt;&lt;code&gt;vm-receive-route&lt;/code&gt;&lt;/strong&gt;, a distributed stream aggregation gateway.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Insight from Source Code
&lt;/h3&gt;

&lt;p&gt;Two aspects of the native implementation are particularly relevant:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Time window range:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;currentTime&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;fasttime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UnixTimestamp&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;deleteDeadline&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;currentTime&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;as&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intervalSecs&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;as&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intervalSecs&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The window has a 50% grace period (&lt;code&gt;intervalSecs &amp;gt;&amp;gt; 1&lt;/code&gt;) beyond the configured interval. This is the only protection against late-arriving data — and it's quite generous, which means stale data can still influence results within that extended window.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Calculation logic:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;lv&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;sv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lastValues&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;inputKey&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;lv&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;lastValueState&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="n"&gt;sv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lastValues&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;inputKey&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lv&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;lv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;lv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;currentTime&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;as&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ignoreInputDeadline&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;sv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The aggregation only checks if a previous value exists and is smaller — it doesn't handle the case where a late-arriving sample with a lower value (after a restart, for example) creates a negative delta that's simply ignored. There's no deduplication, no gap detection, no intelligent merging.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the Gateway Solves
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. Asynchronous Processing
&lt;/h4&gt;

&lt;p&gt;Most &lt;code&gt;remote write&lt;/code&gt; adapters (like &lt;code&gt;prometheus-kafka-adapter&lt;/code&gt;) do synchronous forwarding — they wait for the downstream (Kafka, etc.) to acknowledge before accepting the next batch. Stream aggregation has &lt;em&gt;window constraints&lt;/em&gt;: if the write pipeline blocks and samples arrive late, they miss their window and calculations drift.&lt;/p&gt;

&lt;p&gt;The gateway decouples ingestion from forwarding using an internal buffer. The &lt;code&gt;remote write&lt;/code&gt; endpoint returns immediately, and samples are forwarded asynchronously to the stream aggregation backend. This prevents back-pressure from creating cascading delays.&lt;/p&gt;

&lt;h4&gt;
  
  
  2. Time Window Filtering
&lt;/h4&gt;

&lt;p&gt;Since stream aggregation already computes deltas between successive values, there's no need for complex out-of-order handling at this layer. The gateway simply cooperates with the aggregation window:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Samples arriving within the window → forward normally&lt;/li&gt;
&lt;li&gt;Samples arriving outside the window → discard&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This solves two problems at once:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prometheus retries that send large backlogs of old samples no longer corrupt real-time results&lt;/li&gt;
&lt;li&gt;The resource overhead of processing those retries is eliminated at the gateway level — the aggregation backend never sees them&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  3. Dimension Control
&lt;/h4&gt;

&lt;p&gt;The stream aggregation component inserts a node ID into each aggregated time series to distinguish labels across compute nodes. But as nodes scale horizontally, the cardinality of &lt;em&gt;that label&lt;/em&gt; scales too. You need a way to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Control dimension growth from node identities&lt;/li&gt;
&lt;li&gt;Route time series by dimension to the correct node&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We designed a &lt;strong&gt;dual hashmod scheduling algorithm&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The gateway assigns a hash-based task ID to each time series based on its stable dimensions (not the node identity)&lt;/li&gt;
&lt;li&gt;The same series always routes to the same compute node, regardless of which gateway instance processed it&lt;/li&gt;
&lt;li&gt;The task ID dimension is bounded by the number of unique dimension combinations, not the number of gateway nodes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By moving the task ID labeling to the gateway layer, we eliminated the unbounded dimension growth that horizontal scaling would otherwise cause.&lt;/p&gt;

&lt;h4&gt;
  
  
  4. Backend Service Migration for Failures
&lt;/h4&gt;

&lt;p&gt;When a compute node fails, its in-memory aggregation state is lost. The gateway detects failures via health checks and reroutes traffic to healthy nodes. Since the dual hashmod ensures consistent routing, the remaining nodes can immediately pick up the work, though there will be a brief period of incomplete aggregation until the state rebuilds.&lt;/p&gt;




&lt;h2&gt;
  
  
  Record Rule Dimension Task Generator
&lt;/h2&gt;

&lt;p&gt;Stream aggregation is great for reducing cardinality of &lt;em&gt;single metrics&lt;/em&gt;. But real-world monitoring scenarios require combining multiple metrics with functions — which is where Prometheus &lt;code&gt;Record Rule&lt;/code&gt; comes in.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Problem with Record Rules at Scale
&lt;/h3&gt;

&lt;p&gt;Consider an HTTP request metric with a &lt;code&gt;req_path&lt;/code&gt; dimension:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before stream aggregation:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight prometheus"&gt;&lt;code&gt;&lt;span class="n"&gt;a_http_req_total&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="na"&gt;zone&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"bj"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;src_svr&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"192.168.1.2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;src_port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"30021"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;dis_svr&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"192.168.2.3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"202"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;req_path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"/api/foo?abc=xyz"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;a_http_req_total&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="na"&gt;zone&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"bj"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;src_svr&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"192.168.1.2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;src_port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"30023"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;dis_svr&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"192.168.2.3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"202"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;req_path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"/api/bar?abc=def"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;a_http_req_total&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="na"&gt;zone&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"bj"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;src_svr&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"192.168.1.2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;src_port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"10021"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;dis_svr&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"192.168.2.3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"202"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;req_path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"/api/baz?abc=ghi"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;After stream aggregation (dropping &lt;code&gt;req_path&lt;/code&gt; and &lt;code&gt;src_port&lt;/code&gt;):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight prometheus"&gt;&lt;code&gt;&lt;span class="n"&gt;agg_a_http_req_total&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="na"&gt;zone&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"bj"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;src_svr&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"192.168.1.2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;dis_svr&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"192.168.2.3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"202"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;agg_a_http_req_total&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="na"&gt;zone&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"bj"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;src_svr&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"192.168.1.2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;dis_svr&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"192.168.2.3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"500"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;agg_a_http_req_total&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="na"&gt;zone&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"bj"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;src_svr&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"192.168.1.2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;dis_svr&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"192.168.2.3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"400"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But what you &lt;em&gt;actually&lt;/em&gt; want to display is the success rate per target:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sum by (dis_svr) (
    rate(a_http_req_total{code=~"2.*"}[5m])
)
/
sum by (dis_svr) (
    rate(a_http_req_total{}[5m])
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can use &lt;code&gt;Record Rule&lt;/code&gt; to precompute this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;groups&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;a_http_req_total:sum:rate:5m&lt;/span&gt;
    &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sum by (src_svr, dis_svr, code) (rate(a_http_req_total{}[5m]))&lt;/span&gt;
        &lt;span class="na"&gt;record&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;a_http_req_total:sum:rate:5m&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The problem? &lt;code&gt;Record Rule&lt;/code&gt; loads &lt;em&gt;all&lt;/em&gt; dimensions of the metric into memory. When dimension counts reach critical thresholds, it triggers OOM. Even below that threshold, higher cardinality means slower computation.&lt;/p&gt;

&lt;p&gt;In production, &lt;code&gt;istio_requests_total&lt;/code&gt; QPS could be delayed by &lt;strong&gt;20 minutes&lt;/strong&gt; at high dimension counts. After applying stream aggregation to reduce from tens of millions of time series down to tens of thousands, the delay dropped to 1-2 minutes — better, but still far from real-time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Dynamic Dimension-Split Record Rules
&lt;/h3&gt;

&lt;p&gt;The issue is that &lt;code&gt;Record Rule&lt;/code&gt; evaluates one query per group, loading everything into memory. But if you split the query by specific dimension values, you can process them concurrently.&lt;/p&gt;

&lt;p&gt;The static approach after stream aggregation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;groups&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agg_a_http_req_total:sum:rate:5m-2xx&lt;/span&gt;
    &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sum by (src_svr, dis_svr, code) (rate(agg_a_http_req_total{code=~"2.*"}[5m]))&lt;/span&gt;
        &lt;span class="na"&gt;record&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agg_a_http_req_total:sum:rate:5m&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agg_a_http_req_total:sum:rate:5m-4xx&lt;/span&gt;
    &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sum by (src_svr, dis_svr, code) (rate(agg_a_http_req_total{code=~"4.*"}[5m]))&lt;/span&gt;
        &lt;span class="na"&gt;record&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agg_a_http_req_total:sum:rate:5m&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agg_a_http_req_total:sum:rate:5m-5xx&lt;/span&gt;
    &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sum by (src_svr, dis_svr, code) (rate(agg_a_http_req_total{code=~"5.*"}[5m]))&lt;/span&gt;
        &lt;span class="na"&gt;record&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agg_a_http_req_total:sum:rate:5m&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But the problem is that in production, dimension labels are dynamic. You can't hardcode splits for every dimension value. You need a &lt;strong&gt;label metadata management system&lt;/strong&gt; that watches dimension combinations and dynamically generates split queries.&lt;/p&gt;

&lt;h3&gt;
  
  
  The &lt;code&gt;ruler-handle-process&lt;/code&gt; Component
&lt;/h3&gt;

&lt;p&gt;We built a small metadata watch and rule builder that automates this. Its configuration looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;recode_rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5m&lt;/span&gt;
    &lt;span class="na"&gt;recode_to&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;istio_requests_total:sum:rate:5m&lt;/span&gt;
    &lt;span class="na"&gt;metric_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;istio_requests_total&lt;/span&gt;
    &lt;span class="na"&gt;aggr_type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sum&lt;/span&gt;
    &lt;span class="na"&gt;vector_type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rate&lt;/span&gt;
    &lt;span class="na"&gt;vector_range&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5m&lt;/span&gt;
    &lt;span class="na"&gt;group_by_and_filter&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;source_workload&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;destination_workload&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;cluster&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;namespace&lt;/span&gt;
    &lt;span class="na"&gt;group_by&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;response_code&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;namespace&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;source_workload_namespace&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;destination_workload_namespace&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;destination_service_name&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;cluster&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;reporter&lt;/span&gt;
    &lt;span class="na"&gt;filter_by&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;cluster&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;k8s-hw-bj-xxxxxx"&lt;/span&gt;
    &lt;span class="na"&gt;with_out&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;source_workload&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ingressgateway-workflows"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The component watches the actual dimension combinations under the metric name &lt;code&gt;istio_requests_total&lt;/code&gt; and generates a set of &lt;code&gt;Record Rule&lt;/code&gt; configurations — one per unique dimension combination. Combined with Prometheus's Rule component for concurrent evaluation, this reduced computation latency from minutes to &lt;strong&gt;seconds&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The generated rules look like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;groups&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;istio_requests_total:sum:rate:5m-7218756fe8a0bc327e818812cefb02f7&lt;/span&gt;
    &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sum by (...) (rate(istio_requests_total{cluster="k8s-hw-bj-1-prod", destination_workload="skyaxe-778-flink", ...}[5m]))&lt;/span&gt;
        &lt;span class="na"&gt;record&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;istio_requests_total:sum:rate:5m&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;istio_requests_total:sum:rate:5m-8e30244048f8d5519a6332f309578ed4&lt;/span&gt;
    &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sum by (...) (rate(istio_requests_total{cluster="k8s-hw-bj-1-prod", destination_workload="t-bean-portal", ...}[5m]))&lt;/span&gt;
        &lt;span class="na"&gt;record&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;istio_requests_total:sum:rate:5m&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each unique dimension combination gets its own rule group, enabling true concurrent computation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture Combinations for Different Scales
&lt;/h2&gt;

&lt;p&gt;Using community open-source components alongside our custom gateway and rule builder, we assembled a tiered architecture that handles everything from small deployments to massive fleets.&lt;/p&gt;

&lt;h3&gt;
  
  
  Small Scale: Tens of Thousands of Single Metrics
&lt;/h3&gt;

&lt;p&gt;Minimal setup — vmagent with built-in stream aggregation. No gateway needed. The single-node limits aren't reached yet.&lt;/p&gt;

&lt;h3&gt;
  
  
  Medium Scale: Tens of Thousands of Multi-Metrics
&lt;/h3&gt;

&lt;p&gt;Add Record Rules for computed metrics. Use the dimension-split approach to keep computation fast. Stream aggregation reduces cardinality before Record Rule evaluation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Large Scale: Millions+ of Single Metrics
&lt;/h3&gt;

&lt;p&gt;This is where the distributed gateway comes in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multiple vmagent instances behind the gateway for horizontal ingestion&lt;/li&gt;
&lt;li&gt;Dual hashmod scheduling ensures consistent routing&lt;/li&gt;
&lt;li&gt;Time window filtering at the gateway prevents retries from polluting results&lt;/li&gt;
&lt;li&gt;Asynchronous forwarding prevents back-pressure from creating cascading delays&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Large Scale: Scenario-Based Computational Aggregation for Millions of Metrics
&lt;/h3&gt;

&lt;p&gt;The full stack:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Stream aggregation&lt;/strong&gt; — first pass cardinality reduction at ingestion&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Distributed gateway&lt;/strong&gt; — routing, filtering, and load distribution&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic rule builder&lt;/strong&gt; — watches dimension metadata and generates concurrent Record Rule groups&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rule engine&lt;/strong&gt; — evaluates split queries in parallel&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;VictoriaMetrics' stream aggregation is a solid foundation, but it was designed for single-node operation. When you need to scale horizontally, you encounter a series of interconnected problems — collection gaps, dimension explosion, record rule performance bottlenecks — that the native implementation doesn't address.&lt;/p&gt;

&lt;p&gt;The solutions aren't particularly complex individually (async processing, time window filtering, hash-based routing, dimension-aware rule generation), but they need to work together as a coherent system. The gateway pattern — intercepting the &lt;code&gt;remote write&lt;/code&gt; path before it reaches the aggregation layer — proved to be the right abstraction point for injecting these capabilities without modifying upstream code.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;ruler-handle-process&lt;/code&gt; component was the second key insight: rather than fighting Prometheus Record Rule's single-query-per-group limitation, we embraced it by dynamically splitting queries by dimension and running them concurrently. This turned a 20-minute computation into a seconds-level operation.&lt;/p&gt;

&lt;p&gt;If you're running VictoriaMetrics at scale and hitting these issues, the patterns described here should be applicable regardless of your specific stack. The gateway approach is generic enough to work with any Prometheus-compatible &lt;code&gt;remote write&lt;/code&gt; backend.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://blog.mickeyzzc.tech/en/posts/telemetry/stream-metrics-one/" rel="noopener noreferrer"&gt;my blog&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>monitoring</category>
      <category>performance</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>MiBeeNvr: A Lightweight Home NVR System I Built</title>
      <dc:creator>ZhengZhiCong</dc:creator>
      <pubDate>Sun, 07 Jun 2026 09:27:47 +0000</pubDate>
      <link>https://dev.to/mickey_zzc/mibeenvr-a-lightweight-home-nvr-system-i-built-41b</link>
      <guid>https://dev.to/mickey_zzc/mibeenvr-a-lightweight-home-nvr-system-i-built-41b</guid>
      <description>&lt;p&gt;I have several cameras at home — a few Xiaomi cameras, some DIY ESP32 cameras, and multiple Raspberry Pi CSI cameras. I'd been using cloud storage solutions, but I was never comfortable with them: vendor lock-in, network dependency, and the costs add up. So I decided to build my own NVR system, called MiBeeNvr.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Build MiBeeNvr
&lt;/h2&gt;

&lt;p&gt;To be honest, I was never satisfied with existing cloud storage solutions. Take Xiaomi cameras, for example. By default, you can only view them through the Mi Home app. Recordings are either stored on an SD card (limited capacity, frequent plugging/unplugging) or in the cloud. Cloud storage costs tens of dollars per month, and there's the privacy concern — you never know when the manufacturer might use your video data for AI training or sell it to third parties. Not to mention vendor lock-in — switching platforms is nearly impossible.&lt;/p&gt;

&lt;p&gt;ESP32 cameras have a similar problem. I built several ESP32 cameras, storing recordings on SD cards, but viewing and playback were inconvenient. I needed a unified management platform.&lt;/p&gt;

&lt;p&gt;I also tried other open-source solutions: ZoneMinder requires a LAMP stack — installing and deploying it is more complex than my entire project; Shinobi's configuration is a nightmare; and some smaller projects are basically unmaintained. Frigate is nice but primarily focused on AI detection and depends on Docker — too heavy.&lt;/p&gt;

&lt;p&gt;In short, I wanted something that is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A single binary file — download and run&lt;/li&gt;
&lt;li&gt;Lightweight enough to run on a Raspberry Pi&lt;/li&gt;
&lt;li&gt;Supports multiple camera types, especially Xiaomi's proprietary protocol&lt;/li&gt;
&lt;li&gt;Clean Web interface without frontend complexity&lt;/li&gt;
&lt;li&gt;Auto-cleanup of old recordings, won't fill up the disk&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After searching around, none of the existing solutions fit. So I wrote my own.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is MiBeeNvr
&lt;/h2&gt;

&lt;p&gt;MiBeeNvr is a lightweight NVR system written in Go, designed to solve local storage for home cameras.&lt;/p&gt;

&lt;h3&gt;
  
  
  Overall Architecture
&lt;/h3&gt;

&lt;p&gt;The system has three layers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Camera End&lt;/strong&gt; — multiple camera types connect through different protocols:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Xiaomi cameras use the proprietary "miss" protocol (encrypted, multi-layer)&lt;/li&gt;
&lt;li&gt;ESP32 cameras send HTTP JPEG/MJPEG streams directly&lt;/li&gt;
&lt;li&gt;Raspberry Pi CSI cameras output standard RTSP H.264 via MediaMTX&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Protocol Bridge Layer&lt;/strong&gt; — proprietary protocols get converted to standard RTSP:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;go2rtc&lt;/code&gt; handles Xiaomi's miss protocol decryption and transcodes it to RTSP&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;MediaMTX&lt;/code&gt; converts Raspberry Pi CSI interface video to RTSP&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;MiBeeNvr Core&lt;/strong&gt; — handles the recording logic:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;REST API receives all video streams&lt;/li&gt;
&lt;li&gt;Recording engine segments incoming video into MP4 files&lt;/li&gt;
&lt;li&gt;SQLite stores metadata (pure Go, no CGO, no separate DB installation)&lt;/li&gt;
&lt;li&gt;Auto-cleanup daemon removes old recordings per retention policy&lt;/li&gt;
&lt;li&gt;HLS live streaming for real-time viewing (up to 4 concurrent)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Access Methods&lt;/strong&gt; — users interact through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Built-in Web UI (Svelte 5 SPA, embedded in the single binary)&lt;/li&gt;
&lt;li&gt;WebDAV (read-write) and FTP for file-level access&lt;/li&gt;
&lt;li&gt;Prometheus metrics for system monitoring&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Result: cameras → protocol bridge → MiBeeNvr core → user access, all in a single binary.&lt;/p&gt;

&lt;h3&gt;
  
  
  Recording Pipeline
&lt;/h3&gt;

&lt;p&gt;Video processing follows a straightforward pipeline:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Input&lt;/strong&gt; — RTSP streams are handled via &lt;code&gt;gortsplib&lt;/code&gt;, HTTP JPEG streams are periodically grabbed as frames&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Decode &amp;amp; Mux&lt;/strong&gt; — RTP packets are depacketized via &lt;code&gt;pion/rtp&lt;/code&gt;, then muxed into MP4 via &lt;code&gt;go-mp4&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storage&lt;/strong&gt; — video is segmented into files (configurable: 30s or 10m intervals), SQLite tracks metadata, files go to disk&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The frontend uses Svelte 5 — the entire SPA is compiled to static assets and embedded into the Go binary. Deployment is a single file, no separate Web server needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Backend tech stack:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Go 1.26 + modernc.org/sqlite (pure Go, no CGO dependency)&lt;/li&gt;
&lt;li&gt;chi routing library, clean and efficient&lt;/li&gt;
&lt;li&gt;gortsplib for RTSP/RTP protocol&lt;/li&gt;
&lt;li&gt;pion/rtp for real-time streaming&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;SQLite was chosen because it's single-file, pure Go, performs well enough for home use, supports concurrent access, and most importantly, doesn't require a separate database installation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Design Philosophy
&lt;/h3&gt;

&lt;p&gt;The entire project's design philosophy is &lt;strong&gt;"simple and straightforward:"&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Single binary file, no external dependencies&lt;/li&gt;
&lt;li&gt;Supports cross-compilation, runs on AMD64/ARM64&lt;/li&gt;
&lt;li&gt;YAML configuration, intuitive&lt;/li&gt;
&lt;li&gt;Built-in Web interface, open browser to use&lt;/li&gt;
&lt;li&gt;Minimal resource usage, runs smoothly on Raspberry Pi 4&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key Features
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Supports multiple camera protocols: RTSP (H.264/H.265), HTTP JPEG&lt;/li&gt;
&lt;li&gt;Built-in Web interface with dark/light theme switching&lt;/li&gt;
&lt;li&gt;Chinese/English bilingual support&lt;/li&gt;
&lt;li&gt;WebDAV (read-write), FTP, REST API&lt;/li&gt;
&lt;li&gt;MQTT-triggered recording, ideal for smart home integration&lt;/li&gt;
&lt;li&gt;Prometheus monitoring metrics&lt;/li&gt;
&lt;li&gt;Per-camera independent retention policies&lt;/li&gt;
&lt;li&gt;MP4 segmented recording, auto-cleanup of old files&lt;/li&gt;
&lt;li&gt;Supports HLS live streaming (up to 4 concurrent)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  My Actual Deployment
&lt;/h2&gt;

&lt;p&gt;I run it on an ARM64 mini host with 512MB RAM and 2GB storage. The system runs very stably — basically set it and forget it.&lt;/p&gt;

&lt;p&gt;Connected 4 cameras, each with its own characteristics:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Raspberry Pi CSI Camera&lt;/strong&gt; — RTSP bridge via MediaMTX, converting CSI interface video to standard RTSP. Configured as &lt;code&gt;rtsp_h264&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ESP32-S3 Camera&lt;/strong&gt; — DIY, running MJPEG stream via HTTP protocol. Configured as &lt;code&gt;http_jpeg&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Xiaomi Camera (Balcony)&lt;/strong&gt; — Protocol conversion via go2rtc (Xiaomi proprietary → RTSP), 2K resolution, configured as &lt;code&gt;rtsp_h265&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Xiaomi Camera (Living Room)&lt;/strong&gt; — Same as above, 1080P.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Configuration is 30-second segment recording with 1-day retention. This interval is a trade-off: too short creates too many files, too long makes it inconvenient to look up incidents. WebDAV (read-write) and FTP are enabled for convenient phone viewing and backup.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw6ezz6dqetie2wev5hqs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw6ezz6dqetie2wev5hqs.png" alt="Camera management page" width="800" height="541"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Web interface is clean — camera management and recording lists are straightforward.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqppfsih7x2m5bczottro.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqppfsih7x2m5bczottro.png" alt="Recording list (dark)" width="800" height="957"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Settings page:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4795f8lnr5rtl9hsaris.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4795f8lnr5rtl9hsaris.png" alt="Settings page" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Configuration File
&lt;/h3&gt;

&lt;p&gt;The complete configuration file looks like this — YAML format, clear at a glance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;server&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;listen&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:9090"&lt;/span&gt;

&lt;span class="na"&gt;storage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;root_dir&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/mnt/data/nvr"&lt;/span&gt;
  &lt;span class="na"&gt;segment_duration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;30s"&lt;/span&gt;

&lt;span class="na"&gt;cameras&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rpi-csi-cam"&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;RPi&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;CSI&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Camera"&lt;/span&gt;
    &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rtsp_h264"&lt;/span&gt;
    &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rtsp://10.0.1.100:8554/stream"&lt;/span&gt;
    &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;esp32-cam"&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ESP32-S3&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Camera"&lt;/span&gt;
    &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http_jpeg"&lt;/span&gt;
    &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://10.0.1.101/capture"&lt;/span&gt;
    &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;xiaomi-balcony"&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Xiaomi&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Camera"&lt;/span&gt;
    &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rtsp_h265"&lt;/span&gt;
    &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rtsp://10.0.1.102:8554/xiaomi_stream"&lt;/span&gt;
    &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;span class="na"&gt;cleanup&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;retention_days&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
  &lt;span class="na"&gt;disk_threshold_percent&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;95&lt;/span&gt;

&lt;span class="na"&gt;auth&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;username&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;admin"&lt;/span&gt;
  &lt;span class="na"&gt;password_hash&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Use&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;mibee-nvr&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;hash-password&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;command&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;generate"&lt;/span&gt;

&lt;span class="na"&gt;webdav&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;path_prefix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/dav"&lt;/span&gt;
  &lt;span class="na"&gt;read_write&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;span class="na"&gt;ftp&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2121&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Xiaomi Camera Integration
&lt;/h2&gt;

&lt;p&gt;Xiaomi camera protocol is a major headache. It uses its proprietary "miss" (Mi Secure Streaming) protocol with multi-layer encryption, without a standard RTSP interface. Even if you know the camera's IP, you can't pull a stream with VLC.&lt;/p&gt;

&lt;p&gt;Fortunately, there's &lt;a href="https://github.com/AlexxIT/go2rtc" rel="noopener noreferrer"&gt;go2rtc&lt;/a&gt;, a lifesaver. The integration works in four steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Account auth &amp;amp; key exchange&lt;/strong&gt; — go2rtc logs into your Xiaomi account, retrieves device lists and encryption keys from Xiaomi Cloud&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Establish P2P connection&lt;/strong&gt; — go2rtc initiates a miss protocol handshake with the camera, establishing a peer-to-peer link&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Video stream relay&lt;/strong&gt; — the camera sends its encrypted miss stream to go2rtc continuously; go2rtc decrypts and transcodes it into a standard RTSP stream (H.265); MiBeeNvr receives this as a normal RTSP camera and records segmented MP4 files&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Independent access&lt;/strong&gt; — once set up, you no longer need the Mi Home app or Xiaomi cloud subscription; all recordings are accessible via MiBeeNvr's Web UI, WebDAV, or FTP&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The entire process requires no firmware flashing, no camera disassembly, and no Xiaomi cloud storage subscription. go2rtc handles all the protocol conversion.&lt;/p&gt;

&lt;h3&gt;
  
  
  go2rtc Deployment
&lt;/h3&gt;

&lt;p&gt;The easiest way to deploy go2rtc is with Docker:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create configuration file&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; go2rtc.yaml &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
streams:
  xiaomi_balcony:
    - xiaomi://your_account:cn@10.0.1.100?did=your_camera_did&amp;amp;model=isa.camera.hlc7
  xiaomi_living_room:
    - xiaomi://your_account:cn@10.0.1.101?did=your_camera_did&amp;amp;model=isa.camera.mj200

rtsp:
  listen: ":8554"
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Run container&lt;/span&gt;
docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--name&lt;/span&gt; go2rtc &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt; 8554:8554 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt; 1984:1984 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;pwd&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;/go2rtc.yaml:/config.yaml &lt;span class="se"&gt;\&lt;/span&gt;
  alexxit/go2rtc
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key points:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;xiaomi://&lt;/code&gt; protocol requires Xiaomi account and password authentication&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;did&lt;/code&gt; is the device's unique identifier, &lt;code&gt;model&lt;/code&gt; is the device model (can be found in the Mi Home app)&lt;/li&gt;
&lt;li&gt;go2rtc automatically handles P2P connection and miss protocol decryption&lt;/li&gt;
&lt;li&gt;The final standard RTSP stream is exposed on port 8554, and MiBeeNvr connects to it like any normal camera&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then point to go2rtc in MiBeeNvr's config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;cameras&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;xiaomi-balcony"&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Xiaomi&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Camera"&lt;/span&gt;
    &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rtsp_h265"&lt;/span&gt;
    &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rtsp://localhost:8554/xiaomi_balcony"&lt;/span&gt;
    &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pitfalls Encountered
&lt;/h3&gt;

&lt;p&gt;Xiaomi camera integration has several pitfalls:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;First-time network connection&lt;/strong&gt;: Xiaomi cameras must be able to reach the internet for key exchange with Xiaomi servers. After connection is established, subsequent transmission is over LAN.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Device ID acquisition&lt;/strong&gt;: Each camera's &lt;code&gt;did&lt;/code&gt; is unique. Use go2rtc's WebUI (port 1984) for auto-discovery, or dig through the Mi Home app.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not all models are supported&lt;/strong&gt;: go2rtc maintains a compatibility list — check before buying a camera.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;H.265 vs H.264&lt;/strong&gt;: Newer Xiaomi cameras mostly use H.265. MiBeeNvr supports both codecs, but H.265 saves storage space.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  ESP32 Camera Projects
&lt;/h2&gt;

&lt;p&gt;While working on MiBeeNvr, I also built several ESP32 camera firmware projects. ESP32 cameras had their share of pitfalls, but were also quite interesting.&lt;/p&gt;

&lt;p&gt;I built three firmware projects with different positioning, all designed as upstream capture endpoints for MiBeeNvr — cameras handle video capture, MiBeeNvr handles unified storage and management.&lt;/p&gt;

&lt;h3&gt;
  
  
  MiBeeCam — ESP32-S3-A10 Solution
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/Mi-Bee-Studio/luatos-esp32s3-a10-camera" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/strong&gt; · MIT License&lt;/p&gt;

&lt;p&gt;This is the most successful solution. ESP32-S3-A10 dev board + OV2640 camera (8225N module), 16MB Flash, ESP-IDF v5.4.3 development. Features include MJPEG stream, frame-differencing motion detection, Web config interface, Prometheus metrics. Having an LCD screen makes debugging much easier.&lt;/p&gt;

&lt;h3&gt;
  
  
  AI Thinker ESP32-CAM — Classic Solution
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/Mi-Bee-Studio/ai-thinker-esp32-cam" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/strong&gt; · MIT License&lt;/p&gt;

&lt;p&gt;Entry-level choice, AI Thinker ESP32-CAM dev boards are widely available for around $10-15. 4MB Flash + 4MB PSRAM, runs MJPEG stream without issues. Highlights include SD card storage and NAS upload (WebDAV/HTTP), plus adaptive dark scene detection — automatically switches to infrared mode at night. Downside: no screen, 4MB Flash is limited.&lt;/p&gt;

&lt;h3&gt;
  
  
  MiBeeHomeCam — XIAO ESP32-S3 Sense
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/Mi-Bee-Studio/seeed-esp32s3-cam" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/strong&gt; · GPL v3.0&lt;/p&gt;

&lt;p&gt;The most advanced solution. XIAO ESP32-S3 Sense board is compact and refined, with dual camera support (OV2640/OV3660), 8MB Octal PSRAM. Highlights include AVI segmented recording (real video recording, not just snapshots), FTP/WebDAV dual-protocol upload, watchdog anti-freeze, chip temperature monitoring, batch file management. Suitable for long-term stable operation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Selection Guide
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Beginners&lt;/strong&gt;: Choose AI Thinker ESP32-CAM — cheap with plenty of resources&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Daily use&lt;/strong&gt;: Choose MiBeeCam — LCD screen makes debugging convenient&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maximum features&lt;/strong&gt;: Choose XIAO ESP32-S3 Sense — most powerful&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  System Service Configuration
&lt;/h2&gt;

&lt;p&gt;For stable operation, I use systemd to manage MiBeeNvr:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="nn"&gt;[Unit]&lt;/span&gt;
&lt;span class="py"&gt;Description&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;MiBee NVR&lt;/span&gt;
&lt;span class="py"&gt;After&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;network-online.target&lt;/span&gt;
&lt;span class="py"&gt;Wants&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;network-online.target&lt;/span&gt;

&lt;span class="nn"&gt;[Service]&lt;/span&gt;
&lt;span class="py"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;simple&lt;/span&gt;
&lt;span class="py"&gt;User&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;nvr&lt;/span&gt;
&lt;span class="py"&gt;ExecStart&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;/mnt/data/nvr/bin/mibee-nvr -config /mnt/data/nvr/mibee-nvr.yaml&lt;/span&gt;
&lt;span class="py"&gt;WorkingDirectory&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;/mnt/data/nvr&lt;/span&gt;
&lt;span class="py"&gt;Restart&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;on-failure&lt;/span&gt;
&lt;span class="py"&gt;RestartSec&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;5&lt;/span&gt;

&lt;span class="c"&gt;# Security hardening
&lt;/span&gt;&lt;span class="py"&gt;NoNewPrivileges&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;ProtectSystem&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;strict&lt;/span&gt;
&lt;span class="py"&gt;ReadWritePaths&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;/mnt/data/nvr&lt;/span&gt;
&lt;span class="py"&gt;PrivateTmp&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;

&lt;span class="nn"&gt;[Install]&lt;/span&gt;
&lt;span class="py"&gt;WantedBy&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;multi-user.target&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Save to &lt;code&gt;/etc/systemd/system/mibee-nvr.service&lt;/code&gt;, then &lt;code&gt;systemctl enable --now mibee-nvr&lt;/code&gt;. Auto-start on boot, auto-restart on failure.&lt;/p&gt;




&lt;h2&gt;
  
  
  Open Source
&lt;/h2&gt;

&lt;p&gt;MiBeeNvr is open source, stars and contributions welcome:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MiBeeNvr&lt;/strong&gt;: &lt;a href="https://github.com/Mi-Bee-Studio/MiBeeNvr" rel="noopener noreferrer"&gt;https://github.com/Mi-Bee-Studio/MiBeeNvr&lt;/a&gt; (MIT License)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MiBeeCam&lt;/strong&gt;: &lt;a href="https://github.com/Mi-Bee-Studio/luatos-esp32s3-a10-camera" rel="noopener noreferrer"&gt;https://github.com/Mi-Bee-Studio/luatos-esp32s3-a10-camera&lt;/a&gt; (MIT)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI Thinker ESP32-CAM&lt;/strong&gt;: &lt;a href="https://github.com/Mi-Bee-Studio/ai-thinker-esp32-cam" rel="noopener noreferrer"&gt;https://github.com/Mi-Bee-Studio/ai-thinker-esp32-cam&lt;/a&gt; (MIT)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MiBeeHomeCam&lt;/strong&gt;: &lt;a href="https://github.com/Mi-Bee-Studio/seeed-esp32s3-cam" rel="noopener noreferrer"&gt;https://github.com/Mi-Bee-Studio/seeed-esp32s3-cam&lt;/a&gt; (GPL v3.0)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Documentation is comprehensive, with detailed deployment and configuration instructions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Closing Thoughts
&lt;/h3&gt;

&lt;p&gt;To be honest, I built this project mainly because I was dissatisfied with all the existing solutions. Cloud storage is too expensive, open-source solutions are too heavy, and commercial products are too closed. Building my own was just right: lightweight, free, and fully under my control.&lt;/p&gt;

&lt;p&gt;Oh, about the name MiBeeNvr — "Mi" stands for me (Mickey), "Bee" stands for... let's keep that a secret, and "Nvr" is naturally Network Video Recorder. Simple, memorable, and a bit meaningful.&lt;/p&gt;

&lt;p&gt;If you also have home camera needs or ideas about NVR systems, feel free to reach out. Issues are welcome on GitHub.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://blog.mickeyzzc.tech/en/" rel="noopener noreferrer"&gt;my blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>go</category>
      <category>github</category>
      <category>nvr</category>
      <category>camera</category>
    </item>
  </channel>
</rss>
