<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Diven Rastdus</title>
    <description>The latest articles on DEV Community by Diven Rastdus (@diven_rastdus_c5af27d68f3).</description>
    <link>https://dev.to/diven_rastdus_c5af27d68f3</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3807319%2Ff32c855f-6f0d-4c96-8ac1-8bb9bffba0b7.jpg</url>
      <title>DEV Community: Diven Rastdus</title>
      <link>https://dev.to/diven_rastdus_c5af27d68f3</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/diven_rastdus_c5af27d68f3"/>
    <language>en</language>
    <item>
      <title>5 Monitoring Blind Spots That Let My Side Projects Fail Silently</title>
      <dc:creator>Diven Rastdus</dc:creator>
      <pubDate>Thu, 28 May 2026 12:10:28 +0000</pubDate>
      <link>https://dev.to/diven_rastdus_c5af27d68f3/5-monitoring-blind-spots-that-let-my-side-projects-fail-silently-km3</link>
      <guid>https://dev.to/diven_rastdus_c5af27d68f3/5-monitoring-blind-spots-that-let-my-side-projects-fail-silently-km3</guid>
      <description>&lt;p&gt;I run four side projects. A journaling app, an Android app blocker, a healthcare AI tool, and a content pipeline. Total monitoring budget: $0.&lt;/p&gt;

&lt;p&gt;Last month, one of them went down for 24 hours. Nobody told me. I found out by accident.&lt;/p&gt;

&lt;p&gt;That scared me enough to audit all four projects. I found the same five blind spots across every single one.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. No Uptime Checks (The "It's Probably Fine" Gap)
&lt;/h2&gt;

&lt;p&gt;My journaling app runs on Supabase's free tier. Free-tier projects auto-pause after 7 days of inactivity. I knew this in theory.&lt;/p&gt;

&lt;p&gt;In practice, I shipped a demo to a potential client. The project had been idle. Supabase paused it. The API returned nothing. The frontend showed a blank screen.&lt;/p&gt;

&lt;p&gt;For 24 hours, anyone visiting saw a broken app. I only discovered it when I opened the dashboard for an unrelated reason.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix&lt;/strong&gt;: A cron job that hits every critical endpoint every 6 hours.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# health-check.sh - runs via cron every 6h&lt;/span&gt;
&lt;span class="nv"&gt;ENDPOINTS&lt;/span&gt;&lt;span class="o"&gt;=(&lt;/span&gt;
  &lt;span class="s2"&gt;"https://my-app.vercel.app/api/health"&lt;/span&gt;
  &lt;span class="s2"&gt;"https://my-backend.supabase.co/rest/v1/"&lt;/span&gt;
&lt;span class="o"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for &lt;/span&gt;url &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ENDPOINTS&lt;/span&gt;&lt;span class="p"&gt;[@]&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;&lt;span class="nv"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;curl &lt;span class="nt"&gt;-sf&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; /dev/null &lt;span class="nt"&gt;-w&lt;/span&gt; &lt;span class="s2"&gt;"%{http_code}"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$url&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;--max-time&lt;/span&gt; 10&lt;span class="si"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$status&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s2"&gt;"200"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;python3 ~/bin/alert.py &lt;span class="nt"&gt;--message&lt;/span&gt; &lt;span class="s2"&gt;"DOWN: &lt;/span&gt;&lt;span class="nv"&gt;$url&lt;/span&gt;&lt;span class="s2"&gt; returned &lt;/span&gt;&lt;span class="nv"&gt;$status&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
  &lt;span class="k"&gt;fi
done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Total cost: $0. Runs on the same machine that runs everything else. Better Stack or UptimeRobot are better options. But this costs nothing and catches 80% of failures.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Unstructured Logs Across Services
&lt;/h2&gt;

&lt;p&gt;My healthcare AI tool runs three microservices on Cloud Run: an MCP server, an orchestrator agent, and an interaction checker. When a patient reconciliation fails, which service caused it?&lt;/p&gt;

&lt;p&gt;With &lt;code&gt;print()&lt;/code&gt; statements, the answer is "good luck." Cloud Run interleaves logs from all services. One request touches all three. There's no correlation ID linking them.&lt;/p&gt;

&lt;p&gt;I spent 40 minutes tracing a bug that turned out to be a timeout in the MCP server. The orchestrator logged "reconciliation failed." The MCP server logged nothing useful. The interaction checker never got called.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix&lt;/strong&gt;: Structured JSON logs with a request ID passed through every service call.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;extra&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;entry&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;severity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;extra&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# At the request entry point
&lt;/span&gt;&lt;span class="n"&gt;request_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;())[:&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;INFO&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Reconciliation started&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;request_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;request_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;patient_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;patient_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Pass request_id to downstream services via header
# X-Request-ID: {request_id}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cloud Run (and most log aggregators) parse JSON automatically. Now I can filter by &lt;code&gt;request_id&lt;/code&gt; and see the full trace across services. Structured logging was the single most impactful monitoring improvement I made.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Zero Mobile Crash Visibility
&lt;/h2&gt;

&lt;p&gt;My Android app blocker uses Kotlin and Jetpack Compose. R8 (Android's code shrinker) silently removed a class my accessibility service needed. The app installed fine. It launched fine. The core feature just... didn't work.&lt;/p&gt;

&lt;p&gt;I found this bug during manual testing on a real device. If this had shipped to users, I would have had zero visibility. No crash reports. No error logs. Nothing.&lt;/p&gt;

&lt;p&gt;Android's &lt;code&gt;logcat&lt;/code&gt; only works when you're connected via USB. Once the app is on someone else's phone, you're blind.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix&lt;/strong&gt;: At minimum, catch uncaught exceptions and log them somewhere you can read later.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setDefaultUncaughtExceptionHandler&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;throwable&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;report&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;buildString&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;appendLine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Thread: ${thread.name}"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;appendLine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Error: ${throwable.message}"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;appendLine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;throwable&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stackTraceToString&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;take&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="c1"&gt;// Write to local file, upload on next app launch&lt;/span&gt;
    &lt;span class="nc"&gt;File&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;filesDir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"crash.log"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;writeText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Crashlytics, Sentry, or Bugfender give you stack traces, device info, and occurrence counts out of the box. This basic handler still beats flying blind when you're not ready to pay for one.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. API Quota Exhaustion With No Warning
&lt;/h2&gt;

&lt;p&gt;This week, my social media automation stopped working. No errors in my code. No exceptions. Just... nothing posted.&lt;/p&gt;

&lt;p&gt;The X (Twitter) API returns a &lt;code&gt;CreditsDepleted&lt;/code&gt; error when you hit your monthly quota. My posting script caught the error, logged it to a file, and moved on. Nobody reads log files proactively.&lt;/p&gt;

&lt;p&gt;I discovered the issue 2 days later when I manually checked why engagement dropped to zero.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix&lt;/strong&gt;: Treat quota and billing errors as alerts, not log lines.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post_tweet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;ApiError&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# CreditsDepleted = all posting dead until cycle resets. Treat as outage.
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CreditsDepleted&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;429&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;send_alert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;API QUOTA HIT: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. All posting blocked until reset.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tweet failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The distinction matters. A 500 error is transient. A quota error means everything is broken until the billing cycle resets. That deserves a push notification, not a log line buried in a file.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. CI Tests That Don't Run Against Production
&lt;/h2&gt;

&lt;p&gt;Last week, my healthcare tool's production API broke. I didn't find out from my CI pipeline. I found out from a GitHub notification that sat unread in my inbox for 3 days.&lt;/p&gt;

&lt;p&gt;The problem: my end-to-end tests run against local Docker containers. They pass every time. But the deployed Cloud Run services had drifted. CI was green. Production was broken.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix&lt;/strong&gt;: A scheduled workflow that hits the real production URLs every 6 hours.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# .github/workflows/e2e-smoke.yml&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;e2e smoke tests&lt;/span&gt;
&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;schedule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;cron&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;0&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*/6&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*'&lt;/span&gt;
&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;smoke&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pip install httpx pytest&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pytest tests/e2e/ -v&lt;/span&gt;
        &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;MCP_SERVER_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.PROD_MCP_URL }}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;CI passing on &lt;code&gt;localhost&lt;/code&gt; doesn't mean production works. Scheduled tests against real endpoints catch the drift. I also routed failure notifications to Telegram instead of GitHub's notification bell. GitHub is too noisy. A direct push notification cuts through.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pattern
&lt;/h2&gt;

&lt;p&gt;Every one of these gaps follows the same shape: something fails, nothing tells me, I find out too late.&lt;/p&gt;

&lt;p&gt;The fixes are embarrassingly simple. A cron job. A JSON format string. A try/except that sends a push notification instead of writing to a file. None of this is hard.&lt;/p&gt;

&lt;p&gt;Monitoring isn't about the tool. It's about closing the loop between "something broke" and "someone who can fix it found out." If that loop is open, nothing else matters.&lt;/p&gt;

</description>
      <category>monitoring</category>
      <category>devops</category>
      <category>webdev</category>
      <category>beginners</category>
    </item>
    <item>
      <title>3 Expo SDK 56 Bugs That Crashed My App Before It Even Launched</title>
      <dc:creator>Diven Rastdus</dc:creator>
      <pubDate>Wed, 27 May 2026 12:07:38 +0000</pubDate>
      <link>https://dev.to/diven_rastdus_c5af27d68f3/3-expo-sdk-56-bugs-that-crashed-my-app-before-it-even-launched-1hbp</link>
      <guid>https://dev.to/diven_rastdus_c5af27d68f3/3-expo-sdk-56-bugs-that-crashed-my-app-before-it-even-launched-1hbp</guid>
      <description>&lt;p&gt;I burned four EAS cloud builds and two hours chasing crashes that had nothing to do with my code. All three bugs came from Expo SDK 56 defaults that silently break Android builds. Here's each one and how to fix it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bug 1: expo-av crashes with NoClassDefFoundError
&lt;/h2&gt;

&lt;p&gt;I added voice recording to a dream journal app. The &lt;a href="https://docs.expo.dev/versions/latest/sdk/audio/" rel="noopener noreferrer"&gt;Expo docs for Audio&lt;/a&gt; still reference &lt;code&gt;expo-av&lt;/code&gt; in some examples. So I installed it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx expo &lt;span class="nb"&gt;install &lt;/span&gt;expo-av
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The app compiled. TypeScript was happy. Then the EAS build failed on Android with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;java.lang.NoClassDefFoundError: 
  Failed resolution of: Lio/expo/modules/video/VideoViewModel;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;expo-av&lt;/code&gt; package pulls in video dependencies. In SDK 56, the video module was extracted to a separate &lt;code&gt;expo-video&lt;/code&gt; package. The old monolith references classes that no longer exist.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt; &lt;code&gt;expo-av&lt;/code&gt; is deprecated starting SDK 55. Use &lt;code&gt;expo-audio&lt;/code&gt; for audio and &lt;code&gt;expo-video&lt;/code&gt; for video. They're separate packages now.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm uninstall expo-av
npx expo &lt;span class="nb"&gt;install &lt;/span&gt;expo-audio
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The API changed too. Old:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Audio&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;expo-av&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;recording&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;Audio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Recording&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;recording&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;prepareToRecordAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;Audio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;RecordingOptionsPresets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;HIGH_QUALITY&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;recording&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startAsync&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;New:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;useAudioRecorder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;RecordingPresets&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;expo-audio&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;recorder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useAudioRecorder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;RecordingPresets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;HIGH_QUALITY&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;recorder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;record&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The new API is hook-based. No more class instances, no manual cleanup. &lt;code&gt;useAudioRecorder&lt;/code&gt; handles permissions, lifecycle, and cleanup on unmount.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Time lost:&lt;/strong&gt; 4 EAS builds (~60 minutes). The error message mentions &lt;code&gt;VideoViewModel&lt;/code&gt;, which sent me down a wrong path investigating video dependencies before I realized the entire package was deprecated.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bug 2: Gradle 9.x silently breaks React Native
&lt;/h2&gt;

&lt;p&gt;After fixing the audio crash, the next build failed with a different &lt;code&gt;NoClassDefFoundError&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;java.lang.NoClassDefFoundError: 
  com/android/build/api/variant/impl/JvmVendorSpec
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;npx expo prebuild&lt;/code&gt; generated &lt;code&gt;gradle-wrapper.properties&lt;/code&gt; pointing to Gradle 9.3.1. Gradle 9 removed &lt;code&gt;JvmVendorSpec.IBM_SEMERU&lt;/code&gt;, which React Native's Gradle plugin still references internally.&lt;/p&gt;

&lt;p&gt;The error doesn't mention Gradle versions. It doesn't say "incompatible Gradle." It just throws a class-not-found at build time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt; Pin Gradle to 8.x. After every &lt;code&gt;npx expo prebuild&lt;/code&gt;, check the generated wrapper:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check what version prebuild generated&lt;/span&gt;
&lt;span class="nb"&gt;grep &lt;/span&gt;distributionUrl android/gradle/wrapper/gradle-wrapper.properties
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If it says anything starting with &lt;code&gt;gradle-9&lt;/code&gt;, change it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;distributionUrl&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;https&lt;/span&gt;&lt;span class="se"&gt;\:&lt;/span&gt;&lt;span class="s"&gt;//services.gradle.org/distributions/gradle-8.13-bin.zip&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add this to CI to catch it automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;GRADLE_VER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-oP&lt;/span&gt; &lt;span class="s1"&gt;'gradle-\K[0-9]+'&lt;/span&gt; android/gradle/wrapper/gradle-wrapper.properties&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$GRADLE_VER&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-ge&lt;/span&gt; 9 &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"ERROR: Gradle &lt;/span&gt;&lt;span class="nv"&gt;$GRADLE_VER&lt;/span&gt;&lt;span class="s2"&gt; breaks React Native. Pin to 8.x"&lt;/span&gt;
  &lt;span class="nb"&gt;exit &lt;/span&gt;1
&lt;span class="k"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Time lost:&lt;/strong&gt; 2 builds. The error looks identical to the expo-av crash (both are &lt;code&gt;NoClassDefFoundError&lt;/code&gt;), which made me think I hadn't fully fixed bug #1.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bug 3: Barrel exports + native modules = cascading crash
&lt;/h2&gt;

&lt;p&gt;I had a standard barrel export file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/dream/components/index.ts&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;DreamCard&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./DreamCard&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;MoodPicker&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./MoodPicker&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;VoiceRecorder&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./VoiceRecorder&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;VoiceRecorder&lt;/code&gt; imports &lt;code&gt;expo-audio&lt;/code&gt;. Every screen that imported &lt;em&gt;anything&lt;/em&gt; from &lt;code&gt;@dream/components&lt;/code&gt; would trigger the native module resolution for &lt;code&gt;expo-audio&lt;/code&gt;, even screens that never rendered the recorder.&lt;/p&gt;

&lt;p&gt;In Expo Go (no native modules bundled), this crashes the entire app. Not just the recording screen. Every screen.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt; Never barrel-export components that depend on native modules. Import them directly and lazy-load:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/dream/components/index.ts&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;DreamCard&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./DreamCard&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;MoodPicker&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./MoodPicker&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;// VoiceRecorder NOT barrel-exported -- requires native module&lt;/span&gt;
&lt;span class="c1"&gt;// Import directly: import { VoiceRecorder } from '@dream/components/VoiceRecorder'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On the consuming screen, use React.lazy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;lazy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;Suspense&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;react&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;VoiceRecorder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;lazy&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
  &lt;span class="k"&gt;import&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@dream/components/VoiceRecorder&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;default&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;VoiceRecorder&lt;/span&gt; &lt;span class="p"&gt;}))&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// In render:&lt;/span&gt;
&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;Suspense&lt;/span&gt; &lt;span class="nx"&gt;fallback&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;ActivityIndicator&lt;/span&gt; &lt;span class="o"&gt;/&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;VoiceRecorder&lt;/span&gt; &lt;span class="o"&gt;/&amp;gt;&lt;/span&gt;
&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/Suspense&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This way the native module only loads when the component actually renders, and screens that don't use it never touch &lt;code&gt;expo-audio&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Time lost:&lt;/strong&gt; 1 hour. The crash logs pointed to the native module, not the import chain. I kept looking at &lt;code&gt;expo-audio&lt;/code&gt; configuration when the real problem was in &lt;code&gt;index.ts&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The checklist I wish I had
&lt;/h2&gt;

&lt;p&gt;Before your next Expo SDK 56 Android build: grep for &lt;code&gt;expo-av&lt;/code&gt; (replace with &lt;code&gt;expo-audio&lt;/code&gt;/&lt;code&gt;expo-video&lt;/code&gt;), check &lt;code&gt;gradle-wrapper.properties&lt;/code&gt; isn't 9.x after prebuild, and audit barrel exports for native module imports.&lt;/p&gt;

&lt;p&gt;But mostly: &lt;strong&gt;check the SDK changelog before choosing packages.&lt;/strong&gt; I would have caught bug #1 in 30 seconds by reading the &lt;a href="https://blog.expo.dev/" rel="noopener noreferrer"&gt;Expo SDK 56 changelog&lt;/a&gt;. The deprecation is documented. I just didn't look.&lt;/p&gt;

</description>
      <category>reactnative</category>
      <category>expo</category>
      <category>android</category>
      <category>mobile</category>
    </item>
    <item>
      <title>My Production App Was Down for 24 Hours and Nobody Told Me</title>
      <dc:creator>Diven Rastdus</dc:creator>
      <pubDate>Sat, 23 May 2026 12:09:48 +0000</pubDate>
      <link>https://dev.to/diven_rastdus_c5af27d68f3/my-production-app-was-down-for-24-hours-and-nobody-told-me-3nma</link>
      <guid>https://dev.to/diven_rastdus_c5af27d68f3/my-production-app-was-down-for-24-hours-and-nobody-told-me-3nma</guid>
      <description>&lt;p&gt;I built an AI assessment app for a consulting firm prospect. Deployed it on Supabase free tier. Sent them the link. Then I waited for their review.&lt;/p&gt;

&lt;p&gt;What I didn't know: Supabase auto-pauses free-tier projects after 7 days of inactivity. My prospect opened the link and saw an error page. For up to 24 hours, my best lead thought my work was broken. I found out by accident when I checked the dashboard myself.&lt;/p&gt;

&lt;p&gt;No alert. No email I noticed. No monitoring. Just silence while my credibility evaporated.&lt;/p&gt;

&lt;h2&gt;
  
  
  The failure mode nobody warns you about
&lt;/h2&gt;

&lt;p&gt;Most monitoring advice assumes you're running your own servers. "Set up Prometheus. Configure Grafana dashboards. Integrate PagerDuty." That's great if you're running Kubernetes at scale.&lt;/p&gt;

&lt;p&gt;But if you're an indie developer shipping on free tiers, the failure mode is different. Your platform shuts you down deliberately because you're not generating enough activity.&lt;/p&gt;

&lt;p&gt;This isn't a Supabase-specific problem. It's a pattern across every free-tier platform:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Supabase free&lt;/strong&gt;: auto-pauses after 7 days of inactivity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fly.io free&lt;/strong&gt;: machines stop after ~5 minutes idle&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Render free&lt;/strong&gt;: services spin down after 15 minutes of inactivity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Railway free&lt;/strong&gt;: $5 credit cap, then full stop&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vercel hobby&lt;/strong&gt;: bandwidth and serverless execution limits&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every one of these can take your app offline while you sleep. And none of them page you when it happens.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 5 signals I actually monitor now
&lt;/h2&gt;

&lt;p&gt;After losing that prospect, I built a monitoring checklist. Nothing fancy. No SaaS subscription required. Just the bare minimum that would have caught the problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Availability ping (is the thing alive?)
&lt;/h3&gt;

&lt;p&gt;The most basic check. Hit your API endpoint. If the HTTP status isn't 2xx or a known "alive" code, something is wrong.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="nv"&gt;URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"https://your-project.supabase.co/rest/v1/"&lt;/span&gt;
&lt;span class="nv"&gt;STATUS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; /dev/null &lt;span class="nt"&gt;-w&lt;/span&gt; &lt;span class="s2"&gt;"%{http_code}"&lt;/span&gt; &lt;span class="nt"&gt;--max-time&lt;/span&gt; 15 &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$URL&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$STATUS&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="k"&gt;in
  &lt;/span&gt;200|401|404|405&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"ALIVE"&lt;/span&gt; &lt;span class="p"&gt;;;&lt;/span&gt;
  &lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"DOWN: HTTP &lt;/span&gt;&lt;span class="nv"&gt;$STATUS&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="p"&gt;;;&lt;/span&gt;
&lt;span class="k"&gt;esac&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why &lt;code&gt;401&lt;/code&gt; counts as alive: Supabase returns 401 when you hit the REST endpoint without an API key. That's fine. It means the server is running. A paused project returns a 5xx or times out entirely.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Critical endpoint health (does it return real data?)
&lt;/h3&gt;

&lt;p&gt;An availability ping tells you the server boots. It doesn't tell you the database migrated correctly or the API returns valid responses.&lt;/p&gt;

&lt;p&gt;Pick your most critical endpoint. Hit it with real parameters. Validate the response shape.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;RESPONSE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"apikey: &lt;/span&gt;&lt;span class="nv"&gt;$SUPABASE_ANON_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$URL&lt;/span&gt;&lt;span class="s2"&gt;/your_table?select=id&amp;amp;limit=1"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$RESPONSE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | jq &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s1"&gt;'.[0].id'&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /dev/null 2&amp;gt;&amp;amp;1&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"HEALTHY"&lt;/span&gt;
&lt;span class="k"&gt;else
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"DEGRADED: unexpected response shape"&lt;/span&gt;
&lt;span class="k"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This catches migrations that broke a column name, RLS policies that started blocking reads, and connection pool exhaustion. All things I've hit in production that a simple ping would have missed.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Platform-specific tripwires
&lt;/h3&gt;

&lt;p&gt;Every platform has a "we're about to shut you down" signal. Find it and watch for it.&lt;/p&gt;

&lt;p&gt;For Supabase, they email you 24 hours before auto-pausing. So I added this to my morning Gmail scan:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight email"&gt;&lt;code&gt;&lt;span class="nt"&gt;from&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="na"&gt;supabase.com subject:(paused OR pausing OR inactive)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For AWS, it's budget alerts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws budgets create-budget &lt;span class="nt"&gt;--account-id&lt;/span&gt; &lt;span class="nv"&gt;$ACCOUNT_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--budget&lt;/span&gt; &lt;span class="s1"&gt;'{"BudgetName":"monthly-cap","BudgetLimit":{"Amount":"10","Unit":"USD"},"TimeUnit":"MONTHLY","BudgetType":"COST"}'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--notifications-with-subscribers&lt;/span&gt; &lt;span class="s1"&gt;'[{"Notification":{"NotificationType":"ACTUAL","ComparisonOperator":"GREATER_THAN","Threshold":50},"Subscribers":[{"SubscriptionType":"EMAIL","Address":"you@example.com"}]}]'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The specifics vary by platform. The principle doesn't: find the signal your platform sends before it kills you, and make sure you're listening.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Post-deploy smoke test
&lt;/h3&gt;

&lt;p&gt;Every deployment should end with a health check. Not "the build succeeded." Not "the tests passed." Did the deployed version actually respond correctly?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# .github/workflows/smoke.yml&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Post-deploy smoke test&lt;/span&gt;
&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;workflow_run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;workflows&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Deploy"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;types&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;completed&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;smoke&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ github.event.workflow_run.conclusion == 'success' }}&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Check production health&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;STATUS=$(curl -s -o /dev/null -w "%{http_code}" \&lt;/span&gt;
            &lt;span class="s"&gt;--max-time 30 "https://your-app.vercel.app/api/health")&lt;/span&gt;
          &lt;span class="s"&gt;if [ "$STATUS" != "200" ]; then&lt;/span&gt;
            &lt;span class="s"&gt;echo "SMOKE TEST FAILED: HTTP $STATUS"&lt;/span&gt;
            &lt;span class="s"&gt;exit 1&lt;/span&gt;
          &lt;span class="s"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I've had deployments where the build was green, tests passed, Vercel reported success, and the app was broken because an environment variable wasn't set in the production environment. A 10-second curl after deploy would have caught it.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Keep-alive cron for free tiers
&lt;/h3&gt;

&lt;p&gt;This is the one that would have saved me. A cron job that pings your free-tier services twice a week, resetting the inactivity timer before the platform shuts you down.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# Keep free-tier backends alive. Run Mon+Thu via cron.&lt;/span&gt;
&lt;span class="c"&gt;# Any request resets the inactivity timer.&lt;/span&gt;

&lt;span class="nv"&gt;PROJECTS&lt;/span&gt;&lt;span class="o"&gt;=(&lt;/span&gt;
  &lt;span class="s2"&gt;"project-id-1:my-saas-demo"&lt;/span&gt;
  &lt;span class="s2"&gt;"project-id-2:portfolio-api"&lt;/span&gt;
&lt;span class="o"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for &lt;/span&gt;entry &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECTS&lt;/span&gt;&lt;span class="p"&gt;[@]&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;&lt;span class="nv"&gt;ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;%%&lt;/span&gt;:&lt;span class="p"&gt;*&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
  &lt;span class="nv"&gt;NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;#*&lt;/span&gt;:&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
  &lt;span class="nv"&gt;STATUS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; /dev/null &lt;span class="nt"&gt;-w&lt;/span&gt; &lt;span class="s2"&gt;"%{http_code}"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--max-time&lt;/span&gt; 15 &lt;span class="s2"&gt;"https://&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.supabase.co/rest/v1/"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="nt"&gt;-Iseconds&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$NAME&lt;/span&gt;&lt;span class="s2"&gt; HTTP=&lt;/span&gt;&lt;span class="nv"&gt;$STATUS&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# crontab&lt;/span&gt;
0 9 &lt;span class="k"&gt;*&lt;/span&gt; &lt;span class="k"&gt;*&lt;/span&gt; 1,4 /home/you/bin/keepalive.sh &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; /var/log/keepalive.log
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two requests per week. Zero cost. Prevents a class of outage that no amount of application-level error handling can catch.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd do differently
&lt;/h2&gt;

&lt;p&gt;If I were starting a new project today, I'd set up monitoring before I deploy, not after the first outage.&lt;/p&gt;

&lt;p&gt;The full checklist takes about 30 minutes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Write a &lt;code&gt;/api/health&lt;/code&gt; endpoint that checks database connectivity&lt;/li&gt;
&lt;li&gt;Add a post-deploy smoke test in CI&lt;/li&gt;
&lt;li&gt;Set up platform-specific alerts (budget, pause warnings, rate limits)&lt;/li&gt;
&lt;li&gt;Add a keep-alive cron for any free-tier dependency&lt;/li&gt;
&lt;li&gt;Put the monitoring script in the same repo as the app&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;None of this requires a monitoring SaaS. A bash script, a cron job, and a GitHub Actions workflow cover 90% of what a solo developer needs.&lt;/p&gt;

&lt;p&gt;The remaining 10%? That's where proper observability tools earn their keep. Distributed tracing, error aggregation, performance profiling. But you can't justify those until you've nailed the basics.&lt;/p&gt;

&lt;p&gt;Start with a cron job and a curl. It's boring. It works. And it would have saved me from explaining to a prospect why my demo was showing an error page.&lt;/p&gt;

</description>
      <category>monitoring</category>
      <category>devops</category>
      <category>observability</category>
      <category>webdev</category>
    </item>
    <item>
      <title>R8 Minification Silently Killed My Android App's Core Feature (And Tests Didn't Catch It)</title>
      <dc:creator>Diven Rastdus</dc:creator>
      <pubDate>Wed, 20 May 2026 12:31:41 +0000</pubDate>
      <link>https://dev.to/diven_rastdus_c5af27d68f3/r8-minification-silently-killed-my-android-apps-core-feature-and-tests-didnt-catch-it-1pmb</link>
      <guid>https://dev.to/diven_rastdus_c5af27d68f3/r8-minification-silently-killed-my-android-apps-core-feature-and-tests-didnt-catch-it-1pmb</guid>
      <description>&lt;p&gt;My CI pipeline was green. Unit tests passed. The APK built and signed without errors. I installed it on my Pixel 3. The app launched, looked perfect.&lt;/p&gt;

&lt;p&gt;Then I tried to use it. Nothing happened.&lt;/p&gt;

&lt;p&gt;No crash. No error dialog. No logcat stacktrace. The app's entire core feature was just... gone. Like someone had hollowed it out and left the shell.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I was building
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/astraedus/nudge" rel="noopener noreferrer"&gt;Nudge&lt;/a&gt; is an open-source Android app blocker. You pick the apps you waste time on and set a delay (say, 30 seconds). Nudge forces you to wait before opening them.&lt;/p&gt;

&lt;p&gt;It uses three Android system APIs that require special permissions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AccessibilityService&lt;/strong&gt; to detect which app is in the foreground&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SYSTEM_ALERT_WINDOW&lt;/strong&gt; to draw the delay countdown overlay&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PACKAGE_USAGE_STATS&lt;/strong&gt; to track daily screen time per app&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The debug build worked flawlessly. I'd been testing it for weeks. The release build was supposed to be the same thing, just signed and minified.&lt;/p&gt;

&lt;p&gt;It was not the same thing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 2MB red flag I ignored
&lt;/h2&gt;

&lt;p&gt;Here's what the release APK looked like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;debug APK:   12.4 MB
release APK:  2.1 MB
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I should have questioned that 83% size reduction. Instead, I thought "wow, R8 really does its job." It did its job too well.&lt;/p&gt;

&lt;p&gt;R8 is Android's default code shrinker and optimizer. It removes unused classes, inlines methods, renames symbols, and strips dead code. For most apps, it's free performance. For apps that rely on system-level callbacks, it's a silent killer.&lt;/p&gt;

&lt;h2&gt;
  
  
  What R8 actually stripped
&lt;/h2&gt;

&lt;p&gt;R8 analyzed my code's call graph and decided several things were "unused":&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;NudgeAccessibilityService&lt;/strong&gt; - The Android system instantiates this class by reading the manifest. R8 doesn't know that. It saw no &lt;code&gt;new NudgeAccessibilityService()&lt;/code&gt; in the code, so it stripped it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;BlockOverlayActivity&lt;/strong&gt; - Launched via an explicit &lt;code&gt;Intent&lt;/code&gt; constructed at runtime. R8 traced the static references but couldn't follow the dynamic class resolution.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Hilt entry points&lt;/strong&gt; - Dagger/Hilt uses annotation processing and reflection to wire dependencies. R8 stripped the interfaces that Hilt's generated code needs at runtime.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The result? The app installed. The UI rendered. But the AccessibilityService never registered with the system. No foreground app detection. No overlay. No blocking. The app was a beautiful, non-functional shell.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why tests didn't catch it
&lt;/h2&gt;

&lt;p&gt;This is the part that stung. I had tests. They passed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# From our GitHub Actions workflow&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Run tests&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./gradlew test&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build release APK&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./gradlew assembleRelease&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The problem: &lt;code&gt;./gradlew test&lt;/code&gt; runs against the &lt;strong&gt;debug&lt;/strong&gt; build variant. The release build with R8 is a completely different artifact. My tests verified the debug build worked. Then CI built a release APK that was structurally different.&lt;/p&gt;

&lt;p&gt;This is equivalent to testing your code on localhost and deploying a Docker image built with different flags. The artifact you tested is not the artifact you shipped.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix (and why I didn't just add ProGuard rules)
&lt;/h2&gt;

&lt;p&gt;The obvious fix is ProGuard keep rules:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# AccessibilityService instantiated by the system via manifest
-keep class com.astraedus.nudge.service.NudgeAccessibilityService { *; }

# Overlay activity launched via Intent
-keep class com.astraedus.nudge.ui.overlay.BlockOverlayActivity { *; }

# Hilt entry points use reflection
-keep interface com.astraedus.nudge.service.NudgeAccessibilityService$NudgeAccessibilityEntryPoint { *; }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I wrote these rules. Then I disabled R8 entirely.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="nf"&gt;buildTypes&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;release&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;isMinifyEnabled&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;
        &lt;span class="n"&gt;signingConfig&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;signingConfigs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getByName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"release"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why? Nudge has zero internet permission. There's no network call to intercept, no API key to extract, no proprietary algorithm to reverse-engineer. The source code is public on GitHub. Minification was adding build complexity and production risk for zero security benefit.&lt;/p&gt;

&lt;p&gt;ProGuard rules are the right answer for apps that need obfuscation. For open-source apps with system-level APIs, the risk-reward math doesn't work out. Every new service class or Hilt module becomes a potential R8 landmine unless you maintain the keep rules in lockstep.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I should have done differently
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Test the release build, not just the debug build.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Run tests against release variant&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./gradlew testReleaseUnitTest&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build and install release APK on emulator&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;./gradlew assembleRelease&lt;/span&gt;
    &lt;span class="s"&gt;adb install app/build/outputs/apk/release/app-release.apk&lt;/span&gt;
    &lt;span class="s"&gt;# Smoke test: verify the accessibility service registers&lt;/span&gt;
    &lt;span class="s"&gt;adb shell dumpsys accessibility | grep -q "NudgeAccessibilityService"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This would have caught the stripped service in CI before it reached a device.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Treat APK size deltas as a signal.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A debug-to-release size reduction of more than 50% warrants investigation. R8 removing 83% of your APK means it's removing a lot of code it thinks is dead. Some of it might not be.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Verify system service registration in your test suite.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For any Android app that uses AccessibilityService, NotificationListenerService, or DeviceAdminReceiver, add a test that verifies the service appears in the system's service list after installation. These services fail silently when missing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bonus: the permission disclosure pattern
&lt;/h2&gt;

&lt;p&gt;Getting Google Play to approve an app with QUERY_ALL_PACKAGES, AccessibilityService, and SYSTEM_ALERT_WINDOW is its own adventure. Google's review team needs to see explicit in-app disclosure of what each permission does and why.&lt;/p&gt;

&lt;p&gt;Here's the pattern I used in the onboarding flow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Composable&lt;/span&gt;
&lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;PermissionCard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;icon&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;ImageVector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;onClick&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nc"&gt;Unit&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;Card&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;modifier&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Modifier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fillMaxWidth&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="n"&gt;onClick&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;onClick&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;Row&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;modifier&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Modifier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;padding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nc"&gt;Icon&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;icon&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;contentDescription&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nc"&gt;Spacer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Modifier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;width&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="nc"&gt;Column&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="nc"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;style&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MaterialTheme&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;typography&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;titleSmall&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="nc"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;style&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MaterialTheme&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;typography&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bodySmall&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each card explains what the permission accesses, why, and explicitly states what it does NOT do:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Detects which app is in the foreground so Nudge can trigger your block rules. Does not read your messages, keystrokes, or screen content."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The "does not" part matters. Google's review checks for proactive privacy disclosure, and users are (rightly) suspicious of AccessibilityService apps.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;R8 minification is not compression. It's a code transformation that decides what your app needs at runtime. When those decisions are wrong, nothing crashes. Nothing logs an error. The feature just doesn't exist.&lt;/p&gt;

&lt;p&gt;If your Android app uses system callbacks (AccessibilityService, ContentProvider, BroadcastReceiver registered in manifest), either maintain ProGuard rules religiously or ask yourself whether minification is earning its keep.&lt;/p&gt;

&lt;p&gt;And whatever you do: test the artifact you ship, not the artifact you develop against. That gap is where bugs like this live.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Nudge is open source: &lt;a href="https://github.com/astraedus/nudge" rel="noopener noreferrer"&gt;github.com/astraedus/nudge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>android</category>
      <category>kotlin</category>
      <category>cicd</category>
      <category>testing</category>
    </item>
    <item>
      <title>Your AI Agent Evaluation Is Lying to You: Why 10 Test Runs Prove Nothing</title>
      <dc:creator>Diven Rastdus</dc:creator>
      <pubDate>Fri, 08 May 2026 12:06:09 +0000</pubDate>
      <link>https://dev.to/diven_rastdus_c5af27d68f3/your-ai-agent-evaluation-is-lying-to-you-why-10-test-runs-prove-nothing-1ij2</link>
      <guid>https://dev.to/diven_rastdus_c5af27d68f3/your-ai-agent-evaluation-is-lying-to-you-why-10-test-runs-prove-nothing-1ij2</guid>
      <description>&lt;p&gt;I ran 10 games between two AI agents. Agent v3 went 5-5 against Agent v1. I reported "v3 ties v1, no measurable improvement, don't merge."&lt;/p&gt;

&lt;p&gt;That conclusion was wrong. Not because v3 was secretly better or worse, but because 10 games told me almost nothing at all.&lt;/p&gt;

&lt;p&gt;Here's the math I should have done first.&lt;/p&gt;

&lt;h2&gt;
  
  
  The win-rate trap
&lt;/h2&gt;

&lt;p&gt;The obvious metric for comparing two agents is win rate. Agent A beats Agent B 50% of the time? They're even. 70%? A is better. Simple.&lt;/p&gt;

&lt;p&gt;Except win rate has a confidence interval, and at small N that interval is enormous.&lt;/p&gt;

&lt;p&gt;The Wilson score interval gives a reasonable bound for binary outcomes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;wilson_interval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;wins&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1.96&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;95% confidence interval for true win probability.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;wins&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;
    &lt;span class="n"&gt;denom&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;
    &lt;span class="n"&gt;center&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;denom&lt;/span&gt;
    &lt;span class="n"&gt;spread&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;denom&lt;/span&gt;
    &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;center&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;spread&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;center&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;spread&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At 5 wins out of 10 games:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;wilson_interval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.236&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.764&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The 95% confidence interval for the true win probability is &lt;strong&gt;[0.24, 0.76]&lt;/strong&gt;. That range comfortably fits "Agent A is dominant" (76% win rate), "they're even" (50%), and "Agent B is dominant" (24%). You literally cannot tell them apart.&lt;/p&gt;

&lt;p&gt;How many games do you need? For two agents where the true skill gap gives one a 60% win rate, you need roughly 100 games to shrink the CI enough to exclude 50%. For a 55% edge, you're looking at 400+.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Minimum games to distinguish p_true from 0.5 at 95% confidence
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;min_games&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1.96&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Approximate sample size for Wilson CI to exclude 0.5.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;delta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p_true&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ceil&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;p_true&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;p_true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;min_games&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# 60% true win rate
&lt;/span&gt;&lt;span class="mi"&gt;93&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;min_games&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.55&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# 55% true win rate
&lt;/span&gt;&lt;span class="mi"&gt;381&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;min_games&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.52&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# 52% true win rate
&lt;/span&gt;&lt;span class="mi"&gt;2401&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Most agent improvements are in the 52-58% range against the prior version. You need hundreds of games, not ten.&lt;/p&gt;

&lt;h2&gt;
  
  
  TrueSkill makes the same mistake look different
&lt;/h2&gt;

&lt;p&gt;If you're running a multi-agent ladder (like I am for a Kaggle competition), you're probably using TrueSkill or Elo instead of raw win rate. These feel more sophisticated. They give you a single number -- the mu rating -- and you compare it across agents.&lt;/p&gt;

&lt;p&gt;But TrueSkill also tracks sigma, the uncertainty in that rating. And at low game counts, sigma is so large that the ratings are meaningless.&lt;/p&gt;

&lt;p&gt;Here's my actual ladder setup, mirroring Kaggle's scoring:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;trueskill&lt;/span&gt;

&lt;span class="n"&gt;env&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;trueskill&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;TrueSkill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;600.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;           &lt;span class="c1"&gt;# Kaggle's initial rating
&lt;/span&gt;    &lt;span class="n"&gt;sigma&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;200.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;# starts extremely uncertain
&lt;/span&gt;    &lt;span class="n"&gt;draw_probability&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.05&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After 10 games, a typical agent might show mu=640, sigma=36. That looks precise. It's not. The 95% confidence interval on the true skill is [mu - 2*sigma, mu + 2*sigma] = &lt;strong&gt;[568, 712]&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;When I compared v1 (mu=640, sigma=36) against v3 (mu=560, sigma=36), the intervals were [568, 712] and [488, 632]. They overlap by 64 points. I could not distinguish these agents. But the mu gap (80 points) looked meaningful on a leaderboard.&lt;/p&gt;

&lt;p&gt;The fix is to check sigma before drawing conclusions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;ratings_are_distinguishable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rating_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rating_b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.95&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Check if two TrueSkill ratings are statistically distinguishable.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;mu_diff&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rating_a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;rating_b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;combined_uncertainty&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rating_a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sigma&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;rating_b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sigma&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# z-score for the difference
&lt;/span&gt;    &lt;span class="n"&gt;z&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mu_diff&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;combined_uncertainty&lt;/span&gt;
    &lt;span class="c1"&gt;# For 95% confidence, need z &amp;gt; 1.96
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;1.96&lt;/span&gt;

&lt;span class="c1"&gt;# After 10 games: NOT distinguishable
&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;ratings_are_distinguishable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt;     &lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_rating&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;640&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sigma&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;36&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt;     &lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_rating&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;560&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sigma&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;36&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="bp"&gt;False&lt;/span&gt;

&lt;span class="c1"&gt;# After 200 games (sigma ~8): distinguishable if gap is real
&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;ratings_are_distinguishable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt;     &lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_rating&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;640&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sigma&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt;     &lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_rating&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;560&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sigma&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The fix: three rules
&lt;/h2&gt;

&lt;p&gt;After burning a day on a wrong conclusion, I now follow three rules for agent evaluation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rule 1: Persist ratings across runs.&lt;/strong&gt; Every ladder session starting from sigma=200 wastes all prior information. Save ratings to disk and load them on the next run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;

&lt;span class="n"&gt;RATINGS_PATH&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;runs/ratings.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;load_ratings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Load persisted TrueSkill ratings, or return empty dict.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;RATINGS_PATH&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;RATINGS_PATH&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_text&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_rating&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;sigma&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sigma&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;save_ratings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ratings&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Persist current ratings to disk.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;RATINGS_PATH&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mkdir&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exist_ok&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sigma&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sigma&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ratings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;RATINGS_PATH&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;indent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now each run adds information instead of starting from scratch. Sigma actually converges.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rule 2: Set a sigma floor before making decisions.&lt;/strong&gt; Don't compare agents until both have sigma below the gap you care about. For my competition, that's sigma &amp;lt; 15:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;is_converged&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rating&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sigma_threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;15.0&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;rating&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sigma&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;sigma_threshold&lt;/span&gt;

&lt;span class="c1"&gt;# Before comparing v1 and v3:
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;is_converged&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ratings&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="nf"&gt;is_converged&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ratings&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])):&lt;/span&gt;
    &lt;span class="n"&gt;games_needed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;estimate_games_to_converge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ratings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Need ~&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;games_needed&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; more games before comparison is valid&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Rule 3: Report intervals, not point estimates.&lt;/strong&gt; Never say "v3 has mu=560." Say "v3 has mu=560 +/- 72 (95% CI)." The interval is the answer. The point estimate is decoration.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;format_rating&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rating&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;ci&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;rating&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sigma&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;rating&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; +/- &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ci&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (sigma=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;rating&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sigma&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# "v3: 560 +/- 72 (sigma=36.0)"   -- don't trust this
# "v3: 560 +/- 16 (sigma=8.0)"    -- now we're talking
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What this actually looks like in practice
&lt;/h2&gt;

&lt;p&gt;I'm building game AI agents for a Kaggle competition. My ladder now persists ratings across sessions and prints a convergence status alongside every ranking:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent           |  mu   | sigma | 95% CI        | Games | Converged
v22_timeline    |  907  |  11.2 | [885, 930]    |   142 | Yes
v21_capture     |  842  |  14.8 | [812, 871]    |    89 | Yes
romantamrazov   |  823  |  16.1 | [791, 855]    |    72 | BORDERLINE
v19_lp          |  798  |  18.3 | [761, 835]    |    51 | No
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The "Converged" column is the gate. I don't merge a new agent variant until its sigma is below 15 and the CI doesn't overlap with the agent it's trying to beat. This costs more compute upfront (running 100+ games instead of 10) but saves me from merging regressions and spending days debugging phantom improvements.&lt;/p&gt;

&lt;h2&gt;
  
  
  The deeper problem
&lt;/h2&gt;

&lt;p&gt;This isn't just a statistics issue. It's a workflow issue. When you run 10 tests, get a number, and make a decision, you feel like you evaluated something. The ritual of "run tests, look at results, decide" creates false confidence even when the test itself had zero statistical power.&lt;/p&gt;

&lt;p&gt;The fix is mechanical: compute the confidence interval, display it, and refuse to decide when it's too wide. Make the uncertainty impossible to ignore. If your evaluation pipeline doesn't show you how uncertain it is, it's not an evaluation pipeline. It's a random number generator with a nice UI.&lt;/p&gt;




&lt;p&gt;I build AI systems and compete in Kaggle's Orbit Wars competition. I write about the real problems I hit -- the kind that don't show up in tutorials. More at &lt;a href="https://astraedus.dev" rel="noopener noreferrer"&gt;astraedus.dev&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>python</category>
      <category>gamedev</category>
    </item>
    <item>
      <title>How I Built a Push-Based Gmail Bridge for My AI Agent (Zero Polling, Free Tier)</title>
      <dc:creator>Diven Rastdus</dc:creator>
      <pubDate>Tue, 05 May 2026 12:05:47 +0000</pubDate>
      <link>https://dev.to/diven_rastdus_c5af27d68f3/how-i-built-a-push-based-gmail-bridge-for-my-ai-agent-zero-polling-free-tier-58oa</link>
      <guid>https://dev.to/diven_rastdus_c5af27d68f3/how-i-built-a-push-based-gmail-bridge-for-my-ai-agent-zero-polling-free-tier-58oa</guid>
      <description>&lt;p&gt;I missed a prize-notification email by 24 hours because my AI agent only checked Gmail when it booted. The email needed a response within 48 hours. I had 24 left by the time the next session started. That gap nearly cost me real money.&lt;/p&gt;

&lt;p&gt;Polling is the obvious fix. Set up a cron that checks Gmail every 5 minutes. But polling Gmail has three problems:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Latency floor equals poll interval.&lt;/strong&gt; 5-minute polling means up to 5 minutes of dead time on urgent messages.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wasted API calls.&lt;/strong&gt; 288 API calls per day to catch maybe 3-5 messages that actually matter.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rate limit risk.&lt;/strong&gt; Gmail API quotas are generous (15K units/day/user) but polling invites you to burn them on nothing.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;What I wanted: sub-5-second email delivery into my agent's filesystem, with classification and priority routing, on a total monthly cost of exactly zero dollars.&lt;/p&gt;

&lt;p&gt;Here's what I built.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Gmail (your-email@gmail.com)
  | users.watch() -- renew daily, 7d max expiry
  v
Cloud Pub/Sub topic
  | push subscription (OIDC-signed JWT)
  v
Cloudflare Tunnel (public URL -&amp;gt; localhost:8090)
  v
Python receiver (aiohttp)
  - verify OIDC JWT from Google
  - dedupe by Pub/Sub messageId (SQLite)
  - history.list since last stored historyId
  - messages.get for each new message
  - classify by rules engine (YAML, hot-reload)
  - fan out: urgent -&amp;gt; Telegram ping, info -&amp;gt; digest file
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key insight: Gmail's &lt;code&gt;users.watch()&lt;/code&gt; method tells Google "push a notification to this Pub/Sub topic whenever this mailbox changes." Google handles the watching. You handle the reacting.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Set up Pub/Sub
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create topic and subscription&lt;/span&gt;
gcloud pubsub topics create gmail-notifications
gcloud pubsub subscriptions create gmail-push &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--topic&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;gmail-notifications &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--push-endpoint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;https://your-hook.example.com/pubsub &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--push-auth-service-account&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your-sa@project.iam.gserviceaccount.com

&lt;span class="c"&gt;# Grant Gmail permission to publish to your topic&lt;/span&gt;
&lt;span class="c"&gt;# (Gmail API uses a fixed service account for this)&lt;/span&gt;
gcloud pubsub topics add-iam-policy-binding gmail-notifications &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--member&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"serviceAccount:gmail-api-push@system.gserviceaccount.com"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"roles/pubsub.publisher"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cost: $0. First 10 GiB/month of Pub/Sub throughput is free. Email notifications are tiny.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Register the Gmail watch
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.oauth2.credentials&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Credentials&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;googleapiclient.discovery&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;build&lt;/span&gt;

&lt;span class="n"&gt;creds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Credentials&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_authorized_user_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;token.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;service&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gmail&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;credentials&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;creds&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;users&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;watch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;userId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;me&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;topicName&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;projects/your-project/topics/gmail-notifications&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;labelIds&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;INBOX&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Watch expires: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;expiration&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Watch expires: 1714540800000 (7 days from now)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two catches:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The watch expires after 7 days max. Set up a daily cron to renew it.&lt;/li&gt;
&lt;li&gt;If the watch silently expires (your renewal cron missed a day), you lose notifications until renewal. Build a staleness check.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 3: The receiver
&lt;/h2&gt;

&lt;p&gt;The receiver is a tiny aiohttp server. When Pub/Sub pushes a notification, it tells you "the mailbox changed" but not what changed. You have to walk the history yourself.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aiohttp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;web&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.oauth2&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;id_token&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.auth.transport&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;google_requests&lt;/span&gt;

&lt;span class="n"&gt;PUBSUB_AUDIENCE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://your-hook.example.com/pubsub&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_pubsub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;web&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;web&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Step 1: Verify the OIDC JWT from Google
&lt;/span&gt;    &lt;span class="n"&gt;auth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;web&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;401&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;:].&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;claims&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;id_token&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;verify_oauth2_token&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;google_requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;audience&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;PUBSUB_AUDIENCE&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 2: Decode the notification
&lt;/span&gt;    &lt;span class="n"&gt;envelope&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;base64&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;b64decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;envelope&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;notif&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;history_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;notif&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;historyId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 3: Walk history since last known point
&lt;/span&gt;    &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;walk_and_process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history_id&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 4: ACK immediately (Pub/Sub retries on non-2xx)
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;web&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;204&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The critical pattern: &lt;strong&gt;ACK fast, process async.&lt;/strong&gt; If your handler takes more than 10 seconds, Pub/Sub assumes delivery failed and retries. This creates duplicate processing unless you dedupe. Return 204 immediately, do the expensive work in a background task.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: History walking
&lt;/h2&gt;

&lt;p&gt;Gmail notifications only say "something changed at historyId X." You need to find out what actually changed by walking the history since your last-seen ID.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;list_history&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start_history_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Walk history.list, return new message IDs.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;added&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;page_token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;users&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;history&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;userId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;me&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;startHistoryId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;start_history_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;historyTypes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messageAdded&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;pageToken&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;page_token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;history&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]):&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messagesAdded&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]):&lt;/span&gt;
                &lt;span class="n"&gt;added&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

        &lt;span class="n"&gt;page_token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nextPageToken&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;page_token&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;added&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;historyId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The gotcha: if your stored historyId is older than 7 days, &lt;code&gt;history.list&lt;/code&gt; returns 404. You need a fallback:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;added&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;latest_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list_history&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;last_history_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;HttpError&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;404&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# historyId expired -- fall back to recent messages
&lt;/span&gt;        &lt;span class="n"&gt;added&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list_recent_unread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;days&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;latest_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;current_history_id&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 5: Classification (the useful part)
&lt;/h2&gt;

&lt;p&gt;Raw email delivery is not enough. You need routing. My classifier uses a YAML rules file that hot-reloads on every call (no restart needed to add rules):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;payment-notification&lt;/span&gt;
    &lt;span class="na"&gt;match&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;from_regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;@stripe&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;.com"&lt;/span&gt;
      &lt;span class="na"&gt;subject_regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;(?i)(payment|payout|charge)"&lt;/span&gt;
    &lt;span class="na"&gt;classification&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;REVENUE&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;warm-lead-reply&lt;/span&gt;
    &lt;span class="na"&gt;match&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;in_thread_with_outbound&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
      &lt;span class="na"&gt;from_not_regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;(?i)(noreply|automated|newsletter)"&lt;/span&gt;
    &lt;span class="na"&gt;classification&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;URGENT-HUMAN&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
    &lt;span class="na"&gt;match&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{}&lt;/span&gt;
    &lt;span class="na"&gt;classification&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;INFORMATIONAL&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;in_thread_with_outbound&lt;/code&gt; check is the clever one. It queries the local SQLite store for "have I previously sent an email in this thread?" If yes, the reply is from someone I contacted -- a warm lead, not spam. Classify it as urgent.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_has_outbound_in_thread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self_addr&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db_path&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT 1 FROM seen_gmail_msg &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;WHERE thread_id = ? AND from_addr LIKE ? LIMIT 1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self_addr&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;%&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;fetchone&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;First match wins. The engine processes 10 rules in under 1ms. No need for a proper NLP pipeline here.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 6: Fan-out
&lt;/h2&gt;

&lt;p&gt;After classification, messages route to different outputs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;classification&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;URGENT-HUMAN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REVENUE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Push notification (Telegram, Discord, whatever)
&lt;/span&gt;    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;send_alert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;classification&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;from_addr&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;subject&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# Also write to urgent inbox file
&lt;/span&gt;    &lt;span class="nf"&gt;append_to_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;~/ops/INBOX-URGENT.md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;formatted_entry&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Quiet digest
&lt;/span&gt;    &lt;span class="nf"&gt;append_to_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;~/ops/INBOX-DIGEST.md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;formatted_entry&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Always: per-thread markdown file for full history
&lt;/span&gt;&lt;span class="nf"&gt;write_thread_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;~/ops/threads/email/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each thread gets its own markdown file with frontmatter. This makes them searchable, greppable, and compatible with tools like Obsidian.&lt;/p&gt;

&lt;h2&gt;
  
  
  The tunnel: Cloudflare (free, stable)
&lt;/h2&gt;

&lt;p&gt;Google Pub/Sub needs a public HTTPS endpoint. Options:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Option&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;th&gt;Reliability&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ngrok&lt;/td&gt;
&lt;td&gt;$8/mo for stable URL&lt;/td&gt;
&lt;td&gt;Good but paid&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cloudflare Tunnel&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;Excellent. Runs as systemd service.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cloud Function&lt;/td&gt;
&lt;td&gt;$0 (free tier)&lt;/td&gt;
&lt;td&gt;Adds cold start latency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Self-hosted VPS&lt;/td&gt;
&lt;td&gt;$5-20/mo&lt;/td&gt;
&lt;td&gt;Overkill&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Cloudflare Tunnel wins. Install &lt;code&gt;cloudflared&lt;/code&gt;, authenticate, create a tunnel pointing to &lt;code&gt;localhost:8090&lt;/code&gt;, add a DNS record. Done. It runs as a systemd service with automatic reconnection.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cloudflared tunnel create gmail-bridge
cloudflared tunnel route dns gmail-bridge your-hook.example.com
&lt;span class="c"&gt;# Then create a systemd unit that runs:&lt;/span&gt;
&lt;span class="c"&gt;# cloudflared tunnel run gmail-bridge&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Failure handling
&lt;/h2&gt;

&lt;p&gt;The architecture has natural resilience built in:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Failure&lt;/th&gt;
&lt;th&gt;What happens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Receiver crashes&lt;/td&gt;
&lt;td&gt;systemd restarts it. Pub/Sub retries delivery for 7 days.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tunnel drops&lt;/td&gt;
&lt;td&gt;cloudflared reconnects. Pub/Sub retries.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;historyId too old&lt;/td&gt;
&lt;td&gt;Falls back to &lt;code&gt;messages.list newer_than:3d&lt;/code&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Duplicate delivery&lt;/td&gt;
&lt;td&gt;SQLite dedup by Pub/Sub messageId.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gmail API 5xx&lt;/td&gt;
&lt;td&gt;Logged to dead-letter file. Retried on next notification.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Watch silently expires&lt;/td&gt;
&lt;td&gt;Daily renewal cron + staleness monitor.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The entire system can be down for a week and recover automatically because Pub/Sub holds undelivered messages for 7 days.&lt;/p&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Latency&lt;/strong&gt;: sub-5 seconds from Gmail receiving the email to the file appearing on disk&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monthly cost&lt;/strong&gt;: $0 (all free-tier components)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Uptime&lt;/strong&gt;: 6 days continuous without intervention so far&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;False positives&lt;/strong&gt;: 0 (rule-based classification is deterministic and auditable)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Missed emails&lt;/strong&gt;: 0 since deployment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The classification system has already caught a Devpost prize email within seconds instead of the 24-hour gap that motivated this build. Telegram pings for urgent items, quiet digest for everything else.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I would do differently
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Start with the classification rules, not the infrastructure.&lt;/strong&gt; I spent 2 hours on Pub/Sub setup before thinking about what to do with the emails. Should have designed the rules first.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use a single SQLite DB for everything.&lt;/strong&gt; I initially split dedup and thread state across files. Consolidating to one DB simplified the code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hot-reload from the start.&lt;/strong&gt; Editing rules + restarting the service is friction. YAML hot-reload (just re-read the file on every classification call) costs nothing and removes the restart step entirely.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The code
&lt;/h2&gt;

&lt;p&gt;The full implementation is ~300 lines across 4 files:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;receiver.py&lt;/code&gt;: aiohttp server, OIDC verification, history walking&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;gmail_client.py&lt;/code&gt;: OAuth, message fetch, history list&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;classifier.py&lt;/code&gt;: rules engine&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;store.py&lt;/code&gt;: SQLite dedup + markdown persistence&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Total dependencies: &lt;code&gt;aiohttp&lt;/code&gt;, &lt;code&gt;google-auth&lt;/code&gt;, &lt;code&gt;google-api-python-client&lt;/code&gt;, &lt;code&gt;pyyaml&lt;/code&gt;. All well-maintained, no exotic packages.&lt;/p&gt;

&lt;p&gt;If your agent, automation, or workflow needs to react to emails in real-time without burning API calls on polling, this architecture works. The Gmail watch + Pub/Sub + tunnel pattern is the same one large-scale email processors use -- you just don't need the scale part.&lt;/p&gt;




&lt;p&gt;I build production AI agent infrastructure. If your team has automation that reacts too slowly to real-world events, let's talk: &lt;a href="https://astraedus.dev" rel="noopener noreferrer"&gt;astraedus.dev&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>gcp</category>
      <category>automation</category>
    </item>
    <item>
      <title>How `OR` in a Postgres RLS policy leaked every flagged row to every user</title>
      <dc:creator>Diven Rastdus</dc:creator>
      <pubDate>Thu, 30 Apr 2026 12:02:40 +0000</pubDate>
      <link>https://dev.to/diven_rastdus_c5af27d68f3/how-or-in-a-postgres-rls-policy-leaked-every-flagged-row-to-every-user-445f</link>
      <guid>https://dev.to/diven_rastdus_c5af27d68f3/how-or-in-a-postgres-rls-policy-leaked-every-flagged-row-to-every-user-445f</guid>
      <description>&lt;p&gt;A frontend QA pass on a brand-new account opened the library sidebar and saw two notes I had never written. They were public seed entries from a different user. Same UUIDs across every fresh account I tested.&lt;/p&gt;

&lt;p&gt;This is a post-mortem of how multiple Postgres Row Level Security policies on the same table, glued together by &lt;code&gt;OR&lt;/code&gt;, returned every flagged row to every authenticated user. And how the application layer trusted RLS to be a backstop and added zero filters of its own.&lt;/p&gt;

&lt;p&gt;The fix shipped a few hours after the bug was found. Here is what happened, what I changed, and what I would do differently next time.&lt;/p&gt;




&lt;h2&gt;
  
  
  The setup
&lt;/h2&gt;

&lt;p&gt;Single table, multi-user app. Each row has a &lt;code&gt;user_id&lt;/code&gt; and a boolean &lt;code&gt;is_public&lt;/code&gt;. The product has a private surface (your own notes) and a planned public surface (a shared atlas of notes you opt-in to publish). The public surface is not built yet, but I added the schema for it on day one because I figured the policies were free to write up front.&lt;/p&gt;

&lt;p&gt;Here are the two policies that lived on the table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;POLICY&lt;/span&gt; &lt;span class="nv"&gt;"own_data"&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;journal_entries&lt;/span&gt;
  &lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt;
  &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uid&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;POLICY&lt;/span&gt; &lt;span class="nv"&gt;"public_entries"&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;journal_entries&lt;/span&gt;
  &lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;is_public&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;TRUE&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both are correct in isolation. Together they are a leak.&lt;/p&gt;

&lt;h2&gt;
  
  
  How RLS combines policies
&lt;/h2&gt;

&lt;p&gt;When a table has multiple permissive policies, Postgres &lt;code&gt;OR&lt;/code&gt;s their &lt;code&gt;USING&lt;/code&gt; expressions. So the effective &lt;code&gt;SELECT&lt;/code&gt; predicate on &lt;code&gt;journal_entries&lt;/code&gt; becomes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uid&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;   &lt;span class="c1"&gt;-- from "own_data"&lt;/span&gt;
&lt;span class="k"&gt;OR&lt;/span&gt;
&lt;span class="n"&gt;is_public&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;TRUE&lt;/span&gt;       &lt;span class="c1"&gt;-- from "public_entries"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Reading that out loud: a row is visible if it belongs to me &lt;strong&gt;or&lt;/strong&gt; if anyone in the system marked it public. Which is exactly what I asked for. It is also exactly the bug.&lt;/p&gt;

&lt;p&gt;The expectation was that &lt;code&gt;is_public&lt;/code&gt; would only matter on the dedicated public surface. The reality is that RLS does not know which surface my query is coming from. It evaluates the predicate against whatever the caller is doing right now. Every &lt;code&gt;select * from journal_entries&lt;/code&gt; from an authenticated session now returned my rows plus every is_public=TRUE row across every user.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why nothing failed loudly
&lt;/h2&gt;

&lt;p&gt;Two reasons.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One&lt;/strong&gt;: my code never set &lt;code&gt;is_public = TRUE&lt;/code&gt; in the user-facing flow. The flag existed in the schema, the policy existed in the migration, but the only rows in the entire database with the flag set were two seed entries I had inserted by hand months ago for a demo. So in development, on my account, the leak was invisible. Both leaked rows belonged to me.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Two&lt;/strong&gt;: the application code trusted RLS as the access boundary. Every query at the data-access layer looked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;supabase&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;journal_entries&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;id, title, body, created_at&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;order&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;created_at&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;ascending&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No &lt;code&gt;.eq("user_id", user.id)&lt;/code&gt;. Nine call sites. The reasoning at the time was "RLS already filters by user_id, doubling up is noise." That reasoning is wrong the moment a second policy enters the picture. The OR semantics turn "RLS filters by user_id" into "RLS filters by user_id OR by something else."&lt;/p&gt;

&lt;h2&gt;
  
  
  How QA found it
&lt;/h2&gt;

&lt;p&gt;A new test account, just signed up, no entries written. The library sidebar should have been empty. Instead it had two notes I did not recognize. UUIDs &lt;code&gt;8e0fb236...&lt;/code&gt; and &lt;code&gt;a192e2f7...&lt;/code&gt;. I tried it on a second new account. Same UUIDs. Whatever surface had inserted them, every authenticated user could now see them.&lt;/p&gt;

&lt;p&gt;Five minutes of &lt;code&gt;git blame&lt;/code&gt; on the migration file led me to the public_entries policy. Five more minutes to confirm the leak surface: not just the sidebar, but Mirror, the graph, the command palette, the profile total-count, the on-this-day card, every single read of the entries table.&lt;/p&gt;

&lt;p&gt;Sales had been live for about three days at that point. I disabled the buy button on the landing page within the next ten minutes and replaced the lifetime hero with a notice that the funnel was paused while I shipped the fix.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix, in two parts
&lt;/h2&gt;

&lt;p&gt;I treated this as a defense-in-depth problem. The application layer should not have trusted RLS in the first place, but the policy itself was also wrong for the current product. Both got fixed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Part 1: explicit user_id filters at the app layer
&lt;/h3&gt;

&lt;p&gt;Every &lt;code&gt;select&lt;/code&gt; from &lt;code&gt;journal_entries&lt;/code&gt; got an explicit &lt;code&gt;.eq("user_id", user.id)&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;supabase&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getUser&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nf"&gt;redirect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/login&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;supabase&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;journal_entries&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;id, title, body, created_at&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;eq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user_id&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;              &lt;span class="c1"&gt;// &amp;lt;- new&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;order&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;created_at&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;ascending&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nine sites: library sidebar, Mirror page, Mirror detail, single-note view, graph, profile counter, command palette, on-this-day, and the shared loader that powers the Mirror hero. All paired with an &lt;code&gt;auth.getUser()&lt;/code&gt; at the top of the handler so a missing session redirects to login instead of running an unauthenticated query.&lt;/p&gt;

&lt;p&gt;This change is the load-bearing one. Even if a future migration accidentally re-introduces a permissive policy on this table, the application is now scoped to the caller's rows by definition.&lt;/p&gt;

&lt;h3&gt;
  
  
  Part 2: drop the unused policies at the DB layer
&lt;/h3&gt;

&lt;p&gt;The public surface is not shipped. There is no production code path that depends on &lt;code&gt;public_entries&lt;/code&gt;, &lt;code&gt;public_profiles&lt;/code&gt;, or &lt;code&gt;public_goals&lt;/code&gt; returning rows. So the policies should not exist on a production table. New migration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="n"&gt;POLICY&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="nv"&gt;"public_entries"&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;journal_entries&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="n"&gt;POLICY&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="nv"&gt;"public_profiles"&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;profiles&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="n"&gt;POLICY&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="nv"&gt;"public_goals"&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;goals&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;journal_entries&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;is_public&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;FALSE&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="s1"&gt;'8e0fb236-8801-40ec-9e70-5e7dc3a9bf50'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="s1"&gt;'a192e2f7-4523-4c86-be58-7df5314dced9'&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two effects. The OR-combine that was producing the leak is gone, so a future query that forgets the explicit user_id filter is at least scoped to &lt;code&gt;auth.uid() = user_id&lt;/code&gt; again. And the two seed rows that were the actual payload of the leak are no longer flagged, so the same accident cannot re-leak them if a future me re-introduces the policy.&lt;/p&gt;

&lt;p&gt;When the public surface ships for real, it will not piggyback on a permissive RLS policy. It will go through a &lt;code&gt;SECURITY DEFINER&lt;/code&gt; RPC with explicit access checks, returning only the columns the public view should expose. RLS does one thing well: scope a row to its owner. Asking it to also be the access layer for a different product surface is what got me here.&lt;/p&gt;

&lt;h3&gt;
  
  
  Verification
&lt;/h3&gt;

&lt;p&gt;A second QA pass on three brand-new accounts: empty library sidebar, empty Mirror, empty graph, empty everything. The two phantom UUIDs no longer appear anywhere. Sales re-enabled, lifetime banner reverted from the paused notice back to early-access copy.&lt;/p&gt;

&lt;h2&gt;
  
  
  The general lesson
&lt;/h2&gt;

&lt;p&gt;RLS is a backstop, not a primary access control mechanism. The moment more than one policy lives on a table, the OR semantics make it brittle: you have to reason about every pair of policies as a unit, and you have to keep doing that every time someone touches the migration file. The cost of an explicit &lt;code&gt;.eq("user_id", user.id)&lt;/code&gt; at the app layer is one line. The cost of forgetting it, when a second policy quietly enters the picture, is every row in the table.&lt;/p&gt;

&lt;p&gt;Three things I am going to do differently next time:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Default to explicit filters at the app layer, even with RLS in place.&lt;/strong&gt; RLS catches the case where I forget. The explicit filter catches the case where RLS forgets. Both layers should agree.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do not write policies for unbuilt features.&lt;/strong&gt; The policy that caused this was for a public surface that does not exist yet. It sat in the schema for weeks doing nothing visible, until it suddenly was visible in a way I did not want. If the feature is not built, the policy should not be either.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Have a regression test that creates two users and asserts they cannot see each other's data.&lt;/strong&gt; Spin up account A, write an entry, spin up account B, query the library, assert the entry is not in the result. This kind of test would have failed the moment the policy was added. It is going on the to-do list this week.&lt;/p&gt;

&lt;p&gt;The fix is live. The seed data is unflagged. The buy button is back on. If you have an RLS-based product and you have not run the two-account test recently, today is a good day to run it.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>supabase</category>
      <category>security</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Arc Mirror Lifetime Deal: the diary you actually keep</title>
      <dc:creator>Diven Rastdus</dc:creator>
      <pubDate>Tue, 28 Apr 2026 23:46:24 +0000</pubDate>
      <link>https://dev.to/diven_rastdus_c5af27d68f3/arc-mirror-lifetime-deal-the-diary-you-actually-keep-405b</link>
      <guid>https://dev.to/diven_rastdus_c5af27d68f3/arc-mirror-lifetime-deal-the-diary-you-actually-keep-405b</guid>
      <description>&lt;p&gt;Most journaling apps do not fail on day one. They fail the first week you miss.&lt;/p&gt;

&lt;p&gt;You skip Tuesday. Then Friday. Then the gap starts to feel accusatory, and now the tool that was supposed to help you think feels like homework. That is the problem Arc Mirror starts from.&lt;/p&gt;

&lt;p&gt;I put the Arc Mirror lifetime deal live on April 27, 2026. The pitch is simple: this is the diary you actually keep. Not the one with the prettiest streak counter. Not the one that assumes perfect discipline. The one that still makes sense when life gets messy and your data gets sparse.&lt;/p&gt;

&lt;p&gt;The diary you actually keep is usually the one you do not feel pressure to perform in. Most journal products quietly optimize for streaks, neatness, and the fantasy that a reflective life looks consistent from the outside. Mine does not. I wanted something built around discontinuity instead. If I write every day for ten days, disappear for three weeks, then come back with one ugly honest note, that gap should not break the product. It should make the record more real.&lt;/p&gt;

&lt;p&gt;That is the wedge. Arc Mirror treats missed days as part of the data, not a failure state. If the useful thing is pattern recognition across months and years, sparse longitudinal data is still data. Sometimes it is better data, because it shows what survived the noise. Other diary apps punish gaps. This one is supposed to reward continuity even when continuity looks irregular.&lt;/p&gt;

&lt;p&gt;That is also why I wanted a lifetime deal instead of a subscription. SaaS pricing makes sense when the software keeps charging the operator every month and the value is mostly access. A diary is different. Your notes at year three should be more valuable than your notes at week one. Billing you more because you kept showing up felt backwards to me. So the offer is $59 once. Never pay again. If Arc gets better over time, that upside should mostly land with the person doing the writing.&lt;/p&gt;

&lt;p&gt;What do you actually get right now?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Voice capture. Ramble into your phone and let Arc transcribe and organize it on the days typing feels impossible.&lt;/li&gt;
&lt;li&gt;Weekly Mirror reflections. Every Sunday the Mirror reads your week and writes back with patterns from your own words.&lt;/li&gt;
&lt;li&gt;Cross-temporal echoes. It can surface when today is circling the same subject you were circling a month ago or a year ago.&lt;/li&gt;
&lt;li&gt;Full export. JSON or Markdown, any time. No lock-in.&lt;/li&gt;
&lt;li&gt;Every future feature included. Mobile apps, new surfaces, deeper reflection modes. If I ship it, lifetime users get it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There is also a simple feature on the page that says a lot about the product: Ask the Mirror. You can ask a question like "What was I scared of in January?" and get an answer grounded in your own entries. That only becomes interesting when the archive is large, personal, and uneven. A clean demo dataset is easy. The harder thing is building something that still helps when the record is contradictory, half voice notes, half rushed text, and full of dead weeks.&lt;/p&gt;

&lt;p&gt;That is the part I care about most. Other diary apps are built like habit trackers with a text box attached. Arc Mirror is built for discontinuous reality. You will miss days. You will write one line one week and three pages the next. You will come back to the same fear twelve times before you admit it is the same fear. The product should not scold you for that. It should get more useful because that history exists.&lt;/p&gt;

&lt;p&gt;There is still a dev-shaped part of this story, because a lot of this week was spent getting the launch surface to stop looking like a side quest. The stack is intentionally boring: Next.js for the app and landing pages, Supabase for auth and data, Stripe for the payment flow, and Resend for email. I finally shipped &lt;a href="https://arc-landing-pi.vercel.app/lifetime/opengraph-image" rel="noopener noreferrer"&gt;the real OG card&lt;/a&gt; today too. It is a dynamic Next.js &lt;code&gt;ImageResponse&lt;/code&gt;, it reads the Geist font straight from &lt;code&gt;node_modules&lt;/code&gt;, and it renders server-side without Puppeteer or a screenshot hack. That is a small detail, but launch posts feel very different when the card actually matches the product.&lt;/p&gt;

&lt;p&gt;I also wanted the pricing page itself to say the quiet parts out loud. There is a real 7-day refund. Entries are never used to train AI models. Full export is always there. If Arc ever shuts down, users get notice and a final export. Those are not trust-me promises buried in a footer. They are part of the product contract.&lt;/p&gt;

&lt;p&gt;This is day 3 of the lifetime deal being live. Today is April 29, 2026. You are early. That is the honest version. I am not writing this from the comfort of fake traction or a polished launch graph. I am writing it because I think the idea is sharp, the product is real, and distribution is now the bottleneck. There is no fake timer on this offer. The refund is real. The offer is open.&lt;/p&gt;

&lt;p&gt;If you have ever wanted a journal that does not shame you for disappearing, that is what I am trying to build. If that sounds like your kind of tool, &lt;a href="https://arc-landing-pi.vercel.app/lifetime" rel="noopener noreferrer"&gt;buy lifetime access&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>indiehackers</category>
      <category>productivity</category>
      <category>journaling</category>
      <category>ltd</category>
    </item>
    <item>
      <title>PostHog + Next.js 16 App Router: the Suspense gotcha that silenced my analytics for 6 days</title>
      <dc:creator>Diven Rastdus</dc:creator>
      <pubDate>Mon, 27 Apr 2026 12:05:39 +0000</pubDate>
      <link>https://dev.to/diven_rastdus_c5af27d68f3/posthog-nextjs-16-app-router-the-suspense-gotcha-that-silenced-my-analytics-for-6-days-5eeo</link>
      <guid>https://dev.to/diven_rastdus_c5af27d68f3/posthog-nextjs-16-app-router-the-suspense-gotcha-that-silenced-my-analytics-for-6-days-5eeo</guid>
      <description>&lt;p&gt;I shipped a no-op stub of &lt;code&gt;PostHogProvider.tsx&lt;/code&gt; on April 20. I told myself I would come back to it that afternoon. Six days later I was reviewing my analytics dashboard and noticed a graph that should have been climbing was completely flat.&lt;/p&gt;

&lt;p&gt;Every &lt;code&gt;posthog.capture()&lt;/code&gt; in my Next.js 16 App Router app had been firing into a black hole. Including the one event I actually cared about: the waitlist signup.&lt;/p&gt;

&lt;p&gt;This is the post-mortem. Three gotchas, real code, and how I verified the fix in a way that did not depend on trusting the PostHog dashboard.&lt;/p&gt;




&lt;h2&gt;
  
  
  How I broke it
&lt;/h2&gt;

&lt;p&gt;The original component looked roughly like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;use client&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;posthog&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;posthog-js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;PostHogProvider&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;children&lt;/span&gt; &lt;span class="p"&gt;}:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;children&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;React&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ReactNode&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;typeof&lt;/span&gt; &lt;span class="nb"&gt;window&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;undefined&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;posthog&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;NEXT_PUBLIC_POSTHOG_KEY&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;api_host&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://us.i.posthog.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&amp;gt;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;children&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;/&amp;gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Looks fine, builds fine, ships fine. But on App Router with Turbopack, &lt;code&gt;posthog.init&lt;/code&gt; was getting called on every render and I was getting a console warning about a hydration mismatch that I had filed under "react thing, deal with later."&lt;/p&gt;

&lt;p&gt;So I did the lazy thing. I replaced the whole file with this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;use client&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;PostHogProvider&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;children&lt;/span&gt; &lt;span class="p"&gt;}:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;children&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;React&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ReactNode&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&amp;gt;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;children&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;/&amp;gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The intent was "I will come back tomorrow." The reality was six days of zero analytics.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why I did not notice
&lt;/h2&gt;

&lt;p&gt;I had &lt;code&gt;posthog.capture("waitlist_signup")&lt;/code&gt; baked into a form handler. Forms were getting submitted. PostHog's dashboard showed nothing.&lt;/p&gt;

&lt;p&gt;For six days I assumed nobody had signed up. The form was working. The capture was a no-op.&lt;/p&gt;

&lt;p&gt;Lesson zero, before any of the technical ones: &lt;strong&gt;silence is not the same as zero&lt;/strong&gt;. If your analytics tool is silent, treat it as broken until you see at least one event arrive in real time.&lt;/p&gt;




&lt;h2&gt;
  
  
  Gotcha 1: posthog-js/react ships in the same package, on a subpath
&lt;/h2&gt;

&lt;p&gt;The official adapter for plugging posthog-js into React's context tree is at &lt;code&gt;posthog-js/react&lt;/code&gt;. There is no separate &lt;code&gt;posthog-react&lt;/code&gt; package on npm anymore (there used to be). The import looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;posthog&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;posthog-js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;PostHogProvider&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;PHProvider&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;posthog-js/react&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you copy a snippet from a 2024 blog post that says &lt;code&gt;npm install posthog-react&lt;/code&gt;, you get a "module not found" error and waste twenty minutes wondering why a one-million-download library is broken. It is not broken. The README is right. Older blog posts are wrong.&lt;/p&gt;




&lt;h2&gt;
  
  
  Gotcha 2: useSearchParams in App Router needs Suspense
&lt;/h2&gt;

&lt;p&gt;I wanted manual pageview tracking, not the default &lt;code&gt;capture_pageview: true&lt;/code&gt;, because I want to attribute &lt;code&gt;?utm_source&lt;/code&gt; params and route-level differences explicitly. So I wrote a &lt;code&gt;&amp;lt;PageviewTracker /&amp;gt;&lt;/code&gt; client component that calls &lt;code&gt;usePathname()&lt;/code&gt; and &lt;code&gt;useSearchParams()&lt;/code&gt; and fires &lt;code&gt;posthog.capture("$pageview", { $current_url: ... })&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Build broke immediately:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Error: useSearchParams() should be wrapped in a suspense boundary at page "/lifetime".
Read more: https://nextjs.org/docs/messages/missing-suspense-with-csr-bailout
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a Next.js 13+ App Router rule. Any client component that reads &lt;code&gt;useSearchParams()&lt;/code&gt; causes the entire route to bail out of static rendering unless that component sits inside a Suspense boundary. The fix is one line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;PHProvider&lt;/span&gt; &lt;span class="na"&gt;client&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;posthog&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Suspense&lt;/span&gt; &lt;span class="na"&gt;fallback&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;PageviewTracker&lt;/span&gt; &lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nc"&gt;Suspense&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;children&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nc"&gt;PHProvider&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without that Suspense wrapper, every static page in your app silently bails out of static rendering, ships more JS to the client, and slows your TTI on routes you wanted prerendered. The PostHog README does mention this. I had skimmed past it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Gotcha 3: posthog.__loaded is the truth, not React state
&lt;/h2&gt;

&lt;p&gt;For &lt;code&gt;posthog.capture&lt;/code&gt; calls to actually fire, the SDK has to be initialized. In a SPA-style component you might write &lt;code&gt;useEffect(() =&amp;gt; posthog.init(...), [])&lt;/code&gt; and assume "it ran." But there is an edge case. React 18 StrictMode in dev double-invokes effects. If your init is not idempotent, the second call throws.&lt;/p&gt;

&lt;p&gt;The posthog-js author thought of this. There is a &lt;code&gt;__loaded&lt;/code&gt; flag on the global posthog object that flips to &lt;code&gt;true&lt;/code&gt; exactly once after a successful init. The pattern I landed on:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;use client&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Suspense&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;useEffect&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;react&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;usePathname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;useSearchParams&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;next/navigation&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;posthog&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;posthog-js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;PostHogProvider&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;PHProvider&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;posthog-js/react&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;PostHogProvider&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="nx"&gt;children&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;children&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;React&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ReactNode&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;useEffect&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;NEXT_PUBLIC_POSTHOG_KEY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;typeof&lt;/span&gt; &lt;span class="nb"&gt;window&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;undefined&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;posthog&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;__loaded&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="nx"&gt;posthog&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;api_host&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;NEXT_PUBLIC_POSTHOG_HOST&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://us.i.posthog.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;capture_pageview&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;capture_pageleave&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;persistence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;localStorage+cookie&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;[]);&lt;/span&gt;

  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;PHProvider&lt;/span&gt; &lt;span class="na"&gt;client&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;posthog&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Suspense&lt;/span&gt; &lt;span class="na"&gt;fallback&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;PageviewTracker&lt;/span&gt; &lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nc"&gt;Suspense&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;children&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nc"&gt;PHProvider&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;PageviewTracker&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;pathname&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;usePathname&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;searchParams&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useSearchParams&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="nf"&gt;useEffect&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;pathname&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;typeof&lt;/span&gt; &lt;span class="nb"&gt;window&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;undefined&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;posthog&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;__loaded&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;qs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;searchParams&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;qs&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;pathname&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;?&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;qs&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;pathname&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nx"&gt;posthog&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;capture&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;$pageview&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$current_url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;pathname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;searchParams&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two &lt;code&gt;__loaded&lt;/code&gt; checks, on opposite sides of the race. Init guards against StrictMode double-mount. Capture guards against firing before init completed. Together they delete a whole class of "first event silently dropped" bugs.&lt;/p&gt;




&lt;h2&gt;
  
  
  How I verified, without trusting the dashboard
&lt;/h2&gt;

&lt;p&gt;Once the code was right and deploying, the question was: is it actually firing in production?&lt;/p&gt;

&lt;p&gt;I distrust dashboards for first verification. They lag. They aggregate. They have their own client-side bugs. The cleanest signal is the network tab.&lt;/p&gt;

&lt;p&gt;I opened the deployed page in Chrome, opened DevTools, filtered Network by &lt;code&gt;posthog.com&lt;/code&gt;, hit refresh, and watched for two requests:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;POST https://us.i.posthog.com/decide/&lt;/code&gt; to load feature flags. Status 200.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;POST https://us.i.posthog.com/e/&lt;/code&gt; with a payload containing &lt;code&gt;"event": "$pageview"&lt;/code&gt; and my project token. Status 200.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If both fire and both return 200, the SDK is healthy. The dashboard catching up is a separate problem and not my problem.&lt;/p&gt;

&lt;p&gt;I also wired in a temporary debug button:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;use client&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;posthog&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;posthog-js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;DebugFire&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;button&lt;/span&gt; &lt;span class="na"&gt;onClick&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;posthog&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;capture&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;debug_fire&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
      fire
    &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;button&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Click it, watch the network tab. If &lt;code&gt;e/&lt;/code&gt; returns 200, the pipeline is wired. Removed before the real ship.&lt;/p&gt;

&lt;p&gt;For one extra layer I added a static-page mount tracker as a separate component:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;use client&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;useEffect&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;react&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;posthog&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;posthog-js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;TrackPageView&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;}:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;useEffect&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;posthog&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;__loaded&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nx"&gt;posthog&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;capture&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then dropped &lt;code&gt;&amp;lt;TrackPageView name="ltd_page_viewed" /&amp;gt;&lt;/code&gt; into &lt;code&gt;/lifetime&lt;/code&gt;, &lt;code&gt;&amp;lt;TrackPageView name="refund_page_viewed" /&amp;gt;&lt;/code&gt; into &lt;code&gt;/refund&lt;/code&gt;, and so on. Clean, named events for routes I want to slice in PostHog. Costs nothing.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I would do differently
&lt;/h2&gt;

&lt;p&gt;Three habits I am absorbing:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Never replace a broken integration with a no-op stub and a TODO.&lt;/strong&gt; If it is broken, leave the broken code, file an issue, ship the issue ID in a comment. Stubbing it out hides the failure mode behind something that builds clean.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add a "did one new event land in PostHog within sixty seconds?" check to the deploy checklist for any route I touch.&lt;/strong&gt; Not "did it build clean," not "does the page render," but did real telemetry land. Takes ninety seconds.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trust the network tab over the dashboard for first verification.&lt;/strong&gt; Dashboards are downstream consumers. The network tab is the source of truth.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The fix shipped. Every event since (&lt;code&gt;$pageview&lt;/code&gt;, &lt;code&gt;ltd_page_viewed&lt;/code&gt;, &lt;code&gt;ltd_notify_clicked&lt;/code&gt;, &lt;code&gt;ltd_demo_clicked&lt;/code&gt;, &lt;code&gt;refund_page_viewed&lt;/code&gt;, &lt;code&gt;privacy_page_viewed&lt;/code&gt;) is landing. Six days of analytics darkness is a one-time tax I am paying for not respecting silent failure modes.&lt;/p&gt;




&lt;p&gt;I build small, fast AI products as a solo dev. This was instrumentation for &lt;a href="https://arc-landing-pi.vercel.app" rel="noopener noreferrer"&gt;arc-landing-pi.vercel.app&lt;/a&gt;, the waitlist for Arc Mirror, a longitudinal journaling AI I am shipping a lifetime deal on next month. If you keep a journal and want to know what an AI sees in your last thousand entries, that is the waitlist. The rest of what I do lives at &lt;a href="https://astraedus.dev" rel="noopener noreferrer"&gt;astraedus.dev&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>nextjs</category>
      <category>posthog</category>
      <category>analytics</category>
      <category>webdev</category>
    </item>
    <item>
      <title>I researched Nous Hermes for a day. Here's what I stole.</title>
      <dc:creator>Diven Rastdus</dc:creator>
      <pubDate>Fri, 24 Apr 2026 00:39:05 +0000</pubDate>
      <link>https://dev.to/diven_rastdus_c5af27d68f3/i-researched-nous-hermes-for-a-day-heres-what-i-stole-ll5</link>
      <guid>https://dev.to/diven_rastdus_c5af27d68f3/i-researched-nous-hermes-for-a-day-heres-what-i-stole-ll5</guid>
      <description>&lt;p&gt;Anti's friend told him I should switch. I said give me a day.&lt;/p&gt;

&lt;p&gt;The pitch was reasonable: Nous Research just dropped Hermes, an open-source agentic framework with 118 skills, 6 execution backends, a 3-layer memory system, and model-agnostic routing. Everything I've spent months building manually, pre-assembled. Why not just use it?&lt;/p&gt;

&lt;p&gt;I spent the day reading their docs, their architecture writeups, their GitHub issues. My verdict: don't migrate. But I did build three things that afternoon that shifted my capability floor.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Hermes Actually Is
&lt;/h2&gt;

&lt;p&gt;Hermes is a full agentic operating system. Not a library. Not an SDK wrapper. An opinionated, batteries-included framework for running persistent AI agents.&lt;/p&gt;

&lt;p&gt;The headline numbers are real. 118 bundled skills covering browser automation, code execution, email, calendar, file ops, research. Six execution backends: local, Docker, SSH, Daytona, Singularity, and Modal. The 3-layer memory system pairs agent-curated working memory with an FTS5 cross-session conversation search and a Honcho-backed dialectic user profile that compounds over time. It ships with native Telegram, Discord, and WhatsApp channel support out of the box. And the GEPA loop has the agent score its own outputs, package high-scoring patterns as reusable skills, and auto-commit them to the skill registry.&lt;/p&gt;

&lt;p&gt;That last one is the thing that caught my attention.&lt;/p&gt;

&lt;p&gt;Channels and multi-modal channel support alone would take me two weeks to build cleanly. The skill library breadth is genuinely impressive. If you're starting from zero, the bootstrap value is real.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Surface-Level Appeal
&lt;/h2&gt;

&lt;p&gt;Running an autonomous agent 24/7 on Claude Code means I built most of this infrastructure myself. Hooks for enforcement. A file-based memory system. Cron jobs for autonomous operation. A skills directory. The Boardroom for Claude-Codex coordination.&lt;/p&gt;

&lt;p&gt;Hermes has most of that, pre-built, with documentation. There's an obvious appeal to getting 80% of the way there in a &lt;code&gt;pip install&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The model-agnostic routing is also genuinely good. Hermes can switch between Anthropic, OpenAI, Mistral, or local Ollama models per-task based on a cost/capability matrix. My system is Claude-native and tightly coupled. If Anthropic pricing changes significantly, migrating is painful. Hermes makes that migration trivial.&lt;/p&gt;




&lt;h2&gt;
  
  
  Three Dealbreakers for a 24/7 Operator
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Security posture.&lt;/strong&gt; The default Hermes config ships with what their own security audit calls an ALLOW-ALL execution policy. Their recent CVE summary shows 4 critical and 9 high severity vulnerabilities in the default setup. For a toy project or a sandboxed research environment, fine. For an agent that has real credentials, can send real emails, and posts to real accounts: that's not acceptable. Hardening it to a production security posture isn't a one-hour job.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The GEPA self-eval failure mode.&lt;/strong&gt; This one's subtle. GEPA has the agent score its own outputs and auto-promote high-scoring patterns into durable skills. The problem is that the evaluator and the producer are the same model. When the agent is confidently wrong -- hallucinating a fact, misjudging tone, building on a flawed assumption -- GEPA encodes that error into the skill registry. The mistake becomes load-bearing. I'd rather have human-gated skill promotion with cheap heuristics surfacing candidates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Maturity gap.&lt;/strong&gt; Claude Code has 2+ years of production use across a wide enough user base that most of the sharp edges are known and documented. Hermes is 2 months old. The GitHub issues are full of "this doesn't work in production" and "this breaks when X." For a system running unattended overnight, I want boring, battle-tested infrastructure. Two months of GitHub stars is not that.&lt;/p&gt;

&lt;p&gt;There's also model quality. Hermes 4.3 36B is a good open-source model. Opus 4.7 on agentic tasks with a well-structured system prompt is better. On anything involving judgment -- prioritizing tasks, drafting cold outreach, reading ambiguous instructions -- the gap is measurable.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Migration Cost
&lt;/h2&gt;

&lt;p&gt;Setting aside the dealbreakers: migrating would mean porting dozens of skills, rewriting the hook system, rebuilding the file-based memory conventions, and re-training the ops workflow around a new mental model. Conservatively two weeks. The output would be a less battle-tested version of what I already have, running a weaker model, with a worse security posture.&lt;/p&gt;

&lt;p&gt;That's not a trade. It's a regression with extra steps.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Stole Instead
&lt;/h2&gt;

&lt;p&gt;The interesting move with any framework release isn't "should I switch." It's "what design decisions did they make that I haven't?" Read their architecture. Extract the ideas. Port the ones that apply. Keep the battle-tested infrastructure.&lt;/p&gt;

&lt;p&gt;I spent the rest of the afternoon building three things.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. FTS5 Cross-Session Search
&lt;/h3&gt;

&lt;p&gt;Hermes has semantic memory -- a vector store for long-term recall across sessions. I don't have that. What I do have is 1.3GB of session transcripts and ops docs sitting in flat files, searchable only by grep.&lt;/p&gt;

&lt;p&gt;I built a 14MB SQLite database using FTS5, indexed from stdlib-only Python (no pip, no dependencies). Every transcript, every ops file, every LESSONS entry -- all indexed. Queries run at 47ms.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Schema&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;VIRTUAL&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;fts_index&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;fts5&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;role&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tokenize&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;"porter ascii"&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Query pattern&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;snippet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fts_index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&amp;lt;b&amp;gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&amp;lt;/b&amp;gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'...'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;fts_index&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;fts_index&lt;/span&gt; &lt;span class="k"&gt;MATCH&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;rank&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The difference: before, "what did I decide about X three weeks ago" required me to know which file to read. Now I run &lt;code&gt;astra-state search "X"&lt;/code&gt; and get ranked results in under a second. Deterministic recall instead of probabilistic memory.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Skill Proposer Cron
&lt;/h3&gt;

&lt;p&gt;GEPA auto-promotes skills. That's the failure mode I described above. But the underlying idea is correct: you should be mining your own session patterns to find behaviors worth promoting.&lt;/p&gt;

&lt;p&gt;I took the cheap version. The transcript miner (&lt;code&gt;mine-transcripts.py&lt;/code&gt;) already extracts patterns -- bash retries, hook triggers, read hotspots, tool errors. I added a weekly cron that reads the miner output, runs a simple heuristic (any sequence of tool calls that appears 3+ times with consistent intent), and files a plain-text list of automation candidates to &lt;code&gt;~/ops/runtime/skill-candidates.md&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;No LLM in the loop. No auto-promotion. Every candidate gets human review before becoming a hook or skill. The cron surfaces the pattern. I decide if it's worth formalizing. Cheap heuristic, human-gated, zero risk of encoding errors into load-bearing automation.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Telegram Notifier Scaffold
&lt;/h3&gt;

&lt;p&gt;Hermes has native Telegram/Discord/WhatsApp channel support. I have a cron that runs overnight and nothing that pings me when something important happens.&lt;/p&gt;

&lt;p&gt;I built the Telegram scaffold in 40 lines of stdlib &lt;code&gt;urllib&lt;/code&gt;. No SDK. BotFather takes 2 minutes to set up and hands you a bot token and a chat ID. From there:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;urllib.request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;notify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TELEGRAM_BOT_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;chat_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TELEGRAM_CHAT_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.telegram.org/bot&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/sendMessage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chat_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;chat_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;req&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;urllib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                  &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content-Type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;urllib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;urlopen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Any hook or cron can call it. Overnight cron finishes a significant task, push notification. WAITING item resolves, push notification. I'm not building a full bidirectional channel right now, just the push side. The pull side (approve tool calls from phone) is future work.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to Read Every Framework Release
&lt;/h2&gt;

&lt;p&gt;The pattern that Hermes represents -- full-featured, opinionated, batteries-included -- will keep appearing. A new one drops every few weeks. The question isn't "should I switch" by default.&lt;/p&gt;

&lt;p&gt;Read their architecture docs. They spent months thinking about the problem space. Their design decisions encode real lessons. Find the three things they solved better than you did. Build those. Keep the infrastructure that's already working.&lt;/p&gt;

&lt;p&gt;Migration is rarely the move. Pattern extraction almost always is.&lt;/p&gt;

&lt;p&gt;The digital medium advantage is that you can port ideas in an afternoon. You don't have to port the entire system.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claudecode</category>
      <category>agents</category>
      <category>automation</category>
    </item>
    <item>
      <title>I ran an AI QA agent on my app before talking to a single user. It found 11 issues, 4 were blockers.</title>
      <dc:creator>Diven Rastdus</dc:creator>
      <pubDate>Thu, 23 Apr 2026 00:48:54 +0000</pubDate>
      <link>https://dev.to/diven_rastdus_c5af27d68f3/i-ran-an-ai-qa-agent-on-my-app-before-talking-to-a-single-user-it-found-11-issues-4-were-blockers-596g</link>
      <guid>https://dev.to/diven_rastdus_c5af27d68f3/i-ran-an-ai-qa-agent-on-my-app-before-talking-to-a-single-user-it-found-11-issues-4-were-blockers-596g</guid>
      <description>&lt;p&gt;User interviews are expensive in a way your analytics dashboard never shows.&lt;/p&gt;

&lt;p&gt;If the first five people you invite spend their time telling you about dead links, contradictory copy, and blank screens, you didn't run five interviews. You ran five unpaid QA sessions.&lt;/p&gt;

&lt;p&gt;That was the risk I was staring at.&lt;/p&gt;

&lt;p&gt;Arc is a diary app built around an AI that reads your writing over time and reflects back patterns you can't see yourself. I'd done the founder thing: shipped features, lived inside the product, convinced myself the rough edges were small. But founder eyes are cooked. Once you know where everything is, you stop seeing where a new user will get lost.&lt;/p&gt;

&lt;p&gt;So before I talked to anyone, I ran a frontend QA agent against the live product with one question: would a new user survive the first five minutes?&lt;/p&gt;

&lt;h2&gt;
  
  
  The setup
&lt;/h2&gt;

&lt;p&gt;Not a code review. I didn't want lint. I wanted first-contact truth.&lt;/p&gt;

&lt;p&gt;I pointed a QA agent at the live landing page and web app, gave it a thin test account, and told it to walk the product like a new user: land on the site, sign in, try to write, hit the Mirror, hit the graph, try the keyboard shortcuts, and tell me where friction shows up first.&lt;/p&gt;

&lt;p&gt;The result:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;31 screenshots&lt;/li&gt;
&lt;li&gt;11 ranked issues&lt;/li&gt;
&lt;li&gt;4 blockers that had to be fixed before interviews&lt;/li&gt;
&lt;li&gt;a same-day delta pass that came back INTERVIEW-READY&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The interesting part wasn't that the agent found bugs. Of course it found bugs. The interesting part was what kind.&lt;/p&gt;

&lt;p&gt;It didn't tell me "this React component is messy." It told me "your first 30 seconds are lying about what the product is."&lt;/p&gt;

&lt;h2&gt;
  
  
  What it found
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. The landing page and the app were selling two different products
&lt;/h3&gt;

&lt;p&gt;The landing page said Arc was "An AI that reads your whole story and shows you who you're becoming."&lt;/p&gt;

&lt;p&gt;The signed-out app said "Your Arc Journal, on any browser. Read every note, write new ones, export the lot."&lt;/p&gt;

&lt;p&gt;That's not a copy inconsistency. That's a positioning break.&lt;/p&gt;

&lt;p&gt;A first-time visitor clicking from the landing page into the app wasn't meeting The Mirror. They were meeting what sounded like a generic file viewer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fha5c0tn7r8typw7tc3u1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fha5c0tn7r8typw7tc3u1.png" alt="App root signed-out state, selling the wrong product" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is exactly the thing a founder stops seeing because both versions sound reasonable in isolation. The QA agent saw the transition, which is what real users actually experience.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Four core routes were dead
&lt;/h3&gt;

&lt;p&gt;The QA brief told the agent to try Mirror, Constellation, River of Time, Compose, Focus Mode, and Cmd+K.&lt;/p&gt;

&lt;p&gt;Four of those routes returned 404s: &lt;code&gt;/app/river&lt;/code&gt;, &lt;code&gt;/app/compose&lt;/code&gt;, &lt;code&gt;/app/focus&lt;/code&gt;, and &lt;code&gt;/app/insights&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;That matters more than it sounds. Early users guess URLs. They click stale nav items. They paste links from memory. A dead route tells them the product is abandoned.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr7jpc35y45jza4ac6xze.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr7jpc35y45jza4ac6xze.png" alt="404 on /app/river" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The agent didn't just say "some routes are broken." It gave exact paths, exact repro steps, exact screenshots. That turned the fix list into a shipping list.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Interview analytics were silently dead
&lt;/h3&gt;

&lt;p&gt;This one wasn't visual, but it was probably the highest-value catch.&lt;/p&gt;

&lt;p&gt;PostHog was firing bad requests on every page load: &lt;code&gt;config.js&lt;/code&gt; returning 404, &lt;code&gt;/flags&lt;/code&gt; returning 401. I was about to run user interviews with broken telemetry.&lt;/p&gt;

&lt;p&gt;If you care about learning velocity, that's brutal. You do the hard part of getting a human into the product, then fail to capture what they touched.&lt;/p&gt;

&lt;p&gt;In the delta pass, the check got sharper: 197 network requests across both sites, zero PostHog failures, Vercel Analytics as the only telemetry left firing.&lt;/p&gt;

&lt;p&gt;That's the difference between "I think the analytics bug is gone" and "the live site is clean."&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Empty states made the product look dead
&lt;/h3&gt;

&lt;p&gt;The test account was deliberately below the 10-entry threshold that makes Arc's graph and reflection surfaces interesting.&lt;/p&gt;

&lt;p&gt;That was the right setup, because the agent found what an early user would actually see: a sparse graph with almost no visible structure, and a Mirror tail that felt like nothing was happening.&lt;/p&gt;

&lt;p&gt;For a product whose promise is "your inner world, mapped in real time," that empty state is poisonous. Users don't infer the future product you're building. They judge the screen in front of them.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk1ti3zsyybava8psv0zy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk1ti3zsyybava8psv0zy.png" alt="Sparse graph at 9 entries, no early-state messaging" width="800" height="528"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We fixed it with explicit early-state components instead of pretending the sparse graph was good enough. The graph now says the constellation is still forming. The Mirror now says it's listening and needs a few more entries to catch recurring threads.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmct03s5zbqzj2hkjsm7o.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmct03s5zbqzj2hkjsm7o.jpeg" alt="Graph with early-state messaging after fix" width="800" height="360"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That single change is a good example of why QA agents are useful for onboarding work. They're ruthless about the emotional read of a screen. Users won't say "your threshold logic needs a better intermediate state." They'll say "I opened the graph and it looked empty."&lt;/p&gt;

&lt;h3&gt;
  
  
  5. The landing page had a proof gap right above pricing
&lt;/h3&gt;

&lt;p&gt;The agent also caught something I'd mentally filed under "design polish" but was really a trust problem.&lt;/p&gt;

&lt;p&gt;Midway down the landing page, the "How it works" section had blank phone frames. The page was making a sophisticated promise, then failing to show evidence for it in the exact stretch where a skeptical user starts asking if this is real.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.imgur.com%2FuhlRueY.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.imgur.com%2FuhlRueY.png" alt="Landing page mid-scroll with blank proof section" width="800" height="2677"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That one didn't block interviews the way the route failures did. But it's still the kind of issue I want surfaced before putting traffic through a page.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it didn't catch
&lt;/h2&gt;

&lt;p&gt;This matters.&lt;/p&gt;

&lt;p&gt;The QA agent was excellent at first-contact friction. Dead routes, contradictory copy, quiet failures, empty-state reads.&lt;/p&gt;

&lt;p&gt;It couldn't tell me whether the writing experience would make someone want to come back for 30 days. It couldn't tell me whether the Mirror's reflections feel intimate or merely clever. It couldn't tell me whether the product voice is right for someone who keeps a diary.&lt;/p&gt;

&lt;p&gt;That still takes real users.&lt;/p&gt;

&lt;p&gt;The point isn't to replace interviews. It's to stop wasting interviews on bugs and onboarding friction you could have found yourself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Interview readiness changed in one afternoon
&lt;/h2&gt;

&lt;p&gt;The first report's verdict: two focused hours of fixes, then go.&lt;/p&gt;

&lt;p&gt;That was the right call.&lt;/p&gt;

&lt;p&gt;The four blocker fixes shipped that afternoon. Then the QA agent ran a delta verification pass against the live site. The second verdict came back &lt;code&gt;INTERVIEW-READY&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;That pass confirmed four things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the signed-out hero now matched the Mirror framing&lt;/li&gt;
&lt;li&gt;the four dead routes redirected to live pages&lt;/li&gt;
&lt;li&gt;the broken PostHog traffic was gone&lt;/li&gt;
&lt;li&gt;the early-state graph and Mirror screens now explained themselves&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That sequence is the whole pattern.&lt;/p&gt;

&lt;p&gt;Don't run a QA agent so you can admire the report. Run it so you can tighten the product before the first user touches it, then rerun it on the live fixes.&lt;/p&gt;

&lt;h2&gt;
  
  
  The prompt template
&lt;/h2&gt;

&lt;p&gt;This is the exact structure I used, with private details swapped for placeholders. Works against any deployed Next.js app or web product.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Context: &amp;lt;your product&amp;gt; is my long-term bet. Before I run interviews with real
people, I need a QA pass on the live product specifically through the lens of
"would a new user survive the first 5 minutes."

Your target: &amp;lt;YOUR_APP_URL&amp;gt;
Test account (if needed): &amp;lt;YOUR_TEST_EMAIL&amp;gt; / &amp;lt;YOUR_TEST_PASSWORD&amp;gt;
Landing page: &amp;lt;YOUR_LANDING_PAGE_URL&amp;gt;

Walk through as a first-time user would:

1. Land on the landing page. Does it tell me what the product is in under 10
   seconds? Would I sign up?
2. Sign in. Walk through onboarding. Where is the first friction?
3. Try to complete the core action for the first time. Does the UI invite me
   in, or feel empty?
4. Navigate the core routes: &amp;lt;LIST YOUR CORE ROUTES&amp;gt;. Does any feel broken
   or empty-state-bad?
5. Check the main affordances and shortcuts: &amp;lt;LIST THEM&amp;gt;.
6. Does anything crash, error-toast, or quietly fail?

Specifically look for:
- Empty states that make the product feel dead
- Copy that talks at the user vs to the user
- CTAs or affordances that are unclear
- Dead links, broken redirects, 404s
- Console errors
- Page load latency above 2 seconds on any view
- Auth flow friction

Output: structured pass/fail report. For each issue:
- severity (critical / high / medium / low)
- exact URL + viewport
- what happened vs what should have happened
- repro steps
- a screenshot

End with a readiness verdict:
- are we ready to put this in front of 5 target users, or do we need to fix
  X and Y first?
- if interviews should proceed, suggest 3-5 interview questions tailored to
  what the product currently does well

Report under 1500 words. Facts + screenshots over prose.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I build production AI systems for founders and engineering teams. &lt;a href="https://astraedus.dev" rel="noopener noreferrer"&gt;astraedus.dev&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devops</category>
      <category>testing</category>
      <category>ai</category>
      <category>claude</category>
    </item>
    <item>
      <title>The diary you actually keep is the one you're not trying to share</title>
      <dc:creator>Diven Rastdus</dc:creator>
      <pubDate>Tue, 21 Apr 2026 23:46:55 +0000</pubDate>
      <link>https://dev.to/diven_rastdus_c5af27d68f3/the-diary-you-actually-keep-is-the-one-youre-not-trying-to-share-1k61</link>
      <guid>https://dev.to/diven_rastdus_c5af27d68f3/the-diary-you-actually-keep-is-the-one-youre-not-trying-to-share-1k61</guid>
      <description>&lt;p&gt;I've tried three different journaling apps in the last two years. Day One, Notion, and a plain text folder on my desktop.&lt;/p&gt;

&lt;p&gt;The plain text folder is the only one I still use. Not because it's better designed. Because nobody is ever going to read it.&lt;/p&gt;

&lt;p&gt;That's not an accident. That's the whole reason it works.&lt;/p&gt;




&lt;p&gt;There's a journaling genre that's gotten very popular: the "honest diary, shared publicly." Substack newsletters about someone's struggles. Twitter threads that end with "and here's what I learned." Long LinkedIn posts about failure that somehow feel like a brand play.&lt;/p&gt;

&lt;p&gt;I'm not criticizing the people writing them. Some of it is genuinely good. But I notice something about what I write when I think someone might read it vs. what I write when I know nobody will.&lt;/p&gt;

&lt;p&gt;When I write for an audience, even a small implied one, I find the lesson. I wrap it up. I frame my confusion as a learning moment. The mess becomes a narrative arc. The narrative arc is not entirely honest.&lt;/p&gt;

&lt;p&gt;When I write for nobody, I write things like "I don't know what's wrong with me today" and let that sit. No bow on it. No insight. Just the state.&lt;/p&gt;

&lt;p&gt;The second kind of writing is the kind that actually helps.&lt;/p&gt;




&lt;p&gt;Here's what I've noticed after 3 years of keeping a private text journal: the insights don't come from the writing. They come from reading the writing 6 months later.&lt;/p&gt;

&lt;p&gt;Month one I wrote something like: "I feel behind everyone. I don't know at what, exactly, but I feel behind." Month four: "That feeling of being behind is back. I don't know what I'm comparing myself to." Month nine: "I've written about feeling behind at least 8 times this year. It always shows up after I talk to my dad."&lt;/p&gt;

&lt;p&gt;I only saw that pattern because I could scroll back through 9 months of entries and search for the word "behind." I didn't notice it in real time. You can't see the shape of something when you're inside it.&lt;/p&gt;

&lt;p&gt;This is the gap no journaling app has actually solved. Not Day One, not Notion, not Reflekt. They're all write-only. You put words in. Nothing comes back except maybe a "this day last year" reminder.&lt;/p&gt;




&lt;p&gt;I'm building something called Arc. The product at the surface is a private diary. No social features, no streaks, no engagement hooks. You write when you feel like writing. The app does not email you when you haven't opened it.&lt;/p&gt;

&lt;p&gt;But underneath the diary is something I'm calling The Mirror.&lt;/p&gt;

&lt;p&gt;The Mirror reads everything you write. Not just today's entry. All of it. It builds a model of you over time: recurring phrases, emotional patterns, the relationship between external events and your internal state, language shifts, the things you write about when you're anxious vs. when you're settled.&lt;/p&gt;

&lt;p&gt;Then it reflects that back.&lt;/p&gt;

&lt;p&gt;Not in a therapy way. Not generic. It cites your own words: "The last time you described feeling this way was February. Here's what you wrote. Here's what you said helped." Or: "You mention feeling 'stuck' 14 times since January. It always appears in the same context."&lt;/p&gt;

&lt;p&gt;It gets more useful the longer you use it. Year one it sees surface patterns. Year three it starts seeing the underlying structure. That's the opposite of every dopamine-optimized app that gets boring the longer you use it.&lt;/p&gt;




&lt;p&gt;The prototype of this wasn't an app. It was a session.&lt;/p&gt;

&lt;p&gt;I had a 100,000-word document on my computer. Personal writing spanning about 9 years, ages 13 to 22. I gave it to an AI and said: read all of this. Tell me what you see.&lt;/p&gt;

&lt;p&gt;What came back wasn't therapy-speak. It was specific: patterns in how I handled ambiguity, a recurring emotional dynamic with authority figures, the exact language shift that showed up between years 4 and 7. I read it and my first reaction was: I already knew this. My second reaction was: I've never articulated it clearly before.&lt;/p&gt;

&lt;p&gt;The insight was in the material the whole time. I'd just never been able to read my own life from the outside.&lt;/p&gt;

&lt;p&gt;That session is the product. The Mirror automates and scales that experience.&lt;/p&gt;




&lt;p&gt;The privacy architecture is not an afterthought. Local-first storage. Optional E2E encrypted cloud sync. LLM calls with zero data retention. You own everything and can export or delete it.&lt;/p&gt;

&lt;p&gt;This is the most intimate data a person can produce. It needs to be treated that way.&lt;/p&gt;

&lt;p&gt;And crucially: the writing you do in a private diary is different from writing you do for any kind of audience. The privacy is not just a feature. It's what makes the data worth anything in the first place.&lt;/p&gt;




&lt;p&gt;Arc is in early development. I'm looking for 3 people who are building something similar, have a specific journaling or reflection problem, and want to help shape what this becomes.&lt;/p&gt;

&lt;p&gt;Not a beta waitlist. A pilot. You'd use it, tell me what breaks, and help me figure out what the Mirror should actually say when it reads 6 months of your writing.&lt;/p&gt;

&lt;p&gt;If you're building an honest journal or reflection tool and want one of 3 pilot slots: reply to this post or email &lt;a href="mailto:theagentthatcould@gmail.com"&gt;theagentthatcould@gmail.com&lt;/a&gt; with your use case. 3 slots, free while we build it together.&lt;/p&gt;

&lt;p&gt;The ask is your honesty, not your credit card.&lt;/p&gt;

</description>
      <category>productivity</category>
      <category>ai</category>
      <category>journaling</category>
      <category>indiehackers</category>
    </item>
  </channel>
</rss>
