<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Damir Karimov</title>
    <description>The latest articles on DEV Community by Damir Karimov (@damir-karimov).</description>
    <link>https://dev.to/damir-karimov</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2575304%2Fe501ae75-9f5b-4d85-9dd7-670b54fe522c.png</url>
      <title>DEV Community: Damir Karimov</title>
      <link>https://dev.to/damir-karimov</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/damir-karimov"/>
    <language>en</language>
    <item>
      <title>Real-Time Notification Systems Are Harder Than Most Teams Expect</title>
      <dc:creator>Damir Karimov</dc:creator>
      <pubDate>Tue, 09 Jun 2026 10:26:13 +0000</pubDate>
      <link>https://dev.to/damir-karimov/real-time-notification-systems-are-harder-than-most-teams-expect-od1</link>
      <guid>https://dev.to/damir-karimov/real-time-notification-systems-are-harder-than-most-teams-expect-od1</guid>
      <description>&lt;p&gt;If you’ve ever thought, “It’s just a WebSocket event,” this article is for you.&lt;/p&gt;

&lt;p&gt;Notification systems look simple on the surface, but in production they fail in annoying, expensive, and user-visible ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;duplicate notifications&lt;/li&gt;
&lt;li&gt;missing events&lt;/li&gt;
&lt;li&gt;race conditions&lt;/li&gt;
&lt;li&gt;delayed delivery&lt;/li&gt;
&lt;li&gt;mobile disconnects&lt;/li&gt;
&lt;li&gt;retry storms&lt;/li&gt;
&lt;li&gt;ordering bugs&lt;/li&gt;
&lt;li&gt;state drift across regions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The tricky part is not sending a message.&lt;/p&gt;

&lt;p&gt;The tricky part is making sure the right user gets the right notification, in the right order, with enough reliability that the system can survive crashes, retries, and mobile networks.&lt;/p&gt;




&lt;h2&gt;
  
  
  The problem with “just send a WebSocket event”
&lt;/h2&gt;

&lt;p&gt;A basic notification flow looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Backend → WebSocket Server → Client
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That works in local dev. It even works for a while in production.&lt;/p&gt;

&lt;p&gt;Then real traffic arrives, and the system suddenly has to handle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;reconnects&lt;/li&gt;
&lt;li&gt;offline users&lt;/li&gt;
&lt;li&gt;multiple devices&lt;/li&gt;
&lt;li&gt;persistence&lt;/li&gt;
&lt;li&gt;retries&lt;/li&gt;
&lt;li&gt;fan-out&lt;/li&gt;
&lt;li&gt;backpressure&lt;/li&gt;
&lt;li&gt;push fallbacks&lt;/li&gt;
&lt;li&gt;deduplication&lt;/li&gt;
&lt;li&gt;ordering&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At that point, your “WebSocket feature” has become a distributed messaging system.&lt;/p&gt;

&lt;p&gt;And that is a very different problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  Delivery semantics matter first
&lt;/h2&gt;

&lt;p&gt;Before you design the system, decide what guarantees you need.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Semantics&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;At-most-once&lt;/td&gt;
&lt;td&gt;Messages may be lost, but won’t be duplicated&lt;/td&gt;
&lt;td&gt;Low-priority updates&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;At-least-once&lt;/td&gt;
&lt;td&gt;Messages won’t be lost, but may be duplicated&lt;/td&gt;
&lt;td&gt;Payments, security alerts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Effectively-once&lt;/td&gt;
&lt;td&gt;Duplicates are removed with dedupe logic&lt;/td&gt;
&lt;td&gt;Critical product events&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Most teams make a mistake here.&lt;/p&gt;

&lt;p&gt;They start building transport first, then discover later that they actually needed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;idempotency keys&lt;/li&gt;
&lt;li&gt;durable cursors&lt;/li&gt;
&lt;li&gt;sequence numbers&lt;/li&gt;
&lt;li&gt;replay support&lt;/li&gt;
&lt;li&gt;acknowledgements&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why notification systems become expensive: the real problem is not delivery, it is &lt;strong&gt;delivery semantics&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why fan-out breaks systems
&lt;/h2&gt;

&lt;p&gt;One event is easy.&lt;/p&gt;

&lt;p&gt;One event to 10,000 users is not.&lt;/p&gt;

&lt;p&gt;A single action can trigger:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;feed updates&lt;/li&gt;
&lt;li&gt;badge counter updates&lt;/li&gt;
&lt;li&gt;push notifications&lt;/li&gt;
&lt;li&gt;email digests&lt;/li&gt;
&lt;li&gt;analytics events&lt;/li&gt;
&lt;li&gt;moderation triggers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That creates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;queue amplification&lt;/li&gt;
&lt;li&gt;retry cascades&lt;/li&gt;
&lt;li&gt;hot partitions&lt;/li&gt;
&lt;li&gt;uneven load&lt;/li&gt;
&lt;li&gt;latency spikes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the system stops being “send a notification” and becomes “shape traffic safely under failure.”&lt;/p&gt;




&lt;h2&gt;
  
  
  Why duplicates happen
&lt;/h2&gt;

&lt;p&gt;Duplicates usually do not come from one single bug. They appear from the interaction of retries, crashes, and missing idempotency.&lt;/p&gt;

&lt;p&gt;A common chain looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Message is written to Kafka
2. Consumer processes it
3. Consumer crashes before committing offset
4. Partition is reassigned
5. Another consumer reads the same message
6. User gets the notification twice
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is not random.&lt;/p&gt;

&lt;p&gt;That is at-least-once delivery with missing deduplication.&lt;/p&gt;

&lt;h3&gt;
  
  
  The fix
&lt;/h3&gt;

&lt;p&gt;Use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;idempotency keys&lt;/li&gt;
&lt;li&gt;dedupe storage&lt;/li&gt;
&lt;li&gt;sequence numbers&lt;/li&gt;
&lt;li&gt;consumer-side protection before side effects&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Ordering is harder than throughput
&lt;/h2&gt;

&lt;p&gt;Most users don’t complain about 200ms of delay.&lt;/p&gt;

&lt;p&gt;They absolutely notice this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Payment refunded” arrives before “Payment received”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That destroys trust immediately.&lt;/p&gt;

&lt;p&gt;Global ordering is usually too expensive. In practice, teams often choose:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;per-user ordering&lt;/li&gt;
&lt;li&gt;per-conversation ordering&lt;/li&gt;
&lt;li&gt;approximate ordering&lt;/li&gt;
&lt;li&gt;causal consistency where needed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For most products, per-user ordering is the best balance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example: sequence numbers per user
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;NotificationLog&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sequence&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;notification&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sequence&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sequence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sequence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;notification&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;notification&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;idempotency_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sequence&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;kafka&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;produce&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;notifications&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a simple way to keep ordering stable inside a user shard.&lt;/p&gt;




&lt;h2&gt;
  
  
  Mobile makes everything worse
&lt;/h2&gt;

&lt;p&gt;Desktop clients are relatively stable.&lt;/p&gt;

&lt;p&gt;Mobile clients are not.&lt;/p&gt;

&lt;p&gt;You have to deal with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;app backgrounding&lt;/li&gt;
&lt;li&gt;battery optimization&lt;/li&gt;
&lt;li&gt;network switching&lt;/li&gt;
&lt;li&gt;silent disconnects&lt;/li&gt;
&lt;li&gt;delayed push delivery&lt;/li&gt;
&lt;li&gt;OS throttling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s why real systems often combine:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;WebSockets for active sessions&lt;/li&gt;
&lt;li&gt;APNs for iOS&lt;/li&gt;
&lt;li&gt;FCM for Android&lt;/li&gt;
&lt;li&gt;polling or pull fallback&lt;/li&gt;
&lt;li&gt;local persistence&lt;/li&gt;
&lt;li&gt;sync checkpoints&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Important detail
&lt;/h3&gt;

&lt;p&gt;APNs and FCM are not guaranteed single-delivery transport.&lt;/p&gt;

&lt;p&gt;They can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;delay notifications&lt;/li&gt;
&lt;li&gt;drop messages under pressure&lt;/li&gt;
&lt;li&gt;coalesce updates&lt;/li&gt;
&lt;li&gt;expire tokens&lt;/li&gt;
&lt;li&gt;throttle traffic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So if the notification matters, the server still needs durable state.&lt;/p&gt;




&lt;h2&gt;
  
  
  A real incident example
&lt;/h2&gt;

&lt;p&gt;At 3AM, an on-call engineer gets paged because one user received dozens of duplicate payment emails.&lt;/p&gt;

&lt;p&gt;What happened?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the consumer crashed mid-batch&lt;/li&gt;
&lt;li&gt;the offset was not committed&lt;/li&gt;
&lt;li&gt;Kafka redelivered the same event&lt;/li&gt;
&lt;li&gt;the email sender had no dedupe check&lt;/li&gt;
&lt;li&gt;the user got spammed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That kind of issue is painful because it is not one bug.&lt;/p&gt;

&lt;p&gt;It is a chain of small design decisions that only becomes visible under failure.&lt;/p&gt;

&lt;h3&gt;
  
  
  The practical fix
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;send_notification&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;notification&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;idempotency_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;notification&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;notification&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sequence&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dedupe:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;idempotency_key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;

    &lt;span class="n"&gt;email_service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;notification&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dedupe:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;idempotency_key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;3600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important part is not the code style.&lt;/p&gt;

&lt;p&gt;It is the fact that the system now assumes duplicates can happen and is built to survive them.&lt;/p&gt;




&lt;h2&gt;
  
  
  Observability is not optional
&lt;/h2&gt;

&lt;p&gt;If you cannot observe the pipeline, you cannot debug it.&lt;/p&gt;

&lt;p&gt;Useful signals include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;queue lag&lt;/li&gt;
&lt;li&gt;retry count&lt;/li&gt;
&lt;li&gt;delivery success rate&lt;/li&gt;
&lt;li&gt;connection churn&lt;/li&gt;
&lt;li&gt;consumer health&lt;/li&gt;
&lt;li&gt;fan-out latency&lt;/li&gt;
&lt;li&gt;push provider errors&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The real question is not:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Did we send the event?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Can we prove the user received it?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Those are very different questions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Metrics worth tracking
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Good target&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Delivery success rate&lt;/td&gt;
&lt;td&gt;&amp;gt;99.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;p99 delivery latency&lt;/td&gt;
&lt;td&gt;&amp;lt;500ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Consumer lag&lt;/td&gt;
&lt;td&gt;low and stable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retry rate&lt;/td&gt;
&lt;td&gt;close to zero&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Connection churn&lt;/td&gt;
&lt;td&gt;predictable&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  What to do in production
&lt;/h2&gt;

&lt;p&gt;A good notification system needs a runbook, not just code.&lt;/p&gt;

&lt;p&gt;If retries spike:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Throttle producers.&lt;/li&gt;
&lt;li&gt;Pause non-critical workers.&lt;/li&gt;
&lt;li&gt;Increase retry backoff.&lt;/li&gt;
&lt;li&gt;Check consumer lag.&lt;/li&gt;
&lt;li&gt;Check push provider errors.&lt;/li&gt;
&lt;li&gt;Rehydrate missed clients from durable state.&lt;/li&gt;
&lt;li&gt;Replay safely with idempotency keys.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is what makes the system operationally survivable.&lt;/p&gt;




&lt;h2&gt;
  
  
  The scaling path
&lt;/h2&gt;

&lt;p&gt;A lot of teams go through the same evolution.&lt;/p&gt;

&lt;h3&gt;
  
  
  Startup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;API → WebSocket Server → Client
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Mid-scale
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;API → Kafka → Notification Workers → WebSocket Gateway Cluster → Client
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  High-scale
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;API → Multi-region event bus → regional workers → regional gateways → clients
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At higher scale, the hardest problems are usually:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;state distribution&lt;/li&gt;
&lt;li&gt;per-user ordering&lt;/li&gt;
&lt;li&gt;region routing&lt;/li&gt;
&lt;li&gt;dedupe&lt;/li&gt;
&lt;li&gt;offline recovery&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Not CPU.&lt;/p&gt;

&lt;p&gt;State.&lt;/p&gt;




&lt;h2&gt;
  
  
  What strong teams optimize for
&lt;/h2&gt;

&lt;p&gt;Early teams optimize for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;speed of delivery&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Strong teams optimize for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;correctness&lt;/li&gt;
&lt;li&gt;recoverability&lt;/li&gt;
&lt;li&gt;observability&lt;/li&gt;
&lt;li&gt;graceful degradation&lt;/li&gt;
&lt;li&gt;idempotency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That difference matters a lot in production.&lt;/p&gt;

&lt;p&gt;A notification that is slightly late is usually acceptable.&lt;/p&gt;

&lt;p&gt;A notification that is duplicated, lost, or out of order is not.&lt;/p&gt;




&lt;h2&gt;
  
  
  Testing the system
&lt;/h2&gt;

&lt;p&gt;You should test the failure modes, not just the happy path.&lt;/p&gt;

&lt;p&gt;Try:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;dropping WebSocket connections mid-message&lt;/li&gt;
&lt;li&gt;killing consumers during processing&lt;/li&gt;
&lt;li&gt;simulating mobile sleep/wake cycles&lt;/li&gt;
&lt;li&gt;forcing Kafka rebalances&lt;/li&gt;
&lt;li&gt;replaying duplicate events&lt;/li&gt;
&lt;li&gt;load testing fan-out spikes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the system only works when nothing fails, it is not ready.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final thought
&lt;/h2&gt;

&lt;p&gt;Real-time notification systems look simple until scale, retries, mobile behavior, ordering, and distributed state show up.&lt;/p&gt;

&lt;p&gt;Then they become one of the hardest backend problems in the product.&lt;/p&gt;

&lt;p&gt;The goal is not just to send events.&lt;/p&gt;

&lt;p&gt;The goal is to make sure the right user gets the right notification, with the right semantics, even when the system is under stress.&lt;/p&gt;

</description>
      <category>systemdesign</category>
      <category>websockets</category>
      <category>kafka</category>
      <category>backend</category>
    </item>
    <item>
      <title>AI Wrappers Are Dying: Why Most AI Products Fail</title>
      <dc:creator>Damir Karimov</dc:creator>
      <pubDate>Wed, 27 May 2026 12:01:10 +0000</pubDate>
      <link>https://dev.to/damir-karimov/ai-wrappers-are-dying-why-most-ai-products-fail-ano</link>
      <guid>https://dev.to/damir-karimov/ai-wrappers-are-dying-why-most-ai-products-fail-ano</guid>
      <description>&lt;p&gt;In 2026, building an app on top of OpenAI or Anthropic is easier than ever. But wrappers are dying.&lt;/p&gt;

&lt;p&gt;A polished UI and a few RAG pipelines can get you to launch. They will not get you lasting advantage.&lt;/p&gt;

&lt;p&gt;OpenAI API is not a competitive moat.&lt;/p&gt;




&lt;h2&gt;
  
  
  Wrappers Are Dying
&lt;/h2&gt;

&lt;p&gt;The first wave of AI startups was inevitable. Foundation models became powerful enough that developers could ship useful products without training models from scratch. The barrier to entry dropped dramatically.&lt;/p&gt;

&lt;p&gt;The market filled up with wrappers.&lt;/p&gt;

&lt;p&gt;That was not irrational. It was the fastest way to test demand and prove people would pay for AI-enabled outcomes. For many founders, a wrapper was the right starting point. It reduced time-to-market and let them focus on distribution.&lt;/p&gt;

&lt;p&gt;But wrappers that worked for speed do not work for defensibility.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Wrappers Are Fragile
&lt;/h2&gt;

&lt;p&gt;A wrapper around an LLM is a thin interface over someone else's intelligence.&lt;/p&gt;

&lt;p&gt;When the underlying model improves, your product advantage shrinks. When a competitor copies your UX, your edge disappears. When the model provider ships your core feature natively, your differentiation collapses overnight.&lt;/p&gt;

&lt;p&gt;The closer your product is to a generic interface over a foundation model, the easier it is to clone.&lt;/p&gt;

&lt;p&gt;Three problems:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The UI is visible and easy to imitate.&lt;/li&gt;
&lt;li&gt;The prompts and workflows are often not deeply proprietary.&lt;/li&gt;
&lt;li&gt;The core model capability is rented, not owned.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Many AI products compete on packaging rather than infrastructure.&lt;/p&gt;

&lt;p&gt;If your product can be described as "ChatGPT, but for X," you have product-market fit risk before you have a moat.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Creates Real Moat
&lt;/h2&gt;

&lt;p&gt;A real moat in AI is not "we use GPT." It is owning something the next startup cannot easily replicate.&lt;/p&gt;

&lt;p&gt;That includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Proprietary data&lt;/li&gt;
&lt;li&gt;Embedded workflows&lt;/li&gt;
&lt;li&gt;Deep enterprise integration&lt;/li&gt;
&lt;li&gt;Distribution advantages&lt;/li&gt;
&lt;li&gt;Domain-specific expertise&lt;/li&gt;
&lt;li&gt;Feedback loops that improve the product over time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Model access is replaceable. Workflow capture is sticky.&lt;/p&gt;

&lt;p&gt;If your product becomes part of how a team actually works, not just a tool they try once, you build defensibility. If you own the system of record, the approval flow, the compliance layer, or the operational pipeline, you are selling infrastructure, not AI.&lt;/p&gt;

&lt;p&gt;The more your product learns from user behavior, customer data, and domain-specific outcomes, the harder it becomes to copy.&lt;/p&gt;




&lt;h2&gt;
  
  
  Moat Patterns That Survive
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Proprietary Data Moat
&lt;/h3&gt;

&lt;p&gt;If your product collects high-signal, domain-specific data that competitors cannot access, you improve faster over time.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;labeled support cases&lt;/li&gt;
&lt;li&gt;medical annotations&lt;/li&gt;
&lt;li&gt;legal review outcomes&lt;/li&gt;
&lt;li&gt;sales conversation feedback&lt;/li&gt;
&lt;li&gt;codebase-specific assistant traces&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The moat works only if the data turns into better predictions, better retrieval, or better workflow decisions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Workflow Moat
&lt;/h3&gt;

&lt;p&gt;If your product becomes the place where work starts, gets reviewed, and gets approved, switching becomes painful.&lt;/p&gt;

&lt;p&gt;Workflow moats require:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;native integrations&lt;/li&gt;
&lt;li&gt;permissions and access control&lt;/li&gt;
&lt;li&gt;human-in-the-loop steps&lt;/li&gt;
&lt;li&gt;audit logs&lt;/li&gt;
&lt;li&gt;reliable outputs that fit existing processes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Enterprise AI products win by becoming infrastructure, not assistants.&lt;/p&gt;

&lt;h3&gt;
  
  
  Distribution Moat
&lt;/h3&gt;

&lt;p&gt;If your product is embedded in Slack, email, CRM, IDEs, or internal tooling, it becomes harder to displace. Adoption is already inside the user's daily flow.&lt;/p&gt;

&lt;p&gt;The best model in the world loses if users never reach it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Trust and Compliance Moat
&lt;/h3&gt;

&lt;p&gt;In regulated environments, trust is product value.&lt;/p&gt;

&lt;p&gt;If you can prove data handling, retention rules, access controls, auditability, and predictable behavior, you compete on more than output quality. For enterprise buyers, this is the difference between a demo and a contract.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost and Infrastructure Moat
&lt;/h3&gt;

&lt;p&gt;Some AI products create advantage by reducing inference cost, latency, or operational overhead at scale.&lt;/p&gt;

&lt;p&gt;This moat is weaker than proprietary data or workflow lock-in. It matters when usage volume is high. If you deliver similar quality at lower cost, your margin improves and pricing flexibility increases.&lt;/p&gt;




&lt;h2&gt;
  
  
  RAG Alone Is Not Enough
&lt;/h2&gt;

&lt;p&gt;RAG is useful. It is not a moat.&lt;/p&gt;

&lt;p&gt;Retrieval connects foundation models to private corpora, internal docs, and customer-specific context. But if every competitor can index similar documents and call the same model, the architecture is not defensible.&lt;/p&gt;

&lt;p&gt;RAG becomes valuable when paired with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;proprietary corpora&lt;/li&gt;
&lt;li&gt;strong ranking and retrieval quality&lt;/li&gt;
&lt;li&gt;feedback loops&lt;/li&gt;
&lt;li&gt;domain-specific evaluation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The moat is not the retrieval layer. It is retrieval, data quality, and embedded usage over time.&lt;/p&gt;




&lt;h2&gt;
  
  
  Platform Dependency Is a Liability
&lt;/h2&gt;

&lt;p&gt;The biggest hidden risk in AI startups is platform dependency.&lt;/p&gt;

&lt;p&gt;If your roadmap depends on a single provider, you inherit their pricing, latency, policy changes, rate limits, and feature roadmap. That is not a moat. That is a liability.&lt;/p&gt;

&lt;p&gt;When OpenAI improves a capability, it helps the whole market, including your competitors. When OpenAI ships a built-in feature that overlaps with your product, your differentiation evaporates overnight.&lt;/p&gt;

&lt;p&gt;Relying entirely on external model APIs is dangerous for long-term architecture. The more your product is a front-end to a general model, the more exposed you are to commoditization.&lt;/p&gt;

&lt;p&gt;Ask this: if model prices change, if output quality improves, or if the model vendor ships your core feature natively, what still makes you valuable?&lt;/p&gt;




&lt;h2&gt;
  
  
  Enterprise Workflows Are Where Winners Live
&lt;/h2&gt;

&lt;p&gt;The strongest AI products solve a workflow that already exists inside a company. They do more than "answer questions."&lt;/p&gt;

&lt;p&gt;Enterprise buyers care about more than output quality. They care about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Access control&lt;/li&gt;
&lt;li&gt;Compliance&lt;/li&gt;
&lt;li&gt;Auditability&lt;/li&gt;
&lt;li&gt;Data retention&lt;/li&gt;
&lt;li&gt;Integrations with existing systems&lt;/li&gt;
&lt;li&gt;Human approval steps&lt;/li&gt;
&lt;li&gt;Reliability at scale&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Workflow-based products have stronger moats than generic assistants. They do not just generate text. They become part of operational machinery.&lt;/p&gt;

&lt;p&gt;Once AI is embedded in billing, support, procurement, legal review, or internal knowledge systems, switching costs rise quickly.&lt;/p&gt;

&lt;p&gt;The best products feel "boring" from the outside. They are not flashy consumer apps. They are operational systems that save time, reduce risk, or increase throughput.&lt;/p&gt;




&lt;h2&gt;
  
  
  Vertical AI Wins
&lt;/h2&gt;

&lt;p&gt;Vertical AI is stronger than horizontal AI because it combines domain data, workflow design, and distribution.&lt;/p&gt;

&lt;p&gt;A vertical product knows the problem deeply. It understands terminology, edge cases, compliance rules, and customer expectations in a specific domain. This makes it harder to replace with a generic chatbot.&lt;/p&gt;

&lt;p&gt;Proprietary data becomes especially important here. The more your product learns from a narrow, high-value domain, the more its quality ties to data that others do not have.&lt;/p&gt;

&lt;p&gt;Winners connect three things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;domain-specific data&lt;/li&gt;
&lt;li&gt;operational workflow&lt;/li&gt;
&lt;li&gt;recurring business value&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A good vertical AI product is deeply fitted to a single job. That fit becomes harder to copy with every interaction.&lt;/p&gt;




&lt;h2&gt;
  
  
  Which AI Companies Survive
&lt;/h2&gt;

&lt;p&gt;AI companies that survive are not the ones with the flashiest demos. They turn model capability into durable product advantage.&lt;/p&gt;

&lt;p&gt;They:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;own proprietary or hard-to-access data&lt;/li&gt;
&lt;li&gt;sit inside critical workflows&lt;/li&gt;
&lt;li&gt;integrate deeply into enterprise systems&lt;/li&gt;
&lt;li&gt;build operational infrastructure, not interfaces&lt;/li&gt;
&lt;li&gt;create switching costs through usage, trust, and process&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The model may be replaceable. The product around it should not be.&lt;/p&gt;

&lt;p&gt;This is the difference between a temporary AI app and a lasting business.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to Measure Moat
&lt;/h2&gt;

&lt;p&gt;Signals that the moat is getting stronger:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Retention stays high even when model quality changes&lt;/li&gt;
&lt;li&gt;Customers rely on the product as part of a repeatable workflow&lt;/li&gt;
&lt;li&gt;The cost to replicate your dataset is high&lt;/li&gt;
&lt;li&gt;More value comes from your proprietary layer than from the base model&lt;/li&gt;
&lt;li&gt;Integrations increase switching costs over time&lt;/li&gt;
&lt;li&gt;Unit economics improve as usage and feedback grow&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Test: if a competitor copied your UI tomorrow, would they still need the same data, trust, integrations, and operational context to match your product?&lt;/p&gt;

&lt;p&gt;If yes, you are building a real moat.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The problem with most AI products is not that they use AI. They confuse access to AI with defensibility.&lt;/p&gt;

&lt;p&gt;A great interface gets attention. It rarely creates a moat. Real technical moats come from data, workflow, infrastructure, and integration — things hard to copy and harder to unwind.&lt;/p&gt;

&lt;p&gt;The right question is not "How can we add a model?" The right question is: What do we own that becomes more valuable over time?&lt;/p&gt;

&lt;p&gt;The best AI companies are not the ones with the loudest demo. They are the ones whose product gets more embedded, more trusted, and more expensive to replace every quarter.&lt;/p&gt;

&lt;p&gt;Wrappers are dying. Build a moat instead.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>softwareengineering</category>
      <category>startup</category>
      <category>openai</category>
    </item>
    <item>
      <title>Why Good Abstractions Make Debugging Harder</title>
      <dc:creator>Damir Karimov</dc:creator>
      <pubDate>Thu, 21 May 2026 15:03:00 +0000</pubDate>
      <link>https://dev.to/damir-karimov/why-good-abstractions-make-debugging-harder-lo</link>
      <guid>https://dev.to/damir-karimov/why-good-abstractions-make-debugging-harder-lo</guid>
      <description>&lt;p&gt;Good abstractions are great when you are building software.&lt;/p&gt;

&lt;p&gt;They are much less great when you are debugging production.&lt;/p&gt;

&lt;p&gt;The reason is simple: abstraction hides details, and debugging often depends on the details you hoped to ignore.&lt;/p&gt;

&lt;p&gt;In small codebases, this is barely noticeable. In real systems, especially with caches, async flows, optimistic UI, and multiple state owners, it becomes a serious problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  The core issue
&lt;/h2&gt;

&lt;p&gt;The more layers you add, the easier it is for the system to become “locally correct” and “globally wrong”.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the frontend thinks the payment succeeded,&lt;/li&gt;
&lt;li&gt;the backend committed the transaction,&lt;/li&gt;
&lt;li&gt;the event was published,&lt;/li&gt;
&lt;li&gt;the cache still serves the old value,&lt;/li&gt;
&lt;li&gt;the UI shows stale data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every layer is doing something reasonable.&lt;/p&gt;

&lt;p&gt;The problem is that they are not all talking about the same version of reality.&lt;/p&gt;

&lt;h2&gt;
  
  
  A simple example
&lt;/h2&gt;

&lt;p&gt;Imagine this flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;User clicks &lt;strong&gt;Retry payment&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Frontend updates UI optimistically&lt;/li&gt;
&lt;li&gt;API returns &lt;code&gt;200 OK&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Database is updated&lt;/li&gt;
&lt;li&gt;Event is sent to downstream systems&lt;/li&gt;
&lt;li&gt;Redis still serves old state&lt;/li&gt;
&lt;li&gt;UI refreshes from cache and shows stale data&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is the kind of bug that wastes hours.&lt;/p&gt;

&lt;p&gt;Not because any single line of code is hard, but because the truth is spread across several places.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example in code
&lt;/h2&gt;

&lt;p&gt;Let’s say the frontend uses optimistic updates:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;onRetryPayment&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;setPaymentStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;PAID&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/api/payments/retry&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Retry failed&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;setPaymentStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;FAILED&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At first glance, this looks fine.&lt;/p&gt;

&lt;p&gt;But now imagine:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the API succeeds,&lt;/li&gt;
&lt;li&gt;the DB is updated,&lt;/li&gt;
&lt;li&gt;an event is emitted,&lt;/li&gt;
&lt;li&gt;a consumer deduplicates the event incorrectly,&lt;/li&gt;
&lt;li&gt;Redis still contains the old value,&lt;/li&gt;
&lt;li&gt;the UI re-renders from stale cache.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The bug is no longer in this function.&lt;/p&gt;

&lt;p&gt;The bug is in the &lt;strong&gt;propagation path&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why abstractions make this worse
&lt;/h2&gt;

&lt;p&gt;Abstractions hide the exact mechanics that matter during incidents.&lt;/p&gt;

&lt;p&gt;They hide things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;who owns the state,&lt;/li&gt;
&lt;li&gt;when the state changes,&lt;/li&gt;
&lt;li&gt;whether the update is synchronous or async,&lt;/li&gt;
&lt;li&gt;whether caches are invalidated,&lt;/li&gt;
&lt;li&gt;whether retries are safe,&lt;/li&gt;
&lt;li&gt;whether events can arrive out of order.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is useful in normal development.&lt;/p&gt;

&lt;p&gt;It is terrible during debugging.&lt;/p&gt;

&lt;p&gt;Because when something is wrong, you do not need another clean interface. You need visibility.&lt;/p&gt;

&lt;h2&gt;
  
  
  Typical failure patterns
&lt;/h2&gt;

&lt;p&gt;These are the patterns I see most often in real systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Stale read
&lt;/h3&gt;

&lt;p&gt;The data was updated, but one layer still serves an old version.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// DB updated successfully&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;payment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;where&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;paymentId&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;PAID&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Cache not invalidated&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;DB = &lt;code&gt;PAID&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;cache = &lt;code&gt;PENDING&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;UI = &lt;code&gt;PENDING&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Lost update
&lt;/h3&gt;

&lt;p&gt;Two writes happen close together, and one silently overwrites the other.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;updateProfile&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Alex&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;updateProfile&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;John&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the system uses last-write-wins without proper locking or versioning, the final state may not match user intent.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Ghost update
&lt;/h3&gt;

&lt;p&gt;One layer changes, but another never receives the update.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nf"&gt;dispatch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;updateOrderStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;PAID&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="c1"&gt;// but query cache is never invalidated&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result is a UI that looks stuck even though the backend is correct.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Event reorder bug
&lt;/h3&gt;

&lt;p&gt;Events arrive in a different order than they were produced.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Event B processed before Event A&lt;/span&gt;
&lt;span class="nf"&gt;processEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;payment_succeeded&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nf"&gt;processEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;payment_pending&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the final state may be wrong even if both handlers are valid.&lt;/p&gt;

&lt;h2&gt;
  
  
  The debugging trap
&lt;/h2&gt;

&lt;p&gt;The trap is assuming this is a code bug.&lt;/p&gt;

&lt;p&gt;Very often it is not.&lt;/p&gt;

&lt;p&gt;It is a &lt;strong&gt;state ownership bug&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That means the real question is not:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“Which function crashed?”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The real question is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“Which layer is the source of truth right now?”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you cannot answer that clearly, debugging becomes guesswork.&lt;/p&gt;

&lt;h2&gt;
  
  
  A better way to think about it
&lt;/h2&gt;

&lt;p&gt;Instead of thinking in terms of “where is the bug?”, think in terms of “where does state live?”&lt;/p&gt;

&lt;p&gt;A useful checklist:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Where is the canonical value stored?&lt;/li&gt;
&lt;li&gt;Which layer may cache it?&lt;/li&gt;
&lt;li&gt;Which layer may derive it?&lt;/li&gt;
&lt;li&gt;Which layer may overwrite it?&lt;/li&gt;
&lt;li&gt;Which layer may delay it?&lt;/li&gt;
&lt;li&gt;Which layer may retry it?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the same value exists in five places, you now have five opportunities for disagreement.&lt;/p&gt;

&lt;h2&gt;
  
  
  Debugging strategy
&lt;/h2&gt;

&lt;p&gt;When a bug crosses abstraction boundaries, I usually inspect it in this order:&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Check the source of truth
&lt;/h3&gt;

&lt;p&gt;Confirm where the canonical data lives.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Rebuild the timeline
&lt;/h3&gt;

&lt;p&gt;Trace the state from user action to backend write to cache update to UI read.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Check invalidation
&lt;/h3&gt;

&lt;p&gt;If a cache exists, verify it is updated or cleared at the right moment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Check idempotency
&lt;/h3&gt;

&lt;p&gt;If retries or events are involved, verify the operation can safely happen more than once.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Check ordering
&lt;/h3&gt;

&lt;p&gt;If events are async, verify the system does not depend on strict ordering unless it actually guarantees it.&lt;/p&gt;

&lt;h2&gt;
  
  
  When abstractions do help
&lt;/h2&gt;

&lt;p&gt;This is not an anti-abstraction argument.&lt;/p&gt;

&lt;p&gt;Good abstractions are still valuable when they:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;reduce search space,&lt;/li&gt;
&lt;li&gt;make ownership clear,&lt;/li&gt;
&lt;li&gt;keep state local,&lt;/li&gt;
&lt;li&gt;expose transitions explicitly.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example, a small component with local state is easier to debug than three caches and two event consumers trying to keep the same value in sync.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;setCount&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;button&lt;/span&gt; &lt;span class="na"&gt;onClick&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;setCount&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
      Count: &lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;button&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is easy to reason about because there is one owner of the state.&lt;/p&gt;

&lt;p&gt;That is the difference.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to do in real systems
&lt;/h2&gt;

&lt;p&gt;If you want abstractions to stay helpful in production, make them observable.&lt;/p&gt;

&lt;p&gt;That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;add logs at boundaries,&lt;/li&gt;
&lt;li&gt;use trace IDs,&lt;/li&gt;
&lt;li&gt;keep ownership explicit,&lt;/li&gt;
&lt;li&gt;invalidate caches intentionally,&lt;/li&gt;
&lt;li&gt;design retries to be safe,&lt;/li&gt;
&lt;li&gt;avoid hidden duplicated state.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A good abstraction should reduce complexity, not hide the mechanics that make incidents debuggable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final thought
&lt;/h2&gt;

&lt;p&gt;The best abstractions are honest.&lt;/p&gt;

&lt;p&gt;They do not pretend the system is simpler than it is. They make the system easier to understand &lt;strong&gt;without hiding where truth lives&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That is why debugging gets harder as systems grow: not because abstraction is bad, but because abstraction is often too successful at hiding the exact thing you need under pressure.&lt;/p&gt;

</description>
      <category>distributedsystems</category>
      <category>systemdesign</category>
      <category>frontend</category>
      <category>software</category>
    </item>
    <item>
      <title>AI-generated code doesn't fail loudly. It fails correctly-looking.</title>
      <dc:creator>Damir Karimov</dc:creator>
      <pubDate>Wed, 13 May 2026 12:39:58 +0000</pubDate>
      <link>https://dev.to/damir-karimov/ai-generated-code-doesnt-fail-loudly-it-fails-correctly-looking-1acc</link>
      <guid>https://dev.to/damir-karimov/ai-generated-code-doesnt-fail-loudly-it-fails-correctly-looking-1acc</guid>
      <description>&lt;p&gt;AI-generated code rarely breaks in obvious ways. It passes review, ships&lt;br&gt;
to production, and behaves correctly in controlled scenarios. The&lt;br&gt;
problem is what happens after: failures appear only under timing, load,&lt;br&gt;
retries, or inconsistent state transitions.&lt;/p&gt;

&lt;p&gt;The core issue is not obvious bugs. It is code that looks structurally&lt;br&gt;
correct while silently ignoring real-world failure modes.&lt;/p&gt;


&lt;h2&gt;
  
  
  Why AI code feels correct
&lt;/h2&gt;

&lt;p&gt;AI tends to generate implementations with strong surface-level signals:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  consistent TypeScript types&lt;/li&gt;
&lt;li&gt;  standard architectural patterns&lt;/li&gt;
&lt;li&gt;  clean async/await flows&lt;/li&gt;
&lt;li&gt;  readable naming conventions&lt;/li&gt;
&lt;li&gt;  familiar framework usage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This produces a strong cognitive bias during review. The code does not&lt;br&gt;
look "risky", so it is assumed to be correct.&lt;/p&gt;

&lt;p&gt;The gap appears because readability is not equivalent to correctness&lt;br&gt;
under production conditions.&lt;/p&gt;


&lt;h2&gt;
  
  
  Where AI-generated code typically fails
&lt;/h2&gt;
&lt;h3&gt;
  
  
  1. Concurrency and race conditions
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;updateProfile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Profile&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;setLoading&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;updateProfile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="nf"&gt;setUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nf"&gt;setLoading&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This assumes a single linear execution.&lt;/p&gt;

&lt;p&gt;In real systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  multiple requests can run in parallel&lt;/li&gt;
&lt;li&gt;  responses can resolve out of order&lt;/li&gt;
&lt;li&gt;  later responses can overwrite newer state&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Result: stale state overwrite without errors or crashes.&lt;/p&gt;


&lt;h3&gt;
  
  
  2. Optimistic updates without consistency guarantees
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nf"&gt;setTodos&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;prev&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="nx"&gt;prev&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;newTodo&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createTodo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;newTodo&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This assumes success.&lt;/p&gt;

&lt;p&gt;Failure scenarios:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  request fails but UI is not rolled back&lt;/li&gt;
&lt;li&gt;  retry creates duplicate entries&lt;/li&gt;
&lt;li&gt;  frontend state diverges from backend state&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The system remains "visually correct" while data integrity is broken.&lt;/p&gt;


&lt;h3&gt;
  
  
  3. Stale closures and lifecycle assumptions
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nf"&gt;useEffect&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;interval&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;setInterval&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;clearInterval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;interval&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;[]);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This pattern locks in initial state.&lt;/p&gt;

&lt;p&gt;In production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  values become stale over time&lt;/li&gt;
&lt;li&gt;  UI desynchronization occurs&lt;/li&gt;
&lt;li&gt;  behavior depends on render timing rather than logic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No runtime error occurs, so the issue is often missed.&lt;/p&gt;


&lt;h3&gt;
  
  
  4. Weak caching and invalidation logic
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cacheKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`user-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;cacheKey&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;cacheKey&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetchUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;cacheKey&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This assumes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  stable data shape&lt;/li&gt;
&lt;li&gt;  stable identity rules&lt;/li&gt;
&lt;li&gt;  single write path&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In real systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  partial updates invalidate assumptions&lt;/li&gt;
&lt;li&gt;  multiple services mutate the same entity&lt;/li&gt;
&lt;li&gt;  cache becomes silently stale rather than obviously wrong&lt;/li&gt;
&lt;/ul&gt;


&lt;h3&gt;
  
  
  5. Hidden assumptions about APIs
&lt;/h3&gt;

&lt;p&gt;AI can introduce plausible but non-existent APIs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;refreshSession&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;force&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="nx"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invalidateAllQueries&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These patterns often:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  look consistent with ecosystem conventions&lt;/li&gt;
&lt;li&gt;  pass code review without deep verification&lt;/li&gt;
&lt;li&gt;  fail only at runtime&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This shifts errors from compile-time to production-time.&lt;/p&gt;




&lt;h3&gt;
  
  
  6. Accumulated lifecycle leaks
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nf"&gt;useEffect&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;controller&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;AbortController&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="nf"&gt;fetchData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;signal&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;abort&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Individually correct, but when repeated across systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  inconsistent cleanup patterns accumulate&lt;/li&gt;
&lt;li&gt;  aborted requests still resolve in edge cases&lt;/li&gt;
&lt;li&gt;  memory usage grows gradually&lt;/li&gt;
&lt;li&gt;  behavior becomes harder to reproduce&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Systemic issue: reduced verification depth
&lt;/h2&gt;

&lt;p&gt;The main shift introduced by AI-generated code is not implementation&lt;br&gt;
speed, but review behavior.&lt;/p&gt;

&lt;p&gt;Before AI, writing code required reasoning during implementation. After&lt;br&gt;
AI:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  code already looks complete&lt;/li&gt;
&lt;li&gt;  structure appears correct by default&lt;/li&gt;
&lt;li&gt;  reviewers focus on surface validation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This creates a subtle degradation in engineering discipline:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  fewer edge-case simulations&lt;/li&gt;
&lt;li&gt;  less reasoning about concurrency&lt;/li&gt;
&lt;li&gt;  weaker validation of failure states&lt;/li&gt;
&lt;li&gt;  acceptance of "looks correct" as correctness&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Impact on real systems
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Frontend state drift
&lt;/h3&gt;

&lt;p&gt;UI remains stable visually while backend state diverges.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Authentication and session issues
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  race conditions during token refresh&lt;/li&gt;
&lt;li&gt;  inconsistent logout handling&lt;/li&gt;
&lt;li&gt;  background requests using invalid sessions&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Payments and idempotency problems
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  duplicate transactions&lt;/li&gt;
&lt;li&gt;  retries without deduplication&lt;/li&gt;
&lt;li&gt;  partial failure inconsistencies&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Distributed system inconsistencies
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  assumption of ordering guarantees&lt;/li&gt;
&lt;li&gt;  reliance on immediate consistency&lt;/li&gt;
&lt;li&gt;  incorrect retry semantics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These issues are not immediately visible. They surface as rare,&lt;br&gt;
non-reproducible incidents.&lt;/p&gt;




&lt;h2&gt;
  
  
  The real risk
&lt;/h2&gt;

&lt;p&gt;AI does not generate obviously wrong code.&lt;/p&gt;

&lt;p&gt;It generates code that satisfies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  type safety&lt;/li&gt;
&lt;li&gt;  structural conventions&lt;/li&gt;
&lt;li&gt;  expected patterns&lt;/li&gt;
&lt;li&gt;  readable abstractions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This creates false confidence during review.&lt;/p&gt;

&lt;p&gt;The critical failure is not bugs themselves, but reduced skepticism&lt;br&gt;
toward code that appears correct.&lt;/p&gt;

&lt;p&gt;Once that happens, correctness is no longer actively verified. It is&lt;br&gt;
assumed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;AI increases development speed, but it also changes how correctness is&lt;br&gt;
perceived.&lt;/p&gt;

&lt;p&gt;The danger is code that looks correct enough that nobody questions it deeply.&lt;/p&gt;

&lt;p&gt;When that happens, production issues stop being introduced by obvious mistakes and start emerging from unexamined assumptions embedded in clean-looking code.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>softwareengineering</category>
      <category>codequality</category>
      <category>frontend</category>
    </item>
    <item>
      <title>LLM-Driven Client-Side Caching: A Hybrid Decision Architecture</title>
      <dc:creator>Damir Karimov</dc:creator>
      <pubDate>Mon, 04 May 2026 15:22:22 +0000</pubDate>
      <link>https://dev.to/damir-karimov/llm-driven-client-side-caching-a-hybrid-decision-architecture-322m</link>
      <guid>https://dev.to/damir-karimov/llm-driven-client-side-caching-a-hybrid-decision-architecture-322m</guid>
      <description>&lt;p&gt;Client-side caching is usually implemented as a storage optimization layer (TTL, SWR, invalidation rules). In practice it behaves like a decision system under uncertainty.&lt;/p&gt;

&lt;p&gt;Static strategies fail when data volatility is non-uniform across the same application. This leads to either stale UI or excessive network traffic.&lt;/p&gt;

&lt;p&gt;This article breaks down:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;why standard caching approaches plateau&lt;/li&gt;
&lt;li&gt;where ML improves the system&lt;/li&gt;
&lt;li&gt;where LLMs actually fit&lt;/li&gt;
&lt;li&gt;how to design a production-grade decision pipeline&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Problem: caching is not a storage problem
&lt;/h2&gt;

&lt;p&gt;Different data types behave differently:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;user profiles → low volatility&lt;/li&gt;
&lt;li&gt;feeds / notifications → high volatility&lt;/li&gt;
&lt;li&gt;search results → context-dependent volatility&lt;/li&gt;
&lt;li&gt;partially hydrated UI → unknown volatility&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The core issue:&lt;/p&gt;

&lt;p&gt;caching requires a policy decision per request, not a static rule&lt;/p&gt;

&lt;p&gt;So the real problem is:&lt;/p&gt;

&lt;p&gt;data → context → decision (cache / revalidate / bypass)&lt;/p&gt;

&lt;h2&gt;
  
  
  Baseline systems (what already exists)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. SWR / TTL-based caching
&lt;/h3&gt;

&lt;p&gt;Used in React Query / SWR:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;stale-while-revalidate&lt;/li&gt;
&lt;li&gt;background refetch&lt;/li&gt;
&lt;li&gt;TTL invalidation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Works when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;update cycles are predictable&lt;/li&gt;
&lt;li&gt;data freshness is stable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Fails when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;volatility varies inside the same dataset&lt;/li&gt;
&lt;li&gt;freshness depends on UI state&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Heuristic scoring systems
&lt;/h3&gt;

&lt;p&gt;Example adaptive TTL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;
&lt;span class="nx"&gt;volatilityScore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;EWMA&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;changeFrequency&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nx"&gt;priorityScore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;userInteractionWeight&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;dataImportance&lt;/span&gt;
&lt;span class="nx"&gt;ttl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;baseTTL&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;volatilityScore&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Improves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;adaptive cache lifetime&lt;/li&gt;
&lt;li&gt;frequency-aware invalidation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Limitations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;requires manual feature design&lt;/li&gt;
&lt;li&gt;domain-specific tuning&lt;/li&gt;
&lt;li&gt;breaks under missing signals&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Lightweight ML models
&lt;/h3&gt;

&lt;p&gt;Typical approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;logistic regression&lt;/li&gt;
&lt;li&gt;XGBoost / LightGBM&lt;/li&gt;
&lt;li&gt;embedding classifiers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Pros:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;fast inference&lt;/li&gt;
&lt;li&gt;stable behavior&lt;/li&gt;
&lt;li&gt;cheaper than LLMs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;needs labeled “optimal cache decision” data (rare)&lt;/li&gt;
&lt;li&gt;retraining pipeline required&lt;/li&gt;
&lt;li&gt;brittle under product changes&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why all baseline approaches plateau
&lt;/h2&gt;

&lt;p&gt;All classical systems assume:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;feature space is complete&lt;/li&gt;
&lt;li&gt;behavior is stationary&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In real systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;user behavior is contextual&lt;/li&gt;
&lt;li&gt;volatility depends on UI state&lt;/li&gt;
&lt;li&gt;freshness is semantic, not numeric&lt;/li&gt;
&lt;li&gt;signals are incomplete&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;heuristics → saturate&lt;/li&gt;
&lt;li&gt;ML-light → overfit or drift&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Key idea: caching is a decision system under uncertainty
&lt;/h2&gt;

&lt;p&gt;Instead of:&lt;/p&gt;

&lt;p&gt;“how long do we cache this?”&lt;/p&gt;

&lt;p&gt;The correct formulation is:&lt;/p&gt;

&lt;p&gt;“what action should we take given incomplete information?”&lt;/p&gt;

&lt;p&gt;actions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;HIT&lt;/li&gt;
&lt;li&gt;REVALIDATE&lt;/li&gt;
&lt;li&gt;BYPASS&lt;/li&gt;
&lt;li&gt;SWR&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Where LLMs fit (and where they don’t)
&lt;/h2&gt;

&lt;p&gt;LLMs are not a replacement layer.&lt;/p&gt;

&lt;p&gt;They function as:&lt;/p&gt;

&lt;p&gt;fallback policy engine for ambiguous decision space&lt;/p&gt;

&lt;p&gt;They are useful only when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;scoring model confidence is low&lt;/li&gt;
&lt;li&gt;signals conflict&lt;/li&gt;
&lt;li&gt;unseen patterns appear&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Architecture: layered decision system
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;UI Layer
   ↓
Context Builder
   ↓
Policy Engine
   ├── Rule Layer (deterministic)
   ├── ML Scoring Layer (probabilistic)
   └── LLM Fallback Layer (uncertainty)
   ↓
Cache Layer
   ↓
Network
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Context model (input abstraction)
&lt;/h2&gt;

&lt;p&gt;All decisions must be based on structured signals:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user_feed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"lastUpdatedMs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"accessFrequency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"high"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"volatilityScore"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.82&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"userAction"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"scroll"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"stalenessToleranceMs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Important constraint:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;no raw prompts&lt;/li&gt;
&lt;li&gt;only structured features&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  LLM role (strictly bounded)
&lt;/h2&gt;

&lt;p&gt;LLM is only a classifier:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"strategy"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"HIT | REVALIDATE | BYPASS | SWR"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"ttlMs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.78&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Triggered only when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ML confidence &amp;lt; threshold&lt;/li&gt;
&lt;li&gt;feature signals conflict&lt;/li&gt;
&lt;li&gt;unseen context patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Meta-cache: caching the decision layer
&lt;/h2&gt;

&lt;p&gt;To reduce cost:&lt;/p&gt;

&lt;p&gt;decisionCache(contextHash) → strategy&lt;/p&gt;

&lt;p&gt;Effects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;avoids repeated LLM calls&lt;/li&gt;
&lt;li&gt;stabilizes latency&lt;/li&gt;
&lt;li&gt;amortizes inference cost&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Cost-aware execution pipeline
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;IF rule matches:
    use rule engine
ELSE IF ML confidence &amp;gt; threshold:
    use ML model
ELSE:
    use LLM
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Typical production distribution:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;80–90% rules&lt;/li&gt;
&lt;li&gt;10–20% ML&lt;/li&gt;
&lt;li&gt;&amp;lt;10% LLM&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Failure modes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Overuse of LLM
&lt;/h3&gt;

&lt;p&gt;Problem:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;cost spikes&lt;/li&gt;
&lt;li&gt;unpredictable latency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Mitigation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;strict confidence gating&lt;/li&gt;
&lt;li&gt;bounded invocation layer&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Latency variance
&lt;/h3&gt;

&lt;p&gt;Problem:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;inconsistent response time in UI&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Mitigation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;decision caching&lt;/li&gt;
&lt;li&gt;async precomputation&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Model drift
&lt;/h3&gt;

&lt;p&gt;Problem:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ML decisions degrade over time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Mitigation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;feedback loop&lt;/li&gt;
&lt;li&gt;periodic recalibration&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Engineering takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;caching is a decision system, not storage optimization&lt;/li&gt;
&lt;li&gt;SWR + heuristics solve majority of cases&lt;/li&gt;
&lt;li&gt;ML-light is optimal in stable feature spaces&lt;/li&gt;
&lt;li&gt;LLMs are only for ambiguous cases&lt;/li&gt;
&lt;li&gt;production systems require strict routing hierarchy&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Client-side caching becomes effective only when modeled as a layered decision system.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;rules handle deterministic cases&lt;/li&gt;
&lt;li&gt;ML handles structured uncertainty&lt;/li&gt;
&lt;li&gt;LLM handles ambiguity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The correct design is hybrid, with strict boundaries and cost control, not LLM-centric&lt;/p&gt;

&lt;h2&gt;
  
  
  Discussion
&lt;/h2&gt;

&lt;p&gt;Where should the boundary be defined between ML confidence and LLM fallback in production caching systems?&lt;/p&gt;

</description>
      <category>frontend</category>
      <category>systemdesign</category>
      <category>architecture</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
