<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Muhammad Masad Ashraf</title>
    <description>The latest articles on DEV Community by Muhammad Masad Ashraf (@masadashraf).</description>
    <link>https://dev.to/masadashraf</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1849252%2F2176456a-d7d2-451a-a400-020a2702872f.jpg</url>
      <title>DEV Community: Muhammad Masad Ashraf</title>
      <link>https://dev.to/masadashraf</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/masadashraf"/>
    <language>en</language>
    <item>
      <title>How to Build a Shopify Webhook Replay System That Never Loses Events</title>
      <dc:creator>Muhammad Masad Ashraf</dc:creator>
      <pubDate>Fri, 22 May 2026 15:41:32 +0000</pubDate>
      <link>https://dev.to/masadashraf/how-to-build-a-shopify-webhook-replay-system-that-never-loses-events-4jn6</link>
      <guid>https://dev.to/masadashraf/how-to-build-a-shopify-webhook-replay-system-that-never-loses-events-4jn6</guid>
      <description>&lt;p&gt;Your Shopify integration fires perfectly in staging. Then production hits. Your server goes down for two minutes during a deploy. Shopify sends 40 webhook events. Your handler catches none of them.&lt;/p&gt;

&lt;p&gt;That is real data loss. Orders not fulfilled. Inventory not updated. Customers not synced.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;webhook replay system&lt;/strong&gt; is the fix. This guide walks you through building one that stores every incoming event, retries failures intelligently, and lets you replay any batch of events on demand.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Webhooks Fail in the First Place
&lt;/h2&gt;

&lt;p&gt;Shopify retries failed webhooks up to 19 times over 48 hours. After that, you are on your own. Here is what causes failures:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Failure Cause&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Server downtime&lt;/td&gt;
&lt;td&gt;Endpoint offline when Shopify fires&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Timeout&lt;/td&gt;
&lt;td&gt;Handler takes longer than 5 seconds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Code errors&lt;/td&gt;
&lt;td&gt;Bug in your handler throws an exception&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rate limiting&lt;/td&gt;
&lt;td&gt;System overloaded, rejects the request&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DB unavailability&lt;/td&gt;
&lt;td&gt;Database is down when handler tries to write&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deployment gaps&lt;/td&gt;
&lt;td&gt;Deploy restarts server at the wrong moment&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;After 48 hours and 19 retries, Shopify flags your endpoint as failing. Eventually it stops sending events entirely.&lt;/p&gt;

&lt;p&gt;A replay system gets you out of that hole.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Core Architecture
&lt;/h2&gt;

&lt;p&gt;A production webhook replay system has four stages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Shopify
   |
   v
[Ingestion Endpoint]   &amp;lt;-- ACK immediately, store raw payload
   |
   v
[Event Store (DB)]     &amp;lt;-- durable record of every event
   |
   v
[Processing Queue]     &amp;lt;-- background worker picks up events
   |
   v
[Handler]              &amp;lt;-- idempotent processing logic
   |
   +-- Success --&amp;gt; mark 'succeeded'
   +-- Failure --&amp;gt; retry with backoff --&amp;gt; Dead Letter Queue
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The key rule:&lt;/strong&gt; never process inside the ingestion endpoint. Shopify waits only 5 seconds. Acknowledge fast, process separately.&lt;/p&gt;




&lt;h2&gt;
  
  
  Stage 1: The Ingestion Endpoint
&lt;/h2&gt;

&lt;p&gt;Your ingestion endpoint does exactly two things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Verify the HMAC signature&lt;/li&gt;
&lt;li&gt;Write the raw payload to the event store and return &lt;code&gt;200 OK&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/webhooks&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;express&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;hmac&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;x-shopify-hmac-sha256&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;webhookId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;x-shopify-webhook-id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;topic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;x-shopify-topic&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;

  &lt;span class="c1"&gt;// Verify HMAC first&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nf"&gt;verifyHmac&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;hmac&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;401&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Unauthorized&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// Store immediately, respond fast&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
    INSERT INTO webhook_events (shopify_id, topic, payload, hmac, status)
    VALUES ($1, $2, $3, $4, 'pending')
    ON CONFLICT (shopify_id) DO NOTHING
  `&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;webhookId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="nx"&gt;hmac&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;

  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;OK&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;ON CONFLICT DO NOTHING&lt;/code&gt; handles Shopify's own retries. Same event, same ID, no duplicate row.&lt;/p&gt;




&lt;h2&gt;
  
  
  Stage 2: The Event Store Schema
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;webhook_events&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt;              &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;gen_random_uuid&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="n"&gt;shopify_id&lt;/span&gt;      &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;UNIQUE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;topic&lt;/span&gt;           &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;payload&lt;/span&gt;         &lt;span class="n"&gt;JSONB&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;hmac&lt;/span&gt;            &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;status&lt;/span&gt;          &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="s1"&gt;'pending'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;retry_count&lt;/span&gt;     &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;last_error&lt;/span&gt;      &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;created_at&lt;/span&gt;      &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="n"&gt;processed_at&lt;/span&gt;    &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_webhook_status&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;webhook_events&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_webhook_topic&lt;/span&gt;  &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;webhook_events&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_webhook_created&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;webhook_events&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Index on &lt;code&gt;status&lt;/code&gt;, &lt;code&gt;topic&lt;/code&gt;, and &lt;code&gt;created_at&lt;/code&gt;. These are the three axes you will query for replay.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Storage options by scale:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Option&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;PostgreSQL&lt;/td&gt;
&lt;td&gt;Most Shopify apps. Full SQL, JSONB indexing.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Redis Streams&lt;/td&gt;
&lt;td&gt;High-throughput ingestion, consumer groups.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AWS SQS + DynamoDB&lt;/td&gt;
&lt;td&gt;Managed infra, no ops overhead.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kafka&lt;/td&gt;
&lt;td&gt;Enterprise-scale event sourcing.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Stage 3: The Processing Worker
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;processWebhookEvents&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;events&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
    SELECT * FROM webhook_events
    WHERE status = 'pending'
    ORDER BY created_at ASC
    LIMIT 50
    FOR UPDATE SKIP LOCKED
  `&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;events&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;handleEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
        UPDATE webhook_events
        SET status = 'succeeded', processed_at = NOW()
        WHERE id = $1
      `&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;

    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;newRetryCount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;retry_count&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;newStatus&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;newRetryCount&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;dead&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;pending&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
        UPDATE webhook_events
        SET status = $1, retry_count = $2, last_error = $3
        WHERE id = $4
      `&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;newStatus&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;newRetryCount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;FOR UPDATE SKIP LOCKED&lt;/code&gt; prevents multiple workers from grabbing the same event. Safe for concurrent processing.&lt;/p&gt;




&lt;h2&gt;
  
  
  Stage 4: Exponential Backoff
&lt;/h2&gt;

&lt;p&gt;Do not retry immediately. Back off with jitter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getRetryDelay&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;retryCount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;baseDelays&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;21600&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt; &lt;span class="c1"&gt;// seconds&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;base&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;baseDelays&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;retryCount&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="mi"&gt;21600&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;jitter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;base&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;jitter&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Retry&lt;/th&gt;
&lt;th&gt;Delay&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;30 seconds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;10 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;1 hour&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5+&lt;/td&gt;
&lt;td&gt;6 hours&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Jitter prevents retry storms when many events fail simultaneously.&lt;/p&gt;




&lt;h2&gt;
  
  
  Stage 5: The Replay Trigger
&lt;/h2&gt;

&lt;p&gt;This is the part that earns the "replay" in "replay system." When you fix a bug and need to reprocess failed events:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;replayEvents&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;failed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;batchSize&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;events&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
    SELECT * FROM webhook_events
    WHERE status = $1
      AND ($2::text IS NULL OR topic = $2)
      AND ($3::timestamptz IS NULL OR created_at &amp;gt;= $3)
      AND ($4::timestamptz IS NULL OR created_at &amp;lt;= $4)
    ORDER BY created_at ASC
    LIMIT $5
  `&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;batchSize&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;

  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;events&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Reset to pending so the worker picks it up&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
      UPDATE webhook_events
      SET status = 'pending', retry_count = 0, last_error = NULL
      WHERE id = $1
    `&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;events&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rowCount&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Usage: replay all failed orders/paid from the last 24 hours&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;replayEvents&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;orders/paid&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;failed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;from&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;86400000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Idempotency: The Non-Negotiable Requirement
&lt;/h2&gt;

&lt;p&gt;When you replay, your handler runs twice for the same payload. Without idempotency, you get duplicate orders, double charges, corrupted inventory.&lt;/p&gt;

&lt;p&gt;Use the &lt;code&gt;shopify_id&lt;/code&gt; (the &lt;code&gt;X-Shopify-Webhook-Id&lt;/code&gt;) as your deduplication key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;handleOrderCreate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;shopifyOrderId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="c1"&gt;// Check if already processed&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;existing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;SELECT id FROM orders WHERE shopify_order_id = $1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;shopifyOrderId&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;existing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Order &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;shopifyOrderId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; already processed. Skipping.`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// Safe to process&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;INSERT INTO orders (shopify_order_id, ...) VALUES ($1, ...)&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;shopifyOrderId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...]&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No idempotency = no safe replay. Build this first.&lt;/p&gt;




&lt;h2&gt;
  
  
  Dead Letter Queue
&lt;/h2&gt;

&lt;p&gt;Events that hit max retries go to DLQ status. Monitor these aggressively:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Query DLQ events&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;dlqEvents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
  SELECT topic, COUNT(*) as count, MIN(created_at) as oldest
  FROM webhook_events
  WHERE status = 'dead'
  GROUP BY topic
  ORDER BY count DESC
`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;DLQ growth is not a metric to check weekly. Alert on it in real time. Every row is an event that never reached its destination.&lt;/p&gt;




&lt;h2&gt;
  
  
  When to Replay What
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Server down for 2 hours&lt;/td&gt;
&lt;td&gt;Automatic retry handles it&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bug deployed and fixed&lt;/td&gt;
&lt;td&gt;Manual replay by time range&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;New integration added&lt;/td&gt;
&lt;td&gt;Manual replay to backfill history&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Downstream API was down&lt;/td&gt;
&lt;td&gt;Automatic retry with backoff&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data migration&lt;/td&gt;
&lt;td&gt;Manual replay by topic and date&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Mistakes That Kill Replay Systems
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Not storing the raw payload.&lt;/strong&gt; Store the exact bytes Shopify sent. If you transform first, you cannot replay the original.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Processing in the ingestion endpoint.&lt;/strong&gt; 5-second window. Any business logic risks timeout and a missed event.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No idempotency.&lt;/strong&gt; Replay becomes dangerous instead of safe.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Unlimited retries.&lt;/strong&gt; Cap at 5-7. Move the rest to DLQ.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ignoring DLQ depth.&lt;/strong&gt; Silent DLQ growth means silent data loss.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tools to Accelerate This
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;BullMQ (Node.js)&lt;/td&gt;
&lt;td&gt;Queue + retry + backoff out of the box&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sidekiq (Ruby)&lt;/td&gt;
&lt;td&gt;Background jobs with retry logic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AWS SQS + Lambda&lt;/td&gt;
&lt;td&gt;Fully managed, serverless&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Temporal&lt;/td&gt;
&lt;td&gt;Durable workflows with built-in replay&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pg-boss&lt;/td&gt;
&lt;td&gt;PostgreSQL-backed job queue&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;The pattern is simple:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Ingest fast, store everything&lt;/li&gt;
&lt;li&gt;Process asynchronously&lt;/li&gt;
&lt;li&gt;Retry with backoff&lt;/li&gt;
&lt;li&gt;Route terminal failures to DLQ&lt;/li&gt;
&lt;li&gt;Replay on demand after fixes
Every layer compounds reliability. Start with the event store. Add the worker. Add retry logic. Then add the replay trigger.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Your Shopify store generates events constantly. A replay system makes sure none of them disappear.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://kolachitech.com/shopify-webhook-replay-system/" rel="noopener noreferrer"&gt;kolachitech.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>shopify</category>
      <category>webhooks</category>
      <category>backend</category>
      <category>javascript</category>
    </item>
    <item>
      <title>Stop Losing Shopify Webhooks: A Retry Strategy That Survives Real Outages</title>
      <dc:creator>Muhammad Masad Ashraf</dc:creator>
      <pubDate>Fri, 22 May 2026 00:40:32 +0000</pubDate>
      <link>https://dev.to/masadashraf/stop-losing-shopify-webhooks-a-retry-strategy-that-survives-real-outages-4jl4</link>
      <guid>https://dev.to/masadashraf/stop-losing-shopify-webhooks-a-retry-strategy-that-survives-real-outages-4jl4</guid>
      <description>&lt;h2&gt;
  
  
  Stop Losing Shopify Webhooks: A Retry Strategy That Survives Real Outages
&lt;/h2&gt;

&lt;p&gt;Here is a scenario every Shopify developer eventually lives through.&lt;/p&gt;

&lt;p&gt;It is the middle of a deploy. Your endpoint is down for maybe 90 minutes. Nothing dramatic. But while you were shipping, three orders came in, an inventory sync fired, and a customer updated their address.&lt;/p&gt;

&lt;p&gt;Shopify tried to deliver those webhooks. It retried. Then it gave up.&lt;/p&gt;

&lt;p&gt;Those events are gone. Not delayed. Gone.&lt;/p&gt;

&lt;p&gt;I have watched this exact thing happen, and the worst part is how quiet it is. No error in your logs. No alert. Just a slow drift between what Shopify knows and what your system knows. You find out days later when a customer asks where their order went.&lt;/p&gt;

&lt;p&gt;So let us talk about how to actually fix this, properly, with a retry strategy that survives more than a quick hiccup.&lt;/p&gt;

&lt;h2&gt;
  
  
  First, Know What Shopify Actually Does
&lt;/h2&gt;

&lt;p&gt;You cannot build a good retry layer until you know what the platform gives you for free.&lt;/p&gt;

&lt;p&gt;When Shopify sends a webhook, it waits &lt;strong&gt;5 seconds&lt;/strong&gt; for your endpoint to respond. Any 2xx status code is a success. Anything else, including a timeout, is a failure.&lt;/p&gt;

&lt;p&gt;After a failure, Shopify retries. And here is the part people get wrong, because the internet is full of outdated info:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;As of the September 2024 policy update, Shopify retries a failed webhook &lt;strong&gt;up to 8 times over a 4-hour window&lt;/strong&gt; with exponential backoff.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The old "19 retries over 48 hours" number you will find in older blog posts is dead. If your reliability code was written before late 2024, your timing assumptions are probably wrong.&lt;/p&gt;

&lt;p&gt;A quick reference:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Behavior&lt;/th&gt;
&lt;th&gt;Detail&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Timeout&lt;/td&gt;
&lt;td&gt;5 seconds per attempt&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Success&lt;/td&gt;
&lt;td&gt;Any 2xx status&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retries&lt;/td&gt;
&lt;td&gt;Up to 8 attempts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Window&lt;/td&gt;
&lt;td&gt;4 hours total&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Backoff&lt;/td&gt;
&lt;td&gt;Exponential&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Payload&lt;/td&gt;
&lt;td&gt;Original payload from trigger time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;The scary part&lt;/td&gt;
&lt;td&gt;Persistent failures &lt;strong&gt;delete the subscription&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That last row is the one that ruins weekends. If your endpoint fails enough, Shopify does not just drop events. It removes the webhook subscription. New events stop firing entirely until you re-register. Silent and total.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem With 4 Hours
&lt;/h2&gt;

&lt;p&gt;Exponential backoff over 8 attempts in 4 hours front-loads everything. Most of your retries happen in the first 30 minutes. By hour 2, roughly 5 of your 8 attempts are spent.&lt;/p&gt;

&lt;p&gt;So Shopify's retry system is genuinely good at one thing: surviving short, transient blips. A momentary network drop. A one-off timeout.&lt;/p&gt;

&lt;p&gt;It is bad at the thing that actually hurts you: real outages.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A deploy that runs long&lt;/li&gt;
&lt;li&gt;A downstream API that rate-limits you for hours&lt;/li&gt;
&lt;li&gt;A traffic spike during a sale that pushes responses past 5 seconds&lt;/li&gt;
&lt;li&gt;A botched HMAC secret rotation that 401s every webhook&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In all of those, the 4-hour window runs out before you recover. Shopify did its job. Your data is still lost.&lt;/p&gt;

&lt;p&gt;The conclusion is simple: &lt;strong&gt;you need your own retry layer.&lt;/strong&gt; Shopify's is the floor, not the ceiling.&lt;/p&gt;

&lt;h2&gt;
  
  
  Rule Zero: Acknowledge Fast, Process Later
&lt;/h2&gt;

&lt;p&gt;Before any retry logic, fix the most common mistake in Shopify webhook code: running business logic inside the webhook endpoint.&lt;/p&gt;

&lt;p&gt;If your handler hits a database, calls an external API, or does anything heavy, you are gambling against that 5-second timeout. Cross it once and Shopify marks a perfectly good delivery as failed.&lt;/p&gt;

&lt;p&gt;Your endpoint should do almost nothing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// The entire job of your webhook endpoint&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/webhooks/orders-create&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// 1. Verify the HMAC signature&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nf"&gt;verifyHmac&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sendStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;401&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// 2. Push the raw payload onto a queue&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;orders-create&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rawBody&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;eventId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;x-shopify-event-id&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="na"&gt;triggeredAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;x-shopify-triggered-at&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="c1"&gt;// 3. Respond immediately&lt;/span&gt;
  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sendStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is it. A background worker pulls from the queue and does the real work. This one change eliminates the majority of timeout failures, and it turns retries into a calm internal concern instead of a race against a clock.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building the Retry Layer
&lt;/h2&gt;

&lt;p&gt;Your retry logic lives in the worker, not the endpoint. When processing fails, the worker decides: retry, or give up?&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Classify the error
&lt;/h3&gt;

&lt;p&gt;Not every failure deserves a retry. Retrying a permanent error just burns resources.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Error type&lt;/th&gt;
&lt;th&gt;Examples&lt;/th&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Transient&lt;/td&gt;
&lt;td&gt;Timeout, 503, deadlock, rate limit&lt;/td&gt;
&lt;td&gt;Retry with backoff&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Permanent&lt;/td&gt;
&lt;td&gt;Invalid payload, missing field, validation error&lt;/td&gt;
&lt;td&gt;Straight to the dead letter queue&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Retrying a malformed payload 8 times will not magically make it valid. Categorize first, act second.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Exponential backoff with jitter
&lt;/h3&gt;

&lt;p&gt;For transient errors, retry with increasing delays. Each retry waits longer than the last.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Attempt&lt;/th&gt;
&lt;th&gt;Delay&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;30 seconds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;8 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;30 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;2 hours&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;6 hours&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;24 hours&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Notice the window stretches across &lt;strong&gt;days&lt;/strong&gt;, not Shopify's 4 hours. That is the whole point of building your own layer.&lt;/p&gt;

&lt;p&gt;But pure exponential backoff has a trap. If 500 webhooks fail at the same moment, they all retry at the same moment, and your recovering service gets hammered flat again. This is the thundering herd.&lt;/p&gt;

&lt;p&gt;Fix it with &lt;strong&gt;jitter&lt;/strong&gt;: add a small random offset to each delay. Instead of retrying at exactly 8 minutes, retry somewhere between 7 and 9. It spreads the load and smooths recovery.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;nextDelay&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;attempt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;base&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="nx"&gt;attempt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;86400&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// cap at 24h&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;jitter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;base&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// up to 30% jitter&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;base&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;jitter&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Cap retries, then use a dead letter queue
&lt;/h3&gt;

&lt;p&gt;Retries cannot run forever. Cap them, usually somewhere between 5 and 10 attempts. When a webhook exhausts its retries, it does not vanish. It moves to a &lt;strong&gt;dead letter queue&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The DLQ stores the failed webhook with full context: original payload, error message, attempt count, timestamp. Now you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Inspect&lt;/strong&gt; it and find the root cause&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reprocess&lt;/strong&gt; it once the bug is fixed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Alert&lt;/strong&gt; when the DLQ grows past a normal threshold&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That alert is your early warning system. A spike in DLQ volume tells you something is wrong before any customer notices.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Duplicate Problem Nobody Mentions
&lt;/h2&gt;

&lt;p&gt;Retries create a brand new problem: the same webhook arriving twice.&lt;/p&gt;

&lt;p&gt;Shopify delivers a webhook, your slow response times out, Shopify retries, but your worker already processed the first one. Now you have a duplicate. Without protection, that means double inventory adjustments, duplicate emails, double-charged orders.&lt;/p&gt;

&lt;p&gt;The fix is idempotency. Every Shopify webhook carries an &lt;code&gt;X-Shopify-Event-Id&lt;/code&gt; header, and that value stays identical across every retry of the same event.&lt;/p&gt;

&lt;p&gt;Use it as a dedup key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;processWebhook&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;eventId&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;alreadyProcessed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;eventId&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// safe no-op&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;doTheRealWork&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;markProcessed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;eventId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A retry strategy without deduplication is incomplete. Build this in from day one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Watch Out for Stale Payloads
&lt;/h2&gt;

&lt;p&gt;One more subtlety. When Shopify retries, it sends the &lt;strong&gt;original payload from when the event was triggered&lt;/strong&gt;, not the current state.&lt;/p&gt;

&lt;p&gt;If an order changed three times during the retry window, a late retry still carries the first version. Apply it blindly and you overwrite newer data with older data.&lt;/p&gt;

&lt;p&gt;Always check the &lt;code&gt;X-Shopify-Triggered-At&lt;/code&gt; header against your own records. If your data is already newer, skip the stale update or fetch fresh data from the Admin API instead.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Layer Even Retries Cannot Save You From
&lt;/h2&gt;

&lt;p&gt;Here is the uncomfortable truth: even a perfect retry layer cannot recover an event Shopify never sent, or one it dropped after its own retries failed.&lt;/p&gt;

&lt;p&gt;For that, you need &lt;strong&gt;reconciliation&lt;/strong&gt;. Periodically poll the Shopify Admin API and compare its data against yours. Hourly for high-value topics like &lt;code&gt;orders/*&lt;/code&gt;. Daily for lower-stakes data.&lt;/p&gt;

&lt;p&gt;Reconciliation is the final safety net. It catches events lost to extended outages, dropped events, and gaps from a removed subscription.&lt;/p&gt;

&lt;p&gt;Combine the three layers and you lose almost nothing:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Catches&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Shopify retries&lt;/td&gt;
&lt;td&gt;Short transient blips&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Your retry layer + DLQ&lt;/td&gt;
&lt;td&gt;Longer outages, downstream failures&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reconciliation&lt;/td&gt;
&lt;td&gt;Everything else, including dropped events&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Do Not Forget Monitoring
&lt;/h2&gt;

&lt;p&gt;A retry strategy fails silently without eyes on it. Track these:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Webhook failure rate&lt;/li&gt;
&lt;li&gt;DLQ size and growth&lt;/li&gt;
&lt;li&gt;Retry volume&lt;/li&gt;
&lt;li&gt;Subscription status (run a daily check, alert on any missing topic)&lt;/li&gt;
&lt;li&gt;Processing latency creeping toward 5 seconds&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Shopify's Dev Dashboard has a delivery metrics report with response codes and retry counts per topic. Use it alongside your own monitoring, not instead of it.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;Shopify gives you 8 retries over 4 hours. Real outages last longer than that. So:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Acknowledge fast.&lt;/strong&gt; Return 200 in under 5 seconds, queue the payload.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Classify errors.&lt;/strong&gt; Transient gets retried, permanent goes to the DLQ.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exponential backoff with jitter.&lt;/strong&gt; Retry over days, not hours.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Idempotency.&lt;/strong&gt; Use &lt;code&gt;X-Shopify-Event-Id&lt;/code&gt; to kill duplicates.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dead letter queue.&lt;/strong&gt; Nothing gets lost, everything is replayable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reconcile.&lt;/strong&gt; Poll the Admin API to catch what slipped through.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor.&lt;/strong&gt; Surface problems before customers do.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Build these layers once and your integration stops losing data, even on its worst day.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If you want the full guide, &lt;a href="https://kolachitech.com/shopify-webhook-retry-strategies/" rel="noopener noreferrer"&gt;you can read it here&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;How does your team handle webhook reliability? I would genuinely like to hear what has worked for you in the comments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>shopify</category>
      <category>webhooks</category>
      <category>webdev</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Building a Dead Letter Queue for Shopify Webhooks (Production-Ready Guide)</title>
      <dc:creator>Muhammad Masad Ashraf</dc:creator>
      <pubDate>Mon, 18 May 2026 16:02:08 +0000</pubDate>
      <link>https://dev.to/masadashraf/building-a-dead-letter-queue-for-shopify-webhooks-production-ready-guide-55ad</link>
      <guid>https://dev.to/masadashraf/building-a-dead-letter-queue-for-shopify-webhooks-production-ready-guide-55ad</guid>
      <description>&lt;h1&gt;
  
  
  Building a Dead Letter Queue for Shopify Webhooks
&lt;/h1&gt;

&lt;p&gt;Your Shopify webhook just failed. Network timeout. Database deadlock. Third-party API down. &lt;/p&gt;

&lt;p&gt;The webhook payload? Gone forever.&lt;/p&gt;

&lt;p&gt;Unless you have a dead letter queue.&lt;/p&gt;

&lt;p&gt;This guide walks through building a production-ready DLQ system for Shopify webhooks. I'll show you the architecture, code examples, and mistakes to avoid based on real production systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Shopify's Built-in Retry Isn't Enough
&lt;/h2&gt;

&lt;p&gt;Shopify retries failed webhooks up to 19 times over 48 hours. After that, they stop delivering to your endpoint.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Attempt 1: Immediate
Attempt 2: 1 second later
Attempt 3: 2 seconds later
...
Attempt 19: 48 hours later
Then: 💀 Webhook deleted forever
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For production systems handling payments, orders, and inventory, this isn't acceptable. You need a safety net.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;

&lt;p&gt;Here's the high-level flow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Shopify → Your Webhook Endpoint → Primary Queue → Worker → Success ✓
                    ↓                                    ↓
              200 OK (always)                        Failure ✗
                                                          ↓
                                                   Dead Letter Queue
                                                          ↓
                                                   Retry Logic
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Critical principle:&lt;/strong&gt; Always return 200 OK to Shopify immediately. Process webhooks asynchronously.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Stack
&lt;/h2&gt;

&lt;p&gt;I'll use this stack for examples (adapt to your needs):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Queue:&lt;/strong&gt; Redis with Bull (Node.js) or AWS SQS&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database:&lt;/strong&gt; PostgreSQL for DLQ storage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runtime:&lt;/strong&gt; Node.js with Express&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitoring:&lt;/strong&gt; Custom dashboard + alerts&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Implementation: Step by Step
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Webhook Reception (Never Block Here)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// webhook-receiver.js&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;express&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;express&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;crypto&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;crypto&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;primaryQueue&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./queues&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;express&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/webhooks/shopify/:topic&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;express&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;topic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;hmac&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;X-Shopify-Hmac-Sha256&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="c1"&gt;// Verify webhook signature&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nf"&gt;verifyWebhook&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;hmac&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;401&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Invalid signature&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// Respond to Shopify IMMEDIATELY&lt;/span&gt;
  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;OK&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Queue for async processing (never blocks)&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;primaryQueue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;shopDomain&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;X-Shopify-Shop-Domain&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="na"&gt;webhookId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;X-Shopify-Webhook-Id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="na"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt;
      &lt;span class="na"&gt;receivedAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;attempts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;backoff&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;exponential&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;delay&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt; &lt;span class="c1"&gt;// Start with 10 seconds&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// If queuing fails, write directly to DLQ&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;saveToDLQ&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt;
      &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Failed to queue webhook&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;failureReason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;verifyWebhook&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;hmac&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;hash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;crypto&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createHmac&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;sha256&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;SHOPIFY_WEBHOOK_SECRET&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;digest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;base64&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;hash&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nx"&gt;hmac&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Worker with DLQ Fallback
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// webhook-worker.js&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;primaryQueue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;dlqQueue&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./queues&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;processWebhook&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./processors&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;saveToDLQ&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;incrementRetryCount&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./dlq-storage&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nx"&gt;primaryQueue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;webhookId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;shopDomain&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Your business logic here&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;processWebhook&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;shopDomain&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// If we got here, processing succeeded&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;success&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;webhookId&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;

  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Categorize the error&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;isTransientError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;// Let Bull retry with exponential backoff&lt;/span&gt;
      &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;// Permanent error - move to DLQ immediately&lt;/span&gt;
      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;saveToDLQ&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="nx"&gt;webhookId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nx"&gt;shopDomain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;stackTrace&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;retryCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;attemptsMade&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;firstFailedAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;lastAttemptedAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;

      &lt;span class="c1"&gt;// Don't throw - we handled it&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;success&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;movedToDLQ&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// When all retries exhausted, Bull calls this&lt;/span&gt;
&lt;span class="nx"&gt;primaryQueue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;failed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;attemptsMade&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;attempts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;saveToDLQ&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;webhookId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;webhookId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;shopDomain&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;shopDomain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Max retries exceeded&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;stackTrace&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;retryCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;attemptsMade&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;firstFailedAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;lastAttemptedAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;isTransientError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Network issues, rate limits, timeouts&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;code&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ECONNREFUSED&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt;
         &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;code&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ETIMEDOUT&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt;
         &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;statusCode&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;429&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt;
         &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;statusCode&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;503&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt;
         &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;deadlock&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. DLQ Storage Schema
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- dlq_webhooks table&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;dlq_webhooks&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;gen_random_uuid&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="n"&gt;webhook_id&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;UNIQUE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;topic&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;shop_domain&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="n"&gt;JSONB&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;error_message&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;stack_trace&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;retry_count&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;first_failed_at&lt;/span&gt; &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;last_attempted_at&lt;/span&gt; &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="s1"&gt;'pending'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;-- pending, processing, resolved, abandoned&lt;/span&gt;
  &lt;span class="n"&gt;resolved_at&lt;/span&gt; &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;notes&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Indexes for querying&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_dlq_status&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;dlq_webhooks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_dlq_topic&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;dlq_webhooks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_dlq_shop&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;dlq_webhooks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shop_domain&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_dlq_first_failed&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;dlq_webhooks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;first_failed_at&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. DLQ Storage Functions
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// dlq-storage.js&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Pool&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;pg&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;pool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Pool&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;saveToDLQ&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;webhookData&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;webhookId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;shopDomain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;stackTrace&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;retryCount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;firstFailedAt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;lastAttemptedAt&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;webhookData&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`
    INSERT INTO dlq_webhooks (
      webhook_id, topic, shop_domain, payload,
      error_message, stack_trace, retry_count,
      first_failed_at, last_attempted_at
    ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9)
    ON CONFLICT (webhook_id) 
    DO UPDATE SET
      retry_count = dlq_webhooks.retry_count + 1,
      last_attempted_at = $9,
      error_message = $5,
      stack_trace = $6
    RETURNING id
  `&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="nx"&gt;webhookId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;shopDomain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;stackTrace&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;retryCount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;firstFailedAt&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
      &lt;span class="nx"&gt;lastAttemptedAt&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;]);&lt;/span&gt;

    &lt;span class="c1"&gt;// Alert if this is a new DLQ entry&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rowCount&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;alertDLQEntry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;webhookId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;shopDomain&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Failed to save to DLQ:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="c1"&gt;// This is serious - DLQ itself failed&lt;/span&gt;
    &lt;span class="c1"&gt;// Log to external service (Sentry, etc.)&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getDLQWebhooks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;pending&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;limit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`
    SELECT * FROM dlq_webhooks
    WHERE status = $1
    ORDER BY first_failed_at ASC
    LIMIT $2
  `&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;markAsResolved&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;webhookId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;notes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`
    UPDATE dlq_webhooks
    SET status = 'resolved',
        resolved_at = NOW(),
        notes = $2
    WHERE webhook_id = $1
  `&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;webhookId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;notes&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;exports&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;saveToDLQ&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;getDLQWebhooks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;markAsResolved&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. Automated DLQ Retry Worker
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// dlq-retry-worker.js&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;getDLQWebhooks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;markAsResolved&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;saveToDLQ&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./dlq-storage&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;processWebhook&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./processors&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;processDLQWebhooks&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;webhooks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getDLQWebhooks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;pending&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;webhook&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;webhooks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;// Update status to prevent concurrent processing&lt;/span&gt;
      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;updateDLQStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;webhook&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;webhook_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;processing&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

      &lt;span class="c1"&gt;// Try to process again&lt;/span&gt;
      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;processWebhook&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nx"&gt;webhook&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nx"&gt;webhook&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nx"&gt;webhook&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;shop_domain&lt;/span&gt;
      &lt;span class="p"&gt;);&lt;/span&gt;

      &lt;span class="c1"&gt;// Success! Mark as resolved&lt;/span&gt;
      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;markAsResolved&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;webhook&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;webhook_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Auto-resolved by retry worker&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;// Still failing&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;newRetryCount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;webhook&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;retry_count&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;newRetryCount&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Give up after 10 DLQ retries&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;updateDLQStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;webhook&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;webhook_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;abandoned&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;alertPermanentFailure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;webhook&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Save back to DLQ with updated count&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;saveToDLQ&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
          &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;webhook&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;retryCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;newRetryCount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;stackTrace&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;lastAttemptedAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="p"&gt;});&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Run every 5 minutes&lt;/span&gt;
&lt;span class="nf"&gt;setInterval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;processDLQWebhooks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Monitoring Dashboard
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// dlq-dashboard.js - Express route&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/api/dlq/stats&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;stats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
    SELECT
      status,
      COUNT(*) as count,
      MIN(first_failed_at) as oldest,
      MAX(first_failed_at) as newest
    FROM dlq_webhooks
    GROUP BY status
  `&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;byTopic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
    SELECT
      topic,
      COUNT(*) as count
    FROM dlq_webhooks
    WHERE status = 'pending'
    GROUP BY topic
    ORDER BY count DESC
  `&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;byTopic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;byTopic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rows&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Alerting Setup
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// alerting.js&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;sendSlackAlert&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./slack&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;alertDLQEntry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;webhookId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;shopDomain&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Check DLQ depth&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;depth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getDLQDepth&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;depth&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;sendSlackAlert&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;channel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;#alerts-critical&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`🚨 DLQ depth exceeds 100: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;depth&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; webhooks`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;critical&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;alertPermanentFailure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;webhook&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;sendSlackAlert&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;channel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;#alerts-high&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`⚠️ Webhook permanently failed after 10 retries\nTopic: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;webhook&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;\nShop: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;webhook&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;shop_domain&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;\nWebhook ID: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;webhook&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;webhook_id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;high&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Testing Your DLQ
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// test-dlq.js&lt;/span&gt;
&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Dead Letter Queue&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;should save webhook to DLQ on processing failure&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;testWebhook&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;webhookId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;test-123&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;orders/create&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;shopDomain&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;test-shop.myshopify.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;123&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;total&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;50.00&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Database connection failed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;stackTrace&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Error: ...&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;retryCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;saveToDLQ&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;testWebhook&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;dlqWebhooks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getDLQWebhooks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;pending&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;dlqWebhooks&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toHaveLength&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;dlqWebhooks&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;webhook_id&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toBe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;test-123&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;should retry webhook from DLQ&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Test retry logic&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;should mark webhook as abandoned after max retries&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Test abandonment logic&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Production Considerations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Idempotency
&lt;/h3&gt;

&lt;p&gt;Store processed webhook IDs to prevent duplicate processing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;processWebhook&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;shopDomain&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;webhookId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;admin_graphql_api_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="c1"&gt;// Check if already processed&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;alreadyProcessed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`processed:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;webhookId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;alreadyProcessed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Webhook &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;webhookId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; already processed, skipping`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// Process webhook&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;yourBusinessLogic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Mark as processed (expire after 7 days)&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`processed:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;webhookId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Exponential Backoff Configuration
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;RETRY_CONFIG&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;attempts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;backoff&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;exponential&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;delay&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// 10 seconds&lt;/span&gt;
    &lt;span class="na"&gt;options&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;maxDelay&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1800000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// 30 minutes max&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="c1"&gt;// Results in retry schedule:&lt;/span&gt;
&lt;span class="c1"&gt;// 1. 10 seconds&lt;/span&gt;
&lt;span class="c1"&gt;// 2. 1 minute&lt;/span&gt;
&lt;span class="c1"&gt;// 3. 5 minutes&lt;/span&gt;
&lt;span class="c1"&gt;// 4. 15 minutes&lt;/span&gt;
&lt;span class="c1"&gt;// 5. 30 minutes&lt;/span&gt;
&lt;span class="c1"&gt;// 6. 30 minutes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Memory Management
&lt;/h3&gt;

&lt;p&gt;For high-volume stores:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Process DLQ in batches&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;processDLQBatch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;batchSize&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;webhooks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getDLQWebhooks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;pending&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;batchSize&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Process in parallel with concurrency limit&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;webhooks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;webhook&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; 
      &lt;span class="nx"&gt;limiter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;schedule&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;retryWebhook&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;webhook&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Use p-limit for concurrency control&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;pLimit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;p-limit&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;limiter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;pLimit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// Max 10 concurrent&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Common Mistakes to Avoid
&lt;/h2&gt;

&lt;p&gt;❌ &lt;strong&gt;Processing webhooks synchronously in the HTTP handler&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// DON'T DO THIS&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/webhook&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;processWebhook&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// Blocks!&lt;/span&gt;
  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;OK&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// Too late, Shopify timed out&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;❌ &lt;strong&gt;No maximum retry limit&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// DON'T DO THIS&lt;/span&gt;
&lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;success&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;retry&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// Infinite loop&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;❌ &lt;strong&gt;Not categorizing errors&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// DON'T DO THIS&lt;/span&gt;
&lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// Always retry, even for permanent failures&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Monitoring Checklist
&lt;/h2&gt;

&lt;p&gt;Monitor these metrics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ DLQ depth (alert &amp;gt; 100)&lt;/li&gt;
&lt;li&gt;✅ Age of oldest message (alert &amp;gt; 4 hours)&lt;/li&gt;
&lt;li&gt;✅ Retry success rate&lt;/li&gt;
&lt;li&gt;✅ Error patterns (group by error type)&lt;/li&gt;
&lt;li&gt;✅ Processing latency (p50, p95, p99)&lt;/li&gt;
&lt;li&gt;✅ Webhook topics in DLQ (which events fail most)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://shopify.dev/docs/apps/webhooks" rel="noopener noreferrer"&gt;Shopify Webhook Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/OptimalBits/bull" rel="noopener noreferrer"&gt;Bull Queue Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-dead-letter-queues.html" rel="noopener noreferrer"&gt;AWS SQS DLQ Guide&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Full Implementation
&lt;/h2&gt;

&lt;p&gt;The complete working implementation with all the code above is available on our blog:&lt;br&gt;
👉 &lt;a href="https://kolachitech.com/dead-letter-queue-shopify-webhooks/" rel="noopener noreferrer"&gt;Dead Letter Queue Shopify Webhooks: Complete Guide&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;What's your approach to webhook reliability? Drop your setup in the comments below.&lt;/p&gt;

&lt;p&gt;Building Shopify integrations? We help e-commerce brands build fault-tolerant systems. Check out &lt;a href="https://kolachitech.com" rel="noopener noreferrer"&gt;Kolachi Tech&lt;/a&gt; for more technical deep-dives.&lt;/p&gt;

</description>
      <category>shopify</category>
      <category>webhooks</category>
      <category>architecture</category>
      <category>javascript</category>
    </item>
    <item>
      <title>Scalable Shopify Integration Patterns Every Developer Should Know</title>
      <dc:creator>Muhammad Masad Ashraf</dc:creator>
      <pubDate>Thu, 14 May 2026 22:38:17 +0000</pubDate>
      <link>https://dev.to/masadashraf/scalable-shopify-integration-patterns-every-developer-should-know-2l94</link>
      <guid>https://dev.to/masadashraf/scalable-shopify-integration-patterns-every-developer-should-know-2l94</guid>
      <description>&lt;p&gt;Your Shopify integration works great. Until it does not.&lt;/p&gt;

&lt;p&gt;You hit a traffic spike. Webhooks start failing. Your API quota drains in seconds. Orders process out of order. Data between your store and your warehouse drifts further apart by the minute.&lt;/p&gt;

&lt;p&gt;This is not a bug. It is an architecture problem. And it has well-understood solutions.&lt;/p&gt;

&lt;p&gt;This post covers the core scalable Shopify integration patterns I see used in high-volume production systems, with practical notes on when and how to apply each one.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Classic Failure Stack
&lt;/h2&gt;

&lt;p&gt;Before the patterns, here is what going wrong looks like in practice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Shopify fires webhook
  -&amp;gt; Your handler receives it
  -&amp;gt; Handler calls ERP API (2s)
  -&amp;gt; Handler updates database (1s)
  -&amp;gt; Handler calls shipping provider (3s)
  -&amp;gt; Total: 6s
  -&amp;gt; Shopify timeout: 5s
  -&amp;gt; Shopify marks delivery as FAILED
  -&amp;gt; Shopify retries
  -&amp;gt; You process the same order twice
  -&amp;gt; Inventory is now wrong
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every step in that chain is fixable. Here is how.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pattern 1: Queue-Based Webhook Processing
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The rule:&lt;/strong&gt; Never do real work inside a webhook handler.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;POST /webhooks/orders/create
  -&amp;gt; Parse and validate HMAC     (fast)
  -&amp;gt; Push payload to queue       (fast)
  -&amp;gt; Return 200                  (done)

Queue Worker:
  -&amp;gt; Pull job from queue
  -&amp;gt; Process order
  -&amp;gt; Call ERP, update DB, notify warehouse
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tools that work well here: BullMQ (Node.js), Sidekiq (Ruby), Celery (Python), SQS (any language).&lt;/p&gt;

&lt;p&gt;The handler should complete in under 500ms. Everything else belongs in a worker.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pattern 2: Idempotency Keys
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The rule:&lt;/strong&gt; Processing the same event twice should produce the same result as processing it once.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;processOrder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;webhookPayload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;idempotencyKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`order-created-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;webhookPayload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;alreadyProcessed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;idempotencyKey&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;alreadyProcessed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// skip, already handled&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;doTheActualWork&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;webhookPayload&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;idempotencyKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;EX&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;86400&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// expire after 24h&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without this, any retry from Shopify or your own queue creates duplicates. With it, retries are safe by default.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pattern 3: Caching Shopify API Responses
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The rule:&lt;/strong&gt; Do not call Shopify on every request for data that rarely changes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getProduct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;productId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cacheKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`shopify:product:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;productId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cached&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cacheKey&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cached&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cached&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;product&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;shopify&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;product&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;productId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cacheKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;product&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;EX&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// 5 min TTL&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;product&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Invalidate when Shopify tells you the data changed&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/webhooks/products/update&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;del&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`shopify:product:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;sync-product&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;productId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sendStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cache product data, metafields, and store config. Do not cache inventory or order status.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pattern 4: Event-Driven Architecture
&lt;/h2&gt;

&lt;p&gt;Instead of services calling each other directly, they emit and consume events.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Shopify
  -&amp;gt; fires "order/created" webhook
    -&amp;gt; Order Service processes order
      -&amp;gt; emits "order.confirmed" event
        -&amp;gt; Inventory Service decrements stock
        -&amp;gt; Notification Service sends confirmation email
        -&amp;gt; Analytics Service logs the conversion
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each service is independent. Each can fail and recover without affecting the others. New integrations subscribe to existing events without touching existing code.&lt;/p&gt;

&lt;p&gt;Message brokers that work well: RabbitMQ, Kafka, AWS EventBridge, Google Pub/Sub.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pattern 5: Async Processing for Slow Operations
&lt;/h2&gt;

&lt;p&gt;For anything that takes more than a second or two, go async.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;POST&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nx"&gt;api&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nx"&gt;bulk&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;import&lt;/span&gt;
  &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;Validate&lt;/span&gt; &lt;span class="nx"&gt;the&lt;/span&gt; &lt;span class="nx"&gt;request&lt;/span&gt;
  &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;Create&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt; &lt;span class="nx"&gt;job&lt;/span&gt; &lt;span class="nx"&gt;record&lt;/span&gt; &lt;span class="kd"&gt;with&lt;/span&gt; &lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;pending&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;Push&lt;/span&gt; &lt;span class="nx"&gt;to&lt;/span&gt; &lt;span class="nx"&gt;queue&lt;/span&gt;
  &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;Return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;jobId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;abc123&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;pending&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;GET&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nx"&gt;api&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nx"&gt;jobs&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nx"&gt;abc123&lt;/span&gt;
  &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;Return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;jobId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;abc123&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;processing&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;progress&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// or use a webhook to notify when done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern applies to: bulk product imports, inventory reconciliation, report generation, large sync jobs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pattern 6: Circuit Breakers for External Dependencies
&lt;/h2&gt;

&lt;p&gt;When a downstream service is failing, stop hammering it. Use a circuit breaker.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;breaker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;CircuitBreaker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;callShippingProvider&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;errorThresholdPercentage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;resetTimeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;30000&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;breaker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fallback&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;queued&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Will retry shortly&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}));&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;breaker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fire&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;orderData&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Libraries: &lt;code&gt;opossum&lt;/code&gt; (Node.js), &lt;code&gt;resilience4j&lt;/code&gt; (Java), &lt;code&gt;pybreaker&lt;/code&gt; (Python).&lt;/p&gt;

&lt;p&gt;Circuit breakers prevent one failing dependency from cascading into a full system outage.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pattern 7: Multi-Service by Domain
&lt;/h2&gt;

&lt;p&gt;Split your integration by business domain, not by technical layer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Instead of:
  integration-monolith/
    orders.js
    inventory.js
    shipping.js
    notifications.js

Do this:
  order-service/         &amp;lt;- owns order lifecycle
  inventory-service/     &amp;lt;- owns stock levels
  shipping-service/      &amp;lt;- owns carrier integrations
  notification-service/  &amp;lt;- owns all outbound messaging
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each service:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Has its own database&lt;/li&gt;
&lt;li&gt;Scales independently&lt;/li&gt;
&lt;li&gt;Deploys independently&lt;/li&gt;
&lt;li&gt;Fails in isolation&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  When to Apply Each Pattern
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;th&gt;Start applying at...&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Queue-based webhooks&lt;/td&gt;
&lt;td&gt;Day 1, always&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Idempotency keys&lt;/td&gt;
&lt;td&gt;Day 1, always&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Caching layer&lt;/td&gt;
&lt;td&gt;When API errors appear&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Async architecture&lt;/td&gt;
&lt;td&gt;When handlers exceed 2s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Event-driven design&lt;/td&gt;
&lt;td&gt;When services start coupling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Circuit breakers&lt;/td&gt;
&lt;td&gt;When you have 3+ external dependencies&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-service split&lt;/td&gt;
&lt;td&gt;When one part needs to scale differently&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Quick Wins Checklist
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Webhook handler returns 200 in under 500ms&lt;/li&gt;
&lt;li&gt;[ ] Every job has an idempotency key check&lt;/li&gt;
&lt;li&gt;[ ] Frequently read data is cached in Redis&lt;/li&gt;
&lt;li&gt;[ ] Slow operations run in background workers&lt;/li&gt;
&lt;li&gt;[ ] Retry logic exists for all external API calls&lt;/li&gt;
&lt;li&gt;[ ] Dead letter queue captures persistently failed jobs&lt;/li&gt;
&lt;li&gt;[ ] At least one circuit breaker on a critical dependency&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Final Thought
&lt;/h2&gt;

&lt;p&gt;Scalable Shopify integration patterns are not about adding complexity for its own sake. They are about removing fragility before it costs you customers, revenue, or a 3am incident.&lt;/p&gt;

&lt;p&gt;Start with the queue and idempotency. Everything else can follow as the system demands it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Full guide with infrastructure breakdowns: &lt;a href="https://kolachitech.com/scalable-shopify-integration-patterns/" rel="noopener noreferrer"&gt;kolachitech.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Drop questions or your own pattern war stories in the comments.&lt;/p&gt;

</description>
      <category>shopify</category>
      <category>architecture</category>
      <category>webdev</category>
      <category>backend</category>
    </item>
    <item>
      <title>Multi-Service Architecture for Shopify Ecosystems</title>
      <dc:creator>Muhammad Masad Ashraf</dc:creator>
      <pubDate>Wed, 13 May 2026 18:45:27 +0000</pubDate>
      <link>https://dev.to/masadashraf/multi-service-architecture-for-shopify-ecosystems-1o92</link>
      <guid>https://dev.to/masadashraf/multi-service-architecture-for-shopify-ecosystems-1o92</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Most Shopify stores start simple. One theme, a few apps, the Admin API, and a payment provider.&lt;/p&gt;

&lt;p&gt;Then growth happens.&lt;/p&gt;

&lt;p&gt;You add a custom fulfillment service. Then an ERP sync. Then a loyalty engine. Then a multi-warehouse inventory system. Before long, you are managing a dozen interconnected systems, all talking to Shopify in different ways, at different times, with different failure modes.&lt;/p&gt;

&lt;p&gt;That is not a simple store anymore. That is a distributed ecosystem. And without a deliberate architecture, it becomes a maintenance nightmare.&lt;/p&gt;

&lt;p&gt;Multi-service Shopify architecture is the discipline of designing that ecosystem intentionally. It gives each service a clear responsibility, a defined communication contract, and a path to scale independently without breaking everything else.&lt;/p&gt;

&lt;p&gt;This guide walks you through the core concepts, patterns, and practical decisions you need to build a Shopify ecosystem that grows with your business.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is Multi-Service Architecture in the Shopify Context?
&lt;/h2&gt;

&lt;p&gt;Multi-service architecture means splitting your Shopify ecosystem into distinct, independently deployable services. Each service owns a specific domain and communicates with other services through well-defined interfaces.&lt;/p&gt;

&lt;p&gt;In a Shopify ecosystem, services typically map to business domains:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;th&gt;Responsibility&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Order Service&lt;/td&gt;
&lt;td&gt;Receives and processes Shopify order events&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Inventory Service&lt;/td&gt;
&lt;td&gt;Syncs stock levels across warehouses and Shopify&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fulfillment Service&lt;/td&gt;
&lt;td&gt;Manages pick, pack, and ship workflows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Notification Service&lt;/td&gt;
&lt;td&gt;Sends email, SMS, and push notifications&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Analytics Service&lt;/td&gt;
&lt;td&gt;Aggregates and stores business metrics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Customer Service&lt;/td&gt;
&lt;td&gt;Manages customer profiles and loyalty data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pricing Service&lt;/td&gt;
&lt;td&gt;Handles discount rules, pricing tiers, and promotions&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each service runs independently. A failure in the Notification Service should not take down order processing. A spike in analytics events should not slow down fulfillment.&lt;/p&gt;

&lt;p&gt;That isolation is the primary value of multi-service Shopify architecture.&lt;/p&gt;




&lt;h2&gt;
  
  
  Monolith vs. Multi-Service: When to Make the Switch
&lt;/h2&gt;

&lt;p&gt;Not every Shopify store needs a multi-service architecture. Starting with a monolith is often the right call.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Factor&lt;/th&gt;
&lt;th&gt;Monolith&lt;/th&gt;
&lt;th&gt;Multi-Service&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Team size&lt;/td&gt;
&lt;td&gt;1 to 3 developers&lt;/td&gt;
&lt;td&gt;4 or more developers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Throughput&lt;/td&gt;
&lt;td&gt;Low to moderate&lt;/td&gt;
&lt;td&gt;High or unpredictable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Domain complexity&lt;/td&gt;
&lt;td&gt;Simple&lt;/td&gt;
&lt;td&gt;Multiple distinct domains&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deployment frequency&lt;/td&gt;
&lt;td&gt;Infrequent&lt;/td&gt;
&lt;td&gt;Independent per service&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Failure tolerance&lt;/td&gt;
&lt;td&gt;One service can fail and be acceptable&lt;/td&gt;
&lt;td&gt;Individual failures must not cascade&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scaling needs&lt;/td&gt;
&lt;td&gt;Scale the whole app&lt;/td&gt;
&lt;td&gt;Scale individual services&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The signal to move toward multi-service architecture is usually one of three things: your monolith is getting hard to deploy without breaking things, a single domain needs to scale faster than the rest, or different teams need to work independently without stepping on each other.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Core Patterns of Shopify SOA
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Domain-Driven Service Boundaries
&lt;/h3&gt;

&lt;p&gt;Each service should own a single business domain. Do not split services by technical layer (database service, API service). Split them by what they do for the business.&lt;/p&gt;

&lt;p&gt;Good boundaries follow natural seams in your data. The Order Service owns order records. The Inventory Service owns stock counts. Neither reaches into the other's database directly.&lt;/p&gt;

&lt;p&gt;When services need shared data, they communicate through APIs or events. They never share a database table. This boundary enforcement is what makes services independently deployable.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The API Gateway Pattern
&lt;/h3&gt;

&lt;p&gt;In a multi-service ecosystem, you do not expose every internal service directly to Shopify or external consumers. You use an API gateway as the single entry point.&lt;/p&gt;

&lt;p&gt;The gateway handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Request routing to the correct upstream service&lt;/li&gt;
&lt;li&gt;Authentication and rate limiting&lt;/li&gt;
&lt;li&gt;Response aggregation from multiple services&lt;/li&gt;
&lt;li&gt;Protocol translation (REST to gRPC, for example)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Shopify webhooks hit the gateway first. The gateway validates the HMAC signature, identifies the event type, and routes it to the correct internal service. No individual service needs to handle authentication logic.&lt;/p&gt;

&lt;p&gt;Pairing the API gateway with &lt;a href="https://kolachitech.com/shopify-webhooks/" rel="noopener noreferrer"&gt;Shopify webhooks&lt;/a&gt; creates a clean, centralized entry point for all Shopify-originated traffic.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Event-Driven Communication
&lt;/h3&gt;

&lt;p&gt;Services in a Shopify ecosystem should communicate through events wherever possible. Events decouple services from each other. The Order Service does not need to know the Notification Service exists. It just fires an &lt;code&gt;order.paid&lt;/code&gt; event. Any service that cares about paid orders subscribes to it.&lt;/p&gt;

&lt;p&gt;This is the foundation of &lt;a href="https://kolachitech.com/event-driven-architecture-shopify/" rel="noopener noreferrer"&gt;event-driven architecture for Shopify apps&lt;/a&gt;. It keeps services loosely coupled, independently deployable, and easy to extend.&lt;/p&gt;

&lt;p&gt;Add a new service? Subscribe it to the relevant events. Remove a service? Other services continue without any changes.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Synchronous vs. Asynchronous Communication
&lt;/h3&gt;

&lt;p&gt;Not all inter-service communication should use events. Some operations need an immediate response.&lt;/p&gt;

&lt;p&gt;Use these as guidelines:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;th&gt;When to Use&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Synchronous (REST/GraphQL)&lt;/td&gt;
&lt;td&gt;Need an immediate response&lt;/td&gt;
&lt;td&gt;Pricing service returning a discount value at checkout&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Asynchronous (events/queues)&lt;/td&gt;
&lt;td&gt;Fire-and-forget, no immediate response needed&lt;/td&gt;
&lt;td&gt;Triggering a fulfillment job after order payment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Request-reply via queue&lt;/td&gt;
&lt;td&gt;Async but need confirmation&lt;/td&gt;
&lt;td&gt;Confirming inventory reservation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For async workflows, build on &lt;a href="https://kolachitech.com/queue-based-shopify-webhook-processing/" rel="noopener noreferrer"&gt;queue-based processing&lt;/a&gt; so events are durable, retryable, and independent of service uptime.&lt;/p&gt;




&lt;h2&gt;
  
  
  Service Orchestration vs. Choreography in Shopify
&lt;/h2&gt;

&lt;p&gt;These are two distinct approaches to coordinating work across services. Both apply in Shopify ecosystems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Orchestration
&lt;/h3&gt;

&lt;p&gt;One central service (the orchestrator) coordinates the workflow. It calls services in sequence and handles the results.&lt;/p&gt;

&lt;p&gt;Example: The Order Service orchestrates the post-purchase workflow. It calls the Inventory Service to reserve stock, calls the Fulfillment Service to create a shipment, then calls the Notification Service to send a confirmation.&lt;/p&gt;

&lt;p&gt;Pros: Easy to trace and debug. The workflow logic lives in one place.&lt;/p&gt;

&lt;p&gt;Cons: The orchestrator becomes a bottleneck and a single point of failure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Choreography
&lt;/h3&gt;

&lt;p&gt;Services react to events autonomously. No central coordinator. Each service does its job when it sees the right event.&lt;/p&gt;

&lt;p&gt;Example: The Order Service publishes &lt;code&gt;order.paid&lt;/code&gt;. The Inventory Service hears it and deducts stock. The Fulfillment Service hears it and creates a shipment. The Notification Service hears it and sends a confirmation. All in parallel, none aware of the others.&lt;/p&gt;

&lt;p&gt;Pros: Highly decoupled and scalable.&lt;/p&gt;

&lt;p&gt;Cons: Harder to trace. Business logic is spread across services.&lt;/p&gt;

&lt;p&gt;Most mature Shopify ecosystems use both. Orchestration for complex workflows that need guaranteed sequencing. Choreography for high-volume, loosely coupled event processing.&lt;/p&gt;

&lt;p&gt;For the async side of this, see our deep dive on &lt;a href="https://kolachitech.com/async-shopify-architecture/" rel="noopener noreferrer"&gt;async Shopify architectures&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Shopify as the Source of Truth
&lt;/h2&gt;

&lt;p&gt;In a multi-service ecosystem, every service eventually needs to agree on what happened in Shopify.&lt;/p&gt;

&lt;p&gt;Shopify is the source of truth for orders, products, customers, and inventory. Your internal services hold derived or cached copies of that data for their own processing needs.&lt;/p&gt;

&lt;p&gt;This has two important implications:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First&lt;/strong&gt;, when services conflict, Shopify wins. Your Analytics Service may have a different order count than Shopify for a few minutes during a sync. Shopify's count is correct.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Second&lt;/strong&gt;, your services must be able to rebuild their local state from Shopify if something goes wrong. Design every service with a re-sync capability that pulls current state from the Shopify Admin API or &lt;a href="https://kolachitech.com/shopify-graphql-api/" rel="noopener noreferrer"&gt;Shopify GraphQL API&lt;/a&gt; on demand.&lt;/p&gt;

&lt;h2&gt;
  
  
  This also means your services need to handle &lt;a href="https://kolachitech.com/distributed-shopify-inventory-sync/" rel="noopener noreferrer"&gt;distributed Shopify inventory sync&lt;/a&gt; gracefully, with eventual consistency baked in rather than bolted on after the fact.
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Building Resilient Services in a Shopify Ecosystem
&lt;/h2&gt;

&lt;p&gt;Each individual service must be built to fail gracefully. In a multi-service architecture, external dependencies will be unavailable sometimes. A service that crashes when a downstream call fails will create cascading failures.&lt;/p&gt;

&lt;h3&gt;
  
  
  Circuit Breakers
&lt;/h3&gt;

&lt;p&gt;A circuit breaker monitors calls to a downstream service. If failures cross a threshold, it opens the circuit and stops sending requests for a set period. This prevents your service from waiting on a dead dependency.&lt;/p&gt;

&lt;h3&gt;
  
  
  Retries with Idempotency
&lt;/h3&gt;

&lt;p&gt;Retries are essential. But retries without idempotency create duplicate operations. Every write operation in every service must be idempotent. Passing the same request twice must produce the same result.&lt;/p&gt;

&lt;p&gt;For Shopify-specific patterns, our guide on &lt;a href="https://kolachitech.com/idempotency-strategies-in-shopify-systems/" rel="noopener noreferrer"&gt;idempotency strategies in Shopify systems&lt;/a&gt; covers the implementation in detail.&lt;/p&gt;

&lt;h3&gt;
  
  
  Timeouts
&lt;/h3&gt;

&lt;p&gt;Every synchronous call between services must have a timeout. Never wait indefinitely for a response. Set aggressive timeouts and handle the timeout case explicitly.&lt;/p&gt;

&lt;p&gt;Without timeouts, one slow service can exhaust the thread pool of every service that calls it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bulkhead Isolation
&lt;/h3&gt;

&lt;p&gt;Partition your service's resources so one failing integration does not consume all available capacity. Run your Shopify webhook handlers on a separate thread pool from your ERP sync jobs. Isolate your database connection pools per domain.&lt;/p&gt;

&lt;p&gt;For a complete treatment of these patterns in the Shopify context, read our guide on &lt;a href="https://kolachitech.com/fault-tolerant-shopify-integration/" rel="noopener noreferrer"&gt;fault-tolerant Shopify integration&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Data Management Across Services
&lt;/h2&gt;

&lt;p&gt;Data management is the hardest part of multi-service Shopify architecture.&lt;/p&gt;

&lt;h3&gt;
  
  
  One Database Per Service
&lt;/h3&gt;

&lt;p&gt;Each service owns its own data store. No shared databases. This is not optional if you want true service independence.&lt;/p&gt;

&lt;p&gt;The immediate objection is always: how do you join data across services? You do not join across service boundaries. You use APIs or events to compose data.&lt;/p&gt;

&lt;p&gt;If the Order Service needs the customer's loyalty tier from the Customer Service, it makes an API call at runtime or subscribes to &lt;code&gt;customer.tier_updated&lt;/code&gt; events to keep a local cache.&lt;/p&gt;

&lt;h3&gt;
  
  
  Caching for Cross-Service Data
&lt;/h3&gt;

&lt;p&gt;Frequently read, rarely updated data from other services can be cached locally. The Pricing Service's discount rules do not change every second. The Fulfillment Service can cache them with a short TTL.&lt;/p&gt;

&lt;p&gt;This reduces inter-service API calls and improves resilience. If the Pricing Service is temporarily unavailable, the Fulfillment Service can continue with cached rules.&lt;/p&gt;

&lt;p&gt;Build your caching strategy on solid foundations. Our guide on &lt;a href="https://kolachitech.com/shopify-caching-layers/" rel="noopener noreferrer"&gt;Shopify caching layers&lt;/a&gt; explains the patterns that work at scale.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sagas for Distributed Transactions
&lt;/h3&gt;

&lt;p&gt;In a relational database, you use transactions to keep operations atomic. In a multi-service ecosystem, you cannot do a transaction across service boundaries.&lt;/p&gt;

&lt;p&gt;You use sagas instead.&lt;/p&gt;

&lt;p&gt;A saga is a sequence of local transactions, each publishing an event that triggers the next step. If any step fails, compensating transactions undo the previous steps.&lt;/p&gt;

&lt;p&gt;Example: Processing a Shopify order across services.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Step&lt;/th&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;th&gt;Compensating Action&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Reserve inventory in Inventory Service&lt;/td&gt;
&lt;td&gt;Release reservation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Create shipment in Fulfillment Service&lt;/td&gt;
&lt;td&gt;Cancel shipment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Charge payment via Payment Service&lt;/td&gt;
&lt;td&gt;Refund payment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Send confirmation via Notification Service&lt;/td&gt;
&lt;td&gt;Send cancellation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If Step 3 fails, the saga executes the compensating actions for Steps 1 and 2 automatically. No manual cleanup required.&lt;/p&gt;




&lt;h2&gt;
  
  
  Scaling Individual Services
&lt;/h2&gt;

&lt;p&gt;One of the biggest advantages of multi-service architecture is independent scaling.&lt;/p&gt;

&lt;p&gt;Your Inventory Service may need ten instances during a flash sale but only one during off-peak hours. Your Analytics Service may need to scale horizontally to handle event ingestion volume. Your Notification Service needs to handle bursts without blocking order processing.&lt;/p&gt;

&lt;p&gt;Scale each service based on its own load profile. Do not scale everything together.&lt;/p&gt;

&lt;p&gt;For Shopify-specific scaling challenges, our post on &lt;a href="https://kolachitech.com/scaling-shopify-apps-millions-of-requests/" rel="noopener noreferrer"&gt;scaling Shopify apps to millions of requests&lt;/a&gt; covers the infrastructure decisions in detail.&lt;/p&gt;

&lt;p&gt;Pair independent scaling with &lt;a href="https://kolachitech.com/shopify-load-balancing/" rel="noopener noreferrer"&gt;load balancing strategies&lt;/a&gt; to distribute traffic evenly across service instances and avoid hot spots.&lt;/p&gt;




&lt;h2&gt;
  
  
  Service Communication Contracts
&lt;/h2&gt;

&lt;p&gt;Services talk to each other through APIs and events. Those interfaces are contracts. Breaking a contract breaks every service that depends on it.&lt;/p&gt;

&lt;p&gt;Follow these rules for managing contracts:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Version your APIs.&lt;/strong&gt; Never change an existing endpoint in a breaking way. Add a new version instead. Run both in parallel until all consumers migrate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Schema your events.&lt;/strong&gt; Define a schema for every event your service publishes. Consumers rely on that schema. Changing it without notice breaks them silently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Publish your contracts.&lt;/strong&gt; Use OpenAPI specs for REST APIs and JSON Schema or Avro for event payloads. Keep them in a shared repository all teams can access.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Test contracts explicitly.&lt;/strong&gt; Use consumer-driven contract tests to verify that a service's API or event schema still satisfies every consumer's expectations before deploying a change.&lt;/p&gt;

&lt;p&gt;This discipline prevents the integration failures that make multi-service architectures hard to maintain.&lt;/p&gt;




&lt;h2&gt;
  
  
  Observability in a Multi-Service Shopify Ecosystem
&lt;/h2&gt;

&lt;p&gt;You cannot debug a distributed system the way you debug a monolith. A request that touches five services needs end-to-end tracing to diagnose failures.&lt;/p&gt;

&lt;p&gt;Build these three pillars of observability into every service from day one:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Distributed tracing.&lt;/strong&gt; Assign a trace ID to every Shopify event when it enters your system. Pass it through every service call so you can reconstruct the full path of any request across all services.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Structured logging.&lt;/strong&gt; Every service logs structured JSON with the trace ID, service name, event type, and outcome. This makes logs searchable and correlatable across services.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Metrics.&lt;/strong&gt; Each service exposes metrics on latency, error rate, throughput, and queue depth. Aggregate them in a central dashboard so you can see the health of the entire ecosystem at a glance.&lt;/p&gt;

&lt;p&gt;The middleware layer is often where observability is most valuable. Our guide on &lt;a href="https://kolachitech.com/resilient-shopify-middleware/" rel="noopener noreferrer"&gt;resilient Shopify middleware&lt;/a&gt; covers how to instrument the integration layer effectively.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Reference Architecture for Shopify Ecosystems
&lt;/h2&gt;

&lt;p&gt;Here is a reference architecture that brings all these patterns together:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Shopify Platform
      |
      | (Webhooks + API calls)
      v
  API Gateway
  (Auth, routing, rate limiting)
      |
      +---&amp;gt; Order Service -------&amp;gt; Message Bus &amp;lt;------- Inventory Service
      |           |                     |                      |
      |           v                     |                      v
      |     Local DB (orders)           |              Local DB (inventory)
      |                                 |
      +---&amp;gt; Analytics Service &amp;lt;---------+
      |                                 |
      +---&amp;gt; Notification Service &amp;lt;------+
      |                                 |
      +---&amp;gt; Fulfillment Service &amp;lt;-------+
                  |
                  v
          Local DB (shipments)
                  |
                  v
         3PL / Warehouse API
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every service has its own database. All async communication flows through the message bus. Shopify enters the ecosystem through the API gateway only. Each service can be scaled, deployed, and restarted independently.&lt;/p&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Multi-service Shopify architecture is not about adding complexity for its own sake.&lt;/p&gt;

&lt;p&gt;It is about giving each part of your ecosystem the freedom to evolve, scale, and fail independently. When done well, it makes large Shopify ecosystems easier to maintain, easier to extend, and far more resilient under load.&lt;/p&gt;

&lt;p&gt;Start with clear domain boundaries. Define communication contracts before writing code. Build resilience patterns into every service. Instrument everything from day one.&lt;/p&gt;

&lt;p&gt;The investment upfront pays back every time you need to scale a single service, onboard a new team, or swap out an integration without touching the rest of the system.&lt;/p&gt;




&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q: What is multi-service architecture for Shopify?&lt;/strong&gt;&lt;br&gt;
It is a design approach where your Shopify ecosystem is split into independently deployable services, each owning a specific business domain like orders, inventory, or fulfillment, and communicating through APIs or events.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: When should I switch from a monolith to multi-service Shopify architecture?&lt;/strong&gt;&lt;br&gt;
When your codebase is hard to deploy without breaking things, when different domains need to scale independently, or when separate teams need to work without interfering with each other.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: What is the difference between service orchestration and choreography in Shopify?&lt;/strong&gt;&lt;br&gt;
Orchestration uses a central service to coordinate workflows in sequence. Choreography lets services react to events autonomously without a central coordinator. Most Shopify ecosystems use both depending on the workflow complexity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: How do services share data without a shared database?&lt;/strong&gt;&lt;br&gt;
Through API calls at runtime, event-driven data propagation, or local caches of frequently read data from other services. Direct cross-service database queries are not permitted in multi-service architecture.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: How do I handle distributed transactions across Shopify services?&lt;/strong&gt;&lt;br&gt;
Use the saga pattern. Each step in the transaction publishes an event triggering the next step. Failed steps trigger compensating actions to undo previous steps automatically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Does Shopify support multi-service architecture natively?&lt;/strong&gt;&lt;br&gt;
Shopify provides webhooks, the Admin API, and the GraphQL API as integration points. Multi-service architecture is an application-level pattern you implement in your own infrastructure using those Shopify tools as the integration layer.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Published by&lt;a href="https://kolachitech.com/multi-service-shopify-architecture" rel="noopener noreferrer"&gt; KolachiTech &lt;/a&gt;— Shopify development specialists.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>shopify</category>
      <category>microservices</category>
      <category>ecommerce</category>
    </item>
    <item>
      <title>Handling Eventual Consistency in Shopify Integrations</title>
      <dc:creator>Muhammad Masad Ashraf</dc:creator>
      <pubDate>Mon, 11 May 2026 19:28:11 +0000</pubDate>
      <link>https://dev.to/masadashraf/handling-eventual-consistency-in-shopify-integrations-5dc5</link>
      <guid>https://dev.to/masadashraf/handling-eventual-consistency-in-shopify-integrations-5dc5</guid>
      <description>&lt;p&gt;If you have built a Shopify integration before, you have hit this wall.&lt;/p&gt;

&lt;p&gt;Your warehouse system shows 50 units. Shopify shows 45. An order comes in for 10 units. Now you have a problem you did not plan for.&lt;/p&gt;

&lt;p&gt;This is eventual consistency at work. And it is one of the most common sources of production bugs in Shopify-connected systems.&lt;/p&gt;

&lt;p&gt;This post walks through why it happens and exactly how to handle it.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Eventual Consistency Actually Means
&lt;/h2&gt;

&lt;p&gt;Eventual consistency means all systems will reflect the same data state eventually, but not at the same instant.&lt;/p&gt;

&lt;p&gt;It sits at the opposite end of the spectrum from strong consistency, where every read returns the latest write. Strong consistency requires locking and synchronous coordination. That works fine on a single database. It does not scale to distributed commerce infrastructure serving millions of merchants.&lt;/p&gt;

&lt;p&gt;Shopify prioritizes availability and performance over strict consistency. That is the right call for a platform at its scale. But it shifts the responsibility of handling consistency onto you, the integration developer.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Happens in Shopify Integrations
&lt;/h2&gt;

&lt;p&gt;There are five common causes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Asynchronous webhooks&lt;/strong&gt;&lt;br&gt;
Shopify webhooks are fire-and-forget. There is no guaranteed delivery order. Retries create duplicates. If your endpoint is down, you miss events entirely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;API rate limits&lt;/strong&gt;&lt;br&gt;
When your sync job hits rate limits, some records update later than others. You end up with a window where data is partially synced.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multiple integration points&lt;/strong&gt;&lt;br&gt;
Your store likely connects to an ERP, CRM, warehouse system, and marketing platform. Each one processes at a different speed and fails in different ways.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Network partitions&lt;/strong&gt;&lt;br&gt;
Any service in your stack can fail mid-sync. When it recovers, it has to reconcile against a Shopify state that moved on without it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Concurrent writes&lt;/strong&gt;&lt;br&gt;
Two systems write to the same resource at the same time. Without conflict detection, the last write wins, which may not be the correct one.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Strategies That Actually Work
&lt;/h2&gt;
&lt;h3&gt;
  
  
  1. Make Every Handler Idempotent
&lt;/h3&gt;

&lt;p&gt;This is the most important one. If you do nothing else, do this.&lt;/p&gt;

&lt;p&gt;When Shopify retries a webhook, your handler will receive the same event twice. Without idempotency, you process the order twice, deduct inventory twice, or send the confirmation email twice.&lt;/p&gt;

&lt;p&gt;The fix is simple. Store a deduplication key before processing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;dedupKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;alreadyProcessed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;events&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findOne&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;dedupKey&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;alreadyProcessed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sendStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;events&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;dedupKey&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="nf"&gt;processEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Use the Shopify resource ID combined with the event topic as your key. Check it before doing any work. Write it as part of the same transaction if your database supports it.&lt;/p&gt;


&lt;h3&gt;
  
  
  2. Never Process Webhooks Synchronously
&lt;/h3&gt;

&lt;p&gt;Your webhook endpoint should do one thing: accept the payload and return 200.&lt;/p&gt;

&lt;p&gt;Processing the event inside the HTTP handler means any slowness or failure in your downstream systems causes Shopify to mark your endpoint as failed and retry, creating more duplicates.&lt;/p&gt;

&lt;p&gt;Instead, push to a queue immediately:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/webhooks/orders-create&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;x-shopify-topic&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="na"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sendStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Your worker picks it up, processes it at its own pace, and retries independently from your HTTP layer.&lt;/p&gt;


&lt;h3&gt;
  
  
  3. Use Exponential Backoff with Jitter on Retries
&lt;/h3&gt;

&lt;p&gt;When a downstream system is unavailable, retrying immediately makes things worse. All your queued workers collide on the same recovering service.&lt;/p&gt;

&lt;p&gt;Use exponential backoff:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;delay&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;BASE_DELAY&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;attemptNumber&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;MAX_DELAY&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The jitter (random component) spreads retries across a time window so workers do not all hit the service at the same second.&lt;/p&gt;

&lt;p&gt;Always set a dead-letter queue for events that exhaust all retries. You need to know what failed and why.&lt;/p&gt;


&lt;h3&gt;
  
  
  4. Define Source of Truth Per Data Domain
&lt;/h3&gt;

&lt;p&gt;Most multi-system conflicts come from two services both believing they own the same data.&lt;/p&gt;

&lt;p&gt;Be explicit about ownership:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Data Domain&lt;/th&gt;
&lt;th&gt;Source of Truth&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Product catalog&lt;/td&gt;
&lt;td&gt;Shopify or PIM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Inventory levels&lt;/td&gt;
&lt;td&gt;Warehouse Management System&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Order status&lt;/td&gt;
&lt;td&gt;Shopify&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Customer profiles&lt;/td&gt;
&lt;td&gt;CRM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fulfillment status&lt;/td&gt;
&lt;td&gt;3PL or OMS&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Once ownership is defined, all other systems become consumers. They read from the source of truth. They do not write back to it. This eliminates an entire category of conflicts.&lt;/p&gt;


&lt;h3&gt;
  
  
  5. Add a Polling Fallback Alongside Webhooks
&lt;/h3&gt;

&lt;p&gt;Webhooks miss events. It happens. Endpoints go down, Shopify retries expire, network conditions cause silent failures.&lt;/p&gt;

&lt;p&gt;Pair every webhook subscription with a scheduled polling job. The job fetches records where &lt;code&gt;updated_at&lt;/code&gt; is greater than the last successful sync timestamp:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight graphql"&gt;&lt;code&gt;&lt;span class="k"&gt;query&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;first&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"updated_at:&amp;gt;2026-05-12T10:00:00Z"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="n"&gt;edges&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="n"&gt;updatedAt&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="n"&gt;fulfillmentStatus&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Shopify's GraphQL API makes this efficient with cursor-based pagination and filtered queries. This job is your safety net for everything the webhook layer missed.&lt;/p&gt;


&lt;h3&gt;
  
  
  6. Use Optimistic Concurrency Control for Writes
&lt;/h3&gt;

&lt;p&gt;When two systems write to the same resource, you need conflict detection.&lt;/p&gt;

&lt;p&gt;Optimistic concurrency control works like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Read the resource and note the current version (use &lt;code&gt;updated_at&lt;/code&gt; from Shopify)&lt;/li&gt;
&lt;li&gt;Make your changes locally&lt;/li&gt;
&lt;li&gt;Before writing, assert the version has not changed&lt;/li&gt;
&lt;li&gt;If it has changed, re-fetch and retry&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This prevents silent overwrites where a stale write replaces a more recent update from another system.&lt;/p&gt;


&lt;h3&gt;
  
  
  7. Set Freshness SLAs Per Data Type
&lt;/h3&gt;

&lt;p&gt;Not all data needs to be equally fresh. Setting explicit staleness windows lets you make smarter sync decisions:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Data Type&lt;/th&gt;
&lt;th&gt;Acceptable Staleness&lt;/th&gt;
&lt;th&gt;Sync Method&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Inventory counts&lt;/td&gt;
&lt;td&gt;30-60 seconds&lt;/td&gt;
&lt;td&gt;Webhook + polling fallback&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Order status&lt;/td&gt;
&lt;td&gt;Near real-time&lt;/td&gt;
&lt;td&gt;Webhook with queue&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Product descriptions&lt;/td&gt;
&lt;td&gt;5-15 minutes&lt;/td&gt;
&lt;td&gt;Scheduled batch&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Customer loyalty points&lt;/td&gt;
&lt;td&gt;1-2 minutes&lt;/td&gt;
&lt;td&gt;Event-driven&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pricing rules&lt;/td&gt;
&lt;td&gt;Near real-time&lt;/td&gt;
&lt;td&gt;Webhook + cache invalidation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Once you have these defined, you can size your infrastructure appropriately instead of treating everything as equally urgent.&lt;/p&gt;


&lt;h2&gt;
  
  
  Monitoring Consistency Issues
&lt;/h2&gt;

&lt;p&gt;Visibility is as important as the fix. Build these in from day one:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lag metrics&lt;/strong&gt; — Measure the time between a Shopify event firing and your system completing the action. Alert when it exceeds your SLA.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DLQ depth alerts&lt;/strong&gt; — A dead-letter queue with growing depth means something is consistently failing. Treat it like an on-call alert.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reconciliation jobs&lt;/strong&gt; — Run periodic jobs that compare your local state against Shopify's current state. Log every discrepancy. Patterns reveal systemic problems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Correlation IDs&lt;/strong&gt; — Attach a unique ID to every event on ingestion and propagate it across all services. This lets you trace a single order through your entire stack.&lt;/p&gt;


&lt;h2&gt;
  
  
  Multi-Store Complexity
&lt;/h2&gt;

&lt;p&gt;If you run multiple Shopify stores, consistency challenges multiply.&lt;/p&gt;

&lt;p&gt;The standard approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use a centralized sync coordinator that receives events from all stores and fans them out to downstream systems&lt;/li&gt;
&lt;li&gt;Apply a global inventory buffer to absorb sync lag during peak periods&lt;/li&gt;
&lt;li&gt;Use two-phase confirmation for high-value actions: soft-hold the resource first, confirm after all systems acknowledge&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Quick Reference
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;th&gt;Problem It Solves&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Idempotent handlers&lt;/td&gt;
&lt;td&gt;Duplicate webhook delivery&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Queue-based ingestion&lt;/td&gt;
&lt;td&gt;Synchronous processing failures&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Exponential backoff + jitter&lt;/td&gt;
&lt;td&gt;Retry storms after outages&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Source-of-truth ownership&lt;/td&gt;
&lt;td&gt;Conflicting cross-system writes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Polling fallback&lt;/td&gt;
&lt;td&gt;Silently missed webhook events&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Optimistic concurrency&lt;/td&gt;
&lt;td&gt;Stale write overwrites recent update&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DLQ monitoring&lt;/td&gt;
&lt;td&gt;Silent failures going undetected&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;


&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;Eventual consistency is not something you fix. It is something you design for.&lt;/p&gt;

&lt;p&gt;Start with idempotency. Move event processing off the HTTP layer. Define who owns what data. Build in polling as a fallback. Measure lag continuously.&lt;/p&gt;

&lt;p&gt;These patterns will not eliminate inconsistency windows. But they will stop those windows from becoming customer-facing bugs or business-critical failures.&lt;/p&gt;

&lt;p&gt;Full guide with architecture patterns for multi-store setups and fault tolerance:&lt;br&gt;
&lt;/p&gt;
&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
        &lt;div class="c-embed__cover"&gt;
          &lt;a href="https://kolachitech.com/eventual-consistency-shopify/" class="c-link align-middle" rel="noopener noreferrer"&gt;
            &lt;img alt="" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fkolachitech.com%2Fwp-content%2Fuploads%2F2026%2F05%2FHandling-Eventual-Consistency-in-Shopify-Integrations.png" height="450" class="m-0" width="800"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="c-embed__body"&gt;
        &lt;h2 class="fs-xl lh-tight"&gt;
          &lt;a href="https://kolachitech.com/eventual-consistency-shopify/" rel="noopener noreferrer" class="c-link"&gt;
            Handling Eventual Consistency in Shopify Integrations
          &lt;/a&gt;
        &lt;/h2&gt;
          &lt;p class="truncate-at-3"&gt;
            Explore proven strategies to manage Shopify sync delays, distributed consistency, and data conflicts across third-party systems.
          &lt;/p&gt;
        &lt;div class="color-secondary fs-s flex items-center"&gt;
            &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fkolachitech.com%2Fwp-content%2Fuploads%2F2026%2F03%2Fcropped-Favicon-1-1.png" width="512" height="512"&gt;
          kolachitech.com
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;






&lt;p&gt;&lt;em&gt;What patterns have you found most effective for handling consistency in your Shopify integrations? Drop them in the comments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>shopify</category>
      <category>distributedsystems</category>
      <category>webdev</category>
      <category>backend</category>
    </item>
    <item>
      <title>Designing Resilient Shopify Middleware</title>
      <dc:creator>Muhammad Masad Ashraf</dc:creator>
      <pubDate>Fri, 08 May 2026 21:42:34 +0000</pubDate>
      <link>https://dev.to/masadashraf/designing-resilient-shopify-middleware-4d6i</link>
      <guid>https://dev.to/masadashraf/designing-resilient-shopify-middleware-4d6i</guid>
      <description>&lt;p&gt;Most Shopify middleware works fine in staging. It breaks in production, during a sale, at 11pm on a Friday.&lt;/p&gt;

&lt;p&gt;After auditing dozens of integrations, the failures are almost always the same. Not exotic bugs. Simple architecture decisions made when the system was small and never revisited.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pattern That Breaks Everything
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Webhook received
  → Direct API call to ERP
    → ERP times out (3s)
      → Shopify retries
        → Duplicate event processed
          → Inventory corrupted
            → Weekend on-call incident
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The fix is not a try-catch block. It is replacing the pattern entirely.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rule 1: Acknowledge the webhook in under 200ms. Process it asynchronously.&lt;/li&gt;
&lt;li&gt;Rule 2: Write state and outbox entries in one transaction. Let a worker handle external calls.&lt;/li&gt;
&lt;li&gt;Rule 3: Tag every event with an idempotency key before it touches anything.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Reference Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Shopify]   [ERP]   [WMS]   [Marketplaces]
    |        |       |           |
    v        v       v           v
   ----- API Gateway / Webhook Receiver -----
                     |
              [Message Broker]
                     |
   +-----------------+-----------------+
   |                 |                 |
[Transformer]   [Router]          [Audit Log]
   |                 |
   v                 v
[Domain Services: Orders, Inventory, Customers]
                     |
                     v
              [Outbound Workers]
                     |
                     v
   [Shopify Admin API]   [3rd Party APIs]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every layer has one job.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Webhook receivers do not transform.&lt;/li&gt;
&lt;li&gt;Transformers do not call APIs.&lt;/li&gt;
&lt;li&gt;Workers do not own state.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Core Principles Before You Write Any Code
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Principle&lt;/th&gt;
&lt;th&gt;What It Means&lt;/th&gt;
&lt;th&gt;Why It Matters&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Loose coupling&lt;/td&gt;
&lt;td&gt;Services talk via events, not direct calls&lt;/td&gt;
&lt;td&gt;One slow service does not block others&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Idempotency&lt;/td&gt;
&lt;td&gt;Same event applied twice = same result&lt;/td&gt;
&lt;td&gt;Safe to retry without double charges&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Backpressure&lt;/td&gt;
&lt;td&gt;Upstream slows when consumers lag&lt;/td&gt;
&lt;td&gt;No queue explosions or OOM crashes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Observability&lt;/td&gt;
&lt;td&gt;Every event is traceable end-to-end&lt;/td&gt;
&lt;td&gt;You can actually debug incidents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Graceful degradation&lt;/td&gt;
&lt;td&gt;Non-critical features fail soft&lt;/td&gt;
&lt;td&gt;Core checkout never goes down&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Building Blocks and Their Resilience Role
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;Resilience Pattern&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;API Gateway&lt;/td&gt;
&lt;td&gt;Receive and validate inbound traffic&lt;/td&gt;
&lt;td&gt;Rate limiting, schema validation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Webhook Receiver&lt;/td&gt;
&lt;td&gt;Accept Shopify events&lt;/td&gt;
&lt;td&gt;Fast 200 ACK, async processing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Message Broker&lt;/td&gt;
&lt;td&gt;Decouple producers from consumers&lt;/td&gt;
&lt;td&gt;Durable queues, partitioning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Transformer&lt;/td&gt;
&lt;td&gt;Map between data formats&lt;/td&gt;
&lt;td&gt;Pure functions, no side effects&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Domain Services&lt;/td&gt;
&lt;td&gt;Hold canonical state&lt;/td&gt;
&lt;td&gt;Postgres with row-level locking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Outbound Workers&lt;/td&gt;
&lt;td&gt;Push updates to external systems&lt;/td&gt;
&lt;td&gt;Retries, circuit breakers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Audit Log&lt;/td&gt;
&lt;td&gt;Record every event&lt;/td&gt;
&lt;td&gt;Append-only, 30-90 day retention&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Failure Modes You Must Design For
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Failure Mode&lt;/th&gt;
&lt;th&gt;Trigger&lt;/th&gt;
&lt;th&gt;Mitigation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Shopify rate limits&lt;/td&gt;
&lt;td&gt;Burst traffic, bulk updates&lt;/td&gt;
&lt;td&gt;Token bucket, respect &lt;code&gt;X-Shopify-Shop-Api-Call-Limit&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Webhook retry storms&lt;/td&gt;
&lt;td&gt;Shopify re-delivers for 48h&lt;/td&gt;
&lt;td&gt;Idempotency keys + dedup table&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Network timeouts&lt;/td&gt;
&lt;td&gt;Flaky third-party API&lt;/td&gt;
&lt;td&gt;Circuit breaker + exponential backoff&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bad payloads&lt;/td&gt;
&lt;td&gt;Schema changes upstream&lt;/td&gt;
&lt;td&gt;Strict validation + dead letter queue&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Slow consumers&lt;/td&gt;
&lt;td&gt;Heavy DB writes&lt;/td&gt;
&lt;td&gt;Backpressure + queue partitioning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Region outage&lt;/td&gt;
&lt;td&gt;Vendor failure&lt;/td&gt;
&lt;td&gt;Multi-region failover + replay logs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  The 5 Resilience Patterns That Actually Work
&lt;/h2&gt;

&lt;h2&gt;
  
  
  1. Circuit Breaker
&lt;/h2&gt;

&lt;p&gt;Watches error rates on outbound calls. Stops calling a failing service when errors spike above threshold. Gives the dependency time to recover.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CLOSED → (error rate &amp;gt; 50% over 30s) → OPEN
OPEN   → (wait 60s)                  → HALF-OPEN
HALF-OPEN → (3 successful probes)    → CLOSED
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  2. Exponential Backoff with Jitter
&lt;/h2&gt;

&lt;p&gt;Naive retries create thundering herds. Add jitter to spread the load.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getBackoffDelay&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;attempt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;base&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;attempt&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;60000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;jitter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;base&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;random&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;base&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;jitter&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// attempt 0 → ~200ms&lt;/span&gt;
&lt;span class="c1"&gt;// attempt 1 → ~400ms&lt;/span&gt;
&lt;span class="c1"&gt;// attempt 2 → ~800ms&lt;/span&gt;
&lt;span class="c1"&gt;// attempt 5 → ~6400ms (capped at 60s)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  3. Bulkheads
&lt;/h2&gt;

&lt;p&gt;Inventory updates run on a separate worker pool from order pushes. One clogged queue cannot block the other.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Timeouts on Everything
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Every outbound call needs a timeout. No exceptions.&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;signal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;AbortSignal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;// 5s for sync calls&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  5. Dead Letter Queue
&lt;/h2&gt;

&lt;p&gt;After &lt;code&gt;N&lt;/code&gt; failed retries, move the event to a DLQ.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Alert on growth&lt;/li&gt;
&lt;li&gt;Replay after fixing root cause&lt;/li&gt;
&lt;li&gt;Never let a bad event block the main pipeline&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Idempotency: The Most Important Property
&lt;/h2&gt;

&lt;p&gt;There is no exactly-once delivery.&lt;/p&gt;

&lt;p&gt;There is only:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;at-least-once delivery&lt;/li&gt;
&lt;li&gt;plus idempotent consumers
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;processInventoryEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;alreadyProcessed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`idem:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;event_id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;alreadyProcessed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// skip duplicate&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;transaction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;trx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;trx&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;inventory&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;quantity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;absolute&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;sku&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sku&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;location_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;location_id&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;trx&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;outbox&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;pending&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;`idem:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;event_id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="mi"&gt;604800&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// 7-day TTL&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Outbox Pattern: One Rule That Eliminates Dual-Write Bugs
&lt;/h2&gt;

&lt;p&gt;When you change internal state and need to notify an external system, never do both in the same step.&lt;/p&gt;

&lt;h2&gt;
  
  
  WITHOUT OUTBOX (dangerous)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. UPDATE inventory SET quantity = 42
2. POST /shopify/inventory_levels
   (fails? state wrong. success? maybe duplicate)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  WITH OUTBOX (safe)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. BEGIN TRANSACTION
   UPDATE inventory SET quantity = 42
   INSERT INTO outbox (payload, status) VALUES (...)
2. COMMIT
3. Separate worker polls outbox
   → calls Shopify
   → marks row done
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If Shopify is slow, the row waits.&lt;/p&gt;

&lt;p&gt;If the worker crashes, it restarts and retries.&lt;/p&gt;

&lt;p&gt;Internal state is always consistent.&lt;/p&gt;




&lt;h2&gt;
  
  
  Distributed Shopify Inventory Sync as a Real Example
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Step&lt;/th&gt;
&lt;th&gt;What Middleware Does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Receives inventory webhook from Shopify&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Returns 200 immediately, publishes to broker&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Transformer converts to canonical event schema&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Inventory service applies delta with optimistic concurrency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Outbox records pending downstream push&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Worker pushes to ERP, WMS, and marketplace channels&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Reconciler validates full state every 5 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Observability Baseline
&lt;/h2&gt;

&lt;p&gt;You cannot operate what you cannot see.&lt;/p&gt;

&lt;p&gt;Four pillars, from day one:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pillar&lt;/th&gt;
&lt;th&gt;Tool Options&lt;/th&gt;
&lt;th&gt;Key Question Answered&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Logs&lt;/td&gt;
&lt;td&gt;Datadog, ELK, Loki&lt;/td&gt;
&lt;td&gt;What happened to event X?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Metrics&lt;/td&gt;
&lt;td&gt;Prometheus, CloudWatch&lt;/td&gt;
&lt;td&gt;Is queue lag growing?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Traces&lt;/td&gt;
&lt;td&gt;OpenTelemetry, Jaeger&lt;/td&gt;
&lt;td&gt;Where did request Y spend time?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Alerts&lt;/td&gt;
&lt;td&gt;PagerDuty, Opsgenie&lt;/td&gt;
&lt;td&gt;What needs attention right now?&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Metrics that catch the most incidents
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Queue consumer lag (alert at &amp;gt; 30s)&lt;/li&gt;
&lt;li&gt;Dead letter queue depth (alert at &amp;gt; 100 unprocessed)&lt;/li&gt;
&lt;li&gt;Outbound API error rate (alert at &amp;gt; 2%)&lt;/li&gt;
&lt;li&gt;Webhook 5xx rate (alert at &amp;gt; 0.5%)&lt;/li&gt;
&lt;li&gt;DB connection pool saturation (alert at &amp;gt; 80%)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Race Condition Handling
&lt;/h2&gt;

&lt;p&gt;Two webhooks for the same SKU arrive at the same millisecond.&lt;/p&gt;

&lt;p&gt;Without protection, you lose one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Option 1: Partition by key (preferred)
&lt;/h2&gt;

&lt;p&gt;Hash SKU to a stream partition.&lt;/p&gt;

&lt;p&gt;All events for one SKU process in order on one consumer.&lt;/p&gt;




&lt;h2&gt;
  
  
  Option 2: Optimistic concurrency
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;inventory&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;quantity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;version&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;sku&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'TSHIRT-RED-M'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;location_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'wh_LA'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="k"&gt;version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Option 3: Distributed locks (last resort)
&lt;/h2&gt;

&lt;p&gt;Serialize via Redis or ZooKeeper.&lt;/p&gt;

&lt;p&gt;Works, but adds latency and contention.&lt;/p&gt;




&lt;h2&gt;
  
  
  Common Mistakes That Cost Merchants Real Money
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Calling Shopify or ERP APIs directly from webhook handlers&lt;/li&gt;
&lt;li&gt;Retries with no backoff (thundering herd on every incident)&lt;/li&gt;
&lt;li&gt;Infinite retries (burns rate limit quota silently)&lt;/li&gt;
&lt;li&gt;Logs without trace IDs (cannot debug incidents after the fact)&lt;/li&gt;
&lt;li&gt;No dead letter queue (one bad event blocks the whole pipeline)&lt;/li&gt;
&lt;li&gt;Ignoring Shopify schema changes in API responses&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Build vs Buy
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Recommendation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Simple ERP sync, low volume&lt;/td&gt;
&lt;td&gt;Use a connector (Celigo, Pipe17)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Custom flows, mid volume&lt;/td&gt;
&lt;td&gt;Hybrid: connector + custom services&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Complex multichannel, high volume&lt;/td&gt;
&lt;td&gt;Build a custom middleware platform&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Regulated industry&lt;/td&gt;
&lt;td&gt;Build with audit trail baked in&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  The Full Architecture Guide
&lt;/h2&gt;

&lt;p&gt;This post covers the core patterns.&lt;/p&gt;

&lt;p&gt;The full guide on our blog goes deeper with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Complete component-by-component breakdown&lt;/li&gt;
&lt;li&gt;Database schema design notes&lt;/li&gt;
&lt;li&gt;Serverless vs container trade-offs&lt;/li&gt;
&lt;li&gt;REST vs GraphQL decision framework&lt;/li&gt;
&lt;li&gt;Caching layer design&lt;/li&gt;
&lt;li&gt;Phased implementation roadmap&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;👉 Read it here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://kolachitech.com/resilient-shopify-middleware/" rel="noopener noreferrer"&gt;Designing Resilient Shopify Middleware&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thought
&lt;/h2&gt;

&lt;p&gt;Resilience is not a feature you bolt on later.&lt;/p&gt;

&lt;p&gt;It is an architectural property.&lt;/p&gt;

&lt;p&gt;The systems that survive Black Friday are usually the systems that were designed to fail safely from day one.&lt;/p&gt;




</description>
      <category>shopify</category>
      <category>webdev</category>
      <category>architecture</category>
      <category>javascript</category>
    </item>
    <item>
      <title>Distributed Shopify Inventory Sync: Architecture Guide for Scale</title>
      <dc:creator>Muhammad Masad Ashraf</dc:creator>
      <pubDate>Thu, 07 May 2026 10:20:22 +0000</pubDate>
      <link>https://dev.to/masadashraf/distributed-shopify-inventory-sync-architecture-guide-for-scale-1b60</link>
      <guid>https://dev.to/masadashraf/distributed-shopify-inventory-sync-architecture-guide-for-scale-1b60</guid>
      <description>&lt;p&gt;Keeping inventory accurate across Shopify, warehouses, and marketplaces sounds simple. At scale, it is one of the hardest engineering problems in ecommerce.&lt;/p&gt;

&lt;p&gt;A single API call after each sale works fine at 200 orders a day. At 20,000 concurrent transactions, it collapses.&lt;/p&gt;

&lt;p&gt;Here is what breaks first:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Overselling when two orders hit the same SKU simultaneously&lt;/li&gt;
&lt;li&gt;Stale counts when a warehouse update takes minutes to reflect&lt;/li&gt;
&lt;li&gt;Silent failures when a sync call times out with no retry&lt;/li&gt;
&lt;li&gt;Duplicate decrements when a webhook fires twice&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are predictable failure modes of monolithic sync. A distributed architecture fixes all of them.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Four Layers You Need
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Event Producer Layer&lt;/strong&gt;&lt;br&gt;
Captures inventory change events from Shopify webhooks, WMS, POS, and marketplaces.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Message Queue Layer&lt;/strong&gt;&lt;br&gt;
Events land in a durable queue (Kafka, RabbitMQ, SQS). Nothing gets lost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Microservices Processing Layer&lt;/strong&gt;&lt;br&gt;
Dedicated services consume events, apply business logic, push updates downstream.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. State Store Layer&lt;/strong&gt;&lt;br&gt;
Redis holds the current inventory truth. Shopify is updated from here asynchronously.&lt;/p&gt;

&lt;p&gt;Each layer scales independently. Each can fail without taking down the others.&lt;/p&gt;




&lt;h2&gt;
  
  
  Event-Driven Is the Only Foundation That Works
&lt;/h2&gt;

&lt;p&gt;Stop polling. React to events.&lt;/p&gt;

&lt;p&gt;Shopify fires these webhooks you need to capture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;inventory_levels/update&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;orders/create&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;orders/cancelled&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;refunds/create&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each webhook hits your receiver, gets acknowledged immediately, then lands on a queue for async processing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Never process a webhook synchronously inside the HTTP response window.&lt;/strong&gt; Timeouts will cause missed events and your inventory will drift.&lt;/p&gt;




&lt;h2&gt;
  
  
  Microservices Breakdown
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;th&gt;Job&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Webhook Receiver&lt;/td&gt;
&lt;td&gt;Validates HMAC, publishes to queue&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Order Event Consumer&lt;/td&gt;
&lt;td&gt;Reads order events, calculates deltas&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Inventory Adjuster&lt;/td&gt;
&lt;td&gt;Applies changes with optimistic locking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shopify Sync Service&lt;/td&gt;
&lt;td&gt;Pushes updates via GraphQL API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WMS Connector&lt;/td&gt;
&lt;td&gt;Bidirectional warehouse sync&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Notification Service&lt;/td&gt;
&lt;td&gt;Low-stock alerts and reorder triggers&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;One service, one job. Deploy and scale them independently.&lt;/p&gt;




&lt;h2&gt;
  
  
  Solving Concurrency: Three Patterns
&lt;/h2&gt;

&lt;p&gt;Two orders. One unit left. Both read stock as 1. Both decrement. Stock hits -1.&lt;/p&gt;

&lt;p&gt;Here is how you stop it:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Optimistic Locking&lt;/strong&gt;&lt;br&gt;
Version numbers on every record. Assert version has not changed before writing. Retry on conflict. Best for low-contention SKUs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pessimistic Locking&lt;/strong&gt;&lt;br&gt;
Lock before reading. One writer at a time. Slower but safe. Use during flash sales.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Atomic Counters (Recommended)&lt;/strong&gt;&lt;br&gt;
Redis &lt;code&gt;DECRBY&lt;/code&gt; is atomic. Use Redis as your inventory counter, sync to Shopify asynchronously. Fastest and most reliable for high volume.&lt;/p&gt;




&lt;h2&gt;
  
  
  Fault Tolerance Checklist
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Dead Letter Queue on every message queue&lt;/li&gt;
&lt;li&gt;Exponential backoff: 1s, 2s, 4s, 8s on API retries&lt;/li&gt;
&lt;li&gt;Idempotency keys on every sync operation&lt;/li&gt;
&lt;li&gt;Circuit breakers to stop hammering degraded services&lt;/li&gt;
&lt;li&gt;Correlation IDs on every event for end-to-end tracing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you skip any of these, you will debug silent inventory drift at 2am.&lt;/p&gt;




&lt;h2&gt;
  
  
  Caching Strategy
&lt;/h2&gt;

&lt;p&gt;Do not hit the Shopify API on every inventory read.&lt;/p&gt;

&lt;p&gt;Use &lt;strong&gt;write-through caching&lt;/strong&gt; with Redis:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Every update writes to Redis first&lt;/li&gt;
&lt;li&gt;Shopify sync happens asynchronously&lt;/li&gt;
&lt;li&gt;Reads always hit Redis (fast)&lt;/li&gt;
&lt;li&gt;Webhook fires? Invalidate the cache key immediately&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;TTL of 30 to 60 seconds works for most inventory read patterns.&lt;/p&gt;




&lt;h2&gt;
  
  
  Which Queue Should You Pick?
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Queue&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AWS SQS&lt;/td&gt;
&lt;td&gt;Simplest to operate, great for most stores&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Apache Kafka&lt;/td&gt;
&lt;td&gt;High volume, ordered event streams&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RabbitMQ&lt;/td&gt;
&lt;td&gt;Complex routing between services&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;SQS is the right default. Move to Kafka only when you need strict event ordering at millions of events per day.&lt;/p&gt;




&lt;h2&gt;
  
  
  Metrics to Watch
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Queue lag (messages behind)&lt;/li&gt;
&lt;li&gt;Sync latency (event to Shopify update)&lt;/li&gt;
&lt;li&gt;DLQ message count&lt;/li&gt;
&lt;li&gt;Inventory mismatch rate&lt;/li&gt;
&lt;li&gt;API rate limit hits&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Set alerts on DLQ growth and queue lag. These are your earliest warning signals before inventory starts drifting.&lt;/p&gt;




&lt;h2&gt;
  
  
  Bottom Line
&lt;/h2&gt;

&lt;p&gt;A distributed Shopify inventory sync system rests on four things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Event-driven ingestion&lt;/li&gt;
&lt;li&gt;Queue-based async processing&lt;/li&gt;
&lt;li&gt;Atomic counters for concurrency&lt;/li&gt;
&lt;li&gt;Idempotent operations throughout&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Get these right and oversells, stale counts, and silent sync failures become engineering history rather than daily incidents.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://kolachitech.com/distributed-shopify-inventory-sync/" rel="noopener noreferrer"&gt;KolachiTech&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>shopify</category>
      <category>distributedsystems</category>
      <category>microservices</category>
      <category>webdev</category>
    </item>
    <item>
      <title>How Race Conditions Silently Break Shopify Orders (And How to Fix Them)</title>
      <dc:creator>Muhammad Masad Ashraf</dc:creator>
      <pubDate>Wed, 06 May 2026 15:14:22 +0000</pubDate>
      <link>https://dev.to/masadashraf/how-race-conditions-silently-break-shopify-orders-and-how-to-fix-them-2326</link>
      <guid>https://dev.to/masadashraf/how-race-conditions-silently-break-shopify-orders-and-how-to-fix-them-2326</guid>
      <description>&lt;p&gt;Flash sales. Product drops. Viral moments.&lt;/p&gt;

&lt;p&gt;They're what every Shopify merchant dreams of — and exactly when race conditions turn that dream into a support ticket nightmare.&lt;/p&gt;

&lt;p&gt;Let me break down what's actually happening, why Shopify's built-in protections aren't enough, and the specific patterns that fix it.&lt;/p&gt;

&lt;p&gt;What Is a Race Condition in Shopify?&lt;br&gt;
A race condition happens when two or more processes read and write the same resource simultaneously, and the result depends on which one finishes first.&lt;/p&gt;

&lt;p&gt;In order processing, the classic scenario:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Customer A checks stock → 1 unit available → proceeds to checkout&lt;/li&gt;
&lt;li&gt;Customer B checks stock → 1 unit available → proceeds to checkout&lt;/li&gt;
&lt;li&gt;Customer A completes purchase → inventory = 0&lt;/li&gt;
&lt;li&gt;Customer B completes purchase → inventory = -1&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Both reads happened before either write completed. That gap between read and write is where race conditions live.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Shopify's Defaults Aren't Enough&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Shopify has built-in inventory tracking and "deny checkout when out of stock" settings. For basic stores, that's fine.&lt;/p&gt;

&lt;p&gt;It breaks down the moment you add:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Third-party fulfillment apps that write inventory via API&lt;/li&gt;
&lt;li&gt;Custom order processing logic in private apps&lt;/li&gt;
&lt;li&gt;Webhooks firing to external systems that write back to Shopify&lt;/li&gt;
&lt;li&gt;Multi-location inventory with allocation logic&lt;/li&gt;
&lt;li&gt;Headless storefronts (Hydrogen or custom) with custom cart logic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once any of these exist, you are responsible for managing concurrency in your own code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix #1: Use Atomic GraphQL Mutations&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Shopify GraphQL API's inventoryAdjustQuantities mutation applies inventory changes atomically. Instead of the dangerous read-calculate-write pattern:&lt;/p&gt;

&lt;p&gt;❌ Read stock: 5&lt;br&gt;
   Calculate: 5 - 1 = 4&lt;br&gt;
   Write: 4&lt;br&gt;
   (Another process can read 5 between your read and write)&lt;/p&gt;

&lt;p&gt;You send a delta:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight graphql"&gt;&lt;code&gt;&lt;span class="k"&gt;mutation&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="n"&gt;inventoryAdjustQuantities&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"correction"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"available"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="n"&gt;changes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="n"&gt;inventoryItemId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gid://shopify/InventoryItem/12345"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="n"&gt;locationId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gid://shopify/Location/67890"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;-1&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="n"&gt;inventoryAdjustmentGroup&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="n"&gt;createdAt&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="n"&gt;changes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Shopify applies the delta at the database level. No read-before-write gap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix #2: Idempotent Webhook Handlers&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Shopify retries webhook delivery up to 19 times if your endpoint doesn't return a 200 OK within 5 seconds.&lt;/p&gt;

&lt;p&gt;Without idempotency, one real order can trigger:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1. Multiple fulfillment requests&lt;/li&gt;
&lt;li&gt;2. Duplicate customer notifications&lt;/li&gt;
&lt;li&gt;3. Repeated inventory deductions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every Shopify webhook payload includes an X-Shopify-Webhook-Id header. Here's the pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;handleOrderWebhook&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;webhookId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;x-shopify-webhook-id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;

  &lt;span class="c1"&gt;// 1. Check if we've already processed this webhook&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;existing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;webhookLog&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findOne&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;webhookId&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;existing&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Already processed — return 200 and stop here&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Already processed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// 2. Record this webhook BEFORE processing&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;webhookLog&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;webhookId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;processedAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="c1"&gt;// 3. Now process your order logic safely&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;processOrder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;OK&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This ensures each logical event is processed exactly once, regardless of how many times Shopify retries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix #3: Distributed Locking with Redis&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For flash sales or any scenario with tight inventory, pessimistic locking prevents two processes from modifying the same resource simultaneously.&lt;/p&gt;

&lt;p&gt;Redis SET NX EX (set if not exists, with expiry) is the standard pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;redis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ioredis&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;acquireLock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;resourceId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ttlSeconds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;lockKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`lock:product:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;resourceId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;lockValue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`process-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;&lt;span class="s2"&gt;-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="c1"&gt;// NX = only set if not exists, EX = expire after ttl seconds&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;lockKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;lockValue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;NX&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;EX&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ttlSeconds&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;OK&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;lockValue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// Lock acquired&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// Lock already held by another process&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;releaseLock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;resourceId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;lockValue&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;lockKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`lock:product:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;resourceId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;current&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;lockKey&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Only release if we own the lock&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;current&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nx"&gt;lockValue&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;del&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;lockKey&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Usage in your order processing logic&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;processOrderWithLock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;productId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;orderId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;lock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;acquireLock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;productId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;lock&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Could not acquire lock — try again&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Safe to read and write inventory here&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;updateInventory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;productId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;createFulfillment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;orderId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;releaseLock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;productId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;lock&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Fix #4: Queue-Based Order Processing&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The most robust solution: eliminate the race condition entirely by serializing writes through a queue.&lt;/p&gt;

&lt;p&gt;Webhook Arrives&lt;br&gt;
      ↓&lt;br&gt;
Push to Queue (Redis / SQS / RabbitMQ)&lt;br&gt;
      ↓&lt;br&gt;
Worker picks next job&lt;br&gt;
      ↓&lt;br&gt;
Acquire lock on resource&lt;br&gt;
      ↓&lt;br&gt;
Process order logic&lt;br&gt;
      ↓&lt;br&gt;
Release lock → Acknowledge job&lt;/p&gt;

&lt;p&gt;With a queue, you also get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rate limiting writes to the Shopify API (staying within API limits)&lt;/li&gt;
&lt;li&gt;Safe retries on failures without duplicating successful operations&lt;/li&gt;
&lt;li&gt;Independent scaling of workers vs API layer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Testing for Race Conditions (Before Production Finds Them)&lt;/strong&gt;&lt;br&gt;
Concurrent request simulation:&lt;br&gt;
&lt;code&gt;# k6 load test — 100 virtual users hitting checkout simultaneously&lt;br&gt;
k6 run --vus 100 --duration 10s checkout-test.js&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Chaos test: Deliberately delay your webhook endpoint to force retries. Count how many times your processing logic runs per order.&lt;/p&gt;

&lt;p&gt;Database audit: Log every inventory write with the associated order ID. Duplicate entries for the same order ID = confirmed race condition.&lt;/p&gt;

&lt;p&gt;Negative inventory monitoring:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Run this on a cron job&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;checkNegativeInventory&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;products&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;shopify&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;product&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;250&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;oversold&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;products&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
    &lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;variants&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;some&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;v&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;inventory_quantity&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;oversold&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;alertSlack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`⚠️ Negative inventory detected: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;oversold&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;, &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Where to Start&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Don't try to implement everything at once. Here's the priority order:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Idempotent webhook handlers — highest impact, lowest complexity&lt;/li&gt;
&lt;li&gt;Atomic GraphQL mutations for inventory writes — swap out read-write patterns&lt;/li&gt;
&lt;li&gt;Distributed locking — for any custom inventory logic&lt;/li&gt;
&lt;li&gt;Queue-based processing — as order volume grows&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Race conditions aren't a Shopify problem. They're a distributed systems problem that shows up in Shopify stores. The good news is the fixes are well-understood — they just have to be deliberately implemented.&lt;/p&gt;

&lt;p&gt;Full guide with architecture patterns and a checklist: &lt;a href="//kolachitech.com/race-conditions-shopify-orders"&gt;Race conditions Shopify Orders&lt;/a&gt;&lt;/p&gt;

</description>
      <category>backend</category>
      <category>systemdesign</category>
      <category>tutorial</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Shopify Idempotency Strategies: A Developer's Survival Guide</title>
      <dc:creator>Muhammad Masad Ashraf</dc:creator>
      <pubDate>Tue, 05 May 2026 20:11:23 +0000</pubDate>
      <link>https://dev.to/masadashraf/shopify-idempotency-strategies-a-developers-survival-guide-57k1</link>
      <guid>https://dev.to/masadashraf/shopify-idempotency-strategies-a-developers-survival-guide-57k1</guid>
      <description>&lt;p&gt;You built a Shopify integration. It works great. Until it doesn't.&lt;/p&gt;

&lt;p&gt;A customer's network drops right after the checkout POST fires. They refresh. The browser retries. Shopify receives two identical order creation requests. You now have a double order, a double charge, and a very confused customer.&lt;/p&gt;

&lt;p&gt;This is an idempotency problem. And it is more common than most Shopify developers expect.&lt;/p&gt;

&lt;p&gt;What Is Idempotency in a Shopify Context?&lt;br&gt;
An operation is idempotent if running it once or a hundred times produces the same result. In Shopify integrations, this covers:&lt;/p&gt;

&lt;p&gt;• API mutations (order creation, fulfillment, refunds)&lt;br&gt;
• Webhook processing (order.paid, inventory.update)&lt;br&gt;
• Background job retries&lt;br&gt;
• Third-party sync operations&lt;/p&gt;

&lt;p&gt;Strategy 1: Use Idempotency Keys on REST API Calls&lt;br&gt;
Shopify's REST Admin API accepts an Idempotency-Key header on POST requests. Generate a UUID v4 per logical operation and send it with the request.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;v4&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;uuidv4&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;uuid&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// Generate ONCE per business event, store it const idempotencyKey = uuidv4();  await fetch('https://your-store.myshopify.com/admin/api/2024-01/orders.json', {   method: 'POST',   headers: {     'Content-Type': 'application/json',     'X-Shopify-Access-Token': process.env.SHOPIFY_TOKEN,     'Idempotency-Key': idempotencyKey, // same key on every retry   },   body: JSON.stringify(orderPayload) });&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key rule: Generate the key once. Store it before the request. Reuse it on every retry.&lt;/p&gt;

&lt;p&gt;Strategy 2: Deduplicate Webhooks with X-Shopify-Webhook-Id&lt;br&gt;
Shopify does not guarantee exactly-once webhook delivery. Every delivery includes an X-Shopify-Webhook-Id header. Use it as your deduplication key.&lt;/p&gt;

&lt;p&gt;Express.js webhook handler&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/webhooks/orders/paid&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;   &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;webhookId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;x-shopify-webhook-id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check Redis for this ID (48hr TTL)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;seen&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`webhook:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;webhookId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;   &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;seen&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;     &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;already_processed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;   &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Mark as seen BEFORE processing&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`webhook:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;webhookId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;172800&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;    &lt;span class="c1"&gt;// Now safely process   await processOrderPaid(req.body);   res.status(200).json({ status: 'ok' }); });&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why store BEFORE processing? If processing fails and you stored AFTER, the next retry will be treated as a duplicate and skipped. Store the key first, then process, and let your retry logic handle actual failures separately.&lt;/p&gt;

&lt;p&gt;Strategy 3: Safe Retries with Exponential Backoff&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;retryWithBackoff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;maxRetries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;   &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;maxRetries&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;attempt&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;     &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;       &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// always pass same key     } catch (err) {       if (attempt === maxRetries - 1) throw err;       const baseDelay = Math.pow(2, attempt) * 1000;       const jitter = Math.random() * 400 - 200;       await sleep(baseDelay + jitter);     }   } }&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The jitter prevents thundering herd problems when many clients retry simultaneously after an outage.&lt;/p&gt;

&lt;p&gt;Strategy 4: Database-Level Unique Constraints&lt;br&gt;
App-level checks can fail under race conditions. Two requests can both pass your "does this exist?" check before either writes. Add a database constraint as the hard backstop.&lt;/p&gt;

&lt;p&gt;PostgreSQL: prevent duplicate order processing&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;UNIQUE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_idempotency&lt;/span&gt;   &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;processed_events&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;idempotency_key&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;-- Catch the constraint violation in your app try {   await db.query(     'INSERT INTO processed_events (idempotency_key, result) VALUES ($1, $2)',     [key, JSON.stringify(result)]   ); } catch (err) {   if (err.code === '23505') { // unique violation     const existing = await db.query(       'SELECT result FROM processed_events WHERE idempotency_key = $1',       [key]     );     return JSON.parse(existing.rows[0].result); // return cached result   }   throw err; }&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GraphQL: No Native Header, So Use Upserts&lt;br&gt;
Shopify's GraphQL API does not support the Idempotency-Key header. Design mutations to be naturally idempotent instead.&lt;/p&gt;

&lt;p&gt;Use productUpdate (upsert) instead of productCreate where possible&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight graphql"&gt;&lt;code&gt;&lt;span class="k"&gt;mutation&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;UpdateProductInventory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;ID&lt;/span&gt;&lt;span class="p"&gt;!,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$qty&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;Int&lt;/span&gt;&lt;span class="p"&gt;!)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;inventoryAdjustQuantity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="n"&gt;inventoryItemId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$id&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="n"&gt;availableDelta&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$qty&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="n"&gt;inventoryLevel&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;available&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="n"&gt;userErrors&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;field&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For operations with no upsert, check for existing records first and return the existing result if found.&lt;/p&gt;

&lt;p&gt;Quick Reference: What to Use Where&lt;/p&gt;

&lt;p&gt;REST mutations: Idempotency-Key header + store key before request&lt;br&gt;
GraphQL mutations: Upserts + client-side dedup table&lt;br&gt;
Webhooks: X-Shopify-Webhook-Id + Redis with 48hr TTL&lt;br&gt;
Background jobs: Job ID as idempotency key + status tracking&lt;br&gt;
Database: Unique constraint on idempotency_key as final guard&lt;/p&gt;

&lt;p&gt;Further Reading&lt;br&gt;
&lt;a href="https://kolachitech.com/shopify-webhooks/" rel="noopener noreferrer"&gt;Shopify Webhooks Deep Dive&lt;/a&gt; | &lt;a href="https://kolachitech.com/fault-tolerant-shopify-integration/" rel="noopener noreferrer"&gt;Fault-Tolerant Shopify Integration&lt;/a&gt; | &lt;a href="https://kolachitech.com/queue-based-shopify-webhook-processing/" rel="noopener noreferrer"&gt;Queue-Based Webhook Processing&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What idempotency patterns do you use in your Shopify stack? &lt;br&gt;
&lt;a href="https://kolachitech.com/idempotency-strategies-in-shopify-systems/" rel="noopener noreferrer"&gt;Read more&lt;/a&gt; &lt;/p&gt;

</description>
      <category>shopify</category>
      <category>development</category>
    </item>
    <item>
      <title>Queue-Based Shopify Webhook Processing: Why It Matters and How to Build It</title>
      <dc:creator>Muhammad Masad Ashraf</dc:creator>
      <pubDate>Fri, 01 May 2026 23:39:00 +0000</pubDate>
      <link>https://dev.to/masadashraf/queue-based-shopify-webhook-processing-why-it-matters-and-how-to-build-it-5gff</link>
      <guid>https://dev.to/masadashraf/queue-based-shopify-webhook-processing-why-it-matters-and-how-to-build-it-5gff</guid>
      <description>&lt;p&gt;Shopify webhooks are HTTP POST requests fired on store events: orders, inventory updates, checkouts, customers. By default, your endpoint handles them synchronously. That works until traffic spikes.&lt;/p&gt;

&lt;p&gt;The async pattern:&lt;br&gt;
&lt;code&gt;Shopify fires webhook&lt;br&gt;
  → Receiver validates HMAC, pushes to queue, returns 200&lt;br&gt;
  → Worker pulls job, runs business logic, marks complete&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Your endpoint responds in milliseconds. Your worker takes as long as it needs.&lt;/p&gt;

&lt;p&gt;The two things that actually trip people up in production:&lt;/p&gt;

&lt;p&gt;Idempotency. Shopify retries on failure. If your worker processes a duplicate &lt;strong&gt;&lt;em&gt;orders/paid&lt;/em&gt;&lt;/strong&gt;, you might charge a customer twice. Fix: check the webhook ID in Redis before processing. Skip if seen. Simple.&lt;/p&gt;

&lt;p&gt;Dead letter queues. Jobs that fail all retries need somewhere to go. Log them, alert on them, reprocess them manually. Without a DLQ, failed jobs vanish and you find out from a merchant complaint.&lt;/p&gt;

&lt;p&gt;Stack recommendations: BullMQ + Redis for Node.js, Laravel Queues for PHP, Celery for Python, SQS + Lambda for AWS-native.&lt;/p&gt;

&lt;p&gt;Full post with code, priority queue tiers by webhook topic, and a go-live checklist: &lt;a href="https://kolachitech.com/queue-based-shopify-webhook-processing/" rel="noopener noreferrer"&gt;https://kolachitech.com/queue-based-shopify-webhook-processing/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>shopify</category>
      <category>webhooks</category>
      <category>architecture</category>
      <category>javascript</category>
    </item>
  </channel>
</rss>
