DEV Community

Cover image for Async Event Processing Architectures for Shopify — A Developer's Guide
Muhammad Masad Ashraf
Muhammad Masad Ashraf

Posted on • Originally published at kolachitech.com

Async Event Processing Architectures for Shopify — A Developer's Guide

Synchronous processing kills Shopify scalability. Every blocked operation is a timeout waiting to happen.

This guide covers the async event processing patterns that production Shopify systems actually use. We'll go from webhook receiver basics all the way to saga orchestration and event sourcing.


Why Async Matters for Shopify

Shopify fires webhooks for orders, fulfillments, inventory changes, customer updates, and more. If you process these synchronously inside your endpoint, you will hit timeouts, lose data, and bring down your system during traffic spikes.

The fix is simple in concept: accept fast, process later.

Your webhook endpoint acknowledges receipt in milliseconds. A background worker handles the actual business logic. This keeps your store responsive and your integrations stable.


Pattern 1: The Webhook Receiver

Your endpoint has one job: validate the request, push to a queue, return HTTP 200 — all within 5 seconds.

import express from 'express';
import crypto from 'crypto';
import { OrderQueue } from './queues';

const app = express();
app.use(express.raw({ type: 'application/json' }));

app.post('/webhooks/orders/create', async (req, res) => {
  // 1. Validate HMAC signature
  const hmac = req.headers['x-shopify-hmac-sha256'];
  const hash = crypto
    .createHmac('sha256', process.env.SHOPIFY_SECRET)
    .update(req.body)
    .digest('base64');

  if (hash !== hmac) {
    return res.status(401).send('Unauthorized');
  }

  // 2. Parse and enqueue
  const order = JSON.parse(req.body);
  await OrderQueue.add('process', order, {
    jobId: `order_${order.id}`, // Deduplication key
    attempts: 5,
    backoff: { type: 'exponential', delay: 2000 }
  });

  // 3. Respond immediately
  res.status(200).send('OK');
});
Enter fullscreen mode Exit fullscreen mode

Never run ERP calls, inventory syncs, or email sends inside this handler. Push to queue and return.


Pattern 2: Idempotent Queue Workers

Shopify retries webhooks for 48 hours. You will receive duplicates. Every worker must handle this gracefully.

import { Worker } from 'bullmq';
import { redis } from './redis';
import { db } from './db';

const worker = new Worker('orders', async (job) => {
  const order = job.data;

  // Guard against duplicate processing
  const alreadyProcessed = await db.processedEvents.findUnique({
    where: { shopifyOrderId: String(order.id) }
  });

  if (alreadyProcessed) {
    console.log(`Order ${order.id} already processed. Skipping.`);
    return { skipped: true };
  }

  // Run actual business logic
  await Promise.allSettled([
    syncInventory(order),
    notifyFulfillmentTeam(order),
    updateAnalytics(order),
    syncToERP(order)
  ]);

  // Mark as processed (idempotency record)
  await db.processedEvents.create({
    data: { shopifyOrderId: String(order.id), processedAt: new Date() }
  });

  return { success: true };
}, { connection: redis });
Enter fullscreen mode Exit fullscreen mode

The processedEvents table is your source of truth for deduplication. Always check it first.


Pattern 3: Dead Letter Queue Handling

Not all failures are transient. Some events fail because of bad data, schema mismatches, or downstream service bugs. These need to be isolated instead of retrying forever.

import { Queue, Worker, QueueEvents } from 'bullmq';

const mainQueue = new Queue('orders', { connection: redis });
const dlq = new Queue('orders-dlq', { connection: redis });

const worker = new Worker('orders', processOrder, {
  connection: redis,
});

worker.on('failed', async (job, err) => {
  if (job.attemptsMade >= job.opts.attempts) {
    console.error(`Moving job ${job.id} to DLQ`, err.message);

    await dlq.add('failed', {
      originalJob: job.data,
      error: err.message,
      failedAt: new Date().toISOString(),
      attempts: job.attemptsMade
    });
  }
});
Enter fullscreen mode Exit fullscreen mode

Set up alerts on DLQ size. Any event in there needs human review.


Pattern 4: Out-of-Order Event Handling

Shopify makes no delivery order guarantees. A fulfillments/create webhook can arrive before orders/create. Your system must handle this.

async function handleFulfillmentCreated(event) {
  const order = await db.orders.findUnique({
    where: { shopifyId: String(event.order_id) }
  });

  // Order not yet created locally — reschedule
  if (!order) {
    console.warn(`Order ${event.order_id} not found, rescheduling fulfillment event`);
    await fulfillmentQueue.add('retry', event, {
      delay: 10000, // Wait 10 seconds before retry
      attempts: 10
    });
    return;
  }

  // Safe to proceed
  await db.fulfillments.create({
    data: {
      orderId: order.id,
      shopifyFulfillmentId: String(event.id),
      status: event.status,
      trackingNumber: event.tracking_number
    }
  });
}
Enter fullscreen mode Exit fullscreen mode

Pattern 5: Saga Pattern for Multi-Service Coordination

When an order needs to update inventory, charge payment, and trigger fulfillment, you need transactional guarantees across services. Sagas handle this with compensation logic.

async function createOrderSaga(order) {
  const completedSteps = [];

  try {
    // Step 1
    await reserveInventory(order);
    completedSteps.push('inventory');

    // Step 2
    await chargePayment(order);
    completedSteps.push('payment');

    // Step 3
    await createFulfillment(order);
    completedSteps.push('fulfillment');

    // Step 4
    await sendOrderConfirmation(order);

    return { success: true };

  } catch (error) {
    console.error(`Saga failed at step after: ${completedSteps.join(', ')}`);

    // Compensate completed steps in reverse
    if (completedSteps.includes('payment')) {
      await refundPayment(order);
    }
    if (completedSteps.includes('inventory')) {
      await releaseInventory(order);
    }

    throw error;
  }
}
Enter fullscreen mode Exit fullscreen mode

Sagas trade atomicity for availability. Each step must also be idempotent.


Pattern 6: Circuit Breaker for External APIs

When your ERP, WMS, or third-party API goes down, you don't want to hammer it with retries. Circuit breakers stop the calls automatically and resume when the service recovers.

import CircuitBreaker from 'opossum';

const options = {
  timeout: 5000,       // Fail after 5 seconds
  errorThresholdPercentage: 50, // Open after 50% failures
  resetTimeout: 30000  // Try again after 30 seconds
};

const erpBreaker = new CircuitBreaker(syncOrderToERP, options);

erpBreaker.fallback(async (order) => {
  // Queue for later instead of failing hard
  await retryQueue.add('erp-sync', order, { delay: 60000 });
  return { queued: true };
});

erpBreaker.on('open', () => {
  console.error('ERP circuit breaker OPEN - service is down');
});

erpBreaker.on('close', () => {
  console.log('ERP circuit breaker CLOSED - service recovered');
});

// Use in your worker
async function processOrder(order) {
  await erpBreaker.fire(order);
}
Enter fullscreen mode Exit fullscreen mode

Queue Infrastructure Reference

Tool Best For Persistence Delay Jobs Priority
BullMQ (Redis) Node.js apps Yes Yes Yes
RabbitMQ Multi-language Yes Plugin Yes
AWS SQS Serverless Yes Yes Via FIFO
Google Pub/Sub GCP workloads Yes No No
Kafka High-throughput streams Yes No Partition-based

For most Shopify app use cases, BullMQ with Redis gives you the best balance of features and simplicity.


Monitoring Checklist

Track these metrics or you're flying blind:

  • Queue depth — growing consistently = you need more workers
  • Job processing time (p95) — slowdowns show up here first
  • Failure rate — spikes indicate downstream issues
  • DLQ size — any growth needs immediate investigation
  • Retry count — high values suggest a systemic problem

Log every state transition with structured fields:

logger.info('order.processing.started', {
  orderId: order.id,
  jobId: job.id,
  workerId: worker.id,
  timestamp: Date.now()
});
Enter fullscreen mode Exit fullscreen mode

Key Takeaways

  • Respond to webhooks in under 5 seconds — always push to queue first
  • Make every handler idempotent — duplicates are guaranteed
  • Use exponential backoff — don't hammer failing services
  • Send to DLQ after max retries — don't retry poisoned messages forever
  • Handle out-of-order delivery — reschedule, don't crash
  • Use sagas for multi-step transactions — compensate on failure
  • Monitor queue depth, latency, and DLQ size continuously

Originally published on Kolachi Tech. Kolachi Tech specializes in Shopify development, integrations, and custom app architecture.

Top comments (0)