<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Yevhen Salitrynskyi</title>
    <description>The latest articles on DEV Community by Yevhen Salitrynskyi (@ysalitrynskyi).</description>
    <link>https://dev.to/ysalitrynskyi</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1262238%2F12bcdbca-260a-485e-8e4d-a67754d8095c.jpeg</url>
      <title>DEV Community: Yevhen Salitrynskyi</title>
      <link>https://dev.to/ysalitrynskyi</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ysalitrynskyi"/>
    <language>en</language>
    <item>
      <title>How I built a reliable webhook queue in Rust (retries, idempotency, DLQ, schedules, workflows, real-time)</title>
      <dc:creator>Yevhen Salitrynskyi</dc:creator>
      <pubDate>Mon, 22 Dec 2025 04:23:08 +0000</pubDate>
      <link>https://dev.to/ysalitrynskyi/how-i-built-a-reliable-webhook-queue-in-rust-retries-idempotency-dlq-schedules-workflows-2o7n</link>
      <guid>https://dev.to/ysalitrynskyi/how-i-built-a-reliable-webhook-queue-in-rust-retries-idempotency-dlq-schedules-workflows-2o7n</guid>
      <description>&lt;p&gt;Webhooks are deceptively hard to run in production. If you’ve shipped them at scale, you’ve probably hit at least one of these:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A customer says “we never got the webhook,” but you can’t prove what happened.&lt;/li&gt;
&lt;li&gt;Retries amplify outages (your retries + their retries = thundering herd).&lt;/li&gt;
&lt;li&gt;You implement idempotency inconsistently and pay for it later.&lt;/li&gt;
&lt;li&gt;Failures overwrite context, and the payload that caused the issue is gone.&lt;/li&gt;
&lt;li&gt;You end up building “just enough queue + retry logic” in every service.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After repeating that loop too many times, I built &lt;strong&gt;Spooled&lt;/strong&gt;: an open-source webhook queue and background job infrastructure &lt;strong&gt;built in Rust&lt;/strong&gt;, designed around reliability and operational visibility.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I wanted (non‑negotiables)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reliable delivery&lt;/strong&gt;: retries with backoff and clear terminal states&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Idempotency&lt;/strong&gt;: safe replays without duplicate side effects&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dead-letter queue (DLQ)&lt;/strong&gt;: keep failed jobs + error context; retry/purge when ready&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bulk operations&lt;/strong&gt;: enqueue jobs in batches and manage failures at scale&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cron schedules&lt;/strong&gt;: recurring jobs with timezone support&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workflows&lt;/strong&gt;: job dependencies (DAG-style execution)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time visibility&lt;/strong&gt;: live job/queue updates (SSE + WebSocket)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dual API&lt;/strong&gt;: REST (&lt;code&gt;:8080&lt;/code&gt;) + gRPC (&lt;code&gt;:50051&lt;/code&gt;) for high-throughput workers&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The core design
&lt;/h2&gt;

&lt;p&gt;At a high level:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The API accepts jobs (webhooks are just another job type).&lt;/li&gt;
&lt;li&gt;Jobs are stored durably in &lt;strong&gt;PostgreSQL&lt;/strong&gt; with explicit state transitions.&lt;/li&gt;
&lt;li&gt;Workers claim jobs using DB-backed concurrency patterns (e.g., &lt;code&gt;FOR UPDATE SKIP LOCKED&lt;/code&gt;) so multiple workers can scale safely.&lt;/li&gt;
&lt;li&gt;Every important transition can be observed in real time via &lt;strong&gt;SSE/WebSocket&lt;/strong&gt;, so the dashboard doesn’t lie.&lt;/li&gt;
&lt;li&gt;When retries are exhausted, jobs land in a &lt;strong&gt;DLQ&lt;/strong&gt; with enough context to debug and recover.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This gives you the two properties that matter most for webhooks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Durability&lt;/strong&gt;: jobs survive process restarts and deploys&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Traceability&lt;/strong&gt;: it’s easy to answer “what happened to job X?”&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Retries that don’t cause incidents
&lt;/h2&gt;

&lt;p&gt;Retries are necessary, but “retry immediately forever” is how you take systems down.&lt;/p&gt;

&lt;p&gt;Spooled uses a retry model with backoff and terminal outcomes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;transient failures get retried with increasing delays&lt;/li&gt;
&lt;li&gt;persistent failures end in DLQ instead of looping&lt;/li&gt;
&lt;li&gt;operators can re-run jobs safely (especially with idempotency keys)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Idempotency: making retries safe
&lt;/h2&gt;

&lt;p&gt;A retry system is only “reliable” if it’s safe to replay work.&lt;/p&gt;

&lt;p&gt;Spooled supports an &lt;code&gt;idempotency_key&lt;/code&gt; so you can prevent duplicates when external systems retry the same event (Stripe, GitHub, payment providers, etc.). With idempotency keys, you can aim for &lt;strong&gt;exactly-once effects&lt;/strong&gt; on top of at-least-once processing.&lt;/p&gt;

&lt;h2&gt;
  
  
  DLQ: failures you can actually debug
&lt;/h2&gt;

&lt;p&gt;A DLQ shouldn’t be a graveyard; it should be a debugging tool.&lt;/p&gt;

&lt;p&gt;Spooled’s DLQ keeps failed jobs so you can inspect payload + error context, then retry (or purge) once the underlying issue is fixed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Workflows: dependencies without a heavyweight orchestrator
&lt;/h2&gt;

&lt;p&gt;Many real systems need “do A, then B, then C,” or “run B only after A succeeds.”&lt;/p&gt;

&lt;p&gt;Spooled supports job dependencies and workflow/DAG execution so jobs run in the correct order without bolting on a separate orchestration platform for simple cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-time streaming: dashboards that don’t lie
&lt;/h2&gt;

&lt;p&gt;Polling-based dashboards often go stale at the exact moment you need them.&lt;/p&gt;

&lt;p&gt;Spooled exposes &lt;strong&gt;SSE streams&lt;/strong&gt; (system-wide, per-queue, and per-job) and WebSocket updates, so you can watch job and queue state change live.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Rust?
&lt;/h2&gt;

&lt;p&gt;Rust is a great fit for infrastructure that must run continuously:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;strong reliability and safety properties&lt;/li&gt;
&lt;li&gt;high performance under concurrency&lt;/li&gt;
&lt;li&gt;simple ops via a single binary release artifact&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Quick start (self-hosted)
&lt;/h2&gt;

&lt;p&gt;Spooled is self-hosted. The recommended way to run it is Docker Compose:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Pull the multi-arch image (amd64 + arm64)&lt;/span&gt;
docker pull ghcr.io/spooled-cloud/spooled-backend:latest

&lt;span class="c"&gt;# Download the production compose file&lt;/span&gt;
curl &lt;span class="nt"&gt;-O&lt;/span&gt; https://raw.githubusercontent.com/Spooled-Cloud/spooled-backend/main/docker-compose.prod.yml

&lt;span class="c"&gt;# Create a minimal .env with secure secrets&lt;/span&gt;
&lt;span class="nv"&gt;POSTGRES_PASSWORD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;openssl rand &lt;span class="nt"&gt;-base64&lt;/span&gt; 16&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nv"&gt;JWT_SECRET&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;openssl rand &lt;span class="nt"&gt;-base64&lt;/span&gt; 32&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; .env &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
POSTGRES_PASSWORD=&lt;/span&gt;&lt;span class="nv"&gt;$POSTGRES_PASSWORD&lt;/span&gt;&lt;span class="sh"&gt;
JWT_SECRET=&lt;/span&gt;&lt;span class="nv"&gt;$JWT_SECRET&lt;/span&gt;&lt;span class="sh"&gt;
RUST_ENV=production
JSON_LOGS=true
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Start services&lt;/span&gt;
docker compose &lt;span class="nt"&gt;-f&lt;/span&gt; docker-compose.prod.yml up &lt;span class="nt"&gt;-d&lt;/span&gt;

&lt;span class="c"&gt;# Verify&lt;/span&gt;
curl http://localhost:8080/health
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;PostgreSQL is required. Redis is optional (used for pub/sub and caching when enabled).&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/Spooled-Cloud/spooled-backend" rel="noopener noreferrer"&gt;https://github.com/Spooled-Cloud/spooled-backend&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Docs: &lt;a href="https://spooled.cloud/docs" rel="noopener noreferrer"&gt;https://spooled.cloud/docs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Live demo (SpriteForge): &lt;a href="https://example.spooled.cloud" rel="noopener noreferrer"&gt;https://example.spooled.cloud&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I’d love feedback on
&lt;/h2&gt;

&lt;p&gt;If you’ve built webhook systems or background job infrastructure, I’d love to hear:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What failure modes hurt you most in production?&lt;/li&gt;
&lt;li&gt;What’s missing from existing queues that you wish existed?&lt;/li&gt;
&lt;li&gt;What would make you switch to a self-hosted job/webhook system?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Thanks for reading. Feedback and issues are welcome.&lt;/p&gt;

</description>
      <category>systemdesign</category>
      <category>rust</category>
      <category>showdev</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
