<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Steve Rice</title>
    <description>The latest articles on DEV Community by Steve Rice (@grandfoosier).</description>
    <link>https://dev.to/grandfoosier</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3864165%2Fd2e84895-c303-4871-8e5d-d13adef7172c.jpeg</url>
      <title>DEV Community: Steve Rice</title>
      <link>https://dev.to/grandfoosier</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/grandfoosier"/>
    <language>en</language>
    <item>
      <title>Building a ‘simple’ async service in Rust (and why it wasn’t simple)</title>
      <dc:creator>Steve Rice</dc:creator>
      <pubDate>Mon, 06 Apr 2026 16:29:07 +0000</pubDate>
      <link>https://dev.to/grandfoosier/building-a-simple-async-service-in-rust-and-why-it-wasnt-simple-4i71</link>
      <guid>https://dev.to/grandfoosier/building-a-simple-async-service-in-rust-and-why-it-wasnt-simple-4i71</guid>
      <description>&lt;h1&gt;
  
  
  I thought this async Rust service would be simple
&lt;/h1&gt;

&lt;p&gt;I wanted to build a small async service in Rust.&lt;/p&gt;

&lt;p&gt;Accept events, process them, retry on failure. Nothing fancy.&lt;/p&gt;

&lt;p&gt;It looked like a weekend project.&lt;/p&gt;

&lt;p&gt;It turned into a lesson in how quickly “simple” systems stop being simple once you care about correctness.&lt;/p&gt;

&lt;p&gt;The full project is available here: &lt;a href="https://github.com/yourname/eventful" rel="noopener noreferrer"&gt;https://github.com/yourname/eventful&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The naive version
&lt;/h2&gt;

&lt;p&gt;The initial design looked something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;HTTP → queue → worker pool
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Handler receives an event
&lt;/li&gt;
&lt;li&gt;Push it into a channel
&lt;/li&gt;
&lt;li&gt;Workers pull from the channel and process
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That works fine — until you actually try to make it correct.&lt;/p&gt;

&lt;p&gt;As soon as you introduce retries, idempotency, and failure handling, things start to break in ways that aren’t obvious at first.&lt;/p&gt;




&lt;h2&gt;
  
  
  Problem 1: Idempotency isn’t just “don’t insert twice”
&lt;/h2&gt;

&lt;p&gt;I wanted ingestion to be idempotent by &lt;code&gt;event_id&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;At first, that just meant:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If the ID exists, return the existing record
&lt;/li&gt;
&lt;li&gt;Otherwise insert it
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But that leaves a hole.&lt;/p&gt;

&lt;p&gt;What if the same ID comes in with a different payload?&lt;/p&gt;

&lt;p&gt;That’s not a duplicate — that’s a conflict.&lt;/p&gt;

&lt;p&gt;The fix was to store a hash of the payload and reject mismatches:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Same ID + same payload → OK (deduped)
&lt;/li&gt;
&lt;li&gt;Same ID + different payload → 409 conflict
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Small change, but it forced me to treat idempotency as a real constraint instead of a convenience.&lt;/p&gt;




&lt;h2&gt;
  
  
  Problem 2: You can lose work even if you “queued” it
&lt;/h2&gt;

&lt;p&gt;Originally, I assumed:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If I push an event into the queue, it will eventually be processed.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s not actually true.&lt;/p&gt;

&lt;p&gt;Two things break this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The queue is full (&lt;code&gt;try_send&lt;/code&gt; fails)
&lt;/li&gt;
&lt;li&gt;The queue is broken (receiver dropped)
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In both cases, the event exists in the system, but it never reaches a worker.&lt;/p&gt;

&lt;p&gt;The fix was separating &lt;strong&gt;“exists”&lt;/strong&gt; from &lt;strong&gt;“scheduled.”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each record tracks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;status&lt;/code&gt; (Received, Processing, etc.)
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;queued&lt;/code&gt; (whether we think it’s scheduled)
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If enqueue fails, the record still exists, but it isn’t reliably scheduled anymore.&lt;/p&gt;

&lt;p&gt;Which leads to the next problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  Problem 3: You need a sweeper (even if it feels wrong)
&lt;/h2&gt;

&lt;p&gt;I didn’t initially want a background task scanning state. It felt like a workaround.&lt;/p&gt;

&lt;p&gt;But without it, there are too many ways for events to get stuck:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;enqueue fails
&lt;/li&gt;
&lt;li&gt;worker crashes mid-processing
&lt;/li&gt;
&lt;li&gt;retry timing gets missed
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So I added a sweeper.&lt;/p&gt;

&lt;p&gt;It runs periodically and looks for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;events ready to retry
&lt;/li&gt;
&lt;li&gt;events marked queued but not processed for too long
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then it re-enqueues them.&lt;/p&gt;

&lt;p&gt;It’s not elegant, but it’s robust. It gives you eventual correctness without requiring every code path to be perfect.&lt;/p&gt;




&lt;h2&gt;
  
  
  Problem 4: “Queue depth” is not one number
&lt;/h2&gt;

&lt;p&gt;At first I tracked queue depth as a single value.&lt;/p&gt;

&lt;p&gt;That turned out to be misleading.&lt;/p&gt;

&lt;p&gt;There are at least three different things happening:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Channel depth&lt;/strong&gt; — how many items are currently in the queue
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backlog&lt;/strong&gt; — how many events are marked &lt;code&gt;queued == true&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inflight&lt;/strong&gt; — how many workers are actively processing
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These are not the same.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Channel depth can be 0 while backlog is high
&lt;/li&gt;
&lt;li&gt;Inflight can be maxed out while the queue stays empty
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So I split them into separate metrics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;queue_channel_depth&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;backlog_queued&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;processing_inflight&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once I did that, the system became much easier to reason about.&lt;/p&gt;




&lt;h2&gt;
  
  
  Problem 5: Concurrency needs to be bounded explicitly
&lt;/h2&gt;

&lt;p&gt;The simplest approach is to spawn a task per event.&lt;/p&gt;

&lt;p&gt;That works until it doesn’t.&lt;/p&gt;

&lt;p&gt;I ended up using a &lt;code&gt;Semaphore&lt;/code&gt; to limit concurrency:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each task acquires a permit
&lt;/li&gt;
&lt;li&gt;The permit is held for the duration of processing
&lt;/li&gt;
&lt;li&gt;Max concurrency is fixed
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of a fixed worker pool, this lets me:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;keep the code simple
&lt;/li&gt;
&lt;li&gt;avoid idle workers
&lt;/li&gt;
&lt;li&gt;still enforce limits
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It also makes shutdown behavior much easier to control.&lt;/p&gt;




&lt;h2&gt;
  
  
  Problem 6: Graceful shutdown is where things get messy
&lt;/h2&gt;

&lt;p&gt;Stopping a system like this is harder than starting it.&lt;/p&gt;

&lt;p&gt;You need to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stop accepting new work
&lt;/li&gt;
&lt;li&gt;Stop dispatching new tasks
&lt;/li&gt;
&lt;li&gt;Let in-flight work finish (within reason)
&lt;/li&gt;
&lt;li&gt;Not hang forever
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What I ended up with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a &lt;code&gt;watch&lt;/code&gt; channel for shutdown signaling
&lt;/li&gt;
&lt;li&gt;a dispatch loop that exits on signal
&lt;/li&gt;
&lt;li&gt;a &lt;code&gt;JoinSet&lt;/code&gt; tracking worker tasks
&lt;/li&gt;
&lt;li&gt;a timeout for draining
&lt;/li&gt;
&lt;li&gt;forced abort after timeout
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So shutdown looks like:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Signal shutdown
&lt;/li&gt;
&lt;li&gt;Stop pulling from the queue
&lt;/li&gt;
&lt;li&gt;Wait up to N milliseconds for workers
&lt;/li&gt;
&lt;li&gt;Abort anything still running
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It’s not perfect, but it’s predictable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Problem 7: Metrics will lie to you if you’re not careful
&lt;/h2&gt;

&lt;p&gt;I added metrics early, but they were wrong at first.&lt;/p&gt;

&lt;p&gt;The issue was trying to track counts by incrementing and decrementing in multiple places.&lt;/p&gt;

&lt;p&gt;That’s easy to get wrong in a concurrent system.&lt;/p&gt;

&lt;p&gt;What ended up working:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Counters&lt;/strong&gt; → only ever increment
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;State counts&lt;/strong&gt; → only update on real state transitions
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example, &lt;code&gt;queued_count&lt;/code&gt; only changes when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;queued&lt;/code&gt; flips false → true
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;queued&lt;/code&gt; flips true → false
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Anything else introduces drift.&lt;/p&gt;




&lt;h2&gt;
  
  
  The resulting model
&lt;/h2&gt;

&lt;p&gt;The final system looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;HTTP → Ingest → Store → Channel → Dispatcher → Workers
                             ↑
                          Sweeper
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With a state machine:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Received → Processing → Completed
                    ↘ FailedRetry → Failed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And metrics that reflect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ingress
&lt;/li&gt;
&lt;li&gt;deduplication
&lt;/li&gt;
&lt;li&gt;processing success/failure
&lt;/li&gt;
&lt;li&gt;backlog
&lt;/li&gt;
&lt;li&gt;queue state
&lt;/li&gt;
&lt;li&gt;concurrency
&lt;/li&gt;
&lt;li&gt;latency
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What I took away from this
&lt;/h2&gt;

&lt;p&gt;A few things that stood out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“Simple async system” is usually not simple once you care about correctness
&lt;/li&gt;
&lt;li&gt;State machines make concurrency problems easier to reason about
&lt;/li&gt;
&lt;li&gt;Backpressure is multi-dimensional, not a single number
&lt;/li&gt;
&lt;li&gt;A sweeper is often the simplest way to guarantee eventual progress
&lt;/li&gt;
&lt;li&gt;Shutdown needs to be designed, not added later
&lt;/li&gt;
&lt;li&gt;Observability changes how you design the system
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What I didn’t do (on purpose)
&lt;/h2&gt;

&lt;p&gt;This is an in-memory system.&lt;/p&gt;

&lt;p&gt;I didn’t add:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;persistence
&lt;/li&gt;
&lt;li&gt;distributed processing
&lt;/li&gt;
&lt;li&gt;external queues
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those would be the next steps, but the goal here was to get the core behavior right first.&lt;/p&gt;




&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;This ended up being more about edge cases than features.&lt;/p&gt;

&lt;p&gt;Most of the code is just making sure the system behaves correctly when things don’t go as planned — which is most of the time in real systems.&lt;/p&gt;

&lt;p&gt;That was the interesting part.&lt;/p&gt;

&lt;p&gt;And honestly, the part I didn’t expect going in.&lt;/p&gt;




&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;If you want to see the full implementation:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/yourname/eventful" rel="noopener noreferrer"&gt;https://github.com/yourname/eventful&lt;/a&gt;&lt;/p&gt;

</description>
      <category>rust</category>
      <category>backend</category>
      <category>concurrency</category>
      <category>architecture</category>
    </item>
  </channel>
</rss>
