<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Taras H</title>
    <description>The latest articles on DEV Community by Taras H (@taras_h_7a24f2b356a6e).</description>
    <link>https://dev.to/taras_h_7a24f2b356a6e</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3784318%2F039a00ba-82ed-4e06-ab84-4121d4681b1f.png</url>
      <title>DEV Community: Taras H</title>
      <link>https://dev.to/taras_h_7a24f2b356a6e</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/taras_h_7a24f2b356a6e"/>
    <language>en</language>
    <item>
      <title>Background Jobs in Production: The Problems Queues Don’t Solve</title>
      <dc:creator>Taras H</dc:creator>
      <pubDate>Sun, 08 Mar 2026 11:17:45 +0000</pubDate>
      <link>https://dev.to/taras_h_7a24f2b356a6e/background-jobs-in-production-the-problems-queues-dont-solve-209a</link>
      <guid>https://dev.to/taras_h_7a24f2b356a6e/background-jobs-in-production-the-problems-queues-dont-solve-209a</guid>
      <description>&lt;p&gt;Moving work out of the request path is one of the most common ways to&lt;br&gt;
speed up backend systems.&lt;/p&gt;

&lt;p&gt;Emails are sent asynchronously.&lt;br&gt;
Invoices are generated by workers.&lt;br&gt;
Webhooks are delivered through queues.&lt;br&gt;
Image processing and indexing run in background jobs.&lt;/p&gt;

&lt;p&gt;Latency improves immediately.&lt;/p&gt;

&lt;p&gt;But many teams eventually notice something strange in production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  duplicate emails appear&lt;/li&gt;
&lt;li&gt;  retries increase system load&lt;/li&gt;
&lt;li&gt;  dead-letter queues slowly grow&lt;/li&gt;
&lt;li&gt;  workflows technically "succeed"... but the outcome is wrong&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The queue is healthy.&lt;br&gt;
Workers are running.&lt;/p&gt;

&lt;p&gt;Yet the system behaves incorrectly.&lt;/p&gt;

&lt;p&gt;Moving work to the background &lt;strong&gt;changes where failures happen&lt;/strong&gt;.&lt;br&gt;
It does not remove them.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This post is a shorter version of a deeper engineering write-up&lt;br&gt;
originally published on CodeNotes.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  The Assumption Behind Background Jobs
&lt;/h2&gt;

&lt;p&gt;Background job systems are usually introduced with a simple expectation:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If a job fails, the queue will retry it until it succeeds.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Queues also provide useful features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  buffering traffic spikes&lt;/li&gt;
&lt;li&gt;  independent worker scaling&lt;/li&gt;
&lt;li&gt;  retry handling&lt;/li&gt;
&lt;li&gt;  isolation from request latency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because of this, async processing often &lt;em&gt;feels&lt;/em&gt; safer than synchronous&lt;br&gt;
execution.&lt;/p&gt;

&lt;p&gt;But that assumption depends on something rarely guaranteed in&lt;br&gt;
production:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;that running a job multiple times produces the same result as running&lt;br&gt;
it once.&lt;/strong&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  What "At-Least-Once Delivery" Actually Means
&lt;/h2&gt;

&lt;p&gt;Most queue systems guarantee &lt;strong&gt;at-least-once delivery&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That means the system will try hard to deliver a message - even if it&lt;br&gt;
results in duplicate execution.&lt;/p&gt;

&lt;p&gt;It does &lt;strong&gt;not&lt;/strong&gt; mean:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  the job runs exactly once&lt;/li&gt;
&lt;li&gt;  side effects happen exactly once&lt;/li&gt;
&lt;li&gt;  messages are processed in order&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words, the queue protects against &lt;strong&gt;message loss&lt;/strong&gt;, not&lt;br&gt;
&lt;strong&gt;duplicate work&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Once duplicate execution becomes possible, correctness has to come from&lt;br&gt;
somewhere else.&lt;/p&gt;

&lt;p&gt;Usually that means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  idempotent handlers&lt;/li&gt;
&lt;li&gt;  deduplication keys&lt;/li&gt;
&lt;li&gt;  explicit state transitions&lt;/li&gt;
&lt;li&gt;  retry boundaries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without those protections, the infrastructure is reliable while the&lt;br&gt;
workflow is not.&lt;/p&gt;


&lt;h2&gt;
  
  
  A Classic Failure Scenario
&lt;/h2&gt;

&lt;p&gt;Consider a worker that sends a payment receipt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;emailClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;payment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;receiptSentAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the worker crashes &lt;strong&gt;after sending the email&lt;/strong&gt; but &lt;strong&gt;before updating&lt;br&gt;
the database&lt;/strong&gt;, the job will be retried.&lt;/p&gt;

&lt;p&gt;Now the customer receives two receipts.&lt;/p&gt;

&lt;p&gt;The queue behaved exactly as designed.&lt;/p&gt;

&lt;p&gt;But the business outcome is incorrect.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Production Systems Break Here
&lt;/h2&gt;

&lt;p&gt;Background job systems introduce two things that make correctness&lt;br&gt;
harder.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Duplicate execution
&lt;/h3&gt;

&lt;p&gt;Workers can crash after performing side effects but before acknowledging&lt;br&gt;
the message.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Time separation
&lt;/h3&gt;

&lt;p&gt;Jobs may execute minutes or hours after they were created, when system&lt;br&gt;
state has already changed.&lt;/p&gt;

&lt;p&gt;Because of this, retries often interact with &lt;strong&gt;partial state&lt;/strong&gt; or&lt;br&gt;
&lt;strong&gt;outdated context&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Design Rule Most Teams Learn Later
&lt;/h2&gt;

&lt;p&gt;A background job should never be treated as a &lt;strong&gt;one-time action&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It should be treated as a &lt;strong&gt;replayable command&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Every handler should be safe if it runs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  twice&lt;/li&gt;
&lt;li&gt;  later than expected&lt;/li&gt;
&lt;li&gt;  after partial completion&lt;/li&gt;
&lt;li&gt;  out of order&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If those conditions break the workflow, retries will eventually corrupt&lt;br&gt;
system behavior.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Monitoring Trap
&lt;/h2&gt;

&lt;p&gt;Teams often monitor queue infrastructure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  queue depth&lt;/li&gt;
&lt;li&gt;  worker throughput&lt;/li&gt;
&lt;li&gt;  retry counts&lt;/li&gt;
&lt;li&gt;  dead-letter volume&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those metrics matter - but they don't answer questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Did users receive duplicate emails?&lt;/li&gt;
&lt;li&gt;  Did a payment create multiple ledger entries?&lt;/li&gt;
&lt;li&gt;  Did downstream systems receive conflicting updates?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A queue dashboard can look completely healthy while the workflow is&lt;br&gt;
incorrect.&lt;/p&gt;




&lt;h2&gt;
  
  
  Read the Full Production Breakdown
&lt;/h2&gt;

&lt;p&gt;This post only covers the core failure patterns.&lt;/p&gt;

&lt;p&gt;The full article explains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  why retries can &lt;strong&gt;make outages worse&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  how &lt;strong&gt;idempotent background jobs&lt;/strong&gt; are designed&lt;/li&gt;
&lt;li&gt;  why &lt;strong&gt;dead-letter queues silently grow&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  what production teams monitor &lt;strong&gt;beyond queue depth&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  a &lt;strong&gt;practical rollout checklist&lt;/strong&gt; for new background jobs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;👉 &lt;strong&gt;Full article:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://codenotes.tech/blog/background-jobs-in-production" rel="noopener noreferrer"&gt;https://codenotes.tech/blog/background-jobs-in-production&lt;/a&gt;&lt;/p&gt;

</description>
      <category>sre</category>
      <category>backend</category>
      <category>distributedsystems</category>
    </item>
    <item>
      <title>Why AI Code Review Comments Look Right but Miss Real Risks</title>
      <dc:creator>Taras H</dc:creator>
      <pubDate>Fri, 27 Feb 2026 16:59:54 +0000</pubDate>
      <link>https://dev.to/taras_h_7a24f2b356a6e/why-ai-code-review-comments-look-right-but-miss-real-risks-1j74</link>
      <guid>https://dev.to/taras_h_7a24f2b356a6e/why-ai-code-review-comments-look-right-but-miss-real-risks-1j74</guid>
      <description>&lt;p&gt;Many teams have added AI code review to their pull request workflow.&lt;/p&gt;

&lt;p&gt;The promise is obvious: faster feedback, broader coverage, fewer review bottlenecks. AI scans every diff, flags suspicious code, suggests test cases, and highlights style issues in seconds.&lt;/p&gt;

&lt;p&gt;Pull requests move faster. Review queues shrink. Everything looks healthier.&lt;/p&gt;

&lt;p&gt;But production incidents don’t disappear.&lt;/p&gt;

&lt;p&gt;So the practical question emerges:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If AI reviews every PR, why are high-risk issues still reaching production?&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Reasonable Assumption
&lt;/h2&gt;

&lt;p&gt;It’s natural to assume:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;More review coverage + faster feedback = better quality.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;AI increases comment volume. It catches missing null checks. It suggests cleaner error handling. It improves surface-level consistency.&lt;/p&gt;

&lt;p&gt;At a process level, things look better.&lt;/p&gt;

&lt;p&gt;But review activity is not the same thing as risk reduction.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where the Gap Appears
&lt;/h2&gt;

&lt;p&gt;Most AI code review tools are excellent at:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pattern matching&lt;/li&gt;
&lt;li&gt;Local correctness&lt;/li&gt;
&lt;li&gt;Code explanation&lt;/li&gt;
&lt;li&gt;Generic best practices&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They are much weaker at:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Business logic validation&lt;/li&gt;
&lt;li&gt;Authorization boundaries&lt;/li&gt;
&lt;li&gt;Implicit architectural constraints&lt;/li&gt;
&lt;li&gt;Production failure modes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;updateUserRole&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findUnique&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;where&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;userId&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;User not found&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;where&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;userId&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;role&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An AI reviewer might suggest stronger validation or clearer error handling.&lt;/p&gt;

&lt;p&gt;But the real production risk may be completely different:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Who is allowed to change roles?&lt;/li&gt;
&lt;li&gt;Is there audit logging?&lt;/li&gt;
&lt;li&gt;Does this break cross-service assumptions?&lt;/li&gt;
&lt;li&gt;What happens under concurrent updates?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These risks don’t live in the diff. They live in the system.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why AI Feels More Effective Than It Is
&lt;/h2&gt;

&lt;p&gt;Three patterns show up repeatedly:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Plausible Comments Create Confidence
&lt;/h3&gt;

&lt;p&gt;LLMs generate comments that &lt;em&gt;sound correct&lt;/em&gt;. That increases perceived rigor — even when the risk profile hasn’t changed.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Diffs Hide System Context
&lt;/h3&gt;

&lt;p&gt;Pull requests rarely include architectural history, compliance constraints, or production incident lessons. Humans often carry this context implicitly. AI usually doesn’t.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Automation Changes Human Behavior
&lt;/h3&gt;

&lt;p&gt;When AI has already “reviewed” the code, humans subtly shift from critical analysis to verification mode.&lt;/p&gt;

&lt;p&gt;The question changes from:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“What could fail in production?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;to:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Did we resolve the AI comments?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That shift matters.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Key Insight
&lt;/h2&gt;

&lt;p&gt;AI expands coverage.&lt;/p&gt;

&lt;p&gt;Humans must still own judgment.&lt;/p&gt;

&lt;p&gt;AI is strong at local correctness. Production failures usually emerge from system interactions: retries under load, cache drift, authorization boundaries, cross-service contracts.&lt;/p&gt;

&lt;p&gt;If the review process optimizes for comment resolution instead of failure thinking, speed improves — but risk stays constant.&lt;/p&gt;




&lt;h2&gt;
  
  
  If You’re Using AI Review
&lt;/h2&gt;

&lt;p&gt;A useful mental model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Let AI handle first-pass mechanical checks.&lt;/li&gt;
&lt;li&gt;Explicitly reserve human review for system-level risk.&lt;/li&gt;
&lt;li&gt;Measure escaped defects — not comment counts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The real question isn’t whether AI comments are helpful.&lt;/p&gt;

&lt;p&gt;It’s whether your review process still forces engineers to think about how systems fail in production.&lt;/p&gt;




&lt;p&gt;If this topic resonates, the full breakdown goes deeper into why this happens and how teams misinterpret review signal vs. real risk:&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;Full article:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://codenotes.tech/blog/why-ai-code-review-comments-look-right-but-miss-real-risks" rel="noopener noreferrer"&gt;https://codenotes.tech/blog/why-ai-code-review-comments-look-right-but-miss-real-risks&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>codereview</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
