<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Taras H</title>
    <description>The latest articles on DEV Community by Taras H (@taras_h_7a24f2b356a6e).</description>
    <link>https://dev.to/taras_h_7a24f2b356a6e</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3784318%2F039a00ba-82ed-4e06-ab84-4121d4681b1f.png</url>
      <title>DEV Community: Taras H</title>
      <link>https://dev.to/taras_h_7a24f2b356a6e</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/taras_h_7a24f2b356a6e"/>
    <language>en</language>
    <item>
      <title>Why Integration Tests Flake in CI but Pass Locally</title>
      <dc:creator>Taras H</dc:creator>
      <pubDate>Tue, 09 Jun 2026 15:58:26 +0000</pubDate>
      <link>https://dev.to/taras_h_7a24f2b356a6e/why-integration-tests-flake-in-ci-but-pass-locally-3moc</link>
      <guid>https://dev.to/taras_h_7a24f2b356a6e/why-integration-tests-flake-in-ci-but-pass-locally-3moc</guid>
      <description>&lt;p&gt;An integration test that passes locally and fails in CI is usually not random.&lt;/p&gt;

&lt;p&gt;It is usually depending on something the test does not control:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;shared database state&lt;/li&gt;
&lt;li&gt;test order&lt;/li&gt;
&lt;li&gt;worker parallelism&lt;/li&gt;
&lt;li&gt;real time&lt;/li&gt;
&lt;li&gt;background jobs&lt;/li&gt;
&lt;li&gt;service startup timing&lt;/li&gt;
&lt;li&gt;reused fixture data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why the same test can fail in CI, pass on rerun, and then fail again tomorrow.&lt;br&gt;
The rerun did not fix anything.&lt;br&gt;
It only gave the hidden assumption a better environment.&lt;/p&gt;

&lt;p&gt;I have found that flaky integration tests become much easier to fix when you stop asking:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How do I make this test pass?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;and start asking:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What did this test borrow from the environment?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Here are the failure patterns I would check first.&lt;/p&gt;
&lt;h2&gt;
  
  
  1. The Test Shares Data With Another Test
&lt;/h2&gt;

&lt;p&gt;This is the classic one.&lt;/p&gt;

&lt;p&gt;The test creates a user like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;createUser&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;integration@example.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It passes locally because you run one file against one clean database.&lt;/p&gt;

&lt;p&gt;Then CI runs several files in parallel.&lt;br&gt;
Another test creates the same email.&lt;br&gt;
A previous run leaves a row behind.&lt;br&gt;
A worker truncates a table while another worker is asserting behavior.&lt;/p&gt;

&lt;p&gt;Now the test fails with a unique constraint error, a wrong row count, or a status that makes no sense.&lt;/p&gt;

&lt;p&gt;The fix is not to retry the test.&lt;br&gt;
The fix is to give every run ownership of its own data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;testId&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;GITHUB_RUN_ID&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;local&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;TEST_WORKER_ID&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;w0&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;crypto&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randomUUID&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;_&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;runId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;testId&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;order-create&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;createUser&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;runId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;@example.test`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;testRunId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;runId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the failure log can tell you which run created the data, and cleanup can target only that run.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Cleanup Works Locally But Breaks Under Parallelism
&lt;/h2&gt;

&lt;p&gt;Database cleanup is part of the test design.&lt;br&gt;
It is not just housekeeping.&lt;/p&gt;

&lt;p&gt;For serial tests, truncating tables between cases can be fine:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;TRUNCATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt;
  &lt;span class="n"&gt;outbox_events&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;payment_attempts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;order_items&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;users&lt;/span&gt;
&lt;span class="k"&gt;RESTART&lt;/span&gt; &lt;span class="k"&gt;IDENTITY&lt;/span&gt; &lt;span class="k"&gt;CASCADE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But if multiple CI workers share the same database, truncation can become destructive.&lt;br&gt;
One worker may delete rows while another worker is still using them.&lt;/p&gt;

&lt;p&gt;Better options are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;database per worker&lt;/li&gt;
&lt;li&gt;schema per worker&lt;/li&gt;
&lt;li&gt;run-scoped data with a &lt;code&gt;testRunId&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;transaction rollback when the app and test can share the same transaction boundary&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The exact choice depends on the stack.&lt;br&gt;
The rule is simpler:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;One worker should not be able to delete or mutate another worker's data.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  3. The Test Sleeps Instead Of Waiting For Behavior
&lt;/h2&gt;

&lt;p&gt;This is another common source of CI-only flakes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/api/orders&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;order&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findFirst&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;where&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;externalReference&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;externalReference&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;order&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toBe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;confirmed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This test is not waiting for the system.&lt;br&gt;
It is waiting for the clock.&lt;/p&gt;

&lt;p&gt;If CI is slow, 500 ms is not enough.&lt;br&gt;
If CI is fast, the sleep wastes time.&lt;br&gt;
If the worker crashed, the test waits anyway and then fails with weak evidence.&lt;/p&gt;

&lt;p&gt;Prefer bounded polling against the state that matters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;eventually&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;read&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;assert&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;timeoutMs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;intervalMs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;deadline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;timeoutMs&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;lastError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;

  &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;deadline&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nf"&gt;assert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;lastError&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;
      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;intervalMs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nx"&gt;lastError&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The timeout still exists, but now it protects a meaningful condition.&lt;br&gt;
The test waits for a durable effect, not an arbitrary delay.&lt;/p&gt;
&lt;h2&gt;
  
  
  4. The Test Accidentally Depends On Real Time
&lt;/h2&gt;

&lt;p&gt;Some tests fail only near midnight, month end, daylight saving changes, or slow CI runs.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;token expiration&lt;/li&gt;
&lt;li&gt;trial period calculation&lt;/li&gt;
&lt;li&gt;scheduled jobs&lt;/li&gt;
&lt;li&gt;invoice dates&lt;/li&gt;
&lt;li&gt;order expiry&lt;/li&gt;
&lt;li&gt;time-zone conversion&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If time is not the behavior under test, freeze it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nf"&gt;beforeEach&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;clock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;freeze&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;2026-05-30T10:00:00.000Z&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="nf"&gt;afterEach&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;clock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;restore&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Also make the CI time zone explicit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;TZ&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;UTC&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One caveat: make sure the application and database agree about time.&lt;br&gt;
If the app uses a fake JavaScript clock but the database uses &lt;code&gt;now()&lt;/code&gt;, the test can still be inconsistent.&lt;/p&gt;
&lt;h2&gt;
  
  
  5. The Dependency Is Running But Not Ready
&lt;/h2&gt;

&lt;p&gt;A container can be "up" before it is useful.&lt;/p&gt;

&lt;p&gt;PostgreSQL may accept connections before migrations finish.&lt;br&gt;
An HTTP stub may open a port before fixtures are loaded.&lt;br&gt;
A worker may start after the first request already created a job.&lt;/p&gt;

&lt;p&gt;The first test fails.&lt;br&gt;
The rerun passes.&lt;/p&gt;

&lt;p&gt;That is a readiness problem, not a flaky assertion.&lt;/p&gt;

&lt;p&gt;Make startup explicit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nf"&gt;beforeAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;waitForDatabase&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;runMigrations&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;resetDatabase&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;waitForWorker&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;"Port is open" is weaker than "the service can do the work this test needs."&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Log When A CI Integration Test Fails
&lt;/h2&gt;

&lt;p&gt;Before changing the test, I want the failure to leave evidence.&lt;/p&gt;

&lt;p&gt;At minimum:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;test name and file&lt;/li&gt;
&lt;li&gt;CI run id&lt;/li&gt;
&lt;li&gt;attempt number&lt;/li&gt;
&lt;li&gt;worker id&lt;/li&gt;
&lt;li&gt;database or schema name&lt;/li&gt;
&lt;li&gt;random seed, if the runner has one&lt;/li&gt;
&lt;li&gt;ids of created users/orders/tenants&lt;/li&gt;
&lt;li&gt;current time and time zone&lt;/li&gt;
&lt;li&gt;recent relevant database rows&lt;/li&gt;
&lt;li&gt;pending background jobs&lt;/li&gt;
&lt;li&gt;dependency stub calls&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without that, a rerun can make the test green without teaching you anything.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Useful Debugging Order
&lt;/h2&gt;

&lt;p&gt;When an integration test flakes in CI, I check this sequence:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Does it reuse global fixture values?&lt;/li&gt;
&lt;li&gt;Can two workers touch the same rows?&lt;/li&gt;
&lt;li&gt;Does cleanup delete another worker's data?&lt;/li&gt;
&lt;li&gt;Does it sleep instead of waiting for observable state?&lt;/li&gt;
&lt;li&gt;Does it depend on the real clock or local time zone?&lt;/li&gt;
&lt;li&gt;Are containers actually ready before tests start?&lt;/li&gt;
&lt;li&gt;Does the failure reproduce only with parallelism?&lt;/li&gt;
&lt;li&gt;Is the test exposing a real race condition in the product?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That last point matters.&lt;/p&gt;

&lt;p&gt;Sometimes a flaky integration test is not "just a bad test."&lt;br&gt;
Sometimes it is the only thing showing you that a boundary is unsafe: a transaction commits too early, an idempotency key is missing, a worker can process the same job twice, or a status transition is not atomic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaway
&lt;/h2&gt;

&lt;p&gt;A flaky integration test is a test with an uncontrolled dependency.&lt;/p&gt;

&lt;p&gt;The dependency might be data, time, order, parallelism, cleanup, startup, or a real product race.&lt;/p&gt;

&lt;p&gt;Do not start by hiding the failure.&lt;br&gt;
Find what the test borrowed from the environment.&lt;br&gt;
Then make that dependency explicit, isolated, observable, or removed.&lt;/p&gt;

&lt;p&gt;I wrote a longer version with database cleanup strategies, bounded polling examples, quarantine rules, and a checklist here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://codenotes.tech/blog/flaky-integration-tests-in-ci" rel="noopener noreferrer"&gt;https://codenotes.tech/blog/flaky-integration-tests-in-ci&lt;/a&gt;&lt;/p&gt;

</description>
      <category>testing</category>
      <category>ci</category>
      <category>backend</category>
      <category>debugging</category>
    </item>
    <item>
      <title>API Contract Testing: Why Safe Changes Still Break Clients</title>
      <dc:creator>Taras H</dc:creator>
      <pubDate>Sat, 09 May 2026 12:00:00 +0000</pubDate>
      <link>https://dev.to/taras_h_7a24f2b356a6e/api-contract-testing-why-safe-changes-still-break-clients-4ack</link>
      <guid>https://dev.to/taras_h_7a24f2b356a6e/api-contract-testing-why-safe-changes-still-break-clients-4ack</guid>
      <description>&lt;p&gt;APIs often break clients in ways that don’t show up in tests.&lt;/p&gt;

&lt;p&gt;A change looks safe inside the service:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;remove an unused field
&lt;/li&gt;
&lt;li&gt;tighten validation
&lt;/li&gt;
&lt;li&gt;adjust an error response
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everything still works locally. Tests pass. Deployment succeeds.&lt;/p&gt;

&lt;p&gt;Then a mobile app crashes.&lt;br&gt;&lt;br&gt;
A partner integration fails.&lt;br&gt;&lt;br&gt;
An older frontend silently breaks.&lt;/p&gt;

&lt;p&gt;Nothing is “wrong” in the service - but the &lt;strong&gt;contract changed&lt;/strong&gt;.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Problem: Internal Changes, External Breakage
&lt;/h2&gt;

&lt;p&gt;Consider a response:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ord_123"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"paid"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"customer"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"cus_456"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"email"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ada@example.com"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Removing &lt;code&gt;customer.email&lt;/code&gt; looks like cleanup.&lt;/p&gt;

&lt;p&gt;But for a client, that field might:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;power a receipt screen
&lt;/li&gt;
&lt;li&gt;feed an export pipeline
&lt;/li&gt;
&lt;li&gt;be required in a generated SDK
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From the server’s perspective: harmless&lt;br&gt;&lt;br&gt;
From the client’s perspective: breaking change  &lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Keeps Happening
&lt;/h2&gt;

&lt;p&gt;Most tests focus on &lt;strong&gt;correctness inside the service&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;business logic
&lt;/li&gt;
&lt;li&gt;database state
&lt;/li&gt;
&lt;li&gt;request handling
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They don’t protect &lt;strong&gt;what clients depend on&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That gap is where breakage happens.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Contract Testing Actually Protects
&lt;/h2&gt;

&lt;p&gt;Contract testing focuses on the &lt;strong&gt;boundary between services and clients&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It answers:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Did we change what clients rely on?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Typical breaking changes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;removing or renaming fields
&lt;/li&gt;
&lt;li&gt;changing types
&lt;/li&gt;
&lt;li&gt;adding required inputs
&lt;/li&gt;
&lt;li&gt;changing error formats
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Non-breaking (usually):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;adding optional fields
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why Schema Alone Isn’t Enough
&lt;/h2&gt;

&lt;p&gt;Schema diffs (like OpenAPI) catch structure.&lt;/p&gt;

&lt;p&gt;But real systems depend on behavior:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;error codes
&lt;/li&gt;
&lt;li&gt;pagination shape
&lt;/li&gt;
&lt;li&gt;idempotency responses
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those require &lt;strong&gt;examples&lt;/strong&gt;, not just schemas.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Key Insight
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;Most API breakages aren’t failures of code - they’re failures of assumptions at the boundary.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Contract testing makes those assumptions visible &lt;strong&gt;before release&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thought
&lt;/h2&gt;

&lt;p&gt;Small internal changes can quietly break external systems.&lt;/p&gt;

&lt;p&gt;Contract testing turns that from a surprise into a decision.&lt;/p&gt;

&lt;p&gt;👉 Full deep dive: &lt;a href="https://codenotes.tech/blog/api-contract-testing-prevent-breaking-clients-before-release" rel="noopener noreferrer"&gt;https://codenotes.tech/blog/api-contract-testing-prevent-breaking-clients-before-release&lt;/a&gt;&lt;/p&gt;

</description>
      <category>api</category>
      <category>testing</category>
      <category>softwareengineering</category>
      <category>backend</category>
    </item>
    <item>
      <title>How to Write API Integration Tests (That Actually Catch Bugs)</title>
      <dc:creator>Taras H</dc:creator>
      <pubDate>Mon, 04 May 2026 15:05:14 +0000</pubDate>
      <link>https://dev.to/taras_h_7a24f2b356a6e/how-to-write-api-integration-tests-that-actually-catch-bugs-1fhe</link>
      <guid>https://dev.to/taras_h_7a24f2b356a6e/how-to-write-api-integration-tests-that-actually-catch-bugs-1fhe</guid>
      <description>&lt;p&gt;API integration tests aren’t about checking &lt;code&gt;200 OK&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;They exist to answer a harder question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;When a real request crosses authentication, validation, persistence, and transactions — does the system behave correctly?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Most production bugs don’t live inside a single function.&lt;br&gt;&lt;br&gt;
They happen &lt;strong&gt;between boundaries&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Integration Tests Should Actually Prove
&lt;/h2&gt;

&lt;p&gt;A useful API integration test verifies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;routing + request parsing
&lt;/li&gt;
&lt;li&gt;authentication &amp;amp; authorization
&lt;/li&gt;
&lt;li&gt;validation and error shape
&lt;/li&gt;
&lt;li&gt;database writes and reads
&lt;/li&gt;
&lt;li&gt;transaction boundaries
&lt;/li&gt;
&lt;li&gt;response contract
&lt;/li&gt;
&lt;li&gt;retry / duplicate handling
&lt;/li&gt;
&lt;li&gt;concurrency behavior
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This sits between unit tests and end-to-end tests:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Unit&lt;/strong&gt; → logic correctness
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integration&lt;/strong&gt; → system behavior at boundaries
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;E2E&lt;/strong&gt; → full user journey
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Start With Risk, Not Coverage
&lt;/h2&gt;

&lt;p&gt;Don’t write the same number of tests per endpoint.&lt;/p&gt;

&lt;p&gt;Focus on endpoints that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;mutate important data
&lt;/li&gt;
&lt;li&gt;cross auth or tenant boundaries
&lt;/li&gt;
&lt;li&gt;involve multiple writes or transactions
&lt;/li&gt;
&lt;li&gt;depend on external systems
&lt;/li&gt;
&lt;li&gt;must handle retries or duplicates
&lt;/li&gt;
&lt;li&gt;have broken in production before
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal isn’t coverage.&lt;br&gt;&lt;br&gt;
It’s &lt;strong&gt;risk-weighted confidence&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What a Good Test Looks Like
&lt;/h2&gt;

&lt;p&gt;A strong integration test:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;sends a real HTTP request
&lt;/li&gt;
&lt;li&gt;goes through real auth + validation
&lt;/li&gt;
&lt;li&gt;writes to a real database
&lt;/li&gt;
&lt;li&gt;asserts &lt;strong&gt;persisted state&lt;/strong&gt;, not just response
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/api/orders&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Authorization&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;`Bearer &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toBe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;201&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;order&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findFirst&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;
&lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;order&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toBeTruthy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you only check the response, you’re missing half the system.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Critical Cases People Skip
&lt;/h2&gt;

&lt;p&gt;These are where integration tests provide real value:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Authorization&lt;/strong&gt; → tenant / ownership rules
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validation&lt;/strong&gt; → bad input blocked before writes
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rollback&lt;/strong&gt; → no partial state on failure
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Idempotency&lt;/strong&gt; → retries don’t duplicate effects
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Concurrency&lt;/strong&gt; → overlapping requests don’t break invariants
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most bugs hide here—not in the happy path.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Not to Mock
&lt;/h2&gt;

&lt;p&gt;Avoid mocking:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;database
&lt;/li&gt;
&lt;li&gt;transactions
&lt;/li&gt;
&lt;li&gt;auth middleware
&lt;/li&gt;
&lt;li&gt;validation
&lt;/li&gt;
&lt;li&gt;your own application code
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Mock only external systems like payments or email.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Mock dependencies outside your system, not inside it.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Short Version
&lt;/h2&gt;

&lt;p&gt;Good API integration tests prove:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;real requests cross real boundaries
&lt;/li&gt;
&lt;li&gt;correct state is persisted
&lt;/li&gt;
&lt;li&gt;failures don’t leave partial data
&lt;/li&gt;
&lt;li&gt;retries and concurrency are safe
&lt;/li&gt;
&lt;li&gt;response contracts don’t break clients
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They matter because production bugs usually live &lt;strong&gt;between components&lt;/strong&gt;, not inside them.&lt;/p&gt;




&lt;p&gt;👉 Full deep dive (idempotency, concurrency, rollback examples):&lt;br&gt;&lt;br&gt;
&lt;a href="https://codenotes.tech/blog/how-to-write-api-integration-tests" rel="noopener noreferrer"&gt;https://codenotes.tech/blog/how-to-write-api-integration-tests&lt;/a&gt;&lt;/p&gt;

</description>
      <category>testing</category>
      <category>api</category>
      <category>sre</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>API Idempotency Keys: Prevent Duplicate Requests</title>
      <dc:creator>Taras H</dc:creator>
      <pubDate>Sun, 26 Apr 2026 17:00:00 +0000</pubDate>
      <link>https://dev.to/taras_h_7a24f2b356a6e/api-idempotency-keys-prevent-duplicate-requests-3gca</link>
      <guid>https://dev.to/taras_h_7a24f2b356a6e/api-idempotency-keys-prevent-duplicate-requests-3gca</guid>
      <description>&lt;p&gt;Duplicate requests aren’t edge cases - they’re normal behavior in distributed systems.&lt;/p&gt;

&lt;p&gt;A client times out, retries, and suddenly your API creates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;two payments&lt;/li&gt;
&lt;li&gt;two orders&lt;/li&gt;
&lt;li&gt;two subscriptions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Idempotency keys exist to prevent that. But many implementations still fail under real conditions.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Consider &lt;code&gt;POST /payments&lt;/code&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Server processes the payment&lt;/li&gt;
&lt;li&gt;Response is lost (timeout, network issue)&lt;/li&gt;
&lt;li&gt;Client retries&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Without idempotency, the retry looks like a new request → duplicate charge.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Assumption That Breaks
&lt;/h2&gt;

&lt;p&gt;A common approach is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Store the idempotency key and reject duplicates.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This sounds correct—but it’s not.&lt;/p&gt;

&lt;p&gt;Two concurrent requests can both:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;check for the key&lt;/li&gt;
&lt;li&gt;see nothing&lt;/li&gt;
&lt;li&gt;execute the side effect&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Result: duplicates still happen.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Actually Works
&lt;/h2&gt;

&lt;p&gt;Idempotency is not just storing keys - it’s about &lt;strong&gt;ownership of execution&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The critical rule:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Only one request must be allowed to perform the operation.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This requires an &lt;strong&gt;atomic reservation&lt;/strong&gt;, typically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SQL: unique constraint + &lt;code&gt;INSERT ... ON CONFLICT&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Redis: &lt;code&gt;SET NX&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everything else builds on top of that.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Minimum Safe Design
&lt;/h2&gt;

&lt;p&gt;A correct implementation must:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Atomically reserve &lt;code&gt;(scope, key)&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Store a &lt;strong&gt;request fingerprint&lt;/strong&gt; (to detect misuse)&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Track state:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;in_progress&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;completed&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ambiguous&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Replay the &lt;strong&gt;original response&lt;/strong&gt; on retries&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Reject same key with different payload&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Use a &lt;strong&gt;TTL that matches real retry behavior&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;

&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Hard Part: Ambiguous Failures
&lt;/h2&gt;

&lt;p&gt;The real failure mode isn’t duplicates - it’s uncertainty.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Payment provider accepts the charge&lt;/li&gt;
&lt;li&gt;Your service times out&lt;/li&gt;
&lt;li&gt;Client retries&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You don’t know if the charge succeeded.&lt;/p&gt;

&lt;p&gt;Retrying blindly can double-charge.&lt;/p&gt;

&lt;p&gt;Safe systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;mark the request as &lt;strong&gt;ambiguous&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;reconcile with downstream systems&lt;/li&gt;
&lt;li&gt;only finalize once certainty is restored&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Practical Signals
&lt;/h2&gt;

&lt;p&gt;If idempotency is working, you should see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;replayed responses (normal)&lt;/li&gt;
&lt;li&gt;occasional in-progress conflicts&lt;/li&gt;
&lt;li&gt;rare payload mismatches&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If not, expect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;duplicate writes&lt;/li&gt;
&lt;li&gt;inconsistent downstream state&lt;/li&gt;
&lt;li&gt;hard-to-debug production issues&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Core Insight
&lt;/h2&gt;

&lt;p&gt;Idempotency keys are not a cache.&lt;/p&gt;

&lt;p&gt;They are a &lt;strong&gt;correctness boundary&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;they define ownership&lt;/li&gt;
&lt;li&gt;they prevent duplicate side effects&lt;/li&gt;
&lt;li&gt;they preserve system integrity under retries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without atomic reservation and state modeling, they don’t actually solve the problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  Full Article
&lt;/h2&gt;

&lt;p&gt;For the complete breakdown (schema design, handler flow, TTL strategy, and failure cases):&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://codenotes.tech/blog/api-idempotency-keys-prevent-duplicate-requests" rel="noopener noreferrer"&gt;https://codenotes.tech/blog/api-idempotency-keys-prevent-duplicate-requests&lt;/a&gt;&lt;/p&gt;

</description>
      <category>api</category>
      <category>reliability</category>
      <category>distributedsystems</category>
      <category>backenddevelopment</category>
    </item>
    <item>
      <title>Background Jobs in Production: The Problems Queues Don’t Solve</title>
      <dc:creator>Taras H</dc:creator>
      <pubDate>Sun, 08 Mar 2026 11:17:45 +0000</pubDate>
      <link>https://dev.to/taras_h_7a24f2b356a6e/background-jobs-in-production-the-problems-queues-dont-solve-209a</link>
      <guid>https://dev.to/taras_h_7a24f2b356a6e/background-jobs-in-production-the-problems-queues-dont-solve-209a</guid>
      <description>&lt;p&gt;Moving work out of the request path is one of the most common ways to&lt;br&gt;
speed up backend systems.&lt;/p&gt;

&lt;p&gt;Emails are sent asynchronously.&lt;br&gt;
Invoices are generated by workers.&lt;br&gt;
Webhooks are delivered through queues.&lt;br&gt;
Image processing and indexing run in background jobs.&lt;/p&gt;

&lt;p&gt;Latency improves immediately.&lt;/p&gt;

&lt;p&gt;But many teams eventually notice something strange in production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  duplicate emails appear&lt;/li&gt;
&lt;li&gt;  retries increase system load&lt;/li&gt;
&lt;li&gt;  dead-letter queues slowly grow&lt;/li&gt;
&lt;li&gt;  workflows technically "succeed"... but the outcome is wrong&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The queue is healthy.&lt;br&gt;
Workers are running.&lt;/p&gt;

&lt;p&gt;Yet the system behaves incorrectly.&lt;/p&gt;

&lt;p&gt;Moving work to the background &lt;strong&gt;changes where failures happen&lt;/strong&gt;.&lt;br&gt;
It does not remove them.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This post is a shorter version of a deeper engineering write-up&lt;br&gt;
originally published on CodeNotes.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  The Assumption Behind Background Jobs
&lt;/h2&gt;

&lt;p&gt;Background job systems are usually introduced with a simple expectation:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If a job fails, the queue will retry it until it succeeds.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Queues also provide useful features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  buffering traffic spikes&lt;/li&gt;
&lt;li&gt;  independent worker scaling&lt;/li&gt;
&lt;li&gt;  retry handling&lt;/li&gt;
&lt;li&gt;  isolation from request latency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because of this, async processing often &lt;em&gt;feels&lt;/em&gt; safer than synchronous&lt;br&gt;
execution.&lt;/p&gt;

&lt;p&gt;But that assumption depends on something rarely guaranteed in&lt;br&gt;
production:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;that running a job multiple times produces the same result as running&lt;br&gt;
it once.&lt;/strong&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  What "At-Least-Once Delivery" Actually Means
&lt;/h2&gt;

&lt;p&gt;Most queue systems guarantee &lt;strong&gt;at-least-once delivery&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That means the system will try hard to deliver a message - even if it&lt;br&gt;
results in duplicate execution.&lt;/p&gt;

&lt;p&gt;It does &lt;strong&gt;not&lt;/strong&gt; mean:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  the job runs exactly once&lt;/li&gt;
&lt;li&gt;  side effects happen exactly once&lt;/li&gt;
&lt;li&gt;  messages are processed in order&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words, the queue protects against &lt;strong&gt;message loss&lt;/strong&gt;, not&lt;br&gt;
&lt;strong&gt;duplicate work&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Once duplicate execution becomes possible, correctness has to come from&lt;br&gt;
somewhere else.&lt;/p&gt;

&lt;p&gt;Usually that means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  idempotent handlers&lt;/li&gt;
&lt;li&gt;  deduplication keys&lt;/li&gt;
&lt;li&gt;  explicit state transitions&lt;/li&gt;
&lt;li&gt;  retry boundaries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without those protections, the infrastructure is reliable while the&lt;br&gt;
workflow is not.&lt;/p&gt;


&lt;h2&gt;
  
  
  A Classic Failure Scenario
&lt;/h2&gt;

&lt;p&gt;Consider a worker that sends a payment receipt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;emailClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;payment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;receiptSentAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the worker crashes &lt;strong&gt;after sending the email&lt;/strong&gt; but &lt;strong&gt;before updating&lt;br&gt;
the database&lt;/strong&gt;, the job will be retried.&lt;/p&gt;

&lt;p&gt;Now the customer receives two receipts.&lt;/p&gt;

&lt;p&gt;The queue behaved exactly as designed.&lt;/p&gt;

&lt;p&gt;But the business outcome is incorrect.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Production Systems Break Here
&lt;/h2&gt;

&lt;p&gt;Background job systems introduce two things that make correctness&lt;br&gt;
harder.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Duplicate execution
&lt;/h3&gt;

&lt;p&gt;Workers can crash after performing side effects but before acknowledging&lt;br&gt;
the message.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Time separation
&lt;/h3&gt;

&lt;p&gt;Jobs may execute minutes or hours after they were created, when system&lt;br&gt;
state has already changed.&lt;/p&gt;

&lt;p&gt;Because of this, retries often interact with &lt;strong&gt;partial state&lt;/strong&gt; or&lt;br&gt;
&lt;strong&gt;outdated context&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Design Rule Most Teams Learn Later
&lt;/h2&gt;

&lt;p&gt;A background job should never be treated as a &lt;strong&gt;one-time action&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It should be treated as a &lt;strong&gt;replayable command&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Every handler should be safe if it runs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  twice&lt;/li&gt;
&lt;li&gt;  later than expected&lt;/li&gt;
&lt;li&gt;  after partial completion&lt;/li&gt;
&lt;li&gt;  out of order&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If those conditions break the workflow, retries will eventually corrupt&lt;br&gt;
system behavior.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Monitoring Trap
&lt;/h2&gt;

&lt;p&gt;Teams often monitor queue infrastructure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  queue depth&lt;/li&gt;
&lt;li&gt;  worker throughput&lt;/li&gt;
&lt;li&gt;  retry counts&lt;/li&gt;
&lt;li&gt;  dead-letter volume&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those metrics matter - but they don't answer questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Did users receive duplicate emails?&lt;/li&gt;
&lt;li&gt;  Did a payment create multiple ledger entries?&lt;/li&gt;
&lt;li&gt;  Did downstream systems receive conflicting updates?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A queue dashboard can look completely healthy while the workflow is&lt;br&gt;
incorrect.&lt;/p&gt;




&lt;h2&gt;
  
  
  Read the Full Production Breakdown
&lt;/h2&gt;

&lt;p&gt;This post only covers the core failure patterns.&lt;/p&gt;

&lt;p&gt;The full article explains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  why retries can &lt;strong&gt;make outages worse&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  how &lt;strong&gt;idempotent background jobs&lt;/strong&gt; are designed&lt;/li&gt;
&lt;li&gt;  why &lt;strong&gt;dead-letter queues silently grow&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  what production teams monitor &lt;strong&gt;beyond queue depth&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  a &lt;strong&gt;practical rollout checklist&lt;/strong&gt; for new background jobs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;👉 &lt;strong&gt;Full article:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://codenotes.tech/blog/background-jobs-in-production" rel="noopener noreferrer"&gt;https://codenotes.tech/blog/background-jobs-in-production&lt;/a&gt;&lt;/p&gt;

</description>
      <category>sre</category>
      <category>backend</category>
      <category>distributedsystems</category>
    </item>
    <item>
      <title>Why AI Code Review Comments Look Right but Miss Real Risks</title>
      <dc:creator>Taras H</dc:creator>
      <pubDate>Fri, 27 Feb 2026 16:59:54 +0000</pubDate>
      <link>https://dev.to/taras_h_7a24f2b356a6e/why-ai-code-review-comments-look-right-but-miss-real-risks-1j74</link>
      <guid>https://dev.to/taras_h_7a24f2b356a6e/why-ai-code-review-comments-look-right-but-miss-real-risks-1j74</guid>
      <description>&lt;p&gt;Many teams have added AI code review to their pull request workflow.&lt;/p&gt;

&lt;p&gt;The promise is obvious: faster feedback, broader coverage, fewer review bottlenecks. AI scans every diff, flags suspicious code, suggests test cases, and highlights style issues in seconds.&lt;/p&gt;

&lt;p&gt;Pull requests move faster. Review queues shrink. Everything looks healthier.&lt;/p&gt;

&lt;p&gt;But production incidents don’t disappear.&lt;/p&gt;

&lt;p&gt;So the practical question emerges:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If AI reviews every PR, why are high-risk issues still reaching production?&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Reasonable Assumption
&lt;/h2&gt;

&lt;p&gt;It’s natural to assume:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;More review coverage + faster feedback = better quality.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;AI increases comment volume. It catches missing null checks. It suggests cleaner error handling. It improves surface-level consistency.&lt;/p&gt;

&lt;p&gt;At a process level, things look better.&lt;/p&gt;

&lt;p&gt;But review activity is not the same thing as risk reduction.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where the Gap Appears
&lt;/h2&gt;

&lt;p&gt;Most AI code review tools are excellent at:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pattern matching&lt;/li&gt;
&lt;li&gt;Local correctness&lt;/li&gt;
&lt;li&gt;Code explanation&lt;/li&gt;
&lt;li&gt;Generic best practices&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They are much weaker at:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Business logic validation&lt;/li&gt;
&lt;li&gt;Authorization boundaries&lt;/li&gt;
&lt;li&gt;Implicit architectural constraints&lt;/li&gt;
&lt;li&gt;Production failure modes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;updateUserRole&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findUnique&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;where&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;userId&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;User not found&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;where&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;userId&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;role&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An AI reviewer might suggest stronger validation or clearer error handling.&lt;/p&gt;

&lt;p&gt;But the real production risk may be completely different:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Who is allowed to change roles?&lt;/li&gt;
&lt;li&gt;Is there audit logging?&lt;/li&gt;
&lt;li&gt;Does this break cross-service assumptions?&lt;/li&gt;
&lt;li&gt;What happens under concurrent updates?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These risks don’t live in the diff. They live in the system.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why AI Feels More Effective Than It Is
&lt;/h2&gt;

&lt;p&gt;Three patterns show up repeatedly:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Plausible Comments Create Confidence
&lt;/h3&gt;

&lt;p&gt;LLMs generate comments that &lt;em&gt;sound correct&lt;/em&gt;. That increases perceived rigor — even when the risk profile hasn’t changed.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Diffs Hide System Context
&lt;/h3&gt;

&lt;p&gt;Pull requests rarely include architectural history, compliance constraints, or production incident lessons. Humans often carry this context implicitly. AI usually doesn’t.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Automation Changes Human Behavior
&lt;/h3&gt;

&lt;p&gt;When AI has already “reviewed” the code, humans subtly shift from critical analysis to verification mode.&lt;/p&gt;

&lt;p&gt;The question changes from:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“What could fail in production?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;to:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Did we resolve the AI comments?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That shift matters.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Key Insight
&lt;/h2&gt;

&lt;p&gt;AI expands coverage.&lt;/p&gt;

&lt;p&gt;Humans must still own judgment.&lt;/p&gt;

&lt;p&gt;AI is strong at local correctness. Production failures usually emerge from system interactions: retries under load, cache drift, authorization boundaries, cross-service contracts.&lt;/p&gt;

&lt;p&gt;If the review process optimizes for comment resolution instead of failure thinking, speed improves — but risk stays constant.&lt;/p&gt;




&lt;h2&gt;
  
  
  If You’re Using AI Review
&lt;/h2&gt;

&lt;p&gt;A useful mental model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Let AI handle first-pass mechanical checks.&lt;/li&gt;
&lt;li&gt;Explicitly reserve human review for system-level risk.&lt;/li&gt;
&lt;li&gt;Measure escaped defects — not comment counts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The real question isn’t whether AI comments are helpful.&lt;/p&gt;

&lt;p&gt;It’s whether your review process still forces engineers to think about how systems fail in production.&lt;/p&gt;




&lt;p&gt;If this topic resonates, the full breakdown goes deeper into why this happens and how teams misinterpret review signal vs. real risk:&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;Full article:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://codenotes.tech/blog/why-ai-code-review-comments-look-right-but-miss-real-risks" rel="noopener noreferrer"&gt;https://codenotes.tech/blog/why-ai-code-review-comments-look-right-but-miss-real-risks&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>codereview</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
