<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Petr Petrenko</title>
    <description>The latest articles on DEV Community by Petr Petrenko (@n0rm4l).</description>
    <link>https://dev.to/n0rm4l</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3975126%2F391ef96a-7f53-4e27-b60c-b5456fbfb34f.jpg</url>
      <title>DEV Community: Petr Petrenko</title>
      <link>https://dev.to/n0rm4l</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/n0rm4l"/>
    <language>en</language>
    <item>
      <title>Your Database Just Died. Why Is Everything Still Running?</title>
      <dc:creator>Petr Petrenko</dc:creator>
      <pubDate>Mon, 15 Jun 2026 05:50:46 +0000</pubDate>
      <link>https://dev.to/n0rm4l/your-database-just-died-why-is-everything-still-running-227p</link>
      <guid>https://dev.to/n0rm4l/your-database-just-died-why-is-everything-still-running-227p</guid>
      <description>&lt;p&gt;It's 3 AM. Your PostgreSQL pod crashes. On-call fires. The engineer wakes up, checks the dashboard, and spends 20 minutes figuring out &lt;em&gt;why&lt;/em&gt; the payments service is throwing 500s at 100% error rate before they realize — database is down, but payments is still running, hammering connection pools, logging thousands of errors per second, triggering cascading alerts.&lt;/p&gt;

&lt;p&gt;They scale payments to zero. Problem stops. Recovery begins.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;That 20 minutes was avoidable.&lt;/strong&gt; klink would have scaled payments to zero automatically — 30 seconds after the database went down.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem Nobody Talks About
&lt;/h2&gt;

&lt;p&gt;Kubernetes is excellent at keeping individual services running. Liveness probes, restart policies, resource limits — it's all there. But Kubernetes has no concept of &lt;em&gt;relationships between workloads&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;When your database fails, Kubernetes does exactly what you told it to: keeps the dependent services running. Those services now do nothing useful — they just generate noise, consume resources, and make your incident harder to debug.&lt;/p&gt;

&lt;p&gt;This is the gap klink fills.&lt;/p&gt;




&lt;h2&gt;
  
  
  What klink Does
&lt;/h2&gt;

&lt;p&gt;klink introduces a new primitive: &lt;strong&gt;WorkloadDependency&lt;/strong&gt;. You declare that service B depends on service A. klink watches. When A goes unhealthy, klink scales B to zero. When A recovers, klink restores B automatically.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;deps.klink.dev/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;WorkloadDependency&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;payments-needs-database&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;dependent&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;payments-service&lt;/span&gt;

  &lt;span class="na"&gt;dependsOn&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgresql&lt;/span&gt;
      &lt;span class="na"&gt;condition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;minReadyPercent&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;    &lt;span class="c1"&gt;# healthy if ≥80% pods ready&lt;/span&gt;
        &lt;span class="na"&gt;window&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;30s&lt;/span&gt;            &lt;span class="c1"&gt;# ignore transient restarts&lt;/span&gt;
        &lt;span class="na"&gt;recoveryWindow&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;60s&lt;/span&gt;    &lt;span class="c1"&gt;# wait for stability before restoring&lt;/span&gt;

  &lt;span class="na"&gt;onDegraded&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ScaleToZero&lt;/span&gt;

  &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;strict&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. No code changes. No sidecars. No complex configuration.&lt;/p&gt;




&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq81jcs4xi638pzybibud.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq81jcs4xi638pzybibud.png" alt="When PostgreSQL crashes, klink waits 30 seconds (ignoring transient restarts), then scales payments to zero and saves the replica count. When the database recovers, klink waits 60 seconds for stability, then automatically restores payments to its original replica count." width="800" height="598"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;hysteresis window&lt;/strong&gt; is critical. Without it, a single pod restart would cascade a shutdown. With &lt;code&gt;window: 30s&lt;/code&gt;, klink ignores transient failures — only sustained outages trigger action.&lt;/p&gt;




&lt;h2&gt;
  
  
  Enforcement Modes
&lt;/h2&gt;

&lt;p&gt;Different situations call for different behavior. klink has four modes:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;th&gt;When to use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;strict&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Scales to 0 on failure. Reverts manual scale-ups within 15s.&lt;/td&gt;
&lt;td&gt;Production services where cascade is non-negotiable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;soft&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Scales to 0 once. Respects manual overrides.&lt;/td&gt;
&lt;td&gt;Services where operators need flexibility during incidents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;gate&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Doesn't scale down. Blocks scale-up via admission webhook.&lt;/td&gt;
&lt;td&gt;Preventing HPA from scaling up while dependency is down&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;observe&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Logs what it &lt;em&gt;would&lt;/em&gt; do. Takes no action.&lt;/td&gt;
&lt;td&gt;Safe onboarding — see what klink would do before enabling it&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Start with &lt;code&gt;observe&lt;/code&gt; mode.&lt;/strong&gt; Apply klink to your existing services, watch the logs for a week, and only switch to &lt;code&gt;strict&lt;/code&gt; or &lt;code&gt;soft&lt;/code&gt; once you're confident.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;observe&lt;/span&gt;  &lt;span class="c1"&gt;# "would scale payments to 0 — dependency unhealthy"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Mutual Dependency Problem
&lt;/h2&gt;

&lt;p&gt;What happens when A depends on B &lt;em&gt;and&lt;/em&gt; B depends on A?&lt;/p&gt;

&lt;p&gt;Naive implementations deadlock. Both services go to zero, each waiting for the other to recover. You need a manual fix every time.&lt;/p&gt;

&lt;p&gt;klink solves this with &lt;strong&gt;CoSuspended&lt;/strong&gt; detection:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsng97lpovfrhxw5pecgo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsng97lpovfrhxw5pecgo.png" alt="Payments depends on database AND database depends on payments. When database fails, klink scales payments to zero and marks it as CoSuspended — intentionally paused, not broken. database sees payments at zero but doesn't cascade. Restore database manually → payments comes back automatically. No deadlock." width="800" height="668"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When klink scales payments to zero because database failed, it marks payments as &lt;em&gt;CoSuspended&lt;/em&gt; — intentionally scaled down by klink, not actually broken.&lt;/p&gt;

&lt;p&gt;When database checks its dependencies, it sees payments at zero — but recognizes it as CoSuspended and doesn't cascade. When you manually restore database, klink automatically restores payments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No deadlock. No manual intervention.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Argo Rollout Support — Canary Awareness
&lt;/h2&gt;

&lt;p&gt;klink understands Argo Rollouts. If your payments service is in the middle of a canary deployment when its database goes down, klink &lt;strong&gt;defers the scale-to-zero until the rollout completes&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdwqdxhk2g253ge6t4yqi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdwqdxhk2g253ge6t4yqi.png" alt="If a canary rollout is in progress when the dependency fails, klink defers the scale-to-zero until the rollout completes. Interrupting an active deployment would break the canary analysis and take down the stable version. klink waits, then acts." width="800" height="157"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You never want to interrupt an active deployment. klink handles this automatically.&lt;/p&gt;




&lt;h2&gt;
  
  
  CronJob Support — Suspend Instead of Scale
&lt;/h2&gt;

&lt;p&gt;For batch jobs, scaling to zero makes no sense. klink sets &lt;code&gt;spec.suspend: true&lt;/code&gt; instead:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;dependent&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CronJob&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nightly-billing-export&lt;/span&gt;
  &lt;span class="na"&gt;dependsOn&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;billing-service&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When billing-service goes down, the CronJob is suspended. No failed jobs accumulating in history. When billing-service recovers, the CronJob resumes automatically.&lt;/p&gt;




&lt;h2&gt;
  
  
  Notifications
&lt;/h2&gt;

&lt;p&gt;Get notified when workloads are suspended or restored — before your monitoring fires:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;notify&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;webhookSecretRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;slack-webhook&lt;/span&gt;
      &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;url&lt;/span&gt;
    &lt;span class="na"&gt;onPhases&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Suspended&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;Healthy&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The notification arrives the moment klink acts, with full context:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"workloadDependency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"payments-needs-database"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"namespace"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"production"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"phase"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Suspended"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"previousPhase"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Degraded"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"dependent"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"payments-service"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"dependentKind"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Deployment"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"dependency postgresql not healthy: 0/3 ready"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-06-15T03:00:00Z"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notifications include retry with exponential backoff (1s → 2s → 4s) so transient webhook outages don't silently drop alerts.&lt;/p&gt;




&lt;h2&gt;
  
  
  Safety Net — maxSuspendDuration
&lt;/h2&gt;

&lt;p&gt;Long outages happen. Your database might be down for hours. You don't want your payments service suspended indefinitely.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;onDegraded&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ScaleToZero&lt;/span&gt;
    &lt;span class="na"&gt;maxSuspendDuration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;4h&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After 4 hours, klink restores the workload regardless of dependency state and enters &lt;code&gt;Released&lt;/code&gt; phase — it won't re-suspend until the dependency genuinely recovers. This prevents indefinite outages from a single bad dependency.&lt;/p&gt;




&lt;h2&gt;
  
  
  Observability
&lt;/h2&gt;

&lt;p&gt;klink exports Prometheus metrics so you can see exactly what's happening:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight prometheus"&gt;&lt;code&gt;&lt;span class="n"&gt;klink_dependency_phase&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"production"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"payments-needs-database"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;phase&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"Suspended"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="n"&gt;klink_scale_to_zero_total&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"production"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"Deployment"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"payments-service"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
&lt;span class="n"&gt;klink_replicas_restored_total&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"production"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"Deployment"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"payments-service"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GKE users get a &lt;code&gt;PodMonitoring&lt;/code&gt; resource automatically when metrics are enabled.&lt;/p&gt;




&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;helm upgrade &lt;span class="nt"&gt;--install&lt;/span&gt; klink oci://ghcr.io/n0rm4l-me/charts/klink &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--version&lt;/span&gt; 0.3.0 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; klink-system &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--create-namespace&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Apply your first WorkloadDependency:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;deps.klink.dev/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;WorkloadDependency&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;payments-needs-database&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;dependent&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;payments&lt;/span&gt;
  &lt;span class="na"&gt;dependsOn&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;database&lt;/span&gt;
      &lt;span class="na"&gt;condition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;minReadyPercent&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
        &lt;span class="na"&gt;window&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;30s&lt;/span&gt;
        &lt;span class="na"&gt;recoveryWindow&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;60s&lt;/span&gt;
  &lt;span class="na"&gt;onDegraded&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ScaleToZero&lt;/span&gt;
  &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;observe&lt;/span&gt;  &lt;span class="c1"&gt;# start here — see what klink would do&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check the status:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get workloaddependencies &lt;span class="nt"&gt;-A&lt;/span&gt;

NAMESPACE    NAME                      PHASE     REPLICAS   MESSAGE
production   payments-needs-database   Healthy              all dependencies healthy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you're comfortable with what you see in &lt;code&gt;observe&lt;/code&gt; mode, switch to &lt;code&gt;strict&lt;/code&gt; or &lt;code&gt;soft&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What klink Supports
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Workload&lt;/th&gt;
&lt;th&gt;As dependent&lt;/th&gt;
&lt;th&gt;As dependency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Deployment&lt;/td&gt;
&lt;td&gt;✅ scale to 0&lt;/td&gt;
&lt;td&gt;✅ readyReplicas check&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;StatefulSet&lt;/td&gt;
&lt;td&gt;✅ scale to 0&lt;/td&gt;
&lt;td&gt;✅ readyReplicas check&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CronJob&lt;/td&gt;
&lt;td&gt;✅ suspend/resume&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Argo Rollout&lt;/td&gt;
&lt;td&gt;✅ scale to 0 (canary-aware)&lt;/td&gt;
&lt;td&gt;✅ phase check&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  The Incident That Started This
&lt;/h2&gt;

&lt;p&gt;We run microservices on Kubernetes. One evening our message queue had a rolling restart — routine maintenance, 45 seconds of unavailability. But 12 services that depended on it kept running and kept trying to connect. By the time the queue was back, we had retries queued up, connection pools exhausted, and a 10-minute degraded period that should have been 45 seconds.&lt;/p&gt;

&lt;p&gt;The fix was conceptually simple: "if the queue is down, pause the consumers." But there was no Kubernetes-native way to express that relationship.&lt;/p&gt;

&lt;p&gt;So we built klink.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Prometheus-based health conditions&lt;/strong&gt; — &lt;code&gt;promQuery: 'pg_up == 1'&lt;/code&gt; instead of readyReplicas&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;kubectl klink&lt;/code&gt; plugin&lt;/strong&gt; — &lt;code&gt;klink graph&lt;/code&gt;, &lt;code&gt;klink why payments-service&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;DaemonSet support&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The project is open source under Apache 2.0. Issues, PRs, and feedback welcome.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/n0rm4l-me/klink" rel="noopener noreferrer"&gt;github.com/n0rm4l-me/klink&lt;/a&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>devops</category>
      <category>cloudnative</category>
      <category>opensource</category>
    </item>
    <item>
      <title>We replaced etcd with Google Cloud Spanner. Here's what happened.</title>
      <dc:creator>Petr Petrenko</dc:creator>
      <pubDate>Tue, 09 Jun 2026 04:08:55 +0000</pubDate>
      <link>https://dev.to/n0rm4l/we-replaced-etcd-with-google-cloud-spanner-heres-what-happened-2e9a</link>
      <guid>https://dev.to/n0rm4l/we-replaced-etcd-with-google-cloud-spanner-heres-what-happened-2e9a</guid>
      <description>&lt;p&gt;&lt;code&gt;spanner-etcd&lt;/code&gt; is an open source (Apache 2.0), drop-in etcd v3 replacement backed by Google Cloud Spanner. Same API, no client changes — just point &lt;code&gt;--etcd-servers&lt;/code&gt; at it.&lt;/p&gt;

&lt;p&gt;We built it because etcd has a fundamental scaling constraint: every write serializes through a single global revision counter. One row, one lock, every transaction waits in line. At 32 concurrent writers, that counter becomes the bottleneck.&lt;/p&gt;

&lt;p&gt;The GKE team solved this years ago internally to scale Kubernetes to 65,000 nodes. Their implementation is closed. So we built an open one.&lt;/p&gt;

&lt;p&gt;This is the story of how it works, what we got wrong, and the honest benchmark numbers.&lt;/p&gt;




&lt;h2&gt;
  
  
  The core idea: timestamps as revisions
&lt;/h2&gt;

&lt;p&gt;etcd's revision is a monotonically increasing integer. Every write increments it. That increment is the serialization point.&lt;/p&gt;

&lt;p&gt;Spanner has &lt;code&gt;PENDING_COMMIT_TIMESTAMP()&lt;/code&gt; — a TrueTime-based timestamp assigned at commit time, globally unique, strictly monotonic across all transactions. No counter. No lock. Each transaction commits independently.&lt;/p&gt;

&lt;p&gt;So instead of:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;kv_rev&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;rev&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rev&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;-- everyone waits here&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;kv&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rev&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'/foo'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'bar'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We do:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;kv&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rev&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;PENDING_COMMIT_TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="s1"&gt;'/foo'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'bar'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The revision is the commit timestamp, cast to &lt;code&gt;int64&lt;/code&gt; UnixNano. Valid etcd &lt;code&gt;ModRevision&lt;/code&gt;. Zero contention.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; at ×32 concurrency, write throughput went from a serialized bottleneck to &lt;strong&gt;673 ops/sec&lt;/strong&gt; — 15× faster than the integer counter baseline.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo59gb893c03bha14m9mq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo59gb893c03bha14m9mq.png" alt="PCT vs integer counter: serialized writes waiting in line vs parallel commits with PENDING_COMMIT_TIMESTAMP" width="800" height="802"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Watch events via Change Streams
&lt;/h2&gt;

&lt;p&gt;etcd Watch is a streaming API — clients subscribe to a prefix and receive events as writes happen. In a vanilla etcd replacement you'd poll. We tried that first: it worked, but ~1s latency felt wrong.&lt;/p&gt;

&lt;p&gt;Spanner has Change Streams — a push-based CDC mechanism that delivers row changes within tens of milliseconds. We built a partition reader that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Starts streaming all partitions of &lt;code&gt;kv_changes&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Persists partition cursors to Spanner every 5s (so replicas resume correctly after restart)&lt;/li&gt;
&lt;li&gt;Falls back to 1-second polling on the emulator (Change Streams aren't supported there)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The result: &lt;strong&gt;~30ms Watch latency&lt;/strong&gt; end-to-end on production Spanner in the same region. Not etcd's 1ms — Spanner is not a local in-memory store. But for Kubernetes workloads it's completely fine.&lt;/p&gt;




&lt;h2&gt;
  
  
  The back-join problem we almost missed
&lt;/h2&gt;

&lt;p&gt;Early benchmarks showed Get at &lt;strong&gt;71 ops/sec&lt;/strong&gt;. Seemed reasonable. Then we looked at the query plan.&lt;/p&gt;

&lt;p&gt;Our schema used &lt;code&gt;PRIMARY KEY (id)&lt;/code&gt; with a &lt;code&gt;bit_reversed_positive&lt;/code&gt; sequence — standard Spanner advice to avoid write hotspots. The secondary index &lt;code&gt;kv_key_rev ON kv(key, rev DESC)&lt;/code&gt; existed for reads. But Spanner was doing this for every Get:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Index scan &lt;code&gt;kv_key_rev&lt;/code&gt; → find the row's &lt;code&gt;id&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Table lookup on &lt;code&gt;kv&lt;/code&gt; by &lt;code&gt;id&lt;/code&gt; → fetch the actual data&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Two round-trips inside one query. The fix was a single DDL change:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;kv_key_rev&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;kv&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rev&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;STORING&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;old_value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lease_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;deleted&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="n"&gt;create_revision&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prev_revision&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;STORING&lt;/code&gt; copies all needed columns into the index. Spanner can now serve reads entirely from the index — no back-join.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Get improved +40%. Mixed workload improved +167%.&lt;/strong&gt; Measured before and after on the same hardware.&lt;/p&gt;

&lt;p&gt;We also added &lt;code&gt;kv_rev_desc ON kv(rev DESC)&lt;/code&gt; so &lt;code&gt;CurrentRevision()&lt;/code&gt; does an O(1) LIMIT 1 seek instead of a full &lt;code&gt;MAX(rev)&lt;/code&gt; scan.&lt;/p&gt;

&lt;p&gt;One caveat: &lt;code&gt;STORING value&lt;/code&gt; where value is &lt;code&gt;BYTES(MAX)&lt;/code&gt; doubles write amplification for large values. For Kubernetes workloads (mostly small JSON/protobuf objects) this is fine. For blob storage it would be a problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  Stateless replicas
&lt;/h2&gt;

&lt;p&gt;This is the part that feels almost too simple. Because all state lives in Spanner, every replica is completely stateless. No consensus. No leader election between replicas. No split-brain.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl9prtnek76jpcodmro5w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl9prtnek76jpcodmro5w.png" alt="Stateless architecture: LoadBalancer routes to multiple spanner-etcd replicas, all reading and writing to a single Google Cloud Spanner instance" width="800" height="896"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We tested this explicitly: Watch on replica 2, writes through replica 1, then killed replica 1. Replica 2 received all events — before and after the kill — with zero gaps. 45 Watch streams migrated in ~10s. Kubernetes didn't notice.&lt;/p&gt;

&lt;p&gt;The only statefulness is the Change Stream cursor, persisted to Spanner itself and recovered on restart. No leader election, no quorum, no split-brain scenario possible.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real numbers
&lt;/h2&gt;

&lt;p&gt;Everything below is production Spanner (&lt;code&gt;regional-us-central1&lt;/code&gt;, 1000 PU), same-region &lt;code&gt;e2-standard-4&lt;/code&gt; VM, not the emulator.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Throughput:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;ops/sec&lt;/th&gt;
&lt;th&gt;Latency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Create ×1&lt;/td&gt;
&lt;td&gt;90&lt;/td&gt;
&lt;td&gt;11.1ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Create ×4 parallel&lt;/td&gt;
&lt;td&gt;270&lt;/td&gt;
&lt;td&gt;3.7ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Get ×1&lt;/td&gt;
&lt;td&gt;108&lt;/td&gt;
&lt;td&gt;9.3ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Get ×4 parallel&lt;/td&gt;
&lt;td&gt;481&lt;/td&gt;
&lt;td&gt;2.1ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mixed ×4 (70% read)&lt;/td&gt;
&lt;td&gt;403&lt;/td&gt;
&lt;td&gt;2.5ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Watch latency&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;~30ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;How many Spanner PUs do you actually need?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We benchmarked at 100, 1000, and 2000 PU on &lt;code&gt;us-central1&lt;/code&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;100 PU&lt;/th&gt;
&lt;th&gt;1000 PU&lt;/th&gt;
&lt;th&gt;2000 PU&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Create ×4 parallel&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;87&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;270&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;255&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Get ×4 parallel&lt;/td&gt;
&lt;td&gt;472&lt;/td&gt;
&lt;td&gt;481&lt;/td&gt;
&lt;td&gt;469&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mixed ×4&lt;/td&gt;
&lt;td&gt;294&lt;/td&gt;
&lt;td&gt;403&lt;/td&gt;
&lt;td&gt;404&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Watch latency&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;29ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~30ms&lt;/td&gt;
&lt;td&gt;30ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Interesting findings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Single-key ops are nearly &lt;strong&gt;identical&lt;/strong&gt; across 1000 and 2000 PU — you're paying for network round-trip, not Spanner compute (CPU was ~1% during benchmarks)&lt;/li&gt;
&lt;li&gt;Parallel writes fall off sharply at 100 PU — Create ×4 drops from 270 to 87 ops/sec&lt;/li&gt;
&lt;li&gt;Watch latency is consistent at ~30ms across all tiers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;100 PU is enough for small clusters&lt;/strong&gt; (&amp;lt; 100 nodes with moderate write rates)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Multi-region: what does global durability actually cost?
&lt;/h2&gt;

&lt;p&gt;We ran one more experiment. We switched to &lt;code&gt;nam6&lt;/code&gt; — Iowa + South Carolina + Oregon + Los Angeles — and benchmarked from both regions.&lt;/p&gt;

&lt;p&gt;The Spanner leader lives in Iowa. So writes from Iowa replicate synchronously to South Carolina before committing. Writes from South Carolina travel to Iowa, get committed, then come back.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ynawpu658qy8tl268u9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ynawpu658qy8tl268u9.png" alt="Multi-region write cost: writes from Iowa leader replicate synchronously to South Carolina (~40ms penalty), writes from South Carolina travel round-trip to Iowa leader (~80ms)" width="800" height="391"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;Regional Iowa&lt;/th&gt;
&lt;th&gt;nam6 Iowa&lt;/th&gt;
&lt;th&gt;nam6 S.Carolina&lt;/th&gt;
&lt;th&gt;nam6 S.Carolina + DR&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Create ×1&lt;/td&gt;
&lt;td&gt;90&lt;/td&gt;
&lt;td&gt;53&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Create ×4 parallel&lt;/td&gt;
&lt;td&gt;270&lt;/td&gt;
&lt;td&gt;203&lt;/td&gt;
&lt;td&gt;45&lt;/td&gt;
&lt;td&gt;45&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Get ×1&lt;/td&gt;
&lt;td&gt;108&lt;/td&gt;
&lt;td&gt;116&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;16&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Get ×4 parallel&lt;/td&gt;
&lt;td&gt;481&lt;/td&gt;
&lt;td&gt;577&lt;/td&gt;
&lt;td&gt;60&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;64&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mixed ×4&lt;/td&gt;
&lt;td&gt;403&lt;/td&gt;
&lt;td&gt;327&lt;/td&gt;
&lt;td&gt;49&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;53&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Watch latency&lt;/td&gt;
&lt;td&gt;~30ms&lt;/td&gt;
&lt;td&gt;42ms&lt;/td&gt;
&lt;td&gt;131ms&lt;/td&gt;
&lt;td&gt;196ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;DR = &lt;code&gt;--spanner-read-location=us-east1&lt;/code&gt; — directed reads to the local South Carolina replica.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What this tells you:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Writes from Iowa get ~40% slower — that's the cost of synchronous replication to South Carolina. From South Carolina writes are 8× slower — each write travels Iowa→S.Carolina twice.&lt;/p&gt;

&lt;p&gt;Directed reads improve read latency by 7-14% from South Carolina — reads go to the local replica instead of Iowa. The improvement is modest because writes still dominate the mixed workload, and Watch latency actually gets worse (Change Stream cursors still follow the leader path).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The practical conclusion:&lt;/strong&gt; put your spanner-etcd replicas in the same region as the Spanner leader. If you need RPO=0 and must run replicas far from the leader, use &lt;code&gt;--spanner-read-location&lt;/code&gt; to at least get reads locally. But writes will always pay the cross-region round-trip.&lt;/p&gt;




&lt;h2&gt;
  
  
  Kubernetes validation
&lt;/h2&gt;

&lt;p&gt;We ran Kubernetes v1.33.12 (kubeadm, external etcd = spanner-etcd) for 24 hours straight:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rolling deployments scaled 1–10 replicas every 2 minutes&lt;/li&gt;
&lt;li&gt;ConfigMap churn every 3 minutes&lt;/li&gt;
&lt;li&gt;cert-manager running concurrently&lt;/li&gt;
&lt;li&gt;57 active Watch streams throughout&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Results: &lt;strong&gt;zero crashes, zero data loss, zero unimplemented errors.&lt;/strong&gt; The Kubernetes node stayed Ready the entire time.&lt;/p&gt;

&lt;p&gt;We also tested with 22 production Java/Kotlin microservices (Vert.x + jetcd) on GKE. Auth token expiry, pod kill, Watch stream migration — all clean.&lt;/p&gt;




&lt;h2&gt;
  
  
  What we didn't build (and why)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Auth RBAC&lt;/strong&gt; (UserAdd, RoleAdd, GrantPermission) — Kubernetes doesn't use it. The API server manages its own RBAC. We implement &lt;code&gt;Authenticate&lt;/code&gt; (username/password → token) because kubeadm requires it, but the full RBAC surface isn't needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Defrag / Snapshot&lt;/strong&gt; — Spanner manages storage automatically. These operations don't have a meaningful equivalent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sub-10ms Watch latency&lt;/strong&gt; — if you need this, spanner-etcd is the wrong tool. Change Streams have inherent latency. For most Kubernetes operations this doesn't matter — the API server isn't latency-sensitive to etcd Watch at the millisecond level.&lt;/p&gt;




&lt;h2&gt;
  
  
  What surprised us
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The covering index made a bigger difference than the PCT revision change.&lt;/strong&gt; We expected the write bottleneck removal to be the headline. It was. But the read path optimization nearly doubled read throughput and nearly tripled mixed workload numbers. Sometimes the boring infrastructure work matters more than the clever architectural idea.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;100 PU is genuinely enough for most clusters.&lt;/strong&gt; We expected a linear relationship between PU and performance. Instead we found that network latency dominates and Spanner CPU is barely touched. The PU floor matters for parallel writes, but a small cluster doesn't need 1000 PU.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Non-leader regions are expensive.&lt;/strong&gt; Iowa→South Carolina adds ~80ms round-trip. In a multi-region setup, where you place your replicas relative to the Spanner leader matters a lot more than how many PUs you provision.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;SPANNER_DATABASE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;projects/P/instances/I/databases/D &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt; 2379:2379 &lt;span class="se"&gt;\&lt;/span&gt;
  ghcr.io/n0rm4l-me/spanner-etcd:v0.1.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or with kubeadm:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;etcd&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;external&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;endpoints&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;http://spanner-etcd:2379&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GitHub: &lt;a href="https://github.com/n0rm4l-me/spanner-etcd" rel="noopener noreferrer"&gt;github.com/n0rm4l-me/spanner-etcd&lt;/a&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>googlecloud</category>
      <category>database</category>
      <category>go</category>
    </item>
  </channel>
</rss>
