DEV Community

Cover image for Your "Atomic" Deploys Probably Aren't Atomic
mojoatomic
mojoatomic

Posted on

Your "Atomic" Deploys Probably Aren't Atomic

If you're using symlink swaps for zero-downtime deployments, you've probably written something like this:

ln -sfn releases/20260112 current
Enter fullscreen mode Exit fullscreen mode

Looks atomic. It's not.

The Bug

Run that command through strace:

symlink("releases/20260112", "current") = -1 EEXIST
unlink("current") = 0                    
symlink("releases/20260112", "current") = 0
Enter fullscreen mode Exit fullscreen mode

See the problem? Between unlink and symlink, the current symlink doesn't exist. Under load, some percentage of requests hit that gap and get ENOENT.

Your "zero-downtime" deploy just caused downtime.

The Linux Fix

This is well-documented:

ln -s releases/20260112 .tmp/current.$$
mv -T .tmp/current.$$ current
Enter fullscreen mode Exit fullscreen mode

Create a temp symlink, then use mv -T to atomically replace the target. The -T flag makes mv call rename(2), which is atomic on POSIX filesystems.

Problem solved. Unless you're on macOS.

The macOS Problem

BSD mv doesn't have -T. And it follows symlinks differently - if current is a symlink to a directory, mv .tmp/current.$$ current moves the temp symlink into the directory instead of replacing it.

The Capistrano community has known about this for over a decade. Their workaround is clever - create the symlink in a subdirectory and move it via relative path - but it requires their Ruby runtime.

Most deployment tools just accept the race condition on Mac, or tell you to develop on Linux.

A Different Approach

I needed something that works on both platforms. I manage infrastructure across Linux servers and Mac dev machines, and "just use Linux" wasn't an option.

Python's os.replace() calls rename(2) directly on all POSIX systems:

detect_platform() {
    if mv --version 2>/dev/null | grep -q 'GNU'; then
        printf 'linux'
    else
        printf 'bsd'
    fi
}

activate_release() {
    local tmp_link=".tmp/current.$$"
    ln -s "releases/$release_id" "$tmp_link"

    if [[ "$(detect_platform)" == "linux" ]]; then
        mv -T "$tmp_link" "current"
    else
        python3 -c "import os; os.replace('$tmp_link', 'current')"
    fi
}
Enter fullscreen mode Exit fullscreen mode

Platform detection checks for GNU coreutils by running mv --version rather than relying on uname. This handles edge cases like Homebrew GNU coreutils on Mac.

Proving the Race Condition

Want to see the bug yourself? Here's a test:

#!/bin/bash
mkdir -p releases/v1 releases/v2
echo "v1" > releases/v1/version
echo "v2" > releases/v2/version
ln -s releases/v1 current

# Reader loop in background
(
  for i in {1..10000}; do
    cat current/version 2>/dev/null || echo "ENOENT"
  done
) > reads.log &

reader_pid=$!

# Writer loop - rapidly swap symlink
for i in {1..1000}; do
  ln -sfn releases/v1 current
  ln -sfn releases/v2 current
done

wait $reader_pid

errors=$(grep -c ENOENT reads.log || true)
echo "Errors: $errors / 10000 reads"
Enter fullscreen mode Exit fullscreen mode

On a typical system, you'll see 10-50 errors per run. With the atomic approach, zero.

The Full Script

I wrapped this into a deployment script with:

  • Atomic symlink swap on both Linux and macOS
  • Directory-based locking with stale PID detection
  • Automatic rollback on SIGINT/SIGTERM
  • State machine cleanup that knows whether to rollback or just clean up temp files

It's a single file, no dependencies beyond bash and python3.

GitHub: github.com/mojoatomic/atomic-deploy

What It Doesn't Do

Intentionally out of scope:

  • Shared directories - No Capistrano-style shared folder symlinking
  • Remote deployment - Wrap it in ssh/rsync
  • Release pruning - Add a cron job
  • Service restarts - Use your init system

One thing, done right.

FAQ

Why not just use Capistrano/Deployer?

They're great if you're already in Ruby/PHP. This is a single script you can drop into any CI pipeline.

Why not containers?

Not everyone is on Kubernetes. VMs, bare metal, and edge devices still exist.

Python is a dependency.

Yes, but python3 ships with macOS and virtually every Linux distro. It's as ubiquitous as bash.

What about renameat2() with RENAME_EXCHANGE?

That's Linux 3.15+ with glibc 2.28+. It does a true atomic swap, but it's not portable.

Does this work on NFS?

No. rename(2) atomicity guarantees don't hold on network filesystems.


Further Reading

Top comments (0)