DEV Community

Cover image for The 30-Second Death: A Memoir
Seangles
Seangles

Posted on

The 30-Second Death: A Memoir

A true story about a database that died on a schedule, a developer who tried everything, and a CPU security feature nobody told anyone about.

My database kept dying. Every. Thirty. Seconds.

Not crashing with an error. Not leaving a note. Just... gone. Exit code 139. No explanation. No apology.

I checked the logs. MongoDB had started successfully. Recovered from the previous unclean shutdown. Accepted connections. Everything looked fine.

Then it died.


Act I: Blame the Obvious Thing

Me: Why is the DB crashing?

MongoDB: dies

Me: Okay, unclean shutdown, probably just needs to recover—

MongoDB: recovers successfully, then dies again


I did what any reasonable engineer does. I blamed the obvious thing.

"It's mongo:latest," I said confidently. "Never trust latest." I pinned the version. mongo:8.0.14-noble. Stable. Specific. Professional.

image: mongo:8.0.14-noble
Enter fullscreen mode Exit fullscreen mode

MongoDB: dies at exactly 30 seconds


Act II: Disable Everything

Fine. The diagnostics collector — FTDC. It reads /proc files in the background. Kernel 6.19 is brand new. Maybe something changed.

command: mongod --setParameter diagnosticDataCollectionEnabled=false
Enter fullscreen mode Exit fullscreen mode

MongoDB: dies at exactly 30 seconds


Seccomp, then. Docker's syscall filter. A classic gotcha. I ran the container with --security-opt seccomp=unconfined, essentially handing it a skeleton key to the entire kernel.

MongoDB: dies at exactly 30 seconds

The audacity.


Act III: The Nuclear Option

At this point I had one move left: mongo:7. Downgrade. Accept defeat. Move on with my life.

image: mongo:7
Enter fullscreen mode Exit fullscreen mode

MongoDB 7: reads the data files MongoDB 8 wrote

MongoDB 7: these are not my files

MongoDB 7: exits with code 62

Two databases. Zero working. Outstanding.


Act IV: The Internet Knows

I went searching. I found a GitHub issue titled:

"[arm64] MongoDB 8.x crashes ~30s after startup on 6.19.0-sky1-latest.r5"

ARM64. I am on x86_64. I kept reading anyway.

Then, buried in a bug report, the answer:

"SIGSEGV exactly 30s after start on AMD Zen 5 due to hardware Shadow Stacks (user_shstk) clashing with coroutines."


The Actual Explanation (I Promise It Makes Sense)

Let me walk you through what actually happened here, because I need you to appreciate the full chain of events.

Intel invented a security feature called Control-flow Enforcement Technology (CET). The idea: add a second, hardware-enforced call stack that runs in parallel with the normal one. Every time a function returns, the CPU checks that the return address matches what the shadow stack recorded. Exploits that hijack return addresses — like ROP chains — get caught at the hardware level before they can do anything. It's genuinely clever engineering.

AMD implemented it too, under the name Shadow Stacks.

Linux kernel 6.19 decided to enable it by default for user processes.

Now. MongoDB 8.0 uses coroutines. Coroutines do context switching by manually swapping stack pointers — they save the current stack state and jump to a different one. This is a completely normal thing coroutines do.

The shadow stack looked at this manual stack swap and said:

"That return address does not match what I recorded. This is a security violation."

Then it fired a SIGSEGV. Signal 11. Exit code 139.

After exactly 30 seconds — the interval at which MongoDB's coroutine scheduler runs a particular background task.

Every time. Like clockwork.


The Fix

One line:

environment:
  GLIBC_TUNABLES: glibc.cpu.hwcaps=-SHSTK
Enter fullscreen mode Exit fullscreen mode

This tells glibc to disable Shadow Stacks for the process before it starts. MongoDB's coroutines do their stack swaps unmolested. The database lives.

Status: running, ExitCode: 0, Restarts: 0
Enter fullscreen mode Exit fullscreen mode

Post-Mortem

Time spent too long
Lines of code changed 1
Red herrings investigated mongo:latest tag, FTDC, Docker seccomp, data file incompatibility
Actual cause AMD Zen 5 + Linux 6.19 + CET Shadow Stacks + MongoDB coroutines
What I should have searched first literally anything other than what I searched

Takeaway

If you're running MongoDB 8.0 in Docker on a Linux 6.19+ kernel with a newer AMD CPU and your container dies silently every ~30 seconds with exit code 139 — this is your fix:

environment:
  GLIBC_TUNABLES: glibc.cpu.hwcaps=-SHSTK
Enter fullscreen mode Exit fullscreen mode

You're welcome. I suffered so you don't have to.


Dedicated to everyone who has ever watched a container die on a 30-second interval and felt their personality change. Yes the article is written with the help of an LLM. I spent too much time on this already, it's genuinely one of the weirdest issues of the month that I faced :p

Top comments (0)