A true story about a database that died on a schedule, a developer who tried everything, and a CPU security feature nobody told anyone about.
My database kept dying. Every. Thirty. Seconds.
Not crashing with an error. Not leaving a note. Just... gone. Exit code 139. No explanation. No apology.
I checked the logs. MongoDB had started successfully. Recovered from the previous unclean shutdown. Accepted connections. Everything looked fine.
Then it died.
Act I: Blame the Obvious Thing
Me: Why is the DB crashing?
MongoDB: dies
Me: Okay, unclean shutdown, probably just needs to recover—
MongoDB: recovers successfully, then dies again
I did what any reasonable engineer does. I blamed the obvious thing.
"It's mongo:latest," I said confidently. "Never trust latest." I pinned the version. mongo:8.0.14-noble. Stable. Specific. Professional.
image: mongo:8.0.14-noble
MongoDB: dies at exactly 30 seconds
Act II: Disable Everything
Fine. The diagnostics collector — FTDC. It reads /proc files in the background. Kernel 6.19 is brand new. Maybe something changed.
command: mongod --setParameter diagnosticDataCollectionEnabled=false
MongoDB: dies at exactly 30 seconds
Seccomp, then. Docker's syscall filter. A classic gotcha. I ran the container with --security-opt seccomp=unconfined, essentially handing it a skeleton key to the entire kernel.
MongoDB: dies at exactly 30 seconds
The audacity.
Act III: The Nuclear Option
At this point I had one move left: mongo:7. Downgrade. Accept defeat. Move on with my life.
image: mongo:7
MongoDB 7: reads the data files MongoDB 8 wrote
MongoDB 7: these are not my files
MongoDB 7: exits with code 62
Two databases. Zero working. Outstanding.
Act IV: The Internet Knows
I went searching. I found a GitHub issue titled:
"[arm64] MongoDB 8.x crashes ~30s after startup on 6.19.0-sky1-latest.r5"
ARM64. I am on x86_64. I kept reading anyway.
Then, buried in a bug report, the answer:
"SIGSEGV exactly 30s after start on AMD Zen 5 due to hardware Shadow Stacks (user_shstk) clashing with coroutines."
The Actual Explanation (I Promise It Makes Sense)
Let me walk you through what actually happened here, because I need you to appreciate the full chain of events.
Intel invented a security feature called Control-flow Enforcement Technology (CET). The idea: add a second, hardware-enforced call stack that runs in parallel with the normal one. Every time a function returns, the CPU checks that the return address matches what the shadow stack recorded. Exploits that hijack return addresses — like ROP chains — get caught at the hardware level before they can do anything. It's genuinely clever engineering.
AMD implemented it too, under the name Shadow Stacks.
Linux kernel 6.19 decided to enable it by default for user processes.
Now. MongoDB 8.0 uses coroutines. Coroutines do context switching by manually swapping stack pointers — they save the current stack state and jump to a different one. This is a completely normal thing coroutines do.
The shadow stack looked at this manual stack swap and said:
"That return address does not match what I recorded. This is a security violation."
Then it fired a SIGSEGV. Signal 11. Exit code 139.
After exactly 30 seconds — the interval at which MongoDB's coroutine scheduler runs a particular background task.
Every time. Like clockwork.
The Fix
One line:
environment:
GLIBC_TUNABLES: glibc.cpu.hwcaps=-SHSTK
This tells glibc to disable Shadow Stacks for the process before it starts. MongoDB's coroutines do their stack swaps unmolested. The database lives.
Status: running, ExitCode: 0, Restarts: 0
Post-Mortem
| Time spent | too long |
| Lines of code changed | 1 |
| Red herrings investigated |
mongo:latest tag, FTDC, Docker seccomp, data file incompatibility |
| Actual cause | AMD Zen 5 + Linux 6.19 + CET Shadow Stacks + MongoDB coroutines |
| What I should have searched first | literally anything other than what I searched |
Takeaway
If you're running MongoDB 8.0 in Docker on a Linux 6.19+ kernel with a newer AMD CPU and your container dies silently every ~30 seconds with exit code 139 — this is your fix:
environment:
GLIBC_TUNABLES: glibc.cpu.hwcaps=-SHSTK
You're welcome. I suffered so you don't have to.
Dedicated to everyone who has ever watched a container die on a 30-second interval and felt their personality change. Yes the article is written with the help of an LLM. I spent too much time on this already, it's genuinely one of the weirdest issues of the month that I faced :p
Top comments (0)