Every. Single. Time. ๐คฆโโ๏ธ
My Setup (aka "The Perfect Storm")
Before we dive in, here's what I was working with:
- ๐ฅ๏ธ Linux 6.14 (yes, bleeding edge)
- ๐ง Intel Core Ultra 9 185H (hybrid P/E cores)
- ๐ฎ AMD RadeonPro W7900 with ROCm 7.0.2
- ๐ฆ Node.js 22.21.0
- ๐ง glibc 2.39
Sounds like a developer's dream setup, right? Well...
The Debugging Journey
Phase 1: All the Standard Stuff (That Didn't Work)
I tried everything you'd find on Stack Overflow:
- โ Clearing npm cache
- โ Rebuilding Node.js from source
- โ Adjusting ulimit settings
- โ Playing with UV_THREADPOOL_SIZE
- โ Different Node.js versions
Nothing. The error kept coming back like a boomerang.
Phase 2: Getting Serious
At this point, I started questioning everything. Is it the hybrid CPU? The kernel version? Thread pool size?
# Testing CPU affinity
$ taskset -c 0-11 node -v
node[10234]: pthread_create: Invalid argument # Nope
# Testing thread pool
$ UV_THREADPOOL_SIZE=1 node -v
node[10256]: pthread_create: Invalid argument # Nope
# Testing glibc rseq
$ GLIBC_TUNABLES=glibc.pthread.rseq=0 node -v
node[10312]: pthread_create: Invalid argument # Still nope
Phase 3: The Breakthrough ๐ก
Finally, I pulled out the big guns: LD_DEBUG
$ LD_DEBUG=libs node -e "console.log('test')" 2>&1 | grep -i pthread
And there it was:
/opt/rocm-7.0.2/lib/libamdhip64.so.7: error: symbol lookup error:
undefined symbol: pthread_setaffinity_np (fatal)
EUREKA! ๐
The culprit wasn't Node.js at all. It was ROCm's LD_PRELOAD polluting the environment!
$ env | grep LD_PRELOAD
LD_PRELOAD=/opt/rocm-7.0.2/lib/libMIOpen.so
The Solution: Wrapper Scripts
Here's the clever part: I needed Node.js to work WITHOUT breaking ROCm for my GPU workloads.
Solution: Environment isolation through wrapper scripts.
Step 1: Create the wrappers
File: ~/.local/bin/node
#!/bin/bash
# Isolate Node.js from ROCm's LD_PRELOAD
unset LD_PRELOAD
exec /usr/bin/node "$@"
File: ~/.local/bin/npm
#!/bin/bash
# Isolate npm from ROCm's LD_PRELOAD
unset LD_PRELOAD
exec /usr/bin/npm "$@"
Step 2: Make them executable
chmod +x ~/.local/bin/node ~/.local/bin/npm
Step 3: Fix your PATH (CRITICAL!)
Edit ~/.bashrc and make sure ~/.local/bin comes FIRST:
# WRONG (wrapper won't be used):
export PATH="$PATH:$HOME/.local/bin"
# RIGHT (wrapper will be used):
export PATH="$HOME/.local/bin:$PATH"
Apply changes:
source ~/.bashrc
Step 4: Verify
$ which node
/home/user/.local/bin/node # โ
Our wrapper!
$ node -v
v22.21.0 # โ
NO ERROR! ๐
$ npm -v
10.9.4 # โ
Clean!
Why This Works
The wrapper creates a clean environment for Node.js while keeping ROCm functional for other applications:
- โ
Node.js runs without
LD_PRELOADpollution - โ ROCm still works for GPU applications
- โ Transparent to all programs (terminal, IDE, scripts)
- โ Easy to maintain and rollback
The Technical Deep-Dive
Want to know why this happens? It's a "perfect storm":
- ROCm's LD_PRELOAD forces its libraries to load first
-
These libraries have undefined symbols (
pthread_setaffinity_np) - Node.js 22 tries to create threads with these broken symbols in scope
-
Result:
pthread_create()returns EINVAL (errno 22)
The kicker? The program still works because:
- Main threads are already created
- The error happens in additional worker threads
- Node.js libuv handles the error gracefully
But it's still annoying as hell to see on every run. ๐
Lessons Learned
- Bleeding edge = cutting yourself - Latest kernel + glibc + Node.js = unexpected interactions
- LD_PRELOAD is dangerous - It affects every dynamically linked program
-
Deep tracing saves the day -
LD_DEBUGfound the issue in one shot - Constraints breed creativity - "Don't touch ROCm" โ wrapper pattern
- PATH order matters - First match wins!
Full Documentation
I've documented the entire diagnostic process, alternative solutions considered, and technical details in a comprehensive PSP (Problem-Solution Pattern):
๐ Complete PSP on GitHub Gist
Conclusion
If you're seeing pthread_create: Invalid argument with Node.js and you have AMD GPU with ROCm installed, check for LD_PRELOAD pollution. The wrapper script solution is clean, maintainable, and doesn't break your GPU workflows.
Have you encountered similar issues with environment variable pollution? Let me know in the comments! ๐
Stats:
- โฑ๏ธ Time to debug: 2.5 hours
- ๐งช Hypotheses tested: 8+
- ๐ฏ Tools used: LD_DEBUG, strace, lscpu, ulimit
- ๐ช Complexity: 5/5
- ๐ Satisfaction: โ
Top comments (0)