loading...

Ghost file descriptors take over my machine.

captainsafia profile image Safia Abdalla ・3 min read

So, in the other blog post, I started explaining this issue that I ran into when I was running a build on a lerna repo. The issue was related to having too many open file descriptors on my system. It disappeared when I ran the command again so I shrugged my shoulders and thought nothing of it.

OH HOW WRONG I WAS!!!

I didn't run into the issue for a while, but I was recently running a Webpack build for an electron app when I ran into the same issue. This time I was suspicious. I decided to take the recommendation of some resources mentioned online and increased the number of allowed file descriptors on my machine using the following command.

$ ulimit -n 8192

That should be more than enough. Right? Wrong! I kept getting those errors. So I decided to run another command to figure out how many open file descriptors were on my machine.

$ lsof | wc -l
   24940

Um. What. This seems like an absurd amount of file descriptors to have open. Which processes are opening up all these file descriptors? I decided to capture the output of lsof into a file I could look through.

$ lsof > fes.txt

I opened the file and started to look through it for anything suspicious. The first thing I noticed was that there were tons of file descriptors opened by Python processes, around 12,662. What was even more suspicious was that the file descriptors originated from resources under the Anaconda install of Python 3.6.

python3.6  4392 captainsafia  txt       REG                1,4     34284 8596243939 /Users/captainsafia/anaconda3/lib/python3.6/lib-dynload/zlib.cpython-36m-darwin.so.c~

Why was this weird? Well, because I had uninstalled Anaconda earlier that morning. I was running into some unrelated issues with it and decided to just nuke it. I followed the uninstall instructions listed on the website. It looks like the file descriptors weren't cleaned up for a process I was running previously. In fact, the process with process ID 4392 was no longer running on my machine.

So, I have thousands of these ghost file descriptors running from non-existent processes.THIS IS STRANGE. This shouldn't even be possible. If a process is killed, its open file descriptors should be closed right.

Now, there is one thing to note here, I've been recently working on writing some code that spawns and shutdown processes related to Jupyter kernels. Maybe the bugs in my shutdown code had caused these issues?

It seemed really weird that these file descriptors were still around. I had restarted my machine a few hours earlier. Shouldn't that have cleared them away? In any case, I decided to give it another restart.

$ lsof | wc -l
    3232

Lo and behold, a more reasonable number of open file descriptors on my machine. I went ahead and initialized the build for the Electron application that I was working on. Things ran smoothly this time and no excessive file descriptors were open on my machine. I decided to do one more thing.

Remember that buggy code I was telling you about? I decided to run the tests for that code. If the bugs related to correctly shutting down running processes had not been resolved, I should see similar behavior.

The tests ran smoothly and nothing fishy happened.

I'm still quite perplexed as to why there were so many open file descriptors on my system with no processes associated with them? The code that I was writing was written in Node. It killed the process using process.kill and used the destroy method to close out the standard I/O streams associated with the process.

I'm still working on cleaning up the shutdown code. There's a lot of things that need to happen for a shutdown to occur correctly in this context. I'll continue working through this issue and write another blog post when I've got more things figured out.

Have you run into a similar issue with file descriptors not being cleaned on process exit? Let me know in the comments!

Posted on by:

captainsafia profile

Safia Abdalla

@captainsafia

I make open source at @nteractio, make software at @Microsoft, and write books and blogs. Dream big and follow through even bigger.

Discussion

markdown guide
 

I never had a problem this tricky, but over a decade ago when I was a unix admin one of the systems I was in charge of had really high /var usage. The funny thing was I could not get du and df to agree on disk usage. So there was some "super hidden" file taking up disk space that I could not delete.

Another admin found it, with fsck. Turns out I deleted a log file a few weeks back (also because of disk space issues), but the process still had the inode open and the log was not rotating or anything.