DEV Community πŸ‘©β€πŸ’»πŸ‘¨β€πŸ’»

Discussion on: A deep dive into file locking with NodeJS

Collapse
sebmtl profile image
Sebastian Rath Author • Edited on

Thanks for your input! As far as I know there are no proper solutions yet. I’ve found countless of articles, blog posts, and threads on Stackoverflow and Reddit with the same problem.

For your suggestion I’m wondering, doesn’t the β€œprivate” copy have the same lock problem? While creating/copying this private file, another process could just modify the content of the original file.

And for file systems, they are mostly atomic on block level. Every write operation bigger than that is undefined afaik

Collapse
december1981 profile image
Stephen Brown

I'm no expert on file systems. But in principle if all read operations on a file block write operations, the following would work:

do a checksum on the original, which would block writes during this period. Copy it, then checksum the private copied file. If the checksums match, commit the private file, else repeat.

But apparently at the os/kernel level, to which you alluded, one can get multiple write() operations on the same file during writing, at which point read() operations could interlace between the write ops. Ok, So we don't have full file atomicity.

So what about: checksum the original, copy to private, checksum the private, then checksum the original again (after a second or so), and compare all three - try again if all three don't match.

Would we have a reasonable heuristic for avoiding the need to use locks at all? It's like a high level compare and swap mechanism. Waiting a second or so on every file would be costly if commiting many files so you'd do this comparison op on groups of files, rather than one at a time.

Thread Thread
sebmtl profile image
Sebastian Rath Author

Great idea, but taking the initial checksum already has the file locking problem as the file cannot be locked. Just for the sake of outlining a potential edge case: Imagine the write operation of a process on file "foo" being faster than the read operation which takes the checksum. So while a file is being overwritten by a process the file computed the checksum of the old file and the remaining part of the new content.

This could be solved of course, but only if you're the maintainer of all involved processes.

Thread Thread
december1981 profile image
Stephen Brown

Let's say the read op begins first, as you describe. It gets half way through the file, when the write op comes in, which latter races through to the finish line first, causing, as you say, the reader to checksum the earlier part of the file with older content, the latter part of the file with the newer file content.

As the reader finishes last, we then determine the checksum (based on a corrputed snapshot of the file), copy the file, and do a checksum on the private file. The private file will be a fully written file, so this latter checksum will be for the entire updated content, thus mismatching with the first checksum. This would cause your commit process to retry (or fail if the command line arg said "fail on detecting update during commit"). I suggested a third checksum back on the original file, in my heuristic, just for a sanity check.

Thread Thread
december1981 profile image
Stephen Brown • Edited on

As an example of why I suggested the third check - say a write operation comes in first. It gest someway through and then stalls. The read op comes in, for the original checksum, then it copies the file (assuming the stalled write process allows it), which will match the original (Despite the file being partially written when copied). The third checksum on the original is intentionally delayed a second or two to allow a stall to "resume" and a mismatch to be detected at this last hurdle. That's why I said it is not 100% fool proof. The stall could last multiple seconds or more. But that would be a very broken os...

Thread Thread
sebmtl profile image
Sebastian Rath Author

That makes absolutely sense. I've approached this more with speed and performance in mind since project files can become quite large. During the beta test it happened too often that users continued working and saving while the app was still running in the background

Thread Thread
sebmtl profile image
Sebastian Rath Author

But you're approach is pretty good, I like it. As in to use a "pre-staging area" as a safety net

Thread Thread
december1981 profile image
Stephen Brown • Edited on

I meant to add an additional comment - it seems that if you write to a file it can sometimes take a long time to write - imagine a browser is downloading a file - the write stall may depend on the network conditions, and could take as long as you like.

I think your approach of checking for open file handles, combined with the checksum approach I suggested will solve this issue.

So

i) read and checksum original file
ii) check any write handles on original file, as you were doing (fail/abort if any)
iii) copy original file.
iv) compare checksum of copied file with original (fail if mismatch)

These four steps would seem to suffice to me, avoiding race conditions, as well as a partially written, stalling write operations just before the commit operation has begun.

Thread Thread
sebmtl profile image
Sebastian Rath Author

Hey, sorry for the late reply. That's a really good idea you have there, I'll have to let that sink in. In case you're interested, you are more than welcome to contribute your ideas to the project either on GitHub directly or on Discord as well. Door is always open!

Thread Thread
december1981 profile image
Stephen Brown

I will clone and check the project out, it does sound interesting.