loading...

re: Just in! A New Persistent NoSQL Database (18 KiB only!) VIEW POST

TOP OF THREAD FULL DISCUSSION
re: This isn't very good: module.exports = (path, data, fs) => { fs.writeFileSync(path, '') for (const key in data) { const line = JSON.st...
 

npmjs.com/package/msgpack

node-msgpack is currently slower than the built-in JSON.stringify() and JSON.parse() methods. In recent versions of node.js, the JSON functions have been heavily optimized. node-msgpack is still more compact, and we are currently working performance improvements. Testing shows that, over 500k iterations, msgpack.pack() is about 5x slower than JSON.stringify(), and msgpack.unpack() is about 3.5x slower than JSON.parse().

 

I'm aware of this though it's a bit more complex. Non-native msgpack implementations will find it hard to compete against native implementations.

I did a lot of research on binary serialization. Specifically writing several of my own, comparing to igbinary and msgpack for speed, size when uncompressed and size when compressed.

Two things were difficult:

  1. Making any difference in size, except for string deduplication (igbinary does quite well with this even post compression). Note that most improvements made to the binary to make it smaller really had much the same effect as pre-compressing so rarely made any difference after compression.
  2. Anything, and I mean getting anything much faster in JS be it in JS itself or C++ was very difficult to get the same performance gain you get compared to something like PHP C extensions. A lot of the overhead in JS is baked right into the objects used under the hood in C++ if I remember not just the interpretation.

msgpack being slower is a problem though I'm not sure why it should be so at least on the backend. I only ever considered that a problem on the frontend. It should have all the same limitations and advantages as the native JSON code does. Unless, which I can't be sure about as it was a while ago, there might have also been some barriers between the engine and extensions.

In theory a backend implementation of msgpack, at least minimal, would be little more than a copy and paste of the one for JSON (though if it uses something like YACC it's overkill). It should basically be the same but with less so faster.

It doesn't help that JS isn't entirely binary friendly (another area which it sorely hurt compared to PHP, which I mention a lot because a lot of infrastructure is a mix of the two and they need to talk to each other).

JSON does in some circumstances have size benefits but they're not often brought out.

It's a shame that msgpack isn't as fast at least frontend due to no native (tried that webasm stuff / cutting edge JS though, typed arrays, etc)?

Regardless, it's not always the main thing. It's still potentially much smaller which might be important if bandwidth or the amount written / read is a concern.

I haven't checked recently, but msgpack may still not support dictionaries as an extra option. You can however do this yourself. In the serialized stream you might look for if there are repeat strings and then store those only once in an array then the string type points to those instead.

That has an interesting benefit as you can reduce memory use as well as a side effect reducing copies of the same string. Though I don't know if JS engines are able to do that themselves in any situation.

A similar thing might be done with object arrays. IE if they're all the same, instead of [{a: 123, b: false}, {a: 321, b: true}] have ['a', 'b', 123, false, 321, false] though that kind of things you might want to better do yourself over the serialization. You can do the same for a string dictionary like igbinary has which is surprisingly effective even post compression but might have complications or limitations with streaming and memory usage patterns.

This is a similar concept to gson and interned strings. I've been using those successful and for proven gain for some time now.

There's probably a missing pre/post pack library out there for these cases. They can all take more or less CPU but it's basically the same trade off the same as with compression, spend more time but get better results.

I found it very annoying with very large things that you can't get the best of both worlds with msgpack. IE, you notice both the CPU time and net time.

It wouldn't surprise me if one of the API's somewhere sneaking into modern JS has some binary serialisation that might be native.

I'm seeing C++ for msgpack so it should at least in theory be possible to get it up to JSON speed on the backend. It's sometimes more important on the backend because that's where the contention is.

Thanks for the detailed explanation ❤️

My goal is to be pure JS implementation with no binary dependencies.

I think streams are optimized to minimize CPU and memory overheads

There's always a lot to learn, I still learn every day.

No bins is good for supporting front and back as well as being hardware agnostic.

You should take a look at fs.open, and see if there's a flock, etc.

You might decide you don't need to implement flock but if you make that decision it's crucial to understand concurrency issues.

To test this might be easier than you think. See if there's a sleepSync. Then after writing each line sleep.

Then run a basic program filling the in memory store then saving it twice at the same time.

You'll see lines from both. Ok it doesn't matter you mgiht think they're the same and it's just the same op twice but initialise each database so that they have the same keys but different values.

If you really push it, just run the program thousands of times at the same time with no delays, then you can get something like the file will have one line that suddenly stops and then the line from another program continues meaning the lines wont even be parsable in JSON.

When people are starting out the first thought is "how can I make this work". The next step, the revolution, is thinking "how can I break this" :D.

code of conduct - report abuse