Blueis-h 4: Intrusive thoughts (and tables)

#hashmap #hashtable #redis #systemdesign

Well this entry is late (again 😭) but at least I have a reason this time and the delay wasn't THAT long. So firstly, I got rank 1 in my college's competitive coding competition after our team solved all problems with 10 minutes to spare so YAY!

Now coming to the task at hand this section of the project was truly brutal, to the point where I wanted to give up and simply keep on using the STL's unordered map.

The concept of an intrusive data structure is simple to internalize, a way to give structure to your data without embedding your data within this structure.

The implementation on the other hand is... something else, the functions definitions were the simplest part but then came the structs apparently there's a reason incremental resizing isn't widely used unless necessary, I had to keep track of two maps simultaneously and juggle between them. While this sounds obvious (because duh some entries will be left in the older table even though a newer, and bigger one exists until rehashing is complete), but in practice this meant implementing two levels of functions one for the nodes themselves and one for the table.

Choosing the hash function was the simplest part but reading about it was fun. Apparently the choice is not much of a deal breaker as long as we use a non-cryptographic hash since we want better distribution and faster speeds and do not care much for the hash's security.

I went with MurmurHash most project's I've seen use FNV and while that would've been fine for this project too I just wanted to use Murmur to get that better distribution (I also broke my project trying to write a test script for getting chain lengths to compare between FNV and Murmur so there's that.

Conclusion(?)

This felt more like a rant than a technical blog tbh but I was genuinely able to learn a LOT from this addition alone than I have from all the past ones combined. Learning intrusive data structures, the fact that they can add structure and yet never touch any data itself (thanks to the nifty container_of() definition which tbh took me a while to even make heads and tails of), handling equality checks in two places by comparing hcodes in nodes and data in server all of it was a bit too much to take in at once and I believe I'll keep coming back to it every once in a while to review but it was a lot of fun

Next I think I'll start working on sorted sets since I heard they are pretty useful for getting "hot searches" and similar features, sooo let's see how long that takes.

Byee!!

DEV Community

Blueis-h 4: Intrusive thoughts (and tables)

Conclusion(?)

Top comments (0)