Day the machines talked and Bloom Filters bloomed!

#systemdesign #devjournal #distributedsystems

Hello reader! As you know I am building CodeArena and it is a decentralized system. So this is the content I will be covering in today's blog:

CodeArena Progress - The machines can now elect a leader and the system is capable of repairing itself.
System Design Question - Peer discussion on today's system design problem helped us come up with an amazing solution which was just a thoughtful combination our solutions.
Next Plans - What I will be doing next in CodeArena and in System Design along with some in parallel "mini" tasks as sprinkles on ice cream 😂.

CodeArena Progress

So last time I had architected the system and this time I have implemented a part of it. The whole system is too huge and complex (yeah I realize it more and more as I implement it 😂) and hence implementing every feature even for a single node is not less than a nightmare 🫠.

But nevertheless I wrote the C code for an empty node (an incomplete empty node 😂). This node has two threads one is a sender and the second is a receiver. When the node starts, it creates two sockets, one for listening heartbeats and one for sending heartbeats. Both of these threads have a shared variable that holds a struct sockaddr_in value.

The sender first checks if the shared variable does have an address or not (yeah, I used a bool to check for that, that is another shared variable) and if it does not have any value the sender will send its own heartbeats to the network as broadcast.

Meanwhile the receiver hears for any existing monitor signals. Now suppose the system is just starting so this thread won't hear any monitors. So the thread tries 5 times to hear the monitor heartbeats (the socket has the SO_RCVTIMEO option enabled hence it times out every 500 milliseconds when it does not get data). So when it does not hear the monitor for 5 consecutive heartbeats, it starts an election with all other empty nodes.

The receiver thread then calls another function which performs the consensus. The consensus is bascially a UDP voting mechanism and these are the details of how I built it. This new function creates two new threads, lets call them s2 and r2 since I cannot use the name sender and receiver again 😂. so now s2 sends 3 consensusStart packets and r2 listens for 3 consensusStart packets sent by itself (every node has a unique UID which I get from the kernel's entropy pool). So once s2 sends 3 such packets it becomes the receiver and starts collecting votes from other nodes. And once r2 gets 3 packets it sends a vote packet thrice in the network. the vote packet contains a random number.

Every node puts its own random number in the start packet. now the r2 threads collect these and pick the highest received random number (the random number can be the number its own s2 sent). Now s2 listens for these voted numbers and figures the 50% + 1 voted number. If the voted number is its own generated number this node becomes the new monitor and otherwise everything continues like nothing happened. r2 and s2 exit normally.

Now suppose a node starts off and there is a monitor already present, then in this case the node first sends a broadcast using its sender thread as usual but this time the monitor replies it using broadcast saying "Yeah, I am the elected monitor and you can send your heartbeats to me". Now this message is not listened by the sender, this is a very decoupled system hence difficult to understand (I had to refer my code twice to write this down in simple words 😅, but 0% AI for sure!).

This message from the monitor is heard by the receiver thread which updates the shared struct sockaddr_in variable and puts in the monitor address. Now the sender thread is doing its job by sending "Hello, Anyone there?" signals but as soon as the receiver updates the variable the sender thread would send a unicast heartbeat to the monitor from the next time.

The monitor also sends out heartbeats every 500 milliseconds so that the nodes know no need to conduct elections right now.

This means that if I allow the nodes to settle (I did test this for 4 pcs and it works) and then terminate the program on the monitor, the other nodes pick a new monitor quickly stabilizing the system under 5 seconds.

So this was everything I built till now, but I am working on more next which come up in the later section.

The System Design Question

So today's question (more precisely, yesterday's question 😅) was this. The question involved creating an Online Offline Indicator.

Core Requirements:

should update online status of a user within 10 seconds of user coming online

can be lenient in marking a user offline

should scale for 5 million active users at any given moment

a user should see accurate status of any other user, eventually

Think of Discord, it shows you whether a user is online or offline using that small green/red dot. This is nearly the same question.

So this was my approach:

This design is very blur since Dev does not allow svgs, a better image can be found on Hashnode. Checkout the hasnode blog to have a better view, link at the end.

Okay, big design, not so readable, typical me 😂

So this design is built on my previous design as you can see in the left corner. In this design too every continent has a single server and every such continent server has multiple region servers. Now instead of using redis as a cache, these servers use bloom filters as cache. The good thing about bloom filters is they have O(1) reads, updates and consume a lot less memory. Not exactly O(1), it is O(k) which is the number of hash functions the filter uses, so in this case it should be around O(2) or O(3) which is O(1) I assume 😅

So every read goes into the local bloom filter then to the region database and then to the continent bloom filter then to the actual continent database and finally to the intra continental database. Read is trivial there might be very few cases where in the request travels so much and even if it happens it would happen exactly once since the system will sync information according to the frequency of connection between two users. If a american user has an australian friend the servers would be updated in both places.

And for the writes, the user's machine periodically (about 1 minute) pings the region server that it is still online. the region server updates the bloom filter and continent server if the user was previously offline. and if the update does not come through the regional server marks that user as offline and removes them from the bloom filter and sends the update to the continent database.

Now there was a flaw in this which I got to know when I saw the design of my other friend. The flaw was that I was using a persistent database for something which does not need persistence. It would be okay if your friend is shown offline for a minute or 30 seconds while you are chatting with them but eventually their correct status should be revealed.

My friend (who actually knows more system design than me, not kidding) had used an amazing technique and scaled the system using only redis but the concept was just mind blowing!

He used distributed caching, a built in feature of redis. The architecture inside redis is Master Slave architecture wherein a master has n number of slaves and those slaves are read replicas. and there is a central server which handles the client requests. The data is divided into buckets and kept in the redis. Each bucket has a master which is connected n number of slaves. Now at any time a master can fail hence to ensure the system does not fail redis allows a slave to be controlled by any other master if its master dies.

And I genuinely find this design genius, like suppose if the central server was a continent server in my design and the regional servers are the masters. And the slaves are the local servers. Think of the speed of the system if the redis instances were replaced with bloom filters! Though this system I just described more polishing and I agree it has a lot of problems you could potentially use to destroy the system, but it is overall an amazing system.

My friend got his inspiration from this blog. And he is a really smart guy, you should really check out his LinkedIn.

Next Plans

So next I am going to take a deep dive into linux containers and linux rootless containers (Yeah both are different, totally different 😓)

And as far as System Design is concerned, we are going to approach this question, and if you're someone who is better in system design, we'd like to genuinely learn from you and even if you're just starting out we're with you since we are also basically in the "just starting out" phase 😄

The parallel processing

So currently I got an amazing freelance work from a to-be start-up. And the sad part is it is nothing around systems, but the work involves developing mobile and desktop cross platform app along with website. And also having offline sync, yeah no internet is allowed. The app should not use the internet even if it is available. Now I cannot share all the details but this is as far as I could go 😁

The other smaller "sprinkles" are that I am going to continue my freelance work soon at another company (yeah I had to pause due to my exams and later some problem from the organization).

Along with that I am planning to explore the Linux repository and looking forward to getting my name on that mailing list (I hope so! 😅) and also refreshing my compiler design concepts to later explore LLVM (around August or September, as soon I get comfortable with looking at linux's repository 😂).

Thanks and goodbye!

Thanks a lot for reading this till here! Will keep the blogs coming at a better pace, but till then Sayonara!!

GIF Credits: https://tenor.com/pptzyr1WqHS.gif

If you've read till here and if you just a bit more time, I write on other platforms as well and I would appreciate if you could give my blog a view on those platforms 😅

Hashnode: https://keep-alives.hashnode.dev/day-the-machines-talked-and-bloom-filters-bloomed

Medium: https://medium.com/@meetgandhi13062005/day-the-machines-talked-and-bloom-filters-bloomed-4cd28cd4fcda