Blueis-h 3: Me v/s the need to over-engineer

#systemdesign #redis #pubsub #eventdriven

Who could've guessed this update would be even more delayed 😭. Not a good enough excuse this time, I just took time to actually read into my next set of topics instead of starting directly with implementing.
..... But let's backtrack, my last post mentioned that the next thing on our agenda was implementing our own hash tables, and like the reason for it is pretty simple actually.
Our architecture is single threaded event-loop based concurrency, so, if we use the unordered_map a few issues arise, the most notable one being tail latency.
Our table will be resized when our load_factor is exceeded, this resize when triggered, is typically optimized by stl for throughput and is performed in one go, what this does is cause a latency spike for that one request but overall the average is minimized. We do not want this.
If our system encounters a latency spike, the entire pipeline is blocked for that duration which is a serious concern and the solution is to spread out the the resize over multiple incoming requests and while this would increase the time taken by those requests it would avoid the spike and instead spread it out.

The Over-engineering hits

The above is not at all a trivial implementation, but were it being worked upon since my last post it should've long been implemented.
Sadly however, this was unacceptable to me.
A friend pointed out a flaw in my buffers, that I'm maintaining for each socket, for those I was using stl uint8_t vectors and while that seems fine in principle it was bad design.
All of the consumption that occurs in our vector occurs from the front and this cause the rest of the data to be shifted to the front costing us O(N) time which must be mitigated.
The way I decided to go about this is using pointer arithmetic.
We could simply have a struct like so:

struct Buffer {
    uint8_t *buffer_begin;
    uint8_t *buffer_end;
    uint8_t *data_begin;
    uint8_t *data_end;
};

for consumption we simply move forward the data_begin pointer and all it costs is O(1), for appending we can similarly increment the data_end pointer and perform a memcpy().

Issue in this

This was not so much an issue with the approach as it was an issue with my understanding of pointers and their arithmetic, but it took me a full day to actually understand and implement the resizing functionality of this approach.

We have 2 scenarios:

A. enough unused space in buffer to compact it and append

In this scenario we can simple perform a memmove() and move the data from data_begin to buffer_begin.
We can then adjust the pointers accordingly and proceed with copying new data.

B. Not enough space in buffer

This is the scenario where we must allocate more space and then first copy the current buffer followed by the new data. I decided to be traditional and 2x the size of the current buffer until it was large enough to accommodate the new data.
Here we make use of malloc() followed by a memcpy() since using calloc is unnecessary here as we do not need zero initialization and we can simply overwrite the garbage values (this is different from what we'll perform in the hash table but that will come later (sooner rather than later i hope 😭)).

I guess this concludes this post. I'll challenge myself to finish up hash tables asap and write the next blog soon.

Bye! 👋