With the advent of modern C++, it has become easy to achieve many things elegantly, for example a web-server. As hardware gets modern (like Flash disk instead of magnetic ones), the concept of database and web-services needs to be modernised as well.
So, what's the problem with traditional database systems?
Relational Database representation compared to client side (usually Object Oriented) representation are completely different. To overcome this No-SQL was introduced like mongo, couchdb etc were introduced.
However, even then they were developed keeping in mind the magnetic disks and were not tuned for optimal usage for new hard disk systems like flash disks. One such optimised store for new era hardware is rocksdb, which had the benefit of being developed later. Similarly we need to come up with new concepts which makes client side development easy as far as data storing and querying is concerned.
In comes the concept of Quarks (the name originally given by one of my esteemed colleagues Dr. Russel Ahmed Apu). Quarks plan to provide a uniform structure to address architectural problems, a proposed step in the right direction to modern software development. At the heart of quarks is the concept of simplicity. Programming in modern era should be simple as far the client program is concerned, be it a nodejs server-side application - invoking some services or a web-front interface or native apps. The client shouldn't have to worry about traffic management, threads, scaling and distributing of the system when the need arises. A majority of the modern apps now a days has to deal with data and Quarks provide a mechanism to cache, store and retrieve and operate on this data fast (taking advantage of modern hardware) with clever querying techniques.
It is probably better to explain the usage of Quarks with a real life example.
Before moving on to the example, here is the essence in two lines:
- User-->[Quarks.save] ->ThreadManagement->[Cache/Ram]->Queue->PersistentStorage
- User-->[Quarks.Query]->ThreadManagement->fetch to [Cache/RAM]->return
Will discuss how the scaling happens for huge data in a bit after we go through a use case scenario.
User Story - A public chatting system
Step 1 : User types name and asl (age sex location) and
choose from a list of chat channels and clicks join.
Call goes to nodejs/php server through api or socket.
server generates a user_id and assigns a user to a channel
Step 2 : User sends some messages to a specific channel
Step 3 : On re-entry user can see all messages for that channel.
We address these steps as follows:
(Using diagrams to illustrate the solution)
When I talked about clever querying, have a look at how using wild card search it is possible to retrieve the desired info from loads of data.
Needless to mention that all the saving, server hits, request queuing, traffic handling, thread management is now a part of the Quarks system which takes the headache.
Now on to scaling and having a distributed system..
As the traffic grows we can create new instances of the Quarks instance (let's call them core) and these can be controlled by a multicore/multi-instance manager. Illustration below:
So, how does the fetchMessages for a multi-core system look like? Not too much different from what you saw in Step-3. fetchMessages invoke a find query on Quarks.
Again, an illustration on how Quarks is working on the query internally:
First, the find request is carried on to the multicore manager which publishes the request for all of the listening cores to process:
A side note on load balancing (since it's mentioned in the image):
We can simply use a standard load balancer to choose a core to handle request from client. Having load balancers at the front is easy, what's difficult is distributed processing which Quarks make super simple.
Finally, as results become available, the manager aggregates them and returns to the requesting core which was invoked by the client.
Point to note - in client side coding we will hardly notice the difference.
Now, many may question that Redis does almost the same thing. However there are few things which Redis doesn't do that well
1) Persistence of data - you can take snapshots of data, but once Redis goes down, to get the data and get it running again is a laborious task.
Quarks will provide a mechanism , where after a certain amount of memory / cache is filled, it will send batch data for serialisation/ dumping to database for later retrieval. That reduces the hits to persistent storage. However, it will provide a mechanism where it can dump to persistent storage immediately if required.
2) Querying on data - Redis doesn't query on the value. It queries on the key and fetch the value associated with the key. Quarks will be able to query on the value (as well as key) through ORM style queries.
3) Sorting - Redis doesn't return sorted results in certain situations, and the sorting is done from client side for those cases.
4) Expiry - Redis data expiry could be improved which Quarks aim to provide, better expiry mechanism of data.
Lastly, Quarks can reside on the same memory space without the need to have a separate server like Redis.
Now the million dollar question - Is Quarks already developed? No! It's a concept which we are working on and hopefully will have something solid to show in next 6 months. We plan to use following technology:
a) C++ Crow Webserver for serving client requests
b) Good JSON Parsing C++ Library (not decided yet, probably will go with the one provided in Crow)
c) ZeroMQ for PUBSUB.
If we make some progress, rest assured you would be seeing more articles on Quarks in this site.
Just to re-iterate one more time, Quarks is a system as well as a set of guidelines to make modern day programming easy. It ensures clients don't have to worry about scaling, thread and data management etc. We will have more discussions on this some other day.
Signing off for now,