DEV Community

AD
AD

Posted on • Updated on

Let's try building a Scalable system

cover

I previously wrote about:

In this article, we'll get to know the preliminary steps you can take as a Software Engineer for building a scalable system.

Let's see how we can decrease loadtest time from 187s to 31s

Note: I'll be using Node.js but don't skip reading, try to absorb the concept, especially if you're a beginner.

Here is the task:

Build a server with only one GET request to return the highest Prime number between 0 - N

My Setup

  • I've used pure Node.js (Not express.js) for the creation of my server and routes as well, you are free to use express.js
  • You can use this idea with any language, so don't skip reading but you can skip the code/code repo.

Let's Start!

I used this as one of my assignments for hiring (experienced) devs. The session used to be a pair-programming setup where the candidate was free to use the Internet and tools of his/her choice. Considering the kind of my routine work, such assignments are really helpful.

When you wrote a brute-force approach

Let's assume you created your server with the basic algorithm to find a Prime Number. Here is an brute force approach example:

// just trying the first thought in mind
function isPrime(n) {
  for(let i = 2; i <= Math.sqrt(n); i += 1) {
    if (n % i === 0){
      return false;
    }
  }
  return true;
}

function calculateGreatestPrimeInRange(num) {
    const primes = [];
    for (let i = 2; i <= num; i += 1) {
      if (this.isPrime(i)) primes.push(i);
    }
    return primes.length ? primes.pop() : -1;
  }

You'll try to use it in your GET route say like this https:localhost:9090/prime?num=20, it will work fine and you'll feel good. You tried it with some numbers like ?num=10, 55, 101, 1099 you will get instant response and life feels good :)

Hold On!

As soon as you try a large number say num=10101091 you will feel the lag (I've tried it in the browser, you can use Postman)

Since we are not using PM2 right now (which does a ton of things many of the beginners are not aware of), you'll notice that when you try to open a new tab and try for a smaller number, your tab will be waiting for the result of the previous tab.

What you can do now?

Let's bring in concurrency!

  • Cluster mode at the rescue!

Here is the block of code showing Cluster mode in action. If you are not aware of Cluster Module please read about it.

const http = require('http');
const cluster = require('cluster');
const os = require('os');
const routes = require('./routes');

const cpuCount = os.cpus().length;

// check if the process is the master process
if (cluster.isMaster) {
  // print the number of CPUs
  console.log(`Total CPUs are: ${cpuCount}`);

  for (let i = 0; i < cpuCount; i += 1) cluster.fork();

  // when a new worker is started
  cluster.on('online', worker => console.log(`Worker started with Worker Id: ${worker.id} having Process Id: ${worker.process.pid}`));

  // when the worker exits
  cluster.on('exit', worker => {
    // log
    console.log(`Worker with Worker Id: ${worker.id} having Process Id: ${worker.process.pid} went offline`);
    // let's fork another worker
    cluster.fork();
  });
} else {
  // when the process is not a master process, run the app status
  const server = http.createServer(routes.handleRequests).listen(9090, () => console.log('App running at http://localhost:9090'));
}

Voila!

After implementing the Cluster Module, you'll see a drastic change!

You can notice after this using threads, the browser tab with a smaller number will get the response quickly meanwhile the other tab is busy doing the calculations (you can try it out in Postman as well)

For those who are not using Node.js, cluster mode means running your app in concurrent mode using the available threads in the CPU.

Now we have a bit of relaxation but what else we can do to make it even more performant because our single requests with large numbers are still lagging?

Algorithms at your rescue!

I know this is a haunting word but it is an essential tool you cannot ignore and in the end, after implementing a new algorithm you'll get to realize the worth of Algorithms.

So for prime numbers, we have a Sieve of Eratosthenes
We have to tweak it a bit so as to fit this in our use-case. You can find the complete code in the repo inside the class Prime.

Let's have a look at the Loadtesting Results

  • Brute force approach for num=20234456

Command passed to the loadtest module:

loadtest -n 10 -c 10 --rps 200 "http://localhost:9090/prime?num=20234456"

Result:

INFO Total time:          187.492294273 s
INFO Requests per second: 0
INFO Mean latency:        97231.6 ms
INFO 
INFO Percentage of the requests served within a certain time
INFO   50%      108942 ms
INFO   90%      187258 ms
INFO   95%      187258 ms
INFO   99%      187258 ms
INFO  100%      187258 ms (longest request)
  • Using SOE with modifications for num=20234456

Command passed to the loadtest module:

loadtest -n 10 -c 10 --rps 200 "http://localhost:9090/prime?num=20234456"

Result:

INFO Total time:          32.284605092999996 s
INFO Requests per second: 0
INFO Mean latency:        19377.3 ms
INFO 
INFO Percentage of the requests served within a certain time
INFO   50%      22603 ms
INFO   90%      32035 ms
INFO   95%      32035 ms
INFO   99%      32035 ms
INFO  100%      32035 ms (longest request)

You can compare both the results above and can see SOE is a clear winner here.

Can we improve it further?

Yes, we can, we can add a cache, a plain Object in Javascript which can be used as a HashMap.

Using a cache will store the result for a given number N, if we get a request again for N, we can simply return it from the store instead of doing the calculations.

REDIS will do a much better job here

Let's see the results

  • Brute force approach with cache for num=20234456
INFO Target URL:          http://localhost:9090/prime?num=20234456
INFO Max requests:        10
INFO Concurrency level:   10
INFO Agent:               none
INFO Requests per second: 200
INFO 
INFO Completed requests:  10
INFO Total errors:        0
INFO Total time:          47.291413455000004 s
INFO Requests per second: 0
INFO Mean latency:        28059.6 ms
INFO 
INFO Percentage of the requests served within a certain time
INFO   50%      46656 ms
INFO   90%      46943 ms
INFO   95%      46943 ms
INFO   99%      46943 ms
INFO  100%      46943 ms (longest request)

  • Using SOE with modifications & cache for num=20234456

INFO Target URL:          http://localhost:9090/prime-enhanced?num=20234456
INFO Max requests:        10
INFO Concurrency level:   10
INFO Agent:               none
INFO Requests per second: 200
INFO 
INFO Completed requests:  10
INFO Total errors:        0
INFO Total time:          31.047955697999996 s
INFO Requests per second: 0
INFO Mean latency:        19081.8 ms
INFO 
INFO Percentage of the requests served within a certain time
INFO   50%      23192 ms
INFO   90%      32657 ms
INFO   95%      32657 ms
INFO   99%      32657 ms
INFO  100%      32657 ms (longest request)

Time Analysis

Conditions Time
With basic algo 187.492294273 s
With Cache 47.291413455000004 s
With SOE 32.284605092999996 s
With SOE & Cache 31.047955697999996 s

Finally

I hope you understood the benefits of the following:

  • Multithreading
  • Algorithms
  • Caching a.k.a Memoization

I hope you liked this short note, your suggestions are welcome. Hre is the code repo: find-highest-prime

You can find me on Github, LinkedIn, and, Twitter

Discussion (8)

Collapse
sriramr98 profile image
Sriram R

If you're on node, you can do the processing inside a promise. It won't make it faster, but atleast the thread would be non blocking.

Collapse
videtwo profile image
Comment marked as low quality/non-constructive by the community. View Code of Conduct
videtwo • Edited on

You only need a lookup table up to the biggest number you want to handle and then return the result from there. (Then think about how to return from there in a scalable way if you want scalability)
So you can skip any calculation and throw away 90% of the Node.js code that is above...
I would just use Node.js to the only thing it is capable of in this kind of situation, which is to return the result that is coming from somewhere where a proper, competent tech is used by a proper competent developer. (which is not Node.js and not a Node.js developer)
/Actually even for return the value I would use something else to be on the safe side/

Also where is the design here to easily scale up this system to handle 100 or 1000 times the number of requests that it can handle at max at the moment???
Nowhere.
So what you explain here has nothing to do with scalability.
To tell the truth I am not surprised about this after seeing that you are trying to use javascript on the server side for a cluster computation and call this as a design of a scalable system.
Anyway it is really good to see how to use this "very nice clustering" capability in Node.js.
Next time I am interviewing with some idiots who use this to design a scalable cluster computation system like this one (LOL) I will be able to shine...

I am also very interested how much do you know about this multi-threading in Node.js which is actually very confusingly called clustering... (making some people think that Node.js is highly scalable as it has clustering capabilities, as the clustering mostly used when massive parallel computing needed which I don't think Node.js has any support for)

So why exactly do you start the same amount of worker than the number of cores?
This is the max number you can start??
Or what happens if you start more, for example what about 1000 worker?
Do I need an 1000 core processor then???
I really want to know...

On the same note if you can only start up as many worker as the number of cores then basically this clustering stuff in Node.js is a very useless feature when it comes to scaling anything.
Also it tells much about Node.js and its developer community for me that there is nothing written in the official documentation about this.
(IMHO this is the best sign that certain things only there in this language - and also they named in a way they named - to create the big hype around it, but they are actually useless)

On the other hand what I already know that there are more proper tech to handle massive parallel computation and multi-threading which makes possible to scale up and down systems that need massive, possibly parallel computation and that is definitely not Node.js

So in the next job interview my advice is to skip this kind of exercise, instead ask people something that you actually an expert in as a Node.js UI/UX developer and leave out the area of scalability and complicated math computations from the mix.

Collapse
ashokdey_ profile image
AD Author • Edited on

First of all, take a deep breath seems like you're a bit frustrated by my way of learning things. And I don't know who you really are, can't see any validation of 30+ years of experience.

Although I can learn a few things you mentioned in your long essay, I would have appreciated you if you could have suggested a roadway for learning more about Scalability & Mathematical computations.

It is not advisable to demotivate and restrict others from doing things if you cannot guide one.

Have a great day ahead! :)

Collapse
videtwo profile image
Comment marked as low quality/non-constructive by the community. View Code of Conduct
videtwo • Edited on

Actually what is written there in my comment is a validation that it seems like I have much more experience in the area. For example knowing what does it mean scalable system etc..
I can understand if you still do not see the validation of my experience here, but yeah it still makes me frustrated somehow.
I am interviewing with so many people like you as there are so many in the industry nowadays.
You need to decide that you are still learning this stuff as you are saying or you are an expert.
Please only interview people in a way you explained if you feel like you are the second.
In case you are still learning then how can you decide who to hire after an interview you explained when the answer you accept as valid is actually can't be further from the valid answer.
I think you and so many similar people in this industry has the same problem which is they so incompetent and still just learning the first thing about this stuff yet they think they are able to validate others ability and experience.
Yes it is really frustrating...
So what is your validation of experience? That you started 2 startup? LOL

I really do not want to demotivate you, and please keep up with the writing but the only thing is that it is very frustrating when people with 30+ years experience waste their time to interview and pair program on stupid task and then their answer gets validated by people who are still learning... If I were you I would not call this as a fair learning process for myself, instead I would call it as a waste of others time...

You said that you used this as a pair programming exercise to hire an experienced developer...
If I had been that developer I would have been very frustrated...

Thread Thread
ashokdey_ profile image
AD Author • Edited on

I can understand the situation you are talking about in the industry. But I have seen experienced people who fail to answer very basic questions and by experienced I mean 5-6 yrs exp guys who fail to explain about an API and proper use of HTTP verbs. I think you took it personally. When I interview people I make sure they are newer around my experience level.

And have you noticed the word 'beginners'?
Did you notice that in bold I mentioned "reducing loadtest time from 187s to 31s"?
Did anywhere I mentioned that I am an expert?
The task for pair programming is clearly stating the knowledge of basic algorithms and knowledge of caching. These two parameters along with the knowledge of multithreading were what I was looking for in that particular interview.

Is that so hard for you to understand the above points?

Finally, Scalability is a relative word and everyone learns it step by step. I can't see anything wrong about the process I followed, I also stated that I give free hand to the person I interview.

Your words seem to be hot lava flowing through the streets.

Thread Thread
videtwo profile image
Comment marked as low quality/non-constructive by the community. View Code of Conduct
videtwo • Edited on

Designing a scalable system means that you desing a system that can be easily scaled up and down when it is needed.
What you did is that you explained how to improve a brute force solution to make it a bit better. Yeah you also added a very limited scalability to it. As limited as Node.js can support.
I still think this is not a realy usefull design for a scalable system that can be used in a real life massive system. So the title is a bit misleading.
Also it is not about you personally but it is really frustrating that people try to make it beleive that Node,js is capable of handling massive web traffic and is highly scalable.
This is one thing I do not see any real validation of.
Your article is not a validation of this in contrary what is in the title.
The only thing I can see and validate that when it comes to javascript and Node.js there is a lot of hype, features like "clustering" built in. But when you actually want to do anything there is no proper documentation to features, it turns out they are useless and I have never seen anything written in javascript and Node.js that was even stable. Everything feels like in alpha and beta stage. Yeah I know big comapnies like ebay and like they claim they use it in big scale. Now this is what is never validated. Show me the source code that is used by them?
I have seen so many js code in real big scale systems, they were all buggy and instable...
And what frustrates me even more when I go to somwhere and after 30+ years experience I have to show something to validate my knowledge of computing to these people who create instable alpha stage systems and hype around useless tech. And then with 30+ years experience I have to use javascript to create "scalable" systems and if it is not working that based on some people invalidates my knowledge. They think they are so smart so they never listen when I say do not use this javascript to anything.
My experience (30+ years) is that people mostly only use js because they do not know what they are doing and to do that best language is js.
I know you are not going to take my advice nor nobody else...
But anyway at least I did my service to the society to write this down in a few places...
So now I can sleep much better... LOL

I also know that sometimes what I write is confusing and not on point, this is why I keep editing it :-D
So basically when I say not to use js then I mean not to use on the server side for anything computationally intensive or anything on the server side that is important. And when I say important I mean for things that for example can cost money and people's life if something goes wrong. The reason I am saying this is because IT IS NOT STABLE. It does not mean it can't be used for things that does not really matter if goes wrong.. But only use it for that. If people would take these advices then this industry would be a better place...

Of course use it in the browser on the client side, That is where it belongs too...
It is still rubbish but there not really anything else, and also who cares about what goes wrong in the browser, hey are rubbish piece of computing anyway...
(*It is very trendy to move everything to the client side running in the browser, which if overdone can be very bad... Sooner or later browser will be able to kill people using js... LOL so if this is your plan also do not do it and do not use js...)

The worst thing is that way before when I started developing when something said it is stable then it was mostly is. Nowadays people even proud of how instable is their stuff and is still going into production...
After 30+ years of experience it is really frustrating that like it or not this is what you have to use...
And it is hard to see that people even help to create this hype... People who even just learning stuff...
But they already know what to use... Because of all the hype...
But please do not take it personally, it is not just you..

Thread Thread
ashokdey_ profile image
AD Author

I will take all the points into consideration and now I think that I have to change the title which seems misleading as per the content.

As you can see, I have already stated that Node.js is my language of daily use, people are free to use the language they want to use.

I understood about the hype you're talking about. Also, I do not preach about Node.js/Javascript, I find Golang to be a lot better when it comes to multicore systems. I also do not like the fact that the Javascript community is trying to make everything using Javascript.

This calm explanation of yours has taught me a few things, thanks for the detailed explanation. I would like to stay in contact with you for more lessons, you can drop me an email (if you are willing to) here: ad@ashokdey.in

Thanks!

Thread Thread
videtwo profile image
Comment marked as low quality/non-constructive by the community. View Code of Conduct
videtwo • Edited on

Golang is much better when it comes to computations. You definitely need a high performance statically typed and compiled language when it comes to calculate big prime numbers as a first step. What would be even better for this is for example C.
When it comes to scalability you need to think more about a platform than just a language. You need a platform that supports high performance and scalability. It is not only about the language but the underlying hardware and platform on top of that to support the ability to do parallel computation on a cluster.
The reason you only start a number of process equals to the number of processor cores when using Node.js because that is the maximum scalability and performance that it can offer which is nothing when it comes to a complex computational needs. Node.js can serve a lot of request using its single threaded non-blocking architecture as soon as no computation holds up that single thread that it has. You can scale this up using one thread on every core, provided nothing else is running. Otherwise I think Node.js "clustering" can even slowing down things if these threads can't be placed in a separate core each by the Op system. The worst is that you don't even have control over where your threads are placed unless you hack into the op. system which is not something I would recommend you from Node.js :-D
So this is how you can see where is the role of javascript and Node.js in this process of designing a scalable system. I can write about this a lot for you emphasizing that Node.js and javascript is not essential anywhere in a scalable systems, the reason they still use has very different reason. For example an amateur can very quickly learn it and then it cost for a company much less to hire a javascript develpper without proper fundamental knowledge than hiring a competent expert... A well educated professional in the area would not even recommend to use Node.js anywhere here. (unless they want to rip you off, as some big cloud provider does, when they restrict you to use Node.js) /beleive me that is only good for them and not for you/
Although it is true that node.js can be used... but who want to use it, no sane software engineer...

Javascript and Node.js popularity is all about skill shortage of properly educated developers and the huge need for saving on budgets when it comes to paying developers.
the growing priority nowadays is to replace proper but more expensive tech and experts with low cost solutions no matter how unstable and inappropriate they are...
Anyway I sent you an email so we can keep in touch.

PS: I am not saying all of this to discourage anybody to use Node.js. I am using by myself too. You just need to be aware of these facts when you do it. Soultions written in Node.js should never be used for any computations and in an ideal world for anything... But we do not live in an ideal world unfortunately. So we all using it. But once again the main thing is to be aware of these facts....

Also it is very important to understand that a software engineering is not about one language. If you can only use one language then you are not a software engineer just an amateur. You should be able to use any program language needed and even be able to create a programming language if there is not one already that solves the problem. So people who thinks that it is an advantage to only learn one language, e.g. javascript and use that everywhere for everything are not software engineers, only amateurs... If you are a software engineer you need to know how an interpreter or a compiler works, how a language is created to call yourself a professional, and when you know this then you use any program language and platform that is needed to solve a problem, instead of trying to solve any problem with the only one thing you know...
You know the saying: if you only have a hammer than everything looks like a nail.
This goes to all the people who "solves" everything with javascript.

Think about it, it is a dream for every company that want to save huge amount of money to hire a "professional" who only need one tool for everything instead of the one that would use all the tools needed to do things properly. If your only goal to save money you do this even if you know very well that using that only one tool is not appropriate, you would need all the other tools but well you just don't have the money for it.
This is why lot of company likes people who use javascript for everything....
The only benefit is a huge savings for the company but nothing else...
There is no other benefit of using js for everything, especially not for a proper developer.
So everyone "developer" who says the opposite,well they don't even know what they are talking about when they talk about their benefit of this, well are would not call them developer.
Actually there is one benefit for them, getting job in this industry, getting payed, in contrary of being totally incompetent.
Unfortunately these are the kind of incompetent "developers" that you can see everywhere nowadays...
And yes it makes me very frustrated to even think about this...

Oh and one more thing, just to be so naive and think that some more people will ever read this.
Where they say Node.js "excels" is the two way communication with browser using sockets was actually possible even for long time ago using for example C. I am worked in a company when 15 years ago we used similar solutions using plain C sockets. The only bottleneck was not C but how rubbish were browsers that time. One big difference when you use sockets from C in contrary of Node.js, that in C you should know what are you doing when you are doing it, but in exchange you also have control over all the aspect of it and you have insight into what is possible and how. So for example handling two way socket connections in a secure and scalable way using even large cluster of servers and asynchronous request handling.
This is all really far from new tech as some javascript "developer" think...
There is nothing new in this respect...
The only new thing is that nowadays you can do this with Node.js without really even knowing what you are doing and without have any real control over what is going on in the background.
This is the only new thing, and trend, and the trade-off when some using javascript instead of a proper program language...

If with Node.js you want to know what is going on and control it then you need to hack into the V8 engine which is again - what a surprise - written in C.

Please note that I am not saying this because i am a C developer.
I am a software engineer so I can use any program language that is needed.
This is what makes somebody a software engineer and differentiate from an amateur who tryi to use the one language and one framework that he knows to try to solve every problem.
No matter who well you know that one single program language and framework, if you only know one and can only use one then you are an amateur.
Software developers don't know one program languages and one framework, they create them as the need arises.