DEV Community

Cover image for Database and Scaling : Backend Development [Part 2/2]
Atishay Jain
Atishay Jain

Posted on • Edited on • Originally published at Medium

Database and Scaling : Backend Development [Part 2/2]

In this article we will continue where we left off. If you haven’t checked the ‘Part 1’ out, I highly recommend to read that first, so you get a basic understanding of how things work at the backend :) Now let’s get started.

Table of Contents

1. Databases

Data persistence is integral to the vast majority of web applications. Being able to understand how to structure, build, and query your own database are important skills for any full stack developer to have.

Databases are split into “relational” and “non-relational” types of databases, and each handles data and scaling in different manners.

  • SQL(Structured query language) databases are relational databases that store and manage data using tables. They ensure data integrity, support powerful querying with SQL, and offer scalability, ACID transactions, and data security. Popular implementations include MySQL, PostgreSQL, and Oracle Database.
  • NoSQL (Not only SQL) databases offer flexible data models, scalability, and high performance. They don’t enforce strict schemas (like SQL) and excel in handling large volumes of unstructured data. Popular implementations include MongoDB, Cassandra, and Redis. They are ideal for real-time data, high-speed transactions, and large-scale analytics.

Ultimately, which DB you chose depends on the type of functionality you want in your application. This article goes in depth with the difference between the SQL and NoSQL DB’s.

2. Authentication and Authorisation

Are they the same thing ? Definitely not.

Authentication is the process by which we determine whether a user is who he claims to be. Whereas authorization is the process of verifying what specific applications, files, and data a user has access to. Authorisation always comes after authentication.

There are various way of providing authentication, some of them being -

  • Classic email and password verification
  • Google’s OAuth 2.0 protocol (sign in with google feature)
  • JWT tokens. Learn More.
  • Using libraries such as passport and session. Learn More.

Note that hashing users' passwords before storing it is a very important step, storing user’s password in plain text in your database is very dangerous flaw. A library called bcrypt helps with hashing and comparing the passwords.

If you want to learn more about authentication, then I recommend reading about zero-knowledge proofs, which is a core concept in authentication.

3. Performance Optimisation

In today’s fast-paced digital world, a slow-loading website or application can frustrate users and cause them to abandon it altogether. Therefore, it’s crucial to optimize the performance of MERN stack applications to ensure they can handle a high volume of traffic without compromising on speed and performance.

Performance optimisation can be categorised in to two parts -

  • Code Optimisation : This includes techniques like minification of code, lazy loading, server-side rendering, code splitting and load balancing.
  • Database Optimisation : This includes techniques like indexing, denormalisation, caching, partitioning and database sharding.

Note that these optimisation techniques becomes important when developing enterprise level applications. We will discuss caching and database sharding in detail in the upcoming sections.

If you want to know more about the techniques mentioned above. Check out this article.

4. Caching

As discussed above, caching is database optimisation technique. A cache layer or server acts as a secondary storage layer, faster and highly efficient to temporarily store a subset of data which doesn’t change often and is frequently accessed by the client. To maintain it’s small size and freshness, the data items also have an expiration time.

Two caching paradigms-

  • Cache Aside Pattern : Also known as lazy loading, is the most common caching pattern available. Here, the cache is updated after the data is requested. In order to read data from the database, the cache is first checked to determine whether the data is available. If the data is available (also known as a cache hit), the cached data is returned. Otherwise (cache miss), the database is queried for the data and then the cache is the populated with this data.
  • Write-Through Pattern : Here, when the application or backend process updates the primary database, the data is also updated in the cache. In case of cache miss, lazy loading pattern is deployed. Here, the chances of a cache miss is reduced but, the size of cache has increased and it’s also storing items which are infrequently accessed.

node-cache’ and ‘memcache’ are two popular Nodejs packages. The most popular option for caching is using ‘Redis’. Being an in-memory database, its data access operations are faster than any other disk-based database, which makes Redis the perfect choice for caching.

Caching with redis

Check out this article for difference between on-disk and in-memory databases.

5. Scaling

While developing an application in our local machine we have one server, to which we make a single request at a time. Also, our application acts as a single service. This type of system is called a monolithic system ((0,0,0) in the cube shown below)

However, an enterprise level application needs to handle multiple requests at the same time from possibly million of users. Surely, there is a need to scale our application so that it can perform in these type scenarios.

The diagram below shows three dimensions of scaling, let’s discuss each of them in brief below-

Scale Cube

  • Cloning Services: To prevent one server to become swarmed with requests and thus take a lifetime to respond, we create multiple instances of our server listening for the same set of requests. We can implement this in our local machine, we use cluster module which is a inbuilt module in Nodejs, having multiple cores (or CPU’s) by running each instance in each of the core.
const express = require('express');
const cluster = require('cluster');
const totalCpus = require('os').cpus.length;
//totalCpus will give total number of CPU's in your machine

if (cluster.isMaster) {
for (let i = 0; i < totalCpus; i++) 
  // create worker threads
  cluster.fork();
}
else {
  //each worker thread will execure this
  const PORT = 3000;
  const app = express();
  app.listen(PORT, () => {console.log(`Server running on ${PORT}`)};
}

//This will result in multiple servers listening on the same PORT and also 
//connected to the database seperately.
//Output
/* Server running on 3000
   Server running on 3000
   Server running on 3000
   ....8 times, if running on octa-core machine.
*/
Enter fullscreen mode Exit fullscreen mode

Check out this video to see the difference in response time of multiple requests while using one versus multiple servers.

  • Database Sharding: We saw above that we can create multiple server instances, but the problem is that all these server access the same database, thus querying a DB results in a bottleneck in our application’s performance. Database sharding provides the solution for this, using this technique each server is only responsible for only a subset of data.

For example, let’s take the example of Amazon. Let’s assume that amazon servers are running on 3 servers (let’s call them server A,B and C). These servers are designed in such a way that each cater to the request of only a subset of the user. For instance, Servers A handles users whose names' start with ‘A’ to ‘I’. Server B handles ‘J’ to ‘R’ and C handles ‘S’ to ‘Z’. Now suppose of server B requests data fromt he database, the database will now know that it needs to look in that partition that store users’ information with initial letter from ‘J’ to ‘R’
.

Thus, data partitioning, as shown in the above example can improve the application’s capacity and availability.

  • Microservices: Microservices are an architectural and organizational approach to software development where software is composed of small independent services that communicate over well-defined APIs. Where as Z-axis scaling splits things that are similar, Y-axis scaling splits things that are different. At the application tier, Y-axis scaling splits a monolithic application into a set of services. Each service implements a set of related functionality such as order management, customer management etc.

Microservices AWS

Now our multiple server instances can assigned to a different microservices. This way multiple independent functions in our app need not to wait for each other and can run at the same time. Note that how we divide our application into microservices are important and require careful planning.

Two ways to run your application that has been divided into microservices are using- container or serverless. A platform that provides container service is ‘Amazon elastic container service’. We will discuss serverless architecture in detail next.

If you are interested to know more about backend scalability, you can check out ‘The Art of Scalability’. Or you can check out this youtube playlist.

6. Serverless Architecture

As you may have noticed by now, maintaining a server, making it scalable and secured among other things is not an easy job. Thus, by adopting serverless architecture, developers can offload these responsibilities to a third-party provider, enabling them to focus on writing application code.

Function as a Service (FaaS), a popular type of serverless architecture, allows developers to focus on writing application code.Function as a Service (FaaS), a popular type of serverless architecture, allows developers to focus on writing application code.

Serverless architecture

One of the most popular serverless architectures is Function as a Service (FaaS), where developers write their application code as a set of discrete functions. Each function will perform a specific task when triggered by an event, such as an incoming email or an HTTP request. Developers then deploy their functions, along with their triggers, to a cloud provider account.

When a function is invoked, the cloud provider either executes the function on a running server, or, if there is no server currently running, it spins up a new server to execute the function. This execution process is abstracted away from the view of developers, who focus on writing and deploying the application code.

Few of the major FaaS providers are AWS Lambda, Google Cloud Functions and AWS Functions.


References

Image Source

Top comments (0)