DEV Community

Cover image for System Design : Reliability
DevByJESUS
DevByJESUS

Posted on

2

System Design : Reliability

Hello.
Today we are going to talk about a subject that comes in mind when we design a data intensive system. But before we are going to present what is a data Intensive system , what are the principles when designing this type of system.
Let's go ;)

Firstly This amazing Quote

The Internet was done so well that most people think of it as a natural resource like the Pacific Ocean, rather than something that was man-made. When was the last time a technology with a scale like that was so error-free?

  • Alan Kay

Data Intensive System

Read what Martin Kleppmann says about the definition

Data-intensive applications are pushing the boundaries of what is possible by making use of these technological developments. We call an application data-intensive if data is its primary challenge: the quantity of data, the complexity of data, or the speed at which it is changing (as opposed to compute-intensive, where CPU cycles are the
bottleneck).

I can add nothing to this definition ;)

Some Questions

I can assure you , if we are in an I.T team some of these questions below come frequently

How do you ensure that the data remains correct and complete, even when things go wrong internally? How do you provide consistently good performance to clients, even when parts of your system are degraded? How do you scale to handle an increase in load? What does a good API for the service look like?

We can give response to these questions based on System Design Principles

what we are waiting

Principles Of System Design

There is Three core principles , and i think we have heard some of them sometimes

  1. Reliability
  2. Scalability
  3. Maintainability

Today we are going to talk about Reliability only ;)

Reliability

What is Reliability , we say that a system is reliable when he is fault-tolerance , otherwise when it can prevents error. We all agree that there is no system with a fault-tolerance of 100%. But ;) there is some Faults we can prevent in our system.

Possible Patterns against Reliability

  1. Hardware Faults: It is all the errors which can happen on hardware like Hard disks crash, RAM becomes faulty, the power grid has a blackout, someone unplugs the wrong network cable>

How to fight it ? : The world grows , storage or machines grow so for this type of Faults , we can think about redundancy to the individual hardware components in order to reduce the failure rate of the system. Disks may be set up in a RAID configuration, servers may have dual power supplies and hot-swappable CPUs, and data centers may have batteries and diesel generators for backup power.

  1. Software Faults : Martin Kleppmann says The bugs that cause these kinds of software fault often lie dormant for a long time until they are triggered by an unusual set of circumstances. In those circumstances, it is revealed that the software is making some kind of assumption about its environment

How to Deal With It ? : Carefully thinking about assumptions and interactions in the system, thorough testing, process isolation, allowing processes to crash and restart, measuring, monitoring and analyzing system behavior in production .

And Finally , it would have surprise me if we humans were not in this list 😄

  1. Human Faults : one study of large internet services found that configuration errors by operators were the leading cause of outages, whereas hardware faults (servers or network) played a role in only 10–25% of outages.

Possible Solutions :

  1. Decouple the places where human make the most mistakes from the places where they can cause failures , i think we use this daily in our life when we take our daily decisions, for example to buy electricity in our home we want to make no mistake but for buying sugar if there is a mistake the consequences are not as big as for electricity.

  2. Test At All Levels from Unit to Integration : With Automated Testing our system can give us confidence in his daily working on the users side.

😊 Thanks For Reading. By The Grace of JESUS 😊 in the Next Article we will talk about scalability.
The Book of M. Kleppmann Designing Data Intensive Applications .

Heroku

Simplify your DevOps and maximize your time.

Since 2007, Heroku has been the go-to platform for developers as it monitors uptime, performance, and infrastructure concerns, allowing you to focus on writing code.

Learn More

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Discover a treasure trove of wisdom within this insightful piece, highly respected in the nurturing DEV Community enviroment. Developers, whether novice or expert, are encouraged to participate and add to our shared knowledge basin.

A simple "thank you" can illuminate someone's day. Express your appreciation in the comments section!

On DEV, sharing ideas smoothens our journey and strengthens our community ties. Learn something useful? Offering a quick thanks to the author is deeply appreciated.

Okay