<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Prakash Bhattarai</title>
    <description>The latest articles on DEV Community by Prakash Bhattarai (@mrbprakash06).</description>
    <link>https://dev.to/mrbprakash06</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2776360%2F8b0163ef-5069-43e0-95c6-5576ac2cc134.png</url>
      <title>DEV Community: Prakash Bhattarai</title>
      <link>https://dev.to/mrbprakash06</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mrbprakash06"/>
    <language>en</language>
    <item>
      <title>Software Reliability</title>
      <dc:creator>Prakash Bhattarai</dc:creator>
      <pubDate>Fri, 19 Jun 2026 11:00:06 +0000</pubDate>
      <link>https://dev.to/mrbprakash06/software-reliability-122l</link>
      <guid>https://dev.to/mrbprakash06/software-reliability-122l</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Software reliability refers to the probability that software will perform its intended function without failure under specified conditions for a designated period. It is a critical dimension of overall software quality.&lt;/p&gt;

&lt;p&gt;Reliable software improves users’ trust in a system. From businesses to government institutions, the quality of the software they provide reflects the care and responsibility they have toward their customers and citizens. Reliability also minimizes losses caused by downtime. Such losses can range from economic damage to risks involving human life.&lt;/p&gt;

&lt;p&gt;Reliable software helps organizations achieve their broader goals. Only when a software system functions correctly and consistently can an organization provide the services it intends to deliver.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pillars of Reliability
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Availability
&lt;/h3&gt;

&lt;p&gt;Software availability measures how often a system is ready to provide its intended function when users need it. Availability is commonly measured in “nines.” For example, three nines, or 99.9% availability, means the system may be unavailable for a maximum of about 8.76 hours in a year. The higher the number of nines, the better the availability.&lt;/p&gt;

&lt;p&gt;Improving the availability of a software system requires several techniques. The software should be designed with fault tolerance in mind. Data replication and service replication are commonly used to improve a system’s ability to continue operating even when some components fail. Logging and monitoring also play an important role in identifying issues proactively so that service interruptions remain minimal.&lt;/p&gt;

&lt;h3&gt;
  
  
  Correctness
&lt;/h3&gt;

&lt;p&gt;Correctness is a simple but essential pillar of reliable software. Even if a system is available and responsive, it has little value if it produces incorrect results. Therefore, it is important to ensure that the outputs provided by the system are accurate and consistent with the expected behavior.&lt;/p&gt;

&lt;p&gt;Rigorous testing, code reviews, logging, and monitoring are common techniques used to improve software correctness. These practices help detect defects early, reduce unexpected behavior, and ensure that the system continues to function as intended.&lt;/p&gt;

&lt;h3&gt;
  
  
  Security
&lt;/h3&gt;

&lt;p&gt;Security is the next important pillar of reliable software. It refers to the practices, processes, and safeguards that protect software applications and digital systems from threats and attacks. Users cannot truly rely on software that fails to protect their valuable data, privacy, and resources.&lt;/p&gt;

&lt;p&gt;Software security is a vast topic in itself. At its foundation, it focuses on preserving the confidentiality, integrity, and availability of user data and system resources. This can be achieved through standard secure coding practices, rigorous security testing, secure infrastructure, and proper access control. A secure system not only prevents unauthorized access but also strengthens users’ trust in the reliability of the software.&lt;/p&gt;

&lt;h3&gt;
  
  
  Performance
&lt;/h3&gt;

&lt;p&gt;Once software is available and correct, performance becomes another important aspect of reliability. Response time is important for user satisfaction, while throughput is important for the business or organization operating the system.&lt;/p&gt;

&lt;p&gt;Improving software performance requires attention to several areas. Developers can optimize algorithms, refactor inefficient code, and use appropriate subsystems such as databases, message brokers, caches, and proxies. Since most modern software runs over the internet, server resources must also be optimized. This may include upgrading RAM and CPUs, increasing network bandwidth, using better routers, and improving internal connectivity between subsystems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Reliability is not just another property of software; it is one of the most important traits of modern software systems. People are unlikely to use software if they are not confident that it will work when they need it. Therefore, organizations must focus on reliability alongside features and functionality.&lt;/p&gt;

&lt;p&gt;As discussed, reliable software is built on four major pillars: availability, correctness, security and performance. Everyone in an organization has a role to play in keeping these pillars strong. &lt;/p&gt;

&lt;p&gt;Management must ensure that the software development process follows standard practices such as testing, code reviews, and proper planning. &lt;/p&gt;

&lt;p&gt;Developers must write efficient code and design systems that function correctly under the expected load with acceptable performance. Standard design patterns and architectural practices should be used where necessary to ensure that software remains scalable, performant, and reliable.&lt;/p&gt;

&lt;p&gt;Similarly, the operations team must estimate hardware and infrastructure requirements so that the system can handle user demand, even during peak usage. Techniques such as auto-scaling can help maintain a balance between cost and service quality. &lt;/p&gt;

&lt;p&gt;Ultimately, reliable software is the result of careful planning, disciplined development, continuous monitoring, and shared responsibility across the entire organization.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.waru.edu/acquipedia-article/understanding-and-achieving-software-reliability" rel="noopener noreferrer"&gt;https://www.waru.edu/acquipedia-article/understanding-and-achieving-software-reliability&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.geeksforgeeks.org/system-design/why-is-reliability-important-in-a-system/" rel="noopener noreferrer"&gt;https://www.geeksforgeeks.org/system-design/why-is-reliability-important-in-a-system/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.crowdstrike.com/en-us/cybersecurity-101/cybersecurity/software-security/" rel="noopener noreferrer"&gt;https://www.crowdstrike.com/en-us/cybersecurity-101/cybersecurity/software-security/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>systemdesign</category>
      <category>architecture</category>
      <category>software</category>
      <category>reliability</category>
    </item>
    <item>
      <title>Pub/Sub</title>
      <dc:creator>Prakash Bhattarai</dc:creator>
      <pubDate>Fri, 12 Jun 2026 07:43:33 +0000</pubDate>
      <link>https://dev.to/mrbprakash06/pubsub-31ke</link>
      <guid>https://dev.to/mrbprakash06/pubsub-31ke</guid>
      <description>&lt;p&gt;Pub/Sub, also known as Publisher-Subscriber, is a communication model where services exchange messages through an intermediary service called a broker. The publisher publishes a message to a topic managed by the broker. Subscribers receive and consume messages from that topic without knowing anything about the producer of the message.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmckch9bmg4669r508api.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmckch9bmg4669r508api.png" alt="Pub/Sub" width="521" height="391"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The core components of the Pub/Sub communication model are as follows:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Producers:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Producers create and publish messages to a topic. For example, in an e-commerce website, the producer may be the order service, which publishes order-related messages to the appropriate topic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Subscribers:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Subscribers are services that receive and consume messages from a topic managed by the broker. They process messages without knowing anything about the producers of those messages. For example, in an e-commerce website, the payment service might be the consumer or subscriber that receives order messages and processes payments accordingly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Broker:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The broker manages topics and handles the receiving and delivery of messages within those topics. The message itself contains the data sent by the producer, which is then used by the subscriber.&lt;/p&gt;

&lt;p&gt;The advantages of the Pub/Sub mechanism are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reduced coupling:&lt;/strong&gt; It reduces coupling between the producer and subscriber. The publisher does not need to have direct knowledge of the subscriber, and vice versa. This makes it easier to add new subscribers without changing the publisher itself.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Improved scalability:&lt;/strong&gt; We can increase the number of subscribers when more messages are produced. If needed, we can also independently scale the producer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Low-latency communication:&lt;/strong&gt; With proper broker configuration, payload size, network conditions, and processing time, the system can support very low-latency message delivery.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The disadvantages are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Increased complexity:&lt;/strong&gt; With the introduction of a broker, there is a whole new system to manage alongside the producers and consumers. This adds another level of complexity to the system.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;More difficult debugging:&lt;/strong&gt; Since messages are produced and consumed independently, it becomes more difficult to trace the flow of messages.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Message ordering challenge:&lt;/strong&gt; Depending on the broker and its configuration, messages may not always be processed in a strictly sequential order.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The common use cases of Pub/Sub are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Real-time messaging and chat applications&lt;/li&gt;
&lt;li&gt;Distributed systems and microservices&lt;/li&gt;
&lt;li&gt;Event-driven architectures&lt;/li&gt;
&lt;li&gt;Stream processing and pipelines&lt;/li&gt;
&lt;li&gt;Notification and background processing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The common Pub/Sub brokers are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Apache Kafka&lt;/li&gt;
&lt;li&gt;Redis Pub/Sub&lt;/li&gt;
&lt;li&gt;Google Cloud Pub/Sub&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Code snippets given below demonstrate the publisher which publishes count message to the topic &lt;em&gt;messages&lt;/em&gt; managed by &lt;strong&gt;Redis&lt;/strong&gt; every 200 ms and the subscribers which consumes message and prints count to the console.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// publisher.js&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;Redis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ioredis&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;redis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Redis&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;publish&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;messages&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;}));&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;setTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="k"&gt;catch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// subscriber.js&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;Redis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ioredis&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;redis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Redis&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subscribe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;messages&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;message&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;channel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;channel&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;messages&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;parsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="k"&gt;catch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;References&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://docs.cloud.google.com/pubsub/docs/overview" rel="noopener noreferrer"&gt;https://docs.cloud.google.com/pubsub/docs/overview&lt;/a&gt;&lt;br&gt;
&lt;a href="https://redis.io/glossary/pub-sub/" rel="noopener noreferrer"&gt;https://redis.io/glossary/pub-sub/&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.youtube.com/watch?v=FMhbR_kQeHw" rel="noopener noreferrer"&gt;https://www.youtube.com/watch?v=FMhbR_kQeHw&lt;/a&gt;&lt;/p&gt;

</description>
      <category>pubsub</category>
      <category>redis</category>
      <category>systemdesign</category>
      <category>distributedsystems</category>
    </item>
    <item>
      <title>Scaling WebSocket</title>
      <dc:creator>Prakash Bhattarai</dc:creator>
      <pubDate>Fri, 15 May 2026 05:08:41 +0000</pubDate>
      <link>https://dev.to/mrbprakash06/scaling-websocket-1dfa</link>
      <guid>https://dev.to/mrbprakash06/scaling-websocket-1dfa</guid>
      <description>&lt;p&gt;WebSocket provides low-latency, bidirectional communication between clients and servers. It is a widely used solution for building real-time systems. Once a WebSocket connection is established, the underlying TCP connection remains open until either the client or the server closes it, allowing both sides to exchange messages continuously throughout the lifetime of the connection.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fceljw9x0c10oriwvqk84.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fceljw9x0c10oriwvqk84.png" alt="WebSocket Connection" width="799" height="384"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As systems grow, the challenge of scaling WebSocket infrastructure becomes important. Broadly, there are two approaches: vertical scaling and horizontal scaling.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0rw36o223mxbsdretzef.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0rw36o223mxbsdretzef.png" alt="Vertical vs Horizontal Scaling" width="800" height="315"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In vertical scaling, the resources of a single server — CPU, RAM, and network bandwidth — are increased so it can handle more WebSocket connections. This approach works well for small to medium-scale systems, but only to a certain extent. A single machine cannot be scaled indefinitely, and relying on one server also introduces a single point of failure in systems where high availability is critical.&lt;/p&gt;

&lt;p&gt;Horizontal scaling, on the other hand, increases the number of WebSocket servers. This approach improves redundancy and allows the system to scale using inexpensive commodity servers. However, horizontally scaling WebSocket servers introduces additional challenges.&lt;/p&gt;

&lt;p&gt;To understand the problem, consider a multiplayer game with two WebSocket servers: A and B. Initially, players P1, P2, and P3 are connected to server A, and game events flow normally between them. Now suppose P1 disconnects and reconnects, but this time the load balancer routes P1 to server B instead of server A. How will P1 continue receiving events generated by P2 and P3?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fieqemiengaksp49xe6hr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fieqemiengaksp49xe6hr.png" alt="Scaling Issue" width="799" height="339"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It turns out, the problem can be solved using a shared message broker such as Redis Pub/Sub. When a WebSocket server receives an event, it publishes the event to Redis. Other WebSocket servers subscribed to the same channel receive the event and forward it to their connected clients. In this example, when P2 or P3 sends a game event to server A, server A publishes the event through Redis Pub/Sub. Server B then receives the event and forwards it to P1.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F96m7dqxj6g2cb7hawy3o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F96m7dqxj6g2cb7hawy3o.png" alt="Redis Pub/Sub Forwarding Events" width="800" height="928"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;One important limitation of Redis Pub/Sub is that messages are not persisted. If Redis or a subscriber becomes temporarily unavailable, some events may be lost. Because of this, WebSocket systems are often designed so that occasional dropped events do not break the application. In systems where message loss is unacceptable, additional application-level mechanisms are used, such as acknowledgments, durable queues, event replay, and message persistence.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://socket.io/docs/v4/tutorial/step-9" rel="noopener noreferrer"&gt;https://socket.io/docs/v4/tutorial/step-9&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://redis.io/docs/latest/develop/pubsub/" rel="noopener noreferrer"&gt;https://redis.io/docs/latest/develop/pubsub/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=gzIcGhJC8hA&amp;amp;t=175s" rel="noopener noreferrer"&gt;https://www.youtube.com/watch?v=gzIcGhJC8hA&amp;amp;t=175s&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>websocket</category>
      <category>systemdesign</category>
      <category>redis</category>
      <category>architecture</category>
    </item>
  </channel>
</rss>
