DEV Community

Ganesh Parella
Ganesh Parella

Posted on

How to design a Notification System ?

Imagine you’re building a social platform.

  • A user signs up.
  • Someone likes a post.
  • Someone comments.

Each of these actions should trigger a notification.

Sounds simple, right?

But what happens when thousands of users trigger events at the same time?
Let’s design it properly.

Functional Requirements

  • When an event is triggered → send a notification.
  • If sending fails → retry.
  • Support multiple channels (Email, Push, In-App).

Non-functional Requirements

  • High availability
  • Notifications should not be lost (persistence)
  • Scalable under traffic spikes
  • Pluggable architecture (easy to add new channels)

High-Level Architecture

High-Level Architecture
The basic architecture looks straightforward. But many people ask:

Why use a message queue instead of directly sending the request to the notification service?

Let’s say we want to send a welcome email when a new user signs up.

Most email service providers impose rate limits. Assume the limit is 30 requests per second.

Now imagine 100 users click the sign-up button within one second.

If we send requests directly to the email service:

  • 30 succeed
  • 70 fail

That’s 70 lost users. Not acceptable.

Instead, we push all events into a message queue and process them at a controlled rate. The queue acts as a buffer during traffic spikes. Workers dequeue messages when the service is available and send notifications gradually.

This way, we don’t lose requests, and we stay within the provider’s rate limit.

Bottlenecks and Improvements

1. What if the notification provider is down?
Suppose the email service goes down and every request starts failing.

If we retry infinitely:

  • We waste CPU resources
  • The queue keeps growing
  • The system becomes unstable

To solve this, we use exponential backoff retries.

Instead of retrying immediately, we wait longer between each attempt:
1s → 2s → 4s → 8s → 16s …

After a certain number of retries, we move the message to a Dead Letter Queue (DLQ) for later inspection.

2. Avoiding Notification Spam
Initially, we might send an email for every event.

But that’s not ideal.

If a user is actively using the app, sending an email for every like or comment would feel like spam.

To handle this, we introduce a Notification Engine.

All requests go through this engine, which decides:

  • Which channel to use (Email, Push, In-App)
  • Whether the user has disabled certain notifications
  • Whether the user is currently active in the app

We store user preferences and last login time in a cache for quick access.

For example:

  • If the user is active → send only In-App notification
  • If the user is offline → send Push
  • If Push fails → fallback to Email

This makes the system smarter and more user-friendly.

3. Making It Pluggable
We don’t want to tightly couple our system to just Email or Push.

Instead, we design it so that each notification channel implements a common interface.

That way, if we want to add:

  • SMS
  • WhatsApp
  • Slack

We can plug it in without rewriting the core logic.

This keeps the system flexible and future-proof.

Final Thoughts
What started as “just send a notification” quickly becomes a distributed system problem.

By introducing:

  • A message queue for decoupling
  • Worker-based async processing
  • Exponential backoff retries
  • Dead Letter Queues
  • A centralized Notification Engine
  • User preference caching

We build a system that is scalable, resilient, and production-ready.

Simple feature. Complex engineering.

And that’s the fun part.

Top comments (0)