DEV Community: Akshat Jain

Building a Subscription Management System That Actually Scales

Akshat Jain — Wed, 13 May 2026 16:38:21 +0000

Here’s what I learned building a subscription management system from scratch.

If you’ve ever built an app, you’ve probably started the same way I did a frontend, a backend, and a database.

It works. It feels clean. You ship features quickly.

But then the app grows.

You add things like subscription tracking, renewal reminders, analytics, maybe even notifications. Suddenly, what used to feel simple starts getting messy. Logic spreads across different parts of the codebase. Small changes start breaking unrelated features. Debugging takes longer than building.

This is exactly where most subscription management systems start to struggle.

The problem isn’t that the system is “wrong.” It’s that it was designed for a smaller version of reality.

In the beginning, a subscription system looks simple:

Store subscription details
Track billing cycles
Send reminders

But in a real-world scenario, things expand quickly:

Users can have multiple subscriptions
Each subscription has different billing cycles
Notifications need to be timely and reliable
Data needs to be consistent across the system

What started as a few database tables becomes a system that needs to handle coordination, timing, and scale.

And here’s the key insight I learned the hard way:

A system that works for 10 users is not the same system that works for 10,000 users.

Most failures don’t happen because of bad code. They happen because the system wasn’t designed to evolve.

That’s when I stopped thinking in terms of “features”… and started thinking in terms of system design.

Starting with Back-of-the-Envelope System Design

Before writing more code, I stepped back and did something I should’ve done earlier a quick back-of-the-envelope design.

Nothing fancy. No diagrams. Just rough thinking.

If this system actually grows, what does it need to handle?

Let’s say:

10,000 users
Each user has ~5 subscriptions
That’s 50,000 active subscriptions
Each subscription may trigger 1–2 notifications per cycle

Now you’re looking at:

Thousands of reads per day (dashboard, listings)
Frequent writes (new subscriptions, updates)
Time-based events (renewals, reminders)

Suddenly, this is no longer a “simple app.”

It’s a system with different types of workloads:

User authentication
Subscription management
Scheduled processing (renewals, alerts)
Notification delivery

And here’s where things get interesting.

Not all parts of the system behave the same way.

Authentication needs to be fast and secure
Subscription data needs consistency
Notifications need reliability, not just speed
Analytics is read-heavy

Trying to handle all of this in a single backend quickly becomes messy.

You either:

Overload one system with too many responsibilities
Or start adding hacks to make things “just work”

Neither scales well.

This quick exercise changed how I approached the system.

Instead of thinking:

“What feature do I build next?”

I started asking:

“What kind of system does this feature belong to?”

That shift is what naturally led me toward a more scalable architecture.

Breaking the Monolith and moving to a Scalable Architecture

At this point, it became clear that adding more features to a single backend was only going to make things worse.

Everything was tightly connected.

Authentication logic mixed with business logic
Subscription updates triggering notification logic directly
Small changes forcing full redeploys

It worked but it didn’t scale.

So instead of patching the system again, I made a bigger shift:

I started breaking the system into separate services.

Not because microservices are trendy, but because the system needed clear boundaries.

I split the application based on responsibilities:

Authentication became its own service
Subscription logic became isolated
Notifications were handled independently

This immediately brought clarity.

Each part of the system now had a single responsibility.

Changes in one service didn’t risk breaking everything else.

But let’s be honest this didn’t magically make things easier.

In fact, it introduced a new set of problems:

How do services talk to each other?
How do you manage authentication across services?
Where does request routing happen?

This is where the architecture evolved further.

Instead of clients directly calling multiple services, I introduced an API Gateway as the single entry point. Every request flows through it, and it decides where to route it.

This solved a lot of chaos:

Centralized routing
Consistent authentication checks
Cleaner client-side logic

Now the system looked less like a single app and more like a coordinated set of services.

And that’s the key difference.

You’re no longer building “an app.”

You’re building a system where multiple components work together.

It’s more complex, yes.

But it’s also the first step toward something that can actually scale.

Inside the System: How Each Service Works

Once the system was split into services, the next challenge was making sure each one had a clear role. Without that, you just end up with distributed chaos instead of a clean architecture.

Here’s how the system is structured today.

At the front, there’s the mobile application built with React Native. This is what users interact with — adding subscriptions, viewing dashboards, and receiving alerts.

Every request from the app goes through the API Gateway. Think of it as the system’s front door. It handles routing, basic filtering, and ensures requests reach the correct service without the client needing to know internal details.

Behind that, the system is divided into focused services.

The Auth Service is responsible for authentication and security. It handles login, registration, JWT token generation, and role-based access control. This keeps security isolated and consistent across the system.

The User Service manages user-related data profiles, preferences, and account settings. It doesn’t deal with subscriptions directly, which keeps responsibilities clean.

The core of the system is the Subscription Service. This is where all the main business logic lives creating subscriptions, managing billing cycles, calculating renewals, and powering analytics.

The Help/Support Service provides a guided resolution system using a decision-tree approach, helping users troubleshoot common issues step-by-step and reducing the need for manual support intervention.

Then comes the Notification Service, which handles sending alerts. Whether it’s email, in-app notifications, or push alerts, this service ensures users are notified at the right time without blocking the main application flow.

And then there’s the most interesting part — the LLM Service.

Instead of forcing users to manually fill forms, this service allows them to send a simple message like:

“Netflix ₹499 monthly”

The service processes that input, extracts structured data, and creates a subscription event automatically in the database. It sits alongside the system, enhancing it without tightly coupling with core logic.

Each service focuses on one thing.

They communicate through APIs, not shared logic.

This separation is what makes the system maintainable as it grows.

Because at scale, clarity matters more than cleverness.

Making the System Feel Intelligent (LLM + Automation)

Up to this point, the system was scalable. It was structured. It could handle growth.

But it still felt… like a system.

Users had to:

Open the app
Fill forms
Enter subscription details manually

It worked. But it wasn’t great.

That’s when I started thinking about a different problem:

What if users didn’t have to “use” the system at all?

What if they could just say what they did, and the system handled the rest?

That idea led to integrating an LLM-based service into the architecture.

Now instead of filling a form, a user can type something like:

“Bought Spotify for ₹199 per month”

That message goes to the LLM service, which processes it and extracts:

Subscription name
Cost
Billing cycle
Relevant timing details

Once parsed, it creates a structured event and pushes it into the system just like any manually created subscription.

From the user’s perspective, it feels almost invisible.

No forms. No friction. Just intent → action.

And importantly, this didn’t break the architecture.

The LLM service is not tightly coupled with the core system. It acts as an intelligent layer on top:

It receives input
Transforms it
Sends structured data to the Subscription Service

That’s it.

This design keeps the system clean while still enabling powerful behavior.

I’ve written in detail about how this LLM pipeline works and how it processes natural language into database-ready events in a separate articles — [LINK]. But at a high level, the goal here wasn’t just automation.

It was about reducing user effort to near zero.

Because scalability isn’t just about handling more users.

It’s also about making the system easier to use as it grows.

What I Learned Building for Scale

Looking back, the biggest lesson wasn’t about microservices, APIs, or even system design.

It was about how systems evolve.

I didn’t start with a perfect architecture. I started with something simple that worked. And that’s important because trying to over-engineer from day one usually slows you down more than it helps.

But at some point, the system outgrows its original design.

That’s where most developers struggle.

You can either:

Keep patching the existing system
Or step back and rethink the architecture

I learned that scalability isn’t about adding more code.

It’s about creating the right boundaries.

Breaking the system into services wasn’t just about scaling traffic. It was about:

Isolating failures
Making changes safer
Keeping logic understandable

At the same time, microservices are not a silver bullet.

They introduce:

Network complexity
Distributed debugging
More infrastructure to manage

So the goal isn’t “use microservices.” The goal is:

Use the right level of complexity for the problem you have.

Another key shift was thinking beyond features.

Early on, I was focused on:

“Add subscriptions”
“Add notifications”

Later, the thinking changed to:

“How do these parts interact?”
“What happens when this grows 10x?”

And finally, the most interesting realization came from adding the LLM layer.

A system can be scalable and still feel heavy to use.

But when you reduce friction when users don’t have to think in terms of forms and fields you unlock a completely different experience.

That’s when the system stops feeling like software… and starts feeling intuitive.

Conclusion

It’s about evolving the system as the problem grows.

I began with a simple setup — one backend, one database, straightforward APIs. And for a while, that was enough. But as soon as real-world complexity entered the picture — multiple subscriptions, notifications, automation — the cracks started to show.

That’s when the shift happened.

From thinking in terms of features… to thinking in terms of systems.

Breaking the application into services brought structure. Introducing an API Gateway brought control. Separating responsibilities made the system easier to reason about.

And then, adding an intelligent layer with the LLM service changed how users interact with the system entirely.

Because in the end, scalability isn’t just about handling more traffic.

It’s about:

keeping systems maintainable
reducing friction for users
and designing something that can grow without collapsing under its own complexity

If there’s one thing I’d take away from this journey, it’s this:

Don’t try to build a perfect system from day one.

Build something that works and be ready to redesign it when it stops working.

That’s how real systems are built.

Scaling Myths That Mislead Developers

Akshat Jain — Sat, 09 May 2026 15:28:37 +0000

Why common assumptions about scaling lead to fragile systems

Scaling is often seen as a technical problem.

More users arrive, and the system needs to handle increased load.

However, many scaling failures are not caused by lack of resources.

They are caused by incorrect assumptions.

These assumptions shape how systems are designed.

When they are wrong, scaling becomes difficult, expensive, and unreliable.

Understanding these myths is important for building systems that perform well under real conditions.

The “just add more servers” myth

A common belief is that scaling can be solved by adding more machines.

Horizontal scaling does increase capacity, but it does not fix underlying issues.

If the system has:

inefficient queries
tight coupling
shared bottlenecks

Adding more servers only distributes the problem.

In some cases, it can make things worse by increasing coordination overhead and system complexity.

Scaling works only when the architecture supports it.

Premature optimization

Optimization is often applied before understanding real bottlenecks.

Developers may try to:

reduce latency early
optimize code paths unnecessarily
introduce complexity without clear need

This leads to systems that are harder to maintain and reason about.

Without real usage data, optimization decisions are based on assumptions.

Effective scaling requires understanding where the system actually struggles, not where it might struggle.

Tech over fundamentals

There is a tendency to rely on tools and technologies as solutions.

new frameworks
distributed systems
advanced infrastructure

While these tools are useful, they do not solve fundamental design problems.

Poor data modeling, inefficient workflows, and lack of clear boundaries cannot be fixed by adding new technology.

Scaling is primarily a design problem, not a tooling problem.

Confusing performance with scalability

Performance and scalability are related but different.

Performance refers to how fast a system responds under a given load.

Scalability refers to how well a system maintains performance as load increases.

A system can be fast at low traffic but fail under higher load.

Similarly, a scalable system may not be extremely fast, but it maintains stability as usage grows.

Focusing only on performance can hide scalability issues.

Ignoring system limits

Every system has limits.

database capacity
network throughput
processing power

Scaling requires understanding these limits and how they are reached.

Ignoring them leads to unexpected failures when the system is pushed beyond its capacity.

Design decisions should consider where limits exist and how they can be managed.

Assuming linear growth

Another common assumption is that systems scale linearly.

If a system handles 100 requests per second, it is expected to handle 200 with double the resources.

In practice, this is rarely true.

Contention, coordination overhead, and shared dependencies introduce nonlinear behavior.

Performance often degrades faster than expected as load increases.

Conclusion

Scaling is not just about handling more traffic.

It is about how systems behave as conditions change.

Misconceptions about scaling lead to incorrect design decisions, which become visible under pressure.

By focusing on fundamentals, understanding limits, and avoiding common myths, systems can be designed to scale more reliably.

This concludes the series.

Thanks for reading.

Why Your APIs Feel Slow (Even When They Aren’t)

Akshat Jain — Thu, 07 May 2026 16:21:12 +0000

Understanding the gap between actual performance and perceived latency

In previous parts, we explored how backend systems behave under load and how design decisions impact performance.

However, not all performance issues come from slow systems.

In many cases, the backend is fast, but the API still feels slow.

This difference comes from how latency is experienced, not just how it is measured.

API performance is not only about execution time. It is also about network behavior, data transfer, and how requests are structured.

Network latency matters

Every API call travels over a network.

Even if the backend processes a request quickly, the total time includes:

travel time from client to server
routing through multiple network hops
return time for the response

This delay exists even when the backend is efficient.

For users located far from the server, or on unstable networks, this latency becomes noticeable.

As a result, a fast backend can still feel slow due to network distance and conditions.

Payload size issues

The size of the response directly affects how long it takes to deliver.

Larger payloads require more time to:

transfer over the network
process on the client side

Even small increases in payload size can add noticeable delay, especially on slower connections.

Returning more data than necessary increases latency without improving functionality.

Efficient APIs focus on sending only what is required.

Too many API calls

Frontend applications often depend on multiple API calls.

Instead of one request, the system may perform several smaller requests to gather data.

For example:

one call for user data
another for related items
another for additional details

Even if each call is fast, the total time adds up.

Sequential calls increase delay further, as each request waits for the previous one.

This creates the perception of a slow system, even when individual endpoints are efficient.

Serialization and deserialization cost

Data needs to be converted before it is sent and after it is received.

On the server:

objects are serialized into formats like JSON

On the client:

responses are parsed back into usable data

This process takes time.

While the cost is small per request, it becomes noticeable with large payloads or frequent calls.

It adds hidden overhead that is often ignored during performance evaluation.

Frontend rendering delays

API performance is often judged by how quickly users see results.

Even after the response arrives:

data must be processed
UI must be updated
components must render

These steps add delay beyond the API response time.

From the user’s perspective, the system feels slow, even if the backend responded quickly.

Lack of parallelism in requests

When API calls are made sequentially, total latency increases.

Each request waits for the previous one to complete.

If multiple independent requests are needed, this approach wastes time.

Parallel execution can reduce total wait time, but it is not always implemented.

This leads to unnecessary delays in response delivery.

Conclusion

API performance is not only about backend speed.

It is influenced by network latency, payload size, request patterns, and client-side processing.

A system can be technically fast but still feel slow to users.

Understanding this difference helps in designing APIs that are efficient not only in execution, but also in experience.

In the next part, we will explore load testing and why many systems fail to identify performance limits early.

Thanks for reading.

How I Built an LLM Service That Converts Natural Language into Database Events

Akshat Jain — Tue, 05 May 2026 16:08:45 +0000

You open the app, fill fields, select options, and submit.

It works but it’s friction.

I wanted something simpler.

What if a user could just say:

_**“Netflix ₹499 monthly”** …and the system handles everything?_

The Core Idea

Instead of forcing users to adapt to the system…

Make the system adapt to the user.

The pipeline looks like this:

Each step reduces ambiguity and moves toward structured data.

Step 1 — Handling Voice & Text Input

The system doesn’t just rely on one type of input.

Users can either:

Speak (“Netflix ₹499 monthly”)
Type a quick message (just like a notification or note)

So the first step is to normalize everything into plain text.

If the input is voice, we convert it using a speech-to-text service.

If it’s already text, we process it directly.

The goal is simple: everything becomes text before any processing begins.

Example Input

\# Case 1: User typed a message (like a quick note)  
user\_input = "Netflix 499 monthly"  

\# Case 2: Voice input (after speech-to-text conversion)  
voice\_transcribed = "Spotify 199 per month"

Basic Handling Layer

def normalize\_input(input\_data, input\_type="text"):  
    if input\_type == "voice":  
        # Simulated speech-to-text (replace with real API)  
        text = input\_data  # already transcribed  
    else:  
        text = input\_data  
    return text.lower().strip()  

\# Example usage  
text\_input = normalize\_input(user\_input, "text")  
voice\_input = normalize\_input(voice\_transcribed, "voice")  
print(text\_input)  
print(voice\_input)

Why This Step Matters

This step might look simple, but it’s critical.

Because:

It creates a single entry point for all inputs
It keeps downstream logic clean
It allows you to support multiple input methods easily

And more importantly:

It makes the system feel natural — users can just “say” or “type” what they did.

Step 2 — Lightweight Regex Filtering

Before sending everything to the LLM, I added a simple filter.

Why?

Because not all inputs are subscription-related.

This saves cost and improves accuracy.

import re  

def is\_subscription(text):  
    patterns = \[  
        r'\\b(monthly|yearly|weekly)\\b',  
        r'₹\\d+',  
        r'\\b(netflix|spotify|amazon|prime)\\b'  
    \]  

    return any(re.search(p, text.lower()) for p in patterns)  
\# Example  
print(is\_subscription(user\_input))  # True

If it’s not a subscription, we can route it elsewhere.

Step 3 — LLM Parsing

Now comes the important part — extracting structured data.

We send the filtered input to an LLM with a strict prompt.

from openai import OpenAI  

client = OpenAI()  
def parse\_subscription(text):  
    prompt = f"""  
    Extract subscription details from the input.  
    Return JSON with fields:  
    name, cost, billing\_cycle  
    Input: "{text}"  
    """  
    response = client.chat.completions.create(  
        model="gpt-4o-mini",  
        messages=\[{"role": "user", "content": prompt}\]  
    )  
    return response.choices\[0\].message.content  
\# Example  
result = parse\_subscription(user\_input)  
print(result)

Expected output:

{  
  "name": "Netflix",  
  "cost": 499,  
  "billing\_cycle": "monthly"  
}

Step 4 — Structuring the Event

Now we convert this into a system event.

import json  
from datetime import datetime  

def create\_event(parsed\_json):  
    data = json.loads(parsed\_json)  

    event = {  
        "type": "SUBSCRIPTION\_CREATED",  
        "timestamp": datetime.utcnow().isoformat(),  
        "payload": {  
            "name": data\["name"\],  
            "cost": data\["cost"\],  
            "billing\_cycle": data\["billing\_cycle"\]  
        }  
    }  

    return event  
event = create\_event(result)  
print(event)

Step 5 — Saving to Database

Finally, store it.

def save\_to\_db(event):  
    # Replace with actual DB logic  
    print("Saving to DB:", event)  

save\_to\_db(event)

Why This Works

This system feels simple, but a few design decisions make it powerful:

1. Regex Before LLM

Filters irrelevant input
Reduces cost
Improves signal

2. LLM for Structure, Not Logic

LLM extracts meaning
System enforces rules

3. Event-Based Design

Everything becomes an event
Easy to extend (notifications, analytics, etc.)

Where This Gets Interesting

Once this pipeline is in place, you can extend it easily:

Add reminders automatically
Trigger notifications
Detect duplicates
Categorize spending

And most importantly:

The user doesn’t feel like they’re using a system.

They just type or speak naturally or we can take permission and extract messages from cell phone.

Final Thought

This isn’t about AI.

It’s about reducing friction.

Forms make users adapt to systems.

Natural language lets systems adapt to users.

And that small shift makes everything feel… effortless.

Observability: You Can’t Fix What You Can’t See

Akshat Jain — Sun, 03 May 2026 15:09:41 +0000

Understanding system behavior beyond logs and dashboards

In previous parts, we explored how systems fail under load and how design decisions influence performance.

But identifying failures is a different challenge.

A system may be slow, unstable, or partially broken, yet the cause is not always visible.

This is where observability becomes important.

Observability is not just about collecting data.

It is about understanding how a system behaves internally by looking at its outputs.

Logs, metrics, and traces

Observability is built on three main signals.

Logs provide discrete records of events.

They show what happened at a specific point in time.

Metrics provide aggregated numerical data.

They show trends such as latency, error rates, and throughput.

Traces provide request level visibility.

They show how a single request moves through different components.

Each of these serves a different purpose.

Logs help in understanding specific events.

Metrics help in identifying patterns.

Traces help in connecting events across the system.

None of them is sufficient on its own.

Lack of visibility delays fixes

When systems lack observability, problems remain hidden.

Failures may exist in small forms:

slight latency increases
occasional errors
resource usage spikes

These signals are often missed without proper visibility.

Over time, these small issues grow.

By the time they become noticeable, the system is already under stress or failing.

Lack of visibility does not prevent problems.

It delays their discovery.

Correlation is key

Modern systems are distributed.

A single request may pass through multiple services, databases, and external APIs.

Observing each component separately is not enough.

The key is to connect events across components.

Correlation allows understanding of:

how one service affects another
where latency is introduced
how failures propagate

Without correlation, data remains fragmented.

With correlation, it becomes possible to identify root causes instead of symptoms.

The problem of too many metrics

Collecting more data does not always improve observability.

Large systems often generate thousands of metrics.

This creates noise.

When everything is measured, it becomes harder to identify what actually matters.

Important signals get lost among less relevant data.

Effective observability focuses on meaningful metrics:

latency
error rates
system saturation

The goal is not to measure everything, but to measure what reflects system behavior.

Observability as a system property

Observability is not something added later.

It must be part of system design.

Systems should be built in a way that their internal state can be inferred from external outputs.

This includes:

structured logging
consistent metrics
traceable request flows

Without this, understanding system behavior becomes difficult, especially under load.

Conclusion

Observability defines how well a system can be understood from the outside.

Without it, diagnosing issues becomes slow and uncertain.

With it, systems become easier to analyze, debug, and improve.

Performance issues, failures, and bottlenecks are not always obvious.

They must be observed, connected, and interpreted.

In the next part, we will look at common scaling myths that often mislead developers when designing systems.

Thanks for reading.

Load Testing: Why Most Developers Do It Wrong

Akshat Jain — Fri, 01 May 2026 15:31:42 +0000

Why testing for stability often hides the real limits of your system

In previous parts, we explored how systems behave under pressure.

Load testing is meant to reveal those behaviors before they appear in production.

However, many systems still fail unexpectedly, even after being tested.

The issue is not the absence of testing.

It is how testing is approached.

Load testing is often treated as a validation step, rather than a method to understand system limits.

Load Testing

Testing average instead of peak

Most load tests simulate normal conditions.

expected number of users
typical request patterns
stable traffic levels

Under these conditions, systems usually perform well.

However, real failures occur under peak conditions, not average ones.

Traffic spikes, sudden bursts, and extreme concurrency reveal issues that normal testing cannot.

Testing only average load gives a false sense of confidence.

It confirms that the system works, but not how it behaves under stress.

Unrealistic test scenarios

Load tests often use simplified or artificial traffic patterns.

uniform request distribution
predictable intervals
identical requests

Real user behavior is different.

traffic comes in bursts
request patterns vary
some endpoints are used more than others

Because of this mismatch, tests fail to capture real-world complexity.

The system passes the test but fails in production, where conditions are less predictable.

Ignoring system limits

A key purpose of load testing is to identify limits.

maximum throughput
latency thresholds
resource saturation points

However, many tests stop once the system appears stable.

They measure success instead of exploring failure.

Without pushing the system to its limits, it is not possible to understand:

when performance starts degrading
how quickly failures spread
which component fails first

Understanding limits is more valuable than confirming stability.

No continuous testing

Load testing is often treated as a one-time activity.

It is performed before release and then ignored.

However, systems evolve over time.

new features are added
traffic patterns change
dependencies are updated

These changes affect performance.

A system that was stable earlier may degrade gradually.

Without continuous testing, these changes go unnoticed until failure occurs in production.

Lack of failure analysis

Many load tests focus on metrics like response time and throughput.

But they do not analyze how the system fails.

Important questions are often ignored:

does the system degrade gradually or suddenly
which component fails first
how failures propagate

Understanding failure behavior is essential for improving system design.

Without it, testing provides limited insight.

No correlation with real metrics

Load testing results are often viewed in isolation.

They are not always compared with real system metrics such as:

CPU usage
memory consumption
database performance

Without this correlation, it is difficult to identify the root cause of performance issues.

Testing shows that a problem exists, but not why it exists.

Conclusion

Load testing is not just about checking if a system works.

It is about understanding how the system behaves under pressure.

Testing average conditions, using unrealistic scenarios, and avoiding system limits leads to incomplete results.

To be effective, load testing must explore extremes, reflect real-world usage, and evolve with the system.

In the next part, we will look at observability and why understanding system behavior is essential for fixing performance issues.

Thanks for reading.

How I Built a Decision-Tree Based Help and Support System

Akshat Jain — Wed, 29 Apr 2026 16:10:54 +0000

80% of user problems are repeated patterns.

So why are we solving them manually every time?

If you’ve ever built a help and support system, you’ve probably done this

Add a few FAQs, maybe a help page, and a “Contact Us” button.

It feels enough.

But then users start reaching out… and you notice something strange.

They’re asking the same questions. Over and over again.

“How do I add a subscription?”

“Why is my billing date wrong?”

“Where can I see my payments?”

At first, it feels like users aren’t reading.

But that’s not the real problem.

The real problem is this:

Most help systems are designed for information.

Users need guidance.

A static FAQ assumes the user already knows:

what their problem is
what to search for
which answer applies to them

In reality, most users are confused at the first step.

They don’t think in terms of categories like:

“billing issues”
“subscription errors”

They think in situations:

“something is not working”
“I don’t understand this screen”

And here’s where things get interesting.

When I started looking at support requests closely, I realized something:

A large percentage of problems were repeated patterns.

Not unique cases.

Just the same issues showing up again and again:

multiple users struggling to add a subscription
users misunderstanding renewal dates
confusion around monthly vs yearly billing

This changed how I looked at the problem.

Instead of building a better FAQ…

I started thinking:

What if the system could guide users step-by-step to the solution instead of expecting them to find it?

That idea is what led to building a decision-tree based help system.

Thinking Like a System, Not a Page

After noticing that most user issues were repetitive, the problem became clearer.

The issue wasn’t lack of content.

It was lack of direction.

Traditional help systems are built like documentation:

Lists of FAQs
Search bars
Static categories

But users don’t navigate problems like that.

They don’t think:

“Let me go to the billing section and read all options.”

They think:

“Something is wrong what do I do next?”

That shift is important.

Instead of designing a help page, I started designing a guided system.

A system that:

asks the right questions
narrows down the problem
leads the user to a solution

Almost like how a support agent would think.

And that’s where the idea of a decision tree fits naturally.

Instead of overwhelming users with options, you guide them step by step:

What’s the issue?
What exactly went wrong?
When did it happen?

Each answer moves them closer to the solution.

This approach does two things really well:

Reduces user confusion
Reduces repeated support requests

Because now, instead of 20 users asking:

“How do I add a subscription?”

The system guides them through the exact steps automatically.

At this point, the help system stops being passive.

It becomes interactive and problem-solving.

Designing the Decision Tree Structure

Once the idea of a guided system was clear, the next step was structuring it properly.

At its core, the help system is just a decision tree.

Simple concept:

Each node = a question
Each branch = a user choice
Each leaf = a solution or a real person/agent

Instead of showing everything at once, the system reveals only what’s needed at each step.

Here’s a simple example:

Tree Structure

Now compare this to a typical FAQ page.

Instead of scanning 10–15 questions, the user just answers 2–3 guided steps and reaches the solution.

Why This Structure Works

This works well because of one key observation:

Most user problems fall into a limited number of patterns.

For example:

Many users struggle with adding subscriptions
Many get confused about billing cycles
Many face similar payment issues

So instead of handling each request individually, we categorize and guide.

This reduces:

repeated support queries
manual intervention
user frustration

Designing It Properly

While building this, a few principles mattered:

Keep questions simple
Avoid deep nesting (3–5 levels max)
Always provide an exit (contact support)
Log where users drop off

Because if users abandon the flow, that’s where your system needs improvement.

At this point, the structure is clear.

Next step is making it work in code.

Implementing the Decision Tree (Python Code)

Once the structure was clear, implementing it was surprisingly simple.

You don’t need complex frameworks.

A decision tree can be represented using basic objects.

At its core, each node needs:

a question (or condition)
possible next steps
or a final action (solution)

Basic Implementation

Here’s a clean and minimal version:

class Node:  
    def \_\_init\_\_(self, question=None, options=None, action=None):  
        self.question = question  
        self.options = options or {}  
        self.action = action  

    def evaluate(self, context):  
        if self.action:  
            return self.action(context)  

        answer = context.get(self.question)  

        if answer in self.options:  
            return self.options\[answer\].evaluate(context)  

        return escalate\_to\_agent(context)  

\# Actions  
def resolved(ctx):  
    return "Issue Resolved"  

def retry\_payment(ctx):  
    if ctx.get("retry\_success"):  
        return "Payment Successful"  
    return escalate\_to\_agent(ctx)  

def escalate\_to\_agent(ctx):  
    return "Escalating to Customer Support Agent"  

\# Tree Construction  

tree = Node(  
    question="issue\_type",  
    options={  
        "subscription": Node(  
            question="subscription\_problem",  
            options={  
                "add": Node(  
                    question="add\_issue\_type",  
                    options={  
                        "ui": Node(action=resolved),  
                        "error": Node(action=escalate\_to\_agent)  
                    }  
                ),  
                "manage": Node(  
                    question="manage\_issue",  
                    options={  
                        "edit": Node(action=resolved),  
                        "delete": Node(action=escalate\_to\_agent)  
                    }  
                )  
            }  
        ),  
        "payment": Node(  
            question="payment\_problem",  
            options={  
                "failed": Node(action=retry\_payment),  
                "incorrect\_charge": Node(action=escalate\_to\_agent)  
            }  
        )  
    }  
)  

\# Example Context  
context = {  
    "issue\_type": "payment",  
    "payment\_problem": "failed",  
    "retry\_success": False  
}  

print(tree.evaluate(context))

What This Gives You

A flexible structure
Easy to extend (just add nodes)
Clear separation of logic
No hardcoded if-else chains

And most importantly:

You can model real user journeys instead of writing scattered logic.

In a real system, this wouldn’t use input().

Instead:

UI handles selections
Backend returns next node
State is maintained per sessionNow the final step is connecting this to your actual system.

Integrating It into the Real Project

The best part about this help and support system is that it doesn’t need to live separately from the main application.

I integrated it as its own Help/Support Service inside the project architecture.

The flow is simple:

When a user taps Help, the mobile app sends the selected category or current screen context to the service.

For example, if the user is on the Add Subscription page and opens support, the system can already start from a relevant branch of the tree instead of asking generic questions.

This makes the experience feel much smarter.

Reducing Human Effort

The biggest win was reducing repeated manual support effort.

Earlier, if 20 users had trouble adding a new subscription, all 20 would either:

read the same FAQ
message support
wait for a response

Now, the tree handles the majority of these repeated issues automatically.

Some common examples:

unable to add a new subscription
confusion between monthly and yearly plans
payment failure after renewal
missing notification alerts
dashboard analytics not updating

These are pattern-based problems, which makes them perfect for tree traversal.

This means human agents only need to handle edge cases.

Escalation Path

Every branch ends with one of two outcomes:

Resolved automatically
Escalate to human agent

That fallback is important.

Because no matter how good the tree is, some cases will always need human judgment.

The system should help users first, not trap them.

That balance is what makes it practical inside a larger product.

What I Learned Building This

Building a help and support system like this taught me something simple but important:

Most problems are not unique they’re repeated patterns.

Once you accept that, the solution becomes clearer.

You don’t need:

more FAQs
more documentation
more support agents

You need a system that can recognize patterns and guide users.

The decision-tree approach worked well because:

it simplifies user choices
it reduces cognitive load
it scales without increasing support effort

But it’s not perfect.

Some trade-offs:

Deep trees can become hard to manage
Poorly designed questions can confuse users
Edge cases still require human support

So the goal isn’t to replace support.

It’s to handle the predictable 70–80% of issues automatically.

Conclusion

A help system shouldn’t just exist it should actively solve problems.

Most applications treat support as a secondary feature:

static pages
long FAQs
contact forms

But users don’t want information.

They want resolution.

By turning the help system into a decision-tree based flow, you shift from:

passive content → guided experience
repeated queries → automated solutions
manual effort → scalable support

And the result is something that feels natural.

Users don’t feel like they’re navigating a system.

They feel like the system understands them.

That’s when support stops being a feature…

and starts becoming part of the product experience.

Async Processing: The Secret to Surviving Spikes

Akshat Jain — Mon, 27 Apr 2026 16:05:51 +0000

How decoupling work from requests helps systems stay stable under load

In the previous part, we saw the limitations of synchronous systems.

When every request waits for all operations to complete, performance suffers under load. Resources remain blocked, and slow dependencies affect the entire flow.

Asynchronous processing takes a different approach.

Instead of doing all work during the request, it separates immediate responses from background work.

This shift changes how systems handle load, especially during traffic spikes.

Decoupling work from requests

In an asynchronous system, not all work is done in real time.

The request handles only what is necessary for an immediate response.

The remaining work is moved to background processing.

This reduces:

request duration
resource usage during the request
dependency on slow operations

By decoupling work, the system avoids holding resources for long periods and improves overall throughput.

Queues absorb traffic spikes

Queues are a core part of asynchronous systems.

Instead of processing all requests immediately, incoming tasks are stored in a queue and processed at a controlled rate.

This creates a buffer between incoming traffic and system capacity.

During traffic spikes:

requests are queued instead of rejected
processing happens gradually
system load remains stable

Queues do not eliminate load, but they prevent sudden overload.

Improved user experience

Asynchronous systems improve perceived performance.

Users receive faster responses because the system does not wait for all operations to complete.

For example:

a request can be accepted immediately
heavy processing happens in the background
results are delivered later

This reduces user wait time and makes the system feel more responsive.

Event driven architecture basics

Asynchronous systems are often built around events.

Instead of calling services directly and waiting for responses, components emit events when something happens.

Other components react to these events independently.

This model:

reduces direct dependencies between services
allows work to happen in parallel
improves system flexibility

Event driven systems shift the focus from request flow to state changes.

Better resource utilization

Asynchronous processing allows better use of system resources.

Since requests are shorter and less blocking:

threads are freed faster
connections are reused efficiently
overall throughput increases

Background workers can process tasks independently, making better use of available capacity.

Isolation of failures

In synchronous systems, failure in one step affects the entire request.

In asynchronous systems, failures can be isolated.

a background job can fail without blocking user requests
retries can be handled separately
issues remain contained within specific components

This reduces the impact of failures on the overall system.

Trade offs of asynchronous systems

Asynchronous systems are not without challenges.

They introduce:

increased system complexity
delayed consistency
need for monitoring background jobs

Debugging becomes harder because work is distributed across multiple components.

Despite these trade offs, the benefits are significant for systems under variable load.

Conclusion

Asynchronous processing changes how systems handle work.

By separating immediate responses from background tasks, systems can reduce load, improve responsiveness, and handle traffic spikes more effectively.

This approach is especially useful in environments where demand is unpredictable.

In the next part, we will explore why APIs feel slow even when backend systems are fast.

Thanks for reading.

The Hidden Cost of Synchronous Systems

Akshat Jain — Sat, 25 Apr 2026 15:05:22 +0000

Why waiting for every step to finish can quietly slow down your entire backend

In previous parts, we explored how system design choices affect performance.

One such choice is how work is executed.

Many backend systems follow a synchronous model, where each step waits for the previous one to complete.

This approach is simple and easy to reason about.

However, under load, it introduces hidden costs that affect performance, scalability, and user experience.

Blocking requests

In a synchronous system, a request waits until all operations are complete.

During this time, system resources remain occupied.

threads stay blocked
connections remain open
memory is held

This reduces the number of requests the system can handle at the same time.

As traffic increases, blocked resources begin to accumulate, leading to slower responses and reduced throughput.

Slow dependencies lead to slow systems

A synchronous flow depends on the speed of each component.

If one dependency is slow, the entire request becomes slow.

For example:

database queries
external APIs
internal services

Each step adds to the total response time.

The system’s performance becomes limited by its slowest dependency.

This creates a chain where delays propagate across the entire request lifecycle.

User perceived latency

In synchronous systems, users wait for the full operation to complete.

Even if some parts of the work are not immediately required, the response is delayed until everything finishes.

This increases perceived latency.

From the user’s perspective, the system feels slow, even if individual operations are fast.

Reducing perceived latency is not only about speed, but also about how responses are structured.

No parallelism advantage

Synchronous execution processes tasks in sequence.

This limits the ability to use available resources efficiently.

Many operations can be performed independently, but in a synchronous flow they are executed one after another.

This results in:

underutilized resources
longer total processing time
lower system efficiency

Parallel execution can reduce total latency, but synchronous systems do not take full advantage of it.

Limited scalability under load

As traffic increases, synchronous systems struggle to scale.

Each request holds resources for its entire duration.

More requests require more threads, more connections, and more memory.

At some point, the system reaches its limits.

This makes scaling more expensive and less efficient compared to systems that release resources early.

Coupling between operations

Synchronous flows create tight coupling between steps.

Each operation depends on the previous one to complete successfully.

If one step fails or slows down, the entire request is affected.

This reduces flexibility and makes systems more sensitive to failures.

Conclusion

Synchronous systems are simple and predictable, but they come with trade-offs.

They block resources, amplify the impact of slow dependencies, and limit how efficiently a system can scale.

These costs are not always visible at small scale, but they become significant under load.

Understanding these limitations is important when designing systems that need to handle real-world traffic.

In the next part, we will explore asynchronous processing and how it helps systems handle load more efficiently.

Thanks for reading.

Why Microservices Make Performance Worse (If Done Wrong)

Akshat Jain — Thu, 23 Apr 2026 16:16:27 +0000

How breaking your system into services can increase complexity and slow everything down

In the previous part, we discussed how to design systems that survive under pressure.

Microservices are often seen as a solution to scaling and reliability.

But in practice, many systems become slower and harder to manage after moving to microservices.

The problem is not microservices themselves.

The problem is how they are used.

Too many network calls

In a monolithic system, components communicate in memory.

In microservices, communication happens over the network.

Every request between services adds:

network latency
serialization and deserialization cost
additional failure points

A single user request may trigger multiple internal calls.

This increases total response time.

What was once a fast internal function call becomes a slower network operation.

Chatty services problem

Microservices often become too dependent on each other.

Instead of one efficient call, services make many small calls.

For example:

service A calls service B
service B calls service C
service C returns partial data

This creates a chain of requests.

Each call adds latency.

Together, they create significant overhead.

This pattern is known as chatty services.

It is one of the most common causes of slow systems.

Distributed failures

In a distributed system, failures spread easily.

If one service becomes slow or unavailable:

dependent services are affected
requests start timing out
retries increase traffic

This can lead to cascading failures across the system.

Unlike monoliths, where failure is contained, microservices increase the surface area of failure.

Harder debugging

Debugging performance issues becomes more complex.

In a single system, it is easier to trace a request.

In microservices:

requests pass through multiple services
logs are spread across systems
latency is distributed

Finding the root cause requires tracing across multiple components.

Without proper observability, diagnosing issues becomes difficult.

Data consistency challenges

Microservices often manage separate databases.

This improves independence but creates consistency challenges.

data may not be updated at the same time
systems may temporarily disagree
additional logic is required to handle this

Managing consistency adds complexity and can impact performance.

Overengineering too early

Microservices are often adopted too early.

For small systems, they introduce:

more services to manage
more deployment complexity
more communication overhead

Before scaling becomes a real problem, this added complexity slows development and performance.

A simple system becomes unnecessarily complicated.

Conclusion

Microservices are powerful, but they are not a default solution.

They introduce network overhead, increase system complexity, and make failures harder to manage.

When used correctly, they help systems scale.

When used too early or without proper design, they make performance worse.

Choosing the right architecture depends on the problem, not the trend.

In the next part, we will look at synchronous systems and how waiting on responses can slow down your backend.

Thanks for reading.

Designing Systems That Don’t Collapse Under Pressure

Akshat Jain — Tue, 21 Apr 2026 15:49:46 +0000

How to build backend systems that continue to work even when things go wrong

In earlier parts, we saw how systems fail under load.

Traffic increases, dependencies slow down, and small issues turn into full outages.

The goal of system design is not to avoid failure completely.

It is to handle failure in a controlled way.

A well-designed system does not collapse under pressure.

It adapts, limits damage, and continues to function.

Design for failure, not perfection

No system runs perfectly all the time.

Dependencies fail. Networks slow down. Traffic becomes unpredictable.

Designing for perfect conditions creates fragile systems.

Instead, systems should assume that failures will happen.

This changes how components are built:

what happens if a service is unavailable
how the system responds to delays
how errors are handled

Planning for failure makes systems more stable under real conditions.

Add timeouts everywhere

Every external call should have a timeout.

Without timeouts, a request can wait indefinitely for a response.

This blocks threads, connections, and memory.

Under load, these blocked resources accumulate and create pressure on the system.

Timeouts ensure that requests fail fast instead of waiting too long.

This helps in freeing resources and preventing cascading slowdowns.

Use retries carefully

Retries are useful, but they can also be harmful.

When a request fails, retrying may succeed if the failure is temporary.

However, under high load, retries increase traffic.

one request becomes multiple requests
load increases on already stressed services

Uncontrolled retries can worsen the situation.

Retries should be limited, delayed, and used only when necessary.

Introduce circuit breakers

A circuit breaker stops requests to a failing service.

When a dependency is slow or unavailable, continuing to call it wastes resources.

Circuit breakers detect failures and temporarily block calls.

This prevents:

unnecessary load on failing services
delays in dependent systems
spread of failures across the system

Once the service recovers, requests can resume.

Decouple components

Tightly coupled systems fail together.

If one component depends directly on another, failure spreads quickly.

Decoupling reduces this risk.

This can be done using:

asynchronous communication
message queues
clear service boundaries

Loose coupling ensures that one failure does not bring down the entire system.

Use queues to absorb spikes

Traffic is not always steady.

Sudden spikes can overload services.

Queues act as buffers.

Instead of processing everything immediately, requests are stored and handled gradually.

This helps in:

smoothing traffic
protecting downstream services
maintaining stability during bursts

Queues do not remove load, but they control how it is handled.

Monitor meaningful metrics

System health cannot be understood without visibility.

Important metrics include:

latency
error rate
throughput

These metrics show how the system behaves under load.

Monitoring helps in detecting problems early and understanding where pressure is building.

Collecting too many metrics is not useful. Focus should be on signals that reflect real system behavior.

Keep buffer capacity

Systems should not run at full capacity.

If CPU, memory, or connections are always near their limits, even a small increase in load can cause failure.

Keeping buffer capacity provides room to handle:

sudden traffic spikes
temporary slowdowns
unexpected events

This headroom is important for stability.

Graceful degradation

When a system is under stress, it should not fail completely.

Instead, it should reduce functionality in a controlled way.

Examples include:

returning partial data
disabling non-critical features
serving cached responses

This allows the system to remain usable even during issues.

Graceful degradation improves user experience and prevents total outages.

Conclusion

System design is not just about performance.

It is about how the system behaves under stress.

Failures are unavoidable, but uncontrolled failures are not.

By designing for failure, limiting impact, and maintaining control over load, systems can remain stable even under pressure.

In the next part, we will look at microservices and how they can introduce new performance challenges if not designed carefully.

Thanks for reading.

Why Your Database Becomes the Bottleneck

Akshat Jain — Sun, 19 Apr 2026 15:59:40 +0000

Why most backend performance issues eventually lead back to the database

In Part 1, we saw how systems collapse under pressure.

In Part 2, we saw how caching can help or hurt.

Now we look at the most common bottleneck in backend systems:

The database.

Almost every request touches it.

So when it slows down, everything slows down.

Every request depends on the database

Most backend operations rely on the database.

fetching data
storing updates
validating state

This makes it a central dependency.

If the database is slow, your entire system feels slow.

There is no easy fallback.

Connection pool exhaustion

Databases support limited connections.

Under high traffic:

all connections get used
new requests wait in queue
latency increases

This happens even before the query runs.

If the wait time grows, requests start failing.

Slow queries under load

Queries that look fast at low traffic become slow at scale.

Because now:

many queries run together
resources are shared
contention increases

Even a small delay per query becomes a big problem when multiplied across thousands of requests.

Lack of proper indexing

Without indexes, the database scans large data to find results.

At small scale, it may work.

At large scale, it becomes expensive.

This increases:

response time
CPU usage
overall system load

Indexes are one of the simplest and most ignored optimizations.

N plus 1 query problem

Instead of one efficient query, the system makes many small queries.

Example:

fetch list
then fetch details one by one

This increases:

number of DB calls
total latency
load on database

At scale, this becomes a major bottleneck.

Write heavy operations

Writes are more expensive than reads.

Frequent writes can:

lock rows
block reads
increase contention

When reads and writes happen together, they slow each other down.

No read write separation

Using a single database for everything creates pressure.

Reads and writes compete for the same resources.

A better approach:

primary database for writes
replicas for reads

Without this, scaling becomes harder.

Inefficient data modeling

Poor schema design creates long-term problems.

too many joins
deeply nested relations
unnecessary complexity

This makes queries slower and harder to optimize.

Good design reduces work before optimization is even needed.

Unbounded queries

Queries without limits can become dangerous.

fetching too much data
no pagination
large scans

These queries consume more memory and take longer to execute.

Under load, they affect other queries as well.

Locking and contention

When multiple operations try to access the same data, locks are created.

Too many locks lead to:

waiting queries
slower execution
reduced throughput

This is common in write-heavy systems.

Database scaling limits

Databases have limits.

Vertical scaling can only go so far:

CPU limits
memory limits
cost increases

Beyond a point, adding more power does not help.

You need better design, not just bigger machines.

Conclusion

It has limited resources and handles critical operations.

As load increases, small inefficiencies become visible.

Most performance issues are not sudden.

They build slowly and show up when the system is under pressure.

Understanding these patterns helps in avoiding common mistakes.

In the next part, we will look at rate limiting and how controlling traffic can prevent overload.

Thanks for reading.