DEV Community: Neel-Vekariya

How to Prevent Hanging Network Calls — What I Found After Investigating a Real Performance Issue

Neel-Vekariya — Mon, 13 Jul 2026 03:26:53 +0000

In the past two months I explored a lot of things Express, Fastify, MongoDB, PostgreSQL, Node.js. I built around three to four projects in this time.

All of them focused on backend because I haven't learned React yet. But the next one will be full stack.

While building these projects, I noticed one issue that kept repeatedly coming back.

Every time I called a database or an external service something that depends on a network call things would get weird.

Sometimes the response never came. Sometimes it took way longer than it should. And this was creating a performance issue in my application that I couldn't ignore.

So I investigated it properly. And I found that this is actually a common problem a lot of developers face this. And there's a standard solution for it.

The Solution is a Timeout Wrapper

The idea is simple. When you make a network call, instead of just calling it directly, you pass it through a timeout wrapper.

If the call takes more time than it should, the wrapper skips or terminates the call and moves on.

This keeps your application from getting stuck waiting on something that's never going to respond in time.

Two Functions : Two Different Approaches In Timeout

While building this, I found there are two ways to implement a timeout wrapper. Both solve the same problem but work very differently.

1. withTimeout()

export function withTimeout(promise, ms, errorMessage = "Operation timed out") {
    const timeout = new Promise((_, reject) =>
        setTimeout(() => reject(new Error(errorMessage)), ms)
    )
    return Promise.race([promise, timeout])
}

This function takes two things the promise (which is your network call) and the time limit in milliseconds.

Inside, it creates a second promise called timeout. This promise does one thing if the given time runs out, it rejects with an error message.

Then it uses Promise.race(). This function takes both promises and returns whichever one finishes first.

If your network call responds in time, Promise.race gives you that response. If the timeout fires first, you get the error.

Simple. Clean. Works.

But there's a catch I found and this is the important part.

withTimeout() doesn't actually terminate the original network call. If the timeout fires and you move on, the network call is still running in the background.

The socket is still open. Memory is still being used. CPU is still consumed. The request finishes eventually you just stopped waiting for it. The resources are still gone.

This is where the second function solves a different problem.

2. fetchWithTimeout()

export function fetchWithTimeout(url, options = {}, timeoutMs = 10_000) {
    const controller = new AbortController()
    const timer = setTimeout(() => controller.abort(), timeoutMs)

    return fetch(url, { ...options, signal: controller.signal })
        .finally(() => clearTimeout(timer))
}

This function takes the URL, options for the network call, and the timeout.

The key difference here is AbortController. This is a browser and Node.js API specifically built for canceling asynchronous operations without creating memory leaks.

You create a controller, attach its signal to the fetch call, and start a timer. If the timer runs out before the call responds controller.abort() is called.

The fetch is terminated. Not just skipped actually terminated. All the resources that were being consumed get freed.

And the finally() at the end makes sure the timer gets cleared if the call does respond in time. No timer left hanging.

The Real Difference Between Both

This is the thing I want to make clear because it took me a bit to fully understand it.

withTimeout() skips the wait. If the call takes too long, you move on. But the call is still running. Resources are still being consumed. You just stopped caring about the response.

fetchWithTimeout() terminates the call. If it takes too long, the request is actually cancelled. Network socket freed. Memory freed. CPU freed.

For most external API calls where you're using fetch, fetchWithTimeout is the right choice because it actually cleans up after itself.

For other types of async operations database queries, file reads, anything that returns a promise withTimeout is what you'd use. Just know that the operation continues in the background even after the timeout fires.

Why This Actually Matters

When I was building my projects and network calls were just hanging my application was stuck. The user was waiting. The server was waiting. Nothing was happening.

After adding timeout wrappers, if a call hangs, the application moves on. The user gets a proper error response instead of waiting forever. And with fetchWithTimeout, the resources actually get freed instead of piling up in the background.

This is the kind of thing that feels small but makes a real difference in how your application behaves under real conditions.

One investigation. One problem. Two functions that solved it.

If I got something wrong or anything can be improved please drop it in the comments. I'm still learning and I want to get this right.

The Next Token, Not the Next Script

Neel-Vekariya — Wed, 08 Jul 2026 05:45:51 +0000

Everyone is using AI to write code now. But I think the real future of programming isn't AI generating full scripts or full products it's AI predicting the next token, and us deciding whether that token is right or not.

Here's why. When you write code manually, you're already doing something similar in your head predicting the next line based on patterns you've seen before, then checking if it's correct as you write it. Every line passes through your judgment.

When AI generates a full script or product, that judgment gets skipped. You review the code, sure, but not the way you'd review your own. You skim "this line does X, this line does Y" and move on. You're checking outcomes, not making decisions. That's a completely different mental process, and it's shallower.

But when AI suggests the next token, or the next line, instead of the next thousand lines, you're forced back into the same loop you'd use writing manually: is this right, or not? You stay the one making the call.

That's why I think the best use of AI in programming isn't generate my whole app. It's predict what comes next, and let me decide. Smaller suggestions keep engineers in the driver's seat. Full generations quietly take the wheel.

Request Scoped Context – How I Stopped Passing Arguments Everywhere

Neel-Vekariya — Mon, 29 Jun 2026 07:40:02 +0000

In my past two backend projects, I use to pass info about user manually. I thought this is the right method. Every time some function need info, I pass it as parameter. Simple. Direct. Clean enough.

Then I started building a cloth rent booking app.

In start everything was fine. I had orderId coming in with the request. I pass it to my booking service. Service pass it to availability checker. I thought — this is clean approach. I feel good about the code.

Then the app start getting more complex.

Now my orderId need to travel from controller to booking service to availability checker to pricing calculator to logger. Every function in this chain need that orderId as parameter.

Even functions which are not using orderId directly they just holding it for a moment and passing it to next function. Function is doing nothing with this parameter except passing it forward.

Like a relay race where some runners are not actually running. They just standing there, holding the baton, waiting to give it to someone else.

This is what called parameter pollution. I didn't know the name then. I just knew something is wrong.

The second problem hit me when I need to add new info deep in the execution. That mean I go back and add that parameter to every function in the chain.

Not because all function use it just because they are sitting in the middle. One change, five files touched. This is tight coupling. Changing something simple create maintenance overhead everywhere.

Third problem is logging. To attach orderId to every log entry I need to carry it all the way down the entire execution path. More parameters. More same work again and again. Copy paste same argument in every function signature.

I was writing same thing again and again. This kind of repetition is not productive. This is just noise.

So I start searching for better approach. How to pass context without manually threading it through every call. And I found something I never heard before.

AsyncLocalStorage.

My first reaction was wow. This is exact solution for my problem.

AsyncLocalStorage is a Node.js feature from the async_hooks module. The idea is simple. You create a storage that is scoped to a specific async execution meaning one request get its own private storage.

You store your orderId, userId, or any context once at the entry point of the request. After that, any function in the entire execution chain can read it directly. No passing. No threading. No pollution.

Node.js track async context internally. When you start a new context using AsyncLocalStorage, every async function running inside it inherit access to that same storage. No matter how deep the call stack goes. The info is just there, available, without anyone carrying it.

To build this in your project you need two functions only.

import { AsyncLocalStorage } from "async_hooks";

const requestContext = new AsyncLocalStorage();

export function runWithContext(context, fn) {
  return requestContext.run(context, fn);
}

export function getContext() {
  return requestContext.getStore() || {};
}

runWithContext is called at the entrance of your request in your middleware or route handler. You pass the context object with whatever info you need, and the function to run. Everything inside that function now have access to the same storage automatically.

getContext is called anywhere inside the execution deep inside a service, inside a logger, inside a helper. It reads from the current async context. No parameters needed. No importing shared state. Just call it and get what you need.

Now my logger not need orderId as a parameter. Just call getContext() and read it. My availability checker not carrying a parameter it don't use. My pricing calculator is clean. Every function only take what it actually need to do its job.

The code became honest. Functions stop pretending to care about things they don't care about.

When I first saw this working in my booking app the feeling was like finally. Not just because code was cleaner. But because I understand why the old approach was a problem in first place.

Small codebase hide these problems. You don't feel the pain until complexity grows. By then bad pattern is already everywhere and refactoring is expensive.

This is one thing I wish I know earlier. Not because it is hard to understand it is actually simple. But because the problem it solve only become visible at certain scale. And when you finally see it, you can't unsee it.

If you are building something today and passing same parameter through three or more functions just to reach one inner function this is your solution.

Store it once at the entry point. Read it anywhere. Let your functions stay focused on what they actually do.

If I made any mistakes in this — please mention in the comments. I'll correct it.

Why Problems Keep Me Going in Tech

Neel-Vekariya — Thu, 25 Jun 2026 07:40:29 +0000

As a learner in tech, Every day somehow I face a new problem I didn't even know existed yesterday.

And honestly I love this.

When something crashes, when a feature breaks, when I have no idea why the code is doing what it's doing and that struggle of finding the bug, digging into it, understanding it that gives me a kinda of motivation I can't explain. It's like a dopamine hit. The feeling after solving that problem is incredible.

I realized something because of this. Without problems, writing code becomes a repetitive, boring task. Zero motivation. Zero excitement. Just the same thing again and again.

But when something breaks when I actually have to think, investigate, sit with the problem that's when my boundaries expand. That's when I actually learn something.

The Comfort Zone

But I also noticed something else about myself.

The moment I get comfortable with something I lose interest in it.

It starts in the learning phase. Everything is new, everything is a problem, everything is exciting.

I'm struggling, I'm figuring things out, I'm building. And then slowly I get comfortable. That struggle disappears.

The thing that was exciting becomes a repetitive task. Boring. Routine.

And then I move on to the next thing.

This happened with MongoDB. I started learning it, faced problems connection issues, queries not giving correct responses, things not working the way I expected.

I struggled. I used it in my backend project. And then I got comfortable. The interest just went away. MongoDB became something I already knew.

So I moved to PostgreSQL.

Same thing happened with Express. I got comfortable, lost interest.

That's actually one of the reasons I moved to Fastify I needed something new to struggle with again.

The Interesting Part

At first I thought this was a problem. Like maybe I'm not focusing enough. Maybe I should stay with one thing longer.

But then I noticed something.

Express took me two and a half months to learn. Fastify took one week.

PostgreSQL took even less time than that.

The time it takes me to learn something new keeps getting smaller. Each thing I learn becomes the foundation for the next thing. The struggles I had before make the new struggles easier to work through.

Someone once said "Tomorrow's knowledge becomes the foundation of today."

I think about this a lot, And Thats True. Every uncomfortable thing I pushed through, every bug that frustrated me, every late night trying to figure out why something wasn't working all of that is why the next hard thing feels a little less hard.

What I Actually Think This Is

I don't think I get bored of things. I think I outgrow them.
The problems that once challenged me stop being problems. And without problems, there's no growth. So naturally I move toward the next thing that will challenge me again.

That feeling the discomfort, the struggle, the not knowing that's not a sign something is wrong. That's actually the sign that learning is happening.

I want to stay in that feeling as long as I can.

If I made any mistakes in this — please mention in the comments. I'll correct it.

Circuit Breaker Explained Through Real Failure Experience

Neel-Vekariya — Mon, 22 Jun 2026 06:33:06 +0000

In the past few days I faced something I didn't expect.

Features breaking one after another. Endpoints not working. Failures coming in a chain one failing, that causing another to fail, and that causing another. And I had no control over any of it. No way to stop it. No way to understand why it was happening. The failures just kept coming and my system kept getting more overloaded.

I won't lie I wanted to quit. Multiple times. I hit a point where I thought maybe this is not for me.

But there's something that keeps pulling me back. Curiosity. That one feeling of I want to understand this. I want to fix this. That's the fuel that kept me going even when I was completely frustrated.

And I gave it another chance. I sat with the problem again. And this time I found something that explained exactly what was happening and gave me a way to fix it.

That thing is called a circuit breaker.

What Was Actually Happening

Before I found this solution, my system was doing something really inefficient and I didn't even realize it.

Every time a service was failing, I was still sending requests to it. The service is down, failing, completely broken and I'm still passing requests through the router to it. Every single one of them failing. My server kept trying, resources kept getting consumed, and nothing was recovering. The failures were just piling up.

What I needed was a way to say okay, this service has failed too many times. Stop sending requests to it. Give it time to recover. Then try again carefully.

That's exactly what a circuit breaker does.

The Electrical Analogy

Think about an electrical circuit breaker in your home. When too much current flows something is overloaded, something is wrong the breaker trips. It opens the circuit. Power stops flowing. This protects everything from burning out.

Then after a while, you reset it carefully. If everything is fine, the circuit closes again and power flows normally.

Code circuit breakers work exactly the same way.

Three States: This is the Core Idea

CLOSED

Normal operation. All requests are allowed through. But failures are being tracked. If the number of failures crosses the threshold you set, the circuit opens.

OPEN

The circuit has tripped. All requests are blocked immediately. No function even runs. The system waits for the cooldown period to pass. This is the healing time your server recovers, the failing service gets time to come back up.

HALF_OPEN

After the cooldown, the circuit doesn't immediately go back to CLOSED. It moves to HALF_OPEN first. A limited number of requests are allowed through just enough to test whether the service has actually recovered. If those requests succeed, the circuit closes again and everything goes back to normal. If they fail, the circuit opens again and the cooldown starts over.

The Code


export class CircuitBreaker {
    constructor(failureThreshold, cooldownMs) {
        this.failureThreshold = failureThreshold
        this.cooldownMs = cooldownMs
        this.state = "CLOSED"
        this.failureCount = 0
        this.lastFailureTime = null
    }

    openCircuit() {
        this.state = "OPEN"
        this.lastFailureTime = Date.now()
    }

    closeCircuit() {
        this.state = "CLOSED"
        this.failureCount = 0
        this.lastFailureTime = null
    }

    halfOpenCircuit() {
        this.state = "HALF_OPEN"
    }

    async execute(fn) {
        if (this.state === "OPEN") {
            const cooldownExpired = Date.now() - this.lastFailureTime >= this.cooldownMs
            if (!cooldownExpired) {
                throw new ApiError(503, "Circuit is open.")
            }
            this.halfOpenCircuit();
        }

        try {
            const result = await fn()
            if (this.state === "HALF_OPEN") {
                this.closeCircuit()
            }
            return result;
        } catch (error) {
            if (this.state === "HALF_OPEN") {
                this.openCircuit()
                throw error;
            }
            this.failureCount++
            if (this.failureCount >= this.failureThreshold) {
                this.openCircuit()
            }
            throw error
        }
    }
}

Where This Actually Gets Used

This isn't just for one specific case. Any time your system depends on an external service that can fail, a circuit breaker makes sense.

Payment gateways. External APIs. Microservices talking to each other. Database connections. Third party integrations. All of these can fail temporarily. And without a circuit breaker, your system will keep hammering them even when they're down wasting resources, slowing everything else down, and making recovery harder.

What This Actually Fixed for Me

Before circuit breaker failures came in a chain. One service went down, requests kept hitting it, my system kept consuming resources on requests that were definitely going to fail, and everything got worse.

After circuit breaker the moment failures cross the threshold, the circuit opens. Requests stop hitting the failing service. The system gets breathing room. The service gets time to recover. And when it comes back, HALF_OPEN tests it carefully before trusting it again.

That frustration I had the failures I couldn't control, the system I couldn't stabilize all of it was because I had no mechanism to stop the bleeding when something went wrong.

Circuit breaker is that mechanism.

If I got something wrong or anything can be improved — please drop it in the comments. I'm still learning and I want to get this right.

Semaphore — How to Control How Many Operations Run at the Same Time

Neel-Vekariya — Wed, 17 Jun 2026 06:45:03 +0000

What if 100 requests hit your server at the same time and all of them try to connect to your database simultaneously?

I started thinking about this while building my project. I had worker threads running, multiple operations happening at once, and at some point I realized there's no control here. Everything is just running at the same time. No limit. No waiting. Just all of it, at once.
And that's a problem.

What the actual means

Think about a restaurant with only 10 tables.
100 people want to eat. But only 10 can sit at a time. The rest wait outside. When one table frees up, the next person in line comes in. Nobody rushes in randomly. There's order. There's control.

That's exactly what a semaphore does for your code.

A semaphore controls how many operations can run at the same time. If the limit is reached, the rest wait in a queue. When one operation finishes and releases its slot, the next one in the queue gets to run.

Why This Actually Matters

Without this kind of control, you can easily overwhelm a resource.

Your database has a connection limit. Your external API has a rate limit.

Your file system slows down if too many things try to write at the same time. If you just let everything run at once without any control, you're not managing your resources you're just hoping nothing breaks.

I kept seeing this pattern while building production-level infrastructure. Every piece of it has some form of concurrency control. Rate limiting for incoming requests. Connection pooling for databases. And semaphore for controlling how many operations run at a time inside your own system.

How the Semaphore Works

The implementation is simpler than it sounds.

export class Semaphore {
    constructor(maxConcurrent) {
        this.maxConcurrent = maxConcurrent
        this.current = 0
        this.queue = []
    }

    acquire() {
        return new Promise((resolve) => {
            if (this.current < this.maxConcurrent) {
                this.current++;
                resolve();
            } else {
                this.queue.push(resolve)
            }
        })
    }

    release() {
        this.current--;
        if (this.queue.length > 0) {
            this.current++;
            const next = this.queue.shift()
            next();
        }
    }

    async run(fn) {
        await this.acquire();
        try {
            return await fn()
        } finally {
            this.release()
        }
    }
}

Three parts. That's it.

acquire() — before an operation starts, it asks the semaphore for a slot. If a slot is available, it gets one and runs immediately. If not, it waits in the queue. That waiting is handled by a Promise that only resolves when a slot opens up.

release() — when an operation finishes, it gives the slot back. And if there's anything waiting in the queue, the next one gets the slot immediately.

run() — this is the wrapper you actually use. It acquires the slot, runs your function, and releases in a finally block. The finally is important it makes sure the slot is always released, even if the operation throws an error. No slot gets stuck.

What This Looks Like in Practice

Say you have 50 API calls to make but you only want 5 running at the same time.

const semaphore = new Semaphore(5);

const results = await Promise.all(
    requests.map(req => semaphore.run(() => callAPI(req)))
);

All 50 are queued up. But only 5 run at a time. As each one finishes, the next one in line starts. Controlled. Predictable. Your API doesn't get overwhelmed.

What I Realized

When I first wrote concurrent code, I just used Promise.all()and let everything run at the same time. It worked. Until it didn't.

The moment the load increased too many database connections, too many simultaneous operations,things started breaking. And I didn't understand why at first because the code looked fine.

That's the thing about concurrency. It's not just about making things run at the same time. It's about controlling how many things run at the same time. That control is what keeps your system stable under real load.
Semaphore is one of the simplest ways to add that control. And once you understand it, you start seeing places where you need it everywhere.

If I got something wrong or anything can be improved — please drop it in the comments. I'm still learning and I want to get this right.

Retry in Distributed Systems — How Production Systems Recover From Temporary Failures

Neel-Vekariya — Tue, 16 Jun 2026 06:11:58 +0000

Not every failure is permanent.

This is something I didn't think about before. When something fails in my app, my first thought was something broke, fix it. But when I started learning how distributed systems actually work, I realized that some failures are not really failures. They're just temporary.

Network glitch. API timeout. A service that just restarted. Rate limiting kicking in. These are all failures but they last for a very short time window. If your system tries the same operation again after a few seconds, it will probably succeed.

So the question is does your system know how to try again? Or does it just give up the first time something goes wrong?
That's what retry is.

What Retry Actually Does

Without a retry system, if a temporary failure happens that's it. The entire operation fails. The user sees an error. The request is gone.
With retry, your system automatically attempts the operation again after a failure. The goal is simple recover from temporary failures without the user even knowing something went wrong.

This felt obvious to me once I understood it. But building it properly is where it gets interesting.

The Configuration: What Each Part Controls

When I looked into how retry systems are actually configured, there were more options than I expected. And each one exists for a specific reason.

maxAttempts — this defines the maximum number of times the operation can be attempted. You don't want infinite retries. At some point if it keeps failing, it's probably not a temporary problem.

exponential backoff — instead of retrying immediately every time, the delay between retries doubles after each failure. First retry after 1 second, second after 2 seconds, third after 4 seconds. This gives the failing service time to recover instead of bombarding it with requests.

baseDelay — this is the starting delay used in the exponential backoff. The first wait time before retrying.

maxDelay — this caps the maximum delay. Without this, the exponential backoff keeps doubling forever and the delay becomes too long to be useful.

shouldRetry — this determines whether another retry should actually happen. Not every error deserves a retry. A 404 is not a temporary failure. A network timeout is. This config lets you define that logic.

onRetry — a callback that runs before each retry attempt. This is mainly used for logging, metrics, and monitoring. So you have a record of how many retries happened and why.

But There's a Catch — The Thundering Herd Problem

This is the part that I found really interesting.
Imagine 200 workers all connected to the same service. The service goes down for a moment. All 200 workers detect the failure and because of exponential backoff they all wait the same amount of time and then retry at the exact same moment.

What happens? 200 requests hit the service at the same time the moment it comes back up. The service crashes again. Then all 200 retry again at the same time. It becomes a loop.

This is called the thundering herd problem. And exponential backoff alone doesn't prevent it because all workers are using the same delay calculation.

Jitter : The Solution for Thundering Herd

Jitter means adding randomness to the retry timing. Instead of every worker waiting exactly 2 seconds, one waits 1.7 seconds, another waits 2.3 seconds, another waits 1.4 seconds. The requests spread out across a time window instead of hitting all at once.

This one small addition of randomness in the delay completely solves the thundering herd problem. The service gets requests gradually, recovers properly, and the retry system actually works the way it's supposed to.

Why This Matters in Production

What I realized going through this retry is not just "try again". It's a carefully designed system. Without exponential backoff, you overload the failing service. Without jitter, you get thundering herd. Without
maxAttempts, you retry forever. Without shouldRetry, you retry on errors that will never recover.

Every config option exists because someone ran into a real problem in production.

That's the thing about distributed systems the failures are real, the edge cases are real, and every piece of this infrastructure exists because someone hit a wall and had to find a way through it.

If I got something wrong or anything can be improved — please drop it in the comments. I'm still learning and I want to get this right.

Rate Limiting — What Happens When Too Many Requests Hit Your Server

Neel-Vekariya — Mon, 15 Jun 2026 09:23:12 +0000

What if the number of requests coming to your server is more than what your server can actually handle?

Think about this for a second. Your server has a limit — how many requests it can process at a time. If that limit gets crossed, what happens? Your server starts struggling. Response times go up. And in the worst case, your server goes down. Crashes completely.

I started thinking about this — how do production systems actually prevent this from happening? Because this is not a rare situation. One bad actor, or even a sudden spike in traffic, and your server is in trouble.

That's where rate limiting comes in.

What Rate Limiting Actually Does

The idea is simple. You set a limit, how many requests are allowed in a certain time window. If the number of requests goes beyond that limit, the rate limiter steps in and stops the extra requests. It protects your server before things get out of control.

While reading about this, I found there isn't just one way to do this. There are multiple algorithms, fixed window counter, leaky bucket, token bucket, sliding window, and sliding window counter. Each one handles the counting and limiting logic a bit differently. Some are simple but have edge cases at window boundaries. Some are smoother but a bit more complex to implement. The core idea behind all of them is the same count requests, compare with the limit, decide whether to allow or block.

The Good Part You Don't Have to Build This From Scratch

Here's something that made this easier than I expected. Node.js already has a ready-made solution for this express-rate-limit. You don't need to implement any of these algorithms yourself. Just install the library and configure it.

This is the kind of infrastructure you build specifically for public APIs, authentication endpoints, and expensive endpoints the ones that are most likely to get hit hard, either by real traffic spikes or by someone trying to abuse your system.

Where Redis Comes In

Now here's a question I had what if your application is not running on just one server? What if it's running across multiple pods, like in a Kubernetes setup?

If each pod keeps its own request counter, the rate limiting becomes useless. One user could hit pod 1, get counted there, then hit pod 2, and the counter resets effectively bypassing the limit.

This is where Redis comes in. Instead of each pod keeping its own count, you set up a central counter using Redis. Every pod checks and updates the same counter. That's what rate-limit-redis is for connecting express-rate-limit to a shared Redis store so the count stays accurate no matter how many pods your app is running on.

The Configuration What Each Option Actually Does

When I looked into express-rate-limit, there were a bunch of config options. At first they looked like just settings, but each one actually controls a specific part of the behavior.

windowMs — this defines the duration of the rate-limiting window. Basically, the time frame in which requests are counted.

max — the maximum number of requests allowed during that window. Cross this, and the limiter kicks in.

standardHeaders — adds the modern rate-limit headers to the response, so clients know their current limit status.

legacyHeaders — adds the old-style rate limit headers, for backward compatibility.

store — this determines where the request counter is actually stored. By default it's in memory. But for distributed systems, this is where you'd plug in Redis instead.

keyGenerator — this defines how a user is identified. Usually by IP, but it can be customized — by user ID, API key, whatever makes sense for your system.

handler — a custom response for when the limit is exceeded. Instead of a generic error, you can send exactly the response you want.

Why This Matters

What I realized going through this rate limiting is not just about blocking abuse. It's about protecting your server's stability. Without it, one spike in traffic, intentional or not, can take your entire system down.

And the fact that this comes as a configurable, ready-made solution it's not something you need to over-engineer. You just need to understand what each piece does, and configure it according to how your system is set up.

That's the thing I keep noticing as I learn more about production systems
most of the hard problems already have solutions. The real skill is knowing the problem exists in the first place, and knowing where to look.

If I got something wrong or anything can be improved please drop it in the comments. I'm still learning and I want to get this right.

How GitHub Webhook Signature Verification Works — HMAC Explained Simply

Neel-Vekariya — Fri, 12 Jun 2026 06:09:35 +0000

Do you ever think about how you push code on GitHub and it automatically updates on your deployed server?

Like, you just pushed. Nobody manually pulled the code. Nobody SSHed into the server. It just updated. How does that even work?

The mechanics behind this are called webhook signatures. And to understand it, I have an example. Webhook signature is like human trust. If two people have a deep level of trust and one of them sends any signal the other one follows it. No questions asked. Because the trust is already established between them.
Webhooks work the same way.

The Full Flow What Actually Happens

When you push code to GitHub, here's exactly what happens behind the scenes.

GitHub creates a signature using two things, the payload (the data about your push) and a secret key. That secret key is added by you in the GitHub webhook settings. GitHub generates the signature, and sends both the signature and the payload to your server's webhook endpoint.

Your server receives them. But your server doesn't just trust what GitHub sent. It creates its own signature using the same secret key and the same payload. Then it compares both signatures.

If they match the request is authentic. Server runs the pull, code gets updated.

If they don't match the request is rejected.

That's HMAC. Hash-based Message Authentication Code. The entire trust system between GitHub and your server built on a shared secret that only both of them know.

But Why Do We Even Need This

This is the part most people skip. They understand what happens but not why it exists.

Your server's webhook endpoint is publicly available. Anyone on the internet can send a POST request to it. Without verification, your server has no way to know who actually sent that request. Was it GitHub? Was it someone else?

Without webhook signature anyone could send a fake request to your webhook endpoint and trigger a code update. Or worse, modify your data. Your server would have no way to tell the difference.

The second reason is resources. Without webhooks, the alternative is setting up an active listener that constantly polls GitHub to check if anything changed. That's not reliable. That wastes resources. Webhooks solve this GitHub tells you when something happens, and you verify that it's actually GitHub telling you.

But There's a Catch Replay Attacks

If someone intercepts the request and copies the exact same payload and signature they can resend it. The signature would still match because the payload is the same.

To prevent this, a timestamp is added into the payload as a uniqueness factor. This timestamp is not just sitting in the raw payload GitHub includes it in the signature calculation. Your server adds the same timestamp on its side when recreating the signature.

So even if someone copies and replays the same request after a few minutes, the timestamp won't match the expected window. Request rejected.

Second Problem Timing Attacks

There's another problem that happens during signature comparison.
When your server compares two signatures using a normal comparison, it stops the moment it finds the first difference. That means if the first character doesn't match, it returns faster than if the first 10 characters match. This timing difference leaks information.

An attacker can send thousands of requests and measure the response time difference. Gradually, by analyzing the timing, they can guess the correct signature character by character.

To prevent this, you should use timingSafeEqual(). This function compares the signatures byte by byte and always takes the exact same amount of time regardless of how many bytes match or don't match. No timing difference. No information leaked.

hear's the code

const crypto = require('crypto');

function verifyWebhookSignature(payload, receivedSignature, secret) {
    const expectedSignature = 'sha256=' + crypto
        .createHmac('sha256', secret)
        .update(payload)
        .digest('hex');

    const trusted = Buffer.from(expectedSignature, 'utf8');
    const untrusted = Buffer.from(receivedSignature, 'utf8');

    return crypto.timingSafeEqual(trusted, untrusted);
}

Same length buffers. Same time every comparison. Timing attack prevented.

What This Actually Is

This is not just a GitHub feature. This is how trust is established between any two systems on the internet that don't share a session or a login.

The shared secret is the trust. The HMAC is the proof of that trust. And timingSafeEqual makes sure nobody can reverse-engineer that trust by watching the clock.

One push. Automatic deployment. Verified. Secure. That's the mechanics behind it.

If I got something wrong or anything can be improved — please drop it in the comments. I'm still learning and I want to get this right.

Middleware is More Than Auth - What I Learned Building Production Systems

Neel-Vekariya — Thu, 11 Jun 2026 05:28:28 +0000

Most of us think middleware is for one thing verify the user, check the token, move on.

I thought the same. For a long time, when someone said middleware, I thought authentication. That's it. One step before the controller runs, check if the user is allowed, done.

But when I started learning more about how production systems actually work, my perspective completely changed.

Middleware is not just a gatekeeper. It's the entire processing layer that runs before, during, and after your controller. And three specific middlewares changed how I think about this.

Correlation Middleware For Every Request Gets an Identity

In a production system, hundreds of requests are happening at the same time. If something goes wrong, how do you find exactly which request caused the problem?

That's what correlation middleware solves.

Before the request even reaches the controller, this middleware creates a unique ID for that request using randomUUID. Every single event that comes into your system gets its own identity a personal ID that travels with it through the entire execution flow.

This ID is not just for the request. It's for your logs. When something breaks in production, you don't search through thousands of log lines hoping to find the right one. You search by correlation ID and you find every single log entry connected to that specific request. Where it started, what happened, where it failed.

Without this, debugging in production is guessing. With this, it's tracing.

Logger Middleware What's Happening at Every Step

Once the request has its correlation ID, it moves to the logger middleware.

This middleware logs each event during the entire execution flow. Not just at the start, not just at the end every step. The correlation ID from the first middleware travels with every log entry, so everything stays connected and traceable.

In development you can just console.log things and move on. But in production, you need a proper record of what happened, in what order, with what data. Logger middleware does that automatically for every request without you writing logging code inside every controller.

Controller The Actual Work

After passing through correlation and logger middleware, the request finally reaches the controller. This is where the actual business logic runs. By this point, the request already has an identity and everything happening is being logged.

Error Middleware When Things Go Wrong

The last middleware in the chain is error middleware.
Every production system will have errors. The question are those errors handled properly or do they just silently fail? Error middleware catches what goes wrong during the entire execution flow and creates a proper track record of it. And because the correlation ID is already attached to the request, every error log is also traceable back to the original request.

This is the last line of defense before something breaks without you knowing.

How the Flow Actually Looks

Request comes in
- Correlation Middleware (assigns unique ID)
- Logger Middleware (starts logging with that ID)
- Controller (does the actual work)
- Error Middleware (catches anything that breaks)
Every step is connected. Every step is traceable. Nothing gets lost.

What Changed for Me

I used to think middleware was just about who is allowed in. Now I think of it as the system that makes sure every request is identified, tracked, and handled from the moment it arrives to the moment it exits.

This is the difference between an app that runs and an app you can actually debug when something goes wrong in production.
That's the real role of middleware.

Graceful Shutdown in Node.js — What Happens to Your App When It Crashes

Neel-Vekariya — Wed, 10 Jun 2026 06:15:30 +0000

In the last few days I built a project from scratch.

Not a simple one. A project that has everything a real production system has worker threads, cluster, streams, database connections, scheduling mechanisms. All of it. I wanted to understand how production-level systems actually work, not just read about them.

And in those days I learned a lot. But I also crashed my app. A lot.
Every time the app crashed.

one doubt kept coming back to me that what happens to my database connections when the app crashes? What happens to the requests that were still being processed? What happens to the internet connections that were open?

This doubt pushed me to find an answer. And I found an approach called graceful shutdown.

I read about it. And I found some important things I didn't know yesterday. So let me share.

What is Graceful Shutdown

In general graceful shutdown means closing your application without losing data, dropping active requests, or leaving open connections.

Instead of killing the application immediately, we follow a process:

Stop the app from accepting new requests
Allow current requests to finish
Close all database and cache connections
Exit the process cleanly

That's it. That's the whole idea.

Before Building This Two Signals You Need to Know

To implement graceful shutdown, first you need to understand two signals.

SIGINT — Signal Interrupt
This is the signal triggered when you press CTRL + C. Manual shutdown. You're telling the process yourself stop now.

SIGTERM — Signal Terminate
This is the signal sent by Kubernetes, Docker, or the Linux operating system. It's basically a polite way of asking a process to stop. The system is not killing it forcefully. it's saying, hey, wrap up and exit cleanly.

These signals are important to listen to. If you don't handle them, your app just dies the moment they're triggered. Connections stay open. Requests get dropped. Data can get lost.

How to Build This

First, you need a separate server file. In this file you create a server using app.listen() and you hold the reference of that server in a variable. That reference is important you need it to control the shutdown.

Then you set a process.on() listener for both signals. Whenever either signal is received, you trigger a shutdown function.

Here's the code:

javascriptconst server = app.listen(3000)

app.get("/", (req, res) => {
    setTimeout(() => { res.send("Response after 5 seconds Complete") }, 5000);
})

async function shutdownServer(signal) {
    console.log(`Received ${signal}. Shutting down gracefully...`);

   server.close(() => {
        console.log("Server closed. Exiting process.");
        process.exit(0);
    });

   setTimeout(() => {
        console.error("Could not close connections in time. Forcefully shutting down.");
        process.exit(1);
    }, 10000);

}


process.on("SIGINT", () => shutdownServer("SIGINT"));
process.on("SIGTERM", () => shutdownServer("SIGTERM"));

One Thing Most People Get Wrong About server.close()
A lot of people think server.close() stops the server immediately. That's not true.

What this method actually does it stops accepting new requests first.

Then it waits for all the pending requests to finish processing. Then it closes all the database and cache connections. And then it exits.
That's the whole point. You're not killing everything at once. You're letting the app finish what it started.

But there is a catch. Sometimes connections don't close properly. Maybe a database connection is hanging, maybe something is stuck. If you only rely on server.close(), your app could wait forever.

That's why there's a backup a setTimeout. If the connections are not closing within a certain time, it forces the process to exit. It's a safety net. If everything closes properly, process.exit(0) runs first and the timeout never fires. If something is stuck, the timeout catches it and forces a clean exit with process.exit(1).
Both cases handled.

If I got something wrong or anything can be improved — please drop it in the comments. I'm still learning and I want to get this right.

worker-threads vs cluster

Neel-Vekariya — Tue, 09 Jun 2026 04:51:28 +0000

worker-threads vs cluster: When to Use Which, With Reasoning

Node.js is single-threaded. That's the first thing everyone learns.
But at some point you realize okay, single-threaded is fine for most things. But what happens when your app needs to actually use more than one CPU core? Or what happens when one heavy task starts blocking everything else?

That's where worker-threads and cluster both show up. And for a while I treated them like they solve the same problem. They don't.

The way I started thinking about it
The difference clicked for me when I stopped thinking about code and started thinking about the actual problem I'm trying to solve.
Ask yourself one question where is the bottleneck?

Is it that my server can't handle more incoming requests because I'm only using one CPU core? Or is it that one specific task inside my app is too heavy and it's blocking everything else?

These are two different problems. And they need two different solutions.

cluster is about scaling your server
cluster creates multiple Node.js processes. Not threads full processes. Each process gets its own memory, its own event loop, its own everything. They don't share anything by default.

The idea is simple. Your machine has 8 CPU cores. But your Node.js server is only using one. cluster lets you spin up 8 worker processes so all 8 cores are actually doing work. Incoming requests get distributed across all of them.

This is why cluster makes sense for HTTP servers. You're not changing what your app does you're just running multiple copies of it so more requests can be handled at the same time.

And there's another thing. If one process crashes, the others keep running. Your whole server doesn't go down because one worker died. That reliability is built in.

But here's where people make the mistake if you have a CPU-heavy task and you put it inside a cluster worker, it still blocks that entire worker process. cluster didn't solve your actual problem. It just spread the load across processes.

worker_threads is about CPU-heavy tasks

worker-threads is different. It creates threads inside the same process. They share the same memory space. And they're specifically designed for CPU-intensive work.

Think about image resizing. Or parsing a huge JSON file. Or running some complex calculation. These tasks don't need more network capacity they need more CPU time. And because Node.js is single-threaded, that task blocks your entire event loop while it runs.

worker-threads lets you move that heavy work off the main thread. The main thread keeps doing what it does handling requests, running the event loop while the worker thread grinds through the heavy task in the background.

That's the use case. Not "I need to handle more requests." It's "I have one heavy operation and I don't want it to freeze everything else."

The way I think about choosing between them

If the problem is my server is slow under high traffic and I'm not using all my CPU cores that's cluster.

If the problem is I have a CPU-intensive task that blocks my event loop that's worker-threads.

One more thing that helped me think about this. cluster is process-level parallelism. worker-threads is thread-level parallelism. Processes are heavier, more isolated, more fault-tolerant. Threads are lighter, share memory, better for tightly coupled parallel computation.

They're not competing tools. There are cases where you'd actually use both a clustered HTTP server where each worker process also uses worker-threads internally for heavy computation. That's when you're really using your machine's resources properly.

The real question
Before picking either one, I always ask what is actually slow, and why?
Most of the time when I dig into it, the answer tells me exactly which one I need. The wrong choice doesn't break your app. But the right choice makes your app behave the way it should at scale.
That's the reasoning I come back to every time.