Semaphore — How to Control How Many Operations Run at the Same Time

#architecture #learning #performance #writing

What if 100 requests hit your server at the same time and all of them try to connect to your database simultaneously?

I started thinking about this while building my project. I had worker threads running, multiple operations happening at once, and at some point I realized there's no control here. Everything is just running at the same time. No limit. No waiting. Just all of it, at once.
And that's a problem.

What the actual means

Think about a restaurant with only 10 tables.
100 people want to eat. But only 10 can sit at a time. The rest wait outside. When one table frees up, the next person in line comes in. Nobody rushes in randomly. There's order. There's control.

That's exactly what a semaphore does for your code.

A semaphore controls how many operations can run at the same time. If the limit is reached, the rest wait in a queue. When one operation finishes and releases its slot, the next one in the queue gets to run.

Why This Actually Matters

Without this kind of control, you can easily overwhelm a resource.

Your database has a connection limit. Your external API has a rate limit.

Your file system slows down if too many things try to write at the same time. If you just let everything run at once without any control, you're not managing your resources you're just hoping nothing breaks.

I kept seeing this pattern while building production-level infrastructure. Every piece of it has some form of concurrency control. Rate limiting for incoming requests. Connection pooling for databases. And semaphore for controlling how many operations run at a time inside your own system.

How the Semaphore Works

The implementation is simpler than it sounds.

export class Semaphore {
    constructor(maxConcurrent) {
        this.maxConcurrent = maxConcurrent
        this.current = 0
        this.queue = []
    }

    acquire() {
        return new Promise((resolve) => {
            if (this.current < this.maxConcurrent) {
                this.current++;
                resolve();
            } else {
                this.queue.push(resolve)
            }
        })
    }

    release() {
        this.current--;
        if (this.queue.length > 0) {
            this.current++;
            const next = this.queue.shift()
            next();
        }
    }

    async run(fn) {
        await this.acquire();
        try {
            return await fn()
        } finally {
            this.release()
        }
    }
}

Three parts. That's it.

acquire() — before an operation starts, it asks the semaphore for a slot. If a slot is available, it gets one and runs immediately. If not, it waits in the queue. That waiting is handled by a Promise that only resolves when a slot opens up.

release() — when an operation finishes, it gives the slot back. And if there's anything waiting in the queue, the next one gets the slot immediately.

run() — this is the wrapper you actually use. It acquires the slot, runs your function, and releases in a finally block. The finally is important it makes sure the slot is always released, even if the operation throws an error. No slot gets stuck.

What This Looks Like in Practice

Say you have 50 API calls to make but you only want 5 running at the same time.

const semaphore = new Semaphore(5);

const results = await Promise.all(
    requests.map(req => semaphore.run(() => callAPI(req)))
);

All 50 are queued up. But only 5 run at a time. As each one finishes, the next one in line starts. Controlled. Predictable. Your API doesn't get overwhelmed.

What I Realized

When I first wrote concurrent code, I just used Promise.all()and let everything run at the same time. It worked. Until it didn't.

The moment the load increased too many database connections, too many simultaneous operations,things started breaking. And I didn't understand why at first because the code looked fine.

That's the thing about concurrency. It's not just about making things run at the same time. It's about controlling how many things run at the same time. That control is what keeps your system stable under real load.
Semaphore is one of the simplest ways to add that control. And once you understand it, you start seeing places where you need it everywhere.

If I got something wrong or anything can be improved — please drop it in the comments. I'm still learning and I want to get this right.