If you’ve ever written this:
await Promise.all(tasks.map(t => t()));
You’ve probably told yourself:
“That’s concurrency.”
I used to believe that too.
Then I tried to build something that behaved like a real system — not a tutorial.
What I wanted was simple on paper:
A Node.js task runner that could:
- Run tasks concurrently
- Retry failures
- Never overload the system
- Always reach a terminal state
What I got instead was a lesson in engineering humility.
The Problem That Started It
I wasn’t trying to optimize performance.
I was trying to build trust into my system.
In production, this is what really matters:
- APIs fail
- Networks lie
- Logs deceive
- Retries spiral out of control
So I asked myself:
If this were a job queue or microservice worker — would I trust it at 3 AM?
That question shaped everything.
What a “Task” Really Is
In my system, a task is not a promise.
It’s a function that returns a promise.
Why that matters:
Because retries require re-execution, not re-awaiting.
type Task<T> = () => Promise<T>;
This single design choice eliminated:
- Stale promises
- Broken retries
- Phantom concurrency bugs
The Fake API That Told the Truth
Instead of mocking success, I mocked reality:
const fakeApi = (id, delay, shouldSucceed) => {
const start = Date.now();
console.info(`STARTING Task ID: ${id}`);
return new Promise((resolve, reject) => {
setTimeout(() => {
const end = Date.now();
if (shouldSucceed) {
console.log(`SUCCEEDED Task ID: ${id} | Duration ${end - start}ms`);
resolve(id);
} else {
console.error(`FAILED Task ID: ${id} | Duration ${end - start}ms`);
reject(new Error(`${id} failed`));
}
}, delay);
});
};
This gave me:
- Real durations
- Real failures
- Real concurrency pressure
No illusions. Only evidence.
The Real Engineering Problem
The hard part wasn’t “running tasks.”
It was enforcing three laws:
1. Concurrency is a Hard Ceiling
Never more than MAX_CONCURRENT_TASKS running — even during retries.
2. Retries Must Be Deterministic
A failed task can retry — but it must never exist twice in-flight.
3. Everything Must End
Every task must reach:
- Success
- Or retry exhaustion
No zombies. No silent failures.
The Breakthrough Moment
The moment it clicked wasn’t when my code worked.
It was when my logs told a story:
STARTING Task ID: 2
FAILED Task ID: 2
STARTING Task ID: 2
FAILED Task ID: 2
That wasn’t noise.
That was controlled failure.
That’s what real systems do.
What This Taught Me About Backend Engineering
Concurrency isn’t about parallelism.
It’s about governance.
Retries aren’t about recovery.
They’re about discipline.
Logs aren’t for debugging.
They’re for truth.
Why This Pattern Matters in Production
This architecture maps directly to:
- Job queues
- Worker pools
- Microservice orchestrators
- API batch processors
- Event-driven systems
If you can control this — you can control load, cost, and reliability.
Final Thought
Most developers learn async.
Engineers learn systems behavior.
This project pushed me from one side to the other.
Repo link - Concurrency Task Runner
Top comments (1)
Great article on understanding real systems behavior! Your point about "engineers learn systems behavior" really resonates. When designing distributed systems with task queues and workers, being able to quickly visualize the architecture is crucial. I've been using InfraSketch (infrasketch.net) to experiment with different task processing architectures—you describe your system conversationally and it generates the diagram. Makes it much faster to explore architectural trade-offs before implementation.