naveenreddy devireddy

Posted on May 21

The Black Box Nobody Would Touch: Scaling Undocumented Legacy Code as a 3-Year Dev

#legacy #architecture #node #career

Three years into my career, I raised my hand for a job nobody else wanted to touch.

I'd inherited a system with zero documentation — the people who built it were long gone, and nobody fully understood how it worked. The kind of codebase everyone quietly hopes lands on someone else's plate.

Then the brief came down: scale it to 5,000 concurrent users, turn it into multi-tenant SaaS, and support per-client authentication — some tenants on Active Directory, some on LDAP, some with their own mechanism.

The existing system couldn't do any of that — and changing it felt risky enough that the work kept getting put off.

I took the challenge. Here's what the black box actually looked like, and the method that got us through it — with one Senior DevOps, in about two months.

What we inherited

One server doing everything. Node.js served the frontend pages and ran the backend. Active Directory was tightly woven into the auth flow.
Two databases, two philosophies. MySQL for transactions, MongoDB for reporting.
Data got into Mongo three different ways. Sometimes a Python script pushed it. Sometimes the API wrote to it directly. It had been done different ways over time, with no shared convention.
No docs, no original authors, no map.

If you've ever opened a repo and felt your stomach drop, you know the feeling. The instinct in that moment is to rewrite it. That instinct is almost always wrong — a rewrite of a system you don't understand just recreates the same unknowns, slower, while the business waits.

So we didn't rewrite. We tamed it. Here's the method.

1. Understand the system from the outside, not the code

You can read a million lines of undocumented code and still not know what actually happens in production. So I stopped trying to read my way to understanding and started observing instead:

Logs and traffic — what endpoints actually get hit, and how often.
The databases, live — what queries are really running.
The process and the network — what's listening, what talks to what.

Map the behavior, not the source. Within days you have a real picture of the system the code comments never gave you.

2. Find the seams

Once you can see how requests flow, you look for seams — the natural boundaries where you can change something without unraveling the whole sweater. For us the important ones were the authentication boundary, the heavy reporting read paths, and the frontend vs. backend responsibilities that happened to live in the same process.

You don't need to understand every line — you need to understand the edges.

3. Scale the box from the outside before touching its insides

The key insight, and the part I'd tell every junior engineer: you can scale a system you don't fully understand by changing what's around it, not what's inside it.

We went from one server to four:

Concretely:

Split frontend and backend onto their own servers — 2 for frontend, 2 for backend. Each frontend ran nginx as a reverse proxy that load-balanced across both backend servers, so traffic spread evenly and either backend could drop without taking the app down. (They also had different scaling profiles; co-locating them on one box was the original bottleneck.)
Ran the Node backend under PM2 in cluster mode — 4 workers per server, so a single Node process stopped being a single point of CPU contention.
Added read replicas for both MySQL and MongoDB. Reporting reads went to replicas; transactional writes stayed on the primaries. This alone took enormous pressure off the system without changing a line of business logic.

None of this required understanding the messy internals. It's infrastructure and topology — and it bought us the headroom for 5,000 concurrent users.

4. Isolate the one risky internal change behind an abstraction

The part we did have to change was auth — going multi-tenant meant Active Directory could no longer be hardcoded. Each tenant needed its own mechanism: AD, LDAP, or custom.

Instead of touching AD logic scattered everywhere, we introduced a single seam: a pluggable auth provider, chosen per tenant.

// One interface, one adapter per mechanism.
const authProviders = {
  ad:     require('./auth/activeDirectory'),
  ldap:   require('./auth/ldap'),
  custom: require('./auth/custom'),
};

function getAuthProvider(tenant) {
  const provider = authProviders[tenant.authType];
  if (!provider) throw new Error(`No auth provider for ${tenant.authType}`);
  return provider;
}

async function login(req, res) {
  const tenant   = await resolveTenant(req);   // by subdomain / header
  const provider = getAuthProvider(tenant);
  const user     = await provider.authenticate(req.body, tenant.config);
  // ...issue the session / token
}

Every adapter implements the same authenticate(credentials, config) contract. The old AD logic became one adapter behind the interface instead of an assumption baked through the codebase. Adding a new tenant's auth no longer means surgery on the core — just one small, isolated adapter.

That's the legacy-taming move in miniature: don't refactor the whole thing — wrap the risky part in a boundary and change behavior there.

5. Change incrementally, verify behavior every step

Because nobody understood the system fully, every change went out small and observable. Capture how it behaves now, change one thing, confirm nothing drifted, repeat. Slow is smooth, smooth is fast — especially when the safety net is observability, not people who remember how it works.

Where it landed

One server → four (2 frontend, 2 backend), backend under PM2 cluster (4 workers each).
Read replicas on both MySQL and MongoDB.
Pluggable per-tenant auth — AD, LDAP, or custom — instead of hardcoded Active Directory.
Capacity for 5,000 concurrent users, multi-tenant ready.
Delivered in ~2 months, with one Senior DevOps.

And the part I didn't expect: doing this as a 3-year engineer — taking the thing no one else would touch and bringing it through — changed my standing in the company. It taught me that the scary, undocumented, "don't touch it" systems are exactly where you earn trust the fastest.

What I'd tell my 3-year-old self

Don't rewrite what you don't understand. Tame it first.
Observe the running system before you read the code. Behavior is the real documentation.
You can scale a black box from the outside — topology, clustering, replicas — long before you refactor its insides.
Wrap risky change in a seam. Adapters let you change behavior without understanding everything.
The systems nobody wants to touch are the biggest career opportunities. Raise your hand.

Have you inherited a black box like this? What did you do first — read the code, or watch it run?

Written as a general lesson on taming legacy systems — no company, client, or proprietary specifics, just the approach.

DEV Community