DEV Community

Cover image for The const enum that took down our payments
Karan Mali
Karan Mali

Posted on

The const enum that took down our payments

Three minutes

That's how long I'd sit there every time I changed a file on our server. Three to four minutes for the dev server to rebuild and come back up. Long enough to check Slack, scroll twitter, and then forget what I was about to do when it finally came back. I was the only one on Linux. The rest of the team was on Mac, and on their machines the dev server came back in twenty seconds. Nobody else felt the pain, so nobody else was looking for a fix. On my machine it was the bottleneck of my whole day. I started batching changes across multiple files so I'd only pay the rebuild cost once, which sounds clever but mostly meant I'd lose track of what I was actually testing. You don't really notice an hour disappearing into that. You just notice you're tired by 4pm and you can't figure out why.

For a while, I lived with it.


A small experiment

I'd used esbuild in some side projects before. It built things in milliseconds. After enough days of staring at a slow rebuild, I started wondering whether it would work for our server too. I tried it locally first. Wrote a small config, pointed it at our entry files, ran it. The build went from minutes to under a second. I ran the dev server. It came up. Things worked.

I pinged a senior developer. Told him I had esbuild working locally and asked if we should give it a real shot. He said yes, we had fifteen to twenty days before the next production release, plenty of room to catch problems. So I opened a PR. The PR was small: a config file, a couple of package.json scripts, and two enum changes that I'll come back to in a minute. That was it. We talked about it again the next day. He said let's not merge it into dev right now because dev was being stabilised for the upcoming production release. "Use it locally for now, we'll merge it after the release." I said fine. Went back to my work and kept using esbuild on my machine. The PR sat there. A few days later the production release happened. Everything looked fine. I didn't think about the PR. I was busy with my own tickets, and honestly the esbuild thing had already paid off for me personally. I was the only one using it, my local was fast, life was good.

That weekend, my phone buzzed.


One landlord, then five

The first message was from on-call. A landlord couldn't access their payment link. The link was returning something weird, undefined or null, the kind of thing that usually smells like a bad record more than a bad code path. We looked into it. Their data on our end seemed off. We wrote a small script, patched the row, told the landlord we were sorry, and moved on. One landlord with broken data is the kind of thing you can rationalise away. Migrations are messy, maybe a job didn't run, whatever. We've all seen worse.

The next day, another landlord reported the same thing. Then another. Then another. By the time we had five or six of these, the rationalisation stopped working. Data corruption does not politely affect five different accounts in the same exact way, in the same flow, on the same weekend. Something else was going on. And it was Saturday, and landlords were trying to collect rent, and the payment link, the one piece of the product that absolutely cannot be broken, was the thing that was broken.

I pulled the flow up on my own machine. It worked. Pulled it up on dev. Reproduced it on the second try. Logs from every affected account hit the same callback handler before things fell over. So this wasn't bad data. This was code. Code that had shipped, code that had passed review, code that none of our tests had caught. That's when I started going through git history.


The PR I forgot about

Our tickets that release had nothing to do with payments. Nobody had touched the payment files. Nobody except me, and only in one place: those two enum changes in the PR I thought we hadn't merged yet. I pulled up the git log. There it was. Merged into dev, shipped to prod, sitting there in the release as if it had always been planned. Apparently somewhere between "let's wait for the release" and the release itself, the PR had quietly gone in. I genuinely don't remember how. I don't remember being told. I had stopped tracking it because in my head it wasn't going out yet.

I opened the commit. Two small lines stood out:

- export declare const enum PaymentStatus { ... }
+ export enum PaymentStatus { ... }
Enter fullscreen mode Exit fullscreen mode

I had a feeling, but I wasn't sure. I dropped the diff into Claude and asked what esbuild does with declare const enum. It told me everything I needed to know in about two paragraphs. I read it twice, then went looking through the actual codebase to confirm, because at this point I didn't really trust my own memory of what I had touched.


Why the build tool was the bug

tsc and esbuild look like they do the same job. You give them TypeScript, they give you JavaScript back. Except they really don't. tsc understands types across files. It reads the .d.ts files for every package you import from. When it sees a declare const enum, it doesn't generate a runtime object at all, it just inlines the value directly into the call site. So payment.status === PStatus.CAPTURED compiles down to payment.status === "Captured". The enum doesn't exist at runtime. It doesn't need to. esbuild doesn't read .d.ts files. That's the whole story, really, but it took me a while to fully appreciate what it meant.

We had an internal npm package that the server depended on heavily. Its types looked like this:

// internal-db-package/.../Payments.types.d.ts
export declare const enum PStatus {
  PENDING = 'Pending',
  CAPTURED = 'Captured',
}
Enter fullscreen mode Exit fullscreen mode

And its compiled .js file looked like this:

// internal-db-package/.../Payments.types.js
"use strict";
// (that's the whole file; declare const enum compiles to nothing)
Enter fullscreen mode Exit fullscreen mode

tsc would read the .d.ts, find the enum, and inline the value into the consumer code:

status === "Captured"
Enter fullscreen mode Exit fullscreen mode

esbuild read the .js, found nothing, and emitted this instead:

status === (void 0).CAPTURED
// TypeError: Cannot read properties of undefined (reading 'CAPTURED')
Enter fullscreen mode Exit fullscreen mode

The build printed a warning. Not an error, a warning. The server still booted. The dev environment behaved normally because that specific code path didn't run in any of our health checks. The only time it actually ran was when a real payment confirmation came back from the provider and the code tried to look up the status against the enum. That's when it crashed. The two enum flips in my PR (PaymentStatus and InvoiceTrackingEvent) were just there to get past esbuild's local build errors. I thought I had fixed the problem. What I had actually done was patch the two enums I happened to notice, while leaving every other enum import from that internal package silently broken in production.

The hotfix itself was tiny. Revert the build script back to tsc, restore the two declare const enum keywords, push. Five lines. I had it in within the hour. Total production impact came out to roughly five to six hours, spread across a weekend, affecting somewhere between ten and fifteen landlords depending on how you count the ones who retried successfully on their own. We patched the affected payment records by hand the next morning. None of the actual money was lost, just the link state. Stripe held the charges fine. That was the only mercy in the whole thing.


The part I didn't expect

I waited for someone to be angry. Nobody was. A senior developer pinged me after the hotfix went out. No "why did you do this", no "why didn't we test it harder". Just: "You know the fix, just get the PR up for production, we're good." That was the whole conversation. The story made the rounds internally as a joke, not an indictment. Nobody points at me with it.

Two months later, with the dust settled, I came back to esbuild. By then I had switched to a Mac and the slow-build pain wasn't personal anymore, but the numbers were still real and the curiosity hadn't gone away. This time I did it differently. I started by actually mapping where the internal package was imported. Dozens of places, some I had no idea existed. The codebase had been growing for years and a few of the original authors were no longer around. That alone was a useful exercise, separate from anything to do with build tools. I now had a list of every file the change could touch. The rule I set for myself was simple. tsc stays in the production Docker build, untouched. esbuild is only allowed near dev. Production keeps the boring, slow, correct thing. Local gets the fast, fragile thing, with guardrails.

Then I wrote two small esbuild plugins. The first one rewrites declare const enum into plain enum on the way through esbuild. It walks the source, strips the declare keyword, and lets the enum compile to a real runtime object. The local files now generate the JavaScript that tsc would have inlined for free. The second plugin was the one that actually mattered. It virtualises the internal package's .types modules. The internal package's compiled .js files are empty (because, again, declare const enum compiles to nothing). The plugin intercepts every import resolving into those .types paths and substitutes a hand-written module with the same shape and the same values, but as a plain object literal that exists at runtime. The empty .js files never get loaded again. Whatever esbuild can't infer from .d.ts, the plugin supplies directly. The dev server rebuild went from 6.6 seconds with tsc down to 119 milliseconds with esbuild. Roughly fifty-five times faster.

tsc      ████████████████████████████████  6.6s
esbuild  ▏                                 119ms
Enter fullscreen mode Exit fullscreen mode

To be clear, the 6.6 seconds is the Mac number, not the Linux one. On Linux it had been minutes; on the Mac tsc was bearable but esbuild was still in a different league. That setup is still running today, three months later, and nobody has noticed it exists, which is the highest compliment a dev tool can get. I also wrote a short doc explaining what the plugins do and why they exist, and pinned it in our engineering channel. Partly so the next person who touches the build doesn't wander into the same trap. Partly so future-me, in six months, doesn't either.


In closing

Build tools are replaceable. Good teams aren't.

Top comments (0)