Hideki Mori

Posted on Jun 8

Abstractions are fine. Starting on them isn't.

#api #architecture #serverless #softwareengineering

I've been writing software alone for 24 years. The shape of how I do that hasn't changed much — but I've watched the environment around me change a lot, and one shift has been quietly bothering me.

It's the shift in what application engineers are given on day one.

Pain used to teach you when something was wrong

In the old days, the lesson was simple. Your hardware stopped responding. That was the lesson.

A box couldn't keep up — you saw it on the dashboard, you felt it in the user complaints. You learned, often the hard way, that you needed an LB and two machines instead of one. The pain told you the system had outgrown its current shape.

EC2 didn't change this much. An instance could still get pinned at 100% CPU. The application could still hang. The pain was still there, just on rented hardware instead of bought hardware. The lesson — squeeze whatever you can out of the application layer first — was still delivered in the same form.

The pain was the cheapest monitoring tool any of us ever had.

Serverless took three pains away

Then serverless arrived. Auto-scaling. Pay-per-use. Managed everything. And without anyone really announcing it, three things that used to teach engineers stopped teaching them.

Pain 1: cost spikes that show up too late

In a per-use billing model, your application can be running badly for weeks before the financial signal arrives. You only notice when the invoice shows up at the end of the month — by which point, the failure is already paid for.

The "fix" most people reach for is a budget alert. But a budget alert is a smoke detector after the fire is already in the wall. You needed to know in week one, not in week four.

Modern anomaly-detection tools can shorten the gap from weeks to hours. They help. But notice the framing: each new tool exists because something underneath was hidden in the first place. The tool is a patch on a layer of invisibility, not a substitute for not having that layer.

Pain 2: performance degradation that auto-scaling hides

You're looking at a database load graph that's been climbing for three months. Why?

Is it because users are growing? Healthy. Or is it because your aggregation table is missing partitions, so query cost grows superlinearly with data volume? Not healthy.

When the database has fixed limits, this question gets forced on you. The database starts complaining. You have to look.

When the database auto-scales, the second condition is much easier to overlook. The infrastructure absorbs the inefficiency, and nothing forces you to look. The graph keeps climbing. The bill keeps climbing. Nobody traces it back to the actual code that's doing the wrong thing.

Pain 3: where the work actually happens

This is the one I think about most.

Someone I work with recently described a problem to me like this: the managed GraphQL layer is firing way too many SQL queries, and I need to do something about it.

I kept turning that statement over in my head. The managed GraphQL layer is firing too many SQL queries. What does that actually mean?

It means: queries are being issued, against a relational database, on behalf of an application this engineer is responsible for. And the engineer doesn't fully know how many, when, or with what transaction boundaries.

That last part is the one that matters. I asked them: if you're aggregating into yearly, monthly, and daily summary tables, and then setting a "processed" flag on the source records — and the very last UPDATE fails — can you roll all of that back? Is the whole sequence under your control?

I'm not sure they fully understood the question. To be fair, in the world they were handed on day one, the question doesn't naturally come up. The way the queries were resolved broke the unit of work into independent pieces, and each piece looked atomic on its own.

That isn't an SQL problem. It's a control problem.

A web design analogy

If I had to put this another way:

You can build a beautiful website using a no-code tool today. The output looks great. It deploys fine. It probably even works in production for a while.

But when something needs adjusting at the level of the generated code, you're stuck. The tool has handed you a polished surface, and not the means to repair it. For prototyping, this is excellent. For long-term operation, it's a problem you'll only discover the day you need to fix something the tool didn't anticipate.

Serverless plus heavy abstraction layers, given to an engineer on day one, is a similar situation — except the things they can't repair include cost, performance, and transactional integrity, all at once.

The application engineer is the right judge — they just need to be able to see

Here's the part that often gets reversed in these discussions.

The right person to judge whether an application is well-optimized is the application engineer who built it. They know the business requirements. They know which endpoint is hit by 1M requests a day, and which one is hit by 10. They know which tables grow linearly with users and which grow with the square of users. Infrastructure people, by their position, often don't have this context — not because they lack ability, but because the role doesn't naturally include the business reasoning behind every endpoint. They can't make this judgment from the outside.

The application engineer is the right judge.

But to judge, they need to be able to see. They need to know how many SQL queries their endpoint issues. They need to know what transactions they own. They need to know what happens when the third write succeeds and the fourth fails. None of this is obscure or fancy — it's the basic literacy of running a relational database under load.

The problem isn't that abstractions exist. Abstractions are fine. The problem is that abstractions are now the first thing an engineer encounters, and the things underneath them are made deliberately invisible.

My setup, for what it's worth

I run something called LDX hub. It's a public API platform on top of an internal hub I've been growing for years. Five services, document processing, the usual.

For the public-facing edge, I use a managed gateway on a flat-rate monthly subscription. It's "serverless" in shape, but the billing is fixed. There is no cost-spike risk to monitor for. If the bill could ever scale with traffic — even gracefully — I would seriously reconsider. The flat rate is the entire reason this part of the stack is comfortable to me.

For the compute layer, I run Java on EC2. Java's cold-start cost makes a long-running process the natural fit, but that's only half the reason. The other half is that EC2 has fixed limits. When something is wrong with the application, the fixed ceiling forces me to see it. I keep some old-fashioned things — server-side metrics, per-segment timing logs that record how long each part of a request took, alerts when something runs slower than it should — and these tell me, fairly directly, whether the application is doing its job efficiently or not.

For the database, I run my own RDB on EC2 too. Managed databases would handle backups and patches for me, but they would also schedule downtime I can't always control. I prefer the option of doing maintenance myself, on my schedule, with the techniques I've used for two decades — including running two databases in parallel through an application-layer two-phase commit, then cutting over, all without taking the system down. None of that works if the database is something I can't touch.

The whole stack is in AWS. None of it is "old school" in the sense of being on physical hardware. But each layer was chosen so that the things I should be able to see remain visible, and the things I should be able to control remain in my control.

This is not what you should do. This is what 24 years has taught one specific person to do.

Closing

I'm not against abstractions. I use them. The managed gateway in front of LDX hub is one. Cloud is one. Even my IDE is one.

What I'm against is starting there.

If you learn to write applications in an environment where you can't see what you're consuming, you don't learn the relationship between what you wrote and what your system did. You also don't learn how to cover the gap when something goes wrong. Twenty-four years from now, you'll still be guessing at it.

The engineers I see escape this trap usually do it because someone — often by accident — exposed them to a system where the lights were on.

I don't know if that's a fixable situation at the level of an industry. I work around it in my own setup by keeping the lights on — not because visibility is a virtue, but because the business logic is the one thing in my stack that no one else will get right for me. Everything else exists to give me time to get that part right.

Experience widens the set of choices that work for that. I'm not arguing against any of those choices. I'm arguing against the day one where there isn't a choice yet.

Built with Claude (Opus).

Earlier in this series:

DEV Community