This Dot Media for This Dot

Posted on Jun 30, 2019 • Edited on Jun 30, 2024 • Originally published at thisdot.co

The Cost of Premature Abstraction

Confused about or frustrated by premature abstraction? Want to know how to avoid it? In this article, we’ll walk through what premature abstraction is, how to recognize it, and how to evade it before it causes problems in your code.

For those of you not familiar with premature abstraction, this term refers to adding abstractions to your code before it’s actually necessary, and before you fully understand the problem space and all the possible variations.

It’s easy to provide the definition of “premature abstraction.” It’s much harder to truly know when an abstraction is truly premature or when it’s just smart planning.

When is Premature Abstraction Most Common?

The cycle usually starts off innocuous and righteous: You’re solving a large problem (e.g., a new feature) and find that many of its smaller sub-problems have things in common — duplicate code. You add an abstraction.

As you continue to work the larger problem and reach for your new abstraction, you modify it to accommodate the slight variation. Repeat. More time passes. Repeat.

Eventually, you start to find that all the variations the abstraction needs to support starts become unwieldy. The abstraction itself is now complex, hard to follow, and brittle. It tries to do too much for too many disparate scenarios.

This can happen relatively quickly, or it could take months or even years to start causing grief. Either way, abstractions can have an ironic effect of making things harder to maintain.

Some of the most common cases of premature optimization happen with “helper/utility” functions and with Views/Components. If you find your abstraction has to check numerous possible input cases, this is a good sign that the abstraction was premature, or at least that it has outgrown its original purpose and should be disbanded.

Data structures can suffer from this too, but you also don’t want them to end up as monolithic God objects that know too much.

Another case is a little sneakier: bringing in complex libraries or patterns. In the JavaScript world, libraries like Redux, RxJS, NgRx, etc., are very useful but come at a cost. Particularly for newcomers, it’s easy to use these libraries in ways that do more harm than good. We’re actually big fans of all of them, but have found that the temptation by many to bring them in too early or without proper knowledge/training has led to unfortunate outcomes.

Recognizing an abstraction as premature is more art than science, and admittedly, it’s very subjective.

The True Cost of the Wrong Abstractions

It’s somewhat of a paradox, but abstractions can simultaneously increase and decrease the complexity of code — even if you choose the right abstraction at the right time.

When the right abstractions are added at the right time, it can make it easier to fix bugs in code that is commonly reused, as well as to make it easier for someone to quickly understand what the process of a particular code path looks like without needing to understand the nitty-gritty details. For example, when I’m scanning some code and come across fetchUser() I don’t always need to know how the User is in fact fetched.

At the same time, even with the right abstraction, this indirection means the developer who does have to change some of that underlying abstracted behavior has to dig deeper, search further, and untangle compounded abstractions on top of abstractions. Admittedly, this is actually normal to some extent.

However, when the abstraction was premature or otherwise poorly executed, these problems are much worse.

If your app doesn’t have solid testing in place, abstractions are even riskier. Any change to that abstraction can have unknown consequences, so you’d be wise to avoid using lots of them in code that isn’t tested. You don’t have to shun them altogether, just avoid them.

Instead of Abstractions

When in doubt, duplicate code. The cost of a bad abstraction is much, much higher than the cost of dealing with duplicate code. There’s even seemingly conflicting advice, like “don’t repeat yourself” (aka D.R.Y.), where you’re encouraged to add an abstraction, such as a function, instead of writing duplicate code.

The ideal lies somewhere in the middle between the two. It’s TOTALLY OKAY to repeat yourself! In fact, we take the stance that you should repeat yourself by default. Abstract it away once it’s very clear that the code you wrote is in fact needs to be used multiple times and that any variations you need to support aren’t significant.

Allocate time for code reviews, and only later refactor to tease out clear cases where nearly identical code paths or data structures can be reused.

If you abstract too early or too aggressively, it’s easy to create overly restrictive abstractions (where you’ll have to work around them later) or overly general abstractions (where you’ll have difficulty maintaining them).

Of course, some things are by definition abstractions, and are unavoidable even from the start; e.g., you (usually) don’t want your JavaScript app querying your database directly. But even in cases such as these, you can keep your initial abstractions to a minimum.

Conclusion

Above all, remember why we create abstractions to begin with: to make it easier to reuse code, fix bugs in a single place, and to hide implementation details.

Abstractions are necessary, and they’re all around us. They’re not evil!

The goal is working towards that seemingly impossible balance of just enough abstractions, introduced at just the right time.

At This Dot Labs, we help software architects and developers think through problems like the costs of premature abstraction and craft expert, seamless solutions. Need help? Email us directly at hi@thisdot.co.

This article was written by Jay Phelps, a Google Developer Expert and mentor at This Dot Labs.

Need JavaScript consulting, mentoring, or training help? Check out our list of services at This Dot Labs.

Top comments (2)

Riccardo Bernardini • Jul 1 '19 • Edited

Actually, my decision on abstracting is made on the basis if the abstraction makes semantically sense, not just if there is some code in common. Because of this, my code usually has class hierarchies that are one, maximum two, level deeps and in most cases the first level is just an interface, declaring the operations I expect from the descendant of that root class. I use this solution especially for those parts of my code that are plugin-based, for example, a program that read some external data could define a "reader interface" and I would be able to add a new input format by just adding a new descendant (plug-in) of the reader interface.

For example, I am currently writing a software that helps in writing European research proposal. In a research proposal you have several "entities:" partners, workpackages, tasks, deliverables and milestones.

Every entity has a name, a short name (used when there is not much space) and a label (used for cross-reference). Deliverables and milestones also have a date when they are expected, while WPs and tasks have beginning and ending dates.

Therefore, my hierarchy is something like this

entity (abstract, it has name, short name and label)
   |
   +-- partner
   |
   +-- event-like entity (abstract, something that happens at a given time)
   |         |
   |         +-- deliverable
   |         |
   |         +-- milestone
   |
   +-- activity-like entity (abstract, begin and end time)
             |
             +-- work package
             |
             +-- task

Note that I never talked about code, the reason for doing this abstraction are basically of "semantic" type. Note also that the abstraction was born from the "concrete" classes and recognizing some commonality in their semantic. It was not born by starting from a "grand root abstract class" with nothing in it, whose only goal is to have everything descends from it.

Alan • Jul 1 '19

Totally agree with your view on what makes abstraction worthwhile.

A one-to-two level deep inheritance or composition is most ideal since it doesn't suffer from Chinese Whispers effect. I often refer to the change as an identity corruption of code, as keeping the identity of a function/class true to its origin would keep it readable and maintainable.