stereobooster

Posted on Jan 12, 2019 • Edited on Aug 23, 2019

Spot a leaky abstraction

#beginners #programming #discuss

Let's play a game. Who will find more leaky abstractions? I wrote about abstractions recently. To recap leaky abstraction is the one which exposes implementation details.

I will start - indexes in relational databases. The abstraction consists of relational algebra (in form of SQL), entity relational diagrams (as a way to model), normal forms, right? You modeled your database, got it to 3rd normal form, you wrote some queries - everything works ok; then it appears that some queries are slow, and you need to use EXPLAIN. It can happen that the implementation of query analyzer not ideal and you need to tweak your query a bit to make it happy, or you need to add indexes (anв you can do it without understanding what is it and how it works - some magic which makes your selects faster and inserts slower). Ok. Then you need to implement text search and you have indexes and you use LIKE "abc%" and it works fast, but LIKE "%abc%" is slow? And your reaction WTF. You start to read and it appears that indexes are BTrees data structure (typically for string columns), so you can traverse tree fast from top to bottom e.g. optimize text searches "starts with", but not "contains". At this point implementation is exposed, I need to understand how it is implemented to understand why it works the way it works. Luckily I don't need to understand how it handles concurrency or works with the file system, so it's not that bad.

Don't confuse leaky abstractions with the wrong abstraction. To recap the wrong abstraction, when you can't precisely state your problem in terms of abstraction, it means that abstraction doesn't fit, it was written with different use cases in mind or it is outdated, or requirements changed etc. For example, SQL is not suitable to store graphs (doable) and traverse graphs (really bad idea), SQL is simply the wrong abstraction for this task. Use graph database, like Neo4j; use Gremlin or Graphit instead of SQL.

It's your turn. One example per comment. Use open source projects or publicly available things as examples.

Photo by Nine Köpfer on Unsplash

Top comments (7)

Adrian B.G. • Jan 12 '19

I do not think that you need to understand index implementation, Im pretty sure you just have to read the manual.

Unpopular oppinnion: why is portraited as a wrong thing? Knowing how a tech work you use everyday and depend on it is not something bad, is a requirement, for a professional.

stereobooster • Jan 13 '19

To be clear:

the fact there is one tiny leaky abstraction, doesn't disqualify all technology at once. SQL is just fine. This thing with indexes is fine, but I guess we can do better (and this is hard, otherwise people would already come up with better solution)
I never said, that learning new things is bad. For sure learn everything, learn as much as possible. But also take into account, that there maybe people who don't want to do it, but they will be forced or frustrated. Like what if this somebody from BI, they don't have a clue what BTree is

Now about leaky abstraction. I wrote whole paragraph explaining why but, I guess, my explanations unclear. Help me to understand what is unclear in my explanation.

The abstraction itself consist of relations (expressed as foreign keys or joins by some field), entities (expressed as tuples in tables), normal forms. None of those mention indexes, right?

Indexes required only because we need to make it fast, to be usable on practice. What if instead of exposing indexes, I would provide database with the list of queries I want to run and it would use some of heuristic to guess indexes? You see indexes is incidental complexity here. We could avoid it if there would be smart enough system which could solve it. But there are a lot of factors, and it is hard to solve this problem in general (or maybe nobody tried, because we got used to the idea).

Adrian B.G. • Jan 14 '19

I see so you are saying that SQL is an abstraction of data structures (databases), and the fact that we need to deal with indecs is a leak.

I see indecs are part of the SQL, as a data attribute, and also part of the technology that you have to use. Indeed not directly to the SQL language, but we dont use SQL, we use a database.

'Create table' or 'create index' is the same thing, you specify how do you want your data to be used, stored and retrived.

Arvind Padmanabhan • Aug 23 '19

There's a mistake in the post. LIKE "%abc%" is slow but the leading % is missing.

stereobooster • Aug 23 '19

Fixed thanks

Jonathan Boudreau • Jan 13 '19

I think that leaky abstractions are when you need to understand the underlying implementation details to understand it at all, not to optimize it.

stereobooster • Jan 13 '19

You said "optimize" as if this is something optional. And indeed if we take "make it work, make it right, make it fast" as rule, this can be treated so. But sometimes optimize can be "first-class" requirement, examples FPS for games, time to response in trading applications, realtime systems, script behind load balancer with short timeout (remember unicorn page in GitHub). In this case this argument of you need to optimize doesn't fit. But this is shadow area, we are arguing about semantics of loosely defined terminology.

Provide your example of leaky abstraction then, maybe, I will be able to see picture from your PoV.