Eric Normand

Posted on Jul 10, 2017 • Originally published at lispcast.com

When in doubt, refactor at the bottom

#abstraction #refactoring

I was recently in a discussion on JavaScript Jabber. Aimee Knight had been reading my articles about abstraction and asked a really poignant question, which I will paraphrase:

When do you think it's okay to extract out an abstraction?

I wasn't really prepared to answer this question. I was expecting to talk about ClojureScript. I gave the question a shot, but of course, after the recording I had a better idea of what to say.

One way to refactor code is to look for repetition. You look for blocks of code that are similar, then extract the common parts into a function. People justify this type of refactoring as supporting the principle of Don't Repeat Yourself. They call it DRYing up your code.

DRYing up your code has become so common that now there is advice against doing it too much. Sandi Metz's advice is very popular:

Duplication is far cheaper than the wrong abstraction.

Her article explains the problems very well. Go read it.

The situation goes like this: you see ten lines of code that are almost the same as ten lines of code somewhere else. They're doing something similar and you think you could extract out the common parts into a function and make the uncommon parts into parameters to the function. You do so, your code gets shorter, and you're happy.

Metz goes on to talk about the follow-on effects, which are that someone wants to reuse that abstraction but needs those ten lines to be different in just one more way, so they add another parameter. Parameters aggregate over time until the code is a mess.

People have different ways to mitigate this. Some people will say "wait until the code is repeated three times before you extract it". It is much more likely to be used a fourth time, and you're more likely to capture all of the parameters the first time.

This method does count as an abstraction since you are naming a function. The name, of course, is of vital importance. Many people say that if you can't think of a good name for your abstraction, it's probably not a thing. I'm a fan of that, but naming is hard. Sometimes I can't think of a good name when it really is a thing.

But there's a deeper problem. I actually think ten lines of code is too big for an abstraction (Metz agrees). Ten lines of code that happen to be repeated are so unlikely to have any interesting properties, including being bug-free, reusable, and composable, that you should be very skeptical of being able to factor the whole thing out as common. However, don't despair! You can more often than not find lots of one-, two-, or three-line bits of code that are 1.) easy to name and 2.) reusable. If you break enough of those out, you may start to see some underlying structure to those repeated ten lines as they become replaced by your new, small abstractions.

When people ask me "how do I refactor this?", that's my first response. Start naming the small bits, even if they're not repeated. Names give meaning to your code. It organizes your code and your thinking. And even if the named bit is never reused, it's still valuable to have a name attached. If you get some reuse out of it, that's gravy.

This process is a purely mechanical, almost statistical way of finding reusability. You're betting that smaller abstractions are more likely to be reusable than big ones. It's a good bet to make,. However, refactoring cannot fix code that's broken. It can only improve the quality of the code, not its underlying meaning.

Sometimes that's all you want to do. You've got some existing code. It works. You just want to clean it up. That's fine. It's like making your desk neater. You can stack up the papers more neatly, arrange the stapler and pencil holder. Throw away the old coke cans. You can do all of that without really knowing what the work is that is done at the desk.

But sometimes you need to look at the meaning of the code, and really do some work at that level. If you want to file those papers away, you have to know where to file them. You have to know what the paper is about and why you have it. Do we need this stapler? Maybe it's not used in this line of work? Maybe it needs to be a longer stapler. We need to know much more about the meaning of that stapler in the work being done.

At this point, simple metrics about how many times something is repeated or how many lines of code are extracted won't cut it. In fact, simply extracting existing code won't cut it at all. We need to go deeper into what we're trying to do and find the structure there--not in our code itself. When you're at that point, there isn't much advice. Here is what I've found/created.

Domain-Driven Design is a methodology for developing code alongside domain experts to really get into the "what and why" of your abstractions.
Denotational Design is a method for thinking deeply about abstractions using Category Theory.
Building Composable Abstractions is my attempt to elaborate a process for designing abstractions that doesn't require complex math. It relies heavily on your intuition of physical systems and delays writing code as long as possible.

I appreciate these methods because they force us to revisit the domain in deep ways.

Conclusions

Repetition in our code can be a sign that there is an abstraction lurking in our code, ready to be extracted. Although the sign may be clear, we can't always be sure how best to extract it. If you're unsure, don't extract it. The longer the abstraction is, the more likely it won't be reusable. But every ten-line bit of repeated code has nine two-line bits and eight three-line bits. There's probably something there to extract. Start there, with smaller abstractions. Start refactoring at the bottom! Finally, there are limits to refactoring. Sometimes you need to revisit the meaning of your abstractions, not just the current code.

If you're interested in the big ideas in Computer Science and the philosophical questions of programming, you should check out the PurelyFunctional.tv Newsletter. It's a weekly romp through the theory and practice of Clojure and functional programming.

Top comments (3)

Andy Zhao (he/him) • Jul 11 '17

I really like the cleaning up your desk analogy. Makes a lot of sense to me.

Also, I think the "name first, use later" approach is really helpful. @ben also gave me the same advice, and it helps me think in ways I wouldn't have before.

Solid article for people like me who are just getting started in their career! Thanks for the good read as well.

Alexey Voinov • Jul 11 '17

I've tried to write a comment here three times, and then deleted it. This one is the fourth. :) I'm not sure what's bothering me here. The idea of refactoring at the bottom is good. Smaller abstractions are perfect. But the idea of just leaving duplicated code makes me feel uneasy. And even that is not completely true. The idea of leaving duplicated code and feeling that it is ok, that's what I think is not good.

What I think would be slightly better, is to refactor anyway, even if it is duplicated "only" in two places, and there's a good chance that this code will be used more. I see two rules here: 1. abstractions should be meaningful, extracting code that makes sense only structurally is rarely a good thing. 2. when you see a need to use this abstraction once more and add a parameter, refactor once more. With more information you can achieve better structures, better abstractions. And this doesn't mean that you should not extract when you think you don't have enough information. That should be an iterative process, possibly infinite. :)

I hope I was able to express what I had in mind. Sometimes it is quite hard to do, especially for the things that seems obvious.

Gage • Oct 18 '17 • Edited

The main danger is when you bring two things together you are in a saying they are the same. It's like organising so the papers on your desk by color, at first this makes a ton of sense and brings a sense of Zen. However over time your boss informs you that he needs all the bills that were on your desk to be paid. This goes against your storage system, they are all mixed up! They seemed so similar however the more you learn they grow apart. This is no problem if you have the time to refactor your system to be more bill driven rather than color driven but often that abstraction becomes so ingrained it's very expensive to change. Waiting for more information is important.