When in doubt, refactor at the bottom
Eric Normand Jul 10, 2017
When do you think it's okay to extract out an abstraction?
I wasn't really prepared to answer this question. I was expecting to talk about ClojureScript. I gave the question a shot, but of course, after the recording I had a better idea of what to say.
One way to refactor code is to look for repetition. You look for blocks of code that are similar, then extract the common parts into a function. People justify this type of refactoring as supporting the principle of Don't Repeat Yourself. They call it DRYing up your code.
DRYing up your code has become so common that now there is advice against doing it too much. Sandi Metz's advice is very popular:
Duplication is far cheaper than the wrong abstraction.
Her article explains the problems very well. Go read it.
The situation goes like this: you see ten lines of code that are almost the same as ten lines of code somewhere else. They're doing something similar and you think you could extract out the common parts into a function and make the uncommon parts into parameters to the function. You do so, your code gets shorter, and you're happy.
Metz goes on to talk about the follow-on effects, which are that someone wants to reuse that abstraction but needs those ten lines to be different in just one more way, so they add another parameter. Parameters aggregate over time until the code is a mess.
People have different ways to mitigate this. Some people will say "wait until the code is repeated three times before you extract it". It is much more likely to be used a fourth time, and you're more likely to capture all of the parameters the first time.
This method does count as an abstraction since you are naming a function. The name, of course, is of vital importance. Many people say that if you can't think of a good name for your abstraction, it's probably not a thing. I'm a fan of that, but naming is hard. Sometimes I can't think of a good name when it really is a thing.
But there's a deeper problem. I actually think ten lines of code is too big for an abstraction (Metz agrees). Ten lines of code that happen to be repeated are so unlikely to have any interesting properties, including being bug-free, reusable, and composable, that you should be very skeptical of being able to factor the whole thing out as common. However, don't despair! You can more often than not find lots of one-, two-, or three-line bits of code that are 1.) easy to name and 2.) reusable. If you break enough of those out, you may start to see some underlying structure to those repeated ten lines as they become replaced by your new, small abstractions.
When people ask me "how do I refactor this?", that's my first response. Start naming the small bits, even if they're not repeated. Names give meaning to your code. It organizes your code and your thinking. And even if the named bit is never reused, it's still valuable to have a name attached. If you get some reuse out of it, that's gravy.
This process is a purely mechanical, almost statistical way of finding reusability. You're betting that smaller abstractions are more likely to be reusable than big ones. It's a good bet to make,. However, refactoring cannot fix code that's broken. It can only improve the quality of the code, not its underlying meaning.
Sometimes that's all you want to do. You've got some existing code. It works. You just want to clean it up. That's fine. It's like making your desk neater. You can stack up the papers more neatly, arrange the stapler and pencil holder. Throw away the old coke cans. You can do all of that without really knowing what the work is that is done at the desk.
But sometimes you need to look at the meaning of the code, and really do some work at that level. If you want to file those papers away, you have to know where to file them. You have to know what the paper is about and why you have it. Do we need this stapler? Maybe it's not used in this line of work? Maybe it needs to be a longer stapler. We need to know much more about the meaning of that stapler in the work being done.
At this point, simple metrics about how many times something is repeated or how many lines of code are extracted won't cut it. In fact, simply extracting existing code won't cut it at all. We need to go deeper into what we're trying to do and find the structure there--not in our code itself. When you're at that point, there isn't much advice. Here is what I've found/created.
- Domain-Driven Design is a methodology for developing code alongside domain experts to really get into the "what and why" of your abstractions.
- Denotational Design is a method for thinking deeply about abstractions using Category Theory.
- Building Composable Abstractions is my attempt to elaborate a process for designing abstractions that doesn't require complex math. It relies heavily on your intuition of physical systems and delays writing code as long as possible.
I appreciate these methods because they force us to revisit the domain in deep ways.
Repetition in our code can be a sign that there is an abstraction lurking in our code, ready to be extracted. Although the sign may be clear, we can't always be sure how best to extract it. If you're unsure, don't extract it. The longer the abstraction is, the more likely it won't be reusable. But every ten-line bit of repeated code has nine two-line bits and eight three-line bits. There's probably something there to extract. Start there, with smaller abstractions. Start refactoring at the bottom! Finally, there are limits to refactoring. Sometimes you need to revisit the meaning of your abstractions, not just the current code.
If you're interested in the big ideas in Computer Science and the philosophical questions of programming, you should check out the PurelyFunctional.tv Newsletter. It's a weekly romp through the theory and practice of Clojure and functional programming.