Refactoring the Worst Code I’ve Ever Written

Jacque Schrag on April 18, 2019

During the recent #DevDiscuss chat on "Developer Confessions", I confessed that I didn't really know what I was doing when I started my first dev j... [Read Full]
markdown guide
 

I remember when you posted this code on Twitter unabashedly, which I SUPER respected. It showed people especially new up and comers that EVERYONE starts somewhere.

This is doubly awesome because it can show people YOU will IMPROVE 🙂🙂🙂

 

Thank you so much! I was honestly really nervous about posting it on Twitter, so I'm glad it was so well received. I wish more developers would share their bad or in-progress code for exactly the reason you stated - everybody starts somewhere and everyone improves. :D

 

But nested loops can have significant negative performance impacts. This method focuses on each layer of categorization separately, eliminating extraneous looping and improving performance.

It's awesome that you've got an eye out for performance - that's always great to see. You are right that nested loops can be problematic, but keep in mind that the cost of the work done in the loop is the main factor.

In this case it's actually 6x faster to build your structure using nested loops, since the cost of creating empty arrays is tiny compared to the overhead of all the other calls in your write-up. I'd argue that the nested loop version is easier to understand too:

const types = ['in', 'out'];
const years = [2000, 2005, 2010, 2015, 2016, 2017];
const regions = ['Africa', 'Americas', 'Asia', 'Europe', 'Oceania'];

var dataset = {};

for (let typ of types) {
  dataset[typ] = {};

  for (let year of years) {
    dataset[typ][year] = {};

    for (let region of regions) {
      dataset[typ][year][region] = [];
    }
  }
}

As the number of values in types, years and regions increases, the nested loop implementation remains 6x faster as the main cost is running the rest of the code, not creating the array.

It's cool that you know how to reduce iterations like this, but nested loops aren't automatically a bad thing 🙂

 

Thanks for the comment! I'm aware that nested loops aren't always a bad thing, that's why I included the word "can" in the part of my article you quoted. :)

In this case, you're right, the nested loop is faster, so I stand corrected on the claim that the solution I proposed is better for performance. Most "best practice" articles/guides I've read have all suggested to stay away from nested loops when possible, so I opted to highlight a different method. Of course it's all context specific, so the "best" way is going to be variable depending on the problem being solved.

In any case, I've updated my article with a link to your comment so others can see it as well. Thanks!

 

Would you consider it highjacking your post if we used the comments to share "hold my beer" examples of our own terrible old code? I'm currently thinking of some ridiculously bloated and complicated Java code I wrote in years past when inheritance and generics were my solution to all problems.

 
 

"Hold my <E extends Beer>"

public abstract class EntityManager <E extends Entity,F extends EntityMetaData,G extends EntityMetaDataCollection<F>,H extends EntityWebService> {

  protected abstract EntityDialog<E,F,G,H,EntityManager<E,F,G,H>> createEntityDialog(F entityMetaData);

}

I once thought the above was good code and was proud of how I had ensured type-safety 😭

 

At a glance, I would say your original code, while not dry is actually better in terms of long term maintenance and easier for people to jump into.

The reason I say this is that we learn to make things dry and obsess to reduce our code but when you have to maintain a project for 5+ years and developers come and go you realize this is where the mantra "Clever vs Clear" steps into effect and you look at your code in a different way.

You're already a Pro 👍

 

Thanks for the comment! I think I’m going to disagree. Although I concede that at a glance, reading the hard coded version is easier to jump into, from a long term maintenance perspective it isn’t better. I’ve had to update the data for this interactive twice now, and both times something went wrong at first because I forgot to fully implement the new year. I’ve built a lot more interactives using the data driven approach that don’t have that same problem and can be updated just by uploading a new file. Yes, it takes longer for a new dev to read the code to understand how it works, but it’s straight forward enough that I think that extra mental effort pays for itself.

 

There are so many unknowns still unanswered so the best reply I could give was 'at a glance'.

 

First, I really enjoyed this, so please take that. It's lovely to see someone learn so much. This is the sort of code I try help my teammates write. Neat, organised and with clear optimisations.

I want to note a couple further optimisations though.

First, you could create the full dataset in a single reduce using the .add() method on Set. I don't know how Array.from on a Set is or a new Set on an array are implemented but if it isn't some recast of memory then you actually have a few implicit nested loops in this section.

Then, the .includes() test on currentRegions could be faster (and this might matter for large data sets) if the values in currentRegions were either a Set or stored as object keys. This would allow testing via .has() or .hasOwnProperty(), which are O(1) rather than O(n) lookups.

And this is less a matter of optimisation (indeed less optimal on memory but also not coupled to line location) than my taste for non-mutative code, but you can return directly from a reduce using something like:

return {...acc, [curr.type]: { [curr.year]: [...acc[curr.type][curr.year], ...curr] } }

But still bravo on a lot of this and thanks for writing your journey.

(Sorry to write without code highlighting.)

 

Hey, thanks for the comments! I'm always happy to hear constructive criticism on how to make my code better. :)

  1. If I'm understanding you correctly, you're suggesting creating the arrays of types, years, and regions within the .reduce function? That's not a bad idea, and you're right, would provide some optimization. I didn't explain this in the post, but part of my rationale for keeping it separate is to maintain separation of concerns. Those 3 lines are specifically about isolating the values of those variables within the dataset, while the .reduce accomplishes a separate task. But you make a good point and it's something to consider for future implementations!

  2. I hadn't considered storing those values as a Set instead of an array, that's a great idea

  3. Yep! However, I intentionally wrote the longer version for clarity. :) Certainly is worth considering how to make it more concise in production, though!

 

Re 1, it's not so much where it is but rather that I suspect Array.from loops and new Set loops and the map loops, which means 3 loops for each construction of those items.

That said, you could do it all in one reduce and keep the concerns separated with object keys, and then destructure them into variables in the outer scope.

I think re 3 you're talking about my non-mutation comment?

If you destructure the variables it can be terser, but my case is more for non-destructive operations. I don't like modifying things because it makes the original unavailable for later use without larger refactor. But again I'm an ideologue about this and it isn't common opinion.

 

Great post Jacque - it definitely looks cleaner in the end (and less brittle to boot).

Just wondering though - you've performed this refactor, but you've not mentioned how you knew you weren't changing the behaviour? Were there automated tests surrounding the code, or were you executing the code with some sample data and comparing the results manually, or were you doing something else?

 

Thanks so much!

Thanks for pointing out this omission. This particular project doesn't use tests, so my comparing the before/after code was done by manually checking the final dataset object and the returned UI on the data viz. It was definitely more tedious than it could have been if I had also added tests. :)

 

Nice Jacque! I remember that feeling of gnarly, non-DRY code from my first coding challenge at my bootcamp where I wrote out 500 something lines for converting a number to its roman numeral equivalent. This is a great solution and well explained as well!

 

Oh my gosh, that sounds awful 😂And absolutely something that I would have done. Here's to learning to spend our time on harder problems!

 

I'll never forget how I felt that day. It's burned in to my brain as the lowest I've ever felt as an engineer.

 

Thanks for sharing Jacque! It's nice to see that you could take something positive from a complicated experience haha! Can I suggest an article related to the code refactoring subject? This is it, a dev friend of mine wrote it: uruit.com/blog/code-refactoring/

 

Regarding your advice of reading our of code. There is a website that shows you code examples indexed from github for your code search.

I always forget the name, but it always appear when searching by code in Google.

Do you know something similar for Javascript?

It is very useful to see working examples from real life open source projects.

Unfortunately the search in github code via the web site is not very good.

I always get cloning the projects and using grep.

 

I'm afraid I'm not familiar with any tools like that for JavaScript, but that sounds awesome! If you think of the name, please update us!

 

I think is this one: programcreek

Now I realize that it covers also Python, Java and Scala.

This is an example of a search for pytz.timezone

 

Awesome - thanks for posting Jacque!!

I think this point you make is key:

Refactoring is hard because there are several ways to solve problems...

One thing I think helps, tell me if you agree, is if you know a lot more about the mission of the project you're on. In some cases, the mission might be: "I just need to get some web ui out there RIGHT NOW to show some investors what our app might look like," in which case your first solution may be perfectly acceptable.

One the other hand, if you know you need a robust, flexible solution that's going to be around for a long time and be able to adapt to any solution, then what you refactored might be perfect.

So I think it's very important to work with your tech lead to make sure the vision for the task is super clear. If you're the tech lead, make sure you are clear about what the mission is. Then that context helps inform your low level design choices...

What do you think?

Pretty sure, by the way, that my early code was far worse than yours btw!! Fortunately, it's all long gone now!

 

Thanks so much! Glad you enjoyed it. :) And glad to hear I'm not the only one who has made some poor choices with their code before!

I absolutely agree that knowing more about the mission helps in determining in what works as a solution. There is a very real danger in over-engineering something that can be adequately done with less polish. However, I also think we need to be careful to not let poor code choices (like my original snippet) that are done for demo purposes make it into the final code base. I know I've written something quickly for sake of getting it done, didn't take the time to clean it up or think about a "proper" solution, and now it's sitting in production adding to that project's tech debt.

So yes, 99% agree with a 1% caveat to be mindful that poor code doesn't get moved to production because it's already "done". :)

 

hi,Jacque
Can I translate your article into English and share it with Chinese developers?

 

Hi there! Sure, please go ahead. Just include proper credit & a link back to the original here on Dev.to. :) And please share the link with me once it's published!

 
 

Wow! Great to see such a progress! When I saw your initial code, I couldn't come up with an easy way to "DRY" it. :)

What you did is a great job. That also goes for the other "Options" you added later.

One thing that caught my eye:

Every time the user makes a selection, this method will run. Depending on the size of that dataset (remember, it grows every year), there could be a negative impact on performance. Modern browsers are efficient and the performance hit might be minuscule, but if we already know that the user can only select 1 type and 1 year at a time, we can be proactive about improving performance by grouping the data from the beginning.

...is performance. Yes, some other users already mentioned it. :)
I think we (me too) generally think too much about performance. One great book that I'm reading right now is Martin Fowler's "Refactoring" (pun? :D). In it, he writes, and I agree, to not be so strict about performance. Usually performance bottlenecks are in the parts of the apps we don't think they are.
I'd just add that without a benchmark, we are both just playing poker with no cards in the hand. :)
I'd just advise against even mentioning performance. If you're not writing the next react, just ignore it for this use-case.

 

As a Java person just starting out with JS, this post gives me serious hope that eventually I will write JavaScript code I'm not embarrassed of. Thanks for sharing!

 

I’m so glad you found it useful and encouraging! That’s exactly why I wanted to share it. I promise you will absolutely get to that point when you’re ready to share. 💚

 

Very nice article. Thanks.

Actually, you might have already learned that nobody makes the perfect choice at first try.

I like to start with simple versions and evolve it over time.

 

Pseudo code and a little bit of Swift language might help. You are doing a great job though. Wish I was younger to keep up.

 
Sloan, the sloth mascot Comment marked as low quality/non-constructive by the community View code of conduct

Refactoring the Worst Code I’ve Ever Written, Part 2

// Group data according to type and year
const dataset = data.reduce((acc, curr) => {
    acc[curr.type] = acc[curr.type] || {}
    acc[curr.type][curr.year] = acc[curr.type][curr.year] || []
    acc[curr.type][curr.year].push(curr)
    return acc
}, {})
 

Maybe I'm missing the point of your comment, but if you have recommendations for how to refactor that snippet, I'd honestly love to hear them! I did consider creating a dataset object first with the type & years predefined (like I created in Option 1) and setting that to the initial accumulator value, but this seemed cleaner to me. Would love to know if you had other thoughts!

 

Hello, didn't read the whole legacy code.
But this block of code took my attention, in concrete: useless write and duplicated reads.

I see better construction of it is:

const dataSet = data.reduce((acc, cur) => {
  const { type, year } = cur;

  if (acc[type] == null) acc[type] = {};
  if (acc[type][year] == null) acc[type][year] = [];

  acc[type][year].push(cur);
  return acc;
}, {});

Thanks for sharing! Just goes to show that there's a lot of different ways to solve the same problem.

@Jacque, it's way of not doing useless writing each time, but only in case when it needs.

Yep, I read it and can see what it does, thanks.

Yeah, Dmytro it's terser. It's really not among the more interesting comments you could make though given the optimisations actually available above.

Maybe consider when commenting on the internet about how crap someone's code is that yours is still very ...naive.

sauln, if take your comment seriously, not easy without smile, but anyway,
about "naive" – what do you mean from technical point of view?
Propose your variant, and highlight weak(or naive) sides of my example,
if you will be right with your proposition – I will agree with better solution with real advantages.

code of conduct - report abuse