Most developers know that they should write tests, yet few do. This article outlines a very specific scenario I was in recently where my tests came to my rescue, which demonstrates why test are so important.
I was put in charge of rewriting some legacy code to improve performance, reliability, visibility, etc. It was a scheduling system that triggered events at specific times that were defined by the users of the system, and they needed to be repeated at set intervals. The intervals were very dynamic and could be daily, weekly, monthly, on a set day of the month, on certain days of the week, and the kicker is that the users’ time zone needed to be considered. This meant that if they expected it at 1pm, the offset needed to be factored into the recalculation of when that event should fire next.
The rewrite went mostly well. Some bugs to iron out, but what new code doesn’t have those? Nothing major, until a very specific event in the system needed to be fired in a specific time zone. This is where our story really begins.
I get a call on my day off stating that a specific event is firing over and over again and would not stop. And to top it all off, the user is aware when these events fire, so I can only imagine how annoying this must have been. We disabled those events to prevent the user from getting spammed, but there was work to be done.
I traced the bug to a very specific function that is used by EVERY EVENT that gets processed by the system. It was the function that calculates the next time that each event in the system needed to fire next. It was good that the source was discovered very quickly, but now it needed to be fixed, and any incorrect calculation could cause A LOT of records to get recalculated incorrectly, potentially resulting in an even bigger issue.
When I started this rewrite, the first thing I did I take a TDD approach, which means I wrote unit tests before I wrote much (if any) production code. Along the way, I would fix any tests that broke and at the end of the rewrite, and a pretty comprehensive test suite that covered most of the system.
Because of this, I could actually write a test case to simulate the bug and see it and debug it while I was ironing out the issue. Also, I could see if any of the existing tests broke along the way. This provided not only a way for me to test the function without affecting production data, but a means to fix the bug with the confidence that the events that DID work will continue working.
The end result of course is that the bug was fixed (along with several other that were uncovered during my testing), the user was happy, the other users were not affected, and this was all done confidently and quickly.