TL;DR
Novu's team encountered a significant bug affecting date calculations in their CI/CD pipelines, hindering all deployments.
The issue arose from the date-fns library's addMonths and subMonths functions.
We fixed this by using addDays and subDays functions instead.
Novu: Open-source notification infrastructure 🚀
Just a quick background about us. Novu is an open-source notification infrastructure. We basically help to manage all the product notifications. It can be In-App (the bell icon like you have in the Dev Community - Websockets), Emails, SMSs and so on.
The Mindset
When working in software development, we're always prepared for bugs to crop up.
Sometimes they're small, easy to identify, and quick to fix.
Other times, they're like this year's candidate for our 'Bug Of The Year'.
This was a bug so elusive and mysterious that it had us rummaging through our pipelines, questioning our code-base, and coming face-to-face with the intricacies of date manipulation.
Problems, Different Problems, and More Problems
Our CI/CD pipelines were failing. Specifically, two tests which were blocking ALL new deployments. It was time to put on our detective hats 🕵️.
We dove into our commit history using git bisect
however it offered us no insight. Git bisect took us back to commits that where over 6 months in the past, long before any of our newest changes to the system that would have caused this. Was this bug created at the very beginning of Novu?
However, we did have a clue. Our failing unit tests showed us that we had incorrect date calculations.
Gathering the Clues 💡
Strangely, the difference was just one day.
const startDate = new Date("2023-08-31");
const oneMonthAhead = addMonths(startDate, 1);
const result = subMonths(oneMonthAhead, 1);
console.log(result); // Expected: 31st of August, Reality: 30th of August
We also found that this does not happen on 31st July.
const startDate = new Date("2023-07-31");
const oneMonthAhead = addMonths(startDate, 1);
const result = subMonths(oneMonthAhead, 1);
console.log(result); // Expected: 31st of July, Reality: 31th of July
But the bug shows up again January 31st.
const startDate = new Date("2023-01-31");
const oneMonthAhead = addMonths(startDate, 1);
const result = subMonths(oneMonthAhead, 1);
console.log(result); // Expected: 31st of January, Reality: 28th of January
So this bug only happens when we add 1 month to a month that has more days then the next month and then subtract 1 month to go back to the month before.
This is a sneaky one
So here is what we know so far:
- It would only show up on systems that does this specific sequence of logic.
- The code would have to be ran on one of the few dates that are effected.
- This effect is not documented anywhere on any of the libraries we use.
The worst thing is that this bug is also shows up HR tools, finance tools, salary tools, public government tools all rely on this package but unfortunately it is still better then us making the functions our-self's.
It has been said many times that date-times are among the trickiest aspects of programming, and our current predicament served as a hash reminder.
Why a simple actions can lead to bad things
After finding this out, we had a 'Eureka!' moment.
Our CTO, Dima Grossman, then had the idea to try it it on raycast. Interestingly enough it was happening in their product too.
We realized that the issue stemmed from being on the last day of the month, but what exactly was going awry?
The Culprit:
This popular utility library for date operations was at the heart of the problem.
Specifically, the addMonths
and subMonths
functions.
The addMonths
function, when adding a month to the last day of any given month, would take you to the last day of the following month. Logical, right?
// source: https://github.com/date-fns/date-fns/blob/main/src/addMonths/index.ts
const daysInMonth = endOfDesiredMonth.getDate()
if (dayOfMonth >= daysInMonth) {
// If we're already at the end of the month, then this is the correct date
// and we're done.
return endOfDesiredMonth
} else {
// Otherwise, we now know that setting the original day-of-month value won't
// cause an overflow, so set the desired day-of-month. Note that we can't
// just set the date of `endOfDesiredMonth` because that object may have had
// its time changed in the unusual case where where a DST transition was on
// the last day of the month and its local time was in the hour skipped or
// repeated next to a DST transition. So we use `date` instead which is
// guaranteed to still have the original time.
_date.setFullYear(
endOfDesiredMonth.getFullYear(),
endOfDesiredMonth.getMonth(),
dayOfMonth
)
return _date
}
But the subMonths
function, rather than having its own dedicated logic, simply reused addMonths
with a negative number. D.R.Y principles in action, but with an unintended consequence.
// source: https://github.com/date-fns/date-fns/blob/main/src/subMonths/index.ts
export default function subMonths<DateType extends Date>(
date: DateType | number,
amount: number
): DateType {
return addMonths(date, -amount)
}
Here is what exactly caused our issue
Let's put it this way:
- For 28th February, add one month and then subtract one month, and you get 28th February. No problems there.
- But, for 31st August, add one month and then subtract one month, and you land on... 30th August. That's one day lost in date limbo!
The core of the issue was the way addMonths
determined the end of the desired month.
For days that were not at the end of the month, the logic was sound.
However, for the last day of a month, the function defaulted to the end of the next month instead of adding the correct amount of days.
The Simple Fix
To ensure a consistent approach to date manipulation, we shifted from using addMonths
and subMonths
to addDays
and subDays
.
This provided a more granular and precise way to handle date calculations, and importantly, allowed us to sidestep the addMonths
pitfall.
Lessons Learnt
This bug served as a strong lesson in a few key areas:
- Assumptions are Risky: Never assume that widely-used libraries are infallible. Even the most popular ones have their quirks.
- Tests are Gold: If not for our rigorous testing suite, this bug might have remained hidden, only to wreak havoc at the most inopportune moment.
- Dates are Tricky: They've always been, and will continue to be, a challenging aspect of software development. Always handle with care.
While this bug threw a wrench in our pipes, it also reinforced the importance of comprehensive tests and the need to continually question and challenge our assumptions.
Death of this Bug
In a world of code where dates and times form such a crucial part of our applications, bugs like these provide not just a hiccup, but a learning opportunity. The next time you find a weird issue in your application, dig deep. Who knows, you might just uncover the next 'Bug Of The Year'.
You can find the PRs and Issues here:
Latest comments (44)
I wouldn't consider this a bug.
This works as I expect it to work.
I think your thought is very true, while in our perspective it was a bug I agree the actually logic is not a bug.
However, I do think that there should be a warning about this edge case as not everyone would be able to see it.
Month math and math with leap years is weird, I'm sure there is some sort of reason things are the way they are in calendars, but it is annoying that the stuff isn't very math, and consequentially computer, friendly...
The good old date time problem.
We have two popular problems in programming:
Now with this exposé, we have a third - date/time problems!
Thanks for writing this u @cliftonz . Good lessons learned
I think that the issue is one of definition rather than coding around the end of the month. We write software that calculates rental periods. When the rent is monthly, the start date could fall on (say) the 31st or 30th of the month. Advancing these through the year gives an ambiguity. When you get to (e.g.) February, both charges are adjusted to the 28th / 29th. What day should they be in March? They should be back to 31st or 30th. You can't know this without storing an additional peice of information, in this case the regular billing day. However, without needing to support both methods, most people expect the last day of the month to be dominant so, once the day is the last date of the month it stays there when next move a month. To easily spoort this dominant definition, you add one day to the date (for all calculations), then add / subtract the required number of months, then subtract one day. This makes the end of the month dominant. Remember that, if advancing a date monthly through the year using the dominant method, then any date with a day greater than or equal to 28 will eventually end up being the 31st or last day of the month (without also knowing the regular billing day to make the distinction).
The fact that the bug was happening 2 times a year brough me all the way here, like have you ever encountered a bug that only manifests say after 5 years under very specific scenarios? Damn it was working all long you say!!!
From my experience, I'd advise using always fixed version of the npm package. The most common usage I've seen is to use a tilde. Using tilde (~) gives you bug-fix releases. However, on CI/CD, npm may fetch a really tiny next version, and that may fail your build. You'll spend a whole day investigating the root cause. Even more confusion comes from the fact that package.json contains the same version, e.g., 3.4.0 in the repository and on your computer, but in fact on the server there might be 3.4.1 installed. Hence, fixed versions ensure that npm dependencies will always be on the same version.
It's fine using dependency constraints like
^
and~
as long as you are locking your dependencies with a lock file and only usenpm ci
inside your pipeline instead ofnpm i
as this will always take the resolves versions from the lockfile. If any commit breaks the pipeline you can just check if the lock file was updated 👍@niklaspor I don't recall exactly, but I think I had a past problem with
package-lock.json
. There were unresolvable conflicts. You can use the keywords "package lock json problem" to find out what others are struggling with.One of the points npm documentation says is "install exactly the same dependencies". That's what you can exactly achieve without
^
and~
.From my personal side, I have never found any real practical usage of
package-lock.json
.npm i
/npm install
will bump any package to the latest matching package version.npm ci
orpnpm --frozen-lockfile
will keep exactly the versions which were resolves in the lastnpm i
which was executes.Always use
npm ci
inside your pipelines, otherwise you risk getting different packages from your local installation, even if you don't use any ranged but just plain veesions. Also any package deeper in your dependency tree might specify a version range, which might lead to a newer resolves dependency, if you executenpm i
instead ofnpm ci
. Same for pnpm and yarn.I would even suggest you use
npm ci
on your local machine, when your working with any teams of bigger size and you don't want to update dependencies. Otherwise the code on your machine might differ from the one one your colleagues.stackoverflow.com/questions/524996...
support.deploybot.com/article/131-....
@niklaspor Let me put this into a different perspective. The most important thing here is to ask, what kind of problem is being solved here? So the problem is: how to keep npm package dependencies consistent? If using a fixed version works, any other solution will simply be unnecessary. And so far, the solution is working excellently.
As for CI/CD, it is organised in very different ways, and there is no single approach. So if
npm ci
solves specific problems for someone, then using this approach is the right solution.A bit off-topic. Recently I published zoned-date library.
dev.to/sang/javascript-zoned-date-...
The library has a sophisticated DST support (better than date-fns) and very elegant API interfaces. If you have ever had a problem with DST or timezone-related, I would highly recommend it. Of course, feedbacks are always recommended.
Thanks for letting us know!
Great catch! This is part of the 10% of bugs that make you question the fabric of nature 😁
What was the test about that helped you find the bug? If you feel comfortable sharing :)
The was ensuring that the date to send a notification was correct.
Such a simple thing turned into something much bigger.
Was a pull-request to
date-fns
created as a result?Absolutely, its linked in the bottom of the article, but you can go directly to issue with this link.
github.com/date-fns/date-fns/issue...
The picture has no link to your repo
Fixed :)