One of my favorite features of PHP is the strtotime
function. This function lets you pass in a date string and have a unix timestamp returned
$time = strtotime('2021-04-01');
echo date('c',$time),"\n";
// outputs
// 2021-04-01T00:00:00-07:00
What's great about it is it works with a variety of date formats.
$time = strtotime('04/01/2021');
echo date('c',$time),"\n";
// outputs
// 2021-04-01T00:00:00-07:00
And don't worry -- if you're all objects all the time the same string parsing behavior works with PHP's DateTime
class.
$date = new DateTime('April 1, 2020');
echo $date->format('c'),"\n";
// outputs
// 2020-04-01T00:00:00-07:00
With strtotime
if you're working with sketchy data (in other words -- real data) you have a bit more confidence that your code will keep working when/if your code encounters an unexpected date format.
Javascript's Date.parse
Javascript has similar functionality built in to its Date
object. Unfortunately, there's a few weird edge cases around timezones that make it unreliable. The following examples all use a Node.js 14.2 REPL, but should apply generally to modern versions of javascript.
In javascript, you can use the Date.parse
method to automatically parse a date string and get a unix timestamp back, or you can pass a string directly to the Date
object's constructor function.
$ node
Welcome to Node.js v14.2.0.
Type ".help" for more information.
> Date.parse('April 1, 2021')
1617260400000
> new Date('April 1, 2021')
2021-04-01T07:00:00.000Z
Right away we see a few small differences from strtotime
. First, javascript reports its unix epoch timestamps in milliseconds, not seconds. Second, javascript's ISO date formatting (the 'c'
format in PHP's stringToTime
) always reports using UTC time (indicated by the trailing Z
), where PHP reports the timezone offset from UTC. So these two ISO date strings
2021-04-01T00:00:00-07:00
2021-04-01T07:00:00.000Z
represent the same time.
Note: All example in this article were run on a computer setup for US West Coast time during daylight savings time -- you may see an offset other than seven hours depending on when and where you run the code samples.
So far these are important, but small, differences. The bigger difference comes when you start using date strings that look like they're part of an ISO 8601 date string
> new Date('2021-04-01')
2021-04-01T00:00:00.000Z
You'll see that, like before, javascript's using a Z
to indicate the date is in UTC time. However, you'll also notice the time is not 07:00:00
-- it's 00:00:00
. In our previous examples javascript assumes a time of midnight using the current configured timezone. However, when we used 2021-04-01
as a date string, javascript assumed a time of midnight with a UTC timezone. Because 2021-04-01
looks like an incomplete ISO 8601 date, javascript assumed it was an ISO 8601 date with a missing timezone, and the timezone defaulted to UTC.
If you're not aware of it, this behavior can cause bugs in your program. I ran into this when I was processing some CSV files from banks. Some transactions appeared on the wrong day because one CSV files used YYYY-MM-DD
format and another used the MM/DD/YYYY
format.
This isn't the only problem with string parsing in the Date
class. The MDN documentation on javascript's Date Time String Format covers some other edge cases you might be interested in.
Date Libraries
The Date
object is one of javascript's original objects, and its behavior is not likely to change. If some javascript vendor "fixed" this to be more consistent, it would almost certainly break a large amount of code in the world. Because of this, most javascript programmers rely on a third party library to handle dates.
Let's look at four popular date handling libraries (date-fns
, dayjs
, luxon
, and moment
) and see how they handle YYYY-MM-DD
case. The following examples presume you have these date libraries installed via npm.
$ npm install date-fns dayjs luxon moment
Moment
The moment library is one of the most popular date libraries for javascript, even if its developers have stepped away from it and consider it "finished". Let's see how it handles abbreviated ISO date strings.
> moment= require('moment')
//...
> moment('2021-04-01')
Moment<2021-04-01T00:00:00-07:00>
Success! Unlike the native Date
object, moment doesn't assume a UTC timezone. Instead, it assumes the currently configured system timezone.
However, something interesting will happen if we try to parse a date string that's not ISO formatted.
> moment('04/01/2021')
Deprecation warning: value provided is not in a recognized RFC2822 or
ISO format. moment construction falls back to js Date(), which is not
reliable across all browsers and versions. Non RFC2822/ISO date formats
are discouraged.
Please refer to http://momentjs.com/guides/#/warnings/js-date/ for more info.
/* ... */
Moment<2021-04-01T00:00:00-07:00>
The moment
function still returns a date, but we get a warning that our date's in a format that moment doesn't recognize, and that moment is falling back to using javascript s built in Date
. So, although we got the answer we wanted for our ISO 8601 date (Moment<2021-04-01T00:00:00-07:00>
), we might not be so lucky if we were using a different version of javascript or a string format that wasn't ISO 8601 based.
Luxon
The luxon date library, (created by one of the maintainers of moment ) has an different approach.
Luxon can handle a variety of date formats, but does not attempt to automatically detect which format is which.
const {DateTime} = require('luxon')
DateTime.fromISO(...)
DateTime.fromRFC2822(...)
DateTime.fromSQL(...)
DateTime.fromMillis(...)
DateTime.fromSeconds(...)
DateTime.fromJsDate(...)
Luxon's philosophy is that it's up to you, the end-user-programmer, to know what sort of dates you're dealing with. If you call one of these methods with an invalid date format, luxon will return a DateTime
object, but that object will be considered invalid
> DateTime.fromISO('04/01/2021')
DateTime {
/* ... */
invalid: Invalid {
reason: 'unparsable',
explanation: `the input "04/01/2021" can't be parsed as ISO 8601`
},
/* ... */
}
Day.js
Next up is Day.js, a library that prides itself its small size and a Moment.js like API.
Day.js seems capable of parsing a variety of date formats, and doesn't get caught up in the ISO 8601 UTC issue.
> const dayjs = require('dayjs')
undefined
> dayjs('2021-04-01')
d {
/* ... */
'$d': 2021-04-01T07:00:00.000Z,
/* ... */
}
> dayjs('4/01/2021')
d {
/* ... */
'$d': 2021-04-01T07:00:00.000Z,
/* ... */
}
However, their docs page contain this vague warning.
For consistent results parsing anything other than ISO 8601 strings, you should use String + Format.
This hints that, behind the scenes, Day.js is doing some extra data validation and parsing, but ultimately just using a Date
object for its parsing. Since Day.js is open source we can peek behind the scenes and confirm this is true.
This means if you're using Day.js and want consistent parsing of non-ISO dates, you'll need to use their CustomParseFormat
plugin. The plugin allows you to define a string format that will parse a specific date string.
> const dayjs = require('dayjs')
/* ... */
> const customParseFormat = require('dayjs/plugin/customParseFormat')
/* ... */
> dayjs.extend(customParseFormat)
/* ... */
> dayjs('04/01/2021', 'MM/DD/YYYY')
d {
/* ... */
'$d': 2021-04-01T07:00:00.000Z,
/* ... */
}
If your date is of a known format and uses one of the Day.js parsing tokens you'll be in good shape.
date-fns
The last date library we'll look at is date-fns
, which describes itself as
like lodash for dates
The date-fns
library prides itself on its size, boasting of 200+ functions in their GitHub README. When it comes to date parsing, date-fns
has a parseISO
function that's explicitly for parsing full and partial ISO date strings.
> const datefns = require('date-fns')
//...
> datefns.parseISO('2021-04-01')
2021-04-01T07:00:00.000Z
Similar to the other library based solutions, this function will use the current timezone if one is not provided.
If your date is not an ISO like string, datefns
provides a a format-string based solution via the parse
method. Similar to Day.js, the parse
method allows you to tell datefns
how it should parse a date string.
> foo = datefns.parse('04/01/2021','MM/dd/yyyy', (new Date))
2021-04-01T07:00:00.000Z
That third required parameter is aDate
object -- per the docs, parse
will use this object to
define values missing from the parsed dateString
What this means in practice we'll leave as an exercise for the reader -- for the general case this means passing in a new Date
instance.
Another thing to watch out for here -- those format tokens aren't the same tokens used in other libraries.
Responsibility Shifted
As you can see, there's a variety of libraries and approaches available to a javascript developer to work around the non-ideal default behavior of javascript's Date
object. However, you also may have noticed that none of their libraries attempts to solve the problem of generic date string parsing. Instead, they offer the end-user-programmer a variety of options for dealing with date strings, but it's the client programmer's responsibility to identify which format their dates are using.
Put another way, if you have a bank CSV file that includes dates in the format
04/01/2021
you'll either be writing a format string to parse this specific date format, or parsing your date string into its month/day/year parts yourself. If you have a datasource where the date format varies, you'll be writing code to identify what format that is.
This fits in with the general trend in open source code over the past 5 - 10 years. More often than not creators and maintainers of software libraries are trying to limit the scope of what the code they put out in the world does in order to limit the scope of what they need to support in the future.
Porting strtotime?
After doing this all this research I had one last question -- why not just port strtotime
to other languages? I went looking and found two things worth mentioning.
First, the implementation of strtotime
is a textbook study in why other people's C code is not where you want to spend time. You can see the guts of the implementation logic here. This isn't stock C code -- it's code for a system called re2c. This system allows you to write regular expressions in a custom DSL (domain specific language), and then transform/compile those regular expressions down to C programs (also C++ and Go) that will execute those regular expressions. Something in PHP's make file uses this parse_date.re
file to generate parse_date.c
. If you don't realize parse_date.c
is a generated file, this can be extremely rough going. If you've not familiar with re2c
is can be regular rough going. We leave further exploration as an exercise for the reader -- an exercise we haven't taken ourself.
So porting this function isn't a straight forward task, but there is a community driven open source package named locutus that's trying. In their own words
Locutus is a project that seeks to assimilate other languages’ standard libraries to JavaScript. Why, you ask? Well, firstly because we can of course! Apart from that, it can also serve as a nice pastime for a rainy Sunday afternoon. Not only can porting a function be quite rewarding, but it also deepens your understanding of different languages. In that sense, it is not unlike doing a crossword puzzle.
This package includes an implementation of PHP's strtotime
function. While it's not a direct port of the re2c
PHP regular expressions, it does seem to handle the date formats we've used in this article. A program like this
const strtotime = require('locutus/php/datetime/strtotime')
console.log(new Date(strtotime('April 1, 2021') * 1000))
console.log(new Date(strtotime('4/1/2021') * 1000))
console.log(new Date(strtotime('2021-04-01') * 1000))
results in output like this
2021-04-01T07:00:00.000Z
2021-04-01T07:00:00.000Z
2021-04-01T07:00:00.000Z
Identical dates, created with a date of midnight in the local timezone, represented as a UTC date.
Top comments (1)
Hi Alan, nice write up, but what I find way more bothersome is that it's impossible to get a date without a time in JavaScript. For something as simple as a birthdate, a time (and timezone) is irrelevant, but can lead to misinterpretations when serialized. Sometimes this happens behind the scenes and you're a day off for reasons hidden in frameworks that convert to UTC before sending things over the wire.
Anyway, thanks for the overview. I'm moving to Luxon after reading this. I'm all for explicit over magic. Define what you support and be strict about it. I'd rather aid a customer in providing the input I need (writing a wrapper/post processor for their export tool for instance), then guessing what the customer meant.