DEV Community

Alan Storm
Alan Storm

Posted on

Javascript Date String Parsing

One of my favorite features of PHP is the strtotime function. This function lets you pass in a date string and have a unix timestamp returned

    $time = strtotime('2021-04-01');
    echo date('c',$time),"\n";
    // outputs
    // 2021-04-01T00:00:00-07:00
Enter fullscreen mode Exit fullscreen mode

What's great about it is it works with a variety of date formats.

    $time = strtotime('04/01/2021');
    echo date('c',$time),"\n";
    // outputs
    // 2021-04-01T00:00:00-07:00
Enter fullscreen mode Exit fullscreen mode

And don't worry -- if you're all objects all the time the same string parsing behavior works with PHP's DateTime class.

    $date = new DateTime('April 1, 2020');
    echo $date->format('c'),"\n";
    // outputs
    // 2020-04-01T00:00:00-07:00
Enter fullscreen mode Exit fullscreen mode

With strtotime if you're working with sketchy data (in other words -- real data) you have a bit more confidence that your code will keep working when/if your code encounters an unexpected date format.

Javascript's Date.parse

Javascript has similar functionality built in to its Date object. Unfortunately, there's a few weird edge cases around timezones that make it unreliable. The following examples all use a Node.js 14.2 REPL, but should apply generally to modern versions of javascript.

In javascript, you can use the Date.parse method to automatically parse a date string and get a unix timestamp back, or you can pass a string directly to the Date object's constructor function.

    $ node
    Welcome to Node.js v14.2.0.
    Type ".help" for more information.
    > Date.parse('April 1, 2021')
    1617260400000
    > new Date('April 1, 2021')
    2021-04-01T07:00:00.000Z
Enter fullscreen mode Exit fullscreen mode

Right away we see a few small differences from strtotime. First, javascript reports its unix epoch timestamps in milliseconds, not seconds. Second, javascript's ISO date formatting (the 'c' format in PHP's stringToTime) always reports using UTC time (indicated by the trailing Z), where PHP reports the timezone offset from UTC. So these two ISO date strings

2021-04-01T00:00:00-07:00
2021-04-01T07:00:00.000Z
Enter fullscreen mode Exit fullscreen mode

represent the same time.

Note: All example in this article were run on a computer setup for US West Coast time during daylight savings time -- you may see an offset other than seven hours depending on when and where you run the code samples.

So far these are important, but small, differences. The bigger difference comes when you start using date strings that look like they're part of an ISO 8601 date string

    > new Date('2021-04-01')
    2021-04-01T00:00:00.000Z
Enter fullscreen mode Exit fullscreen mode

You'll see that, like before, javascript's using a Z to indicate the date is in UTC time. However, you'll also notice the time is not 07:00:00 -- it's 00:00:00. In our previous examples javascript assumes a time of midnight using the current configured timezone. However, when we used 2021-04-01 as a date string, javascript assumed a time of midnight with a UTC timezone. Because 2021-04-01 looks like an incomplete ISO 8601 date, javascript assumed it was an ISO 8601 date with a missing timezone, and the timezone defaulted to UTC.

If you're not aware of it, this behavior can cause bugs in your program. I ran into this when I was processing some CSV files from banks. Some transactions appeared on the wrong day because one CSV files used YYYY-MM-DD format and another used the MM/DD/YYYY format.

This isn't the only problem with string parsing in the Date class. The MDN documentation on javascript's Date Time String Format covers some other edge cases you might be interested in.

Date Libraries

The Date object is one of javascript's original objects, and its behavior is not likely to change. If some javascript vendor "fixed" this to be more consistent, it would almost certainly break a large amount of code in the world. Because of this, most javascript programmers rely on a third party library to handle dates.

Let's look at four popular date handling libraries (date-fns, dayjs, luxon, and moment) and see how they handle YYYY-MM-DD case. The following examples presume you have these date libraries installed via npm.

$ npm install date-fns dayjs luxon moment
Enter fullscreen mode Exit fullscreen mode

Moment

The moment library is one of the most popular date libraries for javascript, even if its developers have stepped away from it and consider it "finished". Let's see how it handles abbreviated ISO date strings.

    > moment= require('moment')
    //...
    > moment('2021-04-01')
    Moment<2021-04-01T00:00:00-07:00>
Enter fullscreen mode Exit fullscreen mode

Success! Unlike the native Date object, moment doesn't assume a UTC timezone. Instead, it assumes the currently configured system timezone.

However, something interesting will happen if we try to parse a date string that's not ISO formatted.

    > moment('04/01/2021')
    Deprecation warning: value provided is not in a recognized RFC2822 or
    ISO format. moment construction falls back to js Date(), which is not
    reliable across all browsers and versions. Non RFC2822/ISO date formats
    are discouraged.

    Please refer to http://momentjs.com/guides/#/warnings/js-date/ for more info.
    /* ... */
    Moment<2021-04-01T00:00:00-07:00>
Enter fullscreen mode Exit fullscreen mode

The moment function still returns a date, but we get a warning that our date's in a format that moment doesn't recognize, and that moment is falling back to using javascript s built in Date. So, although we got the answer we wanted for our ISO 8601 date (Moment<2021-04-01T00:00:00-07:00>), we might not be so lucky if we were using a different version of javascript or a string format that wasn't ISO 8601 based.

Luxon

The luxon date library, (created by one of the maintainers of moment ) has an different approach.

Luxon can handle a variety of date formats, but does not attempt to automatically detect which format is which.

    const {DateTime} = require('luxon')

    DateTime.fromISO(...)
    DateTime.fromRFC2822(...)
    DateTime.fromSQL(...)
    DateTime.fromMillis(...)
    DateTime.fromSeconds(...)
    DateTime.fromJsDate(...)
Enter fullscreen mode Exit fullscreen mode

Luxon's philosophy is that it's up to you, the end-user-programmer, to know what sort of dates you're dealing with. If you call one of these methods with an invalid date format, luxon will return a DateTime object, but that object will be considered invalid

    > DateTime.fromISO('04/01/2021')
    DateTime {
      /* ... */
      invalid: Invalid {
        reason: 'unparsable',
        explanation: `the input "04/01/2021" can't be parsed as ISO 8601`
      },
      /* ... */
    }
Enter fullscreen mode Exit fullscreen mode

Day.js

Next up is Day.js, a library that prides itself its small size and a Moment.js like API.

Day.js seems capable of parsing a variety of date formats, and doesn't get caught up in the ISO 8601 UTC issue.

    > const dayjs = require('dayjs')
    undefined
    > dayjs('2021-04-01')
    d {
      /* ... */
      '$d': 2021-04-01T07:00:00.000Z,
      /* ... */
    }
    > dayjs('4/01/2021')
    d {
      /* ... */
      '$d': 2021-04-01T07:00:00.000Z,
      /* ... */
    }
Enter fullscreen mode Exit fullscreen mode

However, their docs page contain this vague warning.

For consistent results parsing anything other than ISO 8601 strings, you should use String + Format.

This hints that, behind the scenes, Day.js is doing some extra data validation and parsing, but ultimately just using a Date object for its parsing. Since Day.js is open source we can peek behind the scenes and confirm this is true.

This means if you're using Day.js and want consistent parsing of non-ISO dates, you'll need to use their CustomParseFormat plugin. The plugin allows you to define a string format that will parse a specific date string.

    > const dayjs = require('dayjs')
    /* ... */
    > const customParseFormat = require('dayjs/plugin/customParseFormat')
    /* ... */
    > dayjs.extend(customParseFormat)
    /* ... */
    > dayjs('04/01/2021', 'MM/DD/YYYY')
    d {
      /* ... */
      '$d': 2021-04-01T07:00:00.000Z,
      /* ... */
    }
Enter fullscreen mode Exit fullscreen mode

If your date is of a known format and uses one of the Day.js parsing tokens you'll be in good shape.

date-fns

The last date library we'll look at is date-fns, which describes itself as

like lodash for dates

The date-fns library prides itself on its size, boasting of 200+ functions in their GitHub README. When it comes to date parsing, date-fns has a parseISO function that's explicitly for parsing full and partial ISO date strings.

    > const datefns = require('date-fns')
    //...
    > datefns.parseISO('2021-04-01')
    2021-04-01T07:00:00.000Z
Enter fullscreen mode Exit fullscreen mode

Similar to the other library based solutions, this function will use the current timezone if one is not provided.

If your date is not an ISO like string, datefns provides a a format-string based solution via the parse method. Similar to Day.js, the parse method allows you to tell datefns how it should parse a date string.

    > foo = datefns.parse('04/01/2021','MM/dd/yyyy', (new Date))
    2021-04-01T07:00:00.000Z
Enter fullscreen mode Exit fullscreen mode

That third required parameter is aDate object -- per the docs, parse will use this object to

define values missing from the parsed dateString

What this means in practice we'll leave as an exercise for the reader -- for the general case this means passing in a new Date instance.

Another thing to watch out for here -- those format tokens aren't the same tokens used in other libraries.

Responsibility Shifted

As you can see, there's a variety of libraries and approaches available to a javascript developer to work around the non-ideal default behavior of javascript's Date object. However, you also may have noticed that none of their libraries attempts to solve the problem of generic date string parsing. Instead, they offer the end-user-programmer a variety of options for dealing with date strings, but it's the client programmer's responsibility to identify which format their dates are using.

Put another way, if you have a bank CSV file that includes dates in the format

04/01/2021
Enter fullscreen mode Exit fullscreen mode

you'll either be writing a format string to parse this specific date format, or parsing your date string into its month/day/year parts yourself. If you have a datasource where the date format varies, you'll be writing code to identify what format that is.

This fits in with the general trend in open source code over the past 5 - 10 years. More often than not creators and maintainers of software libraries are trying to limit the scope of what the code they put out in the world does in order to limit the scope of what they need to support in the future.

Porting strtotime?

After doing this all this research I had one last question -- why not just port strtotime to other languages? I went looking and found two things worth mentioning.

First, the implementation of strtotime is a textbook study in why other people's C code is not where you want to spend time. You can see the guts of the implementation logic here. This isn't stock C code -- it's code for a system called re2c. This system allows you to write regular expressions in a custom DSL (domain specific language), and then transform/compile those regular expressions down to C programs (also C++ and Go) that will execute those regular expressions. Something in PHP's make file uses this parse_date.re file to generate parse_date.c. If you don't realize parse_date.c is a generated file, this can be extremely rough going. If you've not familiar with re2c is can be regular rough going. We leave further exploration as an exercise for the reader -- an exercise we haven't taken ourself.

So porting this function isn't a straight forward task, but there is a community driven open source package named locutus that's trying. In their own words

Locutus is a project that seeks to assimilate other languages’ standard libraries to JavaScript. Why, you ask? Well, firstly because we can of course! Apart from that, it can also serve as a nice pastime for a rainy Sunday afternoon. Not only can porting a function be quite rewarding, but it also deepens your understanding of different languages. In that sense, it is not unlike doing a crossword puzzle.

This package includes an implementation of PHP's strtotime function. While it's not a direct port of the re2c PHP regular expressions, it does seem to handle the date formats we've used in this article. A program like this

    const strtotime = require('locutus/php/datetime/strtotime')
    console.log(new Date(strtotime('April 1, 2021') * 1000))
    console.log(new Date(strtotime('4/1/2021') * 1000))
    console.log(new Date(strtotime('2021-04-01') * 1000))
Enter fullscreen mode Exit fullscreen mode

results in output like this

2021-04-01T07:00:00.000Z
2021-04-01T07:00:00.000Z
2021-04-01T07:00:00.000Z
Enter fullscreen mode Exit fullscreen mode

Identical dates, created with a date of midnight in the local timezone, represented as a UTC date.

Top comments (1)

Collapse
 
melvyn_sopacua_afcf30b58a profile image
Melvyn Sopacua

Hi Alan, nice write up, but what I find way more bothersome is that it's impossible to get a date without a time in JavaScript. For something as simple as a birthdate, a time (and timezone) is irrelevant, but can lead to misinterpretations when serialized. Sometimes this happens behind the scenes and you're a day off for reasons hidden in frameworks that convert to UTC before sending things over the wire.

Anyway, thanks for the overview. I'm moving to Luxon after reading this. I'm all for explicit over magic. Define what you support and be strict about it. I'd rather aid a customer in providing the input I need (writing a wrapper/post processor for their export tool for instance), then guessing what the customer meant.