DEV Community

Prateek Saxena for WinkJS

Posted on • Originally published at

How to find date and time in text?

Date and time, and other named entities can be extracted from a document using winkNLP. To do this, we’ll load a document and ask for its entities. We’ll then filter down to just the ones we need. If we want to get the Unix time from the text we can look at the shape and check if the text can directly be sent to the Date object to be parsed. Here is how we would find the date and time entities in a text using winkNLP:

// Load wink-nlp package & helpers.
const winkNLP = require( 'wink-nlp' );
const its = require( 'wink-nlp/src/its.js' );
const model = require( 'wink-eng-lite-model' );
const nlp = winkNLP( model );

const text = `The release happened on 21 August 2020 at 4:32 pm`;
var doc = nlp.readDoc(text);

doc.entities().filter( e => {
  if ( e.out( its.type ) === 'DATE' ) {
    console.log( e.out(), new Date( e.out() ) );
    // -> 21 August 2020
    // -> Fri Aug 21 2020 00:00:00 GMT+0530 (India Standard Time)

  if ( e.out( its.type ) === 'TIME' ) {
    console.log( 'Time:', e.out() );
    // -> Time:
    // -> 4:32pm
Enter fullscreen mode Exit fullscreen mode

This would give you all the strings that contain dates or times, and the Unix time corresponding to it if the format allows it. You can now use the usual JavaScript functions like sort on this data structure to get the insights you need. For example our Wikipedia Timeline showcase uses it to create visualizations of articles.

Raw texts may contain many named entities like time, money, and hashtags. The English language lite model for winkNLP finds entities spanning multiple tokens by employing pre-trained finite state machine.

Todo applications that automatically add due dates based on the text that was entered, or email clients that add events to your calendar based on time and location do this by using this form of named entity extraction. It can also be used to create a timeline of events based on raw text.

Top comments (0)