DEV Community

Cover image for How to split text into sentences in NLP?
Pallavi Ratra for WinkJS

Posted on • Updated on • Originally published at winkjs.org

How to split text into sentences in NLP?

To split any text into sentences using winkNLP, read the text using readDoc. Then use the sentences method to get a collection of sentences from the text. Follow this with the out method to get this collection as a JavaScript array. This is how you can split a text into sentences:

// Load wink-nlp package  & helpers.
const winkNLP = require( 'wink-nlp' );
// Load "its" helper to extract item properties.
const its = require( 'wink-nlp/src/its.js' );
// Load english language model — light version.
const model = require( 'wink-eng-lite-model' );
// Instantiate winkNLP.
const nlp = winkNLP( model );

// Input text
const text = 'AI Inc. is focussing on AI. It is based in 
              the U.S.A. It was started on 06.12.2007.';
// Read text
const doc = nlp.readDoc( text );
// Extract sentences from the data
const sentences = doc.sentences().out();
console.log( sentences );
Enter fullscreen mode Exit fullscreen mode

This returns an array of sentences:

[
  'AI Inc. is focussing on AI.',
  'It is based in the U.S.A.',
  'It was started on 06.12.2007.'
]
Enter fullscreen mode Exit fullscreen mode

If no sentence-break is found in the input text, the output is the complete text as an array with a single member.

A sentence is usually split at a full-stop, question mark or exclamation mark. Even in the presence of abbreviations, honorifics, etc., winkNLP attempts to intelligently identify the sentence boundary.

Top comments (0)