How many times have you done this?
const text = "I love pizza and pasta";
text.split(" ");
// ["I", "love", "pizza", "and", "pasta"]
Looks fine — until you deal with punctuation, multiple spaces, or languages that don’t even use spaces.
Try splitting a French sentence with apostrophes, or a Japanese one with no whitespace at all.
Suddenly .split(" ") feels more like a guess than a rule.
That’s where Intl.Segmenter comes in.
What Is Intl.Segmenter?
Intl.Segmenter is part of the JavaScript Internationalization API.
It splits text into human-perceived units — words, sentences, or characters — using the rules of a specific language.
In other words, it doesn’t just separate symbols.
It understands how people actually read and write.
Understanding Granularity
The secret sauce of Intl.Segmenter lies in its granularity — the level at which text is broken down.
You can choose between three modes:
-
grapheme→ splits text into visible characters (user-perceived symbols) -
word→ splits into words, respecting language-specific rules -
sentence→ splits into sentences, automatically detecting punctuation and spacing
Each level serves a different purpose:
use graphemes for counting characters, words for tokenization, and sentences for text analysis.
Here’s the beauty: the same phrase can be segmented differently depending on the locale.
Example: One Sentence, Three Languages
Let’s see how Intl.Segmenter handles the same idea across English, French, and Japanese.
const sentences = {
en: "I love sushi.",
fr: "J’aime les sushis.",
ja: "私は寿司が好きです。"
};
for (const [locale, text] of Object.entries(sentences)) {
const segmenter = new Intl.Segmenter(locale, { granularity: "word" });
const words = [...segmenter.segment(text)]
.filter(s => s.isWordLike)
.map(s => s.segment);
console.log(locale, words, `(${words.length} words)`);
}
Output:
en [ 'I', 'love', 'sushi' ] (3 words)
fr [ 'J’aime', 'les', 'sushis' ] (3 words)
ja [ '私', 'は', '寿司', 'が', '好き', 'です' ] (6 words)
Same sentence.
Different segmentation — perfectly adapted to each language’s rules.
Example: Word Segmentation (English)
const text = "Pizza, pasta and ice cream!";
const segmenter = new Intl.Segmenter("en", { granularity: "word" });
const words = [...segmenter.segment(text)]
.filter(s => s.isWordLike)
.map(s => s.segment);
console.log(words);
// ["Pizza", "pasta", "and", "ice", "cream"]
Each element is not just a string — it’s a full object with metadata:
{
segment: "Pizza",
index: 0,
input: "Pizza, pasta and ice cream!",
isWordLike: true
}
Example: Grapheme Segmentation
Need to count or slice user-visible characters correctly?
const text = "Café résumé";
const graphemeSegmenter = new Intl.Segmenter("en", { granularity: "grapheme" });
const graphemes = [...graphemeSegmenter.segment(text)].map(s => s.segment);
console.log(graphemes);
// ["C", "a", "f", "é", " ", "r", "é", "s", "u", "m", "é"]
Each accented character counts as one, not two code units.
Perfect for counters, highlighting, or text selection logic.
Example: Sentence Segmentation
const paragraph = "I love pizza. Pasta is great too! Let's eat.";
const sentenceSegmenter = new Intl.Segmenter("en", { granularity: "sentence" });
for (const segment of sentenceSegmenter.segment(paragraph)) {
console.log(segment.segment);
}
Output:
I love pizza.
Pasta is great too!
Let's eat.
Automatically handles punctuation and spacing.
Example: Real-world Use — Word Counter
You can build a word counter that works in any language:
function countWords(text, locale = "en") {
const segmenter = new Intl.Segmenter(locale, { granularity: "word" });
return [...segmenter.segment(text)].filter(s => s.isWordLike).length;
}
console.log(countWords("Pizza, pasta and ice cream!")); // 5
console.log(countWords("J’aime le chocolat et le café.")); // 6
console.log(countWords("私は寿司が好きです。")); // 6
No regex, no guesswork — fully locale-aware.
Conclusion
Splitting text isn’t just a technical task — it’s a linguistic one.
Intl.Segmenter gives JavaScript the power to understand words, sentences, and meaning, not just characters.
No regex. No libraries. No fragile logic.
To explore more, visit the official MDN Web Docs.
And connect on LinkedIn for more updates and insights.
Top comments (0)