DEV Community

Cover image for TIL: Many emojis are actually multiple emojis combined together, not single characters
Fin Chen
Fin Chen

Posted on • Edited on

TIL: Many emojis are actually multiple emojis combined together, not single characters

Ever wondered why some emojis take up more character count than expected? (Especially when dealing with input character count) Turns out many emojis are actually combinations of simpler ones!

Examples of composite emojis:

  • ๐Ÿ‘จโ€๐Ÿ‘จโ€๐Ÿ‘ฆโ€๐Ÿ‘ฆ = ๐Ÿ‘จ+๐Ÿ‘จ+๐Ÿ‘ฆ+๐Ÿ‘ฆ (family of four men)
  • ๐Ÿ‘ฉโ€๐Ÿ’ป = ๐Ÿ‘ฉ+๐Ÿ’ป (woman technologist)
  • ๐Ÿณ๏ธโ€๐ŸŒˆ = ๐Ÿณ๏ธ+๐ŸŒˆ (rainbow flag)
  • ๐Ÿ‘จโ€๐Ÿณ = ๐Ÿ‘จ+๐Ÿณ (man cook)
  • ๐Ÿง‘๐Ÿปโ€๐ŸŽจ = ๐Ÿง‘ + ๐Ÿป + ๐ŸŽจ (artist: light skin tone)

Developer gotcha: Different programming languages handle Unicode differently, so emoji length calculations can vary between frontend and backend. Always test your character limits with composite emojis.

JavaScript examples:

For javascript, Intl.Segmenter can be a great help

const family = '๐Ÿ‘จโ€๐Ÿ‘จโ€๐Ÿ‘ฆโ€๐Ÿ‘ฆ';

// Default length - counts UTF-16 code units
console.log(family.length); // 11

// Destructuring - counts grapheme clusters  
console.log([...family].length); // 7

// See the actual components
console.log(Array.from(family)); 
// ['๐Ÿ‘จ', 'โ€', '๐Ÿ‘จ', 'โ€', '๐Ÿ‘ฆ', 'โ€', '๐Ÿ‘ฆ']

// For accurate user-visible character count
const segmenter = new Intl.Segmenter('en', {granularity: 'grapheme'});
console.log([...segmenter.segment(family)].length); // 1
Enter fullscreen mode Exit fullscreen mode

Playground

I got curious about all possible combinations, so I made "Emoji Architect" to explore these emojis, with this tool you can:

  • Browse all composite emojis
  • See the breakdown of any emoji
  • Filter combinations by base emoji components

๐Ÿ”— https://www.thingsaboutweb.dev/en/emojiarchitect

More explanation and ways of building emojis coming soon...

Reference

Best reference is the spec

Top comments (0)