DEV Community

Michael Lip
Michael Lip

Posted on • Originally published at zovo.one

The Unicode Emoji System Is More Complex Than You Think

There are over 3,600 emoji in Unicode 15.1. Finding the right one by scrolling through your phone's keyboard is slow. Understanding how emoji are encoded helps developers handle them correctly in text processing, databases, and rendering.

How emoji work in Unicode

Each emoji is one or more Unicode code points. Simple emoji like the smiley face are a single code point:

  • Smiling face: U+1F600
  • Heart: U+2764
  • Thumbs up: U+1F44D

But many emoji are composed of multiple code points joined together:

Skin tone modifiers: The thumbs up emoji (U+1F44D) followed by a skin tone modifier (U+1F3FB through U+1F3FF) produces a thumbs up with a specific skin tone. That is two code points rendering as one visible character.

Gender modifiers: The "person running" emoji (U+1F3C3) plus a Zero-Width Joiner (ZWJ, U+200D) plus the female sign (U+2640) plus a Variation Selector (U+FE0F) produces the "woman running" emoji. Four code points, one visible character.

Family emoji: The family emoji with two parents and two children can be up to 11 code points: person + ZWJ + person + ZWJ + child + ZWJ + child, each potentially with skin tone modifiers.

Flag emoji: Country flags are composed of two Regional Indicator Symbol characters. The US flag is U+1F1FA (Regional Indicator U) + U+1F1F8 (Regional Indicator S). There is no single "US flag" code point.

Why this matters for developers

String length: "👍🏽".length in JavaScript returns 4, not 1. The emoji is two code points (base + skin tone modifier), and JavaScript's .length counts UTF-16 code units, not visual characters. The base emoji is outside the Basic Multilingual Plane, requiring a surrogate pair (2 code units), plus the skin tone modifier is another surrogate pair (2 more).

To get the number of visual characters, use [..."👍🏽"].length (spread into an array of code points) which returns 2. For true grapheme cluster counting, use Intl.Segmenter:

const segmenter = new Intl.Segmenter();
const count = [...segmenter.segment("👍🏽")].length; // 1
Enter fullscreen mode Exit fullscreen mode

Database storage: Emoji require UTF-8 encoding with 4-byte support. MySQL's utf8 charset only supports 3-byte characters and cannot store most emoji. Use utf8mb4 instead. This has caused countless production bugs where emoji cause database insert failures.

Rendering: Not all platforms render the same emoji identically. The "gun" emoji renders as a water pistol on Apple but looked different on other platforms historically. The "salad" emoji once had an egg on some platforms, which was not vegan. Platform differences in emoji rendering can change the perceived meaning.

Text processing: Any operation that manipulates strings character by character (truncation, substring, reversal) can split emoji in the middle of a multi-code-point sequence, producing broken rendering.

Emoji categories

The Unicode Consortium organizes emoji into eight categories:

  1. Smileys and People (most commonly used)
  2. Animals and Nature
  3. Food and Drink
  4. Travel and Places
  5. Activities
  6. Objects
  7. Symbols
  8. Flags

Within each category, emoji are ordered by subcategory (face-positive, face-negative, hand, person, etc.). This standardized ordering is what most emoji pickers follow.

The picker

For finding and copying emoji quickly, I built an emoji picker with search, categories, skin tone selection, and recently used tracking. Click to copy to clipboard, ready to paste anywhere.


I'm Michael Lip. I build free developer tools at zovo.one. 500+ tools, all private, all free.

Top comments (0)