DEV Community

Dev Nestio
Dev Nestio

Posted on

Unicode Character Search: UTF-8 Bytes, HTML Entities, JS Escapes in One Tool

How many bytes does the snowman emoji take in UTF-8? What's the HTML entity for the euro sign? I got tired of googling these, so I built a lookup tool.

Try it

Unicode Character Search — DevNestio

What you get for any character

  • Codepoint (U+XXXX)
  • Decimal value
  • Unicode name and block
  • UTF-8 bytes (hex)
  • UTF-16 code units
  • HTML entity (&, ☃, etc.)
  • JavaScript escape (\u2603 or \u{1F600})
  • CSS escape (\2603)

All with copy buttons.

Input methods

  • U+2603 or u+2603 — codepoint notation
  • 0x2603 — hex
  • 9731 — decimal
  • — paste a character directly
  • SNOWMAN — name search (500+ named characters)

UTF-8 encoding from scratch

function toUtf8Bytes(cp) {
  if (cp < 0x80)    return [cp];
  if (cp < 0x800)   return [0xC0|(cp>>6), 0x80|(cp&0x3F)];
  if (cp < 0x10000) return [0xE0|(cp>>12), 0x80|((cp>>6)&0x3F), 0x80|(cp&0x3F)];
  return [0xF0|(cp>>18), 0x80|((cp>>12)&0x3F), 0x80|((cp>>6)&0x3F), 0x80|(cp&0x3F)];
}
Enter fullscreen mode Exit fullscreen mode
  • U+2603 (SNOWMAN): E2 98 83 — 3 bytes
  • U+1F600 (Grinning Face): F0 9F 98 80 — 4 bytes

UTF-16 surrogate pairs

Codepoints above U+FFFF need two UTF-16 code units:

function toUtf16(cp) {
  if (cp < 0x10000) return [cp];
  const c = cp - 0x10000;
  return [0xD800 + (c >> 10), 0xDC00 + (c & 0x3FF)];
}
// U+1F600 -> [0xD83D, 0xDE00]
// That's why '😀'.length === 2 in JavaScript
Enter fullscreen mode Exit fullscreen mode

Bug caught by tests

A logic error hid &nbsp; inside an if (cp < 128) guard, making it unreachable for codepoint 160:

// Bug: 160 < 128 is false, so map was never checked
if (cp < 128) {
  const map = { ..., 160: '&nbsp;' }; // dead code!
}
// Fix:
const named = { 34:'&quot;', 38:'&amp;', 60:'&lt;', 62:'&gt;', 160:'&nbsp;' };
if (named[cp]) return named[cp];
Enter fullscreen mode Exit fullscreen mode

93 tests catch things like this.


Part of DevNestio — 115 free browser-only developer tools.

Top comments (0)