How many bytes does the snowman emoji take in UTF-8? What's the HTML entity for the euro sign? I got tired of googling these, so I built a lookup tool.
Try it
Unicode Character Search — DevNestio
What you get for any character
- Codepoint (U+XXXX)
- Decimal value
- Unicode name and block
- UTF-8 bytes (hex)
- UTF-16 code units
- HTML entity (
&,☃, etc.) - JavaScript escape (
\u2603or\u{1F600}) - CSS escape (
\2603)
All with copy buttons.
Input methods
-
U+2603oru+2603— codepoint notation -
0x2603— hex -
9731— decimal -
☃— paste a character directly -
SNOWMAN— name search (500+ named characters)
UTF-8 encoding from scratch
function toUtf8Bytes(cp) {
if (cp < 0x80) return [cp];
if (cp < 0x800) return [0xC0|(cp>>6), 0x80|(cp&0x3F)];
if (cp < 0x10000) return [0xE0|(cp>>12), 0x80|((cp>>6)&0x3F), 0x80|(cp&0x3F)];
return [0xF0|(cp>>18), 0x80|((cp>>12)&0x3F), 0x80|((cp>>6)&0x3F), 0x80|(cp&0x3F)];
}
- U+2603 (SNOWMAN):
E2 98 83— 3 bytes - U+1F600 (Grinning Face):
F0 9F 98 80— 4 bytes
UTF-16 surrogate pairs
Codepoints above U+FFFF need two UTF-16 code units:
function toUtf16(cp) {
if (cp < 0x10000) return [cp];
const c = cp - 0x10000;
return [0xD800 + (c >> 10), 0xDC00 + (c & 0x3FF)];
}
// U+1F600 -> [0xD83D, 0xDE00]
// That's why '😀'.length === 2 in JavaScript
Bug caught by tests
A logic error hid inside an if (cp < 128) guard, making it unreachable for codepoint 160:
// Bug: 160 < 128 is false, so map was never checked
if (cp < 128) {
const map = { ..., 160: ' ' }; // dead code!
}
// Fix:
const named = { 34:'"', 38:'&', 60:'<', 62:'>', 160:' ' };
if (named[cp]) return named[cp];
93 tests catch things like this.
Part of DevNestio — 115 free browser-only developer tools.
Top comments (0)