Every developer has hit this: you need to escape <, >, &, and quotes before dropping user input into HTML — or you're staring at mangled text full of & and need to convert it back. Most online tools do the basics, but fall short on the full HTML5 named entity set or force you to choose between three encoding formats.
I built one that handles all three formats, 253 named entities, and decodes all of them with a single regex pass — entirely in the browser, no server, no framework.
👉 https://html-entity-encoder.pages.dev
What It Does
-
Encode: text → HTML entities in three modes
-
Named —
é→é,©→©,π→π -
Decimal —
é→é,©→© -
Hex —
é→é,©→©
-
Named —
- Decode: all three entity formats back to plain text
- 253 HTML5 named entities — Latin-1 Supplement, Latin Extended-A, Greek, Math, Arrows, Punctuation, Currency, Symbols
- Real-time: output updates on every keystroke
- Quick Reference: collapsible table you can click to insert characters
- Swap, Copy, Clear, Sample buttons
- Zero external dependencies — single HTML file, works offline
The Core: Encoding in Three Modes
The encoding logic iterates over Unicode code points (not UTF-16 code units), which is essential for handling emoji and characters outside the BMP:
const ALWAYS_ENCODE = new Set(['&', '<', '>', '"', "'"]);
function encode(text, mode) {
if (!text) return '';
const result = [];
for (const ch of text) { // for...of iterates code points
const cp = ch.codePointAt(0);
const mustEncode = ALWAYS_ENCODE.has(ch) || cp > 127;
if (!mustEncode) { result.push(ch); continue; }
if (mode === 'named') {
result.push(CHAR_TO_ENTITY[ch] || `&#${cp};`);
} else if (mode === 'decimal') {
result.push(`&#${cp};`);
} else { // hex
result.push(`&#x${cp.toString(16).toUpperCase()};`);
}
}
return result.join('');
}
The for...of loop over a string yields Unicode code points. A for loop with index would break on any character outside the Basic Multilingual Plane — emoji like 😀 are encoded as surrogate pairs in UTF-16, so a naive str[i] approach would emit two separate (invalid) entities for a single character.
Why &#N; fallback in named mode? Because the 253 named entities don't cover everything. A character like 😀 (U+1F600) has no HTML5 named form, so decimal is the only option.
The Decode Regex
One regex handles all three entity formats in a single pass:
function decode(text) {
if (!text) return '';
return text.replace(
/&([a-zA-Z][a-zA-Z0-9]*);|&#([0-9]+);|&#[xX]([0-9a-fA-F]+);/g,
(match, name, dec, hex) => {
try {
if (name !== undefined)
return Object.prototype.hasOwnProperty.call(ENTITY_TO_CHAR, name)
? ENTITY_TO_CHAR[name] : match;
if (dec !== undefined)
return String.fromCodePoint(parseInt(dec, 10));
if (hex !== undefined)
return String.fromCodePoint(parseInt(hex, 16));
} catch (_) {}
return match;
}
);
}
Three alternation groups, each capturing a different entity format. The named entity lookup uses hasOwnProperty explicitly to guard against prototype pollution — toString, constructor, __proto__ are technically valid entity name shapes, so a direct ENTITY_TO_CHAR[name] lookup could be exploited to return unexpected values from the prototype chain.
The hex branch accepts both &#x...; and &#X...; (the [xX] in the regex) — the HTML5 spec allows both, even though lowercase is conventional.
Building the Entity Maps
The decode map is the source of truth: ENTITY_TO_CHAR maps each name string to its Unicode character. Then the encode map is derived by inverting it:
const CHAR_TO_ENTITY = {};
(function buildCharMap() {
// First pass: reverse all entries
for (const [name, ch] of Object.entries(ENTITY_TO_CHAR)) {
if (!CHAR_TO_ENTITY[ch]) CHAR_TO_ENTITY[ch] = `&${name};`;
}
// Second pass: force canonical preferred names for ambiguous chars
const preferred = {
'"': '"', '&': '&', "'": ''',
'<': '<', '>': '>', ' ': ' ',
'©': '©', '®': '®', '™': '™',
'€': '€', '×': '×', '÷': '÷'
};
Object.assign(CHAR_TO_ENTITY, preferred);
})();
Some characters have multiple named forms in HTML5. For example, ' maps to both ' (from XHTML) and &squot; — the second pass pins canonical names so the encoder always outputs the most recognizable form.
What's in the 253-entity Map
| Category | Count | Examples |
|---|---|---|
| HTML special | 5 |
& < > " '
|
| Latin-1 Supplement | 96 |
é ñ © €
|
| Latin Extended-A | 5 |
Œ œ Š
|
| Greek | 49 |
α π Σ Ω
|
| Mathematical | 37 |
∞ ≠ ≤ ∑ √
|
| Arrows | 11 |
→ ← ⇔ ↵
|
| Punctuation | 20 |
— – … “
|
| Misc Symbols | 10+ |
™ • ♠ ♥
|
| Currency | 5 |
€ £ ¥ ¢
|
Testing: 246 Cases, No Framework
246 tests across 26 sections, built on a two-function inline runner:
let passed = 0, failed = 0;
function eq(a, b, label) {
if (a === b) { console.log(` ✓ ${label}`); passed++; }
else {
console.error(` ✗ ${label}\n got: ${JSON.stringify(a)}\n expected: ${JSON.stringify(b)}`);
failed++;
}
}
| Section | Tests | What's covered |
|---|---|---|
| Entity map coverage | 12 | Size ≥ 250, key entries exist |
| Encode HTML specials (named/decimal/hex) | 23 |
& < > " ' in all modes |
| Encode Latin extended (all modes) | 25 | © € é ñ ü ± ° ½ |
| Encode Greek (all modes) | 14 | α β γ π Σ Ω |
| Encode math & symbols | 11 | ∞ ≠ ≤ √ → • — … |
| ASCII passthrough | 8 | Letters, digits, misc symbols |
| Encode mixed strings | 7 | XSS payloads, café, résumé |
| Decode named entities | 20 | All common named entities |
| Decode decimal entities | 10 |
& through π
|
| Decode hex (lowercase/uppercase X) | 15 |
< and < forms |
| Decode mixed strings | 8 | Full HTML tags, price strings |
| Decode edge cases | 10 | Unknown entities, no semicolon, empty |
| Round-trip (encode→decode) | 30 | 10 strings × 3 modes |
| Double-encoding prevention | 2 |
& → &amp;
|
| Unicode correctness | 7 | U+0000, U+0041, U+2665 |
| Entity map value checks | 10 | Known char values |
| Misc symbols encode | 8 | ♠ ♥ ♦ ♣ ⇒ ∑ |
| Less common entities | 13 |
Œ • ‰
|
| Whitespace entities | 5 |
    ‌
|
| Hex uppercase digits | 4 |
Ä Ü
|
| Non-BMP encode/decode | 4 | 😀 decimal + hex round-trip |
Run with npm test.
A Subtle Edge Case: Prototype Pollution in Decode
The named entity lookup is written as:
Object.prototype.hasOwnProperty.call(ENTITY_TO_CHAR, name)
? ENTITY_TO_CHAR[name]
: match
rather than the simpler ENTITY_TO_CHAR[name]. Why? Because name comes from user-supplied text via the regex match. If someone passes &constructor; or &__proto__; as input, a direct bracket lookup would walk the prototype chain and return a function object or the prototype itself — then String.fromCodePoint on a non-integer would throw, but that's after already leaking prototype state.
The hasOwnProperty check ensures we only return values that are explicitly in the entity map, not inherited from Object.prototype.
Try It
https://html-entity-encoder.pages.dev
Single HTML file, no build step. Open DevTools and read the source — everything is there.
Also part of devnestio — a growing collection of zero-dependency browser tools for developers.
Built with: vanilla JS, the HTML5 named character references spec, and an unreasonable number of Greek letters.
Top comments (0)