arenasbob2024-cell

Posted on Feb 27 • Originally published at viadreams.cc

HTML Entities: The Complete Guide to Special Characters and XSS Prevention

#security #webdev #html #javascript

HTML entities are the backbone of safe web content rendering. Getting them wrong leads to broken layouts, encoding issues, and XSS vulnerabilities.

The 5 Characters You Must Always Encode

<!-- These 5 MUST be encoded in HTML content -->
& → &amp;   <!-- Would be parsed as entity start -->
< → &lt;    <!-- Would start a tag -->
> → &gt;    <!-- Would end a tag -->
" → &quot;  <!-- In quoted attributes -->
' → &#39;   <!-- In single-quoted attributes -->

Named vs Numeric Entities

Three equivalent ways to write the same character:

<!-- Named entity (most readable) -->
&copy; → ©

<!-- Decimal numeric entity -->
&#169; → ©

<!-- Hex numeric entity -->
&#xA9; → ©

Essential Entities Reference

Typography:
| Character | Entity | Code |
|-----------|--------|------|
| © | © | Copyright |
| ® | ® | Registered |
| ™ | ™ | Trademark |
| — | — | Em dash |
| – | – | En dash |
| … | … | Ellipsis |
| " | “ | Left double quote |
| " | ” | Right double quote |
| |   | Non-breaking space |

Symbols:
| Character | Entity |
|-----------|--------|
| × | × |
| ÷ | ÷ |
| ≤ | ≤ |
| ≥ | ≥ |
| ≠ | ≠ |
| ∞ | ∞ |
| € | € |
| £ | £ |

JavaScript Encoding

// Safe DOM API (auto-encodes) - PREFERRED
element.textContent = userInput;  // &amp; &lt; etc. handled automatically

// Manual encode (for inserting into HTML strings)
function encodeHTML(str) {
  return str
    .replace(/&/g, '&amp;')
    .replace(/</g, '&lt;')
    .replace(/>/g, '&gt;')
    .replace(/"/g, '&quot;')
    .replace(/'/g, '&#39;');
}

// Manual decode
function decodeHTML(html) {
  const doc = new DOMParser().parseFromString(html, 'text/html');
  return doc.documentElement.textContent;
}

// Library options
// npm install he
import he from 'he';
he.encode('<script>alert("xss")</script>');
// → '&lt;script&gt;alert(&quot;xss&quot;)&lt;/script&gt;'

Python Encoding

import html

# Encode
safe = html.escape('<script>alert("xss")</script>')
# → '&lt;script&gt;alert(&quot;xss&quot;)&lt;/script&gt;'

# Also encode single quotes (for use in attributes)
safe_attr = html.escape(user_input, quote=True)

# Decode
original = html.unescape('&lt;b&gt;Hello&lt;/b&gt;')
# → '<b>Hello</b>'

PHP Encoding

// htmlspecialchars: encodes &, <, >, ", '
$safe = htmlspecialchars($user_input, ENT_QUOTES | ENT_HTML5, 'UTF-8');

// htmlentities: encodes ALL applicable characters (accents, symbols, etc.)
$very_safe = htmlentities($input, ENT_QUOTES, 'UTF-8');

// Decode
$original = html_entity_decode($encoded, ENT_QUOTES | ENT_HTML5, 'UTF-8');

React/JSX — Auto-Escaping

React automatically escapes content in JSX:

// This is safe — React encodes user_input automatically
<div>{user_input}</div>  // → &lt;script&gt; in rendered HTML

// Avoid dangerouslySetInnerHTML unless you sanitize first
import DOMPurify from 'dompurify';
<div dangerouslySetInnerHTML={{ __html: DOMPurify.sanitize(html_content) }} />

// Entities in JSX source (not in variables):
<span>Copyright &copy; 2026</span>
<span>Arrows: &larr; &rarr; &uarr; &darr;</span>

XSS Prevention Context Table

Context encoding rules:

Location	Required encoding
HTML content	`& < >`
HTML attribute	`& < > " '`
JavaScript string	`\x26 \x3c \x3e` or JSON.stringify
CSS value	`\00XX` hex encoding
URL parameter	`encodeURIComponent()`

Common Mistakes

& in URLs — href="/search?a=1&b=2" → href="/search?a=1&b=2"
Double encoding — encoding already-encoded text
innerHTML without sanitization — use DOMPurify
abuse — for spacing, use CSS margin/padding instead

Online Encoder

For quick entity conversion, use DevToolBox's HTML entity encoder — supports encode/decode, named entities, numeric decimal, and hex formats.

Encode and decode HTML entities instantly with DevToolBox's free HTML entity tool.

DEV Community