HTML entities are the backbone of safe web content rendering. Getting them wrong leads to broken layouts, encoding issues, and XSS vulnerabilities.
The 5 Characters You Must Always Encode
<!-- These 5 MUST be encoded in HTML content -->
& → & <!-- Would be parsed as entity start -->
< → < <!-- Would start a tag -->
> → > <!-- Would end a tag -->
" → " <!-- In quoted attributes -->
' → ' <!-- In single-quoted attributes -->
Named vs Numeric Entities
Three equivalent ways to write the same character:
<!-- Named entity (most readable) -->
© → ©
<!-- Decimal numeric entity -->
© → ©
<!-- Hex numeric entity -->
© → ©
Essential Entities Reference
Typography:
| Character | Entity | Code |
|-----------|--------|------|
| © | © | Copyright |
| ® | ® | Registered |
| ™ | ™ | Trademark |
| — | — | Em dash |
| – | – | En dash |
| … | … | Ellipsis |
| " | “ | Left double quote |
| " | ” | Right double quote |
| | | Non-breaking space |
Symbols:
| Character | Entity |
|-----------|--------|
| × | × |
| ÷ | ÷ |
| ≤ | ≤ |
| ≥ | ≥ |
| ≠ | ≠ |
| ∞ | ∞ |
| € | € |
| £ | £ |
JavaScript Encoding
// Safe DOM API (auto-encodes) - PREFERRED
element.textContent = userInput; // & < etc. handled automatically
// Manual encode (for inserting into HTML strings)
function encodeHTML(str) {
return str
.replace(/&/g, '&')
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/"/g, '"')
.replace(/'/g, ''');
}
// Manual decode
function decodeHTML(html) {
const doc = new DOMParser().parseFromString(html, 'text/html');
return doc.documentElement.textContent;
}
// Library options
// npm install he
import he from 'he';
he.encode('<script>alert("xss")</script>');
// → '<script>alert("xss")</script>'
Python Encoding
import html
# Encode
safe = html.escape('<script>alert("xss")</script>')
# → '<script>alert("xss")</script>'
# Also encode single quotes (for use in attributes)
safe_attr = html.escape(user_input, quote=True)
# Decode
original = html.unescape('<b>Hello</b>')
# → '<b>Hello</b>'
PHP Encoding
// htmlspecialchars: encodes &, <, >, ", '
$safe = htmlspecialchars($user_input, ENT_QUOTES | ENT_HTML5, 'UTF-8');
// htmlentities: encodes ALL applicable characters (accents, symbols, etc.)
$very_safe = htmlentities($input, ENT_QUOTES, 'UTF-8');
// Decode
$original = html_entity_decode($encoded, ENT_QUOTES | ENT_HTML5, 'UTF-8');
React/JSX — Auto-Escaping
React automatically escapes content in JSX:
// This is safe — React encodes user_input automatically
<div>{user_input}</div> // → <script> in rendered HTML
// Avoid dangerouslySetInnerHTML unless you sanitize first
import DOMPurify from 'dompurify';
<div dangerouslySetInnerHTML={{ __html: DOMPurify.sanitize(html_content) }} />
// Entities in JSX source (not in variables):
<span>Copyright © 2026</span>
<span>Arrows: ← → ↑ ↓</span>
XSS Prevention Context Table
Context encoding rules:
| Location | Required encoding |
|---|---|
| HTML content | & < > |
| HTML attribute | & < > " ' |
| JavaScript string |
\x26 \x3c \x3e or JSON.stringify |
| CSS value |
\00XX hex encoding |
| URL parameter | encodeURIComponent() |
Common Mistakes
-
& in URLs —
href="/search?a=1&b=2"→href="/search?a=1&b=2" - Double encoding — encoding already-encoded text
- innerHTML without sanitization — use DOMPurify
- abuse — for spacing, use CSS margin/padding instead
Online Encoder
For quick entity conversion, use DevToolBox's HTML entity encoder — supports encode/decode, named entities, numeric decimal, and hex formats.
Encode and decode HTML entities instantly with DevToolBox's free HTML entity tool.
Top comments (0)