Snappy Tools

Posted on Apr 30 • Originally published at snappytools.app

HTML Entities Explained: When to Encode, When to Skip, and What Goes Wrong

#security #webdev #html #beginners

HTML entities trip up developers in two situations: encoding content for display, and sanitizing content to prevent injection. These are related but not the same thing — and mixing them up is how XSS vulnerabilities happen.

This post covers what HTML entities actually are, when you must encode, when you don't need to, and what your framework is (hopefully) doing for you.

What is an HTML entity?

An HTML entity is a text representation of a character that has special meaning in HTML, or a character that can't easily be typed.

The format is either:

Named entity: &, <, >, ", '
Numeric decimal: &, <, >
Numeric hex: &, <, >

They all end with a semicolon. The browser decodes them back to the character when rendering.

The five characters you always need to encode

These five characters have special meaning in HTML. If your content contains them and you're inserting it into HTML, you must encode them:

Character	Entity	Why it matters
`&`	`&`	Starts every entity — must be encoded first
`<`	`<`	Opens a tag
`>`	`>`	Closes a tag
`"`	`"`	Closes a double-quoted attribute value
`'`	`'` or `'`	Closes a single-quoted attribute value

The order matters when encoding manually: always encode & before anything else, otherwise you'll double-encode entities that already exist.

When encoding matters for security: XSS

Cross-site scripting (XSS) happens when user-supplied content is rendered as HTML instead of text. A classic example:

<!-- User submits: -->
<script>alert('XSS')</script>

<!-- Page renders without encoding: -->
Hello, <script>alert('XSS')</script>!
<!-- The script runs in every visitor's browser -->

With proper encoding:

Hello, &lt;script&gt;alert('XSS')&lt;/script&gt;!
<!-- Renders as text — no execution -->

The browser sees < and displays <, but doesn't treat it as a tag opener.

The attribute context

Encoding in attributes requires extra care:

<!-- Safe in element content, but not in an attribute -->
<input value="<script>alert(1)</script>">

<!-- The browser sees: value="  -->
<!-- The rest is treated as markup -->

When injecting into attribute values, you must encode " and ' in addition to < and >.

The JavaScript context

HTML encoding is not sufficient for content inside <script> tags or onclick handlers. That content is parsed as JavaScript, not HTML. Use JavaScript string escaping (or better, never inject user content directly into JS).

What your framework does for you

Modern frameworks escape HTML by default in templates. If you're using React, Vue, Angular, Svelte, or any major framework, dynamic content is encoded automatically:

// React — safe by default
function Greeting({ name }) {
  return <p>Hello, {name}!</p>;
  // name is automatically HTML-escaped
}

The risk comes from bypassing the default:

// DANGEROUS — opt-in to raw HTML injection
<div dangerouslySetInnerHTML={{ __html: userContent }} />

Any time you use dangerouslySetInnerHTML, v-html, or equivalent, you are responsible for sanitizing the content yourself.

For server-side rendering, the same rules apply. Template engines like Jinja2, Twig, and Handlebars escape by default — but have a | safe filter or {{{ }}} syntax that bypasses escaping.

When you don't need entities

You only need to encode when:

You're writing raw HTML (not using a framework or template engine)
You're doing server-side string interpolation into HTML output
You're using a raw/safe escape hatch in a template engine
You're working with email HTML (many email clients don't process entities reliably — be extra careful)

You don't need entities for:

Content inside <textarea> — < and > are literal text there (except </textarea>)
JSON API responses — JSON is not HTML
JavaScript strings that aren't injected into HTML
CSS — different parser entirely

Common entities beyond the core five

A few you'll encounter regularly:

Character	Entity	Use case
Non-breaking space	` `	Prevents line break between words
Em dash —	`—`	Typography in HTML content
En dash –	`–`	Date ranges, scores
©	`©`	Copyright notice
®	`®`	Registered trademark
™	`™`	Trademark
→	`→`	Arrows in UI copy
" "	`“` `”`	Typographic quote marks

In practice, you can use UTF-8 characters directly in modern HTML (as long as you declare <meta charset="utf-8">). Named entities are more of a legacy requirement — they were necessary when character encoding was inconsistent.

Encoding and decoding in JavaScript

// Encode — no built-in, DIY or use a library
function htmlEncode(str) {
  return str
    .replace(/&/g, '&amp;')
    .replace(/</g, '&lt;')
    .replace(/>/g, '&gt;')
    .replace(/"/g, '&quot;')
    .replace(/'/g, '&#39;');
}

// Decode — can use the DOM
function htmlDecode(str) {
  const txt = document.createElement('textarea');
  txt.innerHTML = str;
  return txt.value;
}

The DOM decode trick works because textarea.innerHTML parses entities, and .value returns the decoded text. It's safe for decoding — the <textarea> doesn't execute scripts.

Quick reference

If you need to encode or decode HTML entities without writing code, SnappyTools HTML Entity Encoder handles both directions — encode raw text to entities, or decode entities back to characters.

Summary

Encode &, <, >, ", ' when inserting user content into HTML
Modern frameworks do this automatically — be careful when bypassing them
HTML encoding is not enough for JavaScript contexts
  and typographic entities are useful but optional in UTF-8 documents
When in doubt, encode — double-encoding is ugly but not a security hole; missing encoding can be one

DEV Community