Ken Bellows

Posted on Mar 24, 2019 • Edited on Dec 11, 2021

Stop using so many divs! An intro to semantic HTML

#html #webdev #beginners #semantic

Divs are played out

We all love our <div> tags. They've been around for decades, and for decades they've been the go-to element when you need to wrap some stuff in a block for styling or structural purposes. It's still very common to look through production websites and see stuff like this:



<div class="container" id="header">
    <div class="header header-main">Super duper best blog ever</div>
    <div class="site-navigation">
        <a href="/">Home</a>
        <a href="/about">About</a>
        <a href="/archive">Archive</a>
    </div>
</div>
<div class="container" id="main">
    <div class="article-header-level-1">
        Why you should buy more cheeses than you currently do
    </div>
    <div class="article-content">
        <div class="article-section">
            <div class="article-header-level-2">
                Part 1: Variety is spicy
            </div>
            <!-- cheesy content -->
        </div>
        <div class="article-section">
            <div class="article-header-level-2">
                Part 2: Cows are great
            </div>
            <!-- more cheesy content -->
        </div>
    </div>
</div>
<div class="container" id="footer">
    Contact us!
    <div class="contact-info">
        <p class="email">
            <a href="mailto:us@example.com">us@example.com</a>
        </p>
        <div class="street-address">
            <p>123 Main St., Suite 404</p>
            <p>Yourtown, AK, 12345</p>
            <p>United States of America</p>
        </div>
    </div>
</div>

Hoo, that's a lot of <div>s. And hey, it works. I mean, mostly. It has the structure you need, and I'm sure it'll look the way you intend by the time you're done styling it. But it has some big problems:

Accessibility - Many a11y tools are pretty smart, and try their best to parse the structure of a page to help guide users through it in the way the page's author intends, and to give users easy jump points to navigate quickly to the section of the page they care about. But <div>s don't really impart any useful info about the structure of a document. The smartest a11y tool in the world still isn't a human, and can't be expected to parse class and id attributes and recognize all the weird and wild ways that devs all over the world name their blocks. I can recognize that class="article-header-level-2" is a subheading, but a robot can't. (And if it can, get it out of my computer, I'm not ready for the AGI revolution just yet.)
Readability - To read this code, you need to carefully scan for the class names, picking them out from between the <div class="..."></div> boilerplate. And once you're a few levels deep in the markup, it becomes tricky to keep track of which </div> closing tags go with which <div...> opening tags. You start to rely very heavily on IDE features like coloring different indentation levels or highlighting the matching tag for you to keep track of where you are, and in larger documents it can require a lot of scrolling on top of those features.
Consistency and standards - It can be frustrating to start a new job or move to a new project and have to learn from scratch all the crazy markup conventions used across the codebase. If everyone had a standardized way to mark up common structures in web documents, it would be much easier to skim an HTML file in an unfamiliar codebase and get a quick handle on what it's supposed to represent. If only there was such a standard...

HTML5: Such a standard

HTML5 is not new. That's an understatement; an initial working draft was released for public comment in January of 2008 (11 years ago!), and it became a full-fledged W3C recommendation in October of 2014, 4½ years ago. So, like, it's been around for a while.

One of the primary advancements of HTML5 was introducing a standardized set of semantic elements. The term "semantic" refers to the meaning of a word or a thing, so "semantic elements" are elements designed to mark up the structure of a document in a more meaningful way, a way that makes it clear what they're for, what purpose they serve in the document. And importantly, since they're standardized, these elements define the document in a way that everyone can use and understand, robots included.

I think the HTML5 spec itself sums up the issue nicely in a note under the definition of the <div> element:

NOTE:
Authors are strongly encouraged to view the div element as an element of last resort, for when no other element is suitable. Use of more appropriate elements instead of the div element leads to better accessibility for readers and easier maintainability for authors.
— https://www.w3.org/TR/html5/grouping-content.html#the-div-element

I'll divide the semantic block elements into two categories: primary structure and content indicators. These aren't standard terms or anything; I just made them up for this article. But I think the distinction is useful enough. 🤷‍♂️

Primary Structures

There's a super common pattern that can be found in websites, tutorials, and even CSS libraries all over the internet, and for good reason. We often divide a page at its topmost level into three regions: header, main, and footer, then divide those regions into sections as needed. I included this in my example above to prove the point:



<div class="container" id="header">...</div>
<div class="container" id="main">
    ...
    <div class="article-section">...</div>
    ...
</div>
<div class="container" id="footer">...</div>

I've seen (and used) this pattern for decades, and it makes a ton of sense to structure a document this way, both for readability of the HTML and for easier styling of the page in CSS. The header and footer elements also make partial templates in languages like PHP or Rails/ERB a ton easier to work with, as you can include common header and footer partials all over the site:



<?php include 'header.php'; ?>

<div id="main">...</div>

<?php include 'footer.php'; ?>

So here's the thing: everyone agrees that this is a nice pattern to follow. And that includes the folks at the WHATWG and W3C, who standardized the pattern into four new elements in HTML5 with very clear names: <header>, <main>, <footer>, and <section>.

Bookends: `<header>` and `<footer>`

The <header> and <footer> elements are basically twins: they're very similarly defined in the spec and follow the same set of rules about where they're allowed to be used, with the only difference being their semantic purposes: headers go at the beginnings of things, footers go at the ends of things. And by "things", I mean more than just the <body> of your page: this pair of elements are designed to be used within any part of your document that represents a chunk of content with a clear beginning and end. This can include things like forms, articles, sections of articles, posts on a social media site, cards, etc.

Headers and footers are attached semantically to the closest "sectioning root" or "sectioning content" element. These are things like <body>, <blockquote>, <section>, <td>, <aside>, and lots of others; click the links above if you want the full lists. Assistive technologies can use these elements and others to generate an outline of a document, which can help users navigate it more easily. You shouldn't have more than one <header> or <footer> per sectioning root/content. (One of each is fine, but not two of the same.)

As a final note, <header>s very often hold the heading element (<h1>-<h6>) for their context. This is not necessary, but can help to group other related elements with the heading, like links, images, or subheadings, and can help maintain a consistent structure even when the heading is the only element in the <header>.

The good stuff: `<main>`

The third primary region element, <main>, is special. The spec says two very important things about <main>:

The main content area of a document includes content that is unique to that document and excludes content that is repeated across a set of documents such as site navigation links, copyright information, site logos and banners and search forms (unless the document or application’s main function is that of a search form).
— https://www.w3.org/TR/html5/grouping-content.html#elementdef-main

So <main> is where you put the good stuff, the important parts of a page, the reason the user came to this page in particular, not your site in general. In other words, the main content. 😯😲🤯

All that other stuff, logos and search forms and navigation and such, can go in a <header> or <footer> within the <body> but outside of <main>.

There must not be more than one visible main element in a document. If more than one main element is present in a document, all other instances must be hidden using the hidden attribute.
— https://www.w3.org/TR/html5/grouping-content.html#elementdef-main

This is pretty unique. Unlike <header> and <footer> (and most other block elements), <main> can't be used all over the page within arbitrary sectioning content; it should be used once and only once. Or rather, it can be used multiple times in a document, but only one <main> element should be visible at a time; all others must be hidden with the hidden attribute, which basically acts like display: none; in CSS. If you think about it, this suggests a pretty useful pattern for preloading views in an app: create a new <main hidden>, fetch some content that the user is likely to view next (e.g., the next article in a series, the next slide in a slideshow, etc.), and when the user clicks the link/button to load that view, swap out the current <main> with the preloaded one by toggling the hidden attribute on both.

Before continuing, let's pause and review the example from above. Here's how it would look if we used <header>, <main>, and <footer> for the main structure of the article:



<header>
    <h1>Super duper best blog ever</h1>
    ...
</header>
<main>
    <h2>Why you should buy more cheeses than you currently do</h2>
    ...
</main>
<footer>
    Contact us!
    <div class="contact-info">this.is.us@example.com</div>
</footer>

That's so much nicer already! But there's still plenty of work to do.

Break it down: `<section>`

So we've got a basic outline for our page: a header, a footer, and a main content region. Now it's time to add some of that sweet, sweet content.

Typically you'll want to break your content down into sections, especially for mass text content like this article, because no one likes reading impenetrable walls of text.

Wall of text — Nobody likes a wall of text

That's where <section> comes in. This one is the simplest in terms of rules: structurally speaking, it's basically just a <div> with special semantic meaning. A <section> begins a new "sectioning content" region, so it can have its own <header> and/or <footer>.

What's the difference, then, between a <section> and a regular old <div>, and when should you use each? Well, allow me to quote the spec once again:

NOTE:
The <section> element is not a generic container element. When an element is needed only for styling purposes or as a convenience for scripting, authors are encouraged to use the <div> element instead. A general rule is that the <section> element is appropriate only if the element’s contents would be listed explicitly in the document’s outline.
— https://www.w3.org/TR/html5/sections.html#the-section-element

You know, as a quick aside, the HTML5 spec is actually pretty readable. It's one of the more readable specs out there. Every time I glance at it for a quick answer, I inevitably learn something unexpected and useful, especially if I start clicking links. Give it a try some time!

So in short, if you would list this portion of the document in the table of contents, use a <section>. If not, use a <div> or something else.

Content Indicators

Okay, so we've got a solid structure for our page. Instead of just slinging <div>s all over, we've explicitly marked the main content region of the page, and we've called out headers, footers, and sections. But there's definitely more semantics than that to our document.

Let's talk about a few of the elements added in HTML5 that communicate content semantics rather than structure.

The whole shebang: `<article>`

The <article> element is used to represent a fully self-contained region of content, something that could be plucked out of your page and dropped into another and still make sense on its own. This might be a literal article or blog post, but could also be used for a social media post like a tweet or a Facebook wall post.

The HTML5 spec recommends that articles always have a heading that identifies what it is, ideally using a heading element (<h1>-<h6>). An <article> can also have <header>, <footer>, and <section> elements, so you really could use it to embed a full document fragment with all the structure it needs within another page.

To return to the example from the way up above, let's rewrite the class="article-*" elements using an <article> and some of the other elements we've discussed.



<article>
    <header>
        <h1>Why you should buy more cheeses than you currently do</h1>
    </header>
    <section>
        <header>
            <h2>Part 1: Variety is spicy</h2>
        </header>
        <!-- cheesy content -->
    </section>
    <section>
        <header>
            <h2>Part 2: Cows are great</h2>
        </header>
        <!-- more cheesy content -->
    </section>
</article>

Isn't that a ton more readable than the original? And again, not only is it easier to read, it's way more useful for assistive tech; robots can't always figure out your specific class name pattern, but they can follow this structure.

Getting around: `<nav>`

This element is a bit more well-known than others. <nav> is designed to clearly identify the main navigation blocks on the page, the groups of links that help the user find their way around the rest of the site (e.g. a site map or list of links in the header) or the current page (e.g. a table of contents).

In our example up top, let's apply a <nav> to that group of links in the header.



<nav>
    <a href="/">Home</a>
    <a href="/about">About</a>
    <a href="/archive">Archive</a>
</nav>

Doesn't change the structure at all, but you know what it is at a glance rather than needing to read and process the class name on a <div> to figure it out, and more importantly the robots can find it too.

Getting in touch: `<address>`

The last element we'll discuss is <address>. This element is intended to call out contact info, and it's often used in the main page <footer> to markup the mailing address, phone number, customer service email address, etc. for a business.

Interestingly, the rules for how to markup the content within an <address> element is left open. The spec mentions that there are several other specs that address this, and it probably is outside the scope of HTML itself to provide that level of granularity.

A common solution is RDFa, also a W3C spec, which uses attributes on tags to label the different components of data. Here's what the footer from our example might look when marked up with <address> elements and RDFa:



<footer>
    <section class="contact" vocab="http://schema.org/" typeof="LocalBusiness">
        <h2>Contact us!</h2>
        <address property="email">
            <a href="mailto:us@example.com">us@example.com</a>
        </address>
        <address property="address" typeof="PostalAddress">
            <p property="streetAddress">123 Main St., Suite 404</p>
            <p>
                <span property="addressLocality">Yourtown</span>,
                <span property="addressRegion">AK</span>,
                <span property="postalCode">12345</span>   
            </p>
            <p property="addressCountry">United States of America</p>
        </address>
    </section>
</footer>

RDFa is admittedly a bit verbose, but it's pretty handy for marking up data. If you're interested in learning more about RDFa, here's a few links:

Conclusion

Okay, we've covered a lot, and we've seen a lot of it applied to our example in bits and pieces. But let's put it all together and see what it looks like.



<header>
    <h1>Super duper best blog ever</h1>
    <nav>
        <a href="/">Home</a>
        <a href="/about">About</a>
        <a href="/archive">Archive</a>
    </nav>
</header>
<main>
    <article>
    <header>
        <h1>Why you should buy more cheeses than you currently do</h1>
    </header>
    <section>
        <header>
            <h2>Part 1: Variety is spicy</h2>
        </header>
        <!-- cheesy content -->
    </section>
    <section>
        <header>
            <h2>Part 2: Cows are great</h2>
        </header>
        <!-- more cheesy content -->
    </section>
</article>
</main>
<footer>
    <section class="contact" vocab="http://schema.org/" typeof="LocalBusiness">
        <h2>Contact us!</h2>
        <address property="email">
            <a href="mailto:us@example.com">us@example.com</a>
        </address>
        <address property="address" typeof="PostalAddress">
            <p property="streetAddress">123 Main St., Suite 404</p>
            <p>
                <span property="addressLocality">Yourtown</span>,
                <span property="addressRegion">AK</span>,
                <span property="postalCode">12345</span>   
            </p>
            <p property="addressCountry">United States of America</p>
        </address>
    </section>
</footer>

If you ask me, that's 100x more readable than the original example, and it's going to be 100x more effective for SEO and accessibility purposes, too.

These are by no means the only semantic elements in HTML. There are lots of additional elements that help to tag and structure your text content, embedded media, etc. Here are a few to check out if you're enjoying this and want to dig deeper. You might recognize a few:

And that's just a start! Like I said, when you start reading the HTML spec, it's tough to stop. It's an incredibly rich language, and I think people underestimate it a little too often.

Oldest comments (128)

Wes • Mar 24 '19

This is beautiful 😍😍🤩🤩. I’m converting my apps to use semantic html from now on. Soooo much more readable and accessible. #html #javascript

Mihail Malo • Mar 24 '19 • Edited

#hashtag

Ben Halpern • Mar 24 '19

Fab post

Oyebanji Jacob • Mar 24 '19

Awesome post!

you have a dangling </div> in the last HTML code block. :)

Ken Bellows • Mar 24 '19

Oh no! 😫 Hahaha, thanks for pointing that out! Fixed it

PauGuillamon • Mar 24 '19

I agree with everyone, very nice and useful post! Especially for me, who I'm not a web developer but I sometimes do some side and personal projects that imply web development!

As a side note, there's still one extra </div> in the code sample in The whole shebang: <article> section ;)

Ken Bellows • Mar 24 '19

Thanks, fixed!

programmerabhi • Apr 5 '20

Which tag should I use to display the code?

Ken Bellows • Apr 6 '20 • Edited

For semantics purposes, use the <code> tag. For display purposes, to make it looks like code, it depends:

If it's an inline code snippet in the middle of a sentence, you can just use the <code> tag:
The <code>while(condition)</code> loop is useful for loops with an unknown number of iterations
If it's an independent block of code, possibly with multiple lines, and you don't need syntax highlighting, you can wrap your <code> tag in the "preformatted text" tag, <pre>:

Below is an example of a while loop used to walk a graph of nodes:
<pre><code>
const queue = [rootNode]
const prev = new Map()
const visited = new Set()
while (queue.length > 0)  {
  const node = queue.shift()
  for (const child of node.children) {
    if (visited.has(child)) {
      continue
    }
    visited.add(child)
    prev.set(child, node)
    queue.push(child)
  }
}
</code></pre>

If you are worried about syntax highlighting, you should still wrap the whole thing in a <code> tag, but you're probably going to want to bring in a library that does the very hard work of parsing and highlighting your code, like Prism or highlight.js.

Bottom line, though, no matter what extra stuff you're doing for display purposes, code should always be wrapped in a <code> tag.

Michael Caveney • Mar 24 '19

Very nice work, thanks for taking the time to write this up!

Isa Levine • Mar 24 '19

thanks for writing this! i'm just starting out with html and i totally thought the convention was to throw divs in everywhere--glad to have these other options, especially early on!

Tygari • Mar 24 '19 • Edited

Great post.
I been doing this for over a year and love it.

I feel you should have given some mention to Custom Tags. They help so much in pushing this to the next level.

PS-- Content Indicators has a hanging < /div>

Ken Bellows • Mar 24 '19

Thanks, fixed!

By custom tags, do you mean custom elements? Those are a whole other ball of wax. They're part of the Web Components API, which requires JavaScript and a ton of extra domain knowledge. This article is just an introduction to an important feature of HTML5, and Web Components are not a part of HTML5, so that would fall pretty far outside the scope of this article.

To be honest, I'm pretty unfamiliar with custom elements myself, so I'm not the person to write that article. Sounds like you have some background, so maybe you are! I'd love to read it!

Tygari • Mar 26 '19

Yes custom elements are what I mean.

Though there is far more to it than I know or understand (for now). In it's simplest form you can just write a tag as < box>< /box> (or any word you choose) and be able to target that tag as you would any other.

JavaScript:
document.getElementsByTagName("box")

CSS:
box{display: none;}

All without understanding anything else of the API.

The greatest thing about a custom tag is it is blank.
No pre-attached css or code like 'p' or 'div' have. Your free to write without having to remember what has what attached.

I use this kind of coding extensibly in my Browser Game "EVO Idle". As well as some new Dynamics to manipulate information quickly and easily. (Be warned some are considered against the standard.)

Ken Bellows • Mar 26 '19

RIght, that's the thing. Those aren't so much "custom elements" as they are "nonstandard elements", because the browser doesn't understand them, and they don't have any functionality backing them. It is true that you can write arbitrary tags in your markup, and IIRC the browser treats them like <span>s by default (which is to say, as generic inline elements), but this isn't standard practice, and it's usually considered a bad idea.

A custom tag name indicates to other developers that there's something special about this tag, that it's a component or a proper Custom Element with some JS behind it or something, and it can be very confusing to read markup with lots of nonstandard elements that aren't backed by any other code, especially if they're mixed with actual components that are backed by code. So my recommendation is to instead use a standard element with a class="..." instead of a custom tag name. The only thing that changes in your selector is an extra . before what would have been the element name, and is now a class name.

Just my two cents, take it or leave it 😊

more-urgent-jest • Mar 26 '19

html.spec.whatwg.org/#custom-elements

Ken Bellows • Mar 26 '19 • Edited

Yep, that's a nice summary! They distinguish there between "non-standard elements", arbitrary tags that have not been added to the CustomElementRegistry, and "custom elements", which have:

Custom elements provide a way for authors to build their own fully-featured DOM elements. Although authors could always use non-standard elements in their documents, with application-specific behavior added after the fact by scripting or similar, such elements have historically been non-conforming and not very functional. By defining a custom element, authors can inform the parser how to properly construct an element and how elements of that class should react to changes.

(My emphasis added.)

The word "defining" is linked to the portion of the standard that describes how to define a custom element, which begins as follows:

Element definition is a process of adding a custom element definition to the CustomElementRegistry. This is accomplished by the define() method.

Tygari • Mar 29 '19

I disagree. If people wish to make assumptions about everything than it is their own fault for messing up.

A simple Google check will verify if a tag is standard or custom.

Using custom tags with or without javascript makes the document more readable. Which is the point you were making.

Not everything needs to be predefined.
If it were the language would never have evolved in the first place.

It is people pushing out of the standard practice that evolve the language.

If you wish to be confined by such limitations that is your choice.
I choose to evolve my coding style by trying new things and creating new concepts. Even in the face of the people's backward concepts of what is and is not proper coding.

Ken Bellows • Mar 30 '19 • Edited

Alright, please read what I have to say here in full. I need to say something that I feel is very important.

I fully respect that you have a different opinion from me on how best to write your markup. And I'm perfectly cool with that; you can write your HTML in whatever way you see fit, and I'll be happy to hear how it works for you, and what benefits you find in it. Genuinely, reading about alternative viewpoints on web development is one of my hobbies, and I almost always find something I like in every one I explore.

But please do not turn around and label my attempts to explain the existing standards and best practices of the web platform as "backwards concepts of what is and is not proper coding". Because these are not some arbitrary whims handed down from some oligarchic hierarchy of web gods. These are standards with a huge amount, literally decades, of research, community-wide debate, and iterative revisions behind them, and there are are very good reasons why they exist.

It honestly hurts to be accused of trying to "confine" and "limit" people from "trying new things and creating new concepts" simply because I'm explaining the background and advantages of the specs that are out there. My point was never that nonstandard tags are evil and you should feel bad for using them. But you started the conversation by suggesting that I should have promoted nonstandard elements as a good practice that "help[s] so much in pushing this to the next level." The fact that you said that nonstandard tags can build on the techniques that I discussed in my article tells me that I did a poor job of emphasizing the reasons why we use standardized semantic tags. Semantic tags are not primarily about improving code readability. If that was the case, there would have been no point in defining a spec for them; we could just standardize the use of arbitrary tag names and let common patterns develop within the community, like we have with CSS class names.

My point in this thread was to let you know that there is a very real difference between arbitrary nonstandard elements, which have no defined semantics or behavior and can't be used by assistive tech or web crawlers, and true custom elements, for which the developer has explicitly defined the behavior and semantics for the browser.

Because here's the thing: the semantic web isn't just a matter of preference or style or convenience. It has a huge direct impact on the lives of many, many users, those who rely on assistive technology, which in turn relies on the semantics it can parse from the text to help those users.

If you've never tried to use the web with a screen reader before, please do. I think every web developer needs to do this periodically in order to better understand how many of their users interact with the web, and how honestly horrible a lot of the web can be for users who rely on assistive tech. If your site is built with nonstandard elements with no defined semantics, then the best that a screenreader can do is read the text top-to-bottom, with no way to let the user easily navigate the page. But if you use the elements I talked about in this article, a screen-reader can add build an outline of the page to give to the user, and it makes it a hundred times easier to navigate the page.

And microdata specs like RDFa help fill in the rest of the semantics that aren't expressible in HTML alone. Seriously, browse a little through schema.org/docs/full.html and look at all the options. And that's all stuff that assistive tools can potentially utilize to give users more context about what the page represents. (And by the way, it can dramatically help your SEO on top of this.)

In my experience, there's a tragic lack of attention paid to semantics in web development training, and this knowledge gap actively hurts the users that actually need it. That's a big part of why I wrote this article. Using semantic HTML and microdata formats improves the lives of many people much more directly than you might expect. Nonstandard tags, unfortunately, don’t, and I worry they may redirect devs away from the standard methods that do because nonstandard tags require fewer characters and zero research.

Maybe I should have emphasized the a11y aspect of semantics more strongly in my article, and maybe I'll write a follow-up to do just that. But please, please don't think that I or anyone else is trying to enforce some arbitrary restrictions that stifle innovation by recommending that devs avoid nonstandard elements and use existing semantics frameworks instead. What I'm trying to do is help one of the most underserved and ignored groups of users on the web.