Let’s Talk About HTML Hierarchy
Jennifer Lyn Parsons Nov 1 '17
Okay, this is our meat and potatoes, right? HTML is the backbone of how we put a website together. At some point, no matter what framework, JS, Ruby, etc. fanciness we’re using to build a site, we’re going to spit HTML out at the end to be ingested by some sort of browser.
We don’t know what kind of browser will be reading our code. We only know that it will be able to read it. Firefox, Chrome, Internet Explorer and Safari are usually what pop into people’s heads when you say the word “browser”. It might be desktop or mobile, but there are many other browsers out there. In particular, the ones that get forgotten most often are screen readers, those browsers that are accessible to those with visual limitations of one from or another.
What these limited browsers also do, in addition to being incredible tools for the visually impaired to gain access to the internet, is also display the power and importance of using semantic elements in our HTML structures.
Frankly, we should all be more concerned about accessibility (myself included) but even if you’re not for some reason, this information is useful to everyone for another reason: this is also how search engines read your page. Want better page rankings? Using HTML properly will help you out. The bots will be able to determine what the most important information is on the page. This, in turn, tells the search engine what your page is about and whether to serve your link up higher on the results page when someone searches for something related to your page.
There is a third important reason to write your HTML semantically: it makes it easier to write CSS and you’ll have to write less of it. I’ll get into that at another time, but suffice to say that there is magic sauce in the mix between semantic HTML and the power of the cascade.
So, with all that said, what does it mean to write “semantic” HTML? It means that the content that we’re adding to our page has context. It has hierarchy, levels of importance that help tell the browsers what to emphasize and what to deemphasize. It helps describe the data that we’re displaying on the page so our users (don’t forget the users!) don’t have to think too hard about what they’re looking at. We want them to intuitively understand our page as much as possible.
Okay, perhaps an example here would be helpful. Take a look at this chunk of information:
And now this one:
Yes, you may have parsed out that the bit of this wikipedia article is about the Drabble fiction form Drabble - Wikipedia, but what was less clear was that “Criteria” is a subsection of “55 Fiction”. In both examples above, that’s raw html rendered in Firefox, by the way. No CSS or anything fancy other than setting a width for the body so I could take a nice screenshot.
In the first example, all the chunks of information are wrapped in
p tags. In the second, we have a structure that looks like this:
<h1></h1> <p></p> <h2></h2> <p></p> <h2></h2> <p></p> <h3></h3> <p></p>
Which, if we were to think of this accurately and with some indenting to clarify the hierarchy, is a page outline and could be thought of like this:
<h1>Title</h1> <p>information related to the title</p> <h2>Subtitle</h2> <p>information related to the subtitle</p> <h2>Subtitle</h2> <p>information related to the subtitle</p> <h3>Sub Subtitle</h3> <p>information related to the subsubtitle</p>
The only reason in the first example that you can discern the headlines is because they are short and they are in separate
p tags, which by default in Firefox have margin around them.
Two slightly more extreme examples for you. The first, which uses only
div elements, which are rendered by default as blocks similar to
p tags, but without any margins. They are “styleless” by default:
The second contains only
span tags, which are inline elements really meant for styling and grouping data with similar attributes such as phrases that are in a different language:
Yeah, things are starting to get hard to parse now, right? And yet, in both these examples, there are the same number of html tags, wrapping the information in exactly the same way as in the first two examples.
It has been argued to me that tags like
article, etc. only encode style. In theory, this may be true. Tags are just an abstract way of displaying information in organized chunks and their names don’t matter. If we had collectively decided to identify semantics with class names,
div.h1 would have the same importance as
h1. However, this is not the paradigm we work in and so
h1 still holds more semantic weight in the page hierarchy and browsers are engineered with stylistic defaults to match that hierarchy.
Semantic HTML goes beyond just headers and paragraphs.
footer, and a few other special tags all help not only organize your content and make it easier to style, but also contribute to making your HTML make more sense both to the browser and to the user reading your page. While they don’t have any default styling associated with them, once again, screen readers and SEO concerns benefit greatly from proper usage of these tags.
I hope this has helped clarify a few things about how the HTML hierarchy works and how you can use it to provide your users with a better experience. To help you along the way, I’ll be posting a follow up on some tips for writing better code that outlines some common mistakes I’ve seen and more thoughts on writing cleaner, more performant HTML.
Some final notes and resources:
I highly recommend reading this article for a brief but thorough overview on screen reader capabilities and how semantics can improve the experience for the visually impaired:
WebAIM: Designing for Screen Reader Compatibility
Want to go more in depth in understanding semantic HTML? This is a great resource:
Semantic HTML Tutorial | HTML & CSS Is Hard
For a bit more nitty gritty overview of the various tags mentioned here, as well as plenty others, there are few better resources than the MDN. Their list of tags conveniently breaks them down by intended usage:
HTML element reference - HTML | MDN
A few updates:
Too often framework users make everything a div. That's bad for users, search engines, and even devs making changes!
As I thought a bit further about the topic, I also thought of a few more use cases you may not have considered:
• print/pdf versions of your page (and epub if you do that kind of thing) are much easier to generate when a proper page hierarchy is used.
• RSS readers ingest the html sent over without any styling attached. If you’d like your headlines and general page structure to translate into other interfaces with clarity, semantic HTML is the way to go.