Tom Streeter

Posted on Aug 12, 2020

Semantics: Meaning from Markup

#html #semantics #beginners

The Power of Patterns

Thinking is hard. It takes effort. It takes energy. It takes time. Many species have developed the ability to recognize patterns to minimize the effort required to accomplish tasks. What humans can do that many other species can't is extend this ability to abstract symbols.

This figure illustrates what recognizing patterns can do. We really don't need to read every letter in order to understand what's being communicated as long as we understand the patterns that words in our language typically employ. Your mind can "fill in the blanks" because it knows when vowels usually show up in words and you can insert letters "on the fly" to make reasonable guesses as to what the groups of letters mean.

This can only work if you have an intuitive understanding of the language being used. Indeed, being able to identify that something is something based only on patterns you recognize isn't a bad working definition of the word "intuitive." When someone says they can read a language but can't speak it, that usually means they've begun to pick up the patterns. It's not a small thing.

You know that HTML stands for Hyper Text Markup Language and that individual HTML elements follow patterns we refer to as syntax. The syntax of HTML elements are so well-defined that your text editor will typically color-code their different parts so they're easier to understand at a glance.

Markup Provides Structure

You also know that HTML elements can be grouped in particular ways to give HTML documents their overall structure. You can think about individual HTML elements as containers when they're used this way.

This is the basic structure of every web page ever written. There's nothing accidental about it. Each element means something to the browser.

The <html> element is the container within which the HTML document is held. The <head> element contains meta elements. These elements are not part of the page's visible content, it's information about the page's content that only the browser needs to know. The <title> element is meta information. It's not part of the page's content. It's information about the page that's there for the benefit of the web browser. Think of it this way: Some of my students would call me "Dr. Streeter." That's incorrect, as it turns out. I never finished my Ph.D., so it's really not appropriate to call me that. It's a title; It's not my name. It's an arbitrary label (albeit one that's hard to get and I didn't). The text of every HTML <title> element could legitimately be just A Page. It'd be accurate, but not very helpful.

(Don't do this.)

(Really.)

I don't know how the word "title" was chosen. It would have been just as accurate to use the word "label" or "identity." It is <title>, though, but it's only a "title" as far as the web browser is concerned.

The <body> element, on the other hand, is the container for the content meant to be seen by people. It's why the page exists. The <head> element contains a series of elements that are meant to be understood by web browsers. The stuff inside the <body> element are read by computer programs called web browsers (or User Agents if you really want to show off) that are reading the HTML documents for people.

That last part is crucial. HTML is read by machines, not people. People are meant to see the output of what a browser interprets from what's in the HTML document.

The reason tools like Codepen.io exist is that the <html> and <head> elements are really more boilerplate than anything else. Yes, they're important, but it's not the part of the web page we (as humans) spend the most time on (if we're doing it right). The part of the page we really care about is inside the <body>. When you use a tool like Codepen, you're able to just work with HTML markup fragments as if you were writing them in the body of a full HTML document -- but without having to actually write a whole document and stick it on a server somewhere.

We want to be able to do that because the HTML elements that appear within the <body> element take this idea of structuring content to a whole other level. To understand the importance of this, it's worth taking a little side trip to think about we really use words.

Structure Provides Meaning

Consider the following text:

ea efflorescere quo malis occaecat veniam exquisitaque instituendarum voluptate exquisitaque anim tempor quis fabulas dolor senserit relinqueret ne esse admodum singulis nam vidisse ab exercitation elit voluptatibus irure sunt familiaritatem o quae probant consectetur ullamco offendit voluptate ab eiusmod ita voluptate sempiternum exercitation do ea vidisse offendit adipisicing familiaritatem cupidatat multos hic quis adipisicing singulis laboris quid enim fugiat incurreret sint tractavissent deserunt admodum nisi deserunt comprehenderit est se iis tractavissent quamquam culpa tempor elit

There is no way to know what this is. The words are a kind-of-sort-of-Latin called lorem ipsum text that's commonly used as dummy text when mocking up all kinds of print and electronic documents.

In this example there are no capital letters. There are no sentences. We don't know what's supposed to be what. Could be a cookie recipe. Could be a ransom note (has anyone seen my cat?) Since the words are essentially nonsense, the only way we're going to get any clue about what this is is from the structure of the words in relation to one another.

Let's add capital letters and some punctuation:

Ea efflorescere quo malis. Occaecat veniam exquisitaque instituendarum voluptate. Exquisitaque anim tempor quis, fabulas dolor senserit. Relinqueret ne esse admodum singulis nam vidisse ab exercitation elit voluptatibus. Irure sunt familiaritatem o quae probant consectetur ullamco offendit. Voluptate ab eiusmod ita voluptate sempiternum exercitation do ea vidisse offendit. Adipisicing familiaritatem cupidatat multos hic quis adipisicing singulis. Laboris quid enim fugiat incurreret sint tractavissent deserunt. Admodum nisi deserunt comprehenderit est. Se iis tractavissent quamquam culpa tempor elit.

Same letters and spaces, but now, at least, we can see what's supposed to be a sentence. Sure, the words still don't mean anything to us, but at least we can recognize a basic structure. We know it's (probably) not just a list of words.

Let's add a bit more structure:

Ea Efflorescere Quo Malis

Occaecat veniam exquisitaque instituendarum voluptate. Exquisitaque anim tempor quis, fabulas dolor senserit. Relinqueret ne esse admodum singulis nam vidisse ab exercitation elit voluptatibus. Irure sunt familiaritatem o quae probant consectetur ullamco offendit.

Voluptate ab eiusmod ita voluptate sempiternum exercitation do ea vidisse offendit.

Adipisicing familiaritatem cupidatat multos hic quis adipisicing singulis.

Laboris quid enim fugiat incurreret sint tractavissent deserunt.

Admodum nisi deserunt comprehenderit est. Se iis tractavissent quamquam culpa tempor elit.

Now we're getting somewhere! Without knowing a single thing about what any of the words mean, you might be able to make a reasonable guess as to what's going on. You won't be able to translate it, but just the form of the words communicate something.

What different interpretation would you make if you saw this instead?

Ea Efflorescere Quo Malis

Occaecat veniam exquisitaque instituendarum voluptate. Exquisitaque anim tempor quis, fabulas dolor senserit. Relinqueret ne esse admodum singulis nam vidisse ab exercitation elit voluptatibus. Irure sunt familiaritatem o quae probant consectetur ullamco offendit.

Voluptate ab eiusmod ita voluptate sempiternum exercitation do ea vidisse offendit.

Adipisicing familiaritatem cupidatat multos hic quis adipisicing singulis.

Laboris quid enim fugiat incurreret sint tractavissent deserunt.

Admodum nisi deserunt comprehenderit est. Se iis tractavissent quamquam culpa tempor elit.

See what changed? Why would you choose to use a numbered list instead of a bulleted list? Does your guess about what these words mean change because of the way the items are presented?

Semantics Are Meanings Communicated Through Structure Defined By Markup

I hope you can see that the structure of the words -- the way they are arranged in relationship to one another -- suggests what the words themselves might mean. We know some structure has been imposed because the appearance of some of the words have changed in recognizable ways.

How does HTML markup make this happen? Let's look at the markup and see:

<h4>Ea efflorescere quo malis</h4>

<p>Occaecat veniam exquisitaque instituendarum voluptate.
Exquisitaque anim tempor quis, fabulas dolor senserit. Relinqueret
ne esse admodum singulis nam vidisse ab exercitation elit
voluptatibus. Irure sunt familiaritatem o quae probant consectetur
ullamco offendit.</p>

<ol>
    <li>  Voluptate ab eiusmod ita voluptate sempiternum
exercitation do ea vidisse offendit.</li>
    <li> Adipisicing familiaritatem cupidatat multos hic quis
adipisicing singulis.</li>
    <li> Laboris quid enim fugiat incurreret sint tractavissent
deserunt.</li>
</ol>

<p>Admodum nisi deserunt comprehenderit est. Se iis tractavissent
quamquam culpa tempor elit.</p>

Let's break down what we see — and don't see:

The first four words are a heading. In the context of this HTML page, this heading is a fourth-level heading. That means that somewhere else in this document, there are first, second, and third-level headings. The beginning and the ending of the heading are marked by the <h4> and </h4> tags. It says nothing about what a fourth-level heading should look like. CSS dictates that, not the HTML.
There is a block of text surrounded by the elements <p> and </p> that sets it off as a paragraph. It's literally the use of those tags that means the content should be considered a paragraph. In this case paragraphs happen to have an extra line before and after them and no indentation of the first line. That's only because the CSS is set up that way. All the HTML does is say "this is a paragraph." If the CSS defined indentations and no extra lines, the same markup would work just as well.
Three sentences form a list in which their order matters. We know the order matters because they're displayed in numerical order, but those numbers could just as easily be letters, roman numerals, or even spelled out as words. The <ol> and </ol> group the items on that list. Individual list items are defined by <li> and </li> tags. If the order of the sentences didn't matter, bullet points would be more appropriate and unordered list (<ul> and </ul>) tags would be used.

List items must be surrounded by either ordered list tags or unordered list tags so the item "knows" what kind of marker to add. If you knew nothing about HTML you'd never know that this was a numbered list. And it doesn't have to be numbered as I pointed out in the description. "Ordered" and "unordered" is all the HTML defines. How those concepts manifest themselves are up to CSS.
The last two sentences are marked off as a final paragraph with our old pals <p> and </p>.

Documents can have further internal structures that aren't necessarily visible to the eye, but still have meanings that matter. The figure above is the structure of the web page where the text of this blog post originated. See where there's a <section> element with an id attribute with the value semantics? That could just as easily say "You are Here."

Except that would be stupid.

Which doesn't mean I wouldn't do it.

It just means I didn't think of it before I grabbed the screenshot.

Look at all those other tags. Now look at the page you're reading.

You should be able to find that level-3 heading since it's up there at the top of the page. What about the rest of them? What does a <header> look like? What about a <nav>? What the heck is a <div> and why would it have an id with a value of "wrapper"? What's the deal with all those <section>s?

Many of those tags have names that are descriptive of what kind of region they are meant to define. They mean something. When content is ascribed particular meaning through the use of particular markup tags, it's referred to as semantic markup.

Semantic markup provides meaning. It doesn't define appearance.

Understanding this is what separates people who understand what they are doing when they markup a document from people who just cut-and-paste tags on a page and hope they work.

If you want to know specifically what those tags you see mean, you can look them up. The main takeaway from this post is that the purpose of HTML is to define a meaningful structure to a document. Sometimes those meanings will result in special formatting, but it's not always the case.

When you learn about a particular element's tag, don't think about it in terms of what it makes something look like. Think about it in terms of what it says something means.

That's the essence of semantic markup with HTML. Understand that and you'll understand HTML in a pretty deep way.