DEV Community

Cover image for Web Fundamentals: The HTML-JSX Confusion
Hasan Ali
Hasan Ali

Posted on

Web Fundamentals: The HTML-JSX Confusion

Contents

  1. The mistake
  2. Parsers
  3. XHTML & JSX
  4. Why it matters
  5. Summary

1. The mistake

This post was originally going to be a short correction on Web Fundamentals: Web Components Part 1 about a false claim I made. I claimed that you can follow along with a simple HTML file by copy and pasting the code snippets that were originally written in Astro. However, that isn't entirely true unless you tweaked some of the markup first. I used Astro's markup syntax [1] which is very similar to HTML, and since I didn't use any of its templating features in my snippets, I thought it'd be identical to HTML. It turns out that it would only be identical to the incorrect HTML I would've written because I wasn't aware of how browsers parsed it. The mistake I made was not put closing tags on my custom elements. I wrote <x-timer /> instead of <x-timer></x-timer>.

I suppose this is a small enough error that you could either ignore or accept a quick correction for, but I thought it'd be more valuable to take this opportunity to peak behind the scenes of how browsers parse HTML. My intention with this snippet was to declare an empty element in the document, and then add children to it using JavaScript. It worked for me in Astro because Astro's templating syntax is closer to JSX than it is HTML. If you were to do the same thing in a plain .html file and then view it in the browser, there's a chance it might not look right depending on the markup that followed the declaration. What's the problem then? And why is it nuanced?

2. Parsers

Syntax is an agreement. It’s an agreement between you and the tool that parses it. When we follow the rules, we expect to see the same results every time because that’s what we’ve agreed to. If the rules were strict, it'd be harder to break but it might also feel too rigid. If the rules were lenient, it'd offer greater flexibility but it can also be harder to avoid ambiguity in the code.

The HTML parser is very flexible by design. It’s built in such a way that even if you break the rules a little, it’ll figure out how to simplify things and display something on the screen. It’s very tolerant. This means if you view the following incomplete markup in the browser, it'll still render a page:

<div>
<p>HTML is tolerant
<br>
<p>Unusally tolerant if you've used a programming language
<section>
Why does it work without any closing tags?
Enter fullscreen mode Exit fullscreen mode

If you inspect this page, you'll see that the browser has insert some basic tags you've missed and also figure out where to put the closing tags for the elements you’ve declared:

<html>
  <head></head>
  <body>
    <div>
      <p>HTML is tolerant</p>
      <br>
      <p>Unusally tolerant if you've used a programming language</p>
      <section>
        Why does it work without any closing tags?
      </section>
    </div>
  </body>
</html>
Enter fullscreen mode Exit fullscreen mode

This might've been exactly what you wanted. Or you might've intended for the section to be outside of the div. Since you've omitted the closing tags, there is no way for the HTML parser to deduce your intent any further [2]. It simplifies it and just does what it does because that’s the agreement. This flexibility means that the burden of enforcing the correctness of the markup is on you. Are there other parsers with different rules that will yell at you when you're being ambiguous like this? Yes.

The flexibility is a feature of the HTML parser, but there are other parsers out there that are stricter, namely XML parsers. XML is a data format like JSON. In XML, there are no predefined tags and you can use it to create whatever shape you want to describe your data. For example:

<person>
  <name>Hasan</name>
  <occupation>Software Engineer</occupation>
  <hobbies>
    <hobby>
      <name>Writing</name>
      <reason>It helps me shape my thinking</reason>
      <frequency>Few times a week</frequency>
    </hobby>
    <hobby>
      <name>Video games</name>
      <reason>It helps me relax and unwind</reason>
      <frequency>Few times a week</frequency>
    </hobby>
    <hobby>
      <name>Cooking</name>
      <reason>It helps me try new things and be creative</reason>
      <frequency>Almost everday</frequency>
    </hobby>
  </hobbies>
</person>
Enter fullscreen mode Exit fullscreen mode

If you're more familiar with JSON, the equivalent is:

{
  "name": "Hasan",
  "occupation": "Software Engineer",
  "hobbies": [
    {
      "name": "Writing",
      "reason": "It helps me shape my thinking",
      "frequency": "Few times a week"   
    },
    {
      "name": "Video games",
      "reason": "It helps me relax and unwind",
      "frequency": "Few times a week"   
    },
    {
      "name": "Cooking",
      "reason": "It helps me try new things and be creative",
      "frequency": "Almost everyday"    
    }   
  ] 
}
Enter fullscreen mode Exit fullscreen mode

There are no predefined tags in XML and if you notice, every tag that we created has a closing tag and this isn't optional [3]. It's because without it there would be no way to know where boundaries of different elements would be. With HTML, the parser uses rules of predefined elements to decide where the closing tags could go if it's not specified. If there aren't any predefined tags, the only way to reliably parse XML would be to expect explicit closing tags. So, what if you wrote HTML like it were XML so you gain all the benefits of XML? This would remove all ambiguity in your markup at the time of authoring it.

3. XHTML & JSX

This was the motivation behind XHTML, or the XML syntax for HTML [4]. You can write your markup strictly in the XML syntax, but it does have the drawback of not rendering anything if you get the markup wrong. With it, you’ve traded the flexibility of the HTML syntax for correctness and depending on the problems you’re facing, this might be worth it. The actual history of how the standards evolved is a little murky to me, but today you can have the better parts of both approaches and write your HTML markup using a mixture of traditional HTML syntax and an XML-like one. This compatibility was added in HTML5 to be more friendly to parser-agnostic markup, so it can be parsed by both HTML and XML parsers. The XML syntax inspired the JSX syntax with the advent of React [5], and even though strict XHTML is not in fashion anymore, it's impact can be felt by the templating syntaxes that it inspired.

With that context established, how would you define elements that can't have children according to the HTML specification, like <br> or <input>, using the XML syntax? In the specification, these elements are called "void elements" [6]. XML syntax requires you to have closing tags, so how can you write this and remain parser agnostic? The XML syntax supports "self-closing" tags for this reason, so that would mean <br> and <input> would become <br /> and <input />, respectively (on the other hand, the HTML parser is also smart enough to ignore the erroneous closing tags for void elements, but don't do this). This was added to HTML5, but there are still strict rules around it. There's a short list of predefined void elements and that's it. You cannot define any other elements as if they were void elements (for example, you can't do <div /> instead of <div></div> for an empty div), and this is the reason why my code snippet wasn't valid HTML.

When the HTML parser encounters a non-void element with a self-closing tag, it treats it as if it was just an opening tag [7]. The parser then figures out where to put the closing tag like it normally would when you omit it, and that's why my snippet would've worked only some of the time if it were used directly in HTML. It worked reliably for me in Astro because Astro uses a JSX-like syntax, which would translate <div /> to <div></div>, and <x-timer /> to <x-timer></x-timer>. The more granular reason this syntax isn't supported for custom elements is because the parser cannot ensure that it can be a void element in the parsing phase [8] without running the connectedCallback, and that doesn't happen until the layout and painting phase [9].

4. Why it matters

Does it really matter? I think when learning the fundamentals, it's important to be shown the right thing and then the alternatives for the very specific problems they solve. That's quite challenging today because of how ubiquitous non-standard technologies are, and how much they influence our understanding of the web platform. When I learnt HTML 15 years ago, I didn't spend long enough with it to notice things like void elements and optional tags. Having written JSX in React, Solid and Astro for years, my brain couldn't even understand why my mistake was a mistake; until I saw it pointed out in a video about WebC [8], which is a templating syntax that is much closer to native HTML than JSX is.

A part of understanding the web is understanding how the browser handles what you throw at it. In Web Fundamentals: HTML Forms we looked at how the browser takes your declarative HTML form and submits an HTTP request to the server when you submit it. In Web Fundamentals: Web Components Part 1 we looked at how the browser parses and executes your custom element logic. This approach gives us a stronger foundation when trying to understand and evaluate some of the modern innovations in web development, because framework authors are building on top of the strengths and weaknesses of the platform.

Summary

HTML tags are so fundamental to web development that the rules around it can often be overlooked, especially when you work with various templating languages regularly. By looking at how the browser parses HTML tags, we saw that the source of my error was not understanding the difference between HTML and JSX syntaxes. We built up to it by looking at the different considerations the different parsers have had to make and how all of that culminated in my misunderstanding of when it's okay to use self-closing tags. This was an example of the importance of questioning your most fundamental assumptions because that is how you keep growing and stay up-to-date.

If you think of anything I've missed or just wanted to get in touch, you can reach me through a comment, via Mastodon, via Threads, via Twitter or through LinkedIn.

References

  1. Astro templating syntax [Website]
  2. Where HTML beats C [YouTube]
  3. XML Tags [Website]
  4. XHTML [MDN]
  5. React JSX closing tags [Website]
  6. HTML spec on void elements [Website]
  7. HTML spec on HTML parsing [Website]
  8. WebC is neat! with Zach Leatherman, Web Developer [YouTube]
  9. Web Fundamentals: Web Components Part 1 [Website]

Top comments (1)

Collapse
 
hasanhaja profile image
Hasan Ali

Coincidentally, I saw a YouTuber run into this same issue today, so I clipped it for reassurance that it wasn't just me:

🎬 JSX definitely HAS infected our brains