DEV Community

Cover image for HTML Parsing with PHP 8.4's New DOM\HTMLDocument Class
Fishbite
Fishbite

Posted on

HTML Parsing with PHP 8.4's New DOM\HTMLDocument Class

Intro: PHP 8.4's New DOM\HTMLDocument Class

PHP 8.4 quietly introduced a powerful new class: DOM\HTMLDocument. Unlike the traditional DOMDocument, this class is tailor-made for parsing HTML5, not XML, and brings a more forgiving and intuitive DOM interface to modern PHP.

Note: This requires PHP 8.4+ and the dom extension to be enabled.


What's the Difference?

DOM\HTMLDocument is more HTML5-native than the classic DOMDocument, which is XML-centric and strict about malformed markup. If you’ve ever fought with DOMDocument silently failing or auto-correcting tags, this new API is for you.


Sample Project on GitHub

I created a GitHub repo showing how to:

  • Load and parse HTML5 files
  • Access elements like <head> and <title>
  • Extract <meta> tags
  • Cleanly render DOM elements as HTML

GitHub Repo: github.com/Fishbite/php-dom-htmldocument-examples


Example: Extracting the First Element in <head>

use DOM\HTMLDocument;

require __DIR__ . '/vendor/autoload.php';

$dom = HTMLDocument::createFromFile('./html/sample.html');
$head = $dom->getElementsByTagName('head')->item(0);

foreach ($head->childNodes as $child) {
    if ($child instanceof DOM\Element) {
        echo "First tag in <head>: <{$child->tagName}>\n";
        echo $dom->saveHTML($child);
        break;
    }
}
Enter fullscreen mode Exit fullscreen mode

What's Included in the Repo?

  • extract_head.php: Show the first child of
  • extract_title.php: Grab the text
  • list_meta.php: List all tags
  • sample.html: Minimal HTML5 doc for testing
  • composer.json: Basic autoloading config

You can run each file individually from the command line or route them through a browser if preferred.


What’s Next?

There’s very little official documentation or community content around DOM\HTMLDocument, so this is a good time to explore it together. Feedback, suggestions, or PRs are more than welcome!

Let’s help get the word out and make working with HTML in PHP a whole lot nicer.

Top comments (0)